Detect the language of a string

Updated . Posted . Visible to the public. Deprecated.

bad detection

You can use the whatlanguage Show archive.org snapshot gem to detect the language of a Ruby string.
Note that it also has not been updated in quite a while and that there might be alternatives. However, it still works.

It has problems with short strings, but works quite well on longer texts.

Use it like this:

>> WhatLanguage.new(:all).language('Half the price of a hotel for twice the space')
=> :english

There is also a convenience method on Strings (you may need to require 'whatlanguage/string').

>> 'Wir entwickeln und betreiben anspruchsvolle Webanwendungen'.language
=> :german

Depending on your users' input, consider using less languages for better accuracy:

>> WhatLanguage.new(:all).language('Hello')
=> :russian # nope
>> WhatLanguage.new(:german, :english).language('Hello')
=> :english

WARNING

whatlanguage has a really bad detection:

LANGUAGES = WhatLanguage.new(:english, :german, :french, :italian, :spanish)

LANGUAGES.language("Updated: ElasticSearch - a database alternative?")
=> :french
LANGUAGES.language("Updated a database alternative?")
=> :german
LANGUAGES.language("a database alternative?")
=> :french
LANGUAGES.language("a database")
=> :french
LANGUAGES.language("database")
=> :italian

An alternative could be Compact Language Detection Show archive.org snapshot but it contains native extensions

Arne Hartherz
Last edit
Natalie Zeumann
License
Source code in this card is licensed under the MIT License.
Posted by Arne Hartherz to makandra dev (2010-09-10 17:14)