bad detection
You can use the
whatlanguage
Show archive.org snapshot
gem to detect the language of a Ruby string.
Note that it also has not been updated in quite a while and that there might be alternatives. However, it still works.
It has problems with short strings, but works quite well on longer texts.
Use it like this:
>> WhatLanguage.new(:all).language('Half the price of a hotel for twice the space')
=> :english
There is also a convenience method on Strings (you may need to require 'whatlanguage/string'
).
>> 'Wir entwickeln und betreiben anspruchsvolle Webanwendungen'.language
=> :german
Depending on your users' input, consider using less languages for better accuracy:
>> WhatLanguage.new(:all).language('Hello')
=> :russian # nope
>> WhatLanguage.new(:german, :english).language('Hello')
=> :english
WARNING
whatlanguage has a really bad detection:
LANGUAGES = WhatLanguage.new(:english, :german, :french, :italian, :spanish)
LANGUAGES.language("Updated: ElasticSearch - a database alternative?")
=> :french
LANGUAGES.language("Updated a database alternative?")
=> :german
LANGUAGES.language("a database alternative?")
=> :french
LANGUAGES.language("a database")
=> :french
LANGUAGES.language("database")
=> :italian
An alternative could be Compact Language Detection Show archive.org snapshot but it contains native extensions
Posted by Arne Hartherz to makandra dev (2010-09-10 17:14)