Language Detection Library for Java
The language-detection library is a Java opensource library to detect languages in which texts are written.
(Also known as 'Language identification', 'Language guessing' and 'Language recognition')
- 99% over precision for 40+ languages
- Detect language of a text using naive Bayesian filter
- Generate language profiles from Wikipedia abstract database file
- Supported languages (bundled 47 profiles):
- Afrikaans, Arabic, Bulgarian, Bengali, Czech, German, Greek, English, Spanish, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Macedonian, Malayalam, Marathi, Nepali, Dutch, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Somali, Albanian, Swedish, Swahili, Tamil, Telugu, Thai, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Simplified/Traditional Chinese.
- Project Homepage:
- Apache License 2.0
- Shuyo Nakatani (twitter : @shuyo) / Cybozu Labs, Inc.