Input
- Any text files (.txt or .csv are the more common ones)
Output
A CSV file with 3 headers
Headers are: Language, Encoding, Confidence.
Note: it reply blank when the language is for all languages.
Character Code Table
Codec | Languages | Codec | Languages | Codec | Languages | Codec | Languages |
---|---|---|---|---|---|---|---|
ascii | English | cp869 | Greek | gbk | Unified Chinese | johab | Korean |
big5 | Traditional Chinese | cp874 | Thai | gb18030 | Unified Chinese | koi8_r | Russian |
big5hkscs | Traditional Chinese | cp875 | Greek | hz | Simplified Chinese | koi8_t | Tajik |
cp037 | English | cp932 | Japanese | iso2022_jp | Japanese | koi8_u | Ukrainian |
cp273 | German | cp949 | Korean | iso2022_jp_1 | Japanese | kz1048 | Kazakh |
cp424 | Hebrew | cp950 | Traditional Chinese | iso2022_jp_2 | Japanese, Korean, Simplified Chinese, Western Europe, Greek | mac_cyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp437 | English | cp1006 | Urdu | iso2022_jp_2004 | Japanese | mac_greek | Greek |
cp500 | Western Europe | cp1026 | Turkish | iso2022_jp_3 | Japanese | mac_iceland | Icelandic |
cp720 | Arabic | cp1125 | Ukrainian | iso2022_jp_ext | Japanese | mac_latin2 | Central and Eastern Europe |
cp737 | Greek | cp1140 | Western Europe | iso2022_kr | Korean | mac_roman | Western Europe |
cp775 | Baltic languages | cp1250 | Central and Eastern Europe | latin_1 | Western Europe | mac_turkish | Turkish |
cp850 | Western Europe | cp1251 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian | iso8859_2 | Central and Eastern Europe | ptcp154 | Kazakh |
cp852 | Central and Eastern Europe | cp1252 | Western Europe | iso8859_3 | Esperanto, Maltese | shift_jis | Japanese |
cp855 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian | cp1253 | Greek | iso8859_4 | Baltic languages | shift_jis_2004 | Japanese |
cp856 | Hebrew | cp1254 | Turkish | iso8859_5 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian | shift_jisx0213 | Japanese |
cp857 | Turkish | cp1255 | Hebrew | iso8859_6 | Arabic | utf_32 | all languages |
cp858 | Western Europe | cp1256 | Arabic | iso8859_7 | Greek | utf_32_be | all languages |
cp860 | Portuguese | cp1257 | Baltic languages | iso8859_8 | Hebrew | utf_32_le | all languages |
cp861 | Icelandic | cp1258 | Vietnamese | iso8859_9 | Turkish | utf_16 | all languages |
cp862 | Hebrew | cp65001 | Windows only Windows UTF-8 (CP_UTF8) | iso8859_10 | Nordic languages | utf_16_be | all languages |
cp863 | Canadian | euc_jp | Japanese | iso8859_11 | Thai languages | utf_16_le | all languages |
cp864 | Arabic | euc_jis_2004 | Japanese | iso8859_13 | Baltic languages | utf_7 | all languages |
cp865 | Danish, Norwegian | euc_jisx0213 | Japanese | iso8859_14 | Celtic languages | utf_8 | all languages |
cp866 | Russian | euc_kr | Korean | iso8859_15 | Western Europe | utf_8_sig | all languages |
gb2312 | Simplified Chinese | iso8859_16 | South-Eastern Europe |