Detect CharSet-2.326.1130

Detect CharSet

Author: Jerry Chae

This plugin takes and text file and tells you what Character Set (Character Code/Encoding) the file uses.

Need help?

Technical contact to tech@argos-labs.com

May you search all operations,

Input

Any text files (.txt or .csv are the more common ones)

Output

A CSV file with 3 headers
Headers are: Language, Encoding, Confidence.

Note: it reply blank when the language is for all languages.

Character Code Table

Codec	Languages	Codec	Languages	Codec	Languages	Codec	Languages
ascii	English	cp869	Greek	gbk	Unified Chinese	johab	Korean
big5	Traditional Chinese	cp874	Thai	gb18030	Unified Chinese	koi8_r	Russian
big5hkscs	Traditional Chinese	cp875	Greek	hz	Simplified Chinese	koi8_t	Tajik
cp037	English	cp932	Japanese	iso2022_jp	Japanese	koi8_u	Ukrainian
cp273	German	cp949	Korean	iso2022_jp_1	Japanese	kz1048	Kazakh
cp424	Hebrew	cp950	Traditional Chinese	iso2022_jp_2	Japanese, Korean, Simplified Chinese, Western Europe, Greek	mac_cyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp437	English	cp1006	Urdu	iso2022_jp_2004	Japanese	mac_greek	Greek
cp500	Western Europe	cp1026	Turkish	iso2022_jp_3	Japanese	mac_iceland	Icelandic
cp720	Arabic	cp1125	Ukrainian	iso2022_jp_ext	Japanese	mac_latin2	Central and Eastern Europe
cp737	Greek	cp1140	Western Europe	iso2022_kr	Korean	mac_roman	Western Europe
cp775	Baltic languages	cp1250	Central and Eastern Europe	latin_1	Western Europe	mac_turkish	Turkish
cp850	Western Europe	cp1251	Bulgarian, Byelorussian, Macedonian, Russian, Serbian	iso8859_2	Central and Eastern Europe	ptcp154	Kazakh
cp852	Central and Eastern Europe	cp1252	Western Europe	iso8859_3	Esperanto, Maltese	shift_jis	Japanese
cp855	Bulgarian, Byelorussian, Macedonian, Russian, Serbian	cp1253	Greek	iso8859_4	Baltic languages	shift_jis_2004	Japanese
cp856	Hebrew	cp1254	Turkish	iso8859_5	Bulgarian, Byelorussian, Macedonian, Russian, Serbian	shift_jisx0213	Japanese
cp857	Turkish	cp1255	Hebrew	iso8859_6	Arabic	utf_32	all languages
cp858	Western Europe	cp1256	Arabic	iso8859_7	Greek	utf_32_be	all languages
cp860	Portuguese	cp1257	Baltic languages	iso8859_8	Hebrew	utf_32_le	all languages
cp861	Icelandic	cp1258	Vietnamese	iso8859_9	Turkish	utf_16	all languages
cp862	Hebrew	cp65001	Windows only Windows UTF-8 (CP_UTF8)	iso8859_10	Nordic languages	utf_16_be	all languages
cp863	Canadian	euc_jp	Japanese	iso8859_11	Thai languages	utf_16_le	all languages
cp864	Arabic	euc_jis_2004	Japanese	iso8859_13	Baltic languages	utf_7	all languages
cp865	Danish, Norwegian	euc_jisx0213	Japanese	iso8859_14	Celtic languages	utf_8	all languages
cp866	Russian	euc_kr	Korean	iso8859_15	Western Europe	utf_8_sig	all languages
		gb2312	Simplified Chinese	iso8859_16	South-Eastern Europe

ARGOS

Detect CharSet-2.326.1130

Output

Character Code Table

How to set parameters