Howto change the encoding of text files in Linux systems
How many times did you want to change the encoding of a text files in Linux systems? or How many times did you try to watch a movie and it’s subtitles .srt showed in unreadable shapes “characters” and you needed to change it to readable characters?
Sure many times you tried/needed to change the encoding of text files in Linux systems.
All this because your apps/programs using a wrong encoding format for your text files. The solution for this is very simple Just changing the text files encoding will end your problems.
In this mini post, I’ll show you how to change the encoding of your text files by using iconv
Linux command.
- Changing a File’s Encoding using
iconv
Linux command
To use iconv
Linux command you need to know the encoding of the text file you need to change it. Use the following syntax to convert the encoding of a file :
$ iconv -f [encoding] -t [encoding] [filename] > [output_filename]
Option | Description |
---|---|
-f, --from-code |
Convert characters from encoding |
-t, --to-code |
Convert characters to encoding |
Example 1: Convert a file’s encoding from iso-8859-1 to UTF-8 and save it to New_storks.srt
$ iconv -f iso-8859-1 -t utf-8 storks.srt > New_storks.srt
Here’s the New_storks.srt is UTF-8 encoded.
Example 2: Convert a file’s encoding from cp1256 to UTF-8 and save it to output.txt
$ iconv -f cp1256 -t utf-8 input.txt > output.txt
Here’s the output.txt is UTF-8 encoded.
Example 3: Convert a file’s encoding from ASCII to UTF-8 and save it to output.txt
$ iconv -f ascii -t utf-8 input.txt > output.txt
Here’s the output.txt is UTF-8 encoded.
Example 4: Convert a file’s encoding from UTF-8 to ASCII
Hints: 1. UTF-8 can contain characters that can't be encoded with ASCII, the iconv will generate the error message "illegal input sequence at position X" unless you tell it to strip all non-ASCII characters using the -c option. 2. With using iconv with the -c option, you could loose some characters from your text file.
$ iconv -c -f utf-8 -t ascii input.txt > output.txt
Option | Description |
---|---|
-c | Omit invalid characters from output |
Finally, to list all the coded character sets known run -l
option with iconv
as follow:
$ iconv -l
Option | Description |
---|---|
-l, --list |
List known coded character sets |
Here’s the output of the above command:
The following list contain all the coded character sets known. This does not necessarily mean that all combinations of these names can be used for the FROM and TO command line parameters. One coded character set can be listed with several different names (aliases). 437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865, 866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4, 8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110, ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5, BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BRF, BS_4730, CA, CN-BIG5, CN-GB, CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278, CP280, CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424, CP437, CP500, CP737, CP770, CP771, CP772, CP773, CP774, CP775, CP803, CP813, CP819, CP850, CP851, CP852, CP855, CP856, CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP866NAV, CP868, CP869, CP870, CP871, CP874, CP875, CP880, CP891, CP901, CP902, CP903, CP904, CP905, CP912, CP915, CP916, CP918, CP920, CP921, CP922, CP930, CP932, CP933, CP935, CP936, CP937, CP939, CP949, CP950, CP1004, CP1008, CP1025, CP1026, CP1046, CP1047, CP1070, CP1079, ........
Finally, I hope this article is useful for you.
If You Appreciate What We Do Here On Mimastech, You Should Consider:
- Stay Connected to: Facebook | Twitter | Google+
- Support us via PayPal Donation
- Subscribe to our email newsletters.
- Tell other sysadmins / friends about Us - Share and Like our posts and services
We are thankful for your never ending support.