Tip of the Trade: GNU Recode

By Carla Schroder (Send Email)
Posted Dec 26, 2006


In the beginning, there was C and C++, as well as hosts of other computer programming languages. All are based on ASCII (American Standard Code for Information Interchange), which, as the name implies, is based on the English alphabet. This wouldn't be an issue except there are many humans in the world, and they don't all use the English alphabet. When it comes to conquering character encoding chaos, GNU Recode is a simple key to unicode conformity.

So along came Unicode to the rescue. Unicode provides a framework for all of the alphabets of the world to be represented on computers. UTF-8 is the most popular Unicode implementation because it preserves backward compatibility with ASCII. Which is all fun to know, but what good is that when you're looking at piles of computer files that must be converted from ISO-8859-1 (Latin-1, Western European) into whatever encoding you prefer? Naturally, there are a number of utilities just for this task.

GNU Recode supports more than 150 character sets and converts just about anything to anything. For example, there are users of legacy Linux systems still running ISO-8859-1. GNU Recode converts these to nice modern UTF-8, like this:

$ recode UTF-8 recode-test.txt
Check out the GNU Recode Manual for instructions.

That's fast and easy enough, but one job remains — converting the filename. The convmv is just the tool for this. This example converts all the ISO-8859-1 filenames in the files/ directory to UTF-8:

$ convmv -f iso-8859-1 -t utf8 --notest  files/
convmv run without the --notest option does a dry-run without changing anything, which is probably a wise first step.

Maybe you have a file that you don't know what the encoding is. Upload the file to this online tool, and it will tell you. You can even do file conversions here.

Resources

The subject of character encoding is huge and bewildering, especially for us dinosaurs from the typewriter era, and when you hit a typewriter key it came out the same way — every single time. Wikipedia has a number of excellent introductory articles:

Page 1 of 1


Comment and Contribute

Your name/nickname

Your email

(Maximum characters: 1200). You have characters left.