Tip of the Trade: GNU Recode
In the beginning, there was C and C++, as well as hosts of other computer programming languages. All are based on ASCII (American Standard Code for Information Interchange), which, as the name implies, is based on the English alphabet. This wouldn't be an issue except there are many humans in the world, and they don't all use the English alphabet. When it comes to conquering character encoding chaos, GNU Recode is a simple key to unicode conformity.
So along came Unicode to the rescue. Unicode provides a framework for all of the alphabets of the world to be represented on computers. UTF-8 is the most popular Unicode implementation because it preserves backward compatibility with ASCII. Which is all fun to know, but what good is that when you're looking at piles of computer files that must be converted from ISO-8859-1 (Latin-1, Western European) into whatever encoding you prefer? Naturally, there are a number of utilities just for this task.
GNU Recode supports more than 150 character sets and converts just about anything to anything. For example, there are users of legacy Linux systems still running ISO-8859-1. GNU Recode converts these to nice modern UTF-8, like this:
$ recode UTF-8 recode-test.txt
That's fast and easy enough, but one job remains converting the filename. The convmv is just the tool for this. This example converts all the ISO-8859-1 filenames in the files/ directory to UTF-8:
$ convmv -f iso-8859-1 -t utf8 --notest files/
Maybe you have a file that you don't know what the encoding is. Upload the file to this online tool, and it will tell you. You can even do file conversions here.
ResourcesThe subject of character encoding is huge and bewildering, especially for us dinosaurs from the typewriter era, and when you hit a typewriter key it came out the same way every single time. Wikipedia has a number of excellent introductory articles: