- 1 Vapor IO Brings OpenDCRE to General Availability
- 2 VMware Takes the Wraps Off vRealize Automation and vRealize Business
- 3 Microsoft Previews Hyper-V Containers for Windows Server 2016
- 4 Mirantis Led FUEL Project Gets Installed Under OpenStack Big Tent
- 5 Red Hat Enterprise Linux 7.2 Adds Security, DR Features
Tip of the Trade: Fixing Filename Encodings
In the beginning was ANSI, which was later extended to ASCII, and that was the universal language of computers. But that did not encompass non-English languages, so dozens of incompatible extensions to ASCII were created to include other languages. This became a big mess, and none of the languages worked reliably. Then, one day, some brainiacs invented Unicode. Unicode aims to replace all of those incompatible, messy ASCII charsets with a single giant character set that assigns a unique code to each of the world's characters.
|Unicode is the accepted computer language standard, but inconsistencies and messes remain. To help with the cleanup, Linux and Unix users can use convmv, for converting the encodings of filenames, and iconv, for converting the contents of files.|
Unicode is still a work in progress, but it has been widely adopted and is now the accepted standard. However, we are still in transition, and there are often have funny little messes to cleanup, like archives of files in the old ASCII encodings. Linux and Unix users have two great little commands to fix this: convmv for converting the encodings of filenames, and iconv for converting the contents of files.
convmv, written in Perl, converts file and directory names into different character encodings. It converts only the filenames, not their contents. This example is a dry-run to illustrate what will happen if you convert all the filenames in the convertme directory:
$ convmv -f iso-8859-7 -t utf8 convertme/
By default, nothing gets changed, so when you're ready to do it for real, add the --notest option:
$ convmv -f iso-8859-7 -t utf8 --notest convertme/
Add -r to recurse through subdirectories.
iconv works pretty much the same way, except it operates on the contents of files; not the filenames:
$ iconv -f ISO-8859-7 -t UTF-8 convertme converted
convertme is the input file and converted is the new output file. If you do not specify an output file, the results are displayed on standard output. See man convmv and man iconv for complete command options.