Unicode Support Page 4

By William Rowe, Jr. (Send Email)
Posted May 2, 2002

The single biggest benefit in Apache 2.0 that Windows NT (as well as 2000 and XP) users will notice is Unicode support. Every file name request to httpd is treated as a Unicode filename and encoded in utf-8. Request URLs were originally defined as ASCII-encoded, which leaves the other possible 128 characters undefined and up to the imagination of the author.

Windows systems traditionally used the local codepage, so Apache running on U.S. versions of Windows handled filenames entirely differently than how it handled them on Japanese versions. However, this leaves no clue in determining how to serve a request from a U.S. browser vs. a request from a Japanese browser. No rules were ever put in place to provide consistent "hints" of the meaning of those other characters.

Most Internet drafts that attempt to deal with this ambiguity focus on utf-8 encoding of Unicode to extend the available characters in request URLs. With utf-8 request URLs and Unicode filenames, Apache running on any Windows NT machine can access the filenames of any language or characters. The autoindex module even responds to a request for a Web directory listing with the utf-8 character set identifier, so all modern browsers display all the filenames correctly.

There is one other side effect to Unicode; this change is marginally faster than asking Windows NT to handle the old, 8-bit local codepage filenames and identifiers. Windows NT (and 2000, and XP) were designed from the ground up as an international operating system, using Unicode throughout the entire kernel. Every identifier within the kernel is Unicode, so every single system call using old codepages must also translate the identifiers.

On the other hand, utf-8 translation is extremely efficient. So even though it took many years, Apache has caught up with Windows and the worldwide adoption of Unicode and utf-8 standards.

Unicode and utf-8 are not part of the URL specification today. But nearly every working group is moving in that direction, so this change is expected to provide years of service from Apache 2.0 as Web site authors continue to increase their sites' appeal to international users.

The Apache 2.0 configuration and htaccess files may be saved as utf-8 text files (even Windows' notepad provides this feature) to provide access control over directories and files. But these benefits aren't available to Windows 9x (95, 98 or ME) users, since this fundamental API change was introduced with Windows NT.

These changes to Apache 2.0 make it the clear Web server of choice for enterprises wishing to standardize on the most widely used Web serving technology on the market. Whether you're running Windows, Unix or a heterogeneous infrastructure Apache 2.0 now fits the bill.

Page 4 of 4

Comment and Contribute

Your name/nickname

Your email

(Maximum characters: 1200). You have characters left.