What's New and Improved in Apache 2.0 on Windows?

What's New and Improved in Apache 2.0 on Windows?

May 2, 2002

In April, the Apache Software Foundation (ASF) released Apache 2.0.35 for Windows and with it a Microsoft System Installer (.msi) based binary distribution for all Windows versions from 95 through XP.

The ominous warnings for Windows users, such as, "Apache 1.3 is not yet optimized for performance ..." and "not as stable or secure as the Unix version" are gone. Outside testing labs have found Apache 2.0's performance to be comparable to Microsoft's IIS product. And Apache 2.0 has finally arrived as a viable alternative to IIS on the Windows operating system.

All this might leave you wondering just what changed in respect to Windows between Apache 1.3 and 2.0 development.

This article outlines three major changes made to Apache that affect the 'fringes' (i.e., those platforms other than Unix), especially as they relate to the Windows platform. The first is a split between the server and the request processing logic into a Multiple Process Module, or MPM. The second change is the breaking out of platform-specific code into the Apache Portability Runtime library, or APR. The third, and biggest, Windows-NT-specific change is the introduction of Unicode, or utf-8 encoded filenames.

The Multiple Process Module (MPM) does the work for each platform or architecture by distributing many requests for Web pages between processes, threads, and whatever other organizational units the MPM's author has designed. In Windows' case, the new "winnt" MPM introduced the concept of AcceptEx, which is the mechanism used by MS's own IIS server.

AcceptEx is a huge optimization win, because this one TCP/IP socket call:

  1. Accepts one connection to the server
  2. Discovers the addresses of the client and this server
  3. Collects the first part of the request (including some or all of the request headers and POST data)

This all happens before Apache 2.0 even finds out the request is pending, and the answers to all of the questions Apache must ask are returned with a single query, improving the initial response time to make it more than ten times faster than Apache 1.3.

Apache 2.0's "winnt" MPM is a threaded architecture, just as Apache version 1.3 for Windows is. However, threaded MPMs on Unix are new to Apache 2.0. This means _all_ module authors must now pay attention to "thread safety," an issue neglected by some Apache 1.3 module authors (who were principally or only interested in Unix).

In the past, when one attempted to build such modules on Windows, the modules might crash and burn, or simply show obscure bugs that Unix users could not reproduce. Even mod_rewrite and other core modules contained latent bugs related to threading. Modperl would serve only one request in Apache 1.3 because it was designed for multiple processes, rather than threads. Now that everyone is playing by the same rulebook, Apache 2.0 modules are for the most part of similar quality on Unix and Windows.

The second powerful change to Apache 2.0 on Windows was the introduction of APR. Designed for the same reasons as the Mozilla Project's NSPR library, the Apache Portable Runtime (or Portability Runtime) library allows Apache's core code to treat every platform the same way. The vast majority of the code inside of Apache HTTP server remains the same for every platform. No longer does the Unix version differ from Windows, NetWare, OS/2, and other obscure platforms' code. In the core HTTP server, all code is the same, and everyone shares the same bugs. Thus, everyone benefits when bugs are quickly identified and fixed.

APR itself deals will the differences between files, locking, TCP/IP sockets, processes, threads, and the many more distinctions between platforms. APR has been adopted by several projects that aren't under the umbrella of the Apache Software Foundation, including the Subversion project (which extends the WebDAV protocol for version control, the V at the end of DAV.) Other projects that plan on using APR are under way, and it has even received a nod of approval from Rob McCool (the original author of the NCSA server, which makes him the grandpappy of the Apache server).

Each version of Windows has introduced new features that improve the programmer's application programming interface. Examples include file system features and TCP/IP sockets. APR helps Apache leverage these features by having them rely on these APR library abstractions when the particular platform supports the feature but choosing slower means to handle the request (although just as functional) when running on a platform that doesn't provide support (such as an older version of Windows.)

As more projects adopt APR, platform-specific functions are tested and exercised in ways the Apache project authors never imagined. This benefits APR by shaking out platform-specific bugs that have no effect in the Apache Web server today, but that could easily be uncovered with future changes to Apache.

The single biggest benefit in Apache 2.0 that Windows NT (as well as 2000 and XP) users will notice is Unicode support. Every file name request to httpd is treated as a Unicode filename and encoded in utf-8. Request URLs were originally defined as ASCII-encoded, which leaves the other possible 128 characters undefined and up to the imagination of the author.

Windows systems traditionally used the local codepage, so Apache running on U.S. versions of Windows handled filenames entirely differently than how it handled them on Japanese versions. However, this leaves no clue in determining how to serve a request from a U.S. browser vs. a request from a Japanese browser. No rules were ever put in place to provide consistent "hints" of the meaning of those other characters.

Most Internet drafts that attempt to deal with this ambiguity focus on utf-8 encoding of Unicode to extend the available characters in request URLs. With utf-8 request URLs and Unicode filenames, Apache running on any Windows NT machine can access the filenames of any language or characters. The autoindex module even responds to a request for a Web directory listing with the utf-8 character set identifier, so all modern browsers display all the filenames correctly.

There is one other side effect to Unicode; this change is marginally faster than asking Windows NT to handle the old, 8-bit local codepage filenames and identifiers. Windows NT (and 2000, and XP) were designed from the ground up as an international operating system, using Unicode throughout the entire kernel. Every identifier within the kernel is Unicode, so every single system call using old codepages must also translate the identifiers.

On the other hand, utf-8 translation is extremely efficient. So even though it took many years, Apache has caught up with Windows and the worldwide adoption of Unicode and utf-8 standards.

Unicode and utf-8 are not part of the URL specification today. But nearly every working group is moving in that direction, so this change is expected to provide years of service from Apache 2.0 as Web site authors continue to increase their sites' appeal to international users.

The Apache 2.0 configuration and htaccess files may be saved as utf-8 text files (even Windows' notepad provides this feature) to provide access control over directories and files. But these benefits aren't available to Windows 9x (95, 98 or ME) users, since this fundamental API change was introduced with Windows NT.

These changes to Apache 2.0 make it the clear Web server of choice for enterprises wishing to standardize on the most widely used Web serving technology on the market. Whether you're running Windows, Unix or a heterogeneous infrastructure Apache 2.0 now fits the bill.