Looking at Apache 2.0 Alpha 4
Since my last article, the Apache Software Foundation has released the fourth alpha version of Apache 2.0. In this article, I will review some of the features new to the 2.0 series and explain why they were added and how they will help site administrators.Development continues to roll along on Apache 2.0. In his latest column, Ryan Bloom details what's new in the recently released Apache 2.0 Alpha 4.
Piped and Reliable Piped Logs
Piped logs are a feature that Apache has had for some time, but they have just been added back into 2.0. Because they are a useful feature and are brand new to the 2.0 series, I will discuss them here.
Logs are very important to every Apache installation. They tell the administrator who is accessing the site and if something has gone wrong with the server. An easily apparent use for logs is to determine if somebody has tried to break in to the server. Logs are obviously not something to be taken lightly; however, there are also some drawbacks to using logs. The first problem with logs is that they can grow very large. Every time a person accesses a page on a site, a message is written to a log. A basic Apache installation does not do anything with logs other than write to them, which means the logs are going to get very large unless something is done about them. Piped and reliable piped logs provide a way to handle this problem.
The second issue with logs is that they can be slow. If an Apache configuration is setup to log the hostname of every machine that requests a page from a site, logging is likely to be very slow on your machine. This is because Apache, like all network programs, uses IP addresses instead of hostnames for all network communication. Apache relies on the local machine's hostname resolver to convert IP addresses into hostnames. This can be a slow process because of the protocol used by the Domain Name Service. The whole time that a thread or process is trying to convert an IP address to a hostname, that thread or process is not doing its primary job, serving web pages. On a heavily loaded site, this can become a very large performance bottleneck. Piped and reliable piped logs can also provide a method for a server to not be affected by this problem.
Now that two real-world issues that piped logs can solve have been identified, we can talk about what they are and how they work. Reliable piped logs and piped logs move the responsibility of writing the log to the file away from the Apache server to some other external process. When Apache starts, if the configuration file specifies that the logs are to be piped, Apache creates a new process and sets up a pipe between that process and the Apache parent process. When the child processes are created, they inherit that pipe and use it to send log messages to the logging process. This happens for each piped log, which means if piped logs are specified for the error, transfer, and access logs, the server will create three separate processes, one for each log. Apache takes advantage of a property of the size of the log messages to ensure that the logging process does not need to synchronize reading the logs. This allows a logging process to read one line from its standard input (the pipe), perform some operation on that string, and write it out to the log file. The log process then reads the next message from the pipe.
How does this help the two problems mentioned above? It allows people to write small programs that solve these problems easily and efficiently. In every Apache distribution there is a small program called rotatelogs. This program reads log messages from the pipe for a specified amount of time and then closes the real log file and renames it. Afterwards, it opens a new log file and begins the process over again. This keeps logs from getting too large, and allows the administrator to easily archive all logs in one convenient place. There is another program called logresolve which will perform the conversion from IP addresses to hostnames.