Filtering I/O in Apache 2.0

One of the holy grails of the Apache developers has always been filtered or layered I/O, the ability for one module to modify the data that was generated by an earlier module. This ability was originally slated for inclusion in Apache 2.0, but when work began in earnest on 2.0, this feature was pushed aside, and marked for inclusion in 2.1 or 3.0. Two months ago however, the Apache developers had a small meeting, and designed filtered I/O for Apache 2.0. The work has been started, and there have been some filters written. Over the next few months, I will explain how this feature works, and how your modules can take advantage of it. One of the holy grails of the Apache developers has always been filtered or layered I/O, the ability for one module to modify the data that was generated by an earlier module.

The general premise of the filtered I/O design in Apache 2.0 is that all data served by a web server can be broken into chunks. Each chunk of data comes from the same place either a file, a CGI program, or it is generated by a module. We also knew that all of the data could always be represented as a string of characters, although that string may not be human-readable. Armed with that knowledge, we sat down to design the filtering system. One of our overriding goals, was that the filtering logic needed to be performance aware. It didn't matter if filters chose to ignore performance issues, but it did matter if the design hindered filters from knowing about performance issues. This meant that we needed to know more about that data than just what the data was, we also needed to know where the data comes from, and what it's lifetime is.

This meta-data is important when actually writing the response to the network. For example, if we have a very simple request that is just a page from disk, then we want to use sendfile (sendfile is provided by APR, and is available on all platforms, it the platform doesn't have a native sendfile, then APR loops reading the file and writing to the network.) If we take this example a step further, and make the whole response an SSI page, where one element is a file from disk, and the rest is generated, such as date strings, then we want to use a single sendfile call if possible. APR's sendfile provides an opportunity to include both header and trailer information with the file, which are sent using writev. In this example, we can send the HTTP headers, the full file, and the date string with one APR call (The number of system calls will differ depending on platform). Keeping the meta-data accessable is obviously a good idea.

In order to keep the meta-data available, the Apache developers needed to find some way to pass everything from one filter to the next. The data structure that was designed to do this is being called a bucket_brigade. Each bucket brigade is composed of multiple buckets. The buckets contain the data that we are sending to the client. And the type of bucket used makes up the meta-data.

Currently, we have a small number of bucket types, but the bucket API was designed to be extendable. The current bucket types are:

  • This bucket type is designed to store data allocated off the heap. This data will be available as long as the bucket is available. If the data needs to be modified and there is space in the bucket, it is acceptable to modify the data in place when using this bucket.

  • This bucket stores data allocated off the stack. This means that when a filter function returns, it is garaunteed that the data will not still be valid the next time this function is called. If the data has not yet been written to the network, then it must be converted to a heap bucket so that it is still available the next time the current filter is called.


This article was originally published on Sep 20, 2000
Page 1 of 3

Thanks for your registration, follow us on our social networks to keep up-to-date