Staying Out of Deep Water: Performance Testing Using HTTPD-Test's Flood
June 3, 2003
Once you've set up your server and users are accessing your Web site, the last thing you want to hear about are performance problems with the site. You can test the system manually, but there are limitations to manual-based testing.
One major downside of manual testing (aside from the time investment) is that it doesn't reveal where the real problem with the site lies. Is it a configuration problem with the server, a problem with some dynamic elements, or a more fundamental network performance issue?
The Apache HTTP Project includes a sub-project called HTTPD-Test. As the name suggests, it's a test suite for Apache and HTTP in general. The suite contains a number of different elements, and this article will focus on the one known as Flood. Flood is so named because it is used to flood an HTTP server with requests to test its response times.
Flood uses an XML document with the necessary settings -- URLs and optional POST data -- to send requests to a given server or range of servers. Flood then measures the time it takes to:
With these four criteria being measured, administrators can identify whether the problem is with the Apache configuration (or any other HTTP server), the sheer load and performance of the hardware, or a network bottleneck.
You can download the packages -- httpd-test and apr/apr-util -- required for building from the CVS server at Apache. You'll need to log in first though (the password is 'anoncvs'):
Once you have the source, you must build and compile the application using:
You're now ready to go!
Flood is configured through an XML file that is used to define the parameters for testing the Web site. When testing, Flood uses a profile, which defines how a given list of URLs are accessed. Requests are generated by one or more farmers that are, in turn, members of one or more farms. You can see this more clearly in the illustration below.
As illustrated in the graphic, we have one farm, which specifies two sets of five farmers. Farmer Joe uses ProfileA and a list of five URLs, and farmer Bob uses ProfileB with a list of three URLs. The farmers request the URLs directly from the Web server. Flood uses threads to create the farmers, and then collates the information the farmers collect into a single data file for later processing.
The XML file contains definitions for four main elements: the URL lists, profiles, farmers and farms.
The URL list is just that, a list of URLs to be accessed. URLs can be straight requests or specific types (GET, HEAD, and POST types are supported, as is the capability to supply data accordingly for dynamic driven sites).
Profiles define which URL list to use, how they should be accessed, what type of socket to use, and how the information should be reported.
Farmers are responsible for the actual request process. The only configurable elements are the profile to use and the number of times to process the profile. Profiles are executed sequentially by each farmer but can be repeated, so you would end up accessing, for example, urla, urlb, urla, urlb, and so on.
Farms specify the number of farmers to create and when. By increasing the number of farmers created by a farm, the number of simultaneous requests is increased. Additional settings enable you to create a number of initial farmers, and then increase that number at regular intervals. For example, you could initially create two farmers, then add a new farmer every five seconds up to a maximum of 20. Depending on your URL list and server performance, this could result in a slow rise to 20 simultaneous accesses for a given period, and then a slow fall back to zero. Alternatively, it could give the effect of a regular number of users accessing a set number of pages for a longer duration, with peaks of five or six simultaneous requests.
Note: The current version of Flood supports only one Farm, and it must be called 'Bingo'. You can, however, specify multiple farmer definitions within the single farm, which achieves the same basic effect.
By tuning the farm, farmer, and URL list parameters you can control the number of requests, simultaneous requests, overall duration (as a function of the URL list, repeat count and number of farmers) and how the requests are spread over the duration of the test. This allows you to very specifically test for different situations.
The three basic (and rough) rules of configuration with Flood to remember are:
A sample configuration for Flood is in the examples folder that came with the distribution; round-robin.xml is probably the easiest to one to start with. This article, however, will not discuss the specifics of editing the XML, or even processing the data file generated.
Instead, we will examine how to tune the parameters to test different types of Web sites. To help understand the implications of the next section, here's a quick look at the results of the analyze-relative script from the examples directory. In this case it shows the results of a test on an internal server:
From these results you can see the average connect, write (request), read (response), and close times for a single page. You also get a basic idea of the number of requests handled per second by the server.
Testing 'News' Style Web Sites
The majority of news-style Web sites -- New York Times, Slashdot, ServerWatch, and even many blog sites -- have a main index page, which everybody accesses, that contains the links to the main stores and typically one or more 'story' pages when a user deems the story interesting enough to read in full.
In general, this results in a fairly steady stream of people hitting the main page and a variable number hitting specific other pages. If a site publishes an RSS/RDF feed it will also see a fair number of accesses directly to story pages without first viewing the home page. Nearly all of these types of site use dynamic elements, and using Flood is also a good way to test the dynamic performance of your site, especially if you can compare that to a raw HTML-based response.
You can simulate the news style requests through Flood by using the following settings:
Testing Shopping Sites
Shopping sites, online product catalog, and other more interactive sites have a different usage profile. Although some people will hit your home page, some will come in directly to another page within your site. Most users will also spend more time browsing around the site -- they'll look at a product page for a number of products, perhaps do some searches and even click through to other related or similar products.
Thus, you should test with a higher number of URLs in the list, larger repeat counts (to simulate a larger number of users), and lower simultaneous access than with news site:
Testing the "Slashdot" Effect
Occasionally, a situation will arise where you must check how your system copes with thousands of users trying to access the site at the same time. Many sites have already experienced the "slashdot" effect, whereby a mention of a particular page on the Slashdot Web site (www.slashdot.org) results in these large, simultaneous, requests.
Typically these requests are for only one page, and we can test for that with Flood by creating hundreds, or thousands, of farmers simultaneously accessing the server for just that one page. To simulate the rapid sequential access by a number of readers over a period of time, set a high repeat count and use the delay system to ramp up the requests to their high point.
Tips for Testing
For any of these tests to work properly, you must keep a few things in mind:
Future articles will examine ways to summarize the report information and how to test more complex sites and servers.