Configuring Apache 2.0 as a Forward Proxy Server

Configuring Apache 2.0 as a Forward Proxy Server


January 4, 2008

What Is a Proxy Server?

Proxy servers do a number of different things, but the basic term proxy means to do something for somebody else, usually in an authorized capacity. There are in fact two types of proxy servers, a forward proxy and a reverse proxy. A forward proxy is used to support Internet access for a number of clients through a single server for security, caching, or filtering. A reverse proxy is used to redirect requests for a Web site to a number of servers for a client.

This article concentrates on the forward proxy server, which is generally used for the following reasons:

  • Security -- Because the proxy server can redirect requests, we can use it as a gateway to the Internet. Because it can be a single machine, it can act as an authenticated gateway through firewalls, while still preventing direct Internet access to clients.
  • Caching -- If one machine (the proxy server) is being used to access the Internet, it can also act as a cache, storing frequently used and accessed sites, graphics, and other elements. Even in a relatively modest installation, the use of a caching server can significantly improve the performance of an entire enterprise's Web access. It can also help lower bandwidth requirements, enabling organizations to squeeze more performance out of an Internet connection.
  • Filtering -- Because all requests for Web pages go through the proxy server, the proxy server can make decisions about which sites and information clients can view or access. A proxy server can simply block adverts and pop-ups (providing you can easily identify the site or URL) or entire sites.

Architecturally, the proxy server sits on the network, and may be the same machine that provides the Internet connection and firewall/filtering service. Figure 1 illustrates a basic network diagram for this.

Figure 1
Network Diagram

Using a proxy server like this relies on special configuration within the client browser to tell it to communicate its requests for a Web page directly to the proxy server, rather than directly with the host. For example, within Internet Explorer the proxy settings are available through the Connections tab of the Internet Options, as shown in Figure 2.

Figure 2
Internet Options Connections Tab

Apache can act as a proxy for either FTP or HTTP services, and it's possible to add other proxy types using extensions. Whatever protocol is used, the result of using the proxy service is that communication requires extra steps. For example, interaction between Client and ServerWatch.com instead of being:

  1. Client sends request to ServerWatch.com
  2. ServerWatch.com replies to Client

Becomes:

  1. Client sends request for ServerWatch.com to Proxy
  2. Proxy sends request to ServerWatch.com
  3. ServerWatch.com replies to Proxy
  4. Proxy sends reply to Client

Proxy configuration relies on setting a host address and port to listen to, enabling the proxy server and setting some optional settings to secure the server and configure the caching and blocking of the proxy service.

Original date of publication, 10/15/2003

Basic Settings

Setting up a forward proxy service is actually very straightforward. First, you must make sure the Apache 2.0 installation has been configured with the proxy module enabled. I also prefer to use a separate Apache installation to handle the proxy service. This makes the configuration easier -- especially as it means we can easily create a proxy server on an alternative IP and port allocation -- and provides far more flexibility. For example, you can start and stop the proxy service independent of the main Apache Web server. This is particularly useful should you wish to update the blocking settings.

To enable the proxy module during the configuration use the --enable-proxy option to configure; to specifically enable HTTP and FTP proxy services, and relocate Apache to a separate directory, run configure with the following command line options:

configure --prefix=/export/http/apache2.proxy --enable-proxy
--enable-proxy-http --enable-proxy-ftp

(For example purposes, we've broken the code into two lines. To enable the proxy module, the code must appear on one line.)

Once you've configured, built and installed the new Apache installation, you must update the configuration file or create a new one. The latter option is preferable, and with a proxy service, the configuration file is as simple as:

ServerRoot "/export/http/apache2.proxy"
Listen 192.168.1.8:8001
User nobody
Group nobody
ProxyRequests On

Going through that file step by step:

  • ServerRoot specifies the root directory used to hold the Apache configuration files.
  • Listen specifies the IP address and port number on which Apache will listen for requests. Although 8001 is used in this example, other common values are 8080, 8000, and 8008. (Remember that within Unix, anything lower than 1024 must first be executed by root to gain access to the port.)
  • User/Group sets the user name and group name to the one that the server will use during execution. This includes the typical 'nobody' accounts here, as well as special proxy accounts set up for that purpose. (If you are using names, you must also create the corresponding entries in the passwd/groups files.)
  • ProxyRequests configures Apache as a forward proxy.

That's it -- that's all you need to do to set up a proxy service within Apache. However, before rushing off to create one, there are several other configuration parameters to consider, including the security of the new proxy server and whether to enable caching.

Original date of publication, 10/15/2003

Security

Once the proxy is configured, you must ensure the proxy service is secure so that only acceptable clients can gain access. There are a number of reasons for this, the primary one being that since a proxy server is often given pass-through access on a firewall, you don't want to allow clients from the Internet side to gain access, as the proxy server might be caching internal intranet content. You also don't want to waste precious Internet bandwidth providing a service that isn't intended for public use.

You can also use the proxy capabilities to restrict access to specific groups of people. For example, if the network was divided up into two network segments, you could accept proxy traffic from one group, but not another.

To secure the proxy server in this way you must include a proxy control block in the configuration file, like this:

<Proxy *>
  Order Deny, Allow 
  Deny from all 
  Allow from 192.168.1 .mcslp.pri
</Proxy>

The basic format follows the normal access control block profile, so you can specify multiple entries made up for IP prefix, domain name, IP address subnets, and IPv6 addresses and subnets. As shown here, it is preferable to specify an IP address and domain, just to be sure.

Logging

A major benefit of a centralized point of access to the Internet is that a log of Internet pages accessed by an entire company or department is easily kept. To enable an access log (an error log is automatically generated for the main Apache process), add the following two lines to the configuration file:

LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog logs/access_log common

Obviously, if you want modify the content of the log, the LogFormat directive is changeable.

Caching

The basic proxy configuration allows the server to act only as a proxy, relaying requests from clients to their destination. What it doesn't do is cache the content as it accesses the information. Instead, it exchanges the information directly.

To enable caching, you must use the mod_cache and mod_disk_cache modules (use --enable-cache and --enable-disk-cache during configuration) and add a few directives to specify the location, size, and 'refresh' parameters of the cache. To set up a modest disk cache, use the settings in the sample Apache configuration file:

CacheRoot "/export/http/apache2.proxy/cache/"
CacheSize 5
CacheGcInterval 4
CacheMaxExpire 86400
CacheLastModifiedFactor 0.1
CacheDefaultExpire 1

In order, these directives configure the following:

  1. CacheRoot specifies the location of the disk cache. This example uses a directory within the Apache installation, but you might want to place it on a separate partition -- a good fast disk or suitable RAID solution are good choices.
  2. CacheSize defines the maximum amount of space that will be used for the cache on disk. Be careful with this setting; it's tempting to specify a size as large as the available partition, but this can lead to an inefficient cache, largely made up of information you never again use. Instead, consider making some assumptions about the expected amount of information to be downloaded by the users in a typical day, and then multiply that by the number that will use the proxy service. For example, 5 MB is a reasonable figure for light to medium use; for 100 clients, that translates into 500 MB.
  3. CacheGcInterval specifies the number of hours to wait before attempting to clean out unused objects from the cache. Set this too low, and you can force some objects to constantly be reloaded. Set it to high, and you risk filling the cache with stale data.
  4. CacheMaxExpire specifies the number of seconds for an object to be cached without checking the origin of the server to determine if the document has been updated. This helps keep the objects in the cache "fresh," as it specifies the maximum amount of time an object in the cache can be out of date.
  5. CacheLastModifiedFactor defines a value that will be used to calculate whether an item in the cache should be expired if the object hasn't explicitly been marked with an expiration date.
  6. CacheDefaultExpire specifies the number of seconds after which an object will be expired if no specific data is supplied about the expiration date or period from the original server.

The CacheMaxFileSize and CacheMinFileSize directives are also useful, as they set the maximum and minimum file size parameters for files to be retained in the cache. The default values are 100,000 bytes and 1 byte, respectively. Usually, you will want to prevent very large files (e.g., movies and applications installers) from being retained in a cache. That said, if a company regularly views static media files and has a reasonable amount of space available to devote to the cache, it makes sense to set the CacheMaxFileSize directive to a more video friendly limit. Be aware, however, that doing so will make other large objects cacheable.

You can prevent the caching of information from certain sites by using the NoCache directive, which accepts the name of a domain or host, like this:

NoCache barclays.co.uk first-direct.co.uk

As you can see from this example, this is particularly useful for sites where caching information is not desirable, whether for security reasons or because it is a dynamically hosted site that doesn't correctly specify the status of the pages it returns.

Filtering

Apache supports very basic filtering when using the proxy feature. It enables the admin to block access to specific sites or domains explicitly within the configuration file through the ProxyBlock directive. This blocks specific hosts, domains, or fragments of names. To block a specific host you would use:

ProxyBlock www.mcslp.com

To block the domain, use:

ProxyBlock mcslp.com

To block any name or domain with a given string:

ProxyBlock mcslp

Obviously, this is best used by enterprises blocking access to sites it does not want its staff accessing. Another use for it is to stop Web advert and pop-up hosts and domains. However, specifying a particularly long list of entries in this section can slow down the Apache startup, as it builds an in-memory IP address list based on this information. This can cause the proxy service to return data although the source has been updated.

Summary

Proxying is one way to lower Internet connectivity needs. It also speeds up access for the entire network, and can help to secure it. Side benefits include the ability to block specific sites and to monitor the sites and files that users are downloading.

Original date of publication, 10/15/2003