Configuring Apache 2.0 as a Forward Proxy Server Page 3
Once the proxy is configured, you must ensure the proxy service is secure so that only acceptable clients can gain access. There are a number of reasons for this, the primary one being that since a proxy server is often given pass-through access on a firewall, you don't want to allow clients from the Internet side to gain access, as the proxy server might be caching internal intranet content. You also don't want to waste precious Internet bandwidth providing a service that isn't intended for public use.
SWatch Reader Favorite! A forward proxy server provides Internet access for any number of clients via a single server. Learn how to configure your Apache server to accomplish these tasks and reap the benefits.
You can also use the proxy capabilities to restrict access to specific groups of people. For example, if the network was divided up into two network segments, you could accept proxy traffic from one group, but not another.
To secure the proxy server in this way you must include a proxy control block in the configuration file, like this:
<Proxy *> Order Deny, Allow Deny from all Allow from 192.168.1 .mcslp.pri </Proxy>
The basic format follows the normal access control block profile, so you can specify multiple entries made up for IP prefix, domain name, IP address subnets, and IPv6 addresses and subnets. As shown here, it is preferable to specify an IP address and domain, just to be sure.
A major benefit of a centralized point of access to the Internet is that a log of Internet pages accessed by an entire company or department is easily kept. To enable an access log (an error log is automatically generated for the main Apache process), add the following two lines to the configuration file:
LogFormat "%h %l %u %t \"%r\" %>s %b" common CustomLog logs/access_log common
Obviously, if you want modify the content of the log, the LogFormat directive is changeable.
The basic proxy configuration allows the server to act only as a proxy, relaying requests from clients to their destination. What it doesn't do is cache the content as it accesses the information. Instead, it exchanges the information directly.
To enable caching, you must use the mod_cache and mod_disk_cache modules (use --enable-cache and --enable-disk-cache during configuration) and add a few directives to specify the location, size, and 'refresh' parameters of the cache. To set up a modest disk cache, use the settings in the sample Apache configuration file:
CacheRoot "/export/http/apache2.proxy/cache/" CacheSize 5 CacheGcInterval 4 CacheMaxExpire 86400 CacheLastModifiedFactor 0.1 CacheDefaultExpire 1
In order, these directives configure the following:
- CacheRoot specifies the location of the disk cache. This example uses a directory within the Apache installation, but you might want to place it on a separate partition -- a good fast disk or suitable RAID solution are good choices.
- CacheSize defines the maximum amount of space that will be used for the cache on disk. Be careful with this setting; it's tempting to specify a size as large as the available partition, but this can lead to an inefficient cache, largely made up of information you never again use. Instead, consider making some assumptions about the expected amount of information to be downloaded by the users in a typical day, and then multiply that by the number that will use the proxy service. For example, 5 MB is a reasonable figure for light to medium use; for 100 clients, that translates into 500 MB.
- CacheGcInterval specifies the number of hours to wait before attempting to clean out unused objects from the cache. Set this too low, and you can force some objects to constantly be reloaded. Set it to high, and you risk filling the cache with stale data.
- CacheMaxExpire specifies the number of seconds for an object to be cached without checking the origin of the server to determine if the document has been updated. This helps keep the objects in the cache "fresh," as it specifies the maximum amount of time an object in the cache can be out of date.
- CacheLastModifiedFactor defines a value that will be used to calculate whether an item in the cache should be expired if the object hasn't explicitly been marked with an expiration date.
- CacheDefaultExpire specifies the number of seconds after which an object will be expired if no specific data is supplied about the expiration date or period from the original server.
The CacheMaxFileSize and CacheMinFileSize directives are also useful, as they set the maximum and minimum file size parameters for files to be retained in the cache. The default values are 100,000 bytes and 1 byte, respectively. Usually, you will want to prevent very large files (e.g., movies and applications installers) from being retained in a cache. That said, if a company regularly views static media files and has a reasonable amount of space available to devote to the cache, it makes sense to set the CacheMaxFileSize directive to a more video friendly limit. Be aware, however, that doing so will make other large objects cacheable.
You can prevent the caching of information from certain sites by using the NoCache directive, which accepts the name of a domain or host, like this:
NoCache barclays.co.uk first-direct.co.uk
As you can see from this example, this is particularly useful for sites where caching information is not desirable, whether for security reasons or because it is a dynamically hosted site that doesn't correctly specify the status of the pages it returns.
Apache supports very basic filtering when using the proxy feature. It enables the admin to block access to specific sites or domains explicitly within the configuration file through the ProxyBlock directive. This blocks specific hosts, domains, or fragments of names. To block a specific host you would use:
To block the domain, use:
To block any name or domain with a given string:
Obviously, this is best used by enterprises blocking access to sites it does not want its staff accessing. Another use for it is to stop Web advert and pop-up hosts and domains. However, specifying a particularly long list of entries in this section can slow down the Apache startup, as it builds an in-memory IP address list based on this information. This can cause the proxy service to return data although the source has been updated.
Proxying is one way to lower Internet connectivity needs. It also speeds up access for the entire network, and can help to secure it. Side benefits include the ability to block specific sites and to monitor the sites and files that users are downloading.
Original date of publication, 10/15/2003