Squid Internet Object Cache -- Advanced caching proxy server for Unix
More than a mere proxy, the Squid Internet Object Cache is one of the more popular proxy servers on the Internet for a simple reason -- it's a free and easy-to-configure tool that cuts down on Web traffic for large Internet sites.
The idea behind Squid -- and other Internet caches -- is quite simple: a cache is designed to bring documents as close as possible to users. Basically, a cache is set up to store popular Internet documents locally, check periodically to see if any of the documents have recently changed, and update those that have changed. Considering that more than 70 percent of Web documents aren't updated very often, a cache that is able to store a large number of documents can obviously serve as a valuable commodity for larger Internet and/or intranet sites. More than a mere proxy, the Squid Internet Object Cache is one of the more popular proxy servers on the Internet for a simple reason -- it's a free and easy-to-configure tool that cuts down on Web traffic for large Internet sites.
Because the equivalent of tens of thousands of Web sites can be stored locally (or even hundreds of thousands, depending on how much memory and distributed hardware servers are devoted to the caching server), users don't have to wait to download information from the sites. As a result, caching servers also reduce traffic from the local site to the Internet, freeing valuable bandwidth for more important uses or reducing connection costs.
Experts have shown that a well-designed cache can save enterprise networks at least 35 percent in terms of network bandwidth, and even a small cache of a few gigabytes can save as much as 25 percent. And the control lies with you -- using Squid's configuration files you can specify exactly what you want cached and the length of time it is to be cached.
Squid Internet Object Cache contains many distributed features typically found in more expensive caching servers. For starters, it's designed to work closely in conjunction with other Squid caching servers distributed on an intranet with multiple entrances to the Internet. As an example, let's say that a machine on your network asks for a URL that is not currently stored on the local cache. Instead of heading directly to the URL, Squid will first check for other caches on the network hierarchy to see if a copy of the URL is stored elsewhere. If no other cache on the network contains the document, the URL is then downloaded. Since Squid is primarily designed to work with other caches on a network, it will look for data at the closest points before resorting to accessing the Internet.
Squid is based on the Harvest Cache that can still be found on many Internet sites. Developed at the computer-science departments at the University of Southern California and the University of Colorado-Boulder, the Harvest cache is maintained by volunteers and can be downloaded here. In addition to Squid, both Netscape's Compass Server and the commercial WebGlimpse are based on Harvest to some extent.
Squid isn't the best solution for every situation. Since the Squid cache stores previously accessed materials, it can't always access dynamic Web pages (such as all Active Server Pages). A well-designed Web site is designed to work with such caches, but of course not all Web sites are well-designed. Squid can also have some problems passing through dynamically generated ad banners; typically only a static version is stored in the cache. Executable CGI scripts are obviously not cached, as well.
The latest major release of Squid, v2.0, is now available for download and includes the following new features: HTTP/1.1 persistent connections; lower virtual memory usage, as in-transit objects are not held fully in memory; totally independent swap directories; customizable error texts; internal FTP support; asynchronous disk operations; internal icons for FTP and gopher directories; SNMP support; routing requests based on AS numbers; and Cache Digests.
According to the Squid documentation, Cache Digests allow cooperating caches to exchange digests of their contents in a compact form. For example, if Cache A has a digest of Cache B, then A knows what documents are likely to be (or not to be) in Cache B. Consequently, Cache A can decide whether to fetch an object from Cache B without contacting Cache B first. However, Cache Digests use a "lossy compression" technique that can result in some inaccuracies.
Since its introduction, the Squid Internet Object Cache has been one of the leaders when it comes to caching Web documents. And while a new breed of commercial caching software and devices offer better performance and/or richer feature-sets, Squid remains a solid choice for Unix site designers who want to add caching functionality to their intranet or Internet site.
Pros: High-quality server distributed under the GNU GPL and available in source code, Designed to work well with multiple caches implemented on a multi-server site, Fine-grained control over exactly what gets cached
Cons: No Windows NT version, No browser-based administration, A bit of a memory hog relative to other proxy servers
Version Reviewed: 2.0 Patch 2
Reviewed by: Kevin Reichard
Last Updated: 7/2/02
Date of Original Review: 7/30/98