There’s no denying that spam is a pervasive and tricky problem, so it’s no surprise that spam prevention tactics are as varied as the problem itself.
Spam prevention efforts can be undertaken by an outsourced service or tackled in-house. They can occur at the desktop or be loaded into mail servers or machines linked to mail servers at the gateway or ISP.
To illustrate this, we have highlighted anti-spam solutions from a variety of vendors. The list of companies and solution types noted below is by no means complete. Rather, it represents a sample of companies, and types of companies, involved in the spam wars, as well as the solutions they are are currently offering.
Company (in Alphabetical Order) |
Product | Type of Company | Featured Anti-spam Approach |
Brightmail | Brightmail 4.0 | Anti-spam software | F |
Cloudmark | Authority | Anti-spam software | F |
CMS | Connect Praetor | E-mail infrastructure | F |
Gordano | Anti-Spam | E-mail infrastructure | BL, F, KS |
Lotus Notes | Mail Server | Business software | BL |
Lyris | MailShield | E-mail infrastructure | F |
Microsoft | Exchange | General software | BL |
Mirapoint | Mirapoint | E-mail infrastructure | F, H, RBLs |
Postini | Perimeter manager | E-mail security service | H, WL/BL |
Sendmail | Mailstream Manager | E-mail infrastructure | F, H, RBL, WL/BL, |
Stalker Software | CommuniGate Pro | E-mail infrastructure | F, KS, RBL, WL/BL |
Vircom | VOP modusGate | Anti-spam software | F |
Key to Anti-spam Approach Abbreviations: BL = Black List, F = Filtering, H = Heuristics, KS = Keyword Search, RBL = Real-Time Blackhole Lists, WL/BL = White List/Black List |
The most rudimentary and common approach to spam prevention is black list/white lists. As the name implies, this involves continuously updating huge lists of approved and disapproved domain names. Analysts say that this approach is labor-intensive and easily evaded by the spammer changing the originating domain of the spam.
Microsoft Exchange and IBM’s Lotus Notes offer black list/white lists. Free lists are available on the Web at www.mail-abuse.org, www.dsbl.org and elsewhere. “By including these they are able to provide some level of spam filtering and can check the little box in the checklist saying that they provide spam filtering,” says Marten Nelson, a research analyst with Ferris Research.
This approach, one expert says, catches about 80 percent of spam, which may be enough for most enterprises. The marketplace is still determining whether the radical increase in spam will entice e-mail server vendors and their enterprise and ISP clients to implement advanced solutions. The key question is one of economics: At what point do the network resources and manpower waste caused by spam justify the time, money, and “mind share” required to implement the advanced solutions?
The spam-fighting tools currently on the market that are offered by mail server vendors as well as companies specializing in anti-spam products are proliferating. Each technique has its inherent drawbacks, and many vendors feature one approach while incorporating others. Whether they do this for technical or marketing reasons is debatable. “The means to address spam vary, and range all over the map,” says Lih-Tah Wong, president of Computer Mail Services, which sells Praetor rules-based spam filtering software.
One intermediate approach is simple keyword searches. Obviously, finding individual words or phrases doesn’t determine whether a message is spam or not, so keyword searches must be combined with some other technique to have significant impact.
Spammers often respond to keyword searching with a technique called HTML cloaking. This involves replacing characters with their unique ASCII values. Ultimately, the computer displays the ASCII symbols as the intended letter. At the point that the anti-spam software is passed, however, the full word is not present. Consequently, the message isn’t deemed objectionable.
Users and managers that don’t favor keyword searches argue that HTML cloaking allows spam to pass. Conversely, those backing the approach say that the presence of HTML cloaking in and of itself is a potent sign of spam.
Keyword searches are generally combined with heuristic approaches, which use various methods to divine the context within which a word or phrase is used, to determine if the message is likely spam or not.
One heuristic approach is sieve filtering. This approach gives system administrators and others the ability to write scripts based on the characteristics of newly arriving spam that filter out subsequent e-mails following the same pattern. The downside to this approach is that it demands human intervention. “Those are the systems in which the feature set — whether or not the message is spam — is human chosen,” says Bill Yerazunis, author of a spam filter called CMR114.
Another heuristic filtering approach is Bayesian analysis. With Bayesian analysis, a tremendous amount of spam and an equal number of legitimate e-mail undergo sophisticated statistical analysis. A comparison of the results creates a baseline threshold against which newly arriving messages are judged.
Paul Graham, a programmer influential in this type of Bayesian filtering, thinks it is the answer. He noted that since he published an article on the topic in August, more than 20 open source Bayesian filters have been written. He says CRM114 (named after the radio security device in the bomber in the movie 1964 movie “Dr. Strangelove”) is more than 99.95 percent accurate. “The best and the most efficient is the open source stuff,” he says.
A downside of heuristics, proponents of other approaches maintain, is that they can have trouble reacting to subtle changes by spammers.