Slamming Spam: A Guide for System AdministratorsPreface
This book is meant to be a reference for the email system administrator who has been asked to implement an anti-spam solution for their organization. This is an administrator's "how to" stop spam book. It is very hands on, with none of the "why people spam" or other topics which are usually only peripherally interesting or useful to a mail administrator.
Fighting spam is a complex problem, with many potential technical, legislative, and social solutions. No book could ever hope to cover them all in a reasonable amount of space. In fact, when considering only the possible technical spam-fighting solutions, it isn't possible to give them all the coverage they require. Our focus in this book is on the widely used open source anti-spam solutions available for major mail transfer agents (email servers).It has all the latest information on the book, including updated URLs, errata, and other useful information in the fight against spam.
Who This Book Is For
The reader is assumed to have a limited knowledge of Linux/Unix. In most cases, step-by-step instructions are provided for the covered package or approach. These "cookbook" examples are meant to work for most installations, with minimal changes and/or customizations. While some knowledge is assumed of the mail-transfer agent software used (such as Sendmail), the administrator doesn't need to be a mail server expert or Linux guru to implement the solutions outlined here.
You will learn about the best current anti-spam methods and software available. Most of the methods are open source and freely available (as in free beer). These open source solutions offer the "best of breed" anti-spam solutions available today. Implementing open source solutions requires more work than commercial solutions, but often the administrator ends up with a more flexible, better solution than is otherwise available.
We initially thought we would discuss anti-spam services such as Postini and Symantec's Brightmail in the book. However, we found that most of the commercial anti-spam solutions (such as anti-spam firewalls) and services were documented quite well and didn't require additional coverage. As a result, most commercial solutions are only mentioned in the Introduction. The only non-open source anti-spam solution covered here (McAfee SpamKiller) is directly related to the commercial mail servers coveredIBM/Lotus Notes/Domino and Microsoft Exchange.
The IBM Lotus Domino and Microsoft Exchange administrator has a choice. An anti-spam solution can be implemented directly as part of the mail server, since both IBM Lotus and Microsoft Exchange support plug-ins. To supplement or as an alternative to a tightly integrated solution (like McAfee SpamKiller), additional open source email servers can be deployed specifically to perform spam filtering or virus checking. These anti-spam/virus servers would process the message before sending it on to the Domino/Exchange server for delivery to the recipient.
While adding to the "box count" an administrator needs to manage, this approach does enable an open source best-of-breed solution to these otherwise "closed" commercial email servers. A hybrid approach can reduce the out-of-pocket cost while giving the administrator much flexibility in tweaking the anti-spam solution.
What You Will Need
The solutions in this book focus on Linux, on the server side. There is some coverage of the client side, but primarily the client coverage is meant to complement the server implementations we examine. Although the solutions presented here have been tested on Debian and/or Fedora Core Linux, they should work on almost every version of Linux available without too many modifications.
The covered mail transfer agents (MTAs) are Sendmail, Postfix, qmail, IBM Lotus Domino, and Microsoft Exchange. We assume the reader has a previously installed and working MTA, as the task of installing and configuring a single MTA can be a book unto itself. SMTP authentication support for Postfix, Sendmail, and qmail may require the recompilation of the MTAs in order to implement. Having a previously installed compiled and working MTA makes SMTP AUTH much easier.
We assume the reader has root access to the machine(s) they want to implement the anti-spam solutions covered here. Although many of the solutions do not require root access and can be installed and run as a "regular" user (though sometimes this requires configuration changes), we assume root access in our examples. You will see the use of root only when absolutely necessary. You won't see us compiling or installing anything as the root user, unless there is no other way to do it.
Often, we use the sudo command in order to run privileged commands which otherwise would require the root password. sudo is potentially a better way of giving out root access, without disclosing the root password. The commands prefixed by sudo could just as easily be run as root, assuming the root user's path is identical to the unprivileged user's path. For many examples, we assume the user performing the installation tasks has write access to /usr/local.
A few notes regarding other Linux/Unix command assumptions. We presume the reader has access to and knowledge of the following Linux utilities:
tar for tar formatted archives
gzip for GNU zip formatted archives
zip for the Info-zip formatted archives
bzip2 for bzip2 formatted archives
wget, lynx and/or ftp for retrieving source archives
We presume you have a recent version of gcc on the system to build the anti-spam utilities outlined here. Some of the packages covered here specifically require GNU make. Most Linux distributions come with GNU make. If you are building these solutions on a BSD derivative such as FreeBSD, or another platform such Sun Solaris or HP-UX, you may need to install GNU make for the spam-fighting utilities that require it.
In this book, we often mention maildir and mbox (or mailbox) formatted files. You should be aware which type of mailbox your email server software uses. The configuration for many anti-spam utilities covered in this book will vary depending upon which mailbox format is used. (Lotus Domino and Microsoft Exchange use their own internal format, so the mailbox format doesn't apply to those email servers.)
The mbox format stores the messages for a particular user in one file per folder. Because mbox was the original (and at one time only) mailbox format, it has wide support. Sendmail and Postfix use mbox formatted mailboxes by default. Mailboxes in the mbox format work fine in many installations, but can pose problems for some administrators in some cases. For example, mbox formatted mailboxes on NFS-mounted filesystems have locking issues that can result in mailbox corruption.
Maildir stores each message as individual files, with unique names in a directory structure with a directory for each folder. In many cases, a "/" after a filename parameter will indicate maildir formatted message directory, and the lack of a "/" will indicate that a mailbox is in mbox format. qmail uses maildir formatted mailboxes by default. Postfix can be configured easily to use maildir formatted mailboxes. If Procmail is used as the mail delivery agent, Procmail can easily be configured to use maildir format by specifying the folder name with a trailing "/".
How This Book Is Organized
This book can be read cover to cover in order to give the reader a hands-on view of the many methods to fight spam. However, the individual chapters are self-contained, so if there are specific anti-spam solutions you want to implement, you can just skip to those particular chapters.
Chapter 1, "Introduction," is an overview of some of the currently available major anti-spam technologies. It is useful for putting the solutions provided in the rest of the book in context. The focus is designing an anti-spam infrastructure for an organization's network, walking through policy, information gathering, design questions, and goals. If you are interested in designing an anti-spam architecture from scratch, Chapter 1 is an excellent starting point.
Chapter 2, "Procmail, " is a tool often used as a mail-delivery agent by anti-spam software to complete the job of fighting spam. For example, many statistical analysis tools depend upon procmail to perform the filtering of messages into the spam or non-spam folders. If the anti-spam tools of interest require the use of procmail, this chapter should be read if the reader is not familiar with the procmail utility.
Chapter 3, "SpamAssassin," covers the widely known and used spam classifier program. This chapter contains a treatment of the popular anti-spam scoring program, from installing the required packages to configuring SpamAssassin, and ruleset (scoring) creation. If the reader is planning to utilize a general purpose anti-spam filter, SpamAssassin is an excellent choice.
Chapter 4, "Native MTA Anti-Spam Features," covers the native anti-spam capabilities included with the covered open source MTAs. Topics covered here include whitelisting/blacklisting, blackhole listing services, tweaking the MTA to help block spam, and other functions native to the modern MTA. If you wonder what the access database is, or how to tweak Postfix's configuration to block the PIPELINE command, then this is a good chapter for you.
Chapter 5, "SMTP AUTH and STARTTLS," shows how to secure the covered MTA's from sending unwanted outbound spam. Cyrus SASL is used as the basis of SMTP AUTH and STARTTLS functionality for the Sendmail and Postfix MTAs. Installation and configuration of Cyrus SASL for Sendmail and Postfix is covered, as well as the netqmail-1.05 distribution of qmail, which includes patches providing SMTP AUTH and STARTTLS functionality.
Chapter 6, "Distributed Checksum Filtering," covers the Distributed Checksum Clearinghouse (DCC) and Vipul's Razor protocols for exchanging email checksums to identify bulk emailings. Distributed Collaborative (or Checksum) Filtering is an excellent way to help determine whether a message is spam by querying other servers and seeing the number of times a particular message has been processed by other servers.
Chapter 7, "Introduction to Bayesian Filtering," gives the reader a working knowledge behind the most efficient spam-fighting technology to date, Bayesian analysis. Written by Rob Kolstad, it gives an accessible treatment of how the Bayesian analysis algorithms are implemented in the covered applications as well.
Chapter 8, "Bayesian Filtering," covers installation and configuration of a number of the more popular Bayesian filters available, including bogofilter, ASSP, and CRM114.
Chapter 9, "Email Client Filtering," walks the reader through the built-in anti-spam capabilities in Microsoft Outlook, Microsoft Outlook Express, and Mozilla Messenger. It also covers POPFile, one of the Bayesian filters available for any POP3-compliant email client platform.
Chapter 10, "Microsoft Exchange," covers the basic anti-spam capabilities in this popular email server, including the Intelligent Message Filter, Microsoft's anti-spam solution based upon its Smartscreen technology. Chapter 10 also covers McAfee SpamKiller for Exchange 2.1.1, which is an implementation of SpamAssassin tightly integrated into Exchange.
Chapter11, " Lotus Domino and Lotus Notes," walks the reader through the built-in anti-spam capabilities in this popular enterprise email server, Domino, and associated email client, Notes. McAfee SpamKiller for Domino 2.1, a SpamAssassin-based implementation tightly integrated into Domino, is also covered. In addition, how to set up Lotus Domino for use with SMTP AUTH/STARTTLS is detailed.
Chapter 12, "Sender Verification," covers some of the lesser known open source products available in the areas of challenge response and one-time use email accounts (Active Spam Killer and Tagged Message Delivery Agent). Also covered is a sender compute implementation with very nice CRM114 integration known as Camram.
Appendix A covers Sender Policy Framework, a relatively new method for determining the validity of sending email messages by domains publishing "reverse mail exchanger" (MX) records, and recipient email servers enforcing those SPF records published by domain owners.
Appendix B shows the reader how to read email headers, and covers tools associated with spam fighting including SpamCop. It uses an example spam message to show how spammers try to obfuscate their intentions.
Appendix C explains the SpamAssassin default ruleset as it is shown on the SpamAssassin web site.
Appendix D covers SpamAssassin utilities command line interface options.
Appendix E shows SpamAssassin configuration file keywords.
Appendix F covers DSPAM, a Bayesian classifier designed for speed and accuracy, aimed squarely at the organization with thousands of email boxes.
Appendix G contains a list of resources the spam fighting reader should find useful.
No project like this occurs without the assistance of numerous people, some of whom are listed here.
First of all, we would like to thank Rob Kolstad for contributing Chapter 7, "Introduction to Bayesian Analysis". This is an accessible and thorough treatment of the theory behind what we consider the most important spam-fighting technique available today.
We owe a great debt of gratitude to all the people from Pearson: Mary Franz, Noreen Regina, Jim Markham, and Lori Lyons.
The following people reviewed the entire manuscript, for which we are greatly indebted: Fredrick M. Avolio, Eric S. Johannson, and Sarah Ratta. The following individuals reviewed pieces of the manuscript under very short notice, for which we are very grateful: Tim Speed, Henrik Walther, Lars Powers, and Pete Moulton.
We would like to thank all the authors of open source packages used in this book, along with the many people who have devised (and shared) their anti-spam solutions through web sites, email lists, and other avenues. Without people like you, our inboxes would be even more flooded with spam! We truly stand on the shoulders of giants.
The Resources appendix lists many of the URLs we used in building the software components listed in this book. In particular we would like to thank the following people for allowing us to use portions of their web sites in parts our coverage.Mastaler.
From Microsoft's public-relations firm of Waggener Edstrom, we would like to thank Tina Austinson and Amy Petty. From IBM/Lotus, we thank Erica Topolski and Edmund "Ted" Stanton. From McAfee Inc., we thank Tracy Ross, Zoe Lowther, Tim Smithson, and Brian Barnes. From Microsoft support, we thank Fred Wander.
Robert Haskins thanks: Jim Markham of Pearson for his very able assistance in manuscript preparation; my employer, Renesys Corporation (especially Todd Underwood, Andy Ogielski, Jim Cowie, BJ Premore, Rob Bushell, Eric Smith, and Joe Edelman) for their ideas, feedback, and support; David Webster of Computer Net Works for his support and the use of CNW facilities; and most importantly, to my spouse Mary and children Claire and Peter for their encouragement, patience, and understanding during this project.
Dale Nielsen thanks: My partners at Avacoda, LLC, Daniel Dee and Scott Reed, for the use of the Avacoda computing lab facilities as test beds for the software described herein; and especially my wife Janice and my daughter Crystal, for their willingness to have their email put through experimental anti-spam configurations, but most of all for their patience and support over the months that were spent on this project.
© Copyright Pearson Education. All rights reserved.