[linux-elitists] (tmda) Re: Constraining Bogus challenges.

Aaron Lehmann aaronl@vitelus.com
Tue Sep 23 21:28:35 PDT 2003


[Tangent]

On Tue, Sep 23, 2003 at 10:21:04PM -0400, Andrew wrote:
> Offhand, why don't you use spamc/spamd?

Contrary to what people like Karsten Self would have you believe ;),
SpamAssassin is not the be-all/end-all solution that I wish it was.
SpamAssassin (spamd) uses something like 20MB of memory. No other
daemon I use, not even Squid, needs that much to run. And memory is
cheap - but SpamAssassin is SLOW. It shouldn't take so long to test an
email against patterns, and it's even less acceptable for it to use so
much memory for a rather trivial task. Bayesean filtering? Bogofilter
does it in the fraction of the time. The possibilities for
SpamAssassin's temporal and spatial lameness are that SpamAssassin's
code is utter shit and that Perl is much slower than C. I suspect the
problems are caused by a mix of both. When software requires serious
users (especially ISPs) to devote many more resources than truly
necessary to it, it is not good software.

SpamAssassin may work fine for the class of problems it was originally
intended to solve. Yet once one has to run spamassassin as a daemon to
make it somewhat more bearable, it's clear that it's poorly designed.
I would appreciate a good spam filter with a similar design writen in
a compiled language and using efficient algorithms. Here's one example
of an easy improvement that could be made to SpamAssassin if it was
not confined to the facilities of perl: DFA regexp matching. The
difference between the DFA and NFA is somewhat technical, but the
advantage to the DFA is that it does not backtrack, while the NFA, the
algorithm employed by perl, does. This is a lot better in pathological
cases where backtracking would have to explore absurd numbers of
possibilties. There's a good one documented in Mastering Regular
Expressions from ORA, but I don't have that book handy. Even for
normal regular expressions, I observed a performance increase on the
order of about 10 times by switching my web ad filter from the
standard glibc NFA to the GNU egrep(1) DFA. I'm not familiar with the
internals of SpamAssassin, but I'm confident there are more
optimizations among these lines that could be made in a rewrite.
Please, someone, write a good spam filter that is a respectable
application.



More information about the linux-elitists mailing list