[linux-elitists] Current client-side anti-spam best practices

glen martin glenm@locutory.org
Tue Sep 26 08:33:57 PDT 2006

Jeff Waugh wrote:
> I receive most of it via mail
> aliases on foreign machines, so my MTA can't smack most of it down on the
> way in (I rarely get spam through it).
I'm not sure I understand this: after going through the remote alias,
does the adulterated message not come in through your own MTA?
>  2) Are there fiendishly clever ways of dealing with spam that comes through
>     via aliases on machines you don't control?
I don't know about fiendishly clever or best practice, but I'm having
pretty good success with spam lately, including through remote aliases,
filtering at my local MTA. 

#include "std_disclaimer.h"  /* my email, my domain, an MTA I control */

I combine a bunsh of stuff to provide an initial evaluation of received
  - postfix
  - clamav
  - amavis-new
  - spamassassin
  - smtp-amavis

I've hacked this up a bit to not filter, but instead to decorate the
inbound emails (by adding headers) based on whatever the various
programs detect about the email.  This is slightly unusual because it
means that they're all hooked up as a long filter (in essence) applyng
successive decroations - they don't just pass scores around through
their various return codes as seemed standard using amavis to coordinate
virus and spam detection.

Here are the decorations from a couple of sample emails:

X-Virus-Scanned: amavisd-new at locutory.org
X-Spam-Status: No, score=8.789 required=100 tests=[DATE_IN_PAST_12_24=0.881,
X-Spam-Score: 8.789
X-Spam-Level: ********

X-Amavis-Modified: Original mail wrapped as attachment (defanged) by mail.locutory.org
X-Virus-Scanned: amavisd-new at locutory.org
X-Amavis-Alert: INFECTED, message contains virus: HTML.Phishing.Bank-78

At the end of the detection process, I use crm1114 as a trainable
filter.  crm114 is able to train on (and then use in future) the spam
and virus decorations applied earlier in the process. I've set this up
on an IMAP account with 'Suspect' and 'Reclassify' mailboxes, and a cron
job to toggle and retrain on anything I've moved from either 'suspect'
or 'in' into 'reclassify'.

Net net, Ive been training my filter for several months, and then went
off-grid in August and came back online after a couple of weeks. Over
this period, a few thousand emails kept, a whole mess deleted, and
scanning both sets I had no false positives, 2 false negatives.

For the record, whatever cleverness may exist in the idea belongs to
others. Much of the notion of this email chain resulted from a
discussion at a local BAD meeting a year or so ago.  Amazing what comes
out of a good pint or two. :)

There's also more work to do. One larger issue it the numeric values in
the SA decorations don't lend themselves to easy classification by
CRM114. That is, "FORGED_MSGID_MSN=1.956" is probably tokenized in
crm114 as only FORGED_MSGID_MSN. This may be ok for this particular case
in which the test is binary, but I seem to recall that some of the
spamassissin rules detect degrees of badness and label as such with
similar numeric syntax. I think a better decoration should be a little
different, perhaps in named bands or something (eg
"FORGED_MSGID_MSN_HIGH").  But the training seems to work pretty darn
well notwithstanding.

I've also had some complaints from another user of my system, who would
like to see an auto-whitelist of known correspondents (that is,
addresses culled from outbound emails).

Also for the record, I used the filtering natively in Thunderbird for a
long while before this.  I found it a little clunky, not nearly so
accurate, and it didn't deal very well with my multiple email clients
accessing the same account.  My current method is very well integrated
into a multi-desktop IMAP usage model.



More information about the linux-elitists mailing list