rejecting spam at SMTP time (was Re: Postfix anti-antivirus (was Re: [linux-elitists] etc))

Rick Bradley
Fri Sep 24 08:35:39 PDT 2004

* Karsten M. Self ( [040924 05:03]:
> As Don writes this, I'm working through fallout of having run through
> some 28k messages backlogged over some six weeks, of which ~3k were
> spam.  What with my intensive incoming mail processing, it took six days
> to work through the backlog.

I've been seeing exponential growth in spam counts over the past year.
I had reached > 4,000 / day as of this week.  Filtering has been
absurdly effective.  Between the co-lo'd mail server for my various
domains running SA and my local procmail running simple bogofilter
(seeded with ~1M spam/ham mails over a year ago) I get > 99.99% accuracy
in both directions (actually I've not had a spam in my inbox in a couple
of months now, and in the other direction a graylisting process catches
the rare correspondence which would have been tagged as spam).  The
volume, however, is really depressing, and should only continue to grow.

I have a lot of addresses over a lot of domains, and a number of domains
with blanket catch-all rules.  The co-lo'd exim setup adds a header for
me called 'X-Original-Virtual-Address' which tells me which address the
mail was actually sent to (regardless of whether it was a Bcc: or not),
even though it all eventually ends up in the same bucket.  Since I save
every piece of mail I can analyze it.

I ran some stats on the 1.8GB of spam I have stored to determine how my
spam is targeted (by domain, by address, by addresses per domain, etc.)
and what I found is that probably 75% of the spam I get is either
shotgun spam (pick a random plausible string for the username and stick
the domain on the end), or bounces from shotgun joe-jobbing (do the
same, but that's what you use to send mails from, not to).  I found that
one particular domain for which I have a blanket address was seeing
mails to over 1,000 different addresses at that domain in the past few

I looked at my ham mails and extracted the valid email addresses for
that domain, turned off the blanket rule for that domain, and set up
explicit aliases for the good addresses (this is easy because my
friend/hosting provider has a fast web interface to quickly configure
all the virtual mail domain and aliasing configuration).

The jury's still out until the statistics have been averaged out for a
while, but in 24 hours I went from ~4,000 spams per day down to just
over 1,300 spams in one day.

I presume the growth curve will still be mostly exponential, but being
able to step back six months on the curve with a few minutes of work is
a godsend.  Having locked down the biggest culprit domain I have a
couple more, much further down the scale, which I can tackle next (I
want to watch the rate for a few days).  There's still plenty of spam to
legit addresses so that this will only go so far, but this doesn't hurt.
The downside is that it becomes harder to make up email addresses on the
fly to give out to people at parties or to put on luggage, etc.  I
probably needed to start getting more rigorous on that front anyway.
The other possibility is using a domain for that purpose which has no
other exposure.

My hosting friend and I are going to configure SPF for all the domains
handled by his service soon, and begin looking at factoring SPF into the
tagging process.  More than anything we expect this to gradually start
helping with the shotgun joe-jobbing problem (really the joe-jobbing
problem in general) as some of the bigger hosts begin deploying SPF as

--    MUPRN: 493
                       |  and rework Debord's
   random email haiku  |  work. This video just does 
                       |  it, and in fine style.

More information about the linux-elitists mailing list