rejecting spam at SMTP time (was Re: Postfix anti-antivirus (was Re: [linux-elitists] etc))

Karsten M. Self kmself@ix.netcom.com
Fri Sep 24 03:02:00 PDT 2004


on Thu, Sep 23, 2004 at 11:15:55AM -0700, Don Marti (dmarti@zgp.org) wrote:
> begin  Gerald Oskoboiny quotation of Tue, Feb 03, 2004 at 02:08:50AM -0500:
> 
> > I recently started rejecting most incoming mail to my site, and
> > it feels great! (anything with SA score > 10 is rejected.)

As Don writes this, I'm working through fallout of having run through
some 28k messages backlogged over some six weeks, of which ~3k were
spam.  What with my intensive incoming mail processing, it took six days
to work through the backlog.
 
> Good things about SMTP-time rejection:
> 
> 1. It saves you bandwidth.

Needn't, though it can
 
> 2. It saves you disk space.

Dittos.

As has been pointed out, and most critically:  it provides instant
feedback to those who are the victims of false positives.  

There are a number of things spam does, but among the worst is to create
a great many opportunities to have email slip off into the great void.
Neither here nor there.  Tangible example:  I just replied a few minutes
ago to mail sent me June 25, which got more-or-less lost in the pile.
Found it, and thankfully it was low-criticality, but I was having worse
problems when dealing with a single, large, unsorted spoolfile.

Creating an imperfect, but fast-acting, filtering system would improve
the situation *markedly*.

 
> Bad things about SMTP-time rejection:
> 
> 1. It gives spammers information about messages that
>    definitely aren't making it through.

In the grand scheme of things, this is probably a relatively minor sin.

Fact is, most people are probably using one of a few dozen spam
filtering tools, of which SpamAssassin, Brightmail, and a few of the
Bayesian classifiers are at the top of the heap.  Any idiot spammer can
(and some do) run their spam through SA to get at least a rough idea of
how it will score.

Fact is, I'm *still* getting a hell of a lot of high-scoring spam.  Some
of it over 100[1].  I'm seeing a mean score of 18.7 on some 8700
spams from the past 30 days.

From R, some summary stats:

       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      -5.90   11.20   16.80   18.67   25.20  110.60 

...or for deciles:

       0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100% 
     -5.9   7.5  10.2  12.4  14.5  16.8  19.7  23.1  27.3  32.3 110.6 

...and looking at that first 10% range more closely (which is where we
get the problem of low-scoring spam):

       0%    1%    2%    3%    4%    5%    6%    7%    8%    9%   10% 
    -5.90  4.20  4.70  5.00  5.30  5.65  6.10  6.50  6.90  7.20  7.50 

...so I've got issues with maybe 1% of spam, and most of that appears to
be tripping on other rules I've got set.


And while spammers *can* theoretically get mail to an individual by
finely tuning characteristics, doing so greatly raises the costs of
spam.  Remember, we're talking a medium in which breakeven's at tenths
or hundredths of a penny per email.  Making spammers work *does* address
spam.


 
> 2. It makes it harder for users to deal with
>    false positives.  

Which user, and harder than what, Don?

>    The user can't just point a browser at the webmail interface to the
>    spam bucket when the SMTP server never accepted the message.

If you're addressing the problem from a recipient's perspective, sure.
But for me, wading through spam means going through hundreds or
thousands of messages.  Enough of a pain with a powerful tool such as
mutt.  I'd hate to do that through a browser.

 
> 3. (and this is the big one) It saves spammers
>    bandwidth.  Yes, some spammers get bandwidth at
>    no charge, but only by criminal means, so its cost
>    is in risk, not money.

First off, this needn't be the case.  It depends on how you implement
SMTP-time rejection.  There are lots of things that can be done to cost
the spammer time, bandwidth, cycles, or other forms of (loosely defined)
hash cash.

Second, spammer's bandwidth is probably pretty cheap, especially in the
form of zombies, third-world hosting sites, and the like.


But there's stuff more valuable to spammers (sand their hosting
providers) than bandwidth.

Namely:  connectivity.

I watched the Savvis story unfold from NANAE.  That's the one where a
bunch of internal docs leaked saying "well, we _could_ take blanket
aggressive action against the DNSBL and SPEWS types, but it might look
bad, and the only reason we're considering it is because what they do
really _does_ affect us."   The impact was sufficient that Savvis was
willing to walk from some several millions of dollars of revenues from
spammers.  That's impact.

    Savvis outed as big-time spam host | The Register
    http://www.theregister.com/2004/09/09/savvis_spam_canned/



I've posted here several times on my own work with generating
aggregate-level stats of spam volume by network of origin.  Currently I
can do this by both ASN and CIDR, via the asn.routeviews.org DNS server.
I've got a few interesting things to report.


First:  SpamAssassin will be impelementing some form of ASN/CIDR scoring
in a near future release.  In its simplest form, this means that the
network-of-origin will be determined and an overall spaminess/haminess
rating for it computed (and likely a volume metric as well).  This all
pretty much just falls out of creating a token and letting the Bayesian
classifier go to town on it.


Second:  reporting of such stats may be of some utility in getting
networks to shape up.  While my top contributor, KORNET, has held its
first-place ranking for the nine months I've watched, several other
players (notably Telstra and SBC) have entered and exited the top five
slots.  I don't know if it's me, but it's pretty clear that if you can
be readily classified and identified as a spamhaus, _and_ you have
legitimate business interests at odds with that moniker, you might want
to fix the problem.


Third:  solutions of this sort are on the "works locally, great
globally" basis.  You can solve the spam problem for yourself by
blocking / throttling / limiting / hashcashing / cowtipping all traffic
from a set of addresses known to be spammy.  Drop the packets, flood 'em
with transmit errors, whatever.  But your nets become beautifully clear
of the problem.  Without waiting for the rest of the world to take a
cluetrain delivery.

...but if the rest of the world *does* catch a clue, you've got some
wildly cool synergy going.  The spammy net suddently finds themselves on
the short side of the Metcalfe effect, as their points of presence fall.
Accelerating slide.  First they lose their legit customers.  Eventually,
even the spammers (remember, the guys whose margins are in the
fractional mills per email) consider them a bad investment.



 
> Sites that can afford the bandwidth and disk space should avoid
> SMTP-time rejection, especially if it would reveal site-specific
> spam-filtering information such as spamtrap addresses.

IMVAO there's far better ways to spend your time, money, bandwidth, and
disk than this particular battle.



Peace.

--------------------
Notes:

1.  Well, OK.  Apparently someone in the SA dev team decided that AV
    notification spam was among the most vile things on Earth, and
    ranked 'em ~100 or so.  But I still see a lot of high-scoring spam.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    The nearer the bone, the sweeter the meat.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20040924/7e5ae82f/attachment.pgp 


More information about the linux-elitists mailing list