[linux-elitists] paul graham on spam

Karsten M. Self kmself@ix.netcom.com
Thu Apr 1 12:14:09 PST 2004

on Wed, Mar 31, 2004 at 04:15:19PM +1000, Ben Finney (ben@benfinney.id.au) wrote:
> On 30-Mar-2004, Modus Operandi wrote:
> > Of particular interest is the idea of spidering spammers' sites in
> > order to hammer their servers [...]
> > > If widely used, auto-retrieving spam filters would make the email
> > > system rebound.
> Any system that requires "If widely used" for its effects to be felt,
> needs to address the problem of how to *encourage* wide use.

"If widely used" here is probably ~ 0.1%-1%, or fewer, systems
Internet-wide.  Spam is based on low percentage returns, and increasing
interest by a few orders of magnitude would raise costs significantly.

> > > Pump out a million emails an hour, get a million hits an hour on
> > > your servers.
> I don't see the author offering any benefit to early adopters of this
> technique: receive the unwanted bandwidth of spam, and suck down an
> entire site worth of *more*unwanted bandwidth.

Depends.  If the early adopter is a major ISP, or even a fairly large
educational institution, there are some gains.  Though I do accept your
point.  It's similar to one of my own complaints against SPF.

> > similar in effect to teergrubing
> Nope.  Teergrubing punishes the very act of sending spam, with no
> extra bandwidth cost for the recipient (admittedly, there's a
> process-denial cost, but that's far easier to bear).

Somewhat agreed, though the costs are IMO disporportionately on the
spammer (or the spammer's trojaned website host).

There are a few problems with the FFB scenario, some generally
recognized, some discussed on news.admin.net-abuse.email, some somewhat

  - There's the obvious Joe-job potential.  This can be mitigated by
    limiting responses to known spammer havens.  But only by greatly
    increasing the complexity of the problem.  That isn't to say it's
    greatly complex.  Just that it's not bog-simple.

  - There's the non-obvious Joe-job potential.  Spammers have taken to
    breaking up spam keywords with bits of junk HTML.  Well, that's
    pretty easy to spot.  So they switched to anchors.  Words are split
    like:  via<a href="http://slashdot.org/"></a>gra.  You've hit three
    birds with one stone:  you've broken a keyword, you've dropped in
    what might Bayes poison, *and* added a potential Joe-job target.
    This can also be caught if you look at anchors with null (or null
    visible) text, but again you're moving from bog simple.

  - There's the bootstrapping problem Ben alludes to.

  - There's the bandwidth suckage at the behest of spammers.  Depending
    on how your system is configured.  Essentially you're allowing
    someone to request arbitrary bandwidth from you via email request.

These _aren't_ fatal flaws, they can be overcome.  But they make the
problem nontrivial.

One modification of this technique which could be useful would be a
response to phishing sites.  Rather than trying to wipe out the sites,
it would be IMO highly beneficial to seed them with:

  - Bogus fabricated data based on realistic distributions of names,
    age, sex, address, SSN, etc.

  - Tracer accounts.  Accounts known to banks and credit card agencies
    to be bogus, and whose activity is instantly traceable (yes,
    Virginia, the capability to do this exists through existing neural
    net monitoring systems).  Such tracker cards could be used to trace
    down and arrest users of stolen CCs immediately, or to track trends
    and patterns of behavior over time, possibly building a copious
    dossier on the users.  Or both.

I feel that spam is going to require a multimodal approach:

  - Strong authentication tokens (GPG/PGP).

  - Other sender-based whitelisting.

  - Virus filtering.

  - Known spam source filtering (SORBS, SpamHaus, SpamCop).

  - SMTP server reputation based filtering.  Known spam-associated
    sources (SPEWS, ASNs).  Whitelisting of known good SMTP servers.
    Teergrubing of known spam source by IP, SPEWS, ASN.

  - Content-based filtering (Bayes, rules-based filters).

  - Active response:  Spam poisoning.  Bogus email addresses.  Active
    response to known spam sites (call it "invited" DDoS), crapflooding
    of phishing sites.

  - Legal actions against known spammers & spam hosters, to the extent

My own recent research suggests that something akin to the Usenet Death
Penalty is very much in order for networks failing to keep spam in
control.  I've been tracking spam by ASN (and reporting on it here
periodically) for the past several months.  I'm running my montly
reports right now, and am including comparative top-20 results for
January, February, and March, 2004.

My results show that a small number of networks are responsible for a
grossly disproportionate share of spam.  The major contributors are
Korean and Chinese sources, and US broadband providers.  I'm told that
my ISP filters Comcast heavily which may skew data significantly.  In
all cases, the top ten ASN sources account for ~40% of all spam
received.  The main trends are reduced query timeouts (I've increased
both wait and retry on my rDNS queries) and the AT&T WorldNet Services's
disappearance from the top ten.  Timeouts tend to follow overall
distribution patterns and don't seem to skew overall results.

    January, 2004
    Total spams: 2336
    Total ASNs:  440

    Rank  Cum %   Pct  Spams  ASN     Description
    ----  -----   ---- -----  -----   -------------
       1  17.0%  17.0%   396  Message  Query timed out
       2  24.8%   7.8%   183  4766    KORnet Powered BY Korea Telecom
       3  28.0%   3.2%    75  7132    SBC Internet Services
       4  31.0%   3.0%    70  6478    AT&T WorldNet Services
       5  33.9%   3.0%    69  9318    : HANARO Telecom
       6  35.7%   1.8%    42  3462    Chunghwa Telecom Co., Ltd.
       7  37.2%   1.5%    34  3215    RIPE NCC ASN block
       8  38.6%   1.4%    32  1221    TELSTRA-AS
       9  39.8%   1.2%    29  4134    China Telecom
      10  41.0%   1.2%    28  3786    DACOM Corporation in Seoul, Korea

    February, 2004
    Total spams: 4025
    Total ASNs:  590

    Rank  Cum %   Pct  Spams  ASN     Description
    ----  -----   ---- -----  -----   -------------
       1  14.8%  14.8%   597  4766    KORnet Powered BY Korea Telecom
       2  20.3%   5.4%   219  n/a     Query timed out
       3  24.4%   4.1%   166  9318    HANARO Telecom
       4  28.1%   3.7%   150  7132    SBC Internet Services
       5  30.4%   2.3%    92  4134    China Telecom
       6  32.5%   2.1%    86  6478    AT&T WorldNet Services 
       7  34.6%   2.1%    84  9277    THRUNET
       8  36.6%   2.0%    81  4813    China Telecom GUANGDONG PROVINCE BACKBONE NETWORK
       9  38.5%   1.8%    74  3462    Chunghwa Telecom Co., Ltd.
      10  40.1%   1.6%    64  1221    TELSTRA-AS

    March, 2004
    Total spams: 4866
    Total ASNs:  686

    Rank  Cum %   Pct  Spams  ASN     Description
    ----  -----   ---- -----  -----   -------------
       1  17.4%  17.4%   846  4766    KORnet Powered BY Korea Telecom
       2  21.4%   4.0%   196  7132    SBC Internet Services
       3  25.4%   4.0%   193  9318    HANARO Telecom
       4  28.7%   3.3%   162  n/a     Query timed out
       5  31.4%   2.7%   132  4813    China Telecom GUANGDONG PROVINCE BACKBONE NETWORK
       6  33.8%   2.3%   114  4134    China Telecom
       7  35.3%   1.5%    74  3352    Internet Access Network of TDE
       8  36.8%   1.5%    74  3462    Chunghwa Telecom Co., Ltd.
       9  38.3%   1.5%    72  9277    THRUNET
      10  39.7%   1.4%    69  1221    TELSTRA-AS


Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    Save Bob Edwards!  http://www.savebobedwards.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20040401/3089e3e8/attachment.pgp 

More information about the linux-elitists mailing list