[linux-elitists] web server software for tarpitting?

Karsten M. Self karsten@linuxmafia.com
Tue Feb 12 13:21:28 PST 2008


on Tue, Feb 12, 2008 at 10:33:17AM -0800, Gerald Oskoboiny (gerald@impressive.net) wrote:
> * Evan Prodromou <evan@prodromou.name> [2008-02-12 12:37-0500]
> >On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> >> The other day we posted an article [1] about excessive traffic
> >> for DTD files on www.w3.org: up to 130 million requests/day, with
> >> some IP addresses re-requesting the same files thousands of times
> >> per day. (up to 300k times/day, rarely)
> >>
> >> The article goes into more details for those interested, but the
> >> solution I'm thinking will work best (suggested by Don Marti
> >> among others) is to tarpit the offenders.
> >
> >...and not punish everybody else, right?
> 
> Right, just punish those who are abusive.

<...>

> I'd expect most medium/large sites have some kind of defensive
> measures in place to deal with abuse. Google and Wikipedia block all
> access from generic user-agents like Java/x and Python-urllib/x.

Just out of curiosity, what's the user-agent distribution on this
traffic?

  - Are we dealing with poorly-written end-user software (browsers and
    the like)?  Are these typically proprietary or Free Software?  Are
    you getting hammered on account of legacy proprietary software,
    non-standards-compliant FS tools?

  - Or are you getting slammed by spiders, custom agents, and possibly
    more nefarious tools (including possibly viruses)?  A mix of a
    robots.txt and agent-related IP blocking might come into play.

... and of course, there's always a salatious interest in seeing names
named.

Also:  What regions / ASNs / CIDRs seem to be the most prevalent among
abuse sources, does the distribution differ markedly from putatively
legitimate traffic, and are any specific sources markedly more abusive
than the rest?  If you're seeing a lot of data coming from bogon or
non-active space, broad-based (entire subnet) blocking or tarpitting
might be particularly effective.  I've used the Routeviews data
(DNS-queriable, zonefiles downloadable for local use,
http://www.routeviews.org) to identify the CIDR and ASN of spam sources
and found over the period 2003-2005 that the top four ASNs (which varied
over time) typically accounted for a quarter or more of all spam.

Even if you don't find significant concentrations, implementing broad
(CIDR/ASN-based) tarpitting might help you get ISP cooperation in
egress-filetering their own networks.

I don't see an easy solution -- you've sort of made your bed here with
the current DTD reference schema, and we can assume that technology
*will* be poorly implemented.  Might make sense to put DTDs into another
space other than http.  DNS, for example, *is* naturally a cache-and-
forward system, and it's been abused for all sorts of stuff (see Dan
Kaminsky's shenanigans).  While it's not a perfect solution, it's got
several elements which could be made useful and might want to be
considered for future DTD definitions and distribution of same.


Peace.

-- 
Karsten M. Self <karsten@linuxmafia.com>        http://linuxmafia.com/~karsten
    Ceterum censeo, Caldera delenda est.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: Digital signature
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20080212/b5bd5ef5/attachment.pgp 


More information about the linux-elitists mailing list