[linux-elitists] web server software for tarpitting?

Evan Prodromou evan@prodromou.name
Tue Feb 12 09:37:27 PST 2008


On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> The other day we posted an article [1] about excessive traffic
> for DTD files on www.w3.org: up to 130 million requests/day, with
> some IP addresses re-requesting the same files thousands of times
> per day. (up to 300k times/day, rarely)
> 
> The article goes into more details for those interested, but the
> solution I'm thinking will work best (suggested by Don Marti
> among others) is to tarpit the offenders.

...and not punish everybody else, right?

>      W3C's current traffic is something like:
> 
>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
>        - 25% valid HTML/CSS/WAI icons
>        - 9% other

It sounds like W3C has been having a problem satisfying its promises,
then. When you publicize an URL, like a DTD or schema, you're giving
some tacit permission to use that URL. Why are you now trying to
penalize those people who actually bought the story and are using the
URL?

It seems to me the way to solve your problem is to:

     1. Clarify and publicize best practises for using W3C resources
        into a server use policy. How often is it OK to hit a W3C-hosted
        DTD? Once a day? Once an hour? Once a minute?
     2. For absolutely terrible bad-behavers, block them by IP number --
        or return a brief-as-possible HTTP 403 response with a link to
        your server use policy . It sounds like a quick way to cut down
        on your traffic and save some headaches.
     3. Build a content-distribution network (CDN) to free up your
        servers for the important stuff. You could either pony up the
        cash for a commercial CDN, or you could use W3C's goodwill in
        the Web community to put together a free and informal system of
        mirrors.

The whole tarpit thing sounds too smart by half. I think a more direct
approach is more ethical, and also sets a good example for other Web
publishers.

-Evan

-- 
Evan Prodromou <evan@prodromou.name>


More information about the linux-elitists mailing list