[linux-elitists] web server software for tarpitting?

Aaron Sherman ajs@ajs.com
Tue Feb 12 11:07:26 PST 2008

Evan Prodromou wrote:
> On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
>>      W3C's current traffic is something like:
>>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
>>        - 25% valid HTML/CSS/WAI icons
>>        - 9% other
> It sounds like W3C has been having a problem satisfying its promises,
> then. When you publicize an URL, like a DTD or schema, you're giving
> some tacit permission to use that URL. Why are you now trying to
> penalize those people who actually bought the story and are using the
> URL?

"Using" is the term in question here. We're not talking about 
constructive access to these URLs, but poorly implemented software that 
is resorting to polling W3C where it should not. Well behaved software 
does not do this. Why should W3C foot the bill for poorly behaved software?

Making access to W3C degrade performance of poorly written software is a 
fine way to deal with this. Such software can be trivially fixed to 
avoid the degradation.

> It seems to me the way to solve your problem is to:
>      1. Clarify and publicize best practises for using W3C resources
>         into a server use policy. How often is it OK to hit a W3C-hosted
>         DTD? Once a day? Once an hour? Once a minute?

Typically, it should be almost never. There's no reason for an app that 
actually needs the DTDs not to have a copy of them handy for local 
access. Checking to see if the DTD has changed once a week or month 
might be reasonable, using a HEAD method HTTP call. Beyond that, there's 
simply nothing in the DTDs that needs to ever change on a day-by-day 
basis. If, by some oddity, a security problem should involve the way a 
DTD is handled, the app should be the one pushing out the fix, not the 
W3C. Anything else can be fixed on a much longer-term basis.

>      2. For absolutely terrible bad-behavers, block them by IP number --
>         or return a brief-as-possible HTTP 403 response with a link to
>         your server use policy . It sounds like a quick way to cut down
>         on your traffic and save some headaches.

Tar-pitting hurts end-users less. Having my app take an extra 30 seconds 
to do something is far less damaging in many cases than having it fall 
over (which is what many such apps will do if they can't access these 
DTDs, which is why we call them poorly written apps). I'm thinking of 
things like reservation systems here, where a hotel might be hurt by 
having their reservation GUI slow down, but they'd be crippled by having 
it simply stop.

>      3. Build a content-distribution network (CDN) to free up your
>         servers for the important stuff. You could either pony up the
>         cash for a commercial CDN, or you could use W3C's goodwill in
>         the Web community to put together a free and informal system of
>         mirrors.

It's easy to arm-wave and demand that a faceless organization spend 
money on your behalf. In truth, we're talking about the W3C, and 
organization that is at the heart of defining one of the most important 
tools of human communications of the previous millennium, and almost 
certainly of the current one. Addressing early the issues of economic 
and infrastructural scaling is in, literally, all of humanity's best 
interest. That's not something you get to say without hyperbole often.

More information about the linux-elitists mailing list