[linux-elitists] web server software for tarpitting?
Evan Prodromou
evan@prodromou.name
Tue Feb 12 09:37:27 PST 2008
On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> The other day we posted an article [1] about excessive traffic
> for DTD files on www.w3.org: up to 130 million requests/day, with
> some IP addresses re-requesting the same files thousands of times
> per day. (up to 300k times/day, rarely)
>
> The article goes into more details for those interested, but the
> solution I'm thinking will work best (suggested by Don Marti
> among others) is to tarpit the offenders.
...and not punish everybody else, right?
> W3C's current traffic is something like:
>
> - 66% DTD/schema files (.dtd/ent/mod/xsd)
> - 25% valid HTML/CSS/WAI icons
> - 9% other
It sounds like W3C has been having a problem satisfying its promises,
then. When you publicize an URL, like a DTD or schema, you're giving
some tacit permission to use that URL. Why are you now trying to
penalize those people who actually bought the story and are using the
URL?
It seems to me the way to solve your problem is to:
1. Clarify and publicize best practises for using W3C resources
into a server use policy. How often is it OK to hit a W3C-hosted
DTD? Once a day? Once an hour? Once a minute?
2. For absolutely terrible bad-behavers, block them by IP number --
or return a brief-as-possible HTTP 403 response with a link to
your server use policy . It sounds like a quick way to cut down
on your traffic and save some headaches.
3. Build a content-distribution network (CDN) to free up your
servers for the important stuff. You could either pony up the
cash for a commercial CDN, or you could use W3C's goodwill in
the Web community to put together a free and informal system of
mirrors.
The whole tarpit thing sounds too smart by half. I think a more direct
approach is more ethical, and also sets a good example for other Web
publishers.
-Evan
--
Evan Prodromou <evan@prodromou.name>
More information about the linux-elitists
mailing list