[linux-elitists] web server software for tarpitting?
Wed Feb 20 16:05:08 PST 2008
On Tue, 12 Feb 2008 14:07:26 -0500, Aaron Sherman wrote:
> Evan Prodromou wrote:
>> On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
>>> W3C's current traffic is something like:
>>> - 66% DTD/schema files (.dtd/ent/mod/xsd) - 25% valid
>>> HTML/CSS/WAI icons
>>> - 9% other
> "Using" is the term in question here. We're not talking about
> constructive access to these URLs, but poorly implemented software that
> is resorting to polling W3C where it should not. Well behaved software
> does not do this. Why should W3C foot the bill for poorly behaved
Poorly implemented software that isn't a browser, and maybe
compensating for the lack of a local implementation of support software.
Some DTD's can be organized on a local machine (server, workstations,
laptop, Wns, or GNU/Linux) by the use of Catalogs. If the catalog data
isn't constructed or if it doesn't exist, then the standard is to use the
url inside of the DTD/Schema/Rng to validate the xml file.
> Making access to W3C degrade performance of poorly written software is a
> fine way to deal with this. Such software can be trivially fixed to
> avoid the degradation.
No it can't. Before "trivally fixing the software", it should be a
requirment to fix the standards as defined by W3C. If one defines the DTD
for XHTML as authortative in one (and only one) URI, then there is only
one logical location that can provide authortative answers.
in many ways this is a similar issue, that evolved DNS from host files.
[ Delegated Authority in Domains, with localized cached data copies]
the solutions are to fix the standards, and then focus on allowing
software to match the standards. DTD validation has been around for a
long time. Quick fixes will generate other problems that can be worse
than the problem it was intended to fix
>> It seems to me the way to solve your problem is to:
>> 1. Clarify and publicize best practises for using W3C resources
>> into a server use policy. How often is it OK to hit a
>> W3C-hosted DTD? Once a day? Once an hour? Once a minute?
> Typically, it should be almost never. There's no reason for an app that
> actually needs the DTDs not to have a copy of them handy for local
> access. Checking to see if the DTD has changed once a week or month
> might be reasonable, using a HEAD method HTTP call. Beyond that, there's
> simply nothing in the DTDs that needs to ever change on a day-by-day
> basis. If, by some oddity, a security problem should involve the way a
> DTD is handled, the app should be the one pushing out the fix, not the
> W3C. Anything else can be fixed on a much longer-term basis.
aha. Now we have the idea that validation must be done, where the DTD/
Schema/RNG is local to the server and must pre-exist before being
exercised. Is there any allowance to checksum/verify that both ends of
the transaction are using the same version of the same file?
If one can "fix" webservers to utilize your new internal caching to
track copies of the DTD, then it's a fix for a small part of the problem.
One will need to "fix" browsers, and other XML and SGML engine users who
are using the content of the XML file with no involvment in the
or (simple sarcasm) is the "reasonable fix" to turn off validation?
most web pages specfify a html or xhtml level of validation, that's
enforcable at any interface that processes the data.
>> 2. For absolutely terrible bad-behavers, block them by IP number
>> or return a brief-as-possible HTTP 403 response with a link to
>> your server use policy . It sounds like a quick way to cut down
>> on your traffic and save some headaches.
> Tar-pitting hurts end-users less. Having my app take an extra 30 seconds
> to do something is far less damaging in many cases than having it fall
> over (which is what many such apps will do if they can't access these
> DTDs, which is why we call them poorly written apps). I'm thinking of
> things like reservation systems here, where a hotel might be hurt by
> having their reservation GUI slow down, but they'd be crippled by having
> it simply stop.
I understand the hope/desire behind this type of solution. Such a
"Solution" is effective only when it hits close to someone who has a
degree of control over a remedy. Having a browser stop working, because
a Hotel proxy doesn't cache DTD's is a solution that create a need to
resolve the issue.
Having some corporate Email Server run slower because the choice of
email provider software doesn't play nice, may be reasonable if and only
if there is an alternative that doesn't incur the penalty
More information about the linux-elitists