Don Marti

Mon 14 Dec 2009 10:17:08 AM PST

Correct HTTP status code for removed spam pages?

Bug thread on Launchpad about cleaning up a profile page spam problem there: there goes the neighbourhood

So, what's the right HTTP status code to return when someone requests a page from your site that had been spam, but that you removed?

In my humble opinion, if a spammer used a URL on your site, you should not re-use that URL for legit content for a long time. The recipient of a spam link could be checking the target long after you get around to cleaning up the spam. (For example, I'm experimenting with heuristics for marking domains as permanently bad based on whether they're still serving spamvertised pages as 200, hours or days after the spam went out.)

If a user takes a legit page down, that user might decide to put it back up. (Joe remembered that his Grandma follows him on your site, so he took down http://example.com/~joe/vacation-photos/ to change it around, then put it back up again without those photos.) That's an appropriate use for a 404.

If you took a spam page down, then it's probably going to have to be either "410 Gone" or "403 Forbidden." I like "403 Forbidden" since removing a spam page sounds like this: The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. But you could make a case for 410, too. The important thing is it should be something that signals, "yes, we had a spam problem, but we dealt with it," not "we're spammers," or "we're people who don't maintain our web site."

410 seems a better fit than 403

I think 410 is probably better than 403 in this case.

403 Forbidden feels like it's saying "I could serve you this, but for reasons beyond your control, I'm choosing not to". Often, a 403 gets fixed by e-mailing the webmaster for the site in question, who works out which file needs a chmod.

410 Gone (which is rarer) has a much more permanent feel to it - the server is saying that this resource has Gone away, and won't ever be coming back. The fact that I've never seen a 410 that's fixable by e-mailing the webmaster in real life helps cement that feeling of permanence.

Comment by Anonymous Tue 15 Dec 2009 08:44:31 AM PST