[linux-elitists] (tmda) Re: Constraining Bogus challenges.

Matt Beland matt@rearviewmirror.org
Fri Oct 3 13:30:46 PDT 2003

On Friday 03 October 2003 01:14 pm, Aaron Lehmann wrote:
> By the way, what does it take for the stock configuration to mark
> something as spam?
> I get some very nasty recurring spams which get these results:
> X-Spam-Status: No, hits=4.7 required=5.0
>         tests=BAYES_99,DATE_IN_FUTURE_03_06,HTML_50_60,MIME_HTML_ONLY,
> Something tells me the Bayes weights are way too low by default.
> bogofilter would have thrown that very quickly. However, I never get
> false positives and don't want to start now. Has anyone found a good
> balance?

I'd suggest that if you've got a well populated Bayes database (>5k messages 
as Spam and Ham) and you're not seeing Bayes hits on legit messages, it'd be 
safe to bump up the scores - just be gentle with it. The above example would 
have been marked as spam with only .3 more points, so just give BAYES_99 an 
additional .5 or so and see how it works.

Don't forget to use "sa-learn --spam" on the false negatives, although in this 
case that wouldn't have helped - Bayes was already triggering on the message. 
Still, can't hurt, might help. I whipped up a simple script that sits on the 
webmail interface to the main SA-using system I have; users can submit 
messages to sa-learn as either Spam or Ham just by clicking the appropriate 
button. (I'd share it, but it's both embarrassingly simple and hard-coded for 
the inherited custom webmail interface.)

Another possibility, if it's just a few messages that are recurring 
frequently, is to create a rule to match those messages - some phrase that 
they're typically using, that sort of thing. All can be done from the main 
config file - see the documentation.

