[linux-elitists] Spam filters

Shot (Piotr Szotkowski) shot at hot.pl
Wed Mar 25 05:55:29 PDT 2009


Jason White:

> I'm experimenting again with CRM114 at the moment. The accuracy 
> doesn't meet my expectations after a couple of weeks of training,
> so I think I'll change to a different classifier and try again.

How do you train it?

I’m currently running Bogofilter with the default cutoff settings (at 
45% and 99% – everything below the spamicity of 0.45 is ham, anything 
above 0.99 is spam and anything in between is unsure), I didn’t train
it on anything upfront (so the first couple of emails were all unsures) 
and I *only* train it on unsures and errors (so I don’t train it on its 
own guesses when it’s right).

I’ve been doing this successfully for a couple of years before, and 
started from scratch early February (after migration from BDB to 
SQLite). After a couple of weeks I’ve yet to see a false positive,
I saw two or three false negatives, and after this couple of weeks
the unsures are just a few a day (for about 200 spams a day).

— Shot
-- 
I have a fever and it’s not disco related. [opi]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://allium.zgp.org/pipermail/linux-elitists/attachments/20090325/1b1a675d/attachment.pgp>


More information about the linux-elitists mailing list