[linux-elitists] Spam filters
Shot (Piotr Szotkowski)
shot at hot.pl
Wed Mar 25 05:55:29 PDT 2009
> I'm experimenting again with CRM114 at the moment. The accuracy
> doesn't meet my expectations after a couple of weeks of training,
> so I think I'll change to a different classifier and try again.
How do you train it?
I’m currently running Bogofilter with the default cutoff settings (at
45% and 99% – everything below the spamicity of 0.45 is ham, anything
above 0.99 is spam and anything in between is unsure), I didn’t train
it on anything upfront (so the first couple of emails were all unsures)
and I *only* train it on unsures and errors (so I don’t train it on its
own guesses when it’s right).
I’ve been doing this successfully for a couple of years before, and
started from scratch early February (after migration from BDB to
SQLite). After a couple of weeks I’ve yet to see a false positive,
I saw two or three false negatives, and after this couple of weeks
the unsures are just a few a day (for about 200 spams a day).
I have a fever and it’s not disco related. [opi]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: Digital signature
More information about the linux-elitists