[linux-elitists] reputation systems, the FUSSP (was Re: Spam filters)
Karsten M. Self
karsten at linuxmafia.com
Thu Apr 2 15:30:24 PDT 2009
on Mon, Mar 30, 2009 at 12:32:13AM -0400, Gerald Oskoboiny (gerald at impressive.net) wrote:
> * Rick Moen <rick at linuxmafia.com> [2009-03-27 09:33-0700]
> > Quoting Gerald Oskoboiny (gerald at impressive.net):
> > > Reputation systems.
> > Have you noticed that, when you say eminently sensible things like
> > "Reputation systems", there's always some clown whose spinal reflex
> > forces him/her to say "But [reputation systems or whatever it was you
> > mentioned] are not the ultimate, final, single solution to spam!",
> I thought about submitting a paper to the spam conference last
> week entitled "Reputation Systems: the Final, Ultimate Solution
> to the Spam Problem" but didn't get around to writing it in time.
> (for those not aware, FUSSP is the classic term used to mock
> people who stupidly think they know how to solve the spam problem.
> Except in this case I TOTALLY DO.)
> I started writing something about this a while ago (yikes, almost
> 4 years ago) but it was a bit too unfocused and I never managed
> to finish it. Here is my draft in progress, where "in progress"
> means ignoring it except for adding a link every few months when
> I see related software or news:
> I'll try to spend more time on this soon. Maybe I should try
> blogging about it in smaller pieces or something.
Noted among inspirations: k5's mojo (and successors?)
Hey! I inspired something (I contributed some of the ideas behind
K5/Scoop's initial moderation/mojo system).
As most people who've made such attempts, I learned a few things in the
process, including the multiple ways in which trust systems can break
down. On whole it was a moderately successful approach, only partially
implemented. What I was really hoping was that a few additional things
would be incorporated.
As a recap, the Scoop (CMS engine) moderation system used by Kuro5hin
(a/k/a K5, the website, also notably the Daily Koz), was intended as a
one-up on Slashdot's incremental moderation. The failure mode of
Slashdot was that there was no convergence of moderation -- any
addtional moderating of a post moved the post's score by a full point up
or down. As a consequence the range had to be arbitrarially capped.
K5 mojo used a Likert scale -- rate a comment over some range (e.g.:
1-5). "Trusted" users got a '0' option as well. The comment score was
the arithmetic mean of the moderations. I wanted there to be data on
the number of moderations and standard deviation (level of disagreement)
as well. A post twice moderated, one getting [3,3], the other [1,5],
will have a median of 3, but one has a standard deviation of 0 (perfect
agreement among moderators), the other 2.8. This could be used to show,
if nothing else, controversiality of a post.
I actually anticipated most of the downsides of explicit moderation, the
biggest of which is that it requires volitional action, which for large
volumes of posts becomes a drag in itself. Generally the system works
well but there are enough edge cases that it starts to fail. What I saw
(and what's even more true now on social networking sites) is that these
are essentially huge preferences-and-actions gathering data systems,
should anyone have the time, energy, and intelligence to capture and
analyze the data usefully. Actually, sorta scary.
A follow-on was my work with identifying email (and spam) sources by ASN
and CIDR (by way of the Routeviews project reverse-IP lookup -- ASN and
CIDR are returned as TXT fields), documented in my paper "CIDR House
Rules". The upshot of which is that "house rules" could be defined, as
suited a particular house (any organizational unit controlling its own
Internet connections) concerning traffic based on
desireable/undesireable aspects of traffic aggregated by orgination
point. CIDR and ASN are useful as they also correspond to defineable
organizational units, with their own policy and mechanisms for enforcing
(or not) good network practices, or what I call "network hiegene".
My experience at the time was that "good" and "bad" connections were
overwhelmingly concentrated in a very small number of areas, and by
extrapolation, most organizations would see similar trends:
- A small number of sources from which the vast bulk of "good" traffic
- A small number of sources from which the vast bulk of "bad" traffic
- Many sources from which smaller (and effectively negligible) volumes
of "good" and "bad" traffic originate.
In my paper, I noted that one quarter of all spam originated from fewer than
four ASNs, and half from generally 10-20 ASNs. I didn't do a similar
analysis of "good" traffic though in my case what was of interest was
mail from a dozen or so mailing lists and a few score regular
correspondents. My (large and largely out-of-date) mutt aliases file
runs 880 lines or so. That's remarkably smaller than the total online
population; and while it's not everyone I communicate with, it's the
people I've communicated with over the past decade or so whom I've felt
sufficiently motivated to add their address to this list.
Since the bulk of traffic is immediately classifiable at the connection
level (with those classifications being updated on an ongoing basis as
both good and bad patterns change), cheap, line-speed countermeasures,
to the extent of just rate-limiting packets (as opposed to outright
denying them) should result in largely effective countermeasures. If
widely deployed, even more so, and the aquisition of firms such as
IronPort by Cisco remains a promising development. I'd like to see
smarter routers.... "Good" data of course mostly gets a pass (though it
might hit some sanity checks as well). I see a mix of greymilter,
rate-limiting, and outright blocks (instituted for varying periods of
time) as the likely fix here.
For the "negligible" remaining volume, more traditional content and
contextual filtering would be necessary. Aggregation of email into
large providers (gmail, hotmail, yahoo, aol, etc.) means some of these
can't be presumed all good or bad (they get spammers too), though
outbound controls should help here as well.
Interesting space. Watch it.
Karsten M. Self <karsten at linuxmafia.com> http://linuxmafia.com/~karsten
Ceterum censeo, Caldera delenda est.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 191 bytes
Desc: Digital signature
More information about the linux-elitists