[linux-elitists] Mail indexing, Beagle-styles

Phil Mayers p.mayers@imperial.ac.uk
Tue Jan 25 02:00:52 PST 2005

On Tue, 2005-01-25 at 09:13 +0100, Eugen Leitl wrote:
> On Tue, Jan 25, 2005 at 12:21:02AM +0000, Phil Mayers wrote:
> > Interestingly (sadly) a lot of these search-based email systems (Zoe, 
> > Gmail) eschew folders completely, undoubtedly for complexity reasons.  
> They have an equivalent, though: labels.

Hmm. Maybe. I don't know. They're not hierarchial for one thing (I've
got about 400 folders, with a top-level of just 5, nested 3/4 deep) so I
ended up with a large, linear list of labels. They seemed somehow
cumbersome (if I recall, by default labelling a mail didn't remove it
from your "inbox" view). And of course they don't interact with POP at
all (and IMAP ain't to be, apparently) although I confess I haven't
checked to see if the labels make it into the 822 headers.

I did also find that, good as it was, the Gmail app wasn't quite optimal
for managing the sheer volume of mail I get. Like I say, changes in
fundamental working practices (since I use my INBOX as a virtual in-tray
- if it isn't in the INBOX, it doesn't exist as far as work is
concerned, and when I've completed I file it) seemed required.

And of course, fundamentally leaving all my work-critical email with
Google isn't that desirable I feel. Personal mail I have no objection to
and I do use it for that (as an inducement to avoid logging into work
and doing work when I'm on my time). The notion of a self-managed
personal information server appeals better. 

Back on track, I span up Zoe last night. Aside from the fact it's Java,
and thus relies on a non-free VM, I can't seem to determine what license
it's under.

Having said that, I was quite impressed with it, though I certainly
won't be using it day to day. It took about ~20 minutes to index my
complete Cyrus mailbox hierarchy over IMAP to localhost; which is pretty
respectable given the volume of the mail.

It seemed slightly buggy. There were various null pointer exceptions
while indexing, and out of memories, but given the volume and nature of
mail, again not surprising (I have a whole folder of malformed messages
for testing purposes).

The local web interface had somewhat poor message display - line widths
and coalescing adjacent whitespace as a result of poor msg->html
conversion (<pre> anyone) - this kills plaintext emails with e.g. ascii
diagrams, which I use a lot explaining stuff. Also the through-the-web
message composer was relatively primitive, but I guess you're not really
supposed to be using that except as backup.

I got the impression the web UI is really just for searching. It wasn't
so great for reading mail (the default display is "Today" and I couldn't
find a way of displaying "this week" or "last 512 messages"; though you
could probably do it with a search the documentation is, erm, sparse!)

Clicking on a message subject didn't show me the thread, as expected.
Not sure why.

Searching definitely worked, though again the sheer volume of email I
have (including list email) seemed to overwhelm it. I got plenty of
"16,223 hits, displaying the first 200" which would be e.g. archived
cron output thus useless.

It does some nice indexing of mail by Organisation: header and others;
it'll correlate all the people sending from a given domain (show me all
"Imperial College" mails i.e. with imperial.ac.uk in the Sender)

The links to all these cross-references searches are apparently blog-
style permalinks, which I could see being useful (indeed, the included
OSX Mail.app plugin works I believe by hitting those links)

I couldn't get the RSS *output* (syndicate your new mail as RSS) to
display - the feed was blank in Sage+Firefox

There are certainly some very good ideas there, and I'll be looking at
it in the future. However, it seemed to suffer from a similar problem
that Gmail did, which was managing the relatively large quantity of
structured and/or semi-structured mail I get (cron job outputs, LogWatch
messages, etc.) seemed to "poison" the index and generate spurious hits.

Interesting and clever stuff though.

One thought I did have; what's to stop the Evolution Data Server (which
presumably does the indexing, if you're running Beagle) running
*forever* in the background...

More information about the linux-elitists mailing list