[linux-elitists] Ledger: org mode for double-entry bookkeeping

Nathaniel Smith njs at pobox.com
Mon Dec 10 15:36:27 PST 2012

On Mon, Dec 10, 2012 at 4:51 PM, Don Marti <dmarti at zgp.org> wrote:
> begin Nathaniel Smith quotation of Fri, Dec 07, 2012 at 08:45:42PM +0000:
>> If I wanted to be an engineer when I grew up, then I'd convince
>> someone to pay me to write a sort of bastard child of sqlite and git,
>> where you could write a description of complex data structures and it
>> would implement
>> chained-hash-synchronization/delta-compressed-storage/provably-good-merge-algorithms,
>> with a clean API to expose conflicts to the app as structured
>> first-class entities and other niceties useful for synchronizing apps
>> (e.g., pruning history, in case you don't need 15 years of edits to
>> your mail store or calendar). Couchdb kind of wants to be this, but
>> last I checked couchdb doesn't even record merges in its internal
>> history graph, so it's doomed.
> Good idea.  Making SQLite the default data file
> format for applications is nice as long as you don't
> do anything interesting with those data files such
> as try to version, merge, or collaborate on them.
> Is there a good simple command-line tool for merging
> tree-structured files such as XML or JSON?  One that
> could be used as a git mergetool?

I don't know of any. I'm not sure how great a generic tool can work
either, since semantics matter -- is it meaningful for subtrees to
move from one place to another? How do you define which subtrees in
the old and new documents map to each other? (In many realistic apps
the answer is to just assign a unique id to each
message/contact/transaction/whatever and match on that, which is fast
and accurate; but a generic tool may end up using some diff-like
heuristic instead. And heuristics are dangerous in this business.
Pretty much every merge algorithm that's been analyzed in detail has
turned out to have nasty corner cases where it can get permanently
wedged or silently discard edits; it's like cryptography that way.
Will the tool decide that instead of toggling mail A to read and mail
B to unread, I actually left them both in their original state but
swapped their contents?) That sort of thing.

Also git mergetools only get access to left/right/ancestor versions,
right? Three-way merge intrinsically throws away important
information... there are "criss-cross" cases (whenever there are
multiple minimal common ancestors) where there's no good choice of
base version. You can pick one of the nearby common ancestors, but
that can silently discard edits; or you can use some heuristic to pick
a far-back ancestor (this is what git does) but this will often create
spurious conflicts, and AFAICT no-one has actually proven that any of
the heuristics in use are completely safe. The *-merge algorithm I
linked to before avoids this problem completely, but it needs access
to the whole ancestry graph, and is most efficient if you can stash
ancillary information along with each revision as you go. (There's a
dynamic programming thing where you do one incremental pass over the
whole graph and then can use the resulting data structure to quickly
perform as many arbitrary merges as you like.)


More information about the linux-elitists mailing list