[linux-elitists] My first look at BitKeeper. (fwd)
Thu Mar 13 23:21:43 PST 2003
---------- Forwarded message ----------
Date: Thu, 13 Mar 2003 16:20:53 -0800
From: Adam Rifkin <Adam@KnowNow.com>
To: Rohit Khare <email@example.com>
Cc: Ben Sittler <BSittler@KnowNow.com>, Tommy Hui <thui@KnowNow.com>,
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
Subject: My first look at BitKeeper.
Resent-Date: Thu, 13 Mar 2003 16:33:24 -0800
Resent-From: Rohit Khare <email@example.com>
[Adam found these bits... RK]
BitKeeper itself seems like a really nice versioning system for
distributed development of big projects...
CVS has a single repository model. Each work area is clear text only
which means no revision control in the work area during development.
BitKeeper provides staging areas. You can mimic CVS by having one
master repository and several work areas. You can also extend that to
have one master and several staging areas with several work areas below
each staging area. This allows people working on related projects to
merge amongst themselves before merging into the master. Anyone who has
lived through a change that broke the build can see the value of
Merging in CVS is primitive at best.
Branch management in CVS is a nightmare.
CVS has no change sets, i.e., no atomic commits of changes which span
CVS has no rename support.
CVS was based on RCS and still has RCS' limitations.
On the plus side, CVS is free, works well enough for some development
projects, and CVS repositories are easily converted to BitKeeper.
Perforce maintains state in a database next to the RCS files. In order
for this state to be consistent with the RCS files, you must access the
RCS files only through the Perforce daemon. The database is a single
point of failure; if it gets corrupted, your source management system
does not work. The real problem is that when the database gets
corrupted, there is a high chance that you need Perforce to straighten
The Perforce daemon is a bottleneck. Long running operations lock out
all other users. This isn't a problem with small repositories, only
with large ones. Scalability is an issue.
Perforce uses the RCS file format with all of the problems that entails.
The database can use a dramatic amount of disk space.
The main issues are scaling and reliability.
Among the projects hosted by BitKeeper are MySQL and Linux
BitKeeper's dual-license scheme is really clever about how it enforces
Here's a writeup on that...
Not quite Open Source
Larry McVoy is out to change the way cooperative software development
is done, and he may just pull it off. But he also seeks to make a
living from his work, and his way of achieving that goal has put him in
conflict with the Open Source Definition. His novel way of extracting
revenue from proprietary software developers may well fund the creation
of a great new free software tool, but it also has shown that "Open
Source" is not everything.
Some background is in order. Larry has built up an impressive résumé
over the years, with stints at places like SCO, Sun, SGI, and Cobalt.
Much of that time has been spent hacking on one kernel or another, and,
at Sun, putting together configuration management tools. So when he set
out to create a new free tool to address some of the problems that have
come up in the Linux kernel development process, he had a lot of
experience to bring to the task.
The result, a system called BitKeeper, is now nearing readiness.
BitKeeper provides all of the features of systems like SCCS or CVS, and
a lot more. BitKeeper was designed from the beginning to work with
multiple source repositories, and to facilitate moving patches from one
repository to another. Included are some nice graphical tools for
managing and merging patches. To learn more, see the BitKeeper web
Larry's stated goal is to have every free software project using
BitKeeper within a few years. He may just get there. The multiple
repository scheme is designed to work well with large,
globally-distributed development teams. The patch management allows for
the handling of changes, and for filtering these changes on their way
up to the "master" repository. In the Linux kernel case, this means
that Linus can benefit from much greater peer review of patches before
he has to see them. With some luck, the result should be a reduction in
the number of "Linus does not scale" burnouts that have occasionally
halted kernel work in the past.
As part of Larry's approach to world domination, he intends that
BitKeeper be freely available for any free software development team
that wants it. That includes source availability, ability to distribute
modified versions, etc. But Larry also wants commercial software
companies to use his system, and he would like for them to pay for the
privilege. After all, he estimates that about four person-years of
effort have gone into the development of the system; it would never
have happened without some expectation of a return on that investment.
And it's his way of getting them to pay that has put him in conflict
with the Open Source Initiative.
To understand the problem, it's necessary to understand two features of
BitKeeper and its license. BitKeeper includes a logging feature. Once
multiple repositories are in use, BitKeeper will log all changes to a
central server; these logs will be made available via a web page. Thus
anybody can go to the web site and see what's happening with any
development project out there which is using BitKeeper.
BitKeeper's license allows for modifications, but under one
restriction: all modified versions must pass a regression test. Other
free systems (i.e. perl) have regression tests in their licenses, but a
modified version which is unable to pass the test simply loses the
right to use the original name. Versions of BitKeeper which fail the
test may not be used at all. And yes, the regression test checks to be
sure that the logging feature has not been removed or disabled. If you
turn off the logging, you violate the license.
The reasoning behind this move is the following: Larry believes that
free software projects want their work to be in the open anyway, and
will not be bothered by the logging. Since the logging only kicks in
when multiple repositories are used, individuals using BitKeeper to
manage their diaries will not be affected. Proprietary vendors,
instead, are not likely to be happy with having their change log
messages broadcast to the world. For them, this restriction will
probably make the system unusable.
At this point Larry shows up with a deal: the commercial version of
BitKeeper doesn't do public central logging - you can direct the
logging to an internal server. Pay the price, and you can use the
system with your privacy intact.
There are a number of other features to the BitKeeper license.
Subsections of the code - generally library modules that could be
useful elsewhere - will be available under the GPL. If the logging
servers go away, or if work on the system stops for two years, the
whole thing goes GPL.
But that is not good enough for the "Open Source" designation, because
the regression test requirement breaks the rules. Larry discussed the
issue at length with the OSI folks, and was not able to get them to
bend on the issue. He has since given up. BitKeeper is not Open Source.
The interesting thing is that, on a list for kernel hackers who intend
to use the system, nobody really cares all that much. Even members of
the OSI board have posted there, saying that the license is a good one,
and that the lack of the "Open Source" designation should not be a
problem. BitKeeper is free enough for that crowd, and they tend to be
pretty fussy on these things.
So we have a situation where a license widely regarded as "free enough"
does not qualify for [what is supposed to be] the free software
community's mark of recognition. We may be seeing the future here: more
"commercially crippled" licenses may well appear as more developers try
to make a go at making a living from free software. When a lot of "free
enough" software is no longer "Open Source," what becomes of the
certification mark? Will people care about it any more?
Maybe the OSI should consider adopting a multi-tier designation. The
top tier could be reserved for fully free code - perhaps with an even
more restrictive set of criteria than what they have now. Lower levels
could then be used to recognize software which is "free enough," but
which does have some restrictions. Doing so could help the community
distinguish between the incredible number of software licenses which
are coming out, and could also help to preserve the relevance of the
Open Source certification mark.
SCM systems are often a productivity bottleneck. Inexpensive entry
level systems don't solve the problems you need solved. Traditional
high end systems are resource and administration intensive. BitKeeper
is light, fast, and exceptionally simple to use, yet it offers advanced
features not found in even the most expensive traditional systems. If
the following list sounds familiar, BitKeeper is right for you.
Merging. Do your engineers spend too much time merging? BitKeeper has
the best-in-class merge algorithms and merge tools which reduce merge
time to 1/10th of the time required by other tools.
Renames. Do you want to reorganize your source tree but can't because
the SCM tool doesn't properly track file names? BitKeeper gets this
right, files may be renamed at any time, in any work space, and the
renames are handled correctly in all cases.
Geographically distributed. Do you have teams in more than one
location? With centralized client/server SCM systems, all the remote
teams suffer. BitKeeper is a peer-to-peer system based on a replicated
database. All teams become local and enjoy local performance in a
Work flow. Are you stuck in your vendor's idea of work flow? Ever
wished you could modify it to suit your needs rather than their idea of
your needs? BitKeeper is a peer-to-peer system, arbitrary work flows
that match your changing needs are no problem.
Reproducibility. Do you ever have to roll back to fix a bug in an
earlier release only to find that your SCM system doesn't support that
or get it right? BitKeeper guarantees 100% accurate rollback of all
file contents, names, and permissions without requiring any forethought
on your part. While other systems require that you remember to tag the
tree, BitKeeper has no such requirement; all changes are potential
Performance. Do you have to wait because your server gets too busy? Are
you tired of spending more money on expensive machines to keep up with
the load? BitKeeper's replicated nature spreads out the load over all
your machines. A small and cheap PC can easily support thousands of
developers. It would cost more than a hundred times as much to do the
same thing with other SCM solutions.
Reliability. Do you have to wait for your SCM vendor to come unscramble
their database? How about waiting on the overloaded or crashed SCM
server? BitKeeper is based on a replicated database design which means
the main integration server can crash without causing a problem. It is
possible and easy to guarantee 24x7 uptime with BitKeeper.
Data integrity. Have you ever rolled back to fix a bug only to find
that version of the database is corrupted? Most entry level SCM systems
are based on the RCS file format and it is commonplace to have
undetected corruption in those files. You'll find out when a customer
insists on a bugfix in an old release and you can't get at that data.
BitKeeper will tell you immediately if you have data corruption and can
help you fix it.
Reviewing and debugging code. Do you ever want to see all changes
associated with a particular change in a file? Two clicks in BitKeeper
will let you see that for any change in any file. We depend heavily on
this feature to provide fast and accurate support to our customers.
Without this feature, we would have to increase our technical staff by
a factor of three to maintain the same level of support and
Time to market. Do you need to get to market quickly, ahead of your
competitors? BitKeeper will help you by reducing the time engineers
spend merging, catching integrity problems as they happen, allowing
work flow which matches your process, revealing quickly how and why
changes were made, and providing excellent performance as you grow.
Cost. Do you spend as much or more on hardware and support personnel
than on the SCM system itself? You are not alone, that is common for
any medium or large installation. The replicated nature of BitKeeper
means that a PC will work fine and there is no need for full-time
Support. Do you ever have a problem or a question and spend 30 minutes
on hold waiting for an answer? Does your SCM vendor relabel support as
Professional Services and charge you extra? Our support is without
equal in the industry, we are responsive to your needs and will work
with you to deploy BitKeeper effectively, at no extra charge. Our
customers frequently describe our support as the best they've ever
More information about the linux-elitists