Interview with Tom Lord of Arch Revision System 334
comforteagle writes "Every revision control system has its supporters and detractors, but none is as polar as Arch. Either you hate it or think it is the best thing in revision control ever. Built more around what our beloved kernel hackers use (BK), Arch is definitely a departure from CVS and Subversion. I've interviewed Tom Lord, Arch's daddy, about the application, and he has some -ahem- interesting answers and opinions."
Re:I'm left out... (Score:5, Interesting)
And those of us who have heard of it, but have no idea if its a good thing or not.
I noticed freedesktop.org has started using it to some degree [freedesktop.org]. But like I say, I have no idea if thats a good thing. It is slightly inconvenient in that I have to go read yet some more docs to use it.
Design and License (Score:2, Interesting)
Look at the way the Linux kernel project works, at least for developers who are willing to drink the koolaid of Bit Keeper (BK) licensing.
I guess that's a different koolaid than what the Stallman/Gnu cult members are drinking.
GNU arch when an OSA (Score:3, Interesting)
As ever people OSI is accepting [opensource.org] nominations for OSAs.
John.
Most polar? (Score:3, Interesting)
Personally, I really like ClearCase. Too bad its so expensive, otherwise I'd use it for all my open source work.
Re:Most polar? (Score:5, Interesting)
Cost issues aside, I think that perception of ClearCase is effected by whether you have to set ClearCase up yourself or not.
The first time I used ClearCase I had to set up the ClearCase environment. I did not like the ClearCase documentation much. Rather that just telling you what you need to know to get the system set up they provide their grand vision of the world. I could care less about their grand vision, I want to get the source control system working. After this experience I was not a big fan of ClearCase.
I used ClearCase again in an environment where the release engineering group managed ClearCase, along with the releases. They would "freeze" the branches for release (and let you in when you had a bug fix). They would also create new development branches and they managed the main line branch. In this environment ClearCase was really nice. I liked it a lot and prefer it over CVS.
In summary I'd say that ClearCase is a higher cost source control system. You not only have to pay for the software license for ClearCase but part of someone's time to manage it as well. For small projects and software development groups this does not make sense. But once a group reaches a certain size, the cost can be justified and ClearCase is nice.
I am currently working on a project where there there is a core set of software that is used by three different groups, each of which will probably want their own changes. In this environment I think that a release engineering group and ClearCase would be justified (of course that does not mean that we're going to get a relase engineering group and ClearCase).
I disagree... (Score:5, Interesting)
Not very impressive (Score:1, Interesting)
Even if this arch thing is good, i am not going to switch for two reasons: i am happy with cvs, being aware of its drawbacks, switching to a better system is not critical; i am certainly not impressed by what its author says.
Perhaps i should have given them the other way around.
No people skills. (Score:5, Interesting)
Re:All that and he doesn't explain... (Score:4, Interesting)
> point out a false statement made by Lord.
> [Hey, FSFS exists.]
I agree it is good to point out FSFS. The
interview is, indeed, misleading in that
respect.
As far as I know, back when the interview was
conducted, FSFS did not exist or at least was
not on many radars.
A separate question is whether or not FSFS
really makes the server-side of svn all nice
now or not --- but certainly that is not going
to be worked out in
-t
darcs (Score:5, Interesting)
Re:Tact? (Score:3, Interesting)
The other extreme is just developers who hate the popular software just for the sake of hating popularity. That seems to be the case with DSpam over Spamassassin. I don't think that's the case here however. While CVS is reliable software and people know how to work around its flaws (and the creator of arch fully admits that) it is at the same time fairly flawed.
I'd tend to agree that CVS is klunky in the way he describes. I still use it of course since it gets the job done. I've not tried subversion at all, so I can't comment on how well that fixes the problems of CVS.
Non-database repositories (Score:1, Interesting)
It's now possible to create repositories that don't use a BerkeleyDB database. Instead, these new repositories store data in the ordinary filesystem.
they *all* suck (Score:3, Interesting)
At the moment I am using subversion because it has versioned properties and I wrote a bunch of scripts to extract filesystem metadata and create svn properties from them and vice versa.
We have at least one arch fanatic where I work and when I asked him about this, he seemed to think that using arch for what I want would be *fantastic* and arch would rule, only I'd have to use the cvs method of maintaining ownerships and permissions, ie a script which maintains them in a file which is in the repository. Which I tried and which sucks.
Re:Argument by Slashdot(r) ? (Score:4, Interesting)
Re:I don't like CVS, Subversion, or Arch (Score:3, Interesting)
Arch's biggest bug (Score:5, Interesting)
This article is a good example. Tom Lord just hand-waves his way past every question. Subversion sucks!!! CVS users are teh stupid!!! If he tones it down a bit, he definitely has a future in politics. But I don't think he's a very good software architect.
OK, it's true that CVS and Subversion have problems. But, gak, so does Arch. Good God is it slow for big projects (something they've been promising to fix for years). And it's got some horrifying naming conventions: "tla--devo--1.3". And the files! "{arch}", "++default-version", ",,inode-sigs". Whatever Lord was smoking, it must have been good. The branching and merging operators are powerful but, thanks to all the punctuantion, they are also ugly. It's like the entire UI goes out of its way to be downright unfriendly.
Every time someone mentions these deficiencies on the mailing list, they just get flamed for not truly understanding Arch. "Namespaces! Namespaces! Namespaces!" "Win32 is for lusrs!" Whatever. I just want a tool that helps me get the job done.
Personally, I'm in the middle of transitioning to Subversion. It's better than CVS, and it is faster and nicer to use than Arch. Works for me.
Your biggest strength is to know your weaknesses (Score:2, Interesting)
I find it hard to believe that Arch would be so perfect. If he really knew the strength of his software he would also have no problem admitting to its weaknesses and Arch would be that much better for it.
Instead he spent most of the article attacking Subversion. If Arch is really that good, why would he spend so much time complaining and critiquing something else?
Re:I don't like CVS, Subversion, or Arch (Score:4, Interesting)
Re:Arch has great potential... (Score:2, Interesting)
Tom, change your name
you narcissistic f'er
Distributed development under arch? (Score:4, Interesting)
What I was trying to do was to have a two-layer revision control system, where I have a private archive in addition to the project archive, and I check into the private one all the time, and transfer changesets to the project archive when I'm happy with it. That way, I can be halfway through refactoring a big chunk of code, have it completely broken, but have the work so far revision controlled so that, if I accidentally wipe out my build tree, I can recover it.
The problem I ran into was that I couldn't get the two archives to agree exactly on the current status: whenever I transferred my changes up from the private archive, it added a log message to the project archive, and my private archive wasn't up to date, because it didn't have the message. When I updated my private archive from the project archive (either to pick up the message or to get other people's changes), I had to put in a log message, which the project archive then didn't have.
It seems like arch really ought to support getting two archives in perfect sync, as well as disregarding a commit to a remote archive that only adds changesets already in the local archive (as well as disregarding the changesets themselves, which it does do).
Re:I don't like CVS, Subversion, or Arch (Score:2, Interesting)
CVS is, quite frankly, ass! On tagging it can _seem_ like it's tagging successfully (T 'filename') and even a handy exit code of 0. But then when you go to actually use your tag you may get some, all, or none of the revision of the files the tag was supposed to be applied. It's not atomic in anything it does. On subversion operations succeed or they don't. Not some sort of throw the dice and see what actually got checked in/tagged/branched/whatever and what didn't.
You get better log output, super fast execution, and much much better branching.
The comment from, Lord "numb-nuts" of Arch, about svn being a toy is asinine. Bdb isn't the worst thing there is. And there's work to provide a choice. There's work progressing on using *sql as the backend storage. There's also work to give one the option of using a plain filesystem like CVS. If you're sick and like that kind of thing.
In every way imaginable Subversion is superior to CVS. I have gone through the hell of having to work around CVS' failings. I have also experienced how much life is with subversion.
Our repository is just 1.2GB right now. I've not experienced any "flaky" behavior whatsoever. It is, hands down, the better scm tool.
If all code, binaries, and everyone involved in the creation and/or continued propagation of CVS were to be de-res'ed, the world would be a better place.
For an opensource scm tool subversion is the way, the truth, and the light.
If you want to talk about the best scm tool, bar none, that would Clear Case. Truely a best in class application. Although it wouldn't hurt them to step into the latter half of the '90s and get rid of motiff as their widget set for the gui frontends. But that's only if you care about the gui, right?
Conflicts (Score:2, Interesting)
Re:agreed, Arch needs a better advocate (Score:2, Interesting)
The thing is, being really good at arch is more productive than being really good at svn.
I think arch supports a much better model for opensource development than svn. Because it is a distributed model. So while *I* might have the offical release of a project, if someone else wants to download and hack on it, they get to keep their changes in a revision control system, and I can easily merge their changes back. And if they keep developing, I can keep updating without them having to worry about what patches I accept and what I reject.
It also supports maintaining multiple development branches much better. (You have a --dev tree, and a --release tree, where each one is evolving, hopefully one faster than the other.) With CVS, you pretty much only have a branch to eventually merge it back to HEAD. My understanding is SVN is a little bit better about it, but they still don't natively support doing more than 1 merge between 2 trees and automatically detect what has been merged in the past.
So, where is it going? (Score:1, Interesting)
Re:Distributed development under arch? (Score:2, Interesting)
When you see it that way, what is happening makes good sense really.
However, I believe "archive mirrors" does what you are trying to achieve. I haven't used them yet myself, since I need the pivot branch to work with my coworkers.
good luck
zenaan
berkeley DB (Score:2, Interesting)
I still use svn, though. I'm just glad to be able to rename directories.
I'd pee myself if someone forked svn and gave it a more friendly backend.
-Ed
*By this, I mean that you can't take the berkeley DB, copy it to another machine, and expect it to work... the internal byte order is machine specific.
Comment removed (Score:3, Interesting)
Re:agreed, Arch needs a better advocate (Score:3, Interesting)
The best thing about darcs is that every operation is local by default. Subversion does diffs locally; darcs does everything locally. You only need to wait on the network when you want to get something not on your machine, or when you want to share your work with others. Arch can be made to work this way, but it requires a bit of setup and a lot of understanding of advanced concepts: mirrored archives, revision libraries, etc. With darcs, fast is the default.
The main downside is that it's still pre-1.0, and so a bit less stable and documented than Subversion, though still reasonably good.
Re:Design and License (Score:2, Interesting)
From what I remember of Subversion, looking at it on-and-off over the years, they have ended up redesigning it several times over, while trying to produce code that would end up being part of the finished product, thus throwing away large portions several times because it didn't fit the new design, rather than the code could be done in a better way.
I also found it a little strange to be trying to basically implement a VC file-system with file-attributes... Why not actually make a VFS plug-in for Linux, even if it forces everyone to adopt a Linux Server for the repository, and do the merge, etc, tools at the user-level? It still seems more elegant that way, even if it forces use of of Linux -- didn't VMS have a rudimentary Rev Control in its file-system?
When I first saw Arch, I immediately shied away because it was all written in shell, however when the first version of 'tla' appeared (i.e the C-code version) I was quite impressed at the fact he was actually allowing himself to test his basic desing ideas in what amounts to a Rapid Dev Environment, even if Shell-script is a painful beast to try and do something like this in.
Re:darcs (Score:5, Interesting)
I (Larry McVoy) have looked over Darcs, Monotone, Arch, Codeville, and I think some others that I can't remember and I can easily say that no, they haven't discovered much of what we have done.
Let's take darcs as an example. It's a cool system if you are a math or physics person. You can write proofs about how it works, much like BitKeeper. We like that and applaud anyone who is thinking that hard (and if you are looking for a job please come talk to us, we are always hiring). However, darcs suffers from the math problem. It's all about math and not at all about being pragmatic. Here's a for instance. The BitKeeper tree holding the 2.6 kernel has about 55,000 changesets. A null update using BK is 4 seconds (which is insanely slow in our opinion). Try doing the same thing with darcs and you will wait and wait and wait... That's just the first example of how it doesn't scale. The openlogging tree for linux is somewhere north of 110,000 changesets. *All* other systems die with that sort of load. We're slow but we work and we know how to fix the slow part.
This problem space is strange, it is part math and part pragmatism. You have to do both and darcs does one of them. And it does it in only one of the areas, there are many many more. Repository synchronization, rename handling, merging, user interface, installation tools, working well on Windows as well as Unix, etc., etc.
Our payroll is higher than any open source SCM system has generated by a factor of 50. It's higher than the reiserfs payroll, it's higher than lots of well known little companies doing useful stuff. It's high because there are lots and lots of corner cases *in addition* to the hard math stuff which needs to be done.
Since we're talking about Arch, here's another example: we recently got a commercial customer who tried out arch on windows and came back and told us BK was at least 10x faster. And we told him that we think BK is way too slow on Windows. He liked that. The point being is that it isn't just about architecture, or licensing, or features, it's about a lot of not-so-fun stuff and that's why a commercial answer will always be better than a free answer. It costs a lot of money to solve the non-fun problems. Open source solves the fun problems (extremely well, I might add) but unless the project is very visible (i.e., the kernel) it starts to fall down when you hit the non-fun problems. Think about it - if noone is paying you money or telling that you rock while you are doing the grunt work - how long are you going to do that? Not very long, just look at 90% of the "projects" on sourceforge, all talk, no code.
It's worth repeating that last bit. SCM is an undervalued field. Every engineer thinks that they can reproduce what BK does with a few scripts wrapped around CVS or RCS. While they may think that it flies in the face of the over 100 man years we have in BK and we know we are nowhere near good enough. The bummer is that the perception is that this stuff is easy but the reality is that it is hard. Both technically hard and detail hard. It's way more work than people think. But precisely because people don't value it, that's why the only real answer is a commercial answer. Yeah, yeah, you all love to give me crap because BK isn't GPLed but *none* of you have put in 1/10th as much effort as I have or have made 1/10th as much of a difference in this space. Talk is cheap, show me a better answer and I'll be impressed. It won't happen because it costs way way way too much money to deliver a better answer. How's the arch installer on windows? Graphical? Is it careful about not screwing up the registry? Can you have two different versions installed at the same time? What about the transport layers? Works over http? Really? Through all the wacky proxies out there? You get the idea, right?
That's why all this discussion of arch or darcs or whatever is just nonsense. You all think this stuff is easy so you are never going to cough up the $30M or so it will take to solve it right. Sad but true. I guess it's good for us, it means we have a market, but it would be nice if you knew a bit more about the topic. I love it every time it comes up, the world is definitely becoming more aware at least.
--lm
Re:darcs (Score:5, Interesting)
I agree that "darcs suffers from the math problem", at least in that the implementation has focused on getting the semantics right and not on performance. (And unfortunately, the semantics are still not all right.) David maintains a kernel tree in darcs as a reminder of all the ways it doesn't scale. However, he also thinks most of them are fixable "post 1.0", and given how smart and capable he's proven to be, I give that claim some respect. Alas, I haven't had time to learn the math well enough to really be sure.
Regarding the economics, I don't think SCM is an undervalued field. Or at least, the free software community can find a way to value any field it needs to to make progress. (And for SCM, you're helping!) People said we didn't value desktops, or help, or installers, or web browsers, or couldn't do webdav or other protocols "at the top of the stack". "No fun" is what people have said about all of these. (And we're still not great at all these, but I think we're on a clear path to get there.)
What does this mean for darcs? It already has good semantics, is easy to use, and has a solid theoretical foundation. I think that free software folks will increasingly value distributed SCM and it will get more development man-power (if not as much as bk). These are excellent growth factors, and I suspect darcs will be able to handle 90% of projects out there in a few years. Unless the foundation is found to be weak (which is why I asked about that). Unless David loses interest before someone else steps up. Unless, unless, unless, but I like its chances.
Put it this way: I agree that open source does not solve things that are too hard or no fun. But the second is actually a non issue: when we need something, powerful economic and selective forces will make it fun for someone. So I really care about the first, and I'm trying to gauge whether distributed SCM is too hard for David and others attracted to darcs. I suspect that it's not too hard, at least to get to the 90% mark.
Thanks for taking the time to reply. I do enjoy reading what you have to say.
Re:darcs (Score:2, Interesting)
There are some of us who will NEVER trust *our* hard work in a proprietary system. Maybe it is 10x faster and gives blow jobs between commits but it is still *MY* work in *YOUR* product with *YOUR* license on it, subject to *YOUR* bottom line.
I will never use a closed-source/proprietary development tool like that (VMware is probably the closest I'd get, since it is easy to switch to "real" hardware if the VMware folks turn evil). The feature set is almost irrelevant if it's got a restrictive license.
Almost *all* problem spaces are a mix of theory and pragmatism, yet it just takes 1-2 smart people to come along and come up with the breakthrough. Hopefully you haven't hired both of them
CVS has been "good enough" and nobody thought much about a replacement because the only people who were annoyed with CVS were developers who needed to focus on their *projects*, not taking a sideline to develop a good RCS.
But I'll be patient. It'll happen.
Tom Lord was my roommate in '88 (Score:4, Interesting)
Re:Does he know ANYTHING about Subversion? (Score:4, Interesting)
Huh? Did you read the same mails as I? Back then, Tom Lord's ramblings on the svn-dev mailing list had the same problem as this interview. And also those the grandparent complained about:
What exactly is bad about Subversion? Give me an example scenario that shows me just how fucked I would be with svn and how Arch would ride in on a white horse and save the day.
TL talked big about how Subversions design was broken but when asked to give concrete examples he always kept talking about theories.
IMHO, it's not much unlike saying that Linux sucks because it isn't a micro-kernel architecture. And when being asked about details, being unable or unwilling to come up with an example how a micro-kernel design would fix an existing major flaw (without sacrificing the existing good points of the software).
For example, I like QNX's design very much. But that doesn't imply that Linux is broken or sucks. Both have their strong and their week points dependend on the task at hand. (And for my daily desktop work I would fall into a crises if I had to use QNX instead of Mandrake due to some QNX usuability issues... oh wait, that reminds me of arch!
Re:Conflicts (Score:3, Interesting)
Re:darcs (Score:1, Interesting)
ssh root@bkbits.net
# cd
# bk changes -r1.1
ChangeSet@1.1, 1999-12-17 02:18:13-07:00, cort@attis.fsmlabs.com +1 -0
Initial repository create
Yup, that's right, we're pushing 5 years of BK supporting the kernel. In those five years, a multitude of lovely examples of humanity such as yourself have been telling everyone how we are evil corporate jerks who are just out to screw everyone. Just out of curiosity, when we go another 5 years and we haven't screwed you, will we then be considered OK people? No? 10 years? No? So the deal is that you just aren't happy unless it is GPLed. OK, cool, then use the GPLed junk and stop whining. It's all about choice, right? You can choose to use a superior product but you don't get source. You can choose to use an inferior product and you get source.
Or you could stop whining and go try and do as good a job, or hey, a better job than we have done. Maybe after you try that and realize it's 10x harder than you thought and you really don't want to do all that work, maybe, just maybe, you'll show up and say "hey, thanks for BK". Then again, maybe not.
Which for files "backup" ? (Score:1, Interesting)
VMS Versioning - Yess!!! (Score:3, Interesting)
TOPS-10 (not sure about VMS) also had project as well as programmer permissions - kinda like groups but more powerful and useful. Once logged in as a user, you could change projects. Your login would look like, e.g., "user[alex:kerneldev]. Thus files and directories were owned by a project as well as a user, and the system maintained accounting data for both. It was easy to allocate and track work time and resource utilization to projects.
The third big thing I'd really like to have is the transcripting facility in the Perq workstation's text editor. (Perq was an ancient workstation - I have three, will consider selling them as I need the $$.) The editor maintained a transcript of all changes made to the file and stored them on disk. In the event of a crash this transcript could be replayed while you watched. Besides being interesting to watch your own work in fast-time, it allowed recovery from the beginning up through the last block saved. VIM has a short transcript/replay, but it's cumbersome to use for anything more than a few keystrokes. It also has a basic recovery capability but doesn't work as well as this. I dunno about Emacs these days. I once restored a marathon 36-hour programming session (deadlines breed insanity!) using replay. The ideal would be a kind of 'tape' feature in the editor, which one could fast-forward and rewind by using a GUI, and grab that part where you wrote a nifty bit of code (or text), but then backtracked and went a different direction, and now you need that nifty bit.
Re:Distributed development under arch? (Score:3, Interesting)
That is, if a remote archive contains all of the changesets from the local archive, and I update the local archive with the remote one, the local archive needs to notice this operation (so that it knows I have those changesets), but, from the point of the remote archive, I haven't done anything new, because I only added changesets it already has.
I suspect that the underlying issue is that arch fails to drop empty changes. If you commit after you've just committed, you'll generate an empty changeset and a commit message for it. Really, a changeset should have to change something; otherwise, success should be reported with no revision generated. Similarly, if you are applying a changeset from a star-merge and you have all of the changesets it is composed of (you've gotten everything that was merged in, and there were no additional changes by the merger to resolve conflicts), then no revision should be generated. (You don't even need to note the fact that you've now applied that changeset, because applying it again wouldn't do anything)
I've actually used arch myself for most of a year now, and found the way a single archive works to be a significant improvement over CVS, but I have yet to get multiple archives to interact nicely. It looks like it would work well for cases where each archive is autonomous and no archive automatically picks up changes from other archives (i.e., the fully distributed case), but not for cases where some archive is kept up-to-date with respect to another archive (any centralization at all). Of course, arch is still ahead of CVS (the other system I have experience with) in that CVS doesn't support any relationships between repositories.