Skip to content

Supybot Website

Sections
Personal tools
You are here: Home » Members » El domicilio de jemfinch » Idle musings of jemfinch's mind. » Archive » 2005 » April » 25 » Changeset names and ids.

Changeset names and ids.

Originally, I'd planned to give changesets an arbitrary unique identifier, such as might be created by the uuidgen program. It's been shown to me, however, that there's a significant problem with such a plan.

In Darcs, patches are named not with their hashes, but with something similar to a uuid, based on the email of the person creating the hash, the current time, and some other information. As a result of this design decision, there exists an attack against a project using Darcs.

Imagine that a project has two primary developers, "Alice" and "Bob." Alice runs the "official" repository for the project; Bob is a singificant lieutenant in the project. Let's imagine that an attacker, "Mallory," wishes to get a backdoor into the project. Mallory generates two patches, each inserting reasonable-looking lines, and gives them the same name (in normal use of Darcs, they would have different names, but nothing prevents Mallory from renaming patches to get a collision). She sends the first patch (we'll call it p1) to Alice and the second (with the same name, but different contents; we'll call it p2) to Bob. Both developers accept these patches, because both patches seem reasonable. Then Mallory creates another patch (we'll call this one p3) which, when added to Bob's repository (with p2), does something entirely innocuous appearing, but when added to Alice's repository, creates an insidious backdoor. So Bob gets p3, looks at it, applies it to his tree, and decides it's good. So Bob tells Alice to pull the patch from his repository, which she does, and boom! There's a backdoor in the project.

It's worth noting that for this attack to succeed, it doesn't even require that p1 and p2 insert different contents; it could even insert the same content in two slightly different locations and a p3 might be constructed that could insert a backdoor into the project. In that case, even if Alice and Bob discuss the patches out of band, they'll be highly unlikely to discover the difference (quick: name a what line number the last patch you read applied to).

So there's a significant security risk in identifying patches (changesets, technically, but I'll use the terms relatively interchangeably; when I'm referring to the actual Patch objects which comprise a Changeset in SDF, I'll always capitalize the term) by uuid. We need to prevent an attack like the one described above from occurring, but how? What we need is a "self-authenticating id," where the id cannot be forged. The simplest way to achieve this, I think, is to use a cryptographic hash of the changeset as its id; this attack would be trivially prevented: the changeset that Alice received would not be the same as the changeset that Bob received, and so would not have the same id. If p3 depended on p1 or p2 (it couldn't depend on both, of course) then one or the other developer couldn't apply it. But if p3 depended on neither p1 nor p2, then when Alice pulled from Bob (or Bob pulled from Alice) both would see p1 or p2 in the other's repository.

The problem with using a cryptographic hash as an id is that it's practically impossible for a mere human to remember. If changesets were identified by an email address and an auto-incrementing number (say, jemfinch@supybot.com:123) then it would probably be relatively easy for developers to remember and discuss a given changeset; when the changeset id is 9b17d94539a22b722e90f8119cfac17b28a6ad31 it's significantly harder to remember or discuss. So to facilitate social use and human interaction, SDF needs to provide a better name for humans to refer to changesets.

We could, of course, use the same email:number system mentioned earlier, but since we don't have to concern ourselves with uniqueness anymore, we might as well pick something even more understandable: the first line of the commit message seems nicely appropriate. So while the system will use the id of the changeset to prevent attacks like the one described above, humans can refer to changesets by names given to them by another human being.

(I'm particularly indebted to Zooko Wilcox-O'Hearn for explaining the attack against Darcs described above, and for discussing possible ways to resolve human-unfriendliness of hashes. Zooko invented Zooko's triangle, to which Mark Miller responded with pet names (the link is a gentle introduction to pet names by Mark Stiegler). In SDF, the title of the changeset will be its pet name, while its hash will be its "key.")

Posted by jemfinch on 14:30 April 25, 2005

Trackback

The URI to TrackBack this entry is: http://supybot.com/Members/jemfinch/blog/archive/2005/04/25/changeset-names-and-ids/trackback