2011-06-03

SCM: atomic commits and merges

(Note: this post is more relevant to centralized world of SCM)

There are two types of merges that we face in everyday work: merging from more stable branch to less stable branch (release to trunk, trunk to feature, central to local etc.), and merging from less stable branch to more stable one (vice versa, though I hope you never integrate trunk to release branch).

The first one is for getting last (and hopefully somewhat stable) changes.
The second one is for delivering your work to the world.

Typically, when you want to integrate, two branches have diverged for a more than one commit.

How to reconcile the idea of atomic commits with merging? I'll show it on example of integrating back and forth between the trunk and some feature branch, but the same reasoning is applied on any kind of integration/merging (just substitute "trunk" with "more stable" and "feature branch" with "less stable").

When merging changes from the trunk to the feature branch, you should merge change by change, not a bulk of changes at hand. First of all, it is obviously easier to merge this way. More important reason is that that allows to test and review each change made at trunk in isolation. It is a common mistake to merge in one lump change: a lump change means a lump diff, and a log message like "merge from trunk" -- which definitely doesn't tell you much.

As every single commit to the trunk supposed to be an atomic, integrating those changes to the feature branch should also remain atomic.



When merging changes from the feature branch to the trunk, you cannot and don't want to integrate change by change: by definition, you have created your feature branch because you didn't want to commit the changes to the trunk. So you can only integrate back to the trunk when the policy of the feature branch meets the policy of the trunk.


(What is "a policy of the branch"? Think about a policy as an invariant of a codeline. A policy of the trunk could be "the code is compilable and passes unit-tests for whole system", whether a policy of some experimental branch could say "commit whatever you want". Clearly you want to integrate your experimental branch back only when it is also compilable and passes unit-tests.)

Special care is needed for keeping such lump commits atomic: your feature branch should have a single purpose. In other words branch should be feature-oriented, not component-oriented. Don't mix space-fixing, refactorings, bug-fixing, features in one feature branch. For example, if you are working in the GIS domain and optimize memory consumption of the routing component, you should have "optimize-routing-memory" branch, and having a "routing" branch for a years is plain stupid (to whom it might concern: any coincidences are not).




For distributed SCMs all that is much simpler, as such systems save whole graph (not trying to convert it to linear history).

Some articles and books that I recommend to read:
High-level Best Practices in SCM from Perforce site: Perforce Software, and especially Laura Wingerd (their VP of Product Technology) are known for evangelizing good practices in (centralized) SCM usage -- not specific to their product only. This article is about... well, high-level best practices in SCM.

Streamed Lines: Branching Patterns for Parallel Software Development: everything about branching, mostly in "pattern language format": what, how, when, when not, etc.

No comments:

Post a Comment