Lock, stock, and barrel: June 2011

2011-06-14

Being cross-platform

Too often developers sing the praises of cross-platform code. I claim that striving to be cross-platform is not always that good.

Platform independence is an abstraction. As any abstraction it is good -- when it solves real problem. But it could as well be premature, and contribute nothing but complexity and inconvenience both for the code clients and the implementers.

Working with just one platform, you get the benefits of using native platform tools; compiling your code with native build system; using just one compiler and one standard library; using just one set of system primitives; and all that without being forced to play around infinite quirks of each individual component.

If you cannot afford luxury of development just on one platform, strive at least to work with as small subset as possible, and to as close platforms as makes sense. For example, developing network server for Linux and FreeBSD can be OK (at least you have POSIX and pretty much same compiler), but adding Windows to the box is not so fun. The same way, developing desktop game on different Windows versions makes sense, but striving to platform independence only because "one day we may want to run it on Mac" would likely add no value but definitely will increase your budget/schedule.

After all, you should stop somewhere. Like, "this application is only going to work on desktops", or "this will be a library to help with mobile development". My point is that the earlier you stop the better. The less specific and more portable standard you comply to, the less useful primitives you get. At the end you'll be left without threads and directories. Sometimes there is a reason for that.

As with any abstraction, don't try to build this one "just in case": 1) you aren't going need it: instead do on demand; 2) you will do it wrong: instead let it grow organically.

Having said that, it doesn't mean that platform-dependent primitives should proliferate through all your code. On the contrary, your higher level code should not probably depend on platform-specific low-level details. But hey, it has a little to do with "cross-platform" stuff, it is just how reasonable abstractions are built!

Of course, sometimes you can get abstraction from platform for free -- for example, when you already have good cross-platform library or tool that does what you just need. In this case there is no reason not to make use of it. Remember, platform independence is not bad on itself, but only when it implies costs that otherwise could be avoided.

2011-06-07

Enums in C++

"Should I use enumeration in this code, or could I just use plain integer/boolean type for representing set of unique integers?"

Enumerations are OK if:

you use the names in 'switch' statements
using values in template parameter and specializing on that parameter
semantics of some code will be changed if new value has been added
semantics of the code should not be changed if values of two variables in the set have been swapped

You'd better stick with plain integers if:

you routinely iterate through the values in 'for' loop
you perform arithmetics on the values
semantics of the code is not changed if new value has been added

For values that represent strict binary choice -- yes/no, forward/backward, good/bad -- it is almost always great idea to have an enumeration instead of boolean type: it communicates an idea of variable semantics much cleaner.

To sum up, you should use enumerations only if you are interested in the names, and not interested in the values (with an exception for serialization maybe), and if the set of integers is bound.

2011-06-03

SCM: atomic commits and merges

(Note: this post is more relevant to centralized world of SCM)

There are two types of merges that we face in everyday work: merging from more stable branch to less stable branch (release to trunk, trunk to feature, central to local etc.), and merging from less stable branch to more stable one (vice versa, though I hope you never integrate trunk to release branch).

The first one is for getting last (and hopefully somewhat stable) changes.
The second one is for delivering your work to the world.

Typically, when you want to integrate, two branches have diverged for a more than one commit.

How to reconcile the idea of atomic commits with merging? I'll show it on example of integrating back and forth between the trunk and some feature branch, but the same reasoning is applied on any kind of integration/merging (just substitute "trunk" with "more stable" and "feature branch" with "less stable").

When merging changes from the trunk to the feature branch, you should merge change by change, not a bulk of changes at hand. First of all, it is obviously easier to merge this way. More important reason is that that allows to test and review each change made at trunk in isolation. It is a common mistake to merge in one lump change: a lump change means a lump diff, and a log message like "merge from trunk" -- which definitely doesn't tell you much.

As every single commit to the trunk supposed to be an atomic, integrating those changes to the feature branch should also remain atomic.

When merging changes from the feature branch to the trunk, you cannot and don't want to integrate change by change: by definition, you have created your feature branch because you didn't want to commit the changes to the trunk. So you can only integrate back to the trunk when the policy of the feature branch meets the policy of the trunk.

(What is "a policy of the branch"? Think about a policy as an invariant of a codeline. A policy of the trunk could be "the code is compilable and passes unit-tests for whole system", whether a policy of some experimental branch could say "commit whatever you want". Clearly you want to integrate your experimental branch back only when it is also compilable and passes unit-tests.)

Special care is needed for keeping such lump commits atomic: your feature branch should have a single purpose. In other words branch should be feature-oriented, not component-oriented. Don't mix space-fixing, refactorings, bug-fixing, features in one feature branch. For example, if you are working in the GIS domain and optimize memory consumption of the routing component, you should have "optimize-routing-memory" branch, and having a "routing" branch for a years is plain stupid (to whom it might concern: any coincidences are not).

For distributed SCMs all that is much simpler, as such systems save whole graph (not trying to convert it to linear history).

Some articles and books that I recommend to read:
High-level Best Practices in SCM from Perforce site: Perforce Software, and especially Laura Wingerd (their VP of Product Technology) are known for evangelizing good practices in (centralized) SCM usage -- not specific to their product only. This article is about... well, high-level best practices in SCM.

Streamed Lines: Branching Patterns for Parallel Software Development: everything about branching, mostly in "pattern language format": what, how, when, when not, etc.