2011-05-26

SCM: you should make atomic commits

As with functions/modules/classes, tools, and almost everything else in software development, a change that you are going to commit should have a single purpose, and should accomplish this purpose. I will refer to such changeset as an atomic commit.
The most important side effect of an atomic commit is that it produces diff that is easy to read and understand. In turn, having diff that is easy to read and understand makes you happy because:
- it simplifies your debugging by localizing changes;
- it simplifies code review by localizing changes.

Atomic commit doesn't mix refactoring, bug fixing, development of new feature, and style changing. Neither mixes it several refactorings, or bug fixings or whatever.

Atomic commit is self-contained, and accumulates all changes that serve its purpose.

Atomic commit has a short and up-to-the point log message, which usually doesn't contain 'and'-word and lists.

"But I don't have time to fix that minor issues, like capitalizing and spaces, separately: I want to do that along working on my primary task at hand!"

If those are minor issues, why bother spending time on them? It's not "fixing" then, it's polishing. You should polish your product, not your code (unless it happens that your code is the product). Otherwise, just get a piece of paper or text file and add notes about what should be done after you have finished your task.

"But often while working on a task I notice some TODOs or small things that I will forget if I haven't fix them right now!"

TODOs are easy to grep, aren't they: why don't you just get good habit to elaborate them periodically. Small things could be transformed in TODOs so you don't forget. Otherwise, just get a notebook or text file and add notes about what should be done after you have finished your task.

"But dumping small things to the piece of paper kills my flow!"

And fixing those things while working on bigger one doesn't? Then forget about small things.

Even better, switch to the modern SCM. This days modern means distributed, and most often that means git or Mercurial. For those who use one of such tool, there are no excuses at all to not produce atomic commits. Because in your local repository you can commit absolutely freestyle, and then slice the meat you've just produced into nice atomic cuts before upstreaming. (For git users, interactive rebases and partial commits are primary tools for that.)

"But merging changes from one codeline to another means that corresponding commit doesn't have single purpose!"

Wrong. The single purpose of merge should be delivering change to another codeline. That's why you should carefully choose the changes you want to merge in one commit. Once again, that is much easier to do with distributed tool, but it is also simple to do right with svn or perforce. (I will post more on branching/merging in some of subsequent posts.)

Example: this is just awful:

$ git diff
diff --git a/my.cpp b/my.cpp
index 6223d3c..4246210 100644
--- a/my.cpp
+++ b/my.cpp
@@ -1,14 +1,17 @@
-int fancy_stuff(int arg)
+int fancyStuff(int n)
 {
-    // do a lot of stuff
-    return arg * 2;
+    // Do a lot of stuff.
+    return n * 2;
 }

-int contrived(int arg)
+int contrivedFunction(int n)
 {
-    if (arg > 0) {
-        return fancy_stuff(arg*2);
-    } else {
-        throw std::runtime_error("arg is negative!");
+    if (n >= 0)
+    {
+        return fancyStuff(n * 2);
+    }
+    else
+    {
+        throw std::runtime_error("n is negative!");
     }
 }


... if all you wanted to say was:

 int contrived(int arg)
 {
-    if (arg > 0) {
+    if (arg >= 0) {
         return fancy_stuff(arg*2);
     } else {
         throw std::runtime_error("arg is negative!");

2011-05-22

Specifications, part IV

In the previous post I talked about how to write specifications for virtual functions. This post is about second C++ mechanism which uses new code as a tuning for the old one: templates.

Templates are probably the most important C++ abstraction-building tool: they allow constructing amazingly powerful abstractions without incurring run-time and memory penalties, and don't lose type information along the way. I think they should be preferred over virtual functions whenever run-time dispatching is not necessary (choosing one or another is the topic of separate blog post in the future though).

Specify template parameters

Often class and function templates (and member-function templates, and member-functions of class templates) expect their template parameters to have some properties. As usual, it is better to be explicit in what is expected.

Consider following template:

    /// prints its argument
    template <class T>
    void f(T t)
    {
        t.print();
    }

The code doesn't make sense if type T has no member-function template print which can be called without arguments. So it's better to put it in the documentation of function template f:

    /// prints its argument
    /// \pre T has a member-function print which can be called without arguments
    template <class T>
    void f(T t)
    ...

(Note that it is wise specify absolute minimum -- I'm talking about function print that can be called without arguments, not about print that has no arguments.)

This example is somewhat contrived: if client programmer misuses f compilation error will occur.
Sometimes though deciphering compilation error get tricky as instantiation that causes error is deep burried in the call stack. With that kind of things template parameters constraints help: you can have your types checked by type system, and as close to source of violation as possible. Read this Bjarne Stroustrup's faq for that.

And what is more important, compiler cannot verify everything: it verifies syntax and basic things (like presence of print() with compatible signature). But it will not help you with semantics that is not expressed in C++ type system. Examples of these ones are complexity, exception guarantees, specific side effects, commutativity/associativity of operation etc.

Consider std::vector<>. It requires that type of elements it holds is CopyConstructible, and CopyConstructible means specific semantics besides mere presence of publicly available copy-constructor. If you violate this requirement while instantiating std::vector with std::auto_ptr<SomeType>, you don't get any compilation error. Instead, you get undefined behavior (which in this case can be crash at runtime).

Another example of "semantic requirements" is usual requirement imposed on any clean-up function, and on user-defined swap operation to not throw: otherwise generic transactional code is impossible to implement correctly.

Third example is user-defined template that operates on container and promises O(N) run-time in terms of number of elements in the container:

    /// \pre Container is STL-like container
    /// \pre Op is the class that provides context-aware operator() with parameters
    ///      of types convertible to Container and to the type held by Container
    /// \post run-time complexity is O(c.size())
    template <class Container, class Op>
    do_something(Container& c, Op f)
    {
        for (c::const_iterator it = c.begin(); it != c.end(); ++it)
           f(c, *it);
    }

Note, that for keeping its promise, it should in general require that Op::operator() itself has the run-time complexity of O(1)! If all you have is do_something declaration and comments, you should specify it as well.

As with virtual functions, specification of template is only one part for ensuring correctness. Another one is of course providing type parameters that satisfies the specification.