2011-09-20

On overtime

It is widely acknowledged these days that chronic overtime is bad: it makes your team unproductive, unmotivated and unhappy.

At the same time, never having to do any extra hours could be an indicator of unimportance of ones work. Or you have too conservative schedule -- which is both bad economically and demotivating.

So, I think that it is even useful to have an occasional bursts during development cycle. Not only can it help to meet some tight deadline, prepare some demo, polish last glitches before release, or even -- yes -- simply make up that mistake made during time estimation. But also it can actually motivate and unify developing team. It is healthy to push yourself once in a while.

For me, "occasional burst" means to work harder than my long-term pace allows for about a week or two twice a year -- otherwise it's neither "occasional" nor "burst" anymore.

2011-09-10

Templates vs. virtual functions in C++, part II

In part I I have written that templates should generally be preferred to virtual functions if dynamic polymorphism is not absolutely needed.
In this part I will deal with some common arguments in favor of using object-oriented style interfaces and critics of template.

So why C++ developers often prefer interfaces with virtual functions to solutions based on templates? Here is the list of motives I'm aware of:

- code that use OO interfaces can be hidden in .cpp/.CC files, whenever templates force to expose the whole code in the header file;
- templates will cause code bloat;
- OO interfaces are explicit, whenever requirements to template parameters are implicit and exist only in developer's head;
- heavy usage of templates hurts compilation speed.

Let's scrutinize them in order.

Code that uses OO interfaces can be hidden in .cpp/.CC files, whenever templates force to expose the whole code in the header file

Here is how usually code which works with object-oriented hierarchies looks like:

    // in header file
    void f(Base const& t);

    // in source file
    void f(Base const& t)
    {
        // ...
        // hairy implementation
        // ...
    }

So nobody sees implementation of f() guy (*).

Here is how templates usually implemented:

    // in header file
    template <class T>
    void f(T const& t)
    {
        // ...
        // hairy implementation
        // not only hurts aesthetic feelings of every visitor
        // but also couples the interface with dependencies of the implementation
        // ...
    }

Aesthetics can be amended very easy by extracting implementation to "implementation" header file.

Second problem -- exposing dependency on implementation -- is more difficult. This can be fixed, but only if you know a subset of types that your template will be instantiated with -- just provide explicit instantiation of template functions/classes with those parameters:

    // in header file
    template <class T>
    void f(T const& t);
   
    // explicit instantiation
    template
    void f<T1>(T1 const&);
   
    // another one
    template
    void f<T2>(T2 const&);
   
    // in source file
    template <class T>
    void f(T const& t)
    {
        // ...
        // hairy implementation is now hidden
        // ...
    }

Templates will cause code bloat

Usually your code will actually shrink comparing to non-template version -- because only those template functions, classes and member-functions instantiated, and with those type arguments only, that you actually use in code! (**)

    int f()
    {
        std::vector<int> v;
        v.push_back(42);
        // only constructor, destructor, push_back() 
        // and whatever they use internally will be instantiated,
        // and only for type argument int
    }

Bloating can happen though if you are not careful. Classic example is containers that hold pointers. In C implementation there will be just one container that holds void*, and a couple of macros for convenient (but very unsafe) casting of those pointers to specific types. In naive C++ implementation there will be bunch of functions generated for each pointer type:

    std::vector<int*> ints;
    std::vector<float*> floats;
    std::vector<Goblin*> goblins;
    ints.push_back(0);
    floats.push_back(0);
    goblins.push_back(0);
    // three pairs of push_back() will be generated... or not?

All decent implementations though provide a specialization for pointers:

    // primary template
    template <class T> class vector; // additional template omitted

    // specialization for void*
    template<> class vector<void*>
    {
    public:
        // here comes the real implementation
        void push_back(void* p) { ... }
    };

    // partial specialization for pointer types
    template<T> class vector<T*> : private vector<void*>
    {
    private:
        typedef vector<void*> base;
    public:
        // use version of vector<void*> with cast
        // inlined, and doesn't cause any code bloating compared with
        // version one would implement without templates
        // (also safe: user cannot screw up)
        void push_back(T* p) { return base::push_back(static_cast<void*>(v)); }
    };

OO interfaces are explicit, whenever requirements to template parameters are implicit

This one is sad but absolutely true. Remember example from part I:

    template <class T>
    void f1(T const& t)
    {
        // no requirements on T except in comments if you are lucky
        bool flag = t.do_something();
    }

    // serves as an explicit specification
    class Base
    {
    public:
        virtual bool do_something() const;
        // ...
    };

    void f2(Base const& t)
    {
        // explicit requirement is Base's class interface
        bool flag = t.do_something();
    }

Concepts would have solved this problem, but unfortunately they have been rejected from C++11. Let's hope that they appear in the next Standard.

Until then, you have two options.
  1. You can specify requirements in comments and documentation (like SGI STL documentation and C++ Standard do).
  2. Or you emulate concepts. Boost.Concept is the nice tool for that. Finally you can at least use constrains in templates implementation (***)
Heavy usage of templates hurts compilation speed

This one is also true. Compilation slows down due to following reasons:
  • "real code" is generated from templates, and that takes time. Not much can be done with this that "issue". (Alternatively you can write all code by hand, but that would take even more time);
  • templates usually implemented in header files (see the first point), and thus increase preprocessing time and introduce new dependencies that every template user depends on. Sometimes that could be mitigated with explicit instantiation requests, other times with accurate dependency management (not include what can be just forward declared etc.). Sometimes you can just live with it, and other times you can consider using other abstraction tool and not templates.
All in all, in my opinion in modern C++ templates and static polymorphism should be considered the basic design tool -- especially for libraries, and object-oriented techniques should be only considered after them, and not something you start with.

_____________________________________________________________________

(*) Unless developer wants to make it inline, which s/he usually doesn't -- if efficiency of this function was so important s/he wouldn't use virtual functions here.

 (**) Subject to some restrictions: for instance, virtual functions always instantiated, [provide second example]. For more details, read C++ Templates -- The Complete Guide book.

(***) Constrains solve another (but closely related) problem with early diagnostics of violations on type requirements. Unfortunately they are poorly suitable for documenting interface as they are specified at implementation, not in interface.

2011-09-04

Templates vs. virtual functions in C++, part I

Virtual functions and templates are two major tools for supporting polymorphism in C++.

Virtual functions are basic blocks for supporting dynamic polymorphism and thus object-oriented paradigm in C++. Templates are useful for generic programming and metaprogramming; they allow static polymorphism.

Each tool has its own applications. Here is an advise from Bjarne Stroustrup from his book The C++ Programming Language, 13.8:
  1. Prefer a template over derived classes when run-time efficiency is at a premium.
  2. Prefer derived classes over a template if adding new variants without recompilation is important.
  3. Prefer a template over derived classes when no common base can be defined.
  4. Prefer a template over derived classes when built-in types and structures with compatibility constraints are important.
Note that it leaves only one case for virtual functions: when adding new variants without recompilation is important. And if you don't need this benefit, then start with templates. There are several reasons for it:

Templates are non-intrusive

Once type meets requirements imposed by function or class template, it can be used unmodified. Usually that means that type is required to provide some functions or member-functions with predefined semantics.

To pass object to function that operates via pointers/references to root of some hierarchy, type should be specially prepared (either it should be derived from this root directly on indirectly from the beginning, or it should be adapted later) and should have some member-functions with exact signature (*):

    template <class T>
    void f1(T const& t)
    {
        // no requirements on T except that it should provide
        // member-function do_something that
        // returns something convertible to bool
        // and can be called without arguments (so can have i.e default parameters)
        bool flag = t.do_something();
    }

    class Base
    {
    public:
        virtual bool do_something() const;
        // ...
    };

    void f2(Base const& t)
    {
        // first, t should be of class derived from Base
        // second, Derived::do_something() should follow 
        // the signature of Base::do_something()
        bool flag = t.do_something();
    }
Templates produce faster and smaller code

Call it "premature optimization" if you like. But if you are implementing a library you often don't know what will be on the critical path of an application using it. And if you are implementing a C++ library, you'd better make everything as fast as possible, and without using unnecessary memory.

    template <class T>
    void f1(T const& t)
    {
        // no virtual call overhead
        // t doesn't have to store a pointer to vtable
        bool flag = t.do_something();
    }

    void f2(Base const& t)
    {
        // probably virtual call overhead
        // stores pointer to vtable
        bool flag = t.do_something();

        // that can be undesirable especially for small objects
        // and trivial and/or frequently called operations

    }

Templates don't lose information about types

    template <class T>
    void f1(T const& t)
    {
        // here you know the exact type of t and can make further decisions
        // based on it (perhaps using traits or strategies)
    }

    void f2(Base const& t)
    {
        // here information about exact type erased
        // you cannot get it back without run-time type queries
        // (which is inefficient and doesn't scale on a number of derived classes)
    }

Making up hierarchy costs nothing

... but not vice versa. You can always wrap your types designed for using as template parameters with object-oriented hierarchy -- without run-time penalty (**), but once you have "virtuality" in place, its overhead with you forever!

I consider this one the most important. Because it means that, other thing being equal, you probably won't make mistake starting with templates. It is possible because of the previous benefits: performance, non-intrusiveness and keeping type information.

In part II we will deal with some arguments in favor of using dynamic polymorphism.

_____________________________________________________________________


(*) There is relaxed for covariant return types (and for exception specifications which can be stricter in derived class -- provided you consider them part of signature)


(**) Of course, compared with having hierarchy from the start.