May 14, 2004

A Comment on Comments...

The other day, I saw one of my biggest pet peeves: I opened a header file and saw a file without comments, except for the copyright statement and some comments indicating where the class definitions are in the file, which I could have easily inferred. The class therein, however, had a lot of things that I could not infer.

The whole reason behind C++'s separation of the interface and implementation files is to allow developers to focus an interface rather than its implementation details. When I look at an interface file, I am looking for ways that I can use the class defined within and which operations are permitted on it. This is analogous is reading a man page or Microsoft's Help, since when I look at one of these, I am looking at things like what are the parameters, what it does, what it returns, what does it do when there's an error, what Standards it supports, etc. When I look at the strcmp man page, for example, I do not want to see how its implemented, since this could change in the future.

Boost.org states that the interface is more important than the implementation here, indicating that Dietmar Kühl says, "The main contribution is the interface, which is augmented with an implementation..."

I do not necessarily mean something extravagant for comments, but some basic information. Unfortunately, no matter how descriptive your function name is, most do not capture everything that the function does, and therefore, it is good to have some information on what it really does. In some cases, the parameters may seem obvious, however, it does not hurt to add some clarity on what they are. For example, if you have a pointer value, is a null pointer acceptable? If you are taking an integer, is there a specific range that the integer should be in?

Information about the return value or class state is also important. Specifically, it is important to indicate what happens should this function fail. Does it return an error code? Does it throw an exception? Does it tweak errno or some other error mechanism?

Extensions to this would obviously include thread safety, exception safety, references, and standards. For example, imagine that the strtok documentation did not mention how it tweaks a global variable, and is therefore not fit for threaded applications. Even if the source code were available for it, I think that many people would use strtok because it was not mentioned.

As is mentioned in Gotcha #1: Excessive Commenting from C++ Gotchas: Avoiding Common Problems in Coding and Design, you should not comment the obvious, such as a destructor (unless it inheritably does something really special), since C++ developers know what this is. However, it is important to know that obvious to you does not mean obvious to everyone else, therefore, the definition of obvious should only be core C++ knowledge or domain-specific knowledge, where hopefully references are available in your offices for developers who are not familiar with the latter.

C. A. R. Hoare's axiomatic semantics is essentially what you are doing, where he used the notation {P} S {R}, which means, if the predicate P is true before S and S completes successfully, then predicate R will be true. Predicate P, therefore, is a precondition and predicate R is a postcondition of the statement S.

The precondition and postcondition predicates are extremely powerful. If your precondition and postcondition are well-defined conditions, then there is very little that the statement in between can do. However, if the precondition and postcondition predicates are weak conditions, then the statement can do practically anything.

Design by Contracttm is essentially a specialization of this, and the idea behind DBC is that a contract if formed between the caller of the function and the function, where if the caller provides an environment that meets the function's preconditions, then the function must honour that contract by following through with its post conditions. In order to make this work ideally, however, your preconditions and postconditions must be well-defined. Whilst the technique is language neutral, there are several programming languages that actually have DBC built in, viz. Eiffel and D.

Comments written in this style are meaningful for many things. First, it provides a mechanism to tie your design with your source code in a meaningful way. Many design schemes do not tie design to source code, and this tends to keep the design and implementation separate beasts entirely. Placing more documentation in your headers can replace some of the design documentation, and most reverse engineering tools will absorb comments.

Second, new developers will come up to speed faster by reading through some header files than through a lot of source code. One of the problems with reading source code is that the source will probably make references to other components, which will require reading more source code to acquire a full understanding of what is going on. Furthermore, some developers (dare I say most) cannot simply look at source code; there is always something that can be optimized in some fashion, but this deters from simply using the component.

Third, comments like these help test your class. One of the biggest problems with using a testing framework like JUnit or boost is that some classes are difficult to know what to exercise. Documentation in this form directly indicates what should work and what does not work, and this should facilitate the creation of those tests.

There are many other advantages to documenting your classes correctly. Large-Scale C++ Software Design recommends this mostly for public classes, but I prefer it for all classes, since this makes the refactoring process easier should you ever need to make this class public and it also shares the other advantages named above.

A disadvantage is maintenance, however, it should become a habit to ensure that the documentation is correct after you change something. In this regard, your unit testing would probably fail, requiring you to change something there, which should signal a documentation check.

The separation of interface and implementation is different in programming languages such as Java. In Java, the interface and implementation are in the same file, and the comments that are recommended above would essentially be placed in a javadoc comment before the class or method declaration. The javadoc tool can then be executed on your source code to extract these comments into an organized tree, such as the J2SE API Specification.

Whereas a C++ developer would refer to the header file, a Java developer would refer to the javadoc. Similar tools are available for C++, such as cxx2html or Doxygen. The latter is used for boost.org's documentation.

In summary, properly commenting interface files has advantages of many levels. Comments lead to better understanding of the software, which makes developers jobs easier in the long run.

Filed In