Oct 02, 2004

Java Generics: Better Code or Worse?

In the latest Java Specialists newsletter, it seems that some people were not comfortable with Generics, so much so that Heinz sent a follow-up, imploring us to send e-mails justifying our position on generics. In addition to sending this to the proper list (people who know me well already know which one), I decided to also say something here.

Prior to getting too involved in this discussion, however, it is important to realize that I have only started playing with Tiger when it was released on Thursday, and with this in mind, my comments are coming mainly from what I have read about Java's generics and what I know about C++'s templates. But really, the answer to this question is not about the language that is being used; it is about the techniques associated with generic programming.

With this in mind, my position is that Generics can make software easier to maintain. Recently, some Java Journalist commented on how the culture in Java would not generally be impressed by a developer who wrote their own container class; a normal reaction to this case would be to reuse one of the containers in the java.util package. In Java, we are more likely to reuse these classes; in other languages like C, where container implementations are not part of the standard library, developers must implement their own containers for each type, and the way that this is generally implemented is in one of two ways:

While both points have their advantages, such as they can be optimized for your particular environment instead of being a general-purpose container, but both approaches have a similar disadvantage. In the former case, the copy and paste case, it is very likely that you are acquiring a perfectly working piece of code, but it is also possible that the code that you will be primarily using is not presently used, and therefore has bugs. On the same token, as software maintenance goes, the two pieces of code mature differently, such as a developer will modifying one, and forget to the other one. Over all, this is good recipe for disaster. In the latter case, you are reinventing the wheel. While yes, it is cool to show everyone that you still remember your High School lessons of how to write containers, this new implementation will need to be tested, and is is very likely that there will be a few bugs in the new wheel.

In C, a workaround to this situation is to use create container methods for a void* pointer, which basically permits the container to contain any type of data. In essence, this is similar to the java.lang.Object (only void* far worst, lacking, amongst other things, a ClassCastException). Up until J2SE 5.0, we had no alternative than to use a container of raw pointers. On one hand, this ability was good for those rare cases where you have a heterogeneous set of data, but in most cases, you have homogeneous types of data, or at least a common base class or interface that can be used, and constantly casting everything was error prone, in the sense that if you originally wrote a class to have a container of String, and later decided that it would be better to have a StringBuffer, depending on how your code was factored, it could still be possible that a String object either be inserted, removed, or accessed in the container. The only way to find out is to compile, test it, and see if you have any ClassCastException. And if that happens to work, I would just double check your test program to make sure that all the branches are tested in it.

This is one of the areas where generics come into play. In the previous example, if the user had created a LinkedList<String>. and then changed it to a LinkedList<StringBuffer>, then the compiler can catch any improper references that go into the linked list.

One of the other areas that C++ uses generic programming is with algorithms. Consider for a moment the std::distance method, which looks like:

template<class InputInterator>
typename iterator_traits<InputIterator>::difference_type
distance(InputIterator first, InputIterator second)

The above function (the implementation is left as an exercise) essentially takes two iterators and calculates the distance between them, returning the type associated with the container for the length parameter. The class is generic because any class that implements the interface of an InputIterator can execute this method. Now, the attentive reader will note that I just said the word interface; in C++, there is presently no interface support, and therefore, this above mechanism allows C++ the ability to fake interfaces, albeit harsh, in the sense that the exact interface is only supported via template instantiation or documentation (if any exists). The solution in Java would be simple, and it would not require any generics; it would simply require a class that abides to the InputIterator interface.

On the other hand, however, C++'s template specialization allows the for special interpretations of the iterators depending on the exact type of iterator. For example, in the case of a linked list, if you have a pointer at the beginning and one to the end of a sequence, the only way to discover the distance between them is to forward one iterator until they meet. However, in the case of a Vector in C++, where the elements are contiguous, you could subtract the value of the pointer at the end by the value of the pointer at the beginning. While the former approach does work on Vectors also, the latter implementation takes advantage of the underlying implementation, and optimizes this method. This genre of optimization, however, could not be done presently in Java (unless you used a instanceof hack), but this may eventually work. Furthermore, C++ allows the redefinition of types via the typedef keyword, as demonstrated above with the typename iterator_traits<InputIterator>::difference_type; this would allow containers to extensible in number; in other words, if an int were assumed, this would limit the container to having 231-1 entries, however, in C++, this number is defined by the container, and therefore, could be a long, float, or any other value at that.

Just going through some C++-based algorithms that are implemented via generics, I cannot seem to think of any algorithms that could not be rewritten without some proper abstraction of interfaces (perhaps some some generics in them) or members of a class, but I am still convinced that there are generic algorithms that would definitely benefit from Java's generics.

The new syntax of generics is definitely odd to Java developers, and to be honest, many C++ developers are uncomfortable with the syntax. As with any technique, generics could be over used, as mentioned here before. When designing coding, it is important to think about if it is a benefit for a language feature to be used. In my opinion, for C++, generic programming serves no purpose unless you have at least two truly different types using the generic service. With this in mind, if you are writing a class, such as a pool of database connections, then it is probably not a good idea to make the interface to the class generic, however, the internal workings would benefit from the type safety associated with homogeneous, checked containers, and as such, I would recommend it within.

In summary, since many containers of objects are homogeneous, Java's generic support allows this to be enforced by the language. Although interfaces can be used to replace some of the needs for generics, there are algorithms that exist that will truly benefit from the generic support. Between the code reuse and the type safety, I am sure that generics will render code easier to maintain.

Filed In