eyt*

Find your next scenic drive in the Pacific Northwest

June 25, 2006

Clearing an empty java.util.HashMap...

Here is an interesting question; which of the following programs executes faster:

  1. public class ProgramA {
  2.   public static void main( String [] args ) {
  3.     java.util.Map<String, String> map = new java.util.HashMap<String, String>( 100000000 );
  4.     long start = System.currentTimeMillis();
  5.     for ( int i = 0; i < 100; ++ i ) {
  6.       map.clear();
  7.     }
  8.     System.err.println( ( System.currentTimeMillis() - start ) + "ms" );
  9.   }
  10. }
  1. public class ProgramB {
  2.   public static void main( String [] args ) {
  3.     java.util.Map<String, String=> map = new java.util.HashMap<String, String=>( 100000000 );
  4.     long start = System.currentTimeMillis();
  5.     for ( int i = 0; i < 100; ++ i ) {
  6.       if ( ! map.isEmpty() ) {
    // this line is not in the above.
  7.         map.clear();
  8.       }
  9.     }
  10.     System.err.println( ( System.currentTimeMillis() - start ) + "ms" );
  11.   }
  12. }

By simply looking at the code or reading through the documentation, you may think that they are similar since the map is empty, since presumably the clear() method would take this into consideration. But of course, if that were the answer, this would not be much of a blog post.

On my machine, Program_A runs in about 37625 milliseconds with one of my processors pegged whereas the Program_B runs in about 1 millisecond. Why? It would appear that the clear() method recreates its internal representation each time that you call it instead of looking at the current state of the object and realizing that it is indeed empty. Naturally if the initialCapacity is smaller, the program is less affected by this.

Obviously no one writes code exactly like the above snippet, but if you are calling clear() from within a loop, you may want to be aware of this.

June 10, 2006

.NET and loading assemblies for other platforms...

If you are working with .NET, and you see the following exception:

TypeLoadException: Method 'Dispose' in type 'eSQL.SqlConnection' from assembly 'eSQLNet, Version=2.9.2056.14469, Culture=neutral, PublicKeyToken=null' does not have an implementation

Make sure that the .NET assembly you are referencing (eSQL in this case) is for your current platform.

In the case depicted above, I had accidentally selected the ARM version of the eSQL DLL instead of the x86 version. Unfortunately, however, the exception message above does not exactly say that. It would have been nicer to have gotten a message that would have said something about the assembly not being compiled for the current platform...

June 1, 2006

C++ and finally

Almost two months ago, Danny Kalev wrote a piece regarding adding finally to C++. Unlike Danny, however, I think this is a great addition to C++.

Danny has a good point that part of the reason that finally exists in garbage collected languages is that they lack a destruction mechanism that allows for cleaning up. For example, a common use of finally in Java is with files, sockets, and other such resources to ensure that the resource is guaranteed to be released (since the finalizer is not guaranteed to be ran). In C#, this could be done via the using keyword and hopefully in a future release on Java, a similar mechanism will exist that would allow the same behaviour.

But I think that there is a certain amount of value to adding such a feature to the language. For example, if you wanted to have a code segment display a log message at the end of a method, you would need to do something like:

For a general purpose tracing class, this would be, by far, the best approach. But for a specific operation that is done for one specific method, I think that the code above is not as readable as the same code written using a finally, even in the case where the statements in the finally explicitly call a function.

Said another way, the for keyword is equally redundant, since the 1998 C++ ISO Standard states in section 6.5.3 that the statement for ( for-init-statement ; condition ; expression ) statement is equivalent to:

But while the latter is equivalent to the former, there is a certain readability value that comes with the for statement that it is a welcomed keyword.

I feel that the addition of the finally keyword would have the same benefits. It is easily implemented in terms of existing infrastructure quite similar to the for example above is. It has adds increased readability, particularly for cases where the creation of a class would consume many more lines than a line or two in a finally block. Lastly, I think that there is a huge benefit to people using C++ that are accustomed to using other languages; the familiarity with this feature would permit such developers to concentrate on other, more important C++ features instead of trying to figure out how to get around the perceived lack of this feature. I think these benefits outweigh the cons of adding a new keyword, and I welcome the change.

March 22, 2006

Adapting for Concurrency...

Herb Sutter gave a talk on concurrency at PARC earlier this week, and the Audio, Video, and Slides are available.

Where as even up to a few years ago, multiple processor (and multiple core) machines were only available for niche markets, the talk highlights the fact that very soon multiple core machines will be everywhere. This is very exciting, however, we must change how we develop our applications to take advantage of this. From this perspective, it is not just simply using the existing tools that we have such as locks and threads, but also expanding our tools in order to take advantage of these language features.

Herb gives the analogy that this is similar to the early 1990’s when Object Orientation was new. While you could write an object-oriented application in C (for example, consider the FILE structure and methods), it is far easier to write such applications in languages that have native support for objects. Herb states that while we currently can write multithreaded software in our existing tools, writing correct multithreaded software is hard, pointing out that some of the class libraries and even some of his own examples have been incorrect.

During the discussion, Herb mentioned something that I was unaware of. A while ago, I discussed Double Check Locking and how it was broken and later about Scott Meyers and Andre Alexandrescu’s C++ and the Perils of Double-Check Locking. One part that I had not noticed in Scott and Andre’s paper is that, thanks to reworking the memory model, the Double Check Locking Pattern once again works in Java 1.5/5.0 (JSR 133) and .NET 2.0 (CLI 2.0 Section 12.6). The solution is to use the keyword volatile as follows:

  1. private static volatile Singleton instance= null;
  2. public static Singleton get() {
  3.   if ( instance == null ) {
  4.     synchronized ( Singleton.class ) {
  5.        instance = new Singleton();
  6.     }
  7.   }
  8.   return instance;
  9. }

Again, this only works in Java 1.5 and .NET 2.0 because of the changes in the memory model. There are some discussions that the restrictions on volatile may not make it faster than volatile, but then there’s a solution that uses the Memory Barriers instead of volatile that could get around this in .NET. Of course, some of this feels like premature optimization, as synchronization and volatile improvements are things that compiler vendors are likely to working on. I guess its like the free lunch.

And speaking of which, if you haven't really thought about having your desktop application take advantage of a 32 core machine, this is good discussion to get you thinking about that and why.

January 18, 2006

java.util.concurrent Memory Leaks...

About two months ago, one of the Java servers my group maintains ran out of memory, and since then, if this service was up for more than a week, again, we get the dreaded java.lang.OutOfMemoryError. Each time that this would happen, we would take a histogram of the memory usage (jmap -histo pid). After gathering a few of those and comparing those results to what we expected, there are two classes that seemed suspiciously using a lot of memory, java.util.concurrent.LinkedBlockingQueue$Node and java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, so I started looking at how these classes were used.

The java.util.concurrent.LinkedBlockingQueue was used in several places in the program, however, its usage was fairly straight forward. In addition to usually being the one that used the most memory, the other thing that made this class suspect was that its usage was just added a week before the problem first occurred. Furthermore, in all of the various tests that I ran, the number of Nodes would increase significantly, but when the full garbage collector ran (either automatically or when forced), the number of nodes would drop to the number of entries that I expected there to be. This made me suspect that perhaps the garbage collector was not running when the OutOfMemoryError was thrown or that perhaps there was not enough memory at that point to even run the garbage collector, but alas, I was quickly brought back to my senses.

During the investigation of the above, I googled on the class and it brought some interesting results. On the Concurrency Interest mailing list, an entry from last year suggests a memory leak in LinkedBlockingQueue and it was confirmed by Doug Lea. I re-wrote the program (in the method: originallyReportedLeak) suggested in the message to confirm that it was an actual issue, and surely, it is a current problem. Running the program with the JMX arguments, you can attach jconsole to see the memory grow, and no amount of forcing the garbage collector to run will bring it down.

But of course, that was not my problem. The issue with the leak in the above program has to do with a timeout occurring. In all but one of the cases was not using the methods with a timeout, and in that case, however, the timeout value was MAX_INT, and because of the way we use this particular service, there is no way that it would have been able to timeout and leak. And as I mentioned above, forcing a garbage collection always brought it back to normal, so alas, back to the drawing board.

The other class that used outrageous amounts of memory, AbstractQueuedSynchronizer’s Node inner class, however, was not as easy to find where it was used. By using Borland OptimizeIt, however, I found out that this class was used by java.util.concurrent.CountDownLatch, where the code was calling await() with a one second timeout, and in this case, the timeout was always triggered. Sound familiar? I wrote a program (in the method: anotherSimilarLeak) that demonstrates this precise memory leak, and surely, it is the same underlying problem as in the LinkedBlockingQueue above.

In this particular program, we decided that the latch was not necessary for this particular method, and that replacing it with a sleep was all that was necessary. This Bug is fixed in Mustang, the next version of Java, so hopefully your project can wait until it fixed. Otherwise, you may unfortunately need to roll-your-own timeout functionality... just make sure to properly document such hacks so that you can remove them in the near future!

Earlier Entries

<  1  2  3  4  5  6  7  >

Navigation

Recent Posts

Get Firefox
eyt*