Find your next scenic drive!

August 8, 2012

Don’t return from within finally

Earlier today, I was debugging some seemed to be some pretty straight-forward code. The code effectively was the following function. What would you expect the output of this class to be?

  1. public static void main( String [] args ) {
  2.   System.out.println( foo() );
  3. }
  4. public int foo() {
  5.   try {
  6.     throw new RuntimeException();
  7.   } catch ( RuntimeException e ) {
  8.     System.err.println( "Caught" );
  9.     throw e;
  10.   } finally {
  11.     System.err.println( "Finally" );
  12.     return 1;
  13.   }
  14. }

If you expected an exception to be percolated out as I was, you are wrong. It actually turns out that the above returns 1 and ignores the exception that was just raised. Just something subtle to keep an eye out for.

April 7, 2012

How do you quickly sort 84 million unique Integers?

For the last few weeks, I’ve been working on increasing the performance of various areas at work and one of the areas that kept on hitting my profiles has been sorting integers. All of us with a Computer Science degree have sorted integers before and we know how to cost algorithms with Big-O notation, but most of the time, the default sort in Java, C#, and other languages is “good enough” that we don’t really think of it.

When it starts to show up on a profile though, sometimes that is time to start thinking about it. As with many things, algorithms based on N usually perform well when N is reasonably sized, so you do not always notice the issue until N becomes interesting or the number of times you call the algorithm increases.

Whenever diagnosing performance problems, it is important to characterize whether it is one or more big calls to the method that causes the entire time sink or whether it is the sheer number of calls to that method. In either of these problems, I think a reasonable first thing to ask is whether that call is really necessary or is it at the right location? More specifically, if you are not running binary searches or other algorithms or use-cases which require the values to be sorted, why sort them at all? If for no other reason, this causes you to think through how to code is or could be used and sometimes leads to a different solution.

In my particular case, there were a few large sorts that were taking all the time. Each of these sorts were sorting approximately 84 million integers via the JDK standard Arrays.sort() or the Colt IntArrayList.sort() method and taking approximately 10 seconds per sort. Depending on your background, 10 seconds may be an eternity or a dream, so suffice it to say that the entire processing required for this component had to be less than 3 seconds, approximately half of that had already been accounted for.

So assuming it is required, how do you sort faster? An obvious approach to server engineers is to multithread this operation, such as sorting chunks of that memory in parallel and then merging them together, or perhaps a multi-threaded version of quick-sort, and I am sure that my team expected me to go hide in a corner for a while and code up one of these approaches...

But I didn’t. You see, there’s something particularly interesting about the integers we were sorting. First, they are all positive, and second, there aren’t any duplicate integers. So instead of sorting them, can we do anything else with them that we can infer the sort and have more memory locality?

Well, one way to do it would be to use a set of bits. Say that the largest number in the array was 200 million, you can allocate 200 million bits and then sort the numbers using an algorithm like this:

  1. for each N in array
  2.   set bit N
  3. while more bits
  4.   find next set bit
  5.   set array[i] to that bit
  6.   increment i

Before we go further, you will note that the above algorithm goes through the array twice, and that may seem bad to you, however, there is another thing to keep in mind when assessing this algorithm, which is memory locality, and the fact that especially in the second loop, you are effectively iterating sequentially through two arrays which are probably in the L1 or L2 cache already, potentially making a big performance improvement if you have performance-grade processors.

The following sample code implements this in Java using the BitSet class. I’ve personally not gone through the code to see if it is the most optimal implementation for what I am doing with it, but after this change, the performance was in the acceptable range.

  1. // Code available here
  2.       BitSet bits = new BitSet();
  3.       for ( int i = 0; i < array.length; ++ i )
  4. {
  5.           bits.set( array[i] );
  6.       }
  7.       int lastBit = 0;
  8.       int i = 0;
  9.       while ( true ) {
  10.           lastBit = bits.nextSetBit( lastBit );
  11.           if ( lastBit == -1 ) {
  12.               break;
  13.           }
  14.           array[i] = lastBit;
  15.           ++ lastBit;
  16.           ++ i;
  17.       }
  18.   }

Performance across machines will vary. On my work servers with the latest Xeons, it averages around 800ms per sort, compared to an average of 10 seconds. On my home server, it is a little slower but averages around 9 seconds, verses 23 seconds for the QuickSort algorithm.

Now the above algorithm is not optimal for all sizes. In my tests, the magical point was somewhere around one million rows where the performance of the default Arrays.sort() and the above crossed over, so we do a quick size check and either go to the standard QuickSort or to the BitSet-based sorting.

Those familiar with the BitSet API may mention that I could gain better performance by telling the BitSet algorithm the number of bits I am interested in using and allowing it to pre-allocate the memory for that. Unfortunately though, I did not have that information available for most areas in the code and it was not feasible to guess in this area of the code. What was super interesting though is that doing a pre-scan of the entire array to find the maximum number did not change the runtime of the algorithm significantly. In fact, the TP99 of the two were off by around 100 milliseconds in my test. This could definitely be an area to improve the above algorithm going forward.

If the above algorithm was not fast enough for you still, there is still room for improvement. First, you can look at pre-allocating the array. Second, the second loop of the algorithm is easily parallelizable and perhaps the BitSet implementation for this use-case could be simplified.

As I mentioned above, the algorithm only works for unique, positive integers. If you have negative numbers, you could possibly use a second BitSet object for the absolute value of the negative numbers and iterate it in reverse (applying the sign obviously) which would also work. Duplicates obviously don’t work though, and if you have that problem, ask yourself if you really need the sort at all or whether you need the duplicates, and if you do, consider a parallel sorting algorithm.

Now off to another interesting problem. Wanna help?

December 3, 2011

Rounding numbers with Java’s NumberFormat...

Take a look at the following numbers and round them to the nearest whole number:

  1. 2
  2. 2.1
  3. 2.2
  4. 2.3
  5. 2.4
  6. 2.49
  7. 2.5
  8. 2.51
  9. 2.6
  10. 2.7
  11. 2.8
  12. 2.9
  13. 3

I’m pretty confident that you will get the same answers as my 4th grader, which is that the numbers from 1-6 result in a 2 and 7 onwards results in 3, right? Well, given the following code though, you’d be surprised:

  1. import java.text.DecimalFormat;
  2. import java.text.NumberFormat;
  3. public class test {
  4.   public static void main( String [] args ) {
  5.      double [] values = new double[] { 2, 2.1, 2.2, 2.3, 2.4, 2.49, 2.5, 2.51, 2.6, 2.7, 2.8, 2.9, 3.0 };
  6.      final NumberFormat format = new DecimalFormat( "#" );
  7.      for ( double d : values ) {
  8.         System.out.println( d + ": " + format.format( d ) );
  9.      }
  10.   }
  11. }

Your surprise would come from the fact that the 2.5 rounds down to 2. Why? Well, reading the DecimalFormat javadocs closely, you will notice that the default rounding policy is HALF_EVEN, in which the above behaviour is explained. To change that, simply use format.setRoundingMode( RoundingMode.HALF_UP ); or one of the other policies available in the rounding mode.

Why this is the default behaviour is puzzling to me, but I’ll just add it to the list, which includes the Math.abs() handling of Integer.MIN_VALUE and non-thread-safe SimpleDateFormat [Link removed] (which, for the record, I do not believe that the code suggested in that blog post is guaranteed to be thread-safe either... since the DateFormat class is declared to be non-threadsafe also).

September 17, 2008

Math.abs has an edge case...

Consider the following Java code:

  1.   int ai = Math.abs( i );
  2.   assert ai >= 0;

Will the assert ever fire?

Yes, it turns out that it can fire if i = Integer.MIN_VALUE, and you have been warned about this in the Math.abs javadoc, which states “Note that if the argument is equal to the value of Integer.MIN_VALUE, the most negative representable int value, the result is that same value, which is negative.

Keep this in mind when looking at such innocent looking code...

September 11, 2008

Java Mail: Make sure the mail.smtp.auth property's value is a String...

If you are using the JavaMail API to send out e-mail via an e-mail server that requires authentication and are getting an error like com.sun.mail.smtp.SMTPSendFailedException: 550 You must authenticate, make sure the mail.smtp.auth property is assigned as a String, like this:

  1.   properties.put( "mail.smtp.auth", "true" );

Not exactly the most obvious bug to find...