Sep 20, 2004

Debug using Logs...

Last weeks Developer Central newsletter (#79) had a piece entitled “Improving Software Development: Logging Strategies” by Mike Gunderloy which discusses some of the things that I have been meaning to write here myself.

When you talk to developers about logging, it really means different things to different developers. Generally developers who have the luxury of working with a good logger really want one, whereas those who have either never used a logger or used a poorly designed logger are generally in the camp that it is not particularly useful.

Even with the excellent tools that we now have available, there are a number of developers that still prefer printf() style debugging. In this style of debugging, a developer generally inserts some printf()s in the code, recompiles, runs, evaluates the output, and repeats the process.

This style of debugging is perhaps tried and true, especially in development environments where you know exactly what you are tracing down. Generally you know what you were doing, or the tester can generally give you enough context to narrow it down. Martin Fowler states in his Refactoring: Improving the Design of Existing Code that “Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead;” in other words, instead of extracting energy in writing a printf() function to ensure that the variable you think is 10 now is, write a test program to ensure that it is that value. Martin's premise is that the test program does not go away, whereas your printf() statement will be removed when you merge the code, or perhaps at some point in the future, hopefully before a customer sees some cryptic debug message.

While this approach is interesting, the problem with the approach is that it is only useful in the development environment, presuming that the test programs are executed and inspected regularly. That is a lot of conditions to be true. All too often, the software works perfectly in-house and once released, it fails. This phenomenon rarely fails, especially when it is important that the software works perfectly. But, how do you debug this problem?

While some problems are easily reproduced in-house such as different work flows, some other issues can be extremely difficult to trace, such as those that involve timings, and this is really where good logging will be the difference between finding the bug quickly, and, well, not.

As Mike highlights, there are a number of things that log files are good for. Because log files provide the history (hopefully) surrounding the problem, it allows developers to understand the actions that were actually done and the sequence of those actions. These two items alone are extremely important. There are many times that people will overlook something that they have done that is actually very important in the grand scheme of things, which changes the conditions for the bug. Logging such conditions will ensure that this information is recorded to help in reproducing the problem.

One of the hardest types of bugs to catch are timing related bugs, and debuggers are notorious for making the software behave differently, which usually makes the bug change behaviour. However, since you will be shipping with the logger in the source code (perhaps in a different state), the behaviour of the software should be similar regardless of the surrounding conditions.

Another feature of log files is that you can sample the information within them to either look for patterns, and this search for patterns does not really need to be related to a bug. While you can look through the file and say that after X operations, the bug always occurs and try to see some conditions of why that could be or anything that was different before that time, it could also be used to come up with some real-life usage information to guide optimizations and understand work flow. There are sometimes where the work flow that developers envision is not really the work flow that users use. While there is no better way that to talk with customers to learn how they use the application and what they would like to see differently, sometimes people only think in the current context and do not realize all the potential that software really has, and gathering information from log files is a great idea to get useful statistics on the application. This information not only helps the current developers of the project, but also support and future developers on the project.

In one of the last Journal of Object-Oriented Programming (October/November 2001), Jerry Gao, Eugene Y. Zhu, and Simon Shim wrote “Tracking Software Components” which discusses distributed logging. Of the many things that they mentioned, one of the most interesting ideas presented was the ability to be able to play-back a log file. In other words, the application would record a number of events, and using a supplementary tool, the events could be executed in the same sequence to try to reproduce the bug. This obviously requires some careful planning and in some cases, significant effort to write a playback tool, however, this example is simply provided to demonstrate some of the power that a properly used logging system can do.

As with many techniques, adding logging to new code is usually easier than adding it to existing code, but deciding what to log makes the difference. As mentioned here, error conditions and exceptions should, at the very least, by logged. While the condition is potentially of mediocre concern today, as the software evolves, it can become more important in the future, and logging such conditions will allow you to see some areas that may have gone wrong.

In addition to this, you will want to know some of the user's actions, such as the command-line arguments, the buttons pressed, menus selected, et cetera, as well as information about the environment, such as environment labels, hardware, memory, et cetera. This will allow you to see what the user has done and see the errors in relation to the operation that was being performed. Likewise, information about the internal state or length operations should also be logged, but probably at a lower level to allow developers to see them, but clients to disable them.

One of the things that I mentioned above is that the logger is not removed from the shipping software, but it can be disabled, in many fashions. Mike actually mentions disabling the logger completely, but in my opinion, this is not a good idea, as you never know when an issue can pop up. Besides, the logger can easily be integrated with your user interface and provide users with more context of what is happening within the system.

A better approach is to actually use a logger that filters messages based on some configuration option. Most loggers present some way of assigning a level to a log message, and this determines how the error message is propagated. The range of available levels, however, ranges from logger implementation to logger implementation.

Off-the-shelf loggers are available for most programming languages available, and many modern languages come with a logging service (such as Java 1.4's java.util.logging). No matter which logger you decide on using, it is important to select a flexible logger. The flexibility is usually obtained by using levels to discriminate the messages, the option to be threaded or not, and the ability define an Observer that allows for custom reporting services, such as presenting certain messages in the status bar. The log file utilized must also provide a wrapping feature after either a certain number of messages or a particular file size, and must contain, at the very least, a time stamp, a level, and a message, but more context is always desirable, such as the filename and line number where the error is logged, and the host name and/or thread identifier that the error message is from.

Debugging client applications is much easier with the proper context. Logs provide context in an application with minimal overhead. The advantages greatly outweight most excuses people will bring against them; there are few applications where loggers do not make sense, and usually all that is needed, if anything, is a better logger.

Filed In