Published on May 29, 2008

This post was previously on the Pathfinder Software site. Pathfinder Software changed its name to Orthogonal in 2016. Read more.

I’ve had a running debate with my colleague John McCaffrey on the question of testing. He is a big fan of unit testing and testing in general that examines the smallest possible units to be tested, then assembles them in every larger integration tests. The idea is that if you get the small stuff right, then the larger stuff has a greater chance of being right too.

Come to think of it, I’m a big fan of this approach too. But there are some times that this sort of strict constructionism is insufficient. There are a few instances that a comprehensive system test is necessary. Some instances that come to mind are:

  • Systems where the correctness of third party hardware or software and it’s interaction with our system need to be verified. For example, network hardware and operating system interaction with video conferencing software.
  • Complex algorithms and logic whose correctness is difficult to prove mathematically. For example, if your software application allows complex logic to be written by the end user, essentially amounting to the end user writing programs themselves. The Four Color Problem is an excellent example of the sort of algorithm whose correctness is difficult to test and required far more effort than the development of a correct algorithm.
  • High value systems where an error or an oversight in a unit or integration test can lead to damaging consequences. Security of online systems is an excellent example of this; perhaps the most tested area of applications, we still regularly hear of compromised applications.

There are a number of ways to tackle these types of testing challenges, but many of the approaches resemble an empirical, almost experimental tack: treat the system as a black box and record and measure inputs and outputs.

Security and Fuzzing

One extreme approach to security testing, known as “fuzzing”, treats the target system as a black box and tries to subvert it by feeding it garbage, e.g. in the case of a web application, pass it crazy query string, do a POST with a large amount to binary data instead of a simple GET, send it bogus cookies, etc. Now do this a massive number of times with randomly generated attacks. See what breaks. (See here for a list of web fuzzing tools.) The nice thing about fuzzing from a agile development perspective is that is can be automated and, as long as it is black box, doesn’t have to be constantly updated the way interface level tests (i.e. Selenium) do.

The Myth of Code Coverage

If you do Unit Testing, you want to measure how good of a job you are doing. Thus we make use of code coverage tools to see what percentage of line and branch coverage we have. Where code “coverage” is somewhat misleading is that it doesn’t take into account state, i.e. if you are testing a class and it has instance variables, say two characters that can contain ‘a’ thru ‘z’, then you have 26 * 26 = 676 different possible states. If you add a few more instance variables with larger ranges, some lists of related objects, etc., etc., you can get into some pretty serious state.

How complex can this get? Consider the case where our class represents a Turing machine, with a tape instance variable, and a finite state instruction set. Testing that class is the same as testing any arbitrary program — very very very hard.

You might scoff and say that your code doesn’t implement anything a silly as a Turing machine, but how many software programs contain an implicit or explicit language — configurable logic for survey applications, extensible behavior for editors, formulas and scripts in spreadsheets — that represent the same sort of complexity?

So, you should still do your Unit Testing, but when your software starts to resemble a programming environment, include large system regression and random walks through the application functionality.