Friday, January 25, 2008

Testing

You really appreciate the value of tests (especially regression) when you have to fix some code which is complex and is used in multiple places and you aren't sure of the various conditions under which it gets invoked (or even what the result should be). Unfortunately it is sometimes difficult to convince management. We asked and were denied initially anyway , for a separate environment to create and run tests. The separate environments were needed because of the extreme complexity of the business needed, in my opinion, an environment where the initial state is exactly known , otherwise there is no way to accurately predict the outcome of running time sensitive, old data sensitive scenarios. However the cost of maintaing said environment was deemed to be too high. Im not so sure. anyway we are getting an environment so yay, let the games begin.
It's also at time like these that you realise , how out of touch with reality , the TDD/Agile/ JUnit testing people that you see are , because they never seem to discuss anything but unit testing.
I cant see unit test's adding much value to the system I work in , but oh what wouldn't I give for a set of repeatable integration and functional tests.
Other than the really good Java Next Generation Testing book I havent seen the Testing books deal with real world stuff. (Did anyone else think that the Junit cookbook example of Money was representative of what people test in an automated way in real life)

So here are the things that make the testing complicated
a. The system run's on a cluster. Problems have been reported on the cluster , some of which have found to be environment related and some are code related. Would anything other than in container testing on representative environments find this out?. This deals directly with the people advocating 'mocking' interfaces, not running in container, having fast running tests etc.
b. One of the defects was caused when two unrelated processes ran at exactly the same time and on the same server of the cluster. The processes shared infrastructure code but were business wise and code wise unrelated. How would a test have found this out , other than sheer luck? Note that junit as a framework is really bad. Again this is not a complaint about JUnit, just that you dont see people discussing concurrent testing of unrelated data.
c. A complex Architecture. A portal sends data asynchronously (JMS) to a WLI system which has multiple processes , which in turn needs a lot of system data (histories) to perform calculations, makes its business decisions using a set of business rules which are User Configurable(using a totally different system) and then persists this data to a database.
Assuming you want to test that everything works end to end, is there any other way other than restoring the system to a known point in time(so you know what the out come should be?). The catch again is that running the same test on the exact same data might not still return the same results because the results are time sensitive (e.g. there is a different contract for 2007 as compared to 2008 so running the test in 2008 will give you different results).
d. The results depend on the sequence of tests. This isn't a code problem , this is exactly how the business works. Other than creating different data for different cases , this is difficult to solve. In addition you do want to also test what happens when a particular sequence of events occur ( a claim is in , it is paid, the cheque is sent, the claim is adjusted, a new check is sent , the first one is returned, now assert). How is it that the unit testing folks shout themselves hoarse that tests should be independent of each other. I know I know , what I have described isn't a unit test. But you see , the individual bits work. The sequence fails.
Oh I can go on. But there is an India v/s Australia cricket match. Go India!

Debugging

Due to circumstances beyond our control , we've had to take up fixing someone else's code. The business rules are complex (which means the code must be so). There are no existing tests (this is a consequence of both the complex rules and a complex architecture , the latter could be simplified if it were not for the fact that the project is years late). Now the TDD guys are probably jumping up and down at this stage , pointing out that the lack of tests is the reason why everything is so complex , just ignore them , we'll get to these folks later. The bottom line is there are no tests, we are trying to create some regression tests, that will take time , meanwhile there are still bugs to be resolved. I was assigned one of them.
It took me 2 hours to setup the environment . Ones a weblogic portal domain and Ones a weblogic integration domain. The build for the portal takes about 30 minutes after which weblogic workshop duly crashes. I have to point my servers to a configuration other than their default which needs me to mess around manually. It takes another hour to understand what the defect is. it takes some more time to verify that the tester is indeed right, the system is doing something wrong. It takes some time to identify the areas of code that are possibly the cause. it takes a couple of hours to understand the nuances in the mostly undocumented code.
I then have to go and ask another dev, whether the testers expected results are correct, and it turns out they aren't. There is a defect and what the system currently does is wrong , but what should the system do is undocumented and unknown. We could make a guess but this impacts other systems so we need them to confirm. A meeting is scheduled.
The meeting lasts 2 hours. Some issues are resolved, Some new ones are raised. Some have to be followed up with other people. Armed with some more knowledge I look through the code. I get a sudden inspiration, I test out two scenarios , it looks like I'm right.

The defect is fixed by modifying all of two lines of code. Total time to fix defect ? 1.5 days.