Inspired by Probabilistic Programming and Bayesian methods for Hackers, I wrote a program indended to assess the quality and confidence of testing.
The inputs to the program are as follows:
- your prior estimate for the probability of the code you’ve just written being free from bugs
- given that your program passes your test suite without errors, your estimate on your code being error-free
- given that your program passes your test suite without errors, your estimate that your program still contains bugs.
Basically, the first of these three parameters, the ‘prior’ expresses the confidence you have in your coding ability. Parameter number 2 and 3 expresses your confidence in the quality of your test suite.
Above are three qraphs, each of them having a prior of 20%, that is, your belief in your ability to write bug free code is 20%.
In the first graph, we can pretend that we have just begun to build our test suite, and our conficence on it’s capability for catching problems is fairly low. Therefore, we set the probability for the code being free from bugs despite passing the test suite to some number reflecting that uncertainty, let’s say 80%. Similarly, we set the probability for the code having bugs, despite having passed the test suite, to 20% .
In the second graph, those numbers have been changed to 90% and 10%, that is, our confidence in our test suite has increased a bit (but we still have fairly low estimate, 20%, on our ability to write bug free code.
In the third graph our conficence in our test suite has grown further.
The problem, of course, is not doing the Bayesian calculation, that’s easy, the problem in the real world is to find realistic numbers for the parameters. To do so, one might grab the parameter for the prior from historical records for each developer, and to set the parameters for the confidence in the test suite, I can imagine several parameters impacting that, e.g. test coverage, that is, percentage of lines of code of the application being exercised by the test suite. memory leak and usage analysis, performance analysis, domain experience etc.
The point of the Bayesian analysis in this context is to identify some rationale for when the testing effort is ‘good enough’ – we can never do exhaustive testing for anything but trivial applications.