Friday, September 18, 2009

Making the testing better

Despite unit testing has formal metrics that can prove at least minimal level of thoroughness; many organizations struggle to uncover measurement that can help telling how much of unit testing is enough. The minimal level of testing can be measured in terms of code coverage. Code coverage can be as weak as statement to as strong as du-paths. But this is not my goal to explain each of those. My goal is to help those who try finding an alternative measure of unit testing quality.

One of ideas is measuring it post-factum. Well it may be too late to know the truth. Nonetheless this knowledge can be used in the following project to reduce number of defects of a specific sort. I mean the feedback that you receive from testing.

Software is being developed in several stages. Even in Agile, there are stages; but they are shorter and tightly intertwined. Defects injected at a given stage can be found at that stage or at the following ones. Depending on how many defects injected at the stage escaped quality control, we can draw our conclusion about testing quality.

For example, we have created a unit and performed unit tests. In result of testing we have found and fixed 16 issues. But later when a unit comes to Integration and System testing, we discover additional 4 issues attributed to unit. This means that defect removal efficiency of unit testing for that module was:

DRE = (1-4/(16+4))*100%=80%

To enable this measurement you need to analyze issues found at later stages of development. If you have a defect tracking system (and I hope you do) then you can simply introduce the field that will denote to what of development stages a defect can be attributed. The biggest problem here is having defects found at unit testing stage filed somehow. Many developers see it as an additional unneeded paperwork. So, if you are going to implement this metric be prepared to the resistance. You may decrease the pain by claiming only brief information on unit testing defects to be filed (summary only). There are tools that can be integrated into development IDE to soothe this pain for developers.

Other testing stages' effectiveness can be measured the same way. All you need is to collect this information continuously. Two or three projects later you will have plenty of interesting information to think of:

- Why is this module's unit testing is lower in effectiveness than others?
- Why developer A usually produces higher effectiveness than others? Can the latter share the experience to the team?
- Who is responsible for most of the issues down the road? What can you do about it?

Hope this helps! :)

7 comments:

  1. Vladmir,

    Excellent summary of the problem.

    One question though -- want criteria does one use to attribute a defect as one that was missed in unit testing? Does this mean that if the code coverage criteria for Unit testing was "all statements and branches" then a defect found in integration testing that should have been uncovered by that unit test completeness criteria is attributable to unit test? If not, what am I missing?

    ReplyDelete
  2. The problem with using Code Coverage as a metric is that it only measures one aspect of your tests, which is what code paths the tests manage to traverse.

    The thing is that to be effective, tests need to do two things.
    1) exercise a particular element or path in the code.
    2) do so in a way that uncovers errors
    3) detect the errors when they occur.

    Perhaps you can already see where I'm going here? Code coverage tools will tell you #1, but not #2 and #3. So if you don't have tests that are 'good tests' relative to all three aspects, it's entirely possible to have 100% code coverage, and a ton of undetected bugs still lurking in the system.

    Having developers write tests first, and watch them fail, THEN write the code that makes them pass can mitigate the third aspect for the most part, but can still leave the second unaddressed. That aspect takes careful selection of test criteria and for example sample values in a 'FIT' or BDD "Scenario outline" style table of values to test with.

    To answer your question directly William, that takes looking at bugs uncovered in integration testing or later, and once you discover what 'unit' of code is responsible for the bug, you pretty much know the unit tests there didn't catch it. (so better modify them, or add a new test to catch the bug, make sure it's failing, then fix the bug and your new/modified tests should pass)

    ReplyDelete
  3. William, defect in the body of a unit is always attributed to the unit test. Defects that appear between units (boundary shifts, protocol misuse, non-validated input, etc.) are attributed to Integration testing if they appear at the later testing stages...

    The general rule is that defects are attributed to the process that is more suitable to catch them.

    ReplyDelete
  4. Thank you for the clarifications with regard to _Defect Attribution_.

    If I use an SOA service or some transform function as an example, then:
    a defect will be attributed to the Unit Testing when the defect is a coding error in the unit and the error could not be discovered without combining the unit with other unit-testing code, whereas
    a defect will be attributed to Integration Testing when the error cannot be discovered without running the unit with other unit-tested code.

    Have I got that correct?

    Do you ever consider attributing the error to the design of the unit or the specification that lead to the design?

    With regard to the use of _Code Coverage_ criteria as a measure of testing completeness, I agree with Chuck's assertion that code coverage may not tell much about the quality of the testing, but I feel that it is better than not having a measurable completeness criteria for the test effort.

    Most of my work has been at the system test level where my considerations reflect the completeness, correctness, and lack-of-ambiguity in the specification (requirements). I feel fortunate to have enough experience in that area to be able to target a level of functional coverage for completeness. We usually do not worry about code coverage (statements and branches for example) unless we are working in a domain that includes that as a must-have criteria. Then, as Chuck says above, the coverage instrumentation and analysis simple tells us what lines of code were not touched by the test cases whereupon we investigate the reasons for those lines of code and devise suitable test cases with the intention of uncovering defects.

    ReplyDelete
  5. Let define what we mean by "Effective" and then I can give you a targeted respone. This is an important topic that has been around for 20+ years. This issue is that we need a strong definition; otherwise our comments will not be taken seriously or lead us to a solution.

    Lets work on a definition of "Effective" relative to software testing; and see how it can help guide our work and outcomes.

    Thanks,
    -Herb

    ReplyDelete
  6. I would define "effective" in terms of defect removal effectiveness and cost...

    ReplyDelete
  7. More precisely testing effectiveness is a function of defect removal capabilities and cost.

    ReplyDelete