Thursday, June 23, 2011

CEO, are you ready to change?

Many want to get better in sense of product quality their organization produces. Few actually get successful. Many of those who don't blame their managers for not performing to the expected level and bla-bla-bla. Meanwhile, the problem may be in the leader him- or herself.

The goal of making better quality implies changes throughout the organization. The head is not an exception. The problem with bosses is their belief they can drive things with only desire. Well my daughter is 4 yours old and she also believe she can ;) When you are about to start changes in the company start from yourself. This is very easy to think that someone else will do all the hard job. Ho one will be in position to do this if the problem that prevents changes is at the top because ho one has authority to affect boss' behavior.

Late changes is a killer of quality processes. Even agile can't stand late changes. If you are boss then you must learn "enough is enough" principle. Learn to distinguish "a must" from "nice to have" to avoid forcing your team and processes to collapse into late changes nightmare. More important things may be affected because what boss wants comes first attitude.

Yes, most of top guys are very selfish and addicted to the thing that they are the only persons who know how to drive things. This is far not always so. Let other drive for a while and see what happens. Sometimes you even have to become followers. Listen to what quality people tell you very carefully and trust them when you are even not sure they do what you think is right. They are people who know their work better than you. Don't pretend you can cook better than a chief at your favorite restaurant ;)

Sorry, for being a bit clumsy. I have little time. Now back to work and remember - ENOUGH IS ENOUGH :)

Wednesday, June 8, 2011

Load testing

Today I have written to a customer on what we usually do about load/performance testing. I decided to put it here should I need it again as well as to help you organize our load testing activities.

***

Load testing usually start from learning about the customer's problem. Every load testing session is unique and this is very important to start moving right direction from the beginning.

After learning the purpose of testing we develop testing strategy. Test strategy include the definition of load scenarios (how many virtual users are to be involved, where to put the load in the system (at which hierarchy level), what type of scripts to use (simulative or fast), etc.)

Then we start working on scenarios. Scenarios are the individual scripts which will be played back by virtual users. Customer input is vital because domain knowledge is the key to creating the right set of scenarios.

Then we execute several test runs with different load level to see how server reacts and to make sure that load is adequate and test results are not affected by configuration or communication issues. Usually it takes from 5 to 10 runs to finalize test plans and conditions.

A very tight communication with development representative and deployment team is very important. We have to make sure the testing is performed in clean conditions, whereas no one else can interfere and alter test results.

During the testing we usually set up server-side monitors to find the resource-bottleneck, if any.

***

Having done all above you will make sure that the problem is solved, not just a load test performed.

Wednesday, June 1, 2011

Risk management in Test Planning

Did you ever ask yourself what testing does in the big perspective? Surely you did :) My answer would be: testing is reducing the risk that users will face any issue with the software. In this respect this is very close to what other engineering specialists do. Same way they reduce risk that building collapses on a severe earthquake or that a car will suddenly be on flame while you drive.

Did you ever wonder what techniques other engineers use? Well, I guess most of didn't, in which case this article will be of help.

I have worked with the software that implemented one of most used strategy for risk reduction - Failure Mode and Effect Analysis (hereafter FMEA for the sake of conciseness).

The key in risk management is keeping the possible bad effects in sight. So, it's all about perception. You need to imagine what can go wrong with your application in user hands and build up the strategy how to prevent that particular risk from happening. In case of car manufacturing there can be the risk of losing a wheel at a high speed. Engineers will think of special means to prevent a wheel from spinning away even if the nuts get loose. Solving one such problem engineers advance to the next one, starting from most severe one (severity is this case is a combination of probability and impact).

Now back to the software testing. The risk that our users may face are caused by software defects. Of course this is unrealistic to foresee the defects. But we can foresee the consequences of malfunctions. Let's start from the very beginning. The very first thing a user do is trying to set up the application. So, the worst thing that may happen is setup failure preventing further use of the software. We have found the failure mode, but this is not enough to start creating prevention strategy as this risk is too obscure. The point is to be as precise in description mode as possible.

The setup may fail in many different ways:

- Setup failure in the default setup conditions
- Setup failure due to changing setup options
- Setup failure in unattended mode
- Setup failure while running through the deployment center
- Setup failure due to software compatibility
- Setup failure due to hardware compatibility
- Setup failure due to old version compatibility
- Setup racing failure
- Setup performance is unacceptable low

Now we have the list of possible failure scenarios that can be addressed by testing. Each of the items above has a different combination of probability and impact. For example, "Setup failure in the default setup conditions" may have low probability but highest impact. Meanwhile, risk "Setup failure while running through the deployment center" may have lower impact but the highest probability.

FMEA has a lot of templates that will help to summarize and analyze this information. You can easily find the on the internet. Most of those will be overburdened with the information you will hardly need in analysis, so I suggest my own variant of the table:

## | Failure mode | Impact | Probability | Prevention | Comments |

Prevention depends on the context, so I can only provide you several examples.

One of the systems that I worked on with test team was a very old and big client-server. The reliability was the real problem, so I have put this risk on a table and started to think of the ways to change things to better. The probability of the failure mode was estimated as medium, the impact was highest. Prevention included non-stop automated reliability tests on a dedicated server 24x7. In result, it helped to find major issues that could never be found by other types of testing.

So, the bottom line is:

- Strategies developed in other industries work in software development too, so don't just neglect those finding only because we-are-so-different. This is not the case, nor the excuse.

- Try to foresee possible failure scenarios.

- Build up prevention strategy (which can include not testing only, but all process stages)

- Define a plan where all the required prevention steps will be listed.

- Work to the plan but keep an eye on FMEA table. Things may change as you go. So, the correction may be needed.


Hope this helps! I would be happy to learn what you think.