Tuesday, April 19, 2011

TESTING METRICS

What is Test Metric?

Test metrics is a process for analyzing the current level of maturity while testing and predict future trends, finally meant for enhancing the testing activities which were missed in current testing will be added in next build to improve or enhance the Testing Process.
Metrics are the numerical data which will help us to measure the test effectiveness.
Metrics are produced in two forms
1. Base Metrics and
2. Derived Metrics.
Example of Base Metrics:

# Test Cases
# New Test Cases
# Test Cases Executed
# Test Cases Unexecuted
# Test Cases Re-executed
# Passes
# Fails
# Test Cases Under Investigation
# Test Cases Blocked
# 1st Run Fails
#Test Case Execution Time
# Testers

Examples of Derived Metrics:

% Test Cases Complete
% Test Cases Passed
% Test Cases Failed
% Test Cases Blocked
% Test Defects Corrected

Objective of Test Metrics
The objective of Test Metrics is to capture the planned and actual quantities the effort, time and resources required to complete all the phases of Testing of the software Project.
Test metrics usually covers 3 things:
1. Test coverage
2. Time for one test cycle.
3. Convergence of testing

Why Testing metrics?
As, we all know, a major percentage of software projects suffer from quality problems. Software testing provides visibility into product and process quality. Test metrics are key ”facts” that project managers can understand their current position and to prioritize their activities to reduce the risk of schedule over-runs on software releases.
Test metrics are a very powerful management tool. They help you to measure your current performance. Because today’s data becomes tomorrow’s historical data. its never too late to start recording key information on your project. This data can be used to improved future work estimates and quality levels. Without historical data, estimates will be guesses.
You cannot track the project status meaningfully unless you know the actual effort and time spent on each task as compared to your estimates. You cannot sensibly decide whether your product is stable enough to ship unless you track the rates at which your team is finding and fixing defects. You cannot quantify the performance of your new development processes without some statistics on your current performance and a baseline to compare it with. Metrics help you to better control your software projects. They enable you to learn more about the functioning of your organization by establishing a Process Capability baseline that can be used to better estimate and predict the quality of your projects in the future.
The benefits of having testing metrics
1. Test metrics data collection helps predict the long-term direction and scope for an organization and enables a more holistic view of business and identifies high-level goals
2. Provides a basis for estimation and facilitates planning for closure of the performance gap
3. Provides a means for control / status reporting
4. Identifies risk areas that requires more testing
5. Provides meters to flag actions for faster, more informed decision making
6. Quickly identifies and helps resolve potential problems and identifies areas of improvement
7. Test metrics provide an objective measure of the effectiveness and efficiency of testing
Key factors to bear in mind while setting up test metrics
1. Collect only the data that you will actually use/need to make informed decisions to alter your strategies, if you are not going to change your strategy regardless of the finding, your time is better spent in testing.
2. Do not base decisions solely on data that is variable or can be manipulated. For example, measuring testers on the number of tests they write per day can reward them for speeding through superficial tests or punish them for tracking trickier functionality.
3. Use statistical analysis to get a better understanding of the data. Difficult metrics data should be analyzed carefully. The templates used for presenting data should be self explanatory.
4. One of the key inputs to the metrics program is the defect tracking system in which the reported process and product defects are logged and tracked to closure. It is therefore very important to carefully decide on the fields that need per defect in the defect tracking systems and then generate customizable reports.
5. Metrics should be decided on the basis of their importance to stakeholders rather than ease of data collection. Metrics that are of not interest to the stakeholders should be avoided.
6. Inaccurate data should be avoided and complex data should be collected carefully. Proper benchmarks should be definite for the entire program
17.6 Deciding on the Metrics to Collect

There are literally thousands of possible software metrics to collect and possible things to measure about software development. There are many books and training programs available about software metrics, which of the many metrics are appropriate for your situation? One method is to start with one of the many available published suites of metrics and a vision of your own management problems and goals, and then customize the metrics list based on the following metrics collection checklist. For each metric, you must consider,
1) What are you trying to manage with this metric? Each metric must relate to a specific management area of interest in a direct way. The more convoluted the relationship between the measurement and the management goal, the less likely you are to be collecting the right thing.
2) What does this metric measure? Exactly what does this metric count? High-level attempts to answer this question (such as "it measures how much we've accomplished") may be misleading. The detailed answers (such as "it reports how much we had budgeted for design tasks that first-level supervisors are reporting as greater than 80 percent complete") is much more informative, and can provide greater insight regarding the accuracy and usefulness of any specific metric.
3) If your organization optimized this metric alone, what other important aspects of your software design, development, testing, deployment, and maintenance would be affected? Asking this question will provide a list of areas where you must check to be sure that you have a balancing metric. Otherwise, your metrics program may have unintended effects and drive your organization to undesirable behavior.
4) How hard/expensive is it to collect this information? This is where you actually get to identify whether collection of this metric is worth the effort. If it is very expensive or hard to collect, look for automation that can make the collection easier, or consider alternative metrics that can be substituted.
5) Does the collection of this metric interact with (or interfere with) other business processes? For example, does the metric attempt to gather financial information on a different periodic basis or with different granularity than your financial system collects and reports it? If so, how will the two quantitative systems be synchronized? Who will reconcile differences? Can the two collection efforts be combined into one and provide sufficient software metrics information?

6) How accurate will the information be after you collect it? Complex or manpower-intensive metrics collection efforts are often short circuited under time and schedule pressure by the people responsible for the collection. Metrics involving opinions (e.g., what percentage complete do you think you are?) are notoriously inaccurate. Exercise caution, and carefully evaluate the validity of metrics with these characteristics.
7) Can this management interest area be measured by other metrics? What alternatives to this metric exist? Always look for an easier-to-collect, more accurate, more timely metric that will measure relevant aspects of the management issue of concern.
Use of this checklist will help ensure the collection of an efficient suite of software development metrics that directly relates to management goals. Periodic review of existing metrics against this checklist is recommended.
Projects that are underestimated, over-budget, or that produce unstable products, have the potential to devastate the company. Accurate estimates, competitive productivity, and renewed confidence in product quality are critical to the success of the company.
Hoping to solve these problems as quickly as possible, the company management embarks on the 8-Step Metrics Program
Step 1: Document the Software Development Process
Integrated Software does not have a defined development process. However, the new metrics coordinator does a quick review of project status reports and finds that the activities of requirements analysis, design, code, review, recode, test, and debugging describe how the teams spend their time. The inputs, work performed, outputs and verification criteria for each activity have not been recorded. He decides to skip these details for this "test" exercise. The recode activity includes only effort spent addressing software action items (defects) identified in reviews.
Step 2: State the Goals
The metrics coordinator sets out to define the goals of the metrics program. The list of goals in Step 2 of the 7 -Step Metrics Program are broader than (yet still related to) the immediate concerns of Integrated Software. Discussion with development staff leads to some good ideas on how to tailor these goals into specific goals for the company.
1. Estimates
The development staff at Integrated Software considers past estimates to have been unrealistic as they were established using “finger in the wind” techniques. They suggest that current plan could benefit from past experience as the present project is very similar to past projects.
Goal: Use previous project experience to improve estimations of Productivity.
2. Productivity
Discussions about the significant effort spent in debugging center on a comment by one of the developers that defects found early on in reviews have been faster to repair than Defects discovered by the test group. It seems that both reviews and testing are needed, but the amount of effort to put into each is not clear.
Goal: Optimize defect detection and removal.
3. Quality
The test group at the company argues for exhaustive testing. This however, is Prohibitively expensive. Alternatively, they suggest looking at the trends of defects discovered and repaired over time to better understand the probable number of defects remaining.
Goal: Ensure that the defect detection rate during testing is converging towards a level that indicates that less than five defects per KSLOC will be discovered in the next year.
Step 3: Define Metrics Required to Reach Goals and Identify Data to Collect
Working from the Step 3 tables, the metrics coordinator chooses the following metrics for the metrics program.
Goal 1: Improve Estimates
• Actual effort for each type of software in PH
• Size of each type of software in SLOC
• Software product complexity (type)
• Labor rate (PH/SLOC) for each type
Goal 2: Improve Productivity
• Total number of person hours per activity
• Number of defects discovered in reviews
• Number of defects discovered in testing
• Effort spent repairing defects discovered in reviews
• Effort spent repairing defects discovered in testing
• Number of defects removed per effort spent in reviews and recode
• Number of defects removed per effort spent in testing and debug
Goal 3: Improve Quality
• total number of defects discovered
• total number of defects repaired
• number of defects discovered / schedule date
• number of defects repaired / schedule date


Types of test metrics
1. Product test metrics
i. Number of remarks
ii. Number of defects
ii. Remark status
iv. Defect severity
v. defect severity index
vi. Time to find a defect
vii. Time to solve a defect
viii. Test coverage
ix. defects/KLOC
2. Project test metrics
i. workload capacity ratio
ii. Test planning performance.
Iii. Test effort ratio.
iv. Defect category
3. Process test metrics
i. should be found in which phase
ii. Residual defect density
iii. Defect remark ratio
iv. Valid remark ratio
v. bad fix ratio
vi. Defect removal efficiency
vii. Phase yield
viii. Backlog development
ix. Backlog testing
x. scope changes

Product test metrics

I. Number of remarks
Definition
The total number of remarks found in a given time period/phase/test type. A remark is a claim made by test engineer that the application shows an undesired behavior. It may or may not result in software modification or changes to documentation.
Purpose
One of the earliest indicators to measure once the testing commences; provides initial indications about the stability of the software
Data to collect
Total number of remarks found.

II. Number of defects
Definition
The total number of remarks found in a given time period/phase/test type that resulted in software or documentation modifications.
Purpose
The total number of remarks found in a given time period/phase/test type that resulted in software or documentation modifications.
Data to collect
Only remarks that resulted in modifying the software or the documentation are counted.

III. Remark status
Definition
The status of the defect could vary depending upon the defect-tracking tool that is used. Broadly, the following statuses are available: To be solved: Logged by the test engineers and waiting to be taken over by the software engineer. To be retested: Solved by the developer, and waiting to be retested by the test engineer. Closed: The issue was retested by the test engineer and was approved.
Purpose
Track the progress with respect to entering, solving and retesting the remarks. During this phase, the information is useful to know the number of remarks logged, solved, waiting to be resolved and retested.
Data to collect
This information can normally be obtained directly from the defect tracking system based on the remark status.

IV. Defect severity
Definition
The severity level of a defect indicates the potential business impact for the end user (business impact = effect on the end user x frequency of occurrence).
Purpose
Provides indications about the quality of the product under test. A high-severity defect means low product quality, and vice versa. At the end of this phase, this information is useful to make the release decision based on the number of defects and their severity levels.
Data to collect
Every defect has severity levels attached to it. Broadly, these are Critical, Serious, Medium and Low.

V. Defect severity index
Definition
An index representing the average of the severity of the defects.
Purpose
Provides a direct measurement of the quality of the product—specifically, reliability, fault tolerance and stability.
Data to collect
Two measures are required to compute the defect severity index. A number is assigned against each severity level: 4 (Critical), 3 (Serious), 2 (Medium), 1 (Low). Multiply each remark by its severity level number and add the totals; divide this by the total number of defects to determine the defect severity index.

VI. Time to find a defect
Definition
The effort required to find a defect.
Purpose
Shows how fast the defects are being found. This metric indicates the correlation between the test effort and the number of defects found.
Data to collect
Divide the cumulative hours spent on test execution and logging defects by the number of defects entered during the same period.




VII. Time to solve a defect
Definition
Effort required resolving a defect (diagnosis and correction).
Purpose
Provides an indication of the maintainability of the product and can be used to estimate projected maintenance costs.
Data to collect
Divide the number of hours spent on diagnosis and correction by the number of defects resolved during the same period.

VIII. Test coverage
Definition
Defined as the extent to which testing covers the product’s complete functionality.
Purpose
This metric is an indication of the completeness of the testing. It does not indicate anything about the effectiveness of the testing. This can be used as a criterion to stop testing.
Data to collect
Coverage could be with respect to requirements, functional topic list, business flows, use cases, etc. It can be calculated based on the number of items that were covered vs. the total number of items.

IX. Test case effectiveness
Definition
The extent to which test cases are able to find defects
Purpose
This metric provides an indication of the effectiveness of the test cases and the stability of the software.
Data to collect
Ratio of the number of test cases that resulted in logging remarks vs. the total number of test cases.

X. Defects/KLOC
Definition
The number of defects per 1,000 lines of code.
Purpose
This metric indicates the quality of the product under test. It can be used as a basis for estimating defects to be addressed in the next phase or the next version.
Data to collect
Ratio of the number of defects found vs. the total number of lines of code (thousands)

Formula used

Uses of defect/KLOC
Defect density is used to compare the relative number of defects in various software components. This helps identifies candidates various for additional inspection or testing or for possible engineering or replacement. Identifying defect prone components allows the concentration of limited resources into areas with the highest potential return on investment.
Another use of defect density is to compare subsequent releases of a product to track the impact of defect reduction and quality improvement activities. Normalling by size allows releasing of various sizes to be compared. Differences between products or products lines can also be compared in this manner.

Project test metrics:
I. Workload capacity
Definition
Ratio of the planned workload and the gross capacity for the total test project or phase. Similar.
Purpose
This metric helps in detecting issues related to estimation and planning. It serves as an input for estimating similar projects as well.
Data to collect
Computation of this metric often happens in the beginning of the phase or project. Workload is determined by multiplying the number of tasks against their norm times. Gross capacity is nothing but planned working time, determined by workload divided by gross capacity.

II. Test planning performance
Definition
The planned value related to the actual value.
Purpose
Shows how well estimation was done.
Data to collect
The ratio of the actual effort spent to the planned effort

III. Test effort percentage
Definition
Test effort is the amount of work spent, in hours or days or weeks. Overall project effort is divided among multiple phases of the project: requirements, design, coding, testing and such. This metric can be computed by dividing the overall test effort by the total project effort.

Purpose
The effort spent in testing, in relation to the effort spent in the development activities, will give us an indication of the level of investment in testing. This information can also be used to estimate similar projects in the future.
Data to collect
This metric can be computed by dividing the overall test effort by the total project effort.

IV. Defect category
Definition
An attribute of the defect in relation to the quality attributes of the product. Quality attributes of a product include functionality, usability, documentation, performance, installation and internationalization.
Purpose
This metric can provide insight into the different quality attributes of the product.

Data to collect
This metric can be computed by dividing the defects that belong to a particular category by the total number of defects.

Process test metrics
I. Should be found in which phase
Definition
An attribute of the defect, indicating in which phase the remark should have been found.
Purpose
Are we able to find the right defects in the right phase as described in the test strategy? Indicates the percentage of defects that are getting migrated into subsequent test phases.
Data to collect
Computation of this metric is done by calculating the number of defects that should have been found in previous test phases.

II. Residual defect density:
Definition
An estimate of the number of defects that may have been unresolved in the product phase.
Purpose
The goal is to achieve a defect level that is acceptable to the clients. We remove defects in each of the test phases so that few will remain.
Data to collect
This is a tricky issue. Released products have a basis for estimation. For new versions, industry standards, coupled with project specifics, form the basis for estimation.


III. Defect remark ratio
Definition
Ratio of the number of remarks that resulted in software modification vs. the total number of remarks.
Purpose
Provides an indication of the level of understanding between the test engineers and the software engineers about the product, as well as an indirect indication of test effectiveness.
Data to collect
The number of remarks that resulted in software modification vs. the total number of logged remarks. Valid for each test type, during and at the end of test phases.

IV. Valid remark ratio
Definition
Percentage of valid remarks during a certain period.
Purpose
Indicates the efficiency of the test process.
Data to collect
Ratio of the total number of remarks that are valid to the total number of remarks found.
Formula used
Valid remarks = number of defects + duplicate remarks + number of remarks that will be resolved in the next phase or release.

V. Phase yield
Definition
Defined as the number of defects found during the phase of the development life cycle vs. the estimated number of defects at the start of the phase.
Purpose
Shows the effectiveness of the defect removal. Provides a direct measurement of product quality; can be used to determine the estimated number of defects for the next phase.
Data to collect
Ratio of the number of defects found by the total number of estimated defects. This can be used during a phase and also at the end of the phase.

VI. Backlog development
Definition
The number of remarks that are yet to be resolved by the development team.
Purpose
Indicates how well the software engineers are coping with the testing efforts.
Data to collect
The number of remarks that remain to be resolved.

VII. Backlog testing

Definition
The number of resolved remarks that are yet to be retested by the development team.
Purpose
Indicates how well the test engineers are coping with the development efforts.
Data to collect
The number of remarks that have been resolved.

VII. Scope changes
Definition
The number of changes that were made to the test scope.
Purpose
Indicates requirements stability or volatility, as well as process stability.
Data to collect
Ratio of the number of changed items in the test scope to the total number of items.

VIII. Defect removal efficiency
Definition
The number of defects that are removed per time unit (hours/days/weeks)
Purpose
Indicates the efficiency of defect removal methods, as well as indirect measurement of the quality of the product.
Data to collect
Computed by dividing the effort required for defect detection, defect resolution time and retesting time by the number of remarks. This is calculated per test type, during and across test phases.

No comments: