Six Sigma: Hardware Si, Software No!

Robert V. Binder

1. Introduction

Six sigma is a parameter used in statistical models of the quality of manufactured goods. It is also used as a slogan suggesting high quality. Some attempts have been made to apply 6 sigma to software quality measurement. This essay explains what 6 sigma means and why it is inappropriate for measuring software quality.

1.1 What is Six Sigma?
Six sigma means six standard deviations. A standard deviation is a parameter which characterizes a set of measurements, just as the average can characterize such a set. One standard deviation is a value such that roughly two-thirds of all values in a set fall within the range from one standard deviation below average to one standard deviation above average. Sets of values which can be characterized by the average and standard deviation may be modeled by the normal distribution, also know as the "bell-shaped curve". With a larger coefficient for sigma (1 sigma, 2 sigma, ... , 6 sigma) more of the set is included, corresponding to a larger area under the bell-shaped curve. (See any introductory text on statistics for more on this, e.g. [1].)

In most informal discussions, "x sigma" means the range x standard deviations above and below the average, or a span of 2x (i.e., +- x sigma.) For 6 sigma the total range spans 12 standard deviations. As the sigma value increases, a larger area under the "bell curve" is included: 50% at +- 0.67 sigma, 68.3% at +- 1 sigma, 99.7% at +-3 sigma, > 99.999999% at +- 6 sigma.

1.2 What is the Quality Significance of Six Sigma?
As a concept in statistical quality models of physical manufacturing processes, 6 sigma has a very specific meaning.[2]

Manufactured parts typically have a nominal design value for characteristics of merit. For example, the nominal design diameter of a shaft is 1.0 mm, weight 45mg, etc. Processes which produce parts are typically imperfect, so the actual value for characteristics of merit in any given part will vary from nominal. Assemblies which use these parts must be designed to tolerate parts with such variances. Determining these tolerances is a classical systems engineering problem. However, once set, any part that exceeds these limits is defective.

While it might seem that a 3 sigma tolerance is generous, it turns out to result in a (typically) unacceptable rate of defects. With 6 sigma tolerances, a single part, and stable production process, you'd expect to have only 2 defects per billion. However, manufacturing processes vary from batch to the next, so the batch average for a characteristic often drifts between +- 1.5 sigma. If the mean of a batch happens to be at either extreme, many of the parts will be defective (with a 3 sigma standard, 66810 in a million). Assuming a +- 1.5 sigma process drift, 6 sigma tolerances will result in 3.4 defects per million parts.

Interestingly, components designed to 6 sigma tolerate more deviance from nominal values than those designed for lower sigma values. It is easier to achieve a conforming part/product designed to 6 sigma than to 3 sigma because more variance from nominal is acceptable. This is quite the opposite of what one might expect for a "high quality" result.

Multi-part products compound the drift problem. The expected number of defective units for a single part product can be modeled by the distribution for that part. But as the number of parts in a product increases, the chance of getting a product with no defective parts drops. If all parts are designed to 3 sigma, the chance of getting a defect free product with 100 parts is 0.0013. However, if all part design tolerances are extended to 6 sigma, the chance of a defective product with 10000 parts is less than 0.04 -- or, putting a more cheery spin on this, you'd expect that at least 96% of the 100 part products would be defect free with 6 sigma design limits and no more than +-1.5 sigma process drift (note this assumes no faulty interactions among "good" parts can occur.)

This is how Motorola has set their "Six Sigma" standard and why Lee Trevino (in a Motorola commercial) says he'd have to make over three million perfect golf shots to meet this goal. The related design strategy is straightforward: fewer parts, simplify the process, reduce the number of critical tolerances per part. (Six sigma processes are only part of their strategy for high quality manufacturing. [2])

2. Should the Six Sigma Manufacturing Model be Applied to Software?

In a word, no. While high quality software is a good thing, there are at least three reasons why the 6 sigma model does not make sense for software.

2.1 Software Processes are Fuzzy
Every software "part" is produced by a process which defies the kind of predictable mechanization assumed for physical parts. Even at SEI level 5, the simple variation in human cognitive processes is enough to obviate applicability. The behavior of a software "process" is an amorphous blob compared to the constrained, limited, and highly predictable behavior of a die, a stamp, or a numerically controlled milling machine.

2.2 Software Characteristics of Merit are not Ordinal Tolerances
Six sigma applies to linear dimensions and counts of the outcomes of identical processes. The ordinal quantification of physical characteristics of merit cannot be applied to software without a wild stretch of imagination. The characteristic of merit is implicitly redefined from ordinal to cardinal in every discussion of 6 sigma software I've encountered. This is problematic; the analytical leverage of the ordinal model is lost and it is unclear what is being counted.

Even if a cardinal interpretation was valid, more problems remain: 6 sigma of what? A fault density of x or less per [Instructions|Non-Comment Lines of Source Code|Function Points| ...]? A failure rate of x or less per [CPU seconds|transactions|interrupts| ... ]? Shall we use either 3.4E10-6 (the "drift" number) or 2.0E10-9 (the absolute number) for x ? Other interpretations are certainly possible -- this is exactly the problem.

As a point of reference, here are several reported defect densities for released software (KLOC = Thousand Lines of Code):
Application Defect Density Source
NASA Space Shuttle Avionics 0.1 Failures/KLOC [4]
Leading-edge software companies 0.2 Failures/KLOC [5]
Reliability Survey 1.4 Faults/KLOC [6]
Best, Military system survey 5.0 Faults/KLOC [7]
Worst, Military system survey 55.0 Faults/KLOC [7]

For the sake of argument, assume that a six sigma software standard calls for no more than 3.4 failures per million lines of code (0.0034 failures per KLOC.) This would require a software process roughly two orders of magnitude better than current best practices. It is hard to imagine how this could be attained, as the average cost of the shuttle code is reported to be $1,000 per line. [4]

2.3 Software is not Mass Produced
Even if software components could be designed to ordinal tolerances, they'd still be one-off artifacts. It is inconceivable that one would attempt to build thousands of identical software components with an identical development process, sample just a few for conformance, and then, post hoc, try to fix the process if it produces too many systems which don't meet requirements. We can produce millions of copies by a mechanical process, but this is irrelevant with respect to software defects. Quantification of reliability is a whole 'nother ballgame.

3. Six Sigma as Slogan/Hype

I'm not against very high quality software (my consulting practice exists because very high quality software is hard to produce but nearly always worth the effort.) However, slogans like "six sigma" can confuse and mislead, even when applied to manufacturing [8]. Used as a slogan, "six sigma" simply means some subjectively (very) low defect level. The precise statistical sense is lost.

Discussions of "six sigma" software based on this vague sloganeering ignore the fundamental flaws of applying a model of physical ordinal variance to the non-physical, non-ordinal behavior of software systems. This is not to say there are no useful applications of statistical process control in software process management (my favorite is to use u-charts for inspection and test results.)

I say we leave 6 sigma to the manufacturing guys. Let's figure out what we need to do to routinely get very high field reliability in software intensive systems and agree on an operational definition for reliability measurement.

References/Notes

[1] Morris Hamburg, Statistical Analysis for Decision Making. (Harcourt, Brace & World, 1970).

[2] Bill Smith, "Six-Sigma Design," IEEE Spectrum, September 1993, v 30, n 9, pp 43-46.

[3] Michael A. Friedman and Jeffrey M. Voas, Software Assessment: Reliability, Safety, and Testability. (New York: John Wiley & Sons, 1995)

[4] Edward Joyce, "Is Error-free Software Possible?," Datamation, February 18, 1989.

[5] Leading-edge software companies are achieving 0.025 user-reported failures per function point or better (Capers Jones, Applied Software Measurement (McGraw-Hill, 1991) p. 177.) Assuming Jones' conversion factor of 128 lines of C source per function point, we get 0.2 = (0.025 * (1000/128)).

[6] John D. Musa, Anthony Iannino, and Okumoto Kazuhira, Software Reliability: Measurement, Prediction, Application. (New York: McGraw-Hill Publishing Company, 1990.) p. 116.

[7] Joseph P. Cavano and Frank S. LaMonica, "Quality Assurance in Future Development Environments," IEEE Software, September 1987, pp 26-34.

[8] Jim Smith and Mark Oliver, "Six Sigma: Realistic Goal or PR Ploy," Machine Design, September 10, 1992.


Home | Contact | Copyright | Site Map
First Release: 1 December 1995. Last Rev: 15 October 2001