In most informal discussions, "x sigma" means the range x standard deviations above and below the average, or a span of 2x (i.e., +- x sigma.) For 6 sigma the total range spans 12 standard deviations. As the sigma value increases, a larger area under the "bell curve" is included: 50% at +- 0.67 sigma, 68.3% at +- 1 sigma, 99.7% at +-3 sigma, > 99.999999% at +- 6 sigma.
Manufactured parts typically have a nominal design value for characteristics of merit. For example, the nominal design diameter of a shaft is 1.0 mm, weight 45mg, etc. Processes which produce parts are typically imperfect, so the actual value for characteristics of merit in any given part will vary from nominal. Assemblies which use these parts must be designed to tolerate parts with such variances. Determining these tolerances is a classical systems engineering problem. However, once set, any part that exceeds these limits is defective.
While it might seem that a 3 sigma tolerance is generous, it turns out to result in a (typically) unacceptable rate of defects. With 6 sigma tolerances, a single part, and stable production process, you'd expect to have only 2 defects per billion. However, manufacturing processes vary from batch to the next, so the batch average for a characteristic often drifts between +- 1.5 sigma. If the mean of a batch happens to be at either extreme, many of the parts will be defective (with a 3 sigma standard, 66810 in a million). Assuming a +- 1.5 sigma process drift, 6 sigma tolerances will result in 3.4 defects per million parts.
Interestingly, components designed to 6 sigma tolerate more deviance from nominal values than those designed for lower sigma values. It is easier to achieve a conforming part/product designed to 6 sigma than to 3 sigma because more variance from nominal is acceptable. This is quite the opposite of what one might expect for a "high quality" result.
Multi-part products compound the drift problem. The expected number of defective units for a single part product can be modeled by the distribution for that part. But as the number of parts in a product increases, the chance of getting a product with no defective parts drops. If all parts are designed to 3 sigma, the chance of getting a defect free product with 100 parts is 0.0013. However, if all part design tolerances are extended to 6 sigma, the chance of a defective product with 10000 parts is less than 0.04 -- or, putting a more cheery spin on this, you'd expect that at least 96% of the 100 part products would be defect free with 6 sigma design limits and no more than +-1.5 sigma process drift (note this assumes no faulty interactions among "good" parts can occur.)
This is how Motorola has set their "Six Sigma" standard and why Lee Trevino (in a Motorola commercial) says he'd have to make over three million perfect golf shots to meet this goal. The related design strategy is straightforward: fewer parts, simplify the process, reduce the number of critical tolerances per part. (Six sigma processes are only part of their strategy for high quality manufacturing. [2])
Even if a cardinal interpretation was valid, more problems remain: 6 sigma of what? A fault density of x or less per [Instructions|Non-Comment Lines of Source Code|Function Points| ...]? A failure rate of x or less per [CPU seconds|transactions|interrupts| ... ]? Shall we use either 3.4E10-6 (the "drift" number) or 2.0E10-9 (the absolute number) for x ? Other interpretations are certainly possible -- this is exactly the problem.
As a point of reference, here are several reported defect densities for released software (KLOC = Thousand Lines of Code):
| Application | Defect Density | Source |
|---|---|---|
| NASA Space Shuttle Avionics | 0.1 Failures/KLOC | [4] |
| Leading-edge software companies | 0.2 Failures/KLOC | [5] |
| Reliability Survey | 1.4 Faults/KLOC | [6] |
| Best, Military system survey | 5.0 Faults/KLOC | [7] |
| Worst, Military system survey | 55.0 Faults/KLOC | [7] |
For the sake of argument, assume that a six sigma software standard calls for no more than 3.4 failures per million lines of code (0.0034 failures per KLOC.) This would require a software process roughly two orders of magnitude better than current best practices. It is hard to imagine how this could be attained, as the average cost of the shuttle code is reported to be $1,000 per line. [4]
Discussions of "six sigma" software based on this vague sloganeering ignore the fundamental flaws of applying a model of physical ordinal variance to the non-physical, non-ordinal behavior of software systems. This is not to say there are no useful applications of statistical process control in software process management (my favorite is to use u-charts for inspection and test results.)
I say we leave 6 sigma to the manufacturing guys. Let's figure out what we need to do to routinely get very high field reliability in software intensive systems and agree on an operational definition for reliability measurement.
[2] Bill Smith, "Six-Sigma Design," IEEE Spectrum, September 1993, v 30, n 9, pp 43-46.
[3] Michael A. Friedman and Jeffrey M. Voas, Software Assessment: Reliability, Safety, and Testability. (New York: John Wiley & Sons, 1995)
[4] Edward Joyce, "Is Error-free Software Possible?," Datamation, February 18, 1989.
[5] Leading-edge software companies are achieving 0.025 user-reported failures per function point or better (Capers Jones, Applied Software Measurement (McGraw-Hill, 1991) p. 177.) Assuming Jones' conversion factor of 128 lines of C source per function point, we get 0.2 = (0.025 * (1000/128)).
[6] John D. Musa, Anthony Iannino, and Okumoto Kazuhira, Software Reliability: Measurement, Prediction, Application. (New York: McGraw-Hill Publishing Company, 1990.) p. 116.
[7] Joseph P. Cavano and Frank S. LaMonica, "Quality Assurance in Future Development Environments," IEEE Software, September 1987, pp 26-34.
[8] Jim Smith and Mark Oliver, "Six Sigma: Realistic Goal or PR Ploy," Machine Design, September 10, 1992.