The Research That Changed Maintenance Forever
In 1978, Stanley Nowlan and Howard Heap published a study commissioned by the U.S. Department of Defense through United Airlines. Their research analyzed failure data on aircraft components and identified six distinct patterns of failure probability over time. The findings contradicted the prevailing assumption that equipment wears out predictably with age — and they laid the foundation for Reliability Centered Maintenance.
Their core finding: only 11% of components exhibited the “bathtub curve” or wear-out pattern that most preventive maintenance programs are designed around. The other 89% showed failure patterns where age-based replacement either didn’t help or actively made things worse.
This research is over 40 years old, but its implications are still not fully absorbed by many industrial maintenance organizations. Understanding these patterns is essential for making rational maintenance decisions.
The Six Failure Patterns
Pattern A: Bathtub Curve (4% of components)
High infant mortality, then a long period of low, constant failure probability, followed by a wear-out zone with increasing failure probability. This is the classic pattern that most people think of when they imagine equipment aging. It applies to only 4% of components in the original study.
Examples: Simple items with direct material degradation — light bulbs with filament fatigue, tires with tread wear, brake pads.
Maintenance strategy: Time-based replacement makes sense here because failure probability genuinely increases with age. Set the replacement interval before the onset of the wear-out zone.
Pattern B: Wear-Out Only (2% of components)
Low, constant failure probability followed by a sharply increasing failure zone. No infant mortality period. The component performs reliably for an extended period, then wears out.
Examples: Erosion-prone components in consistent services, some fatigue-limited structural members operating in stable conditions.
Maintenance strategy: Time-based replacement is effective if you can reliably identify the onset of wear-out. Condition monitoring is often more cost-effective because it detects the actual onset rather than relying on a predicted age.
Pattern C: Gradual Increase (5% of components)
Failure probability increases gradually and steadily throughout the component’s life. No distinct wear-out zone — just a slowly rising failure rate.
Examples: Components subject to progressive corrosion, erosion, or fatigue in varying conditions.
Maintenance strategy: Condition-based monitoring works well because the gradual increase provides detectable degradation signals over an extended period. Time-based replacement is less effective because there’s no clear age at which failure probability jumps — you’d either replace too early (wasting remaining life) or too late (after probability has risen to unacceptable levels).
Pattern D: Initial Increase Then Constant (7% of components)
Low failure probability initially, rising quickly to a constant level. No wear-out phase. The component is equally likely to fail at any point after the initial settling period.
Examples: Complex systems where early problems are worked out through commissioning and debugging, then failure becomes random — driven by operational events rather than age.
Maintenance strategy: Focus on commissioning quality to survive the early-life risk period. After that, condition monitoring detects developing failures when they occur. Time-based replacement adds no value because the failure probability doesn’t increase with age.
Pattern E: Random (Constant) Failure Rate (14% of components)
Constant failure probability throughout life. A brand-new component is equally likely to fail as one that’s been in service for years. No infant mortality, no wear-out.
Examples: Components subject to external stress events (overloads, contamination incidents, foreign object damage) rather than intrinsic degradation. Many electronic components follow this pattern.
Maintenance strategy: Time-based replacement is counterproductive — the replacement is just as likely to fail as the component it replaced. Condition monitoring (where applicable) or run-to-failure with adequate sparing is appropriate.
Pattern F: Infant Mortality Then Constant (68% of components)
High early failure probability that decreases rapidly to a low, constant level. No wear-out phase. This is the dominant pattern — 68% of components in the Nowlan and Heap study followed this curve.
This pattern is critical to understand because it means scheduled overhauls on complex equipment actually introduce risk. Every time you disassemble and reassemble a component, you reset it to the high-failure-probability infant mortality zone. If the component would otherwise have continued running in the low, constant failure probability zone, the overhaul made reliability worse, not better.
Examples: Complex assemblies — engines, compressors, gearboxes, hydraulic systems — where most failures are driven by installation quality, manufacturing variation, contamination events, and random operational factors rather than intrinsic wear-out.
Maintenance strategy: Minimize unnecessary disassembly. Use condition monitoring to detect developing faults and intervene only when evidence indicates a problem. When maintenance is performed, focus on installation quality, contamination control, and commissioning verification to minimize infant mortality.
Implications for Industrial Maintenance
Stop Overhauling Equipment on a Calendar
If 89% of failure patterns don’t have age-related wear-out characteristics, why do maintenance programs still schedule major overhauls at fixed intervals? In many cases, the answer is “we’ve always done it that way” or “the OEM recommends it.”
OEM overhaul recommendations aren’t wrong — they’re conservative. Equipment manufacturers set intervals based on worst-case assumptions and liability considerations. They don’t know your operating conditions, contamination levels, or maintenance quality. Blindly following OEM intervals means you’re either overhauling too frequently (wasting money and introducing infant mortality risk) or, in some cases, not frequently enough (if your conditions are worse than the OEM assumed).
The better approach: use condition monitoring to determine when an overhaul is actually needed. Vibration trends, oil analysis data, performance degradation, and thermographic findings provide the evidence base for scheduling overhauls based on equipment condition rather than calendar date.
Invest in Installation Quality
Since Pattern F (infant mortality followed by constant failure rate) dominates industrial equipment, reducing infant mortality has an outsized impact on reliability. Every preventive or corrective maintenance event is a potential infant mortality event. The quality of disassembly, cleaning, inspection, reassembly, and commissioning directly determines whether the equipment enters the constant low-failure zone or re-enters the high-failure infant mortality zone.
Practical measures:
- Written, detailed procedures for assembly and installation of critical components
- Precision alignment verification after every coupling reconnection
- Cleanliness standards for bearings, seal chambers, and lubrication systems during maintenance
- Functional testing and acceptance criteria before returning equipment to service
- Post-maintenance vibration readings to verify the equipment is running correctly before declaring the job complete
Don’t Confuse “Random” with “Unpredictable”
Pattern E (random failure rate) means age doesn’t predict failure. It doesn’t mean the failure gives no warning before it occurs. A bearing with a constant failure probability doesn’t fail without warning — when it does begin to fail, it goes through a degradation process that produces detectable vibration, heat, and noise. Condition monitoring works on these components because it detects the degradation process, regardless of when that process begins.
This distinction is important. “Random failure pattern” doesn’t mean “give up and let it break.” It means “don’t replace it on a calendar — instead, monitor its condition and act when the data says it’s degrading.”
Applying This to Your Plant
You don’t need to study failure data for years to apply these concepts. A practical approach:
- For each major failure mode on critical equipment, ask: does this failure mode have a clear age-related onset? Can we predict when it will begin based on operating hours or calendar time?
- If yes (rubber degradation, corrosion at known rates, wear items with established life limits), time-based replacement is appropriate. Set intervals based on known degradation rates.
- If no (bearings, gears, electrical insulation, seals in variable conditions), condition-based monitoring is the preferred strategy. The failure will give detectable warning signs — your job is to have the monitoring in place to detect them.
- For inexpensive, non-critical items with no safety consequence — run-to-failure with spares on hand is the most economical strategy regardless of failure pattern.
Nowlan and Heap’s research gave us the evidence to stop maintaining equipment based on assumptions and start maintaining based on actual failure behavior. Four decades later, the plants that have absorbed this lesson are the ones with the highest equipment reliability and the lowest total maintenance costs.