How do I prioritize failure modes after the FMEA?

Risk Priority Number, but use it as a starting point, not a final answer. Severity × Occurrence × Detection gives you a quantitative ranking; the highest RPNs go to the top of the list. Then pressure-test that list against operations: any mode that could shut the plant for >24 hours or has a safety/environmental consequence goes up regardless of RPN. The math is a sorting tool, not a decision tool — the team owns the final call on which 20 modes get mitigation.

What's the difference between FMEA and FMECA?

FMEA stops at identifying failure modes and their effects. FMECA adds Criticality — a structured assessment of how bad each effect is, usually expressed as a category (catastrophic, critical, marginal, negligible) per MIL-STD-1629A or similar. For most industrial reliability work, plain FMEA with an RPN ranking gets you 90% of the value. FMECA is required when you're dealing with safety-critical systems where regulatory documentation demands the criticality column.

How long does an FMEA workshop take?

Plan two to four hours per major equipment item with a cross-functional team of four to six people. A centrifugal pump FMEA usually takes three hours including effects discussion. A large gas compressor train can run two full days because of the sub-system complexity. The mistake is doing FMEA solo at a desk — without operators and craft in the room, you miss the real-world failure modes that don't show up in the OEM manual.

FMEA for Maintenance Teams: A Step-by-Step Approach to Failure Mode and Effects Analysis

Why FMEA Belongs in Maintenance, Not Just Engineering

Failure Mode and Effects Analysis started in aerospace — specifically, the U.S. military’s MIL-P-1629 procedure from 1949. It migrated to automotive manufacturing through the AIAG standard. But FMEA has a direct and practical application in industrial maintenance that most plants underuse.

Maintenance teams deal with failure modes every day. They replace bearings, rebuild pumps, rewire motors, and fix leaks. The problem is that these repairs are almost always reactive — responding to failures after they occur. FMEA flips this around by systematically identifying failure modes before they happen and prioritizing action based on risk.

FMEA Terminology: Speaking the Same Language

Before you can run an FMEA, everyone on the team needs to agree on what the terms mean.

Function — What the item or system is supposed to do. Express functions in terms of performance requirements: “Deliver 200 GPM at 80 psig” rather than “Pump liquid.”
Failure Mode — A specific way the item can fail to perform its function. Failure modes should be specific enough to identify a maintenance action. “Bearing failure” is too broad. “Inner race spalling due to fatigue” tells you something actionable.
Failure Effect — The consequence of the failure mode at the local, system, and plant level. What does the operator see? What production is lost? What safety risks exist?
Failure Cause — The root mechanism driving the failure mode. Contamination, overloading, improper installation, material defect — these are causes.
Current Controls — What maintenance or monitoring is currently in place to detect or prevent this failure mode?

The Risk Priority Number: Useful but Imperfect

Traditional FMEA ranks risk using the Risk Priority Number (RPN) — the product of Severity, Occurrence, and Detection ratings, each scored 1-10.

Severity (S) — How bad is the effect if the failure occurs? A score of 1 means negligible impact. A 10 means potential safety hazard or catastrophic production loss.
Occurrence (O) — How likely is this failure mode? Based on historical frequency or engineering judgment when data isn’t available. A 1 means virtually impossible. A 10 means almost certain.
Detection (D) — How likely is it that current controls will detect the failure before it reaches the customer (or in maintenance terms, before it causes a functional failure)? A 1 means current controls will almost certainly detect it. A 10 means the failure is undetectable with current methods.

RPN = S x O x D. Maximum possible score is 1,000. Scores above 100-200 typically warrant action, but there’s no universal threshold.

The RPN Problem

RPN treats all three factors equally. A failure mode with Severity 10, Occurrence 1, Detection 1 (RPN = 10) gets the same score as one with Severity 1, Occurrence 2, Detection 5 (RPN = 10). But the first one is a potential catastrophe that current controls catch well. The second is a trivial failure happening occasionally. Equal RPN, completely different risk profiles.

The AIAG/VDA joint FMEA standard published in 2019 addresses this with an Action Priority (AP) table that uses severity-occurrence-detection combinations rather than multiplication. If you’re setting up a new FMEA program, adopt the AP approach from the start. If you’re already using RPN, supplement it with a rule: any failure mode with Severity of 9 or 10 requires action regardless of RPN.

Running an FMEA Session: Practical Steps

Pre-Session Preparation

Don’t walk into an FMEA session cold. Prepare by gathering:

Equipment drawings and manuals
CMMS work order history for the equipment (minimum 2 years)
Operating procedures
Existing PM task lists
Any previous failure investigation reports

Pre-populate the FMEA worksheet with known functions and failure modes from the work order history. This gives the team a starting point rather than a blank page.

The Session

Team composition matters. Include an operator who runs the equipment daily, a maintenance technician who works on it, a reliability engineer or planner, and a facilitator who keeps the process on track. The facilitator doesn’t need to be an equipment expert — their job is to manage the process and challenge assumptions.

Work through the equipment systematically. Start with the primary function and work through each component or subsystem. For each failure mode identified:

Describe the failure effect clearly. What does the operator experience? What happens to production? Any safety or environmental consequences?
Identify the failure cause or mechanism.
Document current detection and prevention controls.
Score Severity, Occurrence, and Detection.
Calculate RPN or determine Action Priority.
Assign recommended actions for high-risk items with responsible person and target date.

Limit sessions to 2-3 hours. Fatigue sets in quickly with this level of detailed analysis. Multiple shorter sessions produce better results than marathon sessions.

Common Pitfalls

Going too deep. Analyzing at the individual fastener level is a waste of time for most industrial equipment. Stay at the component level — bearings, seals, impellers, windings — unless a specific sub-component has a known problematic failure mode.

Inflating Occurrence scores. Teams that haven’t seen a specific failure mode tend to rate it as unlikely. Check your data. A failure that happens once every three years on a single machine might not seem frequent, but across 50 similar machines in the plant, that’s 17 failures per year.

Ignoring Detection. Many teams rush through detection scoring. This is where the actionable insight lives. A high-severity, high-occurrence failure mode with good detection (low D score) is being managed. The same failure mode with poor detection (high D score) is a ticking bomb.

No follow-through. Recommended actions that never get implemented waste everyone’s time. Track FMEA actions in your CMMS or action tracking system. Review progress monthly until all high-priority actions are closed.

Turning FMEA Results Into Maintenance Strategy

FMEA results directly inform your maintenance program:

High Severity + Poor Detection = Add a predictive maintenance task to improve detection. Vibration monitoring, oil analysis, or thermography for the specific failure mode.
High Severity + High Occurrence = Address the root cause. Change materials, modify operating procedures, or redesign the component. Maintenance alone can’t fix a design or operational problem.
Low Severity + Any Occurrence = Run-to-failure is likely appropriate. Don’t waste resources preventing failures that don’t matter.
High Detection (failure easily caught early) = Validate that your current detection methods are actually being performed and acted upon. A vibration program that collects data but doesn’t analyze it offers zero detection capability despite being listed as a control.

Revisit your FMEA when significant changes occur — new operating conditions, equipment modifications, or failure events that weren’t captured in the original analysis. A living FMEA that evolves with your equipment is a strategic asset. A static FMEA that sits in a folder is just paperwork.

FMEA for Maintenance Teams: A Step-by-Step Approach to Failure Mode and Effects Analysis

Why FMEA Belongs in Maintenance, Not Just Engineering

FMEA Terminology: Speaking the Same Language

The Risk Priority Number: Useful but Imperfect

The RPN Problem

Running an FMEA Session: Practical Steps

Pre-Session Preparation

The Session

Common Pitfalls

Turning FMEA Results Into Maintenance Strategy

Request a Free Reliability Assessment

Ready to Solve Your Reliability Problem?

FMEA for Maintenance Teams: A Step-by-Step Approach to Failure Mode and Effects Analysis

Why FMEA Belongs in Maintenance, Not Just Engineering

FMEA Terminology: Speaking the Same Language

The Risk Priority Number: Useful but Imperfect

The RPN Problem

Running an FMEA Session: Practical Steps

Pre-Session Preparation

The Session

Common Pitfalls

Turning FMEA Results Into Maintenance Strategy

FAQ

More Troubleshooting Guides

Reliability Consulting in Food and Beverage Plants: Challenges and Solutions

Bearing Selection and Installation: Getting Maximum Life from Rolling Element Bearings

Maintenance Budgeting: How to Build and Defend a Reliability-Focused Budget

Reliability and Maintenance in Oil and Gas: Upstream, Midstream, and Downstream Considerations

Root Cause Analysis Methods for Equipment Failures: A Practical Guide

How to Build a Reliability Program from Scratch: A 12-Month Implementation Roadmap

Request a Free Reliability Assessment

Ready to Solve Your Reliability Problem?