Home / Resources / Key Reliability Metrics: MTBF, MTTR, OEE, and What They Actually Tell You
Guide 8 min read

Key Reliability Metrics: MTBF, MTTR, OEE, and What They Actually Tell You

You Can’t Manage What You Don’t Measure — But You Can Measure the Wrong Things

Every maintenance and reliability conference emphasizes metrics. KPIs fill dashboards. MTBF appears in quarterly reports. OEE charts hang in production areas. The problem is that many plants measure these metrics incorrectly, interpret them poorly, or focus on numbers that don’t drive the right behaviors.

Good metrics answer specific questions: How reliable is our equipment? How effective is our maintenance? Where should we focus improvement efforts? Bad metrics create busywork, encourage gaming, and distract from the real issues.

Mean Time Between Failures (MTBF)

MTBF is the average time between functional failures on a repairable system. It’s the most commonly cited reliability metric — and the most commonly miscalculated.

Calculating MTBF Correctly

MTBF = Total operating time / Number of failures

For a pump that ran 8,760 hours in a year (continuous operation) and experienced 3 functional failures:

MTBF = 8,760 / 3 = 2,920 hours

Sounds simple. Here’s where plants get it wrong:

  • Counting the wrong failures. MTBF should include only functional failures — events that caused the equipment to stop performing its required function. A minor oil leak that doesn’t affect operation is a deficiency, not a failure. Including it artificially deflates MTBF. Conversely, excluding failures because they were “caught by PdM” before total breakdown understates the reliability problem.
  • Using calendar time instead of operating time. A machine that runs 40 hours per week shouldn’t have its MTBF calculated over 168 hours per week. Use actual running hours.
  • Averaging across dissimilar equipment. Plant-wide MTBF is nearly meaningless. MTBF by equipment class, by system, or by individual critical asset is actionable. Averaging a turbine generator and a sump pump together helps nobody.

What MTBF Tells You

Trending MTBF over time on specific equipment reveals whether reliability is improving or degrading. An MTBF that’s been climbing for three years indicates effective maintenance. A declining MTBF says something is changing — age, operating conditions, maintenance quality — and warrants investigation.

SMRP best practice 5.5.2 targets MTBF specific to equipment type and application, recognizing that a “good” MTBF for a centrifugal pump is different than a “good” MTBF for a conveyor belt.

Mean Time To Repair (MTTR)

MTTR measures how long it takes to restore equipment to operating condition after a failure. It reflects the efficiency of your maintenance response.

MTTR = Total downtime for repairs / Number of repair events

What’s Included in MTTR?

This is where definitions vary — and consistency matters more than the specific definition you choose.

  • Narrow definition: Actual wrench time only. From the moment the technician starts working on the equipment until the repair is complete. This measures maintenance execution efficiency.
  • Broad definition: Total time from failure occurrence to equipment returned to service. This includes notification time, diagnosis time, parts procurement time, actual repair time, and testing/startup time. This measures total maintenance response capability.

The broad definition is more useful for understanding production impact. A 2-hour repair that takes 8 hours to complete because parts weren’t available highlights a planning and inventory problem that the narrow definition misses.

Track MTTR by craft (mechanical, electrical, instrumentation), by equipment type, and for planned versus unplanned work. Unplanned MTTR is typically 2-3x longer than planned MTTR — providing data to support the case for predictive and preventive maintenance investments.

Overall Equipment Effectiveness (OEE)

OEE combines availability, performance, and quality into a single metric that reflects how well equipment performs relative to its design capability.

OEE = Availability x Performance x Quality

  • Availability = (Planned production time – Downtime) / Planned production time. This captures all stops — planned and unplanned.
  • Performance = (Ideal cycle time x Total pieces produced) / Available operating time. This captures speed losses — running slower than design speed.
  • Quality = Good pieces / Total pieces produced. This captures yield losses from defects and rework.

World-Class OEE: The 85% Myth

You’ll see “85% OEE” cited as world-class in almost every OEE reference. That number (90% availability x 95% performance x 99.9% quality = 85.1%) is a reasonable benchmark for discrete manufacturing, but it’s meaningless without context. A batch process plant with long changeovers will have structurally lower availability. A food processing line with strict quality specifications will have lower quality rates. Compare your OEE to your own historical performance and industry-specific benchmarks, not a generic target.

OEE Traps

Manipulating planned downtime. If you classify all PM downtime as “planned” and exclude it from the availability calculation, OEE looks great while your equipment is actually down frequently. Define what counts as planned versus unplanned and stick with it.

Applying OEE to the wrong assets. OEE is most useful for bottleneck operations and production-critical equipment. Calculating OEE for a utility air compressor that has an installed spare makes little sense — availability is already ensured by redundancy. Focus OEE on the constraint.

Other Metrics That Matter

Planned vs. Unplanned Maintenance Ratio

This might be the single most telling metric of maintenance program maturity. Calculate it as:

% Planned = Planned work orders / Total work orders x 100

SMRP best practice targets greater than 80% planned work. Reactive-dominant plants run 40-60% planned. World-class operations run 90%+ planned. The transition from reactive to planned correlates directly with lower total maintenance costs and higher equipment reliability.

Schedule Compliance

Of the work that was scheduled for this week, what percentage was completed? Target: 90% or better. Low schedule compliance indicates either poor planning (jobs not ready to execute), excessive emergency work bumping scheduled jobs, or insufficient resources.

PM Compliance

Are PM tasks being completed on time? Track as percentage of PM work orders completed within the scheduled window (typically ±10% of the interval). Below 90% means PMs are being deferred — and deferred PMs become reactive failures.

Maintenance Cost per Replacement Asset Value (RAV)

Total annual maintenance cost divided by the total replacement value of the physical assets. General industry benchmark is 2-3% of RAV. Below 2% may indicate underinvestment. Above 4% suggests excessive reactive maintenance or poor practices. This metric normalizes maintenance spending across different plant sizes and allows meaningful comparisons.

Making Metrics Work

Limit your active dashboard to 5-7 key metrics. More than that dilutes attention. Review monthly at the maintenance leadership level. Review quarterly with plant management. Investigate step changes and trends — a single month’s number is noise; three months in one direction is a signal.

Connect metrics to actions. If MTBF on a critical system drops, trigger an RCA. If schedule compliance falls below 85%, investigate the root cause (emergency work displacement, parts availability, labor shortages). Metrics that sit on dashboards without driving decisions are decoration, not management tools.

Most importantly, ensure data integrity. Metrics are only as good as the data feeding them. Work orders must be accurately coded — failure codes, equipment identifiers, timestamps, and work type classification. Invest time in training your maintenance team on proper work order completion. Garbage in, garbage out applies to maintenance metrics as much as any other data system.

Get Started

Request a Free Reliability Assessment

Tell us about your equipment and facility. Our reliability team will review your situation and recommend a tailored reliability program — no obligation.

Free initial assessment
Response within 1 business day
No obligation or commitment

No obligation. Typical response within 24 hours.

Ready to Solve Your Reliability Problem?

Submit your equipment details and a reliability specialist will review your situation.

Claim Your Free Assessment →