Mean Time Between Failure
Definition: Mean time between failure (MTBF) is the average time a repairable asset operates before it fails. It is calculated by dividing total operational hours by the number of failures during that period and is expressed in operating hours.
Key Takeaways
- MTBF applies only to repairable assets that are fixed and returned to service after each failure.
- The formula is: MTBF = Total Operational Time divided by Number of Failures.
- Scheduled maintenance windows, shutdowns, and non-production days are excluded from total operational time.
- A higher MTBF means greater reliability; a declining MTBF is an early warning of a maintenance problem.
- MTBF is most useful when paired with MTTF and mean time to repair to build a complete reliability picture.
- Data quality is the single biggest factor in whether your MTBF is trustworthy and actionable.
What Is Mean Time Between Failure?
MTBF measures the average time a repairable asset operates before it fails. It is your reliability score, quantifying how long equipment typically runs between breakdowns.
It applies only to repairable systems. If you replace the component entirely, you are looking at a different metric, such as Mean Time to Failure (MTTF). MTBF focuses on assets you bring back into service after they break.
A few key points to understand MTBF:
- Repairable systems only: MTBF only applies to assets that are fixed after failure, not disposable components.
- Measured in hours: While you can use cycles or days, most teams track MTBF in operating hours.
- Based on history: It reflects what has already happened, not predictions of future performance.
For example, if a conveyor motor runs for 1,000 hours and fails four times during that period, your MTBF is 250 hours. That number is not a timer you can set and guarantee a breakdown. It is a benchmark representing the general expectation of the amount of uptime experienced by an asset between failures.
MTBF is a tangible and actionable metric. It helps determine what is working for a particular asset and what is not. This is a much better approach than making decisions without data to back them up.
MTBF helps answer core operational questions:
- How reliable is this machine?
- Are we seeing improvement after implementing preventive maintenance?
- When should we schedule the next PM to avoid an unexpected failure?
Also worth clearing up: "mean time before failure" is often used interchangeably, but incorrectly. Mean Time Between Failure is the correct term for this context.
How To Calculate Mean Time Between Failure
The formula for MTBF is straightforward:
MTBF = Total Operational Time divided by Number of Failures
But like any maintenance metric, the value of the result entirely depends on the quality of your inputs. If your failure data is incomplete or inconsistent, your MTBF will not tell you much.
Step 1: Gather Reliable Failure Data
You cannot improve what you do not measure, and you cannot measure what you do not define. A failure, in this case, means the asset stopped performing its intended function and required a maintenance intervention.
Not every issue qualifies. For example, a vibration spike that does not impact performance is not a failure. However, a jam that halts production definitely qualifies.
For each failure event recorded, include the:
- Date and time of failure
- Downtime duration
- Component or system that failed
- Corrective action taken
For example, if your packaging machine jams and production comes to a halt, that is a failure event. If it squeaks but keeps running properly, it is not. The squeak may be symptomatic of a potential failure, but only actual failures are included in MTBF.
The key to tracking failures effectively is to keep it simple enough that people actually do it, but detailed enough to be useful later.
Step 2: Sum Total Operational Hours
Calculate how long the asset actually ran between failures. Not in calendar days, but in actual runtime. Only include the hours when the equipment was actively operating.
The following items should be excluded from your calculations:
- Scheduled maintenance windows
- Plant-wide shutdowns
- Non-production days (such as holidays and idle shifts)
To restate this: "total operating hours" includes all hours the asset is actually running, doing its job. "Failure count" includes all breakdowns. Failures do not include plant-wide shutdowns, holidays, scheduled maintenance, and similar activities that are planned for and intended to occur.
Most teams pull this data from runtime logs, SCADA data, or operator reports. The method does not matter as much as the consistency. Stick to one source and use it across all your MTBF calculations.
Step 3: Divide by Number of Failures
Once you have your total operating hours and failure count, apply the MTBF formula.
Example: If a mixing tank operates for 3,000 hours and fails 3 times, your MTBF is 1,000 hours.
That is your indicator of how long, on average, this asset runs before something goes wrong. High MTBF means better reliability. Low means trouble brewing.
What Does a "Good" MTBF Look Like?
MTBF values vary by asset type and operating conditions. But one rule never changes: higher MTBF equals higher reliability; lower MTBF is a maintenance red flag.
Tracking this over time shows whether your reliability strategies are working or if it is time to reassess.
Why Tracking MTBF Matters for Maintenance
With dozens or sometimes hundreds of assets to keep running, not all of them fail the same way or at the same pace.
MTBF helps you understand how reliable your equipment really is, not just how often it breaks. It gives you the historical insight you need to plan smarter, prioritize better, and spot patterns before they turn into production-killing problems.
Here is how MTBF directly supports better maintenance decisions:
Smarter Preventive Maintenance Timing
If your PM schedule is too aggressive, you waste time. Too relaxed, and failures hit without warning. MTBF helps you align PM intervals with real-world failure patterns, reducing unnecessary work while protecting uptime.
Equipment Performance Benchmarking
When identical assets show drastically different MTBF values, you have a reliability issue worth digging into. MTBF highlights outliers, allowing you to take targeted action where it is needed most.
Spare Parts and Inventory Planning
Assets with lower MTBF wear through parts faster. Use MTBF to guide more accurate spare part stocking, ensuring you are neither over-ordering nor caught without critical components when failures occur.
Repair vs. Replace Decisions
Should you keep fixing a piece of equipment or move on? MTBF trends reveal if reliability is improving, stable, or declining, helping you make informed replacement calls based on data, not guesswork.
MTBF vs. MTTF vs. MTTR
These three metrics often get lumped together, but they serve very different purposes. Understanding the difference is key to building a reliable maintenance strategy.
| Metric | Applies To | Question It Answers | Typical Examples |
|---|---|---|---|
| MTBF | Repairable equipment | How often does it break? | Motors, pumps, gearboxes, control systems |
| MTTF | Non-repairable components | How long does it last before failing? | Bearings, seals, fuses, certain electronic parts |
| MTTR | Repair and recovery speed | How long does it take to fix once it fails? | Any repairable asset after a breakdown |
MTBF measures the frequency of failures in repairable equipment. It tracks the average runtime between breakdowns for assets that can be repaired and returned to service.
MTTF applies to components that are not repaired, just replaced. It measures how long something typically lasts before its first and final failure. Use MTTF for replacement planning, stocking consumables, and building failure models for non-repairable assets.
MTTR tells you how fast your team can restore service after a failure. It measures the average time from breakdown to full recovery, including diagnosis, repair, and reactivation. Use MTTR to identify bottlenecks in your corrective maintenance process and improve response times.
Each metric answers a different question, but together they give you a complete picture of reliability and repair efficiency.
Challenges with Mean Time Between Failure Data
On paper, MTBF is a simple formula. In real maintenance environments, though, the accuracy and usefulness of that number depend on how you track and interpret the data behind it.
Here is where teams often run into trouble:
1. Averages Can Be Misleading
MTBF is an average, not a guarantee. If nine identical motors run flawlessly and one fails repeatedly, your fleet-wide MTBF may still look healthy. That could mask the fact that one unit is clearly dragging performance down.
Always break MTBF down by asset or asset group to avoid hiding individual reliability problems behind blended data.
2. Data Quality Defines the Value
Missed failure events, inconsistent logging, or vague documentation will skew your results. You might see a rising MTBF and assume reliability is improving, when in reality, failures are just going unreported.
A useful MTBF starts with disciplined, consistent failure tracking.
3. Context Changes Everything
Not all equipment operates under the same conditions. One pump may run on clean water, another on abrasive slurry. Even if they are identical models, their MTBFs will look completely different.
Always factor in the operating context when comparing MTBF across assets or sites. Consider the environment, load, duty cycles, and other impactful factors.
4. Inconsistent Failure Definitions
If your team does not share a clear definition of what counts as a failure, your MTBF data will be impossible to track over time.
Decide and standardize definitions upfront. Is a reset a failure? What about a brief shutdown that does not require intervention? Set the criteria and stick to them. This consistency is what enables you to see patterns, trends, and deviations over time.
The takeaway: MTBF is powerful, but only when it is built on clean, consistent, and contextual data. If your inputs are vague or inconsistent, your MTBF will be inaccurate and will likely lead you in the wrong direction.
Practical Ways to Improve MTBF
If your MTBF is trending low, it is a signal that your reliability strategy needs improvement. Here are three proven ways to move that number in the right direction:
1. Track Historical Failures Consistently
You cannot improve what you do not document. Every failure event should be logged in detail, not just recording when it happened, but why it happened.
Start recording activity with a basic template:
- What failed
- When it failed
- Why it failed (root cause, if available)
- What resolved it
Over time, this data reveals failure patterns across components, shifts, or environments. These patterns are your roadmap to making more intelligent decisions.
2. Align Preventive Maintenance With MTBF
Once you know your typical failure intervals, use that data to fine-tune your PM schedule.
Example: If a critical pump typically fails every 300 hours, schedule inspections at 250 hours to prevent failure.
The goal is not to increase the frequency of PMs, but to time maintenance interventions where they can actually prevent failures.
3. Standardize Root Cause Analysis
Do not just fix and forget. When high-impact equipment fails, dig deeper with a structured root cause analysis process:
- Why did it fail?
- Has this happened before?
- What conditions contributed?
- What change would prevent recurrence?
You can apply the 5 Whys analysis or FMEA to structure this investigation. This transforms reactive maintenance into systematic problem-solving and prevents repeated failures from compromising your MTBF.
MTBF and Condition-Based Maintenance
Condition-based maintenance takes the logic behind MTBF a step further. Rather than scheduling maintenance based on average failure intervals, it monitors actual asset condition in real time and triggers intervention only when performance degrades to a defined threshold.
Teams using predictive maintenance alongside MTBF tracking can validate whether condition-based decisions are extending failure intervals over time. If MTBF is rising after sensor-driven interventions, the strategy is working. If it stays flat or falls, there is a gap to investigate.
Together, MTBF and condition monitoring give you both the historical baseline and the real-time signal needed to manage reliability proactively.
The Bottom Line
MTBF is a real-time pulse check on your operation's reliability. When you consistently monitor and improve it, you stop reacting to failures and start planning for optimal performance.
Yet for most teams, this is a task easier said than done. Not every team has the tools or systems in place to track failures cleanly or calculate MTBF automatically. And without these, it is nearly impossible to connect data to decision-making.
Without clear failure data, aligned schedules, and centralized tracking, even simple metrics like MTBF become hard to generate or trust, and even harder to act on. MTBF is powerful, but only when built on clean, consistent, and contextual data.
See How Tractian Improves MTBF
Tractian's condition monitoring platform detects developing faults early, extending equipment life and increasing mean time between failures.
See How Tractian Improves MTBFFrequently Asked Questions
What is mean time between failure?
Mean time between failure (MTBF) is the average time a repairable asset operates before it fails. It is calculated by dividing total operational time by the number of failures during that period and is expressed in operating hours.
How do you calculate MTBF?
MTBF = Total Operational Time divided by Number of Failures. Total operational time includes only hours the asset was actively running, excluding scheduled maintenance windows, plant-wide shutdowns, and non-production days.
What is the difference between MTBF and MTTF?
MTBF applies to repairable assets that are fixed and returned to service after a breakdown. MTTF applies to non-repairable components such as bearings or fuses, measuring how long a component lasts before its first and final failure.
What does a high MTBF indicate?
A high MTBF indicates better equipment reliability, meaning the asset operates for longer periods between failures. A low MTBF signals a reliability problem that may require maintenance strategy changes, root cause analysis, or an asset replacement decision.
What should be excluded from MTBF calculations?
Exclude scheduled maintenance windows, plant-wide shutdowns, and non-production days such as holidays or idle shifts. Only count hours when the equipment was actively operating and performing its intended function.
What counts as a failure for MTBF tracking?
A failure is any event where the asset stopped performing its intended function and required a maintenance intervention. Minor issues that do not impact performance, such as a vibration spike that does not halt production, do not qualify as failures for MTBF purposes.
Related terms
Oil Analysis
Oil analysis is a condition monitoring technique that examines a lubricant sample drawn from in-service equipment to assess both the health of the oil...
Oil Contamination Analysis
Oil contamination analysis is the systematic testing of lubricating oil samples to identify, classify, and quantify foreign substances that degrade oil...
On Time Delivery
On time delivery (OTD) is a supply chain and manufacturing KPI that measures the percentage of customer orders fulfilled by or before the agreed...
Onshoring
Onshoring is the practice of locating or retaining business operations, manufacturing, or services within a company's home country rather than moving...
Operation and Maintenance
Operation and maintenance (O&M) refers to the combined set of activities required to run physical assets and facilities at their intended level of...