It’s well accepted that every car lot has at least one lemon. A “lemon car” is a vehicle with repeated defects that continues to break down even after multiple repair attempts. The problem is, it can take a while to realize the breakdowns are related to a deeper issue with the vehicle.
Plant assets are no different. For every plant, there is bound to be one asset that just won’t stop failing. You repair it, reset it, and hope for the best, only to soon repeat the cycle. The question on every technician’s mind is, “At what point should we realize the cycle of repairs is signaling a deeper problem?
Mean Time Between Failure (MTBF) is the way to measure that. It isn’t a theoretical metric, though. It’s a data-driven direct indicator of how reliable your equipment really is. MTBF is a practical way for maintenance teams to gain the insight required to shift their approach to assets from reactive firefighting to strategic decision-making.
In this article, we’ll break down what MTBF means, how to calculate it accurately, and how to apply it to improve reliability. The goal is for you to optimize your maintenance schedules and reduce time spent on downtime that can be avoided.
What Is Mean Time Between Failure (MTBF)?
MTBF measures the average time a repairable asset operates before it fails. It’s your reliability score, quantifying how long equipment typically runs between breakdowns.
It applies only to repairable systems. If you replace the component entirely, you’re looking at a different metric, such as Mean Time to Failure ( MTTF). MTBF focuses on assets you bring back into service after they break.
A few key points to understand MTBF:
- Repairable systems only: MTBF only applies to assets that are fixed after failure, not disposable components.
- Measured in hours: While you can use cycles or days, most teams track MTBF in operating hours.
- Based on history: It reflects what’s already happened, not predictions of future performance.
For example, if a conveyor motor runs for 1,000 hours and fails four times during that period, your MTBF is 250 hours. That number isn’t something you can set a timer for and guarantee a breakdown. It’s a benchmark representing the general expectation of the amount of uptime experienced by an asset between failures.
A number, based on real-world data, that helps you evaluate whether your current maintenance efforts.
MTBF is a tangible and actionable metric and helps determine what is working for a particular asset and what is not. This is a much better approach than making guesses without anything backing them up
MTBF helps answer core operational questions:
- How reliable is this machine?
- Are we seeing improvement after implementing preventive maintenance?
- When should we schedule the next PM to avoid an unexpected failure?
Also worth clearing up: mean time before failure is often used interchangeably, but incorrectly. Mean Time Between Failure is the correct term for this context.
How To Calculate Mean Time Between Failure
The formula for MTBF is straightforward:
MTBF = Total Operational Time ÷ Number of Failures
But like any maintenance metric, the value of the result entirely depends on the quality of your inputs. If your failure data is incomplete or inconsistent, your MTBF won’t tell you much.
Let’s break down how to calculate MTBF the right way:

1. Gather Reliable Failure Data
You can’t improve what you don’t measure, and you can’t measure what you don’t define. A failure, in this case, means the asset stopped performing its intended function and required a maintenance intervention.
Not every issue qualifies. For example, a vibration spike that doesn’t impact performance is not a failure. However, a jam that halts production? This definitely qualifies.
What data should you include for each failure event recorded? Include the:
- Date and time of failure
- Downtime duration
- Component or system that failed
- Corrective action taken
For example, if your packaging machine jams and production comes to a halt, this would be considered a failure event. If it squeaks but keeps running properly, then it would not. The squeak may be symptomatic of a potential failure—but you’re only including actual failures in MTBF.
The key to tracking failures effectively is to keep it simple enough that people actually do it, but detailed enough to be useful later.
2. Sum Total Operational Hours
Now, calculate how long the asset actually ran between failures. However, not in calendar days. We’re talking about actual runtime. Only include the hours when the equipment was actively operating.
It’s also helpful to know what you should exclude from your calculations. The following are items to leave out:
- Scheduled maintenance windows
- Plant-wide shutdowns
- Non-production days (e.g., holidays, idle shifts)
To restate this another way, “total operating hours” includes all hours the asset is actually running, doing its job. “Failure count” includes all breakdowns, which are failures to do its job. Failures do not include plant-wide shutdowns, holidays, scheduled maintenance, and similar activities that are planned for and intended to occur.
Moving forward, this sum is your baseline for the total time the asset was available and in use, tracked against your regular production schedule.
Most teams pull this from runtime logs, SCADA data, or operator reports. The method doesn’t matter as much as the consistency. Stick to one source and use it across all your MTBF calculations.
3. Divide by Number of Failures
Once you have calculated your total operating hours and failure count, it’s time to apply the MTBF formula.
Example: If a mixing tank operates for 3,000 hours and fails 3 times, your MTBF is 1,000 hours.
There's your indicator of how long, on average, this asset runs before something goes wrong.
Remember: high MTBF means better reliability. Low means trouble brewing.
What Does a "Good" MTBF Look Like?
MTBF values vary by asset type and operating conditions. But here’s the rule that never changes:
Higher MTBF = Higher reliability.Lower MTBF = Maintenance red flag.
Tracking this over time shows whether your reliability strategies are working or if it's time to reassess.
Why Tracking MTBF Matters for Maintenance
You’ve got dozens, sometimes hundreds, of assets to keep running. And not all of them fail the same way or at the same pace.
MTBF helps you understand how reliable your equipment really is, not just how often it breaks. It gives you the historical insight you need to plan smarter, prioritize better, and spot patterns before they turn into production-killing problems.
Here’s how MTBF directly supports better maintenance decisions in the real world:
Smarter Preventive Maintenance Timing
If your PM schedule is too aggressive, you waste time. Too relaxed, and failures hit without warning. MTBF helps you align PM intervals with real-world failure patterns, reducing unnecessary work while protecting uptime.
Equipment Performance Benchmarking
When identical assets show drastically different MTBF values, you’ve got a reliability issue worth digging into. MTBF highlights outliers, allowing you to take targeted action where it's needed most.
Spare Parts and Inventory Planning
Assets with lower MTBF wear through parts faster. Use MTBF to guide more accurate spare part stocking, ensuring you’re neither over-ordering nor caught without critical components when failures occur.
Repair vs. Replace Decisions
Should you keep fixing a piece of equipment or move on? MTBF trends reveal if reliability is improving, stable, or declining, helping you make informed replacement calls based on data, not guesswork.
MTBF vs. MTTF vs. MTTR
Let’s clear up the confusion. These three metrics often get lumped together, but they serve very different purposes. Understanding the difference is key to building a reliable maintenance strategy.
MTBF: Mean Time Between Failure (For Repairable Equipment)
MTBF measures the frequency of failures in repairable equipment. It tracks the average runtime between breakdowns for assets that can be repaired and returned to service.
Typical examples:
- Motors
- Pumps
- Gearboxes
- Control systems
Use MTBF to assess asset reliability and guide preventive maintenance, spares planning, and lifecycle decisions.
MTTF: Mean Time to Failure (For Non-Repairable Components)
MTTF applies to components that aren’t repaired, just replaced. It measures how long something typically lasts before its first (and final) failure.
Typical examples:
- Bearings
- Seals and gaskets
- Fuses
- Certain electronic parts
Use MTTF for replacement planning, stocking consumables, and building failure models for non-repairable assets.
MTTR: Mean Time to Repair (For Recovery Speed)
MTTR tells you how fast your team can restore service after a failure. It measures the average time from breakdown to full recovery, including diagnosis, repair, and reactivation.
Use MTTR to identify bottlenecks in your corrective maintenance process and improve response times.
Quick Comparison:
- MTBF = How often does it break?
- MTTF = How long does it last before failing?
- MTTR = How long does it take to fix once it fails?
Each metric answers a different question, but together, they give you a complete picture of reliability and repair efficiency.
Challenges with Mean Time Between Failure Data
On paper, MTBF is a simple formula. In real maintenance environments, though, the accuracy and usefulness of that number depend on how you track and interpret the data behind it.
Here’s where teams often run into trouble with MTBF data:
1. Averages Can Be Misleading
MTBF is an average, not a guarantee. If nine identical motors run flawlessly and one fails repeatedly, your fleet-wide MTBF may still look healthy. That could mask the fact that one unit is clearly dragging performance down.
Always break MTBF down by asset or asset group to avoid hiding individual reliability problems behind blended data.
2. Data Quality Defines the Value
Missed failure events, inconsistent logging, or vague documentation will skew your results. You might see a rising MTBF and assume reliability is improving, when in reality, failures are just going unreported.
A useful MTBF starts with disciplined, consistent failure tracking.
3. Context Changes Everything
Not all equipment operates under the same conditions. One pump may run on clean water, another on abrasive slurry. Even if they’re identical models, their MTBFs will look completely different.
Always factor in the operating context when comparing MTBF across assets or sites. Consider the environment, load, duty cycles, and other impactful factors.
4. Inconsistent Failure Definitions
If your team doesn’t share a clear definition of what counts as a failure, your MTBF data will be impossible to track over time.
Decide and standardize definitions upfront. For example, is a reset a failure? What about a brief shutdown that doesn’t require intervention? Set the criteria and stick to them. This consistency is what enables you to see patterns and changes, such as trends and deviations.
The takeaway: MTBF is powerful, but only when it’s built on clean, consistent, and contextual data. If your inputs are vague or inconsistent, your MTBF will be inaccurate, and it’ll probably lead you in the wrong direction.
Practical Ways to Improve MTBF
If your MTBF is trending low, it’s a signal that your reliability strategy needs improvement. Here are three proven ways to move that number in the right direction:
1. Track Historical Failures Consistently
You can’t improve what you don’t document. Every failure event should be logged in detail. This means not just recording when it happened, but why it happened.
Start recording activity with a basic template:
- What failed
- When it failed
- Why it failed (root cause, if available)
- What resolved it
Over time, this data reveals failure patterns across components, shifts, or environments. These patterns are your roadmap to making more intelligent decisions.
2. Align Preventive Maintenance With MTBF
Once you know your typical failure intervals, use that data to fine-tune your PM schedule.
Example: If a critical pump typically fails every 300 hours, schedule inspections at 250 hours to prevent failure.
The goal isn’t to increase the frequency of PMs, but to time maintenance interventions where they can actually prevent failures.
3. Standardize Root Cause Analysis (RCA)
Don’t just fix and forget. When high-impact equipment fails, dig deeper with a structured root cause analysis process:
- Why did it fail?
- Has this happened before?
- What conditions contributed?
- What change would prevent recurrence?
You can do that by using the 5 Whys Analysis or FMEA.
This transforms reactive maintenance into systematic problem-solving and prevents repeated failures from compromising your MTBF.
Make MTBF the Foundation of Reliable Maintenance
MTBF is a real-time pulse check on your operation’s reliability. When you consistently monitor and improve it, you stop reacting to failures and start planning for optimal performance.
Yet for most teams, this is a task easier said than done. Not every team has the tools or systems in place to track failures cleanly or calculate MTBF automatically. And, without these, it’s nearly impossible to connect data to decision-making.
Without clear failure data, aligned schedules, and centralized tracking, even simple metrics like MTBF become hard to generate or trust. And even harder to act on.
Tractian’s CMMS changes that. It tracks MTBF automatically through your active work orders, keeps your failure history organized, and surfaces insights right when your team needs them. No extra spreadsheets. No manual math.
You’ll log tasks directly from the floor (even offline), use AI-generated SOPs to reduce variability, and follow structured checklists to prevent repeat errors. As a result, your MTBF goes up, not out of luck, but because of smart, evidence-based decisions.
With real-time dashboards, your team can spot performance gaps, respond to failures faster, and adjust PMs before breakdowns occur. With Tractian’s CMMS, you’re not just passively watching reliability improve. You’re actively driving it, with clear visibility and control.