MTBF and MTTR: Reduce Failures with Maintenance KPIs

Billy Cassano

Reading time: 5min.

MTBF and MTTR: Reduce Failures with Maintenance KPIs

Subscribe to our newsletter

and receive free content to enhance your maintenance.

Most companies use traditional maintenance Key Performance Indicators (KPIs) to track management and machine operation. Two of the most common ones are MTBF and MTTR, or Mean Time Between Failures and Mean Time to Repair.

Industries operate at a constant rhythm, especially in today’s fast-paced manufacturing environment – that is why avoiding equipment failures and unplanned downtime has become more crucial than ever, and that is where KPIs come into play. A single hour of downtime in production lines can have serious consequences, such as delays in delivery schedules and significant financial losses for companies.

Conducting detailed inspections on machine performance, availability, and reliability is crucial. This is what allows factories to operate at peak performance and implement successful predictive maintenance strategies.

Equipment Failure Prevention: A Guide for Maintenance Managers

KPIs are established to assist maintenance teams, indicating how frequently machine failures occur and how quickly technicians can repair them. Below, we’ll further explain these metrics and explore how to calculate MTBF and MTTR with examples.

What Is MTBF?

MTBF stands for Mean Time Between Failures, which refers to the average operating time of a piece of equipment before failure. If a machine remains operational for an extended period of time without any interruptions, MTBF stays high – so the higher the MTBF, the better.

MTBF is a KPI used for repairable systems where interventions are needed to replace components, such as changing a shaft’s bearings in a centrifugal pump.

Engineers and technicians responsible for inspections use MTBF to track and detect potential defects in the maintenance activities previously performed on that same equipment. This consequently leads to a more in-depth investigation into the root cause of a failure, and from there, it is possible to determine the best way to rectify it.

How to Calculate MTBF

MTBF is calculated using an arithmetic average. Essentially, this means taking the data for the desired period (it could be six months, a year, etc) and dividing the total operating time by the number of failures during that same period.

The formula to calculate MTBF would be:

Total available time – Lost time / Total number of stops or Machine operating time / Total number of failures

Since this KPI is also used to calculate reliability, MTBF does not take into account planned downtime during scheduled preventive maintenance. Instead, it focuses on unexpected interruptions in production and failures.

Let’s use one of TRACTIAN’s clients as an example: a hydraulic pump in a CNC machining center.

The CNC machine has an operating window of 21 hours per day from Monday to Friday, which totals 105 hours per week. Over the course of one week, the hydraulic pump stopped the cutting tool lubrication system on 3 occasions, resulting in a total of 11 hours of downtime during that week.

This is how to calculate Mean Time Between Failures, or MTBF:

This tells us that, on average, the pump experienced issues every 31 hours during that week.

Thanks to TRACTIAN’s condition monitoring, this company was able to calculate critical equipment MTBF numbers in a simple and effective manner, while preventing manual errors and reducing the need for run-to-failure or reactive maintenance.

What Is MTTR?

MTTR stands for Mean Time to Repair, which refers to the actual time it takes to fix a failure or restore the operation of a piece of equipment. Most companies aim to minimize this KPI, thereby increasing the efficiency of the maintenance processes and equipment – so the lower the MTTR, the better.

In even simpler terms, MTTR measures the effectiveness of a maintenance team. However, it is easy to assume that this KPI holds one single meaning – the truth is that it represents four different concepts, the “R” can stand for repair, recovery, response, or resolution. While all four overlap, each has its own specific meaning.

MTTR (Mean Time to Respond): Average response time from the moment a failure is identified to when repair starts. It is significantly affected by the availability of parts in the inventory to be used in work orders.
MTTR (Mean Time to Recovery): Average time to restore the system. This includes the entire time from when a failure alert is generated until the system or equipment is back to operating normally.
MTTR (Mean Time to Resolve): This includes the time dedicated not only to identifying the fault, diagnosing the problem, and resolving the incident, but also to guaranteeing that the same failure won’t recur.

How to Calculate MTTR (Mean Time to Repair)

To calculate MTTR, sum up the total time spent on repairs during a specific period and divide that time by the number of repairs, or as follows:

Total repair time / Total number of failures

Let’s use the same previous example, the hydraulic pump of a CNC machine. The work order report showed technicians spent 11 hours that week to address the three failures.

This is how to calculate Mean Time to Repair, or MTTR:

The calculation indicates that, on average, it took the maintenance team 3 hours and 36 minutes to repair each failure.

What Is the Importance of MTBF and MTTR?

MTBF and MTTR originally came from the aviation industry, where system failures usually mean significant costs, not only in economic terms but also in employee safety.

Both KPIs, when well managed, can offer substantial benefits, as they:

Determine both the type of failures and their impact on machine downtime.
Save time not only in detecting problems, but also in making data-driven decisions to resolve them completely.
Allow for the establishment of the necessary frequency for appropriate monitoring when maintenance team members are dedicated to more than one area.
Identify the equipment or components that cause the majority of the problems.
Anticipate potential failures, thereby reducing unplanned downtime.
Assist in understanding the root cause of the problem, suggesting a predictive maintenance strategy.
Provide a reliable guide to maintenance costs over a defined period of time.

The following graph illustrates the relationship between MTBF and MTTR:

Difference between mean time to repair, mean time to resolve, mean time to respond, and mean time to recovery

MTBF and MTTR: How to Track Maintenance KPIs

As we have covered over the course of this article, both MTBF and MTTR are powerful allies in optimizing time management at large corporations, as well as anticipating failures and saving money in maintenance costs, ensuring operational systems work at their maximum capacity.

A CMMS or EAM, such as TRACTIAN’s TracOS™, takes this advantage one step further by generating real-time reports on maintenance management.

By centralizing alerts and notifying assignees in real time about machine failures, these issues can be addressed promptly without posing any risk of unplanned downtime. An EAM or CMMS also provides automatic calculations and updates to these KPIs, with accurate information about equipment requiring inspections and failure root causes.

maintenance KPIs like MTBF and MTTR seen on a dashboard in a maintenance management software — Maintenance KPIs such as MTBF, MTTR, MTTA, and reliability seen on a dashboard in TRACTIAN’s maintenance management software.

When KPIs are well implemented and managed, they become one of the greatest allies for maintenance teams. Want to reach maximum asset availability in your company? Click here to learn more.

Billy Cassano

Solutions Specialist

Billy Cassano

Solutions Specialist

As a Solutions Specialist at TRACTIAN, Billy spearheads the implementation of predictive monitoring projects, ensuring maintenance teams maximize the performance of their machines. With expertise in deploying cutting-edge condition monitoring solutions and real-time analytics, he drives efficiency and reliability across industrial operations.

Tractian Raises $120M to Eliminate Industrial Downtime Worldwide

Led by Sapphire Ventures, the round will enable Tractian to drive innovation in Manufacturing AI and expand its reach as the trusted Industrial Copilot.

TRACTIAN

What is Asset Performance Management? A Guide and Tips for APM

Asset Performance Management (APM) is a strategy focused on optimizing the performance and reliability of industrial assets. It combines technology, data analytics, and predictive maintenance to ensure machinery and equipment operate efficiently, reducing downtime and extending asset lifespan. APM isn’t just about monitoring—it turns raw data into actionable insights, allowing teams to prevent failures before they happen. By integrating real-time monitoring, APM empowers companies to make smart

Billy Cassano

How to Create a Work Order System Your Team Will Use

Creating an effective work order system is crucial for any maintenance team aiming to improve efficiency and minimize downtime. A well-structured work order process helps ensure that tasks are completed on time, resources are allocated efficiently, and equipment remains in optimal condition. However, for a work order system to be truly effective, it must be easy for your team to use and integrate seamlessly with your existing workflows. Standard Operating Procedures (SOPs) play a key role in ac

Billy Cassano

MTBF and MTTR: Reduce Failures with Maintenance KPIs

Subscribe to our newsletter