• OEE

Machine Downtime Tracking: Methods and Metrics

Luke Bennett

Updated in mar 20, 2026

9 min.

Most plants know they have a downtime problem. Fewer know exactly how big it is, where it comes from, or whether the number on the board reflects reality.

Accurate machine downtime tracking is the foundation of any serious improvement effort. Without reliable data, maintenance teams chase the wrong problems, production planners build schedules on assumptions, and OEE scores look better on paper than they do on the floor.

What Is Machine Downtime Tracking?

Machine downtime tracking is the systematic process of recording when a machine stops producing, for how long, and why. It spans both planned downtime (scheduled maintenance windows, changeovers, and tooling changes) and unplanned downtime caused by breakdowns, faults, or process failures.

Done well, downtime tracking gives maintenance and operations teams a factual basis for prioritising repairs, scheduling preventive work, and measuring the true cost of equipment reliability. It directly feeds the Availability component of Overall Equipment Effectiveness, which is the standard metric for production performance in manufacturing.

Why Most Downtime Data Is Unreliable

The uncomfortable reality in most plants is that downtime data is systematically incomplete. The gap between reported downtime and actual downtime is rarely small, and it skews decisions at every level.

The manual logging problem

Paper-based and spreadsheet-based logging puts the recording burden on operators who are simultaneously managing a production line. The result is predictable:

  • Short stops go unrecorded. A two-minute jam or a quick reset rarely makes it onto a log sheet. Over a shift, those events can add up to 30 or 40 minutes of untracked lost time.
  • Start and end times are estimated, not measured. Operators often fill in logs at the end of a shift from memory, which introduces rounding and compression errors.
  • Reason codes are guessed. When a machine stops unexpectedly, the operator's job is to get it running again, not to diagnose the root cause. Codes like "unknown fault" or "mechanical issue" obscure the data rather than clarify it.
  • Under-reporting is incentivised. In plants where downtime is tracked against individual operators or lines, there is social pressure to minimise what gets recorded.

Why this matters for decision-making

If your downtime data only captures 50 to 60 percent of actual events, your MTBF calculations are inflated, your maintenance scheduling is based on optimistic assumptions, and your improvement initiatives target the stops that are easy to see, not necessarily the ones that cost the most. The data problem compounds over time.

What to Track: Key Downtime Metrics

Effective downtime tracking requires more than a total minutes-lost figure. The metrics below give a complete picture of equipment reliability and maintenance performance.

Total downtime

The baseline: total minutes or hours lost per machine, line, shift, or day. Segment by planned vs. unplanned to separate maintenance execution from breakdown response.

Machine availability

Availability is the ratio of actual operating time to planned production time. It is one of the three factors in OEE and the most directly affected by downtime. A machine running 7.5 hours of an 8-hour shift has 93.75 percent availability.

MTBF (Mean Time Between Failures)

Mean Time Between Failures measures the average operating time between unplanned stop events. A declining MTBF trend is an early indicator that a machine is degrading. Tracking MTBF by machine and by failure type reveals which assets are consuming the most reliability effort.

MTTR (Mean Time to Repair)

Mean Time to Repair measures how long it takes to restore a machine to operation after a failure. High MTTR often reflects spare parts availability issues, technician skill gaps, or inadequate troubleshooting procedures, not just the severity of the fault itself.

Downtime by reason code

Reason codes categorise why a machine stopped. Aggregating downtime by reason code over time reveals patterns: which failure types occur most often, which ones take longest to resolve, and which are trending upward. This is the diagnostic layer that converts raw downtime minutes into actionable maintenance intelligence.

Unplanned vs. planned downtime ratio

The proportion of downtime that was unplanned is a direct indicator of maintenance programme maturity. A plant with a high unplanned ratio is operating reactively. As preventive and predictive programmes mature, more downtime shifts to planned windows, which are shorter, cheaper, and less disruptive to production.

Summary table

MetricWhat it measuresWhy it matters
Total downtimeMinutes lost per machine or lineBaseline for improvement
AvailabilityOperating time as a % of planned timeCore OEE input
MTBFAverage run time between failuresReliability trend indicator
MTTRAverage time to restore after failureMaintenance response efficiency
Downtime by reason codeFailure type breakdownRoot cause prioritisation
Unplanned vs. planned ratioReactive vs. proactive maintenance splitProgramme maturity indicator

Methods for Tracking Machine Downtime

There are three broad approaches to downtime tracking. Each has different data quality characteristics, implementation costs, and operator requirements.

Manual and paper-based logging

Operators record stop events on a paper sheet or spreadsheet at the machine. Reason codes are selected from a printed list or written in free text.

Advantages: zero infrastructure cost, works on any machine, no IT involvement required.

Limitations: data is incomplete, timestamps are approximate, reason codes are inconsistent, and the process depends entirely on operator compliance. Short stops are routinely missed.

CMMS and MES software

CMMS (Computerised Maintenance Management System) or Manufacturing Execution System captures downtime as part of work order management. Technicians log a stop event when they open a work order; the system records start and end times.

Advantages: downtime is linked directly to maintenance activity, reason codes are standardised, and the data integrates with maintenance cost and parts records.

Limitations: only captures events that generate a work order, which means minor stops and short-duration events are still missed. Data quality depends on technician discipline in opening and closing work orders accurately.

Hardware sensors and automated capture

Production monitoring sensors install at the machine level and detect operating state (running, idle, or stopped) based on electrical current draw or signals from the machine's PLC. Every state change is timestamped automatically, with no operator input required at the point of capture.

Advantages: captures every stop event regardless of duration, timestamps are exact, no operator burden, and data is continuous and objective.

Limitations: sensors require installation and commissioning, and reason codes still require operator input after the fact to explain why a stop occurred. The hardware investment is higher than paper-based systems.

Comparison table

MethodData completenessTimestamp accuracyOperator burdenSetup costReason code quality
Manual/paperLow (50-70%)Low (estimated)HighNoneLow (inconsistent)
CMMS/MESMedium (work order events only)MediumMediumMediumMedium
Hardware sensorsHigh (all stops captured)High (automated)LowMedium-highDepends on follow-up process

How to Categorise and Code Downtime Events

Raw stop-time data becomes useful only when it is categorised consistently. A well-designed reason code system is one of the highest-leverage investments a plant can make in its data quality.

Common downtime categories

Most plants use four to six top-level categories:

  • Mechanical failure: bearing failures, seal leaks, structural damage, wear-related breakdowns
  • Electrical or controls fault: motor failures, sensor faults, PLC errors, wiring issues
  • Process or quality issue: jams, material defects, setup errors, out-of-spec production requiring a stop
  • Planned maintenance: scheduled PMs, inspections, lubrication routes, tooling changes
  • Changeover: product or format changes, cleaning between runs
  • Operator or external: material shortages, operator absence, utility interruptions

Structured reason code design

A two-level code structure (category and sub-reason) gives enough granularity for root cause analysis without overwhelming operators. For example: Category = Mechanical, Sub-reason = Conveyor belt jam. This structure allows you to aggregate across categories for trend analysis while retaining the detail needed to investigate specific events.

Getting operators to code accurately

The most common failure point in reason code systems is inconsistent use. Operators default to generic codes when they are not certain of the cause or when the code list is too long and complex. Best practices:

  • Keep the top-level list to six or fewer categories
  • Train operators on the purpose of coding, not just the mechanics
  • Review reason code distributions weekly: a spike in "unknown" codes is a signal that either the list is unclear or operators are not confident making the call
  • Where root cause is genuinely unclear at the time of the stop, flag it for follow-up rather than forcing a guess

How Tractian Automates Machine Downtime Tracking

Tractian's Sensor + Software solution addresses the two core failure modes of manual downtime tracking: incomplete event capture and unreliable timestamps.

Automated stop detection at the machine level

Tractian's current monitoring sensor clamps directly onto the machine's electrical supply. It detects run, idle, and stop states based on current draw, requiring no wiring, no PLC access, and no modification to the machine. Every state change is logged automatically with a precise timestamp.

This means short stops are captured alongside extended breakdowns. A 90-second jam that an operator would never log on paper appears in the data. Over a week, those micro-stops often account for more lost production than the handful of major breakdowns that everyone remembers.

Sensors complement operator knowledge rather than replace it. When the system flags a stop event, operators use the dashboard to assign a reason code and add context. The hardware provides the objective record; the operator provides the interpretation.

OmniTrac for PLC-connected machines

For machines with existing automation, Tractian's OmniTrac (PLC reader) pulls state signals and production counts directly from the PLC. This adds production cycle data to the downtime picture: not just when the machine was stopped, but how many cycles it completed and whether it was running at the target rate.

Real-time dashboards and OEE visibility

Both data sources feed Tractian's OEE platform, which displays Availability, Performance, and Quality by machine, line, and shift in real time. Maintenance and operations leaders can see where production is being lost as it happens, not during the next morning's review meeting.

The downtime prevention and reporting module aggregates stop events by reason code, machine, and time period. Teams can identify which assets account for the largest share of downtime, track MTBF and MTTR trends, and measure the impact of maintenance interventions over time.

What changes when data is automated

Plants that move from manual logging to automated capture typically see two immediate effects: reported downtime increases (because they are now counting events that were previously invisible), and the reliability of the data improves enough to drive genuine prioritisation decisions. The first effect can be uncomfortable. The second is what makes continuous improvement possible.

Frequently Asked Questions

What is the difference between planned and unplanned downtime?

Planned downtime covers stops that are scheduled in advance, such as preventive maintenance, changeovers, and inspections. Unplanned downtime is any stop that was not scheduled, typically caused by equipment failure, process faults, or external disruptions. The ratio of unplanned to total downtime is one of the clearest indicators of maintenance programme maturity.

How do you calculate machine availability from downtime data?

Availability is calculated as: (Planned Production Time minus Total Downtime) divided by Planned Production Time, expressed as a percentage. For example, a machine with 480 minutes of planned production time and 45 minutes of total downtime has an availability of 90.6 percent. Planned downtime is typically excluded from this calculation, though conventions vary by plant.

What is a good MTBF for manufacturing equipment?

MTBF benchmarks vary significantly by machine type, operating environment, and maintenance maturity. Rather than targeting a universal number, the more useful approach is to track MTBF trends over time for each individual asset. A consistent upward trend indicates that maintenance interventions are working. A declining trend is a signal to investigate before the next failure.

Can downtime tracking software integrate with our existing CMMS?

Most modern downtime tracking platforms, including Tractian's, offer integration with standard CMMS and ERP systems. When a sensor detects a stop event that requires a repair, the system can trigger a work order automatically, linking the downtime record to the maintenance activity and cost data in the CMMS. This closes the loop between the operational record and the maintenance record.

See Every Stop Before It Becomes a Problem

Accurate downtime tracking starts with capturing every stop event, not just the ones that make it onto a log sheet. Tractian's Sensor + Software solution automates machine state detection at the electrical level, giving your team objective data on every minute of lost production, with no operator burden at the point of capture.

See How Downtime Prevention Works

Luke Bennett
Luke Bennett

Applications Engineer

As an OEE Solutions Specialist at Tractian, Luke is dedicated to empowering manufacturing teams to achieve peak operational efficiency. He spearheads the implementation of cutting-edge Overall Equipment Effectiveness (OEE) projects, driving significant improvements in productivity, quality, and machine reliability across diverse industrial environments. Luke's expertise is built on over 5 years of extensive engineering experience at General Motors, Honda and others where he honed his skills to ensure clients maximize the performance of their machines and realize sustainable gains in their production processes.

Share