Most plants know they have a downtime problem. Fewer know exactly how big it is, where it comes from, or whether the number on the board reflects reality.
Accurate machine downtime tracking is the foundation of any serious improvement effort. Without reliable data, maintenance teams chase the wrong problems, production planners build schedules on assumptions, and OEE scores look better on paper than they do on the floor.
What Is Machine Downtime Tracking?
Machine downtime tracking is the systematic process of recording when a machine stops producing, for how long, and why. It spans both planned downtime (scheduled maintenance windows, changeovers, and tooling changes) and unplanned downtime caused by breakdowns, faults, or process failures.
Done well, downtime tracking gives maintenance and operations teams a factual basis for prioritising repairs, scheduling preventive work, and measuring the true cost of equipment reliability. It directly feeds the Availability component of Overall Equipment Effectiveness, which is the standard metric for production performance in manufacturing.
Why Most Downtime Data Is Unreliable
The uncomfortable reality in most plants is that downtime data is systematically incomplete. The gap between reported downtime and actual downtime is rarely small, and it skews decisions at every level.
The manual logging problem
Paper-based and spreadsheet-based logging puts the recording burden on operators who are simultaneously managing a production line. The result is predictable:
- Short stops go unrecorded. A two-minute jam or a quick reset rarely makes it onto a log sheet. Over a shift, those events can add up to 30 or 40 minutes of untracked lost time.
- Start and end times are estimated, not measured. Operators often fill in logs at the end of a shift from memory, which introduces rounding and compression errors.
- Reason codes are guessed. When a machine stops unexpectedly, the operator's job is to get it running again, not to diagnose the root cause. Codes like "unknown fault" or "mechanical issue" obscure the data rather than clarify it.
- Under-reporting is incentivised. In plants where downtime is tracked against individual operators or lines, there is social pressure to minimise what gets recorded.
Why this matters for decision-making
If your downtime data only captures 50 to 60 percent of actual events, your MTBF calculations are inflated, your maintenance scheduling is based on optimistic assumptions, and your improvement initiatives target the stops that are easy to see, not necessarily the ones that cost the most. The data problem compounds over time.
What to Track: Key Downtime Metrics
Effective downtime tracking requires more than a total minutes-lost figure. The metrics below give a complete picture of equipment reliability and maintenance performance.
Total downtime
The baseline: total minutes or hours lost per machine, line, shift, or day. Segment by planned vs. unplanned to separate maintenance execution from breakdown response.
Machine availability
Availability is the ratio of actual operating time to planned production time. It is one of the three factors in OEE and the most directly affected by downtime. A machine running 7.5 hours of an 8-hour shift has 93.75 percent availability.
MTBF (Mean Time Between Failures)
Mean Time Between Failures measures the average operating time between unplanned stop events. A declining MTBF trend is an early indicator that a machine is degrading. Tracking MTBF by machine and by failure type reveals which assets are consuming the most reliability effort.
MTTR (Mean Time to Repair)
Mean Time to Repair measures how long it takes to restore a machine to operation after a failure. High MTTR often reflects spare parts availability issues, technician skill gaps, or inadequate troubleshooting procedures, not just the severity of the fault itself.
Downtime by reason code
Reason codes categorise why a machine stopped. Aggregating downtime by reason code over time reveals patterns: which failure types occur most often, which ones take longest to resolve, and which are trending upward. This is the diagnostic layer that converts raw downtime minutes into actionable maintenance intelligence.
Unplanned vs. planned downtime ratio
The proportion of downtime that was unplanned is a direct indicator of maintenance programme maturity. A plant with a high unplanned ratio is operating reactively. As preventive and predictive programmes mature, more downtime shifts to planned windows, which are shorter, cheaper, and less disruptive to production.
Summary table
| Metric | What it measures | Why it matters |
|---|---|---|
| Total downtime | Minutes lost per machine or line | Baseline for improvement |
| Availability | Operating time as a % of planned time | Core OEE input |
| MTBF | Average run time between failures | Reliability trend indicator |
| MTTR | Average time to restore after failure | Maintenance response efficiency |
| Downtime by reason code | Failure type breakdown | Root cause prioritisation |
| Unplanned vs. planned ratio | Reactive vs. proactive maintenance split | Programme maturity indicator |
Methods for Tracking Machine Downtime
There are three broad approaches to downtime tracking. Each has different data quality characteristics, implementation costs, and operator requirements.
Manual and paper-based logging
Operators record stop events on a paper sheet or spreadsheet at the machine. Reason codes are selected from a printed list or written in free text.
Advantages: zero infrastructure cost, works on any machine, no IT involvement required.
Limitations: data is incomplete, timestamps are approximate, reason codes are inconsistent, and the process depends entirely on operator compliance. Short stops are routinely missed.
CMMS and MES software
A CMMS (Computerised Maintenance Management System) or Manufacturing Execution System captures downtime as part of work order management. Technicians log a stop event when they open a work order; the system records start and end times.
Advantages: downtime is linked directly to maintenance activity, reason codes are standardised, and the data integrates with maintenance cost and parts records.
Limitations: only captures events that generate a work order, which means minor stops and short-duration events are still missed. Data quality depends on technician discipline in opening and closing work orders accurately.
Hardware sensors and automated capture
Production monitoring sensors install at the machine level and detect operating state (running, idle, or stopped) based on electrical current draw or signals from the machine's PLC. Every state change is timestamped automatically, with no operator input required at the point of capture.
Advantages: captures every stop event regardless of duration, timestamps are exact, no operator burden, and data is continuous and objective.
Limitations: sensors require installation and commissioning, and reason codes still require operator input after the fact to explain why a stop occurred. The hardware investment is higher than paper-based systems.
Comparison table
| Method | Data completeness | Timestamp accuracy | Operator burden | Setup cost | Reason code quality |
|---|---|---|---|---|---|
| Manual/paper | Low (50-70%) | Low (estimated) | High | None | Low (inconsistent) |
| CMMS/MES | Medium (work order events only) | Medium | Medium | Medium | Medium |
| Hardware sensors | High (all stops captured) | High (automated) | Low | Medium-high | Depends on follow-up process |
How to Categorise and Code Downtime Events
Raw stop-time data becomes useful only when it is categorised consistently. A well-designed reason code system is one of the highest-leverage investments a plant can make in its data quality.
Common downtime categories
Most plants use four to six top-level categories:
- Mechanical failure: bearing failures, seal leaks, structural damage, wear-related breakdowns
- Electrical or controls fault: motor failures, sensor faults, PLC errors, wiring issues
- Process or quality issue: jams, material defects, setup errors, out-of-spec production requiring a stop
- Planned maintenance: scheduled PMs, inspections, lubrication routes, tooling changes
- Changeover: product or format changes, cleaning between runs
- Operator or external: material shortages, operator absence, utility interruptions
Structured reason code design
A two-level code structure (category and sub-reason) gives enough granularity for root cause analysis without overwhelming operators. For example: Category = Mechanical, Sub-reason = Conveyor belt jam. This structure allows you to aggregate across categories for trend analysis while retaining the detail needed to investigate specific events.
Getting operators to code accurately
The most common failure point in reason code systems is inconsistent use. Operators default to generic codes when they are not certain of the cause or when the code list is too long and complex. Best practices:
- Keep the top-level list to six or fewer categories
- Train operators on the purpose of coding, not just the mechanics
- Review reason code distributions weekly: a spike in "unknown" codes is a signal that either the list is unclear or operators are not confident making the call
- Where root cause is genuinely unclear at the time of the stop, flag it for follow-up rather than forcing a guess
How Tractian Automates Machine Downtime Tracking
Tractian's Sensor + Software solution addresses the two core failure modes of manual downtime tracking: incomplete event capture and unreliable timestamps.
Automated stop detection at the machine level
Tractian's current monitoring sensor clamps directly onto the machine's electrical supply. It detects run, idle, and stop states based on current draw, requiring no wiring, no PLC access, and no modification to the machine. Every state change is logged automatically with a precise timestamp.
This means short stops are captured alongside extended breakdowns. A 90-second jam that an operator would never log on paper appears in the data. Over a week, those micro-stops often account for more lost production than the handful of major breakdowns that everyone remembers.
Sensors complement operator knowledge rather than replace it. When the system flags a stop event, operators use the dashboard to assign a reason code and add context. The hardware provides the objective record; the operator provides the interpretation.
OmniTrac for PLC-connected machines
For machines with existing automation, Tractian's OmniTrac (PLC reader) pulls state signals and production counts directly from the PLC. This adds production cycle data to the downtime picture: not just when the machine was stopped, but how many cycles it completed and whether it was running at the target rate.
Real-time dashboards and OEE visibility
Both data sources feed Tractian's OEE platform, which displays Availability, Performance, and Quality by machine, line, and shift in real time. Maintenance and operations leaders can see where production is being lost as it happens, not during the next morning's review meeting.
The downtime prevention and reporting module aggregates stop events by reason code, machine, and time period. Teams can identify which assets account for the largest share of downtime, track MTBF and MTTR trends, and measure the impact of maintenance interventions over time.
What changes when data is automated
Plants that move from manual logging to automated capture typically see two immediate effects: reported downtime increases (because they are now counting events that were previously invisible), and the reliability of the data improves enough to drive genuine prioritisation decisions. The first effect can be uncomfortable. The second is what makes continuous improvement possible.
Frequently Asked Questions
What is the difference between planned and unplanned downtime?
Planned downtime covers stops that are scheduled in advance, such as preventive maintenance, changeovers, and inspections. Unplanned downtime is any stop that was not scheduled, typically caused by equipment failure, process faults, or external disruptions. The ratio of unplanned to total downtime is one of the clearest indicators of maintenance programme maturity.
How do you calculate machine availability from downtime data?
Availability is calculated as: (Planned Production Time minus Total Downtime) divided by Planned Production Time, expressed as a percentage. For example, a machine with 480 minutes of planned production time and 45 minutes of total downtime has an availability of 90.6 percent. Planned downtime is typically excluded from this calculation, though conventions vary by plant.
What is a good MTBF for manufacturing equipment?
MTBF benchmarks vary significantly by machine type, operating environment, and maintenance maturity. Rather than targeting a universal number, the more useful approach is to track MTBF trends over time for each individual asset. A consistent upward trend indicates that maintenance interventions are working. A declining trend is a signal to investigate before the next failure.
Can downtime tracking software integrate with our existing CMMS?
Most modern downtime tracking platforms, including Tractian's, offer integration with standard CMMS and ERP systems. When a sensor detects a stop event that requires a repair, the system can trigger a work order automatically, linking the downtime record to the maintenance activity and cost data in the CMMS. This closes the loop between the operational record and the maintenance record.
See Every Stop Before It Becomes a Problem
Accurate downtime tracking starts with capturing every stop event, not just the ones that make it onto a log sheet. Tractian's Sensor + Software solution automates machine state detection at the electrical level, giving your team objective data on every minute of lost production, with no operator burden at the point of capture.


