Name: Condition Monitoring System
Brand: Tractian
Rating: 4.7 (200 reviews)

You can decompose the OEE number. You can separate availability loss from performance and quality. You can identify the Tier 1 bottleneck asset generating the most equipment-initiated stops. Then you arrive at the part of the analysis where the data runs out: you know what failed, and roughly when, but you do not know how the degradation progressed before the event. Without that pre-failure record, the root cause analysis is reconstructing a sequence from partial evidence. The FMEA update is based on what was found at inspection, not from the degradation trajectory that led there. The CI kaizen on availability cannot quantitatively separate equipment-driven from process-driven stops.

This guide covers the three engineering challenges that consistently limit OEE improvement work when asset health data is missing, and what changes when continuous condition monitoring fills the data gap.

In this guide

The Data Gap That Limits Availability RCA
Challenge 1: Incomplete RCA on Equipment-Driven Availability Events
Challenge 2: CI Kaizen Projects Without Quantitative Loss Attribution
Challenge 3: FMEA Updates Without Pre-Failure Data
What Changes With Continuous Condition Monitoring
The Asset Selection Decision for OEE Improvement
Continuous vs. Periodic Measurement: Why Route-Based Approaches Miss Transient Events

What Most Manufacturing Engineers Get Wrong About Asset Health and OEE

Treating condition monitoring as a maintenance tool rather than an OEE tool. The framing in most plants positions condition monitoring as a maintenance department capability: the maintenance team uses it to catch bearing failures and schedule repairs. That framing is technically correct and completely misses half the engineering value. For a manufacturing engineer, condition monitoring provides the asset health dimension in the OEE availability analysis that calendar-based PM data cannot supply.

Accepting reconstructed RCA as complete root cause analysis. A 5-Why or fishbone completed without the pre-failure asset health record is working from the failure outcome, not from the failure sequence. The proximate cause (bearing failed, motor overheated) is typically identified correctly. The contributing causes rooted in the degradation progression (how long it developed, under what operating conditions, whether it was detectable before the event) require the monitoring data record to validate.

Running availability improvement kaizens without baseline asset health data. A kaizen that defines the current state using downtime logs alone cannot distinguish equipment-initiated from process-initiated availability events unless the downtime log already includes validated root cause categorization. Most do not. Starting a kaizen on availability without knowing what percentage of stops are equipment-initiated versus process-initiated is working on an uncharacterized problem.

Using periodic vibration routes as the condition monitoring baseline. A quarterly or monthly route captures the asset state at the moment of measurement, typically under planned conditions. Transient events between route visits are invisible. An asset with quarterly routes and a failure mode that progresses to failure in six to eight weeks can fail in the interval between two consecutive route visits without ever generating an out-of-spec reading. Continuous monitoring is not an enhanced version of periodic routes; it captures a fundamentally different data set.

Treating FMEA as a theoretical document rather than a living record. An FMEA built from engineering judgment and historical analogy has assumed failure modes, assumed detection intervals, and assumed RPN rankings. Every failure event on a monitored asset is an opportunity to validate or update those assumptions with actual degradation data. Without condition monitoring, that validation loop does not close.

The Data Gap That Limits Availability RCA

The standard availability analysis for a manufacturing engineer works from two sources: the downtime log (what stopped, for how long, coded by fault category) and the maintenance work order record (what was found at inspection, what was replaced, how long the repair took).

Both sources describe the failure outcome. Neither describes the failure sequence.

The failure sequence is the degradation timeline from the first detectable anomaly to the event that stopped the line. It includes:

When the first detectable precursor appeared (vibration amplitude change, thermal elevation, bearing fault frequency emergence)
How fast the degradation progressed under the operating conditions at the time
Whether operating load changes, process parameter variations, or lubrication intervals influenced the progression rate
What the earliest intervention point was at which the failure could have been prevented with a planned repair

Without condition monitoring, this sequence is unavailable. The failure happened, the line stopped, maintenance investigated and repaired. The pre-failure state is unrecorded. The RCA works from the outcome.

With continuous monitoring, the pre-failure record exists. The full degradation timeline is in the data. The RCA can work from the sequence rather than from the outcome alone. That is the analytical difference that separates reconstructed root cause analysis from validated root cause analysis.

Challenge 1: Incomplete RCA on Equipment-Driven Availability Events

A manufacturing engineer leading an RCA on an equipment-initiated availability event has several standard tools: 5-Why, fishbone (Ishikawa) diagram, fault tree analysis. Each requires a well-characterized event sequence to generate valid root cause conclusions.

The limitation without pre-failure monitoring data is consistent across all three methods: the analysis starts at the failure event, not at the point of first detectable degradation. The 5-Why working backward from "stamping press main motor stopped" can reach "bearing failure" and "bearing failure caused by contamination and insufficient lubrication interval." That is the proximate cause chain.

What it cannot reach without the monitoring record: whether the contamination and lubrication interval were the actual primary cause, or whether an operating load change in the weeks before the event accelerated a pre-existing bearing wear condition that would have been detectable three weeks earlier in the vibration spectrum. Those are two different root causes. The first suggests a contamination control and PM interval fix. The second suggests a load change evaluation and condition-based intervention protocol. They require different corrective actions and produce different FMEA updates.

The asset health timeline from continuous monitoring provides the additional node in the 5-Why that validates which branch is the actual root cause path.

What the RCA gains from monitoring data:

The first detectable precursor and its timestamp (anchors the failure sequence to actual asset behavior)
The degradation rate from first precursor to failure (validates or challenges the assumed progression rate in the FMEA)
Correlation between monitoring parameter changes and process or operating condition changes (enables distinguishing internal failure mode from externally-driven acceleration)
The detection interval at the alert threshold used (confirms whether the current alerting configuration provides sufficient lead time for the next changeover window)

Challenge 2: CI Kaizen Projects Without Quantitative Loss Attribution

A CI kaizen on OEE availability requires a defined current state and a measurable target condition. For the current state to be well-defined on the availability component, the loss attribution question must be answered: of the total availability loss on this line or asset, how much is equipment-initiated, and how much is process-initiated, changeover, or scheduling-driven?

Without asset health data, this question is answered from the downtime log root cause codes. The problem is that downtime log coding is subjective. An operator who records "machine fault" for a stop that was actually a process-initiated feed jam has misclassified the event. A maintenance technician who codes "bearing failure" without distinguishing a sudden failure from a developing degradation has recorded the outcome but not the mechanism.

In plants without continuous monitoring, downtime log accuracy typically degrades over time as the workforce learns that vague codes avoid follow-up questions. The kaizen team that uses this data as the current state baseline is working from a systematically noisy input.

What condition monitoring adds to the kaizen current state analysis:

Asset health correlated with stop events. For each equipment-initiated stop in the monitoring period, the asset health record confirms whether the failure had a detectable precursor. Events with precursors were developing faults that early intervention could have prevented. Events without precursors were sudden failures with no practical early intervention window. These require different corrective actions.

Asset health independent of stop events. The monitoring data shows asset condition continuously, including periods when no stop occurred. This enables detection of developing conditions that have not yet generated an availability event: an asset trending toward early-stage bearing wear that has not yet caused a stop is visible in the monitoring data but invisible in the downtime log. The kaizen can address it before it becomes an availability loss.

Quantitative separation of loss types. Equipment-initiated stops with monitoring precursors are addressable through predictive intervention. Equipment-initiated sudden failures require a different approach (FMEA review, root cause validation). Process-initiated stops require process engineering intervention. The monitoring data provides the quantitative basis for allocating kaizen effort across these categories.

Challenge 3: FMEA Updates Without Pre-Failure Data

FMEA (Failure Mode and Effects Analysis) documents the assumed failure modes, effects, causes, and detection methods for each critical asset. The RPN (Risk Priority Number) rankings that come from FMEA drive maintenance interval decisions, inspection protocols, and spare parts stocking.

An FMEA is only as accurate as the failure history and operating experience it was built from. In most discrete manufacturing environments, FMEAs are initially built from engineering judgment, OEM documentation, and industry analogy. They are updated after significant failure events. Between events, the FMEA assumptions may diverge from actual asset behavior as operating loads change, process modifications are made, and the asset fleet ages.

The specific FMEA limitation without pre-failure monitoring data: the detection ranking (D) in the RPN calculation assumes a detection method and detection interval. For an asset on periodic vibration routes, the assumed detection method is the route, and the assumed detection interval is something shorter than the route frequency. For a quarterly route, the assumed interval might be one to two months.

If the actual failure mode on that asset progresses from undetectable to failure-critical in three to four weeks, the quarterly route provides zero practical detection capability. The FMEA detection ranking is wrong. The RPN is miscalculated. The maintenance interval derived from it is not calibrated to the actual failure progression rate.

Continuous monitoring provides the actual detection interval for each failure mode on each monitored asset: the time between first detectable precursor and failure. That measured interval is the data that should populate the FMEA detection ranking, replacing the assumed value with an empirically validated one. Over time, a fleet of monitored assets generates a calibration dataset for FMEA across every failure mode that has appeared in the monitoring period.

Practical FMEA update protocol using monitoring data:

For each failure event on a monitored asset, extract three values from the monitoring record:

First detectable anomaly timestamp: the point at which any monitoring parameter first deviated from the established normal range.
Alert threshold crossing timestamp: the point at which the deviation reached the configured alert level.
Failure event timestamp: the point at which the asset stopped or was removed from service.

The interval between items 1 and 3 is the actual detection window for that failure mode under the operating conditions present at the time. Compare this to the FMEA assumed detection interval. If the actual interval is shorter than assumed, the FMEA detection ranking is optimistic and the maintenance interval needs to be reduced for that failure mode. If longer, the FMEA is conservative and may be revised.

What Changes With Continuous Condition Monitoring

The three engineering challenges above share a common root: the availability analysis is working from outcome data only. Condition monitoring adds the input data, the degradation sequence that preceded each outcome.

This changes the engineering toolkit in three specific ways:

RCA becomes forensic rather than inferential. The 5-Why team working with a complete degradation timeline is validating a hypothesis against a data record rather than inferring a cause from limited physical evidence. The conclusions are more defensible and the corrective actions are better targeted.

CI kaizens can define the current state quantitatively. The baseline asset health data enables quantitative separation of equipment-driven from process-driven availability loss before the kaizen begins. The improvement target is anchored in a characterised current state, not in a downtime log whose accuracy depends on operator coding consistency.

FMEA is updated from measured evidence, not assumed values. Each failure event on a monitored asset provides a data point for FMEA calibration. Over two to three years of monitoring, the FMEA for each critical asset shifts from an engineering judgment document to an empirically validated one. Maintenance intervals, inspection frequencies, and spare parts stocking decisions derived from that FMEA are grounded in actual asset behavior.

The Asset Selection Decision for OEE Improvement

Continuous monitoring on every asset in a plant is not the starting point. The highest-leverage starting configuration targets the assets whose failure causes the most OEE availability loss.

For a manufacturing engineer leading an OEE improvement project, the asset selection logic follows directly from the OEE decomposition:

Identify the lines with the highest availability loss component in OEE.
On those lines, identify the Tier 1 assets whose failure stops the line (rather than Tier 2 assets with backup capacity or fast replacement).
Within that Tier 1 set, rank by historical availability loss impact: total downtime hours times production value per hour per asset over the last 12 months.
Start monitoring on the top three to five assets by that ranking.

The monitoring data from those assets provides the RCA input, the kaizen current state data, and the FMEA calibration base for the highest-impact availability loss category in the OEE analysis. Expand coverage as the analytical framework is validated and the improvement results are confirmed.

For a Tier 1 auto parts stamping plant, this means starting with the stamping press main drive motor and transfer system. For an appliance plant, the main assembly conveyor drive. For a CNC machining cell, the spindle motor and primary axis drives. In every case, the starting point is the asset whose failure has the highest historical contribution to the availability loss component of OEE.

Continuous vs. Periodic Measurement: Why Route-Based Approaches Miss Transient Events

A periodic vibration route provides one data point per visit. The measurement reflects asset condition at the specific moment the route technician arrives, under the operating conditions present at that moment: typically planned production conditions, mid-shift, at a steady operating load.

Several classes of failure mode develop and manifest in operating conditions that a periodic route cannot capture:

Startup and shutdown transients. Some failure modes in rotating equipment produce the most diagnostic vibration signal during startup or shutdown ramps, when the operating speed crosses through a resonance or when load changes rapidly. A route measurement taken mid-shift at steady state may show a normal signature on an asset that produces a clear fault signature during the startup cycle occurring four times per day.

Load-dependent failure modes. Bearing faults that are subtle at 60% load can become clearly detectable at 100% load. A route conducted at reduced production load during a shift changeover may miss the fault that is visible during peak production hours. Continuous monitoring captures the full load profile including peak-load operation.

Intermittent process-condition faults. Some availability events are caused by intermittent process conditions that coincide with specific production configurations: a material feed characteristic that causes resonance in the conveyor system only during high-density material runs, a lubrication delivery fault that manifests under specific orientation or load cycles. These are invisible to periodic routes unless the route happens to be conducted during the specific condition. Continuous monitoring captures the event whenever it occurs.

Sub-interval degradation rates. A bearing fault that progresses from early-stage to failure-critical in three to five weeks is not reliably caught by monthly routes. The route interval is close enough to the failure progression rate that one visit may show normal, and the next arrives after the failure. Continuous monitoring would have detected the progression in the first week of development.

The practical consequence for FMEA: periodic route data cannot reliably populate the detection interval field for failure modes with sub-interval progression rates. FMEA detection rankings built from route-based programs are optimistic for fast-progressing failure modes. Continuous monitoring provides the data to identify which failure modes progress faster than the route interval and prioritise those for continuous coverage.

The Hidden Factory: Invisible Downtime You Can't See or Fix

Every discrete manufacturing plant has a hidden factory, the accumulated production time lost to micro-stops, brief stoppages, and speed losses that never make it into the ERP or the daily report.

A 2-minute micro-stop on a CNC machining center does not trigger a downtime alarm. An operator clears it and moves on. It is not logged. It happens again 40 minutes later. And again. By the end of the shift, 18 minutes of real production time have vanished. Multiplied across a line, across a week, across a year, these micro-stops represent a significant OEE availability loss that management has no visibility into and the Manufacturing Engineer cannot address without data.

Manual clipboards and ERP manual entry are not solutions. They reflect what operators choose to log, which is not the same as what actually happened to the machine. Manufacturing Engineers who try to drive OEE improvement with operator-reported downtime data are working with a systematically biased dataset.

Machine-level sensor data changes this. Electrical signatures, vibration baselines, and cycle time deviations captured continuously give the Manufacturing Engineer an objective record of exactly when the machine was running, idle, slowing, or cycling below target. The hidden factory becomes visible, and visible problems can be addressed with Lean and Six Sigma methodology.

Finger-Pointing Between Maintenance and Production

The most common data problem in discrete manufacturing is not the absence of data. It is the presence of two incompatible datasets: the operator log and the maintenance log, each of which attributes downtime to the other function.

"The machine is broken." "The operator is running it wrong." This is not a cultural problem that gets resolved through better teamwork. It is an information problem. Without objective, sensor-driven truth about what the machine was actually doing, its vibration level, its electrical load, its cycle time, neither side has the evidence to end the argument. The Manufacturing Engineer is stuck adjudicating between two subjective accounts rather than conducting RCA on real data.

Continuous machine monitoring provides the objective record that eliminates the debate: vibration trend at the time of the reported stoppage, cycle time deviation from baseline, temperature excursion correlated with the production event. The data either shows a developing mechanical fault or it shows the machine was healthy and the process parameter was wrong. Either way, the Manufacturing Engineer has a starting point for a Six Sigma root cause analysis, not an argument to mediate.

Degrading Machines Make Bad Products Before They Stop

A spindle with excessive vibration, a motor running hot, a press with a worn guide: each of these produces defects before it produces a catastrophic failure. The machine does not go from healthy to stopped. It goes from healthy to making marginal product, then to making defective product, then to stopping.

If the first signal the Manufacturing Engineer receives is a maintenance callout or an operator complaint, the defective production has already occurred. Finding out a machine was running out of specification only after a batch of bad product was produced, and has to be scrapped or reworked, is the most expensive quality failure mode in discrete manufacturing. The scrap cost is real. The rework labor cost is real. But the worst cost is the customer implication if the defective product shipped.

Machine health data correlated with quality data closes this gap. Vibration signatures and temperature trends that precede dimensional drift or surface finish degradation give the Manufacturing Engineer advance warning in the domain they care about: product quality, not just equipment availability.

How Tractian Supports Manufacturing Engineer RCA and CI Work

Tractian's continuous monitoring on Tier 1 assets gives manufacturing engineers the pre-failure degradation record that completes the availability RCA. When an equipment-initiated availability event occurs on a monitored asset, the full monitoring history from the weeks before the event is in the record. The first detectable precursor, the progression rate, and the operating conditions during the degradation period are all accessible for the RCA team.

For CI kaizens, Tractian provides the continuous asset health signal alongside the production data, enabling quantitative separation of equipment-driven from process-driven availability events in the baseline analysis. The data is accessible for export to support integration with FMEA documentation and CI project workflows.

Spectrum-level vibration data from Tractian sensors enables failure mode identification rather than threshold-only alerting, which means the RCA team receives the specific failure mode classification with the alert, not just a general anomaly flag. That specificity is what enables FMEA validation rather than requiring a separate diagnostic investigation after each event.

See How Tractian Detects Failures Early

Tractian continuously monitors equipment health in real time, detecting faults early and preventing unplanned downtime.

Explore the Platform

Why is availability RCA incomplete without asset health data?

Traditional availability RCA reconstructs the failure sequence from what is observable after the event: maintenance records, operator logs, component inspection results. This tells you what failed but rarely why the degradation progressed undetected. Continuous condition monitoring provides the pre-failure asset health record, enabling RCA to distinguish a sudden failure from a developing degradation that was progressing for weeks and could have been intercepted.

How does condition monitoring improve CI kaizen projects on bottleneck assets?

A CI kaizen on availability loss requires distinguishing equipment-driven from process-driven stops before defining the intervention. Without asset health data, this distinction relies on observation and operator reporting, which misclassifies process-initiated events as equipment failures and vice versa. Continuous monitoring provides a continuous asset health signal alongside the production data, enabling quantitative separation of loss types before the kaizen begins.

What is the FMEA limitation when pre-failure data is unavailable?

FMEA updates after a failure event typically work from the failure outcome and the maintenance record. Without pre-failure monitoring data, the failure mode sequence and interval cannot be validated from actual asset behavior. Condition monitoring provides the degradation timeline that validates or corrects the assumed failure mode sequence.

How does condition monitoring distinguish equipment-driven from process-driven availability loss?

Equipment-driven availability loss produces an asset health signature before the stop event: rising vibration amplitude, bearing fault frequency emergence, thermal elevation, or current anomaly. Process-driven stops do not produce this signature. Continuous monitoring provides this distinction in real time and retrospectively for any historical event in the monitoring record.

What changes in the RCA process when condition monitoring data is available?

The 5-Why and fishbone analysis gain an additional dimension: the asset health timeline. Instead of working from the failure event backward, the RCA team can examine the complete degradation sequence from the first detectable anomaly to the failure. This enables validation of the failure mode sequence against the FMEA assumptions and identification of the earliest intervention point.

How should a manufacturing engineer use condition monitoring data in an FMEA review?

Use the monitoring data to populate three FMEA fields with actual evidence: failure mode sequence (confirm the observed degradation matches the assumed progression), detection interval (measure the time between first detectable anomaly and failure to validate or update the FMEA detection ranking), and failure cause (confirm whether degradation was driven by maintenance interval, operating load, or process conditions).

What is the engineering benefit of continuous vs. periodic vibration measurement?

Periodic routes capture the asset health state at the moment of measurement, typically under planned conditions at reduced load. Transient events that occur between route visits are invisible to periodic measurement. Continuous monitoring captures the full operating profile, including conditions that only occur during production peaks, startup sequences, or process transitions.

How does monitoring data support the planned-to-unplanned maintenance ratio improvement?

The planned-to-unplanned ratio improves when maintenance interventions can be scheduled before failure rather than in response to failure. Condition monitoring provides the advance warning required for scheduling: when a developing fault is identified weeks before the projected failure point, the repair can be planned for the next available changeover window rather than executed as an emergency.

How Manufacturing Engineers Can Use Asset Health Data to Drive OEE Improvement