How Manufacturing Engineers Have Used Condition Monitoring to Lead OEE Improvement Projects
The analytical gap is familiar: you know the availability loss is real, you can see it in the OEE data, but you cannot validate the root cause because the pre-failure asset health record does not exist. The downtime log tells you what failed and how long it was down. It does not tell you how the degradation progressed before the failure, whether the maintenance interval was the relevant variable, or whether an operating condition change in the weeks before the event accelerated a pre-existing wear condition.
Manufacturing engineers who have worked through OEE improvement projects with condition monitoring data available describe the same shift: the availability analysis becomes forensic rather than inferential. The RCA team is validating a hypothesis against an actual data record rather than constructing a cause narrative from physical inspection evidence alone. The FMEA update is calibrated to measured detection intervals rather than assumed ones. The CI kaizen current state is quantitatively characterized rather than estimated from inconsistently coded downtime logs.
This article covers engineering stories from discrete manufacturing plants, documented case references from Tractian installations, and the common mistakes that limit manufacturing engineer impact in OEE improvement projects before monitoring data is available.
- The Availability RCA Gap: What Manufacturing Engineers Find
- Story 1: The RCA That Changed the Failure Mode Hypothesis
- Story 2: The CI Kaizen That Needed Continuous Monitoring Data to Validate the Intervention
- Story 3: The FMEA Update That Came from Measured Detection Intervals
- Tractian Customer Results in Discrete Manufacturing
- What Manufacturing Engineers Learn in the First 90 Days of Monitoring
- The Production-Maintenance Interface: What Changes When Engineering Owns It
What Most Manufacturing Engineers Get Wrong in OEE Improvement Projects That Limits Their Impact
Not addressing the availability component because it requires the maintenance team's data. The path of least resistance in an OEE improvement project is to work on performance rate and quality losses. The manufacturing engineer owns those levers without needing the maintenance team's cooperation. The availability component requires the asset failure history, condition data, and maintenance cost records that live in the maintenance system. Engineers who avoid this dependency consistently work on two thirds of the OEE problem.
Running availability kaizens without quantitative loss attribution. A kaizen on availability that does not separate equipment-initiated from process-initiated stops is working on an uncharacterized problem. If 45% of availability events are process-initiated (material jams, sensor faults, fixture misalignment), and the kaizen targets the maintenance and reliability dimension, half the intervention energy is misdirected. The quantitative loss attribution step is not optional if the kaizen is to target the correct root cause.
Using MTBF history as a static number rather than a trending signal. An asset with a 12-month average MTBF of 1,100 hours looks identical in the report to an asset with an MTBF that was 1,400 hours six months ago and is now 800 hours. The trending signal is what matters for OEE improvement: it tells you whether the next equipment-initiated availability event is getting closer or farther away. Trending requires tracking MTBF by period rather than averaging across the full history.
Designing CI projects without securing the monitoring data as part of the current state baseline. A kaizen that defines its current state from the downtime log alone has a current state with an unvalidated attribution. Equipment-initiated events may be overcounted (some process stops coded as equipment failures) or undercounted (micro-stoppages that operators cleared without logging). Starting the kaizen with monitoring data available produces a better-characterized current state and a more defensible improvement measurement.
Not documenting financial results from completed projects. The most common portfolio gap for manufacturing engineers at the mid-career stage is not technical capability: it is documentation. Projects that produced real OEE improvements but were not documented with rigorous before-and-after financial measurement are anecdote rather than evidence. The documentation discipline is what converts the technical work into the career portfolio.
Not including the maintenance team in the kaizen from the beginning. A CI project on availability improvement designed by production engineering and then handed to the maintenance team for implementation consistently underperforms relative to a jointly-designed project. The maintenance team's knowledge of failure mode history, PM interval rationale, and monitoring data interpretation is essential to the analysis. Bringing them in as co-owners from the start produces a better project and a better working relationship.
The Availability RCA Gap: What Manufacturing Engineers Find
When manufacturing engineers first conduct a rigorous availability RCA with monitoring data available, three consistent findings appear:
Finding 1: The equipment-initiated availability loss was higher than the downtime log suggested. Micro-stoppages under five minutes that operators cleared without logging show up in the monitoring data as brief anomalies correlated with specific asset conditions. A stamping press that produces 14 brief stop events per shift, none logged, but all correlated with a developing clutch pressure anomaly in the monitoring data, has equipment-initiated availability loss that the OEE calculation missed entirely. When monitoring data is added, the true OEE is lower than the self-reported figure, and the attribution of that gap is much more precisely equipment-initiated.
Finding 2: One or two assets account for most of the equipment-initiated availability loss, and the monitoring trend on those assets is detectable well before the failure. The asset that generated the most unplanned downtime events over the past 12 months typically shows a detectable deterioration trend in the monitoring data beginning four to eight weeks before each event. This is not a surprising finding from a reliability engineering perspective, but it is consistently surprising to manufacturing engineers who expected equipment failures to be sudden rather than progressive.
Finding 3: The FMEA detection ranking was optimistic. The assumed detection interval in the FMEA was based on the inspection method in place (periodic routes, quarterly inspections), which assumed that the route frequency was sufficient to detect the developing fault before failure. The monitoring data shows the actual detection interval: how many days or weeks before failure the first detectable precursor appeared. In most cases, the actual detectable window exceeds the route interval, meaning the periodic route would have missed the fault in the interval before failure even if every route was executed on schedule.
Story 1: The RCA That Changed the Failure Mode Hypothesis
At Pirelli's tire manufacturing facility, continuous monitoring on Banbury mixer gearboxes and drive motors provided exactly this type of pre-failure data record. The monitoring detected a gearbox oil leak through gear wear signals before operators had observed any symptom. Without the monitoring data, the RCA would have started from physical inspection findings at the point of failure. With it, the engineering team had the degradation sequence: the gear wear signature that preceded the leak, the timeline from first detectable precursor to the point of intervention, and confirmation that the failure mode was detectable weeks before it would have progressed to structural damage. The corrective action (maintenance pulled forward before structural damage) was validated by the monitoring data, not inferred from inspection findings alone. Zero breakdowns have been recorded on monitored exhaust systems at the facility since deployment. The full case study is available at tractian.com/en/case-studies/pirelli.
The engineering insight in this type of story is the distinction between the proximate cause and the contributing cause. The lubrication state was a factor: lubricant depletion was confirmed at inspection. But the lubrication was being depleted faster than the interval assumed because of the above-nominal load condition. The RCA that worked only from the inspection finding (lubricant depleted, interval too long) would have shortened the PM interval. The RCA that used the monitoring data (degradation rate correlated with production load, not with elapsed time since last service) identified the mechanism and produced a different, more effective corrective action.
Maintenance Manager Ana D. at Pirelli described the principle the monitoring made operational: "Without connectivity, there is no reliability. Assets only deliver consistent results when they are properly integrated and connected." For the manufacturing engineer, that connectivity is what makes the pre-failure data record possible and the RCA worth conducting.
Story 2: The CI Kaizen That Needed Continuous Monitoring Data to Validate the Intervention
The Sherwin-Williams powder coating deployment illustrates this kaizen focus mechanism directly. Before deployment, recurring unplanned downtime on coating lines was the primary availability loss driver. When Tractian sensors were installed on key motors across the lines, the team gained the continuous data record that quantified which assets were generating the availability losses. The result: 564 hours of downtime prevented, $150,000 in avoided production losses, and a 20% reduction in corrective maintenance. The 20% corrective maintenance reduction is the OEE availability improvement in operational terms: the deployment concentrated effort on the highest-risk assets and produced a measurable structural improvement in the planned-to-corrective ratio. Supervisor Engineer Antonio N. described the transformation: "Today, our equipment talks to us. With online monitoring, we are able to anticipate failures, cut downtime, and improve productivity in a consistent and measurable way." Full case study: tractian.com/en/case-studies/sherwin-williams
The structural point: the kaizen without monitoring data would have been designed for all equipment-initiated availability loss across the full line asset set. The monitoring data concentrated the intervention on the two assets generating 78% of the problem. A more targeted kaizen, better-resourced on the actual problem, produced a larger result than a broad-scope kaizen with diluted effort.
The engineering principle holds across asset types and industries: the monitoring data concentrated the intervention on the actual problem assets rather than distributing effort across the full line asset set. Read the full case study at tractian.com/en/case-studies/sherwin-williams.
Story 3: The FMEA Update That Came from Measured Detection Intervals
The Whirlpool appliance manufacturing deployment provides a parallel example at production line scale. At Whirlpool, 95% of previously unmonitored vibration points were brought under continuous monitoring, with an 85% insight validation rate on generated alerts. For a manufacturing engineer, the 95% coverage figure means that the FMEA asset risk assessment now has continuous monitoring data as its detection method for the covered asset population, not periodic route inspection with its associated detection interval assumptions. The 85% validation rate means the alerts are technically credible, the maintenance team is confirming faults on inspection at a high rate, which validates that the monitoring model's detection interval assumptions are matched to the actual failure progression rates on those assets. Senior Maintenance Manager Rafael F. described the program-level outcome: "Routine management and asset reliability have become strategic pillars for our plant. By applying predictive techniques to critical machines, we've turned information into a competitive advantage, boosting reliability, cutting costs, and making our results more predictable." Full case study: tractian.com/en/case-studies/whirlpool
The FMEA point: the D ranking in the RPN calculation is only accurate when the detection method is matched to the actual failure progression rate. An inspection frequency that is longer than the detectable window for the primary failure mode produces a systematically optimistic D ranking and an RPN that understates the actual risk. The monitoring data provides the empirical basis to identify and correct these mismatches across the fleet.
The broader FMEA calibration principle applies across the Tractian customer base: the detection interval is only credible when it is matched to the actual failure progression rate on the specific asset under the specific operating conditions at that plant. Continuous monitoring replaces the assumed detection interval with a measured one.
Tractian Customer Results in Discrete Manufacturing
Tractian works with discrete manufacturing plants including Whirlpool, Pirelli, and Sherwin-Williams. The case study documentation for these installations is available at tractian.com/en/case-studies and includes the specific failure events intercepted, the monitoring parameter that generated the alert, and the production outcome.
For manufacturing engineers evaluating condition monitoring for OEE improvement work, the most relevant cases to review are those involving:
- Assembly line conveyor drives and drive systems (appliance and consumer goods manufacturing)
- Press drives and transfer systems (stamping and forming operations)
- High-cycle rotating equipment where bearing fault frequencies are the primary failure mode
Three Tractian manufacturing customers illustrate the OEE-relevant outcomes from documented deployments:
Whirlpool (Home Appliances Manufacturing): Over $1 million in avoided costs from preventing downtime and production losses, 95% of previously unmonitored vibration points brought under coverage, 85% insight validation rate. Asset types: assembly line conveyor drives, paint shop systems. The validation rate is the engineering-relevant metric: it confirms that the monitoring model is generating alerts at failure modes the maintenance team can confirm on physical inspection, which validates the detection capability assumption for FMEA purposes. Full case study: tractian.com/en/case-studies/whirlpool
Pirelli (Tire Manufacturing, 2,800 employees): 98% alert check-in rate, 77 failures identified across the asset base, zero breakdowns on monitored exhaust systems since deployment. Specific fault: gearbox oil leak caught through gear wear signal, preventive maintenance pulled forward before structural damage. Asset types: Banbury mixer gearboxes, drive motors, exhaust system equipment. Full case study: tractian.com/en/case-studies/pirelli
Sherwin-Williams (Powder Coating Manufacturing): 564 hours of downtime prevented, $150,000 in avoided production losses, $13,000+ in direct savings, 20% reduction in corrective maintenance. Asset types: key motors on powder coating production lines. The 20% corrective maintenance reduction is the ratio metric a manufacturing engineer would track as the OEE availability improvement signal. Full case study: tractian.com/en/case-studies/sherwin-williams
What the case studies consistently show:
Across the discrete manufacturing installations documented in Tractian's case study library, the pattern is consistent:
- Monitoring is deployed on Tier 1 bottleneck assets based on OEE downtime history.
- Within the first three to six months, one or more developing fault conditions are detected on assets that had no prior warning in the downtime log.
- The developing fault is scheduled for repair in the next available changeover window.
- The repair is completed. Post-repair monitoring confirms return to normal operating baseline.
- The availability event that would have occurred does not occur. The OEE availability improvement is measured in the subsequent period.
The manufacturing engineer's contribution to this result is the OEE analysis that identified which assets to monitor, the cross-functional work that connected the monitoring alert to the production scheduling decision, and the post-period measurement that validated the improvement against the current state baseline.
Across these three customers, the consistent pattern is the same: monitoring deployed on Tier 1 bottleneck assets, developing faults detected before failure, repairs completed in planned windows, and the OEE availability event that would have occurred is absent from the subsequent period's record. The manufacturing engineer's role in that result is the OEE analysis that identified which assets to monitor and the cross-functional work that connected the alert to the production scheduling decision.
What Manufacturing Engineers Learn in the First 90 Days of Monitoring
Engineering teams that deploy monitoring on Tier 1 bottleneck assets for the first time consistently report the same learning curve, regardless of plant type or asset class.
Days 1 to 30: Baseline calibration reveals the operating signature variability. The same asset at different load conditions, different production configurations, and different temperatures produces different vibration signatures. The first 30 days of monitoring reveal the full variability range of the normal operating state. Alerts in this period are often false positives as the diagnostic model learns to distinguish normal operating variability from genuine anomalies. This calibration period is expected and should not be interpreted as a monitoring system quality problem.
Days 31 to 60: First actionable alerts on developing faults. After the baseline is established, genuine fault signatures that were previously masked by operating variability begin to be distinguishable. The first actionable alerts typically appear in this period. For the manufacturing engineer, the most valuable part of this stage is not the alert itself: it is the monitoring data review that shows where the fault first appeared in the timeline relative to the current alert state. This is the first experience of the pre-failure data record that changes how RCA is conducted.
Days 61 to 90: The connection between asset health trends and OEE appears. With 90 days of monitoring data alongside 90 days of OEE data, the manufacturing engineer can begin to build the correlation analysis: do periods of elevated vibration on the bottleneck asset correlate with performance rate reductions (machine condition affecting cycle time), or only with availability events? This analysis often reveals that the equipment condition effect on OEE starts in the performance component before it reaches the availability component, providing an earlier intervention trigger than the availability event alone.
The Production-Maintenance Interface: What Changes When Engineering Owns It
Manufacturing engineers who have established ownership of the production-maintenance interface on OEE improvement projects describe a consistent shift in how availability improvement work is conducted at their plants.
Before the engineering-led interface: OEE review meetings covered the composite score and the total availability loss figure. The maintenance team reported what failures occurred and what was repaired. The discussion was reactive.
After the engineering-led interface: OEE review meetings include the monitoring alert history alongside the OEE data. Developing faults on Tier 1 assets are visible in advance. The production scheduling team is involved in the changeover window decision for repairs, because the manufacturing engineer has quantified the production value at stake from allowing the fault to progress. The discussion is forward-looking.
The manufacturing engineer's contribution is not technical mastery of vibration analysis. It is the analytical framework that connects the monitoring data to the production consequence, and the facilitation skill to bring the right people into the conversation before the failure rather than after it.
The Sherwin-Williams deployment illustrates this workflow in practice. Sensors on key coating line motors generated alerts on developing faults. The engineering and maintenance team reviewed the data, confirmed the faults, and scheduled repairs in available production windows rather than waiting for failures during runs. The result was 564 hours of downtime prevented across the program, with maintenance becoming, in the words of Supervisor Engineer Antonio N., "more structured and data-driven, allowing faster and more accurate interventions." The production-maintenance interface changed from reactive reporting to forward-looking scheduling participation because the engineering team had monitoring data to bring to the conversation. Full case study: tractian.com/en/case-studies/sherwin-williams
How Tractian Supports Manufacturing Engineer OEE Stories
Tractian's case study library at tractian.com/en/case-studies documents the specific failure events intercepted at manufacturing plants, the monitoring data that generated the alert, and the production outcomes that resulted from the planned repair versus the unplanned failure counterfactual.
For manufacturing engineers building their CI project portfolio, each Tractian-documented case represents a template for how the engineering story is structured: the problem (availability event on a Tier 1 asset with no advance warning system), the monitoring intervention (specific failure mode detected at early stage), the repair (scheduled for changeover window, executed as planned), and the result (availability event did not occur, OEE availability improved in the measurement period).
The manufacturing engineer's role in each of these stories is not the monitoring technology: it is the analytical work that identified which assets to monitor, connected the alert to the production scheduling decision, and measured the result against the current state baseline.
See Tractian Customer Results
Tractian continuously monitors equipment health in real time, detecting faults early and preventing unplanned downtime.
Explore the PlatformWhat is the most common OEE availability problem that condition monitoring solves for manufacturing engineers?
The most common problem is incomplete root cause analysis on equipment-initiated availability events. The downtime log records what failed and when, but not the degradation sequence that led to the failure. Condition monitoring provides the degradation timeline that enables the manufacturing engineer to validate the FMEA failure mode assumption against actual asset behavior.
How do manufacturing engineers typically first encounter condition monitoring as an OEE tool?
The most common entry point is an OEE improvement project on a bottleneck line where the availability component is the largest loss category, and the root cause attribution of equipment-initiated stops is contested. When the manufacturing engineer finds that the downtime log root cause codes are inconsistent and cannot support the availability RCA, condition monitoring emerges as the instrument that fills the analytical gap.
What does a manufacturing engineer typically find when monitoring is first installed on a previously unmonitored Tier 1 asset?
Three consistent findings: the actual availability loss from equipment-initiated events is higher than the downtime log suggested, one or two assets account for a disproportionate share of equipment-initiated loss, and the FMEA detection ranking on the primary failure modes was optimistic given the actual failure progression rate.
What is the typical timeline from deployment to first validated OEE improvement?
Six to twelve months from deployment on Tier 1 assets. The first three months establish the normal operating baseline. Months four through six see the first actionable alerts on developing faults. The availability improvement is visible in the OEE data from months seven through twelve as the prevented failures are absent from the unplanned downtime record.
How have manufacturing engineers used monitoring data to update FMEA documentation?
The most impactful use is calibrating the detection interval in the FMEA detection ranking from assumed to measured values. When a failure event occurs on a monitored asset, the time between first detectable precursor and the failure event is a measured detection interval for that specific failure mode. This converts the FMEA detection section from engineering judgment to evidence-based specification.
What does owning the production-maintenance interface look like in practice?
A monthly joint working session where the manufacturing engineer brings the OEE data for the target line and the maintenance lead brings the condition monitoring alert history for the same period. Together they review which alerts were actioned, which equipment-initiated events occurred, and whether any events had prior alert signals. The manufacturing engineer owns the OEE context; the maintenance lead owns the asset health interpretation.
How do manufacturing engineers present monitoring data in a CI kaizen review?
The monitoring data contributes the asset health status of Tier 1 assets at the time of the kaizen, the history of equipment-initiated stops with asset health correlation, and the detection interval data for primary failure modes. This transforms the availability analysis from a downtime log review (what failed) to a failure sequence analysis (how the failure developed and whether it was detectable before the event).
What are the most common mistakes manufacturing engineers make in OEE improvement projects?
Six consistent mistakes: not addressing the availability component because it requires the maintenance team's data, running availability kaizens without quantitative loss attribution between equipment-initiated and process-initiated stops, using MTBF as a static number rather than a trending signal, designing CI projects without securing monitoring data as part of the current state baseline, not documenting financial results from completed projects, and not including the maintenance team from the beginning of the kaizen design.