How Manufacturing Engineers Should Evaluate Condition Monitoring for OEE and CI Projects
Most condition monitoring evaluation processes are run by maintenance departments. The evaluation criteria reflect maintenance priorities: reliability of alerting, ease of work order creation, technician usability. These are legitimate criteria. They are also incomplete from a manufacturing engineer's perspective.
A manufacturing engineer using condition monitoring data for OEE improvement, root cause analysis, and CI project support has different requirements. You need spectrum-level data that enables failure mode identification, not just anomaly detection. You need continuous measurement that captures transient events and peak-load conditions, not periodic routes. You need data that you can export and analyze in your own toolset, not data locked in a vendor dashboard. And when specifying new equipment, you need to evaluate monitoring-readiness as a component of the equipment selection FMEA before the asset is installed.
This guide covers the evaluation framework from that perspective.
What Most Manufacturing Engineers Get Wrong When Evaluating Monitoring Solutions
Accepting global RMS vibration as equivalent to spectrum data. A vendor who shows you an RMS trend line that goes up and then an alarm is showing you anomaly detection. They are not showing you failure mode identification. For OEE RCA and FMEA work, you need to know what failure mode is developing: outer race bearing fault, gear mesh degradation, imbalance. RMS cannot answer that question.
Not distinguishing truly continuous measurement from periodic-but-automated measurement. Some systems sample every few seconds. Some sample every hour. Some sample every day on a fixed schedule. The marketing language often uses "continuous monitoring" to describe scheduled automated routes that are simply executed more frequently than a human-conducted route. Ask specifically: what is the measurement interval, and is data stored at the native measurement rate or aggregated?
Evaluating the vendor interface rather than the underlying data. A well-designed dashboard that presents data clearly is useful for maintenance operations. A manufacturing engineer running RCA needs to access the raw data: time-series exports, spectrum files, pre-event history for any alert. If the evaluation only looks at what the dashboard shows, it misses the most important criterion for engineering use.
Not asking about detection lead time on the specific failure modes you care about. Generic claims about early detection are not useful. You need to know: for an early-stage outer race bearing fault on a motor similar to mine, at what point does your system generate an alert, and how much time does that typically provide before failure? The answer must come with case history data, not with general claims.
Deferring the monitoring-readiness assessment to post-installation. If a new line or press is specified and installed without sensor mount point accessibility in the commissioning specification, retrofitting monitoring later is mechanically constrained and expensive. The evaluation happens at equipment selection, not during the first PM cycle after installation.
Evaluation Criterion 1: Vibration Spectrum Data vs. Global RMS Only
This is the first criterion to evaluate, because it determines whether the monitoring system is useful for engineering analysis or only for maintenance alerting.
Global RMS monitoring measures the total vibration energy across the full frequency range as a single value. When the value changes significantly, an alert is generated. This enables anomaly detection: something changed. It does not enable failure mode identification: what changed, and which failure mechanism is responsible.
Spectrum-based monitoring resolves the vibration signal using Fast Fourier Transform (FFT) processing, expressing the vibration energy at each frequency band. Each failure mode in a rotating machine generates characteristic frequencies:
- Outer race bearing fault: calculated from bearing geometry (number of balls, ball diameter, pitch circle diameter) and rotation speed. Typically expressed as BPFO (Ball Pass Frequency Outer Race).
- Inner race bearing fault: BPFI (Ball Pass Frequency Inner Race), which modulates with rotation speed and generates sidebands at 1x rotation frequency.
- Gear mesh defect: gear mesh frequency (teeth count times rotation speed) and its harmonics, with sidebands at rotation frequency of the defective gear.
- Imbalance: 1x rotation frequency, typically with low harmonic content.
- Misalignment: 2x and 4x rotation frequency, often with axial component.
When the monitoring system provides spectrum data, the alert is not just "vibration is high." The alert is: "outer race bearing fault frequency detected at this amplitude, at this asset, at this measurement point." That is the alert content that enables the RCA team to validate the FMEA failure mode assumption directly.
Evaluation test: request a real failure case from the vendor for an asset class similar to yours. Ask them to show you the spectrum data at three points: first alert, halfway to failure, and at failure. If they cannot show spectrum plots at each stage, they are providing global RMS monitoring, not spectrum-based monitoring.
Evaluation Criterion 2: Measurement Frequency and Transient Coverage
The measurement interval determines what classes of event the monitoring system can capture.
Why measurement interval matters for OEE analysis:
A monitoring system that measures every 30 minutes captures the asset state 48 times per day. An event that occurs, progresses, and resolves within a 20-minute production transient may not be captured at all if the measurement interval is wider than the event duration.
For OEE improvement, the events that matter most are often transient: startup vibration spikes that indicate a developing resonance condition, thermal excursions during peak load cycles, feed-rate-induced resonance in conveyor systems during specific material configurations. These are the failure precursors that explain why the asset is generating more availability events than the MTBF history alone would predict.
What to confirm with the vendor:
- What is the native measurement interval? (Not the alert checking interval: the actual measurement rate.)
- Is data stored at the native rate, or is it aggregated before storage?
- Can you access the full time-series record for any period, not just the period around an alert event?
- Does the system capture startup and shutdown ramps, or only steady-state operation?
Continuous vs. periodic-but-automated: some vendors describe their system as "continuous" when what they mean is "automated routes that run more frequently than a human-conducted route." A system that takes one measurement per hour and generates an alert when the measurement exceeds a threshold is not continuous monitoring for engineering analysis purposes. It is automated periodic monitoring. The distinction matters: a failure mode that progresses from detectable to critical in six to eight hours is visible in truly continuous data and invisible in hourly sampling.
Evaluation Criterion 3: Alert Specificity and Failure Mode Classification
For a maintenance technician, a useful alert is one that says: "Go check asset X. Vibration is high." The technician arrives, inspects, and determines what to do.
For a manufacturing engineer running RCA, a useful alert is one that says: "Outer race bearing fault detected on the main drive motor of Stamping Press 3, measurement point DE bearing housing. Current BPFO amplitude is at [value], [X]% above baseline. Recommended action: inspect and replace outer race bearing before next planned production stop."
The difference is failure mode classification versus threshold crossing. Threshold crossing tells you that something changed. Failure mode classification tells you what changed and what it means for the FMEA failure mode and effects analysis.
Why failure mode classification is required for FMEA work:
When an alert fires and the maintenance team inspects the asset, their inspection report will describe what they found. If the monitoring alert included a failure mode classification, the RCA team can immediately compare the inspection finding to the alert classification: did the system correctly identify the failure mode, or did it misclassify? That comparison provides the calibration feedback that improves the monitoring system's classification accuracy over time.
If the alert only said "vibration high," there is nothing to compare. The alert was a trigger for investigation; it was not a hypothesis about failure mode that the inspection could validate or refute.
Evaluation test: request three to five alert examples from a comparable installation. For each, confirm whether the alert included: asset identification, measurement point, specific parameter and value, failure mode classification (not just anomaly type), and recommended action. If the alerts are all threshold crossings without failure mode specificity, the system is not built for engineering use.
Evaluation Criterion 4: Data Accessibility and Export Capability
A manufacturing engineer's analytical workflow does not live in the monitoring vendor's platform. Root cause analysis documentation, FMEA updates, CI kaizen current state analysis, and OEE reporting all happen in tools the engineering team owns and controls. Monitoring data that cannot be extracted and used in those tools has limited engineering value regardless of how well it works inside the vendor dashboard.
Minimum data accessibility requirements for engineering use:
Time-series data export: the complete measurement record for any asset, any time range, exportable as CSV or accessible via API. This enables integration into RCA documentation, trending analysis in engineering tools, and CI kaizen baseline analysis.
Spectrum data export: the FFT spectrum for any measurement event, exportable in a format that can be opened in a spectrum analysis tool or imported into the engineering team's analysis environment. A spectrum visible only inside the vendor dashboard cannot be attached to an RCA report or FMEA update document.
Pre-event history access: for any alert event, the ability to export the asset health record for the period before the alert, not just the current state. This is the forensic data that RCA requires: the degradation timeline leading to the event.
Unfiltered access without vendor intermediation: the manufacturing engineer should be able to pull any data export themselves, without requesting it from vendor support. Data access that requires submitting a support ticket defeats the purpose of having the data for on-demand engineering analysis.
Questions for the vendor:
- Can I export time-series data for any asset, any time range, as CSV?
- Can I export spectrum data for any individual measurement event?
- Is there an API available for programmatic data access?
- Can I access pre-event history for any alert without contacting your support team?
- Is exported data the raw measurement values, or is it processed or aggregated?
Evaluation Criterion 5: CMMS Integration for RCA Loop Closure
The monitoring system detects a developing fault. An alert is generated. A work order is created in the CMMS. A maintenance technician inspects the asset, finds the fault, and executes the repair. The work order is closed with the finding and the action taken.
Without CMMS integration, the work order outcome exists in the maintenance system and the alert exists in the monitoring system. They are not linked. The question "was this alert valid, and what was found at inspection?" requires manual correlation between two systems.
With CMMS integration, the work order closure feeds back to the monitoring record: what was found at inspection, what was done, and whether the alert correctly identified the failure mode. This is the feedback loop that enables:
Alert threshold calibration: if a class of alerts consistently results in no-fault-found inspections, the threshold is too sensitive for that failure mode and operating condition. The CMMS feedback data identifies this pattern so the threshold can be adjusted.
Failure mode classification validation: if the monitoring system classified an alert as an outer race bearing fault and the technician found an inner race fault, that misclassification is recorded. Over time, these discrepancies improve the diagnostic model for that asset class.
FMEA detection ranking updates: the CMMS outcome data shows the time between first alert and the failure mode confirmation at inspection. That is the actual detection interval for the current alert threshold configuration, which directly informs the FMEA D ranking.
Evaluation question: what CMMS platforms does your system integrate with, and does the integration include bi-directional data exchange (alert to work order, and work order outcome back to the monitoring record) or only one-directional alert push?
Evaluation Criterion 6: Detection Lead Time for Changeover Window Scheduling
For an OEE improvement program in discrete manufacturing, the practical value of condition monitoring depends on whether the detection lead time is sufficient to schedule the repair before the asset fails.
The calculation is simple: if your changeover windows occur every six to eight weeks, and your most common failure mode on the monitored asset class has a detection-to-failure interval of two to three weeks, the monitoring system provides enough advance warning to schedule the repair for the next window in most cases. If the detection interval is five to seven days, the repair must either be expedited or the next window must be close enough to catch it.
How to get accurate detection lead time data:
Generic vendor claims about early detection are not useful for this calculation. You need case history data for the specific failure modes on assets similar to yours.
Request from the vendor: case histories showing the monitoring data record for three to five bearing failure events on motors similar to your Tier 1 assets. For each case, confirm the time between first detectable precursor (when the monitoring parameter first deviated from baseline) and the failure event or planned removal. The range across cases is as important as the average: a system that provides 21 days of average lead time with a range of 2 to 45 days is less operationally useful than one that consistently provides 14 to 21 days.
Cross-reference the detection lead time against your changeover window schedule. The monitoring system needs to provide enough advance warning for repair planning and parts staging within the scheduling constraints your plant actually operates under.
Evaluation Criterion 7: False-Positive Rate and Alert Credibility
A monitoring system with a high false-positive rate trains engineers and technicians to discount alerts. This is a program-ending failure mode that does not appear in product evaluations and takes several months to manifest in a new installation.
The mechanism is straightforward: if 40% of alerts result in no-fault-found inspections, technicians learn to defer response. The deferred response becomes normal. An alert that would have enabled a planned repair in the next changeover window instead sits in the queue until the asset fails. The monitoring system is generating correct alerts, but the operational behavior has been conditioned by the false-positive history to treat them as background noise.
How to assess false-positive rate:
Request alert history data from a comparable installation for a 12-month period. Calculate the percentage of alerts that resulted in a confirmed finding at inspection versus alerts where inspection found no fault. A system that cannot provide this data is a system whose false-positive rate has not been measured, which is itself informative.
A manageable false-positive rate in a well-configured installation should be low enough that technicians trust the alerts and respond to them with appropriate urgency. The exact figure will vary by asset class and operating environment, but the trend direction matters most: is the false-positive rate improving over time as the system learns the normal operating signature of each asset, or is it stable or worsening?
Also evaluate alert fatigue from the volume side: how many alerts does a typical installation generate per asset per month? An installation with 200 assets generating 150 alerts per week requires different operational handling than one generating 20 alerts per week. Neither volume is inherently right or wrong; the question is whether the engineering and maintenance team has the capacity to respond to the alert volume the system produces.
Monitoring-Readiness as an Equipment Selection Criterion
Most manufacturing engineers encounter condition monitoring as a retrofit decision on existing equipment. But the decision that has the highest long-term leverage is the one made at equipment selection: specifying monitoring-readiness into the asset before it is installed.
An asset that was designed with accessible, flat, stud-mounted sensor locations at each critical bearing housing, motor drive end and non-drive end, and gearbox output bearing is a monitoring-ready asset. Sensor installation at commissioning takes a few hours. Sensor installation on an asset where the critical bearing housings are obstructed by guarding, located inside a sealed enclosure, or accessible only when a major component is removed may take a planned shutdown and mechanical modification.
What to include in the equipment selection FMEA for monitoring-readiness:
- Identify the key failure modes for the asset type from the FMEA.
- For each key failure mode, identify the optimal measurement location (e.g., drive end bearing housing for outer race bearing fault detection).
- Evaluate whether that location is accessible for sensor installation, cabling routing, and periodic verification without removing guards or components.
- Specify accessible measurement points as a commissioning requirement in the purchase specification.
- Verify accessibility as part of the equipment acceptance test before the asset is accepted into the plant.
The question to ask the equipment vendor: show me the intended sensor mounting locations for continuous vibration monitoring on the critical bearing points of this asset. If the vendor cannot answer this question, it means the asset was not designed with monitoring-readiness as a requirement. That is a useful signal at selection time.
Complementary Measurement: Ultrasonic and Thermal Alongside Vibration
Vibration analysis is the primary measurement method for rotating equipment failure mode detection. Two complementary measurement methods cover failure mode classes that vibration alone misses.
Ultrasonic monitoring detects high-frequency acoustic emission signals (typically in the 20 to 100 kHz range) generated by friction and impacting in early-stage bearing degradation. The physical mechanism is different from vibration: ultrasonic energy is generated by the mechanical event at the defect surface before the defect has grown large enough to produce significant low-frequency vibration. For many bearing fault modes, ultrasonic signatures appear earlier in the degradation progression than vibration signatures.
For a manufacturing engineer focused on maximizing detection lead time for changeover window scheduling, ultrasonic monitoring on the highest-consequence Tier 1 assets provides the longest possible advance warning interval. The combination of ultrasonic (early detection) and vibration (failure mode confirmation) provides both early warning and diagnostic specificity.
Thermal monitoring detects failure modes that produce heat signatures before mechanical signatures: electrical faults in motor windings (turn-to-turn shorts generate local thermal elevation before they affect vibration), lubrication delivery failures (inadequate lubrication produces bearing temperature rise before the dry-running condition generates detectable vibration), and cooling system degradation (heat exchanger fouling or cooling pump degradation shows in gearbox or drive temperature trends before mechanical consequences appear).
For a comprehensive FMEA coverage perspective: vibration covers mechanical failure modes in rotating elements; ultrasonic extends the detection lead time for the same failure modes; thermal covers electrical and lubrication failure modes that are invisible to vibration analysis in their early stages. A monitoring configuration that includes all three measurement types on Tier 1 assets provides the most complete failure mode coverage.
The Evaluation Sequence
- Define your failure mode requirements first. Pull the FMEA for your Tier 1 bottleneck assets and list the failure modes with the highest RPN. These are the failure modes the monitoring system must detect. Evaluate vendors against this specific list, not against a generic capability claim.
- Confirm spectrum data availability for those failure modes. For each high-RPN failure mode, confirm that the vendor's system produces spectrum data (not just global RMS) that would enable detection of that specific failure mode at the relevant frequency bands.
- Confirm measurement frequency and continuity. Ask specifically what the measurement interval is, whether it is truly continuous, and whether transient events between nominal measurement intervals are captured.
- Request a data export sample. Ask for a raw time-series export and a spectrum data export from a comparable installation. Confirm the format and verify that it is accessible without vendor intermediation.
- Confirm CMMS integration for your platform. If your plant uses a specific CMMS, confirm that bidirectional integration is available and ask for a reference from a comparable installation using the same CMMS.
- Get documented detection lead time data. Request case histories for the specific failure modes on assets similar to yours, with documented detection-to-failure intervals.
- Request false-positive rate data. Ask for 12-month alert history from a comparable installation and calculate the proportion of confirmed findings to no-fault-found inspections.
- Evaluate the ultrasonic and thermal capability. Confirm whether the sensor hardware and data platform support ultrasonic and thermal measurement alongside vibration, or whether those require separate hardware and separate data streams.
OEE visibility and micro-stop detection: Evaluate whether the platform surfaces production data at the cycle-time level, not just downtime events, but idle periods, micro-stops, and speed losses that operators do not log. The hidden factory of brief, unlogged stoppages represents a significant OEE availability loss in discrete manufacturing that is invisible without machine-level sensor data. A Manufacturing Engineer driving OEE improvement needs cycle-time resolution, not shift-level averages. Tractian's OEE solution provides automatic production tracking without relying on manual operator input.
Machine health to product quality correlation: Evaluate whether the platform allows correlation of machine health signals, vibration, temperature, power draw, with production quality data. A spindle running with elevated vibration, a motor running hot, a press with worn guides: each produces dimensional variation or surface defects before it produces a failure event. The Manufacturing Engineer who can correlate a vibration trend increase with a Cp/Cpk decline has the data for a Six Sigma RCA. Without this correlation capability, the quality signal and the equipment signal remain in separate systems and the root cause stays invisible. Scrap and rework from parts produced during the degradation window, before the equipment fault was identified, is the direct cost that machine-health-to-quality correlation prevents.
Objective data for RCA and finger-pointing resolution: Evaluate whether the platform produces timestamped, sensor-driven machine state records that can be exported for RCA analysis. The maintenance-versus-production blame cycle in discrete manufacturing is an information problem. Timestamped machine health data that covers the period of a quality event, a stoppage, or a performance deviation gives the Manufacturing Engineer the objective starting point for a PFMEA update or a Six Sigma investigation, not an account from a maintenance log and a different account from an operator log.
How Tractian Meets the Manufacturing Engineer's Evaluation Criteria
Tractian's Smart Trac sensors provide continuous vibration, temperature, and ultrasonic measurement on a single device, eliminating the need for separate hardware for each measurement type. The vibration measurement is continuous at high sampling rates, with FFT spectrum data stored for every measurement event and accessible for export.
Alerts from Tractian's AI diagnostic layer include failure mode classification, not just threshold crossing. The alert content specifies the failure mode, the asset, the measurement point, and the recommended action, providing the RCA starting point without requiring a separate diagnostic investigation.
Time-series and spectrum data are accessible for export via the platform's data export tools and API, without requiring vendor intermediation. Pre-event history for any alert event is retained in the data record and exportable for RCA use.
Tractian integrates with major CMMS platforms for bidirectional work order data exchange, closing the RCA loop from alert through repair outcome to system calibration feedback. False-positive rates improve over time as the diagnostic models learn the normal operating signature of each specific asset in your installation.
See Tractian Smart Trac Sensors
Tractian continuously monitors equipment health in real time, detecting faults early and preventing unplanned downtime.
Explore the PlatformWhat is the difference between global RMS vibration monitoring and spectrum-based monitoring?
Global RMS monitoring captures the overall vibration energy across all frequencies as a single value. It tells you that something changed, but not what changed or which failure mode is developing. Spectrum-based monitoring resolves the vibration signal into its frequency components, enabling identification of specific failure modes: bearing fault frequencies, gear mesh harmonics, imbalance, misalignment. For manufacturing engineer use in FMEA validation and RCA, spectrum data is required.
Why does continuous measurement matter for OEE availability analysis?
Scheduled route measurements capture asset condition at one point in time, typically under planned operating conditions. Transient events, load-dependent failure modes, and degradations that progress within the route interval are invisible to periodic measurement. Continuous monitoring captures the full operating profile including startup transients, peak-load conditions, and process parameter variations.
What should condition monitoring alerts include to be useful for manufacturing engineer RCA?
Alerts useful for RCA must include the failure mode identified (not just a threshold crossing), the asset and measurement point, the detected parameter and its current value relative to baseline, and the recommended action or urgency level. A threshold alert that says "vibration high" requires a separate diagnostic investigation before RCA can begin.
What data export capabilities should a manufacturing engineer require from a monitoring system?
Raw time-series data export in a standard format, spectrum data export for any event period, and the ability to pull exports without vendor intermediation are the minimum requirements. Data locked inside a vendor dashboard cannot be integrated into RCA workflows, FMEA documentation, or CI project analysis tools.
How should condition monitoring capability be included in equipment selection FMEA?
Add monitoring-readiness as an evaluated criterion in the Detection section of the FMEA at the equipment selection stage. Confirm that sensor mounting points for critical failure-mode locations are accessible without removing guards or components, and specify accessibility as a commissioning requirement.
What is the right evaluation sequence for comparing condition monitoring vendors as a manufacturing engineer?
Step 1: define the failure modes you need to detect from your FMEA. Step 2: confirm spectrum data availability for those failure modes. Step 3: confirm measurement frequency and continuity. Step 4: request a data export sample. Step 5: confirm CMMS integration. Step 6: get documented detection lead time data. Step 7: request false-positive rate data.
How does CMMS integration complete the condition monitoring RCA loop?
The monitoring system detects a developing fault and generates an alert. The alert becomes a work order. The work order closure feeds back to the monitoring system: what was found and whether the alert was valid. This closes the RCA loop and provides the calibration feedback that improves alert accuracy over time.
How do you evaluate whether a monitoring solution provides sufficient detection lead time for your changeover window schedule?
Request failure case histories from the vendor for the specific failure modes relevant to your assets. For each case, confirm the time between first detectable precursor and failure event. Compare that interval to your typical changeover window schedule to confirm that the system provides sufficient advance warning for repair planning and parts staging.
What is vibration spectrum analysis and why does it matter for FMEA validation?
Spectrum analysis resolves the raw vibration signal into its component frequencies using FFT processing. Each failure mode in a rotating machine generates characteristic frequencies. Comparing the measured spectrum to characteristic frequencies for the specific asset configuration enables identification of which failure mode is developing, confirming or updating the FMEA failure mode assumption from actual asset behavior rather than inspection outcomes only.
How should ultrasonic monitoring complement vibration analysis for a manufacturing engineer's toolkit?
Ultrasonic sensors detect high-frequency acoustic emission signals generated by friction and early-stage material fatigue before they are detectable in the lower-frequency vibration spectrum. For early-stage bearing fault detection, ultrasonic monitoring typically provides a longer detection lead time than vibration analysis alone. The most capable configurations use both vibration and ultrasonic measurement on the same asset.