Name: Condition Monitoring System
Brand: Tractian
Rating: 4.7 (200 reviews)

The process you are responsible for optimizing cannot be optimized if it is not running. In a continuous chemical plant, an unplanned shutdown on a non-redundant centrifugal pump or compressor is not an inconvenience for another department. It is a process-wide event that stops your unit train, destroys any in-process material, and resets the reliability baseline you have been building toward your next process improvement milestone.

Manufacturing engineers in chemical plants work at the boundary between process design and asset reliability. The process availability that your optimization projects require is delivered by the mechanical integrity of the rotating equipment your process depends on. The PHA and HAZOP updates you support require failure rate data that generic industry databases do not supply for your specific plant, service conditions, and operating history. The turnaround scope that determines your next operating window is partly a process engineering decision, not just a maintenance decision.

The metrics that belong in your engineering review are not the same as the maintenance team's work order completion rates. They are process availability by unit, MTBF by rotating equipment service class, inspection adherence against PSM schedule, and unplanned shutdown frequency classified by root cause. Each has a direct connection to your process reliability scope, and each changes character when continuous asset health data becomes available.

In this guide

What Most Manufacturing Engineers Get Wrong About KPIs in Chemical Manufacturing
Process Availability: The Primary Engineering Metric
MTBF by Rotating Equipment Service Class
PSM Inspection Completion Rate as a Process Engineering Input
Unplanned Shutdown Frequency by Root Cause Category
From Lagging to Leading: How Continuous Monitoring Changes Each Metric
KPI Reference Table for Chemical Process Engineering Scope
How Tractian Provides the Data These Metrics Require

What Most Manufacturing Engineers Get Wrong About KPIs in Chemical Manufacturing

The data most chemical plants track most readily is not the data with the highest consequence for process reliability.

Three specific measurement gaps create the most engineering exposure in chemical process environments:

Treating maintenance KPIs as belonging to a different department. Planned maintenance ratio and MTBF averages are often tracked exclusively by maintenance teams, reviewed in maintenance meetings, and absent from process engineering scope. In a continuous chemical process, that organizational separation is a reliability liability. The failure modes that will initiate your next unplanned process shutdown are developing in assets you can see in your P&ID right now. The metrics tracking those failure modes should be in your engineering review.

Using plant-wide MTBF averages that obscure individual asset trends. Averaging MTBF across all rotating equipment in a unit produces a metric that looks stable while one non-redundant process-critical pump trends toward failure. Centrifugal pumps in corrosive or high-temperature service have fundamentally different failure mode profiles than general utility service pumps. Tracking them together tells you nothing actionable about either.

Relying on scheduled inspection data for process risk assessment when continuous operating-load data is available. Scheduled inspections confirm what the asset looks like when it is cold and accessible. The bearing degradation, impeller cavitation wear, and seal face deterioration that drive failure in continuous chemical service develop under operating load, between inspection cycles. A manufacturing engineer relying on inspection records alone for PHA failure rate validation is working with data that reflects asset condition during shutdown states, not the state that actually produces the failure mode.

The corrective is not more metrics. It is four specific metrics, each tracked at the right granularity, each connected to its process engineering consequence.

Process Availability: The Primary Engineering Metric

In continuous chemical manufacturing, process availability is not one component of an OEE calculation. It is the state of the process stream.

When a non-redundant centrifugal pump in critical process service fails, the result is not a reduced throughput rate. It is a process shutdown, affecting the entire unit train, with a restart sequence that may take hours to days depending on process conditions and safety requirements. The yield loss is not proportional to downtime minutes. It includes the entire batch or in-process volume at time of failure, the startup transient, and any quality qualification period before the process returns to specification.

For a manufacturing engineer, process availability by unit or train is the engineering metric because it is the envelope within which all other work occurs. Process optimization, HAZOP update cycles, equipment specification projects, and CI initiatives all require a running process. Each unplanned shutdown is both a direct production loss and a disruption to the engineering work program.

How to track it correctly:

Track availability by individual unit train or process section, not by facility. A facility availability number of 97% can mask a unit train that experienced multiple events, each representing full unit downtime rather than proportional losses.
Record each availability event with a duration and an initiating equipment tag. Over a 12-month period, this produces the event frequency and consequence data needed for a meaningful FMEA update.
Distinguish between process excursion events and equipment-initiated events. A shutdown triggered by a process parameter exceedance has a different RCA pathway than one triggered by a rotating equipment failure, even if both appear as availability losses in the same metric.

Benchmarks for continuous chemical operations:

Process Type	World Class	Acceptable	Requires Investigation
Continuous petrochemical	98%+ inter-TAR availability	95 to 97.9%	Below 95% or any unplanned TAR event
Specialty chemical batch	95%+ campaign completion rate	88 to 94.9%	Below 88% or multiple mid-campaign events
Continuous pharma or fine chemical	96%+ per campaign or quarter	92 to 95.9%	Below 92%

MTBF by Rotating Equipment Service Class

MTBF is the primary leading indicator of process availability trajectory, but only when tracked at the right level of resolution.

In a chemical process environment, the relevant categorization is not asset type alone. It is asset type combined with service class, because the failure mode distribution and consequence severity differ substantially between service classes.

Centrifugal pumps: critical process service versus general service. A centrifugal pump transferring a corrosive process fluid at high temperature under continuous duty has a fundamentally different failure mode profile than a utility water pump. Seal face wear rates, impeller cavitation susceptibility, and bearing life under chemical exposure differ by service. Tracking these together produces an MTBF number that underrepresents the risk on the critical service asset.

Critical process service classification criteria for chemical plants:

Handles a fluid covered by PSM (highly hazardous chemical) at process conditions
Non-redundant: no installed spare or automatic switchover
Failure initiates a unit-level or plant-level process shutdown
Downstream process cannot continue safely at reduced flow

Assets meeting all four criteria need individual MTBF tracking, with trend review on a 90-day rolling basis.

Compressors in process service. For continuous gas processing, steam cracking, or any compression-dependent reaction pathway, compressor reliability is the single largest rotating equipment risk. A charge gas compressor trip in a steam cracker is a plant-wide event. MTBF should be tracked at the individual machine level, with vibration trend data reviewed at a higher frequency than the normal maintenance review cycle.

Agitators. In batch chemical and pharmaceutical operations, agitator failure during a batch destroys the batch and its full material value. MTBF tracking for agitators needs to include gearbox and motor drive components, not just the shaft and impeller, because the most common failure modes in high-viscosity chemical service occur in the drive train.

Heat exchanger drivers. Shell-and-tube heat exchanger pump and fan drivers often receive less monitoring attention than reaction and compression equipment, but fouling-induced overload and elevated temperature service create failure mode dynamics that are worth tracking by service class separately from general rotating equipment.

MTBF trend interpretation:

A stable MTBF over a 90-day window is acceptable. An improving MTBF trend reflects effective condition-based maintenance execution. A declining trend over 60 or more days on a non-redundant process-critical asset is a process availability risk event, not a maintenance scheduling observation. It warrants RCA initiation, not just a maintenance work order.

The value of MTBF tracking for a manufacturing engineer extends beyond availability management. It produces the plant-specific failure frequency data that supports PHA failure rate validation, FMEA maintenance column updates, and turnaround scope engineering decisions.

PSM Inspection Completion Rate as a Process Engineering Input

Under OSHA PSM 29 CFR 1910.119(j), facilities handling highly hazardous chemicals must maintain documented mechanical integrity programs that include written procedures, qualified inspection performance, frequency conformance, and corrective action documentation for covered equipment.

The manufacturing engineer's connection to this requirement is the PHA and HAZOP update cycle. HAZOP and process FMEA for covered equipment include failure rate assumptions, detection method reliability assessments, and maintenance interval justifications for rotating equipment in the HAZOP nodes. When those assumptions default to industry generic databases because plant-specific inspection history is incomplete or not systematically documented, the process risk assessment is built on data that does not reflect actual plant operating conditions.

What the metric should capture:

Inspection completion rate against scheduled intervals, per asset tag, for all PSM-covered rotating equipment
Inspection deferral events: instances where a scheduled inspection was postponed, with the actual inspection date and duration of deferral recorded
Corrective action completion rate: of deficiencies identified during inspections, what percentage received corrective action within the defined response interval

For PHA update purposes, the manufacturing engineer needs both the completion history and the finding history. An asset that completes inspections on schedule but consistently produces minor finding categories over multiple cycles has a different failure mode profile than one that produces intermittent major findings. That difference should be reflected in the failure rate and detection column of the HAZOP analysis.

Continuous monitoring as a complement to scheduled inspection:

Scheduled inspections provide point-in-time condition data. Continuous vibration and temperature monitoring provides the between-inspection operating-load condition stream. For PSM mechanical integrity documentation, both contribute: the scheduled inspection satisfies the procedural requirement, and the continuous monitoring record provides the failure mode evidence base that makes PHA assumptions defensible.

When continuous monitoring data identifies a developing fault between inspection cycles, the finding also supports the PHA assumption that the detection method is effective. If the monitoring system identified a bearing fault 47 days before failure would have occurred, that is a data point that validates the detection reliability assumed in the HAZOP node. Over time, this type of data improves PHA quality because it replaces assumed detection probabilities with plant-demonstrated detection performance.

Unplanned Shutdown Frequency by Root Cause Category

Unplanned shutdown frequency as a single number has limited engineering utility. What matters for process reliability analysis is the distribution of root cause categories and the trend within each category over time.

Root cause category framework for chemical process environments:

Equipment failure, rotating equipment. This is the category most directly within the scope of condition monitoring improvement. If 60% of unplanned shutdown events trace to centrifugal pump, compressor, or agitator failure modes, the process reliability improvement pathway is asset health monitoring and maintenance execution quality on those specific equipment classes.

Equipment failure, static equipment. Vessel nozzle leaks, heat exchanger tube failures, and valve failures that initiate process shutdowns trace to corrosion, erosion, and pressure cycling damage mechanisms. This category has a different inspection methodology pathway than rotating equipment failure and requires different data.

Process excursion. Shutdowns initiated by process parameter deviation, such as high-high temperature or pressure trips, that trace to a process control failure, feedstock quality variation, or reaction kinetics anomaly rather than a mechanical failure. RCA for this category routes through process engineering scope, not maintenance scope.

Utility failure. Loss of steam, cooling water, instrument air, or power that initiates a process safety shutdown. Utility reliability analysis is a separate engineering workstream.

Correctly classifying events into these categories is an engineering function, not an administrative one. A DCS high-high trip event may appear as a process excursion in the control system log but trace to a pump cavitation event that caused a process parameter exceedance. If it is classified as a process excursion without rotating equipment investigation, the RCA will not identify the underlying equipment failure mode. The manufacturing engineer reviewing root cause classification contributes process knowledge that a maintenance-only review will miss.

Target frequency thresholds for each category:

Root Cause Category	Target Frequency	Investigation Trigger
Rotating equipment failure	0 to 1 events per quarter, non-redundant assets	Any second event within 180 days on same equipment class
Static equipment failure	0 to 1 events per half-year	Any event with process fluid release requiring PSM incident review
Process excursion (equipment-initiated)	0 events	Any event, because a process trip triggered by equipment degradation is a near-miss reliability event
Utility failure	0 events attributed to utility infrastructure reliability	Distinguish external grid events from internal utility system failures

From Lagging to Leading: How Continuous Monitoring Changes Each Metric

The four metrics above are lagging in their traditional form. They record what happened. The value of continuous asset health monitoring for a manufacturing engineer is that it converts each metric from a historical record into a forward-looking signal.

Process availability: from recording events to predicting them. When continuous vibration and temperature data is available on non-redundant process-critical assets, a developing bearing fault or impeller wear pattern becomes visible weeks to months before failure. The process availability metric shifts from recording the last unplanned event to tracking the health margin on the assets most likely to initiate the next one.

MTBF: from calculating averages to tracking degradation rates. Traditional MTBF calculation requires sufficient failure history to produce a statistically meaningful average. For low-frequency failure events on critical assets, waiting for enough failures to calculate a reliable MTBF is not an acceptable engineering methodology. Continuous monitoring provides a degradation rate for each asset, allowing a remaining useful life estimate that does not depend on historical failure frequency. The metric becomes prospective.

PSM inspection rate: from compliance confirmation to failure mode database construction. Each condition monitoring alert that precedes a scheduled inspection finding validates a detection event. Over time, this creates a plant-specific database of actual failure mode onset timing and detection lead time that is more defensible in a PHA than the generic industry database assumptions that the process engineer is otherwise required to use.

Unplanned shutdown frequency: from post-event classification to pre-event identification. If the monitoring system identifies a developing fault on a critical pump 30 days before it would cause a process trip, the root cause category for that potential event is already known before the event occurs. The classification work happens during the maintenance intervention rather than after the failure event.

This shift from lagging to leading is the engineering case for continuous condition monitoring in chemical process environments. It is not a maintenance argument about faster response to faults. It is a process engineering argument about converting unavoidable uncertainty about equipment failure timing into a manageable engineering variable with a quantifiable intervention window.

KPI Reference Table for Chemical Process Engineering Scope

Metric	Tracked At	Review Frequency	World Class	Needs Investigation
Process availability	Unit or train level	Monthly, with event-level review	98%+ continuous, 95%+ batch campaign completion	Any unplanned TAR event; below 95% continuous
MTBF, critical rotating equipment	Individual asset, by service class	90-day rolling trend	Stable or improving trend	Declining over 60 days
PSM inspection completion rate	Per asset tag, PSM-covered equipment	Quarterly	100% on schedule	Any deferral on a covered item without documented justification
PSM corrective action completion	Per finding, by severity tier	Monthly	100% within defined interval by severity	Any overdue corrective action on high-severity finding
Unplanned shutdown frequency	By root cause category	Monthly	0 events rotating equipment, unit level	Any second event within 180 days on same equipment class
Planned maintenance ratio	By unit	Monthly	80%+	Below 60%

How Tractian Provides the Data These Metrics Require

The metrics in this guide require continuous operating-load asset health data. Tractian delivers that data in certified chemical process environments.

Tractian deploys ATEX/UL/CSA-certified sensors on non-redundant process-critical rotating equipment in classified chemical process areas. Continuous vibration spectrum, temperature, and operational parameter data is collected at operating load conditions, not during shutdown states.

For manufacturing engineers supporting PHA and HAZOP updates, Tractian's monitoring record provides the plant-specific failure mode frequency data that replaces generic industry database assumptions. Each monitoring alert with a confirmed maintenance finding is a data point in the plant-specific failure mode database: this equipment class, in this service, at this site, exhibits this failure mode at this frequency with this detection lead time. That data is the basis for defensible PHA failure rate and detection reliability assumptions.

For process availability tracking, Tractian's asset health trend data provides the leading indicator that converts availability from a lagging event record into a forward-looking metric. A developing fault identified 45 days before the projected failure event is an availability risk that can be managed with planned maintenance. The same fault discovered at failure is an unplanned process shutdown.

For turnaround scope engineering, Tractian's inter-TAR health trend data provides the degradation rate evidence that supports condition-based scope decisions. The manufacturing engineer who brings 18 months of continuous monitoring data into a TAR planning review is making scope decisions from actual asset health evidence, not calendar age assumptions.

See how Tractian supports condition monitoring in chemical manufacturing

See how Tractian supports manufacturing engineers in chemical manufacturing

Tractian continuously monitors equipment health in real time, detecting faults early and preventing unplanned downtime.

Explore the Platform

What is the primary KPI for a manufacturing engineer in a continuous chemical process?

Process availability by unit or train is the primary metric. In continuous chemical operations, a centrifugal pump or compressor failure initiates an unplanned process shutdown, stopping all optimization work and resetting any reliability gains. MTBF on non-redundant rotating equipment by service class is the leading indicator of process availability trajectory.

How should MTBF be tracked in a chemical process environment?

MTBF must be tracked by rotating equipment service class, not as a facility-wide average. Centrifugal pumps in critical process service, compressors, agitators, and heat exchanger drivers each have different failure mode profiles and consequence severity. Averaging across classes obscures the declining trend on a non-redundant asset before it forces an unplanned process shutdown.

How does continuous monitoring change inspection KPIs from lagging to leading?

Traditional inspection completion rate confirms that a scheduled activity occurred but does not indicate equipment condition between inspections. Continuous vibration and temperature monitoring converts each asset's health into a real-time stream. The KPI shifts from whether the inspection occurred to what the current degradation rate is and when it will cross the intervention threshold.

What does unplanned shutdown frequency by root cause category reveal?

Classifying unplanned shutdown events by root cause, specifically equipment failure versus process excursion versus utility failure, directs process reliability analysis to the right system. If the majority of events trace to rotating equipment failure modes, that is an asset health monitoring problem, not a process parameter problem. Misclassifying root cause leads to process design interventions that do not address the actual failure driver.

How does PSM mechanical integrity inspection rate connect to manufacturing engineer responsibilities?

Under OSHA PSM 29 CFR 1910.119, mechanical integrity inspection schedules for pressure vessels and rotating equipment in HHC service are a compliance requirement. The manufacturing engineer who participates in PHA and HAZOP updates needs accurate inspection history to validate assumed failure rate intervals for covered equipment. Inspection completion rate is therefore both a compliance metric and a process engineering data input.

Why is process availability different from OEE availability in chemical manufacturing?

OEE availability in discrete manufacturing measures time a machine is running. In continuous chemical processes, availability is the state of the entire process stream. When a non-redundant pump fails, the entire unit train goes down. The manufacturing engineer's frame is process stream availability because failure consequences are process-wide, not isolated to a single workstation.

What role does MTBF trend data play in supporting HAZOP and PHA updates?

HAZOP and process FMEA failure rate assumptions for rotating equipment default to industry generic data when plant-specific history is unavailable. Continuous monitoring creates a plant-specific failure mode frequency database. A manufacturing engineer updating a PHA node for a centrifugal pump in corrosive service can use actual pump MTBF history from that service rather than assuming a generic average.

How does turnaround interval tracking serve as a manufacturing engineer KPI?

The interval between planned turnarounds is the operating window within which CI projects must be implemented. An unplanned turnaround shortens that window and disrupts the engineering work program. Tracking achieved operating hours versus planned TAR interval by unit gives the manufacturing engineer a direct measure of process reliability against the planning assumption underpinning the CI project roadmap.

What Are the Key Metrics for a Manufacturing Engineer in Chemical Manufacturing?