What Are the Key Metrics for a Manufacturing Engineer in Chemical Manufacturing?
The process you are responsible for optimizing cannot be optimized if it is not running. In a continuous chemical plant, an unplanned shutdown on a non-redundant centrifugal pump or compressor is not an inconvenience for another department. It is a process-wide event that stops your unit train, destroys any in-process material, and resets the reliability baseline you have been building toward your next process improvement milestone.
Manufacturing engineers in chemical plants work at the boundary between process design and asset reliability. The process availability that your optimization projects require is delivered by the mechanical integrity of the rotating equipment your process depends on. The PHA and HAZOP updates you support require failure rate data that generic industry databases do not supply for your specific plant, service conditions, and operating history. The turnaround scope that determines your next operating window is partly a process engineering decision, not just a maintenance decision.
The metrics that belong in your engineering review are not the same as the maintenance team's work order completion rates. They are process availability by unit, MTBF by rotating equipment service class, inspection adherence against PSM schedule, and unplanned shutdown frequency classified by root cause. Each has a direct connection to your process reliability scope, and each changes character when continuous asset health data becomes available.
- What Most Manufacturing Engineers Get Wrong About KPIs in Chemical Manufacturing
- Process Availability: The Primary Engineering Metric
- MTBF by Rotating Equipment Service Class
- PSM Inspection Completion Rate as a Process Engineering Input
- Unplanned Shutdown Frequency by Root Cause Category
- From Lagging to Leading: How Continuous Monitoring Changes Each Metric
- KPI Reference Table for Chemical Process Engineering Scope
- How Tractian Provides the Data These Metrics Require
What Most Manufacturing Engineers Get Wrong About KPIs in Chemical Manufacturing
The data most chemical plants track most readily is not the data with the highest consequence for process reliability.
Three specific measurement gaps create the most engineering exposure in chemical process environments:
Treating maintenance KPIs as belonging to a different department. Planned maintenance ratio and MTBF averages are often tracked exclusively by maintenance teams, reviewed in maintenance meetings, and absent from process engineering scope. In a continuous chemical process, that organizational separation is a reliability liability. The failure modes that will initiate your next unplanned process shutdown are developing in assets you can see in your P&ID right now. The metrics tracking those failure modes should be in your engineering review.
Using plant-wide MTBF averages that obscure individual asset trends. Averaging MTBF across all rotating equipment in a unit produces a metric that looks stable while one non-redundant process-critical pump trends toward failure. Centrifugal pumps in corrosive or high-temperature service have fundamentally different failure mode profiles than general utility service pumps. Tracking them together tells you nothing actionable about either.
Relying on scheduled inspection data for process risk assessment when continuous operating-load data is available. Scheduled inspections confirm what the asset looks like when it is cold and accessible. The bearing degradation, impeller cavitation wear, and seal face deterioration that drive failure in continuous chemical service develop under operating load, between inspection cycles. A manufacturing engineer relying on inspection records alone for PHA failure rate validation is working with data that reflects asset condition during shutdown states, not the state that actually produces the failure mode.
The corrective is not more metrics. It is four specific metrics, each tracked at the right granularity, each connected to its process engineering consequence.
Process Availability: The Primary Engineering Metric
In continuous chemical manufacturing, process availability is not one component of an OEE calculation. It is the state of the process stream.
When a non-redundant centrifugal pump in critical process service fails, the result is not a reduced throughput rate. It is a process shutdown, affecting the entire unit train, with a restart sequence that may take hours to days depending on process conditions and safety requirements. The yield loss is not proportional to downtime minutes. It includes the entire batch or in-process volume at time of failure, the startup transient, and any quality qualification period before the process returns to specification.
For a manufacturing engineer, process availability by unit or train is the engineering metric because it is the envelope within which all other work occurs. Process optimization, HAZOP update cycles, equipment specification projects, and CI initiatives all require a running process. Each unplanned shutdown is both a direct production loss and a disruption to the engineering work program.
How to track it correctly:
- Track availability by individual unit train or process section, not by facility. A facility availability number of 97% can mask a unit train that experienced multiple events, each representing full unit downtime rather than proportional losses.
- Record each availability event with a duration and an initiating equipment tag. Over a 12-month period, this produces the event frequency and consequence data needed for a meaningful FMEA update.
- Distinguish between process excursion events and equipment-initiated events. A shutdown triggered by a process parameter exceedance has a different RCA pathway than one triggered by a rotating equipment failure, even if both appear as availability losses in the same metric.
Benchmarks for continuous chemical operations:
| Process Type | World Class | Acceptable | Requires Investigation |
|---|---|---|---|
| Continuous petrochemical | 98%+ inter-TAR availability | 95 to 97.9% | Below 95% or any unplanned TAR event |
| Specialty chemical batch | 95%+ campaign completion rate | 88 to 94.9% | Below 88% or multiple mid-campaign events |
| Continuous pharma or fine chemical | 96%+ per campaign or quarter | 92 to 95.9% | Below 92% |
MTBF by Rotating Equipment Service Class
MTBF is the primary leading indicator of process availability trajectory, but only when tracked at the right level of resolution.
In a chemical process environment, the relevant categorization is not asset type alone. It is asset type combined with service class, because the failure mode distribution and consequence severity differ substantially between service classes.
Centrifugal pumps: critical process service versus general service. A centrifugal pump transferring a corrosive process fluid at high temperature under continuous duty has a fundamentally different failure mode profile than a utility water pump. Seal face wear rates, impeller cavitation susceptibility, and bearing life under chemical exposure differ by service. Tracking these together produces an MTBF number that underrepresents the risk on the critical service asset.
Critical process service classification criteria for chemical plants:
- Handles a fluid covered by PSM (highly hazardous chemical) at process conditions
- Non-redundant: no installed spare or automatic switchover
- Failure initiates a unit-level or plant-level process shutdown
- Downstream process cannot continue safely at reduced flow
Assets meeting all four criteria need individual MTBF tracking, with trend review on a 90-day rolling basis.
Compressors in process service. For continuous gas processing, steam cracking, or any compression-dependent reaction pathway, compressor reliability is the single largest rotating equipment risk. A charge gas compressor trip in a steam cracker is a plant-wide event. MTBF should be tracked at the individual machine level, with vibration trend data reviewed at a higher frequency than the normal maintenance review cycle.
Agitators. In batch chemical and pharmaceutical operations, agitator failure during a batch destroys the batch and its full material value. MTBF tracking for agitators needs to include gearbox and motor drive components, not just the shaft and impeller, because the most common failure modes in high-viscosity chemical service occur in the drive train.
Heat exchanger drivers. Shell-and-tube heat exchanger pump and fan drivers often receive less monitoring attention than reaction and compression equipment, but fouling-induced overload and elevated temperature service create failure mode dynamics that are worth tracking by service class separately from general rotating equipment.
MTBF trend interpretation:
A stable MTBF over a 90-day window is acceptable. An improving MTBF trend reflects effective condition-based maintenance execution. A declining trend over 60 or more days on a non-redundant process-critical asset is a process availability risk event, not a maintenance scheduling observation. It warrants RCA initiation, not just a maintenance work order.
The value of MTBF tracking for a manufacturing engineer extends beyond availability management. It produces the plant-specific failure frequency data that supports PHA failure rate validation, FMEA maintenance column updates, and turnaround scope engineering decisions.
PSM Inspection Completion Rate as a Process Engineering Input
Under OSHA PSM 29 CFR 1910.119(j), facilities handling highly hazardous chemicals must maintain documented mechanical integrity programs that include written procedures, qualified inspection performance, frequency conformance, and corrective action documentation for covered equipment.
The manufacturing engineer's connection to this requirement is the PHA and HAZOP update cycle. HAZOP and process FMEA for covered equipment include failure rate assumptions, detection method reliability assessments, and maintenance interval justifications for rotating equipment in the HAZOP nodes. When those assumptions default to industry generic databases because plant-specific inspection history is incomplete or not systematically documented, the process risk assessment is built on data that does not reflect actual plant operating conditions.
What the metric should capture:
- Inspection completion rate against scheduled intervals, per asset tag, for all PSM-covered rotating equipment
- Inspection deferral events: instances where a scheduled inspection was postponed, with the actual inspection date and duration of deferral recorded
- Corrective action completion rate: of deficiencies identified during inspections, what percentage received corrective action within the defined response interval
For PHA update purposes, the manufacturing engineer needs both the completion history and the finding history. An asset that completes inspections on schedule but consistently produces minor finding categories over multiple cycles has a different failure mode profile than one that produces intermittent major findings. That difference should be reflected in the failure rate and detection column of the HAZOP analysis.
Continuous monitoring as a complement to scheduled inspection:
Scheduled inspections provide point-in-time condition data. Continuous vibration and temperature monitoring provides the between-inspection operating-load condition stream. For PSM mechanical integrity documentation, both contribute: the scheduled inspection satisfies the procedural requirement, and the continuous monitoring record provides the failure mode evidence base that makes PHA assumptions defensible.
When continuous monitoring data identifies a developing fault between inspection cycles, the finding also supports the PHA assumption that the detection method is effective. If the monitoring system identified a bearing fault 47 days before failure would have occurred, that is a data point that validates the detection reliability assumed in the HAZOP node. Over time, this type of data improves PHA quality because it replaces assumed detection probabilities with plant-demonstrated detection performance.
Unplanned Shutdown Frequency by Root Cause Category
Unplanned shutdown frequency as a single number has limited engineering utility. What matters for process reliability analysis is the distribution of root cause categories and the trend within each category over time.
Root cause category framework for chemical process environments:
Equipment failure, rotating equipment. This is the category most directly within the scope of condition monitoring improvement. If 60% of unplanned shutdown events trace to centrifugal pump, compressor, or agitator failure modes, the process reliability improvement pathway is asset health monitoring and maintenance execution quality on those specific equipment classes.
Equipment failure, static equipment. Vessel nozzle leaks, heat exchanger tube failures, and valve failures that initiate process shutdowns trace to corrosion, erosion, and pressure cycling damage mechanisms. This category has a different inspection methodology pathway than rotating equipment failure and requires different data.
Process excursion. Shutdowns initiated by process parameter deviation, such as high-high temperature or pressure trips, that trace to a process control failure, feedstock quality variation, or reaction kinetics anomaly rather than a mechanical failure. RCA for this category routes through process engineering scope, not maintenance scope.
Utility failure. Loss of steam, cooling water, instrument air, or power that initiates a process safety shutdown. Utility reliability analysis is a separate engineering workstream.
Correctly classifying events into these categories is an engineering function, not an administrative one. A DCS high-high trip event may appear as a process excursion in the control system log but trace to a pump cavitation event that caused a process parameter exceedance. If it is classified as a process excursion without rotating equipment investigation, the RCA will not identify the underlying equipment failure mode. The manufacturing engineer reviewing root cause classification contributes process knowledge that a maintenance-only review will miss.
Target frequency thresholds for each category:
| Root Cause Category | Target Frequency | Investigation Trigger |
|---|---|---|
| Rotating equipment failure | 0 to 1 events per quarter, non-redundant assets | Any second event within 180 days on same equipment class |
| Static equipment failure | 0 to 1 events per half-year | Any event with process fluid release requiring PSM incident review |
| Process excursion (equipment-initiated) | 0 events | Any event, because a process trip triggered by equipment degradation is a near-miss reliability event |
| Utility failure | 0 events attributed to utility infrastructure reliability | Distinguish external grid events from internal utility system failures |
From Lagging to Leading: How Continuous Monitoring Changes Each Metric
The four metrics above are lagging in their traditional form. They record what happened. The value of continuous asset health monitoring for a manufacturing engineer is that it converts each metric from a historical record into a forward-looking signal.
Process availability: from recording events to predicting them. When continuous vibration and temperature data is available on non-redundant process-critical assets, a developing bearing fault or impeller wear pattern becomes visible weeks to months before failure. The process availability metric shifts from recording the last unplanned event to tracking the health margin on the assets most likely to initiate the next one.
MTBF: from calculating averages to tracking degradation rates. Traditional MTBF calculation requires sufficient failure history to produce a statistically meaningful average. For low-frequency failure events on critical assets, waiting for enough failures to calculate a reliable MTBF is not an acceptable engineering methodology. Continuous monitoring provides a degradation rate for each asset, allowing a remaining useful life estimate that does not depend on historical failure frequency. The metric becomes prospective.
PSM inspection rate: from compliance confirmation to failure mode database construction. Each condition monitoring alert that precedes a scheduled inspection finding validates a detection event. Over time, this creates a plant-specific database of actual failure mode onset timing and detection lead time that is more defensible in a PHA than the generic industry database assumptions that the process engineer is otherwise required to use.
Unplanned shutdown frequency: from post-event classification to pre-event identification. If the monitoring system identifies a developing fault on a critical pump 30 days before it would cause a process trip, the root cause category for that potential event is already known before the event occurs. The classification work happens during the maintenance intervention rather than after the failure event.
This shift from lagging to leading is the engineering case for continuous condition monitoring in chemical process environments. It is not a maintenance argument about faster response to faults. It is a process engineering argument about converting unavoidable uncertainty about equipment failure timing into a manageable engineering variable with a quantifiable intervention window.
KPI Reference Table for Chemical Process Engineering Scope
| Metric | Tracked At | Review Frequency | World Class | Needs Investigation |
|---|---|---|---|---|
| Process availability | Unit or train level | Monthly, with event-level review | 98%+ continuous, 95%+ batch campaign completion | Any unplanned TAR event; below 95% continuous |
| MTBF, critical rotating equipment | Individual asset, by service class | 90-day rolling trend | Stable or improving trend | Declining over 60 days |
| PSM inspection completion rate | Per asset tag, PSM-covered equipment | Quarterly | 100% on schedule | Any deferral on a covered item without documented justification |
| PSM corrective action completion | Per finding, by severity tier | Monthly | 100% within defined interval by severity | Any overdue corrective action on high-severity finding |
| Unplanned shutdown frequency | By root cause category | Monthly | 0 events rotating equipment, unit level | Any second event within 180 days on same equipment class |
| Planned maintenance ratio | By unit | Monthly | 80%+ | Below 60% |
How Tractian Provides the Data These Metrics Require
The metrics in this guide require continuous operating-load asset health data. Tractian delivers that data in certified chemical process environments.
Tractian deploys ATEX/UL/CSA-certified sensors on non-redundant process-critical rotating equipment in classified chemical process areas. Continuous vibration spectrum, temperature, and operational parameter data is collected at operating load conditions, not during shutdown states.
For manufacturing engineers supporting PHA and HAZOP updates, Tractian's monitoring record provides the plant-specific failure mode frequency data that replaces generic industry database assumptions. Each monitoring alert with a confirmed maintenance finding is a data point in the plant-specific failure mode database: this equipment class, in this service, at this site, exhibits this failure mode at this frequency with this detection lead time. That data is the basis for defensible PHA failure rate and detection reliability assumptions.
For process availability tracking, Tractian's asset health trend data provides the leading indicator that converts availability from a lagging event record into a forward-looking metric. A developing fault identified 45 days before the projected failure event is an availability risk that can be managed with planned maintenance. The same fault discovered at failure is an unplanned process shutdown.
For turnaround scope engineering, Tractian's inter-TAR health trend data provides the degradation rate evidence that supports condition-based scope decisions. The manufacturing engineer who brings 18 months of continuous monitoring data into a TAR planning review is making scope decisions from actual asset health evidence, not calendar age assumptions.
See how Tractian supports condition monitoring in chemical manufacturing
See how Tractian supports manufacturing engineers in chemical manufacturing
Tractian continuously monitors equipment health in real time, detecting faults early and preventing unplanned downtime.
Explore the PlatformWhat is the primary KPI for a manufacturing engineer in a continuous chemical process?
Process availability by unit or train is the primary metric. In continuous chemical operations, a centrifugal pump or compressor failure initiates an unplanned process shutdown, stopping all optimization work and resetting any reliability gains. MTBF on non-redundant rotating equipment by service class is the leading indicator of process availability trajectory.
How should MTBF be tracked in a chemical process environment?
MTBF must be tracked by rotating equipment service class, not as a facility-wide average. Centrifugal pumps in critical process service, compressors, agitators, and heat exchanger drivers each have different failure mode profiles and consequence severity. Averaging across classes obscures the declining trend on a non-redundant asset before it forces an unplanned process shutdown.
How does continuous monitoring change inspection KPIs from lagging to leading?
Traditional inspection completion rate confirms that a scheduled activity occurred but does not indicate equipment condition between inspections. Continuous vibration and temperature monitoring converts each asset's health into a real-time stream. The KPI shifts from whether the inspection occurred to what the current degradation rate is and when it will cross the intervention threshold.
What does unplanned shutdown frequency by root cause category reveal?
Classifying unplanned shutdown events by root cause, specifically equipment failure versus process excursion versus utility failure, directs process reliability analysis to the right system. If the majority of events trace to rotating equipment failure modes, that is an asset health monitoring problem, not a process parameter problem. Misclassifying root cause leads to process design interventions that do not address the actual failure driver.
How does PSM mechanical integrity inspection rate connect to manufacturing engineer responsibilities?
Under OSHA PSM 29 CFR 1910.119, mechanical integrity inspection schedules for pressure vessels and rotating equipment in HHC service are a compliance requirement. The manufacturing engineer who participates in PHA and HAZOP updates needs accurate inspection history to validate assumed failure rate intervals for covered equipment. Inspection completion rate is therefore both a compliance metric and a process engineering data input.
Why is process availability different from OEE availability in chemical manufacturing?
OEE availability in discrete manufacturing measures time a machine is running. In continuous chemical processes, availability is the state of the entire process stream. When a non-redundant pump fails, the entire unit train goes down. The manufacturing engineer's frame is process stream availability because failure consequences are process-wide, not isolated to a single workstation.
What role does MTBF trend data play in supporting HAZOP and PHA updates?
HAZOP and process FMEA failure rate assumptions for rotating equipment default to industry generic data when plant-specific history is unavailable. Continuous monitoring creates a plant-specific failure mode frequency database. A manufacturing engineer updating a PHA node for a centrifugal pump in corrosive service can use actual pump MTBF history from that service rather than assuming a generic average.
How does turnaround interval tracking serve as a manufacturing engineer KPI?
The interval between planned turnarounds is the operating window within which CI projects must be implemented. An unplanned turnaround shortens that window and disrupts the engineering work program. Tracking achieved operating hours versus planned TAR interval by unit gives the manufacturing engineer a direct measure of process reliability against the planning assumption underpinning the CI project roadmap.