What Are the Key KPIs for a VP of Maintenance in Discrete Manufacturing?

There are three questions a VP of Maintenance in discrete manufacturing needs to be able to answer at any leadership conversation. Is the enterprise meeting its production protection targets? Which sites or asset classes represent the highest financial risk to the business? Is the maintenance program building enterprise capability or eroding it?

Most VPs of Maintenance inherit a portfolio of sites where each plant tracks different metrics, uses different definitions, and reports in different cadences. Aggregating that into an enterprise performance view is not a reporting exercise. It is the core discipline that separates a VP who manages maintenance from one who leads an enterprise reliability program.

This guide covers the enterprise KPI framework, how to read each metric across a multi-site portfolio, and the one board-level financial number that makes every investment conversation credible.

What Most VPs of Maintenance Get Wrong About KPIs

Reporting enterprise averages that hide site-level risk. A 78% average OEE across twelve plants looks acceptable. One plant at 52% generating two-thirds of your unplanned downtime cost is buried inside that number. Enterprise KPIs must be reported with distribution, not just averages. The VP who knows which sites are pulling the average down has a different conversation with their COO than the VP who reports the mean.

Tracking total maintenance spend instead of maintenance cost as % RAV. Total spend in dollars varies with plant size, asset intensity, and production volume. It is not comparable across sites and not defensible to a CFO. Maintenance cost as a percentage of Replacement Asset Value normalizes for size and makes cross-site comparison meaningful. World-class programs run at 2 to 3% of RAV. Most reactive programs run at 5 to 8%. The VP who can trend this metric by site is having a capital efficiency conversation. The VP who reports total dollars is explaining a cost line.

Using planned-to-unplanned ratio as a single enterprise number. A 72% planned ratio averaged across fifteen sites conceals three sites in chronic reactive mode driving the enterprise emergency repair premium. The metric needs site-level distribution: how many sites are above 85% (world-class), how many are between 70 and 84% (managing), and how many are below 70% (structurally reactive). The sites below 70% are where the enterprise downtime cost is being generated.

Not calculating enterprise workforce skill coverage. Experienced technicians retire. Institutional knowledge walks out the door. If no metric tracks what percentage of critical procedures are covered by more than one technician per site, the reliability program has unmapped single points of failure in its workforce. The VP who tracks this is building organizational resilience. The VP who doesn't discovers it when a site loses a key person and a critical asset fails.

Three Questions That Define Enterprise Performance

Enterprise maintenance performance answers three questions, in order.

Is the enterprise meeting its production protection targets? That is an output question: are facilities delivering the reliability that production commitments require?

Which sites or asset classes represent the highest financial risk to the enterprise? That is a risk question: where is the next major unplanned event most likely to occur, and what will it cost when it does?

Is the maintenance program building enterprise capability or eroding it? That is a trajectory question: is the reliability standard improving across the portfolio, or are sites drifting toward reactive mode faster than the program can correct them?

The metrics below answer each question at the enterprise level.

Question 1: Is the Enterprise Meeting Production Protection Targets?

Two metrics define production protection at the enterprise level.

Aggregate OEE variance across sites is not the enterprise OEE average: it is the distribution. A world-class discrete manufacturing operation targets 85% OEE at individual lines. At the enterprise level, the VP of Maintenance needs to know: what percentage of sites are above 80%, what percentage are between 65 and 79%, and what percentage are below 65%? The distribution reveals the reliability quality of the portfolio.

Track OEE by site and by major asset class within each site, not as a single enterprise number. A stamping plant running at 81% OEE with a sub-60% line on a specific press family has a different risk profile than a plant at 73% OEE distributed evenly across all lines.

Planned-to-unplanned maintenance ratio by site is the leading indicator of where production protection is at risk. Sites running below 70% planned maintenance are in structurally reactive mode: they are spending their maintenance labor responding to failures rather than preventing them. That ratio produces higher labor costs, shorter asset life, and a higher unplanned downtime frequency than sites running 85%+ planned.

Track the distribution: how many sites at world-class (85%+), how many managing (70 to 84%), how many reactive (below 70%). Set a target for moving the reactive sites to managing within a defined timeframe. That is a maintenance program improvement target, not an operational one.

Question 2: Which Sites Are the Highest Financial Risk?

Two metrics answer the financial risk question at the enterprise level.

Maintenance cost as % of Replacement Asset Value (RAV) by site is the primary financial benchmark for enterprise maintenance efficiency. Calculate RAV for each site as the current replacement cost of all production equipment. Divide annual maintenance spend by that number. The result is comparable across sites of different sizes, ages, and asset configurations.

World-class: 2 to 3% of RAV. Acceptable: 3 to 5%. Reactive mode: 5% or above.

A site running at 7% of RAV is spending more than twice the maintenance dollar per asset dollar than a world-class program. The excess spend is driven by emergency repair premiums, unplanned labor overtime, and expedited parts sourcing. Identifying the high-RAV-percentage sites tells the VP of Maintenance exactly where to focus program improvement investment.

MTBF trends on Tier 1 asset classes across sites identify where the next major unplanned event is most likely to originate. A declining MTBF trend on a specific asset class at multiple sites is a fleet-wide risk signal, not a site-specific maintenance problem. The VP of Maintenance who sees a declining MTBF trend on conveyor drive motors across four of their eight sites has a different response than a site manager seeing it at one asset.

Review MTBF trends quarterly by asset class across the portfolio. Flag any class showing consistent decline across more than two sites. That is a fleet-level engineering and maintenance standard review, not a per-site corrective action.

Question 3: Is the Program Building Capability or Eroding It?

Workforce skill coverage ratio by site measures what percentage of critical maintenance procedures at each site are documented and covered by more than one qualified technician. A site where one technician holds sole knowledge of a critical procedure (the only person who knows how to rebuild the main conveyor gearbox, calibrate the welding robot, or service the hydraulic press) has a workforce single point of failure.

At enterprise scale, this metric reveals the institutional knowledge risk distribution across the portfolio. Acquired sites are particularly vulnerable: different training histories, different documentation standards, different tenure profiles. The VP of Maintenance who maps skill coverage across all sites before a key technician retires is building resilience. The VP who discovers the gap after retirement is managing a crisis.

Percentage of sites at world-class planned maintenance ratio is the long-horizon program health metric. Improving this number is the VP of Maintenance's primary program mandate. A portfolio where 40% of sites are at world-class planned maintenance ratio (85%+) and that percentage is increasing quarter over quarter is a program building capability. A portfolio where that number is flat or declining is eroding, regardless of what the total maintenance spend says.

Track this as a trend, not a point-in-time number. Set a multi-year target: what percentage of sites should be at world-class in 18 months? That target is the maintenance program's organizational KPI.

Enterprise Benchmark Table

Metric World-Class Acceptable Needs Attention
Maintenance cost as % RAV 2 to 3% 3 to 5% Above 5%
Planned-to-unplanned ratio 85%+ planned 70 to 84% Below 70%
OEE (by line, top sites) 85%+ 65 to 84% Below 65%
MTBF on Tier 1 assets Rising trend Stable Declining trend
Workforce skill coverage 100% 80 to 99% Below 80%
Sites at world-class planned ratio 60%+ of portfolio 30 to 59% Below 30%

The One Board-Level Number

The metrics above belong in management reviews and operational governance. One number belongs in every board-level or CFO-level conversation about maintenance investment.

Total annual enterprise cost of unplanned downtime = (Aggregate unplanned downtime hours x Production value per hour, by site and line) + Emergency repair premium across all sites + OEM penalty exposure for JIT supply chain customers

Build this number by aggregating from sites. Pull 12 months of work order history across all locations. Categorize each unplanned event by asset and calculate the production value at risk per hour for the affected line. Add emergency repair premium from unplanned work orders (typically two to three times the equivalent planned repair cost). Add any OEM penalty or expediting cost from supply chain disruption to JIT customers.

Sum across all sites.

That total is your enterprise downtime cost baseline. It is almost always larger than expected, because sites track production loss, emergency repair costs, and OEM penalties in separate systems. Consolidating them into a single enterprise number is the analysis that turns a maintenance budget conversation into a capital protection conversation.

In a discrete manufacturing enterprise with ten sites, it is common to find that two or three sites generate 60 to 70% of the total downtime cost. Those are the sites where enterprise investment produces the highest return. The board-level conversation is not "we need to spend more on maintenance." It is "we have $X in annual downtime cost concentrated at three sites, and these are the program changes that reduce it."

Reading the Portfolio: When One Site Moves the Enterprise Average

A single underperforming site can dominate enterprise KPIs even in a large portfolio. If one site has a chronic reactive maintenance culture (low planned ratio, high RAV%, frequent emergency events), it will distort the enterprise average while appearing to be a site-level problem.

The corrective approach is a tiered response. Classify sites by reliability maturity: world-class (85%+ planned, below 3% RAV), developing (70 to 84% planned, 3 to 5% RAV), and early-stage (below 70% planned, above 5% RAV). Direct enterprise program investment to early-stage sites first. Standardized PM programs, technology deployment, and workforce training belong there. Developing sites need governance and standard-setting. World-class sites need sustaining investment and can serve as benchmarks for the portfolio.

The goal is to shrink the early-stage population over a 12 to 24 month horizon. Each site that moves from early-stage to developing reduces the enterprise downtime cost and improves the enterprise average. That is a measurable program outcome, reportable to the board with financials attached.

How Tractian Supports Enterprise KPI Reporting

Tractian's condition monitoring platform provides the asset health data that feeds enterprise KPI reporting: MTBF trends by asset class across sites, alert-to-resolution timelines, and planned-versus-unplanned ratios by location. The data is consistent across all sites on the same platform, making cross-site comparison reliable rather than dependent on how each site defines and counts events.

When MTBF trends are declining on a specific asset class at multiple sites simultaneously, the enterprise-level signal appears before the site-level impact. VPs of Maintenance using a common platform across their portfolio see fleet-wide risk patterns that site managers, each looking at their own data, cannot.

See how Tractian supports enterprise manufacturing operations

See how Tractian supports enterprise manufacturing operations

Tractian continuously monitors equipment health in real time, detecting faults early and preventing unplanned downtime.

Explore the Platform

What are the most important KPIs for a VP of Maintenance in discrete manufacturing?

Three enterprise questions define the framework: Is the enterprise meeting production protection targets (aggregate OEE variance, planned-to-unplanned ratio by site)? Which sites are the highest financial risk (maintenance cost as % RAV, MTBF trends on Tier 1 assets)? Is the program building capability (workforce skill coverage ratio, percentage of sites at world-class planned ratio)? The board-level number that ties all three together is the total annual cost of unplanned downtime across all sites.

What is maintenance cost as a percentage of RAV?

Maintenance cost as a percentage of Replacement Asset Value (RAV) is the primary financial benchmark for enterprise maintenance efficiency. Divide annual maintenance spend by the current replacement cost of all production equipment at each site. World-class programs run at 2 to 3%. Reactive programs run at 5 to 8% or higher. The metric is comparable across sites of different sizes, making it the right number for cross-site benchmarking and board reporting.

What is the planned-to-unplanned maintenance ratio and what is world-class?

The ratio measures what percentage of total maintenance hours are planned versus reactive. World-class is 85% planned or higher. Most manufacturing enterprises average 55 to 65% planned across their site portfolio. Sites below 70% are in structurally reactive mode, with structurally higher labor costs, shorter asset life, and more frequent unplanned downtime events.

How do you calculate total enterprise cost of unplanned downtime?

Aggregate unplanned downtime hours by site, multiply by production value per hour for each affected line, add emergency repair premium from unplanned work orders (typically two to three times planned repair cost), and add OEM penalty exposure for JIT supply chain customers. Sum across all sites. Pull 12 months of work order history. The total is almost always larger than expected because sites track these costs in separate systems.

What is the workforce skill coverage ratio?

The ratio measures what percentage of critical maintenance procedures are covered by more than one qualified technician per site. A procedure covered by only one person is a workforce single point of failure. World-class programs target 100% coverage for all Tier 1 asset maintenance procedures. Tracking this across an enterprise portfolio reveals where institutional knowledge risk is concentrated before retirements or departures create a crisis.

How should a VP of Maintenance present KPIs to the board?

Present three numbers: total enterprise maintenance cost as % RAV (vs. world-class benchmark of 2 to 3%), total annual cost of unplanned downtime across all sites (production loss plus emergency repair premium plus OEM penalty exposure), and percentage of sites at world-class planned maintenance ratio (85%+ planned). These three numbers frame maintenance as a capital protection program. Every investment request is evaluated against reduction in downtime cost and improvement in the RAV percentage.