Failure Mode: Definition

Definition A failure mode is the specific way in which a component, system, or process fails to perform its required function. It describes what goes wrong and how it goes wrong physically or functionally, independently of why it happened or what consequence it produces.

What Is a Failure Mode?

In maintenance and reliability engineering, a failure mode is one of three linked concepts. You identify the failure mode (what failed and how), trace its failure cause (why it happened), and document its failure effect (what happened as a result). All three must be understood before you can select the right maintenance task.

Failure modes are the foundational input to FMEA (Failure Mode and Effects Analysis), Reliability-Centered Maintenance, and most structured reliability programs.

Failure Mode vs Failure Cause vs Failure Effect

These three terms are frequently confused. They are distinct concepts that form a chain: a cause triggers a failure mode, and the failure mode produces an effect on the system.

Concept Definition Example (Pump)
Failure Mode The specific way in which an item fails to perform its function Mechanical seal leaks
Failure Cause The underlying physical, chemical, or human reason the failure mode occurred Dry running caused face erosion
Failure Effect The consequence of the failure mode on the system, process, or safety Fluid loss, process contamination, pump shutdown
Failure Code A structured code used in a CMMS to classify the failure mode, cause, and remedy Problem: P07 / Cause: C14 / Remedy: R03

Understanding this chain is critical. Maintenance tasks that only address the failure effect (shutting down and restarting the pump) do not prevent recurrence. Tasks must be designed to address the failure cause or to detect the failure mode before it fully develops.

For a deeper look at how failures are classified in work orders, see the glossary entry on failure codes.

Common Types of Failure Modes

Failure modes are grouped by the physical or functional mechanism involved. The table below covers the most common categories in industrial maintenance.

Category Common Failure Modes Typical Assets Affected
Fatigue Fatigue fracture, crack initiation and propagation, surface spalling Shafts, gears, bearings, structural members
Corrosion General corrosion, pitting, galvanic corrosion, stress corrosion cracking Pipework, tanks, heat exchangers, structural steel
Wear Abrasive wear, adhesive wear, erosion, fretting Bearings, seals, pump impellers, conveyor components
Overheating / Thermal Thermal deformation, insulation degradation, seized components, warped surfaces Motors, transformers, gearboxes, brakes
Electrical Insulation breakdown, short circuit, open circuit, arcing, ground fault Motors, switchgear, cables, control panels
Deformation Plastic deformation, buckling, creep, overload yielding Pressure vessels, structural frames, shafts
Leakage / Seal Seal extrusion, O-ring degradation, flange leakage, valve seat leakage Pumps, valves, compressors, hydraulic systems
Contamination / Blockage Filter clogging, foreign object ingestion, fouling, scale buildup Heat exchangers, filters, lubrication systems, nozzles
Control / Instrumentation Signal loss, sensor drift, false trip, software fault, communication failure PLCs, transmitters, safety systems, control valves

One component can have several independent failure modes. A centrifugal pump, for example, may be susceptible to seal leakage, bearing wear, impeller erosion, cavitation damage, and motor insulation breakdown as entirely separate failure modes. Each one may require a different maintenance approach.

How Failure Modes Are Identified and Documented

Failure modes are not assumed from design specs alone. They are identified through a combination of structured analysis and operational experience.

Structured Analysis Methods

FMEA workshops bring together engineers, operators, and maintenance planners to systematically ask: "In how many ways could this component fail to perform its function?" Each answer is a potential failure mode.

Reliability-Centered Maintenance (RCM) analysis uses a structured decision logic to work through every function and every potential failure of each asset. See the entry on Reliability-Centered Maintenance for a detailed breakdown of the RCM process.

Fault Tree Analysis works in reverse: starting from an undesired top-level event and tracing down through the failure modes and causes that could produce it.

Operational and Historical Data Sources

Historical failure analysis records, work order histories, and FRACAS (Failure Reporting, Analysis, and Corrective Action System) data are among the most valuable inputs. They reveal which failure modes have actually occurred, how frequently, and under what conditions.

Operator and technician interviews capture failure modes that have been observed but never formally recorded. Field experience often surfaces failure modes that no design engineer anticipated.

Documentation in a CMMS

Once identified, failure modes should be documented in a structured format. In a CMMS, this is typically done through failure codes linked to work orders. Each closed work order records the problem (failure mode), the cause, and the remedy applied. Over time, this creates a queryable database of failure mode frequency and consequence data.

Standardized failure code libraries prevent the data quality problems that result when technicians describe the same failure mode in dozens of different ways.

Failure Modes in FMEA

FMEA is the most widely used formal process for identifying and analyzing failure modes. The approach is the same regardless of FMEA type: list functions, identify failure modes, analyze effects and causes, score risk, and assign corrective actions.

FMEA Types and Their Focus

FMEA Type Focus Primary Use Case
Design FMEA (DFMEA) Failure modes inherent in product or component design New product development, engineering change review
Process FMEA (PFMEA) Failure modes in manufacturing or operational processes Quality control, process improvement
System FMEA Failure modes across an integrated system or subsystem Complex equipment, production lines, safety systems
FMECA Failure modes plus criticality ranking by probability and severity Defense, aerospace, high-consequence industrial assets

For more detail on Design FMEA, see the entry on DFMEA. For process-level analysis, see PFMEA. For criticality-weighted analysis, see FMECA.

How FMEA Uses Failure Modes

In an FMEA worksheet, each failure mode is scored across three dimensions:

  • Severity (S): How serious is the effect of this failure mode on safety, operations, or quality?
  • Occurrence (O): How likely is this failure mode to occur in a given timeframe?
  • Detectability (D): How easily can the failure mode be detected before it causes a full failure?

The three scores are multiplied to produce a Risk Priority Number (RPN). High RPN failure modes receive corrective actions first: redesign, additional inspection, new monitoring, or modified maintenance tasks.

Failure Modes in RCM

Reliability-Centered Maintenance uses failure modes as the basis for selecting the right maintenance task for each asset function. The RCM decision logic asks, for each failure mode:

  • Is the failure mode evident or hidden?
  • Does it have safety or environmental consequences?
  • Is there a condition-based or predictive task that can detect it in time to take action?
  • Is a scheduled restoration or replacement task applicable and cost-effective?
  • Is redesign or redundancy the only viable option?

RCM analysis often surfaces hidden failure modes in protective devices, where a component has failed but the failure has not been detected because the device has not been called upon to act. For these, a Failure Finding Interval (FFI) is established to test the device periodically.

The bathtub curve is also relevant here: it illustrates that different failure modes dominate at different stages of an asset's life, and the appropriate maintenance response changes accordingly.

Failure Mode Patterns and the P-F Interval

Not all failure modes behave the same way over time. Some develop gradually and give advance warning. Others occur suddenly with no detectable precursor.

The P-F curve describes the interval between the point at which a failure mode becomes detectable (potential failure) and the point at which it becomes a functional failure. A longer P-F interval gives maintenance teams more time to respond.

Failure modes with long, detectable P-F intervals are the best candidates for condition monitoring and predictive maintenance. Failure modes with no detectable interval require a different approach: redundancy, redesign, or run-to-failure if consequences are low.

Six Failure Mode Patterns (Nowlan and Heap)

Research by Nowlan and Heap identified six statistical patterns of failure mode occurrence in industrial and aerospace equipment:

Pattern Description Best Maintenance Response
A - Bathtub High early failure, stable useful life, increasing wearout Break-in inspection, then condition monitoring
B - Wearout Increasing failure probability after a useful-life period Scheduled replacement before wearout
C - Gradual aging Steadily increasing failure probability with no distinct wearout point Condition-based monitoring
D - Initial low / then high Low failure probability early, then increasing Monitoring after initial break-in period
E - Random Constant failure probability at any age Redundancy, run-to-failure (if low consequence)
F - Infant mortality / then random High early failures, then low random rate Burn-in testing, quality control at installation

The practical implication is significant. Age-based maintenance tasks, such as replacing a component every fixed interval, are only appropriate for failure modes that follow patterns A or B, where a clear wearout age exists. For the majority of industrial failure modes, condition-based approaches or redesign deliver better results at lower cost.

How Failure Mode Data Improves Maintenance Strategy

Failure mode data, when systematically collected and analyzed, changes the quality of maintenance decisions at every level.

Task Selection

Each failure mode has a technically feasible maintenance response. Knowing the failure mode tells you whether vibration analysis, oil sampling, thermography, ultrasound, or a time-based replacement interval is the right tool. Applying condition monitoring to a failure mode that has no detectable precursor is waste. Applying a fixed interval to a failure mode that follows a random pattern is equally wasteful.

Vibration analysis, for example, is highly effective at detecting bearing fatigue, misalignment, and imbalance before they become functional failures. But it provides no early warning for a sudden electrical short circuit.

Maintenance Budget Justification

When failure modes are documented with their frequency, severity, and detection lead time, maintenance teams can quantify what each monitoring task is worth. A failure mode that causes four hours of unplanned downtime per event, occurs six times per year, and is detectable two weeks in advance with continuous monitoring provides a clear return-on-investment case for the monitoring investment.

Criticality Analysis

Failure mode data feeds directly into criticality analysis, where assets are prioritized by the risk their failure modes create. High-frequency, high-consequence failure modes on critical assets receive the highest maintenance attention. Low-consequence failure modes on non-critical assets may be intentionally managed through run-to-failure.

Failure Lifecycle Management

Tracking failure modes across an asset's life allows teams to identify patterns, compare performance across similar assets, and manage the full failure lifecycle from detection through resolution and prevention. Over time, this data becomes the basis for revising PM intervals, updating spare parts inventories, and informing procurement decisions when assets are replaced.

Failure Mode vs Functional Failure

A functional failure is the state in which an asset can no longer perform its required function at the required standard. A failure mode is the mechanism that leads to that state.

A pump has the function "deliver 500 liters per minute at 4 bar." Failure modes such as impeller erosion, bearing seizure, or seal leakage can each independently cause this functional failure, but through different physical mechanisms and at different rates. Each failure mode must be addressed separately.

This distinction matters in RCM, where the analyst first defines functions and functional failures, then works down to identify the failure modes responsible for each functional failure.

How Condition Monitoring Targets Failure Modes

Condition monitoring is most valuable when it is applied to specific, known failure modes with measurable P-F intervals. The monitoring technique must match the physical signature of the failure mode being targeted.

Failure Mode Detectable Indicator Monitoring Technique
Bearing fatigue Increased vibration at bearing frequencies Vibration analysis
Gear tooth wear Metallic particles in oil, gear mesh frequency change Oil analysis, vibration analysis
Motor insulation degradation Rising winding temperature, increased current draw Thermal monitoring, current analysis
Seal leakage (early) Ultrasonic emission at seal face Ultrasonic testing
Corrosion / wall thinning Reduced wall thickness measurement Ultrasonic thickness testing, corrosion monitoring
Electrical connection overheating Thermal hotspot at connection point Infrared thermography
Cavitation High-frequency acoustic signature, vibration spike Acoustic monitoring, vibration analysis

When condition monitoring is deployed against known failure modes, every alert carries context. The team knows what is failing, how fast it is likely to progress, and what corrective action is required, rather than receiving a generic alert and spending diagnostic time narrowing down the cause.

Frequently Asked Questions

What is a failure mode?

A failure mode is the specific way in which a component, system, or process fails to perform its required function. It describes what goes wrong physically or functionally, such as a bearing seizing, a seal leaking, or a motor overheating, without explaining why it happened or what effect it produces.

What is the difference between a failure mode and a failure cause?

A failure mode is what fails and how it fails (for example, fatigue fracture of a shaft). A failure cause is the underlying reason the failure mode occurred (for example, misalignment causing cyclic stress). A failure effect is the consequence of the failure mode on system operation (for example, conveyor stops production). FMEA analyzes all three in sequence.

What are the most common types of failure modes in industrial equipment?

The most common failure modes in industrial equipment include fatigue fracture, corrosion, erosion, wear, overheating, deformation, electrical short circuit, insulation breakdown, leakage, contamination, blockage, and control or sensor signal loss. The specific failure modes that matter most depend on the equipment type, operating environment, and criticality.

How are failure modes identified in practice?

Failure modes are identified through structured methods including FMEA workshops, review of historical work order and failure code data, field interviews with operators and technicians, equipment teardowns, and analysis of sensor and monitoring data. RCM programs use a systematic function-by-function analysis to ensure no failure mode is overlooked.

How does understanding failure modes improve maintenance strategy?

When a team knows the specific failure modes an asset is susceptible to, they can select maintenance tasks that directly address those modes. Detectable failure modes with a measurable P-F interval are suited to condition-based and predictive maintenance. Random or sudden failure modes may require redesign or redundancy rather than scheduled inspection.

What is the role of failure modes in FMEA?

In FMEA, failure modes are the starting point of the analysis. For each function, analysts list every potential failure mode, then identify the effects of each failure mode and the causes that could trigger it. Each failure mode is scored for severity, occurrence, and detectability to calculate a Risk Priority Number (RPN), which drives corrective action priorities.

Can one component have multiple failure modes?

Yes. Most industrial components have multiple failure modes. A pump, for example, may experience seal leakage, cavitation, impeller erosion, bearing wear, and shaft misalignment as separate and independent failure modes. Each failure mode must be analyzed separately because each may require a different maintenance response.

The Bottom Line

A failure mode is more than a label on a work order. It is the specific physical or functional mechanism by which an asset stops meeting its required standard. Identifying failure modes accurately, and distinguishing them from failure causes and failure effects, is the foundation of every effective reliability program.

When failure modes are systematically captured in FMEA, RCM, and CMMS failure codes, they enable maintenance teams to select the right task for the right asset at the right time. Condition monitoring adds the most value when it is matched to specific failure modes with detectable P-F intervals, turning generic sensor data into actionable early warnings.

Monitor the failure modes that matter most

Tractian's condition monitoring platform continuously tracks vibration, temperature, and current across your rotating assets, alerting your team when a known failure mode is developing, before it becomes a functional failure.

See How Condition Monitoring Works

Related terms