FMEA (Failure Mode and Effects Analysis): Definition
Key Takeaways
- FMEA stands for Failure Mode and Effects Analysis. It is a proactive reliability tool used to identify failure risks before they cause downtime, safety incidents, or quality defects.
- The Risk Priority Number (RPN) is calculated as Severity x Occurrence x Detection, with each factor scored 1 to 10. Higher RPNs represent higher priority risks requiring corrective action.
- The three main FMEA types are Design FMEA (DFMEA), Process FMEA (PFMEA), and System FMEA, each targeting a different point in the product and manufacturing lifecycle.
- FMEA is a living document. Teams should update it whenever designs change, new failures are discovered, or corrective actions are implemented.
- FMEA differs from FMECA in that FMECA adds a quantitative criticality analysis step, making it more rigorous for high-consequence industries such as aerospace and defence.
What Is FMEA?
FMEA is a team-based analysis technique that asks a simple but powerful question for every component, function, or process step: "How can this fail, and what happens if it does?"
The method was developed by the US military in the 1940s and formalised in MIL-P-1629 in 1949. It was later adopted by the aerospace industry and, by the 1970s, by automotive manufacturers. Today it is a standard tool in industries ranging from manufacturing and healthcare to oil and gas.
FMEA forces engineering, operations, and maintenance teams to think through failure scenarios systematically rather than relying on intuition. Each failure mode is documented alongside its cause, its effect on the system, and the controls already in place to prevent or detect it. The output is a prioritised action list that directs resources toward the failures that matter most.
Because FMEA is conducted before failure, it sits alongside fault tree analysis and criticality analysis as one of the core proactive reliability methods in any mature maintenance programme.
How FMEA Works
FMEA works by decomposing a system or process into its individual functions, then methodically analysing what could go wrong at each step. The analysis follows a consistent structure that allows teams to compare and prioritise risk across many different failure scenarios.
For each failure mode, the team assigns three numerical ratings on a 1-to-10 scale:
- Severity (S): How serious is the effect of the failure on the customer, the process, or safety? A score of 1 means negligible impact; 10 means a safety-critical failure with no warning.
- Occurrence (O): How likely is this failure mode to occur? A score of 1 means extremely unlikely; 10 means near-certain to happen.
- Detection (D): How likely are existing controls to detect the failure before it reaches the end user or causes harm? A score of 1 means detection is almost certain; 10 means the failure is virtually undetectable with current controls.
These three scores are multiplied together to produce the Risk Priority Number:
RPN = Severity x Occurrence x Detection
The RPN ranges from 1 to 1,000. Teams use the RPN to rank failure modes and focus corrective actions on the highest-risk items first. Critically, a high Severity score alone can justify immediate action even if the RPN is moderate, because some failure consequences are unacceptable regardless of probability.
FMEA RPN Calculation
The RPN is the core prioritisation tool within FMEA. It provides a single number that captures three dimensions of risk and allows teams to compare hundreds of failure modes on the same scale.
| Rating | Severity | Occurrence | Detection |
|---|---|---|---|
| 1 | No effect | Failure almost impossible | Detection almost certain |
| 2-3 | Minor effect, slight disruption | Low probability | High chance of detection |
| 4-6 | Moderate effect, customer dissatisfaction | Moderate probability | Moderate detection chance |
| 7-8 | High effect, loss of function | Moderately high probability | Low detection chance |
| 9-10 | Critical: safety hazard or regulatory violation | Very high, near-certain failure | Detection nearly impossible |
When interpreting RPN results, teams should not rely on the number alone. A failure mode with a Severity of 10 and low Occurrence and Detection scores may still demand immediate attention because the consequence is catastrophic. Many organisations set an absolute threshold for Severity (for example, any S score of 9 or 10 triggers mandatory action regardless of RPN).
After corrective actions are implemented, the team recalculates the RPN to confirm that risk has been reduced to an acceptable level.
Types of FMEA
FMEA is applied at different points in the product and process lifecycle. The three primary types each target a different source of risk.
Design FMEA (DFMEA)
Design FMEA analyses the failure modes that could arise from a product's design before it enters manufacturing. DFMEA is used by engineering teams during the design and development phase to identify weaknesses in components, materials, or geometry that could lead to product failures in the field.
The scope of DFMEA is the product design itself. It does not cover how the product is built, only how it was designed. Corrective actions from DFMEA typically involve design changes, tighter tolerances, alternative materials, or redundant components.
Process FMEA (PFMEA)
Process FMEA analyses the failure modes that could arise from a manufacturing, assembly, or service process. PFMEA is conducted by manufacturing and process engineering teams to identify how variability in a process step could produce a defect or non-conformance.
PFMEA covers process inputs such as equipment settings, tooling, operator actions, and environmental conditions. Corrective actions typically include process controls, mistake-proofing (poka-yoke), operator training, or inspection procedures.
System FMEA
System FMEA analyses failure modes at the highest level, looking at how subsystems and interfaces interact and how failures in one part of the system can propagate to affect the overall system function. It is conducted early in the development process when detailed design information is not yet available.
System FMEA is common in complex industries such as aerospace, automotive, and energy, where integrated systems must meet strict reliability and safety targets.
Other Variants
Additional FMEA variants exist for specific contexts:
- Machinery FMEA (MFMEA): Focuses on manufacturing equipment. Used by maintenance and reliability teams to prevent machine breakdowns.
- Software FMEA: Analyses failure modes in software functions and their effects on the system or user.
- Service FMEA: Analyses failure modes in service delivery processes such as maintenance procedures or logistics workflows.
FMEA Worksheet Structure
The FMEA worksheet is the working document where all analysis is recorded. A standard FMEA worksheet contains the following columns:
| Column | What to Record |
|---|---|
| Item / Process Step | The component, subsystem, or process step being analysed |
| Function | What the item or step is required to do |
| Potential Failure Mode | The specific way the item or step could fail to perform its function |
| Potential Effect of Failure | The consequence of the failure mode on the system, process, or customer |
| Severity (S) | Seriousness of the effect, rated 1 to 10 |
| Potential Cause of Failure | The root cause or mechanism that produces the failure mode |
| Current Prevention Controls | Existing controls that reduce the likelihood of the cause or failure mode occurring |
| Occurrence (O) | Likelihood that the failure mode will occur, rated 1 to 10 |
| Current Detection Controls | Existing controls that detect the failure mode or its cause before impact |
| Detection (D) | Likelihood that detection controls will catch the failure, rated 1 to 10 |
| RPN | S x O x D: the calculated risk priority number |
| Recommended Action | The specific corrective or preventive action to reduce risk |
| Responsibility and Target Date | Owner accountable for the action and deadline for completion |
| Revised RPN | Recalculated RPN after the corrective action has been implemented |
The worksheet is a living document. Teams should review and update it whenever a design changes, a new failure mode is discovered, or a corrective action is implemented and its effectiveness confirmed.
FMEA vs FMECA
FMEA and FMECA are closely related but differ in their approach to quantifying risk.
| Dimension | FMEA | FMECA |
|---|---|---|
| Full name | Failure Mode and Effects Analysis | Failure Mode, Effects, and Criticality Analysis |
| Criticality step | Qualitative RPN scoring (S x O x D) | Adds quantitative criticality number using failure rate data and probability |
| Data requirements | Expert judgement and team knowledge | Requires historical failure rate data (e.g. MIL-HDBK-217) |
| Typical industries | Automotive, manufacturing, healthcare, oil and gas | Aerospace, defence, nuclear, space |
| Standards | AIAG-VDA FMEA Handbook, IEC 60812 | MIL-STD-1629A, SAE ARP5580 |
| Output | Prioritised action list based on RPN | Criticality matrix showing probability and consequence for each failure mode |
For most industrial maintenance and manufacturing teams, FMEA is the right starting point. FMECA is warranted when quantitative failure rate data is available and the consequences of failure justify the additional analysis effort.
When to Use FMEA
FMEA delivers the most value when used proactively, before failures occur. The most common situations where an FMEA is warranted include:
- New product or process design: Identify design weaknesses before tooling is committed or production begins.
- Design or process changes: Any modification to an existing product or process can introduce new failure modes that the original FMEA did not consider.
- Safety-critical applications: Any system where failure could injure people, damage the environment, or violate regulations demands a formal FMEA.
- High-cost failure history: When recurring failures are driving significant corrective maintenance costs, FMEA helps identify and address root causes systematically.
- Reliability-centred maintenance programmes: FMEA is a core input to reliability-centred maintenance (RCM), which uses failure mode data to select the most appropriate maintenance strategy for each asset.
- New supplier or manufacturing site qualification: FMEA helps assess whether a new source or location introduces risk before production is transferred.
FMEA is less suitable as a reactive investigation tool after a specific failure event. For post-failure analysis, root cause analysis or Five Whys are more appropriate starting points.
Steps to Conduct an FMEA
A rigorous FMEA follows a defined sequence. Skipping steps reduces the quality of the output and the reliability of the RPN scores.
Step 1: Define the Scope and Assemble the Team
Start by defining what is being analysed: a specific component, a process, a subsystem, or a full system. Set clear boundaries so the team knows what is in and out of scope. Assemble a cross-functional team that includes design, manufacturing, quality, maintenance, and any other function with relevant knowledge.
Step 2: Create a Functional Block Diagram
Break the system or process into its individual functions or steps. A functional block diagram or process flow map provides the structure for the FMEA worksheet. Each block or step becomes a line item in the analysis.
Step 3: Identify Potential Failure Modes
For each function or process step, the team identifies all the ways it could fail to perform as intended. A single function can have multiple failure modes. Teams should consider failure modes observed in similar products or processes, customer complaints, warranty data, and engineering judgement.
Step 4: Determine the Effects of Each Failure Mode
For each failure mode, the team defines what happens as a result: what effect does the failure have on the next step in the process, on the end customer, on safety, or on compliance? Effects can be local (within the subsystem) or at the system level.
Step 5: Rate Severity, Occurrence, and Detection
Using the agreed rating scales, the team assigns scores for Severity, Occurrence, and Detection for each failure mode. Ratings should be based on data where available, and on team consensus where data is limited. It is important that the team uses consistent criteria throughout the worksheet.
Step 6: Calculate the RPN and Prioritise
Multiply the three scores to produce the RPN. Sort the worksheet by RPN from highest to lowest. Focus corrective actions on the highest-RPN items first, but also flag any failure mode with a Severity score of 9 or 10 regardless of its RPN.
Step 7: Assign Corrective Actions
For each high-priority failure mode, define a specific action to reduce risk. Actions fall into three categories: reducing Occurrence (eliminate the cause), improving Detection (add or improve controls), or, as a last resort when the first two are not feasible, reducing the impact of Severity. Each action must have a named owner and a target completion date.
Step 8: Implement Actions and Recalculate the RPN
After actions are completed, the team reassigns Severity, Occurrence, and Detection scores and calculates a revised RPN. This confirms that the corrective action has achieved the intended risk reduction. If the revised RPN remains unacceptably high, further action is required.
Step 9: Maintain the FMEA as a Living Document
An FMEA is not a one-time exercise. It should be updated whenever the design or process changes, when new failure modes are identified through field experience, or when periodic reviews are scheduled. A CMMS can support this by linking FMEA findings to work orders and maintenance records, making it easier to track whether predicted failure modes are occurring in practice.
Benefits of FMEA
When conducted rigorously, FMEA delivers measurable benefits across design, manufacturing, and maintenance functions:
- Failure prevention: By identifying failure modes before they occur, FMEA allows teams to address root causes during design or process planning rather than after a costly failure event.
- Prioritised risk management: The RPN provides a structured basis for allocating corrective action resources to the failures that pose the greatest risk.
- Improved design quality: DFMEA surfaces design weaknesses early, when changes are least expensive to implement.
- Reduced warranty and field failures: PFMEA reduces the likelihood of process-related defects reaching customers, lowering warranty costs and protecting brand reputation.
- Maintenance strategy development: FMEA results inform preventive maintenance task selection, inspection intervals, and condition monitoring priorities. This connection to predictive maintenance is particularly valuable for high-consequence failure modes where early detection prevents catastrophic outcomes.
- Regulatory compliance: Many industries require documented FMEA as part of product approval, supplier qualification, or safety certification processes.
- Knowledge capture: The FMEA worksheet preserves institutional knowledge about failure risks, making it available to new team members and future design or process reviews.
- Cross-functional alignment: The team-based FMEA process surfaces perspectives from multiple disciplines, reducing the risk that any single viewpoint overlooks an important failure scenario.
FMEA and Reliability Programmes
FMEA does not stand alone. It is most effective when integrated into a broader reliability programme that includes failure analysis, RAM analysis, and risk-based maintenance planning.
In an RCM programme, FMEA provides the failure mode catalogue that feeds maintenance task selection. For each failure mode identified, the RCM logic asks whether a proactive maintenance task is technically feasible and worth doing. If so, the task is assigned to a schedule. If not, the analysis may lead to a design change, a one-time inspection, or an accepted run-to-failure decision based on consequence severity.
The failure rate data accumulated through a CMMS and condition monitoring systems feeds back into the FMEA, allowing teams to validate or revise the Occurrence scores assigned during the original analysis. This creates a continuous improvement loop that sharpens the accuracy of the FMEA over time.
Common FMEA Mistakes to Avoid
Several recurring mistakes reduce the value of an FMEA:
- Treating it as a paperwork exercise: FMEA only delivers value if the team engages seriously with each failure mode and follows through on corrective actions. An FMEA completed for compliance purposes without genuine analysis produces false confidence.
- Overly broad failure modes: Describing failure modes too broadly (for example, "pump fails") makes it impossible to identify specific causes and corrective actions. Failure modes should be precise and specific.
- Inconsistent rating scales: Teams must agree on rating criteria before scoring. If different team members interpret the Severity or Occurrence scales differently, the RPN rankings become unreliable.
- Ignoring high-Severity items: Focusing only on RPN without separately flagging high-Severity failures can leave catastrophic failure modes unaddressed if their Occurrence or Detection scores are low.
- Failing to update the document: An FMEA that is created once and never revised quickly becomes outdated and loses its value as a risk management tool.
- Inadequate team composition: FMEA conducted without input from operations, maintenance, or quality teams misses failure modes that only frontline experience reveals.
Frequently Asked Questions
What is FMEA?
FMEA (Failure Mode and Effects Analysis) is a structured, proactive method for identifying every way a system, product, or process can fail, determining the effect of each failure, and prioritising corrective actions based on a Risk Priority Number (RPN). It is performed before failures occur to prevent them.
What is the RPN in FMEA?
RPN stands for Risk Priority Number. It is calculated by multiplying three scores: Severity (S), Occurrence (O), and Detection (D). Each factor is rated on a 1-to-10 scale, giving an RPN range of 1 to 1,000. Higher RPNs represent higher priority risks requiring corrective action.
What are the main types of FMEA?
The three primary types are Design FMEA (DFMEA), Process FMEA (PFMEA), and System FMEA. DFMEA analyses product design failures, PFMEA analyses manufacturing and assembly process failures, and System FMEA analyses failures across integrated systems and subsystems.
What is the difference between FMEA and FMECA?
FMEA identifies failure modes and their effects. FMECA extends FMEA by adding a quantitative criticality analysis step, which uses failure rate data to calculate the probability and consequence severity of each failure mode. FMECA is more rigorous and is commonly used in aerospace, defence, and nuclear industries.
When should you conduct an FMEA?
FMEA should be conducted when designing a new product or process, when modifying an existing design, when failure consequences are safety-critical, as part of reliability-centred maintenance programmes, and when recurring failures are driving significant corrective maintenance costs.
What does an FMEA worksheet contain?
A standard FMEA worksheet contains the item or process step, its required function, the potential failure mode, the effect of the failure, Severity rating, the cause of the failure, current controls, Occurrence rating, Detection rating, the calculated RPN, recommended corrective action, responsible owner, target date, and a revised RPN after action is completed.
What are the steps to conduct an FMEA?
The core steps are: define the scope and assemble the team; create a functional block diagram; identify failure modes; determine their effects; rate Severity, Occurrence, and Detection; calculate and rank the RPN; assign corrective actions; implement actions and recalculate the RPN; and maintain the FMEA as a living document.
The Bottom Line
FMEA is most valuable when it is treated as a living document rather than a one-time exercise. As equipment ages, operating conditions change, and failure data accumulates, the failure modes and risk rankings need to be revisited regularly to remain accurate and actionable.
The organizations that get the most from FMEA are those that close the loop between FMEA outputs and maintenance execution. When high-RPN failure modes drive specific inspection tasks in the CMMS, and when field failure data is fed back to update the analysis, FMEA becomes a continuously improving guide to where maintenance resources should be focused rather than a static document completed once at commissioning.
Detect Failures Before the RPN Becomes Reality
FMEA tells you which failure modes carry the highest risk. Tractian condition monitoring gives you the real-time data to know when those failure modes are developing on your actual equipment.
See Condition MonitoringRelated terms
Maintenance Cycle: Definition
A maintenance cycle is the complete sequence of activities performed on an asset from one maintenance event to the next, sustaining asset reliability.
Maintenance Dashboard: Definition
A maintenance dashboard is a real-time visual display of key maintenance KPIs, work order status, and asset health data used to manage and improve maintenance operations.
Maintenance Documentation: Definition
Maintenance documentation is the complete set of records, procedures, and reports that capture maintenance activities, asset history, and compliance data in an industrial facility.
Maintenance Demand: Definition
Maintenance demand is the total volume of maintenance work required by assets at a given time, encompassing planned, unplanned, and condition-triggered work orders.
Maintenance Downtime: Definition
Maintenance downtime is the period when equipment is taken offline for maintenance activities, impacting OEE availability and production output.