Fault Tree Analysis: Definition

Name: Condition Monitoring System
Brand: Tractian
Rating: 4.7 (200 reviews)

Definition Fault tree analysis (FTA) is a top-down, deductive method for identifying all the combinations of causes that could lead to a specific undesired event, such as equipment failure, a process upset, or a safety incident. Starting from the top event and working systematically downward through a logic tree of AND and OR gates, FTA maps every pathway through which the failure could occur. It is one of the core tools in reliability engineering and safety analysis, used both to prevent failures during design and to investigate them after the fact.

What Is Fault Tree Analysis?

Fault tree analysis is a structured, logic-based technique for understanding how failures can occur in complex systems. An analyst defines a single undesired outcome at the top of the tree and then asks: what must happen for this event to occur? Each answer becomes a branch of the tree, and each branch is decomposed further until the analysis reaches basic, quantifiable events such as individual component failures, human errors, or external conditions.

The method was originally developed in the early 1960s at Bell Telephone Laboratories and was adopted by the aerospace and nuclear industries for safety-critical system analysis. It has since become a standard tool in process safety, mechanical reliability, and maintenance engineering across a wide range of industries.

The logic structure of the tree uses two primary gates. OR gates represent conditions where any single input is sufficient to produce the output. AND gates represent conditions where all inputs must be present simultaneously to produce the output. This Boolean logic structure allows the analyst to model redundancy, single points of failure, and complex multi-cause failure scenarios within a single diagram.

FTA can be used qualitatively, to identify failure pathways and minimal cut sets, or quantitatively, to calculate the numerical probability of the top event when failure rate data is available for each basic event.

Top-Down Logic: How Fault Tree Analysis Works

The defining characteristic of FTA is its direction. Most failure analysis methods work upward from component-level detail. FTA works downward from a specific outcome.

The analyst begins by precisely defining the top event. Precision matters here. "Pump fails" is too vague. "Centrifugal pump P-101 fails to deliver flow at design conditions" is a workable top event. The more specific the definition, the more useful the analysis.

From the top event, the analyst asks: what immediate causes could produce this event? Those causes become the second level of the tree. Each cause is then decomposed in turn: what could produce this intermediate event? This decomposition continues level by level until the analysis reaches basic events that cannot be decomposed further, typically individual component failures, human errors, or environmental conditions that have known or estimated failure probabilities.

At each level, the analyst assigns a gate. An OR gate is used when any single input is sufficient to cause the output. An AND gate is used when every input must be present simultaneously.

The completed tree is a comprehensive, structured map of every failure pathway leading to the top event. Reading up from any basic event through its gate structure shows exactly how that event contributes to the system-level failure.

Key Symbols and Gates in a Fault Tree Diagram

Fault tree diagrams use a standardized set of symbols. Understanding the primary symbols is essential for reading and building a fault tree.

Symbol	Name	Meaning	Example
Rectangle	Intermediate or top event	A fault event that can be decomposed further into more basic causes	Pump fails to deliver flow
Circle	Basic event	A primary fault event that requires no further decomposition; has an assigned probability	Bearing fails due to fatigue
Diamond	Undeveloped event	An event not further developed due to lack of data or low significance; treated as a basic event with estimated probability	External contamination source
OR gate	OR gate	The output event occurs if any one or more of the input events occurs; models single points of failure	Pump fails if impeller fails OR seal fails OR motor fails
AND gate	AND gate	The output event occurs only if all input events occur simultaneously; models redundancy requirements	System loses flow only if duty pump fails AND standby pump fails
House event	House symbol	A normal event that is expected to occur under specific operating conditions; used to switch branches of the tree on or off	System operating in high-load mode

OR Gates: Single Points of Failure

An OR gate at a given node means the output event occurs whenever any single one of its inputs occurs. OR gates represent parallel failure pathways that converge on the same outcome. Every branch under an OR gate is an independent failure route to the output above it.

In practice, OR gates are very common because most intermediate events in a complex system can be caused by any one of several independent root causes. A motor failure might occur because of overheating, bearing wear, insulation breakdown, or overcurrent. Any one of these is sufficient: the connection is OR logic.

AND Gates: Redundancy and Combined Failures

An AND gate means the output event occurs only when all of its inputs are present at the same time. AND gates are how redundancy is represented in a fault tree. A system with a duty pump and a standby pump will have an AND gate above the event "loss of pumping capacity": both the duty pump failure and the standby pump failure must occur together.

AND gates are critical for identifying scenarios where fault tolerance through redundancy is effective. They also highlight common-cause failure risks: if the duty and standby components share a common failure mechanism, such as the same lubricant type, the same environmental exposure, or the same maintenance interval, the AND gate protection can be undermined.

Fault Tree Analysis vs FMEA

FTA and FMEA (Failure Mode and Effects Analysis) are the two dominant analytical frameworks in reliability engineering. They are often used together because they approach the same problem from opposite directions.

Dimension	Fault Tree Analysis (FTA)	FMEA
Direction	Top-down: starts with the failure outcome	Bottom-up: starts with individual failure modes
Starting point	A specific, predefined undesired event	Every failure mode of every component
Best suited for	Complex, multi-cause scenarios; investigating a known failure event; quantitative probability analysis	Comprehensive coverage of all possible failure modes; design review; prioritization by risk priority number
Output	Fault tree diagram, minimal cut sets, top event probability	FMEA worksheet with severity, occurrence, and detection ratings; risk priority numbers
Handles combined failures	Yes, through AND gate logic	Not directly; each failure mode is analyzed independently
Quantitative	Yes, when failure rate data is available	Semi-quantitative (risk priority number scoring)
Typical use in maintenance	Post-incident investigation, safety case development, reliability design review	Maintenance strategy development, reliability-centered maintenance (RCM) programs, design FMEA

In a comprehensive reliability program, FMEA provides broad coverage of the failure modes present in a system, and FTA provides depth on the specific failure scenarios that matter most. FMECA extends the FMEA approach by adding a criticality ranking, which helps prioritize which failure modes are most worth investigating further with FTA.

How to Build a Fault Tree: Step-by-Step

Step 1: Define the Top Event

Write a precise, unambiguous description of the failure you are analyzing. Include the system, the function that is lost, and the conditions under which it occurs. Vague top events lead to incomplete trees.

Step 2: Assemble the Analysis Team

FTA benefits from cross-functional input. The team should include process or design engineers who understand system function, maintenance technicians who know how the equipment actually fails, and a reliability or safety engineer to lead the logic structure. Each participant brings knowledge that the others may lack.

Step 3: Identify the Immediate Causes

Working directly below the top event, list all the immediate causes that could produce it. For each, decide whether the relationship is OR (any one cause is sufficient) or AND (all causes must be present simultaneously). Place the appropriate gate and connect the causes.

Step 4: Decompose Each Cause

For each intermediate event created in step 3, repeat the process: what are its immediate causes? Continue decomposing level by level until you reach basic events that have known or estimable probabilities and cannot be broken down further into constituent causes.

Step 5: Identify Minimal Cut Sets

A cut set is a combination of basic events whose simultaneous occurrence causes the top event. A minimal cut set is the smallest such combination: removing any event from the set means the top event can no longer occur via that route.

Single-event minimal cut sets are single points of failure: one component or action is sufficient on its own to cause the top event. These are the highest-priority findings for design improvement and maintenance planning.

Step 6: Quantify (Optional)

If failure rate data is available for each basic event, calculate the probability of each event over the analysis period, then propagate those probabilities up through the gate structure. OR gates combine probabilities by addition (with an adjustment to avoid double-counting overlapping events). AND gates combine probabilities by multiplication. The result is a calculated probability for the top event.

Step 7: Interpret and Act

Review the minimal cut sets in order of their contribution to top event probability. Identify design changes, maintenance tasks, or operating procedures that reduce the likelihood of the most critical cut sets. Document findings and assign actions. For post-incident analysis, verify that the basic events identified actually occurred and that the tree correctly represents the failure mechanism.

Fault Tree Analysis Example

Consider a fire suppression system that fails to activate on demand. The top event is: "Suppression system fails to deliver agent to the fire zone."

The immediate causes might be: no agent reaches the nozzles (OR) control system fails to open the release valve (OR) the agent storage vessel is empty. Each of these branches decomposes further.

"Control system fails to open the release valve" might decompose as: detection sensor fails to trigger (OR) control panel fails to process signal (OR) solenoid valve fails to open. This is an OR gate: any single failure in the chain produces the outcome.

"No agent reaches the nozzles even though the valve opens" might decompose as: distribution pipework is blocked (OR) nozzle orifices are blocked. Again, OR logic.

Now consider the redundancy the designer intended to build in. If a standby detector is connected in parallel with the primary detector, the branch becomes: primary detector fails to trigger AND standby detector fails to trigger. This is an AND gate. Both detectors must fail simultaneously for the control system to be starved of a signal. This cut set has a much lower probability than either detector failing alone, because both probabilities must be multiplied.

The minimal cut sets from this tree would include: (a) control panel failure alone (single-event cut set, high priority), (b) primary detector failure AND standby detector failure (two-event cut set, lower priority if both detectors are well maintained), (c) solenoid valve failure alone (single-event cut set, high priority), and so on.

The analysis tells the team that the control panel and the solenoid valve are single points of failure requiring priority maintenance attention and possibly design redundancy, while the detection circuit is already protected by the AND logic of the parallel detectors.

When to Use Fault Tree Analysis

FTA is not appropriate for every reliability problem. It is most valuable in specific circumstances.

Post-incident investigation. After a significant failure, FTA provides a structured method to trace the event back to its root causes and identify every pathway through which the failure occurred. It complements root cause analysis by providing a logic framework that ensures no pathway is overlooked. While Five Whys traces a single causal chain, FTA maps all chains in parallel.

Safety case development. Regulatory submissions for safety-critical systems often require quantitative evidence that the probability of a hazardous event is below a defined threshold. FTA provides the analytical structure to calculate that probability and demonstrate compliance.

Design review of complex systems. When evaluating a new design, FTA identifies single points of failure, common-cause failure vulnerabilities, and the adequacy of redundant protection before the system is built. This is far less costly than discovering the same problems through operational failures.

Reliability-centered maintenance programs. In RCM, FTA can be used to understand the system context of a critical failure mode and to evaluate whether a proposed maintenance task actually addresses the relevant failure pathway.

Prioritizing maintenance resources. The minimal cut sets and their probabilities provide an objective basis for allocating maintenance effort. Single-event minimal cut sets and high-probability cut sets receive priority. This supports criticality analysis and risk-based maintenance planning.

Qualitative vs Quantitative Fault Tree Analysis

FTA can be performed at two levels of rigor depending on the purpose and available data.

Qualitative FTA is completed without numerical probability values. The goal is to map the failure structure, identify all minimal cut sets, and understand which pathways are logically possible. Qualitative FTA is useful for design reviews, failure investigations, and maintenance planning when precise probability data is not available. The output is a prioritized list of failure pathways ranked by their logical structure and engineering judgment about likelihood.

Quantitative FTA assigns failure probabilities to each basic event and calculates the probability of the top event. It requires failure rate data, which can come from equipment databases, manufacturer specifications, field history, or published reliability handbooks. The output includes a numerical top event probability and a ranked list of cut sets by their contribution to that probability. Quantitative FTA is used in formal safety cases, probabilistic risk assessments, and RAM analysis.

The choice between qualitative and quantitative FTA is driven by purpose, available data, and the cost of analysis relative to the benefit of precision. For most operational maintenance applications, qualitative FTA provides sufficient insight. Quantitative FTA adds value when a specific numerical probability target must be demonstrated.

Common Pitfalls in Fault Tree Analysis

An imprecise top event. A vague top event produces a vague tree. If the top event is defined as "pump fails" rather than "pump fails to deliver rated flow under design conditions," the tree will include failure modes that are not actually relevant to the scenario being studied.

Missing common-cause failures. A common-cause failure occurs when a single event causes multiple components to fail simultaneously, undermining the AND gate protection that redundancy provides. If two parallel pumps share the same lubricant, the same vibration environment, or the same maintenance technician making the same error, a single cause can defeat both. Common-cause failure analysis should be a deliberate step in any FTA involving redundant components.

Incomplete decomposition. Stopping the tree too early, before reaching basic events with known probabilities, means the analysis cannot be made quantitative and may miss important failure pathways at lower levels. Each branch should be decomposed until it reaches events that are genuinely independent and have estimable probabilities.

Treating the tree as a one-time exercise. Equipment changes, operating condition changes, and accumulated failure history all affect the validity of a fault tree over time. FTA findings should be reviewed when significant changes occur and updated as part of ongoing reliability management.

FTA and Predictive Maintenance

Fault tree analysis identifies which basic events, if they occur, have the greatest impact on the top event probability. Predictive maintenance provides the means to detect those basic events in their early stages, before they cause the full failure pathway to become active.

The basic events identified in a fault tree as high-priority single points of failure or members of low-probability cut sets are precisely the components that most benefit from continuous condition monitoring. If a solenoid valve is identified as a single-event cut set for a critical safety function, monitoring its electrical response time and coil insulation condition provides early warning before the functional failure occurs. If a bearing failure in the duty pump is a basic event in a two-event cut set alongside standby pump failure, vibration monitoring on both pumps ensures that the AND gate protection is never silently degraded.

Used together, FTA and condition monitoring form a closed loop: FTA identifies what to monitor and why, and monitoring data validates the failure probabilities assumed in the tree while detecting developing faults before they contribute to the top event. This approach also feeds FRACAS (Failure Reporting, Analysis, and Corrective Action Systems), which captures operational failure data to update and refine the tree over time.

FTA in the Context of Other Reliability Methods

FTA sits within a broader ecosystem of reliability and failure analysis methods. Each method has a different scope and level of analysis:

FMEA provides breadth: systematic coverage of all failure modes. FTA provides depth: detailed analysis of specific failure scenarios. They are complementary.
Root cause analysis investigates why a specific failure occurred after the fact. FTA can be used as the structured framework within an RCA to ensure all causal pathways are considered, not just the most obvious one.
Five Whys is a rapid, informal technique for simple, single-chain failure investigations. FTA is a formal, structured method for complex, multi-cause scenarios. The bathtub curve context helps determine whether a failure is early-life, random, or wear-out in nature, which informs which branches of the fault tree are most likely.
RAM analysis models overall system reliability and availability. FTA provides the failure pathway logic that underpins the RAM model for complex systems.
Event tree analysis (ETA) is the forward-looking complement to FTA. Where FTA asks "what could cause this top event," ETA asks "if this initiating event occurs, what are the possible outcomes?" The two methods are often used together in full hazard and risk assessments.

Benefits of Fault Tree Analysis

FTA provides several distinct advantages over less structured approaches to failure analysis and prevention.

It forces explicit logic. Every connection in a fault tree must be justified as either AND or OR logic. This discipline prevents the analytical shortcuts that can cause informal investigations to miss important failure pathways.

It handles complexity. Systems with redundancy, multiple failure modes, and interdependent components are difficult to analyze informally. The tree structure handles this complexity systematically, ensuring that combined failure scenarios are not overlooked.

It prioritizes action. Minimal cut sets ranked by their probability or engineering significance give maintenance and engineering teams an objective basis for allocating limited resources to the failure pathways that matter most.

It supports both prevention and investigation. The same method is applicable before a failure occurs (design review, maintenance planning) and after one occurs (incident investigation, corrective maintenance follow-up). This makes it a versatile tool across the reliability engineering lifecycle.

It creates a shared understanding. A completed fault tree is a visual document that engineers, maintenance teams, operators, and managers can all read. It creates a shared understanding of how the system can fail and what is being done to prevent it.

Frequently Asked Questions

What is fault tree analysis?

Fault tree analysis (FTA) is a top-down, deductive reliability method that starts with an undesired top-level event and systematically maps all the combinations of causes that could produce it. The analysis uses a logic tree of AND gates, OR gates, and basic event symbols to represent the failure structure. The output is a visual diagram showing every failure pathway to the top event, together with the minimal cut sets that represent the smallest combinations of failures sufficient to cause the outcome.

What is the difference between FTA and FMEA?

FTA and FMEA analyze the same system from opposite directions. FTA is top-down: it starts with a specific failure outcome and asks what combinations of causes could produce it. FMEA is bottom-up: it starts with individual component failure modes and asks what effects each could have on the system. FTA is better suited to complex multi-cause scenarios and quantitative probability analysis. FMEA provides broader, more comprehensive coverage of all failure modes. Both are used together in robust reliability programs.

What are AND gates and OR gates in a fault tree?

An OR gate means the output event occurs if any one of the input events occurs. OR gates represent single points of failure where a single cause is sufficient. An AND gate means the output event occurs only if all input events occur simultaneously. AND gates model redundancy: every element of the group must fail at the same time to produce the output. AND gates are how fault trees represent the protection provided by redundant components, and they show where common-cause failure vulnerability can undermine that protection.

When should you use fault tree analysis?

FTA is most valuable for complex, high-consequence failure scenarios involving multiple possible causes or redundant system architectures. Key use cases include: post-incident investigation to identify all failure pathways, design review to find single points of failure before a system is built, safety case development requiring quantitative probability evidence, and maintenance prioritization based on minimal cut set analysis. For simple, single-cause failures, less intensive methods such as root cause analysis or Five Whys are sufficient.

What is a minimal cut set in fault tree analysis?

A minimal cut set is the smallest combination of basic events (component failures, human errors, or external conditions) whose simultaneous occurrence is sufficient to cause the top event. Single-event minimal cut sets represent single points of failure. Identifying minimal cut sets is one of the primary analytical outputs of FTA because it shows exactly which individual failures or combinations of failures the design and maintenance program must prevent. Cut sets are ranked by probability to focus resources on the most critical pathways.

Can fault tree analysis be quantitative?

Yes. When failure rate data is available for each basic event, FTA can calculate the probability of the top event over a defined time period. The calculation applies Boolean probability rules through the gate structure: OR gates add probabilities (with adjustment for overlaps), AND gates multiply them. Quantitative FTA is used in formal safety cases, probabilistic risk assessments, and regulatory submissions where a specific numerical probability target must be demonstrated. When failure data is not available, qualitative FTA still provides valuable structural insight through minimal cut set analysis.

The Bottom Line

Fault tree analysis is one of the most rigorous tools available for understanding system failure. By working backward from a defined undesired event through logic gates to contributing causes, FTA reveals the specific combinations of failures that can produce a catastrophic outcome — and therefore exactly where design safeguards and maintenance tasks need to be focused.

For maintenance organizations, FTA findings have direct practical implications. When a minimal cut set identifies a combination of component failures that can cause a dangerous or costly system failure, each component in that cut set needs a maintenance strategy that prevents the failure mode from occurring. The rigor of FTA ensures that maintenance priorities are grounded in a defensible analysis of actual system risk rather than general experience or intuition.

Detect Failures Before They Reach the Top Event

Fault tree analysis tells you which components matter most. Tractian's condition monitoring platform continuously tracks the health of those critical assets, detecting degradation in real time so your team can act before a basic event triggers a full failure pathway. Turn your FTA findings into a live early-warning system.

See Condition Monitoring