How do you calculate failure rate?

Failure rate (lambda, λ) is calculated using the formula: λ = Number of Failures / Total Operating Time. For example, if a pump experiences 5 failures over 10,000 hours of operation, the failure rate is 0.0005 failures per hour. The result is often expressed in failures per million hours (FIT) for highly reliable components.

What is the difference between failure rate and MTBF?

Failure rate (λ) and Mean Time Between Failure (MTBF) are mathematical inverses of each other: MTBF = 1 / λ. A higher failure rate means a lower MTBF, and vice versa. MTBF expresses reliability as an average time between failures, while failure rate expresses it as a frequency. Both measures assume a constant failure rate during the useful life phase of the bathtub curve.

What is a good failure rate for industrial equipment?

There is no universal benchmark because what constitutes an acceptable failure rate depends on the asset's criticality, the industry, and the consequence of failure. A good failure rate is one that is low enough to meet production targets and safety requirements without causing unacceptable downtime or maintenance costs. Maintenance teams typically set target failure rates based on historical data and OEM specifications.

How does the bathtub curve relate to failure rate?

The bathtub curve illustrates how failure rate changes across an asset's life. During the infant mortality phase, failure rate is high and decreasing due to manufacturing defects and installation errors. In the useful life phase, failure rate stabilizes at a low, roughly constant level. During the wear-out phase, failure rate increases again as components degrade. Understanding which phase an asset is in helps teams apply the right maintenance strategy.

How is failure rate used in maintenance planning?

Failure rate data is used to set preventive maintenance intervals, justify predictive monitoring investments, prioritize critical assets, determine spare parts stocking levels, and perform risk-based maintenance analysis. When failure rate rises above historical baselines, it signals that an asset may be entering the wear-out phase and needs attention before a breakdown occurs.

What is the unit of failure rate?

The most common unit is failures per hour (f/h) for industrial equipment. In electronics and high-reliability systems, failure rate is often expressed in FIT (failures in time), where 1 FIT equals one failure per billion device-hours. Some industries also express failure rate as a percentage per year or failures per operating cycle.

Failure Rate: Definition

Name: Condition Monitoring System
Brand: Tractian
Rating: 4.7 (200 reviews)

Definition Failure rate is the frequency at which an asset or component fails within a given operating period. It is expressed as the number of failures per unit of time (typically failures per hour) and is the foundational metric of reliability engineering. A lower failure rate indicates a more reliable asset.

What Is Failure Rate?

Failure rate measures how often a piece of equipment fails during normal operation. It answers a direct question: on average, how many times per unit of time will this asset fail?

Maintenance teams use failure rate to understand asset reliability, set maintenance intervals, and prioritize where to invest in monitoring or upgrades. It is one of the core inputs in reliability-centered maintenance (RCM), FMEA, and risk-based maintenance programs.

Failure rate is typically denoted by the Greek letter lambda (λ) and is mathematically related to Mean Time Between Failure (MTBF), Mean Time to Failure (MTTF), and overall asset reliability.

Failure Rate Formula and Calculation

The standard failure rate formula is:

  λ = Number of Failures / Total Operating Time

The result is expressed in failures per unit of time. Common units include failures per hour, failures per year, or failures per operating cycle, depending on the asset type.

Step-by-Step Example

Suppose a centrifugal pump is operated continuously for 12 months. During that period, it experiences 3 failures. The operating time (assuming continuous operation) is 8,760 hours per year.

Number of failures: 3
Total operating time: 8,760 hours
Failure rate (λ): 3 / 8,760 = 0.000342 failures per hour

This means the pump fails approximately 0.000342 times per operating hour, or roughly once every 2,920 hours under normal conditions.

FIT: Failures in Time

For components with very low failure rates (such as electronic parts or safety systems), failure rate is often expressed in FIT (failures in time), where:

  1 FIT = 1 failure per 1,000,000,000 (10⁹) device-hours

FIT is common in semiconductor, automotive, and aerospace reliability analysis where failure rates are extremely small.

Failure Rate vs MTBF

Failure rate and MTBF describe the same reliability characteristic from different directions. MTBF asks how long an asset typically operates between failures. Failure rate asks how often failures occur per unit of time. They are mathematical inverses.

Metric	Formula	Unit	What It Answers
Failure Rate (λ)	Failures / Operating Time	Failures per hour	How often does this asset fail?
MTBF	1 / λ (or Operating Time / Failures)	Hours between failures	How long between failures, on average?
MTTF	Total Operating Time / Failures (non-repairable items)	Hours to first failure	How long before a new component first fails?
MTTR	Total Repair Time / Number of Repairs	Hours per repair	How long does it take to restore the asset?

MTBF is the most common way to communicate reliability to maintenance managers and operations leaders because it is expressed in familiar time units. Failure rate is preferred by reliability engineers and statisticians because it integrates directly into reliability models and probability calculations.

One important note: both MTBF and failure rate assume a constant failure rate. This assumption is only valid during the useful life phase of the asset. In the infant mortality and wear-out phases, the failure rate changes and more advanced statistical models (such as the Weibull distribution) are needed.

The Bathtub Curve: How Failure Rate Changes Over an Asset's Life

The bathtub curve is the standard model used in reliability engineering to describe how failure rate evolves across an asset's operational life. The curve is shaped like a bathtub when plotted, with high failure rates at the beginning and end of life and a low, stable failure rate in the middle.

The three phases are:

Phase 1: Infant Mortality (Decreasing Failure Rate)

Newly commissioned assets have an elevated failure rate that decreases over time. Failures in this phase are caused by manufacturing defects, installation errors, substandard components, and incorrect commissioning procedures.

This phase is also called "early life failure" or "burn-in failure." Manufacturers of electronic components sometimes test new products at full stress levels before shipment to eliminate defective units and reduce the effective failure rate seen by end users.

For maintenance teams, the infant mortality phase means new equipment should be monitored more closely after installation. High early failure rates do not predict long-term reliability.

Phase 2: Useful Life (Constant Failure Rate)

During normal operating life, failure rate stabilizes at a relatively low, constant level. Failures in this phase are random and unpredictable. They are caused by sudden overloads, operator errors, unexpected process upsets, or random component variation rather than systematic wear.

This is the phase where MTBF-based planning is most accurate. An asset in its useful life phase with an MTBF of 5,000 hours has a failure rate of 0.0002 failures per hour, and that rate remains roughly constant until the asset begins to age.

Time-based preventive maintenance, when applied during the useful life phase, may not reduce random failures. Condition-based maintenance and continuous monitoring are often more cost-effective strategies for this phase.

Phase 3: Wear-Out (Increasing Failure Rate)

As components reach the end of their design life, wear, corrosion, fatigue, and material degradation cause failure rate to rise. Failures become more predictable and more frequent. This is the phase where scheduled replacement and overhaul tasks are most valuable.

Identifying when an asset transitions into the wear-out phase is one of the most important tasks in a reliability-centered maintenance program. Condition monitoring technologies such as vibration analysis, oil analysis, and thermography help detect early signs of degradation before failure rate accelerates.

How Failure Rate Changes Over Time

The bathtub curve shows the general shape of failure rate over a lifetime, but individual assets can exhibit different patterns. Reliability engineers use the Weibull distribution to model failure rate more precisely because it can represent decreasing, constant, or increasing failure rates depending on the shape parameter (beta, β):

Weibull Shape (β)	Failure Rate Behavior	Lifecycle Phase
β less than 1	Decreasing failure rate	Infant mortality
β equals 1	Constant failure rate	Useful life
β greater than 1	Increasing failure rate	Wear-out

Understanding the Weibull shape parameter for a given asset class tells reliability engineers which maintenance strategy is most appropriate and when proactive intervention is warranted.

When an asset's observed failure rate begins to rise above its historical baseline, that is a key signal to investigate root causes, inspect for wear, and consider whether replacement or overhaul is approaching. Remaining useful life models use failure rate trends to predict exactly when an asset is likely to fail.

How Failure Rate Is Used in Maintenance Planning

Failure rate is not just a historical metric. It is an active planning tool. Maintenance teams use failure rate data in the following ways:

Setting Preventive Maintenance Intervals

PM intervals should be shorter than the expected time to failure, calculated from failure rate data. If a bearing has an MTBF of 4,000 hours (failure rate of 0.00025 failures per hour), a PM interval of 2,000 to 3,000 hours is typically appropriate. Setting intervals without failure rate data leads to either over-maintenance (wasted cost) or under-maintenance (unplanned failures).

Spare Parts Stocking

Failure rate determines how often a part will need to be replaced. A higher failure rate means a faster consumption rate and a higher recommended stock level. Inventory management teams use failure rate alongside lead time to calculate reorder points and safety stock levels for critical components.

Prioritizing Predictive Monitoring

Assets with high failure rates, or assets whose failure rate is trending upward, are the highest-priority candidates for continuous condition monitoring. Deploying sensors on these assets first delivers the greatest reduction in unplanned downtime.

Risk-Based Maintenance Prioritization

In risk-based maintenance, risk is calculated as the product of failure probability (derived from failure rate) and the consequence of failure. Assets with a high failure rate and high consequence of failure are ranked first for maintenance resources and investment.

Reliability Centered Maintenance Analysis

RCM uses failure rate data from historical records, OEM specifications, and industry databases to evaluate failure modes and select the most effective maintenance task for each one. Without accurate failure rate data, RCM analysis relies on assumptions rather than evidence.

Failure Rate in Industry Applications

Different industries apply failure rate analysis in different contexts. The underlying formula is the same, but the data sources, acceptable thresholds, and maintenance consequences vary significantly.

Industry	Common Application	Typical Data Sources
Manufacturing	Setting PM intervals, OEE analysis	CMMS work order history, OEM data
Oil and Gas	Safety system testing (FFI), pressure vessel inspection	OREDA database, plant history
Chemical	Process reliability, safety instrumented systems	CCPS data, site records
Food and Beverage	Packaging line reliability, hygienic equipment performance	Internal maintenance records, OEM specs
Mining	Rotating equipment reliability, haul truck components	Fleet records, component tracking systems

Across all industries, the quality of failure rate analysis depends on the quality of failure data recorded. Standardized failure codes in a CMMS make it possible to accurately count failures by mode, asset class, and time period, which improves the accuracy of calculated failure rates.

Failure Rate and Reliability: The Mathematical Relationship

For an asset with a constant failure rate (the useful life phase), the reliability function R(t) gives the probability that the asset will operate without failure for a period of time t:

  R(t) = e-λt

Where e is Euler's number (approximately 2.718), λ is the failure rate, and t is the time of interest.

For example, if a motor has a failure rate of 0.0001 failures per hour, its reliability over 1,000 hours is:

R(1000) = e^{-(0.0001 x 1000)} = e^-0.1 = approximately 0.905, or 90.5% reliability.

This means the motor has a 90.5% probability of surviving 1,000 hours without failure. Reliability engineers use this calculation to design maintenance intervals that achieve a target reliability level, such as 90%, 95%, or 99.9%, depending on the criticality of the asset.

How to Reduce Failure Rate

Reducing failure rate requires addressing the root causes of failure rather than simply responding to failures after they occur. The most effective approaches are:

Condition monitoring: Detecting degradation early using vibration analysis, oil analysis, thermography, and ultrasound allows teams to intervene before failures occur.
Root cause analysis: Every failure event should trigger a root cause analysis to prevent recurrence. Repeatedly fixing the same symptom without addressing the cause sustains a high failure rate.
Precision maintenance: Misalignment, imbalance, incorrect lubrication, and improper fastener torque are leading causes of early wear and elevated failure rates. Precision installation and maintenance practices reduce random failures during the useful life phase.
Proper commissioning: Infant mortality failures are often preventable. Thorough installation checks, alignment verification, and run-in procedures reduce the failure rate in the early life phase.
Failure mode and effects analysis: Proactively identifying high-risk failure modes using FMEA allows teams to design controls and maintenance tasks before high failure rates develop.

Frequently Asked Questions

What does a failure rate of 0 mean?

A failure rate of zero means the asset experienced no failures during the observation period. It does not mean the asset will never fail. Failure rate is a statistical measure, and a short observation period with no failures does not reliably predict long-term behavior.

Is a lower failure rate always better?

In most cases, yes. A lower failure rate means fewer failures, which means less unplanned downtime and lower maintenance costs. However, achieving a very low failure rate may require significant investment in monitoring, replacement parts, or more frequent maintenance. The optimal failure rate is one that balances reliability with total maintenance cost.

How does failure rate relate to asset availability?

Failure rate directly affects asset availability. Higher failure rates mean more frequent downtime events, which reduce the proportion of time the asset is available to produce. Availability is calculated from MTBF and MTTR: Availability = MTBF / (MTBF + MTTR). Reducing failure rate (and increasing MTBF) raises availability without requiring faster repairs.

Can failure rate be predicted before an asset is installed?

Yes, to a degree. OEM reliability specifications, industry databases (such as MIL-HDBK-217 for electronics, OREDA for oil and gas equipment), and similar-asset historical data can provide initial failure rate estimates. These estimates are updated over time as actual operating data is collected.

What is the difference between failure rate and probability of failure?

Failure rate (λ) is the frequency of failures per unit of time. Probability of failure is the likelihood that a failure will occur within a specified time window and is calculated from failure rate using the reliability function R(t) = e^-λt. The conditional probability of failure extends this to account for age: given that an asset has survived to time t, what is the probability it fails in the next small interval?

What data do I need to calculate failure rate?

You need two pieces of information: the number of failures over a defined period, and the total operating time of the asset during that period. Both data points should be available in your CMMS work order history and equipment runtime records. Accurate failure code recording in the CMMS is essential for separating failures by mode and asset class.

The Bottom Line

Failure rate is one of the most fundamental concepts in reliability engineering and maintenance planning. It quantifies how often equipment fails, connects directly to MTBF and reliability calculations, and changes across an asset's life in predictable patterns described by the bathtub curve.

Teams that measure failure rate accurately can set evidence-based PM intervals, stock the right spare parts, prioritize condition monitoring investments, and detect early signs of wear-out before a costly breakdown occurs. Teams that do not track failure rate are forced to make maintenance decisions based on instinct rather than data.

The starting point is reliable failure data captured in a CMMS with standardized failure codes. From there, failure rate becomes an active tool for reducing unplanned downtime and improving asset reliability across every asset class.

Detect Rising Failure Rates Before They Become Breakdowns

Tractian's condition monitoring solution continuously tracks vibration, temperature, and current on your critical assets. When failure rate trends upward, you see it first and act before the failure occurs.

Explore Condition Monitoring