Failure Rate: Definition

Definition Failure rate is the frequency at which an asset or component fails within a given operating period. It is expressed as the number of failures per unit of time (typically failures per hour) and is the foundational metric of reliability engineering. A lower failure rate indicates a more reliable asset.

What Is Failure Rate?

Failure rate measures how often a piece of equipment fails during normal operation. It answers a direct question: on average, how many times per unit of time will this asset fail?

Maintenance teams use failure rate to understand asset reliability, set maintenance intervals, and prioritize where to invest in monitoring or upgrades. It is one of the core inputs in reliability-centered maintenance (RCM), FMEA, and risk-based maintenance programs.

Failure rate is typically denoted by the Greek letter lambda (λ) and is mathematically related to Mean Time Between Failure (MTBF), Mean Time to Failure (MTTF), and overall asset reliability.

Failure Rate Formula and Calculation

The standard failure rate formula is:

λ = Number of Failures / Total Operating Time

The result is expressed in failures per unit of time. Common units include failures per hour, failures per year, or failures per operating cycle, depending on the asset type.

Step-by-Step Example

Suppose a centrifugal pump is operated continuously for 12 months. During that period, it experiences 3 failures. The operating time (assuming continuous operation) is 8,760 hours per year.

  • Number of failures: 3
  • Total operating time: 8,760 hours
  • Failure rate (λ): 3 / 8,760 = 0.000342 failures per hour

This means the pump fails approximately 0.000342 times per operating hour, or roughly once every 2,920 hours under normal conditions.

FIT: Failures in Time

For components with very low failure rates (such as electronic parts or safety systems), failure rate is often expressed in FIT (failures in time), where:

1 FIT = 1 failure per 1,000,000,000 (10⁹) device-hours

FIT is common in semiconductor, automotive, and aerospace reliability analysis where failure rates are extremely small.

Failure Rate vs MTBF

Failure rate and MTBF describe the same reliability characteristic from different directions. MTBF asks how long an asset typically operates between failures. Failure rate asks how often failures occur per unit of time. They are mathematical inverses.

Metric Formula Unit What It Answers
Failure Rate (λ) Failures / Operating Time Failures per hour How often does this asset fail?
MTBF 1 / λ (or Operating Time / Failures) Hours between failures How long between failures, on average?
MTTF Total Operating Time / Failures (non-repairable items) Hours to first failure How long before a new component first fails?
MTTR Total Repair Time / Number of Repairs Hours per repair How long does it take to restore the asset?

MTBF is the most common way to communicate reliability to maintenance managers and operations leaders because it is expressed in familiar time units. Failure rate is preferred by reliability engineers and statisticians because it integrates directly into reliability models and probability calculations.

One important note: both MTBF and failure rate assume a constant failure rate. This assumption is only valid during the useful life phase of the asset. In the infant mortality and wear-out phases, the failure rate changes and more advanced statistical models (such as the Weibull distribution) are needed.

The Bathtub Curve: How Failure Rate Changes Over an Asset's Life

The bathtub curve is the standard model used in reliability engineering to describe how failure rate evolves across an asset's operational life. The curve is shaped like a bathtub when plotted, with high failure rates at the beginning and end of life and a low, stable failure rate in the middle.

The three phases are:

Phase 1: Infant Mortality (Decreasing Failure Rate)

Newly commissioned assets have an elevated failure rate that decreases over time. Failures in this phase are caused by manufacturing defects, installation errors, substandard components, and incorrect commissioning procedures.

This phase is also called "early life failure" or "burn-in failure." Manufacturers of electronic components sometimes test new products at full stress levels before shipment to eliminate defective units and reduce the effective failure rate seen by end users.

For maintenance teams, the infant mortality phase means new equipment should be monitored more closely after installation. High early failure rates do not predict long-term reliability.

Phase 2: Useful Life (Constant Failure Rate)

During normal operating life, failure rate stabilizes at a relatively low, constant level. Failures in this phase are random and unpredictable. They are caused by sudden overloads, operator errors, unexpected process upsets, or random component variation rather than systematic wear.

This is the phase where MTBF-based planning is most accurate. An asset in its useful life phase with an MTBF of 5,000 hours has a failure rate of 0.0002 failures per hour, and that rate remains roughly constant until the asset begins to age.

Time-based preventive maintenance, when applied during the useful life phase, may not reduce random failures. Condition-based maintenance and continuous monitoring are often more cost-effective strategies for this phase.

Phase 3: Wear-Out (Increasing Failure Rate)

As components reach the end of their design life, wear, corrosion, fatigue, and material degradation cause failure rate to rise. Failures become more predictable and more frequent. This is the phase where scheduled replacement and overhaul tasks are most valuable.

Identifying when an asset transitions into the wear-out phase is one of the most important tasks in a reliability-centered maintenance program. Condition monitoring technologies such as vibration analysis, oil analysis, and thermography help detect early signs of degradation before failure rate accelerates.

How Failure Rate Changes Over Time

The bathtub curve shows the general shape of failure rate over a lifetime, but individual assets can exhibit different patterns. Reliability engineers use the Weibull distribution to model failure rate more precisely because it can represent decreasing, constant, or increasing failure rates depending on the shape parameter (beta, β):

Weibull Shape (β) Failure Rate Behavior Lifecycle Phase
β less than 1 Decreasing failure rate Infant mortality
β equals 1 Constant failure rate Useful life
β greater than 1 Increasing failure rate Wear-out

Understanding the Weibull shape parameter for a given asset class tells reliability engineers which maintenance strategy is most appropriate and when proactive intervention is warranted.

When an asset's observed failure rate begins to rise above its historical baseline, that is a key signal to investigate root causes, inspect for wear, and consider whether replacement or overhaul is approaching. Remaining useful life models use failure rate trends to predict exactly when an asset is likely to fail.

How Failure Rate Is Used in Maintenance Planning

Failure rate is not just a historical metric. It is an active planning tool. Maintenance teams use failure rate data in the following ways:

Setting Preventive Maintenance Intervals

PM intervals should be shorter than the expected time to failure, calculated from failure rate data. If a bearing has an MTBF of 4,000 hours (failure rate of 0.00025 failures per hour), a PM interval of 2,000 to 3,000 hours is typically appropriate. Setting intervals without failure rate data leads to either over-maintenance (wasted cost) or under-maintenance (unplanned failures).

Spare Parts Stocking

Failure rate determines how often a part will need to be replaced. A higher failure rate means a faster consumption rate and a higher recommended stock level. Inventory management teams use failure rate alongside lead time to calculate reorder points and safety stock levels for critical components.

Prioritizing Predictive Monitoring

Assets with high failure rates, or assets whose failure rate is trending upward, are the highest-priority candidates for continuous condition monitoring. Deploying sensors on these assets first delivers the greatest reduction in unplanned downtime.

Risk-Based Maintenance Prioritization

In risk-based maintenance, risk is calculated as the product of failure probability (derived from failure rate) and the consequence of failure. Assets with a high failure rate and high consequence of failure are ranked first for maintenance resources and investment.

Reliability Centered Maintenance Analysis

RCM uses failure rate data from historical records, OEM specifications, and industry databases to evaluate failure modes and select the most effective maintenance task for each one. Without accurate failure rate data, RCM analysis relies on assumptions rather than evidence.

Failure Rate in Industry Applications

Different industries apply failure rate analysis in different contexts. The underlying formula is the same, but the data sources, acceptable thresholds, and maintenance consequences vary significantly.

Industry Common Application Typical Data Sources
Manufacturing Setting PM intervals, OEE analysis CMMS work order history, OEM data
Oil and Gas Safety system testing (FFI), pressure vessel inspection OREDA database, plant history
Chemical Process reliability, safety instrumented systems CCPS data, site records
Food and Beverage Packaging line reliability, hygienic equipment performance Internal maintenance records, OEM specs
Mining Rotating equipment reliability, haul truck components Fleet records, component tracking systems

Across all industries, the quality of failure rate analysis depends on the quality of failure data recorded. Standardized failure codes in a CMMS make it possible to accurately count failures by mode, asset class, and time period, which improves the accuracy of calculated failure rates.

Failure Rate and Reliability: The Mathematical Relationship

For an asset with a constant failure rate (the useful life phase), the reliability function R(t) gives the probability that the asset will operate without failure for a period of time t:

R(t) = e-λt

Where e is Euler's number (approximately 2.718), λ is the failure rate, and t is the time of interest.

For example, if a motor has a failure rate of 0.0001 failures per hour, its reliability over 1,000 hours is:

R(1000) = e-(0.0001 x 1000) = e-0.1 = approximately 0.905, or 90.5% reliability.

This means the motor has a 90.5% probability of surviving 1,000 hours without failure. Reliability engineers use this calculation to design maintenance intervals that achieve a target reliability level, such as 90%, 95%, or 99.9%, depending on the criticality of the asset.

How to Reduce Failure Rate

Reducing failure rate requires addressing the root causes of failure rather than simply responding to failures after they occur. The most effective approaches are:

  • Condition monitoring: Detecting degradation early using vibration analysis, oil analysis, thermography, and ultrasound allows teams to intervene before failures occur.
  • Root cause analysis: Every failure event should trigger a root cause analysis to prevent recurrence. Repeatedly fixing the same symptom without addressing the cause sustains a high failure rate.
  • Precision maintenance: Misalignment, imbalance, incorrect lubrication, and improper fastener torque are leading causes of early wear and elevated failure rates. Precision installation and maintenance practices reduce random failures during the useful life phase.
  • Proper commissioning: Infant mortality failures are often preventable. Thorough installation checks, alignment verification, and run-in procedures reduce the failure rate in the early life phase.
  • Failure mode and effects analysis: Proactively identifying high-risk failure modes using FMEA allows teams to design controls and maintenance tasks before high failure rates develop.

Frequently Asked Questions

What does a failure rate of 0 mean?

A failure rate of zero means the asset experienced no failures during the observation period. It does not mean the asset will never fail. Failure rate is a statistical measure, and a short observation period with no failures does not reliably predict long-term behavior.

Is a lower failure rate always better?

In most cases, yes. A lower failure rate means fewer failures, which means less unplanned downtime and lower maintenance costs. However, achieving a very low failure rate may require significant investment in monitoring, replacement parts, or more frequent maintenance. The optimal failure rate is one that balances reliability with total maintenance cost.

How does failure rate relate to asset availability?

Failure rate directly affects asset availability. Higher failure rates mean more frequent downtime events, which reduce the proportion of time the asset is available to produce. Availability is calculated from MTBF and MTTR: Availability = MTBF / (MTBF + MTTR). Reducing failure rate (and increasing MTBF) raises availability without requiring faster repairs.

Can failure rate be predicted before an asset is installed?

Yes, to a degree. OEM reliability specifications, industry databases (such as MIL-HDBK-217 for electronics, OREDA for oil and gas equipment), and similar-asset historical data can provide initial failure rate estimates. These estimates are updated over time as actual operating data is collected.

What is the difference between failure rate and probability of failure?

Failure rate (λ) is the frequency of failures per unit of time. Probability of failure is the likelihood that a failure will occur within a specified time window and is calculated from failure rate using the reliability function R(t) = e-λt. The conditional probability of failure extends this to account for age: given that an asset has survived to time t, what is the probability it fails in the next small interval?

What data do I need to calculate failure rate?

You need two pieces of information: the number of failures over a defined period, and the total operating time of the asset during that period. Both data points should be available in your CMMS work order history and equipment runtime records. Accurate failure code recording in the CMMS is essential for separating failures by mode and asset class.

The Bottom Line

Failure rate is one of the most fundamental concepts in reliability engineering and maintenance planning. It quantifies how often equipment fails, connects directly to MTBF and reliability calculations, and changes across an asset's life in predictable patterns described by the bathtub curve.

Teams that measure failure rate accurately can set evidence-based PM intervals, stock the right spare parts, prioritize condition monitoring investments, and detect early signs of wear-out before a costly breakdown occurs. Teams that do not track failure rate are forced to make maintenance decisions based on instinct rather than data.

The starting point is reliable failure data captured in a CMMS with standardized failure codes. From there, failure rate becomes an active tool for reducing unplanned downtime and improving asset reliability across every asset class.

Detect Rising Failure Rates Before They Become Breakdowns

TRACTIAN's condition monitoring solution continuously tracks vibration, temperature, and current on your critical assets. When failure rate trends upward, you see it first and act before the failure occurs.

Explore Condition Monitoring

Related terms