ads

Failure Mode and Effect Analysis (FMEA)

 

Failure Mode and Effect Analysis (FMEA)

Composed By Muhammad Aqeel Khan
Date 21/9/2025


Failures happen — in designs, processes, systems, or services. What separates high-reliability organizations from others is not simply how they respond to failure, but how they proactively anticipate with careful analysis so failures are prevented or mitigated. One of the most widely used tools for such risk-based prevention is Failure Mode and Effect Analysis (FMEA).

What Is FMEA?

FMEA (sometimes FMECA when combined with Criticality Analysis) is a structured, systematic, proactive risk assessment method. It aims to identify:

  • Failure modes — the ways in which a component, system, or process might fail to perform its intended function;

  • Effects — what happens if a failure mode occurs;

  • Causes — why that failure might occur;

  • Controls — what measures exist (or could be put in place) to prevent the failure, reduce its occurrence, or detect it before it causes harm;

By evaluating and prioritizing these potential failure modes, FMEA helps organizations allocate resources (design changes, process improvements, monitoring, detection, prevention) to reduce risk. 

FMECA is a variation that adds criticality assessment — combining failure likelihood and severity into a measure (often used in aerospace, military, and safety-critical systems) to further prioritize. 

History of FMEA

The origins of FMEA trace back to aerospace sectors in the U.S.:

  • Over the 1950s/1960s, FMEA variants were adopted by NASA, contractors for space programs like Apollo, Viking, etc. 

  • In subsequent decades, it spread into automotive, manufacturing, chemical, food, healthcare, electronics, and software industries. 

  • In automotive especially, the AIAG (Automotive Industry Action Group, formed ~1982) and VDA (German automotive association) produced long-used FMEA handbooks. In 2019, AIAG and VDA released a harmonized FMEA manual to unify DFMEA (Design FMEA), PFMEA (Process FMEA), and supplemental FMEA practices. 

So, while the core idea is old, FMEA has evolved, adapted, expanded, and standardized more recently.

Why FMEA Is Widely Used in Manufacturing, Healthcare, Engineering

Several industries adopt FMEA because it helps them meet strong demands for safety, reliability, quality, regulation, and customer satisfaction. Some reasons:

  1. Proactive risk identification

    Instead of responding after failures occur (reactive), FMEA helps anticipate and prevent failures (or minimize their impact).

  2. Regulatory, safety, and quality compliance requirements

    Industries like automotive, aerospace, medical devices, healthcare and nuclear are subject to strict safety standards. FMEA (and its newer versions) are embedded in many industry standards and customer requirements. For example, the AIAG-VDA FMEA handbook is expected in many OEM/Tier-1 supplier relationships. 

  3. Support for ISO standards and risk management systems

    • ISO 9001:2015 emphasizes “risk-based thinking” and preventive action—FMEA can be (and often is) used to satisfy these requirements. 9000 Store+2Advisera+2

    • ISO 31000 (Risk Management) provides a general framework for risk identification, analysis, evaluation, treatment, monitoring. FMEA fits into that framework as a technique under risk analysis and evaluation. 

  4. Cost savings and reliability improvements

    By identifying high severity or high occurrence failures early, designing in mitigating controls, improving detection, organizations reduce warranty costs, recalls, downtime, errors, rework, safety incidents.

  5. Cross-industry versatility

    FMEA works in manufacturing (design & production), in engineering (systems, mechanisms), in software and hardware reliability, in healthcare (patient safety, medication errors, procedural risks), and more. Literature is rich with healthcare FMEA applications.

Step-by-Step Process of Conducting FMEA

Although there are variations by industry (traditional vs AIAG-VDA harmonized practices), a typical FMEA process follows roughly these main steps. I’ll include the newer AIAG/VDA (2019-on) seven-step approach where relevant.

StepWhat HappensKey Outputs
1. Planning & Preparation / Define ScopeDecide what product, process, system to analyze; assemble multi-disciplinary team; define boundaries (e.g. up to subcomponent level, process steps), define customer requirements, function(s), environmental/usage conditions. Establish resources, timelines, tools. AIAG-VDA calls this “Planning & Preparation.”Scope document, team members, timeline, definitions of severity/occurrence/detection scales, interfaces/subsystems identified
2. Structure / Function AnalysisMap the system: break into subsystems, components, process steps (PFMEA) or functional blocks (DFMEA). Determine functions of each under normal operation. Define what is required. AIAG/VDA includes structural and functional analysis as its second and third steps.ts/outputs, interfaces
3. Failure Mode IdentificationFor each component or process step or function, ask: “What can go wrong?” (failure modes). Then determine the potential effects (what happens downstream if that mode fails), and the root causes. Also identify existing controls (prevention, detection). Traditional FMEA worksheets include columns for failure mode, effect, cause, current controls. List of failure modes, corresponding effects and causes, current prevention/detection controls
4. Assessment – Severity, Occurrence, DetectionFor each failure mode, assign a severity rating (how bad the effect is if failure occurs), an occurrence rating (how likely is the cause/failure to occur), and a detection rating (how likely the failure or cause will be detected before impact). These ratings use a scale (often 1-10). Severity, occurrence, detection ratings per failure mode; these feed into risk quantification (e.g. RPN or other)
5. Risk Prioritization / Risk AnalysisMultiply S × O × D to compute Risk Priority Number (RPN) in traditional FMEA; higher RPN means more urgent action. In the newer AIAG/VDA methodology, RPN is being replaced (or supplemented) by Action Priority (AP) tables that consider severity first, then occurrence/detection to decide which failure modes need actions. Ranked list of failure modes, with RPN or AP; identification of which ones cross thresholds for mitigation
6. Actions / MitigationFor high risk failure modes, decide corrective/preventive actions: redesign, additional prevention controls, detection improvements, monitoring, inspection, redundancy, etc. Assign owners, deadlines. Evaluate how actions will reduce occurrence or improve detection or reduce severity if possible. Action plans with responsibilities, target values, timelines; expected revised S, O, or D after mitigation
7. Review, Documentation & Feedback / OptimizationDocument findings, decisions, and the FMEA report. After actions are implemented, re-assess the failure modes: did controllers reduce risk? Update the FMEA when design or process changes, periodically review, monitor performance, integrate lessons learned. AIAG/VDA emphasizes results documentation and communication. onitoring

Calculating Risk Priority Number (RPN)

The RPN is a quantitative measure in traditional FMEA that helps prioritize which failure modes deserve attention. It is defined as:

RPN=S×O×D\text{RPN} = S \times O \times D
  • S = Severity rating (how severe the effect is if failure occurs)

  • O = Occurrence rating (how likely the failure is to happen)

  • D = Detection rating (how likely it is that existing controls will detect the failure or cause before it reaches the effect) 

Scales are often 1 to 10, though some organizations use 1-5 or other ranges. The maximum possible RPN is the highest values (e.g. 10×10×10 = 1000). 

Once RPNs are calculated, common practices are:

  • Rank failure modes by RPN

  • Define threshold(s) above which action is mandatory

  • Focus on reducing high RPNs through mitigation of causes, improvement in detection, or reducing occurrence

Note: With the AIAG/VDA harmonized manual, RPN is no longer the only method. The manual introduces Action Priority (AP) tables (High/Medium/Low) which help decide priority of action in a more structured way than using RPN alone. AP considers severity first, then occurrence and detection. 

Real-World Examples

Here are practical examples across industries illustrating how FMEA is used and what benefits it delivers.

1. Healthcare – Pediatric / Adolescent Hospital Care

A recent scoping review found that FMEA, FMECA, and HFMEA have been used widely in pediatric/adolescent hospital settings—particularly in medication prescribing, dispensing & administration, hospital pharmacy workflows, infection control, and sterilization processes. The studies often identified many high-RPN failure modes and recommended improvement actions. 

In one example, FMEA was applied to reduce medication errors: mapping out the steps, failure modes (e.g. wrong dosage, wrong patient, wrong time), rating severity, occurrence, detection, and then implementing double checks, software alerts, staff training, improved labeling, etc. The result was a measurable drop in error rates. 

2. Manufacturing / Engineering: Automotive & Industrial Radiography

  • In automotive, the harmonized AIAG-VDA manual ensures that design and process FMEAs meet OEM expectations; suppliers use FMEA to anticipate failure modes during design and process, reduce warranty claims, improve reliability. 

  • A study of industrial radiography exposure devices used FMEA to assess component failure modes; expert teams identified many failure modes, assigned RPNs, and found that low detectability (i.e. failures that are hard to detect) and high severity were often greatest risks; actions focused on improving detection systems and preventive maintenance. 

  • In engineering design, FMEA is used early to check alternative designs, materials, configurations to see which potential failure modes are most dangerous or likely — leading to design adjustments before prototyping.

Benefits & Limitations

Every tool has strengths and trade-offs. FMEA is powerful but not perfect.

Benefits

  • Prevention over correction: Helps avoid failures before they happen.

  • Improved safety and reliability: Especially in safety-critical systems.

  • Better design & process decisions: Teams think systematically about failure, which often exposes blind spots.

  • Cost savings: Less rework, recalls, warranty, downtime, legal / regulatory costs.

  • Cross-functional collaboration: FMEA tends to bring together design, quality, process, maintenance, operations etc.

  • Documentation & traceability: Good for audits, regulatory compliance.

Limitations

  • Labor-intensive: Thorough FMEA can require many people, time, data. May be difficult for smaller organizations without resources.

  • Subjectivity in ratings: Severity, occurrence, detection ratings often depend on judgement, which can vary between people. Differences in scale definitions can lead to inconsistent RPNs.

  • Potential overemphasis on RPN: RPN treats all three dimensions equally in multiplication, but sometimes severity should dominate. Also, two different failure modes with same RPN might demand different priority depending on severity or context.

  • May miss combined or cascading failures: FMEA examines single failure modes; interactions, multiple simultaneous failures, or complex dependency often require supplementary tools (like Fault Tree Analysis). 

  • Detection control limitations: Some failure modes are inherently undetectable or detected too late; improving detection is sometimes expensive or impractical.

  • Maintenance of FMEA: Unless regularly reviewed, updated (after design/process changes, after field data), the FMEA becomes stale.

Comparison with Other Risk Assessment Tools

Some tools are complementary; others serve different purposes. Here are comparisons:

ToolTop-down / Bottom-upBest forStrengthsWeaknesses vs FMEA
Fault Tree Analysis (FTA)Top-down (start from a top-level event and trace backwards)Analyzing system-level failures, especially logical or event combinations; probabilistic modelingGood at mapping logical dependencies, multiple failure paths, quantitative reliability in complex systemsMore complex; FTA does not always consider detailed causes/effects for each component as FMEA does; may require more data; less intuitive for non-engineers
Preliminary Hazard Analysis (PHA)Broad risk identification early in lifecycleEarly stage assessment of broad hazards; screening high-level risksFast, lightweight; helps set contextLess detailed; may miss subtle failure modes or detailed causes; not as strong in detailed action planning as FMEA
Hazard & Operability Study (HAZOP)Systematic, often bottom-up but in process / chemical engineering contextsProcesses with complex flows (chemical plants, pipelines) where deviations from design conditions are criticalVery thorough in process deviations; structured with guidewords; good for identifying issues with process variablesMore specialized; can be very time consuming; may need domain experts; not all failure modes (especially hardware failures) are covered; may not integrate as easily with design FMEA
Risk matrices / Qualitative Risk AssessmentOften high-level, “quick & dirty” evaluationsEarly screening, prioritizing risks to analyze deeperSimple; less resource heavy; good for initial triageLess precise; lacks detailed cause/effect analysis; less useful for specifying corrective/preventive control; may be subjective and inconsistent

FMEA often works best when used in conjunction with these other tools: e.g. PHA early, FMEA for deeper component-level work, FTA for system logic, HAZOP where process variables matter heavily.

Standards and Guidelines Relevant to FMEA

Understanding the standards helps implement FMEA in a way that satisfies regulatory, customer, and quality management expectations.

  • ISO 9001:2015 — Quality Management Systems. It doesn’t mandate FMEA, but requires risk-based thinking and preventive actions. FMEA is often used to satisfy these risk and preventive action clauses. 9000 Store+2Advisera+2

  • ISO 31000 — Risk management standard. Provides general framework, definitions, steps (identification, analysis, evaluation, treatment, monitoring). FMEA fits under risk identification, analysis, evaluation. Wikipedia+1

  • AIAG & VDA FMEA Handbook (2019/2020) — The harmonized automotive standard for DFMEA, PFMEA, and supplemental FMEA-MSR. Introduces updated methodology, including the seven-step process, more emphasis on prevention vs detection, and action priority tables. quality-one.com+2QualityTrainingPortal+2

  • IEC 60812 — Standard for FMEA / FMECA. Defines methodology for failure analysis. Many studies reference this when describing the “official” table/scoring practices. 

  • ISO 14971 (for medical devices) — Risk management of medical devices. FMEA is one of the techniques that can be used, but ISO 14971 comprises broader risk management policy, post-market surveillance, overall risk acceptability, not just product/process failure modes.

Effective Implementation Strategies

To get full value from FMEA (and avoid pitfalls), here are strategies and best practices.

  1. Involve a multidisciplinary team

    Include people from design/engineering, process, quality, operations, maintenance, safety, user perspectives. Diverse viewpoints help reveal failure modes that might be overlooked.

  2. Define clear scoring scales and criteria

    For severity, occurrence, detection, ensure that all participants understand what each rating means. Use anchor examples. This reduces subjectivity and improves consistency.

  3. Use relevant data

    Use historical failure data, field data, test data, supplier data. Don’t rely purely on guesswork. Where data is missing, treat estimates carefully, with conservative assumptions.

  4. Prioritize high severity & undetected failure modes

    When severity is very high (e.g. safety, regulatory compliance), even if probability is low, you may need mitigation. Also focus on detection controls because often reducing detection risk is efficient.

  5. Thresholds / action priority

    Use thresholds for RPN or AP to decide which failure modes need immediate action. The AIAG/VDA method’s AP tables help with consistency in deciding priority.

  6. Document everything, assign ownership and deadlines

    Without action planning, even identified risks may not get addressed. Assign who is responsible, by when; include metrics to assess effectiveness.

  7. Update FMEA dynamically

    Whenever design changes, process changes, usage conditions change, or new data emerges (failures, near misses), revise the FMEA. It should be a living document.

  8. Balance effort

    Avoid over-documenting trivial failure modes (ones with very low severity, fully controlled). Instead, ensure the effort is focused where risk is meaningful.

  9. Integrate with other risk tools and management systems

    Align with ISO standards, regulatory requirements, management reviews, customer feedback. Use FMEA together with tools like fault tree analysis, PHA, RCA etc., for complementary views.

  10. Training and culture

    People need to be trained not just on method but on mindset: thinking ahead, looking for failure, continuous improvement. A culture that rewards raising issues is crucial.

Example of FMEA vs Traditional FMEA vs AIAG-VDA Harmonized

To illustrate, here is a comparison:

  • Traditional FMEA (pre-2019) uses RPN, long worksheets, focus sometimes equal weighting of S, O, D. Detection sometimes overemphasized because improving detection is easier than reducing occurrence sometimes.

  • AIAG-VDA Harmonized FMEA (2019+) adds:

    • Seven-step methodology (Planning & Preparation; Structure; Function; Failure; Risk Analysis; Optimization; Documentation) 

    • Action Priority (AP) tables to decide what action to take, rather than exact RPN thresholds (or complementing them) 

    • More emphasis on prevention controls (design and process design), not just detection, and making detection control effectiveness clearer.

    • Supplemental FMEA-MSR (Monitoring and System Response) in some domains (especially automotive / safety critical) to address how systems behave post-delivery, durability, monitoring, diagnostics.

Practical Tips & Pitfalls to Avoid

  • Don’t wait until late design stages: Performing FMEA early (even at concept stage) provides more flexibility to change design and avoid risk.

  • Be realistic in detection and occurrence ratings: Overoptimistic detection or underestimating occurrence leads to under-treating risks.

  • Avoid “number inflation”: If every failure mode is assigned high occurrence/severity to force action, RPNs lose discriminative power.

  • Ensure follow-through on actions: Identify actions, assign ownership, track progress. Without action, FMEA becomes documentation only.

  • Use RPN/AP for prioritization, not just ticking boxes: The value is in what you do with findings.

  • Communicate results well: Make findings visible to management, design teams, process owners. Risk perception differs: what is “high severity” to one group may seem less to another.

Conclusion

Failure Mode and Effect Analysis (FMEA) is a mature, powerful, and widely-used method for proactively identifying and mitigating failure risks in design, process, systems, or service contexts. Its history from military/aerospace roots, through automotive and manufacturing, to healthcare and software, shows its adaptability and relevance. Modern standards (AIAG/VDA, ISO 9001, ISO 31000, etc.) increasingly push organizations to incorporate risk-based thinking, preventive action, and structured risk analysis—and FMEA is a prime tool for doing so.

By understanding how to conduct FMEA rigorously — defining scope, mapping structure and functions, identifying failure modes, assessing severity/occurrence/detection, calculating RPN or using AP, identifying mitigation actions, and regularly reviewing — organizations can improve safety, reduce cost, meet regulatory expectations, and enhance reliability.

References

Here are key sources / studies mentioned or useful for deeper reading:

  • SMM El-Awady, et al. (2023). “Overview of Failure Mode and Effects Analysis (FMEA)”. PMC. 

  • M Vecchia et al. (2025). “Healthcare Application of Failure Mode and Effect Analysis” - literature review. PMC.

  • Industrial radiography devices risk assessment using FMEA

  • AIAG & VDA FMEA Handbook; AIAG-VDA harmonization project.

  • Comparison of traditional vs fuzzy-based FMEA in smart grid systems.

Post a Comment

0 Comments