Run an FMEA, before it fails you.
Also called: Failure modes analysis · FMEA · Risk priority analysis · Failure planning
Listing every way the product could fail, scoring each by severity, likelihood and detectability, and fixing the worst before launch.
An FMEA makes you imagine failure on purpose. You list how each part could fail, score it three ways, multiply to a priority number, and act on the top scorers. The worked table below shows it on a real product.
What an FMEA is
Failure modes and effects analysis is a structured way of finding where a product will let people down before it does. For each part or function you ask four things: how could this fail, what happens when it does, how likely is it, and would we even notice. The answers point your prevention effort at the failures that actually matter, instead of spreading it thin across every worry.
How the scoring works
Each failure mode gets three scores, one to ten: Severity (how bad is the effect), Occurrence (how likely is it), and Detection (how likely you are to catch it before the customer does, where ten means you almost certainly won’t). Multiply them for a Risk Priority Number, RPN = S × O × D, and sort. The high numbers tell you where to spend.
To keep the numbers consistent, anchor the ends of each scale before you score: for Occurrence, 1 is a failure you would expect on almost no units and 10 is one you would expect on nearly every unit, so a one-off shipping knock sits low and a marginal tolerance that bites most builds sits high; for Detection, 1 is a fault a built-in check or test catches every time and 10 is one nothing in your process would catch before the customer does.
Here is the proofing box’s FMEA, scored. It is worth reading the severity column closely, because on a mains appliance the worst-consequence failure is what comes top.
| Failure mode | S | O | D | RPN |
|---|---|---|---|---|
| Heater fails on and runs away, unattended overnight | 10 | 4 | 7 | 280 |
| Ceramic body cracks or breaks in shipping | 6 | 4 | 5 | 120 |
| Temperature sensor drifts off the ±0.5°C band | 5 | 3 | 4 | 60 |
Thermal runaway tops the list not because it is the most likely, but because of severity: it scores a ten there, since a heater that fails on and keeps climbing, unattended overnight on a counter, is the worst case any kitchen appliance carries. That single column is why it earned a real design fix, an independent thermal cutoff that kills the heater regardless of what the control board is doing, rather than a note in a file.
Watch-outs
Running it too late. An FMEA after tooling or production has started is an autopsy. Run it while a fix is still a CAD change, not a tool modification.
Treating it as paperwork. A scored sheet nobody acts on is theatre. The output is the fixes and the re-score, not the spreadsheet.
Keeping it in the design team. The people who make it, use it and test it see failures the designer cannot. Pull them in, or you will miss whole columns of risk.
Ignoring the clusters. A pile of medium-priority items can add up to one big problem. Do not only chase the single highest number.
How it fits the bigger picture
Run FMEA is an Engineer-stage activity, code 07.10.10. Behind it sit the analysis and the wider risk work that feed it. Ahead of it sit the pre-production CAD and the prototypes where the fixes get tested.
What it can do
Surface the failure modes worth preventing while prevention is still cheap, and direct your limited effort to the failures that would actually hurt the customer or the business.
What it can’t do
An FMEA only catches the failures you think to list; it will miss the ones nobody imagined. It reduces risk, it does not remove it, and it is no substitute for real testing.
See the full 10-stage process →
Try it yourself
For each part of your product, write one specific way it could fail and what happens when it does. Score each on severity, occurrence and detection, one to ten, and multiply. Sort by the total. Fix the top few, then re-score them to prove the fix worked. The list is only worth the actions it triggers.
Want a structured first pass? Start the Free Sprint → and the GPT will help you turn worries into scoreable failure modes.
Before you call the FMEA done
▸ From the notebook · optional reading
Project notes: the proofing box’s real failure modes
The proofing box’s FMEA produced a handful of failure modes that mattered. The highest scorer was the one with the worst consequence: a heating fault on a mains appliance.
3 min read · click to open
A countertop appliance does not feel like it needs a full FMEA. On a mains product that runs warm overnight unattended, it absolutely does, and four failure modes rose to the top: thermal runaway, ceramic crack and shipping breakage, PSU isolation failure, and the temperature sensor drifting off the ±0.5°C band.
The one that dominated was thermal runaway. It scored high not because it was likely, but because of consequence: a heater that fails on and keeps climbing, unattended overnight on a counter, is the worst case any kitchen appliance carries. That earned the hardest fix: an independent thermal cutoff that kills the heater regardless of what the control board is doing, on top of the BS EN 61010-class design and the isolated PSU. PSU isolation failure was mitigated the same way, by the isolation built into the supply rather than trusted to firmware.
Ceramic crack and shipping breakage drove the protective packaging, and sensor drift drove the thermal-hold spot check in QC. I re-scored all of them after the fixes to confirm the numbers had actually dropped, because an FMEA you do not re-score is just a worried afternoon. Most of these landed in the risk register too, but the FMEA is where they turned from risks into specific engineering changes on a £149 mains appliance, where the worst case is not an inconvenience but a fire.
— Next in Engineer → CAD pre-production
