Latent Failures
- Categories
- Systems
- Sources
- How Complex Systems Fail
Complex systems always contain multiple flaws, each latent and individually insufficient to cause harm. Because the system keeps running, these flaws accumulate largely unnoticed, and you can never remove them all.
Why it Matters
Failures are not anomalies waiting to be eliminated; they are a normal, permanent feature of complex systems. Accepting this reframes the goal from "zero flaws" to managing how flaws combine and what defends against the combination.
Signals
- A system that works while known issues remain open.
- "We've always had that problem."
- Incidents traced to conditions that existed long before they bit.
Benefits
A realistic risk posture: attention shifts to dangerous combinations and to defenses, rather than chasing every individual flaw.
Risks
Complacency, because the system runs and the flaws seem harmless until several align; or the opposite, trying to eliminate every latent flaw, which is impossible and ruinously costly.
Tensions
You cannot remove all latent failures, yet each is a genuine hazard. The judgment is which to fix and which to defend against.
Examples
A service running for months with a race condition that only bites under a rare load combination; safety incidents where every contributing condition had been present and tolerated for years.