> "We had the same outage three times in a year — different teams, same root cause."
^pain
A collective synthesis pain. Post-incident reviews are scoped too narrowly, lack rationale, are never revisited. Groundhog-day outages where the same class of problem recurs for months or years. The post-mortem document exists; the *learning* doesn't propagate.
## Discovery questions
- "Walk me through your last three incidents - is there a thread you only see in retrospect?"
- "When was the last time a post-mortem changed how your team actually works, six months later?"
^discovery-questions
## Examples
- CFPB on Wells Fargo: improper sales-practice issues and fake-accounts patterns persisted for years across business units because root causes were not addressed.[^1]
- Google SRECon talks like "The Halting Problem of Incident Response" document orgs seeing recurring outages because postmortems focused on proximate causes rather than systemic fixes.[^2]
- Cloudflare incident reports include cases of repeated BGP or configuration-related outages, prompting stronger change management after Groundhog-day patterns.[^3]
- UK FSA/FCA documented repeat control failures at UBS that enabled multiple rogue-trading incidents over years, despite earlier warnings.[^4]
- NTSB repeatedly cites airlines and rail operators for the same kinds of human-factors and signalling errors across accidents, highlighting shallow learning.[^5]
[^1]: https://www.consumerfinance.gov/enforcement/actions/wells-fargo-bank-n-a-2016-ct-order/
[^2]: https://www.usenix.org/conference/srecon
[^3]: https://blog.cloudflare.com
[^4]: https://www.fca.org.uk
[^5]: https://www.ntsb.gov