F
11

That week our API went down for 3 days straight was a nightmare for ops

Last Monday our entire payment API crashed at 9am, and it took us until Thursday to get a fix deployed. We lost about $12k in pending invoices and had to manually reconcile every single customer account after. Has anyone else dealt with a major third party dependency failing like this and had to scramble to cover for it without an SLA lawsuit?
3 comments

Log in to join the discussion

Log In
3 Comments
laura_chen41
My cousin runs a small bakery and last summer their credit card reader vendor pushed an update that bricked the whole system for almost 5 days. No POS, no invoices, nothing. They lost a full weekend of sales and had to take handwritten orders. It reminded me of how we all just trust these services like they're part of the walls and then one tiny thing breaks and you're scrambling with notebooks and sticky notes. The pattern I see is that nobody builds for failure anymore, we all just assume the cloud will always work. But the real world doesn't work that way, your payment API going down is no different than a spark plug failing in a car that's been run too long without a tune up. We just keep patching instead of reinforcing the weak spots.
9
elliotm57
elliotm5710d ago
Did they figure out if there was any kind of backup or manual override that could have kept them running? Seems like a bakery especially should have a way to take payments without the whole system going dark. Doesn't make sense to have all your eggs in one basket like that.
6
pat_moore
pat_moore10d ago
Honestly, having fallback systems for a bakery is just extra cost and complexity most of the time. Tbh, if your internet goes down for more than an hour, you've got bigger problems than just taking payments. Ngl, the cloud is usually way more reliable than some janky backup system nobody bothers to maintain anyway.
5