This week, a significant portion of the network crashed when this affected Fastly sites on Tuesday massive downtime which affected about 85 percent of the network.
The near-complete collapse – which was quickly identified and fixed – took off sites like GitHub, Stack Overflow, PayPal, Shopify, Stripe, Reddit, Amazon and CNN. In addition, expressing rage on Twitter was impossible because it also affected a server that handles social network emoticons.
This downtime was extensive and severe, and we truly regret the impact on our customers and anyone who trusts them.
– Nick Rockwell, Senior Vice President, Technology and Infrastructure, Fastly Inc.
The incident occurred at approximately 10.00 UST (06.00 EST) and caused mass errors “Error 503”. Fastly recognized it in less than a minute and fixed it in an hour.
The initial analysis shows that the whole episode was triggered by one customer updating their settings (in a perfectly valid way) – Do you know the nightmares you have about clicking the wrong button and deleting the entire network? Imagine that you are that person. The exact combination of settings triggered an error in an update that had been missed in Fastly’s quality control and had been sitting in the production code since May 12th.
If you’ve ever visited a serious server center, you know what kind of security they use to defend against potential criminal attacks. The only center I’ve visited in person was inside a nuclear-resistant bunker with multiple security checks and I wasn’t even allowed to Really safe part. But it turns out that all terrorists need to open a CDN account and update regulations to overthrow the global economy.
Reacts quickly really much faster than competitors ’previous CDN mass outages – one possible reason for its stock price to rise this week. But it is still trapped in a competitive cycle where quickly and cheap is easy to compare, and good is somewhat abstract … until it is not.
Most of us feel like experienced hands online when the truth is very early adopters. It will take a century or more for the network to truly integrate into society. Yet we are building foundations now, and future generations need these foundations to be solid. We need less focus on a few cents for a refund, less focus on delivering sites in 3 nanoseconds before the user opens the browser, and more attention to flexibility.
Like everyone, I love eye-catching fast sites, and I’m more than happy to get a great deal, but personally, I don’t think either of these things are worth waking up to the error 503 on the site I’m responsible for.
Picture Through Unsplash.