AWS: last week’s outage happened because reasons

A bunch of digital services were unavailable for hours last week due to an AWS outage and the company has finally offered an explanation.

For those of us bemoaning the lack of detailed information provided at the time, this is a case of ‘careful what you wish for’. In the explanatory post, published three days after the outage, a breakdown of the plumbing of the AWS network is offered as context. You can read it here if you’re into that sort of thing, but the short version is: it’s complicated in there.

There are many good reasons for AWS to inflict so much arcana on the outside world. Firstly there’s the acknowledgement that a proper explanation was overdue. Then there’s the likelihood that, the longer and more complicated it is, the more people are likely to get bored and just accept it at face value. And there’s the inference that it’s amazing such a complicated thing goes wrong so rarely.

While all of these are valid they still miss the central point, which is that the modern economy has become so reliant on a few public cloud providers that we’re all effectively at their mercy. No double AWS is trying its best and will learn form the outage, but that’s of little consolation to companies that lost a day’s business or, worse, people who had their video streaming options severely limited (Amazon Prime seems to have been unaffected).

“Finally, we want to apologize for the impact this event caused for our customers,” concludes the AWS post. “While we are proud of our track record of availability, we know how critical our services are to our customers, their applications and end users, and their businesses. We know this event impacted many customers in significant ways. We will do everything we can to learn from this event and use it to improve our availability even further.

For all its renewed striving this presumably won’t be the last time AWS or Azure or Google Cloud has a wobble. They will be no less contrite and full of resolve to avoid a repetition next time too. But you have to wonder how close regulatory authorities are to trying to intervene in the name of the greater good. It remains what, if anything, they can do, but political pressure to be seen to be doing something is likely to grow.

One comment

  1. Avatar Rodolfo Di Muro 13/12/2021 @ 3:51 pm

    Hi Scott,

    Outage has happened from telco equipment providers in the past / from well consolidate vendor and it is not surprising now that could happen from IT providers as well.

    Private Cloud has a scaling factor, and although you have the 3 majors companies, there is also a learning aspect that you are disregarding.

    Despite what your experience is, there is no a single new product that do not require an update release for debagging software or little issues not detected on the initial delivery

    The issue is if there is a continuous outage along the journey!

    take care!

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.