The Amazon cloud crash’s silver lining

When Amazon’s data center in Virginia began experiencing some problems last week, it resulted in a major outage for many online businesses and services that rely on Amazon Web Services (AWS) — or “the cloud,” if you will — for their data storage. This is the first major widespread failure of cloud storage, and plenty of tech bloggers have been analyzing the incident and trying to figure out what went wrong.

Over the weekend I thought about the Amazon cloud collapse in the context of a recent lecture I attended on our campus about success and failure in engineering design. The lecturer, Henry Petroski, a professor of civil engineering and history at Duke University, approaches failure from an unusual perspective, since he is both an engineer and a historian. In the talk, Petroski described the paradox of failure:

  1. Anticipating failure leads to success. But …
  2. Successful designs evolve into failures. That’s because we tend to become overconfident in our successful designs (or programs, or models, whatever is working at the time).

The cloud computing model of data storage has chugged merrily along for years. Amazon’s approach seemed to be the best model around. Foursquare, Hootsuite, Reddit and Netflix all put their eggs in the AWS basket, because it seemed like a good thing to do, and Amazon guaranteed the service. No doubt Amazon’s designers created what they thought was a fail-safe system, with multiple availability zones. The thought was that, even if one of those zones failed, others would not. But the concept itself didn’t work. More than one availability zone failed.

Back to Petroski: One of his examples was the infamous failure of the Titanic, a ship that was supposed to be unsinkable. We all know what happened to that vessel.

But Petroski posed another question: What if the Titanic hadn’t struck the iceberg? Then transatlantic voyages would have continued status quo, and more shipping companies would have followed the paradigm of the Titanic designers, and perhaps an even greater tragedy, or multiple tragedies, would have occurred later. Instead, ship designers learned from the Titanic failure and re-designed their vessels with double hulls and taller bulkheads in an effort to prevent a duplication of that disaster.

What lessons will we learn from the crash of the Amazon cloud? More pertinent to those of us in marketing, what can we learn from our own failures?

The value of failure

Last summer, after reviewing Charlene Li’s wonderful book Open Leadership (affiliate link), I riffed on her thoughts on failure in a blog post called The value of failure. I quoted a question from Li that is relevant to higher education:

In your organization, how important is it for people to be risk takers, to be innovators? If initiative and innovation are key to your future success, then you need to take a long hard look at how you personally create trust and approach failure, because it will be reflected back in the culture that you create.

“Colleges and universities tend to be mainly conservative, cautious institutions,” I wrote back then. And nothing has changed. We’re still conservative and cautious. “Not many of our leaders got to where they are by taking huge risks in their careers. And so, the culture that rewards a cautious approach is not likely to reward risk takers — especially if they fail.

“So where,” I asked, “does that leave us who fall in the middle of the org charts and who aspire to be the open leaders Li talks about? ”

I think it leaves us to take risks and give those who report to us as many opportunities to take risks as we can. We should not discourage risk-taking simply because we are in a culture that rewards caution. Moreover, we should learn to practice the art of forgiveness. To those in our organizations who take risks and fail, we should ask: “What did you (we) learn? How can this help us in the future?”

We should also take a cue from Google, which has a motto — “Fail fast, fail smart” — that would be a nice one to adopt in higher ed.

So where’s the silver lining in Amazon’s cloud fail? It’s in the lessons we learn from it. After all, we’re in the learning business, right?

* * *

Update, 11:40 a.m. CDT: This post about lessons from four famous marketing failures, via Laura D.’s Marketing for Higher Ed, provides relevant insight from the marketing realm and reminds us that not all disasters are related to engineering and technology. Sometimes marketers blow it — and blow it big-time.


Author: andrewcareaga

Higher ed PR and marketing guy. Communications director for Missouri University of Science and Technology (Missouri S&T) in Rolla, Missouri, USA. Slow runner, mediocre guitarist, lover of music and puns, and an avid St. Louis Cardinals fan. I blog and Tweet about #highered, #music, #gocards and #random stuff.

6 thoughts on “The Amazon cloud crash’s silver lining”

  1. We learn from our mistakes. Do we learn anything if things run smoothly?

    I don’t think we can ever give an absolute 100% guarantee that problems won’t occur.

    Good arguments.

  2. Ben – Thanks for the comment. I think one of the mistakes Amazon made — and something we might all learn from this situation — is the “guarantee.” You’re correct: no guarantee is iron-clad. So what does one get in exchange for a guarantee? What is the “…or your money back” part of AWS’s agreement with its clients?

  3. Andrew, all the hosting services providers I’ve ever dealt with promise a 99.99% uptime or something similar. Essentially what they’re saying by the 99.99% is that they’re totally confident the service won’t go down and have multiple backups and so on, but they always leave that 0.01% as an out so they won’t get sued if they do go down. I imagine the Amazon Cloud Service has (or had!) a similar 99.99999% guarantee. No hosting company will ever guarantee 100% uptime – at least I’ve never seen one that did.

  4. Hi AC – great post. I couldn’t agree more with Petroski’s second point, that successful designs evolve into failures. One of the most frustrating things I’ve found working in this industry is people’s reluctance to change, whether it’s a media plan, an office structure, or even a publication. The going attitude seems to be “if it ain’t broke, why fix it?” But I’ve found that the people who are saying this are not paying attention to what others are doing, and therefore are not aware that their methods are indeed broken – or on their way out. It’s important to key into what others are doing to stay with the trends, and to constantly evaluate the work your doing to make sure it’s measuring up.

  5. A great blog. We were hit when Amazon’s cloud got hit through our use of CoTweet (who uses Amazon’s cloud). Since this is the first such issue with them, we are relying on CoTweet to minimize future issues. Otherwise, bye bye CoTweet.

    More to the heart of the matter, we learn the best from our mistakes and failures. One thing that many lack is the ability to brainstorm and foresee possible issues. While there are always failures, there are always ways to mitigate some failures. One of our biggest failures is our failure to foresee futures failures. Staying ahead of failures and being proactive could possibly be just as productive and keep things running smoothly.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s