Is AWS S3 the CDO of the Cloud?

The answer: not really, but the question needs examination.

One of the causes of the financial crisis of 2008 was the flawed ratings of complex financial instruments by supposed experts (ratings bodies such as S&P). Instruments such as CDOs comingled mortgages of varying risk levels, yet managed to get the best (AAA) ratings. The math (simplified) was that the likelihood of all the component mortgages failing at the same time was very low, and hence the CDO (more accurately the AAA rated portion of the CDO) itself was safe. For example if the probability of a single mortgage defaulting is 5%, then the probability of 5 mortgages defaulting at the same time is 1 in 3 million. The painful after effects of the bad assumptions underlying the math is still being felt globally, 5 years later.

If there is a AAA equivalent in the Cloud, it is AWS S3 which promises “11 9s of durability” (99.999999999%) or:

“if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years”.

However, AWS does not give us the mathematical reasoning behind it. We have some clues:

“Amazon S3 redundantly stores your objects on multiple devices across multiple facilities in an Amazon S3 Region. The service is designed to sustain concurrent device failures by quickly detecting and repairing any lost redundancy”.

Let’s assume S3 stores 3 copies of each object in 3 datacenters that do not share the same failure domains. For example the datacenters each can have different sources of power, be located in different earthquake fault zones and flood zones. So, by design the possibility of 3 simultaneous failures is presumably low. Intuitively, if the probability of losing one object is 10-4, then the possibility of losing all 3 is (10-4)3 or 10-12. This is an oversimplification, there are other factors in the calculation such as the time to recover a failed copy. This graph (courtesy of Mozy) shows that the probability of losing an object is far lower with 3 copies (blue) than with 2 copies (yellow).

Image

The fatal flaw with the CDO ratings was that the failures of the component mortgages did chillingly correlate when the housing bubble burst : something that the ratings agency had not considered in their models. Perhaps the engineers and modelers at Amazon are far better at prediction science. But remember that the Fukushima nuclear disaster was also a result of a failure to account for an earthquake larger than 8.6. What we do know that at least in the US Standard region, AWS stores the object bi-coastally, so indeed the odds of a natural disaster simultaneously wiping out everything must indeed be quite low. There’s still manmade disasters (bugs, malicious attacks) of course. But given the uncertainty of such events (known unknowns), it is likely that the 10e-11 figure ignores such eventualities. Other AWS regions (EU, Japan, etc) do not store the objects with such wide geographic separation, but there are no caveats on the 10e-11 figure for those regions, so I’m guessing that the geographical separation in the U.S. Standard region does not figure in the 10e-11 calculation.

Interestingly none of S3’s competitors make similar claims. I can’t find Google Storage or Azure Storage making similar claims (or any claims). This riposte from the Google Storage team says

“We don’t believe that quoting a number without hard data to back it up is meaningful to our customers … we can’t share the kind of architectural information necessary to back up a durability number”.

Windows Azure storage SLA quotes an “availability” number of 99.9%. This availability number is same as that of Amazon S3.

Waitaminnit. What happened to the 11 9’s we saw above?

The difference between availability and durability is that while the object (at least 1 copy) might exist in AWS S3, it may not be available via the S3 APIs. So there’s a difference of 8 nines between the availability and durability of an S3 object.

Given the actual track record of AWS S3, it is perhaps time to revise the durability  estimates of this amazing service. If I were an enterprise or government organization considering moving my applications to the AWS cloud, I would certainly want to examine the math and modeling behind the figures. During congressional testimony, the ratings agencies defended their work by saying that it was impossible to anticipate the housing downturn: but clearly lots of people had anticipated it and made a killing by shorting those gold-plated CDOs. The suckers who trusted the ratings agencies blindly failed to do their own due diligence.

I do believe that storing your data in AWS S3 is less risky than your own data center and probably cheaper than storing it across your own multiple data centers. But it behooves one to use a backup: preferably with another provider such as Google Storage. As chief cloud booster Adrian Cockroft of Netflix says:

Advertisements

One thought on “Is AWS S3 the CDO of the Cloud?

  1. Pingback: Apple’s iCloud is a multi-cloud beast | Cloudier Than Thou

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s