We're evaluating cloud providers, and I've heard some horror stories about bad service-level agreements (SLAs). What should we watch out for?
Cloud-provider SLAs typically allow for a certain amount of failure before a problem qualifies as an outage.
Instead, what most cloud providers cover in their SLAs is the general availability of their services. That means that any single server can go down and not be considered an SLA violation.
Cloud-provider SLAs also typically allow for a certain amount of failure before a problem qualifies as an outage. For example, with Amazon Elastic Compute Cloud, or EC2, it's considered a true outage only if all your instances within two availability zones (AZs) are down. That means that if a single AZ is down, or if you're running in only a single AZ, you're not covered by the SLA.
It's also important to understand what the SLA covers in terms of what you'll receive and how you'll receive it if the provider breaches the agreement. In Amazon's case, you typically receive only statement credits, which don't help you recoup any potential revenue lost during downtime.
In addition, many providers specifically remove the requirement to automatically grant such credits, instead offering a way for you to request statement credits if you see an SLA violation. Going through that process is usually more trouble than it's worth.
In general, SLAs won't actually help you recover anything in the event of an SLA violation. For that reason, it's important to identify your own recovery steps to prevent provider outages from costing you money.
Some questions to consider in developing those steps: If the servers in one area are down, do you have the ability to launch servers in another area? Can you migrate your data and instances from one location to another? And what can you do if the provider does violate the SLA?
This was first published in October 2013