Reliability
CLOUD SERVICES ALSO HAVE OUTAGES AND THAT’S PERFECTLY ALRIGHT

There’s a fairly popular belief going around that cloud services are somehow flawless and that these services never experience any outages. However, over and over again, we see news of outages in cloud services in the real world. Just recently, there was an outage in one of the US regions of AWS which affected both smaller and bigger players around the world, from Amazon’s own services to dating apps, security systems, and robots who all need cloud services to function. The last big outage before this happened to Fastly, which crippled a huge part of the internet, but nearly all large and small cloud service providers have experienced similar outages.
Does this mean that cloud services are not reliable?

Technology is technology – regardless of whether it is in the cloud or somewhere else –, which means that outages are unavoidable, and their happening is only a matter of time. Luckily, cloud service developers and architects know to take this inevitable fact into account at every step and create the necessary backup systems for critical services and data, which both the infrastructure and the software in it can automatically switch over to in case of an outage.

A cloud service outage critically affects all clients whose businesses are dependent on it – whether a hospital, a governmental service, or an online store. For example, if an online store is hosted in the AWS’s Virginia data centre in the US, which experienced an outage just recently, then to ensure that the online store’s business would not have been affected by the outage, there would have had to have been a copy of it in a European data centre as well. In that case, a correctly set up system would have automatically redirected users from the American store to the one in Europe. Customers would not have become aware of the service outage happening in the background and the store owner would not have missed out on a single sale.

Sounds simple and logical, but unfortunately, often enough, people don’t account for something like this happening and systems are knowingly not built to be reliable from the beginning.

Generally, there are two reasons for this. First, the misunderstanding that the cloud is absolutely flawless, even when used without any special skills. At the same time, cloud service providers emphasise and real-life examples prove that this is not true.

The second reason is taking a logical and calculated risk: companies calculate exactly how much the outage of a service will cost them per hour and how much making the service reliable will cost them. Based on that calculation, it can be decided whether it’s more profitable for the company’s owner to not have the service available for a number of hours or to have it made more reliable.

Understandably, making a service more reliable can be a very costly undertaking but luckily, there are many ways of doing it. For example, reliability can be increased by duplicating only the critical services and data between data centres located within the same geographical region, or by duplicating the critical parts of the service across data centres located in different geographical regions.

To create an even more reliable solution, it is possible to duplicate the services across different cloud service providers in addition to different geographical regions.

In conclusion, one should always keep in mind that technology will occasionally break. If you consider that from the start, you can save both your nerves and money – by being informed of the actual risks, you can prepare development jobs, work processes, and crisis communication well ahead of time.

Klemens Arro
CEO of ADM Cloudtech

This article was also published in Geenius DigiPRO.