Vendor Lock-in on the Cloud

March 31, 2020 adminkw

I see a lot of talk online about avoiding cloud vendor lock-in. I used to buy into this axiom as well and for some cases that might be prudent. But, if you’re looking to save money, leverage vendor security controls for compliance, and increase development velocity , vendor lock-in shouldn’t be so worrisome.

There’s a TL;DR at the bottom, the last 2 sections

The big cloud vendors offer value far past simple compute and storage. Most companies rely on AWS or Azure for services that a simple VPS provider could provide. The real value is in re-architecting applications to utilize your provider’s managed services. When you take this into account, you can easily see the savings and value falling into vendor lock-in can provide.

Before you go further…

If you’re an engineer running software outside of web, this blog post might not be for you. I’m thinking of people running micro-services and people who want to migrate their monolith to them. This post is for people running scalable web applications that have some allowable downtime. Please see Google’s Error Budget.

This is not for people running services that have to have cross provider configurations (AKA: places that have the budget $$$)

Why would you willingly implement vendor lock-in?

Focusing on a single cloud platform allows your engineers to become experts of that domain. There is a learning curve. You can’t expect engineers to go from a standard environment to a cloud platform in an instant. There are lots of assumptions that no longer apply and new opinionated configurations for them to learn.

My current employer has complete vendor lock-in with AWS. I started my position about 2 years ago now. Not long before I started, I used the basic services; ec2, vpc, s3, rds. At the time of being hired, I may have even been intermediate level, using things like alb, elasticache, and the es services. But it wasn’t until I was working somewhere with vendor lock-in to AWS when I really got going with all there was to offer.

As a Developer

vendor lock-in savings over time — Cost initially went up due to some service misuse. Iterated on the design until, voila

I have personally re-engineered apps to reach huge reduction in costs. I’ve had teammates reach up to 5000% reduction on some apps, the one in particular being $3,000 per month down to the $50s. You’ll be surprised what once required a full Java enterprise app (relational database, web server, activemq, plus the JVM) can be whittled down to s3, lambda, dynamodb, sqs, and api gateway.

You only get these benefits if you commit to vendor lock-in. Your engineer’s expertise is what can lead you to these cost reductions. The vendor specific technologies allowed me to retool and rebuild this app in an incredible amount of speed. If you don’t have the budget, and you’re busy trying to be cross-cloud compatible, these are the kind of savings you’re missing out on while you spin your wheels trying to be cross-cloud.

As Operations

Almost every service has some sort of automated backup built right in. Most fully compliant with any security control at the click of a button or setting of a configuration parameter.

Different regions will save you from major outages. If you’re really worried about global service downtime, you most likely won’t be the only person experiencing it, and your service probably isn’t that important. If it is, you’ll be included in the people I mentioned above, with a big budget. You need to account for the acceptable risk to your organization here and the web services you’re running.

There is no cloud agnostic tooling to manage across clouds easily (outside of K8s stack, which can be a different burden). Deploying to multiple clouds, even if your infrastructure is IaC such as Terraform, you will be re-writing and re-engineering a ton of code. It’s a massive overhead deploying to 2 separate cloud platforms in active->passive, and even more so in active->active.

All those savings you hear about?

They are real! But sometimes the savings from this kind of vendor lock-in can be more abstract. You need to consider the value you are purchasing.

First you need to consider all the savings alone on not having hardware or personnel to take care of it. Then, multiply those savings because you your cloud provides the value of multiple data-centers.

If you have backups now, do you know what they cost? If you can sum up that number easily; labor, hardware, physical space, power, internet, etc. then you’ll be able to see how much you can save. If you can’t, don’t worry about it, I haven’t seen a company at less than $80MM ARR capable of giving a good estimate on their backup costs. Cost overhead on backups in the cloud? Fractions of what it would cost to do it yourself, sometimes free.

With these more direct cost reductions, you will also save on effort to perform security audits and you will gain a ton of value in a faster and more agile development team on the cloud.

The C word – Compliance

Have you ever had to go through a compliance audit? Leverage as many controls covered by your cloud provider as you can. This will greatly ease the burden of many compliance controls. I’ve knocked out whole families of FISMA controls by just pointing at AWS’s compliance page.

You’re wondering why cloud vendor lock-in matters for this. Most providers cover the same controls/control families. You’re right. But you’ve just increased the boundaries of your environment and included more infrastructure, different types of infrastructure, and different process/procedure into your audit. You’ve effectively increased the scope of your audit anywhere from 1.5x to 2x.

Vendor lock-in can speed up development

This is probably my biggest claim and it’s not going to automatically be true just because you jump on a single cloud provider. This will take discipline and some initial investment.

A lot of devs are not used to working in the cloud and being able to rely on cloud services. There are patterns and best practices that must be learned when using AWS services. Due to this, if you are buying into a cloud provider, make sure to get your operations AND developers courses/training. Help facilitate them to get certified, studying and training for certs will lead to you finding new services you didn’t know of before.

Now this increased speed isn’t just from your development team. Your operations/devops/sre team (I’ll just call them the ops team from here on) needs to be on-board with helping your dev team MOVE! Separate out your infrastructure from your applications. Devs should be able to write their own Terraform to deploy their application (most likely a lambda function or ecs service) on to the infrastructure provided by your ops team.

If you are somewhere where security/compliance is important (THIS SHOULD BE EVERYWHERE!) do not rush these steps. You want to make sure you have a good developer workflow, separate dev/staging/prod environments, and proper mapping of permissions to users. So for important infrastructure the devs can write the Terraform, but they can’t deploy it, that’s up to the CI/CD pipelines.

Finally, to keep speed while maintaining compliance, make sure your policies and procedures are in line with what you’re doing. Tie commits and merges to tickets, get approvals on merge requests, and make sure you stick to what your internal policies and procedures say. There’s no reason for compliance to slow down development in a modern environment.

A final word

If you want to ignore all the crap I wrote above, go ahead. Just remember this metaphor and try to think of this issue like so:

You are a race car team owner and manager. AWS cars and Microsoft cars run the same as normal cars, but they also support a few extra features not in normal cars. These features can make them faster, more convenient, and more reliable.

You have some drivers, mechanics, pit crew, etc. and they are all experts with normal cars. So everything on AWS cars or Microsoft cars that makes the car work and function like a normal car is the same, and these guys can work on them and drive them. The extras they’ll learn as they continue to work on them, they’ll gain more expertise as time goes on.

Now you need to decide.

You can get AWS or Microsoft cars and stick with one brand. Then make sure your drivers and crew understand how to drive them and get the extra features out of them. Keeping up with the metaphor, let’s say those extra features are like automatic tire changes while driving, unlimited fuel, etc. Starting out you’ll do fine, driving the cars like normal cars, but as you learn the features you’ll start to blow your competition away.

You can worry about AWS or Microsoft raising prices on your cars and parts. You decide to go with both to balance this risk and miss out on all those early wins in the series because your drivers and crew weren’t specialized. Hell, some of them were still putting the cars together when the races were starting, unless you decide to hire more crew.

So do yourself a favor

Stick to a single cloud provider unless you have a massive budget. The time, effort, and money you spend on being cloud agnostic could be better spent digging into a single cloud providers value added services and then building around your provider’s paradigms.

This is how you generate true value from your cloud provider, gain the promised savings you’ve heard so much about, improve and lower costs of security and compliance, and increase developer velocity.

KWNetApps

Devops and Systems Engineering