I'm just going to say it: DevOps is hard.
I don't think many people would argue with that point, and yet I still feel a tinge of shame admitting it. I expect a lot of my "full stack" peers might agree. I'm not totally sure what it is about DevOps that makes it so easy to underestimate, but I'd guess it has to do with cultural perceptions of IT work as procedural and repetitive.
Of course, setting up a Kubernetes cluster is an entirely different beast than helping Mom install a PDF reader. But in the back of my head there's still a voice saying: "I just need to follow some step-by-step instructions, click a few buttons in the AWS Console, and I'll be all set. It's easy. Right?"
What's so hard about DevOps?
Let's be honest, though: There's a reason some of the best and brightest are employed as full time DevOps engineers. It's hairy stuff. First, there's the staggering menu of options and decisions to be made when spinning up a new web service:
- Which cloud provider will you choose? AWS, Google, Microsoft, ...
- How will you manage infrastructure within that cloud? Terraform, Chef, Puppet, Ansible, Kubernetes, CloudFormation, Serverless, some combination of these...
- What will you do for logging? ELK stack, a logging SaaS, a services specific to your cloud...
- ...error handling?
- ...background jobs?
- ...asset management?
- ...data storage?
Determining the setup you want is a significant task that will take a long time, and actually getting it all spun up takes even longer. I've heard from several coding boot camp instructors that deployment is one of the biggest hurdles for students, frustrating even those who excel in other areas of their programs.
Then, of course, once you do get everything set up, you and your team have to debug it when things go wrong. Complicating this is the fact that you likely have a pretty unique configuration at this point - how many others will have chosen exactly the same 7 course meal as you?
Finally, you'll have to pick up your toys when you're done playing with them. Eventually, you're going to have to destroy a service you created. Maybe you're migrating to a v2, or your bet on a new service didn't pan out. You'll want to sort through all of the relevant pieces of infrastructure and flip the switch. This can be difficult, and it's an easy step to skip. But cutting corners here can be costly. Just do a Twitter search for 'AWS bill forgot'.
Oh my god. 13 years after I sign up for AWS it finally happened to me: my first terrifying bill. $1300. 😱— Aidan W Steele (@__steele) January 18, 2020
I’d been playing with Control Tower and set up a handful of accounts. Forgot about it for a month. Turns out it deploys a LOT of NAT gateways.
You'll mostly find personal anecdotes like that, but you're dreaming if you think the risk is smaller in the workplace, where you're dealing with a larger, more complex architecture and have to consider things like developer turnover.
How platforms help
So, what's the alternative? Well, instead of setting up what I'll call a "bespoke architecture", you can use a platform like Heroku or Netlify that manages most of the DevOps toolchain for you. Of course, platforms don't totally eliminate the complexities of DevOps, and it doesn't have to be a stark dichotomy - you can use a platform alongside a service configured directly in a cloud provider like AWS. Still, I think it's generally valid to put platforms in one group and the more a la carte approach in another. And I think platforms deliver on their promise to make our lives massively easier. Let's take a look at how they address the pain points I mentioned above.
- Bespoke architectures have too many options. Platforms remove a lot of these questions, or at the very least give you a more obvious default. You can get started quickly and swap things out as the need arises.
- Bespoke architectures are difficult to debug. Because they encourage this consistency, I believe it's a lot easier to debug an issue with a platform. Chances are my teammates already have some experience with it. And even if they don't, I'm likely to find someone online who's had the same issue.
- It's hard to confidently destroy a service in a bespoke architecture. Here's a strained extension of the toy metaphor: platforms are Mr. Potato Head, and bespoke architectures are LEGO. You can be more creative with LEGO, but it's a lot harder to keep track of everything. Or, to ditch the metaphor: Heroku has Apps and Add-ons, and Add-ons are reliably coupled to Apps. So, it's relatively easy to keep track of what you're paying for and to fully delete an App. In AWS, you can choose a tagging convention to mirror this approach, but you might stray from that convention or share resources across apps. There are just fewer promises.
Resisting the urge
Despite these benefits of platforms, I've found the urge to go with a bespoke architecture inevitably rises up from time to time. Surely, there sometimes are good reasons to use or migrate to a bespoke architecture. But most of the time, I think the right call is to stick with a platform as long as you possibly can. Below are some common reasons for moving away from platforms that I've seen cited in my experience, as well as my responses to them (which, sadly, were not always my responses at the time).
- It's for beginners. There's sometimes a sense that platforms are for beginners, and that real professionals create more bespoke architectures. I think this is wrong and largely a confusion of causation and correlation. Yes, a lot of large, successful businesses have bespoke architectures and DevOps teams, but that's because they grew to a point where they needed it, not because they started with it. (see also, DHH on monoliths)
- But, if we're on Heroku I can't.... It's tempting to become fixated on a particular benefit you could get with a bespoke setup that isn't possible on a platform. Maybe you can cut your time to first paint in half, or try out a fancy new database. With a clear head, we might admit that the upside of this single tool would be dwarfed by all the aforementioned pitfalls of moving off a platform. But there's something that really bothers developers (myself included) about firm constraints. After all, one of the joys of coding is that pretty much anything is possible with enough time and dedication.
- We'll save so much money. Generally, it's much more expensive to run a service on a platform than on the ideal bespoke architecture. A service with large, unpredictable usage spikes will be much cheaper using a Function-as-a-Service offering like AWS Lambda. Large data processing jobs that aren't too time sensitive can be done at a significant discount with spot instances. BUT, there are also significant costs to managing those bespoke solutions. Most obviously, there's the personnel cost of extra engineering time. But there's also a cost in agility. It's harder to iterate quickly if you're trying to create the ideal architecture for every service you have. "Good enough" may actually be better, especially if you're focused on growth.
- If there's an outage, we're dead in the water. Outages are particularly frustrating if you are on a platform because you're left feeling helpless. There's also an argument that platforms are more prone to outages since they can be attractive targets for attackers. In my experience, though, platforms have been as reliable as (if not more reliable than) bespoke setups. This is at least partially because bespoke architectures have a corresponding problem: you can help yourself, and in fact you might have to. Resolution can take hours.
Perhaps I'm making an obvious point with all of this: small teams without obviously unique architecture needs should leverage DevOps platforms. They will be more productive and likely save money.
If you nonetheless find yourself being drawn by the siren call of a fancy bespoke architecture, I hope I've helped make the answer even more obvious: just use Heroku!