Advice to an early-career DevOps engineer

This week I was asked a really fundamental question and I totally flunked. I didn't grasp its importance and here is my attempt to correct that mistake:

What advice would you give to an early-career DevOps?

I had been engrossed in writing automation scripts that morning, I naturally started thinking about the nuts and bolts of development and ended up pigeonholing the question into a very narrow scenario. It took me a while but I came up with a better answer:

Think of your code in its context

If you come from a developer background as I do, you may agree that we often feel at home in the boundaries of our IDE, Shell, and repos and fixate in details such as: Is my code easy to read? Can I write this as a one-liner? Is my program thread-safe? and of course, Are my tests passing? – These are important considerations but are insufficient. The days of throwing our code across the wall are long gone and shift-left security thinking is here to stay. Tactical thinking (and even strategical) about our code is necessary. It all sounds good, right? but what does "think of our code in context" actually mean? Let me explain...

Defense in depth

It's a widely studied concept in the Information Security community that has its roots in military strategy. In a nutshell, it proposes that a layered approach to security is more effective than any individual protection we can put in place. Leaving aside network and data security, this approach is highly relevant to developing cloud software.

As we smash monolithic applications, it is easy to think that our software is more secure, better provisioned, and more resilient than before. After all, there isn't a single point of failure and large tightly-coupled code bases are brittle. In reality, we need to adopt security from the moment we write the code and push it to the instant we deploy and beyond. An effective cloud security posture includes infrastructure protection, monitoring, remediation, and security invariants all while allowing fluid operations and keeping an eye on regulations and compliance. Yikes!

What is your code's Context?

Regardless of the size of the team you are working on, your code is your responsibility during its whole lifetime. Yes, there are different other roles and members and trust is of the essence but an early security mindset truly goes a long way.

So we need to ask a new whole set of questions, if we think of even a straightforward lambda function or microservice. What are the inputs, is it an API? you better sanitize that payload. What npm libraries are you using? You probably want to run a dependency scanner before committing that, code coverage is nice but did you remember setting MFA in your Github account? You get the idea, is not only about writing clean and efficient code.

Our code has already broken free from the monolith and we tend to believe that automation and managed platforms are strong guarantees but the shared responsibility model only takes us so far.

I try asking myself some key questions from the start. Where will that code run? Inside a Docker container? Then fewer vulnerabilities could be exploited using a distroless base image and we lose nothing from using them with a multi-stage build. Is our code part of a lambda function? Apply the least privilege principle to its CloudFormation template¹.

Here is another one, Where will that code live and be maintained? What container registry are you using and what guarantees your image won't be tampered with. Lambda functions are extremely easy to deploy, this also means that a mistake can be easily introduced into production, but there is hope if your setup includes blue/green deployments, bake times, and alarms.

Going a step further, What's your code interacting with? A misconfigured Kubernetes service can expose our cluster or an outdated package in a python web function could have serious consequences. These are easy to fix, but not as easy to maintain. These are the questions that describe our program's context.

As we move towards easier models to deploy complex infrastructure (see Helm charts, serverless pipelines, GitOps, cloud native integrations, and fully managed services) we tend to believe we can rely on our cloud provider or the latest scanning tool but at the end of the day nothing can replace a more holistic view of that line of code.