meshBlog

Advanced Terraform for Software Engineers

By Johannes Rudolph22. August 2022

Terraform is without a doubt one of the most popular infrastructure as code tools in use today. When we started using it more and more serious to build landing zones, I initially made the mistake of starting my understanding of it as “a tool for reconciling infrastructure with a better YAML language”. This assumption has held back my learning and left me unable to tap into its more advanced capabilities.

I have seen many other software engineers make that same mistake. Now, as software engineers on a DevOps team, terraform is not our home turf. There’s usually someone working on the infrastructure side that has a much better grasp of it, and we stay content learning as little terraform as we need to get our apps deployed. If your team is working on infrastructure slightly more complex than an S3 bucket and a few lambdas, that’s a terrible mistake. DevOps teams working with complex infrastructure can benefit greatly when ops and dev experts can bring their relative strengths together – infrastructure experts when it comes to configuring resources and developers when it comes to structuring large codebases with design techniques.

In this post I’m going to look at terraform from the perspective of a software engineer and what that perspective can bring to the table for the team. I hope this post will help software engineers understand terraform more quickly in terms of concepts they’re already familiar with – and similarly inspire infrastructure experts to learn more about software engineering principles for their terraform codebases.

What does advanced terraform mean anyway?

Terraform has a gentle learning curve in the beginning. Getting started with a simple configuration to deploy a basic application, e.g. with an EC2 instance and a lambda is just a couple of lines of code and straightforward commands like terraform apply.

As soon as your infrastructure needs and terraform code become more complex, solutions start to become less obvious. How do you split up your code into multiple modules? How can you implement complex logic in HCL, terraform’s configuration language? How do you manage refactoring configurations over time? All of these are well-known and well-solved problems for software engineers working in modern application languages.

Understanding HCL by analogy

Viewing terraform’s configuration language HCL through the lens of “a better YAML” is selling its capabilities short. I find it more helpful to view it as a domain-specific language with lazy evaluation for dependency tracking and python-style comprehensions and statefiles as its database. Let’s dismantle this in order.

HCL as a DSL with lazy evaluation

Domain-specific language means that the language has been built to express logic for a particular domain. In terraform’s case, that domain is expressing a desired state of inter-dependent cloud resources. Terraform cli as the execution engine for that DSL then wires together primitive cloud API calls (implemented by pluggable providers) to perform resource refresh, generate execution plans and finally executing them.

The lazy evaluation in HCL is an important distinction to eager evaluation in imperative programming languages most software engineers are familiar with. In Java for example, your code is evaluated line by line from top to bottom and every statement or expression is evaluated right when it is encountered Lazy evaluation on the other hand is more common in functional languages like Haskell and means that an expression is only evaluated when its needed. Lazy evaluation is a great fit for HCL because it allows the language to recognize and resolve dependencies between resources naturally, instead of forcing the programmer to bring expressions into a well-defined execution order.

Python-style comprehensions in HCL

As your terraform configurations become more complex, you’ll increasingly need to transform data between input variables and dependencies between different resources. As a language designed for declarative infrastructure as code, your first instinct may be to look for declarative list methods like map and filter as common in many mainstream languages like JavaScript or Java/Kotlin.

Not with HCL though. Oddly enough, HashiCorp chose a design inspired by python-style comprehensions that feels more imperative, contrasting the otherwise declarative paradigm of the language. As I wasn’t familiar with python style list and dictionary comprehensions before, I created (and still use) this javascript → HCL cheat sheet to quickly translate code that I’m trying to write to HCL.

💡 You can use terraform console and node REPLs to run and play with these examples

Here’s the javascript version

const list = [{id: 1, value: "a"}, {id: 2, value: "b"}, {id: 1, value: "c"}]

// map to extract a single value
> list.map(x => x.id)
[ 1, 2, 1 ]

// filter and map to transform objects
> list.filter(x => x.value !== "c").map(x => ({ id: x.id, upper: x.value.toUpperCase()}))
[ { id: 1, upper: 'A' }, { id: 2, upper: 'B' } ]

// build an object/map from a list
> Object.fromEntries(list.map(x => [x.id, x.value])
{ '1': 'a', '2': 'b' }

// group by - usually from a library like lodash
_.groupBy(list, x => x.id)
{ '1': ['a', 'c'], '2': ['b'] }

And the equivalent in hcl/terraform

locals {
  list = [{ id = 1, value = "a" }, { id = 2, value = "b" }, { id = 1, value = "c" }]
}

# map to extract a single value
> [for x in local.list: x.id]
[ 1, 2, 1,]

# map to transform objects
> [for x in local.list: {id = x.id, upper = upper(x.value)} if x.value != "c"]
[
  {
    "id" = 1
    "upper" = "A"
  },
  {
    "id" = 2
    "upper" = "B"
  }
]

# build an object/map from a list, two variants
# - using merge
> merge([for x in local.list: { (x.id) = x.value }]...)
{
  "1" = "a"
  "2" = "b"
}

# - using a map comprehension with the => syntax
> { for x in local.list : x.id => x.value }
│ Error: Duplicate object key

# group by
> { for x in local.list : x.id => x.value... }
{
  "1" = [ "a", "c" ]
  "2" = [ "b" ]
}

State files as a Database

Terraform records the actual state of cloud resources in a statefile. You can store statefiles on different backends (e.g. local files, GCS Buckets etc.), but their structure is always the same. Resources have an address (think ”primary key”) and each resource has an id (think “foreign key” to the cloud’s API). Resource providers decide what value they use as an id for each resource type, but in most cases it’s an id that uniquely identifies the resource in the cloud’s API. This “foreign key” also enables terraform to import existing resources from the cloud.

Translating Software Engineering Practices to Terraform

Refactoring and Database Migrations

As your terraform configurations grow in complexity, you’ll occasionally have the urge to just rename a few variables and resources. I mean, how hard could it be? Well, exactly as hard as you’re used to with the following analogies

  • refactoring locals – no problem, this code is internal to your terraform module. While IDE support for refactoring may not be as great as you’re used to, find/replace will mostly just work
  • refactoring resources – resources are stateful and you need to exercise the same level of caution as when you’re refactoring your domain model that’s persistent in a database (see above “state files as a database”). That means you will need the equivalent of a database migration. Luckily, the recently added moved block can express simple rename migrations. For more complex refactorings, you may need a custom migration script that leverages terraform state mv and related commands
  • refactoring variable and output – this is like changing the API of your code and the same caveats with regards to versioning and upgrading all consumers of the API apply

Abstractions and Functions

Where most application programming languages offer a range of lower- and higher-level abstraction mechanisms like functions, classes and generics, HCL only has modules. The lack of functions in particular means that writing reusable logic as modules involves quite a bit of boilerplate to pass in function arguments as variable, logic in local blocks, return as output and needing cumbersome module blocks for invocations. Meanwhile HCL offers a growing range of built-in functions until HashiCorp may address this need in the future.

I/O and data serialization (JSON, YAML)

When I still looked at HCL as a “better YAML”, I naively assumed that reading and writing files has no place in a language for declarative configuration. After all, reading and writing files is very… imperative. Turns out my intuition was completely wrong: terraform solves I/O in a very elegant and declerative way.

Terraform configurations can read files using the file function and even files matching glob patterns using the fileset function. Combining this with various decoding functions like jsondecode or yamldecode allows deserializing file contents to native HCL objects very easily.

The local_file resources enables writing files. One pitfall with the local_file resource is that it stores the file path in state, so you need to avoid using absolute paths if you plan on executing the terraform configuration from different machines.

SOLID Modules and Dependency Injection

As your terraform configurations grow, your team will face the challenge of breaking up the code into multiple smaller modules to tame complexity. Inspired by SOLID principles for object-oriented programming, here are some software engineering heuristics I have found useful to scope my modules

  • single source of truth: one component of your infrastructure should be defined in a single place only, not spread across twenty different modules
  • single responsibility principle: a module should have a "single" reason to change. E.g. if you want to change your network layout, that change should be isolated to the network module + the composition root (because of changed dependencies) but otherwise leave the other modules untouched. This is sometimes also described as an "axis of change", i.e. all changes along that axis should be in one module
  • dependency inversion: your modules should explicitly declare their dependencies (e.g. an RDS instance needs a VPC as an input variable)
  • composition roots: treat your main module as a composition root that injects dependencies to child modules
  • open/closed principle: design your module structure so that it’s open for extension, but closed for modification. E.g. if you’re deploying microservices, adding a new service should be another module (or module instance if the services are all the same) instead of modifying the internals of an existing module
  • low coupling/high cohesion: resources in a module should have high cohesion, e.g. because they have strong dependencies on another, share a privilege boundary and often need to change together (volatility)

Fear the null

Unfortunately, terraform has null and variables are even nullable by default. The only solace I can offer is that with terraform v1.1 and later you can explicitly declare variables to be non-nullable. You should strongly consider adopting this as a default in your terraform coding standard.

Try/catch, kind of

Since terraform allows null, it also has runtime errors. To deal with those, you have various means from the simple coalesce and coalescelist functions that are good for input handling to the more advanced try and can functions that are the closest to a try/catch.

Of course, nulls are not the only source of runtime exceptions. You can also get them from reading files with file or attempting YAML/JSON decoding.

Unit Testing

While there are many ways to test terraform, unit testing has very recently become a lot more attainable with the experimental test_assertions resource. Unit testing your code this way is a good first step on the testing pyramid as it requires low up-front engineering investment compared to integration and end-to-end acceptance tests

Terragrunt as a Terraform Build Tool

As your infrastructure grows, your team will likely encounter the need to break apart your infrastructure into multiple terraform configurations to contain blast radius or break dependencies between teams. This is similar to breaking application code into multiple libraries. At the end however, you still need to build cohesive applications from these libraries. You also want to be able to build applications for debug and release targets – similar to how you want to deploy your infrastructure for development and production environments.

Most modern application programming languages come with their own mature build and package management and tools to serve that need (e.g. Maven for Java). Terraform includes native package management via its registry, but lacks a higher order build tool to manage “builds” consisting of multiple terraform configurations or building across variations of different environments.

While terraform includes some useful features like workspaces, tfvar files and remote_state, most teams have to roll their own primitive build scripts using a combination of bash, make or other third party build scripts around those features. Instead of reinventing the wheel, we discovered that terragrunt provides us a solid foundation to orchestrate terraform builds with proper dependency tracking and quality-of-life features that help centralizing backend/provider configurations – all while using HCL to define the builds.

Like every build tool, terragrunt has a learning curve of its own. However, it enjoys healthy popularity in the terraform community and is a much more consistent and better alternative to home-grown, quirky and badly documented build scripts.

One of the downsides of using terragrunt is that it can create friction integrating additional build steps like formatting, linting (e.g. tflint) or security scanning (e.g. tfsec).

Terraform has a growing and vibrant ecosystem

Last but not least, I want to emphasize that terraform is an evolving ecosystem. As with any ecosystem, it’s worth following its evolution closely and adopting emerging features and tools when they address needs faced by your team.

If you have any further insights or tips and useful analogies for software engineers learning terraform, let me know in the comments.