Highway seen from above

Advanced Terraform for Software Engineers

Terraform is without a doubt one of the most popular infrastructure as code tools in use today. When we started using it more and more serious to build landing zones, I initially made the mistake of starting my understanding of it as “a tool for reconciling infrastructure with a better YAML language”. This assumption has held back my learning and left me unable to tap into its more advanced capabilities.

I have seen many other software engineers make that same mistake. Now, as software engineers on a DevOps team, terraform is not our home turf. There’s usually someone working on the infrastructure side that has a much better grasp of it, and we stay content learning as little terraform as we need to get our apps deployed. If your team is working on infrastructure slightly more complex than an S3 bucket and a few lambdas, that’s a terrible mistake. DevOps teams working with complex infrastructure can benefit greatly when ops and dev experts can bring their relative strengths together - infrastructure experts when it comes to configuring resources and developers when it comes to structuring large codebases with design techniques.

In this post I’m going to look at terraform from the perspective of a software engineer and what that perspective can bring to the table for the team. I hope this post will help software engineers understand terraform more quickly in terms of concepts they’re already familiar with - and similarly inspire infrastructure experts to learn more about software engineering principles for their terraform codebases.

What does advanced terraform mean anyway?

Terraform has a gentle learning curve in the beginning. Getting started with a simple configuration to deploy a basic application, e.g. with an EC2 instance and a lambda is just a couple of lines of code and straightforward commands like terraform apply.

As soon as your infrastructure needs and terraform code become more complex, solutions start to become less obvious. How do you split up your code into multiple modules? How can you implement complex logic in HCL, terraform’s configuration language? How do you manage refactoring configurations over time? All of these are well-known and well-solved problems for software engineers working in modern application languages.

Understanding HCL by analogy

Viewing terraform’s configuration language HCL through the lens of “a better YAML” is selling its capabilities short. I find it more helpful to view it as a domain-specific language with lazy evaluation for dependency tracking and python-style comprehensions and statefiles as its database. Let’s dismantle this in order.

HCL as a DSL with lazy evaluation

Domain-specific language means that the language has been built to express logic for a particular domain. In terraform’s case, that domain is expressing a desired state of inter-dependent cloud resources. Terraform cli as the execution engine for that DSL then wires together primitive cloud API calls (implemented by pluggable providers) to perform resource refresh, generate execution plans and finally executing them.

The lazy evaluation in HCL is an important distinction to eager evaluation in imperative programming languages most software engineers are familiar with. In Java for example, your code is evaluated line by line from top to bottom and every statement or expression is evaluated right when it is encountered Lazy evaluation on the other hand is more common in functional languages like Haskell and means that an expression is only evaluated when its needed. Lazy evaluation is a great fit for HCL because it allows the language to recognize and resolve dependencies between resources naturally, instead of forcing the programmer to bring expressions into a well-defined execution order.

Python-style comprehensions in HCL

As your terraform configurations become more complex, you’ll increasingly need to transform data between input variables and dependencies between different resources. As a language designed for declarative infrastructure as code, your first instinct may be to look for declarative list methods like map and filter as common in many mainstream languages like JavaScript or Java/Kotlin.

Not with HCL though. Oddly enough, HashiCorp chose a design inspired by python-style comprehensions that feels more imperative, contrasting the otherwise declarative paradigm of the language. As I wasn’t familiar with python style list and dictionary comprehensions before, I created (and still use) this javascript → HCL cheat sheet to quickly translate code that I’m trying to write to HCL.

💡 You can use terraform console and node REPLs to run and play with these examples

Here's the javascript version

const list = [{id: 1, value: "a"}, {id: 2, value: "b"}, {id: 1, value: "c"}]

// map to extract a single value
> list.map(x => x.id)
[ 1, 2, 1 ]

// filter and map to transform objects
> list.filter(x => x.value !== "c").map(x => ({ id: x.id, upper: x.value.toUpperCase()}))
[ { id: 1, upper: 'A' }, { id: 2, upper: 'B' } ]

// build an object/map from a list
> Object.fromEntries(list.map(x => [x.id, x.value])
{ '1': 'a', '2': 'b' }

// group by - usually from a library like lodash
_.groupBy(list, x => x.id)
{ '1': ['a', 'c'], '2': ['b'] }

And the equivalent in hcl/terraform

locals {
  list = [{ id = 1, value = "a" }, { id = 2, value = "b" }, { id = 1, value = "c" }]
}

# map to extract a single value
> [for x in local.list: x.id]
[ 1, 2, 1,]

# map to transform objects
> [for x in local.list: {id = x.id, upper = upper(x.value)} if x.value != "c"]
[
  {
    "id" = 1
    "upper" = "A"
  },
  {
    "id" = 2
    "upper" = "B"
  }
]

# build an object/map from a list, two variants
# - using merge
> merge([for x in local.list: { (x.id) = x.value }]...)
{
  "1" = "a"
  "2" = "b"
}

# - using a map comprehension with the => syntax
> { for x in local.list : x.id => x.value }
│ Error: Duplicate object key

# group by
> { for x in local.list : x.id => x.value... }
{
  "1" = [ "a", "c" ]
  "2" = [ "b" ]
}

State files as a Database

Terraform records the actual state of cloud resources in a statefile. You can store statefiles on different backends (e.g. local files, GCS Buckets etc.), but their structure is always the same. Resources have an address (think ”primary key”) and each resource has an id (think “foreign key” to the cloud’s API). Resource providers decide what value they use as an id for each resource type, but in most cases it’s an id that uniquely identifies the resource in the cloud’s API. This “foreign key” also enables terraform to import existing resources from the cloud.

Translating Software Engineering Practices to Terraform

Refactoring and Database Migrations

As your terraform configurations grow in complexity, you’ll occasionally have the urge to just rename a few variables and resources. I mean, how hard could it be? Well, exactly as hard as you’re used to with the following analogies

  • refactoring locals - no problem, this code is internal to your terraform module. While IDE support for refactoring may not be as great as you’re used to, find/replace will mostly just work
  • refactoring resources - resources are stateful and you need to exercise the same level of caution as when you’re refactoring your domain model that’s persistent in a database (see above “state files as a database”). That means you will need the equivalent of a database migration. Luckily, the recently added moved block can express simple rename migrations. For more complex refactorings, you may need a custom migration script that leverages terraform state mv and related commands
  • refactoring variable and output - this is like changing the API of your code and the same caveats with regards to versioning and upgrading all consumers of the API apply

Abstractions and Functions

Where most application programming languages offer a range of lower- and higher-level abstraction mechanisms like functions, classes and generics, HCL only has modules. The lack of functions in particular means that writing reusable logic as modules involves quite a bit of boilerplate to pass in function arguments as variable, logic in local blocks, return as output and needing cumbersome module blocks for invocations. Meanwhile HCL offers a growing range of built-in functions until HashiCorp may address this need in the future.

I/O and data serialization (JSON, YAML)

When I still looked at HCL as a “better YAML”, I naively assumed that reading and writing files has no place in a language for declarative configuration. After all, reading and writing files is very… imperative. Turns out my intuition was completely wrong: terraform solves I/O in a very elegant and declerative way.

Terraform configurations can read files using the file function and even files matching glob patterns using the fileset function. Combining this with various decoding functions like jsondecode or yamldecode allows deserializing file contents to native HCL objects very easily.

The local_file resources enables writing files. One pitfall with the local_file resource is that it stores the file path in state, so you need to avoid using absolute paths if you plan on executing the terraform configuration from different machines.

SOLID Modules and Dependency Injection

As your terraform configurations grow, your team will face the challenge of breaking up the code into multiple smaller modules to tame complexity. Inspired by SOLID principles for object-oriented programming, here are some software engineering heuristics I have found useful to scope my modules

  • single source of truth: one component of your infrastructure should be defined in a single place only, not spread across twenty different modules
  • single responsibility principle: a module should have a "single" reason to change. E.g. if you want to change your network layout, that change should be isolated to the network module + the composition root (because of changed dependencies) but otherwise leave the other modules untouched. This is sometimes also described as an "axis of change", i.e. all changes along that axis should be in one module
  • dependency inversion: your modules should explicitly declare their dependencies (e.g. an RDS instance needs a VPC as an input variable)
  • composition roots: treat your main module as a composition root that injects dependencies to child modules
  • open/closed principle: design your module structure so that it's open for extension, but closed for modification. E.g. if you're deploying microservices, adding a new service should be another module (or module instance if the services are all the same) instead of modifying the internals of an existing module
  • low coupling/high cohesion: resources in a module should have high cohesion, e.g. because they have strong dependencies on another, share a privilege boundary and often need to change together (volatility)

Fear the null

Unfortunately, terraform has null and variables are even nullable by default. The only solace I can offer is that with terraform v1.1 and later you can explicitly declare variables to be non-nullable. You should strongly consider adopting this as a default in your terraform coding standard.

Try/catch, kind of

Since terraform allows null, it also has runtime errors. To deal with those, you have various means from the simple coalesce and coalescelist functions that are good for input handling to the more advanced try and can functions that are the closest to a try/catch.

Of course, nulls are not the only source of runtime exceptions. You can also get them from reading files with file or attempting YAML/JSON decoding.

Unit Testing

While there are many ways to test terraform, unit testing has very recently become a lot more attainable with the experimental test_assertions resource. Unit testing your code this way is a good first step on the testing pyramid as it requires low up-front engineering investment compared to integration and end-to-end acceptance tests

Terragrunt as a Terraform Build Tool

As your infrastructure grows, your team will likely encounter the need to break apart your infrastructure into multiple terraform configurations to contain blast radius or break dependencies between teams. This is similar to breaking application code into multiple libraries. At the end however, you still need to build cohesive applications from these libraries. You also want to be able to build applications for debug and release targets - similar to how you want to deploy your infrastructure for development and production environments.

Most modern application programming languages come with their own mature build and package management and tools to serve that need (e.g. Maven for Java). Terraform includes native package management via its registry, but lacks a higher order build tool to manage “builds” consisting of multiple terraform configurations or building across variations of different environments.

While terraform includes some useful features like workspaces, tfvar files and remote_state, most teams have to roll their own primitive build scripts using a combination of bash, make or other third party build scripts around those features. Instead of reinventing the wheel, we discovered that terragrunt provides us a solid foundation to orchestrate terraform builds with proper dependency tracking and quality-of-life features that help centralizing backend/provider configurations - all while using HCL to define the builds.

Like every build tool, terragrunt has a learning curve of its own. However, it enjoys healthy popularity in the terraform community and is a much more consistent and better alternative to home-grown, quirky and badly documented build scripts.

One of the downsides of using terragrunt is that it can create friction integrating additional build steps like formatting, linting (e.g. tflint) or security scanning (e.g. tfsec).

Terraform has a growing and vibrant ecosystem

Last but not least, I want to emphasize that terraform is an evolving ecosystem. As with any ecosystem, it’s worth following its evolution closely and adopting emerging features and tools when they address needs faced by your team.

If you have any further insights or tips and useful analogies for software engineers learning terraform, let me know in the comments.


Kid plays in sandbox

Empower Developers with Cloud Sandbox Environments

One of the most powerful opportunities that a well-run Cloud Foundation offers is to give developers a new cloud account by the snap of a finger. When I work with customers and explain to them the concept of Cloud Sandbox Environments, the most common reaction I get is “Do we even need this?” Well, a cloud sandbox for developers is one of those things that you truly only know you need once you’ve experienced it.

In this post I’m going to show you an actual (non-fictional) example how having not only one, but two developer cloud sandboxes for different cloud platforms at the snap of a finger made my life a lot easier. Essentially I just want to serve 5 terabytes megabytes worth of static files...

Setting up a Cloud Sandbox

I was quickly slapping together a small angular application to help one of our customers write a service catalog for integration into the service marketplace: a live-preview editor that renders a catalog and the parameter forms side-by-side. You can find this and a link to the source code on our product feedback board.

Setting up a cloud sandbox

So I had this up and running quickly, and now I just wanted to throw this tool up on a cloud storage bucket, sent them the link and be done with it. I’ve recently done a bit of work with Google Cloud Storage (GCS) and had the command for gsutil cp ... right in my head, so I quickly created a new GCP project in meshStack, and no more than 60 seconds later I was copying that angular tool over to a new cloud storage bucket.

Unipipe Panel

But ... duh! Accessing the file on the public URL at https://storage.googleapis.com/unipipe-catalog-preview/index.html did not work - it seems the app tries to retrieve all javascript bundles relative to the domain like https://storage.googleapis.com/main.js and obviously that gives me a HTTP 404.

Okay, can I fix this quickly? I’m in a hurry and have other stuff to do. So like everyone that hasn’t lived under a rock the past ten years I go to stackoverflow, and find an answer saying as much as RTFM. Nope, not gonna do that, clock’s ticking!

While I really like Google Cloud, the root of the problem here is that GCS wants to host my bucket at a relative path instead of hosting on its own domain like https://unipipe-catalog-preview.storage.googleapis.com. Now, I could figure out if I can fix that, but I also know that AWS S3 gives buckets their own domain. So, let’s ditch GCP and go to AWS instead. Yay, multi-cloud FTW! (Sorry GCP, but I promise we’re still good friends).

How to add a tenant

A few seconds later I have my AWS account ready to sign in. Magic! Since I already had a meshProject that had the GCP project in it, meshStack derived a desired state for the new AWS Account with an AWS IAM Role for my user, AWS SSO + AWS CLI integration. I could have literally not created that AWS Account any faster on the cloud console.

Admin Access Panel

Alright, create a bucket with a public access policy, aws s3 cp unipipe-preview s3://unipipe-catalog-preview --recursive on the cli aaaand done!

Here it is: https://unipipe-catalog-preview.s3.eu-central-1.amazonaws.com/index.html

Cloud Sandboxes Empower Engineers to Build

As I hope I could convey in this post, the ability to move fast and break things getting things done is important to empower developers to do their jobs. Having one cloud at your fingertips is cool, having any cloud available at the snap of a finger (or: api call) is pure magic.

Unfortunately, that type of power is not available to most developers today. Only few organizations have a “time to cloud” measured in minutes - most often, it takes weeks or days at best. In IT we focus on making the hard things easy, but tend to forget that doing so sometimes makes the easy things unnecessarily hard. If you haven’t watched the “I just want to serve 5 terabytes” video I mentioned earlier above - you’re in for a good laugh.

Cloud Foundations Provide Cloud Sandboxes

Mind you, most of the ceremony involved in setting up a new application on the cloud have their purpose and place. They’re not designed out of evil spirit. But building a cloud foundation right, we can establish a clear Shared Responsibility Model, trust our engineers to make the right calls about the Cloud Zones they want to put their applications into and then empower them with access to the infrastructure they need. Having the right controls in place like automated landing zone enforcement and cost management, enables us to have a lot of agility and control at the same time.

If empowering engineers to build a better future sounds like something you want to work on - and you’d fancy a workplace that gives you the freedom to get stuff done - we’re hiring!


Life as a meshi: What is a Customer Result

Since Customer Results are such a central concept to our way of work, this post will explain what they are, why we use them and how we organize our work around them.

Customer Results Focus on the Outcome

Outcomes Graphic

As the name gives away, customer results are about just two things: customers, and results.

This means that when describing customer results, we focus on the outcome for our current and future customers. Anything else like "how can we achieve this?" or "what does this mean for our product?" are not relevant at this stage yet.

We love customer results because they provide...

  1. focus: To any problem out there, there are thousands of solutions. Especially when you work in a great team. That's why it's important to focus on the challenge first. What is our biggest challenge at the moment that keeps us from succeeding? What is the most valuable thing we can do right now? All those things are easier to determine when you start at the outcome that you want to achieve instead of all of the solutions to all problems we could solve.
  2. flexibility: Customer results are being worked on by self-organizing teams who have the freedom and flexibility to decide on the best way to achieve a certain result. That also means that we do not narrow down the solution before the team starts working on it.
  3. empathy: We describe challenges faced by our customers using real people and real stories instead of idealized personas. And a customer result describes what the outcome will mean to them.

Focus Graphic

Since customer results are about achieving tangible and meaningful outcomes, they are typically no small feats. Most of our customer results span between 4 to 12 weeks worth of dedicated team work.

How we Write a Customer Result

Our inspiration for customer results comes from our daily interactions with customers. For example, we collect insights from customer success meetings and collect feedback from all users via our public feedback portal. No matter from which interaction inspiration comes from – anyone at meshcloud can propose a customer result.

Titles, Titles, Titles

The first step to a good customer result is to pick a good title. We like catchy phrases that transport empathy. Let's take a real example from our backlog right now:

Landig Zone Management like a boss
What's great about this title? Suppose you're on the team working on this customer result, what is your mission? To make the people using meshStack to manage their landing zones feel like a boss!

A traditional scrum backlog would have had a couple of user stories like "as a platform operator, I want a button to ... so I can ...". Not so our customer result. This is about making someone feel like a boss! That's a clear cut mission statement: the team owns the entirety of discovering, deciding and delivering on the solution. What matters is that the team is able to demonstrate that the goal has been achieved. Working directly with customers makes this easy too – in this case just ask your customers if they feel like a boss using your solution!

Working backward

While a title sets the general direction, we do of course describe the challenge and outcome in more detail. That's why this description starts in the spirit of working backwards with a fictional press release or story that describes the outcome. In those statements, we often use fictional things that we hope real people we work with might say about the outcome like "Peter goes to twitter to rave about his newfound landing zone power: just deployed a new AWS Config rule to all our 500 AWS accounts with meshStack Landing Zones, was a breeze!"

This is a success when...

While empathy and qualitative results are important, we always explore options to quantify and validate results. Like acceptance criteria in traditional user stories, this section of a customer result describes the success criteria we hope to meet. This can leverage metrics and other sources of feedback as well.

It's all in the team

Who do we need to succeed in delivering this customer result? This is where a description of the customer result dream team comes in.

When working on a result that affects a particular type of customer situation, the core of the dream team is to bring in meshis with first-hand expertise of these situations. In our "Landing Zone Management Like a Boss" example above, we would want to bring in colleagues from the solutions team that help customers build and configure landing zones first hand. We also bring in someone from the engineering team that knows a lot about how meshStack handles and orchestrates landing zones, as well as a meshi from the growth flock that is responsible for the communication of these newly achieved capabilities to the outside.

Priorization with RICE

There are so many good ideas to work on at any moment that it's hard to pick the next thing. To help us guide these decisions, we use the RICE framework. Simply put, we describe the dimensions

  • Reach: Who is this result affecting?
  • Impact: What "magnitude" of impact does this result have?
  • Confidence: How confident are we in our assessment of reach, impact and effort?
  • Effort: How much effort do we estimate this is for a team working full time on it?

One positive effect of applying RICE to all customer results is that we ensure that all dimensions have been thought of and described against an established frame of reference. Of course, the simplicity of the scoring Score = Reach * Impact * Confidence / Effort model is misleading. It's not as easy as picking the results with the highest scores. We also consider alignment with strategic goals and other criteria. Nonetheless we find the method provides good comparability of customer results and it has proven its worth as a basis for making difficult priorization decisions.

Product Managers guide the Process

Remember earlier we said that every meshi can propose a new customer result? As a meshi proposing a customer result, our product managers are here to help and guide you through the customer result process. They provide constructive feedback on customer results and help refine them to a state where they become plannable/priorizable.

They also maintain a global board of Customer Results visible to everybody in the company. This allows us to see exactly what everyone is working on at any time and why. We also regularly publish a high-level roadmap externally, to provide transparency of what we work on to our customers and externals.

Do you want to empower humans to build a better future? Then we'd love to meet you!


meshcloud Jobs

The meshcloud Way of Work

At meshcloud we empower humans to build a better future. When you hear this vision for the first time, you may think it sounds incredibly ambitious and completely intangible at the same time. Yet, we mean and live this in everything that we do. In this post I want to describe how our way of work connects this ambitious vision to our everyday routines. You will learn why we work on cloud foundations, how we go about this and how our work empowers our meshis and our customers to build a better future every day.

Why

We believe that computing is an essential means of production in the 21st century and that cloud computing is the best way to deliver it. meshcloud builds a Cloud Foundation Platform that enables organizations to build on the endless possibilities of the cloud while protecting their digital freedom. Cloud Foundations help organizations deliver cloud environments of any provider and any technology to thousands of teams while staying in control of their infrastructure and reducing complexity. Our platform provides essential cloud foundation capabilities: cloud-native technology access, with cross-platform Identity and Access Management, Tenant Management, Compliance & Security and Cost Management covered.

The challenges that Cloud Foundations solve become most apparent in organizations that are fully committed on cloud transformation and embrace multi-cloud at scale. Consequently, our customers are among the largest organizations in the world and recognized as cloud transformation leaders in their industry.

Building Cloud Foundations is an exciting new challenge. Customer needs are rapidly evolving while organizations discover the challenges imposed by leveraging multi-cloud at scale. In fact, we believe that almost every larger organization will have to discover and solve these challenges within the next 5 years as multi-cloud adoption becomes more ubiquitous. And when that happens, we want them to think of meshcloud to solve them. This insight informs how we work at meshcloud and embrace every customer relationship as an opportunity to learn something new about the best way to build solid Cloud Foundations.

Customer Empathy

If there's one super-power that we look for in every meshi working in our team it\'s customer empathy.

Customer empathy is understanding the underlying needs and feelings of customers. It goes beyond recognizing and addressing their tactical requirements and puts things into further context by viewing things from their perspective. [...] Customer empathy sees users as real people and not just individuals trying to do something. It rounds out customers into whole people, provides a larger context for how products and solutions fit into the much broader ecosystem of their lives, their jobs, and their environment.
https://www.productplan.com/glossary/customer-empathy/

For a startup looking to solve an entirely new category of problems, empathy for current and prospect customers is a real super-power. Customer empathy empowers every meshi to

  • iterate faster by collaborating and communicating on customer challenges with more context and deeper understanding
  • drive initiatives more successfully and with higher quality by making better decisions that anticipate customers’ perspectives

This is why we have ingrained customer empathy into our own implementation of an agile way of working.

The meshcloud model: Empowering meshis to build a better future

We embrace people and interactions over tools and processes (Agile Manifesto). That\'s why our way of working is handsomely agile, yet does not fit to any popular method like scrum. Our way of work rests on three pillars

  • We prioritize work using Customer Results: a description of a challenge and desired outcomes that we want to achieve for our customers
  • Customer Result Teams: Cross-functional teams empowered to make all decisions necessary to achieve this outcome
  • Flocks own a functional area of responsibility and its routines – championing cohesion and professional excellence

Customer

Working cross-functional by default in teams centered on delivering customer outcomes empowers every meshi to build a close connection and deep understanding of our customers. Of course, building customer empathy requires communication. Whereas purely functional organizations attempt to funnel customer touchpoints through specialized functions like support and customer success, our way of work enables high band-width communication for all functions. Championing customer empathy in this way helps us understand what building a better future means to our customers – and deliver on that mission.

We'll be sharing more on our way of work in the future and we would love to see you here again.

Learn More On The Way We Work


Why we're sponsoring the Dhall Language Project

We're very happy to announce that meshcloud is the first corporate sponsor of the Dhall Language Project via open collective. In this post I want to explain how we came to use Dhall at meshcloud, what challenges it solves for us and why we hope it will play a role in enabling software projects to more easily adapt to multi-cloud environments.

Enabling DevOps at scale

At the beginning of this year, we realized we had a challenge scaling configuration and operations of our software for customers. meshcloud helps enterprises become cloud-native organizations by enabling "DevOps at scale". Our tool helps hundreds or thousands of DevOps teams in an enterprise to provision and manage cloud environments like AWS Accounts or Azure Subscriptions for their projects while ensuring they are secured and monitored to the organization's standards.

Enabling DevOps teams with the shortest "time to cloud" possible involves the whole organization. Our product serves DevOps teams, IT Governance, Controlling and IT Management in large enterprises. That means meshcloud is an integration solution for a lot of things, so we need to be highly configurable.

Because we also manage private clouds (OpenStack, Cloud Foundry, OpenShift etc.) we often run on-premises and operate our software as a managed service. This presents unique challenges for our SRE team. Not only do we need to maintain and evolve configuration for our growing number of customers, but we also need to support deploying our own software on different infrastructures like OpenStack, AWS or Azure[1].

At the end of the day, this boils down to having good and scalable configuration management. After going through various stages of slinging around YAML with ever more advanced tricks, we realized we needed a more fundamental solution to really crack this challenge.

Configuration management at scale - powered by dhall

The Dhall configuration language solves exactly this problem. It\'s a programmable configuration language that was built to express configuration - and just that. Dhall is decidedly not turing complete. It\'s functional nature makes configuration easy to compose from a set of well-defined operations and ensures that configuration stays consistent.

Using Dhall allows us to compile and type check[2] all our configuration for all our customers before rolling things out. We use Dhall to compile everything we need to configure and deploy our software for a customer: Terraform, Ansible, Kubernetes templates, Spring Boot Config. We even use Dhall to automatically generate Concourse CI pipelines for continuous delivery of our product to customers.

Since adopting Dhall earlier this year, we measurably reduced our deployment defect rate. We feel more confident about configuration changes and can safely express configuration that affects multiple services in our software.

Empowering a Multi-Cloud Ecosystem

We believe that open-source software and open-source cloud platforms are crucial for enabling organizations to avoid vendor lock-in. Now that mature tools like Kubernetes exist and can do the heavy lifting, enabling portability between has become a configuration management challenge.

What we found especially interesting about Dhall is that it\'s not just an "incremental" innovation atop of existing configuration languages like template generators, but instead looks at the problem from a new angle and tries to solve it at a more fundamental level. This is something we can relate to very well as we\'re trying to solve multi-cloud management using an organization as code (like infrastructure as code) approach.

That's why we\'re happy to see Dhall innovating in this space and reached out to the Dhall community to explore ways we can support the project. We hope that providing a steady financial contribution will allow the community to further evolve the language, tooling and its ecosystem.

Footnotes:

  • [1]: In this way meshcloud is not only a multi-cloud management software but is also a multi-cloud enabled software itself.

  • [2]: Dhall purists will want to point out that expressions are not compiled, instead they\'re normalized.


How to apply for a technical role at meshcloud

This post from 2019 still represents the spirit of our hiring process for technical roles. However, please find the most up to date description of our interview process on our careers page.

In this post we want to give you an overview of our values and interview process hiring for technical full-time positions in our team. We hope this guide helps you navigate the process successfully and answers your questions. Should you have any more questions, please don't hesitate to reach out at jobs@meshcloud.io.

We believe that hiring is as much about us getting to know you than it is about you getting to know us. Our application and interview process is thus designed to give both of us a chance to evaluate your fit for a position at meshcloud.

Overview and TL;DR

  • Application with CV and Portfolio (Github, Stackoverflow, etc.)
  • Phone Interview
  • On-Site Interview at our Office & Design Exercise
  • On-Site "MVP Test" with your future colleagues
  • Feedback and offer

Stage 0: Your Application

Present yourself and your skills in the best possible light. Let us know why you're interested in working for meshcloud and consider yourself a good fit for our team. Tell us about your values, achievements and contributions you have made in prior roles. If you're a recent graduate, tell us about a project you've worked on that you're proud of. Even more than your concise resume, we like seeing a sample of your work and abilities. Send us a link to your projects, your stackoverflow or github profile.

Please do not include information on your resume that we don't need to evaluate your application. All that matters to us is your qualifications and personality. We do specifically ask you to not include a photo, gender, marital status or religious orientation.

When we do list the technologies we work with in our job profiles, we always separate between "must have" skills and "nice to have" skills. We believe that every technical skill is valuable. So while we may not use [insert obscure language] right now, there's a good chance you have learned something valuable and transferrable using it. So, please do include it on your CV! We're open-minded when it comes to integrating new tech & tools into our stack. Our most recent addition is dhall.

1st Stage: Phone Interview

You'll meet one on one with the future manager of your position for a structured 30 minute phone interview. We expect you to tell us briefly about yourself and your experience. We'll discuss the role and answer any questions you may have about the position. The second half of the interview is a set of technical questions that helps us get an indication of your skill level in competence areas relevant for the job. We're not looking for textbook answers and you should not prepare specifically for this part.

2nd Stage: On-Site Interview

The on-site interview typically lasts for 2-3 hours. You'll get to visit our office and meet members of the team you may be working with in the future! You'll also meet members of other teams at meshcloud.

We'll discuss in-depth about your prior experience and will together walk through a technical design exercise appropriate for the role. We use this exercise to see your problem-solving process and how you leverage your experience, skills and knowledge to solve the task at hand. This may also involve some whiteboarding or scribbling on paper, but we'll not ask you to come up with syntactically correct code on paper. The challenges are hands-on and real things we're working on, so they will allow you to discover about the things we work on and how our tech stack looks like.

3rd Stage: MVP Build

We don't belive take-home "coding exercises" or "coding tests" provides you with a good idea of how we work at meshcloud. Instead, we want to give you a chance to experience being a part of the team and see how we work first hand.

So what we do is that we will together develop a small and focused "minimum viable product" (MVP) related to your role. We typically start in the morning and walk through the task at hand. The goal is to produce a working prototype in 3 hours.

When building the prototype, we totally expect you'll have to cut some corners. Of course you'll discuss requirements and implementation questions with your colleagues. When we start in the morning, we invite you out for lunch with the team and review your results together after we return. In review you'll present your approach and results, tell us about the corners you cut and what would be left to finish the work to your own quality standards.

If you're a remote candidate or considering relocation to work with us in Frankfurt am Main, we will try to schedule the On-Site Interview and MVP Test for the same day.

Final Stage: Feedback and Offer

You'll typically hear from us within a week of your interview whether we want to offer you the position. We know that you'd rather hear sooner than later whether your interview with us was a success. However, we want to make sure we give every application the attention it deserves. After your interview we collect feedback from all colleagues that got a chance to know you. We also give our colleagues some time to make up their mind and offer additional feedback after the experience has settled for a bit.

You want to learn more about us?

Please check our open positions.


computer and display

GPU Acceleration for Chromium and VSCode

At meshcloud we believe in using the best tools money can buy. Most developers in our team use Linux Workstations with 2x27" 4K displays as their daily drivers. Working with text all day is certainly less straining on a 4K (or hi-dpi) display: the characters are sharper and more easily readable.

Unfortunately, Chromium and consequentally all electron based apps like VSCode disable GPU acceleration on Linux, claiming Linux GPU drivers too buggy to support. If you're running a traditional Full-HD screen, falling back on (non-accelerated) software rendering is not a big deal. But for a 4K display your CPU has to push 4 times the amount of pixels and that can quickly lead to unpleasant input lag when working on code or a sluggish feeling browser. And all of that on powerful machines with tons of CPU cores, RAM and blazingly fast SSDs. Certainly not how a developer workstation should feel like in 2019.

With a clever combination of flags, you can force Chromium and VSCode to use GPU acceleration instead. You may experience a couple of graphic glitches here and there, but that's a small price to pay for a much more responsive browser and text editor. The settings below worked for me and my machine running Fedora 30 and made Chromium and VScode much more enjoyable to use.

Chromium

For chromium, use these flags (depending on your distro, you can also write them to a launcher or config file):

chromium-browser --ignore-gpu-blacklist --enable-gpu-rasterization --enable-native-gpu-memory-buffers

This should results in the following acceleration status on chrome://gpu:

Canvas: Hardware accelerated
Flash: Hardware accelerated
Flash Stage3D: Hardware accelerated
Flash Stage3D Baseline profile: Hardware accelerated
Compositing: Hardware accelerated
Multiple Raster Threads: Enabled
Native GpuMemoryBuffers: Hardware accelerated
Out-of-process Rasterization: Disabled
Hardware Protected Video Decode: Hardware accelerated
Rasterization: Hardware accelerated
Skia Renderer: Disabled
Surface Control: Disabled
Surface Synchronization: Enabled
Video Decode: Hardware accelerated
Viz Service Display Compositor: Enabled
Viz Hit-test Surface Layer: Disabled
WebGL: Hardware accelerated
WebGL2: Hardware accelerated

VSCode

You can use the same flags on VScode (tested on v1.36) too to get GPU acceleration.

code --ignore-gpu-blacklist --enable-gpu-rasterization --enable-native-gpu-memory-buffers

You can check acceleration status using code --status while you have another instance of the editor already running. This should result in:

Version:          Code 1.36.1 (2213894ea0415ee8c85c5eea0d0ff81ecc191529, 2019-07-08T22:55:08.091Z)
OS Version:       Linux x64 5.1.17-300.fc30.x86_64
CPUs:             Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (12 x 800)
Memory (System):  31.29GB (22.39GB free)
Load (avg):       1, 1, 1
VM:               0%
Screen Reader:    no
Process Argv:     --enable-gpu-rasterization --enable-native-gpu-memory-buffers
GPU Status:       2d_canvas:                     enabled
                  flash_3d:                      enabled
                  flash_stage3d:                 enabled
                  flash_stage3d_baseline:        enabled
                  gpu_compositing:               enabled
                  multiple_raster_threads:       enabled_on
                  native_gpu_memory_buffers:     enabled
                  oop_rasterization:             disabled_off
                  protected_video_decode:        unavailable_off
                  rasterization:                 enabled
                  skia_deferred_display_list:    disabled_off
                  skia_renderer:                 disabled_off
                  surface_synchronization:       enabled_on
                  video_decode:                  unavailable_off
                  viz_display_compositor:        disabled_off
                  webgl:                         enabled
                  webgl2:                        enabled

Note that I haven't bothered with out of process rasterization yet. You can enable this using a flag too. But it appears not to be available on e.g. MacOS either, so I don't expect it will make a big difference performance-wise.


Timestamp Initialization

MySQL Timestamp Initialization

During testing of a new migration, we discovered that timestamps in an audit table were suddenly reset to the same timestamp (close to now). That tripped up quite some nerves. After some investigation, it turned out that MySQL and MariaDB may have dangerous default behavior when working with columns of type timestamp. Suppose you declare a an audit event table like this:

CREATE TABLE `Event` (
  `id` varchar(128) NOT NULL,
  `createdOn` timestamp NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;

When reading back the table definition (e.g. via MySQL Workbench), you will find it's actually:

CREATE TABLE `Event` (
  `id` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL,
  `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Note the automatic addition of DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP for the createdOn column. This creates two problems:

  • the timestamp values set by your code may not match the timestamps stored in the database
  • the timestamps on the table become mutable, i.e. in case a migration runs an UPDATE on the table all existing timestamps will be overwritten.

The source of this behavior is Automatic Initialization and Updating for TIMESTAMP and DATETIME, which is controlled by the explicit_defaults_for_timestamp configuration variable and also depends on the active SQL Mode.

To see your active configuration, run this SQL on your active connection

SHOW Variables WHERE Variable_name = "explicit_defaults_for_timestamp";
SELECT @@GLOBAL.sql_mode;
SELECT @@SESSION.sql_mode;

In our case, explicit_defaults_for_timestamp was off, which specifically results in the observed behavior:

The first TIMESTAMP column in a table, if not explicitly declared with the NULL attribute or an explicit DEFAULT or ON UPDATE attribute, is automatically declared with the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP attributes.

Another caveat: Your database team or provider may not have explicitly configured explicit_defaults_for_timestamp. To add insult to injury, the default value for this variable depends on your MySQL Version.

MySQL Version default
MySQL >= 8.0.2 ON
MySQL <= 8.0.1 OFF
MariaDB >= 10.1.8 OFF

Explicitly Controlling Timestamp Initialization

It\'s bad news when the behavior of your application depends on a database configuration variable outside of your team\'s direct control. We thus have to fix up our tables right after creating them as suggested in this stackoverflow answer and adapted for MariaDB:

ALTER TABLE `Event` MODIFY COLUMN `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP;
ALTER TABLE `Event` ALTER COLUMN `createdOn` DROP DEFAULT;

Note: These statements are idempotent, i.e. we can safely run them even if explicit_defaults_for_timestamp is ON and our table will have the desired state, i.e. a timestamp column with no DEFAULT and no ON_UPDATE clause.

We also added a test to our migration test suite that verifies all timestamp columns are created as intended and no hidden behavior messes with our column definitions.


Running Cron Jobs on Cloud Foundry

Many cloud applications need to run some sort of scheduled tasks for chores like processing statistics or doing internal housekeeping. There are two different strategies to implement scheduled tasks to cloud applications running on Cloud Foundry. You can either build scheduling and tasks into the application itself or schedule and run the task in separate containers

Some application frameworks like Spring include some built-in scheduling support. However, this scheduling support does not include a distributed coordination mechanism. This means that an application horizontally scaled to multiple instances will each run the scheduled task individually. Depending on the nature of the task, this may cause observable side effects like sending emails to your customers to be repeated.

It's thus preferable to have a central entity for scheduling. You could of course use e.g. a Java Spring App that needs approximately 1G of RAM to do that for you, but that would be very wasteful. Instead, we can build a simple cron scheduler that runs on 16 MB of RAM to get reliable task scheduling for just a few cents per month.

The task scheduler can then execute arbitrary scripts or code, for example to:

  • invoke an https endpoint on your application to perform the task
  • queue a message on RabbitMQ for processing by a worker
  • trigger execution of the job in a seperate Cloud Foundry Task Container

meshcloud\'s cf-cron scheduler

Our sample repository demonstrates how to run scheduled tasks on Cloud Foundry with a very small footprint (8 to 16 MB RAM) using a traditional crontab. Traditional cron daemons need to run as root and have opinionated defaults for logging and error notifications. This makes them unsuitable for running in a containerized environment like Cloud Foundry. Instead of a system cron daemon we\'re thus using supercronic to run our cron tab.

How it works

This application is built using the binary buildpack and executes supercronic on the crontab file. The crontabfile specifies all your cron jobs. To add additional jobs, simply add a new line which specifies a schedule and command to the crontab.

Note: By default, supercronic will log all output to stderr so we redirect that to stdout in our cf manifest.

You can also include additional scripts and binaries to execute more complex actions. This example allows you to install apt and debian packages to use in your cronjobs. You can specify these packages in apt.yml and they will be installed during staging by apt-buildpackcourtesy of the magic multi-buildpack.

After cf pushing this sample app to Cloud Foundry, you can see that it happily executes the jobs from the crontab in the log output:

2018-03-05T10:59:00.00+0100 [APP/PROC/WEB/0] OUT time="2018-03-05T09:59:00Z" level=info msg=starting iteration=237 job.command="echo "hello world, every 2 seconds"" job.position=1 job.schedule="*/2 * * * * * *"
2018-03-05T10:59:00.00+0100 [APP/PROC/WEB/0] OUT time="2018-03-05T09:59:00Z" level=info msg="hello world, every 2 seconds" channel=stdout iteration=237 job.command="echo "hello world, every 2 seconds"" job.position=1 job.schedule="*/2 * * * * * *"
2018-03-05T10:59:00.00+0100 [APP/PROC/WEB/0] OUT time="2018-03-05T09:59:00Z" level=info msg="job succeeded" iteration=237 job.command="echo "hello world, every 2 seconds"" job.position=1 job.schedule="*/2 * * * * * *"
2018-03-05T10:59:00.05+0100 [APP/PROC/WEB/0] OUT time="2018-03-05T09:59:00Z" level=info msg="cf version 6.34.1+bbdf81482.2018-01-17" channel=stdout iteration=7 job.command="cf --version" job.position=0 job.schedule="*/1 * * * *"
2018-03-05T10:59:00.05+0100 [APP/PROC/WEB/0] OUT time="2018-03-05T09:59:00Z" level=info msg="job succeeded" iteration=7 job.command="cf --version" job.position=0 job.schedule="*/1 * * * *"

Scheduling Cloud Foundry Tasks

While the cron container here is designed to be small and lightweight, you may want to use it to trigger more resource intensive tasks and processes. When a simple curl to an http endpoint is not enough to kick off such a task on your existing app, Cloud Foundry Tasks are a great solution to run these processes.

This sample repository thus includes instructions to install the cf cli tool which you can use to trigger such a task using a meshcloud Service User.


Connect Database to Cloud Foundry

Securely connecting to Service Instances on Cloud Foundry

To connect to a managed service instance on your Cloud Foundry space, most developers use service keys. A service key is a set of authentication credentials that allows you to connect to your database instance via a public IP address and port. While this is quick and easy to do, we do not recommend keeping service keys open for extended periods of time. Instead, you should delete them as soon as possible and create a new service key anytime you need access again.

A more secure approach that does not involve exposing a connection to your database on a public IP is to spin up a shell container on Cloud Foundry and connect to it via cf ssh. This approach is also more suitable for long running or high performance operations that require close proximity between the database and the shell.

Here\'s how to do it showcased for MongoDB, but a similar approach also works for our other managed services like MySQL or PostgreSQL.

  1. Create an app named MARKDOWN_HASH8dbc90b062fdf4d2f370bf28f06aa883MARKDOWN<em>HASH</em> based on a docker container image containing the mongo cli. Tip: you can also specify a specific version using the appropriate container image tag, the example below uses :latest. Note that we tell Cloud Foundry that we need only very little RAM (128 MB), don\'t want a health-check on the App and that it doesn\'t need an HTTP route to be reachable from the outside. After all, we just want to ssh into this app.

    cf push -o mongo:latest mongocli --no-route --no-start -u none -m 128M
  2. Create a binding of the service instance to your new app. This makes a connection string available to the mongocli app that it can use to connect to the database instance on a private network, just like your proucution app does.

    cf bind-service mongocli my-mongodb
  3. Start the container, let it just run a bash

    cf push -o mongo:latest mongocli --no-route -u none -m 128M -c bash

That\'s it, now we can easily ssh into the container using cf ssh mongocli and run env to find our connection string in the VCAP_SERVICES variable. The connection string looks approximately like this:

VCAP_SERVICES={"MongoDB":[{
"credentials": {
"password": "abc",
"database": "db",
"uri": "mongodb://user:pw@ip1:27017,ip2:27017,ip3:27017/db",
"username": "xxx"
},
"syslog_drain_url": null,
"volume_mounts": [

],
"label": "MongoDB",
"provider": null,
"plan": "S",
"name": "my-mongodb",
"tags": [

]
}]}

Now you can simply run mongo mongodb://user:pw@ip1:27017,ip2:27017,ip3:27017/db and you're securely connected to your managed database instance - on a docker container running mongo shell on Cloud Foundry - connected via ssh.