meshBlog

Designing meshStack Building Blocks

By Johannes Rudolph21. March 2023

In this post I want to provide some technical background of meshStack’s new building blocks design that our team is actively working. As outlined in our previous post, the key challenges we set out to solve are publishing building blocks for self-service, resource orchestration using familiar tooling, incremental automation and enabling composition of building blocks.

Defining the Problem Space for Building Blocks

Equipped with a lot of learnings from helping cloud foundation teams build hundreds of landing zones and delivering them to thousands of application teams, we defined the problem space we want to design for

  • Automation is a a must, though for many building blocks it becomes crucial only once you hit a scale > 50 cloud tenants (→ obligatory xkcd).
  • Building blocks ought to represent meaningful high-level capabilities – as such application teams usually compose only a rather small number of them. Extrapolating from our existing experience with marketplace services, we expect most applications will have < 20 building blocks per cloud tenant. We think this is fundamentally different from orchestrating on the cloud resources level where it’s common to have hundreds or thousands of resource to deploy an application on.
  • Cloud foundation teams will leverage building blocks to design and deliver modular landing zones. The lifecycle of building blocks like VPCs or firewall rules will consequently be closer tied to the lifecycle of the cloud tenant than to the lifecycle of an application deployment.
  • The key value proposition of a cloud foundation platform like meshStack is integrating all essential governance functions like tenant management, IAM, cost management, security and compliance in a single application. It’s thus important that building blocks seamlessly integrate with every governance function available via meshStack.
  • Most smaller cloud foundation teams lack the capacity to implement all cloud foundation capabilities by themselves. The ability to tap into an established ecosystem offering a starting point in the form of reusable building blocks is an immense value add.

Orchestrating Building Blocks

We found it helpful to phrase the underlying problem for implementing building blocks in meshStack as an orchestration problem. Platform Engineers and Enterprise Architects already interact with many different incarnations of this problem on a day to day basis like Terraform, Kubernetes, or CI/CD Tools like GitHub Actions.

Adding and reconciling building blocks

Our design takes inspiration from these examples, combining the relative strengths of these orchestration solutions in a way that makes sense for our problem space. The figure below describes the key elements of our design

Building Blocks Interaction Diagram

  1. Application teams can select building blocks from a catalog. Building blocks can have inputs of different types like manually specified inputs (entered by the application team or platform engineers), outputs from other building blocks (creating a dependency between blocks) or metadata derived from other meshObjects like meshTags
  2. meshStack validates and updates the building block graph. The building block graph is a DAG (directed acyclic graph) representing the dependencies between building blocks. meshStack always maintains a single source of truth for the desired state of this graph.
  3. block-runners are independent processes that can reconcile a certain building block implementation type. Runners connect to meshStack via meshObject API, polling for runnable blocks. Building blocks are runnable when they have all their inputs satisfied.
  4. meshStack passes the desired state of the building block to the runner, including the desired lifecycle state of the block as well as all of its inputs.
  5. The block-runner reconciles the building block, for example by executing a terraform apply in case of a terraform building block.
  6. The block-runner collects the resulting output and returns it to meshStack. This information can also include detailed execution logs to help debugging.
  7. Application teams can at any team inspect the status of their cloud tenants and associated building block graph.

Benefits of meshStack’s Building Block Design

This design offers a number of interesting properties relevant to our problem space.

Desired state reconciliation – a typical meshStack customer will manage thousands of building blocks with meshStack across multiple cloud providers. Desired state reconciliation of each individual block makes management robust in face of the inevitable cloud failure and curbs configuration drift.

Autonomous reconciliation – for managing a huge number of building blocks it’s important to intelligently prioritize reconciliation of building blocks with changed inputs with detecting configuration drift while carefully observing cloud API rate limits and automatically recovering from transient error conditions.

Predictable execution – when failures occurs its important that cloud engineers are able to quickly troubleshoot what’s gone wrong. With a single source of truth for block graphs and a central coordinator, reconciliation failures always occur at a defined point in the execution plan and do not propagate further.

Simple Composition – the types of composition we need to enable between building blocks are mostly simple input/output relations. For example, a firewall rule building block may need a VPC id specified as an output of a VPC building block. This means that we will start our design from simple 1:1 mapping from inputs to outputs. There will be different types. Building block implementations can perform more complex computations from inputs with familiar and more suitable tools like HCL (for a terraform building block), pushing this complexity to the edges of the system.

Swappable Block Definitions – it’s important that platform engineers can reuse various existing automation technologies to implement building blocks. In that sense, building block definitions act like an interface for which platform engineers can seamlessly swap the implementation.

Extensible Runner model – simple things should be simple, so meshStack will include out-of-the-box runners for common scenario like terraform modules. Similar to GitHub actions, our runners it will be open source and easy to self-host. This enables advanced scenarios like deploying runners with access to sensitive environments like on-premises or special secrets, runners implementing custom automations and so on.

The Migration path for Marketplace Service based Open Service Broker API

When we designed the first version of meshStack’s marketplace almost five years ago in 2018, we wanted to provide a capable platform that enabled private cloud as well as public cloud use cases alike. The problem space we had in mind was to provide application teams with a PaaS experience as pioneered by platforms like Heroku and Cloud Foundry. Approaching the challenge from that angle, OSB API shines with great support for service catalogs to aid application team’s discovery of compatible services as well as metering service usage for internal chargeback.

From the perspective of a cloud foundation team however, implementing building blocks on top of OSB API poses a few challenges. Implementing a service broker requires implementing a conformant API – a software engineering problem. While especially bigger cloud foundation teams are willing and able to develop and operate custom service brokers, we learned that smaller teams are keenly aware of the their limited bandwidth.

To deal with this, we sought to enable cloud platform engineers by building on workflows they already experts at like writing simple scripts and IaC. With the unipipe open source project we tried to make OSB API more accessible by transparently translating it to a GitOps workflow. Despite all of our efforts, the resulting experience still fell short of our ambitions while simultaneously critically lacking desirable composition capabilities as outlined our previous post on modular landing zones.

Building blocks will offer a clean migration path for customers already leveraging the OSB API marketplace – OSB API service instances will be just another type of building block implementation. We will be tackling OSB API integration with an OSB API compatible block runner in a later milestone, but our current plan is to support the migration as much possible out of the box without requiring changes to existing service brokers, service definitions and service instances.

Enabling Custom Platforms through Building Blocks

One of the design areas we are actively looking into as well is exposing existing meshStack tenant replication capabilities as built-in building blocks. This will give platform engineers fine-granular control over how their modular landing zones apply to cloud tenants.