A Cloud Management Platform helps Cloud Centers of Excellence master the complexity of their multi-cloud environments by implementing governance and control mechanisms.
In this post we provide guidance and orientation for the evaluation of Cloud Management Platforms to help you make the right choice for your organization.

What is a Cloud Management Platform?

Coined as its own product category by Gartner, Cloud Management Platforms provided by vendors like meshcloud, CloudCheckR or Morpheus, have been the tool of choice for lots of organizations. A Cloud Management Platform helps control their cloud infrastructure landscape when cloud adoption increases.

Here are 10 criteria to consider when choosing a Cloud Management Platform:

(✅ 1) What is your intention of moving to the cloud?

There are different strategies that let organization leaders decide to move to the cloud. I’ll shortly describe the 2 extremes and how they can impact your choice of Cloud Management Platform.

  1. Empty Datacenter with Lift&Shift: High hardware and operations costs that aren’t flexible drive organizations to shut down their data centers and move workload to the cloud (no matter how). This is not much about innovation or speed, it’s a shift from CapEx to OpEx for more flexibility. What you will need from a Cloud Management Platform to achieve your goals is easy provisioning of commodity services: VMs, Networks and so on. Teams do not have (and need) in-depth cloud know-how. They just move from the actual datacenter to a virtual one.
  2. Accelerating with cloud-native development: For other organizations the main driver of cloud is speed. Accelerating development, collecting customer feedback and shortening release cycles for fast iterations are crucial. Teams have to adopt new ways of working (DevOps), leverage cloud-native practices like CI/CD or Infrastructure as Code and get familiar with the cloud providers’ services. A Cloud Management Platform should be almost invisible to developers that are working in this mode. It should help them get access to the cloud faster and support with standard setups for developers: Git repositories, CI/CD tooling etc. (Developer Toolchain). But once the initial setup is done, they should be free to access native cloud environments: AWS accounts, Azure subscriptions, GCP projects and choose the services they like.

Of course there are variations of these 2 strategies. And in some cases you might have to enable both as they are executed in parallel. So if your Cloud Management Platform supports both: Self-service provisioning of simple cloud services through a marketplace-like interface as well as native cloud environments and the access to them, you are on the safe side.

The alternative is to look for Cloud Management Platforms that provide DevOps teams with entire cloud environments like AWS accounts, Azure Subscriptions or GCP projects and leaving the choice of specific services within these platforms to the teams.

Download your Cloud Foundation Maturity Poster with over 50 capabilities covering everything you need from Security to a Service Ecosystem. Asses where you're at and plan your road map to cloud-success.

(✅ 2) What is the right level of abstraction for you?

This question is not new to cloud. Enterprise architects have been dealing with abstraction levels for decades, and it’s a key decision to take when looking at Cloud Management Platforms. So let’s have a look at the options.

There are 2 main approaches we see on the market:

  1. Abstraction on the resource level: When abstracting on the resource level, teams that have access to the Cloud Management Platform get offered different resource types, e.g. VMs, databases and so on. They choose what resource they need and can usually provision it in self-service via a self-service portal. Depending on the “depth” of the abstraction they will or will not be aware in which cloud the resource will be provisioned. This approach implies that every cloud service a developer wants to use has to be integrated into the Cloud Management Platform, either by the CMP vendor or your team. Especially, when expecting a scaling demand for cloud, this can make it hard to accommodate all the different use cases.
  2. Abstraction on the tenant level: In contrast to workload abstraction, abstracting on the tenant level means that the Cloud Management Platform provides the capability to provision native cloud tenants, e.g. AWS accounts, Azure subscriptions, GCP projects and so on, in self-service. The developers get the freedom to access these native cloud environments and choose their preferred cloud resources directly in the cloud. However, thanks to the tenant-level abstraction, they will not have to deal with different tenant provisioning processes for each cloud platform, figure out how they’ll get access to the environments, or how to extract costs. That will be handled homogeneously by the Cloud Management Platform to reduce complexity and lower organizational overhead.

(✅ 3) How important is vendor lock-in to you and your organization?

Vendor lock-in is one of the drivers of multi-cloud strategies. By having multiple providers in place you reduce the dependency you have on each individual provider and can easily define an exit strategy.

In reality this is a bit more complex:

If you abstract on the resource level, you can make sure that you only provide services that are available across multiple clouds, rather than proprietary and highly specialized services. This facilitates moving from one cloud to the other. However, in this case vendor lock-in to the Cloud Management Platform itself becomes a critical risk. Removing the CMP from your setup, would leave you without any organizational structure or processes as well as badly isolated cloud resources.

If you abstract on the tenant level, you will have your organizational structure represented within each cloud platform. Most importantly, you will have isolated tenants for your workloads that people can continue to access when removing the CMP from your setup. The downside: Depending on the used resources within each cloud, migrating workload from one cloud to the other may be a bit more complex.

(✅ 4) What do you want the onboarding process for new cloud tenants to look like?

Cloud Onboarding is where everything starts. The process you set up here defines how different teams and stakeholders within your organization will get in touch with the “new” cloud offering you are building. It has a great impact on how fast and intensively cloud will be adopted and finally on the speed of digitization for your whole company. The smoother the onboarding, the lower the hurdle to access cloud resources. If you want to call yourself cloud-native, self-service is a must here.

One way to measure how mature your cloud onboarding process is, is to define your time-to-cloud: How long does it take a product owner or developer from requesting a cloud environment to him/her actually having access to that environment. A good answer is 5 minutes. This should include all the various aspects of cloud onboarding like assigning a budget, defining access rights, documenting it in a central tenant database and actually creating the environment. inThis should at least be the case for low-risk sandbox environments.

(✅ 5) What kinds of resources do you want your application teams to use?

IaaS is a classic, but most organizations are looking to leverage the cloud for higher-level services: K8s, serverless functions and more importantly managed PaaS services like PostgreSQL databases, BigQuery or Kafka just to name a few. They free you from heavy operating efforts and allow you to move faster and focus on what matters: custom functionality that stands out in the market.

A lot of Cloud Management Platforms (CMPs) originate from an IaaS time and focus on provisioning infrastructure like VMs, Networks and nowadays K8s rather than making the entire cloud portfolio accessible to teams. These commodity services are integrated into the platform and can be easily provisioned by DevOps teams. For Lift&Shift workloads this can work out nicely, because the teams do not need to deal with the specifics of each cloud platform. If they have more sophisticated requirements though, it will be hard for a CCoE to accommodate them. New services will have to be integrated individually which will slow down software delivery.

(✅ 6) Are you aiming to change the way you work?

You might recall the old times when requesting a server required filling out lengthy forms and maintaining a good relationship with the server team to accelerate the process from months to weeks.

In cloud-native development code is king. You describe everything you can as code to make it fast, consistent and easily repeatable. Leveraging cloud-native paradigms like CI/CD and Infrastructure as Code will help you shorten your release cycles and delivering new features faster. To support these this shift, your Cloud Management Platform has to provide you access to the native cloud APIs and allow you to use cloud-native tools like terraform or GitLab.

(✅ 7) What is your approach to achieve compliance?

Staying compliant is one of the core goals of cloud management. Simplified a lot, there are 2 basic approaches to achieve compliance when moving to the cloud: Preventive or reactive. You usually recognize a reactive approach by a list of policy violations you need to fix, often times displayed with the classical “traffic lights”. However, it may require a lot of resources to remediate the errors reactively once something has been detected.

In contrast, cloud environments can be secured preventively in a fully automated fashion. Landing Zones are the “tool” of your choice to accomplish that. They define the guardrails for cloud usage within your organization and prevent DevOps teams from doing things they shouldn’t: E.g. deploying outside of Europe (GDPR) or leaving S3 buckets public.

Your Cloud Management Platform should support an automated rollout of Landing Zones. It means that developers won’t be able to do basic mistakes and that security configurations are implemented consistently across different use cases. This makes it much easier for security to assess application’s security as they start from a solid baseline.

(✅ 8) Are you ready to trust your developers?

The cloud only unfolds its full potential when you use it to enable a wide range of use cases within your organization. What does that imply? A lot of engineering moves from IT into functional departments, like production or sales. These teams know best what they want to build and even how they want to build it.

Giving them their freedom requires a new mindset and even more important: A new shared responsibility model. I don’t mean the one between your company and the cloud provider, but your internal one: between your CCoE and developers.

When choosing your Cloud Management Platform you’ll have to make sure your developers get the technological freedom they need to be successful. Specifically check, whether they can create cloud environments in self-service, use the cloud-native APIs for automation and access the large variety of services the cloud providers offer without anyone’s assistance.

(✅ 9) No native Policy Orchestration

One reason to opt for a Cloud Management Platform is to avoid lock-in to the specifics of each individual cloud platform. And yet there are some aspects you can hardly abstract away. One of them is policy orchestration. All three hyperscalers provide native policy orchestration tools. You most probably have come across at least one of them: Cloud Formation (AWS), Azure Blueprints or Google Deployment Manager.

As the cloud itself is the place where policies are enforced (the so-called policy enforcement point), it’s helpful if your Cloud Management Platform leverages this native policy orchestration. You want to have full flexibility when it comes to security and compliance, even if this may cause more effort in setting up an appropriate policy framework as part of your Landing Zone.

(✅ 10) The beauty of Desired State

As humans, we tend to think in workflows that execute one step after the other. Automation enables us to improve workflows by making them faster and more reliable (You are sure it’s exactly the same, every time you run it). Looking at the tooling landscape, ITSM tools are a common way to model workflows digitally. They give you the power to easily automate your workflows and save you time while increasing consistency. They also facilitate collaboration when they automatically trigger a next step, once the first one has been completed to reduce the need for coordination.

Here is a workflow example in the context of cloud account creation:

  1. Product owner requests an AWS account by providing her name and use case
  2. Admin receives service request and triggers automation to create the account with the provided parameters
  3. An account automation script creates the account and tags it
  4. Once the account is created, the IAM team receives a notification to provide access to the account
  5. Once access is provided, the Product owner receives a notification and can access the account

While this is already much better than handing over tasks manually, a workflow approach for Cloud Management comes with weaknesses, especially when you have a large-scale cloud migration project ahead of yourself. Often times workflows still include manual steps. In the long-term they suffer from inconsistencies. How do you integrate future changes? You’ll have to implement a new workflow which will further increase complexity. Or what if there was an error that led the workflow to end early? How do I know after months and tens or hundreds of cloud environments what the actual state of a specific cloud environment is?

That is where declarative definitions or Desired State Models come in to play. Instead of defining each step individually, you define the desired outcome. E.g. an Azure Subscription with access for “Tom Teamlead” and a security configuration that allows productive workload. That’s the desired state that will be “replicated” to the cloud (creation of cloud environment, tagging for production, rolling out of a Landing Zone, providing permissions and so on). While in the beginning the results of workflows and the desired state model may seem similar, the approach to get there is fundamentally different. And the value of that materializes in the long term. Declarative automation in the context of Cloud Management focuses on achieving continuous compliance. What does that mean? A desired state that is defined upon the creation of a cloud environment can be compared against the actual state (actual existance of a cloud environment, actual permission set, actual security configuration etc.) in the cloud. In case of divergence the actual state can automatically be set back to the desired one. Infrastructure-as-Code is a commonly known example for a desired state approach.

meshStack as a Cloud Management Platform follows a desired state approach that is mostly focused on organizational aspects of Cloud Management like Tenant Management, IAM, Security & Compliance or Cost Management.

Cloud Management Platforms are a great way to improve the structure of your cloud landscape and ensure transparency across multiple cloud platforms. However, the choice of platform is a strategic decision that will affect the way you work and the speed in which you’ll be able to adopt cloud in the future. Taking these 10 aspects into consideration will help you take the right choice and better understand its implications.