Cloud Tenant Management: What You Need to Know in 2021

This post gives an overview over important cloud tenant management concepts.

If you work as a CIO, as a Cloud Architect, in DevOps or you are just interested in the cloud tenant lifecycle and its implications, this post is for you.

We'll look at the cloud tenant, or cloud account, lifecycle from fundamental considerations, to provisioning, running workloads, and de-provisioning.

In this new guide you will learn:

  • What cloud tenant management is
  • Why cloud tenant management is at the core of good cloud governance
  • How cloud tenant management ties in with billing, IAM and cloud security
  • Best practices in structuring cloud accounts
  • The resource hierarchies of AWS, Azure and GCP
  • Why tagging and labeling is an essential part of cloud tenant management

Let's get started with the basics:

What is Cloud Tenant Management?

Cloud tenant management is the coordination and administration of a cloud tenant throughout its entire lifecycle to provide developers with a secure, efficient, and fitting working basis in the cloud.

We refer to a cloud tenant as an account with a cloud provider. In the case of AWS, it would be an account, Azure calls it a subscription and Google coined it GCP project. Other providers may have come up with other names but they all describe what we call a cloud tenant. Others just go with cloud account - this post covers the management of such tenants or accounts.

How Cloud Tenant Management Ties in with Cloud Governance

Thorough cloud governance is essential for enterprises and it gets more complex with the number of clouds in use and the number of projects running.

Cloud tenant or cloud account management is an integral part of cloud governance: It's important to effectively manage many cloud accounts over numerous cloud platforms.

Take Care of Cloud Tenant Management Early

Losing track of cloud accounts costs money and risks security: Enterprises embarking on their cloud journey may start with a small number of projects and accounts. But putting thought into proper cloud tenant management can save a great deal of effort in later stages when things grow and get more complex.

The Cloud Tenant Lifecycle

The lifecycle of a cloud tenant is closely related to the cloud project lifecycle, in which a tenant is configured, users are invited, resources are provisioned and at last, everything is de-provisioned again. The cloud project lifecycle is a subset of the cloud tenant lifecycle so to speak.
The cloud tenant lifecycle: Tenant structure strategy, tenant provisioning, tenant configuration, tenant operation and tenant deprovisioning.

Resource Hierarchies for AWS, Azure and GCP

In this section, we'll describe the resource hierarchies of AWS, Azure, and GCP on a very high level. The resource hierarchy refers to the organization of resources inside a cloud platform account. The three big providers offer several levels, some of which are optional:

Resource hierarchies of AWS, Azure and GCP showing the cloud tenant level.
AWS and GCP allow for the cloud tenant (account or project) to be the highest up in the resource hierarchy. Azure requires you to have a root management group for all tenants, that can't be moved or deleted.

Azure also requires you to have at least one resource group - a level that the other two providers do not have at all.

The levels determine the inheritability of policies, how access is managed, and so on. These levels are your building blocks to map your organization to the cloud.

Best Practices for Organizational Structure in the Cloud

Mapping your organization to the cloud is a major and fundamental part of the cloud journey. In this section, we'll cover best practices to provide some guidance along the way.

Planning a consistent scheme ahead of time is important to avoid structures that occlude what is actually going on further down the road.

A best practice we see in the industry and with our customers is creating cloud accounts according to the environments an application should run in. Commonly that means three cloud accounts for one application: One for development, one for staging/QA, and one cloud account for the production environment.

Keep in mind, that for every new cloud account this can mean management overhead piling up - depending on the maturity of your cloud governance capabilities. We'll get to that later.

Flat hierarchies reduce complexity and avoid vendor lock-in to a single cloud service provider.

From our as well as from a best practice point of view your organizational structure should be modeled after your use cases or IT products instead of aligning it to your departments or organizational hierarchy.

Looking at you cloud organization from three different angles:

  • How you want your teams to work?
  • How you want your IT products managed?
  • and from the life cycle perspective

This clearly shows why modeling your organizational structure after IT products or use cases is meaningful:

  1. Start with how you want your team to work on IT products or use cases. "The right person for the right job" describes pretty nicely what you want, and to gather a team of specialists you may need employees from the different departments as most goals can be only achieved if various departments work cooperatively.
  2. How do you want IT products to be managed? Avoiding bottlenecks and enable teams to self-organize and drive innovation! This means you want to have a self-service approach for each IT product so that the team can be self-sovereign to some degree i.e. having their own DevOps team, rights and role structure, chargeback, and metadata.
  3. Everything changes and the organizational structure model should be robust against these changes. Responsibilities of departments for IT products change overtime or IT products move between departments, or maybe the departments themselves will be reorganized. These events would require major remodeling for the organizational structure in your multi-cloud modeled after your departments.

The Importance of a Multi-Cloud-Account Strategy in the Cloud

Most enterprises need a multi-cloud strategy and within these clouds a multi-cloud-account strategy.

Having only one cloud account - or cloud-tenant - for each cloud in use brings major drawbacks that have caused serious trouble: Hackers used the chaos created by running all applications on one cloud tenant to mine cryptocurrency in Teslas AWS account.

Here are 4 risks and limitations caused by running both production and non-production services in one cloud tenant:

1. Concentration risk

2. Service limits

3. Permissions and transparency (think of the Tesla case)

4. Cost allocation (also think of Tesla)

Multi-Tenant Approach for better Security, Agility, and Transparency

The solution to avoid these unnecessary and dangerous downsides of running everything in one cloud tenant is - of course - having a cloud tenant for every production and non-production stage of your applications.

It's also a prerequisite for enabling teams to create their own cloud tenants in self-service - we'll get to that later. This approach is great for tenant isolation, innovation, and enablement. But it definitely needs a good grasp on proper cloud tenant management to keep track of every cloud tenant's lifecycle.

Cloud Account Meta Data

When creating a cloud account - a cloud tenant - metadata is required to manage the specific new cloud account and the overall number of accounts in the organization. here are some common examples of cloud tenant metadata:

  • Budget
  • The responsible developer
  • Type of environment: development, staging or production?

Many companies gather this data over a form or a survey prior to the provisioning of the cloud account from the requesting team. Some store this important metadata in a database, an excel sheet, or simply a PowerPoint presentation.

This manual process is slow, prone to errors and manipulation. Over time the reality of a specific cloud account or all cloud tenants in the organization will differ dramatically from the static metadata: A single source of truth is missing.

Cloud Tagging needs to be consistent. Establishing authoritative sources for each tag and clarifying responsibilities for keeping the tag value up to date is essential.

Metadata to cloud tenants needs to be integrated - meshStack serves as a metadata layer for many of our customers.

Tagging and Labeling Your Cloud Accounts

One way of integrating metadata to a cloud tenant is using tags (Azure and AWS) and labels (GCP). Tagging or labeling cloud tenants increases transparency within the organization:

One central advantage of using the cloud is rapid scalability. And with this comes the necessity to keep track of what is going on in your cloud infrastructure while it is constantly growing and changing. That's where tags come in: You will need a consistent set of tags to apply globally across all of your resources following a consistent set of rules. Tagging is the cornerstone to effective cloud governance: Cost allocation, reporting, chargeback and showback, cost optimization, compliance, and security - all these aspects can only be managed with proper tagging in place.

Everything can be put into this mnemonic: Tag early, tag often.

We've been talking about how tagging is essential and coming up with a cloud tagging strategy should be an early stage step in setting up your cloud governance.

Here are the most common use cases to show you why:

1. Cloud Cost Management
Gain transparency when it comes to cloud usage and costs: Tagging cost centers, business units, and specific purposes help you keep track.

2. Cloud Access Management
Proper tagging enables administrators to control the access of users or user groups to resources or services.

3. Cloud Security Management
Tags are essential to identify sensitive cloud tenants and keeping them secure. Cloud tenant tagging is a matter of compliance and should be treated as such by the central cloud team (the Cloud Foundation Team).

4. Automation
The added metadata of tags enable a whole new level of automation: Many different automation tools can read these tags and utilize them to make your life easier in almost every regard concerning the previously mentioned use cases.

Self-Service Cloud Tenant Creation

We talked about the advantages that draw enterprises to the cloud in the first place: To leverage these advantages you will need a self-service cloud tenant creation for your DevOps teams.

A DevOps team lead should be able to provision a cloud tenant and deploy applications without the interference of some kind of central IT. Offering a self-service cloud account creation requires a high degree of automation. This reduces manual workload and with that reduces the "time-to-cloud" for the developers.

The user interface for that has to be easy to use to enable as many users as possible to provision their own cloud tenants.

Automate Tenant Configuration

Automating the cloud tenant creation and enabling users to do so in self-service can only be the first step. Automating the tenant configuration has to follow to keep a consistent level of cloud security, compliance, and transparency throughout all company cloud accounts and across cloud providers.

Depending on if the newly created cloud account is for development, QA or production - as an example - different configurations and policies should automatically apply. An Azure subscription for production gets a different set of tags and a specific blueprint than let's say an AWS account that's used for staging and is automatically configured with the according landing zone.

The cloud service provider offers their own tooling and support for third-party tools to automate the cloud tenant configuration. AWS has its AWS Vending Machine and GCP provides the Google Project Factory to work with terraform configuration files. The same is possible with Microsoft Azure.


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


The Cloud Identity and Access Management Guide for 2021

This is a comprehensive overview of cloud identity and access management.

If you work as an Enterprise Architect, in a Cloud Foundation Team, in DevOps - or you're just interested in cloud identity and access management - this post is for you.

In this post you will learn:

  • What cloud identity and access management is
  • Why good cloud IAM is so important
  • The difference between authentication and authorization
  • The principle of least privilege
  • How identity federation works
  • How the big cloud service provider handle IAM

What is Identity and Access Management?

In the enterprise IT environment, IAM is all about managing the roles, access authorizations, and requirements of individual users. The core task is to assign a digital identity to an individual. Once created, this identity must be maintained, updated, and monitored throughout the entire lifecycle of a user.

Why is good Cloud Identity and Access Management so important?

Authentication identifies and verifies who you are. Authorization determines what an identity can access within a system once it has authenticated to it. The combination of a central identity, authentication, and authorization is a major pillar of cloud security. It enforces that only authorized people can access only those systems that are necessary to fulfill the tasks relevant to their role in the organization. On the other hand, it allows to audit changes in these systems and traces them back to specific people. A requirement getting more and more important when designing an identity and access management system for your organization is to have efficient processes in place so your team can focus on their actual work.

Authentication vs. Authorization

Let's start by looking at authentication:

The authentication process consists of two parts of information:

The first part of this process is to define who you are, effectively presenting your identity. An example of this would be your login username to your AWS account or environment.

The second part of the authentication process is to verify that you are who you say you are in the first step. This is achieved by providing additional information which should be kept private and secret for security purposes. However, this private information does not have to be a unique value within the system. In the AWS example, you provide your identity in the form of a username to your AWS account, which will be a unique value. The next step is to verify that identity by providing a password.

Authorization deals with the question of what an authenticated user is allowed to do. So here, we are really looking at your access privileges, roles, and permissions.

The Principle of Least Privilege

The principle of least privilege (PoLP) is the concept of granting access to only the resources that are necessary to do the assigned tasks. And within this access only granting the necessary permissions.

It is pretty similar to what you might know from movies about secret agents: They operate on a need-to-know basis to accomplish the mission - in that way they can't endanger the whole operation in case of failure.

4 Tips to implement the principle of least privilege

From developer onboarding to long-term management of user and permission lifecycles, managing access to cloud infrastructure is complex and security-critical. Authorizations should be granted as sparingly as possible (principle of least privilege) to reduce security risks. At the same time, the productivity of the teams should not be restricted by lacking access rights or tedious approval and login processes. A simple and transparent process for assigning access rights is therefore essential.

1. Restrict the use of broad primitive roles
The use of primitive roles generally grants more privileges than necessary. Use custom or pre-defined roles that are more specific, to limit access to the necessary minimum.

2. Assign roles to groups, not individuals
To keep the assignment of roles maintainable, assign them to groups rather than to individuals. This way you can make sure they don’t keep roles when moving to another job in the company or the group role changes.

3. Use networking features to control access
Configure resource and application connectivity following the same principle of least privilege to reduce the risk of unauthorized access. The permission to modify network configuration should only be granted to those directly responsible.

4. Consider using managed platforms and services
To limit your responsibilities for security configuration and maintenance of accounts and permissions you might consider using managed platforms and services.

Be Aware of Privilege Escalation

An important aspect to consider when designing processes in the field of identity and access management is privilege escalation. Privilege escalation describes the case where users with a limited set of permissions have the possibility (due to a bug or bad design) to change their own permissions and gain elevated access. When it comes to privilege escalation we distinguish between vertical and horizontal privilege escalation.

Vertical Privilege Escalation: A user obtains higher privileges (more permissions) than intended (e.g. write instead of read permissions)

Horizontal Privilege Escalation: A user obtains privilege to access more resources than intended.

Identity and Access Management in the Cloud

Compared to traditional environments, cloud environments are more dynamic as resources change frequently and so permissions do as well. Nevertheless, when it comes to cloud identity and access management the same requirements apply to any local environment. To get to an integral identity and access management, cloud IAM and on-premise IAM should not co-exist. Instead, they should be an integral part of the same approach.

To avoid heterogeneous solutions within individual cloud silos it has proven to be best practice to let a central cloud foundation team take over the basic governance of all clouds and thus also of the identity and access management across all clouds.

IAM integration is a requirement for most enterprise systems because you want people to have a single identity to ensure the lifecycle is managed. So cloud needs to be integrated and identities closely monitored as there is sensitive data in the cloud.

Five Common Challenges in Cloud IAM

1. Lack of agility
Existing processes don’t meet cloud-native requirements like self-service, immediate implementation, and scalability

2. Strictly regulated field
Identity and Access Management underlies strict regulation (Bafin requirements - PDF) and has established processes outside the cloud world, e.g. joiner/mover/leaver or segregation of duties

3. Missing transparency and risk of shadow IT
There is no cross-cloud overview on existing cloud tenants and related permissions. Undetected shadow IT is a real financial and security risk.

4. Lack of automation
Cloud projects are frequently changing dynamic environments and come with a great number of IAM objects that is impossible to manage manually

5. Complexity
Large complexity due to the use of multiple clouds, strict separation of environments, multiple roles, flexible teams

The Benefits of Identity Federation

Federated identity is where a third-party identity service vouches for the authenticity of your users – usually by confirming they’ve entered the correct username and password. Federated identity enables users to use their existing directory service credentials to get seamless access to cloud platforms. The gains of Identity Federation are:

1. Single-Sign-On (SSO)
Seamless access to applications with one set of credentials and authorization through a central identity provider like Microsoft's Active Directory Federation Service (ADFS).

2. Security
Multiple login credentials expose your organization to various risks, including the potential use of easy-to-crack passwords by users. Managing a single set of credentials provides convenience to employees and IT admins and helps in creating a strong, single password that can be rotated regularly.

3. Productivity
IT teams have to spend a lot of time helping users resolve login issues keeps both parties from doing actual work and solving actual problems.

Azure, AWS and GCP: Identity and Access Management Overview

The cloud providers each handle the topic of identity and access management a little differently.

Here is a little overview:

Azure

Azure Active Directory (Azure AD) is Microsoft’s cloud-based identity and access management service, which helps your employees sign in and access resources in:

  • External resources, such as Microsoft 365, the Azure portal, and thousands of other SaaS applications.
  • Internal resources, such as apps on your corporate network and intranet, along with any cloud apps developed by your own organization. For more information about creating a tenant for your organization, see Quickstart: Create a new tenant in Azure Active Directory.

Amazon Web Services

AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely. Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.

IAM is a feature of your AWS account offered at no additional charge. You will be charged only for use of other AWS services by your users.

To get started using IAM, or if you have already registered with AWS, go to the AWS Management Console and get started with these IAM Best Practices.

Google Cloud Platform

Google Cloud Identity and Access Management let administrators authorize who can take action on specific resources, giving you full control and visibility to manage Google Cloud resources centrally. For enterprises with complex organizational structures, hundreds of workgroups, and many projects, Cloud IAM provides a unified view into security policy across your entire organization, with built-in auditing to ease compliance processes.

Leverage Cloud Identity, Google Cloud’s built-in managed identity to easily create or sync user accounts across applications and projects. It's easy to provision and manage users and groups, set up single sign-on, and configure two-factor authentication (2FA) directly from the Google Admin Console. You also get access to the Google Cloud Organization, which enables you to centrally manage projects via Resource Manager.


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to getting in touch with you.


Cloud Cost Management

The 2021 Guide to Multi-Cloud Billing and Cost Management

This is a comprehensive guide to multi-cloud billing and cost management.

If you work as a CIO, CFO, in IT financial management, as a Cloud Architect, in DevOps or you are just interested in cloud billing and cost management, this guide is for you.

We'll look at how to establish an automated end-to-end cost management process as part of an organization's cloud governance.

In this new guide you will learn:

  • What cloud cost management is
  • Why cloud cost management is becoming more important
  • What strategies you can use to manage cloud cost
  • The ins and outs of chargeback and showback
  • About the challenges of cloud metering and pay-per-use in the private cloud
  • What to expect and where to go on your cost management journey

Let's get started with the basics:

What Is Cloud Cost Management?

The cloud promises a simple, cost-effective, highly scalable alternative to running your own servers in a data center. But as cloud infrastructure becomes more complex - a lot of companies use multi-cloud architectures - the costs associated become difficult to track and evaluate:

Cloud cost management or cloud cost optimization is the effort to gain valuable insights into the costs of cloud usage within the enterprise and finding effective ways to maximize cloud usage and efficiency.

Multi-Cloud Cost: 4 Expensive Factors That Are Easy To Miss

Without proper cloud cost management in place, things can get out of hand and more expensive than they need to be.

Here are 4 factors that drive cloud cost and are easily overlooked:

  1. Un- or underused cloud resources

    Provisioning cloud resources may be difficult or not in your organization - but de-provisioning is forgotten easily. With the public clouds, pay-per-use model costs can spiral out of control.

  2. Third-party cloud services

    The usage of complementary cloud services like cloud monitoring or backend connectivity is necessary for DevOps teams. The cost of these services is often not attributed or monitored closely enough to detect the potential for significant savings.

  3. Provisioning and organizational support

    For developers, it's not always easy to get the cloud environments they need for their projects. Manual processes for approving and provisioning their requests are time-consuming and costly.

  4. Projects exceeding budget

    Budgeting cloud projects is common; Without the right cost management, it might not be possible to get an early warning on projects that might run out of budget. Shutting down cloud resources on a project because it's exceeding budget is no option and so it drives cloud costs up.

Why Cloud Cost Management is so Important

Cloud usage is an ever-increasing factor in most enterprises. The advantages seem clear - and they are - but the organization needs to adapt to and keep up with these advantages: Rapid scalability, pay-per-use, automation, overall agility - all this needs proper cloud governance to come into effect without exploding costs.

DevOps teams consuming cloud resources like they are free. Multi-cloud architectures that add extra complexity, and private cloud components that don't follow the pay-per-use model make effective cloud cost management an absolute necessity.

At the beginning of an organization's cloud transformation, the topic might not seem as pressing - but that's actually the right time to lay the organizational and technical foundation for cost-effective growth to avoid costly surprises further down the road.

Is Cloud Cost Management an Issue for Businesses?

A recent survey revealed that cutting cloud costs is the top priority for companies' cloud strategy. They face growing invoices from their cloud service providers and the lack of insight into those costs has considerable financial consequences. Most companies expected their cloud budget to increase by 10 to 25 percent in 2020 while considering the applications they deploy to the cloud as "mission-critical". That makes it critical for business success to put the budget increase to good use and not waste it on hidden and avoidable costs in their cloud organization.

Why a lot of Companies are Struggeling with Cloud Costs

Not utilizing the advantages of the cloud means real competitive disadvantages and poses a risk to business success - that's for sure. But going on that journey to the cloud, transforming the organization, and moving workload out of data centers is not an easy task and must not be rushed.

Some companies ran for the cloud and with time came the shocking suprise of mounting costs.

Here are some reasons for that (and for you to avoid):

  • No proper cloud governance
  • Not adopting the cloud-native mindset
  • Overlooking the risks of the pay-per-use model
  • Poor visibility of cloud costs (build a cloud cost dashboard)

6 Levers to Pull to Reduce Cloud Costs

There are a number of levers businesses can pull on to manage cloud costs.

Here are 6 of the most promising:

  1. Right-sizing:

    Ensure that the public cloud instances you choose are the right fit for your organization’s needs.

  2. Automatic scaling:

    This allows organizations to scale up resources when needed and scale
    down the rest of the time, rather than planning for maximum utilization at all times (which can be needlessly expensive).

  3. Reserved Instances:

    Reserved Instances offer a significant discount over on-demand instance pricing (for example, up to 72% on AWS). To make full use of them you have to know what you actually need.

  4. Power scheduling:

    Not all instances need to be used 24/7. Scheduling non-essential instances to shut down overnight or on weekends is more cost-effective than keeping them running constantly.

  5. Removing unused instances:

    If you’re not using an instance, there’s no need to keep it around (and paying for it!). Removing unused instances is also important for security since unused resources can create vulnerabilities.

  6. Discount instances:

    Since discount instances usually do not guarantee availability, they’re not
    appropriate for business-critical workloads that must run constantly but for occasional use, they can result in a significant cost reduction.

The most effective lever is the organizational transformation to fit the requirements of the cloud: We'll get to the why and how later in this guide.

Multi-Cloud Billing: Showback vs. Chargeback

Strategic considerations aside showback and chargeback are pretty similar on an operational level. They share the same structure - but of course, there are essential differences:

  • The showback approach does not contain bills or invoices that need to be paid. IT reports costs and usage, while also paying for the cloud service provider invoices, services, and so on. There is no charging of the actual source of these costs. A DevOps team won't be charged for a VM or a third-party cloud service it uses.
  • In the chargeback approach, each consumer of cloud resources and services is actually billed for what they used. That can mean that money flows from one department to another within a company or just an entry in accounting charging the consuming department.

The Strategic Differences of Showback and Chargeback Models

On a strategic level, showback and chargeback are quite different. The question here is why should you use one over the other? What implications does that have on usage behavior and accountability?

Showback does not have any enforcement mechanisms to guide usage behavior. It is basically providing cloud cost transparency and visibility - which is great and useful - without encouraging users to pay attention and be proactive about the costs they produce.

In a chargeback, model users are incentivized to consider the cost of the cloud resources they request and use.

The Pros and Cons of Showback vs. Chargeback

On a scale of detail required, effort and cost showback requires less and is easier and faster to execute.

For a chargeback model more detail and effort are required and therefore it can be more expensive.

Both models expose levers to reduce cloud costs and optimize usage. They are both fit to demonstrate the value that the cloud brings to the business.

Chargeback offers total accountability and allows IT to recover costs across all DevOps teams and departments that use IT services.

Metering in the Private Cloud

The private cloud needs a little extra treatment when it comes to cloud billing and cost management. There is no large bill at the end of the month that lists the cloud costs on a pay-per-use basis. To get that you have to make or buy a metering implementation that does this for you.

Metering is the process of collecting and calculating cloud resource usage. It involves pricing resource usage to calculate the cost.

Cloud platforms record events and other information about deployed cloud resources. Some of these events are relevant for metering. For example, starting and stopping a virtual machine may
generate a corresponding stream of events that describes for how long the virtual machine was running. This data from the cloud platform can be used to calculate how much RAM-hours and vCPU-hours a virtual machine consumed in a given period.

Cloud resources have many different traits like a virtual machine has RAM and vCPU. In the public cloud, it is common to provide t-shirt sizes for cloud resources. You can build the same for the private cloud and offer VMs in S, M and L instead of giving exact quantities for RAM or vCPU.
A product catalog defines which of these traits are relevant for metering and how their usage is calculated. Typically usage is the product of a quantity and a duration, i.e. a single vCPU used for an hour. But there may be other usage units as well that consist only of quantities (i.e. bytes transferred over the network) or a duration (i.e. resource usage hour).

A product catalog also contains pricing rules that determine the cost for particular resource usage.

This way private cloud usage can be accounted for in a pay-per-use model like it is common with the large public cloud providers.

Cloud Billing and Cost Management Maturity Model

At meshcloud, we came up with a six-stage maturity model for cloud billing and cost management. We see these stages when we look at where our customers are on their cloud journey - but not necessarily in that order:
The six stages of cloud billing and cost management maturity:  Private cloud metering, One large bill per cloud provider, Per project cloud consumption, Tenant fee for cloud governance, Charge for cloud services and Cooperation and cost allocation with external partners

  1. Private cloud metering

    The private cloud does not come with a pay-per-use-based invoice at the end of the month like it is standard with the public cloud providers. A private cloud metering has to be built and implemented to get to the pay-by-consumption model.

  2. One large bill per cloud provider

    Every month or quarterly there will be one large bill for all cloud services that are used across the organization. There is no way of getting around that - a black box.

  3. Per project cloud consumption

    Allocating cost to the consumption in each project is a big leap for cost transparency and visibility. Projects exceeding their budget can be identified early on and adjusted accordingly.

  4. Tenant fee for cloud governance

    A cloud foundation team or a cloud center of excellence takes care of cloud governance: Security and IAM are centralized and the team aims to deliver a good user experience for the DevOps teams. They can focus on their actual work and pay a tenant fee for the services of the cloud foundation team.

  5. Charge for cloud services

    Since not only cloud-native services are used in development and operations additional cloud services (monitoring, CI/CD tooling, connectivity) are provided by the cloud foundation team and other DevOps teams that can then charge for them.

  6. Cooperation and cost allocation with external partners

    The internal transformation into a service organization with full chargeback makes it possible to offer the same services to external partners. This is the peak of the cloud billing mountain everybody strives to climb.

The 7 Steps to Multi-Cloud Cost Management Perfection

Let's take our maturity model and see how to get from where you are now in terms of cloud billing and cost management to where you would like to be.

We have 6 stages in our model, but we'll start with step #0 to get the very basics covered:

Step #0 - Laying the Foundation for Cloud Billing and Cost Management

This step is all about making the right choices and laying the foundation. It has nothing to do with cloud billing and cost management directly.

There are two things you need to consider:

  1. Account structure: Projects, folders, subscriptions, resource groups, accounts - all these entities you are confronted with in the cloud need to be mapped to your organizational structure with teams, departments, and products. For example map an IT product (organizational construct) to a customer and an application stage like dev, QA, or production to a tenant (Azure subscription, AWS account, or GCP Project).

    At meshcloud the meshProject is central to assigned project users and team leads, cloud tenants with landing zones, the service marketplace and approved budget and chargeback

  2. Metadata and Tagging - Tags or labels are custom to every organization. Common tags are data classification, cost center, and environment. Defining a tagging schema helps with scaling and automating cloud usage. Resource tags and labels can be used to control and analyze costs.

Step #1 - Splitting up the Cloud Bill

This step is about getting from one big bill to allocated per project costs.

Central to this step is proper showback to increase transparency and visibility for cloud costs.

Why is that important?

For DevOps teams:

For central IT:

  • negotiating better contract conditions with cloud providers
  • defining service portfolio and seeing if certain cloud services drive costs to find alternatives
  • taking cost optimisation steps

meshcloud offers usage reports on the projects with according costs to increase transparency. For example meshcloud makes it easy to split the bill from shared Kubernetes clusters.

Step #2 - Charging the Right People

After raising awareness with proper showback, chargeback is the next logical step.

The large bill that goes to central IT is something you can't avoid. In a worst-case scenario, it is allocated manually to teams and projects: A model that is not sustainable.

Automation is key when it comes to allocating cloud costs:

Aggregate cost per customer → create monthly chargeback statements per project → provide detailed tenant usage reports per platform. This can then be exported to SAP or other tools. The tags and labels from step #0 come into play here. Without them, automation is not possible.

Step #3 - Staying within Budgets

This step is about keeping projects in budget by mapping budgets to projects.

For that, you need an end-to-end integration: A project is created and approved with a budget before a single cloud resource is used. As cloud resource consumption can vary greatly over the lifecycle of a project it is not easy to plan a budget that fits exactly and keep it in check.

A common setup we see is a DevOps team lead requests a cloud account and the cost center manager and central IT approve. The estimated budget for the project is mapped to the project as it is created. The resources are tagged and tracked as the team works on the project and consumes cloud resources.

All stages of the product - e.g. development, staging, and production - have to share the overall budget for the project. Since you can't cut off a project when the budget is exceeded you need to implement processes that regulate the costs beforehand.

Excursion: Multi-Cloud Organization Done Right

To utilize the cloud to the fullest enterprises have to transform their internal organization to fit the new cloud-native approach. What we at meshcloud see most with our customers is a transformation from a silo-like organization to a centralized cloud foundation team or cloud center of excellence that takes care of cloud or multi-cloud governance.
Visualization of a modern multi-cloud governance. Centralized in a cloud foundation team providing DevOps with good user experience, cloud environments, landing zones and so on.
They integrate platforms, define landing zones, organize provisioning, ensure continuous compliance and provide cross-cloud transparency. All this is centralized in a tool and highly automated. Manual processes are not scalable when DevOps grows. Self-service is implemented to decouple DevOps from cloud governance and the cloud foundation team. That accelerates time-to-cloud for DevOps and time to market for the business.

Step #4 - Cloud Governance to Boost Development

Traditionally DevOps teams have a lot of non-functional things to take care of (compliance, security, general organization, and governance) if not provided by a cloud foundation team. The cloud foundation team should take care of all of that and it can charge for the value it provides with what we call a tenant fee.

This model transforms the cloud foundation team from a cost to a profit center because they charge internally and there is a detailed cost allocation.

Step #5 - The Service Organization

Cloud-native resources from the cloud service providers need to be complemented by other building blocks like backend connectivity, CI/CD-tooling, or multi-cloud monitoring. These cloud services can be provided by the cloud foundation team, other DevOps teams, or by third parties as managed services.

To integrate these services you need cloud-native provisioning (self-service on-demand) and cloud-native billing (pay-per-use) to not destroy the advantages provided by the cloud. The goal is to avoid manual processes.

A service marketplace is a solution to keep the cloud-native processes and bring service owners and cloud users together. Provisioning of cloud services can be done with a service marketplace with plans and pricing as a basic setup to keep track of usage and costs. The service marketplace can also include third-party cloud services like datadog or a managed MongoDB.

A system like this incentivizes teams to provide services for others as well as paying attention to their own consumption.

The goal here is: No emails, no phone calls but a fully automated provisioning with costs showing up on the monthly chargeback statement that also contains the cloud costs.

Step #6 - Running IT Like a Business

This is the last step to fully integrated end-to-end cloud billing and cost management. With the service economy set up internally, it is now possible to offer and purchase cloud services to and from external organizations like start-ups or cooperation partners.

The service economy built in this step is based on the foundations you lay in step #0 to get proper isolation, be compliant with regulatory and internal rules.

With a cloud ecosystem like this, you have everything you need to boosts business success in the cloud-native way!


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


Building a generic Cloud Service Broker using the OSB API

This post gives an overview of OSB API service brokers and introduces an open-source generic OSB using git.

If you work as an Enterprise Architect, in a Cloud Foundation Team, in DevOps - or you're just interested in implementing the OSB API - this post is for you.

In this post we will answer the questions:

  • How can cloud services be distributed enterprise wide?
  • What are cloud service brokers?
  • How does the OSB API work?
  • What are the advantages of the OSB API?

Also, we'll go into detail on our generic Unipipe Service Broker that uses git to version cloud service instances, their provisioning status, and the service marketplace catalog.

The Service Marketplace as Central Distributor for Cloud Services

Platform services play an increasingly important role in cloud infrastructures. They enable application operators to quickly stick together the dependencies they need to run their applications.

Many IT organizations choose to offer a cloud service marketplace as part of their service portfolio. The marketplace acts as an addition to the cloud infrastructure services provided by large cloud providers like AWS, GCP, or Azure.

Service owners can build a light-weight service broker to offer their service on the marketplace. The broker is independent of any implementations of the service itself.

What exactly is a Service Broker and how does OSB API work?

The service broker handles the order of a cloud service and requests a service instance on the actual cloud platform. The user can choose from a catalog of services - e.g. an Azure vNet instance - enter necessary parameters and the broker takes care of getting this specific instance running for the user. In this example, the cloud service user would specify the vNet size (how many addresses do you need?), if it needs to be connected to an on-prem-network and so on.

A popular choice for modeling service broker to marketplace communication is the Open Service Broker API. The OSB API standardizes the way cloud services are called on cloud-native platforms across service providers. When a user browses the service catalog on the marketplace, finds a cloud service useful for his project, and orders it an OSB API request is invoked for provisioning the service.

Building a Cloud Service Broker using the OSB API and GIT

At meshcloud we offer a Service Marketplace with our cloud governance solution that communicates via the OSB API. A service broker has to be build to offer services on this marketplace: We started an open source project to provide developers with a generic broker called the Unipipe Service Broker.

The idea is an implementation of GitOps.
To understand how UniPipe can help to implement GitOps, consider the following definition:

The core idea of GitOps is (1) having a Git repository that always contains declarative descriptions of the infrastructure currently desired in the production environment and (2) an automated process to make the production environment match the described state in the repository.

The broker is implementing (1) by creating and maintaining an updated set of instance.yml files for services ordered from a Platform that speaks OSB API.
The automated process (2) can then be built with the tools your GitOps Team chooses.

The actual provisioning is done via a CI/CD pipeline triggered by changes on the git repository, using Infrastructure-as-Code (IaaC) tools that are made for service provisioning and deployment like terraform.

Using git might be a limiting choice for service owners who expect frequent concurrent orders. But from our experience, the majority of service brokers are called more like once an hour than once a second - even at large companies.

Configuration of the Unipipe Service Broker

You can look up everything you need to get started on our Unipipe Service Broker page.

The custom configuration of our generic broker can be done via environment variables. The following properties can be configured:

  • GIT_REMOTE: The remote Git repository to push the repo to
  • GIT_LOCAL-PATH: The path where the local Git Repo shall be created/used. Defaults to tmp/git
  • GIT_SSH-KEY: If you want to use SSH, this is the SSH key to be used for accessing the remote repo. Linebreaks must be replaced with spaces
  • GIT_USERNAME: If you use HTTPS to access the git repo, define the HTTPS username here
  • GIT_PASSWORD: If you use HTTPS to access the git repo, define the HTTPS password here
  • APP_BASIC-AUTH-USERNAME: The service broker API itself is secured via HTTP Basic Auth. Define the username for this here.
  • APP_BASIC-AUTH-PASSWORD: Define the basic auth password for requests against the API

The expected format for the GIT_SSH-KEY variable looks like this:

GIT_SSH-KEY=-----BEGIN RSA PRIVATE KEY-----
Hgiud8z89ijiojdobdikdosaa+hnjk789hdsanlklmladlsagasHOHAo7869+bcG x9tD2aI3...ysKQfmAnDBdG4=
-----END RSA PRIVATE KEY-----

Deployment using Docker

We publish generic-osb-api container images to GitHub Container Registry. These images are built on GitHub actions and are available publicly

$ docker pull ghcr.io/meshcloud/generic-osb-api:v1.0.5

Deployment to Cloud Foundry

In order to deploy the Unipipe Service Broker to Cloud Foundry, you just have to use a configured manifest file like this:

applications:
- name: generic-osb-api
    memory: 1024M
    path: build/libs/generic-osb-api-0.9.0.jar
    env:
        GIT_REMOTE: <https or ssh url for remote git repo>
        GIT_USERNAME: <if you use https, enter username here>
        GIT_PASSWORD: <if you use https, enter password here>
        APP_BASIC-AUTH-USERNAME: <the username for securing the OSB API itself>
        APP_BASIC-AUTH-PASSWORD: <the password for securing the OSB API itself>
./gradlew build # build the jar of the Generic OSB API
cf push -f cf-manifest.yml # deploy it to CF

Communication with the CI/CD pipeline

As the OSB API is completely provided by the Unipipe Service Broker, what you as a developer of a service broker have to focus on is building your CI/CD pipeline. An example pipeline can be found here.


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


Sticky notes on a whiteboard symbolizing a tagging strategy workshop.

Your Path to a Winning Multi-Cloud Tagging Strategy

This is an introduction to cloud resource tagging and labeling: If you are concerned with building a cloud architecture, then this blog post is for you! Tagging and labeling is an early stage topic of your cloud journey. It forms the foundation of organized and structured growth.

In this post we will cover:

  • Why tagging is an integral part of every cloud journey
  • 5 steps to a winning cloud tagging strategy
  • Common use cases of cloud resource tagging
  • How to stay consistent across multiple platforms
  • How meshcloud takes tagging to the next level

What are Cloud Resource Tags?

A tag is a label assigned to a cloud resource to apply custom metadata. Anything is taggable - from the cloud tenant on the top level to resource groups to single resources like virtual machines and databases.

Tags come as key value pairs:

The key describes the kind of tag that is then further specified by its value. For example, the key could be environment and the values could be development, staging, or production.

There are two different kinds of tags: The ones that are automatically generated by the cloud service provider - e.g. instance or subnet IDs - and user-defined tags.

For this post, we'll focus on the user-defined tags since they enable us to freely enrich our cloud resources with the information we consider relevant.

Why a tagging strategy is an absolute must-have

One central advantage of using the cloud is rapid scalability. And with this comes the necessity to keep track of what is going on in your cloud infrastructure while it is constantly growing and changing. That's where tags come in: You will need a consistent set of tags to apply globally across all of your resources following a consistent set of rules. Tagging is the cornerstone to effective cloud governance: Cost allocation, reporting, chargeback and showback, cost optimization, compliance, and security - all these aspects can only be managed with proper tagging in place.

For example, you can build a cloud cost dashboard in under 10 minutes.

Everything can be put into this mnemonic: Tag early, tag often.

Five steps to a winning tag management strategy

Tagging early and tagging often requires a tag management strategy that streamlines tagging practices across all teams, platforms, and resources.

The cloud governance team - or cloud foundation team - should take the lead in defining your global tagging strategy.

Here are 5 steps to get you started:

  1. Bring the stakeholders together

    Get everyone involved in the process who will be using tags or might have something to contribute to the integration of the strategy in the overall company processes. Of course, these are DevOps representatives, but also non-technical roles from accounting or marketing or any other group using cloud resources. Meet as a group to get the full picture, hear everybody's concerns, avoid misunderstandings and save yourself the trouble of making changes later. If your organization already uses tags, start with auditing what is there.

  2. Understand the purpose

    It is important to have a common understanding of what problems cloud resource tagging is supposed to solve. Define these questions early on in the process - here are some examples of what they could be:

    Which business unit within the organization should this cost be charged to?

    Which cost centers are driving my costs up or down?

    How much does it cost to operate a product that I’m responsible for?

    Are there unused resources in my dev/test environment?

  3. Focus and keep it simple

    You will not be able to set up an all-encompassing tagging strategy that will be valid for eternity. So don't make that your objective - keep it simple and set your focus. To get started, choose a small set of required tags you will need in the short term and build on them
    as needed. Choose three to five pressing areas you want to understand. For example, focus on cost reporting and align these tags with internal reporting requirements. Aim for an intuitive system to build on - more layers and granularity can be added further down the road.

  4. Define the naming convention

    You will need to decide on a naming convention for your tagging system. This is the backbone of everything you're trying to accomplish with your tagging strategy and must be enforced globally. If your company uses multiple cloud platforms or is planning on doing so, take into account that the platforms have different requirements for character count, allowed characters, case-sensitivity, and so on. You can consult our tags and labels cheat sheet to help you with that.

  5. Document everything and make it count

    Make sure to document everything you agree upon in this cross-sectional team working on the tagging strategy. This documentation should cover the naming convention, the policies when to use which tags, and the reasoning behind these decisions.

An organization-wide tagging strategy should make sure that tagging stays consistent on a global level. But take into account that individual teams or applications may add additional tags for their specific needs as well.

Common Use Cases for Cloud Resource Tagging

We've been talking about how tagging is essential and coming up with a tagging strategy should be an early stage step in setting up your cloud governance.

Here are the most common use cases to show you why:

  1. Cloud Cost Management

    Gain transparency when it comes to cloud usage and costs: Tagging cost centers, business units, and specific purposes help you keep track.

  2. Cloud Access Management

    Proper tagging enables administrators to control the access of users or user groups to resources or services.

  3. Cloud Security Management

    Tags are essential to identify sensitive resources and keeping them secure. For example, tagging the confidentiality classification helps to find the S3 bucket that's public and definitely shouldn't be or prevent that from happening in the first place (we'll come to that later).

  4. Automation

    The added metadata of tags enable a whole new level of automation: Many different automation tools can read these tags and utilize them to make your life easier in almost every regard concerning the previously mentioned use cases.

Challenges of Tagging in Multi-Cloud Architectures

Most companies use multiple cloud platforms and - in one way or another - struggle with the governance of their cloud architecture. Tags are here to help! BUT there are a few caveats that you need to be aware of to actually make things better.

Each cloud platform has its own tagging constraints - Google doesn't even call them tags but labels.

These questions will come up:

  • How many tags per resource are possible?
  • How many characters per tag and which characters are not allowed?
  • Is there a difference in requirements for keys and values?
  • What exceptions are there?

To help you with that we've created our Cheat Sheet for Tags and Labels on Cloud Platforms. There you can look up the differences in Azure, AWS, and GCP tagging and labeling.

Consistency in the usage and naming of tags becomes even more important when working in a multi-cloud architecture. It is extremely critical if you want to do queries based on tags - inconsistencies and typos can ruin the whole point of what you were trying to achieve.

Making the Most of Tagging with meshcloud

Now that we've covered what tags are, what they are good for and how to create a tagging strategy to drastically expand the possibilities for cloud governance, we'll talk about how meshcloud takes this to a whole new level:

With meshcloud cloud governance or cloud foundation, teams can define tags globally in one single place. This is incredibly helpful in keeping tags consistent across all platforms, teams, and resources.

meshcloud enables you to set and enforce tag formats, patterns, and constraints globally and make them work with all cloud platforms. With meshcloud, you define your tags as JSON and these can be entered in the UI either by employees themselves, or only administrators.

A code example that shows tag definition in meshcloud.
Tag definition example, in this case for classifying the environment of a project.

UI showing tags previously defined as JSON
The JSON you define will render into UI for your users

meshcloud enables cloud foundations teams to enforce possible tag values to a very granular level. You'll never have to worry if team members make typos or use different values for your tags. It is even possible to enforce the format of values using RegEx. For example, if your cost centers look like ACME-12345, you can enforce this format globally for all clouds.

And, remember when we discussed tag constraints on cloud platforms? We got you covered here. If a tag value is not valid in a cloud platform, meshcloud automatically converts this value to a valid value inside of the cloud. For example, GCP would not allow www.meshcloud.io as a value. It will automatically be converted to www_meshcloud_io, which is a valid GCP value.

Implementing your global tagging strategy across all clouds is not the only value meshcloud has to offer. With our policies we enable our customers to set and enforce rules based on tags across all platforms, teams, projects, and landing zones. This gives cloud foundation teams even more control over who has access to what. For example, you could enforce a certain Azure blueprint to be only used for production projects. Or you enforce that teams can only create projects for the environment they have been approved for. This makes sure that teams will not create production projects without being approved first.

Authors: Wulf Schiemann and Jelle den Burger


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


Sign on store door saying

Cloud Infrastructure Services: Enterprise-wide Distribution via a Marketplace

This is a comprehensive overview over provisioning cloud infrastructure services in 2020.

If you work as an Enterprise Architect, in a Cloud Foundation Team, in DevOps - or you're just interested in cloud infrastructure services - this post is for you.

In this post you will learn:

  • What cloud infrastructure services are
  • What benefits you can expect from them
  • Who the stakeholders are – service user and owner
  • About the role of Cloud Foundation Teams
  • About the challenges posed by provisioning of services
  • How a service marketplace helps you offering and using infrastructure services

If you want to dive deeper, this guide will go into more detail on:

  • The Open Service Broker API
  • How service owners can move from cost to profit center
  • Platform service model classification

Chapter 1: Fundamentals of Cloud Infrastructure Services

Let's get started with a quick intro to cloud infrastructure services.

Specifically, we will define what cloud infrastructure services are, what kinds there are, and what they are good for.

What is a Cloud Infrastructure Service?

Any service needed to actually run workload in the cloud is a cloud infrastructure service: These services are provided by AWS, Azure, or GCP for their platforms but can also be provided by third-party vendors or by the developers using the cloud.

Here are a few examples to get a quick and intuitive understanding of what cloud infrastructure services are: A database is such a service or a monitoring stack or the backend connectivity you need to run your application in the cloud. Another example would be a managed CI/CD service like Gitlab or Azure DevOps that is used in addition to the native infrastructure to follow cloud-native best practices.

You can think of cloud services as infrastructure building blocks, that are standardized and reusable, for any imaginable purpose.

What are Cloud Infrastructure Services good for?

Platform services play an increasingly important role in cloud infrastructures. They enable application operators to quickly stick together the dependencies they need to run their applications. For example, when deploying your application and you need a database, you just request a service instance of the database of your choice and connect it to your application. Done.

Managed services especially - we'll get to that in the next section - enable developers to focus on their application code and not on operations and dependencies or ordering processes.

The basis is the IaaC services provided natively by the cloud service providers: You can create any other - more complex - service on top of them. Organizations thrive for higher-level services, because it's less effort, accelerates time-to-market, and leverages existing knowledge.

What kinds of Cloud Infrastructure Services are there?

Now for a brief dive into the categories that cloud infrastructure services come in.

The first thing we can ask ourselves is by whom the services are provided:

On the one hand, there are the cloud service providers (CSPs), who offer first-hand services for their platforms.

Then there are a large number of third-party providers who bring their services to users via the corresponding cloud provider marketplaces.

Very often, however, services are also provided inhouse in order to meet very specific requirements.

We can also categorize cloud services by their operating model:

First there is the dimension and the level of management of services.

They come as

  • unmanaged,
  • operated and
  • managed services.

Unmanaged services are provisioned by the vendor and from there on the service user is on his own.

Operated services are monitored by the vendor and recovered if anything goes wrong. The effects of the service on the users' applications and data are still out of the vendor's scope in this case.

Managed services are basically run by the vendor: The responsibility of keeping the service available, providing data backups, recovery, and continuous updates makes this the most comfortable and most expensive way of consuming services.

There is a second dimension to service classification:

Is the service a white- or a blackbox to the user?

Whitebox: Details of the service deployment are visible to the user.

Blackbox: The service only offers a defined interface to control your service instance, but no insight where or how this service is deployed or operated.

Chapter 2: Stakeholders of Service Provisioning

First of all: It is important to know who is responsible for what aspect of a cloud infrastructure service:

The service owner and the service user are the two parties that share interest and responsibility in the service.

The Service Owner

Services can be owned by CSPs like AWS, Azure, and GCP, by third-party companies that offer special monitoring or connectivity services and services can be owned by corporate IT or DevOps-Teams. The owner usually charges for the usage of the service.

The Service User

Service users are usually developers and operators who work on applications in the cloud. The user is usually charged with the cost of the service instance.

The Role of Inhouse Cloud Service Owners

Cloud Foundation Teams, network or database teams are usually in the role of a service provider: They offer Databases-as-a-Service or monitoring services to the developers in the company. They focus on cross-sectional topics that are relevant for many cloud applications. Inhouse service owners can more often than not charge for the services provided - missing processes for billing are a major challenge here.

Chapter 3: The Challenges of Provisioning Services

Cloud Infrastructure Services arise from certain demands in the development teams that work with cloud infrastructure. They need monitoring, databases, connectivity, and so on, to do their job.

Where can developers find the right services?

Depending on the cloud technology they work with, developers have a catalog of 3rd party services to choose from: Google runs its Google Cloud Marketplace, Microsoft offers services in their Azure Marketplace and Amazon has its AWS Marketplace.

From operating systems to data analytics and machine learning services - there are solutions of every category.

Challenges for enterprise-wide service distribution

Even with the endless first and third party services in the marketplaces of the cloud providers there is a massive need for customized solutions:

In a worst-case scenario, the developers can't find a service that solves their problem and they build a solution themselves.

This leads us to the challenges that arise when offering and consuming services within an organization:

  • every team reinvents the wheel
  • intransparent and time consuming processes
  • manual provisioning, e.g. having to send a mail to the service owner
  • no central overview of which services are available within the organization
  • Service delivery is resource and time-consuming when self-service is not available
  • Lack of billing

Supply and demand of services

This set of challenges leads to the fundamental issue of service development and consumption in an organization: Potential service owners are deterred from developing services. It makes them a cost center and every new service adds effort when it comes to provisioning it.

In the next chapter, we'll see how a service marketplace addresses this issue by motivating service owners because they can offer services company-wide, gain visibility and charge for the usage of their services.

Chapter 4: The Service Marketplace

Many IT organizations choose to offer a Service Marketplace as part of their service portfolio. The Marketplace acts as an addition to the cloud infrastructure services provided by large cloud providers like AWS, GCP, or Azure.

The 4 components of a marketplace

Four components make up a cloud service marketplace: a service catalog, provisioning, billing, and service monitoring.

  1. The catalog requires the service owners to create intuitive experiences to discover the right product and help developers choose the right solution. The marketplace will also need to have enough available inventory, which means continually incentivizing service owners to provide new services.
  2. Provisioning acts as the backbone of the whole system. The marketplace has to manage which resources need to be created or deleted, it has to deliver the configuration and provision dedicated service instances.
  3. Billing provides the system of record. This system needs to implement several different billing models, payment methods, and processing.
  4. Lastly, service monitoring allows the service owner to check on running service instances and take care of failed services.

An active marketplace needs these critical systems to work smoothly together.

The major use case of a marketplace working like this is the possibility of offering services in a multi-cloud federation.

Benefits of a service marketplace

We learned that cloud services are the building blocks to a more complex cloud infrastructure. We also learned that there are several ways of offering and accessing services.

A service marketplace is the best option to go for and here is why:

  • Self-service
  • Chargeback
  • Service monitoring

Marketplace implementation with Open Service Broker API

At meshcloud, we've implemented our Service Marketplace solution based on the Open Service Broker API.

The Open Service Broker API defines an interface between platforms and service brokers. Service owners can build a light-weight service broker to offer their service on the marketplace. The broker is independent of any implementations of the service itself.

With this technology services of every category can be offered: Managed or unmanaged, black box or white box.

This self-service, on-demand marketplace increases developer velocity and minimizes time to deliver value to the market.

A graphic showing how the meshcloud service marketplace brings service owners and users together using the OSB API
This graphic shows how we at meshcloud implemented a marketplace to bring both service owners and users together.

About the Open Service Broker API
The Open Service Broker API project allows independent software vendors, SaaS providers, and developers to easily provide backing services to workloads running on cloud-native platforms such as Cloud Foundry and Kubernetes. The specification, which has been adopted by many platforms and thousands of service providers, describes a simple set of API endpoints that can be used to provision, gain access to, and managing service offerings.
The project has contributors from Google, IBM, Pivotal, Red Hat, SAP, and many other leading cloud companies.


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


Multi-Cloud Monitoring: A Cloud Security Essential

This is an introduction to cloud monitoring: If you work as a cloud operator or developer or you want to learn about cloud monitoring - this blog post is for you!

In this post you will learn:

  • What cloud monitoring is
  • How it helps you secure business success
  • How monitoring and alerting connect
  • About different types of monitoring
  • How Prometheus and cAdvisor work

Let's get started with the basics!

Cloud Monitoring: Definition and Challenges

Monitoring helps you understand the behavior of your cloud environments and applications.
Technically speaking, in IT, monitoring refers to observing and checking the state of hardware or software systems. Essentially to ensure the system is functioning as intended on a specific level of performance.

Monitoring in cloud environments can be a challenging task. Since there is no control over all layers of the infrastructure, monitoring becomes limited to upper layers depending on the cloud service model. Besides, cloud consumers frequently use containerized applications. Containers are intended to have short lives, even if they did last for long, we don’t rely on them e.g. for storing data. Since their nature is dynamic monitoring them is challenging. Tools such as Prometheus with cAdvisor take care of this challenge. More on that in the two bonus sections at the end of this blog post.

Five reasons why cloud monitoring helps business success

Here are five reasons why good monitoring helps you secure business success:

  1. Increase system availability: Don't let users take the place of proper monitoring and alerting. When an issue occurs on a system that is not being monitored, it will most certainly be reported by the users of that system. Detect problems early to mitigate them, before a user is disrupted by them.
  2. Boost performance: Monitoring systems leads to a more detailed understanding. Flaws become visible and Developers can gain detailed access and fix problems for better performance.
  3. Make better decisions: Detailed insight into the current state of a system allows more accurate decision-making based on actual data analysis.
  4. Predict the future: Predicting what might happen in the future by analyzing historical data is very powerful. An example is so-called pre-emptive maintenance; performing maintenance on parts of the system that have a high probability of failing soon, given the historical data provided.
  5. Automate, automate, automate: Monitoring highly reduces manual work. There is no need to manually check system components when there is a monitoring system doing the checks instead.

Monitoring and Alerting

Monitoring is usually linked to alerting. While monitoring introduces automation by pulling data from running processes, alerting adds even more automation by alerting developers when a problem occurs.

For example: Alerting if a critical process stops running.

Another important reason to monitor is conforming to Service Level Agreements (SLA). Violating the SLA could lead to damage to the business and monitoring helps to keep track of the agreements set in the SLA.

The Different Types of Monitoring

To classify types of monitoring we can ask two questions:

What is being monitored?

and

How is it being monitored?

To the first question there are many answers:

  • Uptime monitoring: As its name suggests, this is important to monitor service uptime.
  • Infrastructure monitoring: In the cloud world, infrastructure varies from traditional infrastructure in that resources are software-based; i.e. virtual machines and containers. And it is important to monitor these resources since they are the base of running processes and services.
  • Security monitoring: Security monitoring is concerned with SSL certificate expiry, intrusion detection, or penetration testing.
  • Disaster recovery monitoring: Also, taking backups for stored data is always an important and necessary practice. Monitoring the backup process is important to ensure it was done properly at its intended timeframe.

Now to the second question: How it is being monitored?

This lets us differentiate between Whitebox and Blackbox monitoring:
Illustration of whitebox and blackbox monitoring. Credits to https://medium.com/@ATavgen for the illustration idea.

Whitebox monitoring: This type refers to monitoring the internals of a system. When monitoring applications, the running process also exposes information about itself which makes it visible to the outside world. Exposed information can be in a form of metrics, logs, or traces.

Blackbox monitoring: This type refers to monitoring the behavior of an object or a system usually by performing a probe (i.e. sending an HTTP request) and checking the result such as ping to check the latency of a request to a server. This type does not check any internals of the application.

The concept of white box and black box is used in software testing with semantically similar meaning as in monitoring. It is also concerned with testing the internals and externals of a software system. The difference being, that software testing usually occurs during development while monitoring is applied when the software is already running.

4 Tips for monitoring cloud security

Correct monitoring will tell you if your cloud infrastructure functions as intended while minimizing the risk of data breaches.

To do that there are a few guidelines to follow:

  • Your monitoring tools need to be scalable to your growing cloud infrastructure and data volumes
  • Aim for constant and instant monitoring of new or modified components
  • Don't rely on what you get from your cloud service provider alone - you need transparency in every layer of your cloud infrastructure
  • Make sure you get enough context with your monitoring alerts to help you understand what is going on

You can and should monitor on different layers (e.g. network, application performance) and there are different tools for doing this. SIEM (Security Information and Event Management) tools collect data from various sources. They process this data to identify and report on security-related incidents and send out alerts whenever a potential risk has been identified.

Bonus 1: Prometheus Architecture

As promised a short excursion to Prometheus:

Prometheus is a metric-based, open-source monitoring tool written in Go. It is the second graduating project after Kubernetes was adopted by the CNCF and will remain fully open-source. Prometheus has its own query language called PromQL which is powerful for performing operations in the metric space. Prometheus also uses its own time-series database (TSDB) for storage.

Prometheus architecture illustration

Prometheus uses service discovery to discover targets or can use statically defined targets as well. It scrapes those targets which are either applications that directly expose Prometheus metrics through Prometheus client libraries or with the help of exporters that translate data from third-party applications into metrics that can be scraped by Prometheus.
While Prometheus has its own time-series storage in which the scraped metrics are stored, it can also use these stored time-series data to evaluate alert rules. Once a condition is met, alerts are sent to Alertmanager which in turn sends a notification to a configured destination (Email, PagerDuty, etc.)

Prometheus time-series data can also be visualized by third-party visualization tools such as Grafana. These tools leverage Prometheus query language to pull time-series data from Prometheus
storage.

Bonus 2: Container Monitoring using cAdvisor and Prometheus

cAdvisor (Container Advisor) is a tool to tackle the challenge of monitoring containers. Its core functionality is making the resource usage and performance characteristics of containers transparent to their users. cAdvisor exposes Prometheus metrics out of the box. It is a running daemon that collects, aggregates, processes, and exports information about running containers. cAdvisor supports Docker and pretty much every other container type out there.

To get started you'll need to configure Prometheus to scrape metrics from cAdvisor:

scrape_configs:
- job_name: cadvisor
  scrape_interval: 5s
  static_configs:
  - targets:
    - cadvisor:8080

Create your containers - Docker for example - that run Prometheus, cAdvisor, and an application to see metrics produced by your containers, collected by cAdvisor, and scraped by Prometheus.

Authors: Mohammad Alhussan and Wulf Schiemann


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


Rules to comply with traffic light depicting compliance and regulation.

Ensuring Continuous Compliance in the Cloud

Ensuring continuous compliance in dynamic multi-cloud architectures is quite the task. The challenge: Achieving and maintaining the required level of compliance across all environments. To many this seems more complex than traditional data center operations. But cloud computing opens up possibilities to stay compliant that by far outweigh the challenges it may add.

In this post you will learn

  • what we mean by continuous compliance,
  • about the typical challenges of implementing continuous compliance,
  • why a declarative approach is superior
  • and as a bonus: What tools and services Azure, AWS and GCP offer you to stay compliant.

Let's get started!

What is continuous compliance?

For compliance efforts to make any sense they have to be ongoing. That means they have to stretch way beyond the initial setup and migration. Continuous compliance ensures a compliant state of all cloud environments at any point - especially in day-to-day operations.

A failure to be compliant to - e.g. the European GDPR - can result in substantial fines and loss of reputation.

Continuous compliance is a matter of culture and strategy in your organization. It's a matter of using the right tools and services to actually live up to the set standards in practice.

Continuous compliance frees IT departments from only reacting to regulators or threats to data security: Well-implemented continuous compliance practices prepare the organization for future security threats and audit requirements.

The 3 hurdles to get over when implementing continuous compliance

There are 3 major hurdles any organization needs to clear when it comes to continuous compliance and moving to the cloud:

  1. Evaluation,
  2. building,
  3. scaling.

The evaluation of general and industry-specific regulation is the first - easier - step. It's followed by the assessment of possible cloud platforms and the definition of enterprise-specific compliance requirements.

With building we mean the challenge of implementing the continuous compliance strategy organization-wide. Spanning all teams, environments, and applications.

The last hurdle is making the continuous compliance strategy and implementation fit for scale. New cloud platforms, new projects, and the dynamic change of existing environments must all be incorporated in continuous compliance efforts.

Continuous Compliance in Cloud Computing

Moving workload to the cloud is a complex operation. Especially multi-cloud architectures with multiple cloud service providers like AWS, Azure, and GCP. Achieving and maintaining compliance across clouds and applications seems more difficult than the traditional datacenter operations.

Cloud migration timeline showing compliance milestones
On this timeline, you can see major compliance milestones in the cloud migration process and during operations: From evaluation to building and scaling.


The added complexity is outweighed by the added transparency the cloud offers: Cloud technology allows you to audit, query, alert and resolve issues on a grand scale across all environments.

There is no denying it - initial definition is complex: A service provided by central IT - let's say a jump host - may take weeks or even months to get security and compliance clearance. But services provided by the cloud, like logging and anomaly detection, paradigms like CI/CD, and automation are great tools to overcome complexity and build on a large scale.

The extent to which companies can utilize the new options and what it will cost them depends on the approach they opt for:

We discern between the workflow-centric and the declarative approach.

Challenges of the workflow-centric approach

A common way to speed up slow manual processes is to automate the workflow.

So for example, instead of having an Azure Admin manually create and configure a subscription for a DevOps team, there will be a script automating the workflow to reduce the time needed.

But what happens if the DevOps team lead goes ahead and changes the set up to better suit the application's needs? Right, configuration drift and no one will be aware of environments becoming non-compliant.

To detect non-compliant environments a compliance monitoring can be introduced subsequently: It issues an alarm if compliance policies are violated. A workflow must then be triggered again to resolve the discrepancies.

The declarative approach: Taking continuous compliance a step further

A superior approach is to define the desired state. That is what we mean by the declarative approach. It is the final and most mature stage in our multi-cloud management maturity model. It offers a lot of potential to take the hurdles of building and scaling we talked about earlier.

The declarative approach focuses on the what as opposed to the how of the workflow-centric approach: The declarative approach has the benefit that it enables a continuous validation of the actual state against the defined desired state (re-certification) and provides a single source of truth to avoid configuration drift.

To stick with the Azure example, this could be an Azure subscription with access permissions for a DevOps team lead and one of his team members. This desired state definition can be continuously compared to the actual state. If no subscription or permissions exist yet, they will be initially set up. If the DevOps team lead changes the configuration, this will be detected. If it is intended the desired state can be updated, if not the action can be undone to get back to the desired configuration.

The declarative approach covers both technical and organizational compliance measures: Tools like Azure Resource Manager templates help you describe the desired state to ensure continuous technical compliance.

The same is possible for continuous organizational compliance: As part of the government-funded MultiSecure project, meshcloud enables enterprises to describe organizational structures in a declarative format.

Let's have a quick look at an example: Productive cloud environments do not only have to follow specific configurations - provided by landing zones - but organizations have to make sure that only authorized staff creates and accesses these environments.

The idea of MultiSecure is to describe organizational elements and their relationships as code in an open and reusable format - a declarative manifest that represents the desired target state of the organization. MultiSecure allows centralizing this information in an open format, instead of squeezing the organization into the envisaged organizational models of the cloud providers and therefore maintaining multiple proprietary organizational models in parallel and distributed. It builds a projection of the organization that can be consumed by different systems.

A practical look at using meshcloud

Let's have a look at how you can prevent unauthorized and unintended permission changes in your cloud environments using meshcloud:

The creation of new cloud environments often comes with a certain permission set. A DevOps team lead creates a project and receives permissions to access and edit the corresponding cloud environments and so do the other DevOps team members who work on the application deployment.

If DevOps teams receive cloud-native access to the clouds, such permissions are prone to unintended or unauthorized change.

To prevent this configuration drift, the permissions must be monitored. With meshcloud - following the declarative approach - the desired state of configurations is continuously checked against the actual state of the subscription. In case of deviations, the configurations will be automatically restored to maintain a compliant state of all environments.

Bonus: A quick look at Azure, AWS and GCP compliance services

The big public cloud platforms offer a range of resources, tools, and services to help their customers implementing their continuous compliance strategies.

Let's have a look!

Microsoft Azure:

Microsoft puts its Azure Trust Center forward to explain what Azure offers in terms of compliance: From audit reports and compliance offerings (including regulation and certification like GDPR or ISO 27001) to their understanding of shared responsibility.

With Azure Security Center Microsoft offers an infrastructure security management system to protect cloud and data center workloads.

Further Azure services include

  • Azure Sentinel (cloud native SIEM and security analytics)
  • Azure Policy (implementing governance and standards for Azure resources)

Amazon AWS:

Amazon offers what they call the AWS Security Hub. It provides insights into the security state of AWS environments and helps to check against set security and compliance standards.

AWS Systems Manager provides visibility and control to manage infrastructure securely at scale. It helps to maintain compliance by detecting policy violations.

Google Cloud Platform:

In their Cloud Compliance Resource Center Google collects all important information on what tools and services GCP offer to help to stay compliant on their platform. Google provides a wide variety of compliance offerings - global and regional.

With Google Anthos there now is a service that lets you enforce security and compliance policies across all cloud environments.

GCP also supports third-party services like Forseti Security that provide monitoring, enforcing, and displaying policies.


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to get in touch with you.


Cloud Exit Strategy: Ensure Compliance and Prevent Vendor Lock-in

A clear cloud exit strategy is absolutely necessary when moving to the cloud. Ensuring both compliance and business continuity - and avoiding vendor lock-in - are the primary reasons.

Today, large companies can no longer do without the advantages of the cloud. The competitive pressure of digitalization requires migration to the cloud. At the same time, high regulatory demands are being placed on outsourcing - especially in critical industries. And using public cloud is outsourcing!

Placing oneself in the hands of the hyperscalers involves risks. You generally don't have to worry about security: Amazon, Microsft, and Google do a lot to keep their offers safe. Compliance and vendor lock-in is a different matter: It is important to clarify which data and solutions end up with which cloud providers. For this purpose, a clear exit strategy is required in addition to the (multi-)cloud architecture and the cloud sourcing strategy. A high-profile example is that of Dropbox leaving AWS in favor of hosting their workloads themselves.

Regulatory Requirements

In certain critical industries, a documented exit strategy is more than just a good idea, it is a regulatory requirement.

The banking sector is one of the most heavily regulated industries. The regulators also deal with the cloud use of companies. The European Banking Authority, for example, requires an exit strategy for outsourced critical or important functions in its EBA Guidelines on outsourcing arrangements under chapter 15. This includes the use of the public cloud.

The German financial supervisory authority BaFin also prescribes the development of an exit strategy in its banking supervisory requirements for IT.

4 aspects of vendor lock-in

Vendor lock-in means not being able to shift workloads from one vendor to another without too much hassle. That it brings great advantages to be able to do this shows the recent downtime at IBM.

Not being able to do so has different possible reasons:

  • Cost
  • Contracts
  • Skills
  • Technology

Cost is a major factor in prohibiting migration from one vendor to another. The vendor might charge for exporting data. In addition to that costs will pile up for training staff, consultants, and the lowered productivity. The larger the workload the larger the costs. A good example to look at is Netflix: The streaming service is all in on AWS and won't be able to change that - at least not with reasonable costs.

Contracts can play a big role in vendor lock-in. Some cloud service providers make it hard to decide for a migration to an alternative vendor by implementing a continuously upward pricing model that aims at drawing their customers deeper and deeper into a factual lock-in. At some point, a partial exit may no longer be economical and a complete and difficult withdrawal from the whole contract the only option.

Skills play a big role in migrating and operating workloads. Cloud architects, DevOps teams, and security experts are specialized and it takes time and money to shift that knowledge to newly adopted cloud platforms. That can be a major hurdle when considering leaving a vendor for another. Going multi-cloud from the start provides companies with a larger talent pool and that takes the trouble out of transitioning a little bit.

Technology causes vendor lock-in as well - at least when it comes to proprietary technology vendors use to differentiate. On the one hand, that's great and can offer a competitive edge. On the other hand, it can get companies locked in on this technology and hinder the adoption of the next big thing in cloud technologies.

The 4 key aspects to every cloud exit strategy

So here are 4 aspects you will have to have an eye on when building your cloud exit strategy:

  1. Most importantly: Take inventory! Knowing your assets is essential. Exit strategies often apply to critical business functions only. So it’s important to know what you have running in which cloud – an up-to-date cloud inventory is of great help.
  2. Open-source infrastructure is key. Open-source infrastructure components like Kubernetes or OpenShift clusters or open-source databases can make a move between clouds much easier. The more proprietary services you use, the harder it will be to adapt your application to run in a new cloud environment.
  3. Go multi-cloud from the beginning. Contract negotiations between enterprises and cloud providers can take a while. It’s too late to start the process when it’s actually time to move
  4. Watch out for organizational lock-in. Even if from a technical perspective your application can easily be moved to a different cloud provider, there’s more to it. If you are running cloud applications at scale, setting up the corresponding cloud environments transferring permissions and configurations comes with massive complexity. Use a centralized governance system like meshcloud to keep your organizational structures independent from specific providers.

To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to getting in touch with you.


Cloud Landing Zone Lifecycle explained!

The Cloud is the backbone and foundation of digital transformation in its many forms. The - quite literal - foundation for a successful transformation to the cloud is the concept of landing zones. This post will cover the management of landing zones over their lifetime.

But let's start with a brief definition of what a landing zone is and does:

What is a Landing Zone?

A landing zone is the underlying core configuration of any cloud adoption environment. Landing zones provide a pre-configured environment - provisioned through code - to host workloads in private, hybrid, or public clouds. You don't want to hand your developers "naked" cloud tenants - completely unconfigured AWS accounts, Azure subscriptions, or GCP projects.

Here are 4 key aspects a landing zone can and should take care of in your cloud:

  1. Security & Compliance
  2. Standardized tenancy
  3. Identity and access management
  4. Networking

A landing zone is certainly the starting point of your cloud journey - but it is also a constantly evolving core component of your infrastructure.

Benefits of Landing Zones:

Landing Zones allow you to standardize cloud environments that are provisioned to DevOps teams. They offer consistency across all tenants in naming, scaling, and access control. With that comes a security baseline that preempts (accidental) non-compliant or unauthorized configurations.

So let's talk about the different phases of a landing zones lifecycle!

Design, Deploy, Operate: 3 "Days" in the life of a landing zone

In software development you often hear the terms

"Day 0/Day 1/Day 2".

Those refer to different phases in the life of a software: From specifications and design (Day 0) to development and deployment (Day 1) to operations (Day 2). For this blog post, we're going to use this terminology to describe the phases of the landing zone lifecycle.

Visualization of a cloud landing zone lifecycle.

Day 0: Designing a Landing Zone

As the starting point of your cloud journey and the core component of your cloud environment landing zones should be well thought out and strategized - certainly with Day 1 and 2 in mind. Let's expand on the 4 aspects a well-designed landing zone should take care of in the cloud:

  1. Security and Compliance: Centralize your security, monitoring, and logging approach. Company-wide compliance and data residency policies for example can be implemented with landing zones. This way you can ensure a base level of compliance over multiple tenants or environments.
  2. Standardized tenancy: Enforce tagging policies across multiple cloud tenants and provide standardized tenants for different security profiles (dev/staging/prod).
  3. Identity and access management: Implement the principle of least privilege by defining roles and access policies. Define your user ID configurations and password standards across tenants.
  4. Networking: Provide IaaS network configurations, firewalls, and other basic networking parameters you want to have in place.

Day 1: Deploying a Landing Zone

On Day 1 it comes to customizing and deploying a landing zone according to the design and specifications determined on Day 0. The implementation of the landing zone concept is handled differently by every public cloud service provider.

Let's have a look at the big 3 CSPs:

Microsoft Azure: Within Microsoft's public cloud platform the concept of landing zones is implemented in the Cloud Adoption Framework. A major tool is Azure blueprints: You can choose and configure migration landing zone blueprints within Azure to set up your cloud environments. As an alternative, you can use third-party services like terraform.

Amazon Web Services: The landing zone solution provided by AWS is just called AWS Landing Zone. This solution includes a security baseline pre-configuring AWS services like CloudTrail, GuardDuty, and Landing Zone Notifications. The service also automates the setup of a landing zone environment thereby speeding up cloud migrations. Depending on your use case AWS offers Cloud Formation Templates to customize and standardize service or application architectures.

Google Cloud Platform: With GCP the Google Deployment Manager is the way to go to write flexible template and configuration files. You can use a declarative format utilizing Yaml - or Python and Jinja2 templates - to configure your deployments.

Day 2: Operating a Landing Zone

Cloud environments and their usage are never static. That means ongoing effort has to go into the management and operations of the underlying landing zones.

As your use of the cloud expands, the landing zones need to be well-maintained and updated as all aspects of cloud environments evolve: Implementing new best practices from the cloud providers, reacting to new needs that arise from new applications or responding to upcoming security threats. Make sure to keep your architecture flexible enough to be able to expand and update your landing zones during operations.

The meshcloud take on Landing Zones

We at meshcloud have our own take on the landing zone concept: With meshLandingzones we support native tooling provided by the different cloud platforms and vendors. This way we ensure seamless integration of existing operational capabilities and leverage the most powerful and best-integrated tooling available for each platform. In most instances, this tooling follows an infrastructure-as-code paradigm that fits naturally with meshcloud's multi-cloud orchestration approach.

On day 0, the design of the landing zones is done by native tools of the respective providers.
On day 1 meshcloud comes into play for the deployment. For example, previously created Azure blueprints can be integrated into meshLandingzones. meshLandingzones rely on the various native tools of the providers: In the case of AWS, these are in particular OU Assignments, Lambda Invocations, and Cloud Formation Templates.

For day 2 operations meshcloud offers various mechanisms for landing zone management. With fast updates of landing zones across many projects, it is possible to react to short-term security risks. The long-term development of landing zones to comply with new regulations and requirements is made possible by versioning the landing zones. With meshcloud, you always have a cross-platform overview of which projects use which landing zone (and in which version).


To learn more about the meshcloud platform, please get in touch with our sales team or book a demo with one of our product experts. We're looking forward to getting in touch with you.