WE ENABLE CLOUD-NATIVE ORGANIZATIONS
meshBlog

Learn more about Cloud, Multi-Cloud and Software Delivery

This is an introduction to cloud monitoring: If you work as a cloud operator or developer or you want to learn about cloud monitoring - this blog post is for you!

In this post you will learn:

  • What cloud monitoring is
  • How it helps you secure business success
  • How monitoring and alerting connect
  • About different types of monitoring
  • How Prometheus and cAdvisor work

Let's get started with the basics!

Cloud Monitoring: Definition and Challenges

Monitoring helps you understand the behaviour of your cloud environments and applications.
Technically speaking, in IT, monitoring refers to observing and checking the state of hardware or software systems. Essentially to ensure the system is functioning as intended on a specific level of performance.

Monitoring in cloud environments can be a challenging task. Since there is no control over all layers of the infrastructure, monitoring becomes limited to upper layers depending on the cloud service model. In addition, cloud consumers frequently use containerized applications. Containers are intended to have short lives, even if they did last for long, we don’t rely on them e.g. for storing data. Since their nature is dynamic monitoring them is challenging. Tools such as Prometheus with cAdvisor take care of this challenge. More on that in the two bonus sections at the end of this blog post.

Five reasons how cloud monitoring helps business success

Here are five reasons how good monitoring helps you secure business success:

  1. Increase system availability: Don't let users take the place of proper monitoring and alerting. When an issue occurs on a system that is not being monitored, it will most certainly be reported by the users of that system. Detect problems early to mitigate them, before a user is disrupted by them.
  2. Boost performance: Monitoring systems leads to more detailed understanding. Flaws become visible and Developers can gain detailed access and fix problems for better performance.
  3. Make better decisions: Detailed insight into the current state of a system allows more accurate decision making based on actual data analysis.
  4. Predict the future: Predicting what might happen in the future by analyzing historical data is very powerful. An example is so-called pre-emptive maintenance; performing maintenance on parts of the system that have a high probability of failing in the near future, given the historical data provided.
  5. Automate, automate, automate: Monitoring highly reduces manual work. There is no need to manually check system components when there is a monitoring system doing the checks instead.

Monitoring and Alerting

Monitoring is usually linked to alerting. While monitoring introduces automation by pulling data from running processes, alerting adds even more automation by alerting developers when a problem occurs.

For example: Alerting if a critical process stops running.

Another important reason to monitor is conforming to Service Level Agreements (SLA). Violating the SLA could lead to damage to the business and monitoring helps keeping track of the agreements set in the SLA.

The Different Types of Monitoring

To classify types of monitoring we can ask two questions:

What is being monitored?

and

How is it being monitored?

To the first question there are many answers:

  • Uptime monitoring: As its name suggests, this is important to monitor service uptime.
  • Infrastructure monitoring: In the cloud world, infrastructure varies from traditional infrastructure in that resources are software-based; i.e. virtual machines and containers. And it is important to monitor these resources since they are the base of running processes and services.
  • Security monitoring: Security monitoring is concerned with SSL certificate expiry, intrusion detection or penetration testing.
  • Disaster recovery monitoring: Also, taking backups for stored data is always important and a necessary practice. Monitoring the backup process is important to ensure it was done properly at its intended timeframe.

Now to the second question: How it is being monitored?

This let's us differentiate between Whitebox and Blackbox monitoring:
Illustration of whitebox and blackbox monitoring. Credits to https://medium.com/@ATavgen for the illustration idea.

Whitebox monitoring: This type refers to monitoring the internals of a system. When monitoring applications, the running process also exposes information about itself which makes it visible to the outside world. Exposed information can be in a form of metrics, logs or traces.

Blackbox monitoring: This type refers to monitoring the behaviour of an object or a system usually by performing probe (i.e. sending an HTTP request) and checking the result such as ping to check the latency of a request to a server. This type does not check any internals of the application.

The concept of whitebox and blackbox is used in software testing with semantically similar meaning as in monitoring. It is also concerned with testing internals and externals of a software system. The difference being, that software testing usually occurs during development while monitoring is applied when the software is already running.

4 Tips for monitoring cloud security

Correct monitoring will tell you if your cloud infrastructure functions as intended while minimizing the risk of data breaches.

To do that there are a few guidelines to follow:

  • Your monitoring tools need to be scalable to your growing cloud infrastructure and data volumes
  • Aim for constant and instant monitoring of new or modified components
  • Don't rely on what you get from your cloud service provider alone - you need transparency in every layer of your cloud infrastructure
  • Make sure you get enough context with your monitoring alerts to help you understand what is going on

You can and should monitor on different layers (e.g. network, application performance) and there are different tools for doing this. SIEM (Security Information and Event Management) tools collect data from various sources. They process this data to identify and report on security-related incidents and send out alerts whenever a potential risk has been identified.

Bonus 1: Prometheus Architecture

As promised a short excursion to Prometheus:

Prometheus is a metric-based, open-source monitoring tool written in Go. It is the second graduating project after Kubernetes adopted by the CNCF and will remain fully open-source. Prometheus has its own query language called PromQL which is powerful for performing operations in the metric space. Prometheus also uses its own time-series database (TSDB) for storage.

Prometheus architecture illustration

Prometheus uses service discovery to discover targets, or can use statically defined targets as well. It scrapes those targets which are either applications that directly expose Prometheus metrics through Prometheus client libraries or with the help of exporters that translate data from third-party applications into metrics that can be scraped by Prometheus.
While Prometheus has its own time-series storage in which the scraped metrics are stored, it can also use these stored time-series data to evaluate alert rules. Once a condition is met, alerts are sent to Alertmanager which in turn sends a notification to a configured destination (Email, PagerDuty, etc.)

Prometheus timeseries data can also be visualized by third-party visualization tools such as Grafana. These tools leverage Prometheus query language to pull time-series data from Prometheus
storage.

Bonus 2: Container Monitoring using cAdvisor and Prometheus

cAdvisor (Container Advisor) is a tool to tackle the challenge of monitoring containers. Its core functionality is making the resource usage and performance characteristics of containers transparent to their users. cAdvisor exposes Prometheus metrics out of the box. It is a running daemon that collects, aggregates, processes, and exports information about running containers. cAdvisor supports Docker and pretty much every other container type out there.

To get started you'll need to configure Prometheus to scrape metrics from cAdvisor:

scrape_configs:
- job_name: cadvisor
  scrape_interval: 5s
  static_configs:
  - targets:
    - cadvisor:8080

Create your containers - Docker for example - that run Prometheus, cAdvisor, and an application to see metrics produced by your containers, collected by cAdvisor and scraped by Prometheus.

Authors: Mohammad Alhussan and Wulf Schiemann

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.