This is an introduction to cloud monitoring: If you work as a cloud operator or developer or you want to learn about cloud monitoring - this blog post is for you!
In this post you will learn:
- What cloud monitoring is
- How it helps you secure business success
- How monitoring and alerting connect
- About different types of monitoring
- How Prometheus and cAdvisor work
Let's get started with the basics!
Cloud Monitoring: Definition and Challenges
Monitoring helps you understand the behavior of your cloud environments and applications.
Technically speaking, in IT, monitoring refers to observing and checking the state of hardware or software systems. Essentially to ensure the system is functioning as intended on a specific level of performance.
Monitoring in cloud environments can be a challenging task. Since there is no control over all layers of the infrastructure, monitoring becomes limited to upper layers depending on the cloud service model. Besides, cloud consumers frequently use containerized applications. Containers are intended to have short lives, even if they did last for long, we don’t rely on them e.g. for storing data. Since their nature is dynamic monitoring them is challenging. Tools such as Prometheus with cAdvisor take care of this challenge. More on that in the two bonus sections at the end of this blog post.
Five reasons how cloud monitoring helps business success
Here are five reasons how good monitoring helps you secure business success:
- Increase system availability: Don't let users take the place of proper monitoring and alerting. When an issue occurs on a system that is not being monitored, it will most certainly be reported by the users of that system. Detect problems early to mitigate them, before a user is disrupted by them.
- Boost performance: Monitoring systems leads to a more detailed understanding. Flaws become visible and Developers can gain detailed access and fix problems for better performance.
- Make better decisions: Detailed insight into the current state of a system allows more accurate decision-making based on actual data analysis.
- Predict the future: Predicting what might happen in the future by analyzing historical data is very powerful. An example is so-called pre-emptive maintenance; performing maintenance on parts of the system that have a high probability of failing soon, given the historical data provided.
- Automate, automate, automate: Monitoring highly reduces manual work. There is no need to manually check system components when there is a monitoring system doing the checks instead.
Monitoring and Alerting
Monitoring is usually linked to alerting. While monitoring introduces automation by pulling data from running processes, alerting adds even more automation by alerting developers when a problem occurs.
For example: Alerting if a critical process stops running.
Another important reason to monitor is conforming to Service Level Agreements (SLA). Violating the SLA could lead to damage to the business and monitoring helps to keep track of the agreements set in the SLA.
The Different Types of Monitoring
To classify types of monitoring we can ask two questions:
What is being monitored?
How is it being monitored?
To the first question there are many answers:
- Uptime monitoring: As its name suggests, this is important to monitor service uptime.
- Infrastructure monitoring: In the cloud world, infrastructure varies from traditional infrastructure in that resources are software-based; i.e. virtual machines and containers. And it is important to monitor these resources since they are the base of running processes and services.
- Security monitoring: Security monitoring is concerned with SSL certificate expiry, intrusion detection, or penetration testing.
- Disaster recovery monitoring: Also, taking backups for stored data is always an important and necessary practice. Monitoring the backup process is important to ensure it was done properly at its intended timeframe.
Now to the second question: How it is being monitored?
This lets us differentiate between Whitebox and Blackbox monitoring:
Credits to https://medium.com/@ATavgen for the illustration idea.
Whitebox monitoring: This type refers to monitoring the internals of a system. When monitoring applications, the running process also exposes information about itself which makes it visible to the outside world. Exposed information can be in a form of metrics, logs, or traces.
Blackbox monitoring: This type refers to monitoring the behavior of an object or a system usually by performing a probe (i.e. sending an HTTP request) and checking the result such as ping to check the latency of a request to a server. This type does not check any internals of the application.
The concept of white box and black box is used in software testing with semantically similar meaning as in monitoring. It is also concerned with testing the internals and externals of a software system. The difference being, that software testing usually occurs during development while monitoring is applied when the software is already running.
4 Tips for monitoring cloud security
Correct monitoring will tell you if your cloud infrastructure functions as intended while minimizing the risk of data breaches.
To do that there are a few guidelines to follow:
- Your monitoring tools need to be scalable to your growing cloud infrastructure and data volumes
- Aim for constant and instant monitoring of new or modified components
- Don't rely on what you get from your cloud service provider alone - you need transparency in every layer of your cloud infrastructure
- Make sure you get enough context with your monitoring alerts to help you understand what is going on
You can and should monitor on different layers (e.g. network, application performance) and there are different tools for doing this. SIEM (Security Information and Event Management) tools collect data from various sources. They process this data to identify and report on security-related incidents and send out alerts whenever a potential risk has been identified.
Bonus 1: Prometheus Architecture
As promised a short excursion to Prometheus:
Prometheus is a metric-based, open-source monitoring tool written in Go. It is the second graduating project after Kubernetes was adopted by the CNCF and will remain fully open-source. Prometheus has its own query language called PromQL which is powerful for performing operations in the metric space. Prometheus also uses its own time-series database (TSDB) for storage.
Prometheus uses service discovery to discover targets or can use statically defined targets as well. It scrapes those targets which are either applications that directly expose Prometheus metrics through Prometheus client libraries or with the help of exporters that translate data from third-party applications into metrics that can be scraped by Prometheus.
While Prometheus has its own time-series storage in which the scraped metrics are stored, it can also use these stored time-series data to evaluate alert rules. Once a condition is met, alerts are sent to Alertmanager which in turn sends a notification to a configured destination (Email, PagerDuty, etc.)
Prometheus time-series data can also be visualized by third-party visualization tools such as Grafana. These tools leverage Prometheus query language to pull time-series data from Prometheus
Bonus 2: Container Monitoring using cAdvisor and Prometheus
cAdvisor (Container Advisor) is a tool to tackle the challenge of monitoring containers. Its core functionality is making the resource usage and performance characteristics of containers transparent to their users. cAdvisor exposes Prometheus metrics out of the box. It is a running daemon that collects, aggregates, processes, and exports information about running containers. cAdvisor supports Docker and pretty much every other container type out there.
To get started you'll need to configure Prometheus to scrape metrics from cAdvisor:
scrape_configs: - job_name: cadvisor scrape_interval: 5s static_configs: - targets: - cadvisor:8080
Create your containers - Docker for example - that run Prometheus, cAdvisor, and an application to see metrics produced by your containers, collected by cAdvisor, and scraped by Prometheus.
Authors: Mohammad Alhussan and Wulf Schiemann