Why observability became so complex and expensive?

In Today IT software and hardware industry complexity grows by each day.

Krzysztof Grabowski

Jul 22, 2024

•

2

min.

In Today IT software and hardware industry complexity grows by each day. We already move forward from bare metal machines and evolve to virtual machines with cloud providers. But complexity didn’t end there and containers became a new reality for most software deployment.

Today the complexity of software deployments solve some performance problems but unfortunately create a lot of new ones. In the distant past monitoring of deployed software was rather straightforward when monolithic software was directly deployed to the bare metal machines. Term observability didn’t even exist in the IT industry. During that time monitoring only consists of alerts mostly used via Nagios and some basic metrics with the RRD tool and some extra extensions to Nagios. There wasn’t need for high resolution graphs and even a few minutes resolution was sufficient.

When virtual machines became much more common for software deployments, monitoring became much more demanding because now it needed to monitor not a low number of bare metal machines but a higher number of software deployments on each virtual machine. In a really short period of time requirements for monitoring grow by at least one order of magnitude. Monitoring software that was deployed quickly became a bottleneck. Many previously software solutions used for monitoring stopped being sufficient because they weren’t architectured to process and store so much data. With new challenges new monitoring software needs to be written to solve monitoring problems. The most popular solutions become Graphite and Prometheus later on. During that time one of Prometheus core values “Instrument first, ask questions later” became a new reality. Besides metrics with alerts and graphs, logs also became an important part of monitoring stack. Because of that new term observability was born to describe not only alerts and metrics and graphs but also logs. Due to growing needs observability software also grew with new vendors on the market but observability in general became quite costly to demands for high resolution and long retention period. Unfortunately high quality open source observability software has become limited. Many companies decided to go fully only in closed source and offer quite expensive license fees.

But complexity of software didn’t end there and went to another level with microservices architecture and containers. All problems mentioned earlier worsened again because transition from previous architecture to container based brought even more congestion. Volume of data that needs to be stored and processed grows again at least one order of magnitude or even more. Observability started to cover distributed tracing as part of software instrumentalisation. Short lived containers also bring a burden in the form of churn to observability software. The most known open source monitoring tool Prometheus became rewritten and allowed to be efficiently used with containers too.

Over time Grafana company became really important for observability in open source world. Grafana as company decided to release their observability products in open source form that allow to use them without need for expensive licensing fees. This approach requires skillful operators to prepare and run and maintain the Grafana observability stack.

Over time observability became a fundamental part of insight about the state of deployed software on any available platform.

Observability currently consist of following pillars:

metrics
alerts
graphs
logs
distributed tracing

Observability is still evolving and covering new areas and providing even more insights of deployed software.