State of the Art: StatsD Deployment Strategies In Containerized Environments

 

The most standard statsd deployment strategy consists of a statsd collector that resides on a different machine, and aggregates the traffic from all of the statsd sources. In this image below, for example, we have three different containers, each of which with an application emitting statsd traffic toward the collector.

 

               

Usually, the collector’s machine is reachable through a static IP address or an ad hoc DNS entry. This is how most open source statsd deployments are done. It works well, but has a couple of limitations:

  • every single statsd metric update has to travel across the network, which imposes a tradeoff between the frequency of metric updates and the network bandwidth that gets consumed.
  • statsd metrics travel on the network separately from other metrics gathered from the machine or container, decreasing the opportunity to compress and efficiently transmit performance data.
  • it’s impossible to automatically tag and context-enhance the metrics for successive segmentation, because the information on which container/host/application has generated the metric is lost.

 

For these reasons, several commercial implementations tend to adopt a different model: they have a local collector on each machine. This local collector assembles different types of metrics metrics (statsd, but also system information, process information, application information, and so on) into samples that get shipped to a general purpose monitoring backend at regular intervals, for example every minute. This model is shown in the next image, and has the advantage of being more efficient, especially in bigger deployments. Because metrics are aggregated and compressed before being sent to the monitoring backend. It’s also simple, because you don’t have to configure a target for statsd: it just goes to localhost.



Only problem? This model works well in traditional virtualized environments, but tends to be a pain with containers. Adding a metric collection agent to every container is inefficient, complicates deployments, and is just not adherent to the container philosophy of having one process per container.

 

As a consequence, a third strategy is gaining adoption, shown in this picture:

The statsd collector resides on the same machine, but in a separate container. This separate container is often also in charge of collecting system metrics, stitching everything together and sending samples to a general purpose backend at regular intervals. This is a good compromise, definitely much better than having a statsd daemon on each container, but it’s still far from ideal. In particular, the apps in the containers need to know where they should send the statsd UDP payloads. This can be done in a couple of ways:

  • through a mechanism called linking, which involves exporting the IP address of the monitoring container to the other containers as an environment variable. This mechanism is quite rudimentary and pretty fragile. For example, it makes it hard to update the monitoring container, because that will almost certainly change its IP address and destroy the linking.
  • by assigning a static IP to the monitoring container. This has all the limitations involved with using static IP addresses, including having to having to come up with a way to avoid address conflicts if you want a monitoring container on each physical host.

 

So, too bad: we are left without good, easy options. Or maybe not?

 

The Sysdig Monitor Approach

When we designed our highly requested statsd integration, we had containerized environments in mind from day one. We came up with what we called passive statsd collection (or, as I prefer to call it, statds teleport). Let me show you how it works:



The careful reader has surely noticed that there are no arrows connecting the app containers and the monitor container. That’s because the applications inside the containers send their statsd messages to localhost. This means: no need to hardcode a collector IP address. However there’s no statsd collector on localhost, so the UDP payload just gets dropped (the image depicts it as the message going to a trashcan). The same message “magically” appears in the monitoring container (container4). Once there, it’s is received by sysdig, which feeds it to the local statds collector, after enriching it with a set of tags (container name, ID and image name) that will enable segmentation. The statsd messages are merged with Sysdig Monitor's incredibly detailed system, network and application metrics and then they are compressed and shipped to the Sysdig Monitor backend, once per second.

 

You’re probably wondering: “how do statsd messages go from the application container to the Sysdig Monitor one”? Let’s add detail to the previous image:

         

What happens is that each network transmission made from inside the application containers, including statsd messages, including the ones sent to a non existent destination, generates a system call. The Sysdig agent can capture these system calls from a separate container, where the statsd collector is listening. In practice, sysdig acts as a transparent proxy between the application and the statds collectior, even if they are in different containers. Sysdig also knows which container the system call is coming from, and uses that information to transparently tag the statsd message.

 

Benefits

We believe our approach brings the benefits of a local statsd collector, without the typical drawbacks related to containers integration. In particular, the benefits include:

  • no need to instrument the container in any way.
  • super simple “push to localhost” approach.
  • no network configuration required, no need to deal with DNS or static IP addresses.
  • local aggregation with minimal bandwidth overhead.
  • out of the box container/host tagging.
  • aggregation of stasd metrics with best in class container system, network and application metrics.
  • it just works. Note how the passive collection system works even if you are already exporting statsd metrics to an existing collector. In that case, Sysdig Monitor will capture them as well, with minimal overhead and with no disruption to the current export.

 

 

Have more questions? Submit a request