Aggregation settings

When graphing or alerting on a metric in Sysdig Monitor, you have the option of adjusting the aggregation settings - this informs how Sysdig Monitor will roll up the available data samples in order to create the chart or evaluate the alert. There are two types of aggregation: time aggregation and group aggregation.

 

Order of operations

Time aggregation is always performed before group aggregation. 

 

Time Aggregation

Time aggregation comes into effect in two potentially overlapping situations:

  • Charts can only render a limited number of data points, so if you want to look at a wide range of data, Sysdig Monitor may need to aggregate granular data into larger samples to visualize.
  • Sysdig Monitor rolls up historical data over time, but we retain rollups based on every aggregation type - so can choose which one you want to utilize when evaluating older data.

Note, this means that when looking at recent data in a small time window, time aggregation may not need to be utilized at all, so the settings here will have no impact.

Average: 

  • Returns the average value of the metric across the time period being evaluated
  • Formula: sum of data samples / number of data samples
  • Often useful for "gauge" type metrics (eg. cpu %, memory bytes)

Rate:

  • Returns the average value of the metric across the time period being evaluated
  • Formula: sum of data samples / number of time intervals
  • Often useful for "counter" type metrics (eg. network i/o, file i/o)
  • Note: most metrics are sampled once for each time interval, so "Average" and "Rate" will return the same value. There will be a distinction though for any metrics that are not reported at every interval, eg. some custom statsd metrics. 
  • Note: "Rate" is currently known as timeAvg in the Sysdig Monitor API and advanced alerting language.

Sum:

  • Returns the total value of the metric summed across the time period being evaluated
  • Often useful for "counter" type metrics (eg. network i/o, file i/o)

Min/Max:

  • Returns the lowest/highest value during the time period being evaluated

 

Group Aggregation

When a metric is being evaluated across a group of entities (hosts, containers, etc), Sysdig Monitor needs to aggregate data samples from across the group - the method of aggregation used in this case is the "Group Aggregation".

Average:

  • Returns the average value of the metric across the group

Sum:

  • Returns the total value of the metric summed across the group

Min/Max:

  • Returns the lowest/highest value of the metric for any member the group 

A Real Life Example

In the graphic below, the CPU% metric is applied to the ‘webserver’ group of servers.  The top chart shows metrics with Average aggregations for Time and Group.   The bottom chart is set to Maximum aggregation for Time and Group:

 

For each one-minute interval shown on the chart,  the bottom chart renders the highest CPU usage value found from the servers in the "webserver" group and from all of the samples reported during the one-minute interval.  Setting a view in this configuration is useful when looking for transient spikes in metrics over long periods of time that would otherwise be missed with averaging.

 

Group Aggregation with Segmentation

Note that if you segment a chart or alert, then the Group Aggregation settings will be utilized for aggregation across the whole group and aggregation within each individual segmentation. 

For example, let's say you are looking at a chart of CPU % for a server:

But now you segment by proc.name ... In this case you will see one CPU % line in the chart for each process:

The important thing to realize now is that each of those lines is also an aggregation of all the processes which share that same name.

For example, in this case there are multiple processes named java. And since we're using "Group Aggregation = Average", that java line represents the average CPU % used by all Java processes.

This explains why the lines in the second chart do not seem to add up to the total in the first chart. To address this confusion, just switch to "Group Aggregation = Sum":

Have more questions? Submit a request