There are two types of ways metrics can be aggregated in Sysdig Monitor: across time and across groups. This aggregation applies to metrics displayed in charts as well as metrics used to calculate alerts.
Sysdig agents collect and report metrics at 10 second resolution. When displaying a time series chart with 5 minutes or less of time, datapoints are drawn at this 10 second sampling so any time aggregation selections will have no effect. When a larger amount of time is displayed (over 5 min), data points are drawn as an aggregate for an appropriate time interval. For example, looking at a chart that covers 1 hour of time, each datapoint will reflect a one minute interval.
At intervals of 1 minute and above, you can configure your chart to display different aggregates for the 10 second metrics used to calculate each datapoint. By default, the Average aggregation type is used when displaying datapoints for the time interval, but you can also choose to have the highest value of the interval’s samples displayed (Maximum) or the lowest (Minimum). To display the value of all the samples of the interval combined, use Sum as the aggregate type.
For alerts that cover longer stretches of time, time aggregation applies in this same way.
When a metric has been applied to a group of items (hosts, containers), by default metrics are averaged between the members of the group. For example, three hosts report different CPU usage for one sample interval. The three values will be averaged and reported on the chart as a single datapoint for that metric. Using the Group Aggregation menu, you can change aggregation between Average, Minimum, Maximum and Sum. If Maximum is selected in the same example above, the highest CPU value of the three hosts would be reported on the chart. To display the total value of the samples taken for all the hosts, select Sum as the Group Aggregation type.
For alerts covering groups of nodes, group aggregation applies in this same way.
In the graphic below, the CPU% metric is applied to the ‘webserver’ group of servers. The top chart shows metrics with Average aggregations for Time and Group. The bottom chart is set to Maximum aggregation for Time and Group:
For each one-minute interval shown on the chart, the bottom chart renders the highest CPU usage value found from the servers in the webserver group and from all of the samples reported during the one-minute interval. Setting a view in this configuration is useful when looking for transient spikes in metrics over long periods of time that would otherwise be missed with averaging.
Note that group aggregation is dependent on the segmentation. If a view is showing metrics for a group of items and the Segment By selection is changed to break out the individual items on the view, the effect will be to nullify the group aggregation setting.