Sysdig Monitor Metrics

This page lists the metrics coming from the Sysdig Cloud Agent with a brief description of each. Metrics are constantly being added and this list will be updated over time.

Categories

Metrics are grouped in the following categories:

Name Description
App Checks Custom metrics specific for an application for example Redis, MongoDB, Memcached and more
Containers Container metrics
File File related metrics
Host Host related metrics
JMX Metrics coming from the Java Management Extensions
JVM Metrics coming from the Java Virtual Machine
Kubernetes Kubernetes related metrics
Network Network related metrics
Process Process related metrics
Provider Provider related metrics
System Contains all the metrics related with the system, such as CPU, Memory, File System, Processes
StatsD Metrics coming from StatsD

 

Containers

Top

Container metrics

Container ID

container.id

The ID of the running container. In case of docker, this is a 12 digit hex number.

Container Image

container.image

The name of the image used to run the container.

Container Name

container.name

The name of a running container.

Container Count

container.count

Count of the number of containers.
Tip: This metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) containers of a certain type in a certain group or node - try segmenting by container.image, .id or .name. See also: host.count.

 

File

Top

File related metrics

File Bytes Received

file.bytes.in

Amount of bytes read from file.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

File Bytes Written

file.bytes.out

Amount of bytes written to file.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

File Bytes Total

file.bytes.total

Amount of bytes read from and written to file.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

File Open Error Count

file.error.open.count

Number of errors in opening files.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

File Error Count

file.error.total.count

Number of error caused by file access.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Read File IOPS

file.iops.in

Number of file read operations per second.
Note: This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Write File IOPS

file.iops.out

Number of file write operations per second.
Note: This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Total File IOPS

file.iops.total

Number of read and write file operations per second.
Note: This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Number Of Open Files

file.open.count

Number of time the file has been opened.

Time Spent In File Reading

file.time.in

Time spent in file reading.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Time Spent In File Writing

file.time.out

Time spent in file writing.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Time Spent In File I/O

file.time.total

Time spent in file I/O.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

 

Host

Top

Host related metrics

Sysdig Agent tags

agent.tag

An optional user-defined label assigned during the Sysdig Monitor agent installation.
Usage: Use tags to assign roles or other identifying information to each instance so it can be logically grouped and displayed in hierarchies on views. Assign tags upon Sysdig agent installation (e.g., role:webserver, location:bldg_3, area:east_coast).

Host Count

host.count

Count of the number of hosts.
Tip: This metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) machines of a certain type in a certain group - try segment by tag or hostname. See also: container.count.

Website Domain

host.domain

Domain name for external websites.

System Call Errors

host.error.count

Number of system call errors.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Hostname

host.hostName

Host name as defined in the /etc/hostname file.

Private IPs

host.ipList

Private machine IP addresses.

Is Client or Server

host.isClientServer

Return 'Client' if the host is a Client, otherwise 'Server'.

Instrumented Host

host.isInstrumented

Specifies if the host has a Sysdig Monitor agent installed.

Internal/External Host

host.isInternal

Specifies if the host is part of your AWS or Rackspace infrastructure or if it is external to it (e.g. another website).

Host MAC Address

host.mac

Media Access Control address of the host.

Main Processes

host.procList.main

The top server processes on the host.

Server Process Names

host.procList.server

List of server program names present in the host.

Number of Processes

proc.count

Number of processes on host or container.

Number of Process Starts

proc.start.count

Number of process starts on host or container.

 

JVM

Top

Metrics coming from the Java Management Extensions

JVM Heap Max (MB)

jvm.heap.max

The maximum size allocation of heap memory for the JVM (defined by the –Xmx option). Any memory allocation attempt that would exceed this limit will cause an OutOfMemoryError exception to be thrown.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

JVM Non-Heap Max (MB)

jvm.nonHeap.max

The maximum size allocation of non-heap memory for the JVM. This memory is used by Java to store loaded classes and other meta-data.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Heap Used (MB)

jvm.heap.used

The amount of allocated heap memory (ie Heap Committed) currently in use. Heap memory is the storage area for Java objects. An object in the heap that is referenced by another object is 'live', and will remain in the heap as long as it continues to be referenced. Objects that are no longer referenced are garbage and will be cleared out of the heap to reclaim space.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Non-Heap Used (MB)

jvm.nonHeap.used

The amount of allocated non-heap memory (ie Non-Heap Committed) currently in use. Non-heap memory is used by Java to store loaded classes and other meta-data.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Heap Used %

jvm.heap.used.percent

The ratio between Heap Used and Heap Committed.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Non-Heap Used %

jvm.nonHeap.used.percent

The ratio between Non-Heap Used and Non-Heap Committed.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Heap Init (MB)

jvm.heap.init

The initial amount of memory that the JVM requests from the operating system for heap memory during startup (defined by the –Xms option). The JVM may request additional memory from the operating system and may also release memory to the system over time. The value of Heap Init may be undefined.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Non-Heap Init (MB)

jvm.nonHeap.init

The initial amount of memory that the JVM requests from the operating system for non-heap memory during startup. The JVM may request additional memory from the operating system and may also release memory to the system over time. The value of Non-Heap Init may be undefined.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Heap Committed (MB)

jvm.heap.committed

The amount of memory that is currently allocated to the JVM for heap memory. Heap memory is the storage area for Java objects. The JVM may release memory to the system and Heap Committed could decrease below Heap Init; but Heap Committed can never increase above Heap Max.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Non-Heap Committed (MB)

jvm.nonHeap.committed

The amount of memory that is currently allocated to the JVM for non-heap memory. Non-heap memory is used by Java to store loaded classes and other meta-data. The JVM may release memory to the system and Non-Heap Committed could decrease below Non-Heap Init; but Non-Heap Committed can never increase above Non-Heap Max.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

JVM Loaded Classes

jvm.class.loaded

The number of classes that are currently loaded in the JVM.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

JVM Thread Count

jvm.thread.count

The current number of live daemon and non-daemon threads.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

JVM Thread Daemon

jvm.thread.daemon

The current number of live daemon threads. Daemon threads are used for background supporting tasks and are only needed while normal threads are executing.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

JVM PS MarkSweep Time

jvm.gc.PS_MarkSweep.time

The amount of time the parallel scavenge Mark-Sweep old generation garbage collector has run.

JVM PS MarkSweep Count

jvm.gc.PS_MarkSweep.count

The number of times the parallel scavenge Mark-Sweep old generation garbage collector has run.

JVM PS Scavenge Count

jvm.gc.PS_Scavenge.count

The number of times the parallel eden/survivor space garbage collector has run.

JVM PS Scavenge Time

jvm.gc.PS_Scavenge.time

The amount of time the parallel eden/survivor space garbage collector has run.

JVM Par New Count

jvm.gc.ParNew.count

The number of times the parallel garbage collector has run.

JVM Par New Time

jvm.gc.ParNew.time

The amount of time the parallel garbage collector has run.

JVM ConcurrentMarkSweep Count

jvm.gc.ConcurrentMarkSweep.count

The number of times the Concurrent Mark-Sweep garbage collector has run.

JVM ConcurrentMarkSweep Time

jvm.gc.ConcurrentMarkSweep.time

The amount of time the Concurrent Mark-Sweep garbage collector has run.

 

Kubernetes

Top

Kubernetes related metrics

Desired Daemon Set Pods

kubernetes.daemonSet.pods.desired

The number of nodes that should be running the daemon pod.

Misscheduled Daemon Set Pods

kubernetes.daemonSet.pods.misscheduled

The number of nodes running a daemon pod but are not supposed to.

Ready Daemon Set Pods

kubernetes.daemonSet.pods.ready

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.

Scheduled Daemon Set Pods

kubernetes.daemonSet.pods.scheduled

The number of nodes that running at least one daemon pod and are supposed to.

Available Deployment Replicas

kubernetes.deployment.replicas.available

The number of available pods per deployment.

Desired Deployment Replicas

kubernetes.deployment.replicas.desired

The number of desired pods per deployment.

Paused Deployment Replicas

kubernetes.deployment.replicas.paused

The number of paused pods per deployment. These pods will not be processed by the deployment controller.

Running Deployment Replicas

kubernetes.deployment.replicas.running

The number of running pods per deployment.

Unavailable Deployment Replicas

kubernetes.deployment.replicas.unavailable

The number of unavailable pods per deployment.

Updated Deployment Replicas

kubernetes.deployment.replicas.updated

The number of updated pods per deployment.

Job Completions

kubernetes.job.completions

The desired number of successfully finished pods that the job should be run with.

Job Failed

kubernetes.job.numFailed

The number of pods which reached Phase Failed.

Job Succeeded 

kubernetes.job.numSucceeded

The number of pods which reached Phase Succeeded.

Job Parallelism

kubernetes.job.parallelism

The maximum desired number of pods that the job should run at any given time.

Active Job

kubernetes.job.status.active

The number of actively running pods.

Namespace Count

kubernetes.namespace.count

The number of namespaces.

Namespace Deployment Count

kubernetes.namespace.deployment.count

The number of deployments per namespace.

Namespace Job Count

kubernetes.namespace.job.count

The number of jobs per namespaces.

Namespace Replica Set Count

kubernetes.namespace.replicaSet.count

The number of replicaSets per namespace.

Namespace Service Count

kubernetes.namespace.service.count

The number of services per namespace.

Node Allocatable CPU Cores

kubernetes.node.allocatable.cpuCores

The CPU resources of a node that are available for scheduling.

Node Allocatable Memory Bytes

kubernetes.node.allocatable.memBytes

The memory resources of a node that are available for scheduling.

Node Allocatable Pods

kubernetes.node.allocatable.pods

The pod resources of a node that are available for scheduling.

Node Capacity CPU Cores

kubernetes.node.capacity.cpuCores

The maximum CPU resources of the node.

Node Capacity Memory Bytes

kubernetes.node.capacity.memBytes

The maximum memory resources of the node.

Node Capacity Pods

kubernetes.node.capacity.pods

The maximum number of pods of the node.

Node Disk Pressure

kubernetes.node.diskPressure

The number of nodes with disk pressure.

Node Memory Pressure

kubernetes.node.memoryPressure

The number of nodes with memory pressure.

Node Network Unavailable

kubernetes.node.networkUnavailable

The number of nodes with network unavailable.

Node Out of Disk

kubernetes.node.outOfDisk

The number of nodes that are out of disk space.

Node Ready

kubernetes.node.ready

The number of nodes that are ready.

Node Unschedulable

kubernetes.node.unschedulable

The number of nodes unavailable to schedule new pods.

Waiting Pod Containers

kubernetes.pod.containers.waiting

The number of containers waiting for a pod.

Pod Resource Limits on CPU Cores

kubernetes.pod.resourceLimits.cpuCores

The limit on CPU cores to be used by a container.

Pod Resource Limits on Memory Bytes

kubernetes.pod.resourceLimits.memBytes

The limit on memory to be used by a container in bytes.

Pod Resource Requests of CPU Cores

kubernetes.pod.resourceRequests.cpuCores

The number of CPU cores requested by containers in the pod.

Pod Resource Requests of Memory Bytes

kubernetes.pod.resourceRequests.memBytes

The number of memory bytes requested by containers in the pod.

Pod Restart Count

kubernetes.pod.restart.count

The number of container restarts for the pod.

Pod Status Ready

kubernetes.pod.status.ready

The number of pods ready to serve requests.

Desired Replica Set Replicas

kubernetes.replicaSet.replicas.desired

The number of desired pods per replicaSet.

Fully Labeled Replica Set Replicas

kubernetes.replicaSet.replicas.fullyLabeled

The number of fully labeled pods per replicaSet.

Ready Replica Set Replicas

kubernetes.replicaSet.replicas.ready

The number of ready pods per replicaSet.

Running Replica Set Replicas

kubernetes.replicaSet.replicas.running

The number of running pods per replicaSet.

Desired Replication Controller Replicas

kubernetes.replicationController.replicas.desired

The number of desired pods per replication controller.

Running Replication Controller Replicas

kubernetes.replicationController.replicas.running

The number of running pods per replication controller.

 

Network

Top

Network related metrics

Estimated Max Stolen Requests

capacity.estimated.request.stolen.count

Number of requests that this node cannot serve because of CPU steal time.
Usage: This metric is calculated by measuring the current number of requests that a machine is serving, and extrapolating how many more it could serve if there were no steal time. You can use it to understand how steal time is impacting your ability to serve user requests.

Estimated Max Requests

capacity.estimated.request.total.count

Number of requests that this node is estimated to serve at full capacity.
Usage: This metric is calculated by measuring the current number of requests that a machine is serving and the resources (CPU, disk, network...) that each of them is using. The values are combined to project how many requests will the machine be able to serve at full capacity. Observing this metric will help you determine if and when you need to increase the capacity of your infrastructure.

Network Bytes In

net.bytes.in

Inbound network bytes.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Network Bytes Out

net.bytes.out

Outbound network bytes.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Network Bytes Total

net.bytes.total

Total network bytes, inbound and outbound.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Client IP Address

net.client.ip

Client IP address.

Number Of Client Connections

net.connection.count.in

Number of currently established client (inbound) connections.
Tip: This metric is especially useful when segmented by protocol, port or process.

Number Of Server Connections

net.connection.count.out

Number of currently established server (outbound) connections.
Tip: This metric is especially useful when segmented by protocol, port or process.

Number Of Connections

net.connection.count.total

Number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.
Tip: This metric is especially useful when segmented by protocol, port or process.

Network Error Count

net.error.count

Number of network errors.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Failed HTTP Requests

net.http.error.count

Number of failed HTTP requests as counted from 4xx/5xx status codes.

Http Method

net.http.method

HTTP request method.

HTTP Request Count

net.http.request.count

Count of HTTP requests.

Average HTTP Request Time

net.http.request.time

Average time for HTTP requests.

Max HTTP Request Time

net.http.request.time.worst

Maximum time for HTTP requests.

HTTP Status Code

net.http.statusCode

HTTP response status code.

URL

net.http.url

URL from an HTTP request.

IP Address

net.ip

IP address.

Link Delay Per Request

net.link.delay.perRequest

Average delay in the network link per request.

Client->Server Bytes

net.link.clientServer.bytes

Bytes passing through the link from client to server.

Server->Client Bytes

net.link.serverClient.bytes

Bytes passing through the link from server to client.

MongoDB Collections

net.mongodb.collections

MongoDB collections.

MongoDB Collections Details

net.mongodb.collections.details

MongoDB collections details.

Failed MongoDB Request Count

net.mongodb.error.count

Number of Failed MongoDB requests.

MongoDB Operation

net.mongodb.operation

MongoDB Operation.

MongoDB Query Type

net.mongodb.op.type

MongoDB OP Type.

MongoDB Request Count

net.mongodb.request.count

Total number of MongoDB requests.

Avg MongoDB Request Time

net.mongodb.request.time

Average time to complete a MongoDB request.

Max MongoDB Request Time

net.mongodb.request.time.worst

Maximum time to complete a MongoDB request.

Failed SQL Request Count

net.sql.error.count

Number of Failed SQL requests.

SQL Query

net.sql.query

The full SQL query.

SQL Query Type

net.sql.query.type

SQL query type (SELECT, INSERT, DELETE, etc.).

SQL Request Count

net.sql.request.count

Number of SQL requests.

Avg SQL Request Time

net.sql.request.time

Average time to complete a SQL request.

Max SQL Request Time

net.sql.request.time.worst

Maximum time to complete a SQL request.

SQL Table

net.sql.table

SQL query table name.

Network Protocol

net.protocol

The network protocol of a request (e.g. HTTP, MySQL).

Remote Connection Endpoint

net.remote.endpoint

IP address of a remote node.

Remote Connection Service

net.remote.service

Service (port number) of a remote node.

Requests Total

net.request.count

Total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.

Requests In

net.request.count.in

Number of inbound network requests.

Requests Out

net.request.count.out

Number of outbound network requests.

Avg Request Time

net.request.time

A measure of response time which includes app + network latency. For server side it is purely a measure of app latency. This is calculated by measuring when we see the arrival of the last request buffer to when we see the departure of the first response buffer.

Request Time - File I/O

net.request.time.file

The amount of time for serving a request that is spent doing file I/O. See also net.request.time.net (network I/O time) and net.request.time.processing (CPU processing time).

Request Time % - File I/O

net.request.time.file.percent

The percentage of time for serving a request that is spent doing file I/O. See also net.request.time.net (network I/O time) and net.request.time.processing (CPU processing time).

Avg Request Time - Inbound

net.request.time.in

Average time to serve an inbound request.

Total Time In Node

net.request.time.local

Average per request delay introduced by this node when it serves requests coming from the previous tiers. In other words, this is the time spent serving incoming requests minus the time spent waiting for outgoing requests to complete.

Delay Local Tiers %

net.request.time.local.percent

When serving requests that come from previous tiers, the percentage of time spent in the local node versus the next tiers.

Request Time - Network I/O

net.request.time.net

The amount of time for serving a request that is spent doing network I/O. See also net.request.time.file (file I/O time) and net.request.time.processing (CPU processing time).

Request Time % - Network I/O

net.request.time.net.percent

The percent of time for serving a request that is spent doing network I/O. See also net.request.time.file (file I/O time) and net.request.time.processing (CPU processing time).

Total Time In Next Tiers

net.request.time.nextTiers

Delay introduced by the successive tiers when serving requests.

% Time In Next Tiers

net.request.time.nextTiers.percent

When serving requests that come from previous tiers, the percentage of time spent in the next tiers versus the local node.

Avg Request Time - Outbound

net.request.time.out

Average time spent waiting for an outbound request.

Request Time - Processing

net.request.time.processing

The amount of time for serving a request that is spent doing CPU processing. See also net.request.time.file (file I/O time) and net.request.time.net (network I/O time).

Request Time % - Processing

net.request.time.processing.percent

The percent of time for serving a request that is spent doing CPU processing. See also net.request.time.file (file I/O time) and net.request.time.net (network I/O time).

Max Request Time In

net.request.time.worst.in

Maximum time to serve an inbound request.

Max Request Time Out

net.request.time.worst.out

Maximum time spent waiting for an outbound request.

Server IP Address

net.server.ip

Server IP address.

TCP/UDP Server Port

net.server.port

TCP/UDP Server Port number.

TCP Request Queue Len

net.tcp.queue.len

Length of the TCP request queue.

 

Process

Top

Process related metrics

FD Usage %

fd.used.percent

Percentage of used file descriptors out of the maximum available.
Usage: Usually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Process start counter

proc.start.count

A count of how many times a process starts, per second.
Tip: Define this metric in an alert in order to be notified if a process is restarted unexpectedly. For example, create a manual alert with the condition of proc.start.count > 0 on sum, then define the alert's 'Where' clause with 'proc.name' equal to the process to be monitored.

Process Command Line

proc.commandLine

Command line used to start the process.

Process ID

proc.id

Process ID (Internal)

Process Name

proc.name

Name of the process.

Client Process Name

proc.name.client

Name of the Client process.

Server Process Name

proc.name.server

Name of the server process.

 

Provider

Top

Provider related metrics

AWS T2 EC Credit Balance

aws.ec2.CPUCreditBalance

Cpu credit balance for AWS T2 instances.

AWS T2 EC Credit Usage

aws.ec2.CPUCreditUsage

Cpu credit usage for AWS T2 instances.

Account Id

cloudProvider.account.id

The account number related to your AWS account - useful when you have multiple AWS accounts linked with Sysdig Monitor.

Availability Zone

cloudProvider.availabilityZone

The AWS Availability Zone where the entity or entities are located. Each Availability zone is an isolated subsection of an AWS region (see cloudProvider.region).

Provider Private Addresses

cloudProvider.host.ip.private

The private IP address allocated by the cloud provider for the instance. This address can be used for communication between instances in the same network.

Public IPs

cloudProvider.host.ip.public

Public IP addresses of the selected host.

Cloud Provider

cloudProvider.host.name

The name of the host as reported by the cloud provider (e.g. AWS).

Provider ID

cloudProvider.id

ID number as assigned and reported by the cloud provider.

Instance Type

cloudProvider.instance.type

The type of AWS or Rackspace instance.
Usage: This metric is extremely useful to segment instances and compare their resource usage and saturation. You can use it as a grouping criteria for the explore table to quickly explore AWS usage on a per-instance-type basis. You can also use it to compare things like CPU usage, number of requests or network utilization for different instance types.
Tip: Use this grouping criteria in conjunction with the host.count metric to easily create a report on how many instances of each type you have.

Cloud Provider Name

cloudProvider.name

Name of the cloud service provider (AWS, Rackspace, etc.).

Region

cloudProvider.region

The AWS or Rackspace region where the host (or group of hosts) is located.
Tip: Use this grouping criteria in conjunction with the host.count metric to easily create a report on how many instances you have in each region.

Resource Endpoint

cloudProvider.resource.endPoint

DNS name for which the resource can be accessed.

Service Name

cloudProvider.resource.name

The AWS service name (e.g. EC2, RDS, ELB).

Service Type

cloudProvider.resource.type

The AWS service type (e.g. INSTANCE, LOAD_BALANCER, DATABASE).

Security Group Name

cloudProvider.securityGroups

Security Groups Name.

Provider Status

cloudProvider.status

Resource status.

Provider Tag

cloudProvider.tag

One of the AWS tags.

 

System

Top

Contains all the metrics related with the system, such as CPU, Memory, File System, Processes

Capacity Stolen

capacity.stolen.percent

This metric reflects the loss of capacity to service requests due to stolen cpu and its impact on other resource usage capabilities such as disk I/O and network I/O.
Usage: Capacity.stolen.percent is non-zero only if cpu.stolen.percent is also non-zero.

Capacity in Use + Stolen

capacity.total.percent

Shows estimated current capacity usage of this machine, based on CPU, disk and network utilization with cpu stolen time added back.
Usage: This metric can tell you how the system would perform if it had dedicated use of the CPU.

Capacity in Use

capacity.used.percent

Estimated current capacity usage of this machine, based on CPU, disk and network utilization.
Usage: This metric is calculated by measuring the resources (CPU, disk, network...) that each request coming to the machine is using. The values are combined to create a score that indicates how saturated the machine resources are.

CPU Stolen %

cpu.stolen.percent

CPU steal time is a measure of the percent of time that a virtual machine's CPU is in a state of involuntary wait due to the fact that the physical CPU is shared among virtual machines. In calculating steal time, the operating system kernel detects when it has work available but does not have access to the physical CPU to perform that work.
Tip: If the percent of steal time is consistently high, you may want to stop and restart the instance (since it will most likely start on different physical hardware) or upgrade to a virtual machine with more CPU power. Also see the metric 'capacity total percent' to see how steal time directly impacts the number of server requests that could not be handled. On AWS EC2, steal time does not depend on the activity of other virtual machine neighbours. EC2 is simply making sure your instance is not using more CPU cycles than paid for.

CPU Used %

cpu.used.percent

The average host CPU's usage, measured as a sum of all cores usages divided by the number of cores.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

CPU User %

cpu.user.percent

Percentage of CPU utilization that occurred while executing at the user level (application).
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

CPU Nice %

cpu.nice.percent

Percentage of CPU utilization that occurred while executing at the user level with nice priority.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

CPU System %

cpu.system.percent

Percentage of CPU utilization that occurred while executing at the system level (kernel).
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

CPU I/O Wait %

cpu.iowait.percent

Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

CPU Idle %

cpu.idle.percent

Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

FS Free Space

fs.bytes.free

Filesystem available space.

FS Size

fs.bytes.total

Filesystem size.

Disk Used Bytes

fs.bytes.used

Filesystem used space.

FS Device

fs.device

Filesystem device.

FS Free Space %

fs.free.percent

Percentage of filesystem free space.

FS Mount Dir

fs.mountDir

Filesystem mount directory.

FS Type

fs.type

Filesystem type.

FS Usage %

fs.used.percent

Percentage of the sum of all filesystems in use

FS Root Usage %

fs.root.used.percent

Percentage of the root filesystem in use

FS Largest Usage %

fs.largest.used.percent

Percentage of the largest filesystem in use

Swapped Memory Used

memory.swap.bytes.used

The amount of swapped memory currently in use.
Tip: Swap space is secondary memory storage on a drive for data that does not fit into physical RAM. If swap space usage is high, consider adding memory to the instance to increase performance.

Memory Total

memory.bytes.total

The total amount of physical memory available.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Memory Used

memory.bytes.used

The amount of physical memory currently in use.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Virtual Memory Used

memory.bytes.virtual

The amount of virtual memory currently in use.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Available Memory

memory.bytes.available

The amount of available memory.
Tip: An estimate of how much memory is available for starting new applications, without swapping. Note that this metric may not be directly available on older systems (kernel version < 3.14), in which case it is approximated as sum of free and cached memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Total Swap Memory

memory.swap.bytes.total

Total amount of swap memory.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Available Swap Memory

memory.swap.bytes.available

Available amount of swap memory.
Tip: Sum of free and cached swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Used Swap Memory

memory.swap.bytes.used

Used amount of swap memory.
Tip: The amount of used swap memory is calculated by subtracting available from total swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Used Swap Memory %

memory.swap.used.percent

Used percent of swap memory.
Tip: The percentage of used swap memory is calculated as percentual ratio of used and total swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using 'Segment by' in the UI.

Memory Page Faults

memory.pageFault.major

A count of the condition that occurs when a program accesses a memory page that is mapped in the virtual address space, but not loaded in physical memory.
Usage: A major or 'hard' page fault is handled by using a disk I/O operation (e.g., memory mapped file or page replacement causing a page swapping). For instance, when starting an application the Linux kernel will search physical memory and the CPU cache, and, if data does not exist, a major page fault occurs. Generally, adjusting application source code or making more physical memory available reduces major page faults.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Memory Minor Page Faults

memory.pageFault.minor

A count of the condition in which a memory page had been loaded in memory at the time the page fault was generated, but was not marked in the memory management unit as being loaded in memory.
Usage: If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or 'soft' page fault. A minor page fault is handled without using a disk I/O operation (e.g., allocated by malloc().). The effect of minor page faults depends on system load and other factors, but are typically short and have very little impact.
Tip: By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Memory Usage %

memory.used.percent

The percentage of physical memory in use.
Tip: By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use 'Segment by' in the UI.

Uptime

uptime

The percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.

CPU Shares Count

cpu.shares.count

Amount of CPU Shares assigned to a container (technically, the container's cgroup) - this is a common way of creating a CPU limit for a container. CPU Shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. The default value for a container is 1024. Each container receives its own allocation of CPU cycles, according to the ratio of it's share count vs to the total number of shares claimed by all containers. For example, if you have three containers, each with 1024 shares, then each will recieve 1/3 of the CPU cycles. Note that this is not a hard limit: a container can consume more than its allocation, if the CPU has cycles that aren’t being consumed by the container they were originally allocated to.

CPU Shares Used %

cpu.shares.used.percent

Percentage of a container's allocated CPU Shares that are actually used. CPU Shares are a common way of creating a CPU limit for a container. CPU Shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. The default value for a container is 1024. Each container receives its own allocation of CPU cycles, according to the ratio of it's share count vs to the total number of shares claimed by all containers. For example, if you have three containers, each with 1024 shares, then each will recieve 1/3 of the CPU cycles. Note that this is not a hard limit: a container can consume more than its allocation, if the CPU has cycles that aren’t being consumed by the container they were originally allocated to - so this metric, CPU Shares %, can actually exceed 100%.

CPU Quota Used %

cpu.quota.used.percent

Percentage of a container's CPU Quota that is actually used. CPU Quotas are a common way of creating a CPU limit for a container. CPU Quotas are based on a percentage of time - a container can only spend its quota of time on CPU cycles across a given time period (default period is 100ms). Note that, unlike CPU Shares, CPU Quota is a hard limit to the amount of CPU the container can use - so this metric, CPU Quota %, should not exceed 100%.

Memory Limit

memory.limit.bytes

Memory limit in bytes assigned to a container.

Memory Limit Usage %

memory.limit.used.percent

Percentage of memory limit used by a container.

Swap Limit

swap.limit.bytes

Swap limit in bytes assigned to a container.

Swap Limit Usage %

swap.limit.used.percent

Percentage of swap limit used by the container.

System Average 1 min Load

load.average.1m

The 1 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 1 minute for all cores. The value should correspond to the first (of three) load average values displayed by 'uptime' command.

System Average 5 min Load

load.average.5m

The 5 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 5 minutes for all cores. The value should correspond to the second (of three) load average values displayed by 'uptime' command.

System Average 15 min Load

load.average.15m

The 15 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 15 minutes for all cores. The value should correspond to the third (and last) load average value displayed by 'uptime' command.

System Average per CPU 1 min Load

load.average.percpu.1m

The 1 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 1 minute, divided by number of system CPUs.

System Average per CPU 5 min Load

load.average.percpu.5m

The 5 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 5 minutes, divided by number of system CPUs.

System Average per CPU 15 min Load

load.average.percpu.15m

The 15 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 15 minutes, divided by number of system CPUs.

System Uptime

system.uptime

System Uptime
Have more questions? Submit a request