Sysdig Monitor Drilldown Views

This page lists the views available for each host of your infrastructure. To display them, go to the explore page, select an host in the table and use the Display menu in the bottom panel to skim through them.

Categories

Drilldown Views are grouped in the following categories:

Name Description
Services Views offering visibility into the holistic performance of your services, even if those services are deployed in orchestrated containers.
Applications Views for all your various Applications and Infrastructure components. Are we missing your favorite app? Let us know, at info@sysdig.com!
Hosts & Containers Views offering deep visibility into resource utilization and system activity on your hosts and in your containers
Network Views offering deep visibility in network connections and activity
AWS ECS Views for environments running on Amazon’s EC2 Container Service (ECS)
Docker Orchestration Views for environments built on Docker Swarm (v1.12+) and Docker Compose (including Universal Control Plane - UCP, and Docker DataCenter)
Kubernetes Views for environments running on Kubernetes, including Google’s Container Engine
Marathon Views for environments running on Mesos’s Marathon framework, including Mesosphere's DCOS
Mesos Views for environments running on Mesos, including Mesosphere's DCOS
Topology Topology views map the logical dependencies of your application tiers and overlay metrics

 

Services

Top

Views offering visibility into the holistic performance of your services, even if those services are deployed in orchestrated containers.

Service Overview

view.services.overview

Apply this view to a single service to get an overview of the resource utilization and response times of that service.

Service Bottlenecks

view.services.bottlenecks

Apply this view to a single service to get a better understanding of what might be causing the bottlenecks in the service.Local vs Next Tiers: these panels show the amount of time that requests spend in local versus remote tiers of a distributed transaction. For example: in a scenario where a client makes a request to a web server (local tier) which in turn, makes a request to a back-end database server (next tier), the Local vs Next Tiers panels show the time in percent that the request spends in each tier.Processing vs Net vs Disk: these panels show the amount of time that local requests spend on processing, network i/o, and disk i/o.

Overview by Service

view.services.overviewByService

Apply this view to a group of services to get an overview of the size, performance, and limitations of each service. In this view, services are defined as sharing the same container image. Note, specialized views are also available showing service breakdowns based on your particular orchestrator (eg. Docker Swarm, Kubernetes, Mesos/Marathon).
 

Applications

Top

Views for all your various Applications and Infrastructure components. Are we missing your favorite app? Let us know, at info@sysdig.com!

HTTP Overview

view.net.http

This view gives a basic understanding of the health of your web server by showing the load being put on it and the server's ability to service requests in a timely manner.
Usage: Use the Number of Requests and Avg/Max Request Time panels to gauge overall busyness of your server. See if correlations exist between the Top URLS and Slowest URLS panels to find opportunities to significantly enhance performance. Use the Status Codes metric to keep an eye on problems with excessive 4xx and 5xx error codes.
Tip: When improvements are made to the web server, use the Time Travel window's Compare function to help verify if overall performance improved as expected.

HTTP Top Requests

view.http.top.request

This view details the top requested URLs to your webserver and includes for the total number of requests, average and maximum times to service the requests, and the amount of traffic contained in the requests and responses.
Metrics:
  1. URL
  2. Method
  3. Requests Total
  4. Ave Request Time
  5. Max Request Time
  6. Network Bytes In
  7. Network Bytes Out
  8. Network Bytes Total
  9. Error Count

MySQL/PostgreSQL

view.net.sql

This view shows the overall load and performance status of your SQL database transactions with metrics for the number of requests and how quickly they are handled. Actual database queries are shown by top request count and slowest performance as are database tables.
Metrics:
  1. Number of Requests
  2. Average and Max Request Time
  3. Top Queries by Number of Requests
  4. Top Tables by Number of Request
  5. Slowest Queries
  6. Slowest TablesRequest Types Over Time
Usage: Use this view to help determine whether you can improve performance. For example, by monitoring the response times for slowest queries, you can determine whether changes to the query or indexes on the tables are required.

MySQL/PostgreSQL Top

view.sql.top

These three tables give insight to the top SQL queries by displaying metrics for the number of queries received and the amount of traffic sent and received for the query. Request times are also shown along with a count of the number of query failures.
Metrics:
  1. SQL Query Chart
  2. Top Tables Chart
  3. Top Query Types Chart
  4. Requests Total
  5. Network Bytes In
  6. Network Bytes Out
  7. Network Bytes Total
  8. Avg Req Time
  9. Max Req Time
  10. Failed Reqs
Usage: Sort columns as needed to see the most requested, highest traffic producing or slowest processing queries. If the query statement is truncated in the view, mouse over it to see a pop-up with the full query.

MongoDB

view.mongodb

This view shows the how busy your MongoDB service is, which collections are in highest demand and which have the slowest performance. Top and slowest operations are also shown.
Metrics:
  1. Number of Requests
  2. Average and Max Request Time
  3. Top Collections by Number of Requests
  4. Top Operations by Number of Requests
  5. Slowest Collections
  6. Slowest Operations
  7. Number of Operations Over Time
Usage: Use to spot which collections may benefit from query and index performance tuning.

JVM

view.jvm

JAVA Description
Metrics:
  1. Heap Usage Over Time
  2. Heap Usage by Process
  3. Thread Count
Usage: Use this view to track memory management of your Java virtual machines. Watch dynamic memory allocation with the Heap Usage graphs and memory reclamation via the Garbage Collector charts.

Cassandra Overview

view.cassandra.overview

This view shows how a Cassandra cluster is performing, by mixing key system metrics with Cassandra-specific metrics such as requests volume and compactions.
Metrics:
  1. Heap Usage Over Time
  2. Heap Usage by Process
  3. Garbage Collector: Collection Time
  4. Garbage Collector: Collection Count
  5. Thread Count
Usage: Use this view on a group containing the entire Cassandra cluster as a first starting point to troubleshoot the overall health of your database. You can inspect typical system metrics to make sure the cluster is not being overloaded, and then you can correlate these information with important advanced Cassandra metrics such as pending compactions, useful to spot a possible imminent degradation of performance, or JVM metrics like heap usage and gargage collection times, that will point out critical problems.

Cassandra by Node

view.cassandra.byNode

This view shows how every node in a Cassandra cluster is performing, by mixing key system metrics with Cassandra-specific metrics such as requests volume and compactions.
Metrics:
  1. Heap Usage Over Time
  2. Heap Usage by Process
  3. Garbage Collector: Collection Time
  4. Garbage Collector: Collection Count
  5. Thread Count
Usage: Use this view on a group containing the entire Cassandra cluster when you already identified that there's a problem with some metric (using the "Cassandra Overview" view) and you need to see which node is causing the problem. You can easily spot issues such as imbalances between the size of data held in each node, nodes going down and generating a lot of hinted handoffs, or disk bottlenecks by looking at the pending compactions.

ActiveMQ

view.activemq

This view presents an overview of thirteen performance metrics for throughput, resource usage, and processing times for your ActiveMQ servers
Metrics:
  1. All Pending Messages
  2. Expired Message Rate
  3. Enqueue Time Overview
  4. Storage Usage
  5. Broker Memory Usage
  6. Producer Count
  7. Enqueue Rate
  8. Consumer Count
  9. Dequeue Rate
  10. Dispatch Rate
  11. Inflight Messages
  12. Queues Memory Usage
  13. Broker Temp Usage
Tips: Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Tomcat

view.tomcat

This view displays an overview of nine major performance metrics for throughput, errors, requests, and processing times for your Tomcat servers
Metrics:
  1. Bytes Sent vs Bytes Received
  2. Error Count
  3. Request Count
  4. Active Sessions vs Expired Sessions
  5. Datasources Overview
  6. Threads Overview
  7. Servlet Processing Time
  8. Request Processing Time vs Max Time
  9. Sessions: Create vs Expire Rate
Tips: Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Kafka

view.kafka

This view displays specific performance and capacity metrics for your Kafka messaging service.
Metrics:
  1. In-Messages
  2. Network Overview
  3. Fetch and Produce Fails
  4. Average Fetch Time
  5. Average Operation Time
  6. Replication Overview
Tip: Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Apache ZooKeeper Standalone

view.zookeeper

This view displays an overview of latency and capacity metrics on a per cluster basis.
Metrics:
  1. Request Latency
  2. zNode Count3.
  3. Active Connections
Tip: Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Apache ZooKeeper Cluster

view.zookeeper.byNode

This view displays an overview of latency and capacity metrics on a per node basis.
Metrics:
  1. Request Latency
  2. zNode Count
  3. Alive Connections
  4. Quorum Size
Tip: Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Apache HBase

view.hbase

This view displays an overview of critical latency and capacity metrics for the HBase distributed database.
Metrics:
  1. Server Load
  2. Requests Overview
  3. Flush Queue Length
  4. Compaction Queue Length
  5. Cache Overview
  6. Cache Eviction
  7. Memory Store Size
  8. SlowAppend Count
Tip: Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Apache

view.app.apache

This view displays eight performance and resource usage metrics for the server overall and top requested and slowest performing pages.
Metrics:
  1. Busy vs Idle Workers
  2. CPU Load by Machine
  3. Network Bytes Activity
  4. Requests per Second
  5. Top URLs
  6. Slowest URLs
  7. HTTP Methods
  8. Response Codes
USAGE: Use the first four metrics to determine 'busyness' when making load-balancing decisions. See 'Slowest URLs' to pinpoint pages that should be optimized and 'Response Codes' to find problems. Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected or require more resources.
TIP: This view utilizes application check functionality for metrics polling. Configure non-default application connection parameters in the agent's configuration file /opt/draios/etc/dragent.yaml file per the user guide.

Elasticsearch

view.app.elasticsearch

This view lists eight important metrics for node and document counts, shards, indexing time and query latency.
Metrics:
  1. Number of Nodes
  2. Network Bytes Activity
  3. Number of Shards
  4. Active Shards by Host
  5. Document Count
  6. Query Latency
  7. Indexing Time
  8. Index Memory Size
USAGE: Since query latency can directly impact user experience, consider adding an alert for it. Watch the node count as this can also impact query times. Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.
TIP: Apply the view 'App: JVM' for additional memory management insight to this JVM application. This view utilizes application check functionality for metrics polling. Configure non-default application connection parameters in the agent's configuration file /opt/draios/etc/dragent.yaml file per the user guide.

HAProxy

view.app.haproxy

This view reports metrics for host CPU use and proxy throughput.
Metrics:
  1. Frontend Network Bytes (In vs Out)
  2. Frontend sessions
  3. Backend Network Bytes (In vs Out)
  4. Backend errors
  5. Backend warning rate
  6. CPU Usage by Machine
USAGE: Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.
TIP: This view utilizes application check functionality for metrics polling. Configure non-default application connection parameters in the agent's configuration file /opt/draios/etc/dragent.yaml file per the user guide.

Nginx

view.app.nginx

This view reports nine metrics for host resources, http connections, top and slowest URLs, and host response codes.
Metrics:
  1. Writing, Reading, Waiting connections
  2. CPU Load by Machine
  3. Network Bytes Activity
  4. Requests per Second
  5. Top URLs
  6. Slowest URLs
  7. Active Connections
  8. Dropped Connections
  9. Response Codes
USAGE: Use the first four metrics to determine 'busyness' for load-balancing adjustments. Focus on 'Slowest URLs' to identify pages for optimization and 'Response Codes' to find problems. Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected.
TIP: This view utilizes application check functionality for metrics polling. Configure non-default application connection parameters in the agent's configuration file /opt/draios/etc/dragent.yaml file per the user guide.

RabbitMQ

view.app.rabbitmq

This view reports seven metrics for host loading and queue performance.
Metrics:
  1. Queue Messages
  2. Queue consumers
  3. Queue Memory
  4. Network Bytes Activity
  5. CPU Usage by Machine
  6. Sockets per Node
  7. File Descriptors per Node
USAGE: Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.
TIP: This view utilizes application check functionality for metrics polling. Configure non-default application connection parameters in the agent's configuration file /opt/draios/etc/dragent.yaml file per the user guide.

Redis

view.app.redis

This view reports seven metrics for host resource usage and application performance.
Metrics:
  1. Connections
  2. Keys
  3. Memory Usage
  4. Keyspaces hits and misses
  5. Commands
  6. Network Bytes Activity
  7. CPU Usage by Machine
TIPS/USAGE: Pin individual metrics to your own custom dashboard using the push-pin icon in each panel. Use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.
TIP: This view utilizes application check functionality for metrics polling. Configure non-default application connection parameters in the agent's configuration file /opt/draios/etc/dragent.yaml file per the user guide.
 

Hosts & Containers

Top

Views offering deep visibility into resource utilization and system activity on your hosts and in your containers

Overview by Process

res.usage.proc

This view reports general resource usage statistics for the top processes on a host, or if the view is applied to a group of instances, the top processes aggregated for all instances of the group.
Metrics:
  1. CPU %
  2. Memory Usage %
  3. Network Bytes Total
  4. Number of Network Connections
  5. File Bytes Total
  6. Disk Usage
Usage: Monitor this view to identify which processes are using disproportionate amounts of resources. Helpful in determining if an application should be moved to a more capable host.
Tip: You can use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Overview by Container

res.usage.container

This view reports resource usage statistics for containers running on the selected group or instance
Metrics:
  1. CPU %
  2. Memory Usage %
  3. Network Bytes Total
  4. File Bytes Total
Usage: Monitor this view to identify which containers are using disproportionate amounts of resources. Helpful in determining if an application should be moved to a more capable host. Drill into an instance by clicking the '+' symbol and selecting a container to automatically bring up the System: Container Overview screen for more detailed information about the processes running inside.
Tip: You can use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Container Limits

view.container.limits

This container view shows both CPU and memory metrics relative to the host resources allocated to each container. Note that this is different from the standard metrics surfaced for containers, which convey resource utilization relative to the total host resources.
Tip: CPU shares and quotas represent two different ways that CPU can be allocated to a container at the kernel level. For detailed descriptions of these metrics, see the <a href='http://support.sysdigcloud.com/hc/en-us/articles/204931155#container' target='_blank'>Sysdig metrics dictionary</a>.

Overview by Host

res.usage.host

This view reports general resource usage statistics for the whole host, or if the view is applied to a group of instances, for each instance of the group.
Metrics:
  1. CPU %
  2. Memory Usage %
  3. Network Bytes Total
  4. Number of Network Connections
  5. File Bytes Total
  6. Disk Usage
Usage: This view is especially useful since it will easily show you when a host is being under or over-utilized within a group of hosts with similar job functions.
Tip: You can use the Time Travel feature with the Compare function to see historical context and determine if processes are running as expected, misbehaving, or now require more resources.

Top Processes

view.top.proc

This view lists the top processes running on the host or hosts (if a group is selected). The process name, its path and all arguments will be shown as well.
Metrics:
  1. Process command line
  2. CPU %
  3. Memory Usage %
  4. Network Bytes In
  5. Requests In
Usage: Use this view when you want to find which processes are top consumers in an environment where the same process is spawned multiple times.
Tip: Remember to use the filter feature to help restrict which processes are listed (e.g. getty). If a process name is truncated in the list, move the mouse over the name to see a tool tip with the full command.

Top Server Processes

view.top.proc.server

Similar to the System: Top Processes, this view displays resource consumption for processes identified to be server oriented only (httpd, java, ntpd, etc.).
Metrics:
  1. Process command line
  2. CPU %
  3. Memory Usage %
  4. Network Bytes
  5. Requests In
Usage: Use this view to see resource usage for only server processes.
Tip: If process name is truncated, move the mouse over the name to see tool tip with full command.

File System

view.fs

This table view shows directory mount points, file system devices, and capacity and usage information for the file systems mounted on the instance. When groups are selected, metrics are averages for similar filesystem mount points.
Metrics:
  1. FS Mount Dir
  2. FS Device
  3. FS Type
  4. FS Size
  5. Disk Used Bytes
  6. FS Free Space
  7. FS Usage %
Usage: Sort on 'FS Usage %' to find which file systems are filling up or which ones are being underutilized.
Tip: Note that remotely mounted file systems are not listed by default. To enable, add the entry 'remotefs.enabled = true' to the /opt/draios/bin/dragent.properties file on each instance.

Top Files

view.files.top

This view lists the most active, open files and metrics on their size, I/O time, and number of errors encountered when accessing the file.
Metrics:
  1. File name
  2. File Bytes Total
  3. Time spent in file I/O
  4. File Error Count
Usage: Sort by size to see the top disk consumers. Sort by I/O to see the busiest files. Spot potential problems with the File Error Count column.

Memory

view.memory

This view shows the quantity of memory used over time in aggregate, and by top processes and top hosts. Major memory page faults are similarly shown over time, and by top processes and top hosts.
Metrics:
  1. Average Memory Usage Over Time
  2. Average Page Faults Over Time
  3. Memory Usage - Top Processes
  4. Page Faults - Top Processes
  5. Memory Usage - Top Hosts
  6. Page Faults - Top Hosts
Usage: A major page fault occurs when a program accesses a memory page that is mapped in the virtual address space, but not loaded in physical memory. For example, when starting an application the Linux kernel will search physical memory and the CPU cache. If data does not exist, a major page fault occurs and a disk I/O results.
Tip: While page faults are normal, excessive major page faults can degrade performance. Generally, making more physical memory available reduces page faults. To see the complete command line and arguments of processes listed, change the processes grouping to 'proc.command.line' in the view menu bar.

Overview by Container Image

view.container.image

Apply this view to a group of services to get an overview of the size, performance, and limitations of each service. In this view, services are defined as sharing the same container image. Note, specialized views are also available showing service breakdowns based on your particular orchestrator (eg. Docker Swarm, Kubernetes, Mesos/Marathon).
 

Network

Top

Views offering deep visibility in network connections and activity

Overview

view.net.overview

Four strip charts plot total network utilization over time by direction, application, process and host.
Metrics:
  1. In vs Out Network Bytes
  2. Network Bytes by Application
  3. Network Bytes by Process
  4. Network Bytes by Host
Usage: For the selected hosts, average inbound traffic versus outbound traffic is shown on the first chart. The remaining charts show total combined values by application type, by individual process and then by individual host.

Connections Table

view.net.connections

This detailed table view lists all connections between your instance(s) and remote nodes with traffic metrics for each connection.
Metrics:
  1. Local Endpoint
  2. Local Service
  3. Remote Endpoint
  4. Remote Service
  5. Bytes In
  6. Bytes Out
  7. Bytes Total
  8. Number of Connections
  9. Requests In
  10. Requests Out
  11. Requests Total
Usage: Sorted by the appropriate columns, use this view to quickly find the top talkers on your network for the host under review.

Response Times vs Resource Usage

view.response.time.vs.res.usage

This strip chart graphs several host resource metrics over time as compared to the host's response time in processing network requests.
Metrics:
  1. Net Request Time
  2. CPU Used %
  3. Memory Used %
  4. Net Bytes Total
  5. File Bytes Total
Usage: Apply this view to your servers to help identify which resources impact response performance the most and then increase those resources as necessary to see if improved response rates result.

Response Times

view.net.request.breakdown

For the local and next tiers metrics, this view defines the amount of time requests spend in local versus remote tiers of a distributed transaction. For example: in a scenario where a client makes a request to a web server (local tier) which in turn, makes a request to a back-end database server (next tier), the Local vs Next Tiers panels show the time in percent that the request spends in each tier.
Metrics:
  1. Local vs Next Tiers Request Time Breakdown Over Time
  2. Current Local vs Next Tiers Request Time Breakdown
  3. Processing vs Net vs Disk Request Time Breakdown Over Time
  4. Current Processing vs Net vs Disk Request Time Breakdown
Usage: Use this view to help identify where any delays may be taking place: on the local host due to heavy processing and disk use or due to client/server delays or back-end delays. A comparison of the percent of time spent by the local host(s) in CPU processing versus network and disk I/O can be seen in the bottom two charts.
 

Docker Orchestration

Top

Views for environments built on Docker Swarm (v1.12+) and Docker Compose (including Universal Control Plane - UCP, and Docker DataCenter)

Compose Overview

view.docker.overview

This view should support environments built on Docker Compose, including older Swarm environments, Universal Control Plane (UCP), and Docker DataCenter.

Compose Projects

view.docker.projects

This view should support environments built on Docker Compose, including older Swarm environments, Universal Control Plane (UCP), and Docker DataCenter.

Compose Services

view.docker.services

This view should support environments built on Docker Compose, including older Swarm environments, Universal Control Plane (UCP), and Docker DataCenter.

Swarm Overview

view.swarm.overview

This view should work for environments built on Docker Swarm v1.12+. For older Swarm environments, try the Docker Compose views.

Swarm Services

view.swarm.services

This view should work for environments built on Docker Swarm v1.12+. For older Swarm environments, try the Docker Compose views.

Swarm Tasks

view.swarm.tasks

This view should work for environments built on Docker Swarm v1.12+. For older Swarm environments, try the Docker Compose views.
 

Topology

Top

Topology views map the logical dependencies of your application tiers and overlay metrics

CPU Usage

view.cpu.usage.map

Immediately see the scope of interaction between your host and the rest of your infrastructure when this view renders your selected host's top processes and their connections with processes on other hosts/host groups.
Metrics:
  1. IP Address
  2. Top Processes
  3. CPU %
Usage: The view will show dashed lines and grey boxes for those instances that do not have the Sysdig Monitor agent installed but for which communication can still be detected. Spot busy hosts when they become color-coded when CPU usage is elevated.
Tip: Zoom into a host by clicking the associated '+' symbol and view the top processes within a host. Network connection lines are rendered between hosts and, when zoomed, between individual processes and hosts. Use your mouse scroll wheel to zoom the contents of the window then left-click and drag to move the map components within the window.

Network Traffic

view.net.connections.map

Similar to the Topology: CPU Usage, this map view allows you to visually understand the selected instance's bandwidth usage between local processes and processes on remote nodes.
Metrics:
  1. IP Address
  2. Process
  3. Bytes Total
Usage: The view will show dashed lines and grey boxes for those instances that do not have the Sysdig Monitor agent installed but for which communication can still be detected.
Tip: Zoom into a host by clicking the associated '+' symbol and view the top processes within a host. Network connection lines are rendered between hosts and, when zoomed, between individual processes and hosts. Use your mouse scroll wheel to zoom the contents of the window then left-click and drag to move the map components within the window.

Topology: Response Times

view.net.requests.time.map

This graphical map shows communication and network response metrics between all processes on the selected hosts. Response counts and times are shown in averages down to 1 second granularity. The view will show dashed lines and grey boxes for those instances that do not have the Sysdig Monitor agent installed but for which communication can still be detected.
Metrics:
  1. IP Address
  2. Network Processes
  3. Response Times
Tip: Expand a group or host by clicking the associated '+' symbol. Hosts can be expanded to see individual processes inside. Network connection lines are rendered between hosts and, when zoomed, between individual processes and hosts. Use your mouse scroll wheel to zoom the contents of the window then left-click and drag to move the map components within the window.
Have more questions? Submit a request