Metrics integrations: Application Checks

The Sysdig agent supports additional application monitoring capabilities with application check scripts or 'app checks'. These are a set of plugins that poll for custom metrics for those specific applications which export them via status or management pages, for example: Nginx, Redis, MongoDB, Memcached and more.

Many app checks are enabled by default in the agent and when a supported application is found, the correct app check script will be called and metrics polled automatically. However if default connection parameters are changed in your application, you will need to modify the app check connection parameters in the Sysdig Agent user settings configuration file to match your application.

In some cases you may also need to enable the metrics reporting functionality in your application before the agent can poll them. This guide will detail how to make any configuration changes in the agent's configuration file and list several application integration examples.


Supported Applications List: 

Below is the supported list of applications the agent will automatically poll. Some app check scripts will need to be configured since no defaults exist while some applications may need to be configured to output their metrics. Click a highlighted link to see application specific notes if any:


Active MQ Apache Apache CouchDB Apache HBase
Apache Kafka Apache Zookeeper Cassandra Consul
CEPH Couchbase Elasticsearch etcd
fluentd Gearman Go Gunicorn
Jenkins  JVM  Kyoto Tycoon  Lighttpd
Memcached Mesos/Marathon Mongo DB MySQL
network Nginx ntp Percona TokuMX
PGBouncer PHP-FPM Postfix PostgreSQL
Prometheus RabbitMQ Redis Riak
Riak CS Supervisord TCP Tomcat



If you need to customize a connection configuration for the agent, for example to change usernames or passwords to match your application, you will add an entry to the user-settings configuration file here:


If your application is already supported, the default settings can be seen from the agent's default- settings configuration file:


To override defaults, copy the entry from the default-settings file into the user-settings file and modify parameters as needed. For information on adding parameters to a container agent's configuration file, see the FAQ: How-can-I-edit-the-agent-s-configuration-file?

Any entries copied into the user-settings file will override similar entries in the default-settings file. This is required since dragent.default.yaml will be overwritten when subsequent agent upgrades are performed and any manual edits to that default file would be lost. The dragent.yaml user-settings file never gets overwritten - modify dragent.yaml only!

The basic app checks configuration template is as follows:

  - name: APP-NAME
    check_module: APP_CHECK_SCRIPT
      comm: PROCESS-NAME
    host: IP_ADDR
    port: PORT

app_checks: is the main section of dragent.default.yaml that contains a list of pre-configured checks. Every check should have a unique name: and will be displayed on Sysdig Monitor as the process name of your software.  check_module: is the name of the Python plugin that polls the data from your application. All the app check scripts can be found inside the /opt/draios/lib/python/checks.d directory.

The pattern: section is used by the Sysdig agent to match a process with a check, three kinds of keys can be specified along with any arguments to help distinguish: 

  • comm: -> matches process name as seen in /proc/<pid>/status
  • exe:  ->  matches the process exe as seen in /proc/<pid>/exe link
  • port: -> matches the port where the process is listening
  • arg:  ->  matches any process arguments

The conf: section is specific for each plugin, you can specify any key/values that the plugins support. Also, as values, you can use {...} tokens which will be substituted with values from process info. 


Example Config File

Here is a complete example dragent.yaml user settings file with an app-check entry for Redis. The app-checks section  was copied from the dragent.default.yaml file and modified for our specific instance: 

customerid: 831f3-Your-Access-Key-9401
tags: local:sf,acct:dev,svc:db
  - name: redis-6380
    check_module: redisdb
      comm: redis-server
    port: {port}
    password: mysecret

We changed the name to be displayed in the interface, and added a required password. Since the token {port} is used, it will be translated to the actual port where Redis is listening. Be sure to use consistent spacing for indents as shown and list all check entries under an 'app_checks:' section title.

After saving the changes in /opt/draios/etc/dragent.yaml and restarting the agent with 'service restart agent' or 'docker restart sysdig-agent', metrics for the Redis database should appear in your Sysdig Monitor interface in the App:Redis view. Additional individual metrics polled will appear in the Metrics list.


Application Specific Notes

Some applications may not work out-of-the-box since they require a non default username and password or because they do not expose metrics by default. Below are several applications that require additional configuration before they can be polled by the Sysdig agent:



Apache has a common default for exposing metrics. The process command name can be either apache2 or httpd. By default, our agent will look for the process 'apache2'.  If named differently in your environment, copy the agent's default Apache configuration from dragent.default.yaml and modify the comm: line to match the process name.  Default entry is:

  - name: apache
    check_module: apache
      comm: apache2
      apache_status_url: "http://localhost:{port}/server-status?auto"


Apache Kafka

Metrics from Kafka via JMX polling are already configured in the agent's default-settings configuration file. Metrics for consumers, however, need to use app-checks to poll the Kafka and Zookeeper API.  Custom configuration is required for the checks since consumer names and topics are unique.

For Kafka, here is a sample entry for the agent's user-settings config file dragent.yaml:

  - name: kafka
    check_module: kafka_consumer
      comm: java
      arg: kafka.Kafka
kafka_consumer_offsets: true kafka_connect_str: "" # kafka address, usually localhost as we run the check on the same instance zk_connect_str: "zookeeper:2181" # zookeeper address, may be different than localhost zk_prefix: / consumer_groups: sample-consumer: # sample consumer name test: [0, ] # sample topic name and partitions sample-consumer-2: # sample consumer name test-2: [0, 3, 5] # sample topic name and partitions

Note: monitor_unlisted_consumer_groups: if set to true will auto-discover consumer groups and topics, but only for offsets stored in zookeeper. It will also ignore configured consumer_groups.

For users of version 9 and onward, Kafka allows you to store consumer group config info inside of Kafka itself for better performance.  For those using this scenario, you will use this alternate Kafka application check script example entry in the dragent.yaml file: 

  - name: kafka
    check_module: kafka_consumer
      comm: java
      arg: kafka.Kafka
      kafka_connect_str: "localhost:9092"
      zk_connect_str: "localhost:2181"
      zk_prefix: /
      kafka_consumer_offsets: true
        testgroup: # Replace with actual consumer group name
          test: [0] # Replace "test" with actual topic name and "0" with list of partitions

Note: kafka_consumer_offsets: if set to true will look for consumer offsets in Kafka. The appcheck will also look in Kafka if zk_connect_str is not configured. 



Consul support works out-of-the-box if you use the standard port, otherwise you can configure your custom port in the agent's config file adding this entry:

  - name: consul
      comm: consul
      url: "http://localhost:<port>"
      catalog_checks: yes

In addition to the metrics from our app-check, there are many other metrics that Consul can send using StatsD. Those metrics will be automatically collected by our agent's StatsD integration if Consul is configured to send them by adding this line to its config file:

"statsd_address": ""

for example:

"leave_on_terminate": "true",
"statsd_address": "",
"recursors": [""]



The default Couchbase entry will return ep_io_* metrics if the cbstats command line utility is available and the entry should work for most installations without changes. If the utility is not available at the default location, specify a location in agent config using the added cbstats port and path statements. If the utility is not available, the app check does not report the ep_io_* metrics but will collect all other metrics:

  - name: couchbase
      comm: beam.smp
      arg: couchbase
      port: 8091
      server: http://localhost:8091
      # The following block is optional and required only if the 'path' and
      # 'port' need to be set to non-default values specified here
        port: 11210
        path: /opt/couchbase/bin/cbstats



The default Elasticsearch entry should work for most installations without modification.  Only if you have enabled password authentication will the entry need to be changed.  Here is the default entry:

   - name: elasticsearch
check_module: elastic
port: 9200
comm: java
url: http://localhost:9200
ssl_cert: <path to the cert>
ssl_key: <path to the key>


The default agent configuration for etcd will look for the application on localhost, port 2379. And no customization should be required.

Etcd (before version 2) does not listen on localhost, so our agent will not connect to it automatically. Add the option -bind-addr to the etcd commandline to allow our agent to connect.

If you use a port different from 4001, set it by copying and pasting this etcd entry into the agent configuration file dragent.yaml changing <port> as needed.

Alternatively you can use {hostname} as a token on conf: section, it will be replaced at runtime with the hostname where the agent is running. For Kubernetes customers, this is the recommended setting.

Configuring an alternate port:

  - name: etcd
      comm: etcd
      url: "http://localhost:<port>

 Configuring with {hostname} - preferred when using Kubernetes:

  - name: etcd 
      comm: etcd 
      url: "http://{hostname}:<port>"

 If encryption is used add appropriate SSL entries:

  - name: etcd
      comm: etcd
      url: "https://localhost:<port>"
      ssl_keyfile:  /etc/etcd/peer.key
      ssl_certfile: /etc/etcd/peer.crt
      ssl_ca_certs: /etc/etcd/ca.crt
      ssl_cert_validation: True



Make sure to have these lines in fluentd.conf:

  @type monitor_agent
  port 24220

If you use a non-standard port for monitor_agent, you can configure it as usual in the agent config file dragent.yaml:

  - name: fluentd
      comm: fluentd
      monitor_agent_url: http://localhost:24220/api/plugins.json



The Go programming language provides an easy way for developers to expose application metrics. Because of the difficulty in determining if an application is written in Go by looking at process names or arguments, you will need to create a custom entry in the user settings config file for your Go application. Be sure your app has expvars enabled, this means importing the expvar module and having an HTTP server started from inside your app:

import (

// If your application has no http server running for the DefaultServeMux,
// you'll have to have a http server running for expvar to use, for example
// by adding the following to your init function
func init() {
    go http.ServeAndListen(":8080", nil)

// You can also expose variables that are specific to your application
// See for more information

var (
    exp_points_processed = expvar.NewInt("points_processed")

func processPoints(p RawPoints) {
    points_processed, err := parsePoints(p)

Then add these lines on your dragent.yaml file - customized for your app:

  - name: mygoapp # customize the name
    check_module: go_expvar # all go apps will use the same check_module
      comm: app # In this case we are matching the app by process name, use other selectors if needed
      expvar_url: "http://localhost:{port}/debug/vars" # automatically match url using the listening port
      # Add custom metrics if you want
      # metrics:
      #   - path: points_processed 
      #     type: rate # rate or gauge
      #     alias: points.processed.count



The stats feature needs to be enabled on your HAProxy instance. This can be done by adding the following entry to the haproxy configuration file here: /etc/haproxy/haproxy.cfg:

listen stats :1936
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth stats:stats
- foo
- bar
- zoo
- keeper

If changes are made to any ports or passwords, copy the entry below into the user-settings config file and modify as necessary:

  - name: haproxy
      comm: haproxy
      port: 1936
      username: stats
      password: stats
      url: http://localhost:1936
      status_check: True
      collect_aggregates_only: True
      collect_status_metrics: True
      collect_status_metrics_by_host: True
      tag_service_check_by_host: True



Similar to the TCP check, HTTP check monitors your HTTP based applications for URL availability. It will send a request to your described http endpoint and return the metric 'http.can_connect'  with a value of 0 for closed or 1 for open. You can further specify content that should be found for the check to be successful.

Add the below entry to the user-settings config file dragent.yaml and modify the `name:`  `comm:`  `arg:` and `url:` parameters as needed:

  - name: my_http_backend
    check_module: http_check
      comm: ruby
      arg: http.rb
      url: "http://localhost:{port}/ping"
collect_response_time: true
 content_match: 'mycontent'

In this example the metric 'network.http.response_time' will also be returned with the time in seconds for the web server to accept the connection.  Since the optional `content_match:` parameter is specified the URL must be up and return the specified content in the page.



Istio metrics can be obtained using Prometheus.



For Lighttpd the status page needs to be enabled. Add mod_status in the /etc/lighttpd/lighttpd.conf config file:

server.modules = ( ..., "mod_status", ... )

And configure an endpoint for it, for security you can allow it only for local users:

$HTTP["remoteip"] == "" {
    status.status-url = "/server-status" 

If changes are made to any ports or passwords, use the entry below and modify as necessary:

  - name: lighttpd
      comm: lighttpd
      lighttpd_status_url: "http://localhost:{port}/server-status?auto"



Mesos master and slave application checks should work with no additional configuration. However, to customize them for your configuration, copy the entries from the dragent.default.yaml file and add them as explained above to dragent.yaml with your required changes.

Note: In the latest versions of Mesos on DC/OS, the Sysdig Monitor application checks will not work without specific additional configuration.  Please review this guide for more details: Sysdig Application checks for Mesos/Marathon in DC/OS.

For the Mesos master, the default is:

  - name: mesos-master
    check_module: mesos_master
    interval: 30
      comm: mesos-master
      url: "http://localhost:{port}"

For the Mesos slave:

  - name: mesos-slave
    check_module: mesos_slave
      comm: mesos-slave
    interval: 30
      url: "http://localhost:{port}"
      # Name of individual tasks to monitor, if needed
      # tasks:
        # - mongo
        # - cassandra

For Marathon:

  - name: marathon
    check_module: marathon
    interval: 30
      arg: mesosphere.marathon.Main
      url: "http://localhost:{port}"



The default MongoDB entry should work for most installations without modification.  Only if you have enabled password authentication will the entry need to be changed.  Here is the default entry:

  - name: mongodb
check_module: mongo
comm: mongod
server: "mongodb://localhost:{port}/admin"

If you have added a username and password, copy the default Mongo entry into the dragent.yaml file modifying the `server:` entry by adding your user account and password:

        server: "mongodb://USER:[email protected]:{port}/admin



There is no default configuration for MySQL since a unique user and password are required for metrics polling. To configure credentials, run the following commands on your server replacing the 'sysdig' user and password parameters:

mysql -e "CREATE USER 'sysdig-cloud'@'' IDENTIFIED BY 'sysdig-cloud-password';"

Then add the entry for MySQL into dragent.yaml, again, changing credential information:

  - name: mysql
      comm: mysqld
      user: sysdig-cloud
      pass: sysdig-cloud-password



Open-source NGINX exposes basic metrics about server activity on a simple status page, provided that you have the HTTP stub status module enabled. To check if the module is already enabled, run:

nginx -V 2>&1 | grep -o with-http_stub_status_module 

If 'with-http_stub_status_module' is listed, the status module is enabled. If that command returns no output, you will need to enable the status module:

The agent has an entry (/nginx_status) for Nginx in dragent.default.yaml . However, the free version can have a different status page (/basic_status). Copy and paste the agent's default Nginx configuration to dragent.yaml and modify to match your Nginx's configured status URL:

  - name: nginx 
    check_module: nginx
      exe: "nginx: worker process"
      nginx_status_url: "http://localhost:{port}/nginx_status/"



PGBouncer does not ship with a default stats user configuration. To configure it, you need to add a user allowed to access PGBouncer stats. Do so by adding this line in pgbouncer.ini:

stats_users = sysdig_cloud

For the same user you need an entry in userlist.txt:

"sysdig_cloud" "sysdig_cloud_password"

Then add a PGBouncer entry in the agent's config file dragent.yaml:

  - name: pgbouncer
      comm: pgbouncer
      # set if the bind ip is different
      # set if the port is not the default
      username: sysdig_cloud
      password: sysdig_cloud_password



This check has a default configuration that should suit many use cases. If it does not work for you, verify you have added these lines on your php-fpm.conf file:

pm.status_path = /status
ping.path = /ping

If you need a different configuration, you can change the conf part below:

  - name: php-fpm
    check_module: php_fpm
      exe: "php-fpm: master process"
      status_url: /mystatus
      ping_url: /myping
      ping_reply: mypingreply



PostgreSQL will be auto-discovered and the agent will connect through the Unix socket using the postgres default user. If it does not work, you can create a user for Sysdig Monitor and give it enough permissions to read Postgres stats. To do this, execute these example statements on your server:

create user sysdig_cloud with password 'password';
grant SELECT ON pg_stat_database to sysdig_cloud;

And then add these lines to the dragent.yaml configuration file:

  - name: postgres
      comm: postgres
      port: 5432
      username: sysdig_cloud
      password: password



Starting with Agent version 0.70.0, Sysdig introduced automatic gathering of Prometheus metrics. See the documentation and blog post for more information.

Any application that exposes Prometheus metrics via exporters can be monitored by Sysdig.



For RabbitMQ you need to install it's management plugin, this can be done with the command below (See for more info):

rabbitmq-plugins enable rabbitmq_management

After installation, if you change the default RabbitMQ user/password (guest:guest), you will need to add it in dragent.yaml. Copy and paste the RabbitMQ configuration from dragent.default.yaml and change the user name and password:

  - name: rabbitmq
      port: 15672
      rabbitmq_api_url: "http://localhost:15672/api/"
      rabbitmq_user: myuser
      rabbitmq_pass: mypassword
        - queue1
        - queue2
- . . .

To limit the number of queues monitored, add the "queues:" parameter and add the list of queue names. This is useful if you see the error "To many queues to fetch" in the agent's log file.


Riak CS

Riak-CS does not ship with a default configuration because it needs at least access_id and access_secret to work.  Add the RiackCS entry in your dragent.yaml with those unique parameters:

  - name: riakcs
      comm: beam.smp
      port: 8080
      access_id: "my_access_id"
      access_secret: "my_access_secret"
      #is_secure: false 

To get statistics app_checks also needs permissions to read the riak-cs bucket.




When you want to monitor the status of your custom application's port, use the TCP check. This check will routinely connect to the designated port and send Sysdig Monitor a simple on/off metric. This configuration is not in the default settings file, you must add the below entry to the user settings config file dragent.yaml:

  - name: myapp
    check_module: tcp_check
      comm: ruby
      arg: myapp.rb
      port: 8080

The above example will look for the 'ruby' process name with argument 'myapp.rb' running on port 8080 and return the metric 'tcp.can_connect' with a value of 0 for closed or 1 for open.

If you want the response time for your port, meaning the amount of time the process takes to accept the connection, you can add the collect_response_time: true parameter under the conf: section and the additional metric 'network.tcp.response_time' will appear in the Metrics list.

Warning: do not use port: under the pattern: section in this case, since if the process is not listening it will not be matched and the metric will not be sent to Sysdig Monitor.



To monitor Jenkins, add the following configuration to the dragent.yaml:

  - name: jenkins
      comm: java
      port: 50000
      name: default
jenkins_home: /var/lib/jenkins #this depends on your environment

Working With Agent Configuration Files

To change the user settings configuration file for the native Linux agent, edit the file /opt/draios/etc/dragent.yaml and then restart the agent with the shell command service dragent restart.  Never edit the default settings configuration file dragent.default.yaml since this file will be overwritten when upgrading the agent.

If you need to change the configuration file for the containerized agent, please see the FAQ How-can-I-modify-the-containerized-agents-configuration-file?


Disabling A Single Application Check

Sometimes the default configuration shipped with the Sysdig agent does not work for you or you may not be interested in checks for a single application. To turn a single check off, add an entry like this to disable it:

  -name: nginx
   enabled: false

This entry overrides the default configuration of the nginx check, disabling it. 

If you are using the `ADDITIONAL_CONF` parameter to modify your container agent's configuration (from the above FAQ), you would add an entry like this to your Docker run command (or Kubernetes manifest):

-e ADDITIONAL_CONF="app_checks:\n  - name: nginx\n    enabled: false\n"


Disabling ALL Application Checks

If you do not need it or otherwise want to disable the application check functionality, you can add the following entry to the agent's user settings configuration file /opt/draios/etc/dragent.yaml

app_checks_enabled: false

Restart the agent as shown immediately above for either the native Linux agent installation or the container agent installation.


Increasing Polling Interval

The default interval for an application check to be run by the agent is set to every second.  You can increase the interval per application check by adding the interval: parameter (under the -name section) and the number of seconds to wait before each run of the script.  

Example: Run the NTP check once per minute:

  - name: ntp
    interval: 60
      comm: systemd


Metrics Limit

There is a limit of 300 metrics which can be reported by all application check scripts per host. If more metrics are needed please contact your sales representative with your use case.

Note that a metric with the same name but different tag name will count as a unique metric by the agent. Example: a metric 'user.clicks' with the tag 'country=us' and another 'user.clicks' with the 'tag country=it' are considered two metrics which count towards the limit of 300.


Have more questions? Submit a request