Metrics integrations: Application Checks

The Sysdig agent supports additional application monitoring capabilities with application check scripts or 'app checks'. These are a set of plugins that poll for custom metrics for those specific applications which export them via status or management pages, for example: Nginx, Redis, MongoDB, Memcached and more.

Many app checks are enabled by default in the agent and when a supported application is found, the correct app check script will be called and metrics polled automatically. However if default connection parameters are changed in your application, you will need to modify the app check connection parameters in the Sysdig Agent user settings configuration file to match your application.

In some cases you may also need to enable the metrics reporting functionality in your application before the agent can poll them. This guide will detail how to make any configuration changes in the agent's configuration file and list several application integration examples.

 

Supported Applications List: 

Below is the supported list of applications the agent will automatically poll. Some app check scripts will need to be configured since no defaults exist while some applications may need to be configured to output their metrics. Click a highlighted link to see application specific notes if any:

 

Active MQ Apache Apache CouchDB Apache HBase
Apache Kafka Apache Zookeeper Cassandra Consul
CEPH Couchbase Elasticsearch etcd
fluentd Gearman Go Gunicorn
HAProxy HDFS HTTP Jenkins
JVM Kyoto Tycoon Lighttpd Memcached
Mesos/Marathon Mongo DB MySQL network
Nginx Percona TokuMX ntp PGBouncer
PHP-FPM Postfix PostgreSQL Prometheus
RabbitMQ Redis Riak Riak CS
Supervisord TCP  Tomcat Varnish

 

Configuration

If you need to customize a connection configuration for the agent, for example to change usernames or passwords to match your application, you will add an entry to the user-settings configuration file here:

/opt/draios/etc/dragent.yaml

If your application is already supported, the default settings can be seen from the agent's default- settings configuration file:

/opt/draios/etc/dragent.default.yaml

To override defaults, copy the entry from the default-settings file into the user-settings file and modify parameters as needed. For information on adding parameters to a container agent's configuration file, see the FAQ: How-can-I-edit-the-agent-s-configuration-file?

Any entries copied into the user-settings file will override similar entries in the default-settings file. This is required since dragent.default.yaml will be overwritten when subsequent agent upgrades are performed and any manual edits to that default file would be lost. The dragent.yaml user-settings file never gets overwritten - modify dragent.yaml only!

The basic app checks configuration template is as follows:

app_checks: 
  - name: APP-NAME
    check_module: APP_CHECK_SCRIPT
    pattern:
      comm: PROCESS-NAME
  conf:
    host: IP_ADDR
    port: PORT

app_checks: is the main section of dragent.default.yaml that contains a list of pre-configured checks. Every check should have a unique name: and will be displayed on Sysdig Monitor as the process name of your software.  check_module: is the name of the Python plugin that polls the data from your application. All the app check scripts can be found inside the /opt/draios/lib/python/checks.d directory.

The pattern: section is used by the Sysdig agent to match a process with a check, three kinds of keys can be specified along with any arguments to help distinguish: 

  • comm: -> matches process name as seen in /proc/<pid>/status
  • exe:  ->  matches the process exe as seen in /proc/<pid>/exe link
  • port: -> matches the port where the process is listening
  • arg:  ->  matches any process arguments

The conf: section is specific for each plugin, you can specify any key/values that the plugins support. Also, as values, you can use {...} tokens which will be substituted with values from process info. 

 

Example Config File

Here is a complete example dragent.yaml user settings file with an app-check entry for Redis. The app-checks section  was copied from the dragent.default.yaml file and modified for our specific instance: 

customerid: 831f3-Your-Access-Key-9401
tags: local:sf,acct:dev,svc:db
app_checks: 
  - name: redis-6380
    check_module: redisdb
    pattern:
      comm: redis-server
  conf:
    host: 127.0.0.1
    port: {port}
    password: mysecret

We changed the name to be displayed in the interface, and added a required password. Since the token {port} is used, it will be translated to the actual port where Redis is listening. Be sure to use consistent spacing for indents as shown and list all check entries under an 'app_checks:' section title.

After saving the changes in /opt/draios/etc/dragent.yaml and restarting the agent with 'service restart agent' or 'docker restart sysdig-agent', metrics for the Redis database should appear in your Sysdig Monitor interface in the App:Redis view. Additional individual metrics polled will appear in the Metrics list.

 

Application Specific Notes

Some applications may not work out-of-the-box since they require a non default username and password or because they do not expose metrics by default. Below are several applications that require additional configuration before they can be polled by the Sysdig agent:

Apache

Apache has a common default for exposing metrics. The process command name can be either apache2 or httpd. By default, our agent will look for the process 'apache2'.  If named differently in your environment, copy the agent's default Apache configuration from dragent.default.yaml and modify the comm: line to match the process name.  Default entry is:

  - name: apache
    check_module: apache
    pattern: 
      comm: apache2
    conf:
      apache_status_url: "http://localhost:{port}/server-status?auto"

Apache Kafka

Metrics from Kafka via JMX polling are already configured in the agent's default-settings configuration file. Metrics for consumers, however, need to use app-checks to poll the Kafka and Zookeeper API.  Custom configuration is required for the checks since consumer names and topics are unique. Here is a sample entry for the agent's user-settings config file dragent.yaml:

  - name: kafka
    check_module: kafka_consumer
    pattern:
      comm: java
      arg: kafka.Kafka
    conf:
      kafka_connect_str: "127.0.0.1:9092" # kafka address, usually localhost as we run the check on the same instance
      zk_connect_str: "zookeeper:2181" # zookeeper address, may be different than localhost
      zk_prefix: /
      consumer_groups:
        sample-consumer: # sample consumer name
          test: [0, ] # sample topic name and partitions
        sample-consumer-2: # sample consumer name
          test-2: [0, 3, 5] # sample topic name and partitions

 

Consul

Consul support works out-of-the-box if you use the standard port, otherwise you can configure your custom port in the agent's config file adding this entry:

  - name: consul
    pattern:
      comm: consul
    conf:
      url: "http://localhost:<port>"
      catalog_checks: yes

In addition to the metrics from our app-check, there are many other metrics that Consul can send using StatsD. Those metrics will be automatically collected by our agent's StatsD integration if Consul is configured to send them by adding this line to its config file:

"statsd_addr": "127.0.0.1:8125"

for example:

{
"leave_on_terminate": "true",
"statsd_addr": "127.0.0.1:8125",
"recursors": ["10.4.0.2"]
}

etcd

The default agent configuration for etcd will look for the application on localhost, port 2379. And no customization should be required.

Etcd (before version 2) does not listen on localhost, so our agent will not connect to it automatically. Add the option -bind-addr 0.0.0.0:4001 to the etcd commandline to allow our agent to connect.

If you use a port different from 4001, set it by copying and pasting this etcd entry into the agent configuration file dragent.yaml changing <port> as needed. 

Alternatively you can use {hostname} as a token on conf: section, it will be replaced at runtime with the hostname where the agent is running.
For Kubernetes customers, this is the recommended setting.


Configuring alternate port:

- name: etcd
pattern: comm: etcd conf: url: "http://localhost:<port>"


Configuring {hostname} (preferred setting for using etcd with Kubernetes):

- name: etcd
pattern:
comm: etcd
conf:
url: "http://{hotsname}:<port>"


If encryption is used add appropriate SSL entries:


  - name: etcd
    pattern:
      comm: etcd
    conf:
      url: "https://localhost:<port>"
      ssl_keyfile:  /etc/etcd/peer.key
      ssl_certfile: /etc/etcd/peer.crt
      ssl_ca_certs: /etc/etcd/ca.crt
      ssl_cert_validation: True

 

fluentd

Make sure to have these lines in fluentd.conf:

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
</source>

If you use a non-standard port for monitor_agent, you can configure it as usual in the agent config file dragent.yaml:

  - name: fluentd
    pattern:
      comm: fluentd
    conf:
      monitor_agent_url: http://localhost:24220/api/plugins.json

Go

The Go programming language provides an easy way for developers to expose application metrics. Because of the difficulty in determining if an application is written in Go by looking at process names or arguments, you will need to create a custom entry in the user settings config file for your Go application. Be sure your app has expvars enabled, this means importing the expvar module and having an HTTP server started from inside your app:

import (
    ...
    "net/http"
    "expvar"
    ...
)

// If your application has no http server running for the DefaultServeMux,
// you'll have to have a http server running for expvar to use, for example
// by adding the following to your init function
func init() {
    go http.ServeAndListen(":8080", nil)
}

// You can also expose variables that are specific to your application
// See http://golang.org/pkg/expvar/ for more information

var (
    exp_points_processed = expvar.NewInt("points_processed")
)

func processPoints(p RawPoints) {
    points_processed, err := parsePoints(p)
    exp_points_processed.Add(points_processed)
    ...
}

Then add these lines on your dragent.yaml file - customized for your app:

  - name: mygoapp # customize the name
    check_module: go_expvar # all go apps will use the same check_module
    pattern:
      comm: app # In this case we are matching the app by process name, use other selectors if needed
    conf:
      expvar_url: "http://localhost:{port}/debug/vars" # automatically match url using the listening port
      # Add custom metrics if you want
      # metrics:
      #   - path: points_processed 
      #     type: rate # rate or gauge
      #     alias: points.processed.count

HAProxy

The stats feature needs to be enabled on your HAProxy instance. This can be done by adding the following entry to the haproxy configuration file here: /etc/haproxy/haproxy.cfg:

listen stats :1936
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth stats:stats
services_include:
- foo
- bar
services_exclude:
- zoo
- keeper

If changes are made to any ports or passwords, copy the entry below into the user-settings config file and modify as necessary:

  - name: haproxy
    pattern:
      comm: haproxy
      port: 1936
    conf:
      username: stats
      password: stats
      url: http://localhost:1936/
status_check: True
collect_aggregates_only: True
collect_status_metrics: True
collect_status_metrics_by_host: True
tag_service_check_by_host: True

HTTP

Similar to the TCP check, HTTP check monitors your HTTP based applications for URL availability. It will send a request to your described http endpoint and return the metric 'http.can_connect'  with a value of 0 for closed or 1 for open. You can further specify content that should be found for the check to be successful.

Add the below entry to the user-settings config file dragent.yaml and modify the `name:`  `comm:`  `arg:` and `url:` parameters as needed:

  - name: my_http_backend
    check_module: http_check
    pattern:
      comm: ruby
      arg: http.rb
    conf:
      url: "http://localhost:{port}/ping"
collect_response_time: true
 content_match: 'mycontent'

In this example the metric 'network.http.response_time' will also be returned with the time in seconds for the web server to accept the connection.  Since the optional `content_match:` parameter is specified the URL must be up and return the specified content in the page.

Lighttpd

For Lighttpd the status page needs to be enabled. Add mod_status in the /etc/lighttpd/lighttpd.conf config file:

server.modules = ( ..., "mod_status", ... )

And configure an endpoint for it, for security you can allow it only for local users:

$HTTP["remoteip"] == "127.0.0.1/8" {
    status.status-url = "/server-status" 
  }

If changes are made to any ports or passwords, use the entry below and modify as necessary:

  - name: lighttpd
    pattern:
      comm: lighttpd
    conf:
      lighttpd_status_url: "http://localhost:{port}/server-status?auto"

Mesos/Marathon

Mesos master and slave application checks should work with no additional configuration. However, to customize them for your configuration, copy the entries from the dragent.default.yaml file and add them as explained above to dragent.yaml with your required changes.

Note: In the latest versions of Mesos on DC/OS, the Sysdig Monitor application checks will not work without specific additional configuration.  Please review this guide for more details: Sysdig Application checks for Mesos/Marathon in DC/OS.

For the Mesos master, the default is:

  - name: mesos-master
    check_module: mesos_master
    interval: 30
    pattern:
      comm: mesos-master
    conf:
      url: "http://localhost:{port}"

For the Mesos slave:

  - name: mesos-slave
    check_module: mesos_slave
    pattern:
      comm: mesos-slave
    interval: 30
    conf:
      url: "http://localhost:{port}"
      # Name of individual tasks to monitor, if needed
      # tasks:
        # - mongo
        # - cassandra

For Marathon:

  - name: marathon
    check_module: marathon
    interval: 30
    pattern:
      arg: mesosphere.marathon.Main
    conf:
      url: "http://localhost:{port}"

Mongo

The default MongoDB entry should work for most installations without modification.  Only if you have enabled password authentication will the entry need to be changed.  Here is the default entry:

  - name: mongodb
check_module: mongo
pattern:
comm: mongod
conf:
server: "mongodb://localhost:{port}/admin"

If you have added a username and password, copy the default Mongo entry into the dragent.yaml file modifying the `server:` entry by adding your user account and password:

        server: "mongodb://USER:PASSWORD@localhost:{port}/admin

MySQL

There is no default configuration for MySQL since a unique user and password are required for metrics polling. To configure credentials, run the following commands on your server replacing the 'sysdig' user and password parameters:

mysql -e "CREATE USER 'sysdig-cloud'@'127.0.0.1' IDENTIFIED BY 'sysdig-cloud-password';"
mysql -e "GRANT REPLICATION CLIENT ON *.* TO 'sysdig-cloud'@'127.0.0.1' WITH MAX_USER_CONNECTIONS 5;"

Then add the entry for MySQL into dragent.yaml, again, changing credential information:

  - name: mysql
    pattern:
      comm: mysqld
    conf:
      server: 127.0.0.1
      user: sysdig-cloud
      pass: sysdig-cloud-password

Nginx

Open-source NGINX exposes basic metrics about server activity on a simple status page, provided that you have the HTTP stub status module enabled. To check if the module is already enabled, run:

nginx -V 2>&1 | grep -o with-http_stub_status_module 

If 'with-http_stub_status_module' is listed, the status module is enabled. If that command returns no output, you will need to enable the status module: http://nginx.org/en/docs/http/ngx_http_stub_status_module.html

The agent already has an entry for Nginx in dragent.default.yaml. However, the commercial version may have a different status page. If so, copy and paste the agent's default Nginx configuration to dragent.yaml and modify to match the configured status URL:

  - name: nginx 
    check_module: nginx
    pattern:
      exe: "nginx: worker process"
    conf:
      nginx_status_url: "http://localhost:{port}/nginx_status/"

PGBouncer

PGBouncer does not ship with a default stats user configuration. To configure it, you need to add a user allowed to access PGBouncer stats. Do so by adding this line in pgbouncer.ini:

stats_users = sysdig_cloud

For the same user you need an entry in userlist.txt:

"sysdig_cloud" "sysdig_cloud_password"

Then add a PGBouncer entry in the agent's config file dragent.yaml:

  - name: pgbouncer
    pattern:
      comm: pgbouncer
    conf:
      # set if the bind ip is different
      # set if the port is not the default
      username: sysdig_cloud
      password: sysdig_cloud_password

PHP-FPM

This check has a default configuration that should suit many use cases. If it does not work for you, verify you have added these lines on your php-fpm.conf file:

pm.status_path = /status
ping.path = /ping

If you need a different configuration, you can change the conf part below:

  - name: php-fpm
    check_module: php_fpm
    pattern:
      exe: "php-fpm: master process"
    conf:
      status_url: /mystatus
      ping_url: /myping
      ping_reply: mypingreply

PostgreSQL

PostgreSQL will be auto-discovered and the agent will connect through the Unix socket using the postgres default user. If it does not work, you can create a user for Sysdig Monitor and give it enough permissions to read Postgres stats. To do this, execute these example statements on your server:

create user sysdig_cloud with password 'password';
grant SELECT ON pg_stat_database to sysdig_cloud;

And then add these lines to the dragent.yaml configuration file:

  - name: postgres
    pattern:
      comm: postgres
      port: 5432
    conf:
      username: sysdig-cloud
      password: password

Prometheus

This application check is able to collect metrics from an external HTTP endpoint exposing metrics in Prometheus format and import them as StatsD metrics in Sysdig.

Tests were performed again: https://github.com/kubernetes/kube-state-metrics

Add these lines to the agent's configuration file:

  - name: prometheus
    pattern:
      comm: python
      arg: /opt/draios/bin/sdchecks
    conf:
      url: http://{YOUR_PROMETHEUS_ENDPOINT_IP}:8080/metrics

RabbitMQ

For RabbitMQ you need to install it's management plugin, this can be done with the command below (See https://www.rabbitmq.com/management.html for more info):

rabbitmq-plugins enable rabbitmq_management

After installation, if you change the default RabbitMQ user/password (guest:guest), you will need to add it in dragent.yaml. Copy and paste the RabbitMQ configuration from dragent.default.yaml and change the user name and password:

  - name: rabbitmq
    pattern:
      port: 15672
    conf:
      rabbitmq_api_url: "http://localhost:15672/api/"
      rabbitmq_user: myuser
      rabbitmq_pass: mypassword
      queues:
        - queue1
        - queue2
- . . .

To limit the number of queues monitored, add the "queues:" parameter and add the list of queue names. This is useful if you see the error "To many queues to fetch" in the agent's log file.

Riak CS

Riak-CS does not ship with a default configuration because it needs at least access_id and access_secret to work.  Add the RiackCS entry in your dragent.yaml with those unique parameters:

  - name: riakcs
    pattern:
      comm: beam.smp
      port: 8080
    conf:
      access_id: "my_access_id"
      access_secret: "my_access_secret"
      #is_secure: false 
      s3_root: s3.amazonaws.dev

To get statistics app_checks also needs permissions to read the riak-cs bucket.

TCP

When you want to monitor the status of your custom application's port, use the TCP check. This check will routinely connect to the designated port and send Sysdig Monitor a simple on/off metric. This configuration is not in the default settings file, you must add the below entry to the user settings config file dragent.yaml:

  - name: myapp
    check_module: tcp_check
    pattern:
      comm: ruby
      arg: myapp.rb
    conf:
      port: 8080

The above example will look for the 'ruby' process name with argument 'myapp.rb' running on port 8080 and return the metric 'tcp.can_connect' with a value of 0 for closed or 1 for open.

If you want the response time for your port, meaning the amount of time the process takes to accept the connection, you can add the collect_response_time: true parameter under the conf: section and the additional metric 'network.tcp.response_time' will appear in the Metrics list.

Warning: do not use port: under the pattern: section in this case, since if the process is not listening it will not be matched and the metric will not be sent to Sysdig Monitor.

Jenkins

To monitor Jenkins, add the following configuration to the dragent.yaml:

  - name: jenkins
    pattern:
      comm: java
      port: 50000
    conf:
      name: default
jenkins_home: /var/lib/jenkins #this depends on your environment

Working With Agent Configuration Files

To change the user settings configuration file for the native Linux agent, edit the file /opt/draios/etc/dragent.yaml and then restart the agent with the shell command service dragent restart.  Never edit the default settings configuration file dragent.default.yaml since this file will be overwritten when upgrading the agent.

If you need to change the configuration file for the containerized agent, please see the FAQ How-can-I-modify-the-containerized-agents-configuration-file?

 

Disabling A Single Application Check

Sometimes the default configuration shipped with the Sysdig agent does not work for you or you may not be interested in checks for a single application. To turn a single check off, add an entry like this to disable it:

app_checks: 
- name: nginx
enabled: false

This entry overrides the default configuration of the nginx check, disabling it.

If you are using the `ADDITIONAL_CONF` parameter to modify your container agent's configuration (from the above FAQ), you would add an entry like this to your Docker run command (or Kubernetes manifest):

-e ADDITIONAL_CONF="app_checks:\n  - name: nginx\n    enabled: false\n"

 

Disabling ALL Application Checks

If you do not need it or otherwise want to disable the application check functionality, you can add the following entry to the agent's user settings configuration file /opt/draios/etc/dragent.yaml

app_checks_enabled: false

Restart the agent as shown immediately above for either the native Linux agent installation or the container agent installation.

 

Metrics Limit

There is a limit of 300 metrics which can be reported by all application check scripts per host. If more metrics are needed please contact your sales representative with your use case.

Note that a metric with the same name but different tag name will count as a unique metric by the agent. Example: a metric 'user.clicks' with the tag 'country=us' and another 'user.clicks' with the 'tag country=it' are considered two metrics which count towards the limit of 300.

 

Have more questions? Submit a request