Metrics integrations: Build a Custom App Check

Application checks are integrations that allow the Sysdig agent to poll specific metrics exposed by any application. We provide built-in app checks for many common infrastructure applications - see the list of currently supported app checks here: Metrics integrations: Application Checks. Many other Java-based applications are also supported out-of-the-box with our JMX integrations: Metrics integrations: JMX

If your application is not already supported though, you have a couple options:

 

  1. Utilize StatsD or JMX to collect custom metrics: 
  2. Send us a request at support@sysdig.com, and we'll do our best to add support for your application
  3. Create your own check by following the instructions below.

If you do write a custom check, let us know! We love hearing about how our users extend Sysdig Monitor, and we can also consider embedding your app check automatically in the Sysdig agent.

Check Anatomy

Essentially, an app check is a Python Class that extends AgentCheck:

from checks import AgentCheck

class MyCustomCheck(AgentCheck):
    # namespaces of the monitored process to join
    # right now we support 'net', 'mnt' and 'uts'
    # put there the minimum necessary namespaces to join
    # usually 'net' is enough. In this case you can also omit the variable
    # NEEDED_NS = ( 'net', )

    # def __init__(self, name, init_config, agentConfig):
    #     '''
    #     Optional, define it if you need custom initialization
    #     remember to accept these parameters and pass them to the superclass
    #     '''
    #     AgentCheck.__init__(self, name, init_config, agentConfig)
    #     self.myvar = None

    def check(self, instance):
        '''
        This function gets called to perform the check.
        Connect to the application, parse the metrics and add them to aggregation using
        superclass methods like `self.gauge(metricname, value, tags)`
        '''
        server_port = instance['port']
        self.gauge("testmetric", 1)

Put this file into /opt/draios/lib/python/checks.custom.d (create the directory if not present) and it will be available to the Sysdig agent. To run your checks, you need to supply configuration information in the agent's config file, dragent.yaml as is done with bundled checks:

app_checks:
  - name: voltdb # check name, must be unique
    # name of your .py file, if it's the same of the check name you can omit it
    # check_module: voltdb 
    pattern: # pattern to match the application
      comm: java
      arg: org.voltdb.VoltDB
    conf:
      port: 21212 # any key value config you need on `check(self, instance_conf)` function

See the Metrics Configuration guide above for more examples.  For information on adding parameters to an agent's user settings configuration file, see the FAQ: How-can-I-edit-the-agent-s-configuration-file?

 

Check Interface Detail

As you can see, the most important piece of the check interface is the check function. The function declaration is:

    def check(self, instance)

instance is a dict containing the configuration of the check. It will contain all the attributes found in the conf: section in dragent.yaml plus the following:

  • name:  the check unique name
  • ports:  an array of all listening ports of the process
  • port:  the first listening port of the process

These attributes are available as defaults and allow you to automatically configure your check. The conf: section as higher priority on these values.

Inside the check function you can call these methods to send metrics:

self.gauge(metric_name, value, tags) # Sample a gauge metric

self.rate(metric_name, value, tags) # Sample a point, with the rate calculated at the end of the check

self.increment(metric_name, value, tags) # Increment a counter metric

self.decrement(metric_name, value, tags) # Decrement a counter metric

self.histogram(metric_name, value, tags) # Sample a histogram metric

self.count(metric_name, value, tags) # Sample a raw count metric

self.monotonic_count(metric_name, value, tags) # Sample an increasing counter metric

Usually the most used are gauge and rate. Besides metric_name and value parameters that are quite obvious, you can also add tags to your metric using this format:

tags = [ "key:value", "key2:value2", "key_without_value"]

It is an array of string representing tags in both single or key/value approach. They will be useful in Sysdig Monitor for graph segmentation.

You can also send service checks which are on/off metrics, using this interface:

self.service_check(name, status, tags)

Where status can be:

  • AgentCheck.OK
  • AgentCheck.WARNING
  • AgentCheck.CRITICAL
  • AgentCheck.UNKNOWN

 

Testing

To test your check you can launch Sysdig App Checks from the command line to avoid running the full agent and iterate faster:

# from /opt/draios directory
sudo ./bin/sdchecks runCheck <check_unique_name> <process_pid> [<process_vpid>] [<process_port>]
  • check_unique_name: the check name as on config file
  • pid:  process pid seen from host
  • vpid:  optional, process pid seen inside the container, defaults to 1
  • port:  optional, port where the process is listening, defaults to None

Example:

sudo ./bin/sdchecks runCheck redis 1254 1 6379
5658:INFO:Starting
5658:INFO:Container support: True
5658:INFO:Run AppCheck for {'ports': [6379], 'pid': 5625, 'check': 'redis', 'vpid': 1}
Conf: {'port': 6379, 'socket_timeout': 5, 'host': '127.0.0.1', 'name': 'redis', 'ports': [6379]}
Metrics: # metrics array
Checks: # metrics check
Exception: None # exceptions

The output is intentionally raw to allow you to better debug what the check is doing.

 

 

Have more questions? Submit a request