Sysdig Monitor Agent Changelogs

RSS
  • Aug 10, 2018

    0.84.0

    New Features

    • Secure compliance checks that run CIS Docker and Kubernetes benchmarks to check for best practices.
    • Allow multiple instances of an AppCheck to run per process.
    • Capture files created by agent can now be seamlessly read by older versions of open source Sysdig tool.

    Bug Fixes

    • Several performance improvements.
    • Fixed sdchecks crash on Python 2.6 caused by incompatibility in the websocketlibrary.
    • Fixed file descriptor leak that can occur if an agent process crashes while disconnected from the backend.
    • Upgraded fasterxml:jackson-databind from version 2.8.4 to 2.8.11.1.
    • Fix logic that avoids scanning duplicate ports while fetching Prometheus metrics.
    • Fix the Varnish AppCheck for Varnish 5.x.
    • Only scrape Solr processes with listening ports.
    • Refresh list of listening ports after 30 seconds.
    • Fix collecting container metadata from older versions of Docker.
    • Consul AppCheck now supports Consul 0.7 and above.
    • Read environment variables from envs greater than 4kiB and optionally walk process hierarchy to find environment variables on custom containers.
    • Fix Kubernetes node and deployment metrics not updating.
    • Minor improvements to host CPU usage calculation.
    • Compatibility fix in Sysdig probe for Linux kernel versions 4.17 and later.
  • Jul 11, 2018

    0.83.1

    Bug fixes

    • Solr AppCheck refactor: added default 5 second timeout to HTTP requests, get index size from core stats if replication handler is not available, performance improvements.
    • Adding processes with AppCheck metrics to top process list to send to the collector.
    • To prevent sdjagent restarts, we now send heartbeats from sdjagent if VM discovery takes a long time.
    • Fixed debug log for the number of Prometheus metrics sent, filtered and total.
    • Limit file system stats to the configured number of mount points.
  • Jun 28, 2018

    0.83.0

    New Features

    • Added support for matching jar name when collecting JMX metrics.

    Bug fixes

    • Avoid duplicate port scans on Prometheus metrics scrapping.
  • Jun 8, 2018

    0.82.0

    New Features

    • Added the ability to suppress system events for a process and its descendants. This can reduce agent CPU usage by ignoring non-essential programs running on the host.
    • Performance improvements to program hash calculation.
    • Performance Improvements related to Falco policies in Secure.
    • Increased default AppChecks metric limit to 500 (was 300).
    • Updated to latest Redis AppCheck upstream version.

    Bug fixes

    • Fixed potential crash in high-stress environments.
    • Fixed issue in reporting kubernetes.node.ready.
    • Fixed version checking in PostgreSQL AppCheck that was causing certain metrics to not get reported. Fixed reporting replication_delay for Postgres 10.
    • Reduced frequency of warnings in agent log for RabbitMQ AppCheck when limits on number of queues, nodes, or exchanges is exceeded.
    • Fixed error that was causing the NTP AppCheck to fail when the agent was running in a Kubernetes daemonset.
    • Added missing AmazonLinux repos for building Sysdig driver.
    • Added location of newer kbuild Debian packages for building Sysdig driver.
    • Addressed unknown metric type errors reported in agent log in Mesos, MySQL and PHP-FPM AppChecks by converting counters to rate metrics.
    • Fixed sdjagent crash when running in Java 10 & 11.
    • Fixed URL for Fedora repos for building Sysdig driver.
    • Report memory values in bytes as opposed to MB in MongoDB AppCheck.
  • May 17, 2018

    0.81.0

    New Features

    • Improved performance for event processing when using Sysdig Secure.
    • Reduce agent load by dropping unneeded events in the driver.
    • Log the top syscalls seen by the analyzer each flush loop.

    Bug fixes

    • Fixed multiline logs to use the correct log level instead of always using error level.
    • Lower the default number of proc and socket lookups to reduce agent instability under heavy load.
    • Fixed the loss of default metrics when using custom_metrics and fixed unknown metric errors in the PostgreSQL AppCheck.
    • Fixed unknown metric errors in the PHP-FPM AppCheck.
  • Apr 23, 2018

    0.80.2

    New Features

    • Added support for ep_io_* metrics on Couchbase AppCheck.
    • Updated RabbitMQ AppCheck with upstream.
    • Added option to send JMX Bean attributes to be used for segmentation.

      Bug fixes

    • Fixed potential agent crash during log of first export of Prometheus metric.
    • Report lag as 0 when consumer offset is negative in KafKa AppCheck.
    • Increased default maximum container label length to 100 characters.
    • Fixed probe DKMS builds for Debian Jessie versions by adding symbolic links for gcc.
    • Fixed bug where setting max_n_proc_lookups: 0 allowed uncapped number of lookups.
  • Apr 26, 2018

    0.80.1

    Bug fixes

    • Filter Kubernetes replicaSets and replicationControllers that have spec == 0.
  • Apr 23, 2018

    0.80.0

    New Features

    • Added info-level log to indicate when Prometheus metrics are first found and exported.
    • Add easier way to get dragent.yaml configurations from Kubernetes ConfigMaps and Secrets. They can be used with the updated daemonSet yaml manifest (sysdig-agent-deamonset-v2.yaml) found in https://github.com/draios/sysdig-cloud-scripts/tree/master/agent_deploy/kubernetes.

      Bug fixes

    • Reduce cointerface memory usage during startup, especially on large Kubernetes clusters.
    • Don't report watch timeouts as errors in the Kubernetes orchestrator event watchdog.
  • Apr 11, 2018

    0.79.1

    Bug Fixes

    • Fix a problem that caused captures to have inconsistent state and not be readable by any Sysdig version. Captures generated by this agent can be read with Sysdig versions >= 0.21.0.
  • Apr 5, 2018

    0.79.0

    New Features

    • Added kubestate metrics for Horizontal Pod AutoScaler.
    • Added kubestate metrics for StatefulSets.
    • Track HTTP requests, MongoDB and SQL query counts as separate metrics.
    • Match free in calculation of host memory usage.
    • Pick delegated node by node name instead of UUID.
    • Added support for use of HTTPS to connect to a Prometheus exporter.
    • Track versioning in capture files: with this release, we will increment the pcap major/minor version in capture files when a release adds new event types, additional event fields, etc. that are incompatible with earlier sysdig versions.
    • Added ability to specify a custom URL to pull the agent probe.
    • Falco secure policies can now use the in operator with a set of addr/netmask values.
    • Added support for AMD's Secure Memory Encryption (SME).

      Bug fixes

    • Report negative offsets as zero in Kafka AppCheck.
    • Fixed hostname resolution in etcd AppCheck.
    • Bumped up agent's default memory limit for cointerface to 256 MiB.
    • Pruned list of known_ports in dragent.default.yaml for which net.request.* metrics are calculated.
    • Blacklisted DNS from known_ports to prevent DNS from consuming entries in agent's connection table.
    • Mark a protocol as TLS only if the port is in known_ports.
    • Output fields in secure policy events are specified in agent (and are more flexible).
    • Small fixes to filesystem-related secure policies.
    • Added /run/secrets to the mount points exclude list in dragent.default.yaml.
  • Mar 15, 2018

    0.78.1

    Bug Fixes

    • Fix a regression where some encryption certificates could not be parsed properly when accessing Kubernetes API.
  • Mar 9, 2018

    0.78.0

    New Features

    • Install script now works on Amazon Linux 2.
    • We have added a filter to prune most of the default labels that orchestrators add in order to reduce noise and redundant metadata. Container and Kubernetes labels are limitted to 50 chars max.

    Bug Fixes

    • UDP connection tracking improvements: a connection will be reported only if there has been at least one write/read between endpoints.
    • Fixed driver compilation on Ubuntu Xenial when using containerized agent.
    • Fixed a race condition that caused crashes of Kubernetes cointerface.
    • Fixed a bug where the agent would stop collecting metadata after a Kubernetes cointerface crash.
    • Avoid retrying the Kubernetes API if doesn't have CronJobs support when collecting state for those.
    • Fixed a bug that caused sdjagent (Java metrics collector) to stall.
  • Feb 9, 2018

    0.77.0

    New Features

    • Support for new Sysdig Secure policy types in addition to Falco based policies. Requires additional configuration on the backend side at the account level to enable them.
    • Add ability to scrape Prometheus histogram metric type.
    • AppChecks can now ignore all SSL warnings.

    Bug Fixes

    • Fix Gunicorn AppCheck compatibility with a more recent Python psutil package.
    • Fix Kafka AppCheck to handle cases when node_id is 0.
    • Improve accuracy of CPU usage calculations for newly started containers with recycled container IDs.
    • Add additional validation of and warnings about degenerate Mesos task IDs such as 0 and 1.
    • More aggressively filter container mounts to exclude common mounts set up by Docker and Kubernetes.
    • Fix regression in reporting container counts caused by not reporting CPU usage when 0.
  • Jan 31, 2018

    0.76.4

    Bug Fixes

    • Fixed calculation of CPU usage after missed samples.
    • Stuck captures no longer prevent the start of other captures.
    • Fixde a problem that could cause repeated events in captures.
  • Jan 26, 2018

    0.76.3

    New Features

    • Added ability to get Mesos lightweight metadata from Docker environment data if container processes zero out environment variables.

    Bug Fixes

    • Suppress SSL warnings from http_check AppCheck if it is configured to ignore warnings.
    • Fixed a rare crash that can occur during agent startup due to uninitialized internal state.
    • Changed level of Kubernetes delegated node log message from DEBUG to INFO.
    • Clamp CPU values when incorrect values are detected due to issues in kernels 4.8 - 4.10.
  • Jan 18, 2018

    0.76.2

    New Features

    • Support for JMX metrics from Java 9 applications.

    Bug Fixes

    • Fixed an issue in 0.76.1 that causes agent captures to not work with the Sysdig command line tool and Sysdig Inspect.
    • Added logic to handle invalid values in cgroup's cpuacct.usage.
  • Jan 17, 2018

    0.76.1

    New Features

    • Added Amazon Linux 2 support.
    • Dump make.log when dkms probe module builds fail.

    Bug Fixes

    • Fixed crash in setsid() exit event parser.
    • Upgraded various build dependencies.
    • Fix slow processing of cointerface RPC responses that can cause memory pressure.
  • Jan 12, 2018

    0.76

    New Features

    • Agent now reads container cpu.used.percent directly from cgroups instead of summing all the processes, so it will be more accurate.
    • There is a new command on sdjagent: allAvailableBeans that prints all the beans found on a JVM with their respective values.
    • Kubernetes metadata is available also for init containers.
    • Kakfa AppCheck support for consumer offsets stored in Kafka.

    Bug Fixes

    • sdjagent availableMetrics command now uses the same logic of getMetrics to detect JMX attributes convertible to numbers.
    • Fix a crash occurring if all statsd metrics are filtered.
    • Fixed an error when calculating the hash for precompiled CoreOS probe modules.
    • Fixed an agent probe module failure on Ubuntu kernels with the Meltdown patches that disable the page-fault tracepoints.
  • Dec 20, 2017

    0.75

    New Features

    • Updated host CPU usage formula, excluding iowait.
    • Mesos framework name now report just 'marathon' and not 'marathon [url]'. So regardless Marathon leader failover, the framework name will be consistent.
    • Agent is now able to gather a subset of Mesos/Marathon labels without involving master API. It can reduce overhead and increase reliability.
    • Container labels with > 200 characters will be skipped.
    • Updated all Docker API calls to use versioned endpoints.

    Bug Fixes

    • Improved validation of Mesos containers (avoid spurious slave containerid in UI).
    • Agent container is able to compile the kernel module also if kernel is compiled with CONFIG_STACK_VALIDATION or CONFIG_ORC_UNWINDER.
    • Fix crash caused by unhandled exception from scap_get_n_tracepoint_hit.
    • Fix warning for unhandled Docker event.
    • Fix JMX aliases with more than one token.
  • Dec 14, 2017

    0.74

    Bug Fixes

    • Fix a crash that occurs during error conditions of DNS lookups.
    • Fixed get_env() to handle spaces properly and to only return exact matches.
  • Dec 4, 2017

    0.73.2

    Bug Fixes

    • Fix an issue parsing Mesos API.
    • Make JMX class matching case insensitive.
  • Nov 29, 2017

    0.73.1

    Bug Fixes

    • Properly handle boolean values on AppChecks config.
    • Add additional logging to track JSON parse failures in Mesos environments.
  • Nov 21, 2017

    0.73.0

    New Features

    • Agent is now able to monitor activity of ia32 apps running on 64bit OS.

    Bug Fixes

    *Fix a bug causing agent reporting wrong memory usage for newly created processes.

  • Nov 10, 2017

    0.72.4

    New Features

    • Reported container memory now matches more closely what top does for processes.
  • Nov 9, 2017

    0.72.3

    Bug Fixes

    • Fix a bug on percentiles calculation.
  • Nov 8, 2017

    0.72.2

    Bug Fixes

    • Fix cointerface crash caused by incompatible versions of the Go and C++ GRPC dependencies.
    • Fix invalid EPEL link in the agent install script.
  • Oct 27, 2017

    0.72.0

    New Features

    • Support for Prometheus histogram and summaries metric types. Now we report avg and count for each of these.

    Bug Fixes

    • Report component name for Kubernetes events, i.e. pod name will be reported on pod related events.
    • Performance improvements on Sysdig Secure capabilities.
  • Oct 19, 2017

    0.71.0

    New Features

    • Add exitCode and signal on Docker die and kill events.

    Bug Fixes

    • Updates to Kubernetes annotations were not propagated for Prometheus scraping.
    • Containerized agent would not use the probe module it built through DKMS.
    • Prevent log spam by throttling how frequently the agent connects to the Kubernetes API server.
    • Do not require new_k8s to be enabled in order to use Kubernetes data for Prometheus autodetection.
  • Oct 12, 2017

    0.70.0

    New Features

    • Agent will automatically shutdown the kernel driver if overhead is too high, see tracepoint_hits_threshold, tracepoint_hits_threshold, tracepoint_hits_threshold and tracepoint_hits_threshold options.
    • Prometheus metrics can be autodiscovered using the standard Kubernetes metadata too.

    Bug Fixes

    • Fix race condition that would cause watchdog to wrongly kill agent subprocesses.
    • Performance improvements on percentile aggregations.
  • Sep 25, 2017

    0.69.0

    New Features

    • Sysdig captures performed by the agent can be read by Sysdig Inspect.
    • New support for proc.exepath.
    • Improved Couchbase check with new metrics.

    Bug Fixes

    • Report Prometheus metrics as AppChecks for quota purposes.
    • Fix gaps in metric transmissions when collecting inverval is longer than 1 second.
  • Sep 16, 2017

    0.68.0

    New Features

    • Report number of custom metrics sent for each category: statsd, JMX, AppChecks and Prometheus.
    • Report number of native thread count per process/container/host.

    Bug Fixes

    • Gracefully handle running out of memory for back in time captures. At startup, the agent will try to allocate the required memory and disable back in time captures if the allocation fails.
  • Sep 7, 2017

    0.67.0

    New Features

    • Prometheus metrics: automatically detect and report metrics generated by Prometheus-enabled applications.

    Bug Fixes

    • Include meta events in captures, both regular and back-in-time.
  • Aug 23, 2017

    0.66.0

    New Features

    • Linux install script now supports --additional-conf parameter with same semantics from our Docker containerized agent.
    • AppChecks can now use {hostname} as a token on conf: section that will be replaced at runtime with the hostname where the agent is running.
    • Send BackOff, Unhealthy and FailedScheduling events by default. Useful to detect issues on Kubernetes pods.
    • Hide agent events from captures by default.
    • Change description of capture notification to policy name to be more clear.

    Bug Fixes

    • Fix Docker container to properly build kernel module on Debian Stretch.
  • Aug 12, 2017

    0.65.1

    Bug Fixes

    • Report load averages in nodriver mode.
  • Jul 28, 2017

    0.65.0

    New Features

    • Update AppChecks from upstream project.

    Bug Fixes

    • Properly terminate sdjagent on Java 1.6 instead of crashing.
    • Improved Rkt detection.
  • Jul 20, 2017

    0.64.0

    New Features

    • Support Unix sockets on PHP-FPM check.

    Bug Fixes

    • Fix Kubernetes auto-delegation algorithm in case nodes have taints (latest Openshift release has them by default).
  • Jul 10, 2017

    0.63.1

    Bug Fixes

    • Fix Marathon Groups reporting.
    • Fix a bug causing memory.limit.used.bytes reporting wrong values.
  • Jul 5, 2017

    0.63.0

    New Features

    • Limit mount points to 15 per host and per each container and allow custom filtering to include/exclude mount points.

    Bug Fixes

    • Fix mount points reporting and JMX for rkt containers.
    • Log metrics over limit even if there are no filters.
  • Jun 15, 2017

    0.62.0

    New features

    • Additional Cassandra JMX metrics to monitor cluster health (cassandra.mutation.dropped, cassandra.counter.mutation.dropped, cassandra.read.dropped).
    • RabbitMQ check has a new ssl_verify parameter that allows the agent to connect to RabbitMQ instances without verifying the SSL certificate.

    Bug fixes

    • Fix statsd support for rkt containers.
    • Reduce severity of misleading Kubernetes delegation logs.
    • Rework HAproxy metrics to avoid per app tagging.
    • Exit gracefully when the agent can not load the kernel module.
    • Avoid 32 bit overflow for net.bytes.{in,out,total}.
    • Fix slow memory growth related to metrics messages and shared pointers.
    • Don't crash if PerfFile library throws NullPointerException.
Have more questions? Submit a request