Sysdig Monitor is the first and only monitoring, alerting, and troubleshooting solution designed from the ground up to provide unprecedented visibility into containerized infrastructures.
Sysdig Monitor comes with built-in, first class support for Kubernetes. In order to instrument your Kubernetes environment with Sysdig Monitor, you simply need to install the Sysdig agent container on each underlying host in your Kubernetes cluster. Sysdig Monitor will automatically begin monitoring all of your hosts, apps, pods, and services, and will also automatically connect to the Kubernetes API to pull relevant metadata about your environment.
The recommended way to install Sysdig Monitor across a Kubernetes cluster is using DaemonSets. A DaemonSet will automatically place a single Sysdig agent container (via a pod) on each node in your cluster.
Kernel headers: be sure to install the required kernel headers for all instances as detailed in the Agent Installation tab of the Sysdig Monitor app settings.
DaemonSets: The minimum version of Kubernetes that supports DaemonSets is v1.1.1.
Note, for some older versions of Kubernetes, you might need to enable DaemonSets on the Kubernetes API server by adding an addition flag to the startup command:
Step 1: Installation with DaemonSet
Schedule the Sysdig agent container as a DaemonSet by creating a resource manifest (YAML) file following this example provided on github. You must at least enter your Sysdig Monitor agent access key as found in the Agent Installation tab of the Sysdig Monitor app settings. Any parameters commented out with '#' are optional or are needed only in on-premises installations. You will likely want to uncomment and set the "TAGS" section.
Deploy the DaemonSet by issuing this command:
kubectl create -f ‘sysdig.yaml’
The Sysdig agent pods will automatically self-elect two "delegated" agents. These delegated agents will automatically detect the location of the Kubernetes API server and authenticate using the Bearer Token method to connect to the API and collect cluster metadata. If any issues arise with the delegated agents, new delegates will be automatically elected, in order to maintain a consistent connection to the API server, with no configuration needed.
Note: We populate `kubernetes.cluster.name` with the value from the agent config `k8s_cluster_name`, and if nothing is set, we report `kubernetes.cluster.name` as the value "default".
Step 2: Monitoring your master instance
If you have access to your master instance and want to monitor it in addition to your cluster, you will need to install a Sysdig agent container on the server.
The recommended method is to register your master instance of your Kubernetes cluster by restarting the master kublet with these flags:
This will allow the Sysdig DaemonSet to automatically install the agent on the master instance, just like every other node in the cluster (while still preventing "regular" pods from being scheduled on the master node).
Alternatively, you can follow the standard Sysdig install instructions and manually install a Sysdig agent container on the master instance.
If you do not have access to the Master node (eg. in a Google Container Engine environment), don't worry: this step is optional and is only necessary if you need to monitor your Master node. The Kubernetes API will still be automatically accessed remotely as detailed in Step 1 above. Note however, if the remote API server detection is failing for any reason, this local Sysdig agent container can serve as a fall back option. It will automatically attempt to connect to the local Kubernetes API. Local detection is based on process name (
hyperkube with the
--apiserver command line parameter), so if the Kubernetes API server process is named differently, it will not be auto-detected locally.
Role-Based Access Control (RBAC) Policies:
If Role-Based Access Control (RBAC) is employed, the following reduced configuration should be used for the cluster at this time. However, this list is expanding, and modifications to your configuration will be needed at a later stage.
- apiGroups: ["extensions","batch","apps",""]
- nonResourceURLs: ["/healthz", "/healthz/*"]
- kind: ServiceAccount
namespace: default # SAMPLE
name: default # SAMPLE
Manual Configuration / Troubleshooting:
If you're not seeing Kubernetes metadata appearing in Sysdig Monitor, then there may be an issue with the Sysdig agents automatically connecting to the Kubernetes API server. Some manual configuration may be needed.
Several settings in the Sysdig agent are available for manual configuration:
- Manual delegation and API server detection
- Disabling automatic API server detection and delegation
- Manually delegating a node for remote API server detection
- Providing API server port + authentication
- Adjusting the number of auto-delegated nodes
- API server connection timeout [deprecated for Sysdig agent v0.35.0+]
The necessary parameters for these settings are described below.
In some situations, these parameters can be passed in as environment variables from the DaemonSet yaml file. Alternatively, they can be added directly to the Sysdig agent's user settings configuration file:
For more details on modifying the dragent.yaml file, including how to pass in parameters to a Docker container, check out this FAQ.
Update for Kubernetes 1.6 - If the master nodes happen to be the N oldest nodes, then the rest of the nodes end up thinking the masters will elect themselves but the masters do not have any agents running on them so it causes no nodes to collect the metadata. Follow the Manual delegation instructions below.
Manual delegation and API server detection
If automatic delegation is failing, or if automatic bearer token authentication is failing, you may need to manually delegate a Sysdig agent to connect to the API server.
1. Disabling automatic delegation and API server detection
You can disable auto-delegation and (both local and remote) auto-detection of the API server by the Sysdig agent container by providing the following parameter:
2. Manually delegating a node for remote API server detection
You can manually delegate one Sysdig agent to remotely connect to the API server by providing the following parameter in the agent's dragent.yaml file:
Choose only one Sysdig agent for this role. Provide the API server's connection protocol, address, and port. Also specify credentials if non-default (see below for more auth options).
3. Providing API server port + authentication
If the API server uses a different port or requires credentials, they must be specified in the configuration files. Three forms of authentication are supported: basic HTTP, client certificate-based and/or server verification, and bearer token. For more details, see Kubernetes authentication.
Basic HTTP Authentication and Port specification
You must specify the API server's connection method, user credentials, and port:
Client Certificate-based Authentication and/or Server Verification
The client authentication certificate and private key file names must be configured with two parameters:
If private key is protected with a password, the password must be specified:
Server verification setting is configured with following entries:
The paths to all files can be either absolute or relative to /opt/draios/ directory.
Bearer Token Authentication
The token file name for bearer token authentication must be configured with a parameter:
Adjusting the number of auto-delegated nodes
By default, two Sysdig agent containers are automatically elected as delegates - this is the recommended number. However, the number of automatically delegated nodes can be adjusted manually:
API Timeouts Occur
Note, this functionality is deprecated starting with v0.35.0 of the Sysdig agent.
When communicating with the API server, the designated agent waits for a response for 10 seconds before timing out. While timing is not usually a concern when the API server is on the same machine (connection attempts to non-existing servers will fail very quickly), it may become an obstacle in cases when an API server is remote.
While rarely required, you can increase the value of the timeout parameter in milliseconds. The default is 10000 ms (10 seconds). The value below causes the agent to wait for 15 seconds before a timeout will occur: