Sysdig Monitor Troubleshooting Guide

This guide is intended to help you troubleshoot and proactively collect information for a technical support request. When troubleshooting issues with the Sysdig Monitor service or where the agent will not install or run, we may ask for multiple pieces of information. Supplying as much information as possible with your request will help considerably in resolving your issue as quickly as possible.

Most importantly: please detail the problem you are having, and list any commands given and errors seen verbatim (screenshots preferred) and supply the most recent log file covering the time period of the error.  Since the agent creates up to eleven 10MB log files in rotation, each with a date and time-stamp, we are able to troubleshoot some agent reporting issues for a substantial time after any event.

Sysdig Monitor 'Frequently Asked Questions' link

 

Confirm Agent Connectivity

When you notice a host not showing up in the user interface or see "Error, connection_manager: Lost connection" messages in the agent's log file (/opt/draios/logs/draios.log),  suspect agent connectivity problems.  Here are a few quick things to check:

1. Confirm that you have not used all available agent licenses. The agent license count is available in the Settings > Subscription tab.  You can purchase additional agent licenses from that tab if needed.

2. Confirm basic connectivity through your firewall:

ping collector.sysdigcloud.com

3. Confirm the correct port is open through your firewall:

telnet collector.sysdigcloud.com 6666 

See the FAQ to change the port number to 80 if 6666 is not available.

4. Check for duplicate MAC addresses in your hosts, when the agent starts, an entry in the /opt/draios/logs/draios.log file reports the host's MAC:

2016-09-26 10:20:25.982, 2363, Information, machine id: a2:11:0b:84:11:21 

Compare the logged MAC address to any existing reporting agents in the Explore tab using the 'Hosts & Containers' hierarchy (Show feature).

5. Confirm your access key is correct. Your access key is available in the Settings >  User Profile tab of your account. You can see the Sysdig agent key as configured in /opt/draios/etc/dragent.yaml.

 

The agent is routinely updated to include new features and resolve bugs.  Many times problems can be resolved by simply updating your Sysdig agent.  Please update your agent to make sure you are not troubleshooting a known issue.

 

For Your Support Request

If you need to contact us about a problem, please review the list below and feel free to supply anything else you think may be useful in helping us to understand the issue you are having and speed its resolution:

 

1 - Your Sysdig Monitor Account

If the Sysdig Monitor account you are using is not the same as the email address on your support ticket or signature, please be sure to list it so we troubleshoot the proper account. Your full name and company name can also be useful in finding you in our databases.  

 

2 - The Operating System

Please be sure to check that your operating system is supported.  Submitting the output of uname -a and lsb_release -a will help us determine if the kernel is supported. If you have a custom kernel and the kernel development headers are not available you will not be able to install our agent.

Be sure to also note if the Linux distribution used inside your app's container is not the same as the host.

 

3 - Your Infrastructure and Agent Version

When using our agent in an orchestration infrastructure (Kubernetes, ECS, OpenShift, Mesos and etc.) please let us know the details of how you installed the infrastructure to allow us to recreate your scenario. Examples of the information needed are: 

  • Which Kubernetes version
  • The cloud provider being run under (AWS, GKE, etc)
  • The tool used to setup K8S (KOPS, Kube-AWS, etc)
  • The number of nodes
  • The authentication method configured for the API server 

It is useful to examine the log file for the 'delegated' Sysdig agent polling your Kubernetes API server. You can find the address of the agent performing the API polling by issuing the command: kubectl get nodes. Agents use this output to determine if they are to become a delegated polling agent (first two nodes in the list). Alternatively, enabling debug mode in agent versions 0.64.0 and beyond will show the node list in any agent logs as well. Find the delegated nodes and use step #5 below to get their agent log files: /opt/draios/logs/draios.log.

It is also important to verify the version of the Sysdig Monitor agent installed since older agents may not support collecting metrics from the latest orchestration tools.  You can check the agent version with the command:  /opt/draios/bin/dragent --version  or via the UI by applying the Host & Containers > Sysdig Agent Summary view.

Compare your installed version to the latest release shown on our agent build list. Upgrade any older agents to the latest version to make sure you are not encountering a known issue: Sysdig-Agent-Update-Uninstall 

 

4 - Agent Start Command or Manifest File 

Many agent connection problems are due to transcription errors in the agent start command or manifest files. This is especially true with truncated access keys and when using the Additional_Conf parameter in a container agent installation. Always cut and paste and then modify our example commands or manifest files. 

  • Try running the agent from the command line using the 'docker start' or curl command.
  • Send in the command used and initial agent output.  
  • If using docker start, remove the '-d' option so output will display on the console.

Your docker start and native agent run commands are available in Settings > Agent Installation tab of the user interface. If you are not an Admin level user you will not see the installation tab, in this case please request the instructions from your admin.

 

5 - Sysdig Agent Configuration File and Logs

The Sysdig agent reads the user-settings configuration file /opt/draios/etc/dragent.yaml and generates log entries in /opt/draios/logs/draios.log. The agent will rotate out the log file when it reaches 10MB in size keeping the 10 most recent log files archived with a date-stamp appended to the filename.

It's always helpful to attach the config file and latest log to your support request if you see metrics not reporting or have agent connection issues.  Since the agent logs critical startup information when initializing, restarting the agent and then collecting the logs is desirable. For a container agent you can use: docker restart sysdig-agent  For a native Linux service agent use:  service dragent restart

Whenever possible, be sure to "attach" any files rather than cut/paste log file or config file text inline in the email body. Important formatting will be preserved.

To copy the configuration file and most recent log file out of an agent running in a container use these Docker commands:

docker cp sysdig-agent:/opt/draios/logs/draios.log  ./draios.log
docker cp sysdig-agent:/opt/draios/etc/dragent.yaml ./dragent.yaml

Please compress large files before attaching them to a support ticket.  Files over 7MB will require us to supply you with a download link.

Also be sure to let us know the host name that the files came from.

Debug log level available

 

6 - Sysdig Monitor Application Logs

Our on-premises version of the Sysdig Monitor application server is available in two infrastructure versions: Replicated or Kubernetes. Both have facilities to create useful support bundle files which are invaluable when troubleshooting suspected back-end issues such as problems with component startup or when many agents have problems connecting, and etc. Create a bundle file using the appropriate section below and send it to support for suspected backend problems. The file can be many MB in size anFd we can supply you with a third party file transfer link if you do not have your own.  Files over 7MB will need to be sent via a file transfer service.

    Replicated On-Prem Infrastructure:

If you are running the Sysdig Monitor on-premises Replicated version, you can generate a complete support bundle from the management console's Support tab. 

Go to the Support tab and click on "Download Support Bundle".  It can take a minute or two for larger installations or those with more history.  You should be prompted to save a file "replicated-support<#####>.tar.gz".  

 

    Kubernetes On-Prem Infrastructure:

If you are running the Sysdig Monitor on-premises Kubernetes version, generate the application support bundle via a script that is provided in our GitHub repository at https://github.com/draios/sysdigcloud-kubernetes: get_support_bundle.sh  Supply the script with the namespace where Sysdig Cloud is deployed and it will generate a tarball with backend logs and configuration information:

Supply the script (also attached to the end of this guide) with the namespace where Sysdig Cloud is deployed and it will generate a tarball with backend logs and configuration information:

  $ ./scripts/get_support_bundle.sh sysdigcloud

 

Generating A Sysdig Agent Coredump File

We typically ask for this only when the agent log files do not supply enough details. Core dump files are useful for troubleshooting when the Sysdig agent crashes. The ability to create a core dump is available starting in agent version 0.21.0.  To allow the agent to create a file upon a crash, add the core dump entry to the agent's user settings configuration file /opt/draios/etc/dragent.yaml

echo coredump: true >> /opt/draios/etc/dragent.yaml

After restarting the agent ('service dragent restart'` or 'docker restart sysdig-agent'), when a crash next occurs, a coredump file will be generated which can be sent to support@sysdig.com for troubleshooting.

Core dumps can be found in the location configured in /proc/sys/kernel/core_pattern. Usually, the location is /tmp or the process' current working directory. However, note that some operating systems (Ubuntu) have a hook that does custom logic with the core file. For easier troubleshooting in those cases, you can temporarily override the hook by putting 'core' inside /proc/sys/kernel/core_pattern:

echo core | sudo tee /proc/sys/kernel/core_pattern

The coredump will be called 'core' and will be found in root /, or /opt/draios if the Sysdig agent is installed natively, otherwise within the agent container. For container agent installations, retrieve the core file with the docker copy command:

docker cp sysdig-agent:/core .

 

For more information on adding parameters to a container agent's configuration file, see the FAQ: How-can-I-edit-the-agent-s-configuration-file?

 

 

Have more questions? Submit a request