The article below will walk you through steps for troubleshooting and diagnosing high-consumption CPU issues on UNIX servers hosting Incorta / Spark. The article will be divided into four sections:
How to identify high CPU usage
How to find the process causing the high CPU
How to find the thread within the process causing the problem, and
Finding the thread name.
Part 1: How to Identify High CPU Usage
Using top command
Top is an interactive unix tool that is used to show OS metrics. It can be interactively executed at the time of the issue.
Id (highlighted in snapshot below) represents the idle percentage for the total CPU. The higher the CPU, the closer the idle % will be to zero.
Total processing power will be the number of cores * 100. For example, a machine with 16 cores can be up to 1600%.
In the top command, if you see a process consuming 200% or 300% (above 100%). This is not considered a high CPU utilization unless all cores are considered.
Use the following table as a guide to using the top command.
sort all running processes by CPUusage
sort all processes by how long the processes have been running
hide all idle processes
sort all running processes by Memory usage
2. Monitoring Systems:
Several 3rd party monitoring systems can be configured to monitor Incorta Servers OS metrics to detect usage anomalies.
For example, the below snapshot is taken from WatchDog; used to monitor the CPU for an Incorta Server. You can recognize the high CPU peak highlighted below.
3. OS SAR (System Analysis Reports):
Unix provides system analysis reports located under var/log/sar. The files will help to check the OS's performance metrics. With some other third-party tools, these files can be graphed to show these metrics in a visual format.
Check your OS manual; sometimes SAR is not enabled by default.
SAR reports are available in text and binary format (sa→ binary, sar→ text). Files are rotated at the end of the day from the binary to text format. Also, ad-hoc conversion can be done using UNIX commands.
Part 2: Which Process is causing the high CPU?
Now that we have identified a CPU issue, we want to drill down further to the process causing the issue. In some cases, Incorta is not only the only software installed on a server; we also can find Spark, MetaDBs, etc.
Running OS commands at the time of the issue is a challenging way to diagnose such problems due to complications executing these commands themselves. Servers may not be accessible during high CPU incidents, or the issue has been resolved before we can diagnose it.
OS commands can be registered on the crontab script to check which process is causing the problem.
1. ps commands
Ps is often the most effective method since ps gives you a wide range of info on the process running. Unlike the top command, it prints the whole command being executed.
The below command is used to print out the 6 highest CPU-consuming processes. Note that this can command can differ from one UNIX distribution to another.
Using these steps, you can quickly and efficiently investigate the issue. The Incorta Engineering team will help you further by using all the information you provided through this investigation.
As a future consideration, automation can be done to collect all of the above info regularly. Just remember that your script's run intervals will be run in a shorter time window than the period of the issue in which the high CPU consumption happens.