03-21-2022 08:48 AM - edited 05-26-2023 01:02 PM
Over time as you add new users and new data sources, import additional data, and use Incorta data in different ways, the stresses on your Incorta instance architecture will change as well. Like other enterprise applications, it is important for you to monitor the processes that Incorta requires to make sure that your application administration team is aware of any issues that could adversely affect the experience for your users. Appropriate monitoring can provide insight into how your instance is performing over time so that you can take corrective actions to ensure a good experience or to reduce cost.
We recommend that you be familiar with these Incorta concepts before exploring this topic further.
These concepts apply to all releases of Incorta. Note that for Incorta Cloud customers, Incorta will manage most of the monitoring for you.
There are many different products that can be used for monitoring applications and in the future Incorta will be offering an expanded self monitoring capability from within the Cluster Management Console (CMC). Until such time as the new monitoring features are available natively, choose the third party tool that works best for you. Some commonly used tools include Cloudwatch (AWS) often combined with Datadog, and Appdynamics.
The table below lists the recommended areas to monitor for your Incorta implementation. We also recommend that you set up alerts on most of these measures so that if usage goes too high, your admin team gets notified.
To Monitor |
What To Look For |
Alert |
CPU |
If the CPU goes above 90% for an extended period of time, you should be aware. Note that sometimes this is normal. |
Send an alert if the CPU usage is greater than 90% for more than 30 minutes. |
I/O Wait |
High I/O wait can slow Incorta down. |
Send an alert if the I/O wait is above 90% for more than 30 minutes |
RAM |
If the RAM usage is very high, it could cause a crash. Rising usage over time could indicate a memory leak. |
Send an alert when RAM usage goes above 90%. |
Loader Service |
Check to make sure that the service is up and running. To identify the Incorta loader java process to monitor, run: <incorta home>/IncortaNode/listServices.sh The Service Location will contain a GUID that you can use to monitor the loader process. |
Send an alert if the service goes down. Note that starting with 4.9, the CMC will send an email to the designated Administrator email address if the Loader Service goes down and will attempt to restart itself up to three times. |
Analytics Service |
Check to make sure that the service is up and running. To identify the Incorta loader java process to monitor, run: <incorta home>/IncortaNode/listServices.sh The Service Location will contain a GUID that you can use to monitor the loader process. |
Send an alert if the service goes down. If you use a load balancer, it should be possible to configure a health check. Note that starting with 4.9, the CMC will send an email to the designated Administrator email address if the Analytics Service goes down and will attempt to restart itself up to three times. |
If you are using a Data Agent to connect from Incorta to your data sources, check to make sure that the Data Agent service installed in your network (not in Incorta) is up and running. |
Send an alert if the Data Agent service goes down. |
|
CMC Service |
Check to make sure that the service is up and running. |
Send an alert if the service goes down. |
Check to make sure that the service is up and running. |
Send an alert if the service goes down. |
|
Zookeeper |
Check to make sure that the service is up and running |
Send an alert if the service goes down. |
Active Memory (for Analytics and Loader services) |
Check on On Heap and Off Heap memory usage. You can check from the OS or from the CMC endpoint. |
For more advanced monitoring of memory usage, consider the use of an Application Performance Monitoring tool such as AppDynamics. |
Disk Usage |
Check to see that the disk is not filled up. |
Send alerts when the disk reaches 80% (Warning) of capacity and 90% of capacity (Action Required). |
Network Traffic |
Network traffic can be used as a diagnostic tool. If an issue occurs, you can look at network traffic history to help determine the root cause of the issue. |
|
Requests (count) |
Tracking requests can be used as a diagnostic tool. The number of requests coming into your instance |
|
Servers |
Check to see how long the server has been up so that you know if there has been a reboot. |
Send an alert if a server goes down. |
MySQL Connections (Metadata database) |
By default, MySQL has a maximum of 300 connections. Incorta rarely uses more than 30 connections. |
Send an alert if the number of connections rises above 125. |