We will try to collect all the needed and common information to be provided to the created ticket to facilitate the investigation process and expedite the solution also to avoid unneeded back and forth communication between the customer and Support.
Kindly refer to the article https://community.incorta.com/t5/administration-knowledgebase/useful-scripts/ta-p/193 for details about Data Collection Script.
Kindly refer to the article https://docs.incorta.com/5.2/release-notes-5-2-3/#thread-dump-generation for details about how to generate thread dumps from the CMC. "Starting from 5.2.3"
Here are the points that will be covered in the article:
How to create an optimal ticket.
What information needs to be provided for each ticket type.
How to help the support team to perform better by creating complete ticket.
Setting priorities accordingly.
In this article we will discuss each and every point separately:
How to create an optimal ticket? We need to have the following:
Clear description of the issue. - Time frame. - Upgrade issue (Did the issue occurred after an upgrade?) - Names of the users doing the action. - Screenshots, Recordings ..etc - Log files.
The current version of Incorta, The previous version in case of issues occuring after upgrades.
Issues environment: - Cloud or non-cloud. - PRD or DEV.
If It’s a cloud environment: - we will need a token. - we will also need the cluster’s name. 2. How to help the support team to perform better by creating complete ticket:
Describe the problem as is, then we need to gather the information that will help us start the investigation as we won't be able to start the investigation without most of the details. Then we start the analysis process that will kind of making the picture clearer and soon we will have a complete vision of the issue to get it solved.
3. Common ticket types We have 5 main components that most issues happen in:
4. What information needs to be provided for each type of issues?
Schema stuck It means that the schema is running for long time and not being completed, In this case we will need: 1. Thread dumps that can be generated while running the data collection script and starting from 5.2.3 thread dumps can be generated from the CMC UI as shown here . (Needs to be taken before doing Restart) 2. Name of the schema. 3. Tenant loader logs. 4. Time the schema started.
Schema performance It means that there's a slowness along days or between environments, In this case we will need: 1. What’s the baseline of the performance (Good Run). 2. Loader logs for Good Run Vs Bad Run
Schema queued for long time In this case we will need: 1. Name of the queued schema. 2. GC logs. 3. Tenant loader logs
Schema failure In this case we will need: 1. Name of the failed schema. 2. Screenshot of the errors. 3. Tenant loader logs.
Login is Slow ( User name , Time of login , Analytics Tenant login)
Dashboard Slowness ( Dashboard/Insight Name , Dashboard/Insight url , Analytics Tenant Logs , Date and Time of the issue , User name experiencing the issue, Was this Dashboard working better before or an optimization is needed in general).
General Slowness ( Probably this will need a call to gather info about the end point causing the slowness).
Insight error In this case we will need: 1. What’s the encountered error? 2. Dashboard and Insight Name 3. Dashboard /Tenant export 4. Tenant analytics logs.
Login issues (SSO) In this case we will need: 1. Ask if it's the first time to login using SSO? 2. Name of the provider (OKTA , ADFS ...etc) 3. SSO Configurations from the CMC. 4. Screenshot from the error. 5. User Name, Tenant and service analytics logs.
Connector issues In this case we will need: 1. Name of the connector. 2. Screenshot of the errors. 3. Tenant analytics logs to test the connection. 4. Tenant loader logs to check the failures of the tables.
Failed MVs In this case we will need: 1. Screenshot of the error. 2. Configurations of the Spark from CMC. 3. Properties on the MV’s level If there’s any. 4. Spark and loader logs. 5. Event logs and work logs.
Notebook failures In this case we will need: 1. What’s the encountered error? 2. Zeppelin logs. 3. Tenant analytics logs.
Stuck MVs In this case we will need: 1. Name of the stuck MV. 2. Tenant loader logs. 3. Spark logs.
Failed tables that use Postgresql ‘IOI’ In this case we will need: 1. Screenshot of the error. 2. Tenant loader logs. 3. SQLI logs from the loader service
Stuck tables the use Postgresql ‘IOI’ In this case we will need: 1. Name of the table. 2. Tenant Loader logs. 3. SQLI logs.
Crashes It means that services are down and asking for the Data collection script is the best approach here on all the services (It’s better to run it using sudo/Root access).
OOM It means Out Of Memory failure and asking for Data collection script is the best approach here on all the services, as we will be able to get all the needed details from it and recommend a resizing.