Introduction
We will try to collect all the needed and common information to be provided to the created ticket to facilitate the investigation process and expedite the solution also to avoid unneeded back and forth communication between the customer and Support.
Kindly refer to the article https://community.incorta.com/t5/administration-knowledgebase/useful-scripts/ta-p/193 for details about Data Collection Script.
Kindly refer to the article https://docs.incorta.com/5.2/release-notes-5-2-3/#thread-dump-generation for details about how to generate thread dumps from the CMC. "Starting from 5.2.3"
Here are the points that will be covered in the article:
- How to create an optimal ticket.
- What information needs to be provided for each ticket type.
- How to help the support team to perform better by creating complete ticket.
- Ticket types.
- Setting priorities accordingly.
Let's Go
In this article we will discuss each and every point separately:
- How to create an optimal ticket? We need to have the following:
- Clear description of the issue.
- Time frame.
- Upgrade issue (Did the issue occurred after an upgrade?)
- Names of the users doing the action.
- Screenshots, Recordings ..etc
- Log files. - The current version of Incorta, The previous version in case of issues occuring after upgrades.
- Issues environment:
- Cloud or non-cloud.
- PRD or DEV.
- If It’s a cloud environment:
- we will need a token.
- we will also need the cluster’s name.
2. How to help the support team to perform better by creating complete ticket:
Describe the problem as is, then we need to gather the information that will help us start the investigation as we won't be able to start the investigation without most of the details.
Then we start the analysis process that will kind of making the picture clearer and soon we will have a complete vision of the issue to get it solved.
3. Common ticket types
We have 5 main components that most issues happen in:
- Loader
- Analytics
- Spark
- SQLI
- Memory
4. What information needs to be provided for each type of issues?
Loader component
- Schema stuck
It means that the schema is running for long time and not being completed, In this case we will need:
1. Thread dumps that can be generated while running the data collection script and starting from 5.2.3 thread dumps can be generated from the CMC UI as shown here . (Needs to be taken before doing Restart)
2. Name of the schema.
3. Tenant loader logs.
4. Time the schema started. - Schema performance
It means that there's a slowness along days or between environments, In this case we will need:
1. What’s the baseline of the performance (Good Run).
2. Loader logs for Good Run Vs Bad Run
- Schema queued for long time
In this case we will need:
1. Name of the queued schema.
2. GC logs.
3. Tenant loader logs
- Schema failure
In this case we will need:
1. Name of the failed schema.
2. Screenshot of the errors.
3. Tenant loader logs.
Analytics component
- Login is Slow ( User name , Time of login , Analytics Tenant login)
- Dashboard Slowness ( Dashboard/Insight Name , Dashboard/Insight url , Analytics Tenant Logs , Date and Time of the issue , User name experiencing the issue, Was this Dashboard working better before or an optimization is needed in general).
- General Slowness ( Probably this will need a call to gather info about the end point causing the slowness).
- Insight error
In this case we will need:
1. What’s the encountered error?
2. Dashboard and Insight Name
3. Dashboard /Tenant export
4. Tenant analytics logs.
- Login issues (SSO)
In this case we will need:
1. Ask if it's the first time to login using SSO?
2. Name of the provider (OKTA , ADFS ...etc)
3. SSO Configurations from the CMC.
4. Screenshot from the error.
5. User Name, Tenant and service analytics logs.
- Connector issues
In this case we will need:
1. Name of the connector.
2. Screenshot of the errors.
3. Tenant analytics logs to test the connection.
4. Tenant loader logs to check the failures of the tables.
Spark issues
- Failed MVs
In this case we will need:
1. Screenshot of the error.
2. Configurations of the Spark from CMC.
3. Properties on the MV’s level If there’s any.
4. Spark and loader logs.
5. Event logs and work logs.
- Notebook failures
In this case we will need:
1. What’s the encountered error?
2. Zeppelin logs.
3. Tenant analytics logs.
- Stuck MVs
In this case we will need:
1. Name of the stuck MV.
2. Tenant loader logs.
3. Spark logs.
SQLI issues
- Failed tables that use Postgresql ‘IOI’
In this case we will need:
1. Screenshot of the error.
2. Tenant loader logs.
3. SQLI logs from the loader service - Stuck tables the use Postgresql ‘IOI’
In this case we will need:
1. Name of the table.
2. Tenant Loader logs.
3. SQLI logs.
Memory issues
- Crashes
It means that services are down and asking for the Data collection script is the best approach here on all the services (It’s better to run it using sudo/Root access).
- OOM
It means Out Of Memory failure and asking for Data collection script is the best approach here on all the services, as we will be able to get all the needed details from it and recommend a resizing.