.png)
- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on
03-08-2022
01:49 PM
- edited on
02-16-2023
03:43 PM
by
Tristan
Introduction
A number of useful scripts have been developed to assist in the collecting of information for debugging issues or working with Incorta support. This page contains those scripts and more will be added as appropriate.
What you should know before reading this article
We recommend that you be familiar with these Incorta concepts before exploring this topic further.
Applies to
These concepts apply to most releases of Incorta 4.x and later but are most useful for customers who have installed Incorta on-premises or in their own private cloud.
Some scripts will receive updates from time to time, so be sure to check back for the latest versions. If a particular script does not work in your environment please contact Incorta Support to see if there is a different version available.
To use performance.jsp you must have backend access to the analytics node server. Note that performance.jsp is only applicable to versions of Incorta prior to 5.0.
Let's Go
Data Collection Script
The data collection script is a bash script that can be used to collect information on your Incorta environment. Typically you would not need to run this script unless Support asks you to.
Input parameters
The package contains a parameters file that requires this data:
- Incorta home: The path to Incorta installation directory
- Tenant name: The name of the tenant
- Target folder: The path to the target folder where information will be stored
- Logs days ago: The date for getting the log files
Note: The script and parameters file have to be in the same directory as the script will read the inputs from this file.
The information gathered
The script will gather the following information:
- Configuration files (server.xml, web.xml, server.properties, engine.properties, start.sh) for each service (Analytics and loader)
- Incorta log files (incorta, tenant, and GC ) for any number of days, the first 2000 and the last 10000 lines of Catalina.
- Zookeeper logs and configurations (if not running on a separate machine)
- Spark logs and configurations (if not running on a separate machine)
- Server info (OS release, kernel details, CPU, Memory, Swappiness, ulimit details )
- Disk info (filesysm, mount options, iostat)
- Diagnostics info (the result of the commands (top, iostat, sar) at the time of the script run.
- Thread dumps (loader and analytics services)
- JVM flags (used for the tomcat PID)
- cmc.log config and logs (if not running on a separate machine)
- System logs (syslogs-messages)
- SAR files
- SQLi logs
- Tenant export (if CMC is running on the server)
- Notebook logs (if a notebook service is running on the server)
- Local access logs
Running the script
To run the script after receiving it from Support:
- download it into the server running incorta
$ chmod +x Incorta_DataCollection.sh
$ ./Incorta_DataCollection.sh
Spark Data Collection
In some environments, the Spark WebUI may not be accessible due to ports not being open or other security considerations. It is still possible to view the output files (stdout and stderr) if you have access to the Spark master machine.
- Go to <Spark Home>
- Go to the <Spark Home>/work
- You will see a list app id, go to the specific app id and go to the child folder
- You will see the files: stderr, stdout, and the py file generated by Incorta
Since the python code with the Incorta Materialized View (MV) name is stored in the folder, we can identify the log for a specific MV:
cd <spark home>/work
ls -ld app-*-*/0/* |grep BOOK_BY
Performance.jsp
The main purpose of this tool is to identify regressions between Incorta builds. It can also be used to simulate concurrent user requests for stress testing Incorta dashboards.
IMPORTANT NOTE: performance.jsp is only applicable for versions of Incorta prior to 5.0.
Instructions
You will need to have backend access to the analytics node server
- Contact Incorta Support for the most recent version, and then upload it to analytics node server under <Incorta Installation>/IncortaNode/runtime/webapps/incorta folder
- Open a web browser, go to the URL <host:port>/incorta/performance.jsp, then the form page will appear:
- The script is not compatible with all Incorta versions, so you may get 500 errors like this:
- In the error line, there may be a comment that shows how to solve it, such as line 225 comment “Comment/Uncomment between next two lines if you get compilation error“.
- To solve this, open the deployed file with any text editor, then simply toggle the comment literals “//“ between the two lines 227 and 228, and save the file then open the page again from the browser, you might get the same error but in different lines. then make another iteration then the error should be solved.
Input Parameters
- Enter the tenant name, user and password
- Choose the Run mode, if you want a new run then choose Run radio button, or Compare radio button to compare with other runs
- In the Workspace Path field, enter a path in the analytics node file system where it will save the results in.
- In Run Name field, enter any identifier name for your run
- If Compare mode is selected, then enter the compared with run identifier name in Previous Run Name to Compare with field
- If Compare Only checked then Run Name field should be a name of already existed run on the workspace folder, so the tool will generate a comparison report only between two existing runs, not creating a new run.
- Impersonate all users if you want to run with all users in the tenant or enter usernames comma-separated to run for them only
- Skip not owned dashboards if this option checked, the dashboard will be run once by its owner only. if the other users have access but not the owner then it will be skipped for them.
- You can specify a set of folders or dashboards to run instead of running the entire tenant by entering the folder IDs and dashboard GUIDs comma separated.
- You can save a file with dashboard GUIDs (line separated) on the analytics node file path and send the path in guidFilePath query parameter in the URL
- Apply Sorting flag will add sorting by the first measure column by default. this is used if you are comparing between runs and want to minimize mismatches caused by different row orders if the table doesn’t have a sort by column.
- In the concurrency section, you can configure how many insights to run in parallel from Number of Workers dropdown
- If you want to log the query plan for all insights then add logQueryPlan query parameter to the URL with the value true.
- if you want to run without saving the run outputs then add noReportData query parameter to the URL with the value true
Results
The tool has two output formats (HTML/Tab-separated .txt file) that get generated based on the Mode.
Run Mode HTML
HTML report for the list of all visited insights and its folder/dashboard hierarchy with the loading time. It will flag the skipped and failed insights.
If the skipped insights are the hierarchy insights, or if the skipped not owned dashboards flag is on, then all the insights belonging to a dashboard that is not owned by the running user will be skipped. An example of the HTML output:
Run Mode Tab-separated .txt
A Tab-separated .txt file will be generated in the run folder to save the report for further analysis. The file will contain the following columns:
Column |
Description |
Possible Values |
Time |
Load Time in Milliseconds |
1 for skipped insights, -2 for failed insights |
Run Name |
The Run Identifier Name |
|
Run Timestamp |
Timestamp when the run started |
field, gauge, single row chart, series chart, pivot chart, flat table, aggregated table, detail table, pivot, unknown |
Path |
The Path to the entity |
|
Error |
The error message if the insight failed |
format, format2, sort, sort2, right_fail, fail, left_fail, mismatch (exact lines count), mismatch (different lines count) |
Entity Type |
Entity Type |
Tenant, folder, dashboard, insight |
Compare Mode HTML
The HTML report for compare mode will list the count of mismatched insights with the insight type and mismatch category/sub-category. An example of the HTML output:
Compare Mode Tab-separated .txt
A Tab-separated compare.txt file will be generated in the compare folder to save the report for further analysis. The file will contain the following columns:
Column |
Description |
Possible values |
User |
Username for the user who rendered the insight |
- |
Tenant |
Tenant name |
- |
Left |
New run name |
- |
Right |
Referenced (compared with) run name |
- |
Insight category |
category of the rendered insight |
field, gauge, single row chart, series chart, pivot chart, flat table, aggregated table, detail table, pivot, unknown |
Mismatch type |
identified mismatch type between the two runs |
format, format2, sort, sort2, right_fail, fail, left_fail, mismatch (exact lines count), mismatch (different lines count) |
Insight name |
the insight name |
if the insight is unnamed then guid is used |
Path |
Incorta relative insight path |
- |
File path (Left) |
file system path to the new run file |
- |
File path (Right) |
file system path to the old run file |
- |
Comparison Analysis
- Navigate to the compare folder under the workspace folder, then navigate to the folder name <run-name>--<compare-with-run-name> which you want to check.
- In this folder you will find two folders with the run names, then open these folders with any file comparison tool like meld or beyond compare.
- You can then navigate through the mismatches and check the run output for further investigation.
- The mismatches will be categorized into the following:
Mismatch Category |
Mismatch Sub Category |
Description |
Notes |
1- Exact Match |
exact |
exactly matching |
|
2- Semantic Match |
format |
format mismatch |
Some values have different format |
|
format2 |
format mismatch with ignoring precision |
Some variation of above around precision |
|
sort |
sorting mismatch |
The order does not match for reports where sorting was declared |
|
sort2 |
sorting mismatch with ignoring percision |
differences in order with some variations around precision |
|
fail |
both are failing |
Report did not run - no comparison needed |
|
right_fail |
old run failed |
New run succeeded, but old run had failed |
4- Mismatch |
left_fail |
old run succeeded but the new run fails |
|
5- Unknown |
mismatch (exact lines count) |
content mismatch, but the line count is exactly the same between both runs. |
|
|
mismatch (different lines count) |
mismatch and line count is different. |
|
Notes
- You can use the tool to simulate concurrent requests, using Run mode only.
- You can use the tool to find regressions between builds. first, you need to generate run before the upgrade, then upgrade, then generate a new run with compare to the run before the upgrade, then analyze the mismatches if found.
- The maximum number of parallel users while using impersonation is 10.
- Try to keep the Number of Workers low when running the tool with multiple users.
- All the inputs can be set in the URL as query parameters, so you can run the tool through a curl command or schedule it with a cronjob like this example:
BASE_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
host='http://localhost:8080'
tenant='ebs_cloud'
usr='admin'
passEnc='vNurdjf9YdcpAFPBqSawXQ=='
path='/home/incorta/zakaria' #doesn't end with '/', can be empty
comparewithrunname='' #compare run name, leave it empty for run mode
compareonly='' #set it to 'true' to compare only folderIds='' #comma separated ids '746,225,222,227', leave empty if not needed
dashboardIds='' #comma separated guids 7fa751fd-fa26-4036-9d85-cd718631bfd6,3e1e7e99-3c11-491c-8938-f5efb67515be', leave empty if not needed
runwithusers='' #comma separated usernames 'user1,user2', leave empty if not needed
workerCount=4 #number of threads
output='html' #can be html,csv,txt and data.
runname=`date "+%Y%m%d%H%M%S"` #run name. it is the current timestamp but can be anything
if [ "$output" != "data" ]; then
curl -s -d "tenant=$tenant&login=$usr&passEnc=$passEnc&workspacePath=$path¤tRun=$runname&ref=$comparewithrunname&compareOnly=$compareonly&foldersIdsStr=$folderIds&guidsStr=$dashboardIds&userNamesStr=$runwithusers&workersCount=$workerCount&output=$output" $host/incorta/performance.jsp > $runname.$output
else #compare is not supported with 'data' output
curl -s -d "tenant=$tenant&login=$usr&passEnc=$passEnc&workspacePath=$path¤tRun=$runname&foldersIdsStr=$folderIds&guidsStr=$dashboardIds&userNamesStr=$runwithusers&workersCount=$workerCount&output=data" $host/incorta/performance.jsp > $runname.zip
fi
Tenant Comparison Tool
Incorta admins often need to migrate changes from a UAT or Dev instances to PROD but they are not sure which changes took place in either environment. The manual comparison is tedious and error-prone...even the raw XML comparison is still error-prone because the same entity (insight, schema, table, etc) or attribute can be found in different positions in the two XMLs.
Raw XML comparison also would reveal changes that are merely there due to Incorta schema upgrades but they are not really important changes that the user need to be aware of.
The Tenant Comparison Tool is a standalone command-line (CLI) tool that helps users do in comparing two different tenants (for example UAT tenant and PROD tenant) by comparing the dashboards, schemas, business schemas and data sources between the two tenants.
The user can specify the output format of the tool, HTML or CSV. Using the CSV format, the user can upload the comparison output CSV files to Incorta, and use the pre-built schemas and dashboards shipped with the tool to analyze the differences between the tenants using Incorta dashboards.
Terminology
‘Entity’ id will refer to one of the following: Schema, Business Schema, Dashboard, Data Source
- Revised Entity: this represents the more updated version of the entity. (e.g. UAT version)
- Base Entity: this represents the older version of the entity. (e.g. Production version)
Installation and Usage
- Ensure that JDK 1.8 is installed on your machine.
- Contact Incorta Support for the appropriate version of the tool.
- Import the dashboard folder (named "incorta_dashboards_export.zip”) and the schema (named “schema_export.zip“) to an Incorta instance that you are planning to use to analyze the tool’s output.
- Unzip the tool build file comparison-tool.zip to some location on your machine.
- Export the base tenant and the new tenant from the environments you are comparing (e.g. UAT and PROD).
- Run the tool by running the script compare-incorta.sh and pass the input parameters explained in the Command-Line Options section.
- If you choose the output to be in CSV format to analyze the output in tool’s dashboards:
- Compress the output folder then upload it to Incorta under the Incorta Data Files section.
- Reload the schema named “Incorta_tenant_comparison“
- Check the tools' dashboards under the dashboard folder “Tenant Comparison“
Output Formats
The tool can produce comparison reports in two different file formats, HTML and CSV format.
- The HTML format can be viewed in the browser by the help of the index.html file generation with the output files, that lists the changed entities alongside with a hyperlink to navigate and view the changes.
- The CSV format should be used if you are planning to analyze the output using the auxiliary dashboards shipped with the tool.
Command-Line Options
Parameter |
Description |
-b,--base-path |
The path to the base tenant export (zip) file. |
-n,--new-path |
The path to the new version of the tenant export (zip) file. |
-o,--output-path |
The output directory where you should save the comparison reports and the migrated entities to. The default output path is './comparison-output' which is next to the jar file. |
-f,--output-format |
The output comparison file format, you can specify whether it is 'html' or 'csv'. The default format is HTML. |
-s,--schema-list |
A comma-separated list of schema/business-schema names to compare. If not specified or a value 'all' is passed, all base-new pair of schemas should be compared. |
-d,--dashboard-list |
A comma-separated list of dashboard names to compare. If not specified or a value 'all' is passed, all base-new pair of dashboards should be compared. |
-c,--datasource-list |
A comma-separated list of data source names to compare. If not specified or a value 'all' is passed, all base-new pair of data sources should be compared. |
-m,--mode |
Specify 'compare-only' to only compare the tenants. The default value is 'compare-only'. There is another mode called 'migrate' which will migrate (step-by-step) the mentioned entities from the base to the revised version (this is still under investigation). |
-u,--create-subfolder |
Whether to save the output in a sub-folder (named by the run timestamp) under the specified output path. |
Examples
This command compares the whole SAP tenant environments by providing the paths of their exports:
1./compare-incorta.sh \ 2 -b tenant_sapeccdev_20200609.zip \ 3 -n tenant_sapeccdev_ent_20200609.zip \ 4 -o compare-tenant-output \ 5 -m compare-only \ 6 -f csv \ 7 -s all \ 8 -d all \ 9 -c all
Another example of comparing specific schemas (SAPECC_PP and MS_AR), specific dashboards (Unused Business Views and Sales Order Check) and all data sources would be:
1./compare-incorta.sh \ 2 -b tenant_sapeccdev_20200609.zip \ 3 -n tenant_sapeccdev_ent_20200609.zip \ 4 -o compare-tenant-output \ 5 -m compare-only \ 6 -f csv \ 7 -s SAPECC_PP,MS_AR \ 8 -d "Unused Business Views,Sales Order Check" \ 9 -c all
The standard output summary for the above command:
The following screenshot shows the output index.html page generated, and each change has a hyperlink to the changed report. This is useful when dealing with an HTML output format:
The following screenshots show the output after uploading the CSV output into Incorta and previewing the Schema Summary dashboard:
Implementation Details
Scanning Phase
The tool scans the input paths of the two tenant exports, then constructs the pairs of the corresponding entities (schemas, dashboards, etc.) to be compared later. The user can specify the names of specific entities to limit the scope of comparison or can include every existing entity (can be shown in the command line options section).
Comparison Phase
The pairs saved during the scanning phase are being processed. Each entity is being processed one after the other, starting with schemas and business schemas, then data sources, and finally dashboards.
For each pair of corresponding entities:
- Two DOM trees are built from their XML representation. Let's call them the Revised and Base trees as mentioned in the terminology.
- Traverse the Revised DOM tree, and for each element in the tree, the tool will search for a corresponding element in the Base tree. The definition of the corresponding element is defined by a configuration file that identifies each element's key XPATH. For example, a Dashboard is identified by its @name attribute, a Measure is identified by both its Label and Field attributes, etc.
- If no Corresponding Element is not found in the Base tree, then the tool marks the current element as ADDED.
- If the corresponding element is found in the Base tree, then the tool will compare the value of the elements in both trees and the values of each attribute. If any change in values were detected, the tool will record that the current element was UPDATED.
- Then the tool will try to detect the deleted elements, by traversing the Base DOM tree. For each element, if there's no Corresponding Element was found in the Revised tree, then the tool will mark the current element as DELETED.
- After recording these changes and based on the user output format, the output files could in format HTML or CSV. The HTML format produces English-like statements to describe these changes. The CSV format is useful for uploading this data into Incorta and do the visualization there by creating customized insights.
- An additional file is generated called ‘index.html’ that lists the names of each entity along with a hyperlink to navigate to the output report file. (Useful when using HTML format)
Log Parser
It can be difficult to look for specific information in Incorta log files manually, and often the log files may be too large to be opened in a text editor. Incorta maintains several versions of log parsers that will parse out the logs from the Loader or Analytics Service into a CSV file that makes it much easier to read, search or even use as a data source for an Incorta dashboard. There are also versions that can parse the log files without having to extract the log files from a .zip file in case of very large log files. Please contact Incorta Support if you are in need of one and they can provide the appropriate version.
Extract Table Column Attributes
This python script takes an unzipped tenant export file and produces a .csv file which has all the table and column details like data type, function, label, etc. within a schema. Please click HERE for the latest version of the script.
Missing Sync Notifier
Network or server issues could potentially cause nodes to go out of sync, a problem which may cause data inconsistency and information loss. The below script can be run periodically to review the successfully loaded jobs in the metadata and determine if the sync message arrived for the node.
Implementation Details
- This tool will only work for on-premise customers, as the tool will need to be put in the 'IncortaNode' folder on each physical node in the cluster.
- It will create a connection to the metadata DB and extract the successfully loaded jobs to compare them with the syncs that appear in the log file.
- Once it determines that all the load jobs have a corresponding sync it will save the time for the last successful loaded job.
- If it finds that there is a missing sync it will trigger an email notifying you that on a specific node, the specific schema(s) failed to sync on specific load job(s),
- The first time the tool runs on a machine it checks for all load jobs ran the day up to the current the run time.
The jar file takes only 1 parameter which is its absolute path if it is ran from different path which is provided automatically by running it from the shell script.
In order for the tool to function correctly you must provide a ‘logger.properties’ file which looks like this:
mail.smtp.auth=true
mail.smtp.starttls.enable=true
mail.smtp.host=smtp.gmail.com
mail.smtp.port=587
mail.smtp.ssl.trust=smtp.gmail.com
mail.session.mail.transport.protocol=smtp
from=system@incorta.com
to=[ADD EMAIL ADDRESSES]
password=[ADD PASSWORD]
env=[ENTER ENVIRONMENT NAME]
initialStart=72
log_location=[ADD PATH TO /IncortaNode/ FOLDER]