Useful Scripts

TomW · ‎03-08-2022

Introduction

A number of useful scripts have been developed to assist in the collecting of information for debugging issues or working with Incorta support. This page contains those scripts and more will be added as appropriate.

What you should know before reading this article

We recommend that you be familiar with these Incorta concepts before exploring this topic further.

Install Incorta

Applies to

These concepts apply to most releases of Incorta 4.x and later but are most useful for customers who have installed Incorta on-premises or in their own private cloud.

Some scripts will receive updates from time to time, so be sure to check back for the latest versions. If a particular script does not work in your environment please contact Incorta Support to see if there is a different version available.

To use performance.jsp you must have backend access to the analytics node server. Note that performance.jsp is only applicable to versions of Incorta prior to 5.0.

Let's Go

Data Collection Script

The data collection script is a bash script that can be used to collect information on your Incorta environment. Typically you would not need to run this script unless Support asks you to.

Input parameters

The package contains a parameters file that requires this data:

Incorta home: The path to Incorta installation directory
Tenant name: The name of the tenant
Target folder: The path to the target folder where information will be stored
Logs days ago: The date for getting the log files

Note: The script and parameters file have to be in the same directory as the script will read the inputs from this file.

The information gathered

The script will gather the following information:

Configuration files (server.xml, web.xml, server.properties, engine.properties, start.sh) for each service (Analytics and loader)
Incorta log files (incorta, tenant, and GC ) for any number of days, the first 2000 and the last 10000 lines of Catalina.
Zookeeper logs and configurations (if not running on a separate machine)
Spark logs and configurations (if not running on a separate machine)
Server info (OS release, kernel details, CPU, Memory, Swappiness, ulimit details )
Disk info (filesysm, mount options, iostat)
Diagnostics info (the result of the commands (top, iostat, sar) at the time of the script run.
Thread dumps (loader and analytics services)
JVM flags (used for the tomcat PID)
cmc.log config and logs (if not running on a separate machine)
System logs (syslogs-messages)
SAR files
SQLi logs
Tenant export (if CMC is running on the server)
Notebook logs (if a notebook service is running on the server)
Local access logs

Running the script

To run the script after receiving it from Support:
- download it into the server running incorta
$ chmod +x Incorta_DataCollection.sh
$ ./Incorta_DataCollection.sh

Spark Data Collection

In some environments, the Spark WebUI may not be accessible due to ports not being open or other security considerations. It is still possible to view the output files (stdout and stderr) if you have access to the Spark master machine.

Go to <Spark Home>
Go to the <Spark Home>/work
You will see a list app id, go to the specific app id and go to the child folder
You will see the files: stderr, stdout, and the py file generated by Incorta

Since the python code with the Incorta Materialized View (MV) name is stored in the folder, we can identify the log for a specific MV:

cd <spark home>/work
ls -ld app-*-*/0/* |grep BOOK_BY

Performance.jsp

The main purpose of this tool is to identify regressions between Incorta builds. It can also be used to simulate concurrent user requests for stress testing Incorta dashboards.

IMPORTANT NOTE: performance.jsp is only applicable for versions of Incorta prior to 5.0.

Instructions

You will need to have backend access to the analytics node server

Contact Incorta Support for the most recent version, and then upload it to analytics node server under <Incorta Installation>/IncortaNode/runtime/webapps/incorta folder
Open a web browser, go to the URL <host:port>/incorta/performance.jsp, then the form page will appear:

The script is not compatible with all Incorta versions, so you may get 500 errors like this:

In the error line, there may be a comment that shows how to solve it, such as line 225 comment “Comment/Uncomment between next two lines if you get compilation error“.
To solve this, open the deployed file with any text editor, then simply toggle the comment literals “//“ between the two lines 227 and 228, and save the file then open the page again from the browser, you might get the same error but in different lines. then make another iteration then the error should be solved.

Input Parameters

Enter the tenant name, user and password
Choose the Run mode, if you want a new run then choose Run radio button, or Compare radio button to compare with other runs
In the Workspace Path field, enter a path in the analytics node file system where it will save the results in.
In Run Name field, enter any identifier name for your run
If Compare mode is selected, then enter the compared with run identifier name in Previous Run Name to Compare with field

If Compare Only checked then Run Name field should be a name of already existed run on the workspace folder, so the tool will generate a comparison report only between two existing runs, not creating a new run.
Impersonate all users if you want to run with all users in the tenant or enter usernames comma-separated to run for them only
Skip not owned dashboards if this option checked, the dashboard will be run once by its owner only. if the other users have access but not the owner then it will be skipped for them.
You can specify a set of folders or dashboards to run instead of running the entire tenant by entering the folder IDs and dashboard GUIDs comma separated.
You can save a file with dashboard GUIDs (line separated) on the analytics node file path and send the path in guidFilePath query parameter in the URL
Apply Sorting flag will add sorting by the first measure column by default. this is used if you are comparing between runs and want to minimize mismatches caused by different row orders if the table doesn’t have a sort by column.
In the concurrency section, you can configure how many insights to run in parallel from Number of Workers dropdown
If you want to log the query plan for all insights then add logQueryPlan query parameter to the URL with the value true.
if you want to run without saving the run outputs then add noReportData query parameter to the URL with the value true

Results

The tool has two output formats (HTML/Tab-separated .txt file) that get generated based on the Mode.

Run Mode HTML

HTML report for the list of all visited insights and its folder/dashboard hierarchy with the loading time. It will flag the skipped and failed insights.

If the skipped insights are the hierarchy insights, or if the skipped not owned dashboards flag is on, then all the insights belonging to a dashboard that is not owned by the running user will be skipped. An example of the HTML output:

Run Mode Tab-separated .txt

A Tab-separated .txt file will be generated in the run folder to save the report for further analysis. The file will contain the following columns:

Column	Description	Possible Values
Time	Load Time in Milliseconds	1 for skipped insights, -2 for failed insights
Run Name	The Run Identifier Name
Run Timestamp	Timestamp when the run started	field, gauge, single row chart, series chart, pivot chart, flat table, aggregated table, detail table, pivot, unknown
Path	The Path to the entity
Error	The error message if the insight failed	format, format2, sort, sort2, right_fail, fail, left_fail, mismatch (exact lines count), mismatch (different lines count)
Entity Type	Entity Type	Tenant, folder, dashboard, insight

Compare Mode HTML

The HTML report for compare mode will list the count of mismatched insights with the insight type and mismatch category/sub-category. An example of the HTML output:

Compare Mode Tab-separated .txt

A Tab-separated compare.txt file will be generated in the compare folder to save the report for further analysis. The file will contain the following columns:

Column	Description	Possible values
User	Username for the user who rendered the insight	-
Tenant	Tenant name	-
Left	New run name	-
Right	Referenced (compared with) run name	-
Insight category	category of the rendered insight	field, gauge, single row chart, series chart, pivot chart, flat table, aggregated table, detail table, pivot, unknown
Mismatch type	identified mismatch type between the two runs	format, format2, sort, sort2, right_fail, fail, left_fail, mismatch (exact lines count), mismatch (different lines count)
Insight name	the insight name	if the insight is unnamed then guid is used
Path	Incorta relative insight path	-
File path (Left)	file system path to the new run file	-
File path (Right)	file system path to the old run file	-

Comparison Analysis

Navigate to the compare folder under the workspace folder, then navigate to the folder name <run-name>--<compare-with-run-name> which you want to check.
In this folder you will find two folders with the run names, then open these folders with any file comparison tool like meld or beyond compare.
You can then navigate through the mismatches and check the run output for further investigation.
The mismatches will be categorized into the following:

Mismatch Category	Mismatch Sub Category	Description	Notes
1- Exact Match	exact	exactly matching
2- Semantic Match	format	format mismatch	Some values have different format
	format2	format mismatch with ignoring precision	Some variation of above around precision
	sort	sorting mismatch	The order does not match for reports where sorting was declared
	sort2	sorting mismatch with ignoring percision	differences in order with some variations around precision
	fail	both are failing	Report did not run - no comparison needed
	right_fail	old run failed	New run succeeded, but old run had failed
4- Mismatch	left_fail	old run succeeded but the new run fails
5- Unknown	mismatch (exact lines count)	content mismatch, but the line count is exactly the same between both runs.
	mismatch (different lines count)	mismatch and line count is different.

Notes

You can use the tool to simulate concurrent requests, using Run mode only.
You can use the tool to find regressions between builds. first, you need to generate run before the upgrade, then upgrade, then generate a new run with compare to the run before the upgrade, then analyze the mismatches if found.
The maximum number of parallel users while using impersonation is 10.
Try to keep the Number of Workers low when running the tool with multiple users.
All the inputs can be set in the URL as query parameters, so you can run the tool through a curl command or schedule it with a cronjob like this example:

BASE_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )

host='http://localhost:8080'

tenant='ebs_cloud'

usr='admin'

passEnc='vNurdjf9YdcpAFPBqSawXQ=='

path='/home/incorta/zakaria' #doesn't end with '/', can be empty

comparewithrunname='' #compare run name, leave it empty for run mode

compareonly='' #set it to 'true' to compare only folderIds='' #comma separated ids '746,225,222,227', leave empty if not needed

dashboardIds='' #comma separated guids 7fa751fd-fa26-4036-9d85-cd718631bfd6,3e1e7e99-3c11-491c-8938-f5efb67515be', leave empty if not needed

runwithusers='' #comma separated usernames 'user1,user2', leave empty if not needed

workerCount=4 #number of threads

output='html' #can be html,csv,txt and data.

runname=`date "+%Y%m%d%H%M%S"` #run name. it is the current timestamp but can be anything

if [ "$output" != "data" ]; then

    curl -s -d "tenant=$tenant&login=$usr&passEnc=$passEnc&workspacePath=$path&currentRun=$runname&ref=$comparewithrunname&compareOnly=$compareonly&foldersIdsStr=$folderIds&guidsStr=$dashboardIds&userNamesStr=$runwithusers&workersCount=$workerCount&output=$output" $host/incorta/performance.jsp > $runname.$output

else #compare is not supported with 'data' output

    curl -s -d "tenant=$tenant&login=$usr&passEnc=$passEnc&workspacePath=$path&currentRun=$runname&foldersIdsStr=$folderIds&guidsStr=$dashboardIds&userNamesStr=$runwithusers&workersCount=$workerCount&output=data" $host/incorta/performance.jsp > $runname.zip

fi

Tenant Comparison Tool

Incorta admins often need to migrate changes from a UAT or Dev instances to PROD but they are not sure which changes took place in either environment. The manual comparison is tedious and error-prone...even the raw XML comparison is still error-prone because the same entity (insight, schema, table, etc) or attribute can be found in different positions in the two XMLs.

Raw XML comparison also would reveal changes that are merely there due to Incorta schema upgrades but they are not really important changes that the user need to be aware of.

The Tenant Comparison Tool is a standalone command-line (CLI) tool that helps users do in comparing two different tenants (for example UAT tenant and PROD tenant) by comparing the dashboards, schemas, business schemas and data sources between the two tenants.

The user can specify the output format of the tool, HTML or CSV. Using the CSV format, the user can upload the comparison output CSV files to Incorta, and use the pre-built schemas and dashboards shipped with the tool to analyze the differences between the tenants using Incorta dashboards.

Terminology

‘Entity’ id will refer to one of the following: Schema, Business Schema, Dashboard, Data Source

Revised Entity: this represents the more updated version of the entity. (e.g. UAT version)
Base Entity: this represents the older version of the entity. (e.g. Production version)

Installation and Usage

Ensure that JDK 1.8 is installed on your machine.
Contact Incorta Support for the appropriate version of the tool.
Import the dashboard folder (named "incorta_dashboards_export.zip”) and the schema (named “schema_export.zip“) to an Incorta instance that you are planning to use to analyze the tool’s output.
Unzip the tool build file comparison-tool.zip to some location on your machine.
Export the base tenant and the new tenant from the environments you are comparing (e.g. UAT and PROD).
Run the tool by running the script compare-incorta.sh and pass the input parameters explained in the Command-Line Options section.
If you choose the output to be in CSV format to analyze the output in tool’s dashboards:

Compress the output folder then upload it to Incorta under the Incorta Data Files section.
Reload the schema named “Incorta_tenant_comparison“
Check the tools' dashboards under the dashboard folder “Tenant Comparison“

Output Formats

The tool can produce comparison reports in two different file formats, HTML and CSV format.

The HTML format can be viewed in the browser by the help of the index.html file generation with the output files, that lists the changed entities alongside with a hyperlink to navigate and view the changes.
The CSV format should be used if you are planning to analyze the output using the auxiliary dashboards shipped with the tool.

Command-Line Options

Parameter	Description
-b,--base-path	The path to the base tenant export (zip) file.
-n,--new-path	The path to the new version of the tenant export (zip) file.
-o,--output-path	The output directory where you should save the comparison reports and the migrated entities to. The default output path is './comparison-output' which is next to the jar file.
-f,--output-format	The output comparison file format, you can specify whether it is 'html' or 'csv'. The default format is HTML.
-s,--schema-list	A comma-separated list of schema/business-schema names to compare. If not specified or a value 'all' is passed, all base-new pair of schemas should be compared.
-d,--dashboard-list	A comma-separated list of dashboard names to compare. If not specified or a value 'all' is passed, all base-new pair of dashboards should be compared.
-c,--datasource-list	A comma-separated list of data source names to compare. If not specified or a value 'all' is passed, all base-new pair of data sources should be compared.
-m,--mode	Specify 'compare-only' to only compare the tenants. The default value is 'compare-only'. There is another mode called 'migrate' which will migrate (step-by-step) the mentioned entities from the base to the revised version (this is still under investigation).
-u,--create-subfolder	Whether to save the output in a sub-folder (named by the run timestamp) under the specified output path.

Examples

This command compares the whole SAP tenant environments by providing the paths of their exports:

1./compare-incorta.sh \ 2 -b tenant_sapeccdev_20200609.zip \ 3 -n tenant_sapeccdev_ent_20200609.zip \ 4 -o compare-tenant-output \ 5 -m compare-only \ 6 -f csv \ 7 -s all \ 8 -d all \ 9 -c all

Another example of comparing specific schemas (SAPECC_PP and MS_AR), specific dashboards (Unused Business Views and Sales Order Check) and all data sources would be:

1./compare-incorta.sh \ 2 -b tenant_sapeccdev_20200609.zip \ 3 -n tenant_sapeccdev_ent_20200609.zip \ 4 -o compare-tenant-output \ 5 -m compare-only \ 6 -f csv \ 7 -s SAPECC_PP,MS_AR \ 8 -d "Unused Business Views,Sales Order Check" \ 9 -c all

The standard output summary for the above command:

The following screenshot shows the output index.html page generated, and each change has a hyperlink to the changed report. This is useful when dealing with an HTML output format:

The following screenshots show the output after uploading the CSV output into Incorta and previewing the Schema Summary dashboard:

The following screenshot shows the output after uploading the CSV output into Incorta and previewing the Dashboard Summary dashboard:

Implementation Details

Scanning Phase

The tool scans the input paths of the two tenant exports, then constructs the pairs of the corresponding entities (schemas, dashboards, etc.) to be compared later. The user can specify the names of specific entities to limit the scope of comparison or can include every existing entity (can be shown in the command line options section).

Comparison Phase

The pairs saved during the scanning phase are being processed. Each entity is being processed one after the other, starting with schemas and business schemas, then data sources, and finally dashboards.

For each pair of corresponding entities:

Two DOM trees are built from their XML representation. Let's call them the Revised and Base trees as mentioned in the terminology.
Traverse the Revised DOM tree, and for each element in the tree, the tool will search for a corresponding element in the Base tree. The definition of the corresponding element is defined by a configuration file that identifies each element's key XPATH. For example, a Dashboard is identified by its @name attribute, a Measure is identified by both its Label and Field attributes, etc.

If no Corresponding Element is not found in the Base tree, then the tool marks the current element as ADDED.
If the corresponding element is found in the Base tree, then the tool will compare the value of the elements in both trees and the values of each attribute. If any change in values were detected, the tool will record that the current element was UPDATED.

Then the tool will try to detect the deleted elements, by traversing the Base DOM tree. For each element, if there's no Corresponding Element was found in the Revised tree, then the tool will mark the current element as DELETED.
After recording these changes and based on the user output format, the output files could in format HTML or CSV. The HTML format produces English-like statements to describe these changes. The CSV format is useful for uploading this data into Incorta and do the visualization there by creating customized insights.
An additional file is generated called ‘index.html’ that lists the names of each entity along with a hyperlink to navigate to the output report file. (Useful when using HTML format)

Log Parser

It can be difficult to look for specific information in Incorta log files manually, and often the log files may be too large to be opened in a text editor. Incorta maintains several versions of log parsers that will parse out the logs from the Loader or Analytics Service into a CSV file that makes it much easier to read, search or even use as a data source for an Incorta dashboard. There are also versions that can parse the log files without having to extract the log files from a .zip file in case of very large log files. Please contact Incorta Support if you are in need of one and they can provide the appropriate version.

Extract Table Column Attributes

This python script takes an unzipped tenant export file and produces a .csv file which has all the table and column details like data type, function, label, etc. within a schema. Please click HERE for the latest version of the script.

Missing Sync Notifier

Network or server issues could potentially cause nodes to go out of sync, a problem which may cause data inconsistency and information loss. The below script can be run periodically to review the successfully loaded jobs in the metadata and determine if the sync message arrived for the node.

Implementation Details

This tool will only work for on-premise customers, as the tool will need to be put in the 'IncortaNode' folder on each physical node in the cluster.
It will create a connection to the metadata DB and extract the successfully loaded jobs to compare them with the syncs that appear in the log file.
Once it determines that all the load jobs have a corresponding sync it will save the time for the last successful loaded job.
If it finds that there is a missing sync it will trigger an email notifying you that on a specific node, the specific schema(s) failed to sync on specific load job(s),
The first time the tool runs on a machine it checks for all load jobs ran the day up to the current the run time.

The jar file takes only 1 parameter which is its absolute path if it is ran from different path which is provided automatically by running it from the shell script.

In order for the tool to function correctly you must provide a ‘logger.properties’ file which looks like this:

 mail.smtp.auth=true

mail.smtp.starttls.enable=true

mail.smtp.host=smtp.gmail.com

mail.smtp.port=587

mail.smtp.ssl.trust=smtp.gmail.com

mail.session.mail.transport.protocol=smtp

from=system@incorta.com

to=[ADD EMAIL ADDRESSES]

password=[ADD PASSWORD]

env=[ENTER ENVIRONMENT NAME]

initialStart=72

log_location=[ADD PATH TO /IncortaNode/ FOLDER]

Useful Scripts

Introduction

What you should know before reading this article

Applies to

Let's Go

Data Collection Script

Input parameters

The information gathered

Running the script

Spark Data Collection

Performance.jsp

IMPORTANT NOTE: performance.jsp is only applicable for versions of Incorta prior to 5.0.

Instructions

Input Parameters

Results

Comparison Analysis

Notes

Tenant Comparison Tool

Terminology

Installation and Usage

Output Formats

Command-Line Options

Examples

Implementation Details

Log Parser

Extract Table Column Attributes

Missing Sync Notifier

Implementation Details

Example Email