Deploy Incorta Data Profiler

suxinji · ‎09-01-2022

Overview
Deployment Overview
Sample Data
Installation
Step 1: Install the DataPrepAPI Wheel File
Step 2: Upload the Data_Profiler_Datasets data files
Step 3: Import Schema
Step 4: Import Dashboard
Step 5: Add Session Variables (Optional)
Related Materials

Overview

This article will give instructions for deploying Incorta Profiler.

Please read the Introduction to Incorta Profiler before deploying it.

Deployment Overview

To use the Incorta Profiler, you have to install these four components:

IncortaDataPrep Python Package (Required)
Sample Datasets(Optional)
Schema(Required)
Dashboard(Required)

Sample Data

The sample data and data folder is an optional step since you will need to replace the data source in the materialized views with your own schema tables. However, we recommend you deploy the sample data to verify your deployment and understand how it works by trying it out with the sample data.

Two data sets are used as samples

Titanic Survivors
House Price Prediction

Installation

Step 1: Install the DataPrepAPI Wheel File

Here is how to install the IncortaDataPrep wheel file in the on-premises environment:

First, copy the file to your Incorta on-premise server.

scp -i remove.key ~/Download/IncortaDataPrep-0.0.1-py3-none-any.whl incorta@00.000.000.000:/home/incorta

Then, run Python pip to install the wheel file on the python env used by the Incorta instance.

/usr/bin/python3 -m pip install --user IncortaDataPrep-0.0.1-py3-none-any.whl

Step 2: Upload the Data_Profiler_Datasets data files

The Titanic Survivors and House Price data sets are packaged as it is used as the sample data.

This step is optional. You can instead use your own datasets.

Please note that the data files are stored under the folder Data_Profiler_Datasets.

Step 3: Import Schema

In the Schema, we have five MVs:

table_info
summary_table
freq_item
correlation_table
histogram

All these MVs call the DataPrepAPI Python package functions so installing the Python package is required for validating and saving the MVs.

Note:

If you don’t deploy the sample datasets Data_Profiler_Datasets as mentioned in the prior step, You need to open the MVs and change the full load and the incremental load logic by pointing to your data source.

For example:

In freq_item, both the full load and incremental load, in the screenshot highlight part change to your [SCHEMANAME].[TABLENAME]

Edit Full load logic:

Spoiler

Please switch off the incremental logic, run the schema load job before editing the incremental logic.

Edit Incremental logic:

If you have multiple data sources, you can add them in incremental logic using unionAll

Note:

Make the schema table name change with the format [SCHEMANAME].[TABLENAME] for both the full load and incremental load for all five MVs: table_info, summary_table, correlation_table, histogram, and freq_item

Step 4: Import Dashboard

Step 5: Add Session Variables (Optional)

The Session Variable is used to display the default table in the list of Table Names.

query(
Data_Profiler.table_info.table_name,
rowNumber() = 1
)