on 09-01-2022 09:30 AM - edited on 09-23-2022 03:34 PM by Tristan
This article will give instructions for deploying Incorta Profiler.
Please read the Introduction to Incorta Profiler before deploying it.
To use the Incorta Profiler, you have to install these four components:
The sample data and data folder is an optional step since you will need to replace the data source in the materialized views with your own schema tables. However, we recommend you deploy the sample data to verify your deployment and understand how it works by trying it out with the sample data.
Two data sets are used as samples
Here is how to install the IncortaDataPrep wheel file in the on-premises environment:
First, copy the file to your Incorta on-premise server.
scp -i remove.key ~/Download/IncortaDataPrep-0.0.1-py3-none-any.whl incorta@00.000.000.000:/home/incorta
Then, run Python pip to install the wheel file on the python env used by the Incorta instance.
/usr/bin/python3 -m pip install --user IncortaDataPrep-0.0.1-py3-none-any.whl
The Titanic Survivors and House Price data sets are packaged as it is used as the sample data.
This step is optional. You can instead use your own datasets.
Please note that the data files are stored under the folder Data_Profiler_Datasets.
In the Schema, we have five MVs:
All these MVs call the DataPrepAPI Python package functions so installing the Python package is required for validating and saving the MVs.
Note:
If you don’t deploy the sample datasets Data_Profiler_Datasets as mentioned in the prior step, You need to open the MVs and change the full load and the incremental load logic by pointing to your data source.
For example:
In freq_item, both the full load and incremental load, in the screenshot highlight part change to your [SCHEMANAME].[TABLENAME]
If you have multiple data sources, you can add them in incremental logic using unionAll
Note:
Make the schema table name change with the format [SCHEMANAME].[TABLENAME] for both the full load and incremental load for all five MVs: table_info, summary_table, correlation_table, histogram, and freq_item
The Session Variable is used to display the default table in the list of Table Names.
query(
Data_Profiler.table_info.table_name,
rowNumber() = 1
)