.png)
- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on 06-04-2024 02:50 PM
- Overview
- What is Chidori?
- Incorta and Sparkflows Integration
- Configure the Connections
- Create Project and workflow in Sparkflows
- ML Model Registry
- Create an Incorta MV using the PySpark generated by Sparkflows
Overview
Sparkflows provides a no code or low code alternative to writing an Incorta materialized view by yourself. This article discusses how to enable Sparkflows to use Incorta to submit spark jobs to Incorta Cloud and how to deploy the Sparkflows workflow as PySpark.
What is Chidori?
Chidori is a technology as well a product service provided by Incorta to enable Incorta to run Spark jobs without affecting regular Incorta. To learn more about the Chidori service from Incorta, please watch this video.
Incorta and Sparkflows Integration
Configure the Connections
Sparkflows supports two types of connections:
- compute connection
- data connection
By default, Sparkflows jobs are submitted on the local machine itself. Sparkflows can also be configured to submit the jobs to a cluster via its compute connection. Chidori is the AI/ML cluster management service from Incorta. By connecting to Incorta Chidori, Sparkflows jobs are submitted to the Incorta K8S Spark cluster.
Connect to Incorta Chidori as a compute connection from Sparkflows
In Sparkflows, navigate to Administration -> Configuration -> Global Connection
Pick "Chidori" as the connection type.
Create Project and workflow in Sparkflows
Connect to Incorta as a data source from Sparkflows
Create a Dataset with an Incorta object store connection:
Read Incorta Data in a Sparkflows Workflow
Use ‘Read Incorta’ Node in a Sparkflows workflow to ingest data from Incorta
After you refresh schema from Incorta, the columns and types of columns will be refreshed in Sparkflows and available for the rest of the workflows.
Save and Share data with Incorta from Sparkflows
Use the Save Parquet node to persist the output data in a GCS bucket which is accessible to Incorta
ML Model Registry
Save a ML Model to Incorta
View Model Registry in Sparkflows
In the Sparkflows Model Registry, the model summary, the hyper-parameter, performance metrics, feature importances and model path are stored for the executed model. It allows users to compare the different models.
Create an Incorta MV using the PySpark generated by Sparkflows
Generate PySpark Code
PySpark Code can be generated for workflow and executed on any Spark environment by following the below steps:
Click on the Copy to Clipboard button to copy the generated code.
Below is a dashboard in Incorta with Order Summary details and Predicted CLTV for each customer.