Showing results for 
Search instead for 
Did you mean: 


Sparkflows provides a no code or low code alternative to writing an Incorta materialized view by yourself.  This article discusses how to enable Sparkflows to use Incorta to submit spark jobs to Incorta Cloud and how to deploy the Sparkflows workflow as PySpark.

What is Chidori?

Chidori is a technology as well a product service provided by Incorta to enable Incorta to run Spark jobs without affecting regular Incorta. To learn more about the Chidori service from Incorta, please watch this video.

Incorta and Sparkflows Integration

dylanwan_0-1715866539629.pngBelow are steps to integrate Incorta and Sparkflows.

Configure the Connections

Sparkflows supports two types of connections:

  • compute connection
  • data connection

By default, Sparkflows jobs are submitted on the local machine itself.  Sparkflows can also be configured to submit the jobs to a cluster via its compute connection.  Chidori is the AI/ML cluster management service from Incorta. By connecting to Incorta Chidori, Sparkflows jobs are submitted to the Incorta K8S Spark cluster.

Connect to Incorta Chidori as a compute connection from Sparkflows

In Sparkflows, navigate to Administration -> Configuration -> Global Connection


Pick "Chidori" as the connection type.

You can add the jar or python files required for your Sparkflows jobs.

Create Project and workflow in Sparkflows

Connect to Incorta as a data source from Sparkflows

Create a Dataset with an Incorta object store connection:


Read Incorta Data in a Sparkflows Workflow

 Use ‘Read Incorta’ Node in a Sparkflows workflow to ingest data from Incorta

"Read Incorta" allows you to use the data that are extracted, transformed, and enriched in Incorta in Sparkflows for further processing.
In the below example, Demand_Forecasting_Usecase is an incorta schema name. The Demand_forecasting_Train is a table or a materialized view in Incorta.

After you refresh schema from Incorta, the columns and types of columns will be refreshed in Sparkflows and available for the rest of the workflows.

Save and Share data with Incorta from Sparkflows

Use the Save Parquet node to persist the output data in a GCS bucket which is accessible to Incorta




ML Model Registry

Save a ML Model to Incorta




View Model Registry in Sparkflows

In the Sparkflows Model Registry, the model summary, the hyper-parameter, performance metrics, feature importances and model path are stored for the executed model. It allows users to compare the different models.


Create an Incorta MV using the PySpark generated by Sparkflows

Generate PySpark Code

PySpark Code can be generated for workflow and executed on any Spark environment by following the below steps:


 Click on the Copy to Clipboard button to copy the generated code.


Below is a dashboard in Incorta with Order Summary details and Predicted CLTV for each customer.

Image 21-05-2024 at 18.23.jpeg

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Version history
Last update:
‎06-04-2024 02:50 PM
Updated by: