cancel
Showing results for 
Search instead for 
Did you mean: 

Plan for Spark Version 3.2

Stracey
Ranger

Hi all,

I was wondering if there is a roadmap for the upgrade of Incorta Spark to version 3.2 or 3.3?

The reason I ask is because Spark version 3.2 saw the implementation of Pandas API on Spark (https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html).

As we know, one of the limitations of using the standard Pandas library in Spark is its inability to scale linearly with data volume due to single-machine  processing however this limitation is overcome using Pandas API on Spark.

 

Another advantage (for me personally) is that the Pandas API on Spark uses the Plotly backend allowing us to create interactive charts which is extremely useful during the Exploratory Data Analysis and Model Evaluation stages. 

Currently when we use Matplotlib.pyplot or Seaborn for EDA we can only generate static charts and also need to be extremely careful with sampling when working with large datasets.

Any feedback would be much appreciated!

Sam

1 REPLY 1

DustinB
Employee
Employee

Hi Sam, yes, Spark 3.2 is on our roadmap for Incorta for Q4 of this year and is currently in development. Spark 3.3 will follow later once it supports other components leveraged by our platform.