Solved: Plan for Spark Version 3.2

Stracey · ‎08-09-2022

Hi all,

I was wondering if there is a roadmap for the upgrade of Incorta Spark to version 3.2 or 3.3?

The reason I ask is because Spark version 3.2 saw the implementation of Pandas API on Spark (https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html).

As we know, one of the limitations of using the standard Pandas library in Spark is its inability to scale linearly with data volume due to single-machine processing however this limitation is overcome using Pandas API on Spark.

Another advantage (for me personally) is that the Pandas API on Spark uses the Plotly backend allowing us to create interactive charts which is extremely useful during the Exploratory Data Analysis and Model Evaluation stages.

Currently when we use Matplotlib.pyplot or Seaborn for EDA we can only generate static charts and also need to be extremely careful with sampling when working with large datasets.

Any feedback would be much appreciated!

Sam

DustinB · ‎08-09-2022

Hi Sam, yes, Spark 3.2 is on our roadmap for Incorta for Q4 of this year and is currently in development. Spark 3.3 will follow later once it supports other components leveraged by our platform.