Why do we need Spark

Spark is not mandatory but helps performing distributed processing on Parquet data.  Complex stored procedures can be converted to PySpark programs and processed on Spark cluster. 

Here is the link on how to setup Spark with Incorta: https://docs.incorta.com/4.4/spark/

5replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Mateen Mohammed , can you please provide access to the file?

  • Hi Mateen Mohammed.  please provide this file?  It's inaccessible with the provided link.

    An option is to email it to lchizum@c-es.com

    • Lee Chizum I did share the file.

      • Lee Chizum
      • Cost-effective System
      • Lee_Chizum
      • 2 yrs ago
      • Reported - view

      Mateen Mohammed 
      Rec'd - Thx

  • Regarding configuring spark on a distributed node, in the article it is mentioned to copy spark_home to a shared drive. We should avoid that as it will lead to performance degradation due to running spark out of shared drive.

    Instead follow these steps

    1. Zip  spark_home from the Incorta Node.

    2.  Unzip it on spark machine on its local disk NOT on shared disk.

    3. Modify spark-env.sh and spark-defaults..conf to change the hostname to the name of spark machine. These files will be under spark_home/conf directory.

Like1 Follow
  • Status Answered
  • 2 yrs agoLast active
  • 5Replies
  • 376Views
  • 5 Following

Product Announcement

A new community experience is coming! If you would like to have beta access to provide feedback, please contact us at community@incorta.com.