Why do we need Spark

Spark is not mandatory but helps performing distributed processing on Parquet data.  Complex stored procedures can be converted to PySpark programs and processed on Spark cluster. 

Here is the link on how to setup Spark with Incorta: https://docs.incorta.com/4.4/spark/

5replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Mateen Mohammed , can you please provide access to the file?

    Reply Like
  • Hi Mateen Mohammed.  please provide this file?  It's inaccessible with the provided link.

    An option is to email it to lchizum@c-es.com

    Reply Like
    • Lee Chizum I did share the file.

      Reply Like
      • Lee Chizum
      • Cost-effective System
      • Lee_Chizum
      • 8 mths ago
      • Reported - view

      Mateen Mohammed 
      Rec'd - Thx

      Reply Like
  • Regarding configuring spark on a distributed node, in the article it is mentioned to copy spark_home to a shared drive. We should avoid that as it will lead to performance degradation due to running spark out of shared drive.

    Instead follow these steps

    1. Zip  spark_home from the Incorta Node.

    2.  Unzip it on spark machine on its local disk NOT on shared disk.

    3. Modify spark-env.sh and spark-defaults..conf to change the hostname to the name of spark machine. These files will be under spark_home/conf directory.

    Reply Like
Like1 Follow
  • Status Answered
  • 4 mths agoLast active
  • 5Replies
  • 345Views
  • 5 Following