1

Why do we need Spark

Spark is not mandatory but helps performing distributed processing on Parquet data.  Complex stored procedures can be converted to PySpark programs and processed on Spark cluster. 

Here is the link on how to setup Spark with Incorta: https://docs.incorta.com/4.4/spark/

5replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Mateen Mohammed , can you please provide access to the file?

    Like
  • Hi Mateen Mohammed.  please provide this file?  It's inaccessible with the provided link.

    An option is to email it to lchizum@c-es.com

    Like
    • Lee Chizum I did share the file.

      Like
      • Lee Chizum
      • Cost-effective System
      • Lee_Chizum
      • 11 mths ago
      • Reported - view

      Mateen Mohammed 
      Rec'd - Thx

      Like
  • Regarding configuring spark on a distributed node, in the article it is mentioned to copy spark_home to a shared drive. We should avoid that as it will lead to performance degradation due to running spark out of shared drive.

    Instead follow these steps

    1. Zip  spark_home from the Incorta Node.

    2.  Unzip it on spark machine on its local disk NOT on shared disk.

    3. Modify spark-env.sh and spark-defaults..conf to change the hostname to the name of spark machine. These files will be under spark_home/conf directory.

    Like
Like1 Follow
  • Status Answered
  • 1 Likes
  • 8 mths agoLast active
  • 5Replies
  • 353Views
  • 5 Following