0

Sample Incorta v4 Cluster Setup with External Spark

Example Incorta v4.2 Cluster Topology:

  • Node 1: CMC, Metadata DB (MySQL), Loader Service, Zookeeper
  • Node 2: Analyzer Service
  • Node 3: Spark

 

Prerequisites:

  • Ensure all nodes have Oracle JDK 1.8 latest version and JAVA_HOME directory set
  • Ensure all nodes have Python installed
  • Blank Metadata DB database has been created (MySQL)
  • Refer to Installation Guide for node install details

Steps:

  1. Run Incorta installer on Node 1, select "Custom" and install "CMC"
  2. Run Incorta installer on Node 1, select "Custom" and install "Incorta HA Node"
  3. Run Incorta installer on Node 2, select "Custom" and install "Incorta HA Node"
  4. Spark node does not need to be an Incorta node, so no need to use installer, simply copy the Spark binaries from Incorta node to Spark machine:
    1. Copy from Node 1 or 2 the <incorta_home>/IncortaNode/spark folder e.g.
      1. /opt/incorta/spark
    2. Copy from Node 1 or 2 the <incorta_home>/IncortaNode/startSpark.sh and stopSpark.sh e.g.
      1. /opt/incorta/startSpark.sh
  5. At this point, the Incorta cluster can be created. Please follow the "Configure an Incorta v4 Cluster using the CMC step by step" article.
    1. Skip the Spark setup as the document recommends because that can be done from Admin UI later.
    2. Note that Zookeeper is installed on both Incorta Nodes but since this is not an HA setup, only 1 Zookeeper node is needed. In this example, Zookeeper is running on Node 1. If you wish to run a 3 node Zookeeper quorum, you would enter the Zookeeper <ip:port> urls comma-delimited at step 6 "Zookeeper URL".
    3. The guide installs a Loader and Analyzer on the same node. You will install Loader on Node 1 and Analyzer on Node 2 (so select "Finish" at step 7 [from guide: 7- Click "Finish" if you want to have only one service created on this node... ] and then repeat for other service)
  6. Now to configure Spark on Node 3
    1. Edit /opt/incorta/spark/conf/spark-defaults.conf
      1. Update spark.master with Node 3 hostname or IP
      2. Update spark.cores.max to preference. It represents number of cores allocated per MV job e.g. on a 12 core machine, a setting of 4 means 3 MV jobs can run in parallel.
      3. Update spark.executor.cores to equal spark.cores.max or less. If set to 4 this means each MV job will have 1 executor with 4 cores. If set to 2, this means each MV job will have 2 executors with 2 cores each.
      4. Further Spark settings are beyond the scope of this document
    2. Edit /opt/incorta/spark/conf/spark-env.sh
      1. Update SPARK_MASTER_IP with Node 3 hostname or IP
      2. Update SPARK_WORKER_MEMORY to total value you want allocated to Spark. On a dedicated box, you might go as high as 85% of physical memory.
  7. Zookeeper config recommendation
    1. On the chosen Zookeeper node (Node 1 in this example), add the following lines to <incorta_home>/IncortaNode/zookeeper/conf/zoo.cfg
    2. maxSessionTimeout=900000
      minSessionTimeout= 600000
  8. Using the CMC, navigate to your cluster and on the Details tab, click Start
    1. Wait until both Loader and Analyzer services have started
  9. To enable the Spark integration, navigate to Incorta Admin page e.g. http://node2:8080/incorta/admin
    1. On System Configuration page, Server Configs tab, Spark Integration section: Set Spark master URL to node 3 hostname or IP e.g. spark://node3:7077
    2. Click Save.
    3. Restart both loader and analyzer
Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like Follow
  • 1 mth agoLast active
  • 59Views
  • 2 Following

Welcome!

Welcome to Incorta's user community! Whether you are already an Incorta customer, or interested in becoming one, this is your place to come together and discuss the software, register for webinars, learn about events, learn about new product releases and get support from the community. 

Looking for Product Documentation?