Pandas (and other Python libraries) in Materialized View
I am a newer user of Incorta, and I'm wondering -- is it possible to import pandas into materialized views?
I have installed Anaconda in the Linux environment Incorta is housed, and can confirm it is functional via command line, but it does not seem to be communicating with Incorta.
Sometimes linux environments can have multiple installs of python. Can you go to the linux command line of your server running Spark for Incorta, change to the user that Incorta runs as, and run the 'which python' command to confirm that Incorta is using the same install of python that you installed Pandas to?
1) In the <incorta home>/IncortaNode/spark/conf folder edit the spark-env.sh file to set the following line, save and restart spark:
PYSPARK_PYTHON = <full path to python executable/python >
2) You can use Pandas but Pandas dataframes unlike Spark dataframes are mutable so spark cannot use them for processing in a distributed manner. There is a new python module called Koalas which is a new open source project that augments PySpark’s DataFrame API to make it compatible with pandas. Please refer to https://databricks.com/blog/2019/04/24/koalas-easy-transition-from-pandas-to-apache-spark.html
I am trying to import wget in the Materialized View, but I am getting "INC_005005001:Failed to load data from [spark://DESKTOP-KI0KIC7:7077] with properties [[error, No module named 'wget' ]]"
Before this I was getting "Failed to connect to [spark] due to [null] with properties" after following Spark Integration documents, now I am getting this error. Please let me know how can I fix this.