cancel
Showing results for 
Search instead for 
Did you mean: 

PySpark Regressions using pyspark.ml Library

mkrieger
Ranger

I am developing a pipeline for some regression modeling I am experimenting with and I've got a working script and output that I am reasonably happy with. However I am unable to write new scripts using the ml library. I'm not even able to copy and paste my working code into a new materialized view and run it. 

If I copy and paste into a new materialized view I start hitting errors after all my data cleaning when I try to fit my regression here

 

# Importing libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import pyspark.sql.functions as F
from pyspark.sql import Row
from pyspark.sql.types import ArrayType, DoubleType
# ML library
# documentation: https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml import Pipeline
from pyspark.ml.linalg import Vectors
# [...]
# Skipping my data cleaning process for sake of simplicity
# [...]

lr = LinearRegression(featuresCol='features', labelCol='target')
lr_model = lr.fit(training_data)

 

I return the following error message

 

Error An error occurred while calling o605.fit.
: java.util.NoSuchElementException: next on empty iterator
Py4JJavaError : An error occurred while calling o605.fit.
: java.util.NoSuchElementException: next on empty iterator

 

This exact script works fine in the materialized view I developed it in. However if I copy it to a new materialized view to alter (for example if I want to test out some different modeling methods like decision trees or time lag modeling) then I receive  the above error.

How can I reliably use the ml library in Incorta?

0 REPLIES 0