cancel
Showing results for 
Search instead for 
Did you mean: 

Trying To Fit A Linear Regression with pyspark.ml

mkrieger
Ranger

I am trying to fit a simple linear regression with pyspark.ml in a materialized view

The packages I am using are as follows:

 

 

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression

 

 
my code looks like this:

 

# select feature and target columns
feature_columns = ['CustomerId','BuildingId','year','month']
assembler = VectorAssembler(inputCols=feature_columns, outputCol='features')
feature_frame = assembler.transform(frame).select('features', 'target')

# build regression 
lr = LinearRegression(featuresCol='features', labelCol='target')
lr_model = lr.fit(training_data)​

 

However whenever try to validate my script, I keep running into the following error when I run lr.fit

 

INC_03070101: Transformation error [Error An error occurred while calling o560.fit.
: java.util.NoSuchElementException: next on empty iterator
Py4JJavaError : An error occurred while calling o560.fit.
: java.util.NoSuchElementException: next on empty iterator
2023-12-07 09:21:58,452 ERROR util.Instrumentation: java.util.NoSuchElementException: next on empty iterator

 

I can't seem to find anything about an o560 error on github/stackoverflow/Google

I have made sure that my training data contains no NA values. 

3 REPLIES 3

dylanwan
Employee
Employee

Can you add the MV property "spark.dataframe.sampling.enabled" and set the property to false? 

mkrieger
Ranger

Sure, although I run into the error when I try to validate the MV. Not when I run/load the table

mkrieger
Ranger

Solved.

Was an issue with my order of operations between my vector assembly and some additional filtering I was doing before I fit my model (not pictured in my code snippets).

Filter first, then create your vector.

Thanks,

Matt