12-07-2023 06:29 AM
I am trying to fit a simple linear regression with pyspark.ml in a materialized view
The packages I am using are as follows:
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
# select feature and target columns
feature_columns = ['CustomerId','BuildingId','year','month']
assembler = VectorAssembler(inputCols=feature_columns, outputCol='features')
feature_frame = assembler.transform(frame).select('features', 'target')
# build regression
lr = LinearRegression(featuresCol='features', labelCol='target')
lr_model = lr.fit(training_data)
However whenever try to validate my script, I keep running into the following error when I run lr.fit
INC_03070101: Transformation error [Error An error occurred while calling o560.fit.
: java.util.NoSuchElementException: next on empty iterator
Py4JJavaError : An error occurred while calling o560.fit.
: java.util.NoSuchElementException: next on empty iterator
2023-12-07 09:21:58,452 ERROR util.Instrumentation: java.util.NoSuchElementException: next on empty iterator
I can't seem to find anything about an o560 error on github/stackoverflow/Google
I have made sure that my training data contains no NA values.
Solved! Go to Solution.
12-07-2023 07:22 AM
Can you add the MV property "spark.dataframe.sampling.enabled" and set the property to false?
12-07-2023 07:35 AM
Sure, although I run into the error when I try to validate the MV. Not when I run/load the table
12-07-2023 02:10 PM
Solved.
Was an issue with my order of operations between my vector assembly and some additional filtering I was doing before I fit my model (not pictured in my code snippets).
Filter first, then create your vector.
Thanks,
Matt