<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Trying To Fit A Linear Regression with pyspark.ml in Data &amp; Schema Discussions</title>
    <link>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5341#M427</link>
    <description>&lt;P&gt;Sure, although I run into the error when I try to validate the MV. Not when I run/load the table&lt;/P&gt;</description>
    <pubDate>Thu, 07 Dec 2023 15:35:03 GMT</pubDate>
    <dc:creator>mkrieger</dc:creator>
    <dc:date>2023-12-07T15:35:03Z</dc:date>
    <item>
      <title>Trying To Fit A Linear Regression with pyspark.ml</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5339#M425</link>
      <description>&lt;P&gt;I am trying to fit a simple linear regression with pyspark.ml in a materialized view&lt;/P&gt;&lt;P&gt;The packages I am using are as follows:&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;my code looks like this:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# select feature and target columns
feature_columns = ['CustomerId','BuildingId','year','month']
assembler = VectorAssembler(inputCols=feature_columns, outputCol='features')
feature_frame = assembler.transform(frame).select('features', 'target')

# build regression 
lr = LinearRegression(featuresCol='features', labelCol='target')
lr_model = lr.fit(training_data)​&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However whenever try to validate my script, I keep running into the following error when I run lr.fit&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;INC_03070101: Transformation error [Error An error occurred while calling o560.fit.
: java.util.NoSuchElementException: next on empty iterator
Py4JJavaError : An error occurred while calling o560.fit.
: java.util.NoSuchElementException: next on empty iterator
2023-12-07 09:21:58,452 ERROR util.Instrumentation: java.util.NoSuchElementException: next on empty iterator&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can't seem to find anything about an o560 error on github/stackoverflow/Google&lt;/P&gt;&lt;P&gt;I have made sure that my training data contains no NA values.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 14:29:16 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5339#M425</guid>
      <dc:creator>mkrieger</dc:creator>
      <dc:date>2023-12-07T14:29:16Z</dc:date>
    </item>
    <item>
      <title>Re: Trying To Fit A Linear Regression with pyspark.ml</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5340#M426</link>
      <description>&lt;P&gt;Can you add the MV property "spark.dataframe.sampling.enabled" and set the property to false?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 15:22:29 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5340#M426</guid>
      <dc:creator>dylanwan</dc:creator>
      <dc:date>2023-12-07T15:22:29Z</dc:date>
    </item>
    <item>
      <title>Re: Trying To Fit A Linear Regression with pyspark.ml</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5341#M427</link>
      <description>&lt;P&gt;Sure, although I run into the error when I try to validate the MV. Not when I run/load the table&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 15:35:03 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5341#M427</guid>
      <dc:creator>mkrieger</dc:creator>
      <dc:date>2023-12-07T15:35:03Z</dc:date>
    </item>
    <item>
      <title>Re: Trying To Fit A Linear Regression with pyspark.ml</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5345#M428</link>
      <description>&lt;P&gt;Solved.&lt;/P&gt;&lt;P&gt;Was an issue with my order of operations between my vector assembly and some additional filtering I was doing before I fit my model (not pictured in my code snippets).&lt;/P&gt;&lt;P&gt;Filter first, then create your vector.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 22:10:37 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/trying-to-fit-a-linear-regression-with-pyspark-ml/m-p/5345#M428</guid>
      <dc:creator>mkrieger</dc:creator>
      <dc:date>2023-12-07T22:10:37Z</dc:date>
    </item>
  </channel>
</rss>

