cancel
Showing results for 
Search instead for 
Did you mean: 
dylanwan
Employee
Employee

Symptoms

You received the error when trying to convert a Pandas DataFrame to Spark DataFrame in a PySpark MV. Here is the error.

- INC_03070101: Transformation error Error 'DataFrame' object has no attribute 'iteritems' AttributeError : 'DataFrame' object has no attribute 'iteritems'

Diagnosis

Since Pandas 2.0.0, which was released around April 2023, the iteritems() method was deprecated and replaced by the items() method.  However, spark.createDataFrame() may still use the old method.  

Solutions

  1. Downgrade the Pandas version to an earlier version, such as 1.5.3.
  2. Upgrade Spark to Spark 3.4.1 or later
  3. Rewrite your code according to the following example:
import pandas as pd 

month_data = {
    'January': 1,
    'February': 2,
    'March': 3,
    'April': 4,
    'May': 5,
    'June': 6,
    'July': 7,
    'August': 8,
    'September': 9,
    'October': 10,
    'November': 11,
    'December': 12
}
pdf = pd.DataFrame(month_data.items(), columns=['Month', 'Month_Number'])
pdf.set_index('Month_Number', inplace=True)

# Add this line, before you call createDataFrame()
pdf.iteritems = pdf.items

df = spark.createDataFrame(pdf)

save(df)

The above sample code created a MV that shows a list of months with the index that can be used for sorting.  We first created a python dictionary and create it as a Pandas DataFrame for the purpose of reproducing the problem.

After we added the line as highlighted in the code, the issue went away.

The issue can be seen in the Incorta Data Profiler data application.  The workaround is to downgrade the Pandas version for now.

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Contributors
Version history
Last update:
‎11-15-2023 10:57 AM
Updated by: