.png)
- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on 11-15-2023 10:57 AM
Symptoms
You received the error when trying to convert a Pandas DataFrame to Spark DataFrame in a PySpark MV. Here is the error.
- INC_03070101: Transformation error Error 'DataFrame' object has no attribute 'iteritems' AttributeError : 'DataFrame' object has no attribute 'iteritems'
Diagnosis
Since Pandas 2.0.0, which was released around April 2023, the iteritems() method was deprecated and replaced by the items() method. However, spark.createDataFrame() may still use the old method.
Solutions
- Downgrade the Pandas version to an earlier version, such as 1.5.3.
- Upgrade Spark to Spark 3.4.1 or later
- Rewrite your code according to the following example:
import pandas as pd
month_data = {
'January': 1,
'February': 2,
'March': 3,
'April': 4,
'May': 5,
'June': 6,
'July': 7,
'August': 8,
'September': 9,
'October': 10,
'November': 11,
'December': 12
}
pdf = pd.DataFrame(month_data.items(), columns=['Month', 'Month_Number'])
pdf.set_index('Month_Number', inplace=True)
# Add this line, before you call createDataFrame()
pdf.iteritems = pdf.items
df = spark.createDataFrame(pdf)
save(df)
The above sample code created a MV that shows a list of months with the index that can be used for sorting. We first created a python dictionary and create it as a Pandas DataFrame for the purpose of reproducing the problem.
After we added the line as highlighted in the code, the issue went away.
The issue can be seen in the Incorta Data Profiler data application. The workaround is to downgrade the Pandas version for now.