on 11-15-2023 10:57 AM
You received the error when trying to convert a Pandas DataFrame to Spark DataFrame in a PySpark MV. Here is the error.
- INC_03070101: Transformation error Error 'DataFrame' object has no attribute 'iteritems' AttributeError : 'DataFrame' object has no attribute 'iteritems'
Since Pandas 2.0.0, which was released around April 2023, the iteritems() method was deprecated and replaced by the items() method. However, spark.createDataFrame() may still use the old method.
import pandas as pd
month_data = {
'January': 1,
'February': 2,
'March': 3,
'April': 4,
'May': 5,
'June': 6,
'July': 7,
'August': 8,
'September': 9,
'October': 10,
'November': 11,
'December': 12
}
pdf = pd.DataFrame(month_data.items(), columns=['Month', 'Month_Number'])
pdf.set_index('Month_Number', inplace=True)
# Add this line, before you call createDataFrame()
pdf.iteritems = pdf.items
df = spark.createDataFrame(pdf)
save(df)
The above sample code created a MV that shows a list of months with the index that can be used for sorting. We first created a python dictionary and create it as a Pandas DataFrame for the purpose of reproducing the problem.
After we added the line as highlighted in the code, the issue went away.
The issue can be seen in the Incorta Data Profiler data application. The workaround is to downgrade the Pandas version for now.