Groupby Pivot in Materialized View

mkrieger — Fri, 24 Feb 2023 19:51:36 GMT

Hello,

I am trying to create a materialized view with pyspark where I change the shape of a dataframe using a groupby and pivot function to create many new columns out of a pre existing column.

Like so:

result = df.groupby("ItemId").pivot("Question").agg(first("Answer", ignorenulls = True))

My code successfully validates, however when I attempt to load it, I run into the following error message:

- INC_03020207: The result definition mismatches the current table definition. Columns [INCORTA LISTS THE COLUMN NAME VALUES I AM ATTEMPTING TO ADD HERE] are missing(-)/extra(+). Please re-run table discovery

How can I overwrite / update the table definition to perform my transformation?

Re: Groupby Pivot in Materialized View

rsather — Wed, 07 Jun 2023 21:38:48 GMT

Hello,

I have had this problem when new values are added in the source data. The key for me was to force Incorta to think the script was different so it discovers the new "columns". I add a # on an empty line and it's enough to be different and pull in the new fields. The next time it happens I take the # out of the script for the next "change". Keep going back and forth each time the underlying data changes.

Ryan

topic Re: Groupby Pivot in Materialized View in Data & Schema Discussions

Groupby Pivot in Materialized View

Re: Groupby Pivot in Materialized View