on 10-07-2022 11:43 AM - edited on 10-07-2022 11:59 AM by Tristan
Your MV ran successfully in Incorta Notebook, but when you tried to validate and save the MV, it failed.
Incorta performs sampling on the data when you save or validate an MV. If the dataframe does not have enough data, the MV may fail. Unfortunately, the messaging you receive when there is a lack of data as the result of sampling is not very clear.
Incorta does not perform data sampling when you run the logic from the Incorta Notebook. That's why the ML won't fail there.
We cannot change the default to no sampling because when the data set is huge, it will take a large amount of resources and time to validate and save the MV.
Add the property spark.dataframe.sampling.enabled to the Incorta MV, and set the property to false when you run into issues. Adding this setting will allow the MV to bring back all the data which will prevent the issue with sampling not bringing back enough. Try to validate and save the MV again. It should succeed!