09-28-2022 11:53 AM
Is it possible to set a non-incremental table to load from staging (rather than performing a full load) when an incremental load is run?
Solved! Go to Solution.
09-29-2022 10:00 AM
Hi @kxie ,
Is there a particular use case for why you would want to do this? Typically we only run load from staging manually.
09-29-2022 10:37 AM
I currently have tables grouped together for security/access that had full loads at the end of the day. However, due to a change in requirements, incremental refreshes are being done throughout the day. For the more complex python MV's creating incremental load logic is complicated, and until that logic can be developed and validated, I'm looking for some way to shorten the schema's overall incremental load time. I did consider moving the MV's to another schema, but that would require a significant refactor.
09-29-2022 11:40 AM - edited 09-29-2022 11:41 AM
In an incremental refresh, for those tables that do not have the incremental logic, the full load logic will be used. Instead of creating the right incremental logic, a dummy logic can be added with the filter 1=0 so it won't do anything.
Here are some code examples:
For SQL based table:
SELECT <same as full load>
FROM <same as full load>
WHERE <same as full load>
AND 1=0
Add the last line.
SQL base MV, we can take the same approach as SQL based table.
For PySpark MV:
df = read("<current Schema>.<Current MV>")
df = df.filter("1=0")
save(df)
This approach is using the MV itself as the source and select no data from it.
The incremental logic will not bring any data and should run very fast during incremental refresh.