Solved: Re: Incremental Loads

kxie · ‎09-28-2022

Is it possible to set a non-incremental table to load from staging (rather than performing a full load) when an incremental load is run?

KailaT · ‎09-29-2022

Hi @kxie ,

Is there a particular use case for why you would want to do this? Typically we only run load from staging manually.

kxie · ‎09-29-2022

I currently have tables grouped together for security/access that had full loads at the end of the day. However, due to a change in requirements, incremental refreshes are being done throughout the day. For the more complex python MV's creating incremental load logic is complicated, and until that logic can be developed and validated, I'm looking for some way to shorten the schema's overall incremental load time. I did consider moving the MV's to another schema, but that would require a significant refactor.

dylanwan · ‎09-29-2022

In an incremental refresh, for those tables that do not have the incremental logic, the full load logic will be used. Instead of creating the right incremental logic, a dummy logic can be added with the filter 1=0 so it won't do anything.

Here are some code examples:
For SQL based table:

SELECT <same as full load>
FROM <same as full load>
WHERE <same as full load>
AND 1=0

Add the last line.

SQL base MV, we can take the same approach as SQL based table.

For PySpark MV:

df = read("<current Schema>.<Current MV>")
df = df.filter("1=0")
save(df)

This approach is using the MV itself as the source and select no data from it.

The incremental logic will not bring any data and should run very fast during incremental refresh.