cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental loads - when and how much does it negatively impact runtime queries?

RADSr
Captain
Captain

Before anyone types "it depends" know that I'm a consultant and claim that answer for myself  😉 

I am interested to know if there are any measurements which can help me determine a strategy for doing incremental builds as my routine, and only doing full loads as runtime performance dictates.  I'm predominantly on Incorta cloud so kind of don't care if I clutter up a directory w/ 1 or 1000 files, but do care if I watch the "wheel spin" when I open dashboards.

I don't know enough to know if this should be based on number of parquet files, size of the files, a combination of both, or some other factor(s).  

Hoping the fine folks in product development, support, or services have done some benchmarking and can provide some guidance.  

-- IncortaOne@PMsquare.com --
1 REPLY 1

dylanwan
Employee
Employee

Incremental logic will be executed by the source database.  If you just add simple where clause like, "LAST_UPDATE_DATE > ?", it may be fine.  If the source extraction query involves multiple tables and you use OR to connect multiple filters like

WHERE TB1.LAST_UPDATE_DATE >? 
OR TB2.LAST_UPDATE_DATE >? 
OR TB3.LAST_UPDATE_DATE >? 

The database may not perform well on such query and thus become negative impacts.

When Incremental refresh will extract a large volume of the data that need to be merged, Incorta may work better by doing a full refresh.

The incremental logic may make the table fragmented. When the refresh is scheduled with a very high frequency and thus produce many small files.  It will have negative impact to the downstream process and thus a parquet merge tool may have to be used.