cancel
Showing results for 
Search instead for 
Did you mean: 

How does Incorta treat duplicate rows for load and update?

RADSr
Captain
Captain

If I have a table with keys defined and primary key enforcement turned on how does Incorta ingest and process duplicate rows?   First in wins?  Last one in wins?  Other?   

If a duplicate(s) is found during an incremental update does the existing Incorta record get updated in the same fashion ( i.e. if there are two duplicates in the incremental load does the existing record get updated with the first "new" record in or the last ) ? 

I think I understand that incremental loads create their own parquet file - is the order of operations 1) full load creates temp file, goes through compaction and dedup, and then creates final file, 2)  incremental load creates temp file, goes through compaction and dedup, and final file, and then 3) original and subsequent incremental files are read into memory w/ further dedup in chrono/file order?  

RADSr_0-1688853743880.jpeg

 

  

 

-- IncortaOne@PMsquare.com --
1 REPLY 1

amit_kothari
Employee
Employee

In compaction,  the most recent record gets selected. For the first question the last one wins. For the second, the last is picked. 

For the order of operations:
Increments is scanned in order, and duplicates are marked. Once finished, we remove duplicates by rewriting files if needed.