cancel
Showing results for 
Search instead for 
Did you mean: 

Why are primary keys important?

We noticed that starting in 2023.7 we are getting error messages in the loader status, if the Enforce Primary Key Constraint option isn't selected with Primary keys setup in the table.  Screenshot below. 

JC_Data_Wizard_0-1696883505198.png

Someone on our team asked why we even need primary keys on the table. Outside of the deduplication it provides, I didn't really have a good answer. There isn't much on the importance of using primary keys in the documentation or the community. 

 

Outside of deduplication why do we need primary keys set up in the table? 

 

Thank you, 

John Cross

7 REPLIES 7

mhelmy
Employee
Employee

As mentioned by @RADSr, primary keys are crucial for ensuring the uniqueness of each row identifier in a table. They are particularly important for Incorta during the incremental data loading and deduplication process.

Previously, Incorta used to perform deduplication during both full and incremental loading jobs for tables. However, to optimize the load process, we stopped deduplicating data during full loads by default because most data sources already apply primary key constraints. Nevertheless, we require our clients to enforce the primary key constraint for data sources that don't enforce it.

Specific data sources, especially those storing data in a staging area with batches that might contain duplicates, necessitate enabling this flag to perform the deduplication during the full load and the incremental load. We encountered cases where customers didn't enforce the deduplication process during full loads for such data sources. Hence, we introduced this check to ensure the accuracy of join calculations.
Also, the logic of some MVs might return duplicates, which is why it is common to see issues like this with the MVs with keys identified after the upgrade. Enabling the flag (Enforce Primary Key Constraint) and then applying the loading from staging would fix the issue.

Thanks @mhelmy  - and just to confirm, there's no query performance hit/gain associated with the key designation? 

-- IncortaOne@PMsquare.com --

@RADSr 

Having keys on tables introduces additional steps during the load process. However, they are necessary for the deduplication step. As for query performance, keys do not have an impact.