10-09-2023 01:37 PM
We noticed that starting in 2023.7 we are getting error messages in the loader status, if the Enforce Primary Key Constraint option isn't selected with Primary keys setup in the table. Screenshot below.
Someone on our team asked why we even need primary keys on the table. Outside of the deduplication it provides, I didn't really have a good answer. There isn't much on the importance of using primary keys in the documentation or the community.
Outside of deduplication why do we need primary keys set up in the table?
Thank you,
John Cross
Solved! Go to Solution.
10-12-2023 10:13 AM
As mentioned by @RADSr, primary keys are crucial for ensuring the uniqueness of each row identifier in a table. They are particularly important for Incorta during the incremental data loading and deduplication process.
Previously, Incorta used to perform deduplication during both full and incremental loading jobs for tables. However, to optimize the load process, we stopped deduplicating data during full loads by default because most data sources already apply primary key constraints. Nevertheless, we require our clients to enforce the primary key constraint for data sources that don't enforce it.
Specific data sources, especially those storing data in a staging area with batches that might contain duplicates, necessitate enabling this flag to perform the deduplication during the full load and the incremental load. We encountered cases where customers didn't enforce the deduplication process during full loads for such data sources. Hence, we introduced this check to ensure the accuracy of join calculations.
Also, the logic of some MVs might return duplicates, which is why it is common to see issues like this with the MVs with keys identified after the upgrade. Enabling the flag (Enforce Primary Key Constraint) and then applying the loading from staging would fix the issue.
10-12-2023 11:41 AM
Thanks @mhelmy - and just to confirm, there's no query performance hit/gain associated with the key designation?
10-19-2023 08:45 AM
Having keys on tables introduces additional steps during the load process. However, they are necessary for the deduplication step. As for query performance, keys do not have an impact.