
Deleting rows w/o full load
Is there a way to delete unwanted rows from a table without performing a full schema load? I have a large schema that is incrementally populated, and the source data is systematically deleted after it has been incrementally loaded into Incorta. So performing a full load on the schema is not an option. I've found ~750,000 rows (out of 171 million total) that need to be deleted. Is there a way to delete them from the table but retain all the rest of the data?
-
There are two scenarios with Deleted records:
1. We have a way in the source system to identify deleted records. This could be done using a tool, like GoldenGate, which goes over the database logs, and create a file with all updated/inserted/deleted records, with the appropriate flag for each one. In this case we create a small table DELETED_ROWS(ROW_ID).
2. We don't have a way to identify deleted records. In this case we create a skinny table CURRENT_ROWS(ROW_ID) which contains all the primary IDs of records in the main table. This table needs to be refreshed frequently with a Full load. But given we are extracting only one column (or as many columns as constitute the primary key), refreshing this table should not take much time.
Next we build a join between our main table and either one of these tables on ROW_ID. The new table (DELETED_ROWS or CURRENT_ROWS) will be the parent in this join obviously, and our transaction table will be the child.
Next in our Transaction table we add a security filter to eliminate the deleted rows from any query on our transaction table.
3. In the first case (where we have a DELETED_ROWS table), our filter would be : DELETE_ROWS.ROW_ID is NULL
4. In the second case (where we have a CURRENT_ROWS table), our filter would be : CURRENT_ROWS.ROW_ID is not NULL
-
See this thread towards the bottom for an additional method of handling deletes. https://community.incorta.com/t/x1rarf/how-does-incorta-handle-source-deletions-of-rows