They are two usecase here -
1) For some sources like Siebel which supports soft deletes (typically a binary valued column indicating if the row is deleted or not) then we handle deletes by using a table security filter on that field.
2) For sources like E-Business Suite which does not support soft deletes then here is one way to handle it in Incorta -
- Define a slim table which just has the Key fields of the table we want to support deletes
- We will always do a full load of the above table.
- Make the above table as a parent table of the base table by creating a child join from the base table to this keys table.
- Apply a table security filter on the base table as - "a field of parent table is not null" .
- Sample security filter on base table using the parent table field - not(isNull(Test.parent_table_with_keys.id))
This way the deleted rows from the base fact will always be filtered off from the dashboards. If required you can also schedule a periodic full load (for example every week) to flush out the deleted records from the base table.
Can a security filter be applied to a table that is not performance optimized? If so, will the filter be applied if a spark MV selects data from that table?
In my use case, the base table I need to apply the filter to is not performance optimized, and the data from it is selected into several MVs using Spark. If Spark ignores the security filter (I suspect it will since it is working directly with the underlying parquet files) then this approach won't work for me.
There is a similar method for handling deleted rows that is faster because it only requires incremental loads on the source keys table. You will need the ability to create a new table and trigger in the source database. The method is the following:
1) Create a table in the source database with just the key column(s) for the table you need to handle deletes in. Add a Deleted_On column with a date timestamp data type.
2) Create an On or After Delete trigger on the source table that writes our the key(s) to the keys table created above. Set the Deleted_On column with current timestamp.
3) Create a table in Incorta for this narrow keys table in the source database. Set up incremental loads on this table.
4) Create a join as mentioned above where the keys table is the parent table.
5) Create a runtime security filter as mentioned above except the logic is reversed. In this case it should be "a field in the keys table is not null."
This way the deleted rows will be filtered out of the base table but you only need to do incremental loads on the keys table so the loads can be scheduled with more frequency. You may want to put the keys table in a separate schema so that you can schedule the loads on a separate schedule. Also, the keys table in the source table can be truncated periodically to keep it small. The base table can be scheduled for full loads on a weekly or monthly basis on off business hours to physically remove the deleted rows from the base table.