2

Snapshotting Data

If you need to create point in time reporting in Incorta, you may need to implement a snapshotting strategy.  Often the source application itself will track changes to its data and thus you can get a full historical view of changes to data over time by doing a full load from that source.  There are, however, many source systems that do not track changes in data over time.  Incorta can help in this regard by capturing snapshots of your data or of the changes in data at regular intervals and storing the history of change so that it can be reported on in Incorta even if it is not available in the source system itself.  There are a number of strategies that can be implemented depending on your reporting needs and the size of your data.

What you should know before reading this article

We recommend that you be familiar with these Incorta concepts before exploring this topic further.

These concepts apply to all releases of Incorta.

Dense snapshotting captures and stores full copies of the record set that requires change over time reporting each time the data is loaded.  Typically, the data load frequency is once per day, but it can be set to whatever frequency is required.  In order to differentiate each load of the data set, you need to add a date column that is populated with the load date and timestamp and make that column part of the key for the table.  Because a full copy of the data set is created with each incremental data load, this method builds up a lot of data over time and is not viable for large data sets.  It should only be used if there is no other alternative.  Note that you will need to account for as of date when reporting in dashboards with data from this table in order to limit the data to return values for the appropriate time frame.

You should turn off the ability to do a full load on the table and should only use incremental loads when employing dense snapshotting.  Also, you should back up the parquet file directory on a frequent basis as it is not possible to reconstruct the change history from the source system if there is a failure that causes you to lose your Incorta instance. 

Refer to this Community article for additional detail on how to implement Dense Snapshotting in Incorta: Creating Snapshot Tables with Incorta

Sparse snapshotting captures and stores copies of the records from a record set that requires change over time reporting each time the data is loaded only if there is a change to the record.  As such, it is a more efficient way to capture change over time than dense snapshotting.  Typically, the data load frequency is once per day, but it can be set to whatever frequency is required.  In order to differentiate copies of records from one another, you need to add a date column that is populated with the load date and timestamp and make that column part of the key for the table.  When you create dashboards using data from the table, you will need to account for as of date so that you only return the records that are current for the timeframe of the dashboard.

You should turn off the ability to do a full load on the table and should only use incremental loads when employing sparse snapshotting.  Also, you should back up the parquet file directory on a frequent basis as it is not possible to reconstruct the change history from the source system if there is a failure that causes you to lose your Incorta instance. 

Refer to this Community article for additional detail on how to implement Sparse Snapshotting in Incorta: Creating Snapshot Tables with Incorta

Where as snapshotting strategies focus on transactional data, slowly changing dimension (SCD) strategies focus on dimensional data.  There are two approaches to SCD supported in Incorta.

  1. When the source system tracks changes via an effective date and always inserts a new row on a change, then it is simply a matter of creating a >= join condition from the child transactional table(s) to the dimension table on the effective date in addition to the normal foreign key to primary key join used for the two tables.
  2. If the source system just updates dimension records without tracking changes, then in Incorta, logic (via a Materialized View normally) can be used to capture the changes in Incorta as new records that are differentiated with new effective date ranges. 

Refer to this Community article for additional detail on how to implement Slowly Changing Dimensions in Incorta: Slowly Changing Dimensions

Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular

Product Announcement

Incorta 5 is now Generally Available