When an incremental refresh is executed, Incorta always does a UPSERT to find the matched ID and update it and insert the row if the ID is not found. You cannot clear the data from one data set and run a full load only for that data set but still run incremental on the other. The answer is No to your question.
However, we may want to run a full load only on a data set for one of these reasons:
- No change on the data set, for example, it include the legacy data migrated from an old source or an old data warehouse.
- No incremental logic can be defined
- The table may be refreshed entirely in the data source which may have deleted data
- The table in the data source allows delete
One of the approaches can be used:
- If the data source will not be updated, we can still have an incremental logic that return no data by including a SQL statement with the 1=0 filter.
- If there is no typical incremental logic can be defined, such as no LAST_UPDATE_DATE from the table, consider separate the static data from the data may be updated. Perform a full load in a separate table with the data may be changed, for example, the LAST two periods of the data. Perform a one time only full load with dummy incremental logic for the data which will never changed. Merge these two data sets using a MV with UNION with full load only.
- For the table that may be refreshed entirely in the source system. We can create a separate table to perform full load logic in each incremental refresh.
- For the table that may have the deleted data, use the above 2 or 3.
Once you handle the logic for the full load only data set in a separate table. Merge the data with the table with Incremental logic.
To merge two tables into one table, here are the options:
Merge these two data sets using a MV with UNION or UNION ALL with full load only. MV runs faster in that it can avoid re-extracting data. Full load only MV bypass the compaction process.
Use multiple data sets with full load only. For data which is available in Incorta, we can perform a Incorta over Incorta SQL by defining a postgreSQL data source that link to Incorta 5436 or 5442.
To add to to Dylan's points:
In case of data source as Files, You can specify different files for insert and update. When full load happens, the file specified
for insert will be used. And for incremental, the file specified for update will be used.
This way you can have two different data source files.
I use different tables in my full and incremental load because we have a table of daily activity and a table of history.
We pull from the daily activity table every day for an incremental load, but if we ever had to do a full load and start over we'd pull everything from the history table.
Very important: these two tables have exactly the same schema (fields and data types).
Mike, I believe this should work as long as the 2 SQLs use the same Data-Source itself.
Incorta-Table using Data-Source OracleDBTest
Full load SQL:
select columnA, ColumnB from Table A (history)
Incremental load SQL:
select columnA, ColumnB from Table B (daily) WHERE created_date > ?
History and Daily Table must exist in the same data source.
Note: if they exist in a different data source, then this can also be accomplished but only in a Materialized View.