on 02-12-2025 02:20 AM
Managing incremental data loads efficiently is crucial for keeping your data warehouse up-to-date without putting unnecessary strain on your data sources. Incorta has traditionally relied on a query-based approach using either the LAST_UPDATED_TIMESTAMP or the MAX_VALUE of a column to handle these loads. However, this method has limitations, particularly when these options are unavailable or insufficient.
Log-based incremental load offers a comprehensive solution, enabling Incorta to support incremental data loading for both insert, update, and delete events without needing specific columns. This method also ensures that there is no performance impact on data sources during incremental data loads.
2024.1.x and up (Insert & Update events)
The source database must have transaction logging enabled, with logs continuously streamed to Kafka. The current implementation uses Debezium to consume these logs, extract change events, and publish them to Kafka topics.
1. Schema managers can enable 'Log-based Incremental Load' from the Data Source screen, allowing them to add the necessary Kafka credentials.
2. Schema managers can choose between the 'Query-based' or 'Log-based' incremental type from the dataset screen.
An upcoming enhancement will support delete operations using log-based incremental load. These operations will be handled as soft deletes, ensuring that deleted records are marked in Incorta.