.png)
- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on 02-12-2025 02:20 AM
Introduction
Managing incremental data loads efficiently is crucial for keeping your data warehouse up-to-date without putting unnecessary strain on your data sources. Incorta has traditionally relied on a query-based approach using either the LAST_UPDATED_TIMESTAMP or the MAX_VALUE of a column to handle these loads. However, this method has limitations, particularly when these options are unavailable or insufficient.
Solution
Log-based incremental load offers a comprehensive solution, enabling Incorta to support incremental data loading for both insert, update, and delete events without needing specific columns. This method also ensures that there is no performance impact on data sources during incremental data loads.
Applies to
2024.1.x and up (Insert & Update events)
Pre-requisites
The source database must have transaction logging enabled, with logs continuously streamed to Kafka. The current implementation uses Debezium to consume these logs, extract change events, and publish them to Kafka topics.
How to enable this feature:
1. Schema managers can enable 'Log-based Incremental Load' from the Data Source screen, allowing them to add the necessary Kafka credentials.
2. Schema managers can choose between the 'Query-based' or 'Log-based' incremental type from the dataset screen.
Supported Databases:
- Oracle
- Microsoft SQL server
- PostgreSQL
- MySQL
Note:
- Queries, Database Views, and Materialized Views (MVs) do not support log-based CDC.
- A warning message will be displayed if a table does not have a key, indicating that log-based incremental might capture a record multiple times.
- Delete transactions are ignored.
Future Enhancements
An upcoming enhancement will support delete operations using log-based incremental load. These operations will be handled as soft deletes, ensuring that deleted records are marked in Incorta.