Incremental load from Data Lake
I have data loading from Azure Data Lake Storage (ADLS) using Data Lake connection in Incorta, when I enable incremental load for this table, how does Incorta find the incremental data ? I do not see any additional place holder for defining incremental load as we get when data is loaded from Oracle tables.
When you have the wildcard union option turned on (that is, you want all the files under a single directory to be included in the table), incremental load will be based on the files last modified timestamp. For example, if you start with a directory that contains 5 files, the first full load event will extract all 5 files, the next incremental load will look for files with last modified timestamp more recent than the last load event time. This includes new files added to the directory and previously existing files that have been modified after the last time they were loaded.
When you have the wildcard union option turned off (that is, you want a single file to be included in the table), you will have an option to provide an update file. So, only the first file will be extracted in the full load events, and only the the update file will be extracted in the incremental load event.
Usually in a data lake setup, the first option is more convenient and better suits business use cases.