Incorta has a wide variety of external data source connectors. A connector specifies how Incorta can connect to an external system or application , ingest data and publish to destinations. Incorta includes many out-of-the-box connectors for data ingestion, please refer to this page for more details on the types of connectors supported.
We recommend that you be familiar with these Incorta concepts before exploring this topic further.
These concepts apply to all releases of Incorta.
The data agent (DA) is a lightweight application that runs on a server that has access to the source DB, on premises or in the cloud, and makes an outbound connection to the customer Incorta Cloud instance. Please ensure that the data agent is placed in the same server or the same network as the server on which the source database is running. Please refer to this article on how to install/configure a data agent.
Data lakes usually receive updates in the form of new increment files added to an existing directory. Incorta's data lake connector leverages that via the “Wildcard Union” option.
Use the "Wildcard Union" option and the "Last Successful Extract Time" incremental strategy.
Now when the user triggers a full load, the union of all existing files will be extracted into the same table. After that, if the directory receives a new file, for example /path/to/sales/sales_ohio.parquet, the next incremental load triggered will pick up this file since its last modified timestamp will be more recent than the files extracted in the previous full load event.
Use the "Wildcard Union" option and the "Timestamp in File Name" incremental strategy.
Now when the user triggers a full load, the union of all existing files will be extracted into the same table. After that, if the directory receives a new file, for example /path/to/sales/sales_2020-04-04.parquet, the next incremental load triggered will pick up this file since the date part in its name is more recent than the latest file file previously loaded.
If the directory contains heterogeneous files and you want to pick up files that only start with a certain prefix you can use the include field to define something like: sales*.parquet. Specifying this will have the load events consider only files that start with the word sales and end with .parquet.
If the directory contains an arbitrary hierarchy of sub-directories and you would like to load all files within the hierarchy, just enable the Include Sub-Directories flag, this will respect the Include pattern if specified.