Remote tables is an Incorta feature that enables you to access large Parquet, ORC, and CSV files in data lakes without loading them into Incorta memory. Duplicating these large files wastes disk space and uses too much memory even though a small portion of this data might be needed.
The following diagram shows a setup where:
- HDFS data lake contains large files
- Hive stores the metadata for the large files (there are external tables in Hive that point to the files)
For example, you want to query a large file using a BI tool through Incorta, and you want to query the file in-place, without loading it into memory. Incorta features a Hive connector that is usually used as a SQL connector to bring data into Incorta. Hive connectors have a feature that provides access information to Incorta (when you enable the remote mode).
Perform the following steps to set up a remote table:
- Create a Hive data source and point it to your Hive instance
- Create a schema, either from the Schema Wizard or Schema Editor and populate it with tables
- Edit the table that points to a large file and enable the Remote toggle
- Save the change
After you pick one or more remote tables, they become Performance Non-Optimized by default since they cannot be brought into Incorta memory. If you load a schema that contains a mix of normal and remote tables, only normal tables are loaded into memory. You cannot create a dashboard based on a remote table.
With remote tables you can:
- Query using external BI tools using the SQL interface.
- Create Materialized Views on top of the remote tables. After you load the Materialized View into Incorta memory, you can see the results in a dashboard.