Data Hub/Materialized View question
It seems that Data Hub and Materialized View queries both utilize the compacted parquet files and compaction starts after the incremental load has finished. Is there a recommendation on how to account for wanting the Materialized View and Data Hub queries to contain the most current data that was also just picked up in the incremental load?
This is what I am seeing:
- Incremental Load on Schema X happens 12/4 and loads data from 12/3
- Materialized views and Data Hub queries (also part of Schema X) only pick up data from 12/2 and prior in the same incremental load
- Compaction is triggered, and the next time Materialized Views and Data Hub queries run in the incremental load on 12/5, data from 12/3 will be picked up, but not 12/4
This is not ideal for me–I need the MVs and DH queries to have the new data that was picked up in the incremental load.
When Always Compact is on, the compaction process is triggered right after the extract process is finished. When Always Compact is off, currently Incorta triggers the compaction process based on the MV that are using the table. Incorta compares the timestamp of the extracted file with the timestamp of the compacted file to determine if the compaction is necessary.
The compaction jobs are put in a queue and the resources available for running compaction jobs are controlled by Compaction Pool size and Java Compaction memory, both of which you can adjust in the admin UI.
Please note that the data hub Query will not trigger compaction process. If the datahub query is planned to be used in your deployment, please switch Always Compact on.Reply