Data Hub/Materialized View question

It seems that Data Hub and Materialized View queries both utilize the compacted parquet files and compaction starts after the incremental load has finished. Is there a recommendation on how to  account for wanting the Materialized View and Data Hub queries to contain the most current data that was also just picked up in the incremental load?

This is what I am seeing:

  • Incremental Load on Schema X happens  12/4 and loads data from 12/3
  • Materialized views and Data Hub queries (also part of Schema X) only pick up data from 12/2 and prior in the same incremental load
  • Compaction is triggered, and the next  time Materialized Views and Data Hub queries run in the incremental load on 12/5,  data from  12/3 will be picked up, but not 12/4

This is not ideal for me–I need the MVs and DH queries to have the new data that was picked up in the incremental load.

1reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • When Always Compact is on, the compaction process is triggered right after the extract process is finished.  When Always Compact is off, currently Incorta triggers the compaction process based on the MV that are using the table.  Incorta compares the timestamp of the extracted file with the timestamp of the compacted file to determine if the compaction is necessary. 

    The compaction jobs are put in a queue and the resources available for running compaction jobs are controlled by Compaction Pool size and Java Compaction memory, both of which you can adjust in the admin UI.  

    Please note that the data hub Query will not trigger compaction process.  If the datahub query is planned to be used in your deployment, please switch Always Compact on.  

Like Follow
  • Status Answered
  • 2 yrs agoLast active
  • 1Replies
  • 163Views
  • 2 Following

Product Announcement

Incorta 4.9 is now Generally Available (GA)!!!