Solved: Re: Incorta table load/query timing

RADSr · ‎06-02-2023

Doing some testing. I have two tables from the same source - they should be mutually exclusive result sets, but may not be.

I created a third table using two SQLi -based tables to query the first two and set the key columns to eliminate any duplicate rows.

* All three are currently in the same schema and I've modified the load order so the two "source" tables are in group 1 and the composite table is in group 2*

Can I reliably expect that the two source tables will run to completion first and the composite table will then query those results? IOW - when do Incorta tables ( parquet files ) get locked/sequestered for load querying?

-- [email protected] --

mhelmy · ‎06-05-2023

Hello @RADSr

What version of Incorta are you currently using, and what is the type of Incorta table that has been created for the third table?

If you only need the union of both tables, using a single table with two Datasets would be a good option in your case. You can set the key column(s), and enable the Enforce Primary Key Constraint to remove duplicates.

RADSr · ‎06-05-2023

Yep - that's what I'm doing.

Being in the same schema - in different load order groups - I want to make sure the direct-from-source tables will load and that the "two dataset" table will read from the newly loaded version of the original two.

I'm on 2023.4.0 What you describe is what I'm doing w/ the multiple sources being Incorta tables accessed via SQLi

I'm not using direct-to-ERP sources because the SQL takes too exponentially longer to execute as the data volume gets larger.

In database terms, I'm wondering when the SQLi "tables ( i.e. the parquet files ) get locked for reading by the multi-source table. At the beginning of the load plan or when the table itself leaves the queue for extract.

This is going to get way more complex for me fairly quickly so the better my foundational knowledge of how Incorta works the better plan I'll be able to craft for the "way more complex" version.

-- [email protected] --

mhelmy · ‎06-08-2023

If you add the source tables and the SQLi table in the same schema, the latest version of the data may not be retrieved even when using load order. This is because, in the case of the Incorta port, the SQLi will not be looking at the recently extracted version yet, and even with the Spark port, the deduplicated version may not be available yet.

Please add the SQLi table in a separate schema and load it after the first schema to get the recent data.

We are currently working on a new feature that will allow you to load multiple schemas in the same job and specify the load order of these schemas. This feature will address the issue you are experiencing.

RADSr · ‎06-08-2023

Thanks @mhelmy - that's what I was afraid of.

That schema dependency scheduling capability will be hugely valuable!

-- [email protected] --