Load Plans and Sequential Groups

mhelmy · ‎08-25-2023

Introduction

Incorta now offers Load Plans which replace individual Schema Load schedules to make loading data into Incorta more efficient. The advantage of Load Plans is that they allow Incorta to identify the dependencies between schemas and to load many schemas together at once. This eliminates the need to schedule related schemas individually and more importantly it eliminates the need to manually time when to start each schema load, which will remove guesswork and wasted idle time. Additionally, post-processing of loads is reduced because now that step only needs to happen once per Load Plan Group instead of once for each schema.

This article provides tips and tricks for getting the most out of Load Plans for scheduling new data loads and converting existing Schema Load schedules to Load Plan schedules successfully.

What you need to know before reading this article

Applies To

Incorta Cloud

Starting with version 2023.1.0, available as a Lab feature
Starting with version 2023.7.0, available as a GA feature
- Includes the Sequential Loading Group enhancement

On-Premises

Starting with version 6.0, available as a Lab feature
Starting with version 6.1 available as a GA feature
- Includes the Sequential Loading Group enhancement

Let's Go

As noted in the introduction, Load Plans are more efficient than their predecessors, Schema Loads, in that they allow multiple schemas to be run together. When a load plan runs with more than one schema, it treats all the table objects within all the schemas included in a group (more on this later) as if they all belong to a single pool. Then, it determines all the dependencies within the pool and where it can run objects in parallel. This short planning phase is then followed by the actual load that runs efficiently by following the plan and then only executes the post-load process once per sequential loading group. This explains the biggest differentiator of Load Plans, which otherwise operate in terms of scheduling and supporting full, incremental, and staging loads, much like schema loads always have.

In the remainder of this article, we will explore how to take full advantage of the Load Plan feature.

Upon Upgrade

Once you upgrade to a version of Incorta that features Load Plans, your Schema Load schedules will be replaced with Load Plan schedules on a one for one basis. That is to say, everywhere you have a schema scheduled, you will now have a Load Plan that contains a single schema on the same schedule. For the sake of efficiency, you will want to consolidate your schemas into fewer Load Plans. To do this:

Determine which schemas should be loaded together
Pause Scheduled Jobs from the CMC or ask your Administrator to do this for you
Go to the Load Plans sub-tab on the Scheduler tab
Choose one of your load plans and rename it as appropriate
Add all of the schemas that you want to run together to it
Delete the converted single schema load plans that you no longer need

Determine which Schemas to Add to a Data Load

When getting ready to define a Load Plan, there are several factors to think about as you determine which schemas to include in the Load Plan.

Schemas with Related Tables

Schemas themselves are generally defined with tables that are related or joined to one another, but there are often multiple schemas with related data that it makes sense to load together. For example, you might want to load Inventory, BOM, and WIP schemas together. A good indicator for which schemas to load together, besides natural business fit, would be schemas with tables with cross schema joins, which tells you that the data is related.

Another indicator that schemas are related is if their data elements are reported on together. You would want the data to stay synchronized among these schemas for the consumers of your dashboards so they see a chronologically complete picture. This sort of relationship can be identified when you see business views that use data elements from multiple physical schemas or dashboards with insights based on different physical schema data. The data lineage feature can help you identify these relationships.

Schemas Whose Data is Ready for Load at the Same Time

An alternative, but efficient way to group schemas in Load Plans is to look at the timings of when they are ready in their source systems. The schemas in a Load Plan do not necessarily need to be related. Incorta will figure out the dependencies between them and load as many tables as it can in parallel. If there is nothing a schema needs to wait on, you can load it as soon as it is ready, with whichever other schemas are ready simultaneously.

Schemas with Similar Load Times

Another way you might consider grouping schemas in Load Plans, or within Groups in Load Plans, is by the length of time that schemas typically take to process. For example, you might have four schemas that you are thinking about clubbing together in a Load Plan: two run in less than five minutes, and two run in about thirty minutes. You could put them all into one load plan without any groups (or you can think of it as one group) but you would have to wait for all four of the schemas to finish processing before the data from any of them is available.

Assuming that the short-running schemas are not dependent on the longer-running schemas, then by changing the way you set up your Load Plan or Load Plans, you can make the data from the fast-loading schemas available for analytics as soon as they complete the load process (in less than five minutes) and still make the data from the two slower loading schemas available in about thirty minutes. You would accomplish this by splitting the four schemas into two different Load Plans or by distributing the schemas into two groups within a single Load Plan. In the latter case, you would add the two fast-running schemas to a group (Group 1) that comes before the group (Group 2) containing the two long-running schemas. Note that if you use one load plan with two groups, the second group will not begin until the first group has completed processing.

Use Groups to Batch within a Load Plan

Starting with release 2023.7.0, it became possible to define Sequential Loading Groups within a Load Plan. This gives you the ability, within a Load Plan, to orchestrate the order of the load as groups run sequentially, meaning that all the schemas within Group 1 will load before any of the schemas in Group 2 will begin to load, and so forth.

Above, we described a use case where groups enable you to control when data becomes available for analytics based on load time. Another and possibly more critical reason to use groups is that they allow you to set the order when it is necessary that one table object fully processes before another can begin to process. Here are a couple of scenarios where this could come in handy.

Using groups can help you if you have a situation where you have a materialized view (MV) in one schema that is dependent on a derived table, e.g. an analyzer table, in another schema. The MV should not process until after the post-load of the analyzer table is completed. You can force the MV not to load too soon by placing its schema in a separate group that runs after the analyzer table's group runs.
Using groups in a load plan is also helpful if you need to materialize a business view into parquet. You might do this to send flattened data to a destination outside of Incorta. You could use an MV to do the work to materialize the business view, but you would want the MV only to run after the post-load process had occurred for the tables and columns that are used by the business view. You would accomplish that by putting the schema containing the MV(s) into a group in the load plan that follows the group(s) that process the schemas containing the tables your business view depends on.

It is important to note that generally, the use of groups slows down the overall processing time of a Load Plan so, as a best practice, avoiding the use of more than one group is recommended. That said, there are times when you may need them. Load Plans with Sequential Loading Groups give you the flexibility to manage the use cases mentioned above and others if needed.

Sequential Load Group Best Practices

Scenario	Recommendation
You have a dependency that cannot be resolved within the same group	Use groups to force the required processing order. In addition to the two above, another example would an MV that reads from an Analyzer table.
Incorta over Incorta (IOI)	If you can, convert the IOI table to an MV. If not, then add the schema containing the IOI table to a group that runs after the main group.
Loader services has low memory and a low number of cores	Use groups to spread the work by distributing schemas (and thus tables).