Multi-Source Table Design

LukeG · ‎06-30-2022

Introduction

In order to achieve a holistic view of business operations, it is common for organizations to unify similar data from disparate sources. This is often required after an ERP migration, company acquisition, or migration of data from on-premises solutions to cloud-based solutions.

For simple use cases where disparate data sources share similar schemas, users can leverage the native multi-source table feature. This is the best option for simple use cases like merging two separate instances of Oracle EBS. For use cases that require more flexibility and control, it is recommended to use one of the approaches in this article.

What you need to know before reading this article

Let's Go

Option 1: Full Union (Materialized View)

The first approach is to create a Materialized View (MV) with data from each source. In this example we choose six identical columns to align in our MV using Incorta PostgreSQL. Columns can be added to the materialized view when change is required.

Example

We begin by creating a supporting business schema to simplify the process. Typically, you will create one view for each source. In our example, we have two views: 1 for EBS, 1 for SAP.

Screen Shot 2022-08-16 at 8.12.06 PM.png

Each view will contain the subset of columns to be used as part of a union. The image below shows the six columns selected from the EBS AP module. We will select the corresponding six columns from the SAP AP module in our second view.

Screen Shot 2022-08-16 at 8.13.11 PM.png

Once the columns are selected, create a Materialized View in a new schema. Ours is titled MultiSource. In the MV, we can use Incorta PostgreSQL as the language. The following script will create a single table from the two sources allowing us to see all of our data in a single view.

Screen Shot 2022-08-16 at 9.04.54 PM.png

The resulting table can be loaded as needed It will contain data from both sources. See the example dashboard below.

It is recommended that you "harmonize" your data so that different names are not used to represent the same thing in the new table. Check out this article to learn more about Data Harmonization for Multi-Source.

Option 2: Bridge Table (Materialized View)

The second approach is to create a Materialized View with the primary keys from a table in each source. The MV serves as a bridge between the schema from each source. A business view will be used to create formulas that determine which schema to query. Columns can be added to the business view as formulas when change is required.

Example

Part 1. Schema Design

Start by creating the bridge between our Oracle EBS and SAP data. In the new schema, create a materialized view using your language of choice. We will use Spark SQL. The script below creates a table with four columns: Source Name, Transaction ID, Transaction Line ID, and Order Date.

Screen Shot 2022-06-23 at 10.33.56 AM.png

After the bridge table is created, we can use the IDs and dates to join to the existing EBS data, SAP data, and date dimensions.

Screen Shot 2022-06-23 at 12.03.11 PM.png

As we explore the data and build insights, Incorta intelligently builds the multi-source query allowing us to produce a holistic view of business operations in a single dashboard. Below is an example of a query plan using a bridge MV.

Screen Shot 2022-06-23 at 10.36.12 AM.png

Part 2. Business Schema Design

To ensure that dimensions and measures are pulled from the appropriate sources, logic should be defined at the business schema level. We typically include:

Primary key(s)
Date
Source of record

To include other attributes, we will create a view in a business schema. The business view will include a formula field for each attribute that determines which source to fetch the data from.

The formula below is defined in a business view and retrieves the AMOUNT field from each invoice. The source system from which the transaction originated will determine which table to query for the AMOUNT field.

if(
	MultiSource.EBS_SAP_BRIDGE.Source = 'EBS', 
	EBS_AR.RA_CUSTOMER_TRX_LINES_ALL.EXTENDED_AMOUNT, 
	SAPECC_SD.VBRP.NETWR
)

In plain english, the formula says: If the source of the transaction is 'EBS', query EBS_AR.RA_CUSTOMER_TRX_LINES_ALL.EXTENDED_AMOUNT, if not, query SAPECC_SD.VBRP.NETWR

This can be done for all other attributes required for insights (Business_Unit, Supplier_Name, Item_Description, etc...).

Conclusion

These are common approaches to multi-source table design in Incorta. The best approach for you will depend on your data sources, data volume, and use case(s).