on 06-21-2022 10:11 AM
Data extraction is one of the key components of any BI Product. It is very important to make sure that the infrastructure and processes responsible for extracting data are highly reliable and can fail over without any manual intervention. And when the data becomes very large it also becomes important to distribute the load for performance reasons or due to limitations on the capacity of available hardware infrastructure.
Incorta's Loader Service is responsible for extracting data from various data sources and needs to be highly available to make sure that no single point of failure can bring down this key component. This article will help you to configure this service for high availability as well as to distribute the load.
This article requires knowledge of Incorta Installation and Administration. It also requires understanding concepts of load balancing and high availability and failover.
When an Incorta cluster is deployed, the default configuration consists of a single loader service (for data acquisition) and a single analytics service (for analytics on the extracted data). Data extraction will be performed by a single loader service and also all the analytics users are served by a single analytics service. There is no high availability for both loader and analytics services.
The following diagram represents an Incorta architecture with a single loader and single analytics service. If any of the service go down the corresponding loader / analytics functionality will not be available.
The most common option customers go with is by provisioning multiple analytics services to make Analytics high available. So if a particular analytics service go down users are still served from the other analytics services configured.
The diagram below represents one loader (no high-availability) and three analytics services (high-availability).
Advantage:
Disadvantage:
Having high availability at analytics will ensure that analytics users are not affected by individual analytics service failures. However since there is no high availability at loader level, there is a possibility of the loader service failing and leaving users with stale data.
The simple solution is to add multiple loader services ensuring high availability for extraction of data. However, in addition to that, a little more complex architecture can be configured to evenly distribute the extraction load among multiple loader services.
In the following architecture diagram, three primary loader services are defined. Each, extracting data related to a particular business area. This is achieved by assigning schemas to specific loader service in the distribution.properties file in the tenant directory.
In the above distribution file, schemas related to three different business areas are assigned to three different loader services. In addition, a backup loader service is also configured to provide high availability in case of primary loader service failure.
Note: There is a default loader service assigned as well to honor extraction of data for schemas that have been defined but not yet assigned to any specific loader service. If this is not specified, any new schemas that are not yet assigned to a loader service will not be extracted.
Note: This article is applicable to on premise deployments.