Backup and Restore Strategy
Backing up your Incorta environment is one of the most important tasks to perform in order to ensure you have a reliable path for restoring your Incorta reporting environment should the need arise. This document describes the recommended strategies for backing up Incorta and will assist you in selecting the appropriate components and configuration for backup.
We recommend that you be familiar with these Incorta concepts before exploring this topic further.
- Be familiar with the basic structures of your Incorta Install
- Install Location / Path
- Tenant Data Location / Path
- Tenant Names and associated Admin Id / password
- Your Operational Enterprise Backup Strategies
- Does your organization already have a backup / DR plan for servers, data files, and data storage (including databases)?
- What software is used for data backup?
- What maintenance windows does IT set aside for regular downtime for backups and maintenance?
Making yourself aware of the above topics will assist you in determining the best strategy for backing up your Incorta environment
These concepts apply to all releases of Incorta. Note: this document will also make reference to backup capabilities that are available and will be performed as part of the Incorta Cloud hosted environment.
There are many different strategies for backing up the Incorta environment. Which combination of strategies you end up taking, as alluded to above, may in large part depend on what your IT department already does for Production application servers as part of standard backup, recover, and disaster recovery processes. For example, there may already be processes in place to back up the server where Incorta has been installed and/or the server where your metadata database, that contains critical information about your Incorta application, has been loaded. Similarly, if some shared or network storage has been provisioned where your Incorta data files (parquet, snapshots, etc.) reside, then there may already be enterprise software that is backing up these storage locations. Having said that, what are the core aspects of an Incorta Backup that need to be considered?
- Incorta software binaries installed at some location like <install path>/IncortaAnalytics/
- Incorta Data directories installed under some structure like <data path>/Tenants/<tenant name>
- Incorta Metadata Database typically loaded as a MySQL or Oracle database instance residing in your network
- and finally, a special "application configuration" backup available with Incorta called a Tenant Backup, which is basically a FULL backup of the structures & objects you have built in Incorta such as Schemas, Business Views, Folders, Dashboards, Schedules, Data Sources and any and ALL aspects of what you have created as a reporting application for a given Tenant.
Incorta is an interactive, dynamic and 24/7 application that services the needs of your online users as they access the data for reporting. However, Incorta is also the data ingestion engine for loading and preparing the data for reporting, and it is also the development environment where changes can be made that alter the structures, views, rules, and dashboards that define what you are presenting to the end users for reporting. Because of this, there are a lot of moving parts that need to be taken into account as it relates to the integrity of a backup.
- Loader Service: If the loader is running there could be jobs that are actively writing data to the logs, parquet and other files on the system. The Loader Service, if possible, should be stopped so that it can be guaranteed that there is not disk activity at the time of the backup.
- Analytics Service: For most users, the analytics service allows them to log in and run reports. For other more advanced users, they can create new dashboards, and for the highest level of users, they can create and alter existing schemas, business views and dashboards. When changes are being made to the Tenant those changes are being written to the Metadata database, and if those changes also affect the data (for example, adding a column or formula column to a table), that will have an impact on the physical structure of the parquet and snapshot files during the next data load. Therefore, Analytics services should also be shut down so that it is known that at the time of the backup no users were logged in and no changes are being made to the system.
- Size & Availability of Your Source Data: As will be discussed below, the easiest and most common backup method to be implemented is a periodic Tenant Backup. A Tenant Backup can be taken often, and more frequently than backups of your servers and data files. However, the limitation of a restore performed ONLY with the Tenant itself is that it will result in the requirement for a FULL RELOAD of all the data. If your data sources can support such a full reload within acceptable time frames and volumes of data are not so large that a Full Load is not prohibitive, then many times relying on Tenant Backups alone can be a strategy that is acceptable because an Incorta Tenant Application can be fully restored by merely loading the Tenant and running Full Loads of each Schema.
The most common method of backing up the Incorta application that you have built is to back up the Tenant. In some contexts, it may also be referred to as exporting the Tenant. Support may ask for this, for example, in conjunction with a support ticket that has been opened. Also, as a best practice, when performing changes / migrations from one environment to the next, it is always recommended that you take a backup of the target location Tenant prior to making the changes so that you may revert if necessary. The ad hoc export of a Tenant may be accomplished in one of two manners:
- From the Tenant screen in the CMC, click on the "3 dots" on the right side of the screen at the end of the row naming your tenant. You will see the option to Export. A Zip file of your Tenant will be immediately downloaded to your local machine and you can rename it and save it as you desire.
- If you have access to the server where the CMC is loaded, and you can open a terminal session, you can manually export the Tenant from the command line. There is a tool called the Tenant Management Tool (aka TMT) in the <incorta install path>/IncortaAnalytics/cmc/tmt directory. To learn more about this you may read the procedure found on the docs.incorta.com ( https://docs.incorta.com/5.0/backup-restore)
It is also recommended that regular tenant backups are scheduled. This should be done at least once per day, but in heavy development environments, and especially on Dev servers, it may be beneficial to take Tenant backups more frequently in order to keep snapshots of the Tenant application at various moments in time that can be reverted to if needed. The manner in which Tenant backups are scheduled and run differs depending on what version of Incorta you are running.
- For versions 4.8 and below, recurring tenant backups may only be created using scripts that execute the TMT command discussed above which are then are scheduled using cron or another preferred scheduling application running on the OS where Incorta is installed.
- For versions 4.9 and higher, the CMC can be used to create a recurring schedule for backing up the Tenant that uses the same scheduling method that is used in the Incorta UI to schedule schema loads. See the Screenshots below. Note on the schedule creation screen there is a slider-bar that allows you to choose how many versions of the tenant backups are being stored on the server.
The data that has been loaded for your reporting is stored in physical files. If there is a desire to backup your data as it has been loaded into Incorta, it will be necessary to construct a process for backing up the location where the data files are located in your installation:
- In single server installations, the data will more than likely be stored in the <Incorta install path>/IncortaAnalytics/Tenants directory with a subdirectory created at that location for each Tenant you have configured on your system.
- In multi-server environments, it is required that all of the servers running Analytics or Loader services have access to a shared network location since all services need access to the same data loaded for a Tenant.
Incorta does not contain an automated or scheduled method for backing up your data as it resides on the file system. Therefore, as mentioned at the beginning of this document, it is important to understand if your enterprise IT already performs regular backups of the servers and/or network storage devices used in your Incorta environment. In cloud environments such as AWS for example, there may already be processes that are taking Amazon Machine Images (AMI) to back up the machine, and if Amazon EBS volumes are in use, there may be processes in place that are already creating snapshots of those volumes. These concepts are similar across clouds (Google or Microsoft) and other virtual platforms. But even if your organization is using on-premises hardened servers, your IT group should be able to assist you in setting up an appropriate backup methodology for your Incorta servers.
Important Note: As mentioned above, your application is represented by the metadata that is exported as part of the Tenant Backup. That metadata of your application directly maps to the structures written to your data files. Therefore, if it is not possible to time your tenant backup to be taken at the same time as your physical backups, then to accomplish a restore you should choose the tenant backup that is closest to the time of your file backups in order to minimize the possibility that the structure of the files does not match the structure from the restore of the tenant. Also, while not strictly required, it is recommended that prior to taking the data backups, the Loader service should have first been stopped in order to ensure there are no active file write processes running at the time the backup is being taken.
Incorta relies heavily on a metadata database that stores all the data about the structure of your schemas, views, dashboards, security, schedules and all aspects relating to your Incorta reporting applications. In fact, creating a Tenant backup is largely an export of much of this metadata.
The first step in understanding the options for creating a backup of your metadata database is understanding which database vendor is being used and where the database resides. For single server, on premises installations, Incorta has the option of installing MySQL on the same server that Incorta is installed on, and uses that database to run the Incorta application. However, more often than not, the customer already has either MySQL or Oracle running within their enterprise and these database servers can be used to create the "incorta database schema". In either case, backing up this database is a function of using the approved database backup procedures / software as documented by each of these database's documentation. There is no function within Incorta to request and automate a backup of the metadata database itself.
Important Note: Fully restoring a Tenant from a Tenant backup has the effect of fully replacing all of the information about your application in the metadata database. Therefore it is not necessary to restore all 3 component together. However, if you have a true database outage, having a backup of the database itself or at lease an export of the Incorta Schema will allow you to ensure all the tables and structures that Incorta requires are in place.
Backing up the Incorta Binaries located at the <incorta install path>/IncortaAnalytics directory largely follows the same analysis as is applied to backing up the Data. The process for backing up the Incorta binaries may automatically be covered by the same process that is already backing up the OS for the server / VM itself. Similarly, in cloud environments, this may already be accomplished via the application of volume image snapshots, etc. It should be recognized that in addition to the binaries that run the core CMC, Analytics and Loader services of Incorta, the base Incorta install also includes installs of other related software such as Zookeeper and Spark, which are located within the ../IncortaAnalytics/ structure. Further, there are some key configuration files for such things as Active Directory / LDAP synchronization, SSO Login, SSL, various logs, tomcat, Spark defaults, and other settings that have an impact on how your application operates. It is very important that your Install directory is also backed up to retain all the settings that are affecting the operation of your application.
This section covers a method for performing backups that is more appropriate for smaller Incorta implementations. This link and associated scripts for download, describe a process that performs a FULL backup of Incorta. It includes:
- Using TMT to backup the Tenant
- Performing specific configuration file backups
- Performing backup of the <tenant name>/data directory where user uploaded flat files are located
- Performing a backup of all the Incorta Data such as parquet, snapshot, and related folders
The reason why this method is more suitable for smaller and even more specifically single server environments is the weakness of the way it backs up the data itself. This script accomplishes the data backup by performing a Zip of the data directories, and then moving those Zip files to an archive location. The Zip process is not very fast, so this method of backup is not an efficient way to back up large volumes of large data files. Having said that, this process can also be used in conjunction with its associated control properties file to only perform the first 3 of the above bullet steps to accomplish a fairly comprehensive backup.