on 11-14-2022 04:31 PM
When you first implement Incorta, you typically size your hardware for expected current needs plus expected growth in data and users for two years. Over time, as more data is brought into Incorta and as your developers configure new use cases, you will start to hit your headroom on memory. You can always add more RAM and disk, but before doing that, it is worth reviewing your implementation for unused entities that are taking up space unnecessarily.
The topics covered here are useful to both on premises and Incorta Cloud customers. It will be helpful to be familiar with these concepts as you read this article.
There are a number of levers available to you to reduce the amount of memory that Incorta uses by cleaning up what you do not need!
With Incorta, it is so fast to create schemas and tables and load data that we actually encourage it as a way to quickly experiment and build proofs of concept before formalizing your use cases. The result over time can be a build up of objects/entities that are no longer being used but still exist.
Incorta has created a number of tools that help you to understand what is going on with your implementation, including an "Unused Entities" dashboard that helps you identify:
Note that a reference to an entity in an Incorta dashboard can be directly to the entity or via a business view.
This dashboard gives you a solid list to look at for clean up. You can and should install the latest version of the Incorta Tools which includes the "4- Unused Entities" dashboard from the Home tab of Incorta or you can work with Support to get it installed.
The Physical schemas not referenced in dashboards insight on the Unused Entities dashboard provides you with a list of schemas that are not referenced by Incorta dashboards to start with in your review, but there are some caveats to be aware of. First, you may not be using Incorta to build your dashboards. In fact, there are number of reasons to bring data into Incorta and not use the data in an Incorta dashboard:
Access Point | Comments |
SQLi | You are creating your dashboards using another tool like PowerBI or Tableau to access data stored in Incorta with the SQL Interface (SQLi) |
Derived Tables | The schema may be referenced in a Derived Table like a Materialized View |
API | The data in the schema may be accessed via API for use outside of Incorta |
Data Destination | Data from the schema may pushed to a Data Destination directly |
Second, it is a common practice to create dashboards to help validate the data you have brought in and to sketch out how you might present the data to an audience. Just because a dashboard exists does not mean that it and the schema(s) behind it are currently relevant. You will also need to look at the schema that do not show up on the Physical schemas not referenced in dashboards insight.
Recommendation | Notes |
Use the 4- Unused Objects dashboard | This dashboard can help you identify schemas that are not used. |
Understand how schemas are used | The data in a schema may be accessed in ways that are not obvious. See Access Point table above. |
Review all schemas | Even if a schema is referenced by an Incorta dashboard does not mean that it is still relevant. |
Most of the same caveats that apply to removing schemas, also apply to removing tables. There are also a couple other considerations for saving memory specific to tables.
First, is the table optimized? There is a setting on tables called Performance Optimized. When enabled, the data in the table will be loaded into memory by Incorta to speed access to it. Depending on how the table is used, it may not need to be optimized. For example, if it is only ever referenced by an MV or one of the other methods mentioned in the Access Point table, it does not need to be in memory. In this case, make sure that the table is not optimized.
Second, does the table have more data in it than is needed? This may happen if you have been loading data incrementally for a long time. If the answer is yes, then there are two simple approaches to reduce the amount of table data that goes to memory.
Recommendation | Notes |
Use the 4- Unused Objects dashboard | This dashboard can help you identify tables that are not used. |
Understand how tables are used | The data in a table may be accessed in ways that are not obvious. See Access Point table above. |
Review all tables | Even if a table is referenced by an Incorta dashboard it does not mean that it is still relevant. |
Make non-optimized | If table data is only accessed via parquet, it does not need to be optimized. |
Reduce data in tables | If a table has more data than needed, perform a full load that is properly filtered to reduce the data in both memory and parquet. If that is not possible, then add a load filter to the table to limit the amount of data pushed to memory. |
The same caveats that apply to schemas and tables also apply when reviewing columns for removal. That said, there are often a lot of columns that get into Incorta unnecessarily. That is because it is a common practice to just select all columns (SELECT *) from a table when creating the SQL that defines an Incorta table. The best practice is to explicitly specify only the columns that you need when creating your table definitions.
The Tables with Over 90% Columns Unused insight on the Unused Entities dashboard should quickly point out some tables for you to focus on first. For Incorta tables that already exist, drop columns that will never be used. Look at formula columns as well but make sure not to drop Key columns.
Recommendation | Notes |
Use the 4- Unused Objects dashboard | This dashboard can help you identify columns that are not used. |
Understand how columns are used | The data in a column may be accessed in ways that are not obvious. See Access Point table above. |
Review all columns | Even if a column is referenced by an Incorta dashboard it does not mean that it is still relevant. |
Avoid Select * | Specify only the columns you need when writing the SQL that defines your table (or choose only the columns you need if using the wizard). |
Delete columns that will never be used | You can always add columns back at a later date if you find you need them. Make sure not to remove Key columns. |
Going through a clean up process is a good idea when you start to notice that your memory is filling up. It can extend the life of your current hardware setup and push out the expense of upgrading. That said, there is no reason to wait to go through this exercise. Check the Unused Entities dashboard frequently and perform clean up regularly.