cancel
Showing results for 
Search instead for 
Did you mean: 
Tristan
Employee
Employee

Introduction

When you first implement Incorta, you typically size your hardware for expected current needs plus expected growth in data and users for two years.  Over time, as more data is brought into Incorta and as your developers configure new use cases, you will start to hit your headroom on memory.  You can always add more RAM and disk, but before doing that, it is worth reviewing your implementation for unused entities that are taking up space unnecessarily.

What you need to know before reading this article

The topics covered here are useful to both on premises and Incorta Cloud customers.  It will be helpful to be familiar with these concepts as you read this article.

Let's Go

There are a number of levers available to you to reduce the amount of memory that Incorta uses by cleaning up what you do not need!

Identify Unused Entities

With Incorta, it is so fast to create schemas and tables and load data that we actually encourage it as a way to quickly experiment and build proofs of concept before formalizing  your use cases.  The result over time can be a build up of objects/entities that are no longer being used but still exist.

Incorta has created a number of tools that help you to understand what is going on with your implementation, including an "Unused Entities" dashboard that helps you identify:

  • Schemas not referenced in dashboards
  • Tables not referenced in dashboards
  • Tables not referenced in business schemas
  • Columns not referenced in dashboards

Note that a reference to an entity in an Incorta dashboard can be directly to the entity or via a business view.

Screen Shot 2022-11-14 at 4.20.22 PM.png

This dashboard gives you a solid list to look at for clean up.  You can and should install the latest version of the Incorta Tools which includes the "4- Unused Entities" dashboard from the Home tab of Incorta or you can work with Support to get it installed.

Schemas

The Physical schemas not referenced in dashboards insight on the Unused Entities dashboard provides you with a list of schemas that are not referenced by Incorta dashboards to start with in your review, but there are some caveats to be aware of.  First, you may not be using Incorta to build your dashboards.  In fact, there are number of reasons to bring data into Incorta and not use the data in an Incorta dashboard:

Access Point Comments
SQLi You are creating your dashboards using another tool like PowerBI or Tableau to access data stored in Incorta with the SQL Interface (SQLi)
Derived Tables The schema may be referenced in a Derived Table like a Materialized View
API The data in the schema may be accessed via API for use outside of Incorta
Data Destination Data from the schema may pushed to a Data Destination directly

Second, it is a common practice to create dashboards to help validate the data you have brought in and to sketch out how you might present the data to an audience.  Just because a dashboard exists does not mean that it and the schema(s) behind it are currently relevant.  You will also need to look at the schema that do not show up on the Physical schemas not referenced in dashboards insight.

Recommendation Notes
Use the 4- Unused Objects dashboard This dashboard can help you identify schemas that are not used.
Understand how schemas are used The data in a schema may be accessed in ways that are not obvious.  See Access Point table above.
Review all schemas Even if a schema is referenced by an Incorta dashboard does not mean that it is still relevant.

Tables

Most of the same caveats that apply to removing schemas, also apply to removing tables. There are also a couple other considerations for saving memory specific to tables.

First, is the table optimized?  There is a setting on tables called Performance Optimized. When enabled, the data in the table will be loaded into memory by Incorta to speed access to it.  Depending on how the table is used, it may not need to be optimized.  For example, if it is only ever referenced by an MV or one of the other methods mentioned in the Access Point table, it does not need to be in memory.  In this case, make sure that the table is not optimized.

Second, does the table have more data in it than is needed?  This may happen if you have been loading data incrementally for a long time.  If the answer is yes, then there are two simple approaches to reduce the amount of table data that goes to memory.

  1. You can reload the data with a full load making sure that your full load query is appropriately filtered so that you only load the needed data.  Doing this on a regular schedule, will keep the size of your table in check which will limit the amount of memory it takes up.
  2. Alternatively, you can create a load filter on your table. This will not affect the amount of data loaded to parquet, but it will limit the amount of data that gets loaded into memory.  If you cannot do a full load (because the table contains snapshots or would take to long to reload etc...), then this would be the way to go.
Recommendation Notes
Use the 4- Unused Objects dashboard This dashboard can help you identify tables that are not used.
Understand how tables are used The data in a table may be accessed in ways that are not obvious.  See Access Point table above.
Review all tables Even if a table is referenced by an Incorta dashboard it does not mean that it is still relevant.
Make non-optimized If table data is only accessed via parquet, it does not need to be optimized.
Reduce data in tables If a table has more data than needed, perform a full load that is properly filtered to reduce the data in both memory and parquet.  If that is not possible, then add a load filter to the table to limit the amount of data pushed to memory.

Columns

The same caveats that apply to schemas and tables also apply when reviewing columns for removal.  That said, there are often a lot of columns that get into Incorta unnecessarily.  That is because it is a common practice to just select all columns (SELECT *) from a table when creating the SQL that defines an Incorta table.  The best practice is to explicitly specify only the columns that you need when creating your table definitions.  

The Tables with Over 90% Columns Unused insight on the Unused Entities dashboard should quickly point out some tables for you to focus on first.  For Incorta tables that already exist, drop columns that will never be used.  Look at formula columns as well but make sure not to drop Key columns.

Recommendation Notes
Use the 4- Unused Objects dashboard This dashboard can help you identify columns that are not used.
Understand how columns are used The data in a column may be accessed in ways that are not obvious.  See Access Point table above.
Review all columns Even if a column is referenced by an Incorta dashboard it does not mean that it is still relevant.
Avoid Select * Specify only the columns you need when writing the SQL that defines your table (or choose only the columns you need if using the wizard).
Delete columns that will never be used You can always add columns back at a later date if you find you need them.  Make sure not to remove Key columns.

Conclusion

Going through a clean up process is a good idea when you start to notice that your memory is filling up.  It can extend the  life of your current hardware setup and push out the expense of upgrading.  That said, there is no reason to wait to go through this exercise.  Check the Unused Entities dashboard frequently and perform clean up regularly.

Related Material

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Contributors
Version history
Last update:
‎11-14-2022 04:31 PM
Updated by: