3

How does the Max Groups Parameter work?

Some queries can create very large results that may impact the working memory of the server. This is the case for queries with multiple nested groups. To avoid the server running out of memory incorta limits the number of group nodes in the result set.

The result set for a grouped query is a tree with each grouping field increasing the depths of the tree by one level. Results are grouped by the “Grouping Dimensions” of a table or the “Row” and “Column” fields of a Pivot table.

Incorta has a queuing system that limits the amount of simultaneous running reports to the number of physical cores. So you need to take into account that incorta may run 20-40 reports concurrently when increasing the grouping limit.

Each node of the result tree takes about 200 byte of storage. The default limit is 500,000 groups which is equivalent of 100 MB per report component. So if you have 20 physical cores, the default setting limits the memory for 20 simultaneous running reports to 2GB. Since incorta uses per default 75% of the heap for data storage, 25% of memory is left for garbage collection and working memory. We would not recommend setting the grouping limit to result in more than 10-15% of the heap memory.

This would roughly result in the following safe group limit formula:

Safe max group limit = 10% *heap size  / cores / 200

For a heap size of 200GB with 20 cores the recommended limit would be

Group limit Low = 10% 200 *10^9 / 20 / 200 = 5 million

Group limit High = 15% 200 *10^9 / 20 / 200 = 7.5 million

Note that both the maximum number of parallel running reports (cores) in the equation as well the maximum memory (75%) can be adjusted downwards to leave more working memory to run larger reports.

Adding (mandatory) filters to a dashboard is another way to reduce the size of the grouping tree.

So how many group nodes does a report create? Here are a few examples:

Example 1

Assume you have a report that groups by customer and then by year. There are 10,000 customers and for each customer 10 years of data:

  1. The first level of the customer tree will have 10,000 nodes.

  2. The second level will have 10,000 customers * 10 years = 100,000 nodes

  3. The total amount of grouping nodes will be 10,000+100,000 = 110,000 nodes

  4. The amount of nodes is less than the default grouping limit.

Example 2

Grouping is much more efficient if it follows a natural hierarchy from small numbers to larger numbers than in a random order.

Assume your data set has 100 million transaction rows.

You have 5 regions, 20 countries and 200,000 customers.

So in this case it would be natural to group by Region, Country and then by Customer.

Example 2a

You group by Region, Country, and then by Customer

  1. You group by 5 Regions that require 5 group nodes

  2. You group 20 Countries - each country is in one region and requires 20 group nodes

  3. The 200,000 Customers are distributed over the 20 countries and require 200,000 group nodes

  4. The result is 200,000+20+5 = 200,025 group nodes

  5. The amount of nodes is less than the default grouping limit.

Example 2b

You group by Customer (!), Country, and then by Region.

  1. You group by 200,000 customer which requires 200,000 group nodes

  2. Under each customer there is one country resulting into one group node per customer requiring 200,000 additional node for country.

  3. Under country there is one region per country/customer which requires 200,000 grouping nodes.

  4. The result is 200,000+200,000+200,000  = 600,000 group nodes

  5. The amount of nodes is more than the default grouping limit and the report fails

Note on Pivot Tables

For Pivot table there are some more nodes required. The pivot table creates:

  1. A grouping tree over rows+columns, and then two further (smaller trees)

  2. A grouping tree for the column totals and

  3. A grouping tree for the row totals.

1reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Based on the explanations from Matthew, here are the best practices:

    • Add the dashboard level prompt default filter to limit the size of data to be grouped when the user first open the dashboard
    • Add mandatory dashboard prompt to slice the data to avoid the SELECT ALL data in an aggregate insight with a lot of grouping attributes.  Please note a user can still select values but we can avoid the chance to have a large number of groups.
    • Please control the usage of the max group limit by setting the value at the CMC level to a reasonable size, 0.5 M, for example.  This number is related to the on-heap memory allocated in the CMC. 
    • Check the overall memory available from the analytics service machine before increasing the memory allocation in CMC.  Increasing the memory available before increasing the default max group.
    • When necessary, override the setting only at the dashboard level, not increase the default to avoid misuse of the max grouping.  
    • Use a List report, instead of Aggregate report, when appropriate, since List report does not rely on Max Group.
    • Avoid using the attribute that have a huge number of distinct values as the grouping attribute. Sometime concatenating attributes may help reducing the number of grouping attributes in an insight.
    • The order of grouping attribute does matter.  Please put the grouping attributes that are parent of other grouping attribute first.  Basically, from the Parent to the Child according to the logical hierarchical level if such data pattern exists.  For example, Country -> Region -> State -> City -> Address.
    • Design a Drill down dashboard for the lowest level and use the List Report as the drill down report.  This way the grouping attribute at the bottom can be eliminated.
    Like
Like3 Follow
  • Status Answered
  • 3 Likes
  • 5 days agoLast active
  • 1Replies
  • 456Views
  • 3 Following

Product Announcement

Incorta 4.9 is now Generally Available (GA)!!!