Load failures are sometimes not graceful

johnmanitaras · ‎11-28-2023

Hi,

Some of our schema loads will occasionally fail due to memory errors. On this I would like to pass on some feedback as well as seek some advice.

The feedback is that depending on how it falls over it might end up breaking all practical data access due to the way that it has failed. It seems (I suspect) that if the memory issue happens while it is calculating formula columns in post load, them this seems to manifest by it not loading into memory all the dependent data from other tables & schema’s, causing some of those formula calculations to error while the overall schema still completes and updates. This results in data which has null values in those formula columns which can be a serious problem since some of these formula columns are used extensively as filters in virtually all of our dashboards. This can effectively break all access to data until the schema loads again, and even a staging load on some of these schema's isn't especially quick to complete (assuming someone notices the problem and triggers it!).

It would be much preferred if the load job just errored and did not update the parquet, so that data could still be accessed.

On our side, we are aware that we have multiple opportunities to optimise our data model, however Incorta's documentation on this talks in a very general way and I am not clear on which aspects of our data model are having the biggest impact on loader memory specifically. Memory has been a persistent issue for us since we started using Incorta five years ago so it would be great if we could understand this in more detail. Can you provide some specific advice about optimising for loader service memory?

mhelmy · ‎12-13-2023

Hello John,

Thank you for your feedback.

We've made major enhancements to the loader architecture starting from the On-prem 6.0 release, introducing improved memory management that significantly reduces memory issues. In the latest updates, we've implemented a red-zone threshold for off-heap memory, allowing us to pause memory-bound calculations as needed. Additionally, a retry mechanism has been introduced to reduce failed calculations caused by memory issues.

For more details about the loader enhancements, please refer to this community article.

Furthermore, we have plans to provide you with visibility into the load job, enabling you to identify memory requirements for specific calculations like formulas or joins. This enhancement should assist you in optimizing your model.