08-07-2022 05:02 PM
We have a schema that has been unable to complete its update for over 50 hours. We normally load this schema twice daily. It takes about 1.5 hours normally. Here you can see the status:
There have been no changes in the schema design that I'm aware of. Prior to Friday it ran successfully every time. If we click on "In Queue" to look for blocking jobs, the list is empty:
So how do we troubleshoot this? Is there anything in particular that would prevent Incorta from syncing the metadata in the Commit and Sync stage of a job? It's clear that it will never finish, since one phase of the job has already taken 50x longer than the whole job normally takes. Sync normally takes a trivial amount of time.
I have reviewed the documentation here: https://docs.incorta.com/cloud/references-data-ingestion-and-loading and several related parts of the docs. I could not find any guidance on how to return to operational status in this situation.
I've seen some similar threads in Spark forums/ developer boards with the job executor getting stuck on the last step. Any insights would be greatly appreciated.
08-07-2022 05:11 PM
Additional note... other schemas are able to load and sync just fine. Unfortunately the one that's locked is the most-used schema, as might be expected.
08-08-2022 12:06 AM
The Sync phase is for making the data available in the analytics service for reporting after a schema refresh.
After the data is refreshed from a a scheduled or an ad hoc data load job, the data has to be available. Incorta uses the Sync phase to sync between the nodes, typically the analytic service and the loader service. When multi-node architecture is being used, the Sync phase cover the sync among all nodes including analytic nodes and loader nodes.
You may try the following:
To get the alert for long running schema job, you may consider deploying an data alert that runs against the incorta metadata job history.
You may run into an issue that is addressed by a later Incorta release. Here is the release node for Incorta 5.2.
It mentioned : Interrupt long-running dashboards that block sync processes
Hope these help