06-22-2023 11:27 AM
I have a scenario in front of me where I have a bunch of identical ( columns ) Incorta tables -- i.e. I have a table for each month of the year for the past 2+ years which I want to consolidate into one big table w/ all the data inside.
Monthly tables: Pulled from original source. Broken into monthly tables because they return exponentially faster that way.
Consolidated table: IncortaSQL apparently does not support doing a big UNION or UNION ALL and we were getting memory errors using SQLi with the big UNION statement.
What *IS* working is having all the month tables as separate datasets via SQLi - and it's pretty fast!
That said, we must remain optimistic and assume the calendar will not stop.
So we will need to keep adding month tables as we go ( we have 2023 covered, w/ July forward obvs not having any data )
So - finally - the question: Is there a limit to the number of datasets which Incorta will support?
Corollary question - how are these processed? Does Incorta send 36 requests ( to itself in this case ) at the same time? Is that governed by the parallel job tasks config in the CMC? I can't imagine they are all done in series, although there must be some staging somewhere w/in the mix because I am having Incorta enforce primary key constraints.
I'm very excited about this - give me some good news Community members! 😉
Screen cap - 3 of 30! datasets ( 2021 combines Jan-Jun as they were relatively low volume months )
06-27-2023 04:41 PM
TL/DR - making these modular will give me more flexibility and increase my ability to move corrective changes into the most used dashboards in the shortest time possible.
------------
That said, the primary reason for staging into Incorta first is to meter/limit the activity against the source DB . The source is a view on a production box. The view is ( I gather ) pretty heavy and is completely "black box" to me, so I can't do any tuning or even recreate it w/in Incorta. The short is, because it's production I don't want to be a burden on the system which may affect operations.
Currently I have three schemae ( 2021, 2022, 2023 with 7, 12, and12 tables respectively ) which I can run in series or independently and which can be further broken down if need be.
So - use case: A change is made to move a product to a different category and that change needs to be reflected ASAP for the WTD/MTD/QTD/YTD reporting I can choose to run only the tables necessary for the immediate need *now* and be running correct dashboards very quickly while correcting further history at my schedule.