Solved: Performance Analysis of Postgress connection

srini_ch · ‎09-30-2022

Hi Community,

Basically, I am creating a multi-source table using postgres connection, which utilizes the tables already existed in other Incorta schemas. Similar to writing an MV with Union All.

I just wanted to understand the performance impact of this approach versus the standard MV creation using Spark SQL or pySpark. Is there any documentation that I could use?

Thanks,

Srini

dylanwan · ‎10-03-2022

We called the approach "Incorta Over Incorta". It is a SQL table that uses the posgreSQL driver against Incorta SQL interface. In general, it is not recommended unless you have to use it for the support of multiple-source.

Since it is running against the Incorta SQL interface, which runs by the Incorta Analytics service, the data will have to be refreshed available from the Incorta Analytics service. It means that the data that was extracted but has not yet loaded will not be available. You will see the data from the last time the data was refreshed, not the current uncompleted refresh.

It can perform very well if the data can fit the memory and the SQL query does not cause the SQL interface to fallback to Spark. However, it will compete the resource that will serve the users who is running Incorta dashboard against Incorta Analytics service as well.

If the query does fallback to Spark, which will be handled by the SQLapp job that is running in your Spark cluster. It may also perform very well as spark has parallel process.

Whether the query will be by Spark fallback is not really a runtime decision but based on the following criteria:

Whether the query involves any table that is not loaded into memory, such as those tables with optimized set to false.
Whether the query syntax can be supported by Incorta query engine. For example, joining different tables and joins are not defined in Incorta

Using a MV is recommended but the merge of multiple data sets will involve duplicating the data. You will have to extract the data into a SQL table and then use it as the source for the MV. MV can support incremental and the source extract can support incremental. It may not be too bad to duplicate the data. If the source table is not used in any analytic query, we suggest you to make it non-optimized to save the memory and data refresh time.

If you are not using Incorta over Incorta and are not using any 3rd party tool against Incorta, SQL interface can be disabled. If you disable SQL interface, SQLapp can be disabled, it will not compete with Incorta MV on the limited resource from Spark cluster.