0

Difference in data between Incorta Insight and generated SQL Output

I have noticed there is difference between the outputs of the Incorta Insight and the output of the SQL generated by the Incorta Insight.

The Incorta Analyzer handles the same SQL differently compared to the same SQL Handled by the PostGre Engine.

Please find my findings,

Can anyone shed some light on this behavior

Regards

Ishaq

Query Plan

 

 

Incorta Insight output

  

 

Output of the Generated SQL of the above Insight

 

SQL of the Insight

SELECT   "EBS_HR_COMMON"."HR_OPERATING_UNITS".organization_id                  AS "Org Id",
         "EBS_CAL_COMMON"."GREGORIAN_CALENDAR".month_year                      AS "Invoice Period",
         "EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES".category_set_name             AS "Category Set Name",
         (SUM ("EBS_AR"."RA_CUSTOMER_TRX_LINES_ALL".quantity_invoiced))        AS "Quantity",
         (SUM ("EBS_AR"."RA_CUSTOMER_TRX_LINES_ALL".quantity_invoiced_mt))     AS "Quantity Invoiced Mt"
    FROM ((((("EBS_AR"."RA_CUSTOMER_TRX_LINES_ALL"
              LEFT OUTER JOIN
              ebs_item_common.mtl_system_items_b
              ON (    ("EBS_AR"."RA_CUSTOMER_TRX_LINES_ALL".warehouse_id = ebs_item_common.mtl_system_items_b.organization_id)
                  AND ("EBS_AR"."RA_CUSTOMER_TRX_LINES_ALL".inventory_item_id = ebs_item_common.mtl_system_items_b.inventory_item_id)))
             LEFT OUTER JOIN
             "EBS_AR"."RA_CUSTOMER_TRX_ALL"
             ON ("EBS_AR"."RA_CUSTOMER_TRX_LINES_ALL".customer_trx_id = "EBS_AR"."RA_CUSTOMER_TRX_ALL".customer_trx_id))
            LEFT OUTER JOIN
            "EBS_CAL_COMMON"."GREGORIAN_CALENDAR"
            ON ("EBS_AR"."RA_CUSTOMER_TRX_ALL".trx_date = "EBS_CAL_COMMON"."GREGORIAN_CALENDAR"."Date"))
           LEFT OUTER JOIN
           "EBS_HR_COMMON"."HR_OPERATING_UNITS"
           ON ("EBS_AR"."RA_CUSTOMER_TRX_ALL".org_id = "EBS_HR_COMMON"."HR_OPERATING_UNITS".organization_id))
          LEFT OUTER JOIN
          "EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES"
          ON (    (ebs_item_common.mtl_system_items_b.organization_id = "EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES".organization_id)
              AND (ebs_item_common.mtl_system_items_b.inventory_item_id = "EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES".inventory_item_id)))
   WHERE (    (    ((UPPER ("EBS_CAL_COMMON"."GREGORIAN_CALENDAR".month_year)) IN ('APR-21'))
               AND ("EBS_HR_COMMON"."HR_OPERATING_UNITS".organization_id = 7302))
          AND ((UPPER ("EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES".category_set_name)) IN ('IFI_DSR_CATEGORY')))
GROUP BY "EBS_HR_COMMON"."HR_OPERATING_UNITS".organization_id, 
    "EBS_CAL_COMMON"."GREGORIAN_CALENDAR".month_year, 
    "EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES".category_set_name
ORDER BY "EBS_HR_COMMON"."HR_OPERATING_UNITS".organization_id ASC,
         "EBS_CAL_COMMON"."GREGORIAN_CALENDAR".month_year ASC,
         "EBS_ITEM_COMMON"."MTL_ITEM_CATEGORIES".category_set_name ASC

1reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Ishaq,

    In my understanding, queries will be handled differently between SQL and what you'll see in an insight because of how Incorta performs joins.

    For example, if you have multiple parent table rows for a single child table, the Incorta insight will only return 1 row per child row.  But if you were to run a left outer join between the same tables through SQL you would get a repeated child row and each parent row would be returned.  (This might be hard to test because you should try to avoid this situation when making a schema). 

    One easy way to understand it is if you ever use a < or > operator in an Incorta join.  Incorta will return the greatest row (1 row only ) when using > as a join operator.  But if you were to put > in a SQL WHERE clause, it would return all rows that are greater than.  

    There is one other situation to check in your case because are seeing no rows returned via SQL.  I had this happen once because SQL commands handled by the PostGres engine can fallback and run on Spark depending on their complexity.  If that occurs, the source for the data is different behind the scenes.  On the server there is a directory of data called 'compacted' that's located in your [root]/tenants/[tenantName]/compacted/[schemaName].  When the Spark engine processes SQL it reads from the compacted directory.

    In that directory you'll see directory for each table - make sure there is a directory for each table.  Each directory should have a created date for the last time you ran a load (incremental or full), and make sure the name of the directory doesn't have a .0 at the end.  I once had directories with .0 extensions and I was getting no results when external SQL would fallback to the Spark engine.  I think this caused by creating a table and only ever running incremental loads (always run a full load first!) 

    -Mike

    Like 3
Like Follow
  • 2 wk agoLast active
  • 1Replies
  • 20Views
  • 2 Following

Product Announcement

Incorta 5 is now Generally Available