.png)
- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on
12-04-2024
10:48 AM
- edited on
01-09-2025
02:50 PM
by
Tristan
Introduction
This update allows users to enable the Advanced SQL Interface without automatically activating Null Handling, preserving the behavior of existing dashboards and join calculations.
What you need to know before reading this article
- Null Handling Overview: Null Handling differentiates between nulls, zeros, empty strings, and empty dates.
- Advanced SQL Interface Overview:
- This feature connects Incorta to external BI tools (e.g., Tableau, Power BI) and optimizes query execution using Spark and the Incorta Engine.
- The Advanced SQL Interface determines the optimal execution path for queries, leveraging the Engine or Spark to enhance performance.
Understanding how these features interact will help you make informed decisions about your configuration.
Null Handling and Advanced SQL Interface Updates
Null Handling: Behavior Changes
In earlier 2024.7.x releases, the Loader Service always treated records with null values individually. During join calculations, a null value was not considered equal to another null, zero, or empty value. The Loader Service also accounted for null values when evaluating join filters.
Starting with the 2024.7.3 release, the Loader Service will respect the Null Handling flag when performing joins and evaluating join filters:
- If Null Handling is enabled, a null value will not be treated as equal to another null, zero, or empty value during join calculations. This ensures that nulls are preserved as distinct values during the join process.
- If Null Handling is disabled:
- Nulls in numeric columns will be treated as zeros.
- Nulls in string/text columns will be treated as empty strings.
- Nulls in date columns will be treated as empty dates.
Key Consideration:
- After enabling or disabling this setting, perform a load from staging or wait for the next load job to process the changes. All services must be restarted.
Advanced SQL Interface: Behavior Changes
In earlier 2024.7.x releases, enabling the Advanced SQL Interface automatically enables Null Handling.
Starting with the 2024.7.3 release, the Advanced SQL Interface can operate independently of Null Handling, offering users greater flexibility and control.
As mentioned, the Advanced SQL Interface determines the optimal execution path for queries by leveraging the Engine or Spark to optimize performance. While Spark inherently respects null values, the Engine requires Null Handling to be enabled; otherwise, nulls are interpreted as zeros or empty values, potentially leading to inconsistencies.
Recommendation:
Enable Null Handling alongside the Advanced SQL Interface to ensure consistent query results.
Use Cases of Inconsistency when Advanced SQLi is enabled, and Null Handling is disabled
Given the dataset:
ProductName | Category | Price |
Laptop |
Electronics |
1000 |
Mouse |
Accessories |
25 |
Keyboard | Accessories |
45 |
Monitor |
Electronics |
200 |
Tablet | Electronics |
NULL |
Printer |
Electronics |
150 |
Smartphone |
Electronics |
800 |
Webcam |
Accessories |
NULL |
Speaker |
Audio | 120 |
MS-Office |
Software |
100 |
MS-Office |
Software |
50 |
MS-Office |
Software |
NULL |
Example1: Spark SQL Views & Insights in Incorta:
- Edit or Create a business schema
- Create a Spark SQL View with the query below and click on execute
SELECT ProductName, avg(Price) as avgPrice, percentile(Price, 0.3) as percentilePrice
FROM productsalesschema.productdata
WHERE ProductName IN ('Webcam', 'Tablet', 'MS-Office')
GROUP BY ProductName
ORDER BY ProductName
- Result: Since the query involves the percentile function, Spark processes it, and the result will include null where appropriate:
ProductName Avg price percentilePrice MS-Office 75.00 65.00 Tablet NULL NULL Webcam NULL NULL -
Create a new listing table insight in any Dashboard using the Spark SQL View mentioned above. Then, drag and drop ProductName and avgPrice into the insight.
-
Result: Querying only ProductName and avgPrice will be processed by the Engine, which ignores null values because Null Handling is disabled.
ProductName avgPrice MS-Office 50.00 Tablet 0.00 Webcam 0.00 - Drag and drop percentilePrice to the insight and observe the result.
- Result: Adding percentilePrice to the insight triggers Spark to process the entire query, ensuring null values are respected and accurately reflected in the results.
ProductName avgPrice percentilePrice MS-Office 75.00 65.00 Tablet NULL NULL Webcam NULL NULL
Example 2: Running Queries from BI Tools:
Let’s try running the following Spark SQL queries directly from DbVisualizer:
Scenario 1:
The query below retrieves the Product Name, Average Price, and Percentile of the Price. It is processed by Spark, which ensures that null values are properly accounted for when calculating both the average and the percentile.
SELECT ProductName, avg(Price) as avgPrice,
percentile(Price, 0.3) as percentilePrice
FROM productsalesschema.productdata
WHERE ProductName IN ('Webcam', 'Tablet', 'MS-Office')
GROUP BY ProductName
ORDER BY ProductName
Result:
ProductName | avgPrice | percentilePrice |
MS-Office |
75.00 |
65.00 |
Tablet |
NULL |
NULL |
Webcam |
NULL |
NULL |
Scenario 2:
The query below retrieves the Product Name and Average Price. The Engine processes it, and since the Null Handling feature is disabled, null values will be treated as zeros.
SELECT ProductName, avg(Price) as avgPrice
FROM productsalesschema.productdata
WHERE ProductName IN ('Webcam', 'Tablet', 'MS-Office')
GROUP BY ProductName
ORDER BY ProductName
Result:
ProductName | avgPrice |
MS-Office |
50.00 |
Tablet |
0.00 |
Webcam |
0.00 |
Conclusion:
This update provides greater flexibility by allowing users to enable the Advanced SQL Interface independently of Null Handling. However, users must review their configurations to ensure query results remain consistent and accurate.