- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-26-2022 02:28 PM
I am trying to do a seemingly simple thing. I thought I could use a result-set approach but it did not work.
So, here is the task: group averages of numbers based on a category. An aggregate insight does that well. The problem is I want to put in a formula to calculate the difference between the averages.
Data:
Category | Average A |
Cat1 | 10 |
Cat2 | 5 |
The result is a transposed set with a difference column:
Metric | Cat1 | Cat2 | Difference |
Average A | 10 | 5 | 5 |
Any and all suggestions are appreciated.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-28-2022 07:12 AM - edited ‎10-28-2022 07:12 AM
To round out the solution: I also had to use an Incorta Analyzer table to calculate the averages and have access to them from the KPI insight. The MV wouldn't allow me to do the calculations in SQL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-28-2022 01:14 PM - edited ‎10-28-2022 01:19 PM
Here's how this can be solved using an MV with Spark SQL:
I'm using an 'AdventureWorks' database, if your not familiar its simply a sample database often used for testing for a retailer that has tables such as OrderDetail, OrderHeader, Products, ProductSubcategory, ProductCategory etc...
In the first screenshot you can see I have a query that joins a few tables to create a result set in the same format you have with Categories & AvgLineTotal. This will become an inner query, but I'm applying the avg here to improve query performance(reduce as early as possible)
In the screen shot below you can see the result set of the query:
Next, lets use a Pivot function in an outer query, youll notice an avg is applied in the pivot function as its required to apply an aggregate function with pivot, however this does not effect the result in this case. Also to note, most modern versions of sql support this now, so we are going to use this to swap the categories into fields as shown below:
After running this we can now see the result set getting closer to our desired outcome:
And finally, we can perform any function/mathematical operations against those category fields, in our case I chose 'Bikes-Clothing AS Bike_Clothing_Diff' in the outer query as shown below and we can see our difference:
SELECT *, Bikes-Clothing AS Bike_Clothing_Diff FROM(
SELECT pc.categoryname AS Category, avg(sod.linetotal) AS AvgLineTotal
FROM OnlineStore2.salesorderdetail sod
Left Join OnlineStore2.product p ON sod.productid = p.productid
Left Join OnlineStore2.productsubcategory ps ON p.productsubcategoryid = ps.productsubcategoryid
Left Join OnlineStore2.productcategory pc ON ps.productcategoryid = pc.productcategoryid
GROUP BY pc.categoryname
) a
PIVOT(avg(AvgLineTotal)
FOR Category IN('Bikes','Clothing','Accessories','Components'))
Hopefully this gives you an additional option, please let us know if you have any further questions or if you need any further detail anywhere in this process.
Regards,
Zach, Solutions Engineer @ Incorta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-28-2022 01:41 PM
Thank you, Zach. This is very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-04-2022 12:06 PM
I have a follow-up for you Zach. @ZachBreimayer: How would the SQL change for multiple aggregate columns?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-04-2022 02:17 PM - edited ‎11-04-2022 02:20 PM
Hi Marc,
Here is an example I have extended upon the previous. In the first image you can see that within the pivot function after the 'PIVOT(avg(AvgLineTotal' you can then simply add a comma and add the next aggregate 'SUM(OrderQTYTotal)' as shown in the first image below.
Doing this will create a field for each combination of aggregates and categories as shown in the next image.
SELECT *, Bikes_AvgLineTotal-Clothing_AvgLineTotal AS Bike_Clothing_AvgLineTotal_Diff, Bikes_OrderQTYTotal-Clothing_OrderQTYTotal AS Bike_Clothing_OrderQTYTotal_Diff FROM(
SELECT pc.categoryname AS Category, avg(sod.linetotal) AS AvgLineTotal, sum(OrderQTY) AS OrderQTYTotal
FROM OnlineStore2.salesorderdetail sod
Left Join OnlineStore2.product p ON sod.productid = p.productid
Left Join OnlineStore2.productsubcategory ps ON p.productsubcategoryid = ps.productsubcategoryid
Left Join OnlineStore2.productcategory pc ON ps.productcategoryid = pc.productcategoryid
GROUP BY pc.categoryname
) a
PIVOT(avg(AvgLineTotal) AS AvgLineTotal, SUM(OrderQTYTotal) AS OrderQTYTotal
FOR Category IN('Bikes','Clothing','Accessories','Components'))
Appreciate the follow up, always happy to help so let us know if you have any more questions!
Zach, Solutions Engineer @ Incorta