on 05-02-2022 09:00 AM
Simply put, a Z-Score is the number of standard deviations from the mean a data point is. In other words, Z-Score measures a particular performance against the entire field of values, accounting for both the average result and the full range of performances from top to bottom. In sports, this measure can be particularly useful for comparing athlete performances when different venues are considered. For example, shooting a 62 for a professional golfer is great, but how great is it really if the golf course is super easy or other golfers are also scoring really low on that particular day. This article from Grantland does a great job of describing this measure.
Formulaically, a z-score is calculated as:
z = (x - µ)/s
where:
µ is the mean of the population, and
s is the standard deviation of the population
I recently had to write a SQL statement against the Incorta SQL interface to produce this calculation. The trick with SQL is that you have to do a "first pass" to calculate the standard deviation. As seen below, I do this in a separate SELECT and CROSS JOIN it to my main query. I figured I would share it here in case it benefits others down the road as this is a fairly common metric.
The SQL statement below calculates the Z-Score for each AccountNumber based on how much a particular account purchased (AcctSalesTotal) as compared to the rest of the accounts in June, 2014. The formula for the Z-Score can be seen at the top of the statement, (t.AcctSalesTotal - tt.AvgAll) / tt.sdAll As zScore. The mean (AvgAll) and the standard deviation (sdAll) are calculated in the SELECT below and CROSS JOINED into our main SELECT.
SELECT t.AccountNumber,
(t.AcctSalesTotal - tt.AvgAll) / tt.sdAll As zScore
FROM
(SELECT soh.AccountNumber, SUM(sod.UnitPrice) AS AcctSalesTotal
FROM AW.salesorderdetail sod, AW.salesorderheader soh
WHERE soh.SalesOrderID = sod.SalesOrderID
AND date_part('month', soh.OrderDate) = 6
AND date_part('year', soh.OrderDate) = 2014
GROUP BY soh.AccountNumber) as t
CROSS JOIN
(
SELECT avg(tt.AcctSalesTotal) as AvgAll, stddev(tt.AcctSalesTotal) as sdAll
FROM
(SELECT soh.AccountNumber, SUM(sod.UnitPrice) AS AcctSalesTotal
FROM AW.salesorderdetail sod, AW.salesorderheader soh
WHERE soh.SalesOrderID = sod.SalesOrderID
AND date_part('month', soh.OrderDate) = 6
AND date_part('year', soh.OrderDate) = 2014
GROUP BY soh.AccountNumber) as tt
) as tt;
Happy building!