Mash's Musings


Aggregations inside window functions

Published Jul. 24, 2021

I wrote recently about the path of learning SQL and I wanted to take a little excursion into what step 6 can look like. In that recent post, I provided examples of when one can use window functions. We generally want to compare things between days or some other aggregated measure. Unfortunately, as anyone working with data knows, the source data is often at a transactional level with many many records per day. This problem is easily handled by CTEs: first we aggregate our records by day as a CTE and then we apply the window functions to the aggregated CTE. But, did you know that this can (in most SQL engines) be elegantly and concisely written without the use of a CTE? Let's take a look at an example.

Let's say that we have a table of temperature recordings and we want to compare the change in daily highs. We have a table temps that records the time and temp at given reading. And because we are sane and sensible, we record our temperature in Celsius.

time temp
2021-07-01 12:14:42 20.0
2021-07-01 17:01:14 25.0
2021-07-02 06:08:22 18.0
2021-07-02 20:00:11 23.0
2021-07-03 04:39:21 14.0
2021-07-03 11:30:00 21.0
2021-07-03 22:09:43 18.0

The first step would involve figuring out the max temperatures for each day.

SELECT date_trunc('day', time) AS day, max(temp) AS max_temp
FROM temps
GROUP BY 1
ORDER BY 1

Which gets us:

day max_temp
2021-07-01 25.0
2021-07-02 23.0
2021-07-03 21.0

Now with a little window function wizardry, we can compute our day-over-day change.

WITH daily AS (
  SELECT date_trunc('day', time) AS day, max(temp) AS max_temp
  FROM temps
  GROUP BY 1
  ORDER BY 1
)
SELECT *,
       lag(max_temp) OVER(ORDER BY day) AS yesterday_max,
       max_temp - lag(max_temp) OVER(ORDER BY day) AS dod_change
FROM daily

Which gets us:

day max_temp yesterday_max dod_change
2021-07-01 25.0
2021-07-02 23.0 25.0 -2.0
2021-07-03 21.0 23.0 -2.0

All is well and we have what we need. But we can do better than that!

SELECT date_trunc('day', time) AS day,
       max(temp) AS max_temp,
       lag(max(temp)) OVER(ORDER BY date_trunc('day', time)) AS yesterday_max,
       max(temp) - lag(max(temp)) OVER(ORDER BY date_trunc('day', time)) AS dod_change
FROM temps
GROUP BY 1
ORDER BY 1
day max_temp yesterday_max dod_change
2021-07-01 25.0
2021-07-02 23.0 25.0 -2.0
2021-07-03 21.0 23.0 -2.0

Slick! If you look carefully, all we really did was sub in max(temp) and date_trunc('day', time) into our window function where we previously had used the aliased fields max_temp and day respectively. This nesting can look quite confusing at first, especially if you start doing things like sum(count(*)) OVER() to get the total count of records by day but I think it is wonderfully clear when you understand what's happening.

Like all things, practice makes perfect. Don't expect to get this right the first time. But next time you see this pattern of aggregation then window function, stretch yourself and see if you can do it in one shot. Who knows who you can become with all the extra time saved from writing those 5 extra lines of code!?