Learning SQL is often a 6-step process:
One of the first barriers I ran into was in step 3, trying to wrap my head around window functions. Fortunately there are many great resources that provide in-depth explanations of how window functions work which certainly helped me in getting to step 3 and step 6. But I want to focus on perhaps a more important aspect of getting to step 3: when should I even use window functions? What types of problems will they help solve?
Fundamentally, when thinking through any SQL problem, one must think in
terms of records. What information does each record in the table
represent? Do I only care about certain records? Time to think of the
WHERE
. Do I need records from a different table?
FROM
clause. Do I care about records in aggregate?
GROUP BY
clause. Do I only care about specific groups?
HAVING
clause. The vast majority of the time, this basic
syntax will get me what I need. By thinking very carefully about what I
want to happen to 3 or 4 records, I can get a sense of the shape of my
final query.
However, sometimes when breaking down a problem, I run into a roadblock: what if I need information from some other records to compute information about this record? For example, let's say I have daily records of the weather, what if I want to compare today's record against yesterday's record? Running through the list of basic SQL clauses, I realize that none of them quite solve this problem. Enter window functions. The biggest clue to use window functions is when you need to pull information from other records, into this record.
Okay... Well when is that? Since that statement is a little opaque, let's think of a couple examples and break down how they fit the criteria.
lag(temp) OVER(ORDER BY day)
.
GROUP BY
week. But when I break down the problem in
terms of records, I realize that what I'm actually trying to do is for
time t, take the average of the previous six records,
t-6, and this record t. After some searching and
trial and error, I find that I need to
avg(temp) OVER(ORDER BY day ROWS 6 PRECEDING)
.
sum(steps) OVER()
the other records
to get my denominator.
sum(steps) OVER(ORDER BY day)
Hopefully these examples give you a better picture of when window functions are the right tool to use. They may not be something you use everyday but when you inevitably run into this inter-record problem space, they really make your life easier. As a fun aside, know that all window functions could be replaced by a single CTE and join, which is how these inter-record problems were solved up until recently. Once you start to feel a solid footing on step 3, try replacing your window functions with joins to start the climb to step 4.