Window functions for Linux users

Select employees with highest salary in each departmentIn our table we have two departments IT and HR , Third employee has the highest salary in him own department whereas Six employee has the highest salary in HR department .

Let’s write SQL statementselect e.

name,d.

name,e.

salary from emp e inner join dep d on e.

dep_id = d.

idwhere (e.

dep_id,e.

salary) in ( select dep_id,max(salary) from emp group by dep_id )Result 1We join two tables and then use subquery to fetch max salaries for each department.

But what if we want to select employee name and difference between max salary in department and employee salary.

select e.

name ,e.

salary ,top_salaries.

max from emp einner join ( select dep_id,max(salary) as max from emp group by dep_id )top_salaries on top_salaries.

dep_id = e.

dep_idResult 2If requirements such as max ,comparing with min , top n-th will grow then our query would become more complex to maintain.

For such situations Postgres gives us strong feature called Window functionsLet’s see the definition from official documentationWindow functions provide the ability to perform calculations across sets of rows that are related to the current query row.

See Section 3.

5 for an introduction to this feature.

The built-in window functions are listed in Table 9-49.

Note that these functions must be invoked using window function syntax; that is an OVER clause is required.

In addition to these functions, any built-in or user-defined aggregate function can be used as a window function (see Section 9.

20 for a list of the built-in aggregates).

Aggregate functions act as window functions only when an OVER clause follows the call; otherwise they act as regular aggregates.

Quite hard to understand so let’s see an examplesMake note: window functions always use the OVER() clause so if you see OVER() you're looking at a window function.

Select name , salary and sum of all salaries from employee tableselect name,salary,sum(salary) over() as total from emp ;Result 3Please notice that without window functions we can’t use aggregate functions with additional columns without group by clause.

And now let’s see an SQL equivalent without window functionselect e.

name,e.

salary,top.

sum from emp e,(select sum(salary) as sum from emp) top;Obviously , first one is more readable.

Condition clausesIn the above example, we looked at a simple window function without any additional conditions, but in many cases, you’ll want to apply some conditions in the form of additional clauses to your OVER() clause.

One is PARTITION BY which acts as the grouping mechanism for aggregations.

The other one is ORDER BY which orders the results in the window frame.

Partition by — allows us to group aggregations according to the values of the specified fields.

Do you remember the second query where we have selected employee with highest salary in him department.

Now let’s rewrite it using Partition byselect salary,name,sum(salary) over(partition by dep_id) from emp;We include column names(dep_id) to group result.

I want you to mention that we used only aggregate functions , but Postgres provides a number of window specific functions which you can find here.

One of the most popular use case of window functions is to rank each row using RANK() function .

Let’s say we want to rank each employee’s salary according to department.

We can easily do it using RANK function.

select e.

salary,e.

name,d.

name ,rank() over() from emp einner join dep don e.

dep_id = d.

id;Result 4Now all employees have the same rank because ranking query requires an order to by established,let’s fix it using ORDER BY.

select e.

salary,e.

name,d.

name ,rank() over(order by e.

salary) from emp einner join dep don e.

dep_id = d.

id;Result 5Now we have ranks for each employees but we need to divide them by department , it’s condition so we will use PARTITION BYselect e.

salary,e.

name,d.

name ,rank() over(PARTITION BY e.

dep_id order by e.

salary) from emp einner join dep don e.

dep_id = d.

id;Result 5Now if we want to see top 2 salaries for IT department we can easily do it using limit and where clauses with query aboveselect e.

salary,e.

name,d.

name ,rank() over(PARTITION BY e.

dep_id order by e.

salary) from emp einner join dep don e.

dep_id = d.

idwhere e.

dep_id = 1limit 2;Important thing about ORDER BY it’s how it works with aggregate functions .

As you remember adding PARTITION BY to aggregate function gives us the same result for all rows , but ORDER BY recalculate result depending on the number of row.

To see it we will create a table of payments historycreate table payment_history(id serial primary key,payment double precision,user_id int references emp(id));insert into payment_history (payment,user_id)values (10,(select id from emp where name like 'First')),(20,(select id from emp where name like 'First')),(-5,(select id from emp where name like 'First')),(-10,(select id from emp where name like 'First')),(2,(select id from emp where name like 'First'));We have 5 payments for First user , and now we want to see changes in balance after each payment.

Without ORDER BY result will beselect e.

name,p.

payment, sum(p.

payment) over() from emp e inner join payment_history p on e.

id = p.

user_id;Result 6Now let’s add ORDER BY to query aboveselect e.

name,p.

payment, sum(p.

payment) over(partition by p.

user_id order by p.

id) from emp e inner join payment_history p on e.

id = p.

user_id order by p.

id;Result 7As you can see we have changes in user balance after each iteration.

For example for first row sum is 10 because user has only 1 payment , for second row sum is 30 because it’s sum of first payment(10) and second payment(20)ConclusionIn conclusion I want to emphasize that we didn’t consider all window functions like lead,first_value_last_value,but I hope you have understood which problems can be solved with this strong feature.

In the second part of this article I want to describe window function it term of performance .

.

. More details

Leave a Reply