T-SQL window functions

SQL window clause

Microsoft SQL server provides ability to aggregate data based on range or group. This helps developer to get aggregate values at different levels without having to exclude any columns from the query. 

The different window functions 

  • Sum
  • Avg
  • Count
  • Lead
  • Lag
  • Rank
  • Dense_rank
  • Row_number
  • Ntile

Syntax

WINDOW window_name AS (
       [ reference_window_name ]   
       [ <PARTITION BY clause> ]  
       [ <ORDER BY clause> ]   
       [ <ROW or RANGE clause> ]  
      )  
  
<PARTITION BY clause> ::=  
PARTITION BY value_expression , ... [ n ]  
  
<ORDER BY clause> ::=  
ORDER BY order_by_expression  
    [ COLLATE collation_name ]   
    [ ASC | DESC ]   
    [ ,...n ]  
  
<ROW or RANGE clause> ::=  
{ ROWS | RANGE } <window frame extent>

SUM,COUNT, AVG 

Below is employee table

Department

Employee

Salary

Avg_Salary_By_Dept

IT

Anand

5000

 

Sales

Roy

6000

 

IT

John

4500

 

IT

Patrick

7500

 

Sales

Richard

6000

 

HR

Ganesh

6500

 

HR

Vidya

7500

 

In the above example if we need to fill the average salary for each department against each employee row then we can use the SUM function.

SELECT Department,Employee,Salary,SUM(Salary) OVER (PARTITION BY Department) AS Avg_Salary_By_Dept 
FROM Employee

Similarly you can use other aggregate functions like count, avg.

LEAD,LAG

When you need previous row value to compare with current row then you can go ahead with LAG function and if you want to have next row value along with current row then LEAD function. 
For lead and lag function order by clause is very important as based on the columns specified in order by clause the query decides current and previous/next row. Also if you want this comparison should be with specific rowset/group condition then specify the condition in partition by clause. 

Query example 

Below is Employee table

Department

Employee

Salary

Joining date

IT

Anand

5000

1-jan-2023

Sales

Roy

6000

1-jan-2023

IT

John

4500

2-jan-2023

IT

Patrick

7500

10-jan-2023

Sales

Richard

6000

30-dec-2022

HR

Ganesh

6500

11-aug-2022

HR

Vidya

7500

12-sep-2022


From the above table if we need to compare the salary of each employee within a department based on their joining date the we would write the below query.

SELECT Department, Employee, Salary, LEAD(Salary) OVER (PARTITION BY Department ORDER BY [Joining date] ) NEXT_EMP_SALARY, [Joining date] FROM Employee
 LEAD will give null for last row as there is no next row exists.

Similarly LAG will give the previous value and will be null for first row.

RANK,DESNSE_RANK,ROW_NUMBER 

Above three are used for ranking or setting order between rows based on grouping and ordering condition. 

Department

Employee

Salary

Joining date

IT

Anand

5000

1-jan-2023

Sales

Roy

6000

1-jan-2023

IT

John

4500

2-jan-2023

IT

Patrick

7500

10-jan-2023

Sales

Richard

6000

30-dec-2022

HR

Ganesh

6500

11-aug-2022

HR

Vidya

7500

12-sep-2022


If we us RANK for above query as below 

SELECT Department, Employee, Salary, [Joining date], RANK() OVER (ORDER BY [Joining date] ) RANK_JD FROM Employee 
the our output will be as below.

Department

Employee

Salary

Joining date

RANK_JD

IT

Anand

5000

1-jan-2023

4

Sales

Roy

6000

1-jan-2023

4

IT

John

4500

2-jan-2023

6

IT

Patrick

7500

10-jan-2023

7

Sales

Richard

6000

30-dec-2022

3

HR

Ganesh

6500

11-aug-2022

1

HR

Vidya

7500

12-sep-2022

2


If We use DENSE_RANK with same query 

SELECT Department, Employee, Salary, [Joining date], RANK() OVER (ORDER BY [Joining date] ) RANK_JD FROM Employee 

Output is 

Department

Employee

Salary

Joining date

RANK_JD

IT

Anand

5000

1-jan-2023

4

Sales

Roy

6000

1-jan-2023

4

IT

John

4500

2-jan-2023

5

IT

Patrick

7500

10-jan-2023

6

Sales

Richard

6000

30-dec-2022

3

HR

Ganesh

6500

11-aug-2022

1

HR

Vidya

7500

12-sep-2022

2


So if you see that RANK uses the number of rows to assign the next rank but dense_rank just uses the next consecutive number to assign value. 

ROW_NUMBER is pretty streight forward and it will assign incremental number to each rows based on condition. 

Department

Employee

Salary

Joining date

RANK_JD

IT

Anand

5000

1-jan-2023

4

Sales

Roy

6000

1-jan-2023

5

IT

John

4500

2-jan-2023

6

IT

Patrick

7500

10-jan-2023

7

Sales

Richard

6000

30-dec-2022

3

HR

Ganesh

6500

11-aug-2022

1

HR

Vidya

7500

12-sep-2022

2



Query Example and Interview Questions will be added soon (Join Page to get update)




No comments:

Post a Comment

T-SQL LEAD LAG and SUM function based query

  Query on T-SQL window clause Below is the sales table Order_Date Name Product SubCategory ...