When I wrote about using a SQL Sequence to place sequential values in a column I used the example of ranking students based upon on their test results. Looking back this may not have been the best example, as if two students had the same score they would have been given a different sequence number.
Fortunately SQL does have a function I can use to properly rank these students, based on their test scores.
Let me start with the students and their test scores:
LAST_NAME FIRST_NAME MID_INITIAL SCORE --------- ---------- ----------- ----- FUESTER MAX 25.69 ECKERT TRISTAN M 49.86 APUZZO CAROL C 14.89 TORRES JOSE 94.96 PENA JUANITA M 74.09 HONG XIA 18.54 REED STEVE J 65.83 OKEEFE VINCENT 6.02 KING CATHERINA K 18.76 WILEY CHRISTIAN J 60.77 MORALES MARIA A 65.83 |
Steve Reed and Maria Morales have the same test score, therefore, they need to have the same ranking. Let me show the SQL statement I would use to rank these eleven students by their score.
01 SELECT 02 RTRIM(LAST_NAME) || ',' || RTRIM(FIRST_NAME) || ' ' || MID_INITIAL as "Student name", 03 SCORE AS "Test score", 04 RANK() OVER(ORDER BY SCORE DESC) AS "Rank" 05 FROM MYLIB.STUDENT |
Line 2: I am concatenating the parts of the students' name into one column. Here I need to use the RTRIM to remove the trailing blanks from the last and first name columns. The double pipes ( || ) are used to concatenate these parts into one string.
Line 4: Here is the rank function. It has two parts, RANK() followed by the OVER. The OVER tells the rank how to sort and rank the columns in the results. In this case I want the results ranked (sorted) by the Score column in descending order. This will sort the results into this order, and return an integer value.
The results look like:
Student name Test score Rank ---------------- ---------- ---- TORRES,JOSE 94.96 1 PENA,JUANITA M 74.09 2 REED,STEVE J 65.83 3 MORALES,MARIA A 65.83 3 WILEY,CHRISTIAN J 60.77 5 ECKERT,TRISTAN M 49.86 6 FUESTER,MAX 25.69 7 KING,CATHERINA K 18.76 8 HONG,XIA 18.54 9 APUZZO,CAROL C 14.89 10 OKEEFE,VINCENT 6.02 11 |
We can see the Steve Reed and Maria Morales are both rank in third place. Christian Wiley is ranked as fifth as with two third rankings there is no fourth.
What if I want to have a fourth rank, no matter how many third ranks I have. To achieve this I would use the DENSE_RANK() function, as shown below:
01 SELECT 02 RTRIM(LAST_NAME) || ',' || RTRIM(FIRST_NAME) || ' ' || MID_INITIAL as "Student name", 03 SCORE AS "Test score", 04 DENSE_RANK() OVER(ORDER BY SCORE DESC) AS "Rank" 05 FROM MYLIB.STUDENT |
Line 4: The DENSE_RANK() replaces the RANK() that was used in the previous statement.
Steve Reed and Maria Morales are still ranked as third, and Christian Wiley is now ranked as number 4.
Student name Test score Rank ---------------- ---------- ---- TORRES,JOSE 94.96 1 PENA,JUANITA M 74.09 2 REED,STEVE J 65.83 3 MORALES,MARIA A 65.83 3 WILEY,CHRISTIAN J 60.77 4 ECKERT,TRISTAN M 49.86 5 FUESTER,MAX 25.69 6 KING,CATHERINA K 18.76 7 HONG,XIA 18.54 8 APUZZO,CAROL C 14.89 9 OKEEFE,VINCENT 6.02 10 |
When using RANK() and DENSE_RANK() I can have multiple values in the OVER, as this more complex examples shows.
There was a soccer tournament with four teams. They played each other twice, once at home and once away. At the end of the tournament the teams had the following records:
TEAM WINS DRAWS LOSSES GOALS_FOR GOALS_AGAINST -------- ---- ----- ------ --------- ------------- ATHLETIC 1 0 5 4 12 CITY 3 2 1 6 4 ROVERS 1 1 4 5 8 UNITED 5 1 0 15 6 |
To determine the tournament winner I need to calculate the number of points each team won, and the goal difference, difference between the goals for and against. I can use the following statement to generate a table that lists all of teams ranked by the total points, and goal difference if two teams have the same number of points.
01 SELECT TEAM AS "Team", 02 (WINS + DRAWS + LOSSES) AS "Pl", 03 WINS AS "W", 04 DRAWS AS "D", 05 LOSSES AS "L", 06 GOALS_FOR AS "GF", 07 GOALS_AGAINST AS "GA", 08 (GOALS_FOR - GOALS_AGAINST) AS "GD", 09 ((WINS * 3) + DRAWS) AS "Pts", 10 RANK() OVER(ORDER BY ((WINS * 3) + DRAWS) DESC, (GOALS_FOR - GOALS_AGAINST) DESC) AS "Rank" 11 FROM MYLIB.LEAGUE |
Line 1: First columns is the team's name.
Line 2: Next comes the number of games played, which is calculated and the sum number of games each team won, drew, and lost, Pl.
Line 3: Number of games the team won, W.
Line 4: Number of games the team drew (or tied), D.
Line 5: Number of game the team lost, L.
Line 6: Number of goals the team scored, GF.
Line 7: Number of goals the team conceded, GA.
Line 8: The goal difference is the difference between the number of goals the team scored and the number they conceded, GD.
Line 9: The three points are awarded for each win, and one point for a draw, Pts.
Line 10: The OVER part of this rank may look complicated, but it is not. The first thing to use to rank the results is the number of points, and the second is the goal difference.
The tournament winners are United.
Team Pl W D L GF GA GD Pts Rank -------- -- - - - -- -- -- --- ---- UNITED 6 5 1 0 15 6 9 16 1 CITY 6 3 2 1 6 4 2 11 2 ROVERS 6 1 1 4 5 8 -3 4 3 ATHLETIC 6 1 0 5 4 12 -8 3 4 |
Just to prove that the second part of the OVER works I inserted a bogus team, BOGUS, into the results with the same number of wins, draws, and losses as Athletic but Bogus have scored two more goals.
TEAM WINS DRAWS LOSSES GOALS_FOR GOALS_AGAINST -------- ---- ----- ------ --------- ------------- BOGUS 1 0 5 6 12 |
Now when I run the same SQL statement as before I can see that Bogus is ranked above Athletic. They both have three points, and Bogus has the better goal difference.
Team Pl W D L GF GA GD Pts Rank -------- -- - - - -- -- -- --- ---- UNITED 6 5 1 0 15 6 9 16 1 CITY 6 3 2 1 6 4 2 11 2 ROVERS 6 1 1 4 5 8 -3 4 3 BOGUS 6 1 0 5 6 12 -6 3 4 ATHLETIC 6 1 0 5 4 12 -8 3 5 |
If you need rank results now you need to decide whether to use RANK() or DENSE_RANK().
You can learn more about the RANK() and DENSE_RANK() command from the IBM website here.
This article was written for IBM i 7.3, and should work for releases 7.1 and greater.
Used this recently as the reference for both rank and denseRank. Clear, meaningful article. Thanks again.
ReplyDeleteGood! Thanks for sharing...
ReplyDeleteSimon, these are very interesting. I like the ‘rank’ command, I can see a large amount of usage with it. Great read and thanks for sharing.
ReplyDeleteVery useful, thanks Simon.
ReplyDeleteThanks, I will wiil be using the rank command in my next project!
ReplyDeleteVery useful. Thank you
ReplyDelete