I struggled with the title of this post to make it something that explains the content, without it being too long. I am not limited for length in the body of this post so I can explain in more detail my scenario: I want to have columns that are listed in my results for a Select statement that are not in the GROUP BY clause.
This is something that has been made easier with a new Scalar function added in the latest round of Technology Refreshes. First let me give you examples of my scenario.
What I am going to do is to take use my PERSON DDL Table and make a count of the places of birth of the people. As these are people from the British Isles, I will also include the columns for the county and the country (England, Ireland, Northern Ireland, Scotland, or Wales) from a second Table, TOWN.
Let me start by showing that everyone in the PERSON Table has their place of birth in the Table:
01 SELECT FIRST_NAME || ' ' || LAST_NAME AS "Name", 02 PLACE_OF_BIRTH "Birth place" 03 FROM PERSON 04 LIMIT 5 |
Line 1: I am concatenating the first and last name columns just to make, IMHO, a nicer column.
I am limiting the results to just five rows as you will understand what is going on without me showing the results from all of the Table.
The results are:
Name Birth place ---------------- ----------- REG ALLEN MARYLEBONE JACK CROMPTON HULME ROGER BYRNE GORTON JOHNNY CAREY DUBLIN THOMAS MCNULTY SALFORD |
To include the county and the country in the results I need to join the table PERSON to the table TOWN. Which is what the statement below does:
01 SELECT A.FIRST_NAME || ' ' || A.LAST_NAME AS "Name", 02 A.PLACE_OF_BIRTH "Birth place", 03 B.COUNTY AS "County", 04 B.COUNTRY AS "Country" 05 FROM PERSON A, TOWN B 06 WHERE A.PLACE_OF_BIRTH = B.TOWN_NAME 07 LIMIT 5 |
In this statement I have prefixed the columns from each Table with a letter so can identify which Table the column is in.
Lines 3 and 4: These are the two columns I want from the TOWN Table.
Line 5 and 6: I join the two Tables together using the Place of Birth column from the PERSON Table and the Town Name column from TOWN.
The results are:
Name Birth place County Country ---------------- ----------- ----------- ------- REG ALLEN MARYLEBONE MIDDLESEX ENG JACK CROMPTON HULME LANCASHIRE ENG ROGER BYRNE GORTON LANCASHIRE ENG JOHNNY CAREY DUBLIN DUBLIN IRE THOMAS MCNULTY SALFORD LANCASHIRE ENG |
Four of the people are from England (ENG) and one from the Republic of Ireland (IRE).
Now I want to know which town or city were most of the people in the PERSON Table born. Here I would use the Count scalar function and the Group by clause to group my results by place of birth:
01 SELECT PLACE_OF_BIRTH AS "Birth place", 02 COUNT(*) AS "No. people" 03 FROM PERSON 04 GROUP BY PLACE_OF_BIRTH 05 ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH 06 LIMIT 5 |
The results how that the most popular place of birth is the city of Manchester.
Birth place No. people ------------- ---------- MANCHESTER 6 SALFORD 5 GORTON 2 ALDERSHOT 1 ASHTON-UNDER-LYNE 1 |
I want to include the county and country in my results. If I had never used the Group By clause before I would have tried modifying the above statement to be:
01 SELECT PLACE_OF_BIRTH AS "Birth place", 02 COUNT(*) AS "No people", 03 COUNTY AS "County", 04 COUNTRY AS "Country" 05 FROM PERSON, TOWN 06 WHERE PLACE_OF_BIRTH = TOWN_NAME 07 GROUP BY PLACE_OF_BIRTH 08 ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH 09 LIMIT 5 |
Lines 3 and 4: I have added the County and Country columns from TOWN.
Lines 5 and 6: I have joined the two Tables together.
When I try to execute this statement, I get an error:
[SQL0122] Column COUNTY or expression in SELECT list not valid. |
This means that I cannot use the County column in the list of columns selected for the results as it is not included the Group by clause.
I can overcome this error by adding the County and Country columns to the Group By clause:
01 SELECT PLACE_OF_BIRTH AS "Birth place", 02 COUNT(*) AS "No people", 03 COUNTY AS "County", 04 COUNTRY AS "Country" 05 FROM PERSON, TOWN 06 WHERE PLACE_OF_BIRTH = TOWN_NAME 07 GROUP BY PLACE_OF_BIRTH, COUNTY, COUNTRY 08 ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH 09 LIMIT 5 |
Line 7: I have added the columns COUNTY and COUNTRY to the Group By clause.
The statement will execute and I get results:
Birth place No. people County Country ------------- ---------- ----------- ------- MANCHESTER 6 LANCASHIRE ENG SALFORD 5 LANCASHIRE ENG GORTON 2 LANCASHIRE ENG ALDERSHOT 1 HAMPSHIRE ENG ASHTON-UNDER-LYNE 1 LANCASHIRE ENG |
But I don't want to group my results by the place of birth, county and country. I just want to group by the place of birth.
This is where the new scalar function introduced in the latest round of Technology Refreshes comes into play. The ANY_VALUE scalar function allows me to include a column in the Select section and not include it in the Group by clause.
I can modify my previous statement to be:
01 SELECT PLACE_OF_BIRTH AS "Birth place", 02 COUNT(*) AS "No people", 03 ANY_VALUE(COUNTY) AS "County", 04 ANY_VALUE(COUNTRY) AS "Country" 05 FROM PERSON, TOWN 06 WHERE PLACE_OF_BIRTH = TOWN_NAME 07 GROUP BY PLACE_OF_BIRTH 08 ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH 09 LIMIT 5 |
Line 3: The County column is in the ANY_VALUE Scalar function.
Line 4: The Country column is in its own ANY_VALUE Scalar function.
Line 7: The Group by clause only includes the place of birth.
When I execute the statement, I get the following results:
Birth place No. people County Country ------------- ---------- ----------- ------- MANCHESTER 6 LANCASHIRE ENG SALFORD 5 LANCASHIRE ENG GORTON 2 LANCASHIRE ENG ALDERSHOT 1 HAMPSHIRE ENG ASHTON-UNDER-LYNE 1 LANCASHIRE ENG |
I know this is a simple example, but you can see how useful this new Scalar function is.
This article was written for IBM i 7.5 TR4 and 7.4 TR10.
Hi Simon, aren't the last results the same as your previous unwanted results?
ReplyDeleteLove your work,
Thanks,
Marc
In this simple example both the statements are basically the same.
DeleteOnce you start creating a complex statement over a large amount of data I would want to keep my GROUP BY down to a small as possible set of data. This is where the value of the ANY_VALUE comes into play. As the column(s) are not in the GROUP BY CPU time does not have to be wasted by sorting and grouping on columns I don't care about. Which will mean faster results.
Oh I'm loving this! Cheers :)
ReplyDeleteVery useful, Great
ReplyDeleteReally very interesting. Thank you.
ReplyDelete