Wednesday, September 4, 2024

Extra columns in a Select statement but not in the group by

I struggled with the title of this post to make it something that explains the content, without it being too long. I am not limited for length in the body of this post so I can explain in more detail my scenario: I want to have columns that are listed in my results for a Select statement that are not in the GROUP BY clause.

This is something that has been made easier with a new Scalar function added in the latest round of Technology Refreshes. First let me give you examples of my scenario.

What I am going to do is to take use my PERSON DDL Table and make a count of the places of birth of the people. As these are people from the British Isles, I will also include the columns for the county and the country (England, Ireland, Northern Ireland, Scotland, or Wales) from a second Table, TOWN.

Let me start by showing that everyone in the PERSON Table has their place of birth in the Table:

01  SELECT FIRST_NAME || ' ' || LAST_NAME AS "Name",
02         PLACE_OF_BIRTH "Birth place"
03    FROM PERSON
04    LIMIT 5

Line 1: I am concatenating the first and last name columns just to make, IMHO, a nicer column.

I am limiting the results to just five rows as you will understand what is going on without me showing the results from all of the Table.

The results are:

Name              Birth place
----------------  -----------
REG ALLEN         MARYLEBONE
JACK CROMPTON     HULME
ROGER BYRNE       GORTON
JOHNNY CAREY      DUBLIN
THOMAS MCNULTY    SALFORD

To include the county and the country in the results I need to join the table PERSON to the table TOWN. Which is what the statement below does:

01  SELECT A.FIRST_NAME || ' ' || A.LAST_NAME AS "Name",
02         A.PLACE_OF_BIRTH "Birth place",
03         B.COUNTY AS "County",
04         B.COUNTRY AS "Country"
05    FROM PERSON A, TOWN B
06   WHERE A.PLACE_OF_BIRTH = B.TOWN_NAME
07   LIMIT 5

In this statement I have prefixed the columns from each Table with a letter so can identify which Table the column is in.

Lines 3 and 4: These are the two columns I want from the TOWN Table.

Line 5 and 6: I join the two Tables together using the Place of Birth column from the PERSON Table and the Town Name column from TOWN.

The results are:

Name              Birth place  County       Country
----------------  -----------  -----------  -------
REG ALLEN         MARYLEBONE   MIDDLESEX    ENG
JACK CROMPTON     HULME        LANCASHIRE   ENG
ROGER BYRNE       GORTON       LANCASHIRE   ENG
JOHNNY CAREY      DUBLIN       DUBLIN       IRE
THOMAS MCNULTY    SALFORD      LANCASHIRE   ENG

Four of the people are from England (ENG) and one from the Republic of Ireland (IRE).

Now I want to know which town or city were most of the people in the PERSON Table born. Here I would use the Count scalar function and the Group by clause to group my results by place of birth:

01  SELECT PLACE_OF_BIRTH AS "Birth place", 
02         COUNT(*) AS "No. people"
03    FROM PERSON
04   GROUP BY PLACE_OF_BIRTH
05   ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH
06   LIMIT 5

The results how that the most popular place of birth is the city of Manchester.

Birth place    No. people
-------------  ----------
MANCHESTER              6
SALFORD                 5
GORTON                  2
ALDERSHOT               1
ASHTON-UNDER-LYNE       1

I want to include the county and country in my results. If I had never used the Group By clause before I would have tried modifying the above statement to be:

01  SELECT PLACE_OF_BIRTH AS "Birth place", 
02         COUNT(*) AS "No people",
03         COUNTY AS "County",
04         COUNTRY AS "Country"
05    FROM PERSON, TOWN
06   WHERE PLACE_OF_BIRTH = TOWN_NAME
07   GROUP BY PLACE_OF_BIRTH
08   ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH
09   LIMIT 5

Lines 3 and 4: I have added the County and Country columns from TOWN.

Lines 5 and 6: I have joined the two Tables together.

When I try to execute this statement, I get an error:

[SQL0122] Column COUNTY or expression in SELECT list not valid.

This means that I cannot use the County column in the list of columns selected for the results as it is not included the Group by clause.

I can overcome this error by adding the County and Country columns to the Group By clause:

01  SELECT PLACE_OF_BIRTH AS "Birth place", 
02         COUNT(*) AS "No people",
03         COUNTY AS "County",
04         COUNTRY AS "Country"
05    FROM PERSON, TOWN
06   WHERE PLACE_OF_BIRTH = TOWN_NAME
07   GROUP BY PLACE_OF_BIRTH, COUNTY, COUNTRY
08   ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH
09   LIMIT 5

Line 7: I have added the columns COUNTY and COUNTRY to the Group By clause.

The statement will execute and I get results:

Birth place    No. people  County       Country
-------------  ----------  -----------  -------
MANCHESTER              6  LANCASHIRE   ENG
SALFORD                 5  LANCASHIRE   ENG
GORTON                  2  LANCASHIRE   ENG
ALDERSHOT               1  HAMPSHIRE    ENG
ASHTON-UNDER-LYNE       1  LANCASHIRE   ENG

But I don't want to group my results by the place of birth, county and country. I just want to group by the place of birth.

This is where the new scalar function introduced in the latest round of Technology Refreshes comes into play. The ANY_VALUE scalar function allows me to include a column in the Select section and not include it in the Group by clause.

I can modify my previous statement to be:

01  SELECT PLACE_OF_BIRTH AS "Birth place", 
02         COUNT(*) AS "No people",
03         ANY_VALUE(COUNTY) AS "County",
04         ANY_VALUE(COUNTRY) AS "Country"
05    FROM PERSON, TOWN
06   WHERE PLACE_OF_BIRTH = TOWN_NAME
07   GROUP BY PLACE_OF_BIRTH
08   ORDER BY COUNT(*) DESC, PLACE_OF_BIRTH
09   LIMIT 5

Line 3: The County column is in the ANY_VALUE Scalar function.

Line 4: The Country column is in its own ANY_VALUE Scalar function.

Line 7: The Group by clause only includes the place of birth.

When I execute the statement, I get the following results:

Birth place    No. people  County       Country
-------------  ----------  -----------  -------
MANCHESTER              6  LANCASHIRE   ENG
SALFORD                 5  LANCASHIRE   ENG
GORTON                  2  LANCASHIRE   ENG
ALDERSHOT               1  HAMPSHIRE    ENG
ASHTON-UNDER-LYNE       1  LANCASHIRE   ENG

I know this is a simple example, but you can see how useful this new Scalar function is.

 

This article was written for IBM i 7.5 TR4 and 7.4 TR10.

5 comments:

  1. Hi Simon, aren't the last results the same as your previous unwanted results?
    Love your work,
    Thanks,
    Marc

    ReplyDelete
    Replies
    1. In this simple example both the statements are basically the same.
      Once you start creating a complex statement over a large amount of data I would want to keep my GROUP BY down to a small as possible set of data. This is where the value of the ANY_VALUE comes into play. As the column(s) are not in the GROUP BY CPU time does not have to be wasted by sorting and grouping on columns I don't care about. Which will mean faster results.

      Delete
  2. Oh I'm loving this! Cheers :)

    ReplyDelete
  3. Really very interesting. Thank you.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.