Wednesday, September 13, 2023

Creating a pivot table with SQL

Alas, if I wanted to create the equivalent of a pivot table using Db2 for i there is not the equivalent of the PIVOT relational operator that there is in Microsoft SQL Server.

A pivot table is taking a table's data that is in rows and "turning on its side" to become data in columns. This is best demonstrated with an example. I have a file that lists sales by color, month, and year in separator rows. I want to convert that to be one row per column and year. If I am little creative I can do pivot data with Db2 for i.

I have a SQL DDL Table, TESTTABLE that contains the data I want to pivot:

SELECT * FROM TESTTABLE

       SALES_  SALES_  SALES_
COLOR  MONTH   YEAR    NUMBER 
-----  ------  ------  ------
BLUE        1    2023      98
BLUE        2    2023      73
BLUE        3    2023      34
GREEN       1    2023      68
GREEN       2    2023      56
GREEN       3    2023      78
RED         1    2023      85
RED         2    2023      51
RED         3    2023      65

I want my results to be in a single row like:

Color Year 1st month 2nd month 3rd month Total of the 3 months

What made this simple was using a Common Table Expression, CTE. This allows me to create one or more temporary tables, that only existing in memory rather than in storage, that I can use to accomplish my result.

In my example I will create two temporary tables before I present the results.

I am going to break apart my CTE into each temporary file to make it easier to explain what is done. If you want to make a copy of this example do make sure you copy all of the three parts into one statement.

The first part does the pivot, changing the data from rows to columns:

01  WITH T0 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3)
02  AS
03  (SELECT COLOR,SALES_YEAR,
04          CASE WHEN SALES_MONTH = 1 
05            THEN SALES_NUMBER 
06            ELSE 0 
07          END,
08          CASE WHEN SALES_MONTH = 2 
09            THEN SALES_NUMBER 
10            ELSE 0 
11          END,
12          CASE WHEN SALES_MONTH = 3 
13            THEN SALES_NUMBER 
14            ELSE 0 
15          END        
16  FROM TESTTABLE
17  ORDER BY 1,2),

Line 1: The first line of a CTE always starts with WITH. This is followed by the name of the temporary table. I never give mine complicated names as they are only used within the CTE. My own rule is to call the temporary tables "T" followed by a sequential number. As this is the first I name this temporary file T0, T + zero. The table name is followed by a list of the columns that the temporary table will contain.

Line 2: AS is needed before the SQL code to create the temporary table. It does not have to be on its own line. It is personal preference that it is.

Lines 3 – 17: This is the SQL statement to create T0. The statements is contained with parentheses, or brackets depending on whether you use American- or English-English. The first two columns of T0 are the color and sales year from TESTTABLE table.

Lines 4 – 7: I am using a Case statement that checks if the row is for the first sales month. If it is it places the value from the sales number column into this new column. If it is not it places zero into the column. If I do not zero the column it will contain null, and any math performed with null always results in null.

Lines 8 – 11: The same for the second sales month.

Lines 12 – 15: For the third sales month.

Line 16: Data comes from TESTTABLE.

Line 17: I sort the results by the color and the sales year. As there is another temporary file being defined after this one, this statement must end with a comma.

The results for temporary table T0 are:

COLOR  YEAR  MONTH_1  MONTH_2  MONTH_3 
-----  ----  -------  -------  -------
RED    2023       85        0        0
RED    2023        0       51        0
RED    2023        0        0       65
GREEN  2023       68        0        0
GREEN  2023        0       56        0
GREEN  2023        0        0       78
BLUE   2023       98        0        0
BLUE   2023        0       73        0
BLUE   2023        0        0       34

I now have a series of rows, one for each color, year, and month. Now I need to combine them to generate one row per color and year combination. That is what the second part of the CTE does:

18  T1 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3)
19  AS
20  (SELECT COLOR,YEAR,
21          SUM(MONTH_1),SUM(MONTH_2),SUM(MONTH_3)
22  FROM T0
23  GROUP BY COLOR,YEAR)

Line 18: As this is not the start of the CTE I do not need the WITH. I call my second temporary table T1. What follows the table are the columns within it.

Line 19: AS on its on line again.

Lines 20 – 23: The SQL statement to combine the monthly rows into one yearly one. I do this using the SUM scalar function for each year, with a GROUP BY, line 23, with sorts and group the results by the columns listed. The results come from the temporary table T0. There is no comma at the end of this statement as this is the last temporary file definition.

If I look at what T1 contains I would see:

COLOR  YEAR  MONTH_1  MONTH_2  MONTH_3
-----  ----  -------  -------  -------
BLUE   2023      98        73       34
GREEN  2023      68        56       78
RED    2023      85        51       65

I don't have a total column, yet.

The last part of this CTE is a SQL select statement:

24  SELECT T1.*,(MONTH_1 + MONTH_2 + MONTH_3) AS "TOTAL"
25  FROM T1 
26  ORDER BY COLOR,YEAR ;

Line 24: Rather than have to list all of the columns in T1 I can just use T1.*, T1 + the wild card, to include all the columns from T1. The next column is the one for the total. It is simple sum of the three month columns. I have given this new column the name of TOTAL.

Line 25: All the data comes from the temporary table T1.

Line 26: The results will be returned sorted by color and then year.

The final result looks like:

COLOR  YEAR  MONTH_1  MONTH_2  MONTH_3  TOTAL
-----  ----  -------  -------  -------  -----
BLUE   2023      98        73       34    205
GREEN  2023      68        56       78    202
RED    2023      85        51       65    201

If I need to produce this pivot table again I would put the in a SQL View. Then anyone can use it, at any time, to get the pivoted results:

01  CREATE OR REPLACE VIEW MYLIB.PIVOT_SALES_MONTHS
02    FOR SYSTEM NAME "PIVSLSMTH"
03    (COLOR,SALES_YEAR,SALES_MONTH_1,SALES_MONTH_2,SALES_MONTH_3,SALES_TOTAL)
04  AS
05  (WITH T0 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3)
06   AS
07   (SELECT COLOR,SALES_YEAR,
08           CASE WHEN SALES_MONTH = 1 
09             THEN SALES_NUMBER 
10             ELSE 0 
11           END,
12           CASE WHEN SALES_MONTH = 2 
13             THEN SALES_NUMBER 
14             ELSE 0 
15           END,
16           CASE WHEN SALES_MONTH = 3 
17             THEN SALES_NUMBER 
18             ELSE 0 
19           END        
20      FROM TESTTABLE
21   ORDER BY 1,2),

22   T1 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3)
23     AS
24     (SELECT COLOR,YEAR,
25             SUM(MONTH_1),SUM(MONTH_2),SUM(MONTH_3)
26        FROM T0
27       GROUP BY COLOR,YEAR)

28    SELECT T1.*,(MONTH_1 + MONTH_2 + MONTH_3)
29      FROM T1) ;

Line 1: I have used the CREATE OR REPLACE VIEW as it is a View it can be recreated without any loss of data.

Line 2: As the View's name is more than ten characters I give it a short system name.

Line 3: AS on its own line.

Lines 4 – 29: Same as the first version of the CTE I showed above. The only difference is that as a View cannot have a sort of its results within the View's definition, I removed that from the Select statement in the last part of the CTE's definition.

Now if I need this information again I can just use the following statement:

01  SELECT * FROM QTEMP.PIVOT_SALES_MONTHS

I have my rows of data from TESTTABLE returned as columns.

 

This article was written for IBM i 7.5, and should work for some earlier releases too.

3 comments:

  1. Are there OLAP functions that can do the same?

    ReplyDelete
  2. Not that I am aware of. If you know of one I hope you will share.

    ReplyDelete
  3. Hi, Simon! Thanks for the insighful article!

    Had a similar problem and this solution really helped me. But i replaced a "CASE WHEN SALES_MONTH =... " part with LISTAGG aggregate function, so column set is not a hardcode but is selected from the table. So the first row constains LISTAGG of all distinct months and the folowing rows contain color and LISTAGG of SALES_NUMBER for the color and each month.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.