Alas, if I wanted to create the equivalent of a pivot table using Db2 for i there is not the equivalent of the PIVOT relational operator that there is in Microsoft SQL Server.
A pivot table is taking a table's data that is in rows and "turning on its side" to become data in columns. This is best demonstrated with an example. I have a file that lists sales by color, month, and year in separator rows. I want to convert that to be one row per column and year. If I am little creative I can do pivot data with Db2 for i.
I have a SQL DDL Table, TESTTABLE that contains the data I want to pivot:
SELECT * FROM TESTTABLE SALES_ SALES_ SALES_ COLOR MONTH YEAR NUMBER ----- ------ ------ ------ BLUE 1 2023 98 BLUE 2 2023 73 BLUE 3 2023 34 GREEN 1 2023 68 GREEN 2 2023 56 GREEN 3 2023 78 RED 1 2023 85 RED 2 2023 51 RED 3 2023 65 |
I want my results to be in a single row like:
Color | Year | 1st month | 2nd month | 3rd month | Total of the 3 months |
What made this simple was using a Common Table Expression, CTE. This allows me to create one or more temporary tables, that only existing in memory rather than in storage, that I can use to accomplish my result.
In my example I will create two temporary tables before I present the results.
I am going to break apart my CTE into each temporary file to make it easier to explain what is done. If you want to make a copy of this example do make sure you copy all of the three parts into one statement.
The first part does the pivot, changing the data from rows to columns:
01 WITH T0 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3) 02 AS 03 (SELECT COLOR,SALES_YEAR, 04 CASE WHEN SALES_MONTH = 1 05 THEN SALES_NUMBER 06 ELSE 0 07 END, 08 CASE WHEN SALES_MONTH = 2 09 THEN SALES_NUMBER 10 ELSE 0 11 END, 12 CASE WHEN SALES_MONTH = 3 13 THEN SALES_NUMBER 14 ELSE 0 15 END 16 FROM TESTTABLE 17 ORDER BY 1,2), |
Line 1: The first line of a CTE always starts with WITH. This is followed by the name of the temporary table. I never give mine complicated names as they are only used within the CTE. My own rule is to call the temporary tables "T" followed by a sequential number. As this is the first I name this temporary file T0, T + zero. The table name is followed by a list of the columns that the temporary table will contain.
Line 2: AS is needed before the SQL code to create the temporary table. It does not have to be on its own line. It is personal preference that it is.
Lines 3 – 17: This is the SQL statement to create T0. The statements is contained with parentheses, or brackets depending on whether you use American- or English-English. The first two columns of T0 are the color and sales year from TESTTABLE table.
Lines 4 – 7: I am using a Case statement that checks if the row is for the first sales month. If it is it places the value from the sales number column into this new column. If it is not it places zero into the column. If I do not zero the column it will contain null, and any math performed with null always results in null.
Lines 8 – 11: The same for the second sales month.
Lines 12 – 15: For the third sales month.
Line 16: Data comes from TESTTABLE.
Line 17: I sort the results by the color and the sales year. As there is another temporary file being defined after this one, this statement must end with a comma.
The results for temporary table T0 are:
COLOR YEAR MONTH_1 MONTH_2 MONTH_3 ----- ---- ------- ------- ------- RED 2023 85 0 0 RED 2023 0 51 0 RED 2023 0 0 65 GREEN 2023 68 0 0 GREEN 2023 0 56 0 GREEN 2023 0 0 78 BLUE 2023 98 0 0 BLUE 2023 0 73 0 BLUE 2023 0 0 34 |
I now have a series of rows, one for each color, year, and month. Now I need to combine them to generate one row per color and year combination. That is what the second part of the CTE does:
18 T1 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3) 19 AS 20 (SELECT COLOR,YEAR, 21 SUM(MONTH_1),SUM(MONTH_2),SUM(MONTH_3) 22 FROM T0 23 GROUP BY COLOR,YEAR) |
Line 18: As this is not the start of the CTE I do not need the WITH. I call my second temporary table T1. What follows the table are the columns within it.
Line 19: AS on its on line again.
Lines 20 – 23: The SQL statement to combine the monthly rows into one yearly one. I do this using the SUM scalar function for each year, with a GROUP BY, line 23, with sorts and group the results by the columns listed. The results come from the temporary table T0. There is no comma at the end of this statement as this is the last temporary file definition.
If I look at what T1 contains I would see:
COLOR YEAR MONTH_1 MONTH_2 MONTH_3 ----- ---- ------- ------- ------- BLUE 2023 98 73 34 GREEN 2023 68 56 78 RED 2023 85 51 65 |
I don't have a total column, yet.
The last part of this CTE is a SQL select statement:
24 SELECT T1.*,(MONTH_1 + MONTH_2 + MONTH_3) AS "TOTAL" 25 FROM T1 26 ORDER BY COLOR,YEAR ; |
Line 24: Rather than have to list all of the columns in T1 I can just use T1.*, T1 + the wild card, to include all the columns from T1. The next column is the one for the total. It is simple sum of the three month columns. I have given this new column the name of TOTAL.
Line 25: All the data comes from the temporary table T1.
Line 26: The results will be returned sorted by color and then year.
The final result looks like:
COLOR YEAR MONTH_1 MONTH_2 MONTH_3 TOTAL ----- ---- ------- ------- ------- ----- BLUE 2023 98 73 34 205 GREEN 2023 68 56 78 202 RED 2023 85 51 65 201 |
If I need to produce this pivot table again I would put the in a SQL View. Then anyone can use it, at any time, to get the pivoted results:
01 CREATE OR REPLACE VIEW MYLIB.PIVOT_SALES_MONTHS 02 FOR SYSTEM NAME "PIVSLSMTH" 03 (COLOR,SALES_YEAR,SALES_MONTH_1,SALES_MONTH_2,SALES_MONTH_3,SALES_TOTAL) 04 AS 05 (WITH T0 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3) 06 AS 07 (SELECT COLOR,SALES_YEAR, 08 CASE WHEN SALES_MONTH = 1 09 THEN SALES_NUMBER 10 ELSE 0 11 END, 12 CASE WHEN SALES_MONTH = 2 13 THEN SALES_NUMBER 14 ELSE 0 15 END, 16 CASE WHEN SALES_MONTH = 3 17 THEN SALES_NUMBER 18 ELSE 0 19 END 20 FROM TESTTABLE 21 ORDER BY 1,2), 22 T1 (COLOR,YEAR,MONTH_1,MONTH_2,MONTH_3) 23 AS 24 (SELECT COLOR,YEAR, 25 SUM(MONTH_1),SUM(MONTH_2),SUM(MONTH_3) 26 FROM T0 27 GROUP BY COLOR,YEAR) 28 SELECT T1.*,(MONTH_1 + MONTH_2 + MONTH_3) 29 FROM T1) ; |
Line 1: I have used the CREATE OR REPLACE VIEW as it is a View it can be recreated without any loss of data.
Line 2: As the View's name is more than ten characters I give it a short system name.
Line 3: AS on its own line.
Lines 4 – 29: Same as the first version of the CTE I showed above. The only difference is that as a View cannot have a sort of its results within the View's definition, I removed that from the Select statement in the last part of the CTE's definition.
Now if I need this information again I can just use the following statement:
01 SELECT * FROM QTEMP.PIVOT_SALES_MONTHS |
I have my rows of data from TESTTABLE returned as columns.
This article was written for IBM i 7.5, and should work for some earlier releases too.
Are there OLAP functions that can do the same?
ReplyDeleteNot that I am aware of. If you know of one I hope you will share.
ReplyDeleteHi, Simon! Thanks for the insighful article!
ReplyDeleteHad a similar problem and this solution really helped me. But i replaced a "CASE WHEN SALES_MONTH =... " part with LISTAGG aggregate function, so column set is not a hardcode but is selected from the table. So the first row constains LISTAGG of all distinct months and the folowing rows contain color and LISTAGG of SALES_NUMBER for the color and each month.