Wednesday, July 7, 2021

Combining data from the same file in multiple libraries into a view

view union joions convert number to date and varchar to char

When anyone asks for information that they are going to "pull" from the IBM i into another database or to interface to another application on the IBM i I do not want them to access the "live" files for several reasons:

  1. Control the information they can get to, don't show them what they do not need
  2. If there are calculations or other mapping that needs to be performed perform them, rather than rely on them to do it and then find they have not done so in the right manner
  3. Map data from the files and tables into the type of data format they want
  4. Prevent them from being able to insert, change, or delete data from files and tables

In my opinion the perfect tool for this is a SQL View. I can build it the way I want and tell them to use that.

A recent example was a request for the date of the latest test runs of a certain job on the development server. This is complex as there are ten environments (library lists of related libraries) and they wanted the information from all. The ERP application writes the data from each run into a DDS file. This file contains a lot more than the start and end dates selected for the job, which is not needed and I am not going to give it.

All the data that was needed was:

  • Environment designator – which is not held in the file, but was hard coded as a single character
  • From date – packed 6,0 field
  • To date – packed 6,0
  • Submitted date – date
  • Submitted time – time

In these examples I am not going to show how to combine ten environments of data. What works for three can be easily modified to work for ten.

As the layout of the file and data is all the same I can combine the data into SQL View using the UNION clause:

01  SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
02  UNION
03  SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
04  UNION
05  SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
06  ORDER BY SBMDATE DESC,SBMTIME DESC ;

By all means the first rows shown are the most recent, but the results include every row from the three files.

Top 5 results only

00001 FRMDTE  THRDTE SBMDATE    SBMTIME
----- ------  ------ ---------- --------
C      42921   43021 2021-05-01 07.55.15
A      42421   42521 2021-04-26 09.48.45
B      42121   42221 2021-04-21 13.48.34
B      41021   42021 2021-04-21 12.56.27
C      40121   41521 2021-04-16 17.39.58

My first attempt of only returning the first row from each environment was:

01  SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
02  LIMIT 1
03  UNION
04  SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
05  LIMIT 1
06  UNION
07  SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
08  LIMIT 1
09  ORDER BY SBMDATE DESC,SBMTIME DESC ;

Lines 2, 5, and 8: The LIMIT instructs the statement to only return the first record from each SQL statement.

I am performing all of these test in ACS's "Run SQL scripts" and the error returned looks like:

SQL State: 42601 
Vendor Code: -199 
Message: [SQL0199] Keyword UNION not expected.

This was easily overcome by making each part of the statement a sub-select:

01  (SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
02  LIMIT 1)
03  UNION
04  (SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
05  LIMIT 1)
06  UNION
07  (SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
08  LIMIT 1)
09  ORDER BY SBMDATE DESC,SBMTIME DESC ;

The results were exactly what was desired:

00001 FRMDTE  THRDTE SBMDATE    SBMTIME
----- ------  ------ ---------- --------
C      42921   43021 2021-05-01 07.55.15
A      42421   42521 2021-04-26 09.48.45
B      42121   42221 2021-04-21 13.48.34

Now there is one of those gotchas. The job can be run a maximum of twice per day, most of time it is only run once, but on occasion it can be run a second time. To be able to return the two most recent results from each environment only required me to change the number of records returned by the LIMIT.

01  (SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
02  LIMIT 2)
03  UNION
04  (SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
05  LIMIT 2)
06  UNION
07  (SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
08  LIMIT 2)
09  ORDER BY 1,4 DESC,5 DESC ;

Line 9: As I am just playing around the statement to get what I want rather than type in the column names in the ORDER BY I can just give the number to the place that column is in the results. In this example I am sorting by the first column, environment, fourth column, submit date, and fifth column, submit time.

The results now show two rows from each of the files:

00001 FRMDTE  THRDTE SBMDATE    SBMTIME
----- ------  ------ ---------- --------
A      42421   42521 2021-04-26 09.48.45
A      30121   33121 2021-04-02 16.10.45
B      42121   42221 2021-04-21 13.48.34
B      41021   42021 2021-04-21 12.56.27
C      42921   43021 2021-05-01 07.55.15
C      40121   41521 2021-04-16 17.39.58

Now to turn this statement into one to create a SQL View:

01  CREATE OR REPLACE VIEW MYLIB.UNION_VIEW_1
02    FOR SYSTEM NAME "UNIONVIEW1"
03  (ENVIRONMENT FOR COLUMN "ENVIR",
04   FROM_DATE FOR COLUMN "FRMDTE",
05   THRU_DATE FOR COLUMN "THRDATE",
06   SUBMIT_DATE FOR COLUMN "SBMDATE",
07   SUBMIT_TIME FOR COLUMN "SBMTIME")
08  AS
09  (
10  (SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
11   LIMIT 2)
12  UNION
13  (SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
14   LIMIT 2)
15  UNION
16  (SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
17   LIMIT 2)
18  ORDER BY 1,4 DESC,5 DESC
19  ) ;

Line 2: I always like to give my SQL Views both a long name, UNION_VIEW_1, and a system short name, UNIONVIEW1.

Lines 3 – 7: As I have created a column, Environment, in the results that is not in the definition for the file(s) used I need to define the columns for the View. I do this many times even if there are no additional columns to give View's columns relevant long and system short names.

Lines 10 – 18: This is the SQL statement that I used in "Run SQL scripts" to generate the results I want.

Alas, when I tried to create this View I received the following message:

SQL State: 428FJ
Vendor Code: -20211
Message: [SQ20211] Clause not allowed for CREATE VIEW. Cause . . . . . :
ORDER BY, FETCH FIRST n ROWS, and  OFFSET are not allowed in the
outer full select of a view.

Here it is the ORDER BY that is causing the problem. What I need to do is define it in each of the sub-selects, lines 11, 15, and 19.

01  CREATE OR REPLACE VIEW MYLIB.UNION_VIEW_1
02    FOR SYSTEM NAME "UNIONVIEW1"
03  (ENVIRONMENT FOR COLUMN "ENVIR",
04   FROM_DATE FOR COLUMN "FRMDTE",
05   THRU_DATE FOR COLUMN "THRDATE",
06   SUBMIT_DATE FOR COLUMN "SBMDATE",
07   SUBMIT_TIME FOR COLUMN "SBMTIME")
08  AS
09  (
10  (SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
11    ORDER BY SBMDATE DESC,SBMTIME DESC
12   LIMIT 2)
13  UNION
14  (SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
15    ORDER BY SBMDATE DESC,SBMTIME DESC
16   LIMIT 2)
17  UNION
18  (SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
19    ORDER BY SBMDATE DESC,SBMTIME DESC
20   LIMIT 2)
21  ) ;

Now the View is created and I can query its data with the following SQL statement:

SELECT * FROM UNION_VIEW_1 ORDER BY 1,4 DESC,5 DESC

The results are:

ENVIRONMENT FROM_DATE THRU_DATE SUBMIT_DATE SUBMIT_TIME
----------- --------- --------- ----------- -----------
A               42421     42521 2021-04-26  09.48.45
A               30121     33121 2021-04-02  16.10.45
B               42121     42221 2021-04-21  13.48.34
B               41021     42021 2021-04-21  12.56.27
C               42921     43021 2021-05-01  07.55.15
C               40121     41521 2021-04-16  17.39.58

I need to provide the View's column definitions. Fortunately I can get that from the SYSCOLUMNS View:

01  SELECT TABLE_NAME,COLUMN_NAME,SYSTEM_COLUMN_NAME,
02         DATA_TYPE,LENGTH,NUMERIC_SCALE
03    FROM QSYS2.SYSCOLUMNS
04   WHERE TABLE_SCHEMA = 'MYLIB'
05     AND TABLE_NAME = 'UNION_VIEW_1'

Which returns to me:

TABLE                    SYSTEM_     DATE_          NUMERIC
_NAME        COLUMN_NAME COLUMN_NAME TYPE    LENGTH _SCALE
------------ ----------- ----------- ------- ------ -------
UNION_VIEW_1 ENVIRONMENT ENVIR       VARCHAR      1 
UNION_VIEW_1 FROM_DATE   FRMDTE      DECIMAL      6       0
UNION_VIEW_1 THRU_DATE   THRDATE     DECIMAL      6       0
UNION_VIEW_1 SUBMIT_DATE SBMDATE     DATE         4 
UNION_VIEW_1 SUBMIT_TIME SBMTIME     TIME         3 

Personally I dislike using decimal values masquerading as dates in a View like this. I am going to redefine them as true dates. And just because I can I am going to redefine the Environment from VARCHAR to CHAR. This is going to be a long statement, but you will recognize a most of it from the other examples I have shown above.

01  CREATE OR REPLACE VIEW MYLIB.UNION_VIEW_2
02    FOR SYSTEM NAME "UNIONVIEW2"
03  (ENVIRONMENT FOR COLUMN "ENVIR",
04   FROM_DATE FOR COLUMN "FRMDTE",
05   THRU_DATE FOR COLUMN "THRDATE",
06   SUBMIT_DATE FOR COLUMN "SBMDATE",
07   SUBMIT_TIME FOR COLUMN "SBMTIME")
08  AS
09  (
10  WITH T1 (C1,C2,C3,C4,C5) AS
11  ((SELECT 'A',A.* FROM LIBRARY1.TESTFILE A
12     ORDER BY SBMDATE DESC,SBMTIME DESC
13    LIMIT 2)
14  UNION
15  (SELECT 'B',B.* FROM LIBRARY2.TESTFILE B
16    ORDER BY SBMDATE DESC,SBMTIME DESC
17   LIMIT 2)
18  UNION
19  (SELECT 'C',C.* FROM LIBRARY3.TESTFILE C
20    ORDER BY SBMDATE DESC,SBMTIME DESC
21   LIMIT 2)),

22  T2 (C1,C2,C3,C4,C5) AS
23  (SELECT CAST(C1 AS CHAR(1)),
24          DATE(TIMESTAMP_FORMAT(DIGITS(C2),'MMDDYY')),
25          DATE(TIMESTAMP_FORMAT(DIGITS(C3),'MMDDYY')),
26          C4,C5
27     FROM T1
28    ORDER BY C1,C4 DESC,C5 DESC)
   
29  SELECT * FROM T2
30 ) ;

Lines 1 – 9: The same as before, but I am calling this View a different name UNION_VIEW_2.

Lines 10 – 29: I have decided to do the part that gets the rows and format them in a Common Table Expression, CTE.

Part 1: Get the data

Line 10: All CTE start with WITH followed by a table name. As I like to keep things simple, K.I.S.S., I always call the first one T1. What follows the table name is the list of the columns within it. It does not matter what I call them, so to illustrate I am just calling the resulting columns C1 - C5.

Lines 11 – 21: This is the same SQL statement as before, but as it is being used in a temporary table, T1, it is enclosed with parenthesis ( ( ) ).

Line 21: Notice the comma at the end of this line.

Part 2: Format the data

Line 22: The second table does not need the WITH. The name of the table T2 is followed by the list of the columns in will contain.

Line 23: I am using the CAST to redefine the Environment column from VARCHAR(1) to CHAR(1).

Lines 24 and 25: Here I am converting those numbers masquerading as dates to real dates.

Lines 26: Submit date and time columns.

Line 27: The data comes from the temporary table I created earlier, T1.

Line 28: As I am with the temporary table definition I can sort the results by environment, submit date, and submit time. As this is the last temporary table definition there is no comma at the end of it.

Line 29: This Select statement "displays" the rows for the View, which will use the column names defined at the top of the View.

Why did I do the conversions in T2 rather than T1? If I had performed the conversions in T1 then all of the columns in the three TESTFILE would have been converted before the first two results would have been returned. As these files can contain large amounts of data that is a lot of processing just to return two rows. By performing the conversion in T2 I only convert the rows in just six results.

If I query the new View using:

SELECT * FROM UNION_VIEW_2

I can see that the from and through dates are now true dates:

ENVIRONMENT FROM_DATE  THRU_DATE  SUBMIT_DATE SUBMIT_TIME
----------- ---------- ---------- ----------- -----------
A           2021-04-24 2021-04-25 2021-04-26  09.48.45
A           2021-03-01 2021-03-31 2021-04-02  16.10.45
B           2021-04-21 2021-04-22 2021-04-21  13.48.34
B           2021-04-10 2021-04-20 2021-04-21  12.56.27
C           2021-04-29 2021-04-30 2021-05-01  07.55.15
C           2021-04-01 2021-04-15 2021-04-16  17.39.58

I can use the SYSCOLUMNS View to confirm that my conversion of the environment column was successful:

SELECT TABLE_NAME,COLUMN_NAME,SYSTEM_COLUMN_NAME,
       DATA_TYPE,LENGTH,NUMERIC_SCALE
  FROM QSYS2.SYSCOLUMNS
 WHERE TABLE_SCHEMA = 'MYLIB'
   AND TABLE_NAME = 'UNION_VIEW_2'

The results show that environment is now CHAR(1), and confirms that all the dates are dates.

TABLE                    SYSTEM_     DATE_        NUMERIC
_NAME        COLUMN_NAME COLUMN_NAME TYPE  LENGTH _SCALE
------------ ----------- ----------- ----- ------ -------
UNION_VIEW_2 ENVIRONMENT ENVIR       CHAR       1 
UNION_VIEW_2 FROM_DATE   FRMDTE      DATE       4 
UNION_VIEW_2 THRU_DATE   THRDATE     DATE       4 
UNION_VIEW_2 SUBMIT_DATE SBMDATE     DATE       4 
UNION_VIEW_2 SUBMIT_TIME SBMTIME     TIME       3 

I can now tell the person I am working with to use the second View as there is only the data I want to share, and is formatted in the manner we both want.

 

This article was written for IBM i 7.4, and should work for some earlier releases too.

2 comments:

  1. You might consider using UNION ALL for performance when you know you won't be unioning any duplicate rows, which is guaranteed by the environment variable in your query.

    ReplyDelete
  2. What we do is create ALIAS's over the various tables and UNION them...

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.