Wednesday, October 6, 2021

Using Lateral in SQL

sql lateral joins

In the past I have only given examples of using LATERAL in SQL Select statements to get data from Table function. As such I have only mentioned it in passing. In this post I wanted to go into more detail so you can see that it can be used for more than just getting data from a Table function.

Having searched for the best description I could find to explain what a LATERAL is, I found that IBM does best with the following:

A lateral join is essentially a foreach loop in SQL. A lateral join is represented by the keyword LATERAL with an inner subquery in the FROM clause, as shown in the following simple representation:

SELECT <columns>
  FROM <tableReference>
  LATERAL <innerSubquery>

An example of the way I have shown using a LATERAL would be to list when a SQL Index was last used. That would need to join the SYSINDEXES View with the OBJECT_STATISTICS Table function.

01  SELECT SYSTEM_INDEX_SCHEMA AS "Lib",
02         INDEX_NAME AS "Index name",
03         SYSTEM_INDEX_NAME AS "Sys name",
04         SYSTEM_TABLE_SCHEMA AS "Table lib",
05         SYSTEM_TABLE_NAME AS "Table name",
06         DATE(B.LAST_USED_TIMESTAMP) AS "Last used",
07         B.DAYS_USED_COUNT AS "Days used"
08   FROM QSYS2.SYSINDEXES,
09  LATERAL 
10  (SELECT LAST_USED_TIMESTAMP,DAYS_USED_COUNT
11     FROM TABLE(QSYS2.OBJECT_STATISTICS(
12        OBJECT_SCHEMA => INDEX_SCHEMA,
13        OBJTYPELIST => 'FILE',
14        OBJECT_NAME => SYSTEM_INDEX_NAME))) B
15  LIMIT 5 ;

Lines 1 – 5: These are the columns from the SYSINDEXES View.

Lines 6 and 7: Columns from the OBJECT_STATISTICS Table function.

Line 9: The LATERAL that allows me to include the subquery to get the data from the Table function.

Line 10: The columns I want returned from the Table function.

Line 11: Definition of the Table function.

Lines 12 – 14: The parameters used by the OBJECT_STATISTICS Table function to get the information for the object I want. The columns used are taken from the SYSINDEXES View, which ensures that only one row of results are returned from this subquery.

Line 15: I only want five results returned.

What do those results look like:

Lib  Index name       Sys name   Table lib Table name Last used  Day used
---- ---------------- ---------- --------- ---------- ---------- --------
QGPL QZG0000040_QINX1 QZG0000001 QGPL      QZG0000040 2019-10-31        1
QGPL QZG0000040_QINX2 QZG0000002 QGPL      QZG0000040             0
QGPL QZG0000040_QINX3 QZG0000003 QGPL      QZG0000040 2019-10-31        1
QGPL QZG0000040_QINX4 QZG0000004 QGPL      QZG0000040             0
QGPL QZG0000040_QINX5 QZG0000005 QGPL      QZG0000040             0

Another way a LATERAL can be used is to define a new column. For example, I have a file, no surprises for guessing it is called TESTFILE, and I want to create a new column that contains the sum of two of the other columns. I could easily do this with the following statement that includes the LATERAL.

01  SELECT A.*,B.VAL1
02    FROM TESTFILE A
03   CROSS JOIN LATERAL(
04  VALUES (FLD001 + FLD002)) B(VAL1) ;

Line 1: The new column is B.VAL1.

Line 3: To join the LATERAL to the rest of the statement I need to use a JOIN. In this case as the relationship is a one-to-one, I am using a CROSS JOIN.

Line 4: I am using the VALUES that allows me to sum the values from the columns FLD001 and FLD002. The "B" is the prefix I used to be used for the column. This could easily be replaced by another value.

The results are:

FLD001  FLD002  VAL1
------  ------  ----
     1       1     2
     2       2     4
     3       3     6

I know I could have done the same with the following statement:

SELECT A.*,(FLD001 + FLD002) AS "VAL1"
  FROM TESTFILE A ;

Imagine that my calculation is more complex. Not only do I want to show the column in my results, I also want to sort my results by the calculated results, and only include those results where the value is greater than 3. In my opinion it would be easier to create a column that contains the result and then use it in multiple places in the statement. For example:

01  SELECT A.*,B.VAL1
02    FROM RPGPGM1.TESTFILE A 
03   CROSS JOIN LATERAL(
04  VALUES (FLD001 + FLD002)) B(VAL1)
05  WHERE B.VAL1 > 3
06  ORDER BY B.VAL1 DESC  ;

Line 4: I have defined the column VAL1 to contain the result of my calculation.

Line 1: The new column is in the results.

Line 5: It is also used in the WHERE clause.

Line 6: And also in the ORDER BY clause too.

I think you will agree that this is easier to enter, and for someone else to understand rather than having the same calculation in three places. It also prevents maintenance issues, as someone may have to change the calculation and not have to make the same change in all three places.

Let me add a new file into the mix for the next few examples, NAMEFILE. I can view its contents with the following statement:

SELECT * FROM NAMEFILE ;

That returns to me:

FLD001  FLD003
------  --------
     1  ANDREW
     2  BRYAN
     3  CHARLES
     4  DAVID

I can use a lateral to join the two files together, and return the calculation's result:

01  SELECT A.*,B.VAL1
02    FROM NAMEFILE A 
03   CROSS JOIN LATERAL(
04     SELECT (FLD001 + FLD002)
05       FROM TESTFILE
06      WHERE FLD001 = A.FLD001) B(VAL1) ;

Line 1: All the columns from NAMEFILE and the calculated column will be in the results.

Line 3: The CROSS JOIN lateral makes a one-to-one join between the main query and the subquery.

Lines 4 – 6: The result from the subquery's Select is the sum of FLD001 and FLD002, line 4. Those columns come from TESTFILE, line 5, when the value of FLD001 on both files match, line 6. The result of the subquery Select is placed in the variable VAL1.

The results are:

FLD001  FLD003    VAL1
------  --------  ------
     1  ANDREW         2
     2  BRYAN          4
     3  CHARLES        6

But there is no result for David. That is because I have used a cross join, which requires for there to be matching rows in both files. Any unmatched rows are ignored.

To include David I use the LEFT JOIN with the lateral.

01  SELECT A.*,B.VAL1
02    FROM NAMEFILE A 
03    LEFT JOIN LATERAL(
04      SELECT FLD001,(FLD001 + FLD002)
05        FROM TESTFILE) B(FLD001,VAL1)
06    ON A.FLD001 = B.FLD001 ;

Line 3: Here is the LEFT JOIN lateral which will include all the results from NAMEFILE even though there may not be a matching row in TESTFILE.

Line 4: Two results need to be returned. The first is the "key", FLD001, so I can join the two files together. The second is the result of the calculation.

Line 5: The returned columns are separated by a comma ( , ).

Line 6: The ON clause is what is used to join the two files, which is why the subquery had to return FLD001 so it could be match to the field of the same name in NAMEFILE.

The results are:

FLD001  FLD003    VAL1
------  --------  ------
     1  ANDREW         2
     2  BRYAN          4
     3  CHARLES        6
     4  DAVID     <NULL>

The result of the calculation for David is null as there is no data for David in TESTFILE.

 

You can learn more about LATERAL IBM website here.

 

This article was written for IBM i 7.4 TR4 and 7.3 TR10.

6 comments:

  1. More good SQL.

    ReplyDelete
  2. Nice one Simon. I've been perplexed by derived columns (expression) and then replicating that on the WHERE clause. Never thought of that!
    Thanks!
    -Bob

    SELECT A.*,B.VAL1
    02 FROM RPGPGM1.TESTFILE A
    03 CROSS JOIN LATERAL(
    04 VALUES (FLD001 + FLD002)) B(VAL1)
    05 WHERE B.VAL1 > 3
    06 ORDER BY B.VAL1 DESC

    ReplyDelete
  3. Thank you for sharing.

    ReplyDelete
  4. Simon, this is totally new to me. Using the join functions has always been very useful to me now. we have another function to join with inner and outer.. welcome to the family lateral join.. Simon thanks for the great examples and sharing this new sql function.. Again thanks for sharing..

    ReplyDelete
  5. Nice post.

    Another common use of Lateral (at least, in my case), is to convert columns to records.

    Let's say we have a table with 3 columns, phone1, phone2, phone3.
    We can "pivot" these columns into "rows" using LATERAL.

    Example:
    create table testlateral (
    id int,
    phone1 int,
    phone2 int,
    phone3 int)
    ;

    INSERT INTO testlateral
    VALUES (1, 1111111, 1222222, 1333333),
    (3, 3111111, 3222222, 3333333),
    (2, 2111111, 2222222, 2333333);

    So, the content of the table is:
    ID PHONE1 PHONE2 PHONE3
    -- ------- ------- -------
    1 1111111 1222222 1333333
    3 3111111 3222222 3333333
    2 2111111 2222222 2333333


    Finally:
    select id, id_phone, phone_number from testlateral as s,
    lateral(values (1, S.phone1),
    (2, S.phone2),
    (3, S.phone3)) as T(id_phone, phone_number)
    ;


    And we get:
    ID ID_PHONE PHONE_NUMBER
    -- -------- ------------
    1 1 1111111
    1 2 1222222
    1 3 1333333
    3 1 3111111
    3 2 3222222
    3 3 3333333
    2 1 2111111
    2 2 2222222
    2 3 2333333

    And now, as we already have independent "rows", we can run any sql sentence over them.

    Very practical when you have to access columns with array content (extracting previously the elemnts into single columns).

    ReplyDelete
  6. I know I'm late - never too late to say thanks :-) Ran into LATERAL looking at QSYS2.SYSPINDEX. Great job, Simon!

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.