The question was: How can I break apart a hundred long field into five twenty long fields and then have them returned, one after another, as separate result rows using SQL?
Obviously substring is going to be used, but how to return the resulting rows one after another?
Fortunately the easiest solution for this scenario came in earlier Technology Refreshes for the currently supported releases of IBM i, 7.4 TR4 and 7.3 TR10.
Before I show that I need to have a file with a field that is a hundred characters long:
01 A R TESTFILER 02 A FLD001 100A |
I chose to use a file as the question asked for a file. But this will work just as well for a DDL Table:
01 CREATE TABLE MYLIB.TESTFILE 02 (FLD001 CHAR(100)) ; |
I decided to write a "quick and dirty" RPG program to fill the file.
01 **free 02 dcl-f TESTFILE usage(*output) ; 03 dcl-ds FLD001 ; 04 SF1 char(20) ; 05 SF2 char(20) ; 06 SF3 char(20) ; 07 SF4 char(20) ; 08 SF5 char(20) ; 09 end-ds ; 10 SF1 = 'Light' ; 11 SF2 = 'Pencil' ; 12 SF3 = 'Headphones' ; 13 SF4 = 'Notepad' ; 14 SF5 = 'Phone ' ; 15 write TESTFILER ; 16 SF1 = 'Push pin' ; 17 SF2 = 'Scissors' ; 18 SF3 = 'Ceramic cat' ; 19 SF4 = 'Keyboard' ; 20 SF5 = 'Mouse' ; 21 write TESTFILER ; 22 *inlr = *on ; |
Line 2: This is the file definition for the output file.
Lines 3 – 9: I have named this data structure to have the same name as the field in the output file. The five subfields make it easy for me to position data in the places I need.
Lines 10 – 21: I have created two records of things that I have on my desk.
I used SQL to check the data in the output file:
01 SELECT * FROM TESTFILE |
The results are as I expected:
FLD001 ------------------------------------------------------------------------------------- Light Pencil Headphones Notepad Phone Push pin Scissors Ceramic cat Keyboard Mouse |
I am going to use the SQL SPLIT table function to break these records/rows into five parts, and return these parts one after another.
In my experience SPLIT works best if I use a "separator character" that delimits one sub field from another. I chose to use a comma ( , ), but I could have used any other character if commas are present in the string.
I can insert a comma at the end of each subfield using the SUBSTRING, or SUBSTR, statement:
01 SELECT SUBSTR(FLD001,1,20) || ',' || 02 SUBSTR(FLD001,21,20) || ',' || 03 SUBSTR(FLD001,41,20) || ',' || 04 SUBSTR(FLD001,61,20) || ',' || 05 SUBSTR(FLD001,81,20) 06 FROM TESTFILE |
By all means the result is more than a hundred character, but that is unimportant as this is a temporary string.
I use the double pipes ( || ) to concatenate these parts into one string.
FLD001 ----------------------------------------------------------------------------------------- Light ,Pencil ,Headphones ,Notepad ,Phone Push pin ,Scissors ,Ceramic cat ,Keyboard ,Mouse |
If I wanted to I could remove all the trailing spaces when substring:
01 SELECT RTRIM(SUBSTR(FLD001,1,20)) || ',' || 02 RTRIM(SUBSTR(FLD001,21,20)) || ',' || 03 RTRIM(SUBSTR(FLD001,41,20)) || ',' || 04 RTRIM(SUBSTR(FLD001,61,20)) || ',' || 05 RTRIM(SUBSTR(FLD001,81,20)) 06 FROM TESTFILE |
Here I have used the RTRIM to perform a right sided trim. The results are more compacted than what I showed before:
00001 --------------------------------------------- Light,Pencil,Headphones,Notepad,Phone Push pin,Scissors,Ceramic cat,Keyboard,Mouse |
Now I can add the SPLIT in my statement. SPLIT table function requires two parameters:
- String to split
- Separator character
It returns two columns:
- Ordinal position – which element in the string this is
- Element – the extracted part of the string
My statement is:
01 SELECT ELEMENT,ORDINAL_POSITION 02 FROM TESTFILE, 03 TABLE(SYSTOOLS.SPLIT(SUBSTR(FLD001,1,20) || ',' || 04 SUBSTR(FLD001,21,20) || ',' || 05 SUBSTR(FLD001,41,20) || ',' || 06 SUBSTR(FLD001,61,20) || ',' || 07 SUBSTR(FLD001,81,20), 08 ',')) |
Line 1: I am really only interested in the ELEMENT column, but I have included the ORDINAL_POSITION so you can see that I have ten results, five from each record.
Lines 3 – 7: These substrings make my input string, as I have shown above. Notice that line 7 ends with a comma as that is the end of the first input parameter.
Line 8: This is the separator character, the comma.
The results are exactly what I wanted. The first five rows are from the first record, and the next five from the second.
ELEMENT ORDINAL_POSITION -------------- ---------------- Light 1 Pencil 2 Headphones 3 Notepad 4 Phone 5 Push pin 1 Scissors 2 Ceramic cat 3 Keyboard 4 Mouse 5 |
This article was written for IBM i 7.4 TR4 and 7.3 TR10.
How about a lateral cross join with a values clause instead?
ReplyDeleteIt's faster and less error prone.
SELECT
TESTFILE_SPLIT.*
FROM
(VALUES('A B C D E ')) AS TESTFILE(FLD)
CROSS JOIN LATERAL(
VALUES
(SUBSTR(TESTFILE.FLD, 1, 3)),
(SUBSTR(TESTFILE.FLD, 4, 3)),
(SUBSTR(TESTFILE.FLD, 7, 3)),
(SUBSTR(TESTFILE.FLD, 10, 3)),
(SUBSTR(TESTFILE.FLD, 13, 3))
) AS TESTFILE_SPLIT
Wow, didn't know about SYSTOOLS.SPLIT(). Very useful!
ReplyDeleteYour above approach is probably the best, unless you don't know exactly what kind of characters are in your original string. If the data has commas anywhere, for example, then our SPLIT won't work as expected. We need to know 100% a delimiter that is safe to use, but might not always have that information.
If you know that your data is always 100 long and want to split at 20, you could also do it as follows with UNION:
WITH segment AS (
SELECT ROW_NUMBER() OVER() AS rn#,
SUBSTR(fld001, 1,20) fld1,
SUBSTR(fld001,21,20) fld2,
SUBSTR(fld001,41,20) fld3,
SUBSTR(fld001,61,20) fld4,
SUBSTR(fld001,81,20) fld5
FROM TESTFILE
)
SELECT fld
FROM (
SELECT rn#, 1 piece, fld1 fld FROM segment UNION ALL
SELECT rn#, 2 piece, fld2 fld FROM segment UNION ALL
SELECT rn#, 3 piece, fld3 fld FROM segment UNION ALL
SELECT rn#, 4 piece, fld4 fld FROM segment UNION ALL
SELECT rn#, 5 piece, fld5 fld FROM segment
)
ORDER BY rn#, piece
--
The weakness of both the original SPLIT approach, and the above UNION approach is that they will both only work when you know the exact length of the original string and how many pieces you want to break it into. Because we know we will always have exactly 5 pieces, we know how many SUBSTRs to do, and how many UNIONs.
In a more general scenario, if you wanted to break out a string of variable length into equal sized pieces, you could use recursive SQL. The below example is overbuilt for the specific question where we do know it's always 100 long and we want 20 length pieces, but can definitely come in useful for other similar problems. The below SQL lets you put in the size of the piece you want on the first line (current set to 20).
WITH piecelngth(lngth) AS (VALUES (20)
), pieces(rn#, segment, fld, str) AS (
SELECT rn#, 0 segment, '' fld, str
FROM (SELECT ROW_NUMBER() OVER() rn#, fld001 AS str
FROM TESTFILE
)
UNION ALL
SELECT rn#,segment + 1 AS segment,
CASE WHEN LENGTH(TRIM(str))0
)
SELECT fld
FROM pieces
WHERE segment>0
ORDER BY rn#, segment
You can just do it in a less convoluted way; just think in a relational/functional way.
ReplyDeleteJust do:
SELECT SUBSTRING(FLD001, tmp.index, 20)
FROM MYLIB.TESTFILE
CROSS JOIN (VALUES 1,21,41,61,81 ) as tmp (index)
This correct the previous recursive definition because ordering preservation is a requirement.
ReplyDeleteWITH BASE(field, index, len, part) AS (
SELECT FLD001, 21, 20, substring(FLD001, 1, 20)
FROM TRENTINIF.TESTFILE
UNION ALL
SELECT field, index+len, len, substring(field, index, len)
FROM BASE WHERE substring(field, index, len) <> ''
)
SEARCH DEPTH FIRST BY field SET ordcol
select part from base order by ordcol