I had a project where the first step was to merge data from three files into one. I have used the word "merge" deliberately to differentiate it from "copy". With a "merge" I needed to update specific fields if a record is present in the output file with a matching key. If there is no matching key record I add a new record. I have done this many times in RPG, this time I wanted to try something different, like SQL.
After searching IBM's KnowledgeCenter I discovered the MERGE SQL statement. I do not know when it was released, but searching in the different versions of the KnowledgeCenter I can find it in the versions for IBM i 7.1, 7.2, and 7.3, but I cannot find it in the earlier releases. The MERGE does exactly what I want it to, it will update on a match and insert when there is no match.
Before I explain the MERGE I need to show the three files I will be using in this example. The files are named:
- FILE1
- FILE2
- FILE3
They all contain one common field, that I will be using to match with, and two unique fields.
Not all of the files have the records with the same matching key, see below.
Key | Present in | ||
FILE1 | FILE2 | FILE3 | |
1 | Y | Y | Y |
2 | Y | Y | |
3 | Y | Y | |
4 | Y | ||
5 | Y | ||
6 | Y |
The DDS and the content of for FILE1 looks like this:
01 A R FILE1R 02 A F1KEY1 3A 03 A F1F1 5A 04 A F1F2 10A F1KEY1 F1F1 F1F2 1 AAAAA BBBBBBBBBB 2 CCCCC DDDDDDDDDD 6 EEEEE FFFFFFFFFF |
FILE2 looks like:
01 A R FILE2R 02 A F2KEY1 R REFFLD(F1KEY1 FILE1) 03 A F2F1 10A 04 A F2F2 5P 0 F1KEY1 F2F1 F2F2 1 GGGGGGGGGG 1 2 HHHHHHHHHH 2 3 IIIIIIIIII 3 4 JJJJJJJJJJ 4 |
The reason the field F2KEY1 shows F1KEY1 when looking at the data is because the REFFLD copies the attributes of F1KEY1 from FILE1. As F1KEY1 in FILE1 does not have a column heading, COLHDG, the field name is used.
FILE3:
01 A R FILE3R 02 A F3KEY1 R REFFLD(F1KEY1 FILE1) 03 A F3F1 7P 2 04 A F3F2 5A F1KEY1 F3F1 F3F2 1 5.00 KKKKK 3 6.00 LLLLL 5 7.00 MMMMM |
I decided to put my MERGE into a RPG program, which starts like this:
01 exec sql SET OPTION COMMIT = *NONE ; 02 exec sql DROP TABLE QTEMP.TABLE1 ; 03 exec sql CREATE TABLE QTEMP.TABLE1 04 (KEY_COLUMN FOR "KEYFLD", 05 FIRST,SECOND,THIRD,FOURTH,FIFTH,SIXTH) 06 AS (SELECT A.F1KEY1,A.F1F1,A.F1F2, 07 B.F2F1,B.F2F2, 08 C.F3F1,C.F3F2 09 FROM FILE1 A CROSS JOIN 10 FILE2 B CROSS JOIN 11 FILE3 C 12 WHERE A.F1KEY1 = B.F2KEY1 13 AND A.F1KEY1 = C.F3KEY1) 14 DEFINITION ONLY 15 INCLUDING COLUMN DEFAULTS 16 RCDFMT TABLE1R ; |
Line 1: I unless I need commitment control I always add this option into my SQLRPGLE programs.
Line 2: It may be unnecessary to delete a file in QTEMP, especially when the program is running in batch. This program was not, and I ran it repeatedly changing my program. I wanted to delete the table to ensure that the one that was being created would be the way I wanted it and it would be empty. If I use a CREATE OR REPLACE on line 3 the table would not be deleted and the data remain within it.
Lines 3 – 16: I am creating my output table in a similar way to using REFFLD in DDS, defining the attributes of the columns to be the same as the fields in the files used.
Line 3: When creating a table you use the CREATE TABLE statement followed by which library you want it in and its name. I have decided to use "SQL naming convention", therefore the library and file names are separated by a period ( . ).
Line 4: I have decided to give the first column a long name, KEY_COLUMN, as well as a short name, KEYFLD. I can use either name for the column in this program.
Line 5: These are the names of the other columns.
Lines 6 – 8: I am using a SELECT statement to give the "referenced" fields that will be used to define the columns on lines 4 and 5.
Lines 9 – 11: These are the files those fields are contained within. I have used a CROSS JOIN here I am just creating an empty table, therefore, I don't have to bother with trying to match records by using some other kind of join.
Lines 12 and 13: This defines how the three files will be joined.
Line 14: I only want to define the table, in other words create an empty table, so I use the DEFINTION ONLY.
Line 15: I have found that by not including the from field defaults did cause me errors when inserting into the table. By including INCLUDING COLUMN DEFAULTS means that if a column is not defined in an INSERT the default value, taken from the DDS field, is used.
Line 16: I have given the table a record format name. I may, at some time in the future, decide to read this file in a RPG program, and by having a record format name removes the need for me to use a RENAME in the file definition.
At the end of this code I have an empty table ready to be filled. Let me start by inserting data from FILE1.
17 exec sql INSERT INTO QTEMP.TABLE1 (KEY_COLUMN,FIRST,SECOND) SELECT * FROM FILE1 ; |
When that has finished the output table contains:
KEY_COLUMN FIRST SECOND THIRD FOURTH FIFTH SIXTH 1 AAAAA BBBBBBBBBB 0 .00 2 CCCCC DDDDDDDDDD 0 .00 6 EEEEE FFFFFFFFFF 0 .00 |
Let me now merge the data from FILE2 into the output table:
18 exec sql MERGE INTO QTEMP.TABLE1 A USING FILE2 B 19 ON A.KEY_COLUMN = B.F2KEY1 20 WHEN MATCHED THEN 21 UPDATE SET A.THIRD = B.F2F1, 22 A.FOURTH = B.F2F2 23 WHEN NOT MATCHED THEN 24 INSERT (KEY_COLUMN,THIRD,FOURTH) 25 VALUES (B.F2KEY1,B.F2F1,B.F2F2) ; |
Line 18: I want to merge data into QTEMP.TABLE1 from FILE2…
Line 19: Using these "key" fields for the match.
Line 20 – 22: If the "key" fields match update the existing columns in TABLE1.
Line 23 – 25: If the "key" fields do not match insert a new row into TABLE1 using the values from the fields in FILE2 record.
The MERGE updated the rows for "key" fields 1 and 2, and inserted new rows for the "key" values of 3 and 4.
KEY_COLUMN FIRST SECOND THIRD FOURTH FIFTH SIXTH 1 AAAAA BBBBBBBBBB GGGGGGGGGG 1 .00 2 CCCCC DDDDDDDDDD HHHHHHHHHH 2 .00 3 IIIIIIIIII 3 .00 4 JJJJJJJJJJ 4 .00 6 EEEEE FFFFFFFFFF 0 .00 |
If there is a problem with the matching, for example a duplicate row in either file caused by the given "key" matching, the MERGE quits. The error is diagnostic, therefore, it does not interrupt the user, and the merge is unfinished. The only way to find if this type of error has happened is to look at the joblog when the job has completed. To stop errors like this I can add an extra line to the MERGE statement. The merge of the data from FILE3 contains this extra line.
26 exec sql MERGE INTO QTEMP.TABLE1 A USING FILE3 B 27 ON A.KEY_COLUMN = B.F3KEY1 28 WHEN MATCHED THEN 29 UPDATE SET A.FIFTH = B.F3F1, 30 A.SIXTH = B.F3F2 31 WHEN NOT MATCHED THEN 32 INSERT (KEY_COLUMN,FIFTH,SIXTH) 33 VALUES (B.F3KEY1,B.F3F1,B.F3F2) 34 NOT ATOMIC CONTINUE ON SQLEXCEPTION ; |
That "extra" line is line 34. This wonderfully named line means that if there is an error the merge must continue.
With all three files' data merged my output file looks like:
KEY_COLUMN FIRST SECOND THIRD FOURTH FIFTH SIXTH 1 AAAAA BBBBBBBBBB GGGGGGGGGG 1 5.00 KKKKK 2 CCCCC DDDDDDDDDD HHHHHHHHHH 2 .00 3 IIIIIIIIII 3 6.00 LLLLL 4 JJJJJJJJJJ 4 .00 5 0 7.00 MMMMM 6 EEEEE FFFFFFFFFF 0 .00 |
Having used the MERGE in a more complicated scenario I have to say I am impressed with its functionality and speed. No more merging data using RPG for me, I'll use SQL's MERGE instead.
You can learn more about SQL's MERGE from the IBM website here.
This article was written for IBM i 7.3, and should work for 7.2 and 7.1 too.
Good article Simon, but it would be more complete with the mention of UNION. UNION allows the merging of data from different tables into a single result-set which presents the merged rows across all tables coded in the UNION. The various tables that are UNION'd do not necessarily have to have the same column spellings. A set process (INSERT or whatever) could access a UNION TABLE and write to another file, there by doing a true merge. Keep up the good work!
ReplyDeleteHi Dan
ReplyDeleteCould you please post an example?
SELECT COLa, COLb FROM TABLEa UNION ALL SELECT COLx, COLy FROM TABLEb ORDER BY 1, 2;
ReplyDeleteAs long as the data types of COLa same as COLx and COLb same as COLy this will work.
UNION ALL means all rows from both tables are merged. UNION without ALL means only distinct rows merged.
Instead of UNION one could use INTERSECT or SUBTRACT to do all kinds ways to create sets of data.
Where would RPG be without DB2?!? LOL
I started using the SQL Merge statements in 2013. Unfortunately, my stone-age supervisor could not grasp it and asked others to investigate it as if I had somehow concocted it. He was against RPG-FREE too.
ReplyDeleteThere are a lot of managers and supervisors still stuck in that "stone age".
DeleteThat's just because they are not updated with the new techniques. Things that they don't understand are the things we should not use. That's how they think. In my county, there are too many such managers. SQL is like a taboo to them.
DeletePart of being a good manager is to learn and trust from your staff. Wanting everyone to do thing the same way you did 20 years ago hurts you, your employer, and destroys your staff's morale.
DeleteYes but they do not want to learn because they wrongly think that's going to be difficult and they do not want to risk their retirement.
DeleteHI Simon,
ReplyDeleteIs it possible to do a union when each has a host selection variable
ie Ive got something like this
mysql = 'Select feild1, field2, field3 from table
where field1 like ?
union
Select feild3, field4, field5 from table
where field3 like ? ';
// then some calcs to prepare the string to produce varout
exec sql
Prepare S1 from :mysql;
exec sql
Declare Input Cursor for S1;
exec sql
open Input using :varout;
dow '1';
exec sql
fetch Input into :Results;
if sqlstt >= SqlEof;
leave;
endif;
//do stuff with ooutput
enddo;
This doesn't work with their being 2 selection variables (does work with the one without the union though). Is there a way do this?
Regards
james
Have you tried building the SQL statement in a variable and then executing it using the EXECUTE IMMEDIATE?
DeleteSee here.
It's probably worth mentioning that one MERGE statement could potentially fire all the related insert, update and delete triggers.
ReplyDeleteRinger
Yes, Simon. It's a very powerful instruction.
ReplyDeleteIn any case merge function allows you to be much faster than using other methods. Especially if you have to built-up data export or similar.
I'm a "stone-age guy" that discovered the wheel not long time ago...
Cheers
Andrea DD
I love the merge stmt.
ReplyDeleteVery useful. Thanks Simon.
ReplyDelete