Today's post is written by David Taylor.
From time to time, we need to load data from a production system to a development system. To make the process more complex, the data often resides in more than one file. Using SQL, we finally found a way to load the data into multiple files from the same cursor. We include in the SELECT clause any fields from any of the files in the data set to refine the cursor. Once we have the right data set, we can use the final version for the multiple file loads. Keep in mind, we need the data from all three files for the process test to work correctly. We need not just any set of records; we need the data that matches the relationships and limits defined in the cursor. In the samples each file ends with a letter. That same letter is the first character for the field names. For example, FILEA has fields like AKEYFLD1, AFIELD1, and so forth.
01 SELECT AKEYFLD1, BKEYFLD1, CKEYFLD1, AVALUE, CDATE 02 FROM PRODSERV.MYLIBRARY.FILEA 03 JOIN PRODSERV.MYLIBRARY.FILEB 04 ON AFKEYFLD1 = BFKEYFLD1 05 JOIN PRODSERV.MYLIBRARY.FILEC 06 ON BFKEYFLD1 = CFKEYFLD1 07 WHERE FILEA.AVALUE = 'VALUE' 08 AND DATE(FILEC.CDATE) = '2017-05-30' |
Line 1: List all the fields you need to determine the desired data set.
Line 2 - 6: List the file names needed using the fields required to properly join the files together. The example uses the full data file name from the production system where PRODSERV is the name of the target system in the source system directory allowing you to pull directly to one system from another.
Line 7 - 8: List the filters for the data set. Use as many as needed and as few as possible.
Assuming the target library on the development system is MYTSTLIB and all data files are in the same library, make very few changes to use the SQL to load the cursor data into each file.
01 INSERT INTO MYTSTLIB/FILEA 02 SELECT FILEA.* 03 FROM PRODSERV.MYLIBRARY.FILEA 04 JOIN PRODSERV.MYLIBRARY.FILEB 05 ON AKEYFLD1 = BKEYFLD1 06 JOIN PRODSERV.MYLIBRARY.FILEC 07 ON BKEYFLD1 = CKEYFLD1 08 WHERE FILEA.AVALUE = 'VALUE' 09 AND DATE(FILEC.CDATE) = '2017-05-30' 10 FETCH FIRST 100 ROWS ONLY |
Line 1: Define the target file and library on the development system. Yes, you could trust the library list and leave off the library, but better safe than sorry.
Line 2: Pull all the fields from the source file for each current target. This is the only line from the original statement that changes for each of the three files.
Line 3 - 9: This block is the same as the original SQL and will remain the same for all loads.
Line 10: To keep production data from overloading the development system, you may limit to XX rows, in this case 100. The XX rows selected are the rows from the cursor, not the rows from the files. Pulling the first XX rows from each file would not accomplish the need to have the correct data relationships.
Simply repeat the process for the other two files by changing the file name in the INSERT clause and the SELECT clause.
Note well: this method works great for tables without auto-generated columns or other fields such as date fields. Auto-generated fields can include an identity column or a system-generated timestamp. You can still use the basic concept. You simply cannot use the splat (otherwise known as: asterisk or * ) option to select all fields. You would need to list all non-generated fields in the INSERT and SELECT clauses. This will, of course, result in the auto-generated fields having new values. The INSERT and SELECT clauses might look like this.
01 INSERT INTO MYTSTLIB/FILEA 01.1 (AKEYFLD1, AFIELD2, AFIELD3, AFIELD4) 02 SELECT AKEYFLD1, AFIELD2, AFIELD3, AFIELD4 03 FROM PRODSERV.MYLIBRARY.FILEA 04 JOIN PRODSERV.MYLIBRARY.FILEB 05 ON AKEYFLD1 = BKEYFLD1 06 JOIN PRODSERV.MYLIBRARY.FILEC 07 ON BKEYFLD1 = CKEYFLD1 08 WHERE FILEA.AVALUE = 'VALUE' 09 AND DATE(FILEC.CDATE) = '2017-05-30' 10 FETCH FIRST 100 ROWS ONLY |
Line 1.1: From the sample above, add the list of fields to insert.
Line 2: From the sample above list the fields to select.
Once you have this idea down, you can save needing to use SNDNETF or other ways to move data from production to development. You can read more about identity columns here.
This article was written for IBM i 7.1 TR10, and should work for later releases too.
So this is a simple ETL application? This must be a federated database to be able to work with 2 systems in the same SQL statement. What about the Transform of sensitive data, like SSN?
ReplyDeleteThis is straight SQL not ETL. STRSQL and away you go. If you have HIPAA, PCI, or PII, you would need to resolve that according to your company's policy. You could scramble the data on the fly.
DeleteFor tables with Identity Columns, look into "OVERRIDING SYSTEM VALUE". This tells the system to allow the insert into the Column. (This is needed if you have other Tables that rely on this key value.)
ReplyDeleteWhen insert is done. Do a Max on that column.
Then do:
Alter Table TblName Alter Column ColName Restart With Max+1
Cool.
DeleteYou should use "EXISTS" instead of JOIN if you don't want duplicate records
ReplyDeleteGood idea. Some might suggest creating SQL views rather than using inline SELECTs. If the statements need to be run on a regular basis, another idea is to place them in source files that may be run with RUNSQLSTM.
ReplyDeleteExecuting a remote SQL Statements that point to another IBM box implied the use of what is known as three part sql where the production table is pointed out using ProductionBOXNAME.PRODUCTIONLIBRARY.PRODUCTIONFILENAME however in order for this to work user should be aware of this: On the local system, you need to add an entry for the remote IBM box in the relational database directory entry which can be done using WRKRDBDIRE or ADDDRBDIRE.
ReplyDeletePlease help me out here...
ReplyDeleteHow does this statement 'load the data into multiple files from the same cursor'?
How does 'Insert into FILEA...' populate FILEB or FILEC?
Jim, for this example, you need to run the insert three times. For each iteration, you change the INSERT INTO FILEX to the next file and the SELECT FILEX.* to the next file.
ReplyDeleteThank you David! That makes sense.
ReplyDelete