Finding fastest way to copy data – part 2 @ RPGPGM.COM

A couple of days ago I published a post about testing a friend had performed looking at the fastest way to copy data from one table to another. I thought his results were really interesting, which is why I shared them.

But there are another couple of ways I would have tested. In this post I will describe them.

With all the example programs I am not displaying the entire program, just the interesting parts.

The tests were performed over the same SQL tables as before:

TCHAR: All the columns are defined as CHAR.
TVARCHAR: All the columns are defined as VARCHAR.
TVAR80: All the columns are defined as VARCHAR with the allocation of 80% of the size of the data, not column size.
TVAR100: All the columns are defined as VARCHAR, with the allocation of 100% of the columns size.

For their definitions see the previous post.

I created four programs for each test method, one for each of the SQL tables. The only difference was the file names within the program. In these examples I am only going to show the program for the TCHAR table.

The files all contained 15 million rows.

These tests were performed on a less powerful IBM i partition than the original ones were.

The two new tests I performed were:

Copy file with buffering
SQL insert

Copy file with buffering

In an earlier post I described how I can increase the memory buffer size to speed up the Copy File command, CPYF, and by changing the From Record parameter of the command, FROMRCD. I created a new program with these within it:

01  Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE) +
                           TOLIB(QTEMP) NEWOBJ(OUTPUT) +
                           CST(*NO) TRG(*NO) ACCCTL(*NONE)' ;
02  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

03  Command = 'OVRDBF FILE(TCHAR) TOFILE(MYLIB/TCHAR) +
                     OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)' ;
04  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

05  Command = 'OVRDBF FILE(OUTPUT) TOFILE(QTEMP/OUTPUT) +
                     OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)' ;
06  Exec sql CALL QSYS2.QCMDEXC(:Command) ;
                                                                  
07  Command = 'CPYF FROMFILE(MYLIB/TCHAR) TOFILE(QTEMP/OUTPUT) +
                      MBROPT(*ADD) FROMRCD(1)' ;

08  Exec SQL SET :START = CURRENT_TIMESTAMP ;

09  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

10  Exec SQL SET :FINISH = CURRENT_TIMESTAMP ;

Lines 1 and 2: I move the statement to create the output file into the variable Command. The statement is executed by calling the QCMDEXC SQL procedure.

Lines 3 and 4: I am using the Over Database File command, OVRDBF, to change the memory buffer size to maximum allowed, 256 KB, for the input file. In this example that is TCHAR.

Lines 5 and 6: Changes the memory buffer size for the output file.

Line 7: The CPYF statement I will be using. Do notice that the FROMRCD parameter is set to "1".

Line 8: The timestamp is captured before the statement is executed.

Line 9: The CPYF is executed by the QCMDEXC SQL procedure.

Line 10: The timestamp is captured after the statement was executed.

I ran the original CPYF program 10 times, and this one 10 times, and calculated the average execution times:

Program	No. of tests	TCHAR	TVARCHAR	TVAR80	TVAR100
Original CPYF	10	429.745696	535.306755	662.906684	438.526985
New CPYF	10	205.023708	216.990078	265.254506	222.832959

The difference is very clear to see as the new CPYF program took 40-50% less time than the original CPYF program.

SQL insert

In the previous round of testing the program using the SQL multi row fetch was the fastest program. I modified this program slightly:

01  dcl-ds DataDs extname('TCHAR') qualified dim(6000);
02  end-ds ;

03  Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE)
                           TOLIB(QTEMP) NEWOBJ(OUTPUT) +
                           CST(*NO) TRG(*NO) ACCCTL(*NONE)' ;
04  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

05  Exec SQL SET :START = CURRENT_TIMESTAMP ;

06  Exec SQL Declare C0 cursor For
      select * from MYLIB.TCHAR
        FOR READ ONLY ;

07  Exec SQL Open C0 ;

08  Dow SqlSuccessful ;
09    clear DataDs ;

10    Exec SQL FETCH C0
11         FOR :NbrOfRows ROWS Into :DataDs ;
12    SqlSuccessful = (SQLCOD <> 100) ;

13    If SqlSuccessful ;
14      Exec sql GET DIAGNOSTICS :RowsFetched = ROW_COUNT ;

15      Exec SQL INSERT into QTEMP.OUTPUT :RowsFetched ROW
             values( :DataDs ) ;
16    Endif ;
17  Enddo ;

18  Exec SQL Close C0 ;

19  Exec SQL SET :FINISH = CURRENT_TIMESTAMP ;

The two changes I made to this program:

Lines 1 and 2: I only used one data structure array for both the fetch and the insert.

Line 5: I moved the line to capture the start timestamp to before the cursor definition, as I consider that part of the cursor's logic.

And now the new program with just the SQL insert statement:

01  Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE)
                         TOLIB(QTEMP) NEWOBJ(OUTPUT) +
                         CST(*NO) TRG(*NO) ACCCTL(*NONE)' ;
02  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

03  Exec SQL SET :START = CURRENT_TIMESTAMP ;

04  Exec SQL INSERT INTO QTEMP.OUTPUT SELECT * FROM MYLIB.TCHAR ;

05  Exec SQL SET :FINISH = CURRENT_TIMESTAMP ;

Line 4: This is the only line that differs from the previous example. This is a very simple SQL insert statement, inserting all of the contents from the input file into the output file.

And now to the results for the two programs:

Program	No. of tests	TCHAR	TVARCHAR	TVAR80	TVAR100
SQL multi row	10	130.889327	192.517998	238.171682	161.766151
SQL insert	10	135.429933	164.223856	212.056803	148.000209

The difference between the two was not as large as I thought it would be. In fact the multi row fetch was faster than the insert statement for the CHAR columns, although 5 seconds over 15 million rows may not be a statistical difference.

Both all of SQL programs were still faster than the fastest equivalent CPYF programs.

This article was written for IBM i 7.4, and should work for some earlier releases too.

6 comments:

Marco BalgeraAugust 13, 2021 at 5:28 AM
I wnat only said to you how much i appreciate your work.

You are always one step ahead of us.
Thank you Simon
AnonymousAugust 13, 2021 at 7:42 AM
thanks Simon
that was what I wondering about earlier
very surprising sql cursor was "faster" than sql insert
JoseenocperezAugust 13, 2021 at 10:07 AM
One more things Simon let's say that the copy to file does not exist you can even create the copy to file using sql like Create Table as(Select Statement From Table(Copy From Table))
Mark FergasonAugust 13, 2021 at 2:18 PM
I was going to comment on the other post that you need to test a pure SQL solution with an "insert into select" statement! It was faster on all but 1 and that was a small difference. I think it's a good assumption that keeping the work inside the database is the best option for most situations.
Juan VasquezJanuary 4, 2022 at 12:05 PM
Thanks Simons, for sharing

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.

RPGPGM.COM - From AS400 to IBM i

Pages

Friday, August 13, 2021