Pages

Friday, August 13, 2021

Finding fastest way to copy data – part 2

cpyf with block size and sql insert

A couple of days ago I published a post about testing a friend had performed looking at the fastest way to copy data from one table to another. I thought his results were really interesting, which is why I shared them.

But there are another couple of ways I would have tested. In this post I will describe them.

With all the example programs I am not displaying the entire program, just the interesting parts.

The tests were performed over the same SQL tables as before:

  • TCHAR:  All the columns are defined as CHAR.
  • TVARCHAR:  All the columns are defined as VARCHAR.
  • TVAR80:  All the columns are defined as VARCHAR with the allocation of 80% of the size of the data, not column size.
  • TVAR100:  All the columns are defined as VARCHAR, with the allocation of 100% of the columns size.

For their definitions see the previous post.

I created four programs for each test method, one for each of the SQL tables. The only difference was the file names within the program. In these examples I am only going to show the program for the TCHAR table.

The files all contained 15 million rows.

These tests were performed on a less powerful IBM i partition than the original ones were.

The two new tests I performed were:

 

Copy file with buffering

In an earlier post I described how I can increase the memory buffer size to speed up the Copy File command, CPYF, and by changing the From Record parameter of the command, FROMRCD. I created a new program with these within it:

01  Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE) +
                           TOLIB(QTEMP) NEWOBJ(OUTPUT) +
                           CST(*NO) TRG(*NO) ACCCTL(*NONE)' ;
02  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

03  Command = 'OVRDBF FILE(TCHAR) TOFILE(MYLIB/TCHAR) +
                     OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)' ;
04  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

05  Command = 'OVRDBF FILE(OUTPUT) TOFILE(QTEMP/OUTPUT) +
                     OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)' ;
06  Exec sql CALL QSYS2.QCMDEXC(:Command) ;
                                                                  
07  Command = 'CPYF FROMFILE(MYLIB/TCHAR) TOFILE(QTEMP/OUTPUT) +
                      MBROPT(*ADD) FROMRCD(1)' ;

08  Exec SQL SET :START = CURRENT_TIMESTAMP ;

09  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

10  Exec SQL SET :FINISH = CURRENT_TIMESTAMP ;

Lines 1 and 2: I move the statement to create the output file into the variable Command. The statement is executed by calling the QCMDEXC SQL procedure.

Lines 3 and 4: I am using the Over Database File command, OVRDBF, to change the memory buffer size to maximum allowed, 256 KB, for the input file. In this example that is TCHAR.

Lines 5 and 6: Changes the memory buffer size for the output file.

Line 7: The CPYF statement I will be using. Do notice that the FROMRCD parameter is set to "1".

Line 8: The timestamp is captured before the statement is executed.

Line 9: The CPYF is executed by the QCMDEXC SQL procedure.

Line 10: The timestamp is captured after the statement was executed.

I ran the original CPYF program 10 times, and this one 10 times, and calculated the average execution times:

Program No. of
tests
TCHAR TVARCHAR TVAR80 TVAR100
Original CPYF 10 429.745696 535.306755 662.906684 438.526985
New CPYF 10 205.023708 216.990078 265.254506 222.832959

The difference is very clear to see as the new CPYF program took 40-50% less time than the original CPYF program.

 

SQL insert

In the previous round of testing the program using the SQL multi row fetch was the fastest program. I modified this program slightly:

01  dcl-ds DataDs extname('TCHAR') qualified dim(6000);
02  end-ds ;

03  Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE)
                           TOLIB(QTEMP) NEWOBJ(OUTPUT) +
                           CST(*NO) TRG(*NO) ACCCTL(*NONE)' ;
04  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

05  Exec SQL SET :START = CURRENT_TIMESTAMP ;

06  Exec SQL Declare C0 cursor For
      select * from MYLIB.TCHAR
        FOR READ ONLY ;

07  Exec SQL Open C0 ;

08  Dow SqlSuccessful ;
09    clear DataDs ;

10    Exec SQL FETCH C0
11         FOR :NbrOfRows ROWS Into :DataDs ;
12    SqlSuccessful = (SQLCOD <> 100) ;

13    If SqlSuccessful ;
14      Exec sql GET DIAGNOSTICS :RowsFetched = ROW_COUNT ;

15      Exec SQL INSERT into QTEMP.OUTPUT :RowsFetched ROW
             values( :DataDs ) ;
16    Endif ;
17  Enddo ;

18  Exec SQL Close C0 ;

19  Exec SQL SET :FINISH = CURRENT_TIMESTAMP ;

The two changes I made to this program:

Lines 1 and 2: I only used one data structure array for both the fetch and the insert.

Line 5: I moved the line to capture the start timestamp to before the cursor definition, as I consider that part of the cursor's logic.

And now the new program with just the SQL insert statement:

01  Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE)
                         TOLIB(QTEMP) NEWOBJ(OUTPUT) +
                         CST(*NO) TRG(*NO) ACCCTL(*NONE)' ;
02  Exec sql CALL QSYS2.QCMDEXC(:Command) ;

03  Exec SQL SET :START = CURRENT_TIMESTAMP ;

04  Exec SQL INSERT INTO QTEMP.OUTPUT SELECT * FROM MYLIB.TCHAR ;

05  Exec SQL SET :FINISH = CURRENT_TIMESTAMP ;

Line 4: This is the only line that differs from the previous example. This is a very simple SQL insert statement, inserting all of the contents from the input file into the output file.

And now to the results for the two programs:

Program No. of
tests
TCHAR TVARCHAR TVAR80 TVAR100
SQL multi row 10 130.889327 192.517998 238.171682 161.766151
SQL insert 10 135.429933 164.223856 212.056803 148.000209

The difference between the two was not as large as I thought it would be. In fact the multi row fetch was faster than the insert statement for the CHAR columns, although 5 seconds over 15 million rows may not be a statistical difference.

Both all of SQL programs were still faster than the fastest equivalent CPYF programs.

 

This article was written for IBM i 7.4, and should work for some earlier releases too.

6 comments:

  1. I wnat only said to you how much i appreciate your work.

    You are always one step ahead of us.
    Thank you Simon

    ReplyDelete
  2. thanks Simon
    that was what I wondering about earlier
    very surprising sql cursor was "faster" than sql insert

    ReplyDelete
  3. One more things Simon let's say that the copy to file does not exist you can even create the copy to file using sql like Create Table as(Select Statement From Table(Copy From Table))

    ReplyDelete
    Replies
    1. I agree. I use that statement all the time to make work files.

      Wrote about it too here.

      Delete
  4. I was going to comment on the other post that you need to test a pure SQL solution with an "insert into select" statement! It was faster on all but 1 and that was a small difference. I think it's a good assumption that keeping the work inside the database is the best option for most situations.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.