A couple of days ago I published a post about testing a friend had performed looking at the fastest way to copy data from one table to another. I thought his results were really interesting, which is why I shared them.
But there are another couple of ways I would have tested. In this post I will describe them.
With all the example programs I am not displaying the entire program, just the interesting parts.
The tests were performed over the same SQL tables as before:
- TCHAR: All the columns are defined as CHAR.
- TVARCHAR: All the columns are defined as VARCHAR.
- TVAR80: All the columns are defined as VARCHAR with the allocation of 80% of the size of the data, not column size.
- TVAR100: All the columns are defined as VARCHAR, with the allocation of 100% of the columns size.
For their definitions see the previous post.
I created four programs for each test method, one for each of the SQL tables. The only difference was the file names within the program. In these examples I am only going to show the program for the TCHAR table.
The files all contained 15 million rows.
These tests were performed on a less powerful IBM i partition than the original ones were.
The two new tests I performed were:
Copy file with buffering
In an earlier post I described how I can increase the memory buffer size to speed up the Copy File command, CPYF, and by changing the From Record parameter of the command, FROMRCD. I created a new program with these within it:
01 Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE) + TOLIB(QTEMP) NEWOBJ(OUTPUT) + CST(*NO) TRG(*NO) ACCCTL(*NONE)' ; 02 Exec sql CALL QSYS2.QCMDEXC(:Command) ; 03 Command = 'OVRDBF FILE(TCHAR) TOFILE(MYLIB/TCHAR) + OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)' ; 04 Exec sql CALL QSYS2.QCMDEXC(:Command) ; 05 Command = 'OVRDBF FILE(OUTPUT) TOFILE(QTEMP/OUTPUT) + OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)' ; 06 Exec sql CALL QSYS2.QCMDEXC(:Command) ; 07 Command = 'CPYF FROMFILE(MYLIB/TCHAR) TOFILE(QTEMP/OUTPUT) + MBROPT(*ADD) FROMRCD(1)' ; 08 Exec SQL SET :START = CURRENT_TIMESTAMP ; 09 Exec sql CALL QSYS2.QCMDEXC(:Command) ; 10 Exec SQL SET :FINISH = CURRENT_TIMESTAMP ; |
Lines 1 and 2: I move the statement to create the output file into the variable Command. The statement is executed by calling the QCMDEXC SQL procedure.
Lines 3 and 4: I am using the Over Database File command, OVRDBF, to change the memory buffer size to maximum allowed, 256 KB, for the input file. In this example that is TCHAR.
Lines 5 and 6: Changes the memory buffer size for the output file.
Line 7: The CPYF statement I will be using. Do notice that the FROMRCD parameter is set to "1".
Line 8: The timestamp is captured before the statement is executed.
Line 9: The CPYF is executed by the QCMDEXC SQL procedure.
Line 10: The timestamp is captured after the statement was executed.
I ran the original CPYF program 10 times, and this one 10 times, and calculated the average execution times:
Program | No. of tests |
TCHAR | TVARCHAR | TVAR80 | TVAR100 |
Original CPYF | 10 | 429.745696 | 535.306755 | 662.906684 | 438.526985 |
New CPYF | 10 | 205.023708 | 216.990078 | 265.254506 | 222.832959 |
The difference is very clear to see as the new CPYF program took 40-50% less time than the original CPYF program.
SQL insert
In the previous round of testing the program using the SQL multi row fetch was the fastest program. I modified this program slightly:
01 dcl-ds DataDs extname('TCHAR') qualified dim(6000); 02 end-ds ; 03 Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE) TOLIB(QTEMP) NEWOBJ(OUTPUT) + CST(*NO) TRG(*NO) ACCCTL(*NONE)' ; 04 Exec sql CALL QSYS2.QCMDEXC(:Command) ; 05 Exec SQL SET :START = CURRENT_TIMESTAMP ; 06 Exec SQL Declare C0 cursor For select * from MYLIB.TCHAR FOR READ ONLY ; 07 Exec SQL Open C0 ; 08 Dow SqlSuccessful ; 09 clear DataDs ; 10 Exec SQL FETCH C0 11 FOR :NbrOfRows ROWS Into :DataDs ; 12 SqlSuccessful = (SQLCOD <> 100) ; 13 If SqlSuccessful ; 14 Exec sql GET DIAGNOSTICS :RowsFetched = ROW_COUNT ; 15 Exec SQL INSERT into QTEMP.OUTPUT :RowsFetched ROW values( :DataDs ) ; 16 Endif ; 17 Enddo ; 18 Exec SQL Close C0 ; 19 Exec SQL SET :FINISH = CURRENT_TIMESTAMP ; |
The two changes I made to this program:
Lines 1 and 2: I only used one data structure array for both the fetch and the insert.
Line 5: I moved the line to capture the start timestamp to before the cursor definition, as I consider that part of the cursor's logic.
And now the new program with just the SQL insert statement:
01 Command = 'CRTDUPOBJ OBJ(TCHAR) FROMLIB(MYLIB) OBJTYPE(*FILE) TOLIB(QTEMP) NEWOBJ(OUTPUT) + CST(*NO) TRG(*NO) ACCCTL(*NONE)' ; 02 Exec sql CALL QSYS2.QCMDEXC(:Command) ; 03 Exec SQL SET :START = CURRENT_TIMESTAMP ; 04 Exec SQL INSERT INTO QTEMP.OUTPUT SELECT * FROM MYLIB.TCHAR ; 05 Exec SQL SET :FINISH = CURRENT_TIMESTAMP ; |
Line 4: This is the only line that differs from the previous example. This is a very simple SQL insert statement, inserting all of the contents from the input file into the output file.
And now to the results for the two programs:
Program | No. of tests |
TCHAR | TVARCHAR | TVAR80 | TVAR100 |
SQL multi row | 10 | 130.889327 | 192.517998 | 238.171682 | 161.766151 |
SQL insert | 10 | 135.429933 | 164.223856 | 212.056803 | 148.000209 |
The difference between the two was not as large as I thought it would be. In fact the multi row fetch was faster than the insert statement for the CHAR columns, although 5 seconds over 15 million rows may not be a statistical difference.
Both all of SQL programs were still faster than the fastest equivalent CPYF programs.
This article was written for IBM i 7.4, and should work for some earlier releases too.
I wnat only said to you how much i appreciate your work.
ReplyDeleteYou are always one step ahead of us.
Thank you Simon
thanks Simon
ReplyDeletethat was what I wondering about earlier
very surprising sql cursor was "faster" than sql insert
One more things Simon let's say that the copy to file does not exist you can even create the copy to file using sql like Create Table as(Select Statement From Table(Copy From Table))
ReplyDeleteI agree. I use that statement all the time to make work files.
DeleteWrote about it too here.
I was going to comment on the other post that you need to test a pure SQL solution with an "insert into select" statement! It was faster on all but 1 and that was a small difference. I think it's a good assumption that keeping the work inside the database is the best option for most situations.
ReplyDeleteThanks Simons, for sharing
ReplyDelete