Update
Before using what is described below consider using SQL to read IFS files.
After publishing a post about reading an IFS file using RPG I received an interesting communication from Jeff Davis:
I would also say explore the fopen, fget and fclose. With these apis you don't have to scan for the crlf characters as they parse the data on those.
So I did what he suggested and investigated the C APIs he mentioned: fopen, fgets, and fclose. What I found convinced me that this is the better way to read an IFS file using RPG.
I used the same text file and IFS folder that I had in the previous post:
first record At the start second record third record fourth record fifth record sixth record seventh record eighth record ninth record tenth and last record |
Before I had used three UNIX-type APIs: open, read, and close<. This time I will be using these three C APIs:
01 ctl-opt option(*srcstmt) dftactgrp(*no) ; 02 dcl-pr OpenFile pointer extproc('_C_IFS_fopen') ; 03 *n pointer value ; //File name 04 *n pointer value ; //File mode 05 end-pr ; 06 dcl-pr ReadFile pointer extproc('_C_IFS_fgets') ; 07 *n pointer value ; //Retrieved data 08 *n int(10) value ; //Data size 09 *n pointer value ; //Misc pointer 10 end-pr ; 11 dcl-pr CloseFile extproc('_C_IFS_fclose') ; 12 *n pointer value ; //Misc pointer 13 end-pr ; |
First thing I found is that the three C APIs are not called just fopen, fgets, and fclose the API procedures all start with _C_IFS_ then the API name.
Line 1: The usual control options as I want the program to report errors by using the source sequence number, and as I am using external APIs I cannot operate in the default activation group.
Lines 2 – 5: This is the procedure definition for the API to open the IFS, fopen. I have decided not to use the name of the API, OpenFile, therefore, I have to give the external procedure in the EXTPROC keyword on line 1. The pointer before the EXTPROC indicates that this procedure returns a pointer value to the calling program. I have been criticized in the past for using *N for the names of the procedures parameters, lines 3 and 4, as they are not descriptive. To compensate for this I have added a comment after each parameter to describe what it is. For this procedure there are two parameters, both are pointers, and VALUE means that the procedure will use a copy of the data pasweed, and the original version will remain unchanged.
Lines 6 – 10: This definition is for the procedure that performs the read of the IFS file. To read the IFS file I use the fgets, which returns a pointer. If the returned pointer not null then a record has been read. If the pointer is null then the end of file was encountered. There are three parameters for this procedure, lines 7 – 9.
Lines 11 – 13: If we open a file we have to close it when we are done. This procedure is the one that closes our IFS file.
The next part of the example program defines the variables I will be using:
14 dcl-s PathFile char(50) ; 15 dcl-s OpenMode char(5) ; 16 dcl-s FilePtr pointer inz ; 17 dcl-s RtvData char(32767) ; |
Now it is time to open the file.
18 PathFile = '/SIMON/test_read.txt' + x'00' ; 19 OpenMode = 'r' + x'00' ; 20 FilePtr = OpenFile(%addr(PathFile):%addr(OpenMode)) ; 21 if (FilePtr = *null) ; 22 dsply ('fopen unable to open file') ; 23 return ; 24 endif ; |
Line 18: I struggled with this for some time. If I had coded options(*string) on the parameter line in the procedure definition the compiler would have null terminated the variable for me. But no matter what I tried on two servers running IBM i 7.2 I could not get it to work. Therefore, I have null terminated the line myself using the appropriate hexadecimal value. This variable is the for location of our file.
Line 19: I had to null terminate variable myself too. This variable is how to open the IFS file. As I only want to open it for input, or read, which is indicated by the "r" I have placed in this variable.
Line 20: Now I open the file using the OpenFile procedure. As the parameters used by the procedure are pointers I use the %ADDR built in function to pass them as pointers to the address of the variable.
Line 21 – 24: If the file could not be opened the returned pointer, FilePtr, will contain null. This section of the program will display an error message to the user, line 22, and then end, line 23.
The next part is what really won me over... the simplicity of the read:
25 dow (ReadFile(%addr(RtvData):32767:FilePtr) <> *null) ; 26 RtvData = %xlate(x'00':' ':RtvData) ; //End of record null 27 RtvData = %xlate(x'25':' ':RtvData) ; //Line feed (LF) 28 RtvData = %xlate(x'0D':' ':RtvData) ; //Carriage return (CR) 29 dsply %subst(RtvData:1:52) ; 30 RtvData = ' ' ; 31 enddo ; |
Line 25: The read is performed here using the ReadFile procedure. The first parameter is a pointer to the retrieved data, the second is the size I want to use for the retrieved data, and the third is just a pointer variable. When the end of the file is reached the procedure returns null, and I would want to exit the do loop.
When I used the read API in the previous post, see here, I received a chunk of data and I had to determine where the records within it began and ended. With the fgets reads the IFS file one record at a time!
Lines 26 – 28: Now I want to remove all the formatting that came from the text file. This I can do by replacing the end of record, line feed, and carriage return characters with blanks.
Line 29: I can now display what I have read:
DSPLY first record At the start DSPLY second record DSPLY third record DSPLY fourth record DSPLY fifth record DSPLY sixth record DSPLY seventh record DSPLY eighth record DSPLY ninth record DSPLY tenth and last record |
Line 30: If the data in a record is shorter than the previous one the "extra data" from the previous record will remain. To prevent this I clear the retrieved data variable before the next read.
Line 32 - 33: Having read all the records in the IFS I can now close the file and exit the program.
As I said above what I like about these APIs is that I retrieve the records from the IFS file one record at a time. in my opinion, this makes it easier for me to write and for others to understand what my code. In the future it will be fopen, fgets, and fclose for me, rather than open, read, and close.
The entire source code for the program look like:
01 ctl-opt option(*srcstmt) dftactgrp(*no) ; 02 dcl-pr OpenFile pointer extproc('_C_IFS_fopen') ; 03 *n pointer value ; //File name 04 *n pointer value ; //File mode 05 end-pr ; 06 dcl-pr ReadFile pointer extproc('_C_IFS_fgets') ; 07 *n pointer value ; //Retrieved data 08 *n int(10) value ; //Data size 09 *n pointer value ; //Misc pointer 10 end-pr ; 11 dcl-pr CloseFile extproc('_C_IFS_fclose') ; 12 *n pointer value ; //Misc pointer 13 end-pr ; 14 dcl-s PathFile char(50) ; 15 dcl-s OpenMode char(5) ; 16 dcl-s FilePtr pointer inz ; 17 dcl-s RtvData char(32767) ; 18 PathFile = '/SIMON/test_read.txt' + x'00' ; 19 OpenMode = 'r' + x'00' ; 20 FilePtr = OpenFile(%addr(PathFile):%addr(OpenMode)) ; 21 if (FilePtr = *null) ; 22 dsply ('fopen unable to open file') ; 23 return ; 24 endif ; 25 dow (ReadFile(%addr(RtvData):32767:FilePtr) <> *null) ; 26 RtvData = %xlate(x'00':' ':RtvData) ; //End of record null 27 RtvData = %xlate(x'25':' ':RtvData) ; //Line feed (LF) 28 RtvData = %xlate(x'0D':' ':RtvData) ; //Carriage return (CR) 29 dsply %subst(RtvData:1:52) ; 30 RtvData = ' ' ; 31 enddo ; 32 CloseFile(%addr(PathFile)) ; 33 return ; |
For those of your still forced tyo use fixed format definitions your definitions would look like:
01 H option(*srcstmt) dftactgrp(*no) 02 D OpenFile PR * extproc('_C_IFS_fopen') 03 D * value 04 D * value 05 D ReadFile PR * extproc('_C_IFS_fgets') 06 D * value 07 D 10I 0 value 08 D * value 09 D CloseFile PR extproc('_C_IFS_fclose') 10 D * value 11 D PathFile S 50 12 D OpenMode S 5 13 D FilePtr S * 14 D RtvData S 32767 /free |
Addendum
I want to thank Giovanni Ramajola for sending me this. It is something I did not encounter in my testing when I wrote this article as I was not using variable length fields in the IFS file:
Because I was using the C API to make the IFS open and read a text file with a variable length and I realized that the pointer to line 20 of your pgm was always NULL.
To solve I had to add the pointer address in the pathfile:*data.
FilePtr = OpenFile(%addr(PathFile:*data):%addr(OpenMode)) ;In this way, the length of the file name can be variable and we are sure that the data will always and only be stored in the pointer in order to open the file correctly!
If your partition is IBM i 7.4 TR3 and 7.3 TR9, or later, you might find it easier to read the IFS file using SQL, rather than use this C API.
You can learn more about this from the IBM website:
This article was written for IBM i 7.2, and should work for earlier releases too.
Hi
ReplyDeleteRegarding Line 18: Try it this way:
D open Pr * ExtProc( '_C_IFS_fopen' )
D * Value Options( *String )
D * Value Options( *String )
BR,
Sam
Very nice!
ReplyDeleteThanks for the article!
ReplyDeleteIn line 30 you assign RtvData a value of ' ' (1 blank). Coming from other programming languages I would rather write "RtvData = '';", because to me "RtvData = ' ';" would suggest that RtvData has a length of one character.
Is it just a question of style or is it of advantage to use ' ' and not ''?
Markus
Markus ... RPGILE 99% of the time does not use variable length strings. So assigning RtvData a value of a null string is the same as assigning a blank. Both fill the entire 32767 characters with spaces. However if RtvData had been defined as varchar, then your suggestion would have more merit.
DeleteSimon ... Created a small test program using the C APIs - neat concept. However some IFS files are a different CCSID. Viewing the value of RtvData in debug shows trash. How can I force the C APIs to use a particular CCSID? Thanks!
ReplyDeleteThe CCSID is given in the fopen API. Check the IBM documentation in the link given above.
DeleteAnother great tool for the tool box.
ReplyDeleteHello,
ReplyDeleteGreat article
On line 32, shouldn't you close the file with the pointer to the FILE structure (FilePtr) returned from OpenFile() ?
CloseFile( FilePtr ) ;
Bart Wouters
Hello, Yes you are totally right, there should be close to no impact when closing with CloseFile(%addr(PathFile)), except if you try to access the document right after its generation. If you try to access it when the job is done, the file(s) are closed so no impact.
DeleteNevertheless I had a job where I had to read more than 200 files and I used these proc. A job can only support up to 200 open files. Seen that I didn't close the files properly CloseFile(%addr(PathFile)), I had "Too many open files" error message for any and all files that I tried to read after 200-ish read. When I replaced by CloseFile( FilePtr), the files were really closed so no more errors.
Thank you for providing such in-depth examples!!!
ReplyDeleteIn a matter of minutes I had migrated my program from the other IFS method to this one.
-- Scott J.
Used your example and I am getting:
ReplyDeletedsply ('fopen unable to open file') ;
I am trying to open a xml file. Is there something else I could try?
DUDE! Thanks for this. It really helped on a project I'm working on. I love the line by line explanations. Please, keep up the great work.
ReplyDeleteI am so glad you liked this post.
DeleteKeep checking back for more.
You wrote "I have been criticized in the past for using *N for the names of the procedures parameters, lines 3 and 4, as they are not descriptive. To compensate for this I have added a comment after each parameter to describe what it is."
ReplyDeleteWhy don’t you give the parameters meaningful names. I always do it to have the code document itself. Plus, in RDi/LPEX the auto completion feature (control + space) will show the meaning of the parameters.
(I asked this question before in the article "user spaces introduction", so please just delete this one, if you don’t want to answer.)
Markus
My feeling is why should I bother to waste my time coming up with names for parameters when I am not going to use those names.
DeleteI will use the parameter, probably using a variable with another name or constant.
In my experience of working with others I have found that they search the source member looking for where that parameter name is used.
Simon, it is far better to replace *n with actual names, for one huge reason: Documentation. When I see *n, I cringe, and this is not just a difference in style but rather in knowing what each parameter does. Newbies and veterans of RPG will both appreciate seeing actual parameter names. Again, great article, keep up the great work!
DeleteThis is a debate I have frequently.
DeleteI add a comment after the definition, when using *N, which can be far more descriptive than what people would use in the DCL-PR.
A good descriptive name can be used in the DCL-PI, or in the line of code that calls the subprocedure.
What a fantastic article, Simon! I just stumbled upon it today...
ReplyDeleteHi, thanks for the article. I tried this out to read XML from a an IFS file. Worked fine. Second time I tried it the fgets started truncating the first 2 characters. Any ideas why? The second program does a bit more than the first so i've read it might be that I'm using up buffer space.
ReplyDeleteWhat happens if you run the first program twice in a row? Is the data still formatted OK?
DeleteI think it was not working if you took advantage of OPTIONS(*STRING) because your filename variable is not varying length. So the name passed to the API would have trailing blanks unless you coded %TRIM(filename) when you call the API.
ReplyDeleteAlternatively, you could code OPTIONS(*STRING:*TRIM) on the parameter, so RPG will automatically trim the parameter for you.
I'm sending 'r, o_ccsid=1208█' as my openMode but I'm getting garbage when I read the stream file.
ReplyDeleteI always use the partition's default CCSID or 37.
DeleteThis was a big head start in creating my own code (including many of the comments). I switched to free-form which helped me understand all the pieces better as I converted them all manually.
ReplyDeleteOne thing I did find was the %xlate functions are inefficient. I have a large file (100 MB, 160,000 lines) which was taking 53 seconds to read through with just the contents of the loop as shown. I worked out a better way.
Here are my definitions that differ.
dcl-s rtvData char(2000); // ENSURE this is longer than the longest line
dcl-s maxLen int(10) inz(%size(rtvData));
dcl-s rtvLine varchar(%size(rtvData));
dcl-s toTrim char(2) inz(X'0D25');
With that one assumption mentioned in the comment, dealing with all of the characters from the %xlate functions can be done in a single line.
rtvLine = %trimr(%str(%addr(rtvData)): toTrim); // Get actual content
The %str function gets the zero-terminated string (dropping the zero byte), then the %trimr drops the linefeed and (if present) carriage return which we now know are on the right end of the varchar.