Pages

Wednesday, April 6, 2022

List rows in IFS files that contain a certain string

listing all rows in ifs files containing a string

The question was an interesting one: How is it possible, in SQL, to list all of the files in the IFS that contain a certain string?

There are a number of SQL table functions that allow me to get to information about the files in IFS folders, and others to "read" their contents. This is a good excuse to combine the two.

I am going to place three files in my personal IFS folder, MyFolder, then search those for the desired string. I created three DDL tables in my library, MYLIB. OK, I created one table and then cloned it two times to make the three tables. I populated them with just a few rows of data. As the tables are identical, I can easily combine the results from the three in one SQL statement using the UNION clause:

SELECT 'TABLE1',A.* FROM TABLE1 A
UNION
SELECT 'TABLE2',B.* FROM TABLE2 B
UNION
SELECT 'TABLE3',C.* FROM TABLE3 C 
ORDER BY 1,2

I have added an empty row after each table's results to make it easier to understand the results:

00001   FIRST  SECOND     THIRD           FOURTH
------- -----  ---------  --------------  ------------------
TABLE1      1  TEST       SOMETHING       SOMETHING MORE
TABLE1      2  SOMETHING  TEST            SOMETHING MORE
TABLE1      3  SOMETHING  SOMETHING MORE  TEST
TABLE1      4  SOMETHING  SOMETHING MORE  YET MORE SOMETHING

TABLE2      1  NOT HERE   SOMETHING       SOMETHING MORE
TABLE2      2  SOMETHING  NOT HERE        SOMETHING MORE
TABLE2      3  SOMETHING  SOMETHING MORE  NOT HERE

TABLE3      1  SOMETHING  SOMETHING MORE  YET MORE SOMETHING
TABLE3      2  SOMETHING  SOMETHING MORE  TEST
TABLE3      3  SOMETHING  TEST            SOMETHING MORE
TABLE3      4  TEST       SOMETHING       SOMETHING MORE

I am going to be searching for the string "TEST" in these three tables. It is present in various columns of TABLE1 and TABLE3, but not in TABLE2.

I copied the content of these three tables using the Copy To Import File command, CPYTOIMPF:

CPYTOIMPF FROMFILE(MYLIB/TABLE1) +
            TOSTMF('/home/MyFolder/TABLE1.csv') +
            FROMCCSID(37) +
            STMFCCSID(*PCASCII) +
            RCDDLM(*CRLF)

CPYTOIMPF FROMFILE(MYLIB/TABLE2) +
            TOSTMF('/home/MyFolder/TABLE2.csv') +
            FROMCCSID(37) +
            STMFCCSID(*PCASCII) +
            RCDDLM(*CRLF)

CPYTOIMPF FROMFILE(MYLIB/TABLE3) +
            TOSTMF('/home/MyFolder/TABLE3.csv') +
            FROMCCSID(37) +
            STMFCCSID(*PCASCII) + 
            RCDDLM(*CRLF)

Depending upon the configuration of your IBM i and the CCSID it uses you might need to change these commands to fit your needs.

I can check that the files were copied to my folder using the IFS_OBJECT_STATISTICS table function:

SELECT PATH_NAME,OBJECT_TYPE
  FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS('/home/MyFolder','NO','*STMF'))

I have passed the table function:

  1. The path to folder I want to list
  2. Do not list any subfolders
  3. Only return steam files. The CSV files I generated are considered stream files in the IFS

The results show me that my three tables have been successfully copied to the IFS:

PATH_NAME                    OBJECT_TYPE
--------------------------   -----------
/home/MyFolder/TABLE1.csv    *STMF
/home/MyFolder/TABLE2.csv    *STMF
/home/MyFolder/TABLE3.csv    *STMF

Now I can generate a list of files in my IFS folder, I now need to be able to "read" their contents. Fortunately there is. There are multiple similar SQL table functions to do this. The one I normally use is IFS_READ.

SELECT * FROM TABLE(QSYS2.IFS_READ('/home/MyFolder/TABLE1.csv'))

The only parameter I needed to give this table function is the path name of the file I want to read. The results look like:

LINE_NUMBER LINE
----------- ---------------------------------------------------------
          1 1     ,"TEST","SOMETHING","SOMETHING MORE"
          2 2     ,"SOMETHING","TEST","SOMETHING MORE"
          3 3     ,"SOMETHING","SOMETHING MORE","TEST"
          4 4     ,"SOMETHING","SOMETHING MORE","YET MORE SOMETHING"

The line number is generated by the table function.

I am only interested in the rows that contain the string "TEST", therefore, I can refine my search by changing the Select statement:

SELECT * FROM TABLE(QSYS2.IFS_READ('/home/MyFolder/TABLE1.csv'))
WHERE LINE LIKE '%TEST%'

By using the where clause with the wildcard "%TEST%" I am returned only those rows that contain that string:

LINE_NUMBER LINE
----------- -------------------------------------------
          1 1     ,"TEST","SOMETHING","SOMETHING MORE"
          2 2     ,"SOMETHING","TEST","SOMETHING MORE"
          3 3     ,"SOMETHING","SOMETHING MORE","TEST"

In the interest of keeping things K.I.S.S. I decided to create the final statement as a Common Table Expression, CTE:

     -- Get list of files from IFS folder
01  WITH T1 (PATH_NAME)
02  AS
03  (SELECT PATH_NAME FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS('/home/MyFolder',
04                                                           'NO','*STMF'))),

     -- See if any of those files contain string "TEST"
05  T2 (PATH_NAME,LINE_NUMBER)
06  AS
07  (SELECT T1.PATH_NAME,B.LINE_NUMBER FROM T1,
08  LATERAL
09  (SELECT LINE_NUMBER FROM TABLE(QSYS2.IFS_READ(T1.PATH_NAME))
10   WHERE LINE LIKE '%TEST%') B)

11  SELECT * FROM T2 ORDER BY 1,2;

Lines 1 – 4: In the first part of the CTE I am creating a virtual table that contains a list of all the files ("*STMF") in my folder, and not in any subfolders it may contain. As there is a second part of the CTE the first part must end with a comma ( , ).

Lines 5 – 10: In this second part I am taking the list of files and then checking to see if any contain the string "TEST". This I do by joining the view, T1, to the table function, IFS_READ, with a LATERAL. The results of the lateral are place in the virtual table T2. As only the SELECT line follows this part of the CTE does not end with a comma.

Line 11: Last, but not least, is the Select statement that is used to show the results from, in this case, the virtual table T2.

My results show the path name of the file that has rows that contain "TEST" and the line number of the row that does.

PATH_NAME                   LINE_NUMBER
--------------------------  -----------
/home/MyFolder/TABLE1.csv             1
/home/MyFolder/TABLE1.csv             2
/home/MyFolder/TABLE1.csv             3
/home/MyFolder/TABLE3.csv             2
/home/MyFolder/TABLE3.csv             3
/home/MyFolder/TABLE3.csv             4

 

This article was written for IBM i 7.4, and should work for some earlier releases too.

6 comments:

  1. Simon, great read and thanks for sharing. ‘grep’ in the shell tool can also give you some of the same results. It’s great to have so many tools to get the job completed. Thanks for sharing.

    ReplyDelete
    Replies
    1. And i think it would be a lot faster.
      But it's great that and how Simon shows that more and more is possible. (Although the shown technique is not always the way one should really go.)

      Delete
  2. Thank you forward

    ReplyDelete
  3. hello simon, is there an easy way to delete Files in IFS-Folder with SQL? for example alle files in IFS-Folder older than two month.

    ReplyDelete
    Replies
    1. There is not a command that allows you to delete with a date range in it.

      I would try using the QCMDEXC scalar function. If you add it to your SQL Select statement with the DEL or RMVLNK command it would delete that row in the results. Many rows = many deletes.

      Delete
  4. Very useful. I have users ask for this all the time.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.