I have been asked multiple times if there is an easy way to capture a list of jobs with the amount of CPU they, with a program. The answer is, of course, "Yes". I thought it was time to give an example of a program example of how I would do this.
My scenario is that I need to capture all the jobs in this partition with the CPU percentage they are using, every five minutes. This data needs to be captured in an output file, or in this example it's a table. The information I need is:
- CPU percent
- Job name
- Subsystem and the library it is found in
- User name
- Job type – batch, interactive, prestart, etc.
- Function and function type
- Job status
- CPU time
- Time the data was captured
Fortunately all of this can be captured from the ACTIVE_JOB_INFO table function.
First I need to create the output file. As I said before this will be a SQL DDL table. I can create a Table based on another using SQL, in this situation based on the columns returned from ACTIVE_JOB_INFO:
01 CREATE TABLE MYLIB.ACTIVE_JOB_HISTORY 02 FOR SYSTEM NAME "ACTJOBHIST" 03 (RUN_TIME,CPU_PERCENT,JOB_NAME,SUBSYSTEM,SUBSYSTEM_LIBRARY, 04 USER_NAME,JOB_TYPE,FUNCTION_TYPE,FUNCTION,JOB_STATUS,CPU_TIME) 05 AS 06 (SELECT CURRENT_TIMESTAMP,ELAPSED_CPU_PERCENTAGE,JOB_NAME,SUBSYSTEM, 07 SUBSYSTEM_LIBRARY_NAME,AUTHORIZATION_NAME,JOB_TYPE,FUNCTION_TYPE, 08 FUNCTION,JOB_STATUS,CPU_TIME 09 FROM TABLE(QSYS2.ACTIVE_JOB_INFO())) 10 DEFINITION ONLY ; |
Lines 1 and 2: Create this table with a long SQL name and a short system name.
Lines 3 and 4: List of what the columns will be called in this new Table.
Lines 6 – 9: SQL statement to capture the columns of information I want from ACTIVE_JOB_INFO. The only exception is the first column, CURRENT_TIMESTAMP.
Line 10: This means that the Table is just created, no data is written to it.
Now I have my output file I need a program to fill it:
01 **free 02 ctl-opt main(Main) option(*nodebugio:*srcstmt) dftactgrp(*no) alwnull(*usrctl) ; 03 dcl-proc Main ; 04 dcl-pr sleep int(10) extproc('sleep') ; 05 *n uns(10) value ; 06 end-pr ; 07 dcl-s Seconds int(10) inz(300) ; // 300 secs = 5 mins 08 dcl-s CurrentTime timestamp ; 09 dcl-s MessageText char(512) ; 10 dcl-s Nothing char(1) ; 11 MessageText = 'ACTIVE_JOB started ' ; 12 exec sql CALL QSYS2.SEND_MESSAGE('CPF9898',512,:MessageText,'*LIBL','QCPFMSG') ; 13 dow (*on) ; 14 CurrentTime = %timestamp() ; 15 exec sql INSERT INTO ACTIVE_JOB_HISTORY 16 SELECT :CurrentTime,ELAPSED_CPU_PERCENTAGE,JOB_NAME,SUBSYSTEM, 17 SUBSYSTEM_LIBRARY_NAME,AUTHORIZATION_NAME,JOB_TYPE,FUNCTION_TYPE, 18 FUNCTION,JOB_STATUS,CPU_TIME 19 FROM TABLE(QSYS2.ACTIVE_JOB_INFO()) 20 WHERE ELAPSED_CPU_PERCENTAGE > 0 21 ORDER BY ELAPSED_CPU_PERCENTAGE DESC ; // Reset stats 22 exec sql SELECT '1' INTO :Nothing FROM TABLE(QSYS2.ACTIVE_JOB_INFO(RESET_STATISTICS => 'YES')) LIMIT 1 ; 23 sleep(Seconds) ; 24 enddo ; 25 on-exit ; 26 MessageText = 'ACTIVE_JOB ended ' ; 27 exec sql CALL QSYS2.SEND_MESSAGE('CPF9898',512,:MessageText,'*LIBL','QCPFMSG') ; 26 end-proc ; |
Line 1: All my new code is always totally free RPG.
Line 2: My favorite control options. I also have the MAIN option as I will be using a Main procedure.
Line 3: Start of the Main procedure.
Lines 4 – 6: Procedure prototype of the external C procedure sleep. This allows us to "sleep", pause, the program for a number of seconds.
Line 7: Definition of the variable that will contain the number of seconds to sleep. It must be defined as a ten long integer. Here I am initializing the variable with 300 seconds, or 5 minutes.
Lines 8 – 10: Other variables, whose purpose I will describe later.
Lines 11 and 12: I want to send a message to the System Operator message queue that this job is starting. I am going to call this job and this program ACTIVE_JOB. I can add a message to the System Operator message queue using the SEND_MESSAGE SQL procedure. I need to pass this procedure the following parameters:
- Message id: I am using the message CPF9898 as this message has not default text, just the substitution parameter
- Message length: The substitution parameter is 512. If I pass just the length of the string for the message the rest of the message is filled with strange characters. If I use a parameter of 512 then I do not get those, and just the string I desire.
- Message text: This is where I am using the variable MessageText. On line 11 I move the string I want the message to be into the variable
- Message file library
- Message file
Line 13: Start of a never ending Do loop.
Line 14: I thought about using SQL's CURRENT_TIMESTAMP for the run time column in the output file. The danger is that the insert could take several microseconds which could result in records from the same "snapshot" having different timestamps. If I capture the timestamp in a variable I can use this variable for the run time for all the records of this "snapshot".
Lines 15 – 21: The Insert statement to add data to my output file. Things to notice are:
Line 16: I am using the RPG variable for the run time, therefore, it has to be prefixed with a colon so that SQL knows it is an RPG variable.
Line 20: I am only selecting rows from the Table function where the CPU percent is greater than zero.
Line 21: The order in which the data is returned from the Table function, and then inserted into the output file, is in CPU percent in descending order. In other words the job that is using the most CPU is the first to be inserted.
Line 22: I want my CPU percentage to be calculated for the time between this capture and the next, in five minutes time. Therefore, I need to reset the statistics. By using the RESET_STATISTICS parameter I tell the ACTIVE_JOB_INFO to reset itself.
Line 23: The program sleeps.
Line 24: End of the never ending Do loop.
I am sure you have noticed that there is no way to exit the Do loop, it just keeps going. The only way to end the program is to end the job. This is why I used a Main procedure. If I end the job in a cycle RPG program the program just ends. If a procedure ends, whether in a normal exit or when the job is ended the code in the ON-EXIT section is executed.
Line 25: Start of the ON-EXIT section, which will be executed when I cancel this job.
Lines 26 - 27: I write the message to the System Operator message queue that the job has ended.
I compile the above to create the program ACTIVE_JOB.
I can now submit it to batch so it can run and gather this information for me:
SBMJOB CMD(CALL PGM(MYLIB/ACTIVE_JOB)) JOB(ACTIVE_JOB) JOBQ(QS36EVOKE) |
Notice the job queue I have used, QS36EVOKE. Why did I use this job queue rather than one of the QBATCH* job queues?
Most people have their QBATCH* job queues to have only one active job at a time. As this job is never ending it means that in a "single threaded" job queue all the jobs submitted later would just queue up behind it waiting for this job to finish.
The job queue Q36EVOKE still submits the job to the QBATCH subsystem. But it has a no maximum number of jobs it can have running in the subsystem. By using Q36EVOKE I know that the job ACTIVE_JOB will not hold any others up.
When I end the job ACTIVE_JOB I can look in the system operator message queue to see when it started and ended.
Here I am using the MESSAGE_QUEUE_INFO the return the messages from any job that has the pattern "ACTIVE_JOB" in its job name:
01 SELECT MSG_TIME AS TIME,MSGID AS MSGID,MSG_TYPE AS TYPE, 02 CAST(MSG_TEXT AS CHAR(100) CCSID 37) AS TEXT 03 FROM QSYS2.MESSAGE_QUEUE_INFO 04 WHERE MSGQ_NAME = 'QSYSOPR' 05 AND FROM_JOB LIKE '%ACTIVE_JOB%' |
The results are:
TIME MSGID TYPE TEXT -------------------------- ------- ------------- ---------------------------------------- 2023-MM-DD 19:32:39.332633 CPF9898 INFORMATIONAL ??ACTIVE_JOB started 2023-MM-DD 19:32:54.592587 CPC1125 COMPLETION Job 999999/SIMON/ACTIVE_JOB was ended... 2023-MM-DD 19:32:54.609828 CPF9898 INFORMATIONAL ??ACTIVE_JOB ended |
Notice that the completion message comes before the job ended message. This shows that I ended the job, and the ON-EXIT group was executed afterwards.
Before I show the results I need to explain something. The first time the Do loop is executed no time has elapsed, therefore, no CPU has been used and there are no results to insert. When the Do loop loops the second time, time has elapsed and there is now data to insert into the output file.
The first five results from my testing looks like:
CPU_ SUBSYSTEM RUN_TIME PERCENT JOB_NAME SUBSYSTEM _LIBRARY -------------------------- ------- ----------------------- --------- --------- 2023-MM-DD 21:32:48.386692 44.33 833496/USER_1/SOME_JOB QBATCH QSYS 2023-MM-DD 21:32:48.386692 0.98 886083/QUSER/QZDASOINIT QUSRWRK QSYS 2023-MM-DD 21:32:48.386692 0.17 872079/QUSER/QZRCSRVS QUSRWRK QSYS 2023-MM-DD 21:32:48.386692 0.16 771163/QUSER/QZRCSRVS QUSRWRK QSYS 2023-MM-DD 21:32:48.386692 0.16 872084/QUSER/QZRCSRVS QUSRWRK QSYS USER_ JOB_ FUNCTION JOB_ CPU_ NAME TYPE _TYPE FUNCTION STATUS TIME ------ ---- -------- -------- ------ -------- USER_1 BCH PGM FUNCTION RUN 13060167 QUSER PJ <NULL> <NULL> PSRW 3808 QUSER PJ <NULL> <NULL> TIMW 20562 USER_2 PJ <NULL> <NULL> TIMW 169591 USER_3 PJ <NULL> <NULL> RUN 19325 |
I left this program running for five days and found that it never used enough CPU for it to appear in the generated output.
There you go a simple program to gather what jobs are running in your partition, and how much of the CPU they use.
This article was written for IBM i 7.5, and should work for some earlier releases too.
Instead of DoW (*On) I would suggest to write DoW not %Shtdn which allows the program to terminate nicely when performing a controlled end.
ReplyDelete