I can remember many years ago having a conversation with a non-AS400 (yes, it was that long ago) person about the programming environment we worked with. He scoffed at the AS400 stating he could never work on it as it did not support Regular Expressions. I am not going to describe in too much detail what Regular Expressions, sometimes called regex or regexp, are as others have done a good job doing that. To give a very simple explanation: it is a sequence of characters that define a search pattern, which is then used to find or find and replace parts of strings. If you are interested in learning more check the Wikipedia page about regular expressions here.
Within the Db2 for i PTFs that accompanied IBM i TR2 and 7.2 TR6 a number of regular expressions functions were introduced to SQL. I am not going to describe them all in one post, as it would be too long, as I feel they merit their own posts. To start this post is about the REGEXP_LIKE. I am just going to give a simple overview of how to use this, if you want more details or want to learn how to make complex statements I suggest you visit the page on IBM's KnowledgeCenter, link at the bottom of this post, or use your favorite search engine to search for "REGEXP_LIKE". The posts about REGEXP_LIKE for Oracle's PL/SQL I found helpful.
For these examples I built a Db2 table to contain the data I will be using:
01 CREATE OR REPLACE TABLE MYLIB.PERSON ( 02 PERSON_ID FOR "PID" INTEGER 03 GENERATED ALWAYS AS IDENTITY 04 (START WITH 1, INCREMENT BY 1, NOCYCLE), 05 FIRST_NAME FOR "FNAME" VARCHAR(20), 06 LAST_NAME FOR "LNAME" VARCHAR(30) 07 ) ; 08 LABEL ON COLUMN PERSON ( 09 PERSON_ID IS 'Person id', 10 FIRST_NAME IS 'First name', 11 LAST_NAME IS 'Last name' 12 ) ; |
Line 1: As of IBM i 7.2 I can use CREATE OR REPLACE to define my table. If I change the definition of the table I can run this statement again and it will change the table to match the new definition, and map the contents of the table to the new definition.
Lines 2 – 4: I am defining an identity column to help keep the rows unique. For more details on its definition refer to the post about Temporal tables. Personally, I think the definition is self explanatory.
Lines 5 and 6: I have defined two columns for the first and last names of the people to be included in this table.
Lines 8 – 12: I want to give this table decent column headings. These are the equivalent of DDS's COLHDG.
I added rows to the table using the INSERT statement.
INSERT INTO PERSON VALUES(DEFAULT,'ROBERT','ALLEN') ; INSERT INTO PERSON VALUES(DEFAULT,'John','Austin') ; INSERT INTO PERSON VALUES(DEFAULT,'don','bennett') ; |
Notice that I used DEFAULT for the value of the first row. As this is the identity column it will increment itself when rows are added, I do not give it value.
The names I have loaded the table with are the names of the Coventry City Football Club squad from the 1961-62 season.
Let me start with the my first Select statement.
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'Y') |
I have given two of the REGEXP-LIKE parameters. The first is the name of the column to perform the test upon. The second is the character(s) to search for. There are others, but I have not used them in this statement. Here I want my result set to include all of the players whose last name has a "Y" in it.
Person First Last id name name 7 ALAN DALEY 19 BILLY MYERSCOUGH |
As I entered an upper case "Y" all of the players with that in their last name were returned. But I know I have players with a lower case "y" in the last name too. Fortunately I can use the third parameter to ignore the case.
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'Y','i') |
By using a lower case 'i' as the third parameter the results will return all of the last names that contain a "Y", regardless of case. The "i" must be lower case, an upper case version will give you an error. As this is a utility taken from Unix it is case sensitive.
Person First Last id name name 7 ALAN DALEY 9 lol harvey 19 BILLY MYERSCOUGH 21 reg ryan |
I could make my search string be more than one character if I so desire.
In this next I want to find all of the rows where the last name contains a "Y" or "Z", regardless of case. I use the pipe ( | ) symbol to indicate that I want an "or".
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'y|z','i') |
The names returned are shown below. Frank is the only player with a "Z" in his name.
Person First Last id name name 7 ALAN DALEY 9 lol harvey 17 Frank Kletzenbauer 19 BILLY MYERSCOUGH 21 reg ryan |
If I wanted to test for any three characters in a name I could use the pipe, but if there were more than three it would look unwieldy. I can use the square brackets ( [ ] ) and place the values I want to search for within, think of it like a list of characters.
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'f|y|z','i') SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'[fyz]','i') |
Now I have Ron Framer added to the results.
Person First Last id name name 7 ALAN DALEY 8 Ron Framer 9 lol harvey 17 Frank Kletzenbauer 19 BILLY MYERSCOUGH 21 reg ryan |
I can also use a hyphen ( - ) for a range of values within the square brackets. With this statement I want all of the players whose last name contains a letter between "I" and "K".
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'[i-k]','i') |
This gives me a long list of players.
Person First Last id name name 2 John Austin 4 DIETMAR BRUCK 5 Brian Caine 6 george curtis 10 RON HEWITT 11 Brian Hill 12 peter hill 14 Stuart Imlach 15 eric jones 16 MICK KEARNS 17 Frank Kletzenbauer 18 arthur lightening 20 Brian Nicholas 23 Nelson Stiffle |
I can give two ranges to only return those players whose names contain a letter from "I" – "K" followed by a letter in the range "A" – "E".
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'[i-k][a-e]','i') |
The results are much smaller. "KEARNS" is returned as it has a "K" followed by an "E". "Nicholas" as the "i" is followed by "c".
Person First Last id name name 16 MICK KEARNS 20 Brian Nicholas |
If I want to check the variable I am testing starts with a letter I use the caret ( ^ ) symbol.
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'^b','i') |
Which will give me.
Person First Last id name name 3 don bennett 4 DIETMAR BRUCK |
The dollar ( $ ) character is used to indicate that I want to look for the last character in the variable.
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'t$|r$','i') |
I used the or ( | ) I get all of players whose name ends in "T" or "R".
Person First Last id name name 3 don bennett 8 Ron Framer 10 RON HEWITT 13 COLIN HOLDER 17 Frank Kletzenbauer |
I can also search for patterns. For example a vowel, followed by any character, and then another vowel. To indicate the "any character" I use a period/full stop ( . ).
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'[aeiou].[aeiou]','i') |
Which returns the following.
Person First Last id name name 5 Brian Caine 7 ALAN DALEY 8 Ron Framer 10 RON HEWITT 15 eric jones 17 Frank Kletzenbauer 18 arthur lightening 20 Brian Nicholas |
And I just add another period/full stop if I want a list of all the players with vowels separated by two characters.
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'[aeiou]..[aeiou]','i') |
Which gives me.
Person First Last id name name 1 ROBERT ALLEN 2 John Austin 3 don bennett 5 Brian Caine 6 george curtis 9 lol harvey 13 COLIN HOLDER 14 Stuart Imlach 17 Frank Kletzenbauer 20 Brian Nicholas 25 BOB WESSON |
In this last example I want to list all of the players who have two "T" in their last name. I could have "tt" in my search patter, but there is another way. If I put the character(s) I want to search for in square brackets and the number of occurrences in curly brackets ( { } ).
SELECT * FROM PERSON WHERE REGEXP_LIKE(LNAME,'[t]{2}','i') |
Only two players have "TT" in their last names.
Person First Last id name name 3 don bennett 10 RON HEWITT |
There is a lot more you can do with the REGEXP_LIKE, this is just intended as a quick introduction. For more of what you can do with it click on the link below to the KnowledgeCenter page, below. Thankfully this is POSIX compliant, therefore, examples from other SQL databases that are also POSIX compliant can be used.
Other Db2 for i regular expressions:
You can learn more about the REGEXP_LIKE command from the IBM website here.
This article was written for IBM i 7.3 TR2 and 7.2 TR6.
This is Great we just started an application using regular expression. *Note we are still on 7.1 and this is working for us.
ReplyDeleteAlso house keeping notes.
Your table is using MYLIB but your labels are using QTEMP.
Also your blog is only showing 3 inserts but your examples are using other data.
**else this is a great resource looking forward to more from this powerful tool.
Good catch! And a big oops on my part.
DeleteAll of the references to QTEMP have been removed.
I only gave three of the insert statements just as an example. Trust me the other 22 looked pretty much the same, so I did not want to bother to take up space above.
Thanks
This is a great tool for pattern matching!
ReplyDeleteThanks for your article. As an SQL novice, I found it very helpful.
ReplyDeleteExcellent article. Investigating the topic further, there are three other services that use REGEXP: SUBSTR, SCAN and REPLACE which will allow REGEXP functionality in other SQL clauses as well. For example this allowed me to find the top email domains of our customers or at least got me started:
ReplyDeleteSELECT REGEXP_SUBSTR(CONTACTFORM , '@...*') ,count(*)
FROM CUSTOMERS
WHERE REGEXP_LIKE(CONTACTFORM ,'@','i')
and DELT ='N'
group by REGEXP_SUBSTR(CONTACTFORM , '@...*')
order by REGEXP_SUBSTR(CONTACTFORM , '@...*')
All of those other REGEXP functions will be covered in future posts.
DeleteThis an excellent, clear article, Simon. Thank you.
ReplyDeleteHi Simon, I just found this great website of yours! I immediately tried the regexp_like expression, and am receiving this error: QQQSVREG in QSYS type *SRVPGM not found. Is there a beginning to this discussion (i.e. whether these commands must be used within a program, or can be used on command line SQL). I typically build my SQL statements on command line, then convert to SQLRPGLE. I found the manual, but in truth, the last time I read 2k pages, it was the Game of Thrones series, and I don't have an entire summer to spend flipping through this behemoth. Thanks again for posting these most intriguing bon bons!
ReplyDeleteIf you are missing that service program then you must not be at either the right release or not have the latest TRs loaded.
DeleteGo talk to you System Admin ask him what release you are on? If he says 7.2 or 7.3 then ask him to install the latest TR for that release.