I am sure every application has a need to validate email addresses. After finding a question on Facebook I decide to create one just to see how easy, or hard, it would be.
If this was going to be something that could be used from multiple programs it made sense to put the logic in a procedure that I can then bind into any programs that needs this functionality.
For this scenario I have created two new objects:
- MODULE01: The *MODULE that contains the procedure ValidateEmail
- RPGPGM01: A *PGM that needs to call the procedure to validate various email addresses
IMHO it make sense to start showing and explaining the procedure. As it is long, rather than show it all at once I am going to show it in two parts. The first part:
01 **free 02 ctl-opt nomain ; 03 dcl-pr ValidateEmail char(1) ; 04 *n varchar(100) value ; 05 end-pr ; 06 dcl-proc ValidateEmail export ; 07 dcl-pi *n char(1) ; 08 EmailAddress varchar(100) value ; 09 end-pi ; 10 dcl-s Flag char(1) ; 11 dcl-s Domain varchar(100) ; 12 exec sql SET :Flag = REGEXP_COUNT(:EmailAddress, 13 '^[[a-z0-9.!#$%&+]*+/=?^_`{|}~-]+' || 14 '@[a-z0-9-]+(?:\.[a-z0-9-]+)*$', 15 'i') ; 16 if (Flag = '0') ; 17 return '1' ; // Not valid 18 endif ; |
The first part of the procedure validates that the email address is in the correct format. But first I have all the definitions:
Line 1: I say: "Free your thinking, free your (RPG) code".
Line 2: As this is external to the program, I will be creating I give this the NOMAIN control option. This informs the compiler that the program does not use the RPG cycle, and does not have a main procedure.
Lines 3 – 5: This is the procedure prototype for the procedure, ValidateEmail. It has a passed variable character parameter of 100 characters, and returns a single character.
Line 6: Start of the procedure. The EXPORT is required as this procedure will be called from somewhere not within this module.
Lines 7 – 9: The procedure interface, which needs to match the procedure prototype. I never bother to give my procedure interfaces a name, therefore, I need to give it the name of "*N" for null. I do give the incoming parameter a name.
Lines 10 and 11: Definitions for the parameters I will be using in this procedure.
Line 12 - 15: This is where the "magic" happens. I am using a SQL regular expression to validate the format of the email address. I admit I did not write the "test pattern" myself, I copied it from someone who gave me permission to do so. What is it checking for?
An email address consists four parts:
simon@email.com |
- simon: Username, this can include a dot ( simon.hutchinson@email.com ) or a plus character, which is supported by Gmail ( simon+something@gmail.com )
- @: At sign
- email: Mailserver, which can be a subdomain (simon@sub.email.com)
- .com: Domain
Line 12: The first parameter is the variable that contains the email address.
Lines 13 and 14: This is the regular expression string, or pattern, which contains the characters to validate the contents of the first parameter. I have had to split it onto two lines and concatenate, using two pipe symbols ( || ), the two parts together to fit in the space allowed.
Line 15: The third parameter tells the regular expression to ignore case.
The SQL functions returns '1' if the email address matches with the pattern.
Lines 16 – 18: All of the procedures I write return a '0' if it was successful, and '1' if it was not. As the regular expression returns '0' if the pattern does not match, I need to return '1' to whatever calls this procedure and leave the procedure.
Onto the second part of the procedure. This is where I validate that the mailserver part of the email address is a valid domain name:
19 Domain = %subst(EmailAddress : %scan('@' : EmailAddress) + 1) ; 20 dow (*on) ; 21 Flag = '0' ; 22 exec sql SELECT '1' INTO :Flag FROM TABLE(QSYS2.DNS_LOOKUP(:Domain)) LIMIT 1 ; 23 if (Flag = '1') ; 24 leave ; 25 endif ; 26 Domain = %subst(Domain : %scan('.' : Domain) + 1) ; 27 if (%scan('.' : Domain) = 0) ; 28 leave ; 29 endif ; 30 enddo ; 31 if (Flag = '1') ; 32 return '0' ; // Valid 33 else ; 34 return '1' ; // Not valid 35 endif ; 36 end-proc ; |
IMHO only validating the format of the email is only half of what I need to do. I cannot validate the username part. I can validate the mailserver or domain address that for the email address, and that is what the second part of the procedure does.
Lines 19: I need to "extract" the mailserver and domain out of the email address. I use the substring built in function, %SUBST. The first parameter is the email address. The second is where I scan for the "@", and take everything to the right of that as the domain to test.
Line 20: Start of a loop to get and validate the domain name.
Line 21: I move zero the variable Flag before I use it in the following SQL statement. I do this as if the SQL statement fails it does not change the value of Flag.
Line 22: I use the SQL Table function DNS_LOOKUP to validate the mailserver's domain. If the Table Function finds the domain '1' is placed in the variable Flag.
Lines 23 – 25: If the domain name is valid I exit the Do-loop.
Line 26: If this is a subdomain I need to remove the subdomain and be left with just the domain. I use the %SUBST to do that. The second parameter scans for a dot, and adds one to it to hopefully return the domain name.
Lines 27 – 29: I scan for a period again. If I don’t find one I know that this variable does not contain a domain name, just the TLD, for example COM. If this is true I want to exit the Do-loop.
Lines 31 – 35: As I said before it is my personal standard to return '0' if the validation was successful, therefore, if Flag is '1' I return '0'. And if Flag is not '1', '1' is returned.
The above source member is compiled to create a module. The module is then added to the binding directory TESTBNDDIR.
Onto the program that will call this procedure:
01 **free 02 ctl-opt bnddir('*LIBL/TESTBNDDIR') dftactgrp(*no) ; 03 dcl-pr MyProc ; 04 *n varchar(100) value ; 05 end-pr ; 06 MyProc('simon@gmail.com') ; 07 MyProc('SIMON@GMAIL.COM') ; 08 MyProc('simon@gmail.com ') ; 09 MyProc('simon @ GMail . com') ; 10 MyProc('SIMON@X.X') ; 11 MyProc('simon+something@gmail.com') ; 12 MyProc('john.doe@us.ibm.com') ; 13 MyProc('simon@google.co.uk') ; 14 *inlr = *on ; 15 dcl-proc MyProc ; 16 dcl-pi *n ; 17 Email varchar(100) value ; 18 end-pi ; 19 dcl-pr ValidateEmail char(1) ; 20 *n varchar(100) value ; 21 end-pr ; 22 dcl-s Short char(25) ; 23 Short = Email + ':' ; 24 if (ValidateEmail(Email) = '0') ; 25 dsply (%trimr(Short) + ' Email address is valid') ; 26 else ; 27 dsply (%trimr(Short) + ' Email address is NOT valid') ; 28 endif ; 29 end-proc ; |
line 2: I have included the binding directory control option so that I don’t have to remember to do so when I compile the program.
Lines 3 – 5: This is the definition of a subprocedure in this program.
Lines 6 – 13: This procedure is called multiple times with various email addresses.
Line 15: The start of the subprocedure.
lines 16 – 18: This is the procedure interface for this subprocedure.
Lines 19 – 21: Procedure prototype for the procedure above. I placed this within the subprocedure as this is the only place where it will be called.
Lines 22 – 23: The Display operation code, DSPLY, has a limit of 52 characters, therefore, I cannot use the Email variable with it. As Email is a VARCHAR variable I don't need to right trim off any trailing blanks. I can just move its contents into Small. I also add a colon ( : ) after the email address so it will be easy to see if the email address has can trailing character that would make it invalid.
Lines 24 - 28: I am calling the email validation procedure in the If statement, and using what it returns to condition which of two message are shown by the DSPLY operation code.
What is shown when I call the above program:
DSPLY simon@gmail.com: Email address is valid DSPLY SIMON@GMAIL.COM: Email address is valid DSPLY simon@gmail.com : Email address is NOT valid DSPLY simon @ GMail . com: Email address is NOT valid DSPLY SIMON@X.X: Email address is NOT valid DSPLY simon+something@gmail.com: Email address is valid DSPLY john.doe@us.ibm.com: Email address is valid DSPLY simon@google.co.uk: Email address is valid |
The first email address is valid.
The second proves that the validation works for both upper and lower case characters.
The third is invalid as there is a blank/space at the end of the address, which should not be part of an email address.
The fourth is invalid as there are blanks/spaces between each part of the email address.
The fifth passes the regular expression test, but fails in the DNS check.
The sixth is valid.
IBM uses a country subdomain as part of their email address. The seventh result shows that my procedure can remove the subdomain and validate the mailserver's domain, "ibm.com", without it.
My process to check the domain works even for email addresses for websites that have a TLD like "co.uk".
This is a good example of moving logic that could be used in more than one program out of those programs, and into an external procedure that can then be used by any program.
This article was written for IBM i 7.5, and should work for some earlier releases too.
Hi Simon, something still not right in your domain validation. You only really need to check if everything after the @ sign resolves to an IP address using the QSYS2.DNS_LOOKUP service. You don't need to keep checking for dots. Even your checking for dots is not doing what you hope as it never updates Flag.
ReplyDeleteFor the email address with the subdomain, US.IBM.COM, does not return a successful result from DNS_LOOKUP. If I check for the next dot I get IBM.COM which does.
DeleteAs for the Flag variable not being updated: I test extensively so I know that it does where I need it to.