I am not sure if this is the most accurate title for what I want to do, obscure part of an email address as a hint for someone trying to signon to a web faced application. If someone has forgotten the email address they used to register, part of the process is to present them with a hint of the email address they used to sign up for the service. It obscures part of the username of the email address. For example, the email address:
john.smith@gmail.com |
Would be obscured to become:
jXXXXXXXXX@gmail.com |
At present this is being handled within a RPG program. But as part of the modernize drive the idea is to have this information available in a SQL View. When this was brought up in a meeting one of the programmers suggested creating an User Defined Function, UDF, that would still be a RPG program to do this. I explained that was not necessary as it could be done completely within SQL. This post describes my recommendation.
Before I start showing code I want to show that I have a small file with three email addresses that I want to obscure:
01 SELECT EMAIL_ADDRESS FROM TESTFILE EMAIL_ADDRESS -------------------------- john.smith@gmail.com marymoore456@hotmail.com LIZA_MADISON@CORPORATE.NET |
Below is similar to the RPG program that proposed to become the basis for the UDF:
01 **free 02 dcl-f TESTFILE alias ; 03 dcl-s Position int(5) ; 04 dcl-s Obscured like(EMAIL_ADDRESS) ; 05 dcl-c FromChars 'abcdefghijklmnopqrstuvwxyz1234567890._' ; 06 dcl-c ToChars 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' ; 07 dow (*on) ; 08 read TESTFILE ; 09 if (%eof) ; 10 leave ; 11 endif ; 12 Position = %scan('@' : EMAIL_ADDRESS : 1) ; 13 Obscured = 14 %subst(EMAIL_ADDRESS : 1 : 1) + 15 %xlate(FromChars : ToChars : 16 %lower(%subst(EMAIL_ADDRESS : 2 : Position - 2))) + 17 %subst(EMAIL_ADDRESS : Position) ; 18 enddo ; |
Line 1: If it is not free then it cannot be modern RPG.
Line 2: I have defined the File or Table to use the alias names for the fields or columns. I do this as I prefer to use the long name, rather than short system names.
Line 3: I am defining an unsigned integer variable to contain where in the email address the at sign is ( @ ).
Line 4: This variable is defined to be the same as the EMAIL_ADDRESS from the File or Table.
Lines 5 and 6: I need these two constants that I will be using in the program.
Line 7: This is a never-ending loop. The loop does not end. I have to leave it to get out of it.
Lines 8 - 11: I read the input File or Table. If "end of file" is returned I leave the loop.
Line 12: I want to have the position of the at symbol in the email address in its own variable as I will be using this number in more than one place.
Lines 13 – 17: This is where I obscure the email address.
Line 14: I want the first character of the email to not be obscured, which I concatenate to…
Lines 15 and 16: I am using the %XLATE built in function, BiF, to translate the character in the username to the be "X". The first of the parameters is the constant of lower-case characters, numbers, and special characters I am going to be translating from. Which is followed by the constant of all "X", as I must have one "X" for every character in from constant. Those are followed by the value to be translated. What I have done might look complicated, but it is not. I have used a substring to extract from the username part of the email address from the second to the character to the one before the at sign. Then convert that string to lower case. I decided to convert the string to lower case as then I would only need to have all the lower-case letters in the From Characters constant.
I finally concatenate the at sign and domain name to the email address.
When the program is run I get the following results in the Obscured variable:
jXXXXXXXXX@gmail.com mXXXXXXXXXXX@hotmail.com LXXXXXXXXXXX@CORPORATE.NET |
How could I do the same in SQL? Let me start showing how I would determine the parts before coming to the final statement.
Let me start with determining where the at sign is:
01 SELECT EMAIL_ADDRESS, 02 LOCATE_IN_STRING(EMAIL_ADDRESS, '@') 03 FROM TESTFILE |
Line 2: I am using the LOCATE_IN_STRING to find and return the location of the at sign.
The results are:
EMAIL_ADDRESS 00002 -------------------------- ------ john.smith@gmail.com 11 marymoore456@hotmail.com 13 LIZA_MADISON@CORPORATE.NET 13 |
Next step is to extract the string I will be changing, the piece from the second character to just before the at sign.
01 SELECT EMAIL_ADDRESS, 02 LOCATE_IN_STRING(EMAIL_ADDRESS, '@'), 03 SUBSTR(EMAIL_ADDRESS, 2,(LOCATE_IN_STRING(EMAIL_ADDRESS, '@') - 2)) 04 FROM TESTFILE |
Line 3: I substring from the email address, starting in the second position, to the place before the at sign.
My results are:
EMAIL_ADDRESS 00002 00003 -------------------------- ------ ----------- john.smith@gmail.com 11 ohn.smith marymoore456@hotmail.com 13 arymoore456 LIZA_MADISON@CORPORATE.NET 13 IZA_MADISON |
In the statement below I bring the parts from the above and combine them into a new statement to create the obscured email address:
01 SELECT EMAIL_ADDRESS, 02 SUBSTR(EMAIL_ADDRESS, 1, 1) || 03 REGEXP_REPLACE(LOWER(SUBSTR(EMAIL_ADDRESS, 2, (LOCATE_IN_STRING(EMAIL_ADDRESS, '@') - 2))), '[a-z]|[0-9]|[._]', 'x') 04 || SUBSTR(EMAIL_ADDRESS, LOCATE_IN_STRING(EMAIL_ADDRESS, '@')) 05 FROM TESTFILE |
Line 2: I use the substring function to extract the first character of the username of the email address, and by using the double pipe characters ( || ) concatenate it to…
Line 3: I am using the REGEXP_REPLACE function to do the obscuring of the part I want.
As I did in the RPG program I extract the string using the substring function and then convert it to lower case using the LOWER function.
On the third line is the part that tells REGEXP_REPLACE what to replace and with what. The first part instructs the function to replace the range: ("a" – "z") or ("0" – "9") or (period, underscore) with value in the next parameter. This happens to be the letter "x".
Line 4: I concatenate all of the above with remainder of the email address, from the at sign and whatever follows.
My results:
EMAIL_ADDRESS 00002 -------------------------- -------------------------- john.smith@gmail.com jxxxxxxxxx@gmail.com marymoore456@hotmail.com mxxxxxxxxxxx@hotmail.com LIZA_MADISON@CORPORATE.NET Lxxxxxxxxxxx@CORPORATE.NET |
But this is not a View, which what was wanted. I would define my View as:
01 CREATE OR REPLACE VIEW QTEMP.EMAIL_ADDRESSES 02 (EMAIL_ADDRESS,OBSCURED_EMAIL_ADDRESS) 03 AS 04 (SELECT EMAIL_ADDRESS, 05 SUBSTR(EMAIL_ADDRESS, 1, 1) || 06 REGEXP_REPLACE(LOWER(SUBSTR(EMAIL_ADDRESS, 2, (LOCATE_IN_STRING(EMAIL_ADDRESS, '@') - 2))), '[a-z]|[1-9]|[._]', 'x') || 07 SUBSTR(EMAIL_ADDRESS, LOCATE_IN_STRING(EMAIL_ADDRESS, '@')) 08 FROM TESTFILE) |
Line 1: My View will be called EMAIL_ADDRESSES.
Line 2: This is a very small View with just two columns: the email address and the obscured email address.
Lines 4 – 8: The same as the previous SQL statement.
When the View has been created I can just use the obscured email address straight from the View.
SELECT * FROM EMAIL_ADDRESSES EMAIL_ADDRESS OBSCURED_EMAIL_ADDRESS -------------------------- -------------------------- john.smith@gmail.com jxxxxxxxxx@gmail.com marymoore456@hotmail.com mxxxxxxxxxxx@hotmail.com LIZA_MADISON@CORPORATE.NET Lxxxxxxxxxxx@CORPORATE.NET |
By the end of the meeting the others agreed that it was better "to do it all in SQL", rather than use the RPG program UDF.
This article was written for IBM i 7.5, and should work for some earlier releases too.
What if there are special characters? Instead of translating specific characters, it would be better to catenate the required count of 'X's.
ReplyDeleteWhat kind of special characters are you thinking of?
DeleteAny valid special characters could be added to the FROM_CHAR constant.
Further to my previous comment, for better obscuration, inserting a set amount of 'X's instead of the exact count of letter being replaced. Eg. Regardless of the length of the email address, you could have the first letter plus something like 10 Xs, then the .domain.
ReplyDeleteTo mask at the database level (so it also applies to ODBC or file transfer) you can use sql masks. But be careful, once masked if one of the conditions is not true it is no longer possible to know the value of the field.
ReplyDeleteCREATE MASK EMAILMASK ON TESTFILE
FOR COLUMN EMAIL_ADDRESS RETURN
CASE WHEN (SESSION_USER = 'USER1') THEN
EMAIL_ADDRESS
WHEN (SESSION_USER = 'USER2') THEN
'XXX-XXX' || SUBSTR(EMAIL_ADDRESS, 8, 43)
WHEN (EMAIL_ADDRESS > 'abc') THEN
'YYYYYYYYYY' || SUBSTR(EMAIL_ADDRESS, 11, 40)
ELSE ' '
END