Sometimes I am asked a question by a colleague I think would be a good example to share in this blog. This week I was asked was there a way in a SQL statement to only return the records from a file where the last three positions are not numeric. For example:
- A2CDEF = Include
- ABC4EF = Exclude
There is an added complication as the string I need to test is contained within a ten long character field, but it can be of any length 1 - 10.
I quickly made a test file, TESTFILE, that I can use for these examples. The file contains the following records:
FLD001 ---------- ABCDEFGHIJ KLMNOP QRS1234567 TUV8W9X0Y 1Z2 |
I have explained in a previous post how to get the rightmost characters in a string, using SQL's RIGHT scalar function. Using this I can easily get the last three characters of every field:
SELECT FLD001,RIGHT(RTRIM(FLD001),3) FROM TESTFILE |
Which gives me:
FLD001 RIGHT ---------- ----- ABCDEFGHIJ HIJ KLMNOP NOP QRS1234567 567 TUV8W9X0Y X0Y 1Z2 1Z2 |
Note: The fourth result is "X"-zero-"Y"
Testing the last three characters are numeric is a rare example of where Db2 for i falls short.
Other SQL databases have a regular expression that would be perfect for this kind of a test:
RIGHT(RTRIM(FLD001),3) NOT LIKE '%[0-9]%' |
Alas, in Db2 I have to check each position in the last three characters if it is between 'A' and 'Z':
SELECT FLD001,RIGHT(RTRIM(FLD001),3) FROM TESTFILE WHERE SUBSTRING(RIGHT(RTRIM(FLD001),3),1,1) BETWEEN 'A' AND 'Z' AND SUBSTRING(RIGHT(RTRIM(FLD001),3),2,1) BETWEEN 'A' AND 'Z' AND SUBSTRING(RIGHT(RTRIM(FLD001),3),3,1) BETWEEN 'A' AND 'Z' |
Only two records are returned in the results:
FLD001 RIGHT ---------- ----- ABCDEFGHIJ HIJ KLMNOP NOP |
If I wanted to return all records where any of the last three characters contain a number I would replace the AND in the WHERE clause with OR.
SELECT FLD001,RIGHT(RTRIM(FLD001),3) FROM TESTFILE WHERE SUBSTRING(RIGHT(RTRIM(FLD001),3),1,1) BETWEEN 'A' AND 'Z' OR SUBSTRING(RIGHT(RTRIM(FLD001),3),2,1) BETWEEN 'A' AND 'Z' OR SUBSTRING(RIGHT(RTRIM(FLD001),3),3,1) BETWEEN 'A' AND 'Z' |
Which returns to me a few more rows in the results.
FLD001 RIGHT ---------- ----- ABCDEFGHIJ HIJ KLMNOP NOP TUV8W9X0Y X0Y 1Z2 1Z2 |
This article was written for IBM i 7.3, and should work for some earlier releases too.
Great job, Simon.
ReplyDeleteIt would be a bit more setup, but do you think the query is clearer if you use a user-defined function?
ReplyDeletecreate function isAlpha(c varchar(1))
returns char(1)
no external action
deterministic
contains sql
return case when upper(c) between 'A' and 'Z' then 'Y'
else 'N' end;
select fld001,
right(rtrim(fld001),3)
from testfile
where isAlpha(substring(right(trim(fld001),3),1,1)) = 'Y'
and isAlpha(substring(right(trim(fld001),3),2,1)) = 'Y'
and isAlpha(substring(right(trim(fld001),3),3,1)) = 'Y'
DB2 for i does have a regex function.
ReplyDeleteTo return only alpha characters in the last 3 positions:
SELECT FLD001,RIGHT(RTRIM(FLD001),3)
FROM TESTFILE
WHERE
REGEXP_INSTR(RTRIM(FLD001),'[0-9]',LENGTH(RTRIM(FLD001))-2) = 0
For checking whether the last 3 positions are numeric or not you can use the REGEXP_LIKE predicate:
ReplyDeleteSelect Text, Right(RTRIM(Text), 3)
from (Values('ABCDE123456'), ('XXX123'), ('AAAAAAAAA'), ('134567890ABC')) a (Text)
Where RegexP_Like(Right(Rtrim(Text), 3), '[^0-9]');
And here is another way without regular expressions to select all values that do not have numeric values on the last 3 positions:
Select Text, Right(RTRIM(Text), 3)
from (Values('ABCDE123456'), ('XXX123'), ('AAAAAAAAA'), ('134567890ABC')) a (Text)
Where Translate(Right(Rtrim(Text), 3), ' ', '1234567890') <> ''
Birgitta
with data (s) as (values
ReplyDelete('ABCDEFGHIJ'),
('KLMNOP'),
('QRS1234567'),
('TUV8W9X0Y'),
('1Z2'))
select s,
case when regexp_count(right(s, 3),'[^0-9]') = 0 then 'ok'
else 'not ok'
end
from data
==>
ABCDEFGHIJ not ok
KLMNOP not ok
QRS1234567 ok
TUV8W9X0Y not ok
1Z2 not ok
How about this?
ReplyDeleteSELECT
*
FROM
( VALUES('ABCDEFGHIA'), ('KLMNOP'), ('QRS1234567'), ('TUV8W9X0Y'), ('1Z2') ) AS dt(test_col)
WHERE
POSITION(TRANSLATE(RIGHT(test_col, 3), '@@@@@@@@@@', '0123456789'), '@') = 0;
BETWEEN 'A' AND 'Z' is not enough.
ReplyDeleteYou will miss the letters ÆØÅüöä
Instead I think you should use
NOT BETWEEN '0' AND '9'