Wednesday, February 13, 2019

Returning rows where the last few characters are not numeric

checking last 3 characters if not number

Sometimes I am asked a question by a colleague I think would be a good example to share in this blog. This week I was asked was there a way in a SQL statement to only return the records from a file where the last three positions are not numeric. For example:

  • A2CDEF = Include
  • ABC4EF = Exclude

There is an added complication as the string I need to test is contained within a ten long character field, but it can be of any length 1 - 10.

I quickly made a test file, TESTFILE, that I can use for these examples. The file contains the following records:

FLD001
----------
ABCDEFGHIJ
KLMNOP
QRS1234567
TUV8W9X0Y
1Z2

I have explained in a previous post how to get the rightmost characters in a string, using SQL's RIGHT scalar function. Using this I can easily get the last three characters of every field:

SELECT FLD001,RIGHT(RTRIM(FLD001),3)
FROM TESTFILE

Which gives me:

FLD001      RIGHT
----------  -----
ABCDEFGHIJ  HIJ
KLMNOP      NOP
QRS1234567  567
TUV8W9X0Y   X0Y
1Z2         1Z2

Note: The fourth result is "X"-zero-"Y"

Testing the last three characters are numeric is a rare example of where Db2 for i falls short.

Other SQL databases have a regular expression that would be perfect for this kind of a test:

RIGHT(RTRIM(FLD001),3) NOT LIKE '%[0-9]%'

Alas, in Db2 I have to check each position in the last three characters if it is between 'A' and 'Z':

SELECT FLD001,RIGHT(RTRIM(FLD001),3)
FROM TESTFILE
WHERE SUBSTRING(RIGHT(RTRIM(FLD001),3),1,1) BETWEEN 'A' AND 'Z'
AND SUBSTRING(RIGHT(RTRIM(FLD001),3),2,1) BETWEEN 'A' AND 'Z'
AND SUBSTRING(RIGHT(RTRIM(FLD001),3),3,1) BETWEEN 'A' AND 'Z'

Only two records are returned in the results:

FLD001      RIGHT
----------  -----
ABCDEFGHIJ  HIJ
KLMNOP      NOP

If I wanted to return all records where any of the last three characters contain a number I would replace the AND in the WHERE clause with OR.

SELECT FLD001,RIGHT(RTRIM(FLD001),3)
FROM TESTFILE
WHERE SUBSTRING(RIGHT(RTRIM(FLD001),3),1,1) BETWEEN 'A' AND 'Z'
OR SUBSTRING(RIGHT(RTRIM(FLD001),3),2,1) BETWEEN 'A' AND 'Z'
OR SUBSTRING(RIGHT(RTRIM(FLD001),3),3,1) BETWEEN 'A' AND 'Z'

Which returns to me a few more rows in the results.

FLD001      RIGHT
----------  -----
ABCDEFGHIJ  HIJ
KLMNOP      NOP
TUV8W9X0Y   X0Y
1Z2         1Z2

 

This article was written for IBM i 7.3, and should work for some earlier releases too.

7 comments:

  1. It would be a bit more setup, but do you think the query is clearer if you use a user-defined function?

    create function isAlpha(c varchar(1))
    returns char(1)
    no external action
    deterministic
    contains sql
    return case when upper(c) between 'A' and 'Z' then 'Y'
    else 'N' end;

    select fld001,
    right(rtrim(fld001),3)
    from testfile
    where isAlpha(substring(right(trim(fld001),3),1,1)) = 'Y'
    and isAlpha(substring(right(trim(fld001),3),2,1)) = 'Y'
    and isAlpha(substring(right(trim(fld001),3),3,1)) = 'Y'

    ReplyDelete
  2. DB2 for i does have a regex function.

    To return only alpha characters in the last 3 positions:
    SELECT FLD001,RIGHT(RTRIM(FLD001),3)
    FROM TESTFILE
    WHERE
    REGEXP_INSTR(RTRIM(FLD001),'[0-9]',LENGTH(RTRIM(FLD001))-2) = 0

    ReplyDelete
  3. For checking whether the last 3 positions are numeric or not you can use the REGEXP_LIKE predicate:
    Select Text, Right(RTRIM(Text), 3)
    from (Values('ABCDE123456'), ('XXX123'), ('AAAAAAAAA'), ('134567890ABC')) a (Text)
    Where RegexP_Like(Right(Rtrim(Text), 3), '[^0-9]');

    And here is another way without regular expressions to select all values that do not have numeric values on the last 3 positions:
    Select Text, Right(RTRIM(Text), 3)
    from (Values('ABCDE123456'), ('XXX123'), ('AAAAAAAAA'), ('134567890ABC')) a (Text)
    Where Translate(Right(Rtrim(Text), 3), ' ', '1234567890') <> ''

    Birgitta

    ReplyDelete
  4. with data (s) as (values
    ('ABCDEFGHIJ'),
    ('KLMNOP'),
    ('QRS1234567'),
    ('TUV8W9X0Y'),
    ('1Z2'))
    select s,
    case when regexp_count(right(s, 3),'[^0-9]') = 0 then 'ok'
    else 'not ok'
    end
    from data
    ==>
    ABCDEFGHIJ not ok
    KLMNOP not ok
    QRS1234567 ok
    TUV8W9X0Y not ok
    1Z2 not ok

    ReplyDelete
  5. How about this?

    SELECT
    *
    FROM
    ( VALUES('ABCDEFGHIA'), ('KLMNOP'), ('QRS1234567'), ('TUV8W9X0Y'), ('1Z2') ) AS dt(test_col)
    WHERE
    POSITION(TRANSLATE(RIGHT(test_col, 3), '@@@@@@@@@@', '0123456789'), '@') = 0;

    ReplyDelete
  6. BETWEEN 'A' AND 'Z' is not enough.
    You will miss the letters ÆØÅüöä

    Instead I think you should use
    NOT BETWEEN '0' AND '9'

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.