Included in the technical refreshes for IBM i 7.3 TR5 and 7.2 TR9 was a new Db2 for i built in function to convert character strings into numbers.
Why is this needed when I can just CAST the one type of a column to another?
01 SELECT COLUMN1, 02 CAST(COLUMN1 AS DECIMAL(10,2)) 03 FROM QTEMP.TESTFILE |
Numbers are problematic as I can make character representations of them in many different ways. I created a few examples of character representations of numbers, and then used the above SQL statement to display them.
COLUMN1 CAST function 12345 12,345.00 12345.67 12,345.67 12345.678 12,345.67 12,345.67 ++++++++++++++ -1234.56 1,234.56- 1234- ++++++++++++++ +1234 1,234.00 $12,345.67 ++++++++++++++ 0000001.0000 1.00 |
The lines of plus symbols ( + ) denote values that could not be translated by the CAST. All my results have two decimal places even when the original value was an integer.
The new built in function comes with two names, DECFLOAT_FORMAT and TO_NUMBER, and they both do the same things. If I add them to the previous SQL statement:
01 SELECT COLUMN1, 02 CAST(COLUMN1 AS DECIMAL(10,2)) 03 DECFLOAT_FORMAT(COLUMN1), 04 TO_NUMBER(COLUMN1) 05 FROM QTEMP.TESTFILE |
I get the following results:
COLUMN1 CAST function DECFLOAT_FORMAT TO_NUMBER 12345 12,345.00 12345 12345 12345.67 12,345.67 12345.67 12345.67 12345.678 12,345.67 12345.678 12345.678 12,345.67 ++++++++++++++ +++++++++++++++ +++++++++ -1234.56 1,234.56- -1234.56 -1234.56 1234- ++++++++++++++ +++++++++++++++ +++++++++ +1234 1,234.00 1234 1234 $12,345.67 ++++++++++++++ +++++++++++++++ +++++++++ 0000001.0000 1.00 1.0000 1.0000 |
The new built in functions cannot translate same character values that the CAST could not, but that the numbers returned are more accurate representations of the original in the character column.
Before I go any further I am going to explain why I use TO_NUMBER, rather than DECFLOAT_FORMAT. In my opinion the TO_NUMBER is more descriptive of what this built in function does, converting a character string to a number. The name DECFLOAT_FORMAT is not as clear. Therefore, for the rest of this post I will be using TO_NUMBER. Just remember that as they are the same what works/does not work with TO_NUMBER is the same with DECFLOAT_FORMAT.
These built in functions have two parameters:
- The column or string to convert.
- The format string, the format of the character representation of the number.
In the above example I did not use a format string, therefore, the built in function's default was used. I am not going to list all of them here, if you are interested click on the link at the bottom of this post.
In the below example I am saying that all of the character representations of the numbers are thousand separated and have two decimal places. I can use either 0 (zeroes) or 9 to represent the numeric characters in the string. Personally I prefer using 9.
01 SELECT COLUMN1, 02 TO_NUMBER(COLUMN1,'999,999,999.99') 03 FROM QTEMP.TESTFILE |
As you have seen above not all of the "numbers" in COLUMN1 are in that format. Thus, I receive the following error:
Query cannot be run. See lower level messages. |
When I go to the job log and look at the lower level messages I see:
Program or service program QSQSETDIT in library QSYS ended. Reason code 5. User-defined function error on member TESTFILE. Cancel reply received for message CPF503E. |
From experience I have found that if not all the "numbers" are in the given format then reason code 5 is returned by the first message. I am not going to show what reason code 5 means, as in my opinion it is confusing. Just take my word that if you see reason code 5 you have at least one "number" not in the expected format.
For the first "number" that could not be translated contains a thousand separator, 12,345.67, therefore, I would use the following format string:
01 SELECT TO_NUMBER('12,345.67','999,999,999.99') 02 FROM SYSIBM.SYSDUMMY1 TO_NUMBER 12345.67 |
The next number that I had a problem with had a minus sign at the end. Even though this number does not have thousand separators I have found the format string can. The MI indicates that the minus sign is at the end of the number:
01 SELECT TO_NUMBER('1234-','999,999,999MI') 02 FROM SYSIBM.SYSDUMMY1 TO_NUMBER -1234 |
For some reason a S is used to indicate that there is a leading minus sign, an example will be given later.
The last "number" had a currency symbol, $, and thousand separators:
01 SELECT TO_NUMBER('$12,345.67','$999,999,999.99') 02 FROM SYSIBM.SYSDUMMY1 TO_NUMBER 12345.67 |
I can put commas ( , ) anywhere in the "number" string and providing the format string matches I can get a valid number, for example:
01 SELECT TO_NUMBER('123,45,6789,0','999,99,9999,9') 02 FROM SYSIBM.SYSDUMMY1 TO_NUMBER 1234567890 |
In reality we do not get columns of data in different numeric formats, all of the values adhere to one number format. I receive a file twice a week that contains two columns of "numbers":
- Quantity
- Amount in US dollars
Using these built in functions I can convert the character "number" to a real number:
01 SELECT QTYCHAR, 02 TO_NUMBER(QTYCHAR,'S999,999,999.99'), 03 AMTCHAR, 04 TO_NUMBER(AMTCHAR,'S$999,999.99') 05 FROM QTEMP.TESTFILE QTYCHAR TO_NUMBER AMTCHAR TO_NUMBER 12,345.00 12345.00 $12,345.00 12345.00 12,345.67 12345.67 $12,345.67 12345.67 -1,234.56 -1234.56 -$1,234.56 -1234.56 123.00 123.00 $123.00 123.00 0.00 0.00 $0.00 0.00 |
The real proof is if I can take those generated numbers and insert them into decimal columns in another table. My output table just has two columns, both of them decimal type.
01 CREATE TABLE QTEMP.OUTFILE ( 02 AMOUNT DECIMAL(10,2), 03 QUANTITY DECIMAL(10,2) 04 ) ; |
The Insert statement inserts the converted amount and quantity columns in all of the row from TESTFILE into the OUTFILE. The amount column could have a leading minus sign, a currency symbol, and has thousand separators, The quantity column can have a leading minus sign, and has thousand separators.
01 INSERT INTO QTEMP.OUTFILE 02 (SELECT TO_NUMBER(AMTCHAR,'S$999,999,999.99'), 03 TO_NUMBER(QTYCHAR,'S999,999,999.99') 04 FROM QTEMP.TESTFILE) |
The results are expected, the character values have been converted to numbers and inserted.
AMOUNT QUANTITY 12,345.00 12,345.00 12,345.67 12,345.67 1,234.56- 1,234.56- 123.00 123.00 .00 .00 |
You can learn more about the TO_NUMBER and DECFLOAT_FORMAT SQL built in function from the IBM website here.
This article was written for IBM i 7.3 TR5 and 7.2 TR9.
Alternative to Built-In functions in native operation. Excellent
ReplyDeleteThis is fantastic information, always seems to be a translation issue with numbers and characters....not to mention character sets.
ReplyDeleteThank you for including comparisons and explaining the differences.
ReplyDeleteAlways informative and knowledgeable 🙌 Thanks again
ReplyDeleteThanks for the useful info Simon. These functions are always helpful.
ReplyDelete