Most of the files and tables I use contain data that is only in upper case. Occasionally I have to work with files that can contain data in mixed case, both upper and lower, or data from our overseas subsidiaries that can contain special characters: accents, umlauts, circumflexes, etc. This does present me with an issue when I need to search data in those files. I mainly use SQL to search and extract from files, and while I can use a LIKE in the SQL statement it does still compare like to like, where 'A' is not the same as 'a'.
The simple solution would appear to be to convert the incoming field/column into upper case when performing the comparison, but I have Theo Kouwnehoven to thank for an ever simpler and better solution.
For this example I have a file, TESTFILE, that contains a field FRUIT.
A R TESTFILER A FRUIT 15 |
The field contains repeating names of fruits in mixed case and, as you can see in record 9, it can contain special characters:
FRUIT --------------- 01 peach 02 APPLE 03 PineApple 04 Pear 05 pear 06 PEACH 07 BLUEBERRY 08 blackberry 09 pÉæR 10 mangO |
I could just cope with the translation of lower to upper case using the UPPER function, which is the same as UCASE.
SELECT FRUIT FROM TESTFILE WHERE UPPER(FRUIT) LIKE 'PE%' ORDER BY UPPER(FRUIT) |
This would give me the following output:
FRUIT --------------- peach PEACH Pear pear |
Using the UPPER function will result in a slight increase in the overhead used by this program, as each time a record is fetched the field FRUIT goes through the translation process. It also missed "pÉæR".
Theo Kouwnehoven came up with another solution: to change the sort sequence to *LANGIDSHR. IBM defines the sort sequence *LANGIDSHR as using "A shared-weight sort table". What this means is that it treats upper and lower case as the same, and will treat the special characters as their non-special equivalent. For example: "É" will be sorted as "E".
If I was to embed this into an RPG program it could look like:
01 dcl-s wkFRUIT char(15) ; 02 exec sql SET OPTION SRTSEQ = *LANGIDSHR, COMMIT = *NONE ; 03 exec sql DECLARE C1 CURSOR FOR SELECT FRUIT FROM TESTFILE WHERE FRUIT LIKE 'PE%' ORDER BY FRUIT FOR READ ONLY ; 04 exec sql OPEN C1 ; 05 dow (1 = 1) ; 06 exec sql FETCH C1 INTO :wkFruit ; 07 if (SQLCOD <> 0) ; 08 leave ; 09 endif ; 10 dsply wkFruit ; 11 enddo ; 12 *inlr = *on ; 13 exec sql CLOSE C1 ; |
I am not going to give an explanation on how to read a file in SQL using a cursor and fetch here. I will explain the pertinent parts of this example.
On line 2 I am using the SET OPTION to set the sort sequence to *LANGIDSHR and not to use commitment control.
I do not have to do any translation to the field FRUIT in the SELECT statement on line 3.
The FETCH on line 6 is like a read.
The output from the DSPLY would look like:
DSPLY peach DSPLY PEACH DSPLY Pear DSPLY pear DSPLY pÉæR |
This time "pÉæR" is included.
If you want to change the sort sequence in interactive SQL, STRSQL command:
- At the "Enter SQL Statements" screen press F13 (F13=Services)
- At the menu select option 1 (1. Change session attributes)
- Page down to the second screen.
- Fourth field down is "Sort sequence", change the value to *LANGIDSHR.
- Press Enter twice to update and return to the "Enter SQL Statements" screen.
You can learn more about SQL's SET OPTION on the IBM website here.
This article was written for IBM i 7.2, and it should work with earlier releases too.
Well done, amazing power in that sql
ReplyDeleteUsing sort sequences is great, but it is an all or nothing option, i.e. if changed it is used for all columns in all files.
ReplyDeleteWhen using scalar functions you can use different sequences.
In either way to get the best performance, the right indexes must be created.
When using a sort sequence only indexes created with this sortsequence can be used, otherwise table scans may be performed, which is no problem for 500 records but is a huge problem for 500 billion of records.
To ge the best performance with scalar functions, you need to create a new drived index, i.e. with one or more new columns including the scalar function.
Birgitta
Birgitta, are you suggesting that an index with an expression over that single field would be better than using *LANGIDSHR is there a scalar function that gives that result? An index with Upper would work except for the one missing value, it appears.
ReplyDeleteIts not a good habit to test SQLCOD with "SQLCOD <> 0"...it's safer to test it this way: SQLCOD < 0. This is because, as along as SQLCOD is positive, its fine...even a value of 100 (row not found) still show the embedded statement did work, even if it did not find a row. There are other embedded commands that return non-zero but positive values even when the command executed successfully.
ReplyDelete