Wednesday, October 9, 2013

DB2 for z/OS: Something You Might Not Know About Character String Literal Values

Here's something I imagine a lot of you (or programs written by you) do all the time: issue DB2 for z/OS-accessing SQL statements that contain character string literal values. Below is a query that I issued from Data Studio earlier today, targeting a table that I'd created on one of our test DB2 subsystems:

SELECT PRODUCT_ID, UNIT_SALES
  FROM PRODUCT_SALES
  WHERE PRODUCT_ID = 'A12345';


Now, a question for you: is that character string literal value, 'A12345', fixed-length or varying-length? Considering this from your point of view, you might answer, "Fixed length." After all, it's six characters, right? No more, no less. And if you look at it again a few minutes from now, it'll still be six characters. Guess what? From DB2's point of view, the value 'A12345' in the above statement is a varying-length character string. I'm not revealing any state secrets here -- check the DB2 10 for z/OS SQL Reference and you'll see, in the first sentence of the section on character string constants, the following:

"A character string constant specifies a varying-length character string."

Of course, even a seasoned mainframe DB2 professional is not likely to have committed the entire SQL Reference to memory, and in fact plenty of people are unaware of this little nugget of information. That's generally not a big deal, because it almost never matters. Why does it hardly ever matter? Because fixed-length and varying-length character strings are compatible for purposes of assignment and comparison in a DB2 for z/OS system. Take the previously referenced SELECT statement. The 'A12345' character string constant is, as noted, treated by DB2 as a varying-length value. The PRODUCT_ID column of my PRODUCT_SALES table is defined as CHAR(6) -- a fixed-length character column. Does that matter? No -- the query runs just fine. What if you have a stored procedure with an input parameter defined as CHAR(10), and you call the stored procedure with a 10-character (or shorter) literal value provided for that input parameter? Does that matter? No -- the CALL will work just fine. Same goes for a user-defined function invocation.

Alright, so I said that the fact that DB2 treats a character string constant as a varying-length value almost never matters. "Almost never" is not the same thing as "never." So, when does it matter? I'm aware of one case, which was recently brought to my attention by a DB2 for z/OS DBA who I've known for years. This individual was trying to create a column mask in a DB2 10 for z/OS new-function mode environment. His CREATE MASK statement looked something like this (I'm using table and column names other than the real ones):

CREATE MASK ABC.COL2_MASK
 ON ABC.MY_TABLE
 FOR COLUMN COL2 RETURN
   CASE
    WHEN (1 = 1)
     THEN '999999'
    ELSE '000000'
   END
   ENABLE;


When he issued that SQL statement, it failed with a -20474 SQL code, reason code 33, indicating that "The data type of the return expression is not the same as the  data type of the column on which the column mask is defined." This didn't make sense to my DBA friend, because COL2 was defined as CHAR(6), and he'd provided 6-character literal values in the RETURN clause of the CREATE MASK statement. In this situation, the problem was related to the fact that the character string literals in the SQL statement were treated as varying-length values by DB2. Why did that matter? Because, per the DB2 10 SQL Reference, one of the rules of column mask creation is as follows:

"The result data type, null attribute, data length, subtype, encoding scheme, and CCSID of the CASE expression must be identical to those attributes of the column that is specified by column-name."

Fixed-length character strings and varying-length character strings are two different DB2 data types. Thus, the -20474 error, with reason code 33. When the DBA changed the statement as shown below, it worked:

CREATE MASK ABC.COL2_MASK
 ON ABC.MY_TABLE
 FOR COLUMN COL2 RETURN
   CASE
    WHEN (1 = 1)

     THEN CAST('999999' AS CHAR(6))
    ELSE CAST('000000' AS CHAR(6))
   END
   ENABLE;


The statement also worked when coded as follows, because the CHAR scalar function returns a fixed-length character string representation of the argument (this from the DB2 10 SQL Reference, with emphasis added by me):

CREATE MASK ABC.COL2_MASK
 ON ABC.MY_TABLE
 FOR COLUMN COL2 RETURN
   CASE
    WHEN (1 = 1)

     THEN CHAR('999999')
    ELSE CHAR('000000')
   END
   ENABLE;



Recap time. 1) Character string literals in SQL statements are treated by DB2 for z/OS as varying-length values. 2) That almost never matters. 3) It matters when you want to create a mask for a fixed-length character column.

12 comments:

  1. Hi Robert,
    I have a small doubt.
    I could not see any Indexspaces entries in the catalog table SYSCOPY.
    When I queried the syscopy with OTYPE=I , I am able to find the directory Indexspace entries only, Please help me on this.
    Thanks

    ReplyDelete
    Replies
    1. By default, DB2 for z/OS indexes are created with COPY NO. In such cases, they are rebuilt -- not recovered -- when the underlying table space is recovered. It is very common for SYSCOPY to contain very few rows pertaining to indexes. If a lot of indexes in a DB2 for z/OS subsystem are defined with COPY YES, and they are regularly backed up, you will see a lot of index-related rows in SYSCOPY.

      Robert

      Delete
    2. Thanks a Lot Robert...

      But my question is not only related to Image copy.
      Whether DB2 tracks for any utility that ran on an "Indexspace into SYSCOPY" ? . Since we need to know when the last reorg(or any other utility) was done on particular Indexspace.

      Delete
    3. Use the SYSIBM.SYSINDEXSPACESTATS catalog table for that purpose. Check the values in columns such as REORGLASTTIME.This and other columns of the table are described in the DB2 for z/OS SQL Reference (http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sysibmsysindexspacestatstable.htm).

      Robert

      Delete
    4. Thanks Robert ...

      Delete
  2. Recently I faced one issue in char string comparison as below :
    SELECT COUNT(*)
    FROM U.UPARM
    WHERE C_APPL = 'PS'
    AND X_KEY > '201406260924396848


    '
    AND X_KEY < '20140626090013747561'
    AND X_TBL NOT LIKE '%TRANSFER%'
    FOR READ ONLY
    WITH UR;
    Count o/p is 13856.

    And DDL of table above table is :
    CREATE TABLE U.UPARM
    (C_APPL CHAR(3) NOT NULL WITH DEFAULT
    ,X_KEY CHAR(22) NOT NULL WITH DEFAULT
    ,X_TBL CHAR(200) NOT NULL WITH DEFAULT
    ) IN U030.UPARM
    ;

    here X_KEY is of CHAR(22) length.

    But when I changed the above query to :
    1) With X_KEY compared to string less than 22 char :
    SELECT COUNT(*)
    FROM U.UPARM
    WHERE C_APPL = 'PS'
    AND X_KEY > '201406260924396848'

    AND X_KEY < '20140626090013747561'
    AND X_TBL NOT LIKE '%TRANSFER%'
    FOR READ ONLY
    WITH UR;
    Count o/p is 0.
    2) With X_KEY comparison to string of 22 characters :
    SELECT COUNT(*)
    FROM U.UPARM
    WHERE C_APPL = 'PS'
    AND X_KEY > '201406260924396848 '

    AND X_KEY < '20140626090013747561'
    AND X_TBL NOT LIKE '%TRANSFER%'
    FOR READ ONLY
    WITH UR;
    Count o/p is 0.

    3) With X_KEY comparison to string greater than 22 characters but less than 254 characters :
    SELECT COUNT(*)
    FROM U.UPARM
    WHERE C_APPL = 'PS'
    AND X_KEY > '201406260924396848 '

    AND X_KEY < '20140626090013747561'
    AND X_TBL NOT LIKE '%TRANSFER%'
    FOR READ ONLY
    WITH UR;
    Count o/p is 0.

    My concern is that for all the condition where X_KEY input is less than 22 chars and greater than 22 but less than 254 chars; o/p of query is zero.

    When I checked QBLOCK_TYPE field in PLAN_TABLE for above queries, queries with X_KEY input less than equal to 22, it is showing "PRUNED"; but when X_KEY i/p gets greater than 22 chars QBLOCK_TYPE will show as "SELECT".

    Can you help in understanding the o/p of above queries and QBLOCK_TYPE - "PRUNED" or "SELECT" values while analyzing the EXPLAIN of queries.

    Thanks in advance as always.

    ReplyDelete
    Replies
    1. I apologize for the delay in responding -- I have been very busy of late.

      One thing I don't understand: how is it that the initial query execution returned a count of 13856. You have predicates like these (I've made the strings very short for simplicity's sake):

      WHERE X_KEY > '7 ' AND X_KEY < '3'

      In other words, the literal value in the "greater than" comparison is greater than the literal value for the same column in the "less than" predicate. For any given row, the value in the X_KEY column could not be both greater than '7' and less than '3'. I would have expected the count for this query to be 0, not 13856.

      In any case, here is how string comparison works (and the following can be found online at http://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_stringcomparisionsintro.dita):

      "Two strings are compared by comparing the corresponding bytes of each string. If the strings do not have the same length, the comparison is made with a temporary copy of the shorter string that has been padded on the right with blanks so that it has the same length as the other string."

      So, when you had a predicate with a literal value of length (for example) 250, the 22-byte values in the X_KEY column were temporarily padded to the right with 228 blanks to equalize length for value comparison purposes.

      Robert

      Delete
    2. Robert,

      Thanks and very much appreciate your efforts in making the db2 concepts clear.

      I know asking a question is a easy task, but to come with a answer is that much difficult.

      May be I have not presented my query in a better way. Please find my query in more clear way as "Anonymous July 10, 2014 at 7:44 AM"

      Delete
    3. Yes, I saw your rephrasing of the question. The answer is the same: when DB2 compares two character strings of different length, the shorter of the two strings will be padded on the right with blanks to match the longer string, and then the two values will be compared. A predicate with a character string literal value that is longer than the length of the data type in the referenced column will NOT be ignored by DB2, even if the literal value's length is greater than 254 bytes and the referenced column is defined as CHAR (versus VARCHAR).

      Robert

      Delete
  3. Robert,

    Let me put above query in a more precise way.

    I got one sql query as mentioned below :
    Query:

    SELECT COUNT(*)
    FROM U.UPARM
    WHERE C_APPL = 'PS'
    AND X_KEY > '201406260924396848




    '
    AND X_KEY < '20140626090013747561'
    AND X_TBL NOT LIKE '%TRANSFER%'
    FOR READ ONLY
    WITH UR

    DDL Definition :
    CREATE TABLE U.UPARM_TBL
    (C_APPL CHAR(3) NOT NULL DEFAULT
    ,X_KEY CHAR(22) NOT NULL DEFAULT
    ,X_TBL CHAR(200) NOT NULL DEFAULT
    )
    IN U030.UPARM
    ;
    Here X_KEY is CHAR(22).

    When I tried executing above query in SPUFI with first occurence of X_KEY >

    'compared with char string more than 254', guess what it executed fine. It results

    into some rows. I expected it to give error with data size mismatch.

    Then I check removing the first occurrence of X_KEY from predicate, modified query

    looks like as below :

    SELECT COUNT(*)
    FROM U.UPARM
    WHERE C_APPL = 'PS'
    AND X_KEY < '20140626090013747561'
    AND X_TBL NOT LIKE '%TRANSFER%'
    FOR READ ONLY
    WITH UR ;

    It results into same number of rows and same data.

    Does it means that DB2v10 is skipping first condition when more than 254

    characters are passed for comparison.

    Here I want to know how db2v 10 acts when :

    1) predicate CHAR column is compared with char string greater than size defined in the DDL.

    2) Predicate CHAR column is compared with char string greater than CHAR data type limit(254 char).

    Appreciate your efforts in making us to understand complex db2 concept in a easy way.

    Thanks in advance.

    ReplyDelete
  4. Hi,
    I have a question as below.We have a table with colum X defined as CHAR(1) in DDL.
    and have values 0,1,2,3,4,X in that column.
    when I ran the query "select count(1) from tab where x>'0';
    I am getting 4 as count i.e 1,2,3,4 but not X
    can some explain why is this?

    Note: when we have the same situation on netezza, it is giving 5 as count.i.e 1,2,3,4,X

    ReplyDelete
    Replies
    1. Data in a DB2 for z/OS table is typically EBCDIC-encoded. I suspect that data in a Netezza table is ASCII-encoded. In the EBCDIC sort sequence, letters come before numbers. In the ASCII sort sequence it's the other way around.

      Robert

      Delete