Monday, September 17, 2012

DB2 10 for z/OS: New Options for Trimming and Upgrading Indexes

First, some explanation as to what I mean by "trimming and upgrading" indexes. My use of "trimming" refers to reducing the number of indexes defined on tables, through elimination of indexes that aren't providing value. Having fewer indexes on a table is good for cost efficiency, as every index increases the CPU cost of every insert and delete operation targeting the table (and every update of an indexed column), increases the cost of execution of several DB2 utilities (including LOAD, REORG, and RUNSTATS), and drives up disk space utilization (though that cost can be mitigated through the index compression capability introduced with DB2 9 for z/OS, about which I blogged a couple of years ago while working as an independent DB2 consultant). That's clear enough, but you may wonder why an index serving no useful purpose got created in your environment in the first place. Perhaps an index created in anticipation that it would be needed for good application performance turned out to be unnecessary for that purpose. Another possibility is that an index that had been necessary for technical reasons became unnecessary (or potentially so) through some enhancement delivered with a recent release of DB2. I'll provide more information on both of these scenarios in the remainder of this blog entry.

The index "upgrade" concept is related to index "trimming," in that identification of an unnecessary index presents you with a couple of options: one is to eliminate the useless index (to reduce CPU and disk costs, as mentioned); the other is to replace that index with one that is useful -- probably in terms of reducing CPU and elapsed time for some important queries (or searched updates or deletes). I'd consider that action to be an index "upgrade" because you've improved application performance while holding the line with respect to the number of indexes defined on a table (one useless index out, one useful index in).

Whether your aim is index trimming, index upgrading, or both, the essential first step is identification of indexes that can be eliminated because they no longer serve a useful purpose (if indeed they ever did). The last several releases of DB2 for z/OS have made this task easier and more fruitful. I'll first provide a quick review of enhancements related to index trimming and upgrading that were introduced with DB2 V8 and V9, then I'll get into new opportunities for tightening up and/or boosting the effectiveness of your index configuration that were made available through DB2 10 for z/OS. So, the review:
  • Table-controlled table partitioning (DB2 V8). Though this enhancement was delivered years ago, a lot of mainframe DB2 shops have yet to leverage its value (as I pointed out in an entry I posted to this blog a few months ago). Basically, it comes down to this: when you convert an index-controlled partitioned table space to an table-controlled partitioned table space, the formerly partition-controlling index may end up becoming useless. There's your opportunity for index trimming or upgrading.
  • The LASTUSED column of the SYSIBM.SYSINDEXSPACESTATS catalog table (DB2 9). Those "I thought we'd need it, but maybe we don't" indexes were once pretty hard to find. Looking for package dependencies on an index will tell you when one is not used by static SQL statements, but what about dynamic SQL? The LASTUSED column of the real-time statistics table SYSIBM.SYSINDEXSPACESTATS makes identification of indexes that are just taking up space much easier than before, and that multiplies your index trimming and upgrading opportunities. I've blogged about this DB2 feature several times, most recently in an entry posted to this blog a couple of years ago.
  • Index-on-expression (DB2 9). Index upgrading is all about replacing an unnecessary index with one that does you some good. DB2 9 broadened the "does you some good" landscape by introducing a new category of indexes, that being indexes defined not just on columns, but on column expressions, such as SUBSTR(DEPT_NAME, 2, 3). With such an index, a predicate that had been non-indexable can be made indexable, leading to potentially dramatic query performance improvement. I blogged about this DB2 enhancement a few years ago.
  • The XML data type (DB2 9). Just storing XML data in a column defined with the XML data type (as opposed to, for example, the VARCHAR or CLOB data type) can deliver greatly improved performance with respect to querying said XML data. Create an index on that XML column (via XPath expressions allowed in index definitions), and you can REALLY speed things up (particularly when the XML documents are pretty large). This is another of those index upgrade opportunities -- and keep in mind that it pertains not only to retrieval of XML data, but to retrieval of non-XML data based on a predicate that references an XML column (an excellent reference for XML data in a mainframe DB2 environment is the IBM "red book" titled Extremely pureXML in DB2 10 for z/OS -- most of the information in which is also applicable to a DB2 9 system).

OK, now for the good stuff delivered through DB2 10 for z/OS that can take your index trimming and upgrading efforts to new levels of effectiveness. I'll start with what are called index INCLUDE columns. This is really big, in that it created a whole new category of indexes that could be eliminated, thereby paving the way for new index trimming and upgrading actions. The concept of index INCLUDE columns is simple, and best explained through an example. Suppose you have a table for which the combined values of columns C1 and C2 in each row have to be unique. To enable DB2 to enforce that requirement, you defined a unique constraint on (C1, C2), and created a unique index on (C1, C2). Suppose further that you wanted to provide index-only access for queries that reference only columns C1, C2, C3, and C4 of the table, or that you wanted to enable sort avoidance for queries that contain ORDER BY C1, C2, C3, C4. In a pre-DB2 10 environment, you addressed this second need by creating an additional index on columns C1, C2, C3, and C4. DB2 10 (in new-function mode) provides a new option: you can alter the unique index on (C1, C2) to INCLUDE (C3), and again to INCLUDE (C4) (for ALTER INDEX, only one column can be named in an INCLUDE specification -- n columns are included in an index through n executions of ALTER INDEX). In doing that, you'd end up with one index that could accomplish two purposes: 1) enforcement of uniqueness for (C1, C2) value pairs, and 2) query performance enhancement (through enablement of index-only access for queries referencing only columns C1, C2, C3, and C4; and sort avoidance for queries containing ORDER BY C1, C2, C3, C4 ). Voila! That index on (C1, C2, C3, C4) is now unnecessary and can be eliminated, clearing the way for index trimming or index upgrading. Note that INCLUDE is an option for CREATE INDEX as well as ALTER INDEX statements (in the case of CREATE INDEX, several columns can be listed in the INCLUDE part of the statement, so INCLUDE (C3, C4) is OK, syntax-wise). Whether it's an ALTER INDEX or a CREATE INDEX statement, the aim is the same: use one column (or set of columns) for enforcement of a UNIQUE constraint (or for a primary key), and with the same index include another column (or set of columns) that are "along for the ride" to provide better-performing query access paths. The extra columns added to an index via INCLUDE do NOT factor into uniqueness determination with regard to the "core" key on which the unique index is defined -- that's what enables dual-purpose indexing.

Here's another index trimming/upgrading opportunity provided courtesy of DB2 10: hash-organized tables. This is a feature that has generated a good bit of buzz in the DB2 community, but some folks have overlooked the tie-in to index trimming and upgrading. Here's how hash organization of data in a table works: when a table is created (in a DB2 10 new-function mode environment) with ORGANIZE BY HASH (or when an existing table is changed via ALTER TABLE to ADD ORGANIZE BY HASH), assignment of rows to table pages is not based on a clustering key; rather, the hash key value for a row (this is a unique key specified by the user) is run through a hashing algorithm whose output determines the target page. Generally speaking, when people think of hash-organized tables, they think of the performance boost that can be achieved when a query referencing the hash key in an equals predicate can get a row from DB2 with a single GETPAGE. That's great (for single-row access -- not so much for sequential data access), but here's something else to consider: if you're looking to alter an existing table to be hash-organized, and the unique key that you're going to use for hash-organization purposes is already defined as unique and has a corresponding uniqueness-enforcing index (and this applies as well to a primary key), that unique index will likely be unnecessary following the change to hash organization. Why? Because DB2 does not need an index on the hash key to ensure uniqueness of that key -- it can accomplish duplicate-value checking very efficiently using just the table (remember that the location of a row in a hash-organized table can be directly ascertained using the value of the hash key in the row). So, you can probably drop the index that had been previously created on the key that is the hash key for the hash-organized table, and there you have that new opportunity for index trimming or upgrading. Now, I say that you can probably drop that unique index, because it's possible that you'd want to keep it for queries that 1) reference only the column or columns in that index and 2) involve a scan of index values (recall that hash-organized tables are not good for the performance of data retrieval operations that involve sequential access).

[Note: technically speaking, converting a table to be hash-organized results in the creation by DB2 of a new index on the table, but this index -- the hash-overflow index, used to locate rows that were stored in the table space's overflow area because the hash-determined target page was full -- is a "sparse" index, containing entries only for rows in the overflow area. It does not have the CPU and disk space costs associated with traditional indexes.]

So, index trimming and upgrading is good for the efficiency of your DB2 for z/OS environment, and good for the performance of your DB2-accessing applications. DB2 10 continues the trend of giving you more opportunities to prune your index configuration, or to maintain a certain level, with respect to the number of indexes defined, while boosting overall index effectiveness (through replacement of unneeded indexes with value-delivering indexes). Invest some time in this area, and you're likely to see a good return.

5 comments:

  1. Hi Rob

    The INCLUDE feature on the index i tried and the INCLUDED columns are taking the order of the last column that exist on the index. why can't i specify ASC/DESC on my INCLUDE?

    ReplyDelete
    Replies
    1. You can't specify ASC or DESC for INCLUDEd columns because that specification wouldn't change anything. Think about it. Columns can be INCLUDEd only for a unique index. The unique key values will be in their specified sequence (ASC or DESC), and for that reason ASC or DESC for INCLUDED columns is meaningless. Consider a unique index, on COL1, COL2 of a 4-row table, with these key values:
      A1
      B2
      C3
      D4

      Now, suppose you want to INCLUDE column COL3 in this unique index. Suppose the value of COL3 in the row with unique key value A1 is X, and is Y for the row with unique key value B2, and X again for the row with unique key value C3, and Y again for the row with unique key value D4. Remember, the INCLUDEd column is "just along for the ride" with the unique key - it does not affect the order of the unique key values. With COL3 included in the index, you get these key values (and I put the INCLUDEd column key value in parentheses for each):
      A1(X)
      B2(Y)
      C3(X)
      D4(Y)

      Again, order of INCLUDEd columns is determined by the order of the index's unique key, BECAUSE that key is unique. If the "base" (i.e., non-INCLUDE) key were non-unique (and it can't be - INCLUDE can only be specified for a unique index), ASC or DESC for the INCLUDEd columns would be meaningful.

      Hope this helps.

      Robert

      Delete
  2. do you know if you can use include with a hash index?

    ReplyDelete
    Replies
    1. answered my own question:

      QUERY MESSAGES:
      THE STATEMENT COULD NOT BE PROCESSED BECAUSE ONE OR MORE IMPLICITLY CREATED
      OBJECTS ARE INVOLVED 5.

      Delete
    2. I'm not sure what you mean by the term "hash index." In a DB2 for z/OS environment, there are no "hash" indexes. There can be hash-organized tables, but indexes on these table are regular DB2 b-tree indexes. There is a hash overflow index on a hash-organized table (it's created by DB2), but this index is only used to speed access to rows that get stored in the overflow area of a table space holding a hash-organized table (rows that couldn't be placed in the target page as determined by the hash of the row's hash key value).

      Robert

      Delete