Wednesday, September 15, 2010

Of DB2 for z/OS Clustering Indexes, Implicit and Otherwise

Do all of your DB2 for z/OS tables have clustering indexes? Unless you have one or more tables with no indexes at all, the answer to this question is, "yes." How's that? Read on...

About three years ago, I posted an entry to my "old" blog (the one I maintained during my years as an independent DB2 consultant) in which I explained the importance of data clustering in a DB2 environment. A lot of you probably know that an index defined on a table will be used by DB2 to physically sequence that table's rows if it (the index) was created with the CLUSTER attribute (or if, sometime after being created, it was altered to have this attribute). The index with the CLUSTER attribute (and there can be only one such index for a given table) is referred to as the associated table's explicit clustering index (or, more commonly, as just the clustering index). Suppose, however, that none of a table's indexes has the CLUSTER attribute? What then? In that case, the table will have an implicit clustering index. That will be the index that was the first one defined on the table.

The effect of an implicit clustering index on DB2 processing is the same as that of an explicit clustering index:
  • When a new row is to be added to a table that lacks an explicit clustering index, the target data page for the row (i.e., the page into which it should go if optimal data clustering is to be maintained) will be determined via the table's implicit clustering index.
  • When a tablespace is reorganized using the IBM DB2 REORG utility, rows unloaded from the tablespace will be sorted based on the key of the implicit clustering index of the table (or tables) stored in the tablespace.
Note that the statement above pertaining to REORG TABLESPACE has not always been true. There was a time when the IBM DB2 REORG utility would re-sequence rows in a table only if the table had an explicit clustering index. I'm not sure when this behavior changed, but I know that at least since DB2 for z/OS Version 8, REORG will sort data rows according to a table's clustering key, regardless of whether the clustering index is explicit or implicit.

Given that a table with an index will always have a clustering index, and that an implicit clustering index works as does an explicit clustering index, you might think that defining an explicit clustering index on a table is no big deal. I'd disagree with that assessment, and here's why: relying on the status of an index as the "oldest" one on a table to ensure its use as the table's clustering index (if you go the implicit route) could lead to an unexpected situation if that index were to be accidentally dropped. See, if you re-create the dropped index according to the original DDL (which lacked the CLUSTER attribute), it will no longer be the table's implicit clustering index if there are other indexes defined on the table. Why? Because the re-created index will no longer be the oldest one on the table (it will instead be the newest). One of the other indexes on the table will be the "new oldest" one, and that one will be the table's new implicit clustering index. In my opinion, having an explicit clustering index on each of your tables that has an index is a best practice.

I'll conclude this entry with a look at two interesting (to me, anyway) scenarios. First, consider a situation in which a table's implicit clustering index is altered with the addition of a new column (an extension of ALTER INDEX functionality introduced with DB2 for z/OS Version 8). Will it still be the table's implicit clustering index? Yes. The clustering key may have changed by way of the ALTER INDEX... ADD COLUMN operation, but the index is still the oldest one defined on the table.

Scenario two: suppose that index ABC, the first one created on table XYZ, has the CLUSTER attribute. If an ALTER INDEX statement with a NOT CLUSTER specification is subsequently executed for index ABC, does that mean that the index is no longer the table's clustering index? No, that's not what it means. Until such time as an ALTER INDEX statement with a CLUSTER specification is executed for some other index on table XYZ (or until a new index with the CLUSTER attribute is defined on table XYZ), index ABC will remain the table's clustering index (albeit an implicit clustering index) by way of its status as the oldest index defined on the table.

Take out the guesswork, OK? If you are going to have a clustering index on a table (and remember, you will if there are any indexes on the table), label it as such.

10 comments:

  1. Regarding my "scenario 2" in the post above, I should point out that the key factor is not index ABC's status as the first-defined on the table; rather, it's the fact that index ABC was an explicit clustering index. Even if index ABC had been the third or fourth index defined on the table, upon being altered with a NOT CLUSTER specification, it would have continued to be table XYZ's clustering index until such time as another index on the table is altered to have the CLUSTER attribute or a new index with that attribute is defined on the table. In other words, an explicit clustering index will remain the clustering index on a table (assuming it's not dropped) until another index is made that table's explicit clustering index.

    Robert Catterall

    ReplyDelete
  2. Hi Robert,

    Thank you for your post!
    I have a question - how do I see which indexes are implicitly clustered? (DB2 v9.7.0.5)
    Situation is that _none_ of the indexes have "CLUS" value in the INDEXTYPE column when looking on the resultset of admin_get_index_info.
    Is there a way to see which columns are used for implicit clustering?
    Or do I just do "Generate DDL" on the table and see which index comes first?

    P.S.
    I am not a DBA, but facing a task of improving indexes in a database (unfortunately DBAs are not pulling the weight here..).

    Appreciate your time,
    Bogdan

    ReplyDelete
    Replies
    1. I was writing from a DB2 for z/OS perspective. With DB2 for LUW, if none of the indexes on a table has the CLUSTER designation, data in the table will not be clustered by any particular key (in other words, there are no implicit clustering indexes in a DB2 for LUW database - tables either have explicit clustering indexes or they are not clustered). If a DB2 for LUW table has no clustering index, newly inserted rows will be placed in the table based on an available-space algorithm.

      Robert

      Delete
  3. Hi Robert,
    Thank you for all the posts !!! It helps a lot.

    I want to know how the clustering will be achieved if a table doesn't have any index. I want to know the order in which data is unloaded if there is no index on the table. Will it be time saving if I unload and load the table removing all the indexes on it and rebuild it later ?? we are in the middle of a project where in we are thinking to load the table removing the indexes and rebuild it after loading. I am assuming it saves time. But I am worried about the sorting order. If I unload the table without any indexes I am not sure how the order will be determined by the DB2. Then again it may take a lot of time while rebuilding the index if the order is not proper. Please let us know your thoughts on this. Thanks for your time.

    ReplyDelete
    Replies
    1. Hey, glad to know that the blog is of use to you.

      I'd lean against the remove indexes/load table/rebuild indexes approach. The processing of indexes by IBM DB2 for z/OS utilities is way more efficient than it used to be, in part because (starting with DB2 9) a utility job can get index entries from DB2 by the block, as opposed to one at a time. Additionally, DB2-directed, dynamic allocation of utility sort work data sets is more effective than it used to be (as I pointed out in an entry posted to this blog in 2011: http://robertsdb2blog.blogspot.com/2011/04/you-do-let-db2-for-zos-allocate-utility.html). On top of that, the DB2 Sort tool can further improve the performance of utility sort operations such as you'd see for a load of a table on which indexes have been defined (I have an entry about DB2 Sort in another blog of mine: http://db2beyondtheengine.blogspot.com/2012/03/faster-more-efficient-more-resilient.html).

      I believe that the UNLOAD utility will unload data in the sequence of the table's clustering index, if it has an index. That's good for re-load into a table that has the same clustering key, because the recommendation for LOAD is to have the input data sorted in clustering sequence.

      If a table doesn't have an index, UNLOAD should unload data via table space scan, and that should mean that the unloaded data will be ordered as are the rows in the table.

      You can find some useful performance-boosting tips for UNLOAD and LOAD and other IBM DB2 for z/OS utilities in the IBM "redbook" titled, "DB2 9 for z/OS: Using the Utilities Suite." You can get a copy of this document at this URL: http://www.redbooks.ibm.com/abstracts/sg246289.html?Open

      Robert

      Delete
    2. CORRECTION: I stated in my comment above that I assumed that the IBM DB2 for z/OS UNLOAD utility would unload data from a table in the order of that table's clustering index key. That is an INCORRECT assumption on my part - it appears that UNLOAD unloads data rows in the sequence in which they are physically ordered in the target table at the time of the UNLOAD. That actually makes sense from a performance perspective: just unload data by sequentially going through the pages of the table space (a clustering key-ordered unload would potentially require a good bit of bouncing from one page to another, if the table space had become even somewhat disorganized).

      Robert

      Delete
    3. Hi Robert,

      Slightly confused with above answer.

      At beginning of blog you mentioned, Index defined on a table will be used by DB2 to physically sequence that table's rows if it (the index) was created with the CLUSTER attribute.

      Now in above conversation you are mentioning, DB2 for z/OS UNLOAD utility would unload data from a table in the order of that table's clustering index key. That is an INCORRECT assumption on my part - it appears that UNLOAD unloads data rows in the sequence in which they are physically ordered in the target table at the time of the UNLOAD.

      Hence for table with clustering index will have physical ordered sequence as per index only.So it should not be wrong to say UNLOAD is happening as per clustering index keys.

      Please correct me If am wrong.

      Delete
    4. The DB2 for z/OS UNLOAD utility unloads a table space's rows in the order in which they appear in the table space. If a table's rows are in sequence 1, 3, 4, 7, 2, 6, 5 (with the number indicating the row's order per the table's clustering index), they will be in that sequence in the UNLOAD SYSREC output data set -- UNLOAD will NOT sort the unloaded rows in the table's clustering sequence, nor will it access rows via the table's clustering index in order to unload them in clustering sequence.

      To put this another way: the rows in the UNLOAD SYSREC output data set will be in clustering order ONLY if they were already in that order in the table space that was unloaded.

      Robert

      Delete
  4. So even I've clustering index on table, I have to sort sysrec after unload to be sure that data is in order i want (order via clustering index) ?

    ReplyDelete
    Replies
    1. Apologies for the delayed response.

      If you use the DB2 for z/OS UNLOAD utility, and you want to be certain that the unloaded records are in clustering sequence, you need to do one of two things: either sort the output data set to get the records in that sequence, or ensure that the rows of the unloaded table are in clustering sequence before running UNLOAD (this could be the case if the target table space were reorganized and subsequently accessed in read-only mode prior to the UNLOAD).

      An alternative approach would be to execute the REORG utility with UNLOAD EXTERNAL specified - that will get you an output data set with rows sorted in cluster-key order. Another alternative would be to unload using the DSNTIAUL sample job, specifying the appropriate ORDER BY clause - not as CPU-efficient as a utility, but that may be OK if the table in question isn't huge. If the table is really big, you'd need enough space in the DB2 work file table spaces to accommodate the large ORDER BY sort.

      The IBM product DB2 High Performance Unload for z/OS enables unloading of data in a table's clustering sequence. Information on this product can be found at this URL: http://www-03.ibm.com/software/products/en/db2hpu-zos

      Robert

      Delete