Robert's Db2 blog: Of DB2 for z/OS Clustering Indexes, Implicit and Otherwise

Wednesday, September 15, 2010

Of DB2 for z/OS Clustering Indexes, Implicit and Otherwise

Do all of your DB2 for z/OS tables have clustering indexes? Unless you have one or more tables with no indexes at all, the answer to this question is, "yes." How's that? Read on...

About three years ago, I posted an entry to my "old" blog (the one I maintained during my years as an independent DB2 consultant) in which I explained the importance of data clustering in a DB2 environment. A lot of you probably know that an index defined on a table will be used by DB2 to physically sequence that table's rows if it (the index) was created with the CLUSTER attribute (or if, sometime after being created, it was altered to have this attribute). The index with the CLUSTER attribute (and there can be only one such index for a given table) is referred to as the associated table's explicit clustering index (or, more commonly, as just the clustering index). Suppose, however, that none of a table's indexes has the CLUSTER attribute? What then? In that case, the table will have an implicit clustering index. That will be the index that was the first one defined on the table.

The effect of an implicit clustering index on DB2 processing is the same as that of an explicit clustering index:

When a new row is to be added to a table that lacks an explicit clustering index, the target data page for the row (i.e., the page into which it should go if optimal data clustering is to be maintained) will be determined via the table's implicit clustering index.
When a tablespace is reorganized using the IBM DB2 REORG utility, rows unloaded from the tablespace will be sorted based on the key of the implicit clustering index of the table (or tables) stored in the tablespace.

Note that the statement above pertaining to REORG TABLESPACE has not always been true. There was a time when the IBM DB2 REORG utility would re-sequence rows in a table only if the table had an explicit clustering index. I'm not sure when this behavior changed, but I know that at least since DB2 for z/OS Version 8, REORG will sort data rows according to a table's clustering key, regardless of whether the clustering index is explicit or implicit.

Given that a table with an index will always have a clustering index, and that an implicit clustering index works as does an explicit clustering index, you might think that defining an explicit clustering index on a table is no big deal. I'd disagree with that assessment, and here's why: relying on the status of an index as the "oldest" one on a table to ensure its use as the table's clustering index (if you go the implicit route) could lead to an unexpected situation if that index were to be accidentally dropped. See, if you re-create the dropped index according to the original DDL (which lacked the CLUSTER attribute), it will no longer be the table's implicit clustering index if there are other indexes defined on the table. Why? Because the re-created index will no longer be the oldest one on the table (it will instead be the newest). One of the other indexes on the table will be the "new oldest" one, and that one will be the table's new implicit clustering index. In my opinion, having an explicit clustering index on each of your tables that has an index is a best practice.

I'll conclude this entry with a look at two interesting (to me, anyway) scenarios. First, consider a situation in which a table's implicit clustering index is altered with the addition of a new column (an extension of ALTER INDEX functionality introduced with DB2 for z/OS Version 8). Will it still be the table's implicit clustering index? Yes. The clustering key may have changed by way of the ALTER INDEX... ADD COLUMN operation, but the index is still the oldest one defined on the table.

Scenario two: suppose that index ABC, the first one created on table XYZ, has the CLUSTER attribute. If an ALTER INDEX statement with a NOT CLUSTER specification is subsequently executed for index ABC, does that mean that the index is no longer the table's clustering index? No, that's not what it means. Until such time as an ALTER INDEX statement with a CLUSTER specification is executed for some other index on table XYZ (or until a new index with the CLUSTER attribute is defined on table XYZ), index ABC will remain the table's clustering index (albeit an implicit clustering index) by way of its status as the oldest index defined on the table.

Take out the guesswork, OK? If you are going to have a clustering index on a table (and remember, you will if there are any indexes on the table), label it as such.

10 comments:

RobertSeptember 16, 2010 at 7:55 AM
Regarding my "scenario 2" in the post above, I should point out that the key factor is not index ABC's status as the first-defined on the table; rather, it's the fact that index ABC was an explicit clustering index. Even if index ABC had been the third or fourth index defined on the table, upon being altered with a NOT CLUSTER specification, it would have continued to be table XYZ's clustering index until such time as another index on the table is altered to have the CLUSTER attribute or a new index with that attribute is defined on the table. In other words, an explicit clustering index will remain the clustering index on a table (assuming it's not dropped) until another index is made that table's explicit clustering index.

Robert Catterall
ReplyDelete
Replies
AnonymousApril 25, 2013 at 9:09 AM
Hi Robert,

Thank you for your post!
I have a question - how do I see which indexes are implicitly clustered? (DB2 v9.7.0.5)
Situation is that _none_ of the indexes have "CLUS" value in the INDEXTYPE column when looking on the resultset of admin_get_index_info.
Is there a way to see which columns are used for implicit clustering?
Or do I just do "Generate DDL" on the table and see which index comes first?

P.S.
I am not a DBA, but facing a task of improving indexes in a database (unfortunately DBAs are not pulling the weight here..).

Appreciate your time,
Bogdan
ReplyDelete
Replies
Rak RockzzMay 30, 2013 at 1:49 PM
Hi Robert,
Thank you for all the posts !!! It helps a lot.

I want to know how the clustering will be achieved if a table doesn't have any index. I want to know the order in which data is unloaded if there is no index on the table. Will it be time saving if I unload and load the table removing all the indexes on it and rebuild it later ?? we are in the middle of a project where in we are thinking to load the table removing the indexes and rebuild it after loading. I am assuming it saves time. But I am worried about the sorting order. If I unload the table without any indexes I am not sure how the order will be determined by the DB2. Then again it may take a lot of time while rebuilding the index if the order is not proper. Please let us know your thoughts on this. Thanks for your time.
ReplyDelete
Replies
UnknownMarch 16, 2016 at 1:25 PM
So even I've clustering index on table, I have to sort sysrec after unload to be sure that data is in order i want (order via clustering index) ?
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.