About three years ago, I posted an entry to my "old" blog (the one I maintained during my years as an independent DB2 consultant) in which I explained the importance of data clustering in a DB2 environment. A lot of you probably know that an index defined on a table will be used by DB2 to physically sequence that table's rows if it (the index) was created with the CLUSTER attribute (or if, sometime after being created, it was altered to have this attribute). The index with the CLUSTER attribute (and there can be only one such index for a given table) is referred to as the associated table's explicit clustering index (or, more commonly, as just the clustering index). Suppose, however, that none of a table's indexes has the CLUSTER attribute? What then? In that case, the table will have an implicit clustering index. That will be the index that was the first one defined on the table.
The effect of an implicit clustering index on DB2 processing is the same as that of an explicit clustering index:
- When a new row is to be added to a table that lacks an explicit clustering index, the target data page for the row (i.e., the page into which it should go if optimal data clustering is to be maintained) will be determined via the table's implicit clustering index.
- When a tablespace is reorganized using the IBM DB2 REORG utility, rows unloaded from the tablespace will be sorted based on the key of the implicit clustering index of the table (or tables) stored in the tablespace.
Given that a table with an index will always have a clustering index, and that an implicit clustering index works as does an explicit clustering index, you might think that defining an explicit clustering index on a table is no big deal. I'd disagree with that assessment, and here's why: relying on the status of an index as the "oldest" one on a table to ensure its use as the table's clustering index (if you go the implicit route) could lead to an unexpected situation if that index were to be accidentally dropped. See, if you re-create the dropped index according to the original DDL (which lacked the CLUSTER attribute), it will no longer be the table's implicit clustering index if there are other indexes defined on the table. Why? Because the re-created index will no longer be the oldest one on the table (it will instead be the newest). One of the other indexes on the table will be the "new oldest" one, and that one will be the table's new implicit clustering index. In my opinion, having an explicit clustering index on each of your tables that has an index is a best practice.
I'll conclude this entry with a look at two interesting (to me, anyway) scenarios. First, consider a situation in which a table's implicit clustering index is altered with the addition of a new column (an extension of ALTER INDEX functionality introduced with DB2 for z/OS Version 8). Will it still be the table's implicit clustering index? Yes. The clustering key may have changed by way of the ALTER INDEX... ADD COLUMN operation, but the index is still the oldest one defined on the table.
Scenario two: suppose that index ABC, the first one created on table XYZ, has the CLUSTER attribute. If an ALTER INDEX statement with a NOT CLUSTER specification is subsequently executed for index ABC, does that mean that the index is no longer the table's clustering index? No, that's not what it means. Until such time as an ALTER INDEX statement with a CLUSTER specification is executed for some other index on table XYZ (or until a new index with the CLUSTER attribute is defined on table XYZ), index ABC will remain the table's clustering index (albeit an implicit clustering index) by way of its status as the oldest index defined on the table.
Take out the guesswork, OK? If you are going to have a clustering index on a table (and remember, you will if there are any indexes on the table), label it as such.