Sunday, December 31, 2017

Db2 for z/OS: Opportunities for CPU Efficiency When Data Clustering is not a Priority

Not long ago, a veteran Db2 for z/OS DBA contacted me with an interesting question. He had, in the system he supported, a table with a clustering index, and he wanted to know how he could get Db2 to act, with regard to INSERT operations and also with regard to executions of the REORG utility, as though the table had no clustering index. In other words, the DBA had a table for which INSERT and REORG CPU efficiency mattered more than having rows physically ordered per a clustering key. That's understandable. There are certainly situations in which the "locality of reference" provided by a clustering index is not of significant value.

You might think to yourself, "This is not a question that's hard to answer - just issue an ALTER for the table's clustering index, with a NOT CLUSTER specification. Problem solved." Eh, not so fast. ALTER INDEX with NOT CLUSTER is a useful means of changing a table's explicit clustering index from IX1 (or whatever it's called) to IX2 (in that case, an ALTER INDEX IX1 NOT CLUSTER would be followed by an ALTER INDEX IX2 CLUSTER). ALTER INDEX with NOT CLUSTER is not a good way to cause a table to not have a clustering index, because if a table has any index then it will always have a clustering index.

"Hold on, there," some might say. "If IX1 is the clustering index for table T1, and I do an ALTER of IX1 with NOT CLUSTER and do not subsequently alter another index on T1 with a CLUSTER specification, table T1 has no clustering index, right?" Wrong. Given that scenario, T1 does not have an explicit clustering index. It has a clustering index of the implicit variety. And what index is that? Index IX1. "But wait," you might counter, "I just altered IX1 with NOT CLUSTER. How can IX1 still be the table's clustering index?" Because the rules of the Db2 for z/OS game are as follows: when the index that had been table T1's explicit clustering index is altered with NOT CLUSTER, that index will continue to be T1's clustering index - albeit in an implicit sense - until such time as another index on T1 is made the table's explicit clustering index by being altered with a CLUSTER specification."

"OK," says you, "The DBA should drop and re-create the table's indexes, without giving any of them the CLUSTER designation. Presto: a table with no clustering index." Wrong again (with all due respect), says I. The table still has an implicit clustering index. Which index will that be? The first one created for the table (so, if IX2 and IX3 and IX1 are created, in that order, on table T1, and CLUSTER was not specified for any of those indexes, IX2 will be T1's implicit clustering index).

"So, if a Db2 for z/OS table has any indexes at all, it will always have a clustering index?" Yes. I said that a few paragraphs ago, and I meant it. I said it a few years ago in an entry in this blog.

How, then, do you get Db2 to act as though there were no clustering index on T1, with respect to INSERT and/or REORG processing, if clustering for T1 is not a priority for you? I'll first address the REORG part of that question, then the INSERT part.

Until quite recently, you COULDN'T tell Db2 to ignore a table's clustering index when REORGing the associated table space - that index was going to be in the picture, one way or another. All you could do was tell Db2 how the clustering index would affect REORG processing: either the table's data would be unloaded by REORG in the order prescribed by the table's clustering index, or it would be sorted in clustering sequence after having been unloaded in "as-is" sequence (the latter approach is the utility's default - the former is used when REORG is executed with the SORTDATA NO option specified). Db2 11 for z/OS delivered an enhancement whereby REORG can be executed with no regard for a table's clustering index. That enhancement will be in effect if REORG is run with SORTDATA NO in combination with the new (with Db2 11) option RECLUSTER NO. With SORTDATA NO and RECLUSTER NO specified, REORG will run as quickly and efficiently as possible when re-clustering is not the motivation for executing REORG. And when might data re-clustering NOT be a motivation for running REORG? How about when REORG is executed with the DISCARD option to remove a lot of rows from a table in a very CPU-efficient manner (as was the case for the DBA I mentioned in the opening part of this blog entry)? How about when REORG is run to materialize a pending DDL change for a table space, such as a change in page or segment size (and each release of Db2 for z/OS adds things to the list of what can be non-disruptively changed for a table or table space or index via pending DDL - changes effected with an ALTER followed by an online REORG)?

OK, so REORG can be executed in a way that does not take a table's clustering key into account. How about INSERTs? Can they be executed without regard to a table's clustering index? The answer to that question is definitely, "Yes." How is that done? Pretty easy: specify APPEND YES when you create the table (or alter an existing table with APPEND YES). On top of that, if the table is in a universal table space, MEMBER CLUSTER YES should be in effect (through either specification of MEMBER CLUSTER YES in the CREATE TABLESPACE statement, or with MEMBER CLUSTER YES specified for an ALTER of an existing table space).

And here's an added bonus associated with MEMBER CLUSTER YES: it sets you up to exploit the new, ultra-high-efficiency and ultra-high-throughput insert algorithm introduced with Db2 12 for z/OS - an enhancement known as "Insert Algorithm 2." How does that work? I'll let you know in the next entry I write for this blog, which I expect to post in January (and I mean January 2018 - happy new year, folks).

In the spirit of "tell 'em what you told 'em," here's the quick reiteration of the main points made in this blog entry:

  • Any Db2 for z/OS table with at least one index has a clustering index, period.
  • If you want to run REORG as efficiently as you can, and re-clustering data in a table is not important for you, run REORG with SORTDATA NO and RECLUSTER NO.
  • If you want to maximize the efficiency of INSERT operations, and row-clustering is not important for you, use APPEND YES for the table in question (and MEMBER CLUSTER YES for the associated table space, if that table space is of the universal variety).
On to 2018!