Tuesday, June 28, 2016

DB2 for z/OS: Indexes and Foreign Keys

The functionality of DB2 for z/OS is enhanced, in ways small and big, on a regular basis. There's plenty to keep up with, and it is understandable that some product improvements could fly under the radar of a database administrator and escape notice. I encountered this situation a few weeks ago. During a presentation on new features and functions of DB2 11, delivered to an audience comprised mostly of DB2 DBAs, I was asked, "When will we see a lifting of the restriction that requires an index on a foreign key to be defined on just that key in order to be useful for improving referential integrity performance? When will an index that begins with a foreign key's columns, but includes one or more additional columns, be usable as an RI performance booster?"

Do you know what the answer is to that question? It's not actually answerable in an "It will happen..." way, because the dropping of the referenced restriction is something that did happen -- and happened some time ago, at that.

Before going further with this, a little level-setting is in order. Referential integrity (RI) is, of course, a data integrity protection mechanism. Its aim is to prevent "orphan" rows from showing up in the tables of a relational database management system like DB2; so, if you have a CUSTOMERS table and an ORDERS table, and CUST_NUMBER is a column of both tables, RI helps to avoid a situation in which the CUST_NUMBER column of ORDERS contains a value that does not appear in the corresponding column of the CUSTOMERS table. In this example. CUSTOMERS is the parent table in the RI relationship, and ORDERS is the child table (also referred to as a dependent table). CUST_NUMBER in ORDERS is a foreign key. CUST_NUMBER in CUSTOMERS would be a unique key (and maybe that table's primary key).

DB2-managed RI does what it does in part by preventing deletion of a row in the CUSTOMERS table (continuing with the example introduced previously) if that row's CUST_NUMBER value appears in the CUST_NUMBER column of the ORDERS table (assuming that the behavior specified for the referential constraint is DELETE RESTRICT versus DELETE CASCADE). The "child table check" required to enforce a DELETE RESTRICT rule will perform much better if DB2 can verify existence of a matching foreign key value using an index on the child table, as opposed to having to scan the child table itself. And here is where the aforementioned enhancement comes in. For a long time, the usability of an index on a foreign key for "child check" efficiency depended on the index being defined ONLY on the foreign key. In other words, if you defined on the ORDERS table an index on (CUST_NUMBER, ORDER_AMOUNT), that index would not be usable for improving the performance of a CUST_NUMBER "child check" associated with a DELETE targeting a row in the parent CUSTOMERS table. Here's the wording that appeared in the DB2 for z/OS Version 8 Application Programming and SQL Guide, under the heading, "Indexes on foreign keys" (with emphasis added by me):

To let an index on the foreign key be used on the dependent table for a delete operation on a parent table, the columns of the index on the foreign key must be identical to and in the same order as the columns in the foreign key.

That restriction caused some organizations to have more indexes on certain tables than they wanted. If you had, for example, a table T1, with a foreign key COL1, and you needed an index on COL1, COL2 and also wanted to have an index that would improve the performance of the "child check" of a COL1 value associated with a requested DELETE of a row in the parent table of the RI relationship, you needed an index on COL1 (exact match of the foreign key) as well as the index on COL1, COL2. Kind of a bummer, because every index you define on a table makes all DELETEs and all INSERTs involving the table more expensive, and ups the cost of index-processing utilities such as REORG, RUNSTATS, and LOAD (and of of course ups disk space usage).

Lo and behold, starting with DB2 9 for for z/OS (yeah, that far back), things changed in a good way regarding indexes and foreign keys. Here's the wording from the DB2 9 Application Programming and SQL Guide, under the heading, "Defining a foreign key" (again, emphasis has been added by me):

The index on the foreign key can be used on the dependent table for delete operations on a parent table. For the index to qualify, the leading columns of the index must be identical to and in the same order as all columns in the foreign key. The index can include additional columns, but the leading columns match the definition of the foreign key.

And here's a link to a page containing the same information in the DB2 11 Knowledge Center on the Web: http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.perf/src/tpc/db2z_createindexri.html

If this DB2 enhancement made the scene without you realizing it, don't worry -- you're not alone. What's important is what you do with this information, if it's news to you. First, for a new table to be defined in your DB2 for z/OS environment, note the foreign keys (if any). Then, in determining what indexes should be defined on the table, keep in mind that an index on (COL1, COL2, COL3, ...) will be effective as a performance booster for "child checks" for foreign key COL1 (or foreign key COL1, COL2 -- what matters is that the leading column or columns of the index match the column or columns of the foreign key in question) -- you don't have to define a separate index just on the foreign key to get efficient "child checking."

For existing tables, this could be an opportunity for some index pruning that would save disk space and enhance the CPU efficiency of INSERT and DELETE and utility operations (and UPDATEs of indexed columns). Note a table's foreign keys, see if there are indexes on the table that exactly match those foreign keys, and see if there are other indexes on the table with leading columns that match a foreign key. If you find such a situation, consider dropping the index that exactly matches the foreign key if due diligence shows that doing so would not cause any problems; so, if table T1 has a foreign key COL1, COL2 and an index on (COL1, COL2) and another index on (COL1, COL2, COL3), do some analysis. See if any packages are dependent on the index on (COL1, COL2). If any are, could those packages use the index on (COL1, COL2, COL3) and still get good performance? Is there any other reason why the "exact match" index on the foreign key COL1, COL2 is needed? If not, get rid of it and realize CPU and disk space savings. You might well find that this situation exists for multiple of your tables that were created when the "exact match" rule regarding indexes on foreign keys was still in effect.

Bottom line: the indexes you have should be the indexes you need. What's "needed" can change over time. Take that knowledge and use it.