Friday, April 26, 2019

Db2 12 for z/OS Statistics Profiles: Just What the Optimizer Ordered

Here is an interesting story for you: not long ago, an analytics-type query was executed on a Db2 for z/OS system that had recently been migrated to Db2 12. The query was cancelled after it had run for 23 hours. A DBA noticed that there was a row for a table targeted by the query in SYSIBM.SYSTABLES_PROFILES in the Db2 catalog. He ran RUNSTATS for the table, specifying USE PROFILE so that the utility would pick up the options specified in the aforementioned statistics profile. The DBA observed that the RUNSTATS job "ran a little longer than usual," an indication that the utility was doing something it hadn't previously done in generating statistics for the target table. The formerly very-long-running query was subsequently re-submitted, and it successfully ran to completion in 5 minutes. Do I have your attention? Good - read on.

First, the dramatic improvement in query response time described above is not about magic. It's about statistics. Db2 for z/OS has a very sophisticated query optimizer (IBM invented cost-based SQL statement optimization more than 35 years ago), but to do its job well that optimizer needs the right statistics related to tables accessed by queries. How important are those statistics? This important: the team at the IBM Support Center that works on problems involving query performance has said for years that over 75% of these problems are resolved not by rewriting the query, not by creating or modifying an index, but by providing the Db2 optimizer with the statistics it needs to generate a well-performing access plan.

Knowing that, you might think, "OK, so how do I know what statistics the optimizer needs in order to generate the access plan that will deliver the best-possible performance for a query?" And there, traditionally, has been the rub: it can be tricky to get this right, especially for a complex query. If you have the right skills and experience, you can pore over a query access plan as rendered by a visual EXPLAIN tool, and make a good determination of the statistics, currently absent from the catalog, that would appear to be useful to the optimizer for the case at hand. Alternatively, you could "go big" on statistics-gathering for your tables, and instruct RUNSTATS (or LOAD or REORG when inline statistics-gathering is requested) to generate every possible item of statistical data for your tables, but that would substantially increase the CPU cost of utility execution. There's also the option of using a tool that has a statistics recommendation feature (an example of a product in that space is IBM's Db2 Query Workload Tuner for z/OS). Or - and this is new with Db2 12 - you can let Db2 take care of this.

Let me expand on what Db2 12 has introduced with regard to getting catalog statistics right. Db2 11 for z/OS introduced a new catalog table, called SYSIBM.SYSSTATFEEDBACK, into which Db2 places information about statistics seen by the optimizer, in the course of generating query access plans, as being absent or inconsistent (regarding the latter, consider that one can insert statistical information into catalog tables on one's own, perhaps causing an inconsistency - a value in one column that should reflect the value in another column, but doesn't). That's fine, but sometimes that statistics feedback data would just sit there - you could look at it and use it (if you correctly interpreted what you saw), or a tool could read and act on it, but Db2 itself wouldn't do anything with SYSSTATFEEDBACK data. That changed with Db2 12: in a Db2 12 system (assuming that the new STATFDBK_PROFILE parameter in ZPARM is set to the default value of YES), when data is written to SYSIBM.SYSSTATFEEDBACK, data is also written to SYSIBM.SYSTABLES_PROFILES - either a statistics profile row will be added for a table, or an existing statistics profile row will be updated. That new or updated statistics profile row provides the RUNSTATS specifications that will gather the statistics that the Db2 query optimizer itself has indicated that it needs to do what it does to best possible effect.

AND, when I say, "RUNSTATS specifications," don't think that a Db2 12-generated statistics profile is usable only with RUNSTATS. In a Db2 12 environment you can also specify USE PROFILE when executing LOAD or REORG - something you couldn't do in a Db2 11 system.

What if you want to see what kind of statistics profiles Db2 12 is generating? That information is in the PROFILE_TEXT column of SYSIBM.SYSTABLES_PROFILES. The data type for the PROFILE_TEXT column is CLOB. You can view data in a CLOB column via a query issued through IBM Data Studio or IBM Data Server Manager (two no-charge, download-able tools); alternatively, you could view PROFILE_TEXT data by issuing a query through SPUFI, if you cast the CLOB column as VARCHAR. In the example below (a query I've executed using SPUFI), I've gone with VARCHAR(500) - you can specify whatever length you need in your environment.

SELECT CAST(SCHEMA AS CHAR(30)) AS SCHEMA,    
  CAST(TBNAME AS CHAR(30)) AS TBNAME,         
  CAST(PROFILE_TEXT AS VARCHAR(500)) AS PROFILE_OPTIONS
  FROM SYSIBM.SYSTABLES_PROFILES
  WHERE...

Given the enhancement described in this blog entry, I think it would be a very good idea to utilize the USE PROFILE option of RUNSTATS (and of LOAD and REORG, as appropriate) in a Db2 12 system. What if you do that when executing RUNSTATS (or LOAD or REORG) for a table, and there is no row for the table in SYSIBM.SYSTABLES_PROFILES? Not to worry: in that case, COLUMN(ALL) INDEX(ALL) will be used for the RUNSTATS (or LOAD or REORG) job (if a RUNSTATS or LOAD or REORG job is executed without a table being named, and if no row(s) for the table(s) in the target table space exist in SYSIBM.SYSTABLES_PROFILES, the job will be executed using options TABLE(ALL) and INDEX(ALL)). What if you have already populated SYSIBM.SYSTABLES_PROFILES via the use of SET PROFILE (or UPDATE PROFILE) in RUNSTATS jobs? Again, not a problem. Those user-developed statistics profiles can still be used in a Db2 12 environment. Keep in mind that Db2 12 will create or update a statistics profile row in SYSTABLES_PROFILES only when the Db2 optimizer determines that statistics needed to properly optimize a query are missing and/or inconsistent. If a statistics profile you generated yourself gathers the statistics that the optimizer needs, Db2 will not change that profile in the catalog.

For a long time, when confronted with a poorly-performing query, Db2 DBAs have slogged through query analysis and tuning, often a time-consuming task. If your Db2 for z/OS system is at Version 12 (and STATFDBK_PROFILE is set to YES in ZPARM), I'd recommend a different approach in the face of a query performance problem: first, see if a row (or rows) for the table (or tables) accessed by the query exist in SYSIBM.SYSTABLES_PROFILES. If there is such a row (or rows) in that catalog table, execute RUNSTATS for the table(s) with a USE PROFILE specification. Then re-submit the query. If performance is still not what it needs to be, engage in the usual query analysis process. If RUNSTATS with USE PROFILE does the trick for you, you've saved yourself a lot of time and effort.

Oh, and one more thing: when the query I referenced at the beginning of this blog entry was initially resubmitted following execution of RUNSTATS with USE PROFILE for the target table, its performance did not improve at all. Why? Because the previously-prepared form of the query was still in the Db2 subsystem's dynamic statement cache. The DBA I mentioned ran RUNSTATS with INVALIDATECACHE YES to invalidate that prepared form of the statement, and when the query was subsequently submitted it was reoptimized using the statistics generated through the Db2-generated profile, and that's when runtime went from 23-hours-then-cancel to 5 minutes.

Gather catalog statistics using profiles that Db2 generates based on optimizer feedback, and you, too, could realize some nice performance results.