Thursday, June 18, 2015

Are You Using DB2 for z/OS Real-Time Statistics? DB2 Is

DB2 for z/OS is making increased use of the information that it keeps in the real-time statistics tables in the DB2 catalog. You should be doing likewise. Read on to learn more.

For a long time, the only statistics to be found in DB2 "system" tables were those generated through execution of the RUNSTATS utility. Starting with DB2 Version 7 (around the 2000 time frame), DB2 could automatically and periodically (every 30 minutes, by default) write a variety of statistics about table spaces and index spaces to a pair of tables (one for table space statistics, and one for index space statistics), but only if you created those tables. With DB2 9 in new-function mode, the real-time statistics information was moved from user-created tables to a couple of new catalog tables, SYSIBM.SYSTABLESPACESTATS and SYSIBM.INDEXSPACESTATS (I wrote about that change a few years ago, in an entry posted to the blog that I maintained while working as an independent DB2 consultant, prior to re-joining IBM). In those days, real-time statistics were mainly used by DBAs, via vendor tools or by way of user-written programs (the latter often written in REXX, and sometimes involving invocation of the DB2-supplied stored procedure called DSNACCOX), to make more intelligent decisions regarding the execution of utilities such as REORG and COPY. That's still a very good use of DB2 real-time statistics, but what has interested me more of late -- and what I'm writing about here -- is DB2's increasing use of these statistics in support of new autonomic functions, and a number of newer columns in SYSIBM.SYSTABLESPACESTATS and SYSIBM.SYSINDEXSPACESTATS that serve useful purposes beyond smarter utility management. I'll cover these topics now, in that order.

Increased use of real-time statistics by DB2 itself

Rather than try to provide an exhaustive list of each and every usage by DB2 itself of its own real-time statistics, I'll highlight a few that are particularly interesting to me:
  • DB2-managed allocation of utility sort work data sets -- This was a terrific enhancement, in my opinion, and I blogged about it back in 2011. By way of this new feature, built into the DB2 10 base code and retrofitted to DB2 Versions 8 and 9 via PTFs, DB2 utilities such as REORG, LOAD, REBUILD INDEX, and RUNSTATS could dynamically, automatically, and intelligently (using real-time statistics) manage the allocation of the sort work data sets required for their execution. At a growing number of DB2 for z/OS sites, utility JCL and DB2 ZPARM parameter values have been changed to put the DB2 utilities in charge of sort work data set allocation, and every such site that has communicated with me about that change has reported positive results in terms of both utility job reliability (a higher percentage of utility jobs successfully run to completion) and efficiency. That said, at too many sites utility sort work data sets are still being allocated the old way. If that's true where you work, read that aforementioned blog entry and make the changes described therein. It would be a good investment of your time.
  • Automatic calculation by REORG of the appropriate fixed hash space for hash-organized table spaces. DB2 10 for z/OS introduced hash-organized tables -- a row-organizing mechanism that enables super-efficient data access for a query that issues a singleton SELECT containing an "equals" predicate that references a particular unique key of the target table. For hash organization of data to be effective, a table space's fixed hash area must be of a sufficient size; however, you don't want to overdo the sizing of this area, because that would waste space (on disk and in memory). A good way to get a "just right" fixed hash area for a hash-organized table space is to let DB2 figure that out, and that's just what the REORG utility will do -- using real-time statistics -- when you run REORG with AUTOESTSPACE YES (the default) for the table space in question.
  • DB2-calculated space to accommodate table row length increases caused by UPDATEs. Starting with DB2 11 for z/OS, you can set aside space in a table space's pages that will be reserved exclusively for the accommodation of row length-increasing UPDATE statements (previously, space in a table space's pages could be set aside only for accommodation of INSERTs). If in a CREATE TABLESPACE or ALTER TABLESPACE statement you specify PCTFREE FOR UPDATE -1, for the associated table space the initial amount of space set aside in each page (following a REORG or a LOAD REPLACE utility operation) for length-increasing UPDATEs will be 5%, and that amount will subsequently be adjusted by DB2 (for subsequent REORG or LOAD REPLACE jobs) based on real-time statistics information (setting the new ZPARM parameter PCTFREE_UPD to AUTO will make -1 the default value of PCTFREE FOR UPDATE in CREATE TABLESPACE and ALTER TABLESPACE statements).
  • Identification of indexes in need of pseudo-deleted index entry clean-up. Starting with DB2 11 for z/OS, DB2 gets much better at physically deleting index entries that had been marked as pseudo-deleted in the course of row-delete operations. To determine which indexes are most in need of pseudo-deleted index entry clean-up (done by a background process), DB2 periodically checks relevant values in the SYSIBM.SYSINDEXSPACESTATS catalog table. You can read more about this in section 4.2 of the IBM redbook titled, "IBM DB2 11 for z/OS Performance Topics" (downloadable at http://www.redbooks.ibm.com/abstracts/sg248222.html?Open).
 
As DB2 itself is making ever-greater use of real-time statistics, so should you -- and not just to better manage utility operations (though that, as mentioned, continues to be a good use of real-time stats). Below I'll call attention to some of my favorite columns in the two real-time statistics table spaces.

Some of my favorite SYSTABLESPACESTATS columns
  • DATASIZE -- The number of bytes occupied by rows in a table space.
  • REORGCLUSTERSENS -- Why REORG a table space to reestablish clustering sequence, if that isn't important for the table space? This column provides information to let you make that determination.
  • UPDATESIZE -- This column (new with DB2 11) lets you know of the extent to which a table space is growing (or shrinking) as a result of length-increasing (or length-decreasing) UPDATE operations.
  • LASTDATACHANGE -- When was data in the table space last modified?
  • REORGNEARINDREF and REORGFARINDREF -- If you set aside space in a table space's pages for length-increasing UPDATE operations (a new option, as previously noted, with DB2 11), did that action reduce the occurrence of indirect references that are the result of row relocations caused by update-related row-length increases?
 
Some of my favorite SYSINDEXSPACESTATS columns
  • LASTUSED -- A great way to identify indexes that aren't doing you any good, and could therefore be dropped (following some due diligence on your part) to free up disk space and to reduce the CPU cost of insert and delete (and some update) and utility operations.
  • REORGNUMLEVELS -- Are levels being added to, or removed from, any of your indexes?
  • DRIVETYPE -- Same information as the identically-named column in SYSTABLESPACESTATS, but for index spaces.
  • REORGLEAFFAR -- Are index page splits causing index leaf pages to be located far from where they optimally should be in an index? If so, would a larger page size make sense for this index?

I encourage you to look over all of the columns of SYSTABLESPACESTATS and SYSINDEXSPACESTATS. I'm pretty sure that you'll find some that will be helpful to you. Take advantage of this information. DB2 is.

21 comments:

  1. Robert,
    Z/OS V11 -
    Was wonder if you had any thoughts on this.

    Have a 7 partition tablespace where the the number of rows will not get populated in the SYSTABLESPACESTAS table.

    Some of the partitions have the correct number of rows but a couple of them are not even close. This makes out REORGD fail because DB2 is trying to allocated such small datasets.

    Running full Runstats against Tablespace does not update these couple of partitions in SYSTABLESPACESTATS.

    Any idea?

    Thanks,
    Tom Prus Farmers Insurance

    ReplyDelete
    Replies
    1. It's possible, Tom, that this table is populated via INSERTs, as opposed to the LOAD utility. In that case, it may be that TOTALROWS in SYSTABLESPACESTATS will only show correct values for a partition if a "base" value has been established via execution of REORG or RUNSTATS. you indicate that REORGs have failed to complete successfully (at least for some of the partitions of the PBG table space). I believe that RUNSTATS will provide the correct TOTALROWS value if it is executed with UPADTE ALL (the default) and if it is run with SHRLEVEL REFERENCE without sampling (that SHRLEVEL REFERENCE part may not be feasible in your environment).

      A thought: manually update the TOTALROWS value, via UPDATE, for rows where the values is way off. That should allow a REORG to then run successfully, as sort work data sets will be properly sized, and the successful REORG will establish the needed TOTALROWS base value.

      Two useful documentation links: how utilities affect real-time stats (https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_howutilitesaffectrts.html), and how SQL data-changing statements affect real-time stats (https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_sqlrealtimestats.html).

      Hope this helps.

      Robert

      Delete
  2. I forgot to mention - I enjoyed meeting you at MDUG conference when you spoke

    ReplyDelete
  3. Thanks so much for the advice. Changing the SHRLEVEL to reference updated the values in SYSTABLESPACESTAS. I wouldn't have guessed that or found that! Tom

    ReplyDelete
  4. Hello Robert,
    i have a question about the REORGLEAFFAR value in the RTS.
    Do you think this value is also important with SSD?

    Thx
    Alex

    ReplyDelete
    Replies
    1. Maybe slightly less important, Alex, if you are using solid-state drives. The thing is, in my experience Db2 for z/OS synchronous reads are typically serviced out of disk controller cache in a spinning-disk storage subsystem, so they're already handled quite quickly. I've seen relatively small improvements in average wait time per Db2 synchronous read when SSD technology is used versus spinning-disk. The new zHyperLink technology is what can REALLY take wait time per Db2 synchronous read way down - potentially into the 20-40 microsecond range.

      On top of this, a focus on synchronous read wait time can overlook the effect on CPU time of increased GETPAGE activity resulting from far-out-of-place index leaf pages. Those extra GETPAGEs are there regardless of the type of storage subsystem technology used.

      Robert

      Delete
  5. Robert can you talk some about SQL materialization during an sql statement, when does and does not materialize the data for a result set?

    ReplyDelete
    Replies
    1. Kind of a broad topic, there. The short answer is that Db2 for z/OS does materialization when it needs to. Suppose, for example, that execution of a query will involve a sort operation, perhaps to satisfy an ORDER BY specification for which a matching index does not exist. In that case, the result set rows will be materialized because they have to be sorted, and they will be fetched from the sorted set of rows.

      Some merge, outer and star joins can require materialization, as can non-correlated subqueries. Likewise, table expressions and views sometimes have to be materialized. Information on materialization related to views and table expressions can be found in the Db2 for z/OS online documentation, at https://www.ibm.com/docs/en/db2-for-zos/12?topic=access-materialization. Information on how to use EXPLAIN to determine when a query's access plan involves materialization can be found at https://www.ibm.com/docs/en/db2-for-zos/12?topic=access-using-explain-determine-when-materialization-occurs.

      Robert

      Delete
  6. I observed mentioning keycard freqval after update all keyword in runstats card gices rc8 in my job.Why is that?
    RUNSTATS TABLESPACE LIST LIST1
    TABLE ALL
    INDEX ALL
    UPDATE ALL
    KEYCARD FREQVAL NUMCOLS 1 COUNT 15
    INVALID OPERAND 'KEYCARD' FOR KEYWORD 'RUNSTATS'
    FREQVAL NUMCOLS 2 COUNT 15
    SHRLEVEL CHANGE
    UTILITY EXECUTION TERMINATED, HIGHEST RETURN CODE=8

    ReplyDelete
    Replies
    1. I don't know why you got the error, but the KEYCARD keyword of RUNSTATS has been deprecated - you get KEYCARD functionality whether you specify KEYCARD or not, so don't specify it.

      Robert

      Delete
  7. Hello,

    I ran runstats like below for an object ,and later when i checked rts ,it is not reflecting the date and timestamp of below execution runtime.Will only the plain runstats gets recorded/captured in rts? or is it due to invalidate cache option the rundatst is ignored to make it in rts?
    LISTDEF LS INCLUDE TABLESPACE *******

    RUNSTATS TABLESPACE LIST LS INVALIDATECACHE NO
    TABLE (ALL)
    INDEX (ALL)
    REPORT YES
    UPDATE NONE

    ReplyDelete
  8. Thanks Rob.Is the default option -update all?.From manual i see ,update all keyword at top-https://www.ibm.com/docs/en/db2-for-zos/12?topic=runstats-tablespace-syntax-options .I assume if update option not mentioned the default is update all.

    ReplyDelete
    Replies
    1. Yes. Another way that you can make this determination: you will see, in the description of UPDATE on the RUNSTATS syntax and options documentation page, you will see that ALL is underlined. This means that ALL is the default specification for UPDATE.

      Robert

      Delete
    2. Great!! thanks

      Delete
  9. When I ran runstats for index with shrlevel reference.I see below messages in log.
    DSNU124I + 347 07:15:46.43 DSNUSIIX - REGISTER NO SPECIFICATION IS IGNORED
    Is this parm introduced recently? because I have not seen this message in o/p earlier.
    But per manual-https://www.ibm.com/docs/en/db2-for-zos/12?topic=runstats-index-syntax-options,it states only when shrlevel change is in effect this keyword is used.But I used shrlevel ref, then how come it takes register yes keyword?
    RUNSTATS INDEX TXS1.IXS1
    SHRLEVEL REFERENCE

    ReplyDelete
    Replies
    1. https://datasheet.datasheetarchive.com/originals/crawler/redbooks.ibm.com/5adc5a528137a5778b56ecf466edeede.pdf
      "A new keyword called REGISTER for RUNSTATS utility is enabled to be used if SHRLEVEL
      CHANGE is specified. It controls whether to register pages read by RUNSTATS utility with
      SHRLEVEL CHANGE to Coupling Facility in a data sharing environment".Looks like even from other ibm sites,it states that it is enabled for shrlevel chnage.

      Delete
    2. You have executed RUNSTATS INDEX. Per my understanding, REGISTER NO is only valid for RUNSTATS TABLESPACE.

      Robert

      Delete
    3. But what about shrlevel option ? That is another thing to ponder.I used shrlevel reference but this option is valid only for shrlevel change. May be does this varies shop wise/instalation setting in each account?

      Delete
    4. Assuming that you're talking about the SHRLEVEL option of RUNSTATS, that will typically be CHANGE, because 1) that's the default and 2) people don't want RUNSTATS to affect data-change access to the object(s) for which RUNSTATS is being executed.

      Robert

      Robert

      Delete