For a long time, the only statistics to be found in DB2 "system" tables were those generated through execution of the RUNSTATS utility. Starting with DB2 Version 7 (around the 2000 time frame), DB2 could automatically and periodically (every 30 minutes, by default) write a variety of statistics about table spaces and index spaces to a pair of tables (one for table space statistics, and one for index space statistics), but only if you created those tables. With DB2 9 in new-function mode, the real-time statistics information was moved from user-created tables to a couple of new catalog tables, SYSIBM.SYSTABLESPACESTATS and SYSIBM.INDEXSPACESTATS (I wrote about that change a few years ago, in an entry posted to the blog that I maintained while working as an independent DB2 consultant, prior to re-joining IBM). In those days, real-time statistics were mainly used by DBAs, via vendor tools or by way of user-written programs (the latter often written in REXX, and sometimes involving invocation of the DB2-supplied stored procedure called DSNACCOX), to make more intelligent decisions regarding the execution of utilities such as REORG and COPY. That's still a very good use of DB2 real-time statistics, but what has interested me more of late -- and what I'm writing about here -- is DB2's increasing use of these statistics in support of new autonomic functions, and a number of newer columns in SYSIBM.SYSTABLESPACESTATS and SYSIBM.SYSINDEXSPACESTATS that serve useful purposes beyond smarter utility management. I'll cover these topics now, in that order.
Increased use of real-time statistics by DB2 itself
Rather than try to provide an exhaustive list of each and every usage by DB2 itself of its own real-time statistics, I'll highlight a few that are particularly interesting to me:
- DB2-managed allocation of utility sort work data sets -- This was a terrific enhancement, in my opinion, and I blogged about it back in 2011. By way of this new feature, built into the DB2 10 base code and retrofitted to DB2 Versions 8 and 9 via PTFs, DB2 utilities such as REORG, LOAD, REBUILD INDEX, and RUNSTATS could dynamically, automatically, and intelligently (using real-time statistics) manage the allocation of the sort work data sets required for their execution. At a growing number of DB2 for z/OS sites, utility JCL and DB2 ZPARM parameter values have been changed to put the DB2 utilities in charge of sort work data set allocation, and every such site that has communicated with me about that change has reported positive results in terms of both utility job reliability (a higher percentage of utility jobs successfully run to completion) and efficiency. That said, at too many sites utility sort work data sets are still being allocated the old way. If that's true where you work, read that aforementioned blog entry and make the changes described therein. It would be a good investment of your time.
- Automatic calculation by REORG of the appropriate fixed hash space for hash-organized table spaces. DB2 10 for z/OS introduced hash-organized tables -- a row-organizing mechanism that enables super-efficient data access for a query that issues a singleton SELECT containing an "equals" predicate that references a particular unique key of the target table. For hash organization of data to be effective, a table space's fixed hash area must be of a sufficient size; however, you don't want to overdo the sizing of this area, because that would waste space (on disk and in memory). A good way to get a "just right" fixed hash area for a hash-organized table space is to let DB2 figure that out, and that's just what the REORG utility will do -- using real-time statistics -- when you run REORG with AUTOESTSPACE YES (the default) for the table space in question.
- DB2-calculated space to accommodate table row length increases caused by UPDATEs. Starting with DB2 11 for z/OS, you can set aside space in a table space's pages that will be reserved exclusively for the accommodation of row length-increasing UPDATE statements (previously, space in a table space's pages could be set aside only for accommodation of INSERTs). If in a CREATE TABLESPACE or ALTER TABLESPACE statement you specify PCTFREE FOR UPDATE -1, for the associated table space the initial amount of space set aside in each page (following a REORG or a LOAD REPLACE utility operation) for length-increasing UPDATEs will be 5%, and that amount will subsequently be adjusted by DB2 (for subsequent REORG or LOAD REPLACE jobs) based on real-time statistics information (setting the new ZPARM parameter PCTFREE_UPD to AUTO will make -1 the default value of PCTFREE FOR UPDATE in CREATE TABLESPACE and ALTER TABLESPACE statements).
- Identification of indexes in need of pseudo-deleted index entry clean-up. Starting with DB2 11 for z/OS, DB2 gets much better at physically deleting index entries that had been marked as pseudo-deleted in the course of row-delete operations. To determine which indexes are most in need of pseudo-deleted index entry clean-up (done by a background process), DB2 periodically checks relevant values in the SYSIBM.SYSINDEXSPACESTATS catalog table. You can read more about this in section 4.2 of the IBM redbook titled, "IBM DB2 11 for z/OS Performance Topics" (downloadable at http://www.redbooks.ibm.com/abstracts/sg248222.html?Open).
As DB2 itself is making ever-greater use of real-time statistics, so should you -- and not just to better manage utility operations (though that, as mentioned, continues to be a good use of real-time stats). Below I'll call attention to some of my favorite columns in the two real-time statistics table spaces.
Some of my favorite SYSTABLESPACESTATS columns
- DATASIZE -- The number of bytes occupied by rows in a table space.
- REORGCLUSTERSENS -- Why REORG a table space to reestablish clustering sequence, if that isn't important for the table space? This column provides information to let you make that determination.
- DRIVETYPE -- Got any solid-state drives?
- UPDATESIZE -- This column (new with DB2 11) lets you know of the extent to which a table space is growing (or shrinking) as a result of length-increasing (or length-decreasing) UPDATE operations.
- LASTDATACHANGE -- When was data in the table space last modified?
- REORGNEARINDREF and REORGFARINDREF -- If you set aside space in a table space's pages for length-increasing UPDATE operations (a new option, as previously noted, with DB2 11), did that action reduce the occurrence of indirect references that are the result of row relocations caused by update-related row-length increases?
Some of my favorite SYSINDEXSPACESTATS columns
- LASTUSED -- A great way to identify indexes that aren't doing you any good, and could therefore be dropped (following some due diligence on your part) to free up disk space and to reduce the CPU cost of insert and delete (and some update) and utility operations.
- REORGNUMLEVELS -- Are levels being added to, or removed from, any of your indexes?
- DRIVETYPE -- Same information as the identically-named column in SYSTABLESPACESTATS, but for index spaces.
- REORGLEAFFAR -- Are index page splits causing index leaf pages to be located far from where they optimally should be in an index? If so, would a larger page size make sense for this index?
I encourage you to look over all of the columns of SYSTABLESPACESTATS and SYSINDEXSPACESTATS. I'm pretty sure that you'll find some that will be helpful to you. Take advantage of this information. DB2 is.
Z/OS V11 -
Was wonder if you had any thoughts on this.
Have a 7 partition tablespace where the the number of rows will not get populated in the SYSTABLESPACESTAS table.
Some of the partitions have the correct number of rows but a couple of them are not even close. This makes out REORGD fail because DB2 is trying to allocated such small datasets.
Running full Runstats against Tablespace does not update these couple of partitions in SYSTABLESPACESTATS.
Tom Prus Farmers Insurance
It's possible, Tom, that this table is populated via INSERTs, as opposed to the LOAD utility. In that case, it may be that TOTALROWS in SYSTABLESPACESTATS will only show correct values for a partition if a "base" value has been established via execution of REORG or RUNSTATS. you indicate that REORGs have failed to complete successfully (at least for some of the partitions of the PBG table space). I believe that RUNSTATS will provide the correct TOTALROWS value if it is executed with UPADTE ALL (the default) and if it is run with SHRLEVEL REFERENCE without sampling (that SHRLEVEL REFERENCE part may not be feasible in your environment).Delete
A thought: manually update the TOTALROWS value, via UPDATE, for rows where the values is way off. That should allow a REORG to then run successfully, as sort work data sets will be properly sized, and the successful REORG will establish the needed TOTALROWS base value.
Two useful documentation links: how utilities affect real-time stats (https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_howutilitesaffectrts.html), and how SQL data-changing statements affect real-time stats (https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_sqlrealtimestats.html).
Hope this helps.
I forgot to mention - I enjoyed meeting you at MDUG conference when you spokeReplyDelete
Thanks so much for the advice. Changing the SHRLEVEL to reference updated the values in SYSTABLESPACESTAS. I wouldn't have guessed that or found that! TomReplyDelete
i have a question about the REORGLEAFFAR value in the RTS.
Do you think this value is also important with SSD?
Maybe slightly less important, Alex, if you are using solid-state drives. The thing is, in my experience Db2 for z/OS synchronous reads are typically serviced out of disk controller cache in a spinning-disk storage subsystem, so they're already handled quite quickly. I've seen relatively small improvements in average wait time per Db2 synchronous read when SSD technology is used versus spinning-disk. The new zHyperLink technology is what can REALLY take wait time per Db2 synchronous read way down - potentially into the 20-40 microsecond range.Delete
On top of this, a focus on synchronous read wait time can overlook the effect on CPU time of increased GETPAGE activity resulting from far-out-of-place index leaf pages. Those extra GETPAGEs are there regardless of the type of storage subsystem technology used.
Robert can you talk some about SQL materialization during an sql statement, when does and does not materialize the data for a result set?ReplyDelete
Kind of a broad topic, there. The short answer is that Db2 for z/OS does materialization when it needs to. Suppose, for example, that execution of a query will involve a sort operation, perhaps to satisfy an ORDER BY specification for which a matching index does not exist. In that case, the result set rows will be materialized because they have to be sorted, and they will be fetched from the sorted set of rows.Delete
Some merge, outer and star joins can require materialization, as can non-correlated subqueries. Likewise, table expressions and views sometimes have to be materialized. Information on materialization related to views and table expressions can be found in the Db2 for z/OS online documentation, at https://www.ibm.com/docs/en/db2-for-zos/12?topic=access-materialization. Information on how to use EXPLAIN to determine when a query's access plan involves materialization can be found at https://www.ibm.com/docs/en/db2-for-zos/12?topic=access-using-explain-determine-when-materialization-occurs.