Thursday, March 30, 2023

Db2 for z/OS: If Index FTB Functionality is Disabled at Your Site, It's Time to Reconsider That

Over the course of the past three weeks, I reviewed ZPARM settings (i.e., configuration parameter values) for three different production Db2 12 for z/OS environments at three different sites, and I noticed that index FTB (fast traverse block) functionality had been disabled in all three cases. I recommended to all three associated Db2-using organizations that they change the relevant ZPARM setting to re-enable FTB functionality, after first validating that the fixes for a set of related Db2 APARs have been applied to their Db2 12 code (the changes made by the fixes are part of the base Db2 13 code). My recommendation for you, if the FTB feature of Db2 has been "turned off" at your site, is to do the same: turn it on, after doing the aforementioned check of Db2 software maintenance if you're using Db2 12. In this blog entry, I'll explain what FTB functionality is, why it was disabled at some sites, and why it's time to go from "disabled" to "enabled" where feature deactivation has happened. I'll also provide information about the fixes (PTFs) that should be on your system to ensure the robust functioning of FTB-supporting Db2 code (again, if we're talking about Db2 12 - the base Db2 13 code has the solidified FTB functionality provided by the Db2 12 fixes).


The FTB raison d'etre: efficient use of non-leaf information in Db2 indexes

Db2 for z/OS indexes serve various purposes, such as assisting with maintenance of a desired ordering of rows in a table and ensuring uniqueness of key values for which duplicates would be problematic, but for the most part indexes in a Db2 system are there to speed the execution of queries (and of non-query SQL statements that contain predicates, aka search clauses). Indexes deliver this performance boost by enabling identification of query result set rows without a laborious row-by-row examination of values. It can be said that Db2 indexes provide shortcuts that get you to your destination (a query's result set) faster than would otherwise be possible.

The information in a Db2 index is arranged in what is known as a B-tree structure. The logical representation of this structure has something of the appearance of an upside-down tree: you have the root page at the top, and the leaf pages at the bottom. In-between the root page and the leaf pages of an index (unless the underlying Db2 table is quite small), you will have one or more levels of non-leaf pages. Finding a row in a table by way of an index on the table involves what is known as an index probe operation: Db2 starts at the root page and navigates down through the other non-leaf levels to reach the leaf page that contains the searched-for key value and the ID of the row (i.e., the row ID, or RID in Db2 parlance) or rows in which the key value can be found.

OK, so what is the value of index fast traverse blocks? Well, an index probe involves GETPAGE activity. A GETPAGE is a Db2 request to examine the contents of a page in an index or a table space (when the page in question is not already in a Db2 buffer pool in memory, the GETPAGE drives a read I/O request). The more rows a table has, the more levels an index on the table can have. More index levels means more GETPAGE activity associated with use of the index, and that matters because GETPAGE activity is one of the main determinants of the CPU cost of executing a query. Index fast traverse block functionality, introduced by Db2 12 for z/OS, improves CPU efficiency for query execution by reducing index-related GETPAGE activity.

An FTB reduces index GETPAGE activity by providing Db2 with a way to get to the leaf page of an index in which a query-predicate-matching key value is found without having to perform a top-to-bottom index probe. How that works: when Db2 builds an FTB structure in memory that is based on a given index, Db2 puts in that FTB structure the information in the non-leaf pages of the index (note that this is NOT just a matter of caching the index's non-leaf pages in memory - the FTB structure has a space requirement that is considerably smaller than what would be required to cache the index's non-leaf pages in an as-is manner); furthermore, navigation through an FTB structure does not require GETPAGE activity. Yes, FTB navigation does involve some instruction path length, but less than would be needed for the index GETPAGEs that would otherwise be required to get to a target leaf page. Let's say that an index on a large table has five levels. Retrieving a table row via the index will require six GETPAGEs - five for the index and one for the table space. If, on the other hand, Db2 has built an FTB structure from the index, when a query having a predicate that matches on the index's key is executed then Db2 can go to the FTB structure with the key value referenced in the predicate, and the FTB will tell Db2, "This is the leaf page in which you'll find that key value." Db2 then does one GETPAGE to examine that leaf page's contents, finds the key value and the associated RID, and does one more GETPAGE to access the row in the table space. Thanks to the FTB, we've gone from six GETPAGEs (five for the index and one for the table space) to two GETPAGEs (one for the index leaf page, one for the table space). Pretty good.


How is FTB functionality turned off, and why would anyone do that?

The FTB "on/off switch" is the ZPARM parameter INDEX_MEMORY_CONTROL. The default value for that parameter is AUTO. When INDEX_MEMORY_CONTROL is set to AUTO, Db2 notes the size of the subsystem's buffer pool configuration (i.e., the aggregate size of the subsystem's buffer pools) and says (figuratively speaking), "OK, I can create FTB structures from indexes, and the maximum amount of in-memory space I'll use for those FTB structures is equivalent to 20% of the size of the buffer pool configuration." Note that this is not space taken away from the buffer pools - it's net additional use of the z/OS LPAR's real storage by Db2. Consider an example: Db2 subsystem DB2P has 50 GB of buffer pools. If INDEX_MEMORY_CONTROL for DB2P is set to AUTO, DB2P can use up to 10 GB (20% times 50 GB) of memory for index FTBs. The size of the DB2P buffer pool configuration is not affected by FTBs - it remains at 50 GB. Got it?

Besides AUTO, another acceptable value for INDEX_MEMORY_CONTROL is an integer between 10 and 200,000. That would set the FTB memory usage limit in terms of megabytes. Using the previous example, if the z/OS LPAR in which subsystem DB2P is running is generously configured with memory, the organization might decide to set INDEX_MEMORY_CONTROL to 20000 if they want Db2 to be able to use up to about 20 GB of memory for index FTBs, versus the 10 GB limit established via the AUTO setting (20% of the 50 GB size of the buffer pool configuration assumed for the example). If, on the other hand, the z/OS LPAR's memory resource is quite limited, the organization might opt to set INDEX_MEMORY_CONTROL to 1000, to restrict DB2P's use of memory for index FTBs to about 1 GB (I say, "about," because 1 GB is actually 1024 MB).

INDEX_MEMORY_CONTROL can also be set to DISABLE. That has the effect of turning FTB functionality off. Why would someone disable a CPU efficiency-boosting Db2 feature? Well, relatively early on in the lifecycle of Db2 12 for z/OS (which became generally available in October of 2016), a few sites encountered some issues related to index FTB functionality. In some cases, use of an FTB was seen to cause a query to return incorrect output. These situations were pretty uncommon (recall that index FTB functionality is on by default, and most Db2 12 sites with INDEX_MEMORY_CONTROL set to AUTO encountered no problems in leveraging the technology), but they were real. Some organizations heard that other organizations had had some problems related to FTB usage, so they disabled the feature as a preemptive measure. I get it.


Why using FTB functionality makes sense now

In response to the FTB-related issues mentioned above, the IBM Db2 for z/OS development team created a number of code fixes that addressed the problems reported by Db2-using organizations. These fixes and their associated APARs (an APAR is an official description of a software problem for which IBM commits to providing corrective service) are noted in a blog entry, written by members of the Db2 development organization, that can be viewed at https://community.ibm.com/community/user/datamanagement/blogs/paul-mcwilliams1/2020/10/08/new-look-ftb-db2-12. If INDEX_MEMORY_CONTROL is set to DISABLE at your site, and if you are using Db2 12 for z/OS, check to see if the PTFs listed in this blog entry have been applied to your Db2 code. If they have been applied (or if you are using Db2 13), you can use index FTB functionality with confidence. If you are using Db2 12 and the fixes have not been applied in your environment, my recommendation is to get them applied, perhaps as part of a roll-out of a new and more-current level of z/OS software maintenance at your site.

The confidence that the IBM Db2 for z/OS development team has in FTB functionality, with the corrective maintenance applied, is evidenced by a couple of things. First, Db2 12 function level 508 extended FTB functionality to non-unique indexes (it had originally been limited to unique indexes). Second, Db2 13 for z/OS makes FTB functionality available for a larger set of indexes by doubling the key-length limit for FTB-qualifying indexes - from 64 bytes to 128 bytes for unique indexes, and from 56 bytes to 120 bytes for non-unique indexes (as previously mentioned, the code corrections made for Db2 12 by the FTB-related fixes listed in the above-referenced blog entry are part of the Db2 13 base code). The Db2 development team would not have made FTB functionality available for a wider range of indexes if they were anything less than highly confident in the quality of the FTB-supporting code.

Note that if you have INDEX_MEMORY_CONTROL set to DISABLE, and you're interested in turning FTB functionality on but would like to do so in a more-controlled and more-limited way before going to a setting of AUTO, that option is available to you. As noted in the blog entry for which I provided the link, above, and in the Db2 12 and Db2 13 online documentation, you can tell Db2, via a specification of (SELECTED, AUTO) or (SELECTED, n) for INDEX_MEMORY_CONTROL (where n would be a user-designated limit, in MB, on the memory that Db2 can use for FTB structures), that FTB structures can be built only for indexes that you have identified as FTB candidates by way of the SYSINDEXCONTROL table in the Db2 catalog.

In summary, if you have the FTB-solidifying fixes applied in your Db2 12 environment, or if you are running with Db2 13, and you have INDEX_MEMORY_CONTROL set to DISABLE, you should rethink that. The current FTB code is very robust, and if you don't leverage the functionality then you're leaving CPU savings on the table. I'd prefer to see you realize those CPU savings.

Thursday, February 23, 2023

Two RMF (z/OS Monitor) Reports with which Db2 for z/OS People Should be Familiar

When it comes to analyzing the performance of a Db2 for z/OS system and associated applications, I think of a set of concentric circles. The outermost circle represents the operational environment in which Db2 is processing work - that would be the z/OS LPAR (logical partition) in which the Db2 subsystem runs. The next circle within that outer one represents the Db2 subsystem itself - its buffer pool configuration, EDM pool, RID pool, lock manager, recovery log, etc. The innermost of these concentric circles relates to the applications that access Db2-managed data. With the three circles in mind, I take an "outside-in" approach to Db2 system and application tuning. In other words, I begin with a look at the z/OS system within which Db2 is running, then I turn to the Db2 subsystem itself and lastly I analyze application-centric information. The reason for this approach? If the z/OS system in which Db2 is running is constrained in some way, there's a good chance that Db2 subsystem and application tuning actions will yield little positive impact. Similarly, if the Db2 subsystem is operating in a constrained fashion then application tuning actions may not do much good.

So, if assessing the operation of a z/OS system is important prior to turning to Db2 subsystem and/or application performance analysis, how do you determine whether the z/OS LPAR in question is running in a constrained or an unconstrained way? I do this based on examination of information in two RMF reports: the Summary report and the CPU Activity report. If you support a Db2 for z/OS system, you should be familiar with the content of these reports - in particular, some key metrics provided by the reports. In this blog entry I'll point out those key metrics and explain how I use them.

[Note: I am referring to reports generated by IBM's RMF z/OS monitor because RMF is the z/OS monitor with which I am most familiar. If your organization uses another vendor's z/OS monitor, that monitor might be able to generate reports similar to the RMF reports that are the subject of this blog entry - if need be, check with the vendor on that.]


The RMF CPU Activity report

RMF, by default (you can change this), carves the time period covered by a CPU Activity report into 15-minute intervals (so, if you requested a report for a one-hour time period you will see within that report four sub-reports, each providing information for a 15-minute part of the overall one-hour time period). Within a given 15-minute interval you will see, for the z/OS LPAR for which the report was requested, information that looks something like this (I highlighted two important values in red):


---CPU---    ---------------- TIME % ----------------
NUM  TYPE    ONLINE    LPAR BUSY    MVS BUSY   PARKED
 0    CP     100.00    87.03        86.85        0.00
 1    CP     100.00    77.76        77.68        0.00
 2    CP     100.00    83.88        83.78        0.00
 3    CP     100.00    87.07        86.91        0.00
 4    CP     100.00    76.23        76.14        0.00
 5    CP     100.00    76.79        76.71        0.00
 6    CP     100.00    80.45        80.35        0.00
 7    CP     100.00    73.29        73.24        0.00
 8    CP     100.00    63.83        69.22        0.00
 9    CP     100.00    57.78        62.95        0.00
 A    CP     100.00    35.28        48.33       17.01
TOTAL/AVERAGE          72.67        75.16
12    IIP    100.00    66.63        58.68        0.00
                                    46.30        0.00
13    IIP    100.00    26.70        23.42        0.00
                                    18.24        0.00
14    IIP    100.00     9.21         8.07        0.00
                                     6.42        0.00
3E    IIP    100.00     0.00        -----      100.00
                                    -----      100.00
TOTAL/AVERAGE          25.64        26.86


Here is an explanation of what you see in the report snippet above:
  • NUM - This is the ID of a given "engine" (processor core) configured for the LPAR.
  • TYPE - CP is short for central processor (typically referred to as a "general-purpose engine"); IIP is short for integrated information processor (typically referred to as a "zIIP engine").
  • LPAR BUSY - Engine utilization from the LPAR perspective.
  • MVS BUSY - I think of this as the busy-ness of the physical engine - if the engine is used exclusively (or almost exclusively) by the LPAR in question, the LPAR busy and MVS busy numbers should be very similar.
  • PARKED - This is the extent to which an engine's capacity is NOT available to the LPAR during the 15-minute interval (so, if the engine is seen to be 75% parked then the LPAR has access to 25% of that engine's processing capacity). When engines in a mainframe "box" (sometimes called a CEC - short for central electronic complex) are shared between several LPAR's it's not unusual to see a non-zero parked value for at least some of an LPAR's engines).
Note that for this LPAR, there are two MVS BUSY values for each zIIP engine. Why is that? Well, it indicates that the zIIP engines are running in SMT2 mode. SMT2 is short for simultaneous multi-threading 2, with the "2" meaning that z/OS can dispatch two pieces of work simultaneously to the one zIIP core. Running a zIIP engine in SMT2 mode does not double the engine's capacity (each of the two pieces of work dispatched to the one zIIP core will not run as fast as would be the case if the zIIP engine were running in "uni-thread" mode), but for a transactional workload SMT2 can enable a zIIP engine to deliver around 25-40% more throughput versus uni-thread mode (think of a one-way, single-lane road with a speed limit of 60 miles per hour versus a one-way, 2-lane road with a speed limit of 45 miles per hour - the latter will get more cars from A to B in a given time period if there's enough traffic to take advantage of the two lanes). For more information on SMT2 mode for zIIPs, see the entry on that topic that I posted to this blog a few years ago.

OK, to the values highlighted in red in the report snippet:
  • TOTAL/AVERAGE MVS BUSY for the general-purpose engines (75.16 in the report snippet) - As a general rule, application performance - especially for transactional applications (e.g., CICS-Db2, IMS TM-Db2, Db2 DDF) - will be optimal if average MVS busy for an LPAR's general-purpose engines does not exceed 80%. When average MVS busy for the general-purpose engines exceeds 90%, you can see a negative impact on the performance of Db2-accessing applications in the form of what is labeled "not accounted for" time in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information. Not-accounted-for time is in-Db2 elapsed time that is not CPU time and not one of the "known" Db2 wait times (those are so-called class 3 wait times, such as wait for database synchronous read, wait for lock/latch, wait for update/commit processing, etc.). It's literally elapsed time, related to SQL statement execution, for which Db2 cannot account. In my experience, in-Db2 not-accounted-for time is most often a reflection of wait-for-dispatch time, which itself is indicative of CPU contention. I'm generally not too concerned about not-accounted-for time as long as it's less than 10% of in-Db2 elapsed time for an application workload - particularly when it's a higher-priority transactional workload (you might tolerate a higher percentage of not-accounted-for time for a lower-priority batch workload). If not-accounted-for time exceeds 10% of in-Db2 elapsed time (again, especially for a higher-priority transactional workload), that would be a matter of concern for me, indicating that CPU contention is negatively impacting application throughput.
  • TOTAL/AVERAGE MVS BUSY for the zIIP engines (26.86 in the report snippet) - How "hot" can you run zIIP engines before zIIP engine contention becomes a concern? That depends on how many zIIP engines the LPAR has (and, to a lesser extent, whether or not the zIIPs are running in SMT2 mode). The more zIIP engines an LPAR has, the higher the average MVS busy figure for those engines can go before zIIP contention becomes an issue (in the example shown above, the LPAR has three zIIP engines that are running in SMT2 mode - in such a situation average MVS busy for the zIIP engines could probably go to 40-50% without zIIP contention becoming an issue). And when does zIIP contention become an issue? When the zIIP spill-over percentage gets too high, as explained in an entry I posted a few years ago to this blog. [Note: the report snippet shows four zIIP engines, but the fourth - the one identified as processor number 3E - is 100% parked from the LPAR's perspective. That means the LPAR had no access to zIIP processor 3E's capacity, so in effect the LPAR had three zIIP engines during the time interval.]
Below the information shown in the report snippet above, you'll see a sideways bar chart that looks something like this (again, I've highlighted some key information in red):


-----------------------DISTRIBUTION OF IN-READY WORK UNIT QUEUE-
 NUMBER OF              0    10   20   30   40   50   60   70   
 WORK UNITS     (%)     |....|....|....|....|....|....|....|....

<=  N          55.9     >>>>>>>>>>>>>>>>>>>>>>>>>>>>
 =  N +   1     3.5     >>
 =  N +   2     3.1     >>
 =  N +   3     3.5     >>
<=  N +   5     5.5     >>>
<=  N +  10    10.9     >>>>>>
<=  N +  15     5.7     >>>
<=  N +  20     4.2     >>>
<=  N +  30     3.1     >>
<=  N +  40     1.5     >
<=  N +  60     1.3     >
<=  N +  80     0.4     >
<=  N + 100     0.2     >
<=  N + 120     0.1     >
<=  N + 150     0.2     >
>   N + 150     0.2     >

N = NUMBER OF PROCESSORS ONLINE UNPARKED (16.8 ON AVG)


With regard to the report snippet above, the first thing to which I want to draw your attention is the bottom-line information about "N". We see that, for this LPAR during this 15-minute interval, N = 16.8. What does that mean? It means that the LPAR had "16.8 processor targets to which pieces of work could be dispatched." Why do I use the phrase "processor targets" instead of "processors?" It's because we tend to think of "mainframe processors" as meaning "mainframe engines," and that's not quite the case here. This report snippet goes with the first one we looked at (the second snippet appears right after the first one in the source RMF CPU Activity report), and you might recall that the first snippet showed that the LPAR's three zIIP engines are running in SMT2 mode. For that reason, those three zIIP engines are counted as six processor targets to which pieces of work can be dispatched. Thus, when the report shows that N = 16.8, we can say that 6 of the 16.8 relate to the LPAR's zIIP engines. That leaves 10.8 (16.8 - 6 = 10.8). We've accounted for the zIIP engines, so the 10.8 number relates to general-purpose engines. Where does that 10.8 come from? Refer again to the first report snippet. You'll see that the LPAR had 10 general-purpose processors that were not at all parked (i.e. that were 0% parked from the LPAR's perspective). An 11th general-purpose engine, identified as processor number A, was 17.01% parked during the interval. That means that 83% of the capacity of general-purpose processor number A (that's a hexadecimal A) was available to the LPAR during the time interval. That 83% is equivalent to 0.83, and RMF rounds 0.83 down to 0.8, and that's where the ".8" of N = 16.8 comes from. So, then, the LPAR had 6 zIIP "targets" to which work could be dispatched (3 engines, each running in SMT2 mode), and 10.8 general-purpose targets to which work could be dispatched, and that's why we have N = 16.8.

With N now understood, we can turn our attention to the other bit of information I highlighted in red: <= N 55.9. What does that mean? It means that for 55.9% of the time in the 15-minute report interval, the number of "in and ready" tasks (i.e., the number of tasks ready for dispatch) was less than or equal to the number of processor targets to which pieces of work in the LPAR could be dispatched. When that is true - when the number of in-and-ready tasks is <= N - there is essentially nothing in the way of CPU constraint, because an in-and-read task won't have to wait in line to get dispatched to a processor. In my experience, when the <= N figure is above 80%, the LPAR is very unconstrained in terms of processing capacity. A figure between 50% and 80% is indicative of what I'd call moderate CPU constraint, and performance (particularly in terms of throughput) is likely not impacted much by a lack of processing capacity. When the figure is below 50%, I'd say that CPU constraint could be impacting throughput in a noticeable way, and if it's below 10% the performance impact of CPU constraint for the LPAR could be severe. As previously mentioned, the Db2 performance impact of a CPU-constrained system is typically apparent in elevated levels of in-Db2 not-accounted-for time, as seen in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information.

One more thing about an RMF CPU Activity report: the number of engines configured for an LPAR - something that is shown in the report - should be balanced by an adequate amount of memory (aka real storage) so that the LPAR's processing power can be fully exploited to maximize application performance. For a z/OS LPAR in which a production Db2 workload runs, my rule of thumb, based on years of analyzing system and application performance data, is this: the LPAR should have at least 20 GB of memory per engine with which it is configured. The first report snippet included above shows that the LPAR has 13.8 engines: 10.8 general-purpose engines (as previously mentioned, the ".8" relates to an engine that is about 20% parked from the LPAR's perspective) and 3 zIIP engines (and for balanced-configuration purposes, I count physical zIIP cores - I don't double-count a zIIP engine because it is running in SMT2 mode). I'd round the 13.8 to 14 (the nearest integer) and say that on that basis the LPAR should have at least 14 X 20 GB = 280 GB of memory. If that seems like a lot to you, it shouldn't - mainframe memory sizes are getting bigger all the time, and real storage resources in the hundreds of GB are no longer unusual for production z/OS LPARs, especially those in which Db2 workloads run (the biggest real storage size I've personally seen for a z/OS LPAR is about 1100 GB).


The RMF Summary report

An RMF Summary report is smaller in size than a CPU Activity report - typically, one line of information for each 15 minute interval within the report time period. An RMF Summary report for a one-hour period could look something like what you see below (I removed some columns so that I could use a big-enough-to-read font size - the really important column is the one on the far right, with the numbers highlighted in green):

 
NUMBER OF INTERVALS 4      TOTAL LENGTH OF INTERVALS 00.59.58
-DATE   TIME     INT  ... JOB  JOB  TSO  TSO  STC ... SWAP DEMAND
 MM/DD HH.MM.SS MM.SS ... MAX  AVE  MAX  AVE  MAX ... RATE PAGING
 11/03 09.15.00 15.00 ...  83   72   96   92  371 ... 0.00   0.00
 11/03 09.30.00 14.59 ...  85   68   98   95  369 ... 0.00   0.00
 11/03 09.45.00 15.00 ...  75   68   95   92  363 ... 0.00   0.00
 11/03 10.00.00 14.59 ...  82   70   94   91  365 ... 0.00   0.00
-TOTAL/AVERAGE        ...  85   69   98   93  371 ... 0.00   0.00


So, what's the LPAR's demand paging rate? It's the rate, per second, at which pages that had been moved by z/OS from real to auxiliary storage (to make room for other pages that needed to be brought into real storage) were brought back into real storage on-demand (i.e., because some process needs to access the page). Why is the demand paging rate important? Here's why: it is, in my opinion, the best indicator of whether or not memory usage can be expanded without putting too much pressure on the LPAR's real storage resource. Here's what I mean by that: suppose you have a Db2 buffer pool that has a total read I/O rate (synchronous reads + sequential prefetch reads + list prefetch reads + dynamic prefetch reads, per second) that's higher than you like - maybe the total read I/O rate for the pool is north of 1000 per second, and you want to bring that down substantially to boost application performance and CPU efficiency (every read I/O eliminated saves CPU and elapsed time). The best way to lower a buffer pool's total read I/O rate is to make the pool larger. Can you do that without putting too much pressure on the LPAR's real storage resource? Here's what I'd say: If the LPAR's demand paging rate is consistently less than 1 per second, you have a green light for using more memory to get a performance boost. If the LPAR's demand paging rate is consistently greater than 1 per second, I'd hold off on using more memory until the LPAR is configured with additional real storage. This goes for any action that would increase memory usage by DB2 - besides enlarging a buffer pool, that could be a RID pool or a sort pool or an EDM pool size increase, or increasing the use of RELEASE(DEALLOCATE) packages with threads that persist through commits, or whatever. Before doing something that will increase memory usage, check the LPAR's demand paging rate.

That's it for now. If you haven't had a look at these two RMF reports before, get them for an LPAR of interest to you and give them a look-see - a z/OS systems programmer at your site would probably be happy to generate the reports for you. Knowing the key utilization and configuration characteristics of the z/OS LPAR in which a Db2 subsystem runs is an important part of effective performance management of the Db2 environment.

Friday, January 20, 2023

Db2 for z/OS: What is "Wait for Other Read" Time, and What Can You Do About It?

A recent conversation I had with some folks who support a large Db2 for z/OS system reminded me of the importance of something called "wait for other read time." In this blog entry I want to make clear to people what Db2 wait-for-other-read time is, why it's important, how to monitor it and what to do about it if it becomes an issue.


What is Db2 for z/OS "wait for other read" time?

In Db2 performance monitoring parlance, time associated with SQL statement execution is known as "in-Db2" time. It's also called "class 2" time, because it is recorded, for monitoring purposes, in records that are generated when Db2 accounting trace class 2 is active. Class 2 elapsed time (elapsed time pertaining to SQL statement execution) has two main components: CPU time (some of which is consumed on so-called general-purpose processors - aka "engines" - of a mainframe server, and some of which might be consumed on what are known as zIIP engines) and suspend time (on a busy system there can be another component of in-Db2 time, called "not accounted for" time, that generally reflects wait-for-dispatch time). In-Db2 suspend time is also known as "class 3" time, because it is recorded in Db2 accounting trace records when accounting trace class 3 is active. Class 3 time is broken out in a number of categories, and these show up in an accounting long report that might be generated by your Db2 monitor, or by a Db2 monitor's online display of thread detail information.

In a Db2 monitor-generated accounting long report, class 3 suspend times are shown as "average" values. Average per what? Well, if you're looking at information for a Db2-accessing batch workload (referring to jobs that run in z/OS JES initiator address spaces and access Db2 by way of Db2's call attachment facility or TSO attachment facility), it'll be average per batch job (generally speaking, activity for one batch job will be recorded in one Db2 accounting trace record). If you're looking at a transactional workload (e.g., a CICS-Db2 workload, or a Db2-accessing IMS transactional workload, or a DDF client-server workload), the "average" values seen in a Db2 monitor-generated accounting long report will typically be average per transaction.

In many cases, the majority of in-Db2 time for a batch or a transactional workload will be class 3 suspend time (it is a little unusual, but certainly not unheard of, for a Db2 workload's in-Db2 time to be mostly CPU time). More often than not, the largest component of in-Db2 class 3 suspension time will be wait-for-synchronous-database-read time. Another wait-for-read time is labeled "wait for other read." What's that? Well, if it's "other than" synchronous read wait time, it must be asynchronous read time, right? Right, indeed. And what are asynchronous reads? Those are prefetch reads: read I/Os driven by Db2 in anticipation that the pages read into memory in bulk in this way will be requested by the process (such as an application process) that prompted Db2 to issue the prefetch read requests. Well, if a prefetch read I/O operation is executed because Db2 is aiming to get pages into a buffer pool in memory before they are requested by (for example) an application process, why would there be such a thing as a process having to wait for a prefetch read to complete?

Wait-for-prefetch read (reported as "wait for other read") happens because there are usually lots of Db2-accessing processes active in a system at one time. Let's call two of these processes process A and process B, and let's say that Db2 is driving prefetch reads (these could be sequential, list or dynamic prefetch reads - more on that in a moment) for process A. We'll further suppose that Db2 needs to access page 123 of table space TS1 on behalf of process B (i.e., Db2 issues a GETPAGE request for page 123 of table space TS1). If page 123 of table space TS1 is not already in the buffer pool to which TS1 is assigned, Db2 will drive a synchronous read request to get that page into memory, right? Not necessarily. It could be that page 123 of TS1 is already scheduled to be brought into memory via a prefetch read that is being executed on behalf of process A. If that is the case then process B will wait for that in-flight prefetch read to complete, and that wait time will be recorded as "wait for other read time" for process B. [It is also possible that process A has gotten to the point that it needs to access page 123 of TS1, and the prefetch read that will bring that page into memory is currently in-flight, and that would end up causing wait-for-other-read time for process A related to the prefetch request being driven on behalf of process A, but I think it's more likely that wait-for-other-read time will be associated with one process waiting on completion of a prefetch read operation that is being executed on behalf of another process.]


Why is wait-for-other-read time important?

Usually, wait-for-other-read time is a relatively small percentage of total class 3 suspend time for a process (it's typically much smaller than wait-for-synchronous-read time), but that's not always the case. In some situations, wait-for-other-read time is a major component of overall in-Db2 suspend time. The performance impact of elevated wait-for-other-read time can be especially significant for batch applications, as these Db2 processes are often particularly reliant on prefetch to achieve elapsed time objectives. If wait-for-other-read time gets too large then service levels could degrade, leading to user dissatisfaction.


How can wait-for-other-read time be monitored?

As mentioned previously, wait-for-other read time is recorded in accounting long (i.e., accounting detail) reports that can be generated by a Db2 performance monitor; so, you can track that for a process or a workload over time and note trends. Besides wait-for-other-read time itself, are there any other related fields in Db2 monitor-generated reports that you should keep your eye on to help ensure that a wait-for-other-read time problem does not sneak up on you? Yes, as explained below.

The "other related fields" that I'd recommend checking out are found in a Db2 monitor-generated statistics long report (i.e., statistics detail report). In such a report you would see, for each buffer pool, a set of fields like those shown below (this is a snippet of a statistics long report generated by the IBM OMEGAMON for Db2 for z/OS performance monitor - I've added some A, B, C labels that I'll subsequently use in referencing various of these fields):


BP1 READ OPERATIONS          QUANTITY  /SECOND
---------------------------  --------  -------

SEQUENTIAL PREFETCH REQUEST   5622.00     3.23   A
SEQUENTIAL PREFETCH READS     5587.00     3.21   B
PAGES READ VIA SEQ.PREFETCH  52950.00    30.43   C
S.PRF.PAGES READ/S.PRF.READ      9.48            D
LIST PREFETCH REQUESTS       47394.00    27.24   E
LIST PREFETCH READS           5876.00     3.38   F
PAGES READ VIA LIST PREFTCH    154.9K    89.03   G
L.PRF.PAGES READ/L.PRF.READ     26.36            H
DYNAMIC PREFETCH REQUESTED     378.3K   217.42   I
DYNAMIC PREFETCH READS         157.6K    90.59   J
PAGES READ VIA DYN.PREFETCH   3110.6K  1787.68   K
D.PRF.PAGES READ/D.PRF.READ     19.73            L

By way of explanation, I'll first point out that what you see above are three repeating sets of fields (4 fields in each set) that pertain to sequential, list and dynamic prefetch activity. Here are thumbnail definitions of these prefetch types:

  • Sequential - Generally speaking, this is the prefetch mode used for table space scans or for non-matching index scans. In other words, if Db2 determines that a front-to-back scan of a table space or index will be required, sequential prefetch will be used (assuming that the table or index in question is not super-small, in which case prefetch of any kind would usually not make sense).
  • List - This is the prefetch type used when Db2 is retrieving table rows based on a list of row IDs (RIDs) that have been retrieved from an index (or from more than one index, if index ANDing or index ORing is part of the access plan for the query). List prefetch can be efficient if the clustering sequence of rows in the target table is substantially uncorrelated with respect to the order of entries in the index in question (the list of RIDs obtained from the index is sorted in ascending RID sequence and then the sorted RID list is used to prefetch pages of associated rows from the target table). The hybrid method of joining tables is another driver of list prefetch activity.
  • Dynamic - This prefetch method is dynamically initiated at statement execution time when Db2 recognizes a sequential pattern of data access as it retrieves rows. Matching index scans are often drivers of dynamic prefetch activity.
OK, so here are a couple of things to keep an eye on, if you want to avoid a surprise situation involving elevated levels of wait-for-other-read time for processes that use prefetch to access pages of objects assigned to a given buffer pool:

  • Prefetch reads relative to prefetch requests - This tends to be more important for list and dynamic prefetch (less so for sequential prefetch, owing to locality of reference being less of a factor in that case). For list and dynamic prefetch, then, compare the number of prefetch reads to the number of prefetch requests (i.e., compare F to E, and J to I, using the letter-labels I added to the statistics report snippet shown above). What's this about? Well, a prefetch request is just that - a request to read a certain chunk of pages from a table space or an index into the assigned buffer pool. Suppose the prefetch request is for 32 pages (the most common quantity), and suppose that all 32 of those pages are already in the buffer pool. In that case, the prefetch request will not drive a prefetch read I/O operation. The larger the number of buffers allocated for a pool, the greater the likelihood that all pages associated with a prefetch request will already be in memory, thereby reducing prefetch reads as a percentage of prefetch requests. If you see the percentage of prefetch reads relative to prefetch requests going up over time for a pool, especially for list and/or dynamic prefetch, that's an indication that elevated levels of wait-for-other-read time could be in the offing. Why? Because more prefetch reads will generally mean more waiting for prefetch reads to complete.
  • The number of pages read per prefetch read - These are the fields labeled D, H and L in the example statistics report snippet. If you see that number going up for one or more prefetch types (sequential, list, dynamic), it could be an early-warning sign of higher wait-for-other-read times. Why? Because a prefetch read that will bring 25 pages into memory is likely to take longer than a prefetch read that will bring 5 pages into memory (recall that a prefetch read I/O is driven to bring into memory the pages, associated with a prefetch request, that are not already in the buffer pool). When prefetch reads take longer to complete, it is likely that application processes will see higher levels of wait-for-other-read time.
At this point you may have put two and two together, and are thinking, "Hmm. It seems to me that a growing number of prefetch reads relative to prefetch requests combined with an increase in the number of pages read into memory per prefetch read would really be a flashing yellow light with respect to wait-for-other-read time." Right you are. In that case, two things are happening, and both have negative implications for wait-for-other-read time: there are more prefetch reads (because there are fewer cases in which all pages associated with a prefetch request are already in memory) and each prefetch read, on average, is taking longer to complete (because each read, on average, is brining more pages into memory). If too much of that goes on, you could "hit the curve of the hockey stick" and see a sharp and sudden increase in applications' wait-for-other-read times. Better to take a corrective action before that happens. But what?

Glad you asked...


What can you do to reduce (or head off an increase in) wait-for-other-read time?

If wait-for-other-read time has become problematic, or if you see the warning signs and want to take a preemptive action, what can you do? Here are some possibilities:

  • Increase the size of the buffer pool in question - Simple: more buffers in a pool leads to increased page residency time, and that leads to 1) more prefetch requests NOT leading to prefetch reads (because all pages associated with a request are already in memory) and 2) fewer pages, on average, per prefetch read. Fewer prefetch reads + quicker execution of prefetch reads that do occur = less wait-for-other-read time. Obvious related question: "Can I make a buffer pool bigger? I don't want to put too much pressure on the z/OS LPAR's real storage resource." My response: check the LPAR's demand paging rate (available via an RMF Summary Report for the LPAR). If the demand paging rate is zero or a very small non-zero value (i.e., less than 1 per second), there is little to no pressure on the real storage resource, and you have a green light for making the buffer pool bigger. If the demand paging rate is 2-3 per second or more, and you don't want it to go higher than that (I wouldn't want it to go higher than that), consider reducing the size of a buffer pool that has low GETPAGE-per-second and read-I/O-per-second values, and increase the size of the buffer pool of concern by a like amount (so, the overall size of the buffer pool configuration remains the same). In my experience, plenty of Db2 for z/OS-using organization under-utilize the real storage resources of production z/OS LPARs.
  • Change some query access plans - If it looks as though sequential prefetch reads are the primary contributor to higher wait-for-other read times, you can consider taking actions that would reduce table space scan and/or non-matching index scan activity. For that, you could potentially use a query monitor to identify longer-running queries that access objects assigned to the buffer pool in question, and examine EXPLAIN output for those queries to see if any of them have table space scans and/or non-matching index scans in their access plans that involve objects assigned to the buffer pool. Then, consider whether it would be worth it to create a new index or indexes to eliminate such scans (there are cost factors associated with new indexes - you want the benefit to outweigh the cost), or whether simply adding a column to an existing index might reduce scan activity (there is a cost associated with that, too, but it's not as high as the cost of a new index). For dynamic prefetch, keep in mind that this is often related to matching index scans. You can sometimes reduce that activity by enabling Db2 to do more result set row filtering at the index level, and that often involves trying to increase MATCHOLS values for one or more predicates of longer-running and/or more-frequently-executed queries (referring to the name of a column in the PLAN_TABLE in which EXPLAIN output is found). Boosting MATCHCOLS can involve things such as changing an index (add a column, or change the order of columns in a key - keeping in mind that the latter change could benefit some queries and negatively impact others), or maybe re-coding some non-indexable predicates to make them index-able, or maybe adding a predicate that does not change a query's result set. For list prefetch, keep in mind that this often has to do with rows in a table being clustered in a sequence that is very different from the sequence of the index used in the list prefetch operation. You might consider whether a table's clustering key is what it should be - the clustering key of a table can always be changed, and sometimes it makes sense to do that. Also, when index ANDing is driving a lot of list prefetch activity, increasing index-level filtering can help (maybe by adding a column to an index involved in the ANDing, to increase the column-match number, or adding an index that would take the number of indexes AND-ed from n to n+1).
  • Take a look at RID pool activity - List prefetch operations involve use of the Db2 subsystem's RID pool. If the RID pool can't be used for a RID processing operation, Db2 will fall back to a table space scan for the target table, and that can drive sequential prefetch numbers up. RID pools these days are MUCH larger than they used to be (the default RID pool size in a Db2 12 or Db2 13 system is 1 GB), so incidences of RID processing "failed (or not used) due to lack of storage" - something that is indicated in Db2 monitor-generated accounting long as well as statistics long reports - are now quite rare. What you could potentially see, however, in the RID processing block of a statistics long report is a relatively large number of occurrences of "failed due to RDS limit exceeded." What this RID-pool-not-used counter means: If Db2 is executing a query, and a RID list processing action commences, and Db2 determines that more than 25% of the RIDs in the index being accessed will be qualified by the predicate in question, Db2 will abandon the RID list processing action in favor of a table space scan. Can you do anything about this? Maybe. In the case of a static SQL statement, it is my recollection that this RDS limit value is embedded in the associated Db2 package; so, at bind time, Db2 notes the number of RIDs that would exceed 25% of the RIDs in an index that is to be used as part of a RID-list-utilizing access plan action. Why is this potentially important? Because an index could grow substantially in the months (or years) following the most recent bind or rebind of a package. What this means: if you have an application process for which there are many occurrences of "RID list processing failed - RDS limit exceeded," check how long it has been since the package (or packages) associated with the process were last bound or rebound. If it's been a long time, and if you think that relevant indexes have grown substantially since then, consider rebinding the packages - if that rebind results in new and larger "this is the RDS limit threshold for this index" values being embedded in the package, that value increase might be enough to reduce incidences of "RID list processing failed - RDS limit exceeded" for the package.
OK, that's what I've got on this topic. As I mentioned up front: in-Db2 wait-for-other-read time is usually not a matter of concern for Db2 application performance. In some cases, it can be an issue. This blog entry is aimed at helping you should such a case arise at your site (or even better, to help ensure that it doesn't become an issue for you).

Wednesday, December 21, 2022

Db2 13 for z/OS: Setting Lock Timeout Limit and Deadlock Priority at the Application Level

Db2 13 for z/OS, which became generally available about seven months ago, introduced two interesting features that are similar in some ways but differ in one important aspect (about which I'll comment momentarily). These new features allow an application (or, more broadly, a "process") to set its own lock timeout limit and/or its own deadlock priority. With this blog entry I aim to provide related information that will be useful for you.


Application-level lock timeout

First, let's establish the need for this Db2 13 enhancement. Historically, there has been one lock timeout limit - specified via the IRLMRWT parameter in ZPARM - that applies to all processes interacting with a Db2 subsystem. While IRLMRWT is still there in a Db2 13 environment, it became apparent some time ago that "one size fits all" will often NOT be ideal when it comes to lock timeout in a Db2 system. Think about it. Suppose the value of IRLMRWT is at the default of 30 seconds for a production Db2 system at your site. You might have a developer of a Db2-accessing online application say, "What? NO! This app has a mobile front-end and users can get VERY frustrated if they have to wait more than a few seconds for a transaction to complete. It would be TERRIBLE to have a transaction sit and wait for 30 seconds to get a Db2 lock. We need the lock timeout value to be WAY lower than 30 seconds." At the same time, a developer of a long-running batch application might say, "What? NO! This job HAS to complete once it gets started or we miss SLAs and have angry customers. The job typically runs for five hours, and maybe it's been running for four hours and you want to time it out because it's been waiting for a lock for 30 seconds? 30 seconds is NOTHING as far as this job's concerned. The Db2 lock timeout value should be SUBSTANTIALLY greater than 30 seconds." Both of the developers are expressing legit concerns. How can those disparate concerns be addressed?

They can be addressed via the new (with Db2 13) special register named CURRENT LOCK TIMEOUT (available for use when Db2 13 function level 500 has been activated). Here are some things to know about CURRENT LOCK TIMEOUT:

  • The value of the special register - expressed in seconds - can be anything between -1 and 32767 (or a site-specified upper bound - see the next item in this list). A value if -1 means that the process will not be timed out if it ends up waiting for a lock - it will wait until it gets the requested lock or becomes deadlocked with some other process. A value of 0 means that the process does not want to wait at all for a lock - it wants to get an error message if a requested lock can't be obtained immediately (this basically makes available for application use a formerly Db2-internal mechanism known as a conditional lock request).
  • If the default upper-limit value of 32767 seconds is deemed by a Db2-using organization to be too high, a different max value can be provided via the new (with Db2 13) ZPARM parameter SPREG_LOCK_TIMEOUT_MAX. If you set that value to (for example) 1800, no process will be able to set the CURRENT LOCK TIMEOUT special register to a value greater than 1800 seconds.
  • If a lock timeout occurs and an application-level timeout limit was in effect for the lock requester and/or for the lock holder, that will be reflected in the information provided via the DSNT376I lock timeout message generated by Db2.
  • The value of the CURRENT LOCK TIMEOUT special register can be set automatically for an application by way of the Db2 profile tables, and not just for DDF-using applications (more information on this is provided below).

Application-level deadlock priority

A deadlock, of course, happens when process A holds a lock that process B needs in order to proceed, and process B holds a lock that process A needs in order to proceed. With both processes in a not-able-to-proceed state, Db2 detects the deadlock and chooses a "winner" and a "loser." The "loser" process is rolled back, causing it to release locks it had held, and that enables the other process (the "winner") to acquire the lock for which it had been waiting.

All well and good, except for the fact that one traditionally has been able to do little to nothing to influence Db2's choice of winner and loser in deadlock situations. That changes starting with function level 501 of Db2 13, thanks to a new built-in global variable named DEADLOCK_RESOLUTION_PRIORITY.

Before providing some helpful (I hope) items of information about DEADLOCK_RESOLUTION_PRIORITY, let me point out a very important difference between this new feature and the previously-described CURRENT LOCK TIMEOUT: the latter is a special register, while the former is a global variable. Why is that notable? One simple reason: any process can set the value of a special register, but a process must have permission to set the value of a global variable. The rationale for making DEADLOCK_RESOLUTION_PRIORITY a global variable may already be clear to you: if a value for DEADLOCK_RESOLUTION_PRIORITY could be set by any process, one could imagine everyone setting the global variable to its maximum value ("I always want to be the winner in a deadlock situation"), and that would defeat the purpose of the new capability (as the bad guy, Syndrome, in the movie The Incredibles put it, "When everyone is super, no one will be"). The permission-only nature of DEADLOCK_RESOLUTION_PRIORITY means that (for example) a Db2 DBA can assign the max-priority value to a "must-complete" database administration process, and know that under almost any circumstances (exceptions noted below) the process will be the winner in case of a deadlock. The same could be done - with permission received from a DBA - for a high-priority application process.

OK, items of related information that might be good for you to know:
  • The maximum value for DEADLOCK_RESOLUTION_PRIORITY is 255 (the range of acceptable values for the global variable is 0-255).
  • If you're a Db2 DBA, you might think, "I have a process that I think of as 'should-complete,' versus 'must-complete.' I want that process to generally be the winner in a deadlock situation, but I don't want it to get in the way of a 'must-complete' process. If 255 is a good DEADLOCK_RESOLUTION_PRIORITY value for a 'must-complete' process, what would be a reasonable priority value for a 'should-complete' process?" There is not a totally straightforward answer to that question. What you could do is this: start with some value for the 'should complete' process (maybe 150, or maybe 200, for example), and see if it ends up becoming the loser in a deadlock situation. If that happens, you can see how the priority of the "winner" process compared to the priority that you assigned to your 'should-complete' process, and potentially adjust your process's priority accordingly. How could you see the deadlock priority of a process that "beat" your process? That information is available via the IFCID 172 Db2 trace record. Activating IFCID 172 should involve very little overhead, as the trace record captures information about deadlocks, and deadlocks tend to be unusual in most Db2 systems I've seen. By the way, you should be able to use your Db2 monitor to generate a report with formatted information from IFCID 172 trace records (if you use IBM's OMEGAMON for Db2 monitor, the report to use for this purpose is called the Record Trace Report - that report can format the information in most any Db2 trace record).
  • I mentioned previously that there are exceptions to the "255 always wins" rule. Even if DEADLOCK_RESOLUTION_PRIORITY has been set to 255 for a process, that process could be the loser if it gets deadlocked with a process that is changing data in a table space defined with NOT LOGGED (hard for Db2 to roll back a unit of work when there are no associated undo records in the log), or if it gets deadlocked with a rollback or an abort or a backout process.

Setting the lock timeout limit or deadlock priority automatically for an application

What if you want an application to have a certain lock timeout limit or a certain deadlock priority, but you don't want the application to have to issue a SET CURRENT LOCK TIMEOUT or a SET SYSIBMADM.DEADLOCK_RESOLUTION_PRIORITY statement in order to accomplish the objective (SYSIBMADM is the schema for user-set-able built-in Db2 global variables - SYSIBM is the schema for built-in global variables that are set by Db2)? No problem: you can get that done using the Db2 profile tables. "Yeah," you might say, "but I want to do this for a local-to-Db2 application, and the profile tables can be used to set special register or built-in global variable values only for DDF-using applications." Actually, with Db2 13 that statement is no longer entirely true. Db2 13 allows the setting of CURRENT LOCK TIMEOUT (starting with function level 500) and DEADLOCK_RESOLUTION_PRIORITY (starting with function level 501) via profile table entries for local-to-Db2 as well as for DDF-using applications (for other special registers and built-in global variables, value-setting by way of the profile tables remains do-able only for DDF-using applications).

For a DDF-using application, the profile-defining specification (what you put in SYSIBM.DSN_PROFILE_TABLE) can be in a Db2 13 environment what it could be in a Db2 12 environment (the auth ID an application uses when connecting to the Db2 system is one example; the IP address of an application server is another example). For a local-to-Db2 application, the profile-defining specification can be auth ID and/or role, or collection name and/or package name, or the value of CLIENT_APPLNAME or CLIENT_USERID or CLIENT_WRKSTNNAME. The online Db2 13 for z/OS documentation provides additional details on using the profile tables to set values for special registers and for built-in global variables.

And there you have it. Db2 13 provides more control - and more-granular control - over two important aspects of application execution. I hope that these new capabilities will be useful at your site.

Friday, November 11, 2022

A Case Study: Using Db2 for z/OS Monitor Reports to Zero In on a Performance Problem

I had an interesting exchange recently with a Db2 for z/OS systems programmer. This individual had been asked to assist in determining the cause of a performance problem impacting a Db2-accessing application. The sysprog shared with me a Db2 monitor-generated accounting report showing activity for the application process, and a Db2 monitor statistics report covering the same time period for the same Db2 subsystem. In those two reports were the clues that pointed to the source of the application performance problem, and with the probable problem source identified the associated mitigating actions were readily determined. In this blog entry, I'll take you through my analysis of the Db2 monitor-provided accounting and statistics information, and the rationale behind my recommended steps for resolving the problem. My aim is not only to shed light on a particular performance-affecting issue and related remediating moves, but also to illustrate a methodical approach for analyzing Db2 application performance issues in general.


Input for analysis

I find Db2 monitor generated accounting and statistics reports to be extremely useful for analysis of application performance problems. In both cases, what you want is the detailed form of the report. In the case of IBM's OMEGAMON for Db2 performance monitor (the one with which I'm most familiar), you're talking about the ACCOUNTING REPORT - LONG and the STATISTICS REPORT - LONG (for other Db2 monitor products, these reports might have titles like, "detailed summary of accounting information" or "statistics detail report"). For the accounting report, your preference is to see activity pertaining exclusively to the application process for which the performance issue has arisen. This is usually done by using the monitor's data-filtering capabilities to include, for report-generation purposes, only the Db2 accounting records of interest (an accounting report is basically information from Db2 accounting trace records, formatted for readability). Db2 accounting trace records have all kinds of identifier fields, so you can get pretty specific. In the case about which I'm writing here, the relevant filtering criteria were the authorization ID of the application process of interest, and the FROM and TO times that bracketed the period during which the performance problem occurred.

As for the statistics report, what you want is one that covers the same time period as the accounting report (same FROM and TO times), for the same Db2 subsystem.


Looking for clues

What the Db2 sysprog had been told by the development team is that the application in question started out performing well, and then slowed down a lot (important input). The sysprog and I looked first at the accounting report, and in doing that we focused initially on the "class 3 suspension" information (this information, obtained from Db2 accounting trace class 3 records, has to do with "known" wait events, as opposed to "not accounted for" time, about which I'll comment momentarily). Why did we look there first? Because (in my experience), when a Db2-accessing process starts out performing well and then slows way down, it's often due to a substantial increase in one or more of the "wait times" captured by accounting trace class 3 (yes, an application slowdown could be related to a significant CPU time increase, but I've seen that less often than I've seen large wait time increases).

Looking at the class 3 times for the application process, what jumped out was a very large value for the average DB2 LATCH time ("average" is average per accounting trace record, which typically equates to average per transaction or average per batch job, depending on the workload type). Here, I mean "large" in terms of average DB2 LATCH time being a large percentage of average TOTAL CLASS 3 time. Usually, DB2 LATCH time is a very small percentage of TOTAL CLASS 3 time, with "wait time" categories such as SYNCHRON DATABASE I/O and OTHER READ I/O accounting for the bulk of TOTAL CLASS 3 time. A (proportionately) really large DB2 LATCH time is usually an indicator that something's not right.

The first thing I look at when I see unusually large DB2 LATCH time for a Db2-accessing process is the "in-Db2 not-accounted-for time" for that process. Several Db2 monitor products calculate that for you - in an IBM OMEGAMON for Db2 accounting report, the field is labeled NOTACC, and it's shown, on a sideways bar chart at the top of an accounting report, as a percentage of average in-Db2 elapsed time. If you need to calculate this figure for yourself, the denominator is average in-Db2 elapsed time (aka "class 2" elapsed time), and the numerator is average in-Db2 elapsed time minus in-Db2 CPU time (general-purpose plus zIIP, or "specialty engine," CPU time) minus TOTAL CLASS 3 time. In other words, it's the percentage of in-Db2 elapsed time that is not CPU time and not "identifiable" wait time. For a transactional application process (as was the case for the situation about which I'm writing), as a general rule you want in-Db2 not-accounted-for time to be less than 10%. If that figure is substantially greater than 10%, it's indicative of a CPU-constrained environment, and if the environment is highly CPU-constrained then DB2 LATCH time can get really large (as pointed out in an entry I posted to this blog a few years ago).

Well, in this particular case the average in-Db2 not-accounted-for time for the application process was 11% of in-Db2 elapsed time - a little on the high side for a transactional process, but not high enough to explain a really large DB2 LATCH time. With that cause of elevated DB2 LATCH time pretty much eliminated, it was time to turn to the breakdown of latch wait events for different latch categories, and that's where the statistics report comes in. In a Db2 monitor-generated statistics long report, the latch suspend count information will likely look something like this (what you see below is from an OMEGAMON for Db2 statistics long report, but it is NOT from the report I reviewed with the Db2 systems programmer - we jointly viewed that report in a web meeting, and I do not have a copy of the report):

LATCH CNT   /SECOND   /SECOND   /SECOND   /SECOND
---------  --------  --------  --------  --------
LC01-LC04      0.00      0.00      0.00      0.00
LC05-LC08      0.00      0.74      0.00      0.41
LC09-LC12      0.00      0.02      0.00      0.32
LC13-LC16      0.00     12.89      0.00      0.00
LC17-LC20      0.00      0.00      0.01      0.00
LC21-LC24      0.04      0.00      1.96      2.84
LC25-LC28      0.12      0.02      0.01      0.00
LC29-LC32      0.06      0.04      0.00      0.28
LC254          0.00

What I saw in the report I reviewed in the web meeting with the Db2 sysprog (and again, that's NOT what you see above - the snippet above is provided so that you can see what the latch suspend count information looks like in a statistics report) was a particularly high value for latch class 6 suspend events (that would be in the position highlighted in green in the sample report snippet above). What is latch class 6? It has to do with index page split activity in a Db2 data sharing environment (by the way, a handy page for seeing the activities associated with various Db2 latch classes is this one from the OMEGAMON for Db2 documentation).

Let's unpack that. An index page split occurs when Db2 has to insert an entry in an index page because of an insert (or an update of an indexed column) and that page is full. In that situation, a portion of the entries in the page will be moved to what had been an empty page in the index, so that there will be room in the formerly-full page for the new entry. What does Db2 data sharing have to do with this (and in the environment about which I'm writing, Db2 is running in data sharing mode)? In a data sharing system (versus a standalone Db2 subsystem), an index page split action has a greater impact on throughput because it forces a log-write operation.

Seeing the high level of index page split activity suggested by the numerous latch class 6 wait events, we turned again to the accounting report to see the average number of insert operations executed by the performance-impacted application. Sure enough, we saw that this was an insert-intensive process - more by the average number of rows inserted, versus the number of INSERT statements executed (the average number of rows inserted, per the accounting report, was about 100 times larger than the average number of INSERT statements executed, indicating use of block-level inserts by the application).

The elevated count of latch class 6 suspend events (related to index page splits) and the insert-intensive nature of the process also dove-tailed with the observed "started out fine, then slowed down" behavior of the application: in all probability, when the process started there was a pretty good amount of free space in leaf pages of indexes on the table(s) into which rows were being inserted. After a while these "holes" in index leaf pages filled up, and that resulted in a large number of index page split actions to make space for new index entries, and THAT - partly due to the fact that this was a Db2 data sharing system - had a majorly negative impact on application performance (the keys of affected indexes were clearly not of the continuously-ascending variety, because index page split actions are not required for an index defined on a continuously-ascending key).

With the source of the performance problem identified, the next matter to consider was...


What to do about it?

The Db2 sysprog and I discussed two problem-mitigating actions - one an "absolutely do" and the other a "maybe do." The "absolutely do" step was to increase the amount of free space in indexes to accommodate new entries. That step, in turn, was comprised of two sub-steps, each of which is applicable to indexes defined on non-continuously-ascending keys. Sub-step one: increase the index's PCTFREE value. Whereas the default PCTFREE value for an index is 10, a value of 20 or 25 might make more sense for an index defined on a non-continuously-ascending key for an insert-heavy table. Sub-step two: increase the index's FREEPAGE value. The default FREEPAGE value is 0. Here's why boosting the FREEPAGE value - for example, to 5 (in which case there would be an empty index page after every 5 pages containing index entries) - can be helpful for an index defined on a non-continuously-ascending key for an insert-heavy table: as previously mentioned, when an index page is split a portion of that page's entries are moved to a previously empty page in the index. If FREEPAGE 0 (the default) is in effect, the only empty index pages will be at the very end of the index - potentially a long way from the page that was split. That situation creates a drag on performance through degradation of the index's organization (reflected in the LEAFDIST value for the index - or index partition, in the case of a partitioned index - in the SYSIBM.SYSINDEXPART catalog table). With a non-zero and relatively low value for FREEPAGE (meaning, an empty page following each relatively-low-number of populated pages), when an index split does occur then there should be a "nearby" empty page into which entries from the full page can be moved.

Both PCTFREE and FREEPAGE can be changed for an index via an ALTER INDEX statement, and both take effect when the index is subsequently reorganized (or loaded). The larger PCTFREE value will reduce index page split activity between REORGs, and the non-zero FREEPAGE value will reduce the impact of page splits if they do occur.

And what about the "maybe do" step? That would be an increase in the size of the index's pages, from the default of 4 KB to maybe the maximum of 32 KB. How can that help? Here's how: because a 32 KB index page (for example) can hold 8 times as many entries as a 4 KB page, going to 32 KB-sized pages for an index (via an ALTER INDEX statement that assigns the index to a 32K buffer pool) can potentially result in an 87.5% reduction (seven eighths) in page split activity for an index, other things being equal (e.g., same rate of inserts for the underlying table). Why is this a "maybe do" thing versus an "absolutely do" thing? Because if access to table rows through the index (e.g., for queries) is truly random in nature with respect to key values, 32 KB-sized pages could mean somewhat less-effective use of buffer pool resources versus 4 KB-sized pages. It's a matter, then, of what's more important in a particular situation: is it minimizing index page split activity, or maximizing the effectiveness of buffer pool resources for random patterns of row access by applications?


In conclusion...

I hope that this entry has provided you with some "news you can use." Utilize Db2 monitor accounting and statistics reports to determine the source of an application performance problem, then take appropriate remedial action.

One more thing: Db2 13 for z/OS made it a lot easier to verify that index page splits are an issue, through enhanced instrumentation (new IFCID 396, associated with statistics trace class 3, which is on by default, indicates when an index page split operation takes more than 1 second, which would be unusually long) and through the new REORGTOTALSPLITS, REORGSPLITTIME and REORGEXCSPLITS columns of the SYSIBM.SYSINDEXSPACESTATS real-time statistics table in the Db2 catalog.


Friday, October 28, 2022

Getting Ready to Migrate to Db2 13 for z/OS from Db2 12

It could be said that preparation for migration of a Db2 12 for z/OS system to Db2 13 comes down to one thing: activate Db2 12 function level 510. That's a pretty simple-looking migration plan, but there's more to it than meets the eye - as I'll explain in this blog entry.

First, let's consider function level 510 itself. Unique among Db2 12 function levels, 510 provides no new functionality in the traditional sense - there's nothing new there that a DBA or a developer would use, or that could make an application execute with greater efficiency. The purpose of function level 510 is simply this: to validate that a Db2 12 system is technically ready for migration to Db2 13 (here, "system" refers to a standalone Db2 subsystem or a Db2 data sharing group).

If activating function level 510 gets a Db2 12 system ready for migration to Db2 13, what makes a Db2 12 system ready for activation of function level 510? Three things:

  1. The system's code level has to be 510. That can be easily verified by issuing the Db2 command -DISPLAY GROUP: in the command output, look for the value 121510 in a column labeled DB2 LVL (if your Db2 system's maintenance level is relatively current, the code level is likely to be 510 already - the PTF that takes a Db2 system's code to the 510 level came out in April of 2021).
  2. The catalog level has to be V12R1M509 - again, -DISPLAY GROUP output tells the tale.
  3. There can't be any packages in the system, used within the past 18 months, that were last bound or rebound prior to Db2 11.
Let me expand on items 2 and 3 in that list.


Getting to the right catalog level

First, you might be wondering, "How is it that I need to get to catalog level V12R1M509 before I can activate function level V12R1M510? Wouldn't the catalog level have to be V12R1M510?" No. There is no catalog level V12R1M510. Why is that? Because function level 510 has no catalog dependencies (i.e., no changes to a catalog at the V12R1M509 level are necessary to support function level 510). This is not at all unprecedented. Several of the Db2 12 function levels had no catalog dependencies. For example, function level 504 can be activated when the catalog level is V12R1M503 - there is no V12R1M504 catalog level because function level 504 did not require any changes to a catalog at the 503 level.

Second, suppose your Db2 catalog level is, say, V12R1M500. Can you take that catalog right to the 509 level? YES. Assuming your code level is at least 121509, you can execute the CATMAINT utility with a specification of UPDATE LEVEL(V12R1M509). That one execution of CATMAINT will make the changes to the catalog associated with the 509 catalog level, and it will make all the changes associated with all the other catalog levels between 500 and 509.

About those old packages...

"What's with that?" you might wonder. "Why would the existence of one or more packages, that we've used in the past 18 months and that were last bound or rebound prior to Db2 11, keep me from being able to activate function level 510?" Short answer: it's for your own good. Longer answer: packages are executable code. The code generated for a package by a Db2 version prior to V11 cannot be executed in a Db2 13 system. If there is a request to execute a pre-Db2 11 package in a Db2 13 environment, that package will be auto-rebound so that Db2 13 can generate for that package code that can be executed in the Db2 13 system. "OK," you say, "I get that auto-rebinds of packages can be a little disruptive with respect to my application workload, but it's not THAT big of a deal - the auto-rebind gets done and we go trucking on." Not so fast, I'd say. What if, as a result of the auto-rebind of a package, there's an access path change that negatively impacts - perhaps majorly - the performance of the associated program? Your possible response: "Again, not a huge deal. We run with PLANMGMT=EXTENDED in ZPARM, and so we'd have the prior copy of that package available, and we can just execute a REBIND with SWITCH(PREVIOUS) to restore the previous better-performing copy of the package." WRONG! You CAN'T restore the previous copy of that package, because the previous copy was generated prior to Db2 11, and that means the previous copy can't be executed in a Db2 13 system. You're STUCK with that poor-performing package. Sure, you can take steps to try to correct the performance regression - maybe update catalog statistics or take an index action and then rebind and hope for a better performance outcome - but do you want to do that while some critically important production program is performing in an unacceptable way and you're phone is going off non-stop because upper management wants to know WHEN YOU'RE GOING TO GET THIS FIXED? Probably not; so, we're going to prevent that scenario from happening by not letting you go to Db2 13 if you still have any pre-Db2 11 packages that have been used within the past 18 months (the thinking of the Db2 for z/OS development team: if a package was last used more than 18 months ago, it's highly likely that it's a package that's just not used anymore in your environment - it's still in SYSPACKAGE simply because no one has FREE-ed the old package).

This "keep you out of trouble" action taken by the Db2 for z/OS development team is based on the painful experience some organizations had when they migrated to Db2 12 from Db2 11. In that situation, we made it clear in the documentation that pre-Db2 10 packages would need to be rebound prior to going to Db2 12 because pre-Db2 10 packages could not be executed in a Db2 12 environment. Well, some Db2 for z/OS people either didn't see that warning, or saw it and decided to ignore it and take their chances, and in a few cases the problem described in the preceding paragraph was encountered. At some sites, the problem's impact was severe enough to warrant falling back to Db2 11, at which point people would rebind the pre-Db2 10 packages (as had been strongly encouraged by us) and then re-migrate to Db2 12. Not wanting to see reoccurrences of those difficulties, with Db2 13 we're basically saying, "We are not going to let you get into the potentially ugly situation you could see if a package that cannot be executed in a Db2 13 system is requested for execution in that environment - you cannot go to Db2 13 if you have pre-Db2 11 packages that might still be in use at your site.

By the way, if you want to see if you have any packages that would prevent successful activation of function level V12R1M510, you can execute this query on a Db2 12 system (and note that this query is also provided in the Db2 12 documentation):

SELECT * FROM SYSIBM.SYSPACKAGE 
  WHERE LASTUSED >= DATE(DAYS(CURRENT DATE) - 548) 
  AND RELBOUND NOT IN ('P','Q',’R’)
  AND VALID <> ‘N’ 
  AND OPERATIVE <> ‘N’;

One more thing: as previously mentioned, it's highly likely that a package that has not been executed in a Db2 system within the past 18 months will not be executed at some future time in that Db2 environment. That said, maybe you're concerned that, for some reason, a package in your environment is executed every two years (24 months). The chances of that being true are almost certainly very small, but perhaps non-zero. If that's bugging you, disregard the "18 months" window and rebind ANY pre-Db2 11 package in your system prior to going to Db2 13.


If you're nervous about a "big jump" to Db2 12 function level 510...

Consider this hypothetical situation: you're running with Db2 12 function level 500 activated and you're contemplating going to function level 510 to prepare for migration to Db2 13. That's actually not so hypothetical - a good number of Db2 12 systems are running with an activated function level of V12R1M500. If that looks familiar to you, there might be a couple of thoughts running through your mind:
  • Function level 500 to 510 looks like a really big jump to me. How do I get that done with a minimized risk of complications? The key here is the APPLCOMPAT specification for your packages. Maybe you're concerned that making a big jump up in the activated function level for your Db2 12 systems could lead to programs being impacted by a "SQL incompatibility" (basically, that's a SQL behavioral change: same SQL, same data, different result - these incompatibilities are pretty few and far between, and they often affect either few or none of your programs, but they can indeed arise on occasion). If you're indeed worried about that, you can guard against that by leaving the APPLCOMPAT values for your packages where they are when you activate a higher function level of Db2 12 for z/OS. If you have a package bound with, for example, APPLCOMPAT(V12R1M500), and you activate function level 510, SQL issued through the package bound with APPLCOMPAT(V12R1M500) will still get the SQL behavior associated with function level 500. You can find lots more information about APPLCOMPAT in the part 1 and part 2 posts of the 2-part entry on APPLCOMPAT that I added to this blog in 2019.
  • If function level 510 is a good ways beyond where my Db2 12 system is at present, could I maybe go from where we are to some intermediate function level, and later to level 510? Of course you can. "If I decide to do that," you might be thinking, "what would a good intermediate function level be for us?" That's really up to you. My advice: go to the Db2 12 function levels "main page" in the Db2 12 online documentation, and check out the features and functions introduced with each function level between 510 and where you are now. If there's a function level that provides an enhancement that would be particularly helpful in your environment, go to that one, and later, at a time of your choosing, go from that level to level 510 (I'll tell you that a pretty popular "intermediate" Db2 12 function level is 505 - this because a lot of Db2 DBAs really like the "rebind phase-in" functionality that was introduced via function level 505).
OK, that about wraps it up for this blog entry. I hope that this information will be helpful for you as you plan for your site's migration from Db2 12 to Db2 13.

P.S. In addition to activating Db2 12 function level 510, don't forget to apply the "fallback SPE" to your Db2 12 systems prior to migrating to Db2 13 - that PTF allows you to fallback from Db2 13 to Db2 12, in the unlikely event that that should be deemed necessary for you. The APAR associated with this fallback SPE is PH37108.