Thursday, February 23, 2023

Two RMF (z/OS Monitor) Reports with which Db2 for z/OS People Should be Familiar

When it comes to analyzing the performance of a Db2 for z/OS system and associated applications, I think of a set of concentric circles. The outermost circle represents the operational environment in which Db2 is processing work - that would be the z/OS LPAR (logical partition) in which the Db2 subsystem runs. The next circle within that outer one represents the Db2 subsystem itself - its buffer pool configuration, EDM pool, RID pool, lock manager, recovery log, etc. The innermost of these concentric circles relates to the applications that access Db2-managed data. With the three circles in mind, I take an "outside-in" approach to Db2 system and application tuning. In other words, I begin with a look at the z/OS system within which Db2 is running, then I turn to the Db2 subsystem itself and lastly I analyze application-centric information. The reason for this approach? If the z/OS system in which Db2 is running is constrained in some way, there's a good chance that Db2 subsystem and application tuning actions will yield little positive impact. Similarly, if the Db2 subsystem is operating in a constrained fashion then application tuning actions may not do much good.

So, if assessing the operation of a z/OS system is important prior to turning to Db2 subsystem and/or application performance analysis, how do you determine whether the z/OS LPAR in question is running in a constrained or an unconstrained way? I do this based on examination of information in two RMF reports: the Summary report and the CPU Activity report. If you support a Db2 for z/OS system, you should be familiar with the content of these reports - in particular, some key metrics provided by the reports. In this blog entry I'll point out those key metrics and explain how I use them.

[Note: I am referring to reports generated by IBM's RMF z/OS monitor because RMF is the z/OS monitor with which I am most familiar. If your organization uses another vendor's z/OS monitor, that monitor might be able to generate reports similar to the RMF reports that are the subject of this blog entry - if need be, check with the vendor on that.]


The RMF CPU Activity report

RMF, by default (you can change this), carves the time period covered by a CPU Activity report into 15-minute intervals (so, if you requested a report for a one-hour time period you will see within that report four sub-reports, each providing information for a 15-minute part of the overall one-hour time period). Within a given 15-minute interval you will see, for the z/OS LPAR for which the report was requested, information that looks something like this (I highlighted two important values in red):


---CPU---    ---------------- TIME % ----------------
NUM  TYPE    ONLINE    LPAR BUSY    MVS BUSY   PARKED
 0    CP     100.00    87.03        86.85        0.00
 1    CP     100.00    77.76        77.68        0.00
 2    CP     100.00    83.88        83.78        0.00
 3    CP     100.00    87.07        86.91        0.00
 4    CP     100.00    76.23        76.14        0.00
 5    CP     100.00    76.79        76.71        0.00
 6    CP     100.00    80.45        80.35        0.00
 7    CP     100.00    73.29        73.24        0.00
 8    CP     100.00    63.83        69.22        0.00
 9    CP     100.00    57.78        62.95        0.00
 A    CP     100.00    35.28        48.33       17.01
TOTAL/AVERAGE          72.67        75.16
12    IIP    100.00    66.63        58.68        0.00
                                    46.30        0.00
13    IIP    100.00    26.70        23.42        0.00
                                    18.24        0.00
14    IIP    100.00     9.21         8.07        0.00
                                     6.42        0.00
3E    IIP    100.00     0.00        -----      100.00
                                    -----      100.00
TOTAL/AVERAGE          25.64        26.86


Here is an explanation of what you see in the report snippet above:
  • NUM - This is the ID of a given "engine" (processor core) configured for the LPAR.
  • TYPE - CP is short for central processor (typically referred to as a "general-purpose engine"); IIP is short for integrated information processor (typically referred to as a "zIIP engine").
  • LPAR BUSY - Engine utilization from the LPAR perspective.
  • MVS BUSY - I think of this as the busy-ness of the physical engine - if the engine is used exclusively (or almost exclusively) by the LPAR in question, the LPAR busy and MVS busy numbers should be very similar.
  • PARKED - This is the extent to which an engine's capacity is NOT available to the LPAR during the 15-minute interval (so, if the engine is seen to be 75% parked then the LPAR has access to 25% of that engine's processing capacity). When engines in a mainframe "box" (sometimes called a CEC - short for central electronic complex) are shared between several LPAR's it's not unusual to see a non-zero parked value for at least some of an LPAR's engines).
Note that for this LPAR, there are two MVS BUSY values for each zIIP engine. Why is that? Well, it indicates that the zIIP engines are running in SMT2 mode. SMT2 is short for simultaneous multi-threading 2, with the "2" meaning that z/OS can dispatch two pieces of work simultaneously to the one zIIP core. Running a zIIP engine in SMT2 mode does not double the engine's capacity (each of the two pieces of work dispatched to the one zIIP core will not run as fast as would be the case if the zIIP engine were running in "uni-thread" mode), but for a transactional workload SMT2 can enable a zIIP engine to deliver around 25-40% more throughput versus uni-thread mode (think of a one-way, single-lane road with a speed limit of 60 miles per hour versus a one-way, 2-lane road with a speed limit of 45 miles per hour - the latter will get more cars from A to B in a given time period if there's enough traffic to take advantage of the two lanes). For more information on SMT2 mode for zIIPs, see the entry on that topic that I posted to this blog a few years ago.

OK, to the values highlighted in red in the report snippet:
  • TOTAL/AVERAGE MVS BUSY for the general-purpose engines (75.16 in the report snippet) - As a general rule, application performance - especially for transactional applications (e.g., CICS-Db2, IMS TM-Db2, Db2 DDF) - will be optimal if average MVS busy for an LPAR's general-purpose engines does not exceed 80%. When average MVS busy for the general-purpose engines exceeds 90%, you can see a negative impact on the performance of Db2-accessing applications in the form of what is labeled "not accounted for" time in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information. Not-accounted-for time is in-Db2 elapsed time that is not CPU time and not one of the "known" Db2 wait times (those are so-called class 3 wait times, such as wait for database synchronous read, wait for lock/latch, wait for update/commit processing, etc.). It's literally elapsed time, related to SQL statement execution, for which Db2 cannot account. In my experience, in-Db2 not-accounted-for time is most often a reflection of wait-for-dispatch time, which itself is indicative of CPU contention. I'm generally not too concerned about not-accounted-for time as long as it's less than 10% of in-Db2 elapsed time for an application workload - particularly when it's a higher-priority transactional workload (you might tolerate a higher percentage of not-accounted-for time for a lower-priority batch workload). If not-accounted-for time exceeds 10% of in-Db2 elapsed time (again, especially for a higher-priority transactional workload), that would be a matter of concern for me, indicating that CPU contention is negatively impacting application throughput.
  • TOTAL/AVERAGE MVS BUSY for the zIIP engines (26.86 in the report snippet) - How "hot" can you run zIIP engines before zIIP engine contention becomes a concern? That depends on how many zIIP engines the LPAR has (and, to a lesser extent, whether or not the zIIPs are running in SMT2 mode). The more zIIP engines an LPAR has, the higher the average MVS busy figure for those engines can go before zIIP contention becomes an issue (in the example shown above, the LPAR has three zIIP engines that are running in SMT2 mode - in such a situation average MVS busy for the zIIP engines could probably go to 40-50% without zIIP contention becoming an issue). And when does zIIP contention become an issue? When the zIIP spill-over percentage gets too high, as explained in an entry I posted a few years ago to this blog. [Note: the report snippet shows four zIIP engines, but the fourth - the one identified as processor number 3E - is 100% parked from the LPAR's perspective. That means the LPAR had no access to zIIP processor 3E's capacity, so in effect the LPAR had three zIIP engines during the time interval.]
Below the information shown in the report snippet above, you'll see a sideways bar chart that looks something like this (again, I've highlighted some key information in red):


-----------------------DISTRIBUTION OF IN-READY WORK UNIT QUEUE-
 NUMBER OF              0    10   20   30   40   50   60   70   
 WORK UNITS     (%)     |....|....|....|....|....|....|....|....

<=  N          55.9     >>>>>>>>>>>>>>>>>>>>>>>>>>>>
 =  N +   1     3.5     >>
 =  N +   2     3.1     >>
 =  N +   3     3.5     >>
<=  N +   5     5.5     >>>
<=  N +  10    10.9     >>>>>>
<=  N +  15     5.7     >>>
<=  N +  20     4.2     >>>
<=  N +  30     3.1     >>
<=  N +  40     1.5     >
<=  N +  60     1.3     >
<=  N +  80     0.4     >
<=  N + 100     0.2     >
<=  N + 120     0.1     >
<=  N + 150     0.2     >
>   N + 150     0.2     >

N = NUMBER OF PROCESSORS ONLINE UNPARKED (16.8 ON AVG)


With regard to the report snippet above, the first thing to which I want to draw your attention is the bottom-line information about "N". We see that, for this LPAR during this 15-minute interval, N = 16.8. What does that mean? It means that the LPAR had "16.8 processor targets to which pieces of work could be dispatched." Why do I use the phrase "processor targets" instead of "processors?" It's because we tend to think of "mainframe processors" as meaning "mainframe engines," and that's not quite the case here. This report snippet goes with the first one we looked at (the second snippet appears right after the first one in the source RMF CPU Activity report), and you might recall that the first snippet showed that the LPAR's three zIIP engines are running in SMT2 mode. For that reason, those three zIIP engines are counted as six processor targets to which pieces of work can be dispatched. Thus, when the report shows that N = 16.8, we can say that 6 of the 16.8 relate to the LPAR's zIIP engines. That leaves 10.8 (16.8 - 6 = 10.8). We've accounted for the zIIP engines, so the 10.8 number relates to general-purpose engines. Where does that 10.8 come from? Refer again to the first report snippet. You'll see that the LPAR had 10 general-purpose processors that were not at all parked (i.e. that were 0% parked from the LPAR's perspective). An 11th general-purpose engine, identified as processor number A, was 17.01% parked during the interval. That means that 83% of the capacity of general-purpose processor number A (that's a hexadecimal A) was available to the LPAR during the time interval. That 83% is equivalent to 0.83, and RMF rounds 0.83 down to 0.8, and that's where the ".8" of N = 16.8 comes from. So, then, the LPAR had 6 zIIP "targets" to which work could be dispatched (3 engines, each running in SMT2 mode), and 10.8 general-purpose targets to which work could be dispatched, and that's why we have N = 16.8.

With N now understood, we can turn our attention to the other bit of information I highlighted in red: <= N 55.9. What does that mean? It means that for 55.9% of the time in the 15-minute report interval, the number of "in and ready" tasks (i.e., the number of tasks ready for dispatch) was less than or equal to the number of processor targets to which pieces of work in the LPAR could be dispatched. When that is true - when the number of in-and-ready tasks is <= N - there is essentially nothing in the way of CPU constraint, because an in-and-read task won't have to wait in line to get dispatched to a processor. In my experience, when the <= N figure is above 80%, the LPAR is very unconstrained in terms of processing capacity. A figure between 50% and 80% is indicative of what I'd call moderate CPU constraint, and performance (particularly in terms of throughput) is likely not impacted much by a lack of processing capacity. When the figure is below 50%, I'd say that CPU constraint could be impacting throughput in a noticeable way, and if it's below 10% the performance impact of CPU constraint for the LPAR could be severe. As previously mentioned, the Db2 performance impact of a CPU-constrained system is typically apparent in elevated levels of in-Db2 not-accounted-for time, as seen in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information.

One more thing about an RMF CPU Activity report: the number of engines configured for an LPAR - something that is shown in the report - should be balanced by an adequate amount of memory (aka real storage) so that the LPAR's processing power can be fully exploited to maximize application performance. For a z/OS LPAR in which a production Db2 workload runs, my rule of thumb, based on years of analyzing system and application performance data, is this: the LPAR should have at least 20 GB of memory per engine with which it is configured. The first report snippet included above shows that the LPAR has 13.8 engines: 10.8 general-purpose engines (as previously mentioned, the ".8" relates to an engine that is about 20% parked from the LPAR's perspective) and 3 zIIP engines (and for balanced-configuration purposes, I count physical zIIP cores - I don't double-count a zIIP engine because it is running in SMT2 mode). I'd round the 13.8 to 14 (the nearest integer) and say that on that basis the LPAR should have at least 14 X 20 GB = 280 GB of memory. If that seems like a lot to you, it shouldn't - mainframe memory sizes are getting bigger all the time, and real storage resources in the hundreds of GB are no longer unusual for production z/OS LPARs, especially those in which Db2 workloads run (the biggest real storage size I've personally seen for a z/OS LPAR is about 1100 GB).


The RMF Summary report

An RMF Summary report is smaller in size than a CPU Activity report - typically, one line of information for each 15 minute interval within the report time period. An RMF Summary report for a one-hour period could look something like what you see below (I removed some columns so that I could use a big-enough-to-read font size - the really important column is the one on the far right, with the numbers highlighted in green):

 
NUMBER OF INTERVALS 4      TOTAL LENGTH OF INTERVALS 00.59.58
-DATE   TIME     INT  ... JOB  JOB  TSO  TSO  STC ... SWAP DEMAND
 MM/DD HH.MM.SS MM.SS ... MAX  AVE  MAX  AVE  MAX ... RATE PAGING
 11/03 09.15.00 15.00 ...  83   72   96   92  371 ... 0.00   0.00
 11/03 09.30.00 14.59 ...  85   68   98   95  369 ... 0.00   0.00
 11/03 09.45.00 15.00 ...  75   68   95   92  363 ... 0.00   0.00
 11/03 10.00.00 14.59 ...  82   70   94   91  365 ... 0.00   0.00
-TOTAL/AVERAGE        ...  85   69   98   93  371 ... 0.00   0.00


So, what's the LPAR's demand paging rate? It's the rate, per second, at which pages that had been moved by z/OS from real to auxiliary storage (to make room for other pages that needed to be brought into real storage) were brought back into real storage on-demand (i.e., because some process needs to access the page). Why is the demand paging rate important? Here's why: it is, in my opinion, the best indicator of whether or not memory usage can be expanded without putting too much pressure on the LPAR's real storage resource. Here's what I mean by that: suppose you have a Db2 buffer pool that has a total read I/O rate (synchronous reads + sequential prefetch reads + list prefetch reads + dynamic prefetch reads, per second) that's higher than you like - maybe the total read I/O rate for the pool is north of 1000 per second, and you want to bring that down substantially to boost application performance and CPU efficiency (every read I/O eliminated saves CPU and elapsed time). The best way to lower a buffer pool's total read I/O rate is to make the pool larger. Can you do that without putting too much pressure on the LPAR's real storage resource? Here's what I'd say: If the LPAR's demand paging rate is consistently less than 1 per second, you have a green light for using more memory to get a performance boost. If the LPAR's demand paging rate is consistently greater than 1 per second, I'd hold off on using more memory until the LPAR is configured with additional real storage. This goes for any action that would increase memory usage by DB2 - besides enlarging a buffer pool, that could be a RID pool or a sort pool or an EDM pool size increase, or increasing the use of RELEASE(DEALLOCATE) packages with threads that persist through commits, or whatever. Before doing something that will increase memory usage, check the LPAR's demand paging rate.

That's it for now. If you haven't had a look at these two RMF reports before, get them for an LPAR of interest to you and give them a look-see - a z/OS systems programmer at your site would probably be happy to generate the reports for you. Knowing the key utilization and configuration characteristics of the z/OS LPAR in which a Db2 subsystem runs is an important part of effective performance management of the Db2 environment.

Friday, January 20, 2023

Db2 for z/OS: What is "Wait for Other Read" Time, and What Can You Do About It?

A recent conversation I had with some folks who support a large Db2 for z/OS system reminded me of the importance of something called "wait for other read time." In this blog entry I want to make clear to people what Db2 wait-for-other-read time is, why it's important, how to monitor it and what to do about it if it becomes an issue.


What is Db2 for z/OS "wait for other read" time?

In Db2 performance monitoring parlance, time associated with SQL statement execution is known as "in-Db2" time. It's also called "class 2" time, because it is recorded, for monitoring purposes, in records that are generated when Db2 accounting trace class 2 is active. Class 2 elapsed time (elapsed time pertaining to SQL statement execution) has two main components: CPU time (some of which is consumed on so-called general-purpose processors - aka "engines" - of a mainframe server, and some of which might be consumed on what are known as zIIP engines) and suspend time (on a busy system there can be another component of in-Db2 time, called "not accounted for" time, that generally reflects wait-for-dispatch time). In-Db2 suspend time is also known as "class 3" time, because it is recorded in Db2 accounting trace records when accounting trace class 3 is active. Class 3 time is broken out in a number of categories, and these show up in an accounting long report that might be generated by your Db2 monitor, or by a Db2 monitor's online display of thread detail information.

In a Db2 monitor-generated accounting long report, class 3 suspend times are shown as "average" values. Average per what? Well, if you're looking at information for a Db2-accessing batch workload (referring to jobs that run in z/OS JES initiator address spaces and access Db2 by way of Db2's call attachment facility or TSO attachment facility), it'll be average per batch job (generally speaking, activity for one batch job will be recorded in one Db2 accounting trace record). If you're looking at a transactional workload (e.g., a CICS-Db2 workload, or a Db2-accessing IMS transactional workload, or a DDF client-server workload), the "average" values seen in a Db2 monitor-generated accounting long report will typically be average per transaction.

In many cases, the majority of in-Db2 time for a batch or a transactional workload will be class 3 suspend time (it is a little unusual, but certainly not unheard of, for a Db2 workload's in-Db2 time to be mostly CPU time). More often than not, the largest component of in-Db2 class 3 suspension time will be wait-for-synchronous-database-read time. Another wait-for-read time is labeled "wait for other read." What's that? Well, if it's "other than" synchronous read wait time, it must be asynchronous read time, right? Right, indeed. And what are asynchronous reads? Those are prefetch reads: read I/Os driven by Db2 in anticipation that the pages read into memory in bulk in this way will be requested by the process (such as an application process) that prompted Db2 to issue the prefetch read requests. Well, if a prefetch read I/O operation is executed because Db2 is aiming to get pages into a buffer pool in memory before they are requested by (for example) an application process, why would there be such a thing as a process having to wait for a prefetch read to complete?

Wait-for-prefetch read (reported as "wait for other read") happens because there are usually lots of Db2-accessing processes active in a system at one time. Let's call two of these processes process A and process B, and let's say that Db2 is driving prefetch reads (these could be sequential, list or dynamic prefetch reads - more on that in a moment) for process A. We'll further suppose that Db2 needs to access page 123 of table space TS1 on behalf of process B (i.e., Db2 issues a GETPAGE request for page 123 of table space TS1). If page 123 of table space TS1 is not already in the buffer pool to which TS1 is assigned, Db2 will drive a synchronous read request to get that page into memory, right? Not necessarily. It could be that page 123 of TS1 is already scheduled to be brought into memory via a prefetch read that is being executed on behalf of process A. If that is the case then process B will wait for that in-flight prefetch read to complete, and that wait time will be recorded as "wait for other read time" for process B. [It is also possible that process A has gotten to the point that it needs to access page 123 of TS1, and the prefetch read that will bring that page into memory is currently in-flight, and that would end up causing wait-for-other-read time for process A related to the prefetch request being driven on behalf of process A, but I think it's more likely that wait-for-other-read time will be associated with one process waiting on completion of a prefetch read operation that is being executed on behalf of another process.]


Why is wait-for-other-read time important?

Usually, wait-for-other-read time is a relatively small percentage of total class 3 suspend time for a process (it's typically much smaller than wait-for-synchronous-read time), but that's not always the case. In some situations, wait-for-other-read time is a major component of overall in-Db2 suspend time. The performance impact of elevated wait-for-other-read time can be especially significant for batch applications, as these Db2 processes are often particularly reliant on prefetch to achieve elapsed time objectives. If wait-for-other-read time gets too large then service levels could degrade, leading to user dissatisfaction.


How can wait-for-other-read time be monitored?

As mentioned previously, wait-for-other read time is recorded in accounting long (i.e., accounting detail) reports that can be generated by a Db2 performance monitor; so, you can track that for a process or a workload over time and note trends. Besides wait-for-other-read time itself, are there any other related fields in Db2 monitor-generated reports that you should keep your eye on to help ensure that a wait-for-other-read time problem does not sneak up on you? Yes, as explained below.

The "other related fields" that I'd recommend checking out are found in a Db2 monitor-generated statistics long report (i.e., statistics detail report). In such a report you would see, for each buffer pool, a set of fields like those shown below (this is a snippet of a statistics long report generated by the IBM OMEGAMON for Db2 for z/OS performance monitor - I've added some A, B, C labels that I'll subsequently use in referencing various of these fields):


BP1 READ OPERATIONS          QUANTITY  /SECOND
---------------------------  --------  -------

SEQUENTIAL PREFETCH REQUEST   5622.00     3.23   A
SEQUENTIAL PREFETCH READS     5587.00     3.21   B
PAGES READ VIA SEQ.PREFETCH  52950.00    30.43   C
S.PRF.PAGES READ/S.PRF.READ      9.48            D
LIST PREFETCH REQUESTS       47394.00    27.24   E
LIST PREFETCH READS           5876.00     3.38   F
PAGES READ VIA LIST PREFTCH    154.9K    89.03   G
L.PRF.PAGES READ/L.PRF.READ     26.36            H
DYNAMIC PREFETCH REQUESTED     378.3K   217.42   I
DYNAMIC PREFETCH READS         157.6K    90.59   J
PAGES READ VIA DYN.PREFETCH   3110.6K  1787.68   K
D.PRF.PAGES READ/D.PRF.READ     19.73            L

By way of explanation, I'll first point out that what you see above are three repeating sets of fields (4 fields in each set) that pertain to sequential, list and dynamic prefetch activity. Here are thumbnail definitions of these prefetch types:

  • Sequential - Generally speaking, this is the prefetch mode used for table space scans or for non-matching index scans. In other words, if Db2 determines that a front-to-back scan of a table space or index will be required, sequential prefetch will be used (assuming that the table or index in question is not super-small, in which case prefetch of any kind would usually not make sense).
  • List - This is the prefetch type used when Db2 is retrieving table rows based on a list of row IDs (RIDs) that have been retrieved from an index (or from more than one index, if index ANDing or index ORing is part of the access plan for the query). List prefetch can be efficient if the clustering sequence of rows in the target table is substantially uncorrelated with respect to the order of entries in the index in question (the list of RIDs obtained from the index is sorted in ascending RID sequence and then the sorted RID list is used to prefetch pages of associated rows from the target table). The hybrid method of joining tables is another driver of list prefetch activity.
  • Dynamic - This prefetch method is dynamically initiated at statement execution time when Db2 recognizes a sequential pattern of data access as it retrieves rows. Matching index scans are often drivers of dynamic prefetch activity.
OK, so here are a couple of things to keep an eye on, if you want to avoid a surprise situation involving elevated levels of wait-for-other-read time for processes that use prefetch to access pages of objects assigned to a given buffer pool:

  • Prefetch reads relative to prefetch requests - This tends to be more important for list and dynamic prefetch (less so for sequential prefetch, owing to locality of reference being less of a factor in that case). For list and dynamic prefetch, then, compare the number of prefetch reads to the number of prefetch requests (i.e., compare F to E, and J to I, using the letter-labels I added to the statistics report snippet shown above). What's this about? Well, a prefetch request is just that - a request to read a certain chunk of pages from a table space or an index into the assigned buffer pool. Suppose the prefetch request is for 32 pages (the most common quantity), and suppose that all 32 of those pages are already in the buffer pool. In that case, the prefetch request will not drive a prefetch read I/O operation. The larger the number of buffers allocated for a pool, the greater the likelihood that all pages associated with a prefetch request will already be in memory, thereby reducing prefetch reads as a percentage of prefetch requests. If you see the percentage of prefetch reads relative to prefetch requests going up over time for a pool, especially for list and/or dynamic prefetch, that's an indication that elevated levels of wait-for-other-read time could be in the offing. Why? Because more prefetch reads will generally mean more waiting for prefetch reads to complete.
  • The number of pages read per prefetch read - These are the fields labeled D, H and L in the example statistics report snippet. If you see that number going up for one or more prefetch types (sequential, list, dynamic), it could be an early-warning sign of higher wait-for-other-read times. Why? Because a prefetch read that will bring 25 pages into memory is likely to take longer than a prefetch read that will bring 5 pages into memory (recall that a prefetch read I/O is driven to bring into memory the pages, associated with a prefetch request, that are not already in the buffer pool). When prefetch reads take longer to complete, it is likely that application processes will see higher levels of wait-for-other-read time.
At this point you may have put two and two together, and are thinking, "Hmm. It seems to me that a growing number of prefetch reads relative to prefetch requests combined with an increase in the number of pages read into memory per prefetch read would really be a flashing yellow light with respect to wait-for-other-read time." Right you are. In that case, two things are happening, and both have negative implications for wait-for-other-read time: there are more prefetch reads (because there are fewer cases in which all pages associated with a prefetch request are already in memory) and each prefetch read, on average, is taking longer to complete (because each read, on average, is brining more pages into memory). If too much of that goes on, you could "hit the curve of the hockey stick" and see a sharp and sudden increase in applications' wait-for-other-read times. Better to take a corrective action before that happens. But what?

Glad you asked...


What can you do to reduce (or head off an increase in) wait-for-other-read time?

If wait-for-other-read time has become problematic, or if you see the warning signs and want to take a preemptive action, what can you do? Here are some possibilities:

  • Increase the size of the buffer pool in question - Simple: more buffers in a pool leads to increased page residency time, and that leads to 1) more prefetch requests NOT leading to prefetch reads (because all pages associated with a request are already in memory) and 2) fewer pages, on average, per prefetch read. Fewer prefetch reads + quicker execution of prefetch reads that do occur = less wait-for-other-read time. Obvious related question: "Can I make a buffer pool bigger? I don't want to put too much pressure on the z/OS LPAR's real storage resource." My response: check the LPAR's demand paging rate (available via an RMF Summary Report for the LPAR). If the demand paging rate is zero or a very small non-zero value (i.e., less than 1 per second), there is little to no pressure on the real storage resource, and you have a green light for making the buffer pool bigger. If the demand paging rate is 2-3 per second or more, and you don't want it to go higher than that (I wouldn't want it to go higher than that), consider reducing the size of a buffer pool that has low GETPAGE-per-second and read-I/O-per-second values, and increase the size of the buffer pool of concern by a like amount (so, the overall size of the buffer pool configuration remains the same). In my experience, plenty of Db2 for z/OS-using organization under-utilize the real storage resources of production z/OS LPARs.
  • Change some query access plans - If it looks as though sequential prefetch reads are the primary contributor to higher wait-for-other read times, you can consider taking actions that would reduce table space scan and/or non-matching index scan activity. For that, you could potentially use a query monitor to identify longer-running queries that access objects assigned to the buffer pool in question, and examine EXPLAIN output for those queries to see if any of them have table space scans and/or non-matching index scans in their access plans that involve objects assigned to the buffer pool. Then, consider whether it would be worth it to create a new index or indexes to eliminate such scans (there are cost factors associated with new indexes - you want the benefit to outweigh the cost), or whether simply adding a column to an existing index might reduce scan activity (there is a cost associated with that, too, but it's not as high as the cost of a new index). For dynamic prefetch, keep in mind that this is often related to matching index scans. You can sometimes reduce that activity by enabling Db2 to do more result set row filtering at the index level, and that often involves trying to increase MATCHOLS values for one or more predicates of longer-running and/or more-frequently-executed queries (referring to the name of a column in the PLAN_TABLE in which EXPLAIN output is found). Boosting MATCHCOLS can involve things such as changing an index (add a column, or change the order of columns in a key - keeping in mind that the latter change could benefit some queries and negatively impact others), or maybe re-coding some non-indexable predicates to make them index-able, or maybe adding a predicate that does not change a query's result set. For list prefetch, keep in mind that this often has to do with rows in a table being clustered in a sequence that is very different from the sequence of the index used in the list prefetch operation. You might consider whether a table's clustering key is what it should be - the clustering key of a table can always be changed, and sometimes it makes sense to do that. Also, when index ANDing is driving a lot of list prefetch activity, increasing index-level filtering can help (maybe by adding a column to an index involved in the ANDing, to increase the column-match number, or adding an index that would take the number of indexes AND-ed from n to n+1).
  • Take a look at RID pool activity - List prefetch operations involve use of the Db2 subsystem's RID pool. If the RID pool can't be used for a RID processing operation, Db2 will fall back to a table space scan for the target table, and that can drive sequential prefetch numbers up. RID pools these days are MUCH larger than they used to be (the default RID pool size in a Db2 12 or Db2 13 system is 1 GB), so incidences of RID processing "failed (or not used) due to lack of storage" - something that is indicated in Db2 monitor-generated accounting long as well as statistics long reports - are now quite rare. What you could potentially see, however, in the RID processing block of a statistics long report is a relatively large number of occurrences of "failed due to RDS limit exceeded." What this RID-pool-not-used counter means: If Db2 is executing a query, and a RID list processing action commences, and Db2 determines that more than 25% of the RIDs in the index being accessed will be qualified by the predicate in question, Db2 will abandon the RID list processing action in favor of a table space scan. Can you do anything about this? Maybe. In the case of a static SQL statement, it is my recollection that this RDS limit value is embedded in the associated Db2 package; so, at bind time, Db2 notes the number of RIDs that would exceed 25% of the RIDs in an index that is to be used as part of a RID-list-utilizing access plan action. Why is this potentially important? Because an index could grow substantially in the months (or years) following the most recent bind or rebind of a package. What this means: if you have an application process for which there are many occurrences of "RID list processing failed - RDS limit exceeded," check how long it has been since the package (or packages) associated with the process were last bound or rebound. If it's been a long time, and if you think that relevant indexes have grown substantially since then, consider rebinding the packages - if that rebind results in new and larger "this is the RDS limit threshold for this index" values being embedded in the package, that value increase might be enough to reduce incidences of "RID list processing failed - RDS limit exceeded" for the package.
OK, that's what I've got on this topic. As I mentioned up front: in-Db2 wait-for-other-read time is usually not a matter of concern for Db2 application performance. In some cases, it can be an issue. This blog entry is aimed at helping you should such a case arise at your site (or even better, to help ensure that it doesn't become an issue for you).