Sunday, November 29, 2015

DB2 for z/OS: How I Use -DISPLAY BUFFERPOOL Output

A couple of weeks ago, I was going over output of the DB2 for z/OS command -DISPLAY BUFFERPOOL with a group of people, and one of the meeting participants said words to this effect: "You've got some useful formulas and rules of thumb here. Could you write them down for us?" I did that, and it occurred to me that the information would be useful to others in the larger DB2 for z/OS community; so, I've packaged it in this blog entry.

First, enter the command the right way

Based on my experience, the most useful form of the -DISPLAY BUFFERPOOL command is as follows:

-DISPLAY BUFFERPOOL(ACTIVE) DETAIL

You actually want to enter the command twice, one hour apart. Why? Because the command output shows activity since the buffer pools were last allocated (which was probably when the target DB2 subsystem was last "bounced"), or since the command was last issued. If you issue the command once, then issue it again one hour later, the output of the second issuance of the command will show the preceding hour's worth of activity. That being the case, you can divide the activity counters by 3600 to get per-second figures, and those figures are very useful for performance monitoring and tuning work. [Note that the command output includes a DSNB409I message for each buffer pool, the text of which reads, "INCREMENTAL STATISTICS SINCE timestamp-value." Check that timestamp in the output of the second of the two issuances of the command, to verify that the command captured approximately one hour of activity.]

It might be a good idea to use -DISPLAY BUFFERPOOL to capture activity for a busy "online" hour (often a mid-morning or mid-afternoon hour) and for an hour during which a batch workload is particularly heavy -- one period might see more of a load on your buffer pool configuration than the other.

Something else to consider: some organizations have written programs that issue a pair of -DISPLAY BUFFERPOOL commands (again, separated by one hour) one or more times per day, and have a REXX routine that parses the output of the second issuance of a pair of commands and extracts useful information and inserts these values into a DB2 table or maybe a file. In that way, trends can be tracked and responded to with adjustments to the buffer pool configuration.

OK, with the output of the second issuance of a pair of one-hour-apart -DISPLAY BUFFERPOOL commands in hand, here's how you can put the information to good use:

  • Add up the size of all the buffer pools (in megabytes), to get the total size of the buffer pool configuration for the DB2 subsystem. That's easily done by multiplying the number of buffers allocated for a pool by the pool's page size; so, a pool with 20,000 buffers of 4KB apiece is sized at 80 MB. If you have a single production DB2 subsystem running in a DB2 for z/OS LPAR, I would look to move towards a situation in which the total size of the subsystem's buffer pool configuration is 30-40% of the size of the z/OS LPAR's memory resource. For example, if a z/OS LPAR with 100 GB of central storage holds one production DB2 subsystem, I'd aim for a buffer pool configuration size (that is, the aggregate size of all buffer pools allocated for the subsystem) that is in the range of 30-40 GB (if a z/OS LPAR holds more than one production DB2 subsystem, I'd want the total size of all the subsystems' buffer pool configurations to be not much more than 50% of the LPAR's real storage resource). If you're not in that 30-40% of memory "zone" at present, there's no need get there in one giant step from where you are. A series of steps (but pretty good-sized steps, not a lot of baby steps) would be fine. You should be most aggressive in growing the pools that have the highest total read I/O rates (see the next item). By the way, the aim here is to have a buffer pool configuration that makes really good use of an LPAR's memory for enhanced performance, while avoiding a situation in which LPAR memory is under too much pressure -- if the LPAR's demand paging rate, which can be obtained from an RMF (or equivalent z/OS monitor) CPU activity report, is in the low single digits or less per second, the memory resource is not under too much pressure.
  • For each pool, calculate the total read I/O rate. That's the sum of five numbers (for each pool) from the command output -- random synchronous reads, sequential synchronous reads, sequential prefetch reads, list prefetch reads, and dynamic prefetch reads -- divided by the number of seconds in the interval between the first and second issuance of the -DISPLAY BUFFERPOOL command (if that interval was, as suggested, an hour, the number of seconds would be 3600). Focus your attention first on the pools (if any) with read I/O rates in excess of 1000 per second, and see if you can get the I/O rate for those pools below 1000 per second by making the buffer pools larger. You can then turn your attention to pools with a read I/O rate in the hundreds per second, to see if the read I/O rate for those pools can be brought below 100 per second. If the read I/O rate for each of your buffer pools is below 100 per second, you are making exceedingly good use of LPAR memory.
  • The -DISPLAY BUFFERPOOL output shows the specifications for each buffer pool, and you should examine those. All high-I/O pools (total read I/O rate > 100 per second) should be defined with PGFIX(YES), and those page-fixed pools should be backed by 1 MB page frames (check to see that the value of the LFAREA parameter in the IEASYSxx member of the system's PARMLIB data set is large enough for this purpose). If your demand paging rate is very low (as noted previously, that would be low single digits per second or less), consider going with PGFIX(YES) even for low-I/O pools that have a lot of GETPAGE activity (more than 1000 GETPAGEs per second). Page-fixing these pools will deliver a CPU benefit if the pools are backed by 1 MB page frames.
  • Are any of the DB2 subsystem's pools used to "pin" objects in memory (i.e., to cache table spaces and/or indexes in memory in their entirety)? If yes, such pools should be defined with PGSTEAL(NONE), so that DB2 will know that they are intended to be "pinning" pools (the PGSTEAL(NONE) specification was introduced with DB2 10 for z/OS). Additionally, the target read I/O rate for a "pinning" pool is zero. If a "pinning" pool has a read I/O rate that is greater than zero, consider whether the pool needs to be enlarged so that all pages of all objects assigned to the pool can be kept in memory.
  • The counter for the number of times that the data manager threshold (DMTH) was reached should be zero for all pools. If that threshold is hit, either the buffer pool is way too small or the deferred write thresholds are way too high. [For a pool dedicated to 4KB-page or 32KB-page work file table spaces, the deferred write thresholds -- DWQT and VDWQT -- can be set to values higher than the respective defaults of 30 and 5, and doing this can result in some CPU savings, but don't take that idea too far. DWQT/VDWQT settings of 70/40 or even 80/50 should be OK for a work file-dedicated pool, but some sites have gone to 90 for DWQT for such pools, and have ended up hitting the data manager threshold as a result. Hitting DMTH causes the CPU cost of accessing data in a pool to jump, so make sure that you're staying under that threshold.]
  • The "prefetch disabled" counters -- no buffer, and no read engine -- should ideally be zero for all pools. If either is non-zero for a pool, either the pool is too small, or the VPSEQT threshold (default value is 80) is too low (so there are too few buffers available to hold pages read via prefetch, leading to elevated levels of prefetch read activity), or the deferred write queue threshold (DWQT) is too high (this would generally apply to a work file-dedicated pool, as mentioned in the preceding item).
  • For each pool, the number of asynchronous write operations should be significantly larger than the number of synchronous write operations. If this is not the case, the vertical deferred write queue threshold (VDWQT), or the horizontal deferred write queue threshold (DWQT), or both, should be lowered.

I hope that you find this information to be useful. Stay tuned for a companion entry in which I'll document the ways in which I use -DISPLAY GROUPBUFFERPOOL information (relevant to DB2 data sharing systems).

2 comments:

  1. Hi Robert,
    Thank you for this article - very informative (as usual).
    I have a question regarding Prefetch disable - no buffer condition.
    I recently noticed that one of my buffer on specific LPAR ( we have 8-way sharing) getting thousands and thousands Prefetch-disable- no buffer condition. In fact, for some interval , it's reaching over 100K such occurrences per hour.
    The buffer was at 110K pages . I increased it to 150K pages, but don't see much of the positive effect. There is some reduction in number of Prefetch-no-buffer condition , but not substantial.
    This buffer is used for index datasets only, VPSQT is set at 85% ( vdwqt is at 10 , and dwqt is at 30%).
    I cannot increase this BP anymore ( no more available storage on LPAR ) .
    Could you recommend anything else to address this issue ?
    Regards Ilya

    ReplyDelete
    Replies
    1. Sorry about the delay in responding, Ilya.

      You could try reducing VDWQT to the default value of 5, versus 10. That should trigger more deferred write activity, and the result could be more available buffers for receiving pages brought into memory via prefetch.

      You indicate that you can't make the buffer pool any larger because of a lack of real storage in the z/OS LPAR. Could you make the buffer pool larger by decreasing the size of another buffer pool (or pools) by a like amount (so that the overall size of the buffer pool configuration would remain the same)? Check the total read I/O rate (all synchronous reads plus all prefetch reads, per second) for each buffer pool allocated for the DB2 subsystem. Do you see that one or more pools has a low total read I/O rate (less than 100 per second)? If so, consider taking buffers from that pool (or pools) by reducing VPSIZE, and adding the same number of buffers to the pool in question (the one for which you're seeing lots in the way of PREFTECH DISABLED - NO BUFFER).

      Longer term, push for more memory on the system. That's a way to boost throughput and CPU efficiency for DB2-accessing applications, without increasing software license costs (since z Systems software is not licensed on a server memory basis).

      Robert

      Delete