Thursday, May 28, 2020

Db2 12 for z/OS Buffer Pools - Recommendations for PGSTEAL, PGFIX and FRAMESIZE

Not too long ago, I received a note from the leader of a Db2 for z/OS database administration team, asking for my take on a Db2 12 buffer pool configuration set-up that his team had devised. An interesting characteristic of the configuration was the use of the options PGSTEAL(NONE) and FRAMESIZE(2G) for most of the Db2 subsystem's buffer pools. The team's reasoning regarding this choice was as follows:

  • The z/OS LPAR in which the Db2 subsystem runs has a very large real storage resource, and most of the Db2 subsystem's buffer pools are very large, so why not use 2 GB real storage page frames?
  • Most of the Db2 subsystem's buffer pools are so large that the total read I/O rate for each pool is very low. A low read I/O rate for a pool suggests a low rate of buffer stealing. When that is the case then why not use PGSTEAL(NONE) for the pool, since PGSTEAL(NONE) does not preclude buffer stealing when that is necessary (as it likely will be if the pool is not large enough to hold all pages of all objects assigned to the pool)?

I explained to the DBA team leader that the buffer pool configuration he'd shared with me was likely sub-optimal, owing largely to over-use of the PGSTEAL(NONE) option. I provided some buffer pool configuration recommendations for his consideration. Those recommendations are shown below, in "thumbnail" (i.e., highly condensed) form. Following the list of recommendations is the rationale for each one. I hope that this information will be helpful to you for optimizing the buffer pool configuration in your Db2 12 for z/OS environment.


The list

  • If ALL of the pages of ALL the objects assigned to a buffer pool will fit into 90% of the pool's buffers*, use PGSTEAL(NONE), PGFIX(YES) and FRAMESIZE(1M) for the pool (* for a pool that has a VPSIZE greater than or equal to 64000, this recommendation applies if all pages of all objects assigned to the pool will fit into VPSIZE minus 6400 buffers).
  • If it is expected that there will be some buffer stealing for a pool, and the size of the pool is at least 20 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(2G) for the pool.
  • If it is expected that there will be some buffer stealing for a pool, and the size of the pool is less than 2 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(1M) for the pool.
  • If it is expected that there will be some buffer stealing for a pool, and the size of the pool is between 2 GB and 20 GB, use PGSTEAL(LRU) and PGFIX(YES) for the pool. For the pool's real storage frame size, use either FRAMESIZE(1M) or FRAMESIZE(2G) - your choice.
  • If it is expected that the rate of buffer stealing for a pool will be non-zero but VERY LOW (maybe less than 10 buffer steal actions per minute), you MIGHT consider using PGSTEAL(FIFO) for the pool.

The rationale

Recommendation: If ALL of the pages of ALL the objects assigned to a buffer pool will fit into 90% of the pool's buffers*, use PGSTEAL(NONE), PGFIX(YES) and FRAMESIZE(1M) for the pool (* for a pool that has a VPSIZE greater than or equal to 64000, this recommendation applies if all pages of all objects assigned to the pool will fit into VPSIZE minus 6400 buffers).

Rationale: Why recommend that PGSTEAL(NONE) be used only for Db2 12 buffer pools for which not more than 90% of the buffers (or not more than VPSIZE minus 6400 buffers, when VPSIZE is greater than 64000) will be filled by pages of objects assigned to the pool, when buffer stealing for such a pool can be done, when required? Here's why: Db2 12 manages a PGSTEAL(NONE) buffer pool differently versus Db2 11. In a Db2 12 environment, when the pages assigned to a PGSTEAL(NONE) buffer pool are read into that pool (as they will be when the object is first accessed after the pool has been allocated), they will effectively be arranged in memory as they are arranged on disk. This is why a PGSTEAL(NONE) pool is also referred to as a "contiguous buffer pool" in a Db2 12 system. The in-memory-as-on-disk arrangement of pages reduces the CPU cost of accessing the pages in the pool, because every access to a page in the contiguous part of a PGSTEAL(NONE) buffer pool (there is also a non-contiguous part, as explained below) is a direct access - Db2 knows exactly where a given page in the contiguous part of the pool is located, without having to deal with the hash and LRU chains that are associated with buffer management in a standard pool. Now, a PGSTEAL(NONE) buffer pool still has to allow for buffer stealing in case that is required. How can you enable buffer stealing and still have pages of table spaces or indexes assigned to the pool arranged in memory as they are arranged on disk?

The fact of the natter is, you can't have it both ways - not for the whole pool, at least. You can't both allow for buffer stealing and maintain an in-memory-as-on-disk arrangement of pages in all of the pool's buffers. Here's how Db2 12 addresses this issue: it preserves an in-memory-as-on-disk arrangement of pages for 90% of the buffers of a PGSTEAL(NONE) pool (or for VPSIZE minus 6400 buffers, when VPSIZE is greater than 64000). That is the contiguous part of the pool. Any buffer stealing that has to happen will involve buffers in the remainder of the pool, which is called the overflow area. The number of buffers in the overflow area will be 10% of the pool's VPSIZE if VPSIZE is 64000 or less, and will be 6400 buffers if the pool's VPSIZE is greater than 64000 (for a very small pool, with a VPSIZE smaller than 500, the number of buffers in the overflow area will be 50). What this means is that the maximum CPU efficiency benefit of a Db2 12 PGSTEAL(NONE) buffer pool is achieved not just when there is no buffer stealing for the pool, but when the pool will not get more than 90% full of pages (or when not more than VPSIZE minus 6400 buffers will be occupied by pages, when the pool's VPSIZE is greater than 64000). Why? Because when Db2 has to put pages in the overflow area of a PGSTEAL(NONE) pool, it is no longer arranging all pages in the pool as they are arranged on disk - pages in the overflow area are not managed that way (they can't be, because that's where buffer stealing, if required, will happen). Suppose that a Db2 12 PGSTEAL(NONE) pool has 10,000 buffers. The maximum CPU efficiency benefit for that pool will be realized if not more than 9000 of the pool's buffers are used (i.e., when the pool's overflow area is not used). Another example: if VPSIZE for a PGSTEAL(NONE) pool is 100000, the maximum CPU efficiency benefit for the pool will be realized if not more than 93,600 of the pool's buffers (100,000 minus 6400) are used. To the extent that the overflow area is used, there will be a slight reduction in CPU efficiency delivered via the pool, because access to pages In the overflow area will not be quite as CPU efficient as access to pages in the contiguous part of the pool (where pages are arranged in memory as they are arranged on disk). If the overflow area fills up and some buffer stealing happens, there will be yet additional reduction in the CPU efficiency delivered by way of the PGSTEAL(NONE) pool. [Note that buffer stealing - if required - in the overflow area of a PGSTEAL(NONE) buffer pool will be handled via the FIFO (first in, first out) algorithm.]

What about the FRAMESIZE(1M) recommendation for a PGSTEAL(NONE) buffer pool in a Db2 12 system? Why not go with FRAMESIZE(2G) if a PGSTEAL(NONE) pool will be at least 2 GB in size? Here's why: in a Db2 12 environment, FRAMESIZE(2G) will not be honored for a PGSTEAL(NONE) pool. The reason for this: if pages are going to be arranged in the contiguous part a PGSTEAL(NONE) pool as they are on disk for maximum efficiency of page access (and that's the case, as noted above), we need to restrict a given real storage page frame in the contiguous part of the pool to pages belonging to one and only one database object (i.e., one table space or index). That being the case, if we honored FRAMESIZE(2G) for a PGSTEAL(NONE) pool, there would be the potential for a large amount of wasted space in some of those very large page frames. If you specify FRAMESIZE(2G) for a Db2 12 PGSTEAL(NONE) buffer pool, the pool will end up being backed by 4K page frames. Large page frames can definitely provide a CPU efficiency benefit for Db2 buffer pools. How can you get large page frames for a PGSTEAL(NONE) pool? By specifying FRAMESIZE(1M) for the pool. If a page frame used for the contiguous part of a Db2 12 PGSTEAL(NONE) buffer pool can hold pages belonging to only one database object (as mentioned), wouldn't we potentially waste space in some 1 MB page frames if we were to use them for a PGSTEAL(NONE) pool? Yes, but some wasted space in some 1 MB frames is generally not going to be a big deal in a z/OS LPAR that has (as is more and more often the case) hundreds of thousands of MB of real storage. [Note, by the way, that a 1 MB page frame used for the overflow area of a Db2 12 PGSTEAL(NONE) buffer pool can hold pages belonging to more than one database object - that is one reason for the drop-off in CPU efficiency for page access we get when the page in question is in the overflow area of a Db2 12 PGSTEAL(NONE) pool versus being in the contiguous part of the pool.]

Recommendation: If it is expected that there will be some buffer stealing for a pool, and the size of the pool is at least 20 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(2G) for the pool.

Rationale: The CPU efficiency advantage provided by using 2 GB page frames for a buffer pool is generally not very significant (versus the use of 1 MB frames) unless the buffer pool is really large (say, 20 GB or larger in size).

Recommendation: If it is expected that there will be some buffer stealing for a pool, and the size of the pool is less than 2 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(1M) for the pool.

Rationale: As you might expect, FRAMESIZE(2G) is not going to be honored for a pool that is smaller than 2 GB in size (though we will use a 2 GB frame for a pool if FRAMESIZE(2G) is specified and the pool's size is only a little bit smaller than 2 GB).

Recommendation: If it is expected that there will be some buffer stealing for a pool, and the size of the pool is between 2 GB and 20 GB, use PGSTEAL(LRU) and PGFIX(YES) for the pool. For the pool's real storage frame size, use either FRAMESIZE(1M) or FRAMESIZE(2G) - your choice.

Rationale: For a pool whose size is between 2 GB and 20 GB, use of 2 GB page frames will likely not deliver performance that is significantly better than what you'd see with FRAMESIZE(1M). With that said, use of 2 GB frames for a pool that is smaller than 20 GB will not have a negative impact on performance versus 1 MB frames; so, I say that this is your choice, and you'll be fine either way.

Recommendation: If it is expected that the rate of buffer stealing for a pool will be non-zero but VERY LOW (maybe less than 10 buffer steal actions per minute), you MIGHT consider using PGSTEAL(FIFO) for the pool.

Rationale: I see PGSTEAL(FIFO) as being very much a niche-type specification. Yes, it means that we won't consume CPU cycles in tracking least-recently-used information for buffers, but it could negatively impact performance if the buffer stolen via FIFO (the one that, of all the buffers in the pool, holds the page read into the pool the longest time ago) happens to be frequently referenced - in that case, that buffer will likely have to be read right back into the pool. With this in mind, I think that PGSTEAL(FIFO) might deliver a bit of a performance advantage over PGSTEAL(LRU) if the rate of buffer stealing is a really low non-zero value (like, maybe single digits of buffer steals per minute). If the buffer stealing rate is zero, then as previously mentioned the pool could be a good candidate for PGSTEAL(NONE).

5 comments:

  1. Note: a relatively new Db2 12 for z/OS APAR, PH22469, changes things (see https://www.ibm.com/support/pages/apar/PH22469 for more information). With the fix for this APAR applied, if a buffer pool is defined with PGSTEAL(NONE) and FRAMESIZE(2G), Db2 will honor the FRAMESIZE(2G) specification (assuming that 2 GB real storage page frames are available in the system) but will not honor the PGSTEAL(NONE) specification - PGSTEAL(LRU) will be used, instead. Prior to the fix for APAR PH22469 being applied, in that same scenario (buffer pool defined with PGSTEAL(NONE) and FRAMESIZE(2G)) Db2 will honor the PGSTEAL(NONE) specification but will back the pool with 4 KB page frames instead of 2 GB page frames.

    Robert

    ReplyDelete
    Replies
    1. Interesting and very informative. I would check out a additional new PTF - UI69081 which deals with wasted LFAREA memory for 1M frames under Db2 12 for z/OS.

      Delete
  2. Hi Robert, excellent contributions. Thanks for everything.
    I'll tell you a situation that we are observing.
    We have a buffer pool with PGFIX YES with FRAME SIZE 2G and 1M for a few months.
    After recycling the db2 on Sunday nights, an improvement in performance is observed on Mondays during the online period. The next day after the batch is normal. What could be the reason for this behavior?
    Thank you in advance

    ReplyDelete
    Replies
    1. Hard to say. Here's one possibility: a too-large percentage of the z/OS LPAR's real storage resource is being managed in 1M and 2G page frames. That might lead to a situation, not long after a system re-cycle, in which there is a shortage of 4K real storage page frames (many z/OS-based processes can only use 4K page frames). That might in turn cause z/OS to break down many of the 2G and/or 1M frames into 4K frames to boost the supply of the latter. The resulting loss of 2G and 1M page frame resources could reduce the CPU efficiency benefit that had been delivered by those resources.

      I don't know if this is happening on your system. As noted, this is just a possibility that I'm putting forward.

      Robert

      Delete
    2. Hi Robert,
      Thanks for the quick reply.
      Our case is not lack of memory for bufferpool pages.
      I'll look for other reasons, maybe something that shows that the night shift is changing the scene.

      Delete