Again and again I see it: a mainframe server (or logical partition thereof) that has a dozen or more gigabytes of central storage, and a production DB2 subsystem -- the only one on that z/OS instance -- that has a a buffer pool configuration with an aggregate size of a a gigabyte or less. Sometimes, pages of large and very heavily accessed tables and/or indexes are cached in a pool that has 80,000 buffers, or 40,000, or 15,000, or maybe just 10,000 buffers. The result? Way high rates of I/O activity, as in thousands of disk reads per second. That's a big drag on application performance, negatively impacting both throughput and CPU efficiency. When you have the server memory available to eliminate this undesirable situation, USE IT.
So, first things first: get a handle on your buffer pool I/O situation. You can use a DB2 monitor to do this, but my preference is to use the output of the command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL. More specifically, I like to issue this command once, and then again an hour later. That way I get, in the output of the second issuance of the command, one hour's worth of buffer pool activity data. For each pool referenced in the output of the second command, I sum the numbers in five fields:
- SYNC READ I/O (R)
- SYNC READ I/O (S)
- PREFETCH I/O (under SEQUENTIAL PREFETCH)
- PREFETCH I/O (under LIST PREFETCH)
- PREFETCH I/O (under DYNAMIC PREFETCH)
If you have a really high rate of disk read I/O activity for a pool and you want to make it larger, can you? That depends on whether or not the z/OS system in question has enough memory to accommodate a buffer pool size increase without negatively impacting other work on the server. How do you know that? A good way is to check the demand paging rate on the system. This figure indicates the number of times per second that a page, previously moved from central storage to auxiliary storage, is brought back into central storage to satisfy a program request. It's available by way of a z/OS monitor. The lower the demand paging rate, the less pressure there is on the system's central storage resource. What you want is for the demand paging rate to be less than 10 per second. You also want it to be more than zero, because a super-low demand paging rate means that the mainframe memory your organization has paid for is probably not being leveraged as it should be for system performance. If the demand paging rate is less than one per second, memory on your system is likely being underutilized. Enlarging one or more of your DB2 buffer pools is often a very good way to put underutilized server memory to productive use. In my experience, if there is one production DB2 subsystem on a z/OS instance, and if the size of that DB2 subsystem's buffer pool configuration is less than 10% of the amount of memory available to the z/OS instance, it's almost certain that the demand paging rate is very low. If you make a DB2 buffer pool (or pools) larger, check the demand paging rate again after the fact and make sure that it's still less than 10 per second (and note that an occasional incident such as a dump can cause the demand paging rate to briefly spike -- that's not a big deal).
Something else: if a buffer pool's I/O rate is really high (as in a few thousand per second), and the pool is pretty small (e.g., 20,000 buffers or less for a 4K pool), and the system's demand paging rate is really low, don't just dink and dunk your way to a larger buffer pool configuration. In other words, don't add 1000 buffers to a 15,000-buffer pool. Go for way bigger. Think in terms of doubling the pool's current size, or tripling it, or even quadrupling it. Later, after it's bigger (like, 80,000 buffers or more), you can think about growing it in smaller chunks, percentage-wise (e.g., maybe make it 50% larger). Whenever you take any action to enlarge a buffer pool, follow that with the -DISPLAY BUFFERPOOL commands that I mentioned previously, to gauge the effect of the size increase on the pool's disk read I/O rate. When you do that, in addition to checking on read I/Os per second, look to see if the number of of synchronous reads associated with sequential access (SYNC READ I/O(S)) went down. When a buffer pool is really undersized relative to the volume of requests for pages in objects assigned to the pool, it may be that when a set of 32 pages is brought into the pool via a prefetch read I/O, some of those pages are flushed from the pool before the requesting application process can access them. That drives the number of synchronous reads related to sequential access higher (because those flushed-out pages, when requested, will be read into memory individually). With a larger pool, page residency time increases, and it's less likely that prefetched pages will get flushed before they are accessed for the requesting application process.
Another thing: once an individual buffer pool (as opposed to the whole buffer pool configuration) gets to be pretty big (say, a gigabyte in size), before making it larger still consider the possibility of creating a new pool (maybe 25% or half the size of the big one) and moving some of the highest-activity objects from the really big pool to the new one. This is NOT a technical requirement, as a single 4K buffer pool can grow to a terabyte in size; rather, it's an opportunity to split some objects out in a way that will provide you with more granular buffer pool statistics and a chance to more finely tune the overall buffer pool configuration. Just a thought.
Finally, if your DB2 environment operates in data sharing mode on a parallel sysplex mainframe cluster, keep in mind that an increase in the size of a buffer pool (and you usually want a given pool to be the same size on all members) may make an enlargement of the associated group buffer pool highly advisable -- this mainly to ensure that the group buffer pool will have enough directory entries to prevent directory entry reclaims, which get in the way of top performance. A long time ago, when the earliest users of data sharing asked for a formula to help with group buffer pool sizing, I came up with a pretty simple one: add up the size (in MB) of the local pools, and divide that sum by three (this assumes the default ratio of five directory entries for every data entry in a group buffer pool). Fifteen years later, that simple formula still works really well. So if, in a four-way data sharing group, you want to grow BP5 to 120,000 buffers (480 MB) on each member, a good size for GBP5 would be:
(480 MB X 4 members) / 3 = 1920 MB / 3 = 640 MB
If GBP5 is currently smaller than that, take it up to 640 MB (or more) before taking BP5 to 120,000 buffers.
I hope that this information will help you to assess your buffer pool configuration from a performance perspective, and I hope that you'll grow your DB2 buffer pools to improve throughput and CPU efficiency (assuming that you have enough memory for this on your system -- and I'm sure that a lot of you do).
Hi Robert!
ReplyDeleteWe followed your advise and doubled our buffer
pools BP1 & BP2 and global buffer pools GBP1 & GBP2:
BP1 from 117Mb to 234Mb
BP2 from 117Mb to 234Mb
GBP1 from 120Mb to 160Mb (Maxsize 200Mb)
GBP2 from 120Mb to 160Mb (Maxsize 200Mb)
Our feeling is that we have more or less the
same activity...
Any suggestions!?!?
Thanks a lot!
José
You indicate that you have a "feeling" that you have more or less the same activity. Do you have numbers? Do you know what the total read I/O rates were for these pools before and after your pool enlargement actions?
ReplyDeleteHi Robert!
ReplyDeleteI have the numbers but it's dificult to post it here! Any other place where I can send it?
I'm only interested at present in 4 numbers:
ReplyDelete1) The total read I/O rate per second (random synchronous reads + sequential synchronous reads + sequential prefetch reads + list prefetch reads + dynamic prefetch reads) for BP1 before it was enlarged.
2) The total read I/O rate per second for BP1 after it was enlarged.
3) The total read I/O rate for BP2 before it was enlarged.
4) The total read I/O rate per second for BP2 after it was enlarged.
You can provide these four numbers for each member of your data sharing group. If you'd like to send them via e-mail, you can, to rfcatterall@gmail.com.
Hi Robert,
ReplyDeleteFirstly thanks for sharing such information .
I have DB2 v10 in my mainframe(MVS=z/OS 01.13.00) and few points to discuss here :
1) As per the calculation shared by you, I did the same check on my buffer pool(BP20) which is coming more than 1000.
And the comments shared you on this "If it's north of 1000, I will definitely want to make that pool larger -- maybe much larger (the largest number I've seen for disk read I/Os associated with a DB2 buffer pool is a little over 8000 per second -- I'm sure there are larger numbers out there)."
And same time I see :
READY
DSN SYSTEM(DP61)
DSN
-DISPLAY BUFFERPOOL(BP20) DETAIL
DSNB401I !DP61 BUFFERPOOL NAME BP20, BUFFERPOOL ID 20, USE COUNT 2749
DSNB402I !DP61 BUFFER POOL SIZE = 300000 BUFFERS AUTOSIZE = NO
ALLOCATED = 300000 TO BE DELETED = 0
IN-USE/UPDATED = 1799
DSNB406I !DP61 PGFIX ATTRIBUTE -
CURRENT = YES
PENDING = YES
PAGE STEALING METHOD = LRU
DSNB404I !DP61 THRESHOLDS -
VP SEQUENTIAL = 50
DEFERRED WRITE = 50 VERTICAL DEFERRED WRT = 10, 0
PARALLEL SEQUENTIAL =0 ASSISTING PARALLEL SEQT= 0
which indicates that they are many buffer which are not being used by looking in-use/updated and allocated numeric figures.
Do you still recommend to go for the increase in buffer pool size.
Thanks in advance.
The "IN-USE/UPDATED" figure in the output of the DISPLAY BUFFERPOOL command is often misinterpreted. In your case, you could see the value 1799 in that field, and think, "I have 300,000 buffers in this pool, and only 1799 of then are in use? The pool is way over-allocated!" That would not be a correct conclusion. In fact, it's highly likely that every one of the 300,000 buffers in your BP20 holds a page of an object assigned to the pool. The value 1799 in the IN-USE/UPDATED field does not mean that there are 300,000 - 1799 = 298,201 empty buffers in the pool. It means that 1799 of the pool's buffers are currently non-stealable, either because they are being used right now, or they hold pages that have been updated but not yet externalized (either to disk or, in a DB2 data sharing system, to a group buffer pool in a coupling facility LPAR). It is normal and good for only a small percentage of a pool's buffers to be non-stealable at a given point in time. The fact that the total read I/O rate for this pool is (per your calculations) in excess of 1000 per second indicates that significantly enlarging the pool (assuming that the z/OS LPAR's central storage resource is large enough to accommodate enlargement of the pool) could improve system performance by reducing I/O activity.
DeleteRobert
Thanks Robert for your quick response.
DeleteMy team have been given a task to identify the ways which can really improve the CPU utilization or application performance. And let me tell you, I keep following your blog for the same.
There are observational data, which I cannot share on the public platform. If you could let me know the other way to share the data, I would like to extend this discussion further more.
Thanks.
You can contact me at this e-mail address: rfcatterall@gmail.com.
DeleteRobert
I have sent you a mail sharing the details with subject "Thoughts on DB2 for z/OS Buffer Pool Sizing - Blog" on 19 May 2014.
DeleteCan you check and reply on the same.
Thanks