Sunday, July 21, 2013

DB2 for z/OS Data Sharing: The Evolution of a GBP Sizing Formula

Recently I had an e-mail exchange with my friend, colleague, and fellow DB2 blogger Willie Favero, on the subject of DB2 for z/OS group buffer pool sizing. One of my messages was a little wordy (imagine that), and Willie responded with, "[This] reads like it could be your next blog post." And so it is. In this entry I'll share with you what I think is an interesting story, along with some information that I hope will help you with group buffer pool monitoring and tuning.

For those of you who aren't familiar with the term "group buffer pool," a bit of introduction: DB2 data sharing is technology that enables multiple DB2 for z/OS subsystems to share concurrent read/write access to a single database. The DB2 subsystems are members of a data sharing group that runs on a cluster of z/OS systems called a Parallel Sysplex. Within the Parallel Sysplex are resources known as coupling facilities. These are essentially shared memory devices. Within a coupling facility one would find several structures, and among these would be the DB2 group buffer pools, or GBPs. Basically, group buffer pools are used for two things:
  • Page registration -- When a DB2 member subsystem reads into a local buffer pool a page belonging to a GBP-dependent data set (i.e., a table space or index -- or a partition of a partitioned table space or index -- in which there is inter-DB2 read/write interest), it registers that page in a directory entry of the corresponding GBP (so, a page read into buffer pool BP1 would be registered in GBP1). DB2 member X registers a locally cached page of a GBP-dependent data set so that it can be informed if that page is changed by a process running on DB2 member Y. Such a change effected on another member of the data sharing group would cause the copy of the page cached locally in a buffer pool of DB2 member X to be marked invalid. On re-referencing the page, DB2 member X would see that its copy of the page is invalid, and would then look for the current version of the page in the associated GBP (and would retrieve that current version from disk if it were not found in the GBP).
  • Caching of changed pages -- If a process running on DB2 member Y of the data sharing group changes a page belonging to a GBP-dependent data set, that member will write the changed page to the associated GBP in a coupling facility LPAR (usually at commit time, but sometimes before a commit). The changed page will remain in the GBP -- from whence it can be retrieved by a member DB2 subsystem in a few microseconds -- for some time following the update. Eventually the page will be overwritten by another changed page, but before that happens the GBP data entry occupied by the page will be made stealable via a process called castout, through which changed pages written to a GBP are externalized to disk.

OK, now for the story: back in the mid-1990s, when data sharing was introduced with DB2 Version 4 for z/OS, an organization that was one of the early adopters of the technology wanted to know how large a GBP ought to be. I was working at the time in IBM's DB2 for z/OS national technical support group, and I fielded this question and thought that "as large as possible" was an unacceptably imprecise answer; so, I set about trying to determine the right size for a GBP in a quantitative way. I focused on GBPs with 4K-sized data entries because buffer pools that hold 4K-sized pages were (and to a somewhat lesser extent still are) dominant versus buffer pools that hold 8K, 16K, or 32K-sized pages. Further, I assumed that the default ratio of five GBP directory entries for every one data entry would be in effect. I also knew that a GBP directory entry occupied about 200 bytes of space (that was then -- I'll get to the current size of a GBP directory entry in a bit).

The last piece of the puzzle was a GBP sizing objective. A GBP sized right would be beneficial because... what? That "what," I decided, should be 1) avoidance of GBP write failures due to lack of storage (something that can occur if a GBP is so small that casting out of changed pages to disk -- required to make GBP data entries stealable -- cannot keep up with the volume of writes of changed pages to the GBP) and 2) avoidance of GBP directory entry reclaims (if a page must be registered in a GBP but all of that GBP's directory entries are in use, a directory entry has to be reclaimed, and when that happens the copies of the page cached in local buffer pools of member DB2 subsystems have to be preemptively marked invalid). Knowing that GBP write failures due to lack of storage are most likely to occur for a way-undersized GBP, I decided to concentrate on determining a GBP size that would virtually eliminate the possibility of directory entry reclaims, my thinking being that a GBP so sized would also be large enough to avoid GBP write failures due to lack of storage (and that has proven to be the case, in my experience).

How, then, to avoid GBP directory entry reclaims? I realized that the key to accomplishing that goal was having at least as many GBP directory entries as there were "slots" into which different pages of table spaces and indexes could be cached. These slots would be the buffers of local pools, PLUS the data entries in the corresponding GBP. If, for example, a data sharing group had two member DB2 subsystems, and if BP1 on each member had 30,000 buffers, you'd have 60,000 page slots right there. 60,000 directory entries in GBP1 would cover that base, right? Yeah, but with the standard 5:1 ratio of GBP directory entries to data entries, those 60,000 directory entries would bring along 12,000 data entries in GBP1. That's 12,000 additional slots in which different pages could be cached, so you'd need 12,000 more directory entries to eliminate the possibility of directory entry reclaims. Well, those 12,000 GBP directory entries would bring along another 2400 GBP data entries (with the aforementioned 5:1 ratio in effect). Providing 2400 more directory entries to account for those page slots would give you 480 more GBP data entries, so you'd need 480 additional directory entries, and so on. I came up with a formula into which I plugged numerical values (200 bytes for a directory entry, 4K for a data entry, 5 directory entries for every 1 data entry), and saw that I had a converging sequence. That sequence converged to approximately 0.3125, meaning that combining the size (in megabytes) of local BPn buffer pools and multiplying that figure by 0.3125 would give you a size for GBPn that would virtually ensure zeroes for directory entry reclaims and for GBP write failures due to lack of storage (something that could be verified using the output of the command -DISPLAY GROUPBUFFERPOOL(GBPn) GDETAIL(*)).

Nice, but I was concerned that 0.3125 was a number that people would not be able to readily bring to mind, and I really wanted a GBP sizing formula that a person could carry around in his or her head; so, I nudged 0.3125 up to 0.33, and thus arrived at the GBP sizing rule of thumb that came to be known as "add 'em up and divide by 3" (example: given a 4-way DB2 data sharing group, with a BP3 sized at 30,000 buffers on each member, and the default 5:1 ratio of directory entries to data entries in the GBP, a good initial size for GBP3 would be one-third of the combined size of the BP3 pools, or (30,000 X 4KB X 4) / 3, which is 160 MB). That formula was widely and successfully used at DB2 data sharing sites, and it continued to work well even as the size of a GBP directory entry increased (with newer versions of coupling facility control code). Now that the size of a directory entry is about 400 bytes, "add 'em up and divide by 3" has become "add 'em up and multiply by 0.375." So, given that prior example of a 4-way data sharing group with BP3 sized at 30,000 buffers on each member, a GBP3 size that would virtually eliminate the possibility of directory entry reclaims (because it would give you, with the default 5:1 directory-to-data entry ratio, a number of directory entries about equal to the number of "slots" into which different pages could be cached in the local BP3 buffer pools and in GBP3) would be:

(30,000 X 4KB X 4) X 0.375 = 180 MB

Now, I've mentioned multiple times that the formula I derived was based on two assumptions: a local buffer pool with 4K-sized buffers, and a GBP with five directory entries for every one data entry. What about buffer pools with larger buffer sizes (e.g., 8K or 16K)? What about GBP directory-to-data-entry ratios (which can be changed via the -ALTER GROUPBUFFERPOOL command) other than the default 5:1? Regardless of the size of the buffers and the directory-to-data-entry ratio in effect for a GBP, the basic goals remain the same: avoid directory entry reclaims and avoid write failures due to lack of storage. Because accomplishing the first of these goals is likely to result in achievement of the second objective as well, you focus on sizing a GBP to eliminate directory entry reclaims. Directory entries won't be reclaimed if there is at least one directory entry for every "slot" in which a different page could be cached (i.e., you need to have a number of directory entries that is at least as great as the number of buffers in the local buffer pools PLUS the number of data entries in the GBP).

To see how this GBP sizing approach can be used, suppose that you have a 2-way DB2 data sharing group, with a BP16K1 that has 10,000 buffers on each member. Suppose further that you have reason to want to use a 7:1 directory-to-data-entry ratio for GBP16K1 instead of the default 5:1 ratio. In that case, how should GBP16K1 be sized so as to prevent directory entry reclaims? Well, to begin with, you know that you'll need a directory entry in GBP16K1 for every local BP16K1 buffer. That's 20,000 directory entries. Given the 7:1 directory-to-data-entry ratio in effect for GBP16K1, along with the 20,000 GBP directory entries you'll have 2858 data entries (20,000 divided by 7, and rounded up to the nearest integer value). To cover those additional 2858 page "slots," you'll need another 2858 directory entries in the GBP. Again with the 7:1 directory-to-data-entry ratio in mind, the 2858 additional directory entries will come with another 409 data entries (2858 divided by 7, rounded up). You'll need another 409 directory entries to cover those 409 data entries, but that means an additional 59 data entries. The 59 directory entries needed to cover those data entries will bring along 9 more data entries, and the 9 directory entries needed to cover those data entries will mean 2 more data entries, for which you'll need 2 more directory entries. Add one more data entry to go along with those last 2 directory entries (2 divided by 7, rounded up), and you've taken the sequence far enough.

Now, just add up the directory entries you got with each iteration of the sequence, starting with the initial 20,000:

20,000 + 2858 + 409 + 59 + 9 + 2 = 23,337

Divide that figure by 7 (and round up to the nearest integer) to get a number of data entries for the GBP:

23,337 / 7 = 3333.86, or 3334 when rounded up

Now you can size GBP16K1 by multiplying the number of directory entries by 400 bytes and the number of data entries by 16KB:

(23,337 X 400 bytes) + (3334 X 16KB) = 62.7 MB

I'd round that up to 63 MB for good measure. And there you have it: a GBP16K1 sized at 63 MB (given our use of a 7:1 directory-to-data-entry ratio for this GBP) should result in your seeing zero directory entry reclaims for the GBP (and zero write failures due to lack of storage, to boot). As mentioned previously, you can check directory entry and write failure values using the output of the DB2 command (using GBP16K1 in this case) -DISPLAY GROUPBUFFERPOOL(GBP16K1) GDETAIL(*). Dealing with a different buffer size? A different GBP directory-to-data-entry ratio? Doesn't matter. You can use the approach laid out above for any buffer size and any GBP directory-to-data-entry ratio.

I hope that this information will be useful to you in right-sizing your group buffer pools, if you're running DB2 for z/OS in data sharing mode (if you're not, consider the use of this technology at your site -- you can't beat it for availability and scalability).

18 comments:

  1. Great post.. thank you for sharing.

    ReplyDelete
    Replies
    1. Appreciate the kind words, Troy.

      Robert

      Delete
  2. Thank you for sharing ..

    ReplyDelete
  3. Robert, this is great.. have a couple of questions though.

    1. Is a data page read by a DB2 member registered in a directory entry in the GBP always or only when it is GBP dependant? I was thinking always.

    2. Whats the typical size of directory entry? is it 200 KB or 432 KB?

    ReplyDelete
    Replies
    1. 1) Page registration occurs only for GBP-dependent page sets (or partitions, if you're talking about a partitioned table space or index). Even when a page set is GBP-dependent, if data sharing group member DB2A is the only member that has the page set open for read/write access (other members are accessing the page set in a read-only way), DB2A does not have to register pages that it reads. Why? Because the purpose of registration is basically to enable a member to receive a notification from the coupling facility if a page cached in a local buffer pool is changed by a process running on another member of the group (the locally cached page in that case is marked as invalid). If DB2A is the only member that is changing pages of a page set (via UPDATE and/or DELETE and/or INSERT) that is being accessed in read mode by other group members (thereby making the page set GBP-dependent), DB2A will not need any coupling facility notifications of changed pages belonging to the page set (it knows what's changing because it's doing the changing). DB2A therefore will not register pages as it reads them from disk.

      2) I believe that the size of a directory entry is about 250 bytes.

      Robert

      Delete
  4. Thank's for the interesting information. Well i can't follow in all the points. So you're expecting that in case each member has a local bufferpool so in the coupling facility should be the summation of all the matching pool parts. So the fiction is that every page could be groupbuffer dependent. Sure it could be but in practise i never will be. Also it depends on the application that are running. In our installation up to 6 members in a datasharing group i defined a trickle size.
    Interesting for me was also the comparison between the display groupbufferpool command output and the output from the smf PM Report. In case we have a view on the 32K Groupbuffer you will see that SMF is thinking in 4096 KSize so dividing the figure for the directory ent. with 32 and multiply it with 4 (4KPage) will result in the figure for entries in the display command.

    ReplyDelete
    Replies
    1. I don't think that "fiction" is the right term for what I wrote. "Theoretical possibility" is a more accurate term. The important point is this: organizations running DB2 for z/OS data sharing groups wanted a GBP sizing approach that is virtually guaranteed to result in zero directory entry reclaims for their group buffer pools. If you want to guarantee that outcome, you'll have a directory entry in GBPn for every local BPn buffer and every GBPn data entry. If you have fewer than that number of directory entries, will you have directory entry reclaims? That depends on how many fewer directory entries you're talking about, and on the particular data access patterns of your DB2 data sharing system. Plenty of organizations with which I work have enough memory in their CF LPARs to allow them to size GBPs using the "guaranteed zero directory entry reclaims" approach. If an organization does NOT have enough CF LPAR memory to do that, smaller GBPs can still deliver zero-directory-entry-reclaim results, but as the GBP size gets smaller and smaller relative to the size you'd have in following the guaranteed-zero-directory-entry-reclaim approach, the probability that some directory entry reclaims will occur will increase. Of course, a few directory entry reclaims will probably not have a significant negative impact on system performance.

      With regard to GBP directory entry numbers, I'm not familiar with the "SMF PM" report you mentioned in your comment. I've looked at data in RMF Coupling Facility Activity Reports. What is an "SMF PM" report?

      Robert

      Delete
  5. Is it possible to use directory to data ratio as 1:1 . If yes , how the formula will be for calculating the size of group buffer pool ?

    ReplyDelete
    Replies
    1. Yes, that is possible. If you were to have a 1:1 directory-to-data entry ratio for a GBP, I think that the key to avoiding directory entry reclaims would be to have a number of data entries in the GBP that is larger than the number of buffers in the associated local pools. So, if you had a 2-member data sharing group, and BP5 had 1000 buffers on both member subsystems, that would be a total of 2000 local buffers. If GBP5 had, for example, 2500 data entries, it would also have 2500 directory entries. That would be enough directory entries to register 2000 different pages, if every buffer in both local pools held a different page. The extra 500 directory entries would be for changed pages written to the GBP and not yet castout to disk.

      If my example BP5 had assigned to it only one object (a table space or index), and if that object had 1000 pages (i.e., if the purpose of BP5 were to completely cache the object in memory), you shouldn't need more than 1000 directory entries (and 1000 data entries) in GBP5.

      Robert

      Delete
  6. I'm starting to review our GBP sizing now, and am wondering how you determine the appropriate entry to page ratio. What factors would prompt you to change from the apparent 5:1 default to a 7:1 you use in one example?

    Thanks a bunch for all this information, by the way!

    ReplyDelete
    Replies
    1. Hey, Don. I'd start with the standard 5:1 directory-to-data-entry ratio. Using the output of the DB2 command -DISPLAY GROUPBUFFERPOOL(gbp-name) GDETAIL, check to see if you are getting directory entry reclaims for a GBP. If you are, first see if you can make the GBP bigger. If you don't have the CF LPAR memory for that action, consider raising the directory-to-data-entry ratio for the pool somewhat. If you are using a buffer pool to "pin" objects in memory (i.e., to cache them in the buffer pool in their entirety), you should be able to get by with a number of directory entries in the associated GBP that is only a little larger than the number of buffers in one of those local "pinning" pools. You might be able to reduce the directory-to-data-entry ratio for the GBP to increase the number of data entries in the pool (good for boosting GBP hit ratios) while still retaining enough directory entries to account for each different page that could be cached in the "pinning" pools associated with the GBP.

      Hope this helps.

      Robert

      Delete
  7. Thank you for sharing. Please share your mail to provide the RATIO details.

    ReplyDelete
    Replies
    1. You can contact me at rfcatterall@gmail.com.

      Delete
  8. Hi Robert,
    Thank you for your article. Interesting approach in calculating the GBP size.
    However, I dont see how your approach reflect the Write activity.

    The formula to calculate number of Data entries , provided by IBm, as follows :
    Data Entries = U * D * R
    where U is degree of sharing ( from 0.5 to 1)
    D - number of pages written per second for all members during peak activity
    R- avg page residency time
    then DATA portion (Mb) = Data entries * Page size /1024
    This Data portion is the highest number in calculating total GBP size ( GBP Size = Data + Directory ) .

    In this formula , Total size of Local buffers not even consider .
    So, for example , using simplified formula ( your approach close to this ) to calculate GBP size :
    Local BP = 50,000 pages ; Page size = 4 K , Number of Sharing members = 4; Medium sharing = 30%
    GBP Size = (50,000 X 4 members ) 4 kb page size X 0.3 = 240 Mb

    However, if consider Writing activity , and if this actitivty is pretty heavy - let's say 10,000 pages per second ( I took this number from my instalation - sometimes we have even higher numbers ), formula for calculating DATA portion will look like

    Data Entr = 0.7 ( medium sharing) X 10,000 X 60 sec ( default page residency time ) = 420,000 x 4/1024 = 1,640 Mb - and it's not even considering Directory entires

    Adn this is much bigger value then using simplified formula.
    I think Writing activity has much bigger impact on sizing GBP pool , and should be taken into consideration .

    Your comments would be appreciated .

    Regards Ilya

    ReplyDelete
    Replies
    1. Sorry about the delay in responding, Ilya.

      First, you point out that the simplified formula you used to arrive at a recommended GBP size of 240 MB, given a local BP size of 50,000 4K pages in a 4-way group (with, presumably, the default 5:1 ratio of directory to data entries in effect) is "close" to mine. To clarify things, my current simplified formula for that scenario would look like this:

      ((50,000 X 4K) X 4) X 0.4 = 320 MB

      So, you get 320 MB, versus 240 MB. I don't use a light/medium/heavy data sharing factor. I just take the aggregate size (in MB) of the local pools (e.g., combined size of BPn on all members) and multiply that by 40% to get a recommended size for GBPn. Originally, I multiplied aggregate local pool size by 33%, but I later had to bump that up to 40% because GBP directory entries got larger over time. The blog entry then goes on to lay out an approach for sizing GBPs when the page size is other than 4K and/or when the directory-to-data-entry ratio is other than 5:1.

      You make a good point in that my approach is one of several that could be used to size DB2 for z/OS group buffer pools in a data sharing environment. Yes, my approach is based solely on the size of local buffer pools, and does not take GBP write activity into account. I have no problem with people using a GBP sizing formula that takes GBP write activity into account. People should use a GBP sizing approach that works for them and with which they feel comfortable.

      My aim in coming up with my GBP sizing approach was to help people to avoid a mistake that I saw often in the early years of DB2 data sharing (the second half of the 1990s): undersizing GBPs and getting, as a result, page invalidations due to directory entry reclaims (a drag on performance) and, sometimes, GBP write failures due to lack of storage (not good for availability -- could cause some pages to land on the logical page list and be, therefore, inaccessible until recovered). I worked out an approach that would result in a GBP having a number of directory entries that would virtually guarantee that no directory entry reclaims would occur. In my experience, I have seen that with a GBP so sized it is highly unlikely that there will be GBP write failures due to lack of storage.

      So, here's an interesting question: if you use my approach to GBP sizing and you get a recommended size of X for GBPn, what might prompt you to make GBPn larger than that size? One factor would be the occurrence of GBP write failures due to lack of storage. As I've mentioned, in my experience that's unlikely to happen, but if t does (and GBP write failures due to lack of storage is reported by DB2 monitors and also shows up in the output of the DB2 command -DISPLAY GROUPBUFFERPOOL(GBPn) GDETAIL), enlarging the GBP (assuming you have available memory in the coupling facility LPAR) would be a good response.

      Another reason to make a GBP larger than the size arrived at using my approach would be to increase page residency time in a GBP. If my approach yields a size of X for GBPn, and you've sized all your other CF structures and you still have a lot of spare memory in your CF LPARs (and I use LPARs in plural form because you DEFINITELY want to duplex your GBPs), and GBPn is a particularly active GBP in terms of page read requests, consider making GBPn larger so that a larger percentage of the GBPn read requests will return the desired page -- that beats having to get the page from disk, because a GBP read will likely complete a good two orders of magnitude faster than would a read from disk.

      Bottom line: my GBP sizing approach is geared towards giving you a recommended MINIMUM size for a GBP. A GBP larger than that recommended minimum size, like a larger local buffer pool, can boost performance in a DB2 data sharing system.

      Robert

      Delete
  9. NOTE: I updated this entry on August 11, 2015, to reflect the fact that the current size of a group buffer pool directory entry is about 400 bytes. I had stated that the size of a GBP directory entry had grown to about 250 bytes from the original size of roughly 200 bytes. That was true for a while, but the GBP directory entry size continued to grow, so that it is now about 400 bytes. I adjusted calculations and formulas in the blog entry accordingly.

    Robert

    ReplyDelete
  10. Hi Robert,
    I compare the result from your formula with the calculated size by CFsizer.
    First, CF Sizer doesn't take in account the directory to data ratio.
    For a total vpsize of 865K pages, CF Sizer gives an initsize of 839Mb. And with your formula, i get a size of 206836 Kb, (with ratio of 9:1) , which is nearly 1/4 of the CFsizer provided size.
    Can you comment on this difference (i've tried the default ratio of 5:1 and get 327666 with your formula which is still far from the CFsizer result)
    Thank you very much
    Duc

    ReplyDelete
    Replies
    1. I really can't comment on this disparity because I do not know the formula that CFSizer uses to calculate a group buffer pool size for a given set of user-provided input values.

      I will say this: the chief aim of the GBP sizing approach about which I wrote in this blog entry is to have enough directory entries so as to virtually eliminate the possibility of directory entry reclaims. If your coupling facility LPARs have enough memory to accommodate GBP sizes that are larger than those yielded by my approach, go ahead and define larger GBPs. A larger GBP will increase residency time for changed pages written to the GBP, and that will improve performance by increasing the "XI" GBP read hit ratio. I wrote of the XI GBP read hit ratio in the blog entry viewable at this URL:

      http://robertsdb2blog.blogspot.com/2015/07/db2-for-zos-group-buffer-pools.html

      Delete