Thursday, April 12, 2012

DB2 for z/OS: Trading Memory for MIPS (Part 2)

In part one of this three-part entry on leveraging System z memory to reduce CPU consumption for a DB2-accessing application workload, I wrote about getting your DB2 buffer pool configuration house in order by doing things to improve CPU efficiency without increasing the total amount of virtual storage allocated for page caching (e.g., adjusting certain thresholds for work file-dedicated pools, making appropriate use of buffer page-fixing, choosing the right buffer-steal algorithm for a pool, and judiciously shifting buffers from low-I/O to high-I/O pools). After taking these actions you might find that you still have high read I/O rates for some of your buffer pools. That's when you'd take the next step (if feasible, and I'll get to that momentarily) and increase the size of your buffer pool configuration to drive down disk read activity and associated CPU consumption. In a nutshell, that is the subject of this part two post. Within the next couple of weeks I'll post my part three entry, in which I'll look at memory-for-MIPS actions that involve DB2 virtual storage pools other than the buffer pools (e.g., the EDM pool, the RID pool, and the sort pool).

Before going further, I'd like to address a topic that's come up a number of times in conversations I've had with DB2 for z/OS DBAs and systems programmers -- that being the difference between GETPAGE requests and DB2 read I/O requests. Plenty of people have the idea that a bigger buffer pool means fewer GETPAGEs. In fact, that is generally NOT the case. Driving down GETPAGEs is a worthwhile objective, since GETPAGE requests -- more than anything else, in my experience -- are the key determinant of the CPU cost of SQL statement execution; however, growing your buffer pool configuration is not the way to reduce GETPAGE activity. Instead, you'd attack that objective through means such as physical database design changes (adjusting table clustering, adding or modifying indexes, etc.) and SQL statement coding (e.g., changing a non-indexable predicate to an indexable predicate). Enlarging your buffer pool configuration is about reducing I/O requests (primarily read requests, which tend to greatly outnumber write I/O requests) and the CPU cycles consumed in driving those requests. Is there a connection between GETPAGE requests and read I/O activity for a buffer pool? Of course there is: when DB2 needs to examine a page from a table space or an index -- usually on behalf of an application program -- it will issue a GETPAGE request for that page. If the page in question is NOT already in the buffer pool to which the table space or index is assigned, the GETPAGE request will result in DB2 initiating a read I/O operation to bring the page into memory (unless the page is to be read into memory via an already-underway prefetch I/O, in which case the requesting program will wait for the completion said prefetch I/O). A bigger buffer pool, then, doesn't mean fewer GETPAGEs, as DB2 will still need to look at the same number of table and/or index pages in executing the same SQL statement (all other things being equal). What a bigger buffer pool DOES mean is that fewer of the GETPAGE requests will result in read I/O operations, since the likelihood of a requested page already being in memory will tend to increase as a buffer pool's size increases.

There is an exception to this rule: if a buffer pool is way too small relative to the number of pages belonging to objects assigned to the pool and the volume of requests for those pages, DB2 could end up hitting the data manager threshold for the pool. That threshold, also known by the acronym DMTH, is reached when 95% of a pool's buffers are non-stealable (either currently in use or updated and not yet externalized to disk or to a coupling facility in a DB2 data sharing system). When that happens, DB2 will issue a GETPAGE for every row retrieved from a page, so that, for a page from which 20 rows are retrieved for a program, 20 GETPAGE requests will be issued, versus one GETPAGE had DMTH not been hit for the pool. Obviously, a big spike in GETPAGE activity will cause a spike in CPU consumption, so keep an eye on DMTH and make sure that you're not hitting that threshold for any of your buffer pools (your DB2 monitor will provide this information, as will the output of the DB2 command -DISPLAY BUFFERPOOL(xxx) DETAIL, where "xxx" could be the name of an individual pool, such as BP4, or the word ACTIVE if you want to see information for all active buffer pools). If DMTH is being hit for a pool, it's likely that the pool is significantly undersized.

On now to making a buffer pool configuration larger. While there's no need to apologize for that (I'll expand on this point in a moment), you want to do it for a reason, and the reason, as just pointed out, is to reduce disk read I/O activity (which in turn reduces CPU consumption). So, your first order of business is to check on the rate of read I/O activity for each of your buffer pools. As pointed out near the end of my part one memory-for-MIPS post, what you're looking for is the TOTAL rate of read I/O requests (synchronous and asynchronous) per second for your buffer pools. Also as pointed out in the part one post, you can get that figure from a DB2 monitor statistics detail report or online display of buffer pool activity, or from the output of the DB2 command -DISPLAY BUFFERPOOL DETAIL (and I describe in the part one post how to use -DISPLAY BUFFERPOOL DETAIL output to get the desired information). If the total read I/O rate for a pool is less than 100 per second, making that pool larger won't hurt anything (assuming the presence of sufficient real storage to back that enlargement, about which I'll have something to say in a bit), but it likely won't move the CPU consumption needle much (though you might want a zero-read-I/O situation associated with "pinning" a DB2 object in memory -- something made more straightforward and effective in a DB2 10 environment via the new PGSTEAL(NONE) buffer pool option, described in the part one post). What should draw your attention are any pools for which the read I/O rate is in the hundreds -- or even thousands -- per second (the highest I've seen with my own eyes at a DB2 for z/OS site is 9000 read I/Os per second for a pool). Taking a big chunk out of these numbers through buffer pool enlargement can result in a significant improvement in CPU efficiency for your DB2 workload.

Now, taking a "big chunk" out of a high rate of disk read I/Os is not likely to happen via itty-bitty steps, by which I mean that you probably won't see much of an impact from adding 1000 buffers to an 80,000-buffer pool. Big-step buffer pool enlargements will often involve increasing a pool's size by 25% or even 50% (a pool that's currently quite small -- say, 10,000 4K buffers in a system with a lot of real storage -- might be doubled in size -- or more -- in a single step, depending on the particular circumstances of the situation). I tend to think big, in terms of buffer pool enlargement, because I so often see buffer pools with high read I/O rates that are way smaller than they could be, given the abundance of memory available on the associated z/OS LPAR. Here's the deal: since 64-bit addressing was introduced in the System z environment some years ago, and the cost of memory was reduced for new mainframe models, it's become more and more common for production z/OS LPARs to be configured with 20 or 40 GB or more of real storage (and sometimes you see a z/OS LPAR with north of 100 GB of real storage -- a z196 server can be configured with up to 3 TB of memory); yet, in these big lakes of memory you find production DB2 subsystems with a total aggregate buffer pool size (meaning, the sum of the sizes of each allocated pool) of a measly 1 GB -- or even less. HELLO! IS ANYONE UNDER THE IMPRESSION THAT YOU GET BONUS POINTS FOR NOT USING A HARDWARE RESOURCE -- MAINFRAME MEMORY -- FOR WHICH YOUR ORGANIZATION PAID MONEY? Come on, folks. That stuff is there to enable your system to perform better, and no subsystem will use more memory to better effect than DB2.

That brings us to the "don't apologize" part. Some DB2 for z/OS people -- especially those who've worked with DB2 for 15 or 20 years or more -- seem to have some kind of a guilt complex over their subsystem's use of mainframe memory resources. "Golly, I'm using a GIGABYTE just for my buffer pools. The other subsystems on the production LPAR don't use that much memory. I should be satisfied with what I have." WRONG. Other z/OS subsystems that handle things like transaction management and security and networking don't NEED the memory that a relational database management system needs, and -- more importantly -- they don't benefit from getting lots more memory the way DB2 does. DB2 for z/OS was architected, from the get-go, to perform better with a large real storage resource. Give DB2 more memory, and -- unless your system is storage-constrained, and I'll get to that -- throughput goes up and CPU consumption (most likely) goes down (though if more memory for DB2 lets you get a lot more work done in a unit of time, CPU utilization could increase, but that's usually OK if it means that, for example, a critical batch run completes in much less time than before). Some DB2 sites -- and more as time goes on -- have figured this out. They've leveraged big mainframe memory resources to greatly increase DB2 buffer pool sizes, with very positive effects on application performance. The biggest buffer pool configuration I've seen for a single DB2 subsystem is approximately 42 GB. Are the people at that shop crazy? Of course not. They have a z/OS system with a large real storage resource (way bigger than 42 GB), and they are taking advantage of that to deliver excellent performance for a high-volume, mission-critical application that accesses a very large database. They key is to go big when there's big memory available, without going too big.

And that's the rub, you might think. How big is too big, in terms of mainframe memory allocated to a DB2 subsystem? Actually, that's not too hard to monitor. The main thing here is to avoid a situation in which mainframe memory is over-committed, and the metric to keep an eye on in that regard is the z/OS LPAR's demand paging rate. This figure, available via a z/OS monitor, is the rate (usually reported as a per-second number) at which z/OS has to bring into real storage, on-demand, pages that were paged out to auxiliary storage (often referred to as page data sets, on disk). What should that rate be? I'll tell you one thing: it shouldn't be zero -- not in my book. If the z/OS system's demand paging rate is zero, that tells me you have an underutilized resource that could be used to improve the performance of your DB2 workload, and you can change that "underutilized" situation by enlarging buffer pools that have high read I/O rates (or by pinning some really performance-critical DB2 objects in memory). Now, while I think that a demand paging rate that's greater than zero is generally a good thing, you don't want that number to go too high. I'm personally very comfortable with a demand paging rate that's between 1 and 10 per second during busy times (i.e., times of high volume workload processing). Low single-digits I like more than high-single digits, but I wouldn't be concerned about the memory resource being under more-than-desired pressure unless the demand paging rate gets into 10+ per second territory (and here I'm not counting the occasional memory-intensive event, such as a dump, that could cause the demand paging rate to temporarily spike to a higher level). Should your aim be a demand paging rate of 10 per second? No. I'm just saying that a rate of less than 1 per second (quite common these days) means that the system's real storage resource is not under any kind of significant pressure, and there is "head room" available for increasing virtual storage usage -- through enlarging buffer pools, for example -- without negatively impacting non-DB2 work on the system.

So, if the demand paging rate on your system is less than 1 per second and you have high read I/O buffer pools and/or you want to pin some DB2 objects in memory, what should you do? How big should you go, buffer pool-wise? Here's a rule of thumb that's worked for me: in my experience, when a single production DB2 subsystem is running on a z/OS LPAR, that LPAR's memory resource is very unlikely to be under an undesirable level of pressure if the DB2 subsystem's buffer pool configuration size is up to 25% of the z/OS LPAR's real storage size. So, if you have a z/OS LPAR with 20 GB of real storage, a production DB2 subsystem buffer pool configuration size of 5 GB (again, meaning the aggregate size of all of the subsystem's active buffer pools) should pose no problem in terms of leaving plenty of memory for address spaces other than DB2's, such as those associated with TSO, CICS, batch, RACF, etc. (keep in mind that when the maximum real storage size was 2 GB, not too many years ago, it was not uncommon for sites to use 50% or even 75% of that for DB2 buffer pools). If you're well south of that 25% level, you should be able to grow your DB2 buffer pool configuration considerably without hurting overall system performance -- just keep an eye, as I've said, on that demand paging rate, and see that it stays in the aforementioned single digits per second range. Already have a DB2 buffer pool configuration size that's 25% (or even more) of the z/OS LPAR's real storage size? That doesn't mean you have to stop there. Just be deliberate about further buffer pool size increases, and don't overextend the memory resource of the system. Also, don't over-do it with respect to the use of PGFIX(YES) for your buffer pools, as I noted in my part one memory-for-MIPS post. And one more thing: if you have more than one production DB2 subsystem on an LPAR (test and development subsystems usually have buffer pool configurations that are smaller than that of a production subsystem), be aware of the TOTAL buffer pool configuration size of ALL the subsystems. Taking this to an extreme, if you have 4 production DB2 subsystems on one z/OS LPAR, don't size the buffer pool configuration of each to be 25% of the LPAR's memory -- that would likely result in a demand paging problem. You might want the TOTAL size of ALL of the subsystem's buffer pool configurations to be no more than half of the amount of real storage assigned to the LPAR.

If you make a buffer pool larger, check that read I/O rate again. If it's gone down, good. If it hasn't, you might want to make the pool larger still, but if you do that and the read I/O rate still doesn't budge, you may want to think in terms of taking some of the more performance-critical objects in that pool and assigning them to a new pool.

Finally, if you use DB2 in data sharing mode, keep in mind that enlarging a local pool on the member subsystems may necessitate an enlargement of the corresponding group buffer pool (GBP) in the coupling facility. What you want to do here is keep the GBP large enough, relative to the aggregate size of the corresponding local pools, to prevent page invalidations due to directory entry reclaims, as such invalidations require member DB2 subsystems to re-read into memory pages that were likely "clean" when invalidated, and that's a drag on system performance (output from the DB2 command -DISPLAY GROUPBUFFERPOOL GDETAIL will show you whether such invalidations are occurring or not -- look for the field labeled "CROSS INVALIDATIONS DUE TO DIRECTORY RECLAIMS"). To properly size GBPs in this regard, you can make use of the IBM CFSizer tool, available on the Web at http://www-947.ibm.com/systems/support/z/cfsizer/index.html. There's also a quick sizing formula for 4K group buffer pools with the default 5:1 ratio of directory entries to data entries that has long worked well for me (I used it effectively at a DB2 data sharing site just last month): add up the sizes of the local pools and divide that by three, and you have a good GBP size. For example: if you have two member subsystems and BP4 has 15,000 buffers, or 60 MB, on each member, a GBP4 sized at (60 MB + 60 MB) / 3 = 40 MB should not cause page invalidations due to directory entry reclaims (I worked out that formula years ago for 4K GBPs, and haven't worked on a similar one for 8K or 16K or 32K GBPs, but I could do that for those GBP types without too much difficulty).

Whew. More words than usual for one of my blog entries, but I was on a roll and went with it. Hope the information is useful to you. As previously mentioned, I'll post part three of this three-part memory-for-MIPS entry within the next couple of weeks.

No comments:

Post a Comment