Recently, a DB2 for z/OS professional asked me about something he'd occasionally seen in a DB2 monitor display: a non-zero value for CONDITIONAL GETPAGE FAILURES for a buffer pool. Now, his DB2 subsystem appeared to be running just fine, but we are rather conditioned (so to speak) to associate the word FAILURE with the meaning NOT GOOD, and he wanted to know if these FAILURES indicated something with which he should be concerned. The short answer to this question is, "No -- that's just an FYI number." In the remainder of this entry I'll provide some background on conditional GETPAGEs and explain why associated "failures" are nothing to lose sleep over.
I expect that you all know what a DB2 GETPAGE is: a request by DB2 for z/OS to access a particular page in a table or an index. For a "regular" GETPAGE (the vast majority are of this type), a "miss" (i.e., a situation in which the requested page is not found in the DB2 buffer pool) will result in DB2 issuing a synchronous read I/O to get the page into the buffer pool from disk storage. In essence, a DB2 synchronous read I/O is the result of a "regular" GETPAGE "failure." Are you concerned about those "failures?" Of course not -- that's just DB2 business as usual: "I (DB2) need to access page 123 of tablespace XYZ. If it's not in the buffer pool, suspend the application process on behalf of which I'm requesting this page, read the page into memory from disk right away, and resume the application process when the read I/O is completed."
So it is with conditional GETPAGE "failures" -- no worries (as the Aussies say -- I love that phrase). What's different about a conditional GETPAGE versus a "regular" GETPAGE: for the former, a "miss" (reported by a DB2 monitor as a "failure") will result in DB2 driving an asynchronous read I/O to get the page into memory and subsequently issuing a "regular" GETPAGE for the page. "Huh?" you might be thinking, "An asynchronous I/O for one page? I thought that asynchronous I/Os (generally associated with prefetch reads) were issued for multiple pages (typically, up to 32 at a time)." This all has to do with WHY DB2 issues conditional GETPAGE requests: it's for parallel I/O operations under a single task. Before DB2 10, conditional GETPAGEs were issued when query I/O parallelism was used by DB2 to access table or index pages (since DB2 for z/OS V4, most folks equate "query parallelism" with query CPU parallelism -- I/O parallelism is what we had before that). Since there's one task, you can have one synchronous I/O going. If DB2 needs a page and it's not in the buffer pool, it can issue an asynchronous I/O (performed under a DB2 task, versus the application process's task) and move on to the next page that it wants (from another tablespace partition, for example). When it's ready to work on that page for which the asynchronous I/O was driven, DB2 will issue a regular GETPAGE for it, and there's a good chance that it'll be in the buffer pool.
So, as I mentioned, I/O parallelism hasn't exactly been front and center in the minds of mainframe DB2 people since query CPU parallelism made the scene with DB2 V4. With DB2 10, there's something new: index I/O parallelism for inserts into a table in a non-segmented tablespace on which multiple indexes are defined. When there's an insert into a such a table (generally, one having three or more indexes), DB2 will of course issue "regular" GETPAGEs (for root, non-root, and leaf pages) for the table's clustering index, because that's how it identifies the target page in the table for the insert. When that's done and the row is inserted (into the target page of the table, or into a page near that one, if the target page is full or locked and space is available in a nearby page), it's time to update the table's indexes with the row's physical location. The clustering index leaf page needing updating is already in memory. DB2 goes to the next index and issues a conditional GETPAGE for the page it needs to access. If there's a "miss" (aka a "failure"), it moves on and issues a conditional GETPAGE for the next index page it needs. Thus you can have the inserting task stay busy while asynchronous read I/O requests are processed in the background. DB2 will issue "regular" GETPAGEs for the "last" of the table's indexes that it's updating, as it did for the clustering index, but again the objective is for DB2 to work in the background to bring pages from the other indexes on the table into memory asynchronously. When the inserting task is ready to update those index pages, the hope is that those pages will already be in memory, thanks to the asynchronous I/Os. Index I/O parallelism for inserts can result in significant reductions in elapsed time for insert operations into tables in non-segmented tablespaces in a DB2 10 environment.
Besides boosting performance, DB2 10 index I/O parallelism for inserts can result in some conditional GETPAGE "failures," as these are what trigger the asynchronous read I/Os for pages not found in the buffer pool. That, of course, is no sweat whatsoever, whether we're talking about DB2 10 or a prior release (in which case, as noted, conditional GETPAGEs are associated with query I/O parallelism). These "failures" are just buffer pool misses, and those happen all the time in a DB2 environment (unless you have a really big buffer pool configuration and a really small database). So when you're looking at a DB2 monitor display of activity for a buffer pool, and you see a non-zero value for conditional GETPAGE failures, think, "OK," and move on to other fields in the display. Hang loose, dude.
Good explanation, thanks, Wojtek
ReplyDeleteGlad you found it to be useful, Wojtek.
DeleteRobert
Thanks Robert. I appreciate it.
ReplyDeletePlease let us know if you will be doing any presentations in COLUMBUS or midwest region. Thanks.
Hi Robert ,
ReplyDeleteThere are high number of "COnditional Sequential Getpage failures" on one of bufferpool. and this is shown as warnings in CA SYSVIEW monitor. is this something to be worried? Bufferpool is sized at 2000 pages. and application reporting slowness.
As noted in the blog entry, conditional GETPAGE failures are generally not a big deal (to my knowledge, they're not even recorded in an IBM OMEGAMON for Db2 for z/OS statistics report). That said, a buffer pool with only 2000 buffers is very small. Check the pool's total real I/O rate, which is the sum of synchronous read I/Os, sequential prefetch read I/Os, list prefetch read I/Os, and dynamic prefetch read I/Os, per second. If that rate is greater than 1000 per second, substantially increase the size of the buffer pool (as long as the z/OS LPAR's real storage resource is not under too much pressure - if the LPAR's demand paging rate is less then 1 per second, the real storage resource is not under too much pressure).
DeleteRobert
Thank you
DeleteThis comment has been removed by the author.
ReplyDelete