Wednesday, October 31, 2018

Db2 for z/OS Buffer Pools: Clearing the Air Regarding PREFETCH DISABLED - NO READ ENGINE

Has this ever happened to you? You're looking at the output of a Db2 for z/OS -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command, or maybe at a statistics report produced by your Db2 monitor, or maybe your Db2 monitor's online display of a subsystem's buffer pool activity, and you see for one of your buffer pools something like this:

PREF.DISABLED-NO READ ENG   350.00

And you think, "Uh-oh. Prefetch is a good thing, so disabled prefetch must be a bad thing. And it happened 350 times for BP2 in one hour! AHHHHH! CODE RED! WE'RE OUT OF READ ENGINES!"

Hey - take a few deep breaths, and calm down. Chances are, this is not a big deal. In this blog entry, I'll explain why.


First: what does it mean?

OK, a lot of people know that prefetch operations - sequential, list, and dynamic - are handled by Db2 tasks. That is to say, the CPU  cost of processing prefetch requests is charged to Db2 (specifically, to the Db2 database services address space, also known as DBM1), as opposed to being charged to the application process on behalf of which Db2 is executing the prefetch operations. We sometimes refer to Db2's prefetch-handling tasks as "prefetch read engines." They are represented in a z/OS system by what are known as preempt-able SRBs (service request blocks - a type of z/OS control block), and that's what makes the work that they do zIIP-eligible (100% zIIP-eligible, in fact - a major reason why, in most systems, the bulk of CPU time charged to DBM1 is zIIP engine time).

There are - surprise! - a finite number of prefetch read engines associated with a given Db2 subsystem. That number is 600 in a Db2 11 environment, and 900 in a Db2 12 system. If all 600 Db2 11 prefetch read engines (or all 900 Db2 12 engines) are busy handling prefetch requests, and another prefetch request comes along, that prefetch request will be abandoned and the PREFETCH DISABLED - NO READ ENGINE counter will be incremented (externalized via the QBSTREE field of trace record IFCID 0002, written when Db2 statistics trace class 1 is active, and also part of the output of the Db2 command -DISPLAY BUFFERPOOL with DETAIL). And what does it mean when a prefetch request is abandoned because there was not a prefetch read engine available to process the request? It could mean that pages that would have been brought into memory via the abandoned prefetch request will subsequently be read into a buffer pool by way of a synchronous (i.e., single-page, on-demand) read operation, but that is not necessarily the case. Read on.


Second: do you care?

One thing to know up front: if you see a non-zero value for PREFETCH DISABLED - NO READ ENGINE for one or more of your Db2 buffer pools, you're not alone. This situation is not super-common, but neither is it highly unusual. In fact, we've been seeing more incidences lately of non-zero values for PREFETCH DISABLED - NO READ ENGINE. Why? Several reasons. One is the trend of Db2 data sharing group member consolidation. As IBM Z servers have become more powerful and as Db2's vertical scalability has been enhanced, organizations running Db2 in data sharing mode on Parallel Sysplex clusters have found that they can support an application workload with (for example) 4 members in a data sharing group versus 6. When work that had been spread across n data sharing members now runs on a smaller number of members, that can mean more concurrently active prefetch read requests for a given member, and that can lead in some cases to prefetch read requests exceeding the number of prefetch read engines available on a subsystem. Other factors driving increased levels of prefetch read activity include Db2 query optimizer changes that drive, in particular, more list prefetch-related access path selection; the I/O parallelism for index updates that Db2 10 introduced; and row-level sequential detection, also introduced with Db2 10, which enables Db2 to continue to use dynamic prefetch even as a table space's data rows become somewhat disorganized as a result of data-change activity.

So, if you see non-zero numbers for PREFETCH DISABLED - NO READ ENGINE for one or more of your buffer pools, should you try to do something about it? Maybe, maybe not. First, check a related counter: PREFETCH DISABLED - NO BUFFER. If that value is also non-zero, it's likely that the buffer pool in question is too small, and/or that VPSEQT (the virtual pool sequential threshold) has been changed from its default value of 80 to a too-small percentage of the pool's buffers. Respond to that situation by enlarging the pool, if the z/OS system's real storage resource is not constrained (system memory is not constrained if the z/OS LPAR's demand paging rate - available, among other places, in an IBM RMF CPU summary report - is zero or a very small non-zero value, like less than 1 per second), and/or by taking the pool's VPSEQT value from a too-small number (if applicable) to something closer to 80. Also consider enlarging the buffer pool if the pool's total read I/O rate is in the hundreds or more per second. How can a larger buffer pool reduce incidences of PREFETCH DISABLED - NO READ ENGINE? Here's how: a larger buffer pool that caches more pages makes it more likely that a prefetch request will NOT drive an associated prefetch read I/O operation. Examine information for your buffer pools, using a Db2 monitor or the -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command. Look at (for example) the dynamic prefetch numbers for the various buffer pools. You will likely see, for at least one of your pools, a situation in which the number of dynamic prefetch requests is larger - perhaps considerably larger - than the associated number of dynamic prefetch reads. How can that be? Simple: let's say that there's a dynamic prefetch request to read 32 pages of a table space or index into memory. What if all 32 of those pages are ALREADY IN MEMORY? In that case, the dynamic prefetch request will not result in a dynamic prefetch read operation. Now, the dynamic prefetch request will still occupy a prefetch read engine, but the request will tie up the prefetch read engine for a much shorter time if it does not involve a prefetch read I/O operation. When prefetch read engines are freed up more quickly thanks to prefetch requests that do not drive prefetch reads, it becomes less likely that all of a Db2 subsystem's prefetch read engines will be busy at the same time.

Now, let's say that you have a buffer pool that is large enough to have a relatively low total read I/O rate, and you see for that pool no occurrences of PREFETCH DISABLED - NO BUFFER, but you still see a non-zero value for PREFETCH DISABLED - NO READ ENGINE. Is that a cause for concern? Quite possibly not. To understand this, consider a scenario: some application process is accessing a table space or index in a sequential fashion. A dynamic prefetch request (for example) is initiated by Db2 on behalf of the application process, but all of the subsystem's prefetch read engines are busy handling other requests. The dynamic prefetch request is abandoned. That will lead to synchronous read requests, right? Not necessarily. Suppose the aforementioned application process (or another process accessing the same database object) needs a page that would have been read into memory via the dynamic prefetch request that was abandoned due to a "no read engine" situation. Will a synchronous read operation be required for that page access? NO, if that page is already in the buffer pool. If the requested page is indeed already in memory, there was essentially NO penalty associated with the abandoned dynamic prefetch request - only the very small amount of CPU consumption associated with initiating and subsequently abandoning the prefetch request (so small that I think it would be hard to measure). Here's something you can check, using your Db2 monitor or the output of -DISPLAY BUFFERPOOL(ACTIVE) DETAIL: for a buffer pool that shows, at least sometimes, some occurrences of PREFETCH DISABLED - NO READ ENGINE, look at the total number of synchronous read I/Os, and look also at the total number of synchronous read I/Os that are labeled SYNCHRONOUS READS - SEQUENTIAL. What is a "sequential synchronous read?" It's a synchronous read I/O driven by a process that is accessing data in a sequential fashion. Why might there be synchronous reads for a process accessing data in a sequential manner? One reason: a data row (for example) that is to be FETCHed by a process is way out of place, in a clustering sense. That being the case, the row is not in the quantity of table space pages recently prefetched into memory on behalf of the application process. When the process needs the out-of-place row, the associated page will be synchronously read into the buffer pool, and that action will be recorded as a SYNCHRONOUS READ - SEQUENTIAL. Another reason: an application process accessing data in a sequential fashion has to drive a synchronous read I/O because a page that would have been prefetched into memory was not because the prefetch request was abandoned due to NO READ ENGINE.

If you see PREFETCH DISABLED - NO READ ENGINE go from zero to non-zero for a pool, or if that number goes from n to something considerably larger than n, and sequential synchronous reads do not increase as a percentage of total synchronous reads for the buffer pool, it is likely that the incidences of PREFETCH DISABLED - NO READ ENGINE did not materially impact application performance. In other words, it is likely that pages related to prefetch requests that were abandoned due to NO READ ENGINE were found to be already in memory when subsequently needed by application processes. In that case, occurrences of PREFETCH DISABLED - NO READ ENGINE are really nothing about which you should be concerned.

One more thing: if you have a buffer pool that is already big enough to have a relatively low total read I/O rate, and the pool has nothing in the way of PREFETCH DISABLED - NO BUFFER, and you see for this pool some occurrences of PREFETCH DISABLED - NO READ ENGINE, and you'd really like to try to take that to zero, you could look at the possibility of taking some objects in the pool that are a) not humongous and b) are frequently accessed via prefetch, and reassigning them to a pool defined with PGSTEAL(NONE). If the PGSTEAL(NONE) buffer pool is large enough to hold all pages of all objects assigned to the pool (and that's the objective for a PGSTEAL(NONE) pool), there should be few, if any, prefetch requests associated with the pool, and that decrease in prefetch request activity might reduce the incidence of PREFETCH DISABLED - NO READ ENGINE you've seen for other pools. Something to consider.

So, don't freak out if you see some occurrences of PREFETCH DISABLED - NO READ ENGINE for a buffer pool. It may be having little - if any - impact on application performance. If you do think that non-zero values for PREFETCH DISABLED - NO READ ENGINE are leading to increased synchronous read activity (check for increases in sequential synchronous reads), consider enlarging the buffer pool in question, and maybe consider moving some frequently-accessed, small- to medium-sized objects to one or more PGSTEAL(NONE) pools. No need to panic, in any case.