Thursday, May 25, 2023

OUTBUFF: A Db2 for z/OS ZPARM You Really Ought to Check

Over the past year or so, I've seen more and more situations in which a too-small Db2 for z/OS log output buffer is negatively impacting system and application performance. The Db2 development team took aggressive action to remedy that situation via a change to the default value of OUTBUFF (the relevant ZPARM parameter) with Db2 13, but if you are not yet on Db2 13 you should make this change yourself in your Db2 12 environment (and, if you are on Db2 13, you should definitely be using the new OUTBUFF default, or an even higher value). In this blog entry I'll provide information that I hope will make all of this clear and meaningful for you.

The Db2 for z/OS log output buffer

Db2, of course, logs changes made to database objects (aside from the situation in which a table is defined with the NOT LOGGED attribute - unusual, in my experience). This is data integrity 101 - data changes have to be logged so that they can be rolled back if a unit of work fails before completing, and so that database objects can be recovered when that is required, and so that a Db2 subsystem can be restarted and restored to a consistent state after an abnormal termination, etc., etc.

Db2 data changes are physically written to the active log data sets, which are made reusable (i.e. made available for further use after being filled) via the log archive process. Prior to being written to the current pair of active log data sets (you ALWAYS want to do dual-logging, at least in a production Db2 environment), data changes are written to the log output buffer in memory. Information in the log output buffer is externalized (i.e., written to the disk subsystem) when the log output buffer is full, and also when a data-changing unit of work commits.

The size of the log output buffer is specified by way of the OUTBUFF parameter in the Db2 ZPARM module (think of the ZPARM module as the configuration parameter settings for a Db2 subsystem). For Db2 12, the default value for OUTBUFF is 4000 KB (that became the default value for OUTBUFF starting with Db2 10 for z/OS). With Db2 13, the OUTBUFF default value was changed to 102400 KB. Yeah, that's a 25X increase (when I communicated that in writing to the Db2 for z/OS team at a certain site recently, the Db2 systems programmer on the team asked me, "Is that a typo?"). Why this major increase in the OUTBUFF default value? Two reasons:

  1. It's eminently do-able for the vast majority of production Db2 subsystems that I have seen. Yes, in a relative sense a 25X increase in a ZPARM parameter's default value may seem to be a really aggressive move, but in absolute terms the increase - about 98 MB - is a drop in the bucket for a z/OS LPAR with a large real storage resource. Many production z/OS LPARs these days are generously configured with memory, because mainframe memory keeps getting cheaper on a per-gigabyte basis, and because leveraging that memory can be very good for system performance. It's increasingly common for production z/OS LPARs to have multiple hundreds of GB - or more - of central storage.
  2. It can be very helpful for Db2 system and application performance, as noted below.
From a performance perspective, a larger Db2 log output buffer has two main benefits. They are...

Larger OUTBUFF benefit 1: reduced application log write wait time

Db2 accounting trace data - specifically, data provided by accounting trace class 3 (which, along with accounting trace classes 1 and 2, is almost always active at all times for a production Db2 subsystem) - records (among other things) the time that Db2-accessing applications wait for log write operations to complete. Average wait-for-log-write-I/O time (available via a Db2 monitor-generated accounting long report) is generally quite small - often less than 1% of total in-Db2 wait time (i.e., class 3 time) for a Db2 application workload. In some cases, however, this can be a significantly larger percentage of in-Db2 wait time. Now, there are multiple factors that can contribute to elevated wait-for-log-write-I/O time, but one of these factors can be a too-small log output buffer. If you see higher levels of wait-for-log-write-I/O time for your Db2 application workload, check the value of the field labeled UNAVAILABLE OUTPUT LOG BUFF (or something similar to that - different Db2 monitor products sometimes label the same field in slightly different ways) in a Db2 monitor-generated statistics long report (the field will be in a section of the report with the heading LOG ACTIVITY, or something similar to that). In my experience the value of this field is usually 0, but if the field has a non-zero value then it could be a good idea to set OUTBUFF to a larger value for the Db2 subsystem in question. Even if the value of UNAVAILABLE OUTPUT LOG BUFF is 0, if your Db2 subsystem has a log output buffer that's on the small side then making it larger via an increase in the OUTBUFF value could help to make Db2 log write operations more efficient, thereby potentially contributing to a decrease in wait-for-log-write-I/O time for your Db2-accessing applications.

Larger OUTBUFF benefit 2: better log read performance

The importance of good Db2 log write performance should be obvious: Db2 is writing to its log all the time, so getting that work done quickly and efficiently is good for any Db2 data-changing application. Can log read performance be important for a Db2-accessing process? YES - and that's especially true for a Db2 data-change-replication process.

It's not unusual for Db2 for z/OS-managed data to be replicated to some other location for some purpose. The data replication tools, from IBM and other vendors, that capture Db2 for z/OS data changes and send them in near-real time to another location generally do their data change capture work by issuing requests to the log manager component of Db2 to retrieve data change information (this is done using a Db2 trace record, IFCID 306, that can be requested synchronously by a process such as a data replication tool). Especially when the volume of changes made to data in a replicated Db2 table is high, you REALLY want the Db2 log manager to be able to retrieve the requested data change information from the log output buffer in memory, versus having to read the information from the log data sets on disk. If the log output buffer is too small, you can see a high percentage of log read requests that require access to the log data sets on disk. The volume of such log data set read I/Os can be very high - like, thousands per second. That chews up CPU cycles and adds to data replication latency (this latency refers to the time between a change being made to data on the source Db2 for z/OS system and that change being reflected in the corresponding data at the replication target location) - both things you'd rather avoid.

How can you check on this? Again, go to a statistics long report generated by your Db2 monitor, and again go to the section under the heading, LOG ACTIVITY (or something similar to that). Check the fields labeled READS SATISFIED-OUTP.BUF(%) and READS SATISFIED-ACTV.LOG(%). What you want to see (what I'd certainly like to see) is a value for READS SATISFIED-OUTP.BUF(%) that is well north of 90, and - conversely - a value for READS SATISFIED-ACTV.LOG(%) that is in the single digits (ideally, low single digits). If you see a lower than desired value for the percentage of log reads satisfied from the log output buffer, make OUTBUFF larger if you can.

Can you make OUTBUFF larger, and if so, how high should you go?

The answer to the first part of that question (assuming that the value of OUTBUFF is not already at the maximum of 400000 KB) depends on the pressure (or lack thereof) on the real storage resource of the z/OS LPAR in which the Db2 subsystem of interest is running. My favorite indicator of real storage constraint is the LPAR's demand paging rate, available from a z/OS monitor-generated summary report. If the LPAR's demand paging rate is less than 1 per second, the real storage resource is not at all constrained, and you have (as far as I'm concerned) a green light for increasing the OUTBUFF value. If the LPAR's demand paging rate is over 1 per second, you might want to see if more memory can be configured for the system, or if some memory can be freed up (perhaps by shrinking a Db2 buffer pool that is larger than it needs to be), prior to making the value of OUTBUFF significantly larger than its existing value.

If the z/OS LPAR's real storage is not constrained (as described above), and you want to make a Db2 subsystem's OUTBUFF value larger, how high should you go? First of all, I would highly recommend setting OUTBUFF at least to the new-with-Db2-13 default value of 102400 KB. Should you go higher than that? Well, I would if the value of READS SATISFIED-OUTP.BUF(%) is less than 90. One thing to keep in mind here: OUTBUFF is not an online-updatable ZPARM. That means you have to recycle a Db2 subsystem (i.e., stop and restart it) in order to put a new OUTBUFF value into effect. In a Db2 data sharing system, that may not be a big deal (application work can continue to process on other members of the data sharing group as the one member is recycled), and even for some standalone Db2 subsystems there are regular opportunities to "bounce" the subsystem. On the other hand, at some sites where Db2 runs in standalone mode there are only a few times per year when a production Db2 subsystem can be recycled. If that's your situation, you might want to consider going to the maximum OUTBUFF value of 400000 KB (again, if the LPAR's memory is not constrained - and it's not if the LPAR's demand paging rate is less than 1 per second).

And that's what I have to say about OUTBUFF. Check yours, and check the relevant information in Db2 monitor-generated accounting and statistics reports to see if an OUTBUFF increase would be good for your system.

Friday, April 28, 2023

Migrating to Db2 13 for z/OS When You Have Old (pre-11.1) Db2 Client Code on Your App Servers

Not long ago, I had a talk with a Db2 for z/OS systems programmer who works at a pretty big site. In a somewhat dramatized form, our conversation went something like this:

Me: "When are you guys going to migrate your production Db2 for z/OS systems to Db2 13?"

Db2 sysprog: "Later than I'd like."

Me: "Why's that?"

Db2 sysprog: "We have some old Db2 client code on some of our application servers."

Me: "So?"

Db2 sysprog: "So, I can't take APPLCOMPAT for our NULLID packages above V12R1M500."

Me: "No prob. Just leave the APPLCOMPAT value for the NULLID packages at V12R1M500, and go ahead and activate function level V12R1M510, and then migrate the systems to Db2 13."

Db2 sysprog: "I can do that?"

Me: "YES."

The very next week, I had a very similar exchange with another Db2 for z/OS administrator at a different site. It seems clear to me that there's some misunderstanding in this area out there, with people thinking that way-old Db2 client code represents a roadblock on the way from Db2 12 for z/OS to Db2 13. NOT TRUE, as I hope to make clear in this blog entry.

Terminology: "Db2 client code"

This term refers to the piece of IBM code that runs on a remote (from the Db2 for z/OS perspective) server that enables an application on that server to be a DRDA requester (DRDA is short for distributed relational database architecture - the protocol used for Db2 distributed database processing). A DRDA requester application is one that sends SQL statements to Db2 by way of a driver such as IBM's JDBC or ODBC driver. Most often, the Db2 client code is the IBM Data Server Driver Package (for which entitlement is related to an organization's license for IBM Db2 Connect). Sometimes, it's something like the IBM Db2 Connect Runtime Client. In any case, the Db2 client code is considered to be part of the Db2 for Linux/UNIX/Windows (LUW) product family, and it will have a version that corresponds to a Db2 for LUW version.

What is "old Db2 client code" in this context?

Short answer: any version prior to 11.1. Now, to explain that short answer: some would say (understandably) that "old code" means out-of-support code. The 11.1 version of Db2 client code is out of support (and has been since April of 2022 - see Why, then, do I refer to pre-11.1 Db2 client code as being "old," implying that 11.1 Db2 client code, though out of support, is not "old?" It all has to do with context, and the context in this case is a Db2 12 for z/OS system that is the DRDA server for DRDA requester applications.

APPLCOMPAT is a Db2 for z/OS package bind parameter that specifies the Db2 application compatibility level that will be in effect when the package is executed (for more information about APPLCOMPAT, see the part 1 and part 2 blog entries I posted on that topic a few years ago). With the possibility of a few exceptions, every Db2 for z/OS package will have an APPLCOMPAT value, and that is true for the packages in the package collection called NULLID. NULLID is the "home" collection for the Db2 for z/OS packages that are executed when a DRDA requester application accesses the Db2 for z/OS system.

Here's the crux of the matter at hand: if the APPLCOMPAT value for the NULLID packages is taken above V12R1M500, DRDA requester applications will get an error when trying to connect to the Db2 for z/OS system if they are using pre-11.1 Db2 client code.

Why the preceding sentence does not amount to a Db2 13 migration roadblock

Before you can migrate to Db2 13 for z/OS from Db2 12, you have to activate Db2 12 function level V12R1M510 (the last of the Db2 12 function levels). What the Db2 for z/OS systems programmer I referenced at the beginning of this blog entry thought, and what apparently a number of other Db2 for z/OS people think, is that the APPLCOMPAT value for the NULLID packages (and maybe for other Db2 for z/OS packages, as well) has to be V12R1M510 before you can migrate a Db2 12 system to Db2 13. THAT IS NOT TRUE. Can you have, in a Db2 13 system, packages in the NULLID collection (and in other collections) that have an APPLCOMPAT value of V12R1M500? YES, YOU CAN. In fact, APPLCOMPAT values as low as V10R1 are supported in a Db2 13 environment. So, if your NULLID packages are at APPLCOMPAT(V12R1M500), and old (as defined above) Db2 client code is keeping you from upping that APPLCOMPAT value for your NULLID packages, leave the NULLID packages at APPLCOMPAT(V12R1M500), and activate function level V12R1M510 (when your Db2 code and catalog are at the right level), and then migrate the Db2 12 system to Db2 13. There is NOTHING about having NULLID packages at APPLCOMPAT(V12R1M500) that gets in the way of your doing this.

But what if you really want to take APPLCOMPAT for your NULLID packages to a higher level?

First, why might you want to do this? Best answer, I'd say: because you want developers of DRDA applications in your environment to be able to use the latest Db2 for z/OS SQL syntax and functionality (one particularly noteworthy example: the new built-in AI functions of Db2 13 for z/OS, part of that version's SQL Data Insights feature, which can be used via packages with an APPLCOMPAT value of V13R1M500 or higher). If there's pre-11.1 Db2 client code on some of your application servers, and you really want to take APPLCOMPAT higher than V12R1M500 for your NULLID packages (I would), you have a couple of options:

  • One option: update your Db2 client code. This would be for many people the ideal approach. Get the Db2 client code to the current level, which is 11.5, and you get two benefits: 1) you're actually using Db2 client code that is supported by IBM (always nice), and 2) you can take APPLCOMPAT for your Db2 for z/OS NULLID packages as high as you want. Of course, updating the Db2 client code will likely require working in concert with application server administrators in your organization that can perform the code update.
  • Another option: leave the old Db2 client code out there, and raise the APPLCOMPAT value for your NULLID packages anyway. This might be the required approach, at least in the near term, if your application server administrators are not presently able to help update old versions of Db2 client code within your IT infrastructure. How can you do this without causing connection errors for DRDA requester applications that are using old Db2 client code? You do that with the Db2 profile tables, together with an "alternate" collection for the IBM Data Server Driver / Db2 Connect packages, as explained below.

Creating (and, more importantly, using) an alternate collection for the IBM Data Server Driver / Db2 Connect packages

Step 1 for this approach is to create the alternate collection for the packages whose "home" collection is NULLID. This is pretty easy to do: just BIND COPY the packages in the NULLID collection into a collection with some other name (I'll go with OLD_COLL for this example), and in doing that specify APPLCOMPAT(V12R1M500). DRDA requester applications using pre-11.1 Db2 client code will not get connection errors when they use the packages in that OLD_COLL collection. Ah, but how do you get those applications to use the OLD_COLL collection when they will, by default, be looking to use packages in the NULLID collection (NULLID is the default Db2 for z/OS package collection for DRDA requester applications)? This is where the Db2 profile tables come in.

You can use SYSIBM.DSN_PROFILE_TABLE to identify a component of your DDF workload for which you want Db2 to take some action. The component of the DDF workload of interest here is the DRDA requester applications that are using pre-11.1 Db2 client code. How can you identify that DDF workload component as a profile? Easy: use the PRDID (short for product identifier) column of DSN_PROFILE_TABLE (see How do you know which product ID(s) to use? You can get that information via output of the Db2 command -DISPLAY LOCATION (see In the PRDID column of the command output, you'll see the product IDs associated with requesters, and there you'll see the version and release of the Db2 client code that a requester is using (see Using the PRDID information provided via -DISPLAY LOCATION, insert a row (or rows) in DSN_PROFILE_TABLE for the pre-11.1 Db2 client code that is used in your environment. Having done that, for that row (or rows) in DSN_PROFILE_TABLE, insert a corresponding row (or rows) in SYSIBM.DSN_PROFILE_ATTRIBUTES to tell Db2 what you want it to do when one of the DRDA requesters using pre-11.1 Db2 client code requests a connection to the Db2 for z/OS system. And what do you want Db2 to do? You want Db2 to issue SET CURRENT PACKAGE PATH = OLD_COLL (using my example name for the collection into which you BIND COPY-ed the NULLID packages with a specification of APPLCOMPAT(V12R1M500)). This will make OLD_COLL the default collection for the DRDA requester applications using pre-11.1 Db2 client code. Having done this, you can take APPLCOMPT for the NULLID packages higher than V12R1M500, to the benefit of DRDA requester applications that are using 11.1-or-higher versions of the Db2 client code (note that the SET CURRENT PACKAGE PATH = OLD_COLL will happen at application connection time, so after doing the BIND COPY and profile table work you may need to have someone recycle the application servers on which old Db2 client code is running, so they'll get new connections to the Db2 for z/OS system and will be pointed to the OLD_COLL package collection). There is additional information on this approach in an entry I posted to this blog a few years ago (that entry concerns an alternate collection of IBM Data Server Driver / Db2 Connect packages used to get high-performance DBAT functionality, but the collection redirection technique is the same).

And there you have it. I hope you don't have old Db2 client code on your application servers, but if you do, don't worry about that being an impediment to getting to Db2 13, because it isn't.

Thursday, March 30, 2023

Db2 for z/OS: If Index FTB Functionality is Disabled at Your Site, It's Time to Reconsider That

Over the course of the past three weeks, I reviewed ZPARM settings (i.e., configuration parameter values) for three different production Db2 12 for z/OS environments at three different sites, and I noticed that index FTB (fast traverse block) functionality had been disabled in all three cases. I recommended to all three associated Db2-using organizations that they change the relevant ZPARM setting to re-enable FTB functionality, after first validating that the fixes for a set of related Db2 APARs have been applied to their Db2 12 code (the changes made by the fixes are part of the base Db2 13 code). My recommendation for you, if the FTB feature of Db2 has been "turned off" at your site, is to do the same: turn it on, after doing the aforementioned check of Db2 software maintenance if you're using Db2 12. In this blog entry, I'll explain what FTB functionality is, why it was disabled at some sites, and why it's time to go from "disabled" to "enabled" where feature deactivation has happened. I'll also provide information about the fixes (PTFs) that should be on your system to ensure the robust functioning of FTB-supporting Db2 code (again, if we're talking about Db2 12 - the base Db2 13 code has the solidified FTB functionality provided by the Db2 12 fixes).

The FTB raison d'etre: efficient use of non-leaf information in Db2 indexes

Db2 for z/OS indexes serve various purposes, such as assisting with maintenance of a desired ordering of rows in a table and ensuring uniqueness of key values for which duplicates would be problematic, but for the most part indexes in a Db2 system are there to speed the execution of queries (and of non-query SQL statements that contain predicates, aka search clauses). Indexes deliver this performance boost by enabling identification of query result set rows without a laborious row-by-row examination of values. It can be said that Db2 indexes provide shortcuts that get you to your destination (a query's result set) faster than would otherwise be possible.

The information in a Db2 index is arranged in what is known as a B-tree structure. The logical representation of this structure has something of the appearance of an upside-down tree: you have the root page at the top, and the leaf pages at the bottom. In-between the root page and the leaf pages of an index (unless the underlying Db2 table is quite small), you will have one or more levels of non-leaf pages. Finding a row in a table by way of an index on the table involves what is known as an index probe operation: Db2 starts at the root page and navigates down through the other non-leaf levels to reach the leaf page that contains the searched-for key value and the ID of the row (i.e., the row ID, or RID in Db2 parlance) or rows in which the key value can be found.

OK, so what is the value of index fast traverse blocks? Well, an index probe involves GETPAGE activity. A GETPAGE is a Db2 request to examine the contents of a page in an index or a table space (when the page in question is not already in a Db2 buffer pool in memory, the GETPAGE drives a read I/O request). The more rows a table has, the more levels an index on the table can have. More index levels means more GETPAGE activity associated with use of the index, and that matters because GETPAGE activity is one of the main determinants of the CPU cost of executing a query. Index fast traverse block functionality, introduced by Db2 12 for z/OS, improves CPU efficiency for query execution by reducing index-related GETPAGE activity.

An FTB reduces index GETPAGE activity by providing Db2 with a way to get to the leaf page of an index in which a query-predicate-matching key value is found without having to perform a top-to-bottom index probe. How that works: when Db2 builds an FTB structure in memory that is based on a given index, Db2 puts in that FTB structure the information in the non-leaf pages of the index (note that this is NOT just a matter of caching the index's non-leaf pages in memory - the FTB structure has a space requirement that is considerably smaller than what would be required to cache the index's non-leaf pages in an as-is manner); furthermore, navigation through an FTB structure does not require GETPAGE activity. Yes, FTB navigation does involve some instruction path length, but less than would be needed for the index GETPAGEs that would otherwise be required to get to a target leaf page. Let's say that an index on a large table has five levels. Retrieving a table row via the index will require six GETPAGEs - five for the index and one for the table space. If, on the other hand, Db2 has built an FTB structure from the index, when a query having a predicate that matches on the index's key is executed then Db2 can go to the FTB structure with the key value referenced in the predicate, and the FTB will tell Db2, "This is the leaf page in which you'll find that key value." Db2 then does one GETPAGE to examine that leaf page's contents, finds the key value and the associated RID, and does one more GETPAGE to access the row in the table space. Thanks to the FTB, we've gone from six GETPAGEs (five for the index and one for the table space) to two GETPAGEs (one for the index leaf page, one for the table space). Pretty good.

How is FTB functionality turned off, and why would anyone do that?

The FTB "on/off switch" is the ZPARM parameter INDEX_MEMORY_CONTROL. The default value for that parameter is AUTO. When INDEX_MEMORY_CONTROL is set to AUTO, Db2 notes the size of the subsystem's buffer pool configuration (i.e., the aggregate size of the subsystem's buffer pools) and says (figuratively speaking), "OK, I can create FTB structures from indexes, and the maximum amount of in-memory space I'll use for those FTB structures is equivalent to 20% of the size of the buffer pool configuration." Note that this is not space taken away from the buffer pools - it's net additional use of the z/OS LPAR's real storage by Db2. Consider an example: Db2 subsystem DB2P has 50 GB of buffer pools. If INDEX_MEMORY_CONTROL for DB2P is set to AUTO, DB2P can use up to 10 GB (20% times 50 GB) of memory for index FTBs. The size of the DB2P buffer pool configuration is not affected by FTBs - it remains at 50 GB. Got it?

Besides AUTO, another acceptable value for INDEX_MEMORY_CONTROL is an integer between 10 and 200,000. That would set the FTB memory usage limit in terms of megabytes. Using the previous example, if the z/OS LPAR in which subsystem DB2P is running is generously configured with memory, the organization might decide to set INDEX_MEMORY_CONTROL to 20000 if they want Db2 to be able to use up to about 20 GB of memory for index FTBs, versus the 10 GB limit established via the AUTO setting (20% of the 50 GB size of the buffer pool configuration assumed for the example). If, on the other hand, the z/OS LPAR's memory resource is quite limited, the organization might opt to set INDEX_MEMORY_CONTROL to 1000, to restrict DB2P's use of memory for index FTBs to about 1 GB (I say, "about," because 1 GB is actually 1024 MB).

INDEX_MEMORY_CONTROL can also be set to DISABLE. That has the effect of turning FTB functionality off. Why would someone disable a CPU efficiency-boosting Db2 feature? Well, relatively early on in the lifecycle of Db2 12 for z/OS (which became generally available in October of 2016), a few sites encountered some issues related to index FTB functionality. In some cases, use of an FTB was seen to cause a query to return incorrect output. These situations were pretty uncommon (recall that index FTB functionality is on by default, and most Db2 12 sites with INDEX_MEMORY_CONTROL set to AUTO encountered no problems in leveraging the technology), but they were real. Some organizations heard that other organizations had had some problems related to FTB usage, so they disabled the feature as a preemptive measure. I get it.

Why using FTB functionality makes sense now

In response to the FTB-related issues mentioned above, the IBM Db2 for z/OS development team created a number of code fixes that addressed the problems reported by Db2-using organizations. These fixes and their associated APARs (an APAR is an official description of a software problem for which IBM commits to providing corrective service) are noted in a blog entry, written by members of the Db2 development organization, that can be viewed at If INDEX_MEMORY_CONTROL is set to DISABLE at your site, and if you are using Db2 12 for z/OS, check to see if the PTFs listed in this blog entry have been applied to your Db2 code. If they have been applied (or if you are using Db2 13), you can use index FTB functionality with confidence. If you are using Db2 12 and the fixes have not been applied in your environment, my recommendation is to get them applied, perhaps as part of a roll-out of a new and more-current level of z/OS software maintenance at your site.

The confidence that the IBM Db2 for z/OS development team has in FTB functionality, with the corrective maintenance applied, is evidenced by a couple of things. First, Db2 12 function level 508 extended FTB functionality to non-unique indexes (it had originally been limited to unique indexes). Second, Db2 13 for z/OS makes FTB functionality available for a larger set of indexes by doubling the key-length limit for FTB-qualifying indexes - from 64 bytes to 128 bytes for unique indexes, and from 56 bytes to 120 bytes for non-unique indexes (as previously mentioned, the code corrections made for Db2 12 by the FTB-related fixes listed in the above-referenced blog entry are part of the Db2 13 base code). The Db2 development team would not have made FTB functionality available for a wider range of indexes if they were anything less than highly confident in the quality of the FTB-supporting code.

Note that if you have INDEX_MEMORY_CONTROL set to DISABLE, and you're interested in turning FTB functionality on but would like to do so in a more-controlled and more-limited way before going to a setting of AUTO, that option is available to you. As noted in the blog entry for which I provided the link, above, and in the Db2 12 and Db2 13 online documentation, you can tell Db2, via a specification of (SELECTED, AUTO) or (SELECTED, n) for INDEX_MEMORY_CONTROL (where n would be a user-designated limit, in MB, on the memory that Db2 can use for FTB structures), that FTB structures can be built only for indexes that you have identified as FTB candidates by way of the SYSINDEXCONTROL table in the Db2 catalog.

In summary, if you have the FTB-solidifying fixes applied in your Db2 12 environment, or if you are running with Db2 13, and you have INDEX_MEMORY_CONTROL set to DISABLE, you should rethink that. The current FTB code is very robust, and if you don't leverage the functionality then you're leaving CPU savings on the table. I'd prefer to see you realize those CPU savings.

Thursday, February 23, 2023

Two RMF (z/OS Monitor) Reports with which Db2 for z/OS People Should be Familiar

When it comes to analyzing the performance of a Db2 for z/OS system and associated applications, I think of a set of concentric circles. The outermost circle represents the operational environment in which Db2 is processing work - that would be the z/OS LPAR (logical partition) in which the Db2 subsystem runs. The next circle within that outer one represents the Db2 subsystem itself - its buffer pool configuration, EDM pool, RID pool, lock manager, recovery log, etc. The innermost of these concentric circles relates to the applications that access Db2-managed data. With the three circles in mind, I take an "outside-in" approach to Db2 system and application tuning. In other words, I begin with a look at the z/OS system within which Db2 is running, then I turn to the Db2 subsystem itself and lastly I analyze application-centric information. The reason for this approach? If the z/OS system in which Db2 is running is constrained in some way, there's a good chance that Db2 subsystem and application tuning actions will yield little positive impact. Similarly, if the Db2 subsystem is operating in a constrained fashion then application tuning actions may not do much good.

So, if assessing the operation of a z/OS system is important prior to turning to Db2 subsystem and/or application performance analysis, how do you determine whether the z/OS LPAR in question is running in a constrained or an unconstrained way? I do this based on examination of information in two RMF reports: the Summary report and the CPU Activity report. If you support a Db2 for z/OS system, you should be familiar with the content of these reports - in particular, some key metrics provided by the reports. In this blog entry I'll point out those key metrics and explain how I use them.

[Note: I am referring to reports generated by IBM's RMF z/OS monitor because RMF is the z/OS monitor with which I am most familiar. If your organization uses another vendor's z/OS monitor, that monitor might be able to generate reports similar to the RMF reports that are the subject of this blog entry - if need be, check with the vendor on that.]

The RMF CPU Activity report

RMF, by default (you can change this), carves the time period covered by a CPU Activity report into 15-minute intervals (so, if you requested a report for a one-hour time period you will see within that report four sub-reports, each providing information for a 15-minute part of the overall one-hour time period). Within a given 15-minute interval you will see, for the z/OS LPAR for which the report was requested, information that looks something like this (I highlighted two important values in red):

---CPU---    ---------------- TIME % ----------------
 0    CP     100.00    87.03        86.85        0.00
 1    CP     100.00    77.76        77.68        0.00
 2    CP     100.00    83.88        83.78        0.00
 3    CP     100.00    87.07        86.91        0.00
 4    CP     100.00    76.23        76.14        0.00
 5    CP     100.00    76.79        76.71        0.00
 6    CP     100.00    80.45        80.35        0.00
 7    CP     100.00    73.29        73.24        0.00
 8    CP     100.00    63.83        69.22        0.00
 9    CP     100.00    57.78        62.95        0.00
 A    CP     100.00    35.28        48.33       17.01
TOTAL/AVERAGE          72.67        75.16
12    IIP    100.00    66.63        58.68        0.00
                                    46.30        0.00
13    IIP    100.00    26.70        23.42        0.00
                                    18.24        0.00
14    IIP    100.00     9.21         8.07        0.00
                                     6.42        0.00
3E    IIP    100.00     0.00        -----      100.00
                                    -----      100.00
TOTAL/AVERAGE          25.64        26.86

Here is an explanation of what you see in the report snippet above:
  • NUM - This is the ID of a given "engine" (processor core) configured for the LPAR.
  • TYPE - CP is short for central processor (typically referred to as a "general-purpose engine"); IIP is short for integrated information processor (typically referred to as a "zIIP engine").
  • LPAR BUSY - Engine utilization from the LPAR perspective.
  • MVS BUSY - I think of this as the busy-ness of the physical engine - if the engine is used exclusively (or almost exclusively) by the LPAR in question, the LPAR busy and MVS busy numbers should be very similar.
  • PARKED - This is the extent to which an engine's capacity is NOT available to the LPAR during the 15-minute interval (so, if the engine is seen to be 75% parked then the LPAR has access to 25% of that engine's processing capacity). When engines in a mainframe "box" (sometimes called a CEC - short for central electronic complex) are shared between several LPAR's it's not unusual to see a non-zero parked value for at least some of an LPAR's engines).
Note that for this LPAR, there are two MVS BUSY values for each zIIP engine. Why is that? Well, it indicates that the zIIP engines are running in SMT2 mode. SMT2 is short for simultaneous multi-threading 2, with the "2" meaning that z/OS can dispatch two pieces of work simultaneously to the one zIIP core. Running a zIIP engine in SMT2 mode does not double the engine's capacity (each of the two pieces of work dispatched to the one zIIP core will not run as fast as would be the case if the zIIP engine were running in "uni-thread" mode), but for a transactional workload SMT2 can enable a zIIP engine to deliver around 25-40% more throughput versus uni-thread mode (think of a one-way, single-lane road with a speed limit of 60 miles per hour versus a one-way, 2-lane road with a speed limit of 45 miles per hour - the latter will get more cars from A to B in a given time period if there's enough traffic to take advantage of the two lanes). For more information on SMT2 mode for zIIPs, see the entry on that topic that I posted to this blog a few years ago.

OK, to the values highlighted in red in the report snippet:
  • TOTAL/AVERAGE MVS BUSY for the general-purpose engines (75.16 in the report snippet) - As a general rule, application performance - especially for transactional applications (e.g., CICS-Db2, IMS TM-Db2, Db2 DDF) - will be optimal if average MVS busy for an LPAR's general-purpose engines does not exceed 80%. When average MVS busy for the general-purpose engines exceeds 90%, you can see a negative impact on the performance of Db2-accessing applications in the form of what is labeled "not accounted for" time in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information. Not-accounted-for time is in-Db2 elapsed time that is not CPU time and not one of the "known" Db2 wait times (those are so-called class 3 wait times, such as wait for database synchronous read, wait for lock/latch, wait for update/commit processing, etc.). It's literally elapsed time, related to SQL statement execution, for which Db2 cannot account. In my experience, in-Db2 not-accounted-for time is most often a reflection of wait-for-dispatch time, which itself is indicative of CPU contention. I'm generally not too concerned about not-accounted-for time as long as it's less than 10% of in-Db2 elapsed time for an application workload - particularly when it's a higher-priority transactional workload (you might tolerate a higher percentage of not-accounted-for time for a lower-priority batch workload). If not-accounted-for time exceeds 10% of in-Db2 elapsed time (again, especially for a higher-priority transactional workload), that would be a matter of concern for me, indicating that CPU contention is negatively impacting application throughput.
  • TOTAL/AVERAGE MVS BUSY for the zIIP engines (26.86 in the report snippet) - How "hot" can you run zIIP engines before zIIP engine contention becomes a concern? That depends on how many zIIP engines the LPAR has (and, to a lesser extent, whether or not the zIIPs are running in SMT2 mode). The more zIIP engines an LPAR has, the higher the average MVS busy figure for those engines can go before zIIP contention becomes an issue (in the example shown above, the LPAR has three zIIP engines that are running in SMT2 mode - in such a situation average MVS busy for the zIIP engines could probably go to 40-50% without zIIP contention becoming an issue). And when does zIIP contention become an issue? When the zIIP spill-over percentage gets too high, as explained in an entry I posted a few years ago to this blog. [Note: the report snippet shows four zIIP engines, but the fourth - the one identified as processor number 3E - is 100% parked from the LPAR's perspective. That means the LPAR had no access to zIIP processor 3E's capacity, so in effect the LPAR had three zIIP engines during the time interval.]
Below the information shown in the report snippet above, you'll see a sideways bar chart that looks something like this (again, I've highlighted some key information in red):

 NUMBER OF              0    10   20   30   40   50   60   70   
 WORK UNITS     (%)     |....|....|....|....|....|....|....|....

<=  N          55.9     >>>>>>>>>>>>>>>>>>>>>>>>>>>>
 =  N +   1     3.5     >>
 =  N +   2     3.1     >>
 =  N +   3     3.5     >>
<=  N +   5     5.5     >>>
<=  N +  10    10.9     >>>>>>
<=  N +  15     5.7     >>>
<=  N +  20     4.2     >>>
<=  N +  30     3.1     >>
<=  N +  40     1.5     >
<=  N +  60     1.3     >
<=  N +  80     0.4     >
<=  N + 100     0.2     >
<=  N + 120     0.1     >
<=  N + 150     0.2     >
>   N + 150     0.2     >


With regard to the report snippet above, the first thing to which I want to draw your attention is the bottom-line information about "N". We see that, for this LPAR during this 15-minute interval, N = 16.8. What does that mean? It means that the LPAR had "16.8 processor targets to which pieces of work could be dispatched." Why do I use the phrase "processor targets" instead of "processors?" It's because we tend to think of "mainframe processors" as meaning "mainframe engines," and that's not quite the case here. This report snippet goes with the first one we looked at (the second snippet appears right after the first one in the source RMF CPU Activity report), and you might recall that the first snippet showed that the LPAR's three zIIP engines are running in SMT2 mode. For that reason, those three zIIP engines are counted as six processor targets to which pieces of work can be dispatched. Thus, when the report shows that N = 16.8, we can say that 6 of the 16.8 relate to the LPAR's zIIP engines. That leaves 10.8 (16.8 - 6 = 10.8). We've accounted for the zIIP engines, so the 10.8 number relates to general-purpose engines. Where does that 10.8 come from? Refer again to the first report snippet. You'll see that the LPAR had 10 general-purpose processors that were not at all parked (i.e. that were 0% parked from the LPAR's perspective). An 11th general-purpose engine, identified as processor number A, was 17.01% parked during the interval. That means that 83% of the capacity of general-purpose processor number A (that's a hexadecimal A) was available to the LPAR during the time interval. That 83% is equivalent to 0.83, and RMF rounds 0.83 down to 0.8, and that's where the ".8" of N = 16.8 comes from. So, then, the LPAR had 6 zIIP "targets" to which work could be dispatched (3 engines, each running in SMT2 mode), and 10.8 general-purpose targets to which work could be dispatched, and that's why we have N = 16.8.

With N now understood, we can turn our attention to the other bit of information I highlighted in red: <= N 55.9. What does that mean? It means that for 55.9% of the time in the 15-minute report interval, the number of "in and ready" tasks (i.e., the number of tasks ready for dispatch) was less than or equal to the number of processor targets to which pieces of work in the LPAR could be dispatched. When that is true - when the number of in-and-ready tasks is <= N - there is essentially nothing in the way of CPU constraint, because an in-and-read task won't have to wait in line to get dispatched to a processor. In my experience, when the <= N figure is above 80%, the LPAR is very unconstrained in terms of processing capacity. A figure between 50% and 80% is indicative of what I'd call moderate CPU constraint, and performance (particularly in terms of throughput) is likely not impacted much by a lack of processing capacity. When the figure is below 50%, I'd say that CPU constraint could be impacting throughput in a noticeable way, and if it's below 10% the performance impact of CPU constraint for the LPAR could be severe. As previously mentioned, the Db2 performance impact of a CPU-constrained system is typically apparent in elevated levels of in-Db2 not-accounted-for time, as seen in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information.

One more thing about an RMF CPU Activity report: the number of engines configured for an LPAR - something that is shown in the report - should be balanced by an adequate amount of memory (aka real storage) so that the LPAR's processing power can be fully exploited to maximize application performance. For a z/OS LPAR in which a production Db2 workload runs, my rule of thumb, based on years of analyzing system and application performance data, is this: the LPAR should have at least 20 GB of memory per engine with which it is configured. The first report snippet included above shows that the LPAR has 13.8 engines: 10.8 general-purpose engines (as previously mentioned, the ".8" relates to an engine that is about 20% parked from the LPAR's perspective) and 3 zIIP engines (and for balanced-configuration purposes, I count physical zIIP cores - I don't double-count a zIIP engine because it is running in SMT2 mode). I'd round the 13.8 to 14 (the nearest integer) and say that on that basis the LPAR should have at least 14 X 20 GB = 280 GB of memory. If that seems like a lot to you, it shouldn't - mainframe memory sizes are getting bigger all the time, and real storage resources in the hundreds of GB are no longer unusual for production z/OS LPARs, especially those in which Db2 workloads run (the biggest real storage size I've personally seen for a z/OS LPAR is about 1100 GB).

The RMF Summary report

An RMF Summary report is smaller in size than a CPU Activity report - typically, one line of information for each 15 minute interval within the report time period. An RMF Summary report for a one-hour period could look something like what you see below (I removed some columns so that I could use a big-enough-to-read font size - the really important column is the one on the far right, with the numbers highlighted in green):

 11/03 09.15.00 15.00 ...  83   72   96   92  371 ... 0.00   0.00
 11/03 09.30.00 14.59 ...  85   68   98   95  369 ... 0.00   0.00
 11/03 09.45.00 15.00 ...  75   68   95   92  363 ... 0.00   0.00
 11/03 10.00.00 14.59 ...  82   70   94   91  365 ... 0.00   0.00
-TOTAL/AVERAGE        ...  85   69   98   93  371 ... 0.00   0.00

So, what's the LPAR's demand paging rate? It's the rate, per second, at which pages that had been moved by z/OS from real to auxiliary storage (to make room for other pages that needed to be brought into real storage) were brought back into real storage on-demand (i.e., because some process needs to access the page). Why is the demand paging rate important? Here's why: it is, in my opinion, the best indicator of whether or not memory usage can be expanded without putting too much pressure on the LPAR's real storage resource. Here's what I mean by that: suppose you have a Db2 buffer pool that has a total read I/O rate (synchronous reads + sequential prefetch reads + list prefetch reads + dynamic prefetch reads, per second) that's higher than you like - maybe the total read I/O rate for the pool is north of 1000 per second, and you want to bring that down substantially to boost application performance and CPU efficiency (every read I/O eliminated saves CPU and elapsed time). The best way to lower a buffer pool's total read I/O rate is to make the pool larger. Can you do that without putting too much pressure on the LPAR's real storage resource? Here's what I'd say: If the LPAR's demand paging rate is consistently less than 1 per second, you have a green light for using more memory to get a performance boost. If the LPAR's demand paging rate is consistently greater than 1 per second, I'd hold off on using more memory until the LPAR is configured with additional real storage. This goes for any action that would increase memory usage by DB2 - besides enlarging a buffer pool, that could be a RID pool or a sort pool or an EDM pool size increase, or increasing the use of RELEASE(DEALLOCATE) packages with threads that persist through commits, or whatever. Before doing something that will increase memory usage, check the LPAR's demand paging rate.

That's it for now. If you haven't had a look at these two RMF reports before, get them for an LPAR of interest to you and give them a look-see - a z/OS systems programmer at your site would probably be happy to generate the reports for you. Knowing the key utilization and configuration characteristics of the z/OS LPAR in which a Db2 subsystem runs is an important part of effective performance management of the Db2 environment.

Friday, January 20, 2023

Db2 for z/OS: What is "Wait for Other Read" Time, and What Can You Do About It?

A recent conversation I had with some folks who support a large Db2 for z/OS system reminded me of the importance of something called "wait for other read time." In this blog entry I want to make clear to people what Db2 wait-for-other-read time is, why it's important, how to monitor it and what to do about it if it becomes an issue.

What is Db2 for z/OS "wait for other read" time?

In Db2 performance monitoring parlance, time associated with SQL statement execution is known as "in-Db2" time. It's also called "class 2" time, because it is recorded, for monitoring purposes, in records that are generated when Db2 accounting trace class 2 is active. Class 2 elapsed time (elapsed time pertaining to SQL statement execution) has two main components: CPU time (some of which is consumed on so-called general-purpose processors - aka "engines" - of a mainframe server, and some of which might be consumed on what are known as zIIP engines) and suspend time (on a busy system there can be another component of in-Db2 time, called "not accounted for" time, that generally reflects wait-for-dispatch time). In-Db2 suspend time is also known as "class 3" time, because it is recorded in Db2 accounting trace records when accounting trace class 3 is active. Class 3 time is broken out in a number of categories, and these show up in an accounting long report that might be generated by your Db2 monitor, or by a Db2 monitor's online display of thread detail information.

In a Db2 monitor-generated accounting long report, class 3 suspend times are shown as "average" values. Average per what? Well, if you're looking at information for a Db2-accessing batch workload (referring to jobs that run in z/OS JES initiator address spaces and access Db2 by way of Db2's call attachment facility or TSO attachment facility), it'll be average per batch job (generally speaking, activity for one batch job will be recorded in one Db2 accounting trace record). If you're looking at a transactional workload (e.g., a CICS-Db2 workload, or a Db2-accessing IMS transactional workload, or a DDF client-server workload), the "average" values seen in a Db2 monitor-generated accounting long report will typically be average per transaction.

In many cases, the majority of in-Db2 time for a batch or a transactional workload will be class 3 suspend time (it is a little unusual, but certainly not unheard of, for a Db2 workload's in-Db2 time to be mostly CPU time). More often than not, the largest component of in-Db2 class 3 suspension time will be wait-for-synchronous-database-read time. Another wait-for-read time is labeled "wait for other read." What's that? Well, if it's "other than" synchronous read wait time, it must be asynchronous read time, right? Right, indeed. And what are asynchronous reads? Those are prefetch reads: read I/Os driven by Db2 in anticipation that the pages read into memory in bulk in this way will be requested by the process (such as an application process) that prompted Db2 to issue the prefetch read requests. Well, if a prefetch read I/O operation is executed because Db2 is aiming to get pages into a buffer pool in memory before they are requested by (for example) an application process, why would there be such a thing as a process having to wait for a prefetch read to complete?

Wait-for-prefetch read (reported as "wait for other read") happens because there are usually lots of Db2-accessing processes active in a system at one time. Let's call two of these processes process A and process B, and let's say that Db2 is driving prefetch reads (these could be sequential, list or dynamic prefetch reads - more on that in a moment) for process A. We'll further suppose that Db2 needs to access page 123 of table space TS1 on behalf of process B (i.e., Db2 issues a GETPAGE request for page 123 of table space TS1). If page 123 of table space TS1 is not already in the buffer pool to which TS1 is assigned, Db2 will drive a synchronous read request to get that page into memory, right? Not necessarily. It could be that page 123 of TS1 is already scheduled to be brought into memory via a prefetch read that is being executed on behalf of process A. If that is the case then process B will wait for that in-flight prefetch read to complete, and that wait time will be recorded as "wait for other read time" for process B. [It is also possible that process A has gotten to the point that it needs to access page 123 of TS1, and the prefetch read that will bring that page into memory is currently in-flight, and that would end up causing wait-for-other-read time for process A related to the prefetch request being driven on behalf of process A, but I think it's more likely that wait-for-other-read time will be associated with one process waiting on completion of a prefetch read operation that is being executed on behalf of another process.]

Why is wait-for-other-read time important?

Usually, wait-for-other-read time is a relatively small percentage of total class 3 suspend time for a process (it's typically much smaller than wait-for-synchronous-read time), but that's not always the case. In some situations, wait-for-other-read time is a major component of overall in-Db2 suspend time. The performance impact of elevated wait-for-other-read time can be especially significant for batch applications, as these Db2 processes are often particularly reliant on prefetch to achieve elapsed time objectives. If wait-for-other-read time gets too large then service levels could degrade, leading to user dissatisfaction.

How can wait-for-other-read time be monitored?

As mentioned previously, wait-for-other read time is recorded in accounting long (i.e., accounting detail) reports that can be generated by a Db2 performance monitor; so, you can track that for a process or a workload over time and note trends. Besides wait-for-other-read time itself, are there any other related fields in Db2 monitor-generated reports that you should keep your eye on to help ensure that a wait-for-other-read time problem does not sneak up on you? Yes, as explained below.

The "other related fields" that I'd recommend checking out are found in a Db2 monitor-generated statistics long report (i.e., statistics detail report). In such a report you would see, for each buffer pool, a set of fields like those shown below (this is a snippet of a statistics long report generated by the IBM OMEGAMON for Db2 for z/OS performance monitor - I've added some A, B, C labels that I'll subsequently use in referencing various of these fields):

---------------------------  --------  -------

SEQUENTIAL PREFETCH READS     5587.00     3.21   B
PAGES READ VIA SEQ.PREFETCH  52950.00    30.43   C
S.PRF.PAGES READ/S.PRF.READ      9.48            D
LIST PREFETCH REQUESTS       47394.00    27.24   E
LIST PREFETCH READS           5876.00     3.38   F
PAGES READ VIA LIST PREFTCH    154.9K    89.03   G
L.PRF.PAGES READ/L.PRF.READ     26.36            H
DYNAMIC PREFETCH READS         157.6K    90.59   J
PAGES READ VIA DYN.PREFETCH   3110.6K  1787.68   K
D.PRF.PAGES READ/D.PRF.READ     19.73            L

By way of explanation, I'll first point out that what you see above are three repeating sets of fields (4 fields in each set) that pertain to sequential, list and dynamic prefetch activity. Here are thumbnail definitions of these prefetch types:

  • Sequential - Generally speaking, this is the prefetch mode used for table space scans or for non-matching index scans. In other words, if Db2 determines that a front-to-back scan of a table space or index will be required, sequential prefetch will be used (assuming that the table or index in question is not super-small, in which case prefetch of any kind would usually not make sense).
  • List - This is the prefetch type used when Db2 is retrieving table rows based on a list of row IDs (RIDs) that have been retrieved from an index (or from more than one index, if index ANDing or index ORing is part of the access plan for the query). List prefetch can be efficient if the clustering sequence of rows in the target table is substantially uncorrelated with respect to the order of entries in the index in question (the list of RIDs obtained from the index is sorted in ascending RID sequence and then the sorted RID list is used to prefetch pages of associated rows from the target table). The hybrid method of joining tables is another driver of list prefetch activity.
  • Dynamic - This prefetch method is dynamically initiated at statement execution time when Db2 recognizes a sequential pattern of data access as it retrieves rows. Matching index scans are often drivers of dynamic prefetch activity.
OK, so here are a couple of things to keep an eye on, if you want to avoid a surprise situation involving elevated levels of wait-for-other-read time for processes that use prefetch to access pages of objects assigned to a given buffer pool:

  • Prefetch reads relative to prefetch requests - This tends to be more important for list and dynamic prefetch (less so for sequential prefetch, owing to locality of reference being less of a factor in that case). For list and dynamic prefetch, then, compare the number of prefetch reads to the number of prefetch requests (i.e., compare F to E, and J to I, using the letter-labels I added to the statistics report snippet shown above). What's this about? Well, a prefetch request is just that - a request to read a certain chunk of pages from a table space or an index into the assigned buffer pool. Suppose the prefetch request is for 32 pages (the most common quantity), and suppose that all 32 of those pages are already in the buffer pool. In that case, the prefetch request will not drive a prefetch read I/O operation. The larger the number of buffers allocated for a pool, the greater the likelihood that all pages associated with a prefetch request will already be in memory, thereby reducing prefetch reads as a percentage of prefetch requests. If you see the percentage of prefetch reads relative to prefetch requests going up over time for a pool, especially for list and/or dynamic prefetch, that's an indication that elevated levels of wait-for-other-read time could be in the offing. Why? Because more prefetch reads will generally mean more waiting for prefetch reads to complete.
  • The number of pages read per prefetch read - These are the fields labeled D, H and L in the example statistics report snippet. If you see that number going up for one or more prefetch types (sequential, list, dynamic), it could be an early-warning sign of higher wait-for-other-read times. Why? Because a prefetch read that will bring 25 pages into memory is likely to take longer than a prefetch read that will bring 5 pages into memory (recall that a prefetch read I/O is driven to bring into memory the pages, associated with a prefetch request, that are not already in the buffer pool). When prefetch reads take longer to complete, it is likely that application processes will see higher levels of wait-for-other-read time.
At this point you may have put two and two together, and are thinking, "Hmm. It seems to me that a growing number of prefetch reads relative to prefetch requests combined with an increase in the number of pages read into memory per prefetch read would really be a flashing yellow light with respect to wait-for-other-read time." Right you are. In that case, two things are happening, and both have negative implications for wait-for-other-read time: there are more prefetch reads (because there are fewer cases in which all pages associated with a prefetch request are already in memory) and each prefetch read, on average, is taking longer to complete (because each read, on average, is brining more pages into memory). If too much of that goes on, you could "hit the curve of the hockey stick" and see a sharp and sudden increase in applications' wait-for-other-read times. Better to take a corrective action before that happens. But what?

Glad you asked...

What can you do to reduce (or head off an increase in) wait-for-other-read time?

If wait-for-other-read time has become problematic, or if you see the warning signs and want to take a preemptive action, what can you do? Here are some possibilities:

  • Increase the size of the buffer pool in question - Simple: more buffers in a pool leads to increased page residency time, and that leads to 1) more prefetch requests NOT leading to prefetch reads (because all pages associated with a request are already in memory) and 2) fewer pages, on average, per prefetch read. Fewer prefetch reads + quicker execution of prefetch reads that do occur = less wait-for-other-read time. Obvious related question: "Can I make a buffer pool bigger? I don't want to put too much pressure on the z/OS LPAR's real storage resource." My response: check the LPAR's demand paging rate (available via an RMF Summary Report for the LPAR). If the demand paging rate is zero or a very small non-zero value (i.e., less than 1 per second), there is little to no pressure on the real storage resource, and you have a green light for making the buffer pool bigger. If the demand paging rate is 2-3 per second or more, and you don't want it to go higher than that (I wouldn't want it to go higher than that), consider reducing the size of a buffer pool that has low GETPAGE-per-second and read-I/O-per-second values, and increase the size of the buffer pool of concern by a like amount (so, the overall size of the buffer pool configuration remains the same). In my experience, plenty of Db2 for z/OS-using organization under-utilize the real storage resources of production z/OS LPARs.
  • Change some query access plans - If it looks as though sequential prefetch reads are the primary contributor to higher wait-for-other read times, you can consider taking actions that would reduce table space scan and/or non-matching index scan activity. For that, you could potentially use a query monitor to identify longer-running queries that access objects assigned to the buffer pool in question, and examine EXPLAIN output for those queries to see if any of them have table space scans and/or non-matching index scans in their access plans that involve objects assigned to the buffer pool. Then, consider whether it would be worth it to create a new index or indexes to eliminate such scans (there are cost factors associated with new indexes - you want the benefit to outweigh the cost), or whether simply adding a column to an existing index might reduce scan activity (there is a cost associated with that, too, but it's not as high as the cost of a new index). For dynamic prefetch, keep in mind that this is often related to matching index scans. You can sometimes reduce that activity by enabling Db2 to do more result set row filtering at the index level, and that often involves trying to increase MATCHOLS values for one or more predicates of longer-running and/or more-frequently-executed queries (referring to the name of a column in the PLAN_TABLE in which EXPLAIN output is found). Boosting MATCHCOLS can involve things such as changing an index (add a column, or change the order of columns in a key - keeping in mind that the latter change could benefit some queries and negatively impact others), or maybe re-coding some non-indexable predicates to make them index-able, or maybe adding a predicate that does not change a query's result set. For list prefetch, keep in mind that this often has to do with rows in a table being clustered in a sequence that is very different from the sequence of the index used in the list prefetch operation. You might consider whether a table's clustering key is what it should be - the clustering key of a table can always be changed, and sometimes it makes sense to do that. Also, when index ANDing is driving a lot of list prefetch activity, increasing index-level filtering can help (maybe by adding a column to an index involved in the ANDing, to increase the column-match number, or adding an index that would take the number of indexes AND-ed from n to n+1).
  • Take a look at RID pool activity - List prefetch operations involve use of the Db2 subsystem's RID pool. If the RID pool can't be used for a RID processing operation, Db2 will fall back to a table space scan for the target table, and that can drive sequential prefetch numbers up. RID pools these days are MUCH larger than they used to be (the default RID pool size in a Db2 12 or Db2 13 system is 1 GB), so incidences of RID processing "failed (or not used) due to lack of storage" - something that is indicated in Db2 monitor-generated accounting long as well as statistics long reports - are now quite rare. What you could potentially see, however, in the RID processing block of a statistics long report is a relatively large number of occurrences of "failed due to RDS limit exceeded." What this RID-pool-not-used counter means: If Db2 is executing a query, and a RID list processing action commences, and Db2 determines that more than 25% of the RIDs in the index being accessed will be qualified by the predicate in question, Db2 will abandon the RID list processing action in favor of a table space scan. Can you do anything about this? Maybe. In the case of a static SQL statement, it is my recollection that this RDS limit value is embedded in the associated Db2 package; so, at bind time, Db2 notes the number of RIDs that would exceed 25% of the RIDs in an index that is to be used as part of a RID-list-utilizing access plan action. Why is this potentially important? Because an index could grow substantially in the months (or years) following the most recent bind or rebind of a package. What this means: if you have an application process for which there are many occurrences of "RID list processing failed - RDS limit exceeded," check how long it has been since the package (or packages) associated with the process were last bound or rebound. If it's been a long time, and if you think that relevant indexes have grown substantially since then, consider rebinding the packages - if that rebind results in new and larger "this is the RDS limit threshold for this index" values being embedded in the package, that value increase might be enough to reduce incidences of "RID list processing failed - RDS limit exceeded" for the package.
OK, that's what I've got on this topic. As I mentioned up front: in-Db2 wait-for-other-read time is usually not a matter of concern for Db2 application performance. In some cases, it can be an issue. This blog entry is aimed at helping you should such a case arise at your site (or even better, to help ensure that it doesn't become an issue for you).

Wednesday, December 21, 2022

Db2 13 for z/OS: Setting Lock Timeout Limit and Deadlock Priority at the Application Level

Db2 13 for z/OS, which became generally available about seven months ago, introduced two interesting features that are similar in some ways but differ in one important aspect (about which I'll comment momentarily). These new features allow an application (or, more broadly, a "process") to set its own lock timeout limit and/or its own deadlock priority. With this blog entry I aim to provide related information that will be useful for you.

Application-level lock timeout

First, let's establish the need for this Db2 13 enhancement. Historically, there has been one lock timeout limit - specified via the IRLMRWT parameter in ZPARM - that applies to all processes interacting with a Db2 subsystem. While IRLMRWT is still there in a Db2 13 environment, it became apparent some time ago that "one size fits all" will often NOT be ideal when it comes to lock timeout in a Db2 system. Think about it. Suppose the value of IRLMRWT is at the default of 30 seconds for a production Db2 system at your site. You might have a developer of a Db2-accessing online application say, "What? NO! This app has a mobile front-end and users can get VERY frustrated if they have to wait more than a few seconds for a transaction to complete. It would be TERRIBLE to have a transaction sit and wait for 30 seconds to get a Db2 lock. We need the lock timeout value to be WAY lower than 30 seconds." At the same time, a developer of a long-running batch application might say, "What? NO! This job HAS to complete once it gets started or we miss SLAs and have angry customers. The job typically runs for five hours, and maybe it's been running for four hours and you want to time it out because it's been waiting for a lock for 30 seconds? 30 seconds is NOTHING as far as this job's concerned. The Db2 lock timeout value should be SUBSTANTIALLY greater than 30 seconds." Both of the developers are expressing legit concerns. How can those disparate concerns be addressed?

They can be addressed via the new (with Db2 13) special register named CURRENT LOCK TIMEOUT (available for use when Db2 13 function level 500 has been activated). Here are some things to know about CURRENT LOCK TIMEOUT:

  • The value of the special register - expressed in seconds - can be anything between -1 and 32767 (or a site-specified upper bound - see the next item in this list). A value if -1 means that the process will not be timed out if it ends up waiting for a lock - it will wait until it gets the requested lock or becomes deadlocked with some other process. A value of 0 means that the process does not want to wait at all for a lock - it wants to get an error message if a requested lock can't be obtained immediately (this basically makes available for application use a formerly Db2-internal mechanism known as a conditional lock request).
  • If the default upper-limit value of 32767 seconds is deemed by a Db2-using organization to be too high, a different max value can be provided via the new (with Db2 13) ZPARM parameter SPREG_LOCK_TIMEOUT_MAX. If you set that value to (for example) 1800, no process will be able to set the CURRENT LOCK TIMEOUT special register to a value greater than 1800 seconds.
  • If a lock timeout occurs and an application-level timeout limit was in effect for the lock requester and/or for the lock holder, that will be reflected in the information provided via the DSNT376I lock timeout message generated by Db2.
  • The value of the CURRENT LOCK TIMEOUT special register can be set automatically for an application by way of the Db2 profile tables, and not just for DDF-using applications (more information on this is provided below).

Application-level deadlock priority

A deadlock, of course, happens when process A holds a lock that process B needs in order to proceed, and process B holds a lock that process A needs in order to proceed. With both processes in a not-able-to-proceed state, Db2 detects the deadlock and chooses a "winner" and a "loser." The "loser" process is rolled back, causing it to release locks it had held, and that enables the other process (the "winner") to acquire the lock for which it had been waiting.

All well and good, except for the fact that one traditionally has been able to do little to nothing to influence Db2's choice of winner and loser in deadlock situations. That changes starting with function level 501 of Db2 13, thanks to a new built-in global variable named DEADLOCK_RESOLUTION_PRIORITY.

Before providing some helpful (I hope) items of information about DEADLOCK_RESOLUTION_PRIORITY, let me point out a very important difference between this new feature and the previously-described CURRENT LOCK TIMEOUT: the latter is a special register, while the former is a global variable. Why is that notable? One simple reason: any process can set the value of a special register, but a process must have permission to set the value of a global variable. The rationale for making DEADLOCK_RESOLUTION_PRIORITY a global variable may already be clear to you: if a value for DEADLOCK_RESOLUTION_PRIORITY could be set by any process, one could imagine everyone setting the global variable to its maximum value ("I always want to be the winner in a deadlock situation"), and that would defeat the purpose of the new capability (as the bad guy, Syndrome, in the movie The Incredibles put it, "When everyone is super, no one will be"). The permission-only nature of DEADLOCK_RESOLUTION_PRIORITY means that (for example) a Db2 DBA can assign the max-priority value to a "must-complete" database administration process, and know that under almost any circumstances (exceptions noted below) the process will be the winner in case of a deadlock. The same could be done - with permission received from a DBA - for a high-priority application process.

OK, items of related information that might be good for you to know:
  • The maximum value for DEADLOCK_RESOLUTION_PRIORITY is 255 (the range of acceptable values for the global variable is 0-255).
  • If you're a Db2 DBA, you might think, "I have a process that I think of as 'should-complete,' versus 'must-complete.' I want that process to generally be the winner in a deadlock situation, but I don't want it to get in the way of a 'must-complete' process. If 255 is a good DEADLOCK_RESOLUTION_PRIORITY value for a 'must-complete' process, what would be a reasonable priority value for a 'should-complete' process?" There is not a totally straightforward answer to that question. What you could do is this: start with some value for the 'should complete' process (maybe 150, or maybe 200, for example), and see if it ends up becoming the loser in a deadlock situation. If that happens, you can see how the priority of the "winner" process compared to the priority that you assigned to your 'should-complete' process, and potentially adjust your process's priority accordingly. How could you see the deadlock priority of a process that "beat" your process? That information is available via the IFCID 172 Db2 trace record. Activating IFCID 172 should involve very little overhead, as the trace record captures information about deadlocks, and deadlocks tend to be unusual in most Db2 systems I've seen. By the way, you should be able to use your Db2 monitor to generate a report with formatted information from IFCID 172 trace records (if you use IBM's OMEGAMON for Db2 monitor, the report to use for this purpose is called the Record Trace Report - that report can format the information in most any Db2 trace record).
  • I mentioned previously that there are exceptions to the "255 always wins" rule. Even if DEADLOCK_RESOLUTION_PRIORITY has been set to 255 for a process, that process could be the loser if it gets deadlocked with a process that is changing data in a table space defined with NOT LOGGED (hard for Db2 to roll back a unit of work when there are no associated undo records in the log), or if it gets deadlocked with a rollback or an abort or a backout process.

Setting the lock timeout limit or deadlock priority automatically for an application

What if you want an application to have a certain lock timeout limit or a certain deadlock priority, but you don't want the application to have to issue a SET CURRENT LOCK TIMEOUT or a SET SYSIBMADM.DEADLOCK_RESOLUTION_PRIORITY statement in order to accomplish the objective (SYSIBMADM is the schema for user-set-able built-in Db2 global variables - SYSIBM is the schema for built-in global variables that are set by Db2)? No problem: you can get that done using the Db2 profile tables. "Yeah," you might say, "but I want to do this for a local-to-Db2 application, and the profile tables can be used to set special register or built-in global variable values only for DDF-using applications." Actually, with Db2 13 that statement is no longer entirely true. Db2 13 allows the setting of CURRENT LOCK TIMEOUT (starting with function level 500) and DEADLOCK_RESOLUTION_PRIORITY (starting with function level 501) via profile table entries for local-to-Db2 as well as for DDF-using applications (for other special registers and built-in global variables, value-setting by way of the profile tables remains do-able only for DDF-using applications).

For a DDF-using application, the profile-defining specification (what you put in SYSIBM.DSN_PROFILE_TABLE) can be in a Db2 13 environment what it could be in a Db2 12 environment (the auth ID an application uses when connecting to the Db2 system is one example; the IP address of an application server is another example). For a local-to-Db2 application, the profile-defining specification can be auth ID and/or role, or collection name and/or package name, or the value of CLIENT_APPLNAME or CLIENT_USERID or CLIENT_WRKSTNNAME. The online Db2 13 for z/OS documentation provides additional details on using the profile tables to set values for special registers and for built-in global variables.

And there you have it. Db2 13 provides more control - and more-granular control - over two important aspects of application execution. I hope that these new capabilities will be useful at your site.

Friday, November 11, 2022

A Case Study: Using Db2 for z/OS Monitor Reports to Zero In on a Performance Problem

I had an interesting exchange recently with a Db2 for z/OS systems programmer. This individual had been asked to assist in determining the cause of a performance problem impacting a Db2-accessing application. The sysprog shared with me a Db2 monitor-generated accounting report showing activity for the application process, and a Db2 monitor statistics report covering the same time period for the same Db2 subsystem. In those two reports were the clues that pointed to the source of the application performance problem, and with the probable problem source identified the associated mitigating actions were readily determined. In this blog entry, I'll take you through my analysis of the Db2 monitor-provided accounting and statistics information, and the rationale behind my recommended steps for resolving the problem. My aim is not only to shed light on a particular performance-affecting issue and related remediating moves, but also to illustrate a methodical approach for analyzing Db2 application performance issues in general.

Input for analysis

I find Db2 monitor generated accounting and statistics reports to be extremely useful for analysis of application performance problems. In both cases, what you want is the detailed form of the report. In the case of IBM's OMEGAMON for Db2 performance monitor (the one with which I'm most familiar), you're talking about the ACCOUNTING REPORT - LONG and the STATISTICS REPORT - LONG (for other Db2 monitor products, these reports might have titles like, "detailed summary of accounting information" or "statistics detail report"). For the accounting report, your preference is to see activity pertaining exclusively to the application process for which the performance issue has arisen. This is usually done by using the monitor's data-filtering capabilities to include, for report-generation purposes, only the Db2 accounting records of interest (an accounting report is basically information from Db2 accounting trace records, formatted for readability). Db2 accounting trace records have all kinds of identifier fields, so you can get pretty specific. In the case about which I'm writing here, the relevant filtering criteria were the authorization ID of the application process of interest, and the FROM and TO times that bracketed the period during which the performance problem occurred.

As for the statistics report, what you want is one that covers the same time period as the accounting report (same FROM and TO times), for the same Db2 subsystem.

Looking for clues

What the Db2 sysprog had been told by the development team is that the application in question started out performing well, and then slowed down a lot (important input). The sysprog and I looked first at the accounting report, and in doing that we focused initially on the "class 3 suspension" information (this information, obtained from Db2 accounting trace class 3 records, has to do with "known" wait events, as opposed to "not accounted for" time, about which I'll comment momentarily). Why did we look there first? Because (in my experience), when a Db2-accessing process starts out performing well and then slows way down, it's often due to a substantial increase in one or more of the "wait times" captured by accounting trace class 3 (yes, an application slowdown could be related to a significant CPU time increase, but I've seen that less often than I've seen large wait time increases).

Looking at the class 3 times for the application process, what jumped out was a very large value for the average DB2 LATCH time ("average" is average per accounting trace record, which typically equates to average per transaction or average per batch job, depending on the workload type). Here, I mean "large" in terms of average DB2 LATCH time being a large percentage of average TOTAL CLASS 3 time. Usually, DB2 LATCH time is a very small percentage of TOTAL CLASS 3 time, with "wait time" categories such as SYNCHRON DATABASE I/O and OTHER READ I/O accounting for the bulk of TOTAL CLASS 3 time. A (proportionately) really large DB2 LATCH time is usually an indicator that something's not right.

The first thing I look at when I see unusually large DB2 LATCH time for a Db2-accessing process is the "in-Db2 not-accounted-for time" for that process. Several Db2 monitor products calculate that for you - in an IBM OMEGAMON for Db2 accounting report, the field is labeled NOTACC, and it's shown, on a sideways bar chart at the top of an accounting report, as a percentage of average in-Db2 elapsed time. If you need to calculate this figure for yourself, the denominator is average in-Db2 elapsed time (aka "class 2" elapsed time), and the numerator is average in-Db2 elapsed time minus in-Db2 CPU time (general-purpose plus zIIP, or "specialty engine," CPU time) minus TOTAL CLASS 3 time. In other words, it's the percentage of in-Db2 elapsed time that is not CPU time and not "identifiable" wait time. For a transactional application process (as was the case for the situation about which I'm writing), as a general rule you want in-Db2 not-accounted-for time to be less than 10%. If that figure is substantially greater than 10%, it's indicative of a CPU-constrained environment, and if the environment is highly CPU-constrained then DB2 LATCH time can get really large (as pointed out in an entry I posted to this blog a few years ago).

Well, in this particular case the average in-Db2 not-accounted-for time for the application process was 11% of in-Db2 elapsed time - a little on the high side for a transactional process, but not high enough to explain a really large DB2 LATCH time. With that cause of elevated DB2 LATCH time pretty much eliminated, it was time to turn to the breakdown of latch wait events for different latch categories, and that's where the statistics report comes in. In a Db2 monitor-generated statistics long report, the latch suspend count information will likely look something like this (what you see below is from an OMEGAMON for Db2 statistics long report, but it is NOT from the report I reviewed with the Db2 systems programmer - we jointly viewed that report in a web meeting, and I do not have a copy of the report):

---------  --------  --------  --------  --------
LC01-LC04      0.00      0.00      0.00      0.00
LC05-LC08      0.00      0.74      0.00      0.41
LC09-LC12      0.00      0.02      0.00      0.32
LC13-LC16      0.00     12.89      0.00      0.00
LC17-LC20      0.00      0.00      0.01      0.00
LC21-LC24      0.04      0.00      1.96      2.84
LC25-LC28      0.12      0.02      0.01      0.00
LC29-LC32      0.06      0.04      0.00      0.28
LC254          0.00

What I saw in the report I reviewed in the web meeting with the Db2 sysprog (and again, that's NOT what you see above - the snippet above is provided so that you can see what the latch suspend count information looks like in a statistics report) was a particularly high value for latch class 6 suspend events (that would be in the position highlighted in green in the sample report snippet above). What is latch class 6? It has to do with index page split activity in a Db2 data sharing environment (by the way, a handy page for seeing the activities associated with various Db2 latch classes is this one from the OMEGAMON for Db2 documentation).

Let's unpack that. An index page split occurs when Db2 has to insert an entry in an index page because of an insert (or an update of an indexed column) and that page is full. In that situation, a portion of the entries in the page will be moved to what had been an empty page in the index, so that there will be room in the formerly-full page for the new entry. What does Db2 data sharing have to do with this (and in the environment about which I'm writing, Db2 is running in data sharing mode)? In a data sharing system (versus a standalone Db2 subsystem), an index page split action has a greater impact on throughput because it forces a log-write operation.

Seeing the high level of index page split activity suggested by the numerous latch class 6 wait events, we turned again to the accounting report to see the average number of insert operations executed by the performance-impacted application. Sure enough, we saw that this was an insert-intensive process - more by the average number of rows inserted, versus the number of INSERT statements executed (the average number of rows inserted, per the accounting report, was about 100 times larger than the average number of INSERT statements executed, indicating use of block-level inserts by the application).

The elevated count of latch class 6 suspend events (related to index page splits) and the insert-intensive nature of the process also dove-tailed with the observed "started out fine, then slowed down" behavior of the application: in all probability, when the process started there was a pretty good amount of free space in leaf pages of indexes on the table(s) into which rows were being inserted. After a while these "holes" in index leaf pages filled up, and that resulted in a large number of index page split actions to make space for new index entries, and THAT - partly due to the fact that this was a Db2 data sharing system - had a majorly negative impact on application performance (the keys of affected indexes were clearly not of the continuously-ascending variety, because index page split actions are not required for an index defined on a continuously-ascending key).

With the source of the performance problem identified, the next matter to consider was...

What to do about it?

The Db2 sysprog and I discussed two problem-mitigating actions - one an "absolutely do" and the other a "maybe do." The "absolutely do" step was to increase the amount of free space in indexes to accommodate new entries. That step, in turn, was comprised of two sub-steps, each of which is applicable to indexes defined on non-continuously-ascending keys. Sub-step one: increase the index's PCTFREE value. Whereas the default PCTFREE value for an index is 10, a value of 20 or 25 might make more sense for an index defined on a non-continuously-ascending key for an insert-heavy table. Sub-step two: increase the index's FREEPAGE value. The default FREEPAGE value is 0. Here's why boosting the FREEPAGE value - for example, to 5 (in which case there would be an empty index page after every 5 pages containing index entries) - can be helpful for an index defined on a non-continuously-ascending key for an insert-heavy table: as previously mentioned, when an index page is split a portion of that page's entries are moved to a previously empty page in the index. If FREEPAGE 0 (the default) is in effect, the only empty index pages will be at the very end of the index - potentially a long way from the page that was split. That situation creates a drag on performance through degradation of the index's organization (reflected in the LEAFDIST value for the index - or index partition, in the case of a partitioned index - in the SYSIBM.SYSINDEXPART catalog table). With a non-zero and relatively low value for FREEPAGE (meaning, an empty page following each relatively-low-number of populated pages), when an index split does occur then there should be a "nearby" empty page into which entries from the full page can be moved.

Both PCTFREE and FREEPAGE can be changed for an index via an ALTER INDEX statement, and both take effect when the index is subsequently reorganized (or loaded). The larger PCTFREE value will reduce index page split activity between REORGs, and the non-zero FREEPAGE value will reduce the impact of page splits if they do occur.

And what about the "maybe do" step? That would be an increase in the size of the index's pages, from the default of 4 KB to maybe the maximum of 32 KB. How can that help? Here's how: because a 32 KB index page (for example) can hold 8 times as many entries as a 4 KB page, going to 32 KB-sized pages for an index (via an ALTER INDEX statement that assigns the index to a 32K buffer pool) can potentially result in an 87.5% reduction (seven eighths) in page split activity for an index, other things being equal (e.g., same rate of inserts for the underlying table). Why is this a "maybe do" thing versus an "absolutely do" thing? Because if access to table rows through the index (e.g., for queries) is truly random in nature with respect to key values, 32 KB-sized pages could mean somewhat less-effective use of buffer pool resources versus 4 KB-sized pages. It's a matter, then, of what's more important in a particular situation: is it minimizing index page split activity, or maximizing the effectiveness of buffer pool resources for random patterns of row access by applications?

In conclusion...

I hope that this entry has provided you with some "news you can use." Utilize Db2 monitor accounting and statistics reports to determine the source of an application performance problem, then take appropriate remedial action.

One more thing: Db2 13 for z/OS made it a lot easier to verify that index page splits are an issue, through enhanced instrumentation (new IFCID 396, associated with statistics trace class 3, which is on by default, indicates when an index page split operation takes more than 1 second, which would be unusually long) and through the new REORGTOTALSPLITS, REORGSPLITTIME and REORGEXCSPLITS columns of the SYSIBM.SYSINDEXSPACESTATS real-time statistics table in the Db2 catalog.