Thursday, May 25, 2023

OUTBUFF: A Db2 for z/OS ZPARM You Really Ought to Check

Over the past year or so, I've seen more and more situations in which a too-small Db2 for z/OS log output buffer is negatively impacting system and application performance. The Db2 development team took aggressive action to remedy that situation via a change to the default value of OUTBUFF (the relevant ZPARM parameter) with Db2 13, but if you are not yet on Db2 13 you should make this change yourself in your Db2 12 environment (and, if you are on Db2 13, you should definitely be using the new OUTBUFF default, or an even higher value). In this blog entry I'll provide information that I hope will make all of this clear and meaningful for you.

The Db2 for z/OS log output buffer

Db2, of course, logs changes made to database objects (aside from the situation in which a table is defined with the NOT LOGGED attribute - unusual, in my experience). This is data integrity 101 - data changes have to be logged so that they can be rolled back if a unit of work fails before completing, and so that database objects can be recovered when that is required, and so that a Db2 subsystem can be restarted and restored to a consistent state after an abnormal termination, etc., etc.

Db2 data changes are physically written to the active log data sets, which are made reusable (i.e. made available for further use after being filled) via the log archive process. Prior to being written to the current pair of active log data sets (you ALWAYS want to do dual-logging, at least in a production Db2 environment), data changes are written to the log output buffer in memory. Information in the log output buffer is externalized (i.e., written to the disk subsystem) when the log output buffer is full, and also when a data-changing unit of work commits.

The size of the log output buffer is specified by way of the OUTBUFF parameter in the Db2 ZPARM module (think of the ZPARM module as the configuration parameter settings for a Db2 subsystem). For Db2 12, the default value for OUTBUFF is 4000 KB (that became the default value for OUTBUFF starting with Db2 10 for z/OS). With Db2 13, the OUTBUFF default value was changed to 102400 KB. Yeah, that's a 25X increase (when I communicated that in writing to the Db2 for z/OS team at a certain site recently, the Db2 systems programmer on the team asked me, "Is that a typo?"). Why this major increase in the OUTBUFF default value? Two reasons:

  1. It's eminently do-able for the vast majority of production Db2 subsystems that I have seen. Yes, in a relative sense a 25X increase in a ZPARM parameter's default value may seem to be a really aggressive move, but in absolute terms the increase - about 98 MB - is a drop in the bucket for a z/OS LPAR with a large real storage resource. Many production z/OS LPARs these days are generously configured with memory, because mainframe memory keeps getting cheaper on a per-gigabyte basis, and because leveraging that memory can be very good for system performance. It's increasingly common for production z/OS LPARs to have multiple hundreds of GB - or more - of central storage.
  2. It can be very helpful for Db2 system and application performance, as noted below.
From a performance perspective, a larger Db2 log output buffer has two main benefits. They are...

Larger OUTBUFF benefit 1: reduced application log write wait time

Db2 accounting trace data - specifically, data provided by accounting trace class 3 (which, along with accounting trace classes 1 and 2, is almost always active at all times for a production Db2 subsystem) - records (among other things) the time that Db2-accessing applications wait for log write operations to complete. Average wait-for-log-write-I/O time (available via a Db2 monitor-generated accounting long report) is generally quite small - often less than 1% of total in-Db2 wait time (i.e., class 3 time) for a Db2 application workload. In some cases, however, this can be a significantly larger percentage of in-Db2 wait time. Now, there are multiple factors that can contribute to elevated wait-for-log-write-I/O time, but one of these factors can be a too-small log output buffer. If you see higher levels of wait-for-log-write-I/O time for your Db2 application workload, check the value of the field labeled UNAVAILABLE OUTPUT LOG BUFF (or something similar to that - different Db2 monitor products sometimes label the same field in slightly different ways) in a Db2 monitor-generated statistics long report (the field will be in a section of the report with the heading LOG ACTIVITY, or something similar to that). In my experience the value of this field is usually 0, but if the field has a non-zero value then it could be a good idea to set OUTBUFF to a larger value for the Db2 subsystem in question. Even if the value of UNAVAILABLE OUTPUT LOG BUFF is 0, if your Db2 subsystem has a log output buffer that's on the small side then making it larger via an increase in the OUTBUFF value could help to make Db2 log write operations more efficient, thereby potentially contributing to a decrease in wait-for-log-write-I/O time for your Db2-accessing applications.

Larger OUTBUFF benefit 2: better log read performance

The importance of good Db2 log write performance should be obvious: Db2 is writing to its log all the time, so getting that work done quickly and efficiently is good for any Db2 data-changing application. Can log read performance be important for a Db2-accessing process? YES - and that's especially true for a Db2 data-change-replication process.

It's not unusual for Db2 for z/OS-managed data to be replicated to some other location for some purpose. The data replication tools, from IBM and other vendors, that capture Db2 for z/OS data changes and send them in near-real time to another location generally do their data change capture work by issuing requests to the log manager component of Db2 to retrieve data change information (this is done using a Db2 trace record, IFCID 306, that can be requested synchronously by a process such as a data replication tool). Especially when the volume of changes made to data in a replicated Db2 table is high, you REALLY want the Db2 log manager to be able to retrieve the requested data change information from the log output buffer in memory, versus having to read the information from the log data sets on disk. If the log output buffer is too small, you can see a high percentage of log read requests that require access to the log data sets on disk. The volume of such log data set read I/Os can be very high - like, thousands per second. That chews up CPU cycles and adds to data replication latency (this latency refers to the time between a change being made to data on the source Db2 for z/OS system and that change being reflected in the corresponding data at the replication target location) - both things you'd rather avoid.

How can you check on this? Again, go to a statistics long report generated by your Db2 monitor, and again go to the section under the heading, LOG ACTIVITY (or something similar to that). Check the fields labeled READS SATISFIED-OUTP.BUF(%) and READS SATISFIED-ACTV.LOG(%). What you want to see (what I'd certainly like to see) is a value for READS SATISFIED-OUTP.BUF(%) that is well north of 90, and - conversely - a value for READS SATISFIED-ACTV.LOG(%) that is in the single digits (ideally, low single digits). If you see a lower than desired value for the percentage of log reads satisfied from the log output buffer, make OUTBUFF larger if you can.

Can you make OUTBUFF larger, and if so, how high should you go?

The answer to the first part of that question (assuming that the value of OUTBUFF is not already at the maximum of 400000 KB) depends on the pressure (or lack thereof) on the real storage resource of the z/OS LPAR in which the Db2 subsystem of interest is running. My favorite indicator of real storage constraint is the LPAR's demand paging rate, available from a z/OS monitor-generated summary report. If the LPAR's demand paging rate is less than 1 per second, the real storage resource is not at all constrained, and you have (as far as I'm concerned) a green light for increasing the OUTBUFF value. If the LPAR's demand paging rate is over 1 per second, you might want to see if more memory can be configured for the system, or if some memory can be freed up (perhaps by shrinking a Db2 buffer pool that is larger than it needs to be), prior to making the value of OUTBUFF significantly larger than its existing value.

If the z/OS LPAR's real storage is not constrained (as described above), and you want to make a Db2 subsystem's OUTBUFF value larger, how high should you go? First of all, I would highly recommend setting OUTBUFF at least to the new-with-Db2-13 default value of 102400 KB. Should you go higher than that? Well, I would if the value of READS SATISFIED-OUTP.BUF(%) is less than 90. One thing to keep in mind here: OUTBUFF is not an online-updatable ZPARM. That means you have to recycle a Db2 subsystem (i.e., stop and restart it) in order to put a new OUTBUFF value into effect. In a Db2 data sharing system, that may not be a big deal (application work can continue to process on other members of the data sharing group as the one member is recycled), and even for some standalone Db2 subsystems there are regular opportunities to "bounce" the subsystem. On the other hand, at some sites where Db2 runs in standalone mode there are only a few times per year when a production Db2 subsystem can be recycled. If that's your situation, you might want to consider going to the maximum OUTBUFF value of 400000 KB (again, if the LPAR's memory is not constrained - and it's not if the LPAR's demand paging rate is less than 1 per second).

And that's what I have to say about OUTBUFF. Check yours, and check the relevant information in Db2 monitor-generated accounting and statistics reports to see if an OUTBUFF increase would be good for your system.