Monday, September 30, 2019

Db2 12 for z/OS - Let's Talk About MAXDBAT in ZPARM

"ZPARMs" is a reference to the parameters in the Db2 for z/OS module called DSNZPARM - these are essentially the configuration parameters for a Db2 subsystem. Two of the ZPARMs that are closely related to each other are CONDBAT and MAXDBAT. CONDBAT specifies the maximum number of connections that network-attached applications (i.e., applications that access the Db2 system via the Db2 distributed data facility, aka DDF) can have with the Db2 system at any one time. MAXDBAT is, essentially, the number of those application connections that can be "in-use" at one time. ["In-use" has to do with the processing of transactions that are originated by the connected applications - for such a transaction to be processed by Db2, the associated connection has to be paired with a type of Db2 "thread" called a database access thread, or DBAT (think of a DBAT as a DDF thread).]

As a rule, you don't want a Db2 system's CONDBAT limit to be reached. Why? Because when that happens, the next connection request from a DDF-using application will fail with an error code. How can you tell if a Db2 system's CONDBAT limit has been reached (aside from noting that an application encountered a connection failure)? You can see that in a statistics detail report generated by your Db2 monitor (depending on the monitor, that might be called a statistics long report). In that report, you'd see a section with DDF-related information, and a field that would be labeled as the blue-highlighted line below (or in a similar way):

GLOBAL DDF ACTIVITY          QUANTITY
---------------------------  --------
DBAT/CONN QUEUED-MAX ACTIVE      0.00
CONN REJECTED-MAX CONNECTED      0.00

If that blue-highlighted number is non-zero, the CONDBAT limit for the Db2 system was hit during the time interval captured in the report. If I saw that CONDBAT was hit on my Db2 system, I'd increase the CONDBAT value. I wouldn't make the CONDBAT value way higher that it needs to be (it can be as high as 150,000 for a Db2 subsystem), but neither would I be stingy in this department (an application connection not in an "in-use" state is placed in an inactive status by Db2, and an inactive connection has a very small virtual storage footprint, and snapping it back to active status when needed is a very low-overhead operation).

Simple enough, but what about MAXDBAT? Do you also want that limit to not be reached? The answer to this question is a little less straightforward versus the CONDBAT situation. I'd say that in most cases you'd want the MAXDBAT value to be high enough so as not to be reached, but that's not necessarily so in all cases. First, how can you see that MAXDBAT has or has not been reached for a Db2 system? One way would be to check the aforementioned statistics detail (or statistics long) report generated by your Db2 monitor - the field highlighted in purple below is the relevant one:

GLOBAL DDF ACTIVITY          QUANTITY
---------------------------  --------
DBAT/CONN QUEUED-MAX ACTIVE      0.00
CONN REJECTED-MAX CONNECTED      0.00

A non-zero value in the purple-highlighted field indicates that the MAXDBAT limit was reached for the Db2 subsystem during the time interval captured in the report. Another way to check on this would be to issue the Db2 command -DISPLAY DDF DETAIL. In the output of that command you'd see this field and an accompanying value:

QUEDBAT=      0

The QUEDBAT value indicates the cumulative number of times that the MAXDBAT limit was hit for the Db2 subsystem since DDF was last started (which probably would be when the Db2 subsystem was last "bounced," or stopped and restarted).

What happens when MAXDBAT is hit? In that case, the request for a DBAT (required in order to service a transaction request coming via an application connection) is queued. No error, at least not right away, but if the request waits too long then the application server might time the transaction out, and we likely don't want that; so, you don't want MAXDBAT to be hit, right? If it is hit, you want to increase the MAXDBAT value, right? Probably right, but not always. Here's one of the "not always" scenarios: suppose you have a DDF workload characterized by the occasional surge of transaction volume (such as might occur during a certain part of a month). You could make the MAXDBAT value large enough to accommodate that surge, and that would be OK if the Db2 system's processing capacity is sufficient to effectively process the surge of transactions. What if that is not the case? What if the surge of transactions, if allowed to flow right into the Db2 system, would max the system out, taking the utilization of the Z server's general-purpose "engines" (processors) to something close to 100%? The z/OS system won't fail (z/OS is famous for staying up in extreme processing situations), but work could get really backed up, so much so that response times could soar, leading to performance complaints from application users (and maybe to monetary penalties if a service level agreement is violated).

If you have a situation in which a DDF transaction surge overwhelms a server's processing capacity, you might be better off with a MAXDBAT value that induces some transaction queueing during surges. If, for example, the number of concurrently-executing DDF transactions on a system is almost always below 2000, and surges above that have severely impacted response times, a MAXDBAT value of 2000 could be beneficial. Yes, if a surge comes along and MAXDBAT is hit then transactions will start queueing up, waiting for a DBAT to come free (when an in-process transaction completes), but the system, shielded from the negative impact of transaction overload, will continue to process work quickly and efficiently. That, in turn, will cause in-use DBATs to free up quickly, and that could mean that the time a DBAT-awaiting transaction spends in the queue will be very small. While there would be some response time elongation due to some transactions having to queue up for a DBAT, the performance impact could be reduced versus the "let 'em all in when they arrive" situation.

Bottom line: if a Z server has the processing capacity to efficiently handle "peak of the peak" DDF transaction volumes, make the MAXDBAT value high enough to avoid transaction wait-for-DBAT queueing. If, on the other hand, the occasional really-big DDF transaction surge causes the system to get severely bogged down so that transaction service times shoot way beyond the acceptable level, you could actually improve performance and throughput via a MAXDBAT value that maintains efficient processing by forcing a degree of transaction queueing. Note, too, that if you want to induce surge-time queuing only for transactions associated with a particular DDF-using application or applications, you can accomplish that via the Db2 profile tables (SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES), which enable the setting up of DBAT limits (and/or connection limits and/or idle thread timeout values) in a granular fashion (as described in an entry I posted to this blog a couple of years ago).

I hope that this information will be helpful for you.

4 comments:

  1. I am now thinking... when you retroactively realize MAXDBAT was reached (by review system statistics)... how can one effectively review the accounting history for that moment in time to confirm the applications running at that moment??? This would help confirm which application contributed the most to using all those DBATs...

    ReplyDelete
    Replies
    1. You could use your Db2 monitor (IBM's is called OMEGAMON Performance Expert for Db2 on z/OS) to generate an accounting long report for the time period of interest. In doing that, you can tell your Db2 monitor to include in the report only activity related to the DRDA connection type - that way you'll only be looking at activity associated with DDF-using applications.

      Robert

      Delete
    2. I do use OMEGAMON. I have all the accounting loaded into a perf db table of DB2PMFACCT_GENERAL. And for the time period in question... I cannot find any DBAT thread history. Am I looking wrong?

      Delete
    3. This can be a little tricky. A Db2 monitor accounting report doesn’t provide a “high water mark for DBATs used” figure. It will show you, at whatever aggregation level you’ve chosen (e.g., at the primary auth ID level), things like number of commits (basically, that’s the number of transactions) and average class 1 elapsed time (average transaction duration), and with those items of information you might be able to make inferences as to which DDF-using applications probably used the largest numbers of DBATs at one time.

      Another thought: via the Db2 for z/OS profile tables, you can specify warning thresholds for DBAT usage for different DDF applications – Db2 would issue messages when these thresholds are crossed, and with some tries and adjustments you might be able to see which DDF-using apps are really going high on concurrent DBAT usage.

      The IBM product Db2 AI for z/OS has, I believe, the ability to show high-water-mark DBAT usage at the auth ID and/or IP address (of app server) level.

      If, when MAXDBAT was hit for this subsystem, the general-purpose engines of the z/OS LPAR were not overly busy (i.e., if average MVS busy for those engines, per an RMF CPU Activity report for the LPAR, was less than 80%), consider raising your MAXDBAT value (maybe from 400 to 450 or 500) – that would allow more in-flight DDF transactions at one time, but you would have verified that the system has the processing capacity to handle that situation. If, on the other hand, the z/OS LPAR’s general-purpose engines were really busy at the time you hit MAXDBAT (e.g., if average MVS busy for those engines was over 90%), you might want to add processing capacity for the LPAR before taking MAXDBAT to a higher level.

      Also, note that the use of high-performance DBATs can lead to more DBATs being in an in-use state at one time, as explained in the blog entry at https://robertsdb2blog.blogspot.com/2013/12/db2-for-zos-want-to-use-high.html. Implementing hi-perf DBAT functionality often necessitates an increase in a Db2 subsystem’s MAXDBAT value.

      Robert

      Delete