Sunday, December 29, 2013

DB2 for z/OS: Want to use High-Performance DBATs? Check your MAXDBAT Value

Of the features introduced with DB2 10 for z/OS, high-performance DBATs is one of my favorites. It enabled (finally) DDF-using applications to get the CPU efficiency benefit that comes from combining thread reuse with the RELEASE(DEALLOCATE) package bind option -- a performance tuning action that has long been leveraged for CICS-DB2 workloads. Implementing high-performance DBATs is pretty easy: in a DB2 10 (or 11) environment, when a package bound with RELEASE(DEALLOCATE) is executed by way of a DBAT (i.e., a database access thread -- the kind used for DRDA requesters that connect to DB2 via the distributed data facility), that thread becomes a high-performance DBAT (if it isn't one already). Before jumping into this, however, you should consider some things that are impacted by the use of high-performance DBATs. One of those things is the DBAT pool. That's where the MAXDBAT parameter of ZPARM comes in, and that's what this blog entry is about.

The value of MAXDBAT determines the maximum number of DBATs that can be concurrently active for a DB2 subsystem. The default value is 200, and at many sites that value, or one that's a little larger, has effectively supported a much greater number of DB2 client-server application connections (the default value for CONDBAT in ZPARM -- the maximum number of connections through DDF to a DB2 subsystem -- is 10,000). How so? Well, if your system is set up to allow for inactive connections (CMTSTAT = INACTIVE has been the default in ZPARM since DB2 V8), when a DDF transaction completes the associated connection will go into an inactive state (a very low-overhead transition, as is the transition back to the active state) and the DBAT used for the transaction will go into the DBAT pool, ready to service another transaction. That can happen because a "regular" DBAT is only associated with a particular DB2 connection while it is being used to execute a request from said connection. Because it is common for only a small percentage of DDF connections to a DB2 subsystem to be active (i.e., associated with in-flight transactions) at any given moment, a large ratio of connections to DBATs has historically been no problem at all.

Bring high-performance DBATs into the picture, and things change. In particular, a high-performance DBAT, once instantiated, will remain dedicated to the connection through which it was instantiated until it's been reused by 200 units of work (at which point it will be terminated, so as to free up resources allocated to the thread). That high-performance DBAT, therefore, will NOT go into the DBAT pool when a transaction using the thread completes. When a request associated with another connection comes in (i.e., from a connection other than the one through which the high-performance DBAT was instantiated), the high-performance DBAT won't be available to service that request. Some other DBAT will have to be used, and guess what? If that DBAT isn't a high-performance DBAT, it will become one if the package associated with the incoming request (and that could be a DB2 Connect or IBM Data Server Driver package) was bound with RELEASE(DEALLOCATE). The DBAT pool thus becomes progressively smaller as high-performance DBATs are instantiated. Know what else happens? The number of active DBATs goes up -- maybe sharply. Why? Because a "regular" DBAT is active only while it is being used to execute a DDF transaction. A high-performance DBAT, on the other hand, is considered to be active as long as it exists -- that will be 200 units of work, as mentioned previously, and when a high-performance DBAT is waiting to be reused, it's an active DBAT.

This last point -- about the number of active DBATs potentially rising sharply when high-performance DBATs are utilized -- is illustrated by some information I recently received from a DB2 professional. At this person's shop, high-performance DBATs were "turned on" for a DB2 subsystem (the PKGREL option of the -MODIFY DDF command can be used as a "switch," telling DB2 to either honor RELEASE(DEALLOCATE) for packages executed via DBATs -- thereby enabling instantiation of high-performance DBATs -- or not), and the number of active DBATs for the subsystem went from the usual 60 or so to about 700. Because the MAXDBAT value for the DB2 subsystem was already at 750, these folks didn't run out of DBATs, but the pool of "regular" DBATs got pretty small. In response to the big increase in active DBATs seen when high-performance DBAT functionality was enabled, the MAXDBAT value for the DB2 system in question was increased to 2000. Was this OK? Yes: When packages are bound or rebound in a DB2 10 for z/OS environment, almost all thread-related virtual storage goes above the 2 GB "bar" in the DBM1 address space, and that allows for a 5- to 10-times increase in the number of threads that can be concurrently active for the DB2 subsystem.

So, if you're thinking about using high-performance DBATs (and you should), check your subsystem's MAXDBAT value, and consider making that value substantially larger than it is now. Additionally, take steps to enable selective use of high-performance DBATs by your network-attached, DB2-accessing applications. For programs that contain embedded SQL statements and, therefore, have their own packages (e.g., DB2 stored procedures -- both external and native), use RELEASE(DEALLOCATE) for the most frequently executed of these packages. For the packages associated with DB2 Connect and/or the IBM Data Server Driver, use two collections: The default NULLID collection, into which you'd bind the DB2 Connect and/or IBM Data Server Driver packages with RELEASE(COMMIT), and another collection (named as you want) into which you'd bind these packages with RELEASE(DEALLOCATE). Then, by way of a data source or connection string specification on the client side, direct DDF-using applications to NULLID or the other collection name, depending on whether or not you want high-performance DBATs to be used for a given application.

To keep an eye on DBAT usage for a DB2 subsystem, periodically issue the command -DISPLAY DDF DETAIL. In the output of that command you'll see a field, labeled QUEDBAT, that shows the number of times (since the DB2 subsystem was last started) that requests were delayed because the MAXDBAT limit had been reached. If the value of this field is non-zero, consider increasing MAXDBAT for the subsystem. You might also want to look at the value of the field DSCDBAT in the output of the -DISPLAY DDF DETAIL command. This value shows you the current number of DBATs in the pool for the subsystem. As I've pointed out, maintaining the "depth" of the DBAT pool as high-performance DBAT functionality is put to use might require increasing MAXDBAT for your DB2 subsystem.

DDF activity can also be tracked by way of your DB2 monitor. I particularly like to use a DB2 monitor-generated Statistics Long Report to see if the connection limit for a DB2 subsystem (specified via the CONDBAT parameter in ZPARM) is sufficiently high. In the section of the report under the heading "Global DDF Activity," I'll check the value of the field labeled CONN REJECTED-MAX CONNECTED (or something similar -- fields in reports generated by different DB2 monitors might be labeled somewhat differently). A non-zero value in this field is an indication that the CONDBAT limit has been hit, and in that case you'd probably want to set CONDBAT to a larger number to allow more connections to the DB2 subsystem.

So there you go. Using high-performance DBATs can improve the CPU efficiency of your DB2 for z/OS client-server workload, but if you do leverage high-performance DBAT functionality then you might need to boost the DBAT limit for your DB2 subsystem in order to maintain the depth of your DBAT pool, because as high-performance DBATs increase in number, pooled DBATs decrease in number (unless you've upped your MAXDBAT value to compensate for this effect). Boosting MAXDBAT in a DB2 10 (or 11) environment is OK, as thread-related virtual storage in such an environment is almost entirely above the 2 GB "bar" in the DBM1 address space (assuming that packages have been bound or rebound with DB2 at the Version 10 or 11 level). Of course, you need real storage to back virtual storage, so if you increase the MAXDBAT value keep an eye on the z/OS LPAR's demand paging rate and make sure that this doesn't get out of hand (if the demand paging rate is in the low single digits or less per second, it's not out of hand).

27 comments:

  1. Robert, please elaborate on how it's possible that active DBATS "went from the usual 60 or so to about 700. Because the MAXDBAT value for the DB2 subsystem was already at 750 . . .".

    High Performance DBATs are limited to half of MAXDBATs, so in this case 750/2 = 365, so I would expect worst case 365 + 60 = 425, so much less than 700.

    Did they ignore the documented recommendations to restrict high performance DBATs to high volume, light transaction connections that disconnect when they don't have SQL work?

    Did they ignore, by thread reuse and RELEASE(DEALLOCATE) analogy, the CICS-DB2 recommendations to limit the the use of CICS protected threads to high volume, light transactions?

    But even if they did, the half of MAXDBAT cap should have limited the high performance DBATs to a maximum of 375.

    ReplyDelete
  2. Robert, I should have written 750/2 = 375, so I would expect worst case
    375 + 60 = 435, so much less than 700.

    ReplyDelete
    Replies
    1. You state that "High Performance DBATs are limited to half of MAXDBATs." On what are you basing this claim? That is NOT how high-performance DBATs were implemented. I know for a fact that it's possible for 100% of the DBATs in a DB2 for z/OS system to be of the high-performance variety.

      You also state that restricting "high performance DBATs to high volume, light transaction connections that DISCONNECT when they don't have SQL work" (emphasis added by me) is a "documented recommendation." To what documentation are you referring here? I ask because it is NOT important for a network-attached, DB2-accessing application to DISCONNECT following transaction completion if it's to benefit from the use of high-performance DBATs. This is an important point because it's VERY COMMON for network-attached, DB2-accessing applications to connect to a DB2 for z/OS location and then STAY CONNECTED to that DB2 location for a long time. High-performance DBATs can be used to very-much positive effect with such applications. What's important in such situations is that the application's transactions complete in a "clean" fashion. By that I mean that they complete and commit with no "hanging resources" (such as open WITH HOLD cursors, non-dropped declared temporary tables, etc.). When an application's transactions complete cleanly, it's totally OK to use high-performance DBATs for that application, even if it is characterized by long-duration connections to the DB2 server.

      Your point about frequently executed "light" transactions being good candidates for high-performance DBATs is a good one (assuming that by "light" you mean relatively low in-DB2 CPU time, on average, per transaction). For such transactions, the cost of constantly releasing and reacquiring the same resources (such as table space locks and package sections) affected by the RELEASE bind option is proportionately higher than it would be for a "heavy" transaction, so you can get some real bang for your buck in using high-performance DBATs for transactions of that type. For a transaction with a higher average in-DB2 CPU time, high-performance DBATs won't move the server-side CPU consumption needle much, so they should just use "regular" DBATs. Similarly, transactions that are infrequently executed won't use high-performance DBATs to much benefit. They should also use "regular" DBATs.

      Robert

      Delete
  3. Robert, thanks for your answers. The basis for my claim are some IBM presentations including page 41 of the March 13, 2012, SHARE 2012
    Session 10996, "DB2 for z/OS Distributed Access – Best Practices and Updates" by Adrian Burke, which states for DB2 10:

    "If # of Hi-Perf DBATs exceed 50% of MAXDBAT threshold

    • DBATs will be pooled at commit and package resources copied/allocated as RELEASE(COMMIT)"

    Which I interpreted, based on the assumption that transactions are committed, that MAXDBAT/2 was the effective cap. Hence my request for elaboration of your scenario. Have I misinterpreted the bullet? Is the actual implementation different?

    ReplyDelete
    Replies
    1. The information in that presentation bullet is not correct. High-performance DBAT functionality was implemented without a 50%-of-MAXDBAT limitation.

      Sometimes, early presentations about a new release of DB2 contain some items of information related to code that is not yet finalized in terms of implementation. Such informational items may describe design ideas that end up not being implemented. It is possible for an informational item related to a not-implemented design idea to persist for a while in various iterations of presentation material. That may be what happened in this case.

      Delete
    2. Robert, word is sometimes slow to spread if explicit corrections are not explicitly made in later material, even though there are disclaimers.

      This March 20, 2013 IDUG India presentation by IBM has the same not-implemented cap:

      DB2 Migration Planning and Best Practises
      Session Number: 3011
      Primary Presenter: Chris Crone [Ibm]
      Time(s):
      Wed, Mar 20, 2013 (04:00 PM - 05:00 PM)
      Wed, Mar 20, 2013 (05:00 PM - 06:00 PM)

      http://www.idug.org/e/in/eid=31&req=info&s=1586&all=1

      Delete
    3. Acknowledged. It can take a while to get things like this cleaned up.

      Robert

      Delete
    4. Robert, I noticed that page 41 of the Adrian Burke presentation says, "After the Hi-Perf DBAT has been reused 200 times
      * DBAT will be purged and client connection will then go inactive." Page 18 of 1Q2013 Technology Summit Session 1 Track 2 presentation says "Connection turns inactive after 200 times to free up DBAT."

      While page 30 of your recent presentation only references the fate of the DBAT, "After being used for 200 units of work, high-performance DBAT will be terminated to free up resources"

      What happens to the client connection, what impact does it have and is there a way to monitor its occurrence? Which of the wordings is most complete and accurate? Or does some other wording work better?

      Delete
    5. A client connection going inactive is not something about which you should be concerned. It happens all the time in a DB2 DDF application environment. Business as usual. No impact on the client side of things -- the client is still effectively connected to the DB2 for z/OS subsystem. On the DB2 server side, there is very little impact, as 1) an inactive connection takes up only a small amount of virtual storage, and 2) the CPU overhead involved in moving a client connection to the inactive state (and back to the active state when a request from the client needs to be processed) is very low.

      As stated, an inactive client connection can be very quickly and cheaply restored to an active state, and this will happen when the client next sends a request to the DB2 subsystem. In returning to the active state, the connection will be provided with a DBAT to get the DB2 request processed (and if the request is associated with a package bound with RELEASE(DEALLOCATE), the DBAT will become a high-performance DBAT; otherwise, when the request has been processed the DBAT will go back into the DBAT pool and the client connection will again go into an inactive state, remaining in that state until anther request from the client is to be processed.

      I think that the wording in Adrian's presentation that you've cited is solid. It's a good, net description of what happens.

      Robert

      Delete
    6. Rober, again, thank you for your answers.

      Delete
    7. Robert, I meant "Robert" in my most recent comment. Sorry.

      Delete
  4. Robert, thank you for explaining about the presentation. Too bad that it wasn't implemented, I thought it was an excellent idea and safeguard. Do any other things like this come to mind (presented after GA but not implemented or backed out)?

    I was also wrong earlier to write "documented recommendations" and wrong not to limit my comments to Connection Pooling, with an alternative being commit oriented Connection Concentrator.

    Looking again at pages 6-10 of the the Adrian Burke presentation that mentioned previously, I see that I read in my own interpretation and recommendations into the text.

    ReplyDelete
    Replies
    1. No other situations such as this one come to my mind (referring to pre-GA code changes not being reflected in post-GA presentations).

      Robert

      Delete
  5. Robert, in a recent presentation of yours you wrote:

    "Best uses:
    –Higher-volume transactions – especially those with lower SQL
    statement execution cost (for these transactions, CPU cost of release
    and reacquisition of resources at COMMIT is proportionately higher)
    –For batch programs that issue a lot of commits"

    Which applies for RELEASE(DEALLOCATE) usage in general.

    "Have higher-volume client-server transactions use that second collection to gain high-performance DBAT performance benefits (collection name can be specified as a data source property on the client side)"

    Which would work best for well-design and implemented Connection Pooling and Connection Concentrator configurations.

    When you wrote:

    "Because high-performance DBATs do not go back into the DBAT pool, you may want to increase value of MAXDBAT in ZPARM to compensate"

    You use of "may want to" suggests that you weren't expecting active DBATs to increase from 60 to 700 in most places.

    ReplyDelete
    Replies
    1. An increase from 60 to 700 is greater than I would expect, but results will be different at different sites due to DDF workload differences. As I get more in the way of data points I may be able to get a better idea as to what a "typical" increase in active DBATs is, once high-performance DBAT support is activated.

      Robert

      Delete
    2. Robert, thank you for all your answers, especially pointing out that you can have up to MAXDBAT high performance DBATs.

      Sorry for all the typos and errors in my comments.

      Delete
  6. Robert - in a CMTSTAT=INACTIVE setting, is there a periodic clean-up of the INACTIVE connections (i.e, if the app servers do not close out connections/Websphere settings are not robust enough to clean the connections in a timely manner)

    ReplyDelete
    Replies
    1. With respect to connections in an inactive state, the answer to your question is, "not really." If a connection is active (and, therefore, associated with a DBAT), and it sits idle for X number of seconds (whatever was specified for IDTHTOIN in ZPARM) then DB2 can terminate the DBAT and connection; however, if a connection is in the inactive state then it will remain in that state indefinitely if the connection is not closed or terminated on the client side (or if the application server is not shut down). If a connection is inactive and communication with the client has been disrupted, that situation will be detected via TCP/IP KeepAlive functionality and DB2 will get rid of the connection (if the ZPARM TCPKPALV is set to n seconds then the TCP/IP stack on the z/OS LPAR will attempt to reach out to the client every n seconds, and if it gets no response back it will inform DB2 of this and DB2 will get rid of the connection).

      So, if a client connects to DB2 and does not disconnect (and does not shut down, so it continues to respond to TCP/IP KeepAlive probes) then DB2 connections in an inactive state will persist indefinitely. That's usually not going to be a big deal, as one DB2 subsystem can support up to 150,000 connections, and an inactive connection has a very small virtual storage footprint in the DB2 DDF address space.

      Robert

      Delete
  7. Hi Robert, I am running my DB2 Connect on the Linux system connecting to DB2 on mainframes. If I run a db2 batch program on linux, a new thread are created on mainframes after every commit point.

    If I take ten commits the there are 10 threads on Linux but when i run the same on mainframes I get only single thread.

    Please could you let me know how to run application programs under single thread on distributed systems??

    ReplyDelete
    Replies
    1. I assume that you are talking about a batch program that runs under Linux and accesses a DB2 for z/OS-managed database. If that's the case, consider limiting the application in question to a single connection to DB2 for z/OS and a single DB2 thread (i.e., a single DB2 DBAT) by way of the DB2 profile tables. This capability is described in the DB2 for z/OS Knowledge Center on the Web (see http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.perf/src/tpc/db2z_monitorthreadsconnectionsprofiles.dita).

      Robert

      Delete
  8. Hello Robert!

    First, let me thank you for your detailed, explained and very understandable blog posts!

    I read somewhere that the storage consumption of DBATs in the DBM1 address space is about ~200KB and that inactive connection storage consumption is about ~7.5KB in the DIST address space.

    Can you verify this information? Can i assume it's correct?

    Thanks!
    Mark

    ReplyDelete
    Replies
    1. You may have read that in an IBM "redbook" titled, "DB2 9 for z/OS: Distributed Functions" (downloadable from http://www.redbooks.ibm.com/abstracts/sg246952.html?Open). On page 23 of that document, in the first paragraph, you'll see these words: "Each DDF connection only consumes approximately 7.5 K of memory inside the DDF address space, whereas each active DBAT consumes approximately 200 K of memory at a minimum, depending on SQL activity."

      Robert

      Delete
    2. Thanks a lot :)

      Delete
  9. Hi Robert
    I have a situation where CONDBAT=300 and INACONN reached upto 297 thus causing new connections to fail. we found that suddenly one of the application making large no of connections to Db2 . I want to understand how these inactive connections get terminated by DB2 ? or how to terminate them explicitly if needed ? or application should write logic to close the conenction properly to avoid this situation?

    ReplyDelete
    Replies
    1. Hello, Saidulu.

      First of all, a CONDBAT value of 300 is - for a production Db2 for z/OS subsystem, at least - very small. Whatever the MAXDBAT value is for that subsystem, the CONDBAT would reasonably be at least 4-5 times larger. Why? Because at any moment in time most of the connections that network-connected applications have with the Db2 subsystem will not be actively in-use for an in-flight transaction. The other connections will be in an inactive state from the Db2 perspective.

      Active versus inactive connections are a Db2 thing - it's a feature that supports Db2's scalability as a data server for client-server applications. From an application perspective, once a connection has been established to the Db2 subsystem (what could be considered a logical connection, from the Db2 perspective), it's there until it's terminated (more on that in a moment), and as long as it's there the application has no clue (and no need to know) that the connection is in an active or an inactive state on the Db2 side.

      So, let's say that a certain connection from an application to a Db2 for z/OS subsystem is an inactive state (again, that's invisible to the application). The application sends a transaction (i.e., the first Db2-targeting SQL statement in a transaction) to Db2 via the connection. This incoming SQL request will cause Db2 to move the connection from an inactive to an active state and pair the now-active connection with a DBAT (i.e., a DDF thread). When the transaction completes, the connection will be separated from the DBAT and switched from the active to the inactive state, and the DBAT will go back into the DBAT pool, ready to be paired with some other connection to service some other transaction.

      Having a large number of inactive connections is no big deal. An inactive connection has a very small real and virtual storage "footprint" on the Db2 for z/OS system, and switching a connection from the inactive to the active state and back involves very little cPU overhead.

      Db2 will generally not terminate a connection. Obviously, if you shut down a Db2 subsystem then connections to that subsystem will be terminated as part of -STOP DB2 processing. If a DDF transaction appears to Db2 to be in a hung state for the period of time specified via the ZPARM parameter IDTHTOIN (idle thread timeout), Db2 will terminate the transaction, but even then you have a choice as to whether the application connection through which the transaction got to Db2 will be terminated or preserved. You can also use TCP/IP commands on the z/OS side to terminate connections to a Db2 subsystem.

      Usually, a connection established by an application to a Db2 subsysystem will be terminated only when the application terminates the connection (that could also be done by the application server in which the application is running).

      Robert

      Delete
    2. I faced a similar situation, there are 49 inactive connections present from a single application. But even after the recycle of DB2, those 49 connections are seen, just the LUW unit id is different. How does that work ? Dont the inactive connections get terminated during DB2 recycle ?

      Delete
    3. That is a strange situation you have there. When a STOP DB2 command is issued for a Db2 for z/OS subsystem, that will also drive a STOP DDF. As part of that shutdown processing, Db2 will shut down the SQL listener. Db2 will then close all inactive connections. If you stopped and then restarted a Db2 for z/OS subsystem and you see, after the restart. some inactive connections, the only explanation I'd have for that is that the application in question reestablished its connections to the subsystem after the subsystem was restarted. Those reestablished connections would be in an inactive state if they are not being used for in-flight transactions.

      If you're saying that inactive connections were not terminated when the Db2 subsystem was shutdown (prior to the subsystem being restarted), something would appear to be not right. You might need to open a case with IBM support to see if that team can figure out how the Db2 subsystem was able to shut down without terminating inactive connections.

      Robert

      Delete