Sometimes, a DBA will email me a question about Db2 for z/OS, and I'll respond in a pretty comprehensive way, and I'll look at that outbound message and think to myself, "Hmm. I may have written most of a blog entry there." This is one of those cases. I recently got a couple of questions about Db2 database access threads, or DBATs (threads associated with access to a Db2 subsystem through the distributed data facility, aka DDF), and I think the questions and answers might be of interest to a lot of people in the Db2 community. I am therefore packaging them in this blog entry. Off we go:
Question: When does a DBAT go from being active to idle, so that it is subject to the idle thread timeout setting for the Db2 subsystem?
Answer: OK, the first thing to understand here is that a DBAT is always active. Even when it's in the DBAT pool, a DBAT is active - it's just in a so-called disconnected state. A DBAT is idle when it is in-use (i.e., paired with a connection, which happens when a transaction comes along by way of that connection) and it’s not doing anything (or, more accurately, the transaction that was using the DBAT appears to Db2 to be doing nothing). It's normal for there to be some idle thread time for a DDF transaction - a client-side program issues a SQL statement, the result is sent back to that program, and the DBAT is briefly idle until the transaction's next SQL statement is issued. No big deal there. It's when idle time becomes really elongated that a DBAT might be affected by the Db2 subsystem's idle thread timeout value. That timeout value is specified via the IDTHTOIN parameter in the Db2 DSNZPARM module. The default value for IDTHTOIN is 120 seconds (check out the value on your Db2 system, and see if it's set to something other than 120).
Normally, at end-of-transaction there is a commit, and at that time the DBAT that had been used in processing the transaction goes back to the DBAT pool and the connection with which the DBAT had been paired goes back to an inactive state (inactive connections, a key contributor to Db2's connection scalability, are a server-side thing, invisible to a connected application - an inactive connection will go back to an active state when the next transaction associated with the connection begins). Can a DBAT in the pool be affected by the Db2 system's idle thread timeout value? No, but it is subject a limit specified by another ZPARM parameter called POOLINAC (more on that to come).
Let's say that a DDF transaction starts but then never commits. That could happen because of a problem on the client application side, or it could be that the developer of the transaction program decided that a commit is not necessary because the transaction is read-only in nature (that in fact would be a not-good decision - every DDF transaction needs to commit, because even a read-only transaction will hold one or more table space or partition locks and one or more claims on database objects, and those locks and claims will not be released without a commit). Because the transaction has not committed, it is perceived by Db2 to be still in-flight, and for that reason the transaction's DBAT can't be separated from the associated connection and returned to the DBAT pool. The apparently (to Db2) in-flight transaction continues to do nothing, and the related DBAT remains idle for a longer and longer period of time. Eventually the IDTHTOIN limit will be reached for the idle thread (unless IDTHTOIN is set to 0, which means a DBAT can remain indefinitely idle), and Db2 terminates the DBAT and the associated connection.
So, to recap: first, a DBAT does not go from active to idle, because a DBAT is always considered to be active - it's an in-use DBAT, as opposed to an in-the-pool DBAT, that can be idle. Second, an in-use DBAT will typically have at least some idle time (time when it seems to Db2 that the transaction associated with the DBAT is not doing anything - or, at least, not doing any SQL-related thing); it's when that "nothing SQL-related is happening" time gets long that the Db2 idle thread timeout limit can be reached for a DDF transaction and its DBAT.
[By the way, I mentioned earlier that when Db2 terminates a DDF transaction and its DBAT due to the idle thread timeout limit being reached, Db2 also terminates the connection with which the DDF transaction had been associated. If you'd prefer for Db2 to preserve the connection while terminating the transaction and the DBAT, you can get that behavior thanks to an enhancement introduced with Db2 12 for z/OS. The enhancement is enabled via specification of EXCEPTION_ROLLBACK as an attribute of a MONITOR IDLE THREADS row in the Db2 table SYSIBM.DSN_PROFILE_ATTRIBUTES. You can find more information about this enhancement in the Db2 for z/OS online documentation, at https://www.ibm.com/docs/en/db2-for-zos/12?topic=threads-monitoring-idle-by-using-profile-tables.]
Question: We had a DDF transaction surge, and as a result the number of DBATs went way up. The surge passed, and several minutes later I checked on the number of DBATs and it was still way high. What's with that?
- Db2 went from checking every two minutes for DBATs with "too-long-in-the-pool-without-reuse" status to doing that every 15 seconds.
- In a given purge cycle (again, now once every 15 seconds), Db2 will terminate a maximum of 50 DBATs in the "too-long-in-the-pool-without-reuse" category.
thank you Robert for this useful post. I follow your posts for years and appreciate that you are among the fews that go far over the manuals. These twos QA address my 2 misunderstandings on DBAT. I am testing High Performance DBAT and i think that because of this, the savings is less interesting than the savings of CICS thread reuse as the DBAT is « already » reused …
ReplyDeleteActually, implementing high-performance DBAT functionality can be a significant performance-booster for a Db2 DDF workload (particularly from a CPU efficiency perspective). True, DDF threads (DBATs) are "already reused," even without utilization of hi-perf DBATs, in the sense that they tend to be reused from the DBAT pool versus Db2 having to create new DBATs to service new transactions (in a given time period, the number of DBATs created by Db2 will usually be way smaller than the number of re-uses of existing DBATs); however, hi-perf DBATs leverage thread re-use by keeping allocated to a DBAT the RELEASE(DEALLOCATE) packages executed via the DBAT, and the table space or partition locks acquired in the execution of RELEASE(DEALLOCATE) packages, until thread termination (generally after 200 re-uses of the thread). Without hi-perf DBAT functionality being in effect, those packages and associated "parent" locks will be released at the end of each transaction. Because DDF transactions often have micro-service characteristics that include very low average in-Db2 CPU time per transaction, the proportionate cost of constantly releasing and reacquiring packages and parent locks can be fairly substantial, and eliminating that cost by way of high-perf DBATs can shave quite a bit off of average per-transaction in-Db2 CPU time.
DeleteRobert
Regarding DBAT termination (as mentioned in the above post): Once created, why does a DBAT need to be eventually broken down (i.e., relegated to the poolinac 'status', and ultimately terminated)? What's the downside of keeping them around indefinitely (i.e., keep them inactive, or reuse them)?
ReplyDeleteMy understanding is that it has to do with maximizing CPU and virtual storage efficiency for a DDF workload. Suppose a DDF transaction surge takes the number of DBATs up to 1500, and the surge passes and you're back to a transaction volume that can be well-served by 200 DBATs. Getting the number of DBATs trimmed down from 1500 to 200 reduces CPU overhead (managing more threads requires more CPU) and Db2's virtual storage footprint (that can be particularly important with regard to the use of common virtual storage areas such as ECSA, which gets some consumption related to DBATs). If you can automatically and non-disruptively get the number of DBATs reduced (after a transaction surge) to a smaller figure while still having enough for the DDF workload, why not do that? I believe that was the thinking of the DDF development team.
DeleteRobert