Tuesday, December 13, 2011

DB2 for z/OS: Driving Thread Reuse

I recently provided some assistance to an organization that was experiencing throughput issues in their CICS-DB2 online transaction environment. This company was migrating a large application from a non-IBM, non-relational DBMS to DB2 for z/OS, and the initial cut-over to DB2 placed a significant load on a mainframe system that didn't have a whole lot of spare cycles to begin with. The transaction slowdowns were most pronounced during periods of very high CPU utilization. In such situations, there are some mitigating actions that one can take if one has the time to see them through. For example, a business can add processing capacity to a CPU-constrained system, but the procurement cycle might be rather drawn out in the case of a large-scale, enterprise-class server. In the particular case I've described, application re-architecture work would likely yield major CPU savings, as phase one of the database migration project was not exploitative of the set-oriented nature of SQL (what we'd think of as a multi-row result set is presently being retrieved via a series of singleton SELECTs -- much more costly versus OPEN CURSOR/FETCH/FETCH/FETCH...). The application development leads at the company of which I'm writing are aware of this, but the SQL-exploiting phase of the application migration (as opposed to the "compatibility" phase) will take a while to implement.

In the meantime, there's this response time problem that needs attention now. What can be done quickly? A clue as to the appropriate performance tuning action came from side-by-side analysis of CICS and DB2 monitor information. The CICS monitor was reporting elevated wait-for-DB2 times during periods of degraded throughput, while at the same time the DB2 monitor was showing very good performance. Such a situation is often indicative of a problem in between CICS and DB2, and that is likely to have something to do with CICS transactions waiting to get the threads that are necessary for DB2 data access. The CICS monitor sees wait-for-thread time as wait-for-DB2 time, whereas a DB2 monitor doesn't "see" the CICS transaction until it gets a thread (and things might move along lickety-split once a thread is acquired -- thus the picture of good performance presented by the DB2 monitor).

Thankfully, for a problem of this nature there is a corrective action that both cuts down on DB2 thread create/terminate activity AND boosts CPU efficiency. It's called thread reuse. That's a relatively old concept, but it gets lost in the shuffle sometimes and isn't given much thought until a situation arises in which it's needed. Thread reuse has relevance in a transaction processing context, as it involves multiple transactions (i.e., multiple units of work, with the ratio of transactions to DB2 units of work typically being 1:1) accessing DB2 data, one after another, using the same DB2 thread; so, that thread which is reused persists across commits, as opposed to being terminated at end-of-transaction (the thread used by a batch job to access DB2 data is by its nature persistent until end-of-job: the batch job might commit ten thousand times in the course of its execution, but it will keep using the same thread).

When the DB2-accessing transactions are managed by CICS, there are basically two ways to drive up the rate of thread reuse. The first of these, which I don't much like, requires restricting the number of threads available between a CICS region and a DB2 subsystem. That restriction on the supply of threads is needed because this approach depends on transaction queuing to drive thread reuse. Here's what I mean by that: typically, when a CICS-DB2 transaction associated with DB2 plan ABC completes, the thread used for the execution of the transaction will be immediately terminated unless it is reused. The thread will be reused IF a transaction associated with the same plan is queued, waiting for a thread (that queued transaction will also have to be associated with the same CICS DB2ENTRY -- if any -- as the transaction that last used the thread). Wait-for-thread transaction queuing will only happen if the number of threads between the CICS region and the DB2 subsystem has reached its user-specified limit. In that case, new threads can't be created, so incoming transactions wait until in-use threads are freed up for reuse. [Note: I am using CICS-DB2 interface specification terms, such as DB2ENTRY, that are associated with CICS's Resource Definition Online (aka RDO) functionality. Some years ago, RDO replaced the old way of defining the interface between a CICS application-owning region (AOR) and a DB2 subsystem: a macro called the RCT (short for Resource Control Table). Information on DB2ENTRY, DB2CONN, and other DB2-related CICS resource definitions can be found in a manual called the CICS Transaction Server for z/OS DB2 Guide.]

Because I don't like to see CICS transactions queued up waiting for DB2 threads to come free, I favor an alternative means of promoting thread reuse: exploiting protected entry threads. I like this approach because it doesn't depend on forcing wait-for-thread transaction queuing. Unlike non-protected threads (those being pool threads or non-protected entry threads), a protected thread (a type of entry thread) will stick around for an average of 45 seconds following completion of the transaction that last used the thread (this time period is based on the setting of the CICS-DB2 thread purge cycle, which has a default value of 30 seconds -- a protected thread will be terminated if it is not reused within two of these purge cycles). If a transaction associated with the same DB2ENTRY and the same DB2 plan comes along in that 45-second period, the protected thread will be reused (keep in mind that several transactions can be defined for one DB2ENTRY by way of a wildcard character in the transaction name, and additional transactions can be associated with a DB2ENTRY via a DB2TRAN resource definition).

The approach to protected thread utilization that I recommend involves starting with the most frequently executed CICS-DB2 transactions. Set up a DB2ENTRY (or DB2ENTRYs) for these, with a THREADLIMIT value that is greater than zero and a PROTECTNUM value that is equal to the THREADLIMIT value (you could make PROTECTNUM smaller than THREADLIMIT, but I don't see much value in having entry threads that aren't protected). Also specify THREADWAIT(POOL) for the DB2ENTRY(s), so that transactions will overflow to the pool instead of waiting if the thread limit for the DB2ENTRY has been reached. Start with a moderate number of protected threads for an entry, and monitor the impact on thread reuse. If the thread reuse rate is not what you want it to be, increase the number of protected threads for one or more DB2ENTRYs. Note that in adding entry threads (whether protected or not) you may need to increase the value of TCBLIMIT for the CICS region's DB2CONN resource definition -- this to help ensure that you don't unintentionally hit the TCBLIMIT on the number of tasks using DB2 threads between the CICS region and the target DB2 subsystem (example: specifying 100 entry threads for a CICS region won't help you if the value of TCBLIMIT for the region is 50).

So, how do you monitor CICS-DB2 thread reuse in your DB2 environment? I like to use the accounting detail report generated by a DB2 monitor, with the data in the report grouped either by connection type (so that you'll see a report reflecting activity for your overall CICS-DB2 workload) or by connection ID (this would show activity at the CICS AOR level). [Be aware that, based on the particular DB2 monitor in use at your site, what I call an accounting detail report might be referred to as an accounting long report, and what I call data grouping (a SYSIN-specified option in the batch job that generates the report) may be termed data ordering.]

In the accounting detail report, look for a set of fields under the heading NORMAL TERM (or something like that -- headings and field names differ somewhat from one monitor to another). This is where thread reuse data is found. The field DEALLOCATION under the heading NORMAL TERM shows the number of times that a thread was deallocated (i.e., not reused). NEW USER indicates the number of times that a CICS-DB2 thread was reused with a different authorization ID. The RESIGNON field under the NORMAL TERM heading shows the number of times that a thread was reused without an authorization ID change (DB2CONN and DB2ENTRY resource definitions specify the DB2 authorization ID to be used for transactions, and this can optionally be set to a non-changing value, such as a character string or the ID of the CICS region). The rate of thread reuse is:

(NEW USER + RESIGNON) / (NEW USER + RESIGNON + DEALLOCATION)

At one DB2 for z/OS-using organization that I support, this rate is a little over 99%.

You can also use a DB2 monitor statistics detail report (your monitor might refer to this as a statistics long report) to check on the CPU consumption of the DB2 address spaces. A good bit of the work done by the DB2 system services address space (also known as MSTR) has to do with thread creation and termination. As your rate of thread reuse goes up, you might see MSTR CPU consumption go down (though this isn't a really big deal, as MSTR usually consumes only a small amount of a system's processing resource). Rising thread reuse also reduces the average CPU cost for CICS-DB2 transactions (this would be shown as average class 2 CPU time -- also known as in-DB2 CPU time -- per transaction in an accounting detail report of CICS-DB2 activity).

Want to really maximize the CPU savings achievable with greater reuse of CICS-DB2 threads? Bind DB2 packages executed frequently by thread-reusing transactions with RELEASE(DEALLOCATE). That will cause DB2 to retain table space-level locks (these are almost always IS or IX locks, which are not exclusive) and EDM pool entries needed for the execution of the packages until threads through which the packages are executed are deallocated, as opposed to releasing these resources at commit (i.e., at end of transaction) and then reacquiring them when the packages are executed again. The combination of thread reuse and RELEASE(DEALLOCATE), while good for CPU efficiency, will increase utilization of the part of the EDM pool in which package table (PT) sections go (a statistics detail report generated by your DB2 monitor might refer to this as the RDS pool, under the heading EDM POOL). Keep an eye on this, and if the number of free pages in the RDS part of the EDM pool drops below 10% of the total number of pages used for that part of the pool, add pages (you can up the value of the EDMPOOL parameter in ZPARM, and activate the change using the DB2 command -SET SYSPARM). DB2 10, by the way, moves PT sections above the 2 GB bar (for packages bound and rebound on the DB2 10 system), giving you much more room, virtual storage-wise, for enlarging the EDM pool.

Driving thread reuse is not just a CICS-DB2 thing. DB2 10 made it a DDF thing, too, thanks to high performance DBATs. I blogged about that DB2 10 enhancement a few months ago.

And what about that organization I mentioned at the start of this entry -- the one where CICS-DB2 transaction throughput degraded during times of very high CPU utilization? They set up some protected entry threads, and CICS-DB2 thread reuse went from zero to a much higher percentage, with attendant performance benefits. Look into this at your shop, if you're not already using protected entry threads. Your efforts could yield a nice performance payoff.