Tuesday, December 13, 2011

DB2 for z/OS: Driving Thread Reuse

I recently provided some assistance to an organization that was experiencing throughput issues in their CICS-DB2 online transaction environment. This company was migrating a large application from a non-IBM, non-relational DBMS to DB2 for z/OS, and the initial cut-over to DB2 placed a significant load on a mainframe system that didn't have a whole lot of spare cycles to begin with. The transaction slowdowns were most pronounced during periods of very high CPU utilization. In such situations, there are some mitigating actions that one can take if one has the time to see them through. For example, a business can add processing capacity to a CPU-constrained system, but the procurement cycle might be rather drawn out in the case of a large-scale, enterprise-class server. In the particular case I've described, application re-architecture work would likely yield major CPU savings, as phase one of the database migration project was not exploitative of the set-oriented nature of SQL (what we'd think of as a multi-row result set is presently being retrieved via a series of singleton SELECTs -- much more costly versus OPEN CURSOR/FETCH/FETCH/FETCH...). The application development leads at the company of which I'm writing are aware of this, but the SQL-exploiting phase of the application migration (as opposed to the "compatibility" phase) will take a while to implement.

In the meantime, there's this response time problem that needs attention now. What can be done quickly? A clue as to the appropriate performance tuning action came from side-by-side analysis of CICS and DB2 monitor information. The CICS monitor was reporting elevated wait-for-DB2 times during periods of degraded throughput, while at the same time the DB2 monitor was showing very good performance. Such a situation is often indicative of a problem in between CICS and DB2, and that is likely to have something to do with CICS transactions waiting to get the threads that are necessary for DB2 data access. The CICS monitor sees wait-for-thread time as wait-for-DB2 time, whereas a DB2 monitor doesn't "see" the CICS transaction until it gets a thread (and things might move along lickety-split once a thread is acquired -- thus the picture of good performance presented by the DB2 monitor).

Thankfully, for a problem of this nature there is a corrective action that both cuts down on DB2 thread create/terminate activity AND boosts CPU efficiency. It's called thread reuse. That's a relatively old concept, but it gets lost in the shuffle sometimes and isn't given much thought until a situation arises in which it's needed. Thread reuse has relevance in a transaction processing context, as it involves multiple transactions (i.e., multiple units of work, with the ratio of transactions to DB2 units of work typically being 1:1) accessing DB2 data, one after another, using the same DB2 thread; so, that thread which is reused persists across commits, as opposed to being terminated at end-of-transaction (the thread used by a batch job to access DB2 data is by its nature persistent until end-of-job: the batch job might commit ten thousand times in the course of its execution, but it will keep using the same thread).

When the DB2-accessing transactions are managed by CICS, there are basically two ways to drive up the rate of thread reuse. The first of these, which I don't much like, requires restricting the number of threads available between a CICS region and a DB2 subsystem. That restriction on the supply of threads is needed because this approach depends on transaction queuing to drive thread reuse. Here's what I mean by that: typically, when a CICS-DB2 transaction associated with DB2 plan ABC completes, the thread used for the execution of the transaction will be immediately terminated unless it is reused. The thread will be reused IF a transaction associated with the same plan is queued, waiting for a thread (that queued transaction will also have to be associated with the same CICS DB2ENTRY -- if any -- as the transaction that last used the thread). Wait-for-thread transaction queuing will only happen if the number of threads between the CICS region and the DB2 subsystem has reached its user-specified limit. In that case, new threads can't be created, so incoming transactions wait until in-use threads are freed up for reuse. [Note: I am using CICS-DB2 interface specification terms, such as DB2ENTRY, that are associated with CICS's Resource Definition Online (aka RDO) functionality. Some years ago, RDO replaced the old way of defining the interface between a CICS application-owning region (AOR) and a DB2 subsystem: a macro called the RCT (short for Resource Control Table). Information on DB2ENTRY, DB2CONN, and other DB2-related CICS resource definitions can be found in a manual called the CICS Transaction Server for z/OS DB2 Guide.]

Because I don't like to see CICS transactions queued up waiting for DB2 threads to come free, I favor an alternative means of promoting thread reuse: exploiting protected entry threads. I like this approach because it doesn't depend on forcing wait-for-thread transaction queuing. Unlike non-protected threads (those being pool threads or non-protected entry threads), a protected thread (a type of entry thread) will stick around for an average of 45 seconds following completion of the transaction that last used the thread (this time period is based on the setting of the CICS-DB2 thread purge cycle, which has a default value of 30 seconds -- a protected thread will be terminated if it is not reused within two of these purge cycles). If a transaction associated with the same DB2ENTRY and the same DB2 plan comes along in that 45-second period, the protected thread will be reused (keep in mind that several transactions can be defined for one DB2ENTRY by way of a wildcard character in the transaction name, and additional transactions can be associated with a DB2ENTRY via a DB2TRAN resource definition).

The approach to protected thread utilization that I recommend involves starting with the most frequently executed CICS-DB2 transactions. Set up a DB2ENTRY (or DB2ENTRYs) for these, with a THREADLIMIT value that is greater than zero and a PROTECTNUM value that is equal to the THREADLIMIT value (you could make PROTECTNUM smaller than THREADLIMIT, but I don't see much value in having entry threads that aren't protected). Also specify THREADWAIT(POOL) for the DB2ENTRY(s), so that transactions will overflow to the pool instead of waiting if the thread limit for the DB2ENTRY has been reached. Start with a moderate number of protected threads for an entry, and monitor the impact on thread reuse. If the thread reuse rate is not what you want it to be, increase the number of protected threads for one or more DB2ENTRYs. Note that in adding entry threads (whether protected or not) you may need to increase the value of TCBLIMIT for the CICS region's DB2CONN resource definition -- this to help ensure that you don't unintentionally hit the TCBLIMIT on the number of tasks using DB2 threads between the CICS region and the target DB2 subsystem (example: specifying 100 entry threads for a CICS region won't help you if the value of TCBLIMIT for the region is 50).

So, how do you monitor CICS-DB2 thread reuse in your DB2 environment? I like to use the accounting detail report generated by a DB2 monitor, with the data in the report grouped either by connection type (so that you'll see a report reflecting activity for your overall CICS-DB2 workload) or by connection ID (this would show activity at the CICS AOR level). [Be aware that, based on the particular DB2 monitor in use at your site, what I call an accounting detail report might be referred to as an accounting long report, and what I call data grouping (a SYSIN-specified option in the batch job that generates the report) may be termed data ordering.]

In the accounting detail report, look for a set of fields under the heading NORMAL TERM (or something like that -- headings and field names differ somewhat from one monitor to another). This is where thread reuse data is found. The field DEALLOCATION under the heading NORMAL TERM shows the number of times that a thread was deallocated (i.e., not reused). NEW USER indicates the number of times that a CICS-DB2 thread was reused with a different authorization ID. The RESIGNON field under the NORMAL TERM heading shows the number of times that a thread was reused without an authorization ID change (DB2CONN and DB2ENTRY resource definitions specify the DB2 authorization ID to be used for transactions, and this can optionally be set to a non-changing value, such as a character string or the ID of the CICS region). The rate of thread reuse is:

(NEW USER + RESIGNON) / (NEW USER + RESIGNON + DEALLOCATION)

At one DB2 for z/OS-using organization that I support, this rate is a little over 99%.

You can also use a DB2 monitor statistics detail report (your monitor might refer to this as a statistics long report) to check on the CPU consumption of the DB2 address spaces. A good bit of the work done by the DB2 system services address space (also known as MSTR) has to do with thread creation and termination. As your rate of thread reuse goes up, you might see MSTR CPU consumption go down (though this isn't a really big deal, as MSTR usually consumes only a small amount of a system's processing resource). Rising thread reuse also reduces the average CPU cost for CICS-DB2 transactions (this would be shown as average class 2 CPU time -- also known as in-DB2 CPU time -- per transaction in an accounting detail report of CICS-DB2 activity).

Want to really maximize the CPU savings achievable with greater reuse of CICS-DB2 threads? Bind DB2 packages executed frequently by thread-reusing transactions with RELEASE(DEALLOCATE). That will cause DB2 to retain table space-level locks (these are almost always IS or IX locks, which are not exclusive) and EDM pool entries needed for the execution of the packages until threads through which the packages are executed are deallocated, as opposed to releasing these resources at commit (i.e., at end of transaction) and then reacquiring them when the packages are executed again. The combination of thread reuse and RELEASE(DEALLOCATE), while good for CPU efficiency, will increase utilization of the part of the EDM pool in which package table (PT) sections go (a statistics detail report generated by your DB2 monitor might refer to this as the RDS pool, under the heading EDM POOL). Keep an eye on this, and if the number of free pages in the RDS part of the EDM pool drops below 10% of the total number of pages used for that part of the pool, add pages (you can up the value of the EDMPOOL parameter in ZPARM, and activate the change using the DB2 command -SET SYSPARM). DB2 10, by the way, moves PT sections above the 2 GB bar (for packages bound and rebound on the DB2 10 system), giving you much more room, virtual storage-wise, for enlarging the EDM pool.

Driving thread reuse is not just a CICS-DB2 thing. DB2 10 made it a DDF thing, too, thanks to high performance DBATs. I blogged about that DB2 10 enhancement a few months ago.

And what about that organization I mentioned at the start of this entry -- the one where CICS-DB2 transaction throughput degraded during times of very high CPU utilization? They set up some protected entry threads, and CICS-DB2 thread reuse went from zero to a much higher percentage, with attendant performance benefits. Look into this at your shop, if you're not already using protected entry threads. Your efforts could yield a nice performance payoff.

11 comments:

  1. Robert,

    That's fine as far as it goes, but in one DB2 subsystem I support they went almost totally PROTECTED ENTRY and the CICS threads "sit" in DB2 consuming 90% of the active threads at both normal and peak workloads. Since this system is starting to show signs of "storage stress" the possible over utilization of this feature is potentially putting DB2 at risk. I think this is a case of "Be Careful" what you ask for -- more is not necessarily better.

    KenH

    ReplyDelete
    Replies
    1. Point well taken, Ken - you can overdo almost anything, and that includes use of protected entry threads. Protected entry threads are best used for your highest-volume CICS-DB2 transactions. When used judiciously, a relatively small number of protected entry threads for higher-volume transactions will often lead to a high level of CICS-DB2 thread reuse. For transactions that are not among the most frequently executed in a CICS-DB2 environment, it can be a good idea to just use pool threads (in the DB2ENTRY for such a transaction, you'd specify THREADLIMIT(0) and THREADWAIT(POOL)).

      What you DON'T want is a bunch of threads (and CICS TCBs) that are mostly sitting there doing nothing, and that's what you can get when you overuse protected entry threads.

      Delete
  2. HI Robert,
    About the savings on cpu consumption related to protected threads, i've read some different writings. In the "IBM CICS performance series :CICS, DB2, and Thread Safety" Redpaper, at page 14, the authors said : "The cost of thread creation or termination is not reflected in the DB2 accounting records. Analyze the CICS monitoring records(SMF110) to determine the cost savings" . In a presentation from GSE Nordic: "Best practicefor performance tuning with cics and db2 " by IBMer Miao Zheng , "the benefit of protected thread" section shows almost no saving in CICS MIPS, whereas in DB2 MIPS we go from 13 to 9.
    For me there is a part of truth from both documents, savings are noticed in the CICS part, and in the DB2 part (seen in MSTR for the thread deallocation and in DB2 accounting for the allocatio ?)

    What's your opinion about this ?

    Thank you
    Duc

    ReplyDelete
    Replies
    1. Some more precisions : the figures in the second document(from Miao Zheng) presents a reduction of 3% in CICS part CPU and 30% in DB2 part. 3% seems OK to me , but how to explain a 30% decrease ? (Bind option release deallocate is used in both case)

      Delete
    2. I don't know that thread reuse has zero impact on in-DB2 CPU time (i.e., DB2 accounting trace class 2 CPU time) for an application, but I do think that the bulk of the CPU savings associated with thread reuse will be seen in reduced CPU consumption by the DB2 system services address space (also known as MSTR) -- much of the CPU consumption of this address space is associated with thread creation and termination (of course, MSTR CPU consumption is usually pretty small to begin with).

      A 30% decrease in in-DB2 CPU time seen with thread reuse looks to me to be unusually high, but I guess it's conceivable. In that case, it's not so much the thread reuse by itself that made the big difference -- it's the fact that thread reuse enabled realization of the CPU efficiency benefit of the RELEASE(DEALLOCATE) package bind option. If CICS-DB2 threads are not reused, RELEASE(DEALLOCATE) will work pretty much as RELEASE(COMMIT) does.

      RELEASE(DEALLOCATE) combined with CICS-DB2 thread reuse could potentially result in a substantial decrease in in-DB2 CPU time for a CICS-DB2 transaction if that transaction has a low in-DB2 CPU cost anyway (if just a few low-cost SQL statements are issued by the transaction, the cost of deallocating and reallocating table space locks and package sections from and to a thread will be proportionately higher than would be the case for a transaction that issues higher-cost SQL statements, so the reduction in in-DB2 CPU time resulting from the combination of RELEASE(DEALLOCATE) and thread reuse would be greater, percentage-wise, for the transaction issuing a few low-cost SQL statements).

      Even so, 30% savings in in-DB2 CPU consumption looks high to me. I tend to see in-DB2 CPU time reductions that are closer to 10% when RELEASE(DEALLOCATE) is combined with CICS-DB2 thread reuse.

      Robert

      Delete
    3. This comment has been removed by the author.

      Delete
    4. Hello Robert,
      Thanks for the explanation.
      I came back the other day from a session by John Campbell, and he also happened to indicate that saving won't be accounted for DB2 Accounting trace, rather you'll have to go to the CICS 110 records.

      My question is, if I were to do an apple-to-apple comparison, after having changed my application to leverage Therad Reuse(together with REBIND with RELEASE-DEALLOCATE), how would I quantify the benefit at a transaction level to my boss?

      Thanks in advance.
      Cheers, Vibin

      Delete
    5. John may have been referring to CPU time savings associated solely with thread re-use, as opposed to CPU time savings associated with the combination of thread re-use and the RELEASE(DEALLOCATE) package bind option.

      If I were wanting to quantify CPU savings associated with the combination of thread re-use and the RELEASE(DEALLOCATE) package bind option for a CICS transaction, here's what I would do. Let's say that the transaction in question is called TRN1. I would use my DB2 monitor to generate an accounting long report, for only transaction TRN1 (if the monitor is IBM's OMEGAMON for DB2, that would be done via the option INCLUDE(CORRNAME(TRN1)) - if you use another vendor's DB2 monitor, there should be a similar option). Specify FROM and TO times for the report that capture, say, an hour during which the transaction is frequently executed (and make the reporting INTERVAL, in that case, 60 minutes, so as to have one report showing 60 minutes of activity, as opposed to something like 4 reports each showing 15 minutes of activity). The information in this report would be your baseline. Then, provide a few protected entry threads for the transaction to get a high level of thread re-use, and bind the transaction's DB2 package (or packages, if several packages are executed via transaction TRN1, focusing on the most frequently executed of those packages) with RELEASE(DEALLOCATE). Then generate another DB2 monitor accounting long report, just like the first (include only the correlation name TRN1, and use the same FROM and TO times). Then compare output in the "before" and "after" reports. First, verify in the section of the report with a heading like NORMAL TERM (that's the heading used in an OMEGAMON report - it's short for "normal termination") that the transaction is getting a high degree of thread reuse (ideally, well over 90%) - thread reuse is indicated by values in the NEW USER and/or RESIGNON fields under the heading NORMAL TERM. Having verified that, check the value in the CP CPU TIME field in the "DB2 (CLASS 2)" column on the first page of the report. I would expect this time to be less in the "after" report versus the "before" report. That would be average in-DB2 CPU savings "per occurrence," which is per transaction if your system is set up (as many are) to get one CICS-DB2 transaction per DB2 accounting trace record. Total savings would then be per-transaction savings times number of transactions executed during the period of interest.

      Robert

      Delete
  3. Hi Robert , With the DB2 Stats, there is 2 fields : Signon (Q3STSIGN) and Terminate (Q3STTERM).
    The information i've found related to Q3STSIGN is "This field has meaning only when applied to CICS and IMS. It represents the number of times an existing thread has gone through new user signon reusing an existing thread." (from BMC manual)
    So this field seems to be a good indicator for thread reuse ? i know that the accounting is more precise, but just want to have an indicator.
    So, i have this figure :
    From 9 to 10 am : 1379352 of signon, and 1717357 of Terminate.
    But when i check the CICS stats (SMF110 using DFHSTUP) then the count of thread reuse is 0 on all the interval from 9 to 10.
    So obviously the saying of BMC manual is wrong ?
    Thank you for your clarification (if you've gone thru this)
    Duc

    ReplyDelete
  4. Hi Robert, got my answer in Omegamon documentation now. "If the number of signons is greater than the number of create thread
    occurrences, some threads have been reused"

    ReplyDelete
    Replies
    1. Sorry about the delayed response, Duc. To be honest with you, I never use DB2 statistics trace information (as presented by a DB2 monitor) to gauge CICS-DB2 thread reuse. I only use accounting information as shown in a DB2 monitor-generated report for a CICS-DB2 workload (or for an individual CICS region or for an individual CICS-DB2 transaction).

      Robert

      Delete