Wednesday, August 8, 2012

When CICS and DB2 for z/OS Monitors Disagree (Part 1)

Today I'm writing about a situation that I first encountered almost 20 years ago, and which I most recently saw about three months ago: an organization with a CICS transaction workload accessing a DB2 for z/OS database reports a performance problem. The company's CICS people point to monitor data that indicate higher "wait for DB2" times when CICS transactions run long. Meanwhile, the folks on the DB2 team serve up monitor numbers of their own, showing that in-DB2 times for CICS transactions are not higher than normal during CICS slow-down events. What's to be made of this seeming contradiction? How can it be that for a given period, a CICS monitor shows unusually high "wait for DB2" times while DB2 monitor data for the same period show consistently good in-DB2 elapsed times for CICS transactions? I've found that this kind of "are too!" / "am not!" situation is usually the result of CICS transactions waiting longer than usual for DB2 threads. "Wait for thread" goes in the "wait for DB2" bucket from the CICS monitor perspective, but it doesn't inflate in-DB2 elapsed times because the DB2 monitor doesn't "see" a transaction until it has a thread. "Wait for thread" time is, in essence, a waiting that is "between" CICS and DB2, and that's why the CICS and DB2 monitors see it differently.

OK, so why would CICS transactions have to wait longer than usual for DB2 threads? In my experience, this happens for one of two reasons: not enough threads, or not enough priority. In the remainder of this blog entry I'll expand on the first of these two factors: not enough threads (in some cases, this is actually a matter of not having enough TCBs, as explained below). In a part 2 entry, I'll address the priority issue.

Back in the 1980s and 1990s, the connection between a CICS application-owning region (AOR) and a local DB2 for z/OS subsystem was defined by way of a macro called the RCT, or resource control table. RCT gave way to RDO (CICS resource definition online) towards the end of the 1990s. One of the CICS-DB2 connection values, specified in the definition of a resource called DB2CONN, is TCBLIMIT -- the maximum number of TCBs (task control blocks) that can be used to connect transactions running in the CICS region to a target DB2 subsystem. Another of the CICS-DB2 connection set-up parameters, THREADLIMIT, appears in a DB2CONN resource definition (indicating the maximum number of pool threads for CICS-DB2 transactions) and can also appear in a DB2ENTRY resource definition (indicating the maximum number of entry threads for transactions associated with the DB2ENTRY resource). The sum of all THREADLIMIT values (for pool threads and entry threads) for a given CICS region should be less than the value of TCBLIMIT for the region, and people generally start out that way; however, over time folks may increase THREADLIMIT values -- to accommodate a growing CICS-DB2 transaction workload -- without adjusting TCBLIMIT accordingly; thus, the sum of all THREADLIMIT values for a CICS region could end up being greater than the TCBLIMIT value, and that could result in a situation in which threads are available for transactions, but TCBs needed to use those threads to connect to DB2 are insufficient in number to avoid elongated wait times. You should check your system for this possibility, and, if the sum of THREADLIMIT values exceeds TCBLIMIT for a region, either adjust TCBLIMIT upwards or adjust THREADLIMIT values downwards. I'd generally lean towards a larger TCBLIMIT value in that case, but if there were a lot of protected entry threads defined (PROTECTNUM > 0 and THREADLIMIT > 0 for DB2ENTRY resources), I'd consider reducing PROTECTNUM and THREADLIMIT values for those DB2ENTRY resources.

Here's another scenario: TCBLIMIT is greater than the sum of all THREADLIMIT values for a CICS region, but wait-for-thread time is still relatively high. This can happen when the following are true:
  • THREADLIMIT is set to zero (or a very small number) for DB2ENTRY resources
  • THREADWAIT(POOL) is specified for these same DB2ENTRY resources
  • THREADLIMIT is a too-small number for pool threads (i.e., THREADLIMIT has a small value in the region's DB2CONN resource definition)

In that case, it may be that lots of transactions are overflowing to the pool, but the number of pool threads is not large enough for the associated transaction volume. This situation could be indicated by a non-zero value in the W/P column for the POOL row in the output of a DSNC DISPLAY STATISTICS command for the region in question (this is an attachment command issued through a CICS-supplied transaction called DSNC). If you see such a value, bump up the THREADLIMIT number in the DB2CONN resource definition for the region (and in doing that, check, as previously mentioned, to see if a TCBLIMIT increase might be needed, as well).

Of course, you could also have elongated wait-for-thread times if a DB2ENTRY resource definition has a very small (but non-zero) THREADLIMIT value and a specification of THREADWAIT(YES). Before messing with that, check to see if this is an intentional aspect of the DB2ENTRY definition: it's conceivable that a certain transaction must be limited with respect to concurrent execution (maybe even single-threaded, via THREADLIMIT(1) and THREADWAIT(YES) for a DB2ENTRY resource) in order to prevent a contention problem.

Here's one more possibility (though it's not something that I've actually seen): you may have high CICS wait-for-thread times because the DB2 limit on the number of threads for local (i.e., not network-attached) programs is too small. The ZPARM parameter CTHREAD specifies this limit, and it's for all local threads: for CICS transactions, batch jobs, TSO users, etc. If you have an indication of high CICS wait-for-thread times, and you have a pretty small CTHREAD value, consider adjusting CTHREAD upwards. Note that CTHREAD can have a much larger value in a DB2 10 environment versus prior versions of DB2 for z/OS -- this because almost all thread-related virtual storage usage goes above the 2 GB "bar" when packages are bound (or rebound) in a DB2 10 system. Whereas the sum of CTHREAD and MAXDBAT formerly had to be less than 2000, in a DB2 10 environment that sum has to be less than 20,000.

You can help to avoid high CICS wait-for-thread times by ensuring that you have enough threads (and enough CICS TCBs) for your CICS-DB2 transactions. Check back in a week or so for part 2 of this two-part entry, in which I'll explain how high wait-for-thread times can be a matter of priorities (dispatching, that is).

No comments:

Post a Comment