Tuesday, August 28, 2012

The New IBM zEnterprise EC12: Big Iron Gets Bigger (and Better)

When I say "bigger," I don't mean bigger in a "footprint" sense -- the new flagship system in IBM's mainframe server product line is very close, size-wise, to its predecessor, the z196, and has similar power draw and heat load characteristics. What I mean by "bigger"is that the zEnterprise EC12 provides the ability to get more work done, in less time, more efficiently than ever before. That's good news for organizations that need best-of-breed enterprise computing capabilities. It's also good news for those of us who comprise the DB2 for z/OS community: the horse that our favorite DBMS rides is, in the form of the EC12, a thoroughbred like no other.

Continuing with the equine analogy, let's start with a look at the raw horsepower numbers. The EC12's engines run at 5.5 GHz, the highest clock frequency in the industry (the frequency of a z196 engine is 5.2 GHz). That's impressive enough, but the EC12 delivers a per-core processing capacity advantage over the z196 -- up to 25% -- that is significantly greater than suggested just by the difference in engine speed. Very fast processors are only part of the EC12 performance story. There's also a doubling of the cache memory on each processor chip, which reduces delays caused by having to go "off chip" for instructions and data.

Not only are there faster engines that can be more effectively kept busy, there are more engines: a single EC12 server can have up to 120 cores, up to 101 of which can be configured for client use (versus a maximum of 80 configurable cores on a z196 server). Put it all together (a larger number of faster engines), and a fully configured EC12 can deliver 50% more processing capacity than a top-end z196 server.

The total amount of memory available on an EC12 server -- 3 TB -- is the same as for the z196, but something new and very cool is being done with EC12 memory: it can be managed in 2 GB page frames. Yes, that is 2 GB, as in 2000 times the size of the 1 MB page frames available on z196 servers. What does that mean for DB2? Think about a page-fixed buffer pool of 2 GB in size (e.g., 500,000 4K buffers) fitting into ONE real storage page frame. Virtual-to-real storage translation will be more efficient than ever, and that will translate into CPU savings. If you've ever wondered, as a mainframe DB2 person, whether or not the age of Big Memory has really begun, wonder no more. It's here. Get ready to exploit it, if you haven't already.

You might find yourself thinking, "What would my company do with a server that has that much memory and that much processing capacity?" In that case, I have one word for you: virtualization. Don't limit your thinking of EC12 exploitation to one or just a few system images. Think LOTS of system images. Lots of z/OS LPARs on an EC12? Maybe, but I'm thinking more along Linux lines when I think of lots of virtual systems running on a single EC12 box. If your organization isn't running Linux on System z, or thinking seriously about doing so, you're missing a train on which a whole lot of people are traveling. Installation of IFL engines on zEnterprise servers (these are specialty engines that run the Linux operating system) is soaring, and with good reason. The mainframe is a great virtual-system platform (has been for decades, since long before "cloud" mean anything other than coalesced water vapor), and if you have a big and growing DB2 for z/OS client-server workload (as many organizations do), what better place is there for your DB2-accessing application servers than in a Linux image on the same System z server in which DB2 is running? Ever heard of HiperSockets? Network performance doesn't get any better than that. And, think of the benefits in terms of network simplification (and security) when app servers run under Linux on System z, instead of on outboard boxes. With the EC12, the already great System z virtualization story gets better still.

Virtualization's hot, and so is analytics. How about analytics on z? I'm not talking just about having the data accessed by your analytics tools housed in a mainframe database -- we already know that DB2 for z/OS is a great solution there. I'm talking about having the analytics tools themselves running on System z -- in a Linux image or a z/OS LPAR. More and more organizations are doing just that, and the EC12 will provide a related momentum boost. How's that? Well, as you get into progressively higher-value types of analytics -- from "what happened" reporting to "what will happen" predictive modeling -- then you find that the associated processing gets to be more and more CPU-intensive. The EC12 delivers here with significant improvements in performance for compute-intensive and floating-point applications. C and C++ on z? Java on z? Yes. The EC12 is the best mainframe server yet for business analytics.

Excellent performance across a wide range of applications is great, but we all know that a system has to be up if it's to deliver the processing power your organization needs. System z has always been the high-availability champ, and the EC12 takes high availability to a whole new level with zAware (short for System z Advanced Workload Analysis Reporter). zAware provides what one of my colleagues has termed "integrated expert system diagnostics," constantly monitoring OPERLOG messages (which at some sites can number in the millions per day) and presenting related information via an easy-to-interpret graphical user interface. zAware can help mainframe operations personnel to quickly determine when an EC12 system is not behaving normally -- thereby enabling corrective action to be taken before a situation reaches user-impacting proportions.

I've covered a lot in this blog post, but you can learn more about the zEnterprise EC12 and I encourage you to do so. Use the hyperlink I provided at the top of this entry, check out the IBM EC12 Technical Introduction and EC12 Technical Guide "red books," and look for presentations at national conferences and local events. The more you know, the more you'll realize that Big Iron never looked better.

Monday, August 20, 2012

When CICS and DB2 for z/OS Monitors Disagree (Part 2)

About a week ago, in part 1 of this two-part blog entry, I described a situation I've encountered several times at mainframe DB2 sites over the past 20 years: CICS users complain of elongated response times for DB2-accessing transactions, and these users' complaints are backed up by CICS monitor data that indicate elevated "wait for DB2" times as being the cause of the performance degradation. Concurrently, DB2 systems people point to their own data, from a DB2 monitor, showing that in-DB2 time for CICS transactions is NOT significantly higher than normal during times when the CICS programs are running long. Finger-pointing can ensue, and frustration can build in the face of a seemingly intractable problem: how can it be that the CICS monitor shows higher "wait for DB2" times, while according to DB2 monitor data the performance picture looks good? I noted in the aforementioned part 1 blog entry that this situation can occur when there is either a shortage of DB2 threads available for CICS-DB2 transactions or a shortage of TCBs through which the DB2 threads are utilized. In this part 2 entry I'll shine a light on another possibility: the disagreement between CICS and DB2 monitors can be caused by inappropriate priority specifications for DB2 address spaces and/or CICS transactions.

The priority to which I'm referring is dispatching priority -- the kind that determines which of several "in and ready" tasks in a z/OS system will get CPU cycles first. Dispatching priorities for address spaces are specified in the WLM policy of the z/OS LPAR in which they run. The busier a system is (with respect to CPU utilization), the more important it is to get the address space dispatching priorities right. DB2's IRLM address space (the lock manager) should have a very high priority, and in fact should be assigned to the SYSSTC service class (for ultra-high-priority address spaces). Most sites get this right. The mistake that I've seen more than once is giving the DB2 system services and database services address spaces (aka MSTR and DBM1) a priority that is lower than that of the CICS application owning regions (AORs) in the z/OS LPAR. The thinking here is often something like this: "If the DB2 MSTR and DBM1 address spaces have a higher priority than the CICS AORs, they will get in the way of CICS transactions and performance will be the worse for it." Not so. The DB2 "system" address spaces generally consume little in the way of CPU time (DDF can be a different animal in this regard, as I pointed out in an entry posted to this blog last month), so "getting in the way" of CICS regions is not an issue. In fact, it's when CICS regions "get in the way" of the DB2 system address spaces (as they can in a busy system when they have a higher-than-DB2 priority), that CICS transaction performance can go downhill.

How's that? Well, it can come down to a thread-availability issue. It's common for CICS-DB2 threads to be created and terminated with great frequency as transactions start and complete. The DB2 address space that handles thread creation and termination is MSTR, the system services address space. If CICS address spaces are ahead of MSTR in the line for CPU cycles (i.e., if the CICS AORs have a higher priority than MSTR), and if the z/OS LPAR is really busy (think CPU utilization north of 90%), MSTR may not be readily dispatched when it has work to do -- and the work MSTR needs to do may be related to CICS-DB2 thread creation. If DB2 threads aren't made available to CICS transactions in a timely manner, the transactions will take longer to complete, and "wait for DB2" will be seen -- via a CICS monitor -- as the reason for the not-so-good performance. Check DB2 monitor data at such times and you'll likely see that things look fine. This is because DB2 doesn't "see" a CICS transaction until it gets a thread. Thus, as I pointed out in part 1 of this two-part entry, the "wait for DB2" time reported by the CICS monitor can be time spent in-between CICS and DB2, and that's why the different monitors paint different pictures of the same performance scene: the CICS monitor indicates that transactions are waiting longer for DB2, and that's true, but the DB2 monitor shows that when a transaction does get the thread it needs it performs just fine. If you see this kind of situation in your environment, check the dispatching priorities of the DB2 address spaces and of the CICS AORs. The priority of the DB2 MSTR, DBM1, and DDF address spaces should be a little higher than that of the CICS AORs in the LPAR. [Don't sweat giving DDF a relatively high dispatching priority. The vast majority of DDF CPU utilization is associated with the execution of SQL statements that come from network-attached clients, and these execute at the priority of the application processes that issue the SQL statements -- not at the priority of DDF.] Oh, and in addition to ensuring that DB2 MSTR can create threads quickly when they are needed for CICS transaction execution, think about taking some of the thread creation pressure off of MSTR by increasing the rate at which CICS-DB2 threads are reused -- I blogged on this topic a few months ago.

One other thing to check is the priority of the CICS transactions themselves -- more precisely, the priority of the TCBs used for CICS transaction execution. This is specified through the value given to PRIORITY in the definition of a CICS DB2ENTRY resource, or in the DB2CONN definition for transactions that utilize pool threads. The default value of the PRIORITY attribute is HIGH, and this means that the tasks associated with entry threads (or pool threads, as the case may be) will have a dispatching priority that is a little higher than that of the CICS AOR's main task. HIGH is an OK specification if the z/OS LPAR isn't too busy -- it helps to get transactions through the system quickly; however, if the z/OS LPAR is very busy, PRIORITY(HIGH) may lead to a throughput issue -- this because not only the DB2 address spaces, but the CICS AORs, as well, could end up waiting behind transactions to get dispatched. In that case, going with PRIORITY(LOW) could actually improve CICS-DB2 transaction throughput. I have seen this myself. Bear in mind that PRIORITY(LOW) doesn't mean batch low -- it means that the transactions will have a priority that is a little lower than that of the CICS region's main task.

Bottom line: dealing with (or better, preventing) CICS-DB2 performance problems is sometimes just a matter of getting your priorities in order.

Wednesday, August 8, 2012

When CICS and DB2 for z/OS Monitors Disagree (Part 1)

Today I'm writing about a situation that I first encountered almost 20 years ago, and which I most recently saw about three months ago: an organization with a CICS transaction workload accessing a DB2 for z/OS database reports a performance problem. The company's CICS people point to monitor data that indicate higher "wait for DB2" times when CICS transactions run long. Meanwhile, the folks on the DB2 team serve up monitor numbers of their own, showing that in-DB2 times for CICS transactions are not higher than normal during CICS slow-down events. What's to be made of this seeming contradiction? How can it be that for a given period, a CICS monitor shows unusually high "wait for DB2" times while DB2 monitor data for the same period show consistently good in-DB2 elapsed times for CICS transactions? I've found that this kind of "are too!" / "am not!" situation is usually the result of CICS transactions waiting longer than usual for DB2 threads. "Wait for thread" goes in the "wait for DB2" bucket from the CICS monitor perspective, but it doesn't inflate in-DB2 elapsed times because the DB2 monitor doesn't "see" a transaction until it has a thread. "Wait for thread" time is, in essence, a waiting that is "between" CICS and DB2, and that's why the CICS and DB2 monitors see it differently.

OK, so why would CICS transactions have to wait longer than usual for DB2 threads? In my experience, this happens for one of two reasons: not enough threads, or not enough priority. In the remainder of this blog entry I'll expand on the first of these two factors: not enough threads (in some cases, this is actually a matter of not having enough TCBs, as explained below). In a part 2 entry, I'll address the priority issue.

Back in the 1980s and 1990s, the connection between a CICS application-owning region (AOR) and a local DB2 for z/OS subsystem was defined by way of a macro called the RCT, or resource control table. RCT gave way to RDO (CICS resource definition online) towards the end of the 1990s. One of the CICS-DB2 connection values, specified in the definition of a resource called DB2CONN, is TCBLIMIT -- the maximum number of TCBs (task control blocks) that can be used to connect transactions running in the CICS region to a target DB2 subsystem. Another of the CICS-DB2 connection set-up parameters, THREADLIMIT, appears in a DB2CONN resource definition (indicating the maximum number of pool threads for CICS-DB2 transactions) and can also appear in a DB2ENTRY resource definition (indicating the maximum number of entry threads for transactions associated with the DB2ENTRY resource). The sum of all THREADLIMIT values (for pool threads and entry threads) for a given CICS region should be less than the value of TCBLIMIT for the region, and people generally start out that way; however, over time folks may increase THREADLIMIT values -- to accommodate a growing CICS-DB2 transaction workload -- without adjusting TCBLIMIT accordingly; thus, the sum of all THREADLIMIT values for a CICS region could end up being greater than the TCBLIMIT value, and that could result in a situation in which threads are available for transactions, but TCBs needed to use those threads to connect to DB2 are insufficient in number to avoid elongated wait times. You should check your system for this possibility, and, if the sum of THREADLIMIT values exceeds TCBLIMIT for a region, either adjust TCBLIMIT upwards or adjust THREADLIMIT values downwards. I'd generally lean towards a larger TCBLIMIT value in that case, but if there were a lot of protected entry threads defined (PROTECTNUM > 0 and THREADLIMIT > 0 for DB2ENTRY resources), I'd consider reducing PROTECTNUM and THREADLIMIT values for those DB2ENTRY resources.

Here's another scenario: TCBLIMIT is greater than the sum of all THREADLIMIT values for a CICS region, but wait-for-thread time is still relatively high. This can happen when the following are true:
  • THREADLIMIT is set to zero (or a very small number) for DB2ENTRY resources
  • THREADWAIT(POOL) is specified for these same DB2ENTRY resources
  • THREADLIMIT is a too-small number for pool threads (i.e., THREADLIMIT has a small value in the region's DB2CONN resource definition)

In that case, it may be that lots of transactions are overflowing to the pool, but the number of pool threads is not large enough for the associated transaction volume. This situation could be indicated by a non-zero value in the W/P column for the POOL row in the output of a DSNC DISPLAY STATISTICS command for the region in question (this is an attachment command issued through a CICS-supplied transaction called DSNC). If you see such a value, bump up the THREADLIMIT number in the DB2CONN resource definition for the region (and in doing that, check, as previously mentioned, to see if a TCBLIMIT increase might be needed, as well).

Of course, you could also have elongated wait-for-thread times if a DB2ENTRY resource definition has a very small (but non-zero) THREADLIMIT value and a specification of THREADWAIT(YES). Before messing with that, check to see if this is an intentional aspect of the DB2ENTRY definition: it's conceivable that a certain transaction must be limited with respect to concurrent execution (maybe even single-threaded, via THREADLIMIT(1) and THREADWAIT(YES) for a DB2ENTRY resource) in order to prevent a contention problem.

Here's one more possibility (though it's not something that I've actually seen): you may have high CICS wait-for-thread times because the DB2 limit on the number of threads for local (i.e., not network-attached) programs is too small. The ZPARM parameter CTHREAD specifies this limit, and it's for all local threads: for CICS transactions, batch jobs, TSO users, etc. If you have an indication of high CICS wait-for-thread times, and you have a pretty small CTHREAD value, consider adjusting CTHREAD upwards. Note that CTHREAD can have a much larger value in a DB2 10 environment versus prior versions of DB2 for z/OS -- this because almost all thread-related virtual storage usage goes above the 2 GB "bar" when packages are bound (or rebound) in a DB2 10 system. Whereas the sum of CTHREAD and MAXDBAT formerly had to be less than 2000, in a DB2 10 environment that sum has to be less than 20,000.

You can help to avoid high CICS wait-for-thread times by ensuring that you have enough threads (and enough CICS TCBs) for your CICS-DB2 transactions. Check back in a week or so for part 2 of this two-part entry, in which I'll explain how high wait-for-thread times can be a matter of priorities (dispatching, that is).