Tuesday, July 30, 2024

Db2 for z/OS: What Makes for a Balanced Mainframe Configuration?

What I'm writing about today is something I've been talking about for years, thought often in a somewhat indirect way. It has to do with mainframe memory (often referred to as "real storage" by z/OS people, to distinguish it from virtual storage) - more specifically, what it takes, memory-wise, to have what I'd call a balanced configuration for a z/OS LPAR (a z/OS LPAR, or logical partition, is a z/OS system; Db2 for z/OS runs within a z/OS LPAR, and a given mainframe "box" could house one or more z/OS LPARs). After working this year with a couple of situations involving z/OS LPARs with not-well-balanced configurations, I think it's time to address the topic directly.

I'll tell you right off that the gist of the matter is this: to realize the full productive potential of a z/OS LPAR's processing capacity, that capacity has to be balanced by an appropriate amount of system memory.

It's at this point that an IT person who administers so-called distributed systems servers (Linux, UNIX or Windows servers) could be expected to say, "Duh." See, distributed systems administrators have talked about "balanced configuration units" (aka BCUs) for many years (when I was in the IT organization of a financial services company back in the early 2000s, I regularly saw references to BCUs). The BCU concept was simple and straightforward: a distributed systems server with X number of processor cores should have at least Y amount of memory (BCU, in this context, often extended to disk storage, as well - that's not so much the case with z/OS systems, owing in large part to the architecture of the z/OS I/O subsystem).

Oddly enough, I generally didn't hear much talk about balanced configurations among z/OS people. In fact, back around 2008 or so things started to get more un-balanced at some z/OS sites. That was when IBM delivered the z10 line of mainframe servers, a line that pretty dramatically increased processing power relative to its predecessor. What then happened in more than a few instances is that z/OS LPAR processing capacity raced ahead of real storage resource growth. Why did this happen? I'm not certain, but one possibility is the fact that IBM Z (the official product designation for mainframe servers) and z/OS were a bit late to the 64-bit party, referring to 64-bit addressing, which hugely expanded virtual storage resources (to 16 exabytes, from the 2 gigabyte limit imposed by 31-bit addressing) and offered, as well, access to vastly larger real storage resources. z/OS system administrators had for years worked hard to fit huge mainframe workloads into virtual and real storage spaces that could not exceed 2 GB, and when enormously large real storage resources became a possibility, it seemed to me that there was almost a reluctance on the part of some z/OS systems people to ask for more memory than they'd used before. Maybe people felt that requesting a lot more memory for a z/OS system was akin to "taking the easy way out" by just "throwing hardware at the problem" of keeping up with application workload demands.

Distributed systems people had no such qualms about requesting lots of memory to go with lots of processing power.

Anyway, at a number of z/OS sites things got pretty rough with regard to delivering required levels of application performance and throughput because of a lack of real storage resources, and these issues could be particularly acute for z/OS systems supporting Db2 for z/OS workloads. Why is that? Because Db2 for z/OS, from the beginning, was architected to take advantage of large virtual and real storage resources. See, when Db2 for MVS (as it was originally known) was announced back in 1983 (as I recall - it was the second year of my first stint with IBM), the MVS/ESA operating system was right around the corner - and with it, Db2 for MVS/ESA. MVS/ESA provided 31-bit addressing, taking the old 16 megabyte virtual and real storage limit (can you believe that?) associated with 24-bit addressing up to a then-astounding 2 gigabytes. That great big increase in addressability allowed the buffering of large amounts of Db2 data in memory, and that was a huge factor in generating the performance that helped Db2 become a tremendously popular DBMS for z/OS systems (the "We have liftoff" for Db2 moment came when its performance capability matched the huge programmer productivity boost associated with SQL and the relational database model, both of which IBM invented).

Fast-forward lots of years, and you get to that time - starting around the late 2000s, by my reckoning - when, as noted, z/OS processing capacity got seriously out in front of real storage resources at a number of Db2 for z/OS sites. What this sometimes meant for those sites: either the Db2 buffer pool configuration was much smaller than it should have been, due to the lack of real storage for the z/OS LPAR, resulting in very high read I/O rates that impeded application throughput and increased CPU consumption; or, the Db2 buffer pool configuration - while still perhaps on the small side - was too big relative to the undersized real storage resource, leading to elevated levels of demand paging in the z/OS system, with resultant adverse impacts on performance (a production z/OS LPAR's demand paging rate, available via an IBM RMF CPU Summary report, should be less than 1 per second). Plenty of organizations have acted in recent years to get real storage in line with processing capacity for their Db2-containing z/OS LPARs, but quite a few still have production Db2 for z/OS subsystems running in z/OS LPARs that are under-configured from a real storage perspective. It's a good time to achieve balance for Db2-related z/OS LPARs that currently have too little system memory.

OK, so what does a balanced configuration look like for a z/OS LPAR in which a production Db2 subsystem runs? My rule of thumb, based on years of reviewing performance data for production Db2 for z/OS systems, is this: the z/OS LPAR should have at least 20 GB of real storage per engine (i.e., per processor) - and that's regardless of the mix of general-purpose and zIIP engines configured for the LPAR. Here's an example: suppose you have a z/OS LPAR, in which a production Db2 subsystem runs, that is configured with 5 general-purpose and 3 zIIP engines. I'd say that to effectively balance that LPAR's processing capacity with real storage, you'd want the LPAR to have at least 160 GB of real storage (8 engines X 20 GB - at least - per engine). I'm emphasizing "at least" because you don't need to stop at 20 GB of real storage per engine. Some Db2 for z/OS-using organizations have production z/OS systems with 30 GB, 40 GB, 50 GB or more of real storage per engine. Is that over-doing it? Not in my view. These organizations have done things like super-sizing buffer pools; going big for other performance-relevant Db2 memory areas such as the package cache, the database descriptor cache, the prepared dynamic statement cache, the sort pool and the RID (row ID) pool; and boosting the use of RELEASE(DEALLOCATE) packages combined with persistent threads (i.e., Db2 threads that persist across commits) - a move that, like the others, ups Db2's memory usage and drives down per-transaction CPU consumption. The point: give Db2 for z/OS enough memory, and it will perform very well. Give it even more memory, and it will perform even better.

[Here's an important cost factor to consider: upping the real storage resource for a z/OS LPAR will not cause software costs to increase for the LPAR. z/OS software costs are based on general-purpose processor utilization, not real storage size.]

You don't have to take my word on the importance of large memory resources for the performance of a Db2 for z/OS system - just look at the trend lines:

  • z/OS LPAR real storage sizes are in fact getting steadily larger over time, driven in large part by positive results achieved through leveraging big memory for improved Db2 for z/OS workload performance and CPU efficiency (and fueled as well by the continuing drop in the cost of mainframe memory on a per-gigabyte basis). Reference point: the largest real storage size I've seen with my own eyes for a z/OS LPAR in the real world (versus an IBM performance benchmark system) is 2.4 terabytes. The Db2 for z/OS subsystem running in that LPAR has a buffer pool configuration size (aggregate size of all allocated buffer pools) of 1.7 TB, and the LPAR's demand paging rate is zero (the 700 GB of real storage beyond the Db2 buffer pool configuration size is more than enough for the rest of the system's storage requirements).
  • z/OS 3.1 (the current version of the operating system) provides support for up to 16 terabytes of real storage for one z/OS LPAR (up from 4 TB previously). Caveat: with the current packaging of memory for the IBM z16 line of mainframe servers, it's recommended that you not go beyond 10 TB of real storage for a single z/OS LPAR - that restriction will likely be gone at a future time.
  • Since Db2 12 for z/OS (the current version is Db2 13), a single Db2 subsystem can have a buffer pool configuration size of up to 16 TB (that's a prep-for-the-future thing - you generally want your buffer pool configuration size to be less than the real storage size of the associated z/OS LPAR, and as noted that size is currently limited to 16 TB, with a recommended maximum of 10 TB for an LPAR on a z16 server).
  • Multiple virtual storage-related (and, therefore, real storage-related) Db2 for z/OS configuration parameters (known as ZPARMs) have default values for Db2 13 that are substantially greater than the corresponding Db2 12 values. These parameters include those that specify the size of the package cache, the database descriptor cache, the sort pool and the log output buffer. The substantially larger default values for these parameters in a Db2 13 environment reflect the awareness of the IBM Db2 for z/OS development team that z/OS LPAR real storage sizes are getting larger in an ongoing way, and a reaffirmation that leveraging larger real storage resources is a winner for Db2 performance and CPU efficiency.
If you have a Db2-housing z/OS LPAR that is under-configured in terms of real storage, do what you can to change that situation. Yes, when 2 GB was the real and virtual storage limit (the latter referring to the address space size limit) for z/OS systems, Db2 for z/OS people went to heroic lengths to push huge transaction rates through LPARs that were very much memory-constrained. That was then. These days, no one gets a medal for "doing the best you can" with a Db2 for z/OS system that is sorely lacking in the real storage department. Take a cue from your peers on the distributed systems side of the house. Memory matters.