Robert's Db2 blog: 2024

Tuesday, December 17, 2024

Db2 13 for z/OS: Granular Control of Security Requirements for Client-Server Applications

The Db2 profile tables - SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES - have long been very useful in managing a Db2 for z/OS client-server application workload (i.e., a Db2 system's DDF workload). In particular, organizations have appreciated the ability, provided via the profile tables, to set thread limits and/or connection limits and/or an idle thread timeout value for particular DDF-using applications, versus having to rely solely on the subsystem-wide thread limit, connection limit and idle thread timeout values specified by way of the ZPARM parameters MAXDBAT, CONDBAT and IDTHTOIN, respectively. Db2 13 for z/OS recently delivered a new "more granular than ZPARM" DDF workload management option that can be exercised through the profile tables. This time, the more fine-grained control pertains to security requirements for Db2 for z/OS client-server applications. You can read on to learn more about this enhancement.

The challenge for which a solution was needed

Historically, a system-wide security requirement (i.e., one that would apply to all requesters) for a Db2 for z/OS system was specified via the TCPALVER parameter in ZPARM. The need for a more-granular client-server security requirement specification can be readily seen if you think about a new requirement that you'd like to eventually apply to the entire DDF workload for a Db2 for z/OS system. Suppose, for example, that your Db2 client-server applications now authenticate to the Db2 server with a password, and you'd like to eventually have all requesters authenticate to Db2 instead with a digital certificate; or, maybe you'd like for all requester applications accessing the Db2 system to eventually be required to utilize SSL encryption (aka AT/TLS encryption, referring to encryption of data "on the wire" that flows between Db2 for z/OS and requester applications); or, maybe you'd like to eventually have all requester applications use multi-factor authentication when requesting a connection to the Db2 system, or use RACF-generated authentication tokens. With all these situations, the tricky thing has been the "eventually" part of the objective. That's been problematic because Db2 for z/OS client-server security requirements have often required implementation in a "big switch" way (as in, throwing a big switch that suddenly routes all traffic to track A versus track B).

Consider, for example, a desire to have all Db2 for z/OS client-server applications use SSL encryption. One way to do that - available since April of 2019, via Db2 12 APAR PH08188 - is to make it so that the only SQL listener port available for the Db2 subsystem in question is a secure port (when a client application requests a connection to a Db2 for z/OS system via the Db2 server's secure port, Db2 will reject the connection if it determines that the requester cannot support SSL encryption). Let's say you go this route. OK, objective achieved - but, at what cost? Setting up SSL encryption for a Db2 client-server application can be a little tricky if you haven't done that before. There are multiple steps that have to be taken - correctly - on the Db2 and z/OS side, and on the client application side. It's pretty easy to get something wrong, and it could take a few tries before you get it right. If the Db2 for z/OS system's only SQL listener port is of the secure variety (i.e., you've thrown that particular "big switch"), any of your applications that aren't properly set up for SSL encryption will not be able to connect to the Db2 system (and if part of the server-side set-up for SSL encryption was not done properly, NO applications will be able to connect to the Db2 system). This could end up causing substantial disruption for one or more of your Db2 for z/OS client-server applications - something you'd certainly want to avoid.

It's a similar story for things such as certificate-based (versus password-based) client authentication, multi-factor authentication, etc. There are "big switch" ways in which you can lay down the law, but those approaches carry with them an elevated risk of application disruption because of their "big switch" nature. Wouldn't it be nice if you could roll out a Db2 for z/OS client-server security requirement in an incremental way - start small, and then expand once you've gotten it right for a little piece of the client-server workload? In fact, you can take just that approach in a Db2 13 for z/OS environment, thanks to APARs PH48764 (March 2023) and PH57811 (January 2024).

A new way to use the Db2 profile tables

APAR PH48764 added a new profile table attribute: MONITOR _______ CONNECTIONS FOR SECURITY. You'd fill in that blank after MONITOR with a category of the Db2 client-server application workload, with the category choices being:

REST (for applications connecting to the Db2 for z/OS system's REST interface)
JDBC
CLI (short for the call-level interface - this will most commonly apply to applications using an ODBC driver to access the Db2 system)
DB2CONNECT (this applies to applications that connect to the Db2 system through a Db2 Connect "gateway" server, as opposed to getting a direct application server-to-Db2 connection)
DSN (this refers to requesters that are other Db2 for z/OS systems)

[You can also fill in the blank with an * - that profile would apply to any requester not covered by one of the more-specific client-server application categories listed above. Also, note that if there is a requester for which no profile applies, the security requirement specified via TCPALVER in ZPARM will be in effect.]

Importantly, APAR PH57811 made it possible to get very specific within a given client-server application category: you can have a profile that applies to a subnet of TCP/IP addresses, or even to a specific IP address (referring to the IP address of an application server).

With a profile defined (a category of the Db2 client-server workload, possibly as granular as a single application server), you can provide the associated attribute row in the DSN_PROFILE_ATTRIBUTES table. In that row, you can indicate the appropriate security requirement - choices include password-based cleint authentication, certificate-based client authentication, multi-factor authentication, and whether or not SSL encryption is required (the full range of choices can be found in the Db2 for z/OS documentation, and note that a requirement for certificate-based client authentication carries with it a requirement for SSL encryption).

OK, so let's look again at that scenario involving making SSL encryption a requirement for - eventually - all of your Db2 for z/OS client-server applications. This time, the "eventually" bit will NOT be a problem, because we'll leverage this new profile table-based granular client-server security requirement specification capability. Suppose we decide to start with applications that access the Db2 system using a JDBC driver. We can specify a profile for that DDF workload category, and we can make the profile applicable solely to a single server on which a JDBC-using application runs. If the application running on that server is running as well on one or more other servers, we can flow most all transaction work through those other servers and have just a few transactions (perhaps submitted by people on the IT team, as opposed to being user-submitted transactions) going through the server acting as our "canary in the coal mine." We issue the -START PROFILE command to load the profile table information into Db2 memory, and now the JDBC-using application running on the one app server will have to successfully complete an "SSL handshake" with the Db2 for z/OS server in order to establish a connection. What if the first connection request from the app server fails with a security error related to our MONITOR JDBC CONNECTIONS FOR SECURITY row in DSN_PROFILE_ATTRIBUTES? Not a huge deal - the application is interacting with Db2 just fine from servers other than the one on which we're working. We examine the connection error data, and determine where we need to make an adjustment or adjustments - maybe Db2 for z/OS-side, maybe app server-side, maybe a little of both. After making appropriate changes on whatever side, we try again and, WOO-HOO - successful SSL connection, followed by successful execution of some transactions flowing from the one app server. Now, we have a template for success - we know how things need to be set up in order for SSL encryption to work for our JDBC-using applications. We can subsequently expand the location-scope of our profile to include other servers hosting JDBC-using applications that access the Db2 system (maybe specifying, for example, a TCP/IP address that covers a subnet that includes several of these app servers), and eventually go with a generic location value (which could be specified in various ways, including * or 0.0.0.0) to indicate that the profile applies to all JDBC-using applications that access the Db2 for z/OS system.

Having gotten to where we want to be with the JDBC-using applications, we can take the same approach with (for example) applications that access the Db2 system using an ODBC driver - start with (if we want) a single app server, and stay with that very limited location scope until we get it right and have, as a result, the template for success with ODBC-using applications. Then we expand the location scope and eventually have the SSL requirement in force for all ODBC-uisng applications; then, on to the next category of interest to us (maybe REST clients). This "start small, get it right, then expand the location scope using what we've learned" approach can greatly reduce the risk and reach of application disruption when we want to implement a new Db2 for z/OS client-server security requirement. A game-changing Db2 enhancement, in my view.

Don't forget about profile table "warning" behavior - great for discovery purposes

Before closing out this blog entry, I want to highlight a Db2 profile table characteristic that I feel is sometimes overlooked by DBA teams: attributes (those that specify a requirement such as a security rule, or that specify a limit on application consumption of resources such as Db2 connections or threads) can be implemented with warning behavior ("let me know if the requirement was not met or the limit was exceeded, but otherwise take no action") as well as with exception behavior ("take application-impacting action when the requirement is not met or the limit is exceeded"). Warning behavior can be great for "getting the lay of the land" when you want to specify a new security requirement or set a new limit for some part of a Db2 subsystem's DDF workload. Let's again consider the SSL encryption requirement. Before establishing a "comply or suffer the consequences" SSL encryption requirement, even for a single app server, you might do well to activate one or more profiles and associated attributes in warning mode, just to see how big of a lift your team's going to be facing in establishing the new security requirement. How about JDBC-using applications? Is it the case that most of them are already using SSL encryption when accessing the Db2 system (not so big a lift, in that case), or is the reality more on the other end of the range of possibilities, with few or no JDBC-using applications using SSL when accessing the Db2 system (in which case, get ready to roll up your sleeves)? You can get the answer to that question by specifying an "SSL required" attribute for JDBC-using applications that will function in warning mode. No application work is impacted, and the messaging provided by Db2 when the SSL requirement is not met will provide you with valuable clarity regarding the current situation.

In conclusion...

In this blog entry, I have highlighted two of the beneficial applications of MONITOR _______ CONNECTIONS FOR SECURITY functionality that most appeal to me:

It gives you the ability to "start small, succeed and then expand" when rolling out a security requirement that you'd like to eventually be in effect for your entire Db2 for z/OS client-server application workload, such as use of SSL encryption or certificate-based client authentication or multi-factor authentication. This approach can substantially reduce the risk of application disruption that could occur if you were to implement the new security requirement by "throwing a big switch."
The non-application-impacting information provided by Db2 when profile table requirements or resource limits are implemented in warning mode can provide a very useful "lay of the land" view of one or more aspects of a Db2 client-server application workload. That intelligence, in turn, can help you to assess the scope of effort that would be required to implement the proposed requirement or resource limit in exception mode.

There are plenty of other ways that MONITOR _______ CONNECTIONS FOR SECURITY functionality could be helpful for your organization - including, of course, the establishment of security requirements that are intentionally different for different categories of your Db2 client-server application workload. Consider how your organization could advantageously use this Db2 13 enhancement, and get ready to put it to work.

Friday, November 22, 2024

Db2 13 for z/OS: Partition-Level Locking for Insert into a PBG Table Space

Db2 13 for z/OS delivered an enhancement to reduce the likelihood that an insert targeting a table in a partition-by-growth (PBG) table space will fail due to partition-level lock contention. In this blog entry I'll describe that enhancement, and I'll provide some additional information that I hope will help you understand how partition-level locking affects insert actions involving PBG table spaces.

First, an important distinction between PBG and partition-by-range (PBR) table spaces: for a row-insert targeting a table in a PBR table space, the row in question must be inserted into a particular partition, based on the value of the partitioning key of the to-be-inserted row. In the case of an insert into a table in a PBG table space, the new row can go into any of the table space's partitions. Yes, a particular partition will be preferred for the new row (information on that follows), but the insert operation will not be restricted to that preferred partition.

OK, so how is the preferred partition of a PBG table space determined for an insert operation? Simple: that will be the partition containing the target page for the insert. What's the target page? It's the page that, per the table's clustering index, would best preserve row-clustering for the table if the new row were to go into that page.

Now, you might think that with the target page for the new row identified, Db2's next step would be to get an X-lock on the page (or for the row, if row-level locking is in effect), once it has been determined that the page has room for the row. In fact, that doesn't happen - not yet, anyway. See, before Db2 can request a child lock (i.e., a lock on a page or a row), it has to obtain the associated parent lock (for a universal table space, whether PBG or PBR, the parent lock will be on the partition within which a new row is to be inserted). If you're concerned about an entire partition being locked just for an insert of a row, relax - almost always, the partition lock acquired for an insert (or for an update or a delete) will be an intent lock (specifically, for a data-change action such as insert, an IX lock - short for intent exclusive), and intent locks do not conflict with each other.

There's an interesting aspect of the partition lock request Db2 makes for the PBG partition containing the target page: it will be a conditional lock request. What does that mean? It means that if Db2 can't get the requested partition lock right away, it will move on to another of the PBG table space's partitions (if the PBG table space has more than one partition). In other words, Db2 will not wait for up to the subsystem's lock timeout limit to obtain the requested partition lock (that lock timeout limit is the value of the ZPARM parameter IRLMRWT, which has a default value of 30 seconds).

"But wait," you might be thinking. "Didn't you just say that partition-level locks are non-exclusive?" I said that they're almost always non-exclusive. In unusual circumstances, some process may have an exclusive lock on the partition containing the target page. In that case, because the lock on the partition requested by Db2 for the insert action was conditional in nature, Db2 will effectively say, "Oh, I can't get an IX lock on this partition? No prob. I'll just request a conditional lock on the next partition." What's the "next" partition? Well, it could be the next one forward from the one containing the target page, or the next one going backward from that partition (Db2 mixes up the direction of partition progression in these situations, to kind of even things out). Let's say that in this case the PBG table space has 10 partitions, and the target page is in partition 5, and the direction of partition progression is "forward." Db2 requests a conditional lock on partition 6, because the conditional lock request for partition 5 was a no-go. If the conditional lock request for partttion 6 is unsuccessful, Db2 will try again with partition 7.If Db2 gets to partition 10 and the conditional lock request for that partition is unsuccessful, it will wrap around to partition 1 and try for that one.

Suppose this progression continues, and again and again the conditional requests for partition-level locks are unsuccessful, and Db2 ends up back where it started - partition 5. Here's where the Db2 13 enhancement comes in. In a Db2 12 environment, if Db2 tried and tried the conditional partition lock requests and those were unsuccessful every time and Db2 wrapped around the PBG table space's partitions and got back to the partition containing the target page, the insert would fail at that time with a -904 reason code (resource unavailable) and a reason code 00C90090 (partition lock failure). In a Db2 13 environment, Db2 will NOT fail the isert when it has "wrapped back around to" (in our example) partition 5; instead, Db2 13 at that point will retry up to 5 of the conditional partition lock requests that had been unsuccessful the first time around. Given that exclusive partition locks (which block the intent-exclusive partition locks that Db2 has been requesting for the insert) tend to be of relatively short duration, there is a pretty good chance that one of those retries will be successful, and the insert operation can then proceed.

If the (up to 5) conditional partition lock request re-tries are all unsuccessful, will Db2 13 fail the insert? Not yet. At that point, Db2 13 will issue a "regular" (i.e., not conditional) request for a partition-level lock (for the partition containing the target page), and will wait for up to the IRLMRWT-specified timeout period for that lock.

If the "regular" partition lock request times out, will Db2 13 fail the insert? Maybe not yet. Here, there is a dependency on whether or not the PBG table space in question has reached its MAXPARTITIONS limit. I've been using the example of a PBG table space that has 10 partitions. If the MAXPARTITIONS value for the table space is greater than 10, here's what will happen if the conditional partition lock requests (original and retry requests) and the "regular" (non-conditional) partition lock request all fail: Db2 13 will check to see if all of the table space's partitions are full. If they are, Db2 will add a new partition to the table space, and the row-insert into the newly added partition should be successful. If Db2 sees that the table space's partitions are not all full, Db2 will fail the insert with a 00C90090 reason code (partition lock failure). Let's change the scenario slightly, so that the table space's MAXPARTITIONS value is 10 (equal to the current number of partitions for the table space). If that is the case then after all the partition lock requests (the conditional requests - including retries - and the non-conditional request) have been unsuccessful, Db2 will fail the insert with the aforementioned 00C90090 reason code.

A word on the "check for partition full" action just mentioned: this is a "speed check," using cached-in-memory information, and that cached-in-memory information might be a little off from reality. To explain: when Db2 searches a partition for space for a new row (and Db2 will do this if it was able to get the partition-level lock required for an insert), and finds that the partition in question is actually full, it will record that finding in memory. If there are then some deletes that free up space in the partition, will Db2 adjust the cached-in-memory information indicating "partition full?" No - doing that would add undesirable overhead and contention for delete operations. When will the cached-in-memory "partition full/not full" information be updated? After the next search for space in a given partition; so, at a particular time there could be a slight difference between what the cached-in-memory information says about the full-ness of partitions, and the reality of the full-ness of those partitions. What the "sometimes slightly fuzzy" cached-in-memory information enables is a very quick check on partition full-ness after Db2 has been unable to obtain the partition lock needed for an insert. This, I think, is probably what you want, After Db2 has spent time trying (maybe trying and trying and trying and...) to get a partition lock in order to get an insert done, you probably don't want Db2 to add the time needed for a comprehensive search for space to be added to the process. Without getting a partition lock, no insert is going to happen anyway - the real quick partition full-ness check using the "pretty accurate but maybe kind of fuzzy" cached-in-memory information is done just to see if adding a new partition is warranted following failure by Db2 to get a lock on any of the table's existing partitions.

One more thing: if getting a the partition lock needed for a row-insert is not a problem (and usually, this will in fact not be a problem), then Db2 might find that all of the table space's partitions are actually full (done by actually searching the partitions for space - not based on a check of the cached-in-memory partition full/not full information). In that case, Db2 will add a new partition for the table space if the table space's MAXPARTITIONS limit has not been reached. If the MAXPARTITIONS limit has been reached, the "partitions actually all full" situation will cause Db2 to fail the insert with a -904 (resource unavailable) and a 00C9009C reason code (partition full).

I hope that this look at some insert scenarios involving Db2 for z/OS partition-by-growth table spaces has provided you with some useful information. As always, thanks for visiting the blog.

Monday, October 28, 2024

Db2 for z/OS: Avoiding Splitting Headaches (the Index Page Kind)

This blog entry is about a Db2 for z/OS-internal process that has long been a headache-inducer for DBAs, and about some Db2 13 enhancements that can provide effective relief for those headaches. The Db2-internal process to which I'm referring is index page splitting.

Index page splitting: what it is and why it matters

Db2 for z/OS can support very high levels of row-insert throughput. One reason why this is so: a to-be-inserted row does not have to go into a particular page of a table's table space. Yes, if there is a clustering index on the table (and there will be such an index, if the table has any indexes) then there will be a "target page" for a to-be-inserted row - the page into which the new row should be inserted in order to best maintain row-clustering for the table; however, if that target page is locked at the time by some other process or is found to be full, Db2 will insert the new row into some other page - ideally, into a page relatively close (physically) to the target page, but into a "farther away" page if need be. The point here: insert throughput will not be slowed due to contention on, or fullness of, the table space page into which a new row should ideally go.

It's a different story for indexes on Db2 tables. When a new entry is to be added to an index (as a result of an insert operation), that new entry must go into a particular page, because Db2 maintains physical ordering of index entries according to index key value. If the page into which a new index entry must go is full, Db2 will do something called index page splitting. That operation involves moving a portion of the entries in the page to an empty page of the index, so that the formerly-full page will have room for the new entry (Db2 provides a pointer from the formerly-full page to the page into which relocated entries were placed, and a pointer from that page to the one that is next in terms of logical sequence).

The good about index page splitting is that it preserves physical ordering of index key values. As for the not-so-good effects of index page splitting...

It causes the index in question to be less well-organized in terms of physical versus logical page sequencing. The index entries relocated as a result of an index page split can't just go into some page of the index that has some room - they have to go into an empty page. If all of the pages in an index structure are populated, the relocated entries will have to be placed in a new page at the "far end" of the physical index space. As more and more such relocations happen, the performance of index-based data access paths such as "matching index scan" degrades, because Db2 has to "jump" - maybe quite a long ways - from index page n ("n" referring to logical sequence) to page n+1, because logical page n+1 may be physically quite far from the physical location of logical page n (and then Db2 has to jump back to page n+2 in the logical sequence). An index REORG is required to get the physical sequence of pages in line with their logical sequence.
Especially for indexes on tables that are the target of high-volume insert processes, index page splitting can be a real drag on performance, because an index page split operation is serialized by Db2 to protect the data integrity of the index, and that causes processes that need to access the affected part of the index tree structure to wait for the index page split operation to complete. This impediment to application throughput can be more pronounced when Db2 is running in data sharing mode in a Parallel Sysplex cluster of z/OS systems, because of the extra logging that accompanies index page split operations.

Given these negative impacts of index page splitting, you'd think that Db2 DBAs would want to be on the lookout for processes affected by index page splits, and for indexes getting a lot of index page split activity, and you'd be right in that thinking. Just one problem there: actually zeroing in on index page splitting hot spots was challenging (before Db2 13, that is).

A longstanding issue (prior to Db2 13): pinpointing index page split problems

For a long time, positively identifying processes driving index page splitting, and indexes getting a lot if index page split activity, was no easy thing. The issues:

Identifying split-heavy processes - There was a trace record, IFCID 359 (associated with performance trace class 4), that could be used to monitor index page split activity, but this trace was not often activated at Db2 sites. One reason for that lack of use: IFCID 359 is a relatively high-overhead trace - it causes a trace record to be generated for every index page split operation that occurs in a Db2 system. In some cases, there can be a lot of index page splitting going on - especially if there is a high volume of insert activity for tables that have indexes defined on keys that are not continuously-ascending (index page splits do not occur for indexes defined on continuously-ascending keys - i.e., keys for which every new entry has a higher key value than all previous entries in the index). Overhead issues aside, IFCID 359 trace records also lack some information that would be helpful in identifying processes driving index page split activity: it does not provide the unit of recovery (UR) ID of the process that is causing an index page split to happen, and in a Db2 data sharing environment it does not provide the ID of the member subsystem that is performing the index page split.
Identifying indexes with high levels of index page split activity - You could check the value of the REORGLEAFFAR column of the row for an index (or index partition) in the SYSINDEXSPACESTATS real-time statistics table in the Db2 catalog, and get maybe a sense that the level of index page splitting could be high for the index in question, but it was hard to be sure about this - index page splitting is not the only action that causes this counter to be incremented. On top of that, you couldn't determine if these (possible) index page splits for the index were taking a lot of time to be processed.

Bottom line: at best you could get kind of a fuzzy picture of indexes that might be seeing a lot of index page split activity, and you basically didn't have any clear indication of the processes driving index page split activity. Hard to take decisive mitigating actions when you don't have really good intel. Fortunately, Db2 13 provides relief for DBA headaches related to index page splitting.

Db2 13 for z/OS: more helpful (and more CPU-efficient) tracing, more real-time stats

First, the good news on the tracing front: Db2 13 introduces a new trace record, IFCID 396, that is associated with statistics class 3 and is, therefore, active by default (Db2 statistics trace classes 1, 3, 4, 5, and 6 are on by default, per the SMFSTAT parameter in the Db2 ZPARM module). Does this make the CPU overhead of statistics class 3 a matter of concern? Nope - stats class 3 remains a very low-overhead trace. How so? Well, the new IFCID 396 trace record is generated not for every index page split in a Db2 system, but only for those that take an unusually long time to complete (specifically, more than one second). That's generally what you want a spotlight on - more than on all the index page splits that get done lickety-split. What's more, an IFCID 396 trace record provides the UR ID of the process driving the unusually-long-running index page split operation, and (in a Db2 data sharing environment) the ID of the Db2 member that processed the index page split. Result: identifying processes that are driving unusually high levels of unusually long-time-to-complete index page split operations is now a lot more straightforward than it was before.

The real-time statistics situation is also improved with Db2 13, thanks to three columns that are added to SYSIBM.SYSINDEXSPACESTATS when the catalog level goes to V13R1M501 (and the catalog can be taken to that level once function level V13R1M500 has been activated):

REORGTOTALSPLITS - Since the index (or index partition) was last REORG-ed, how many index page splits have happened?
REORGSPLITTIME - What has been the total amount of time consumed in processing all the splits indicated by the REORGTOTALSPLITS value?
REORGEXCSPLITS - Of the number of splits indicated by the REORGTOTALSPLITS value, how many took an exceptionally long time (more than 1 second) to be processed?

With these new real-time stats columns, identifying indexes for which page splitting is problematic is a snap. With such an index identified, you can take an action (or combination of actions) to address the issue - for example:

Go to a larger index page size to reduce the incidence of page splitting (it takes more index entry inserts to fill a larger index page).
Increase the PCTFREE value for the index, so that after a REORG there will be more space in each page to hold newly-inserted entries.
Increase the frequency of REORGs for the index.

Here's the key takeaway for Db2 for z/OS DBAs: whereas before you were kind of shooting in the dark in trying to identify and deal with index page split issues, Db2 13 gives you actionable intelligence that you can use to effectively and efficiently focus your index-page-split-mitigating actions.

Wednesday, September 25, 2024

Migrating to Db2 13 for z/OS: What About Non-Universal Table Spaces?

I recently communicated with a Db2 for z/OS system administrator who had some concerns about non-universal table spaces in the context of his organization's planned migration from Db2 12 to Db2 13. This person asked important questions about the presence of non-universal table spaces in his Db2 environment and how those database objects might affect his team's plans for Db2 12-to-13 migration. Looking over that communications thread, I found myself thinking (as I often have over the years) that it could be the basis for a blog entry that would be helpful for people in the wider Db2 for z/OS user community; so, here we go.

Terminology: universal and non-universal table spaces

Logically speaking, Db2 for z/OS-managed data appears in tabular form - i.e., data records appear as rows in tables, and the data fields of a given set of records are columns of the associated table (that is kind of relational database 101 information). Db2 tables are physically instantiated in what are called table spaces (and table spaces map to VSAM data sets - VSAM being the primary file system of the z/OS operating system).

Table spaces fall into the realm of Db2 physical database design, which means they are basically invisible to application programs (a given Db2 table will look the same to a program, regardless of the type of table space in which it resides). DBAs decide on the type of table space that will be used for a given Db2 table.

For many years, there were three Db2 table space types: simple, segmented and range-partitioned (for the latter, partition ranges were initially defined by way of an index - starting with Db2 Version 8 for z/OS, range-partition specifications could be made at the table level). Db2 9 for z/OS (2007) introduced a new category of table space, known as universal. There are two universal table space types: partition-by-growth (PBG) and partition-by-range (PBR). For a PBG table space, a DBA determines an appropriate partition size (known as a data set size) and an appropriate maximum number of partitions (which can later be adjusted if need be), and when the first partition fills up then Db2 automatically allocates another partition, and when that one fills up another partition is allocated, and so on. For a PBR table space, a DBA determines the appropriate number of partitions and the partitioning key (comprised of one or more of tha associated table's columns), and the limit key value for each partition - Db2 then assigns rows to partitions accordingly.

Some noteworthy characteristics of universal table spaces:

They are always partitioned (though a PBG table space might never grow beyond its first partition).
They are segmented (referring to an aspect of management of the space within the table space - this is why I often use the phrase "traditional segmented" to refer to the segmented table space type that existed before universal table spaces were introduced).
A given universal table space can hold one and only one table.

What does Db2 13 have to do with this?

For a Db2 12 system to be migrate-able to Db2 13, function level V12R1M510 has to be activated for the Db2 12 system. Activating function level N for a Db2 for z/OS system means that functionality associated with function levels prior to N will also be activated (if the prior function levels had not been explicitly activated previously). That means that activation of Db2 12 function level 510 will mean activation of the functionality of function level V12R1M504, and there's the rub - or so the aforementioned Db2 administrator thought. What this administrator knew - and knew rightly - is that, with function level V12R1M504 or higher activated, non-universal table spaces go into "deprecated" status (meaning that there will no longer be functional enhancements related to those non-universal table space types, and that the use of such table space types could eventually become not-possible - an eventuality likely to be far in the future and preceded by way-in-advance notice). Of more immediate concern to the Db2 administrator: he was under the impression that non-universal table spaces could not be created in a Db2 13 environment, and given that impression he was thinking that non-universal table spaces in his environment had to be converted to universal table spaces before Db2 12 systems could be migrated to Db2 13.

I had some good news for the Db2 administrator: non-universal table spaces CAN be created in a Db2 13 environment (this actually applies to traditional segmented table spaces - simple table spaces, while still usable even in a Db2 13 environment, have not been CREATE-able since Db2 9). To create a traditional segmented non-universal table space in a Db2 13 system (or in a Db2 12 system with function level V12R1M504 or higher activated), all one has to do is issue the CREATE TABLESPACE statement via a Db2 package executing with an application compatibility level of V12R1M503 or lower (as explained in an entry I posted to this blog a few years ago). The really important point here is that the presence of non-universal table spaces in your Db2 for z/OS environment should in no way impact your Db2 13 migration plans. Is it good to convert non-universal table spaces to the universal variety? Yes (as explained below); however, you do NOT need to complete that table space comversion work prior to migrating your Db2 12 systems to Db2 13 - you can continue that work in a Db2 13 environment.

Incentives for converting non-universal table spaces to universal table spaces

Avoiding use of deprecated table space types is a good reason to move away from simple and traditional segmented table spaces. There are, in addition, positive incentives for making this move, in the form of Db2 for z/OS features that apply only to universal table spaces (and tables therein):

Pending DDL - Many changes to database objects can be accomplished via ALTER and a subsequent online REORG of the related table space.
LOB in-lining - For a table with one or more LOB (large object) columns, this refers to the ability to store a portion of LOB values (up to a specified length for a given LOB column) in the base table, with the rest of the value (if any) going in the appropriate auxiliary table in a LOB table space.
XML multi-versioning - For a table with one or more XML columns, this Db2 feature provides better concurrency for XML data access, and supports the XMLMODIFY built-in function, enabling changes that affect only part of a stored XML data value.
ALTER TABLE with DROP COLUMN - This is a pending DDL change (see the first item in this list).
Ability to insert a new partition in the middle of a range-partitioned table space (versus having to add the new partition at the end of the table space).
The option of making ALTER COLUMN actions pending DDL changes (as opposed to requiring that these be immediate changes).
Relative page numbering for range-partitioned table spaces.

How disruptive is it to go from a non-universal to a universal table space?

Answer: about as minimally disruptive as could be. It's an ALTER + online REORG action, so the only period of data unavailability related to a non-universal-to-universal table space conversion would be during the "switch" phase at the end of an online REORG, and that can be as brief as a few seconds. Dependent Db2 packages (related to programs that issue so-called static SQL statements targeting a table in a table space converted from non-universal to a universal type) will be invalidated by the online REORG that materializes the table space conversion - you would subsequently either rebind those invalidated packages with explicit REBIND PACKAGE commands, or you would let Db2 auto-bind them (auto-bind is triggered by the first request by a program to execute an invalidated package).

Converting a table space from non-universal to universal is really easy when the non-universal table space holds a single table. If the non-universal table space is of the simple or traditional segmented variety, the process is: 1) ALTER the table space with a MAXPARTITIONS value, and 2) execute an online REORG of the table space. Following completion of the online REORG, the table that had been in a simple or traditional segmented table space will be in a universal partition-by-growth table space. If the non-universal table space is of the range-partitioned type, the process is similarly simple: 1) ALTER the table space, this time with a SEGSIZE value, and 2) execute an online REORG of the table space. Following completion of the online REORG, the table that had been in a non-universal range-partitioned table space will be in a universal partition-by-range table space.

For a simple or traditional segmented table space that contains multiple tables, conversion to universal is a somewhat more involved process, but it still comes down basically to ALTER and online REORG. The process, introduced with Db2 12 function level 508, is described in an entry I posted to this blog not long ago.

After a table space has been converted from non-universal to a universal type, is it necessary to change the APPLCOMPAT value for Db2 packages that are dependent on the table space (or the table therein)?

Answer: NO. The requirement for a Db2 application compatibility level of V12R1M503 or lower is ONLY relevant for the package (maybe a SPUFI or a DSNTEP2 package, or a package associated with another program or tool used by Db2 DBAs) through which a CREATE TABLESPACE statement is issued, when there is a need to create a traditional segmented (i.e., non-universal) table space in a Db2 13 environment or a Db2 12 environment in which the activated function level is V12R1M504 or higher. Db2 packages having any supported APPLCOMPAT value can access a table in a universal table space (the lowest supported APPLCOMPAT value is V10R1, and the highest possible APPLCOMPAT value is equivalent to the current activated function level of the Db2 for z/OS system in which the package executes).

In a nutshell...

Converting non-universal table spaces to universal types is something you ought to be doing, but you DO NOT have to get that conversion work completed prior to migrating a Db2 12 system to Db2 13. The conversion process - available in Db2 12 and Db2 13 environments - varies somewhat depending on whether the non-universal table space is range-partitioned or not, and on whether the non-universal table space (if not range-partitioned) holds one table or multiple tables, but in any case it is pretty straightforward, minimally disruptive and application-transparent. So, yes, work on this, but at a pace that works for you and for your organization.

Friday, August 23, 2024

After Migrating to a New Version of Db2 for z/OS, Rebind All Your Plans and Packages

At this writing, a lot of Db2 for z/OS-using organizations are actively engaged in migrating to Db2 13 from Db2 12, and plenty of sites have recently gotten to Db2 13. That being the case, it seems like a good time to remind people of an important step to take after migrating to Db2 13, while still at the V13R1M100 function level (the initial Db2 13 function level when you've migrated from Db2 12): rebind all your plans and packages.

First: why do this? For a couple of reasons. One would be to get a CPU efficiency benefit. The other has to do with the benefit of keeping package code current (I'll explain that "code" reference momentarily). Let me expand on both these rebind motivations.

Package REBIND motivation #1: improved CPU efficiency

Rebinding packages when you get to a new version of Db2 for z/OS can deliver CPU efficiency benefits in two ways. First (for packages associated with programs that issue static SQL statements), it lets the Db2 optimizer generate access paths for queries (and for "searched" UPDATEs and DELETEs - i.e., UPDATE and DELETE statements that have predicates) using options that may not have been available to the optimizer in the previous version of Db2. In some cases, new access path options can substantially improve a query's performance.

What if you're happy with the current access plan for a package (very often the case)? That brings us to the second source of CPU efficiency gain typically associated with a package rebind in a new-version Db2 environment: even when access paths don't change as a result of a REBIND (and they generally won't when you issue REBIND PACKAGE with the option APREUSE(WARN) or APREUSE(ERROR), which tells Db2 to re-use existing access paths for the package's SQL statements), it's quite likely that the rebind of a package in a new-version Db2 environment will yield at least some improvement in the CPU efficiency of package execution. How is that so? Well, as briefly mentioned above, a Db2 package contains executable code. For a package related to a program that utilizes static SQL, part of the package's executable code will be, in essence, the compiled form of the static SQL statements. See, there is no z/OS or IBM Z instruction called (for example) SELECT. A SELECT statement (or any other SQL statement) embedded in a Db2-accessing program has to be turned into code that can be executed in the z/OS system. That executable-form SQL is found in the program's Db2 package (itself originally generated from a BIND of the program's static SQL statements). And here's the thing: the IBM Db2 for z/OS development team is always working to "tighten up" executable-form SQL (i.e., to make the executable form of SQL statements ever more CPU-efficient). When you let Db2 13 re-generate the code in a package by rebinding that package, you typically will get some CPU efficiency benefit from the resultant "tightened up" code that Db2 13 can deliver - again, even when access paths for the package's SQL statements don't change.

Package REBIND motivation #2: package code currency

Efficiency benefits of tightened-up package code aside, there is another reason to let a new version of Db2 for z/OS regenerate the code for your packages via REBIND: it makes the package code more current in the sense that it is associated with the newer version of Db2. Why is that important? Because at some point too-old package code (i.e., code generated for a package via BIND or REBIND with an older version of Db2) will become un-executable for a newer version of Db2. That was the case in a Db2 12 for z/OS environment for packages last bound or rebound in a pre-Db2 10 system, and it is true in a Db2 13 environment for packages last bound or rebound in a pre-Db2 11 system (that is why activation of Db2 12 function level 510 - a prerequisite for migration to Db2 13 - will not be successful if you have any pre-Db2 11 packages still in use in your Db2 12 environment). If you keep your Db2 package code current by rebinding packages when you go to a new version of Db2, this issue of eventually-not-executable package code will be something about which you won't have to worry.

Good Db2 13 news regarding REBIND PACKAGE with APREUSE(WARN) or APREUSE(ERROR)

As I mentioned previously, it's often the case that a Db2 DBA team is fine with package access paths being what they are. In such situations, when doing a large-scale rebind of packages upon migrating to a new version of Db2, it makes sense to specify APREUSE(WARN) or APREUSE(ERROR) in the REBIND PACKAGE commands. Both forms of APREUSE tell Db2 to reuse existing access paths for a package's SQL statements, versus generating new access paths. The difference? With APREUSE(WARN) in effect, if Db2 cannot re-use an access path for one or more of a package's SQL statements, the rebind will go forward and Db2 will issue an information message about the access path change and will write information to EXPLAIN tables that will allow you to see just what changed. With APREUSE(ERROR) in effect, a REBIND PACKAGE action will fail if Db2 cannot reuse all access paths associated with a package's SQL statements.

Here's the good Db2 13 news regarding APREUSE(WARN) and APREUSE(ERROR):

The CPU efficiency of REBIND PACKAGE when APREUSE(WARN/ERROR) is specified has been improved, so a large-scale rebind with these specifications will consume less CPU time.
The rate of warnings, when APREUSE(WARN) is used, or errors, when APREUSE(ERROR) is used, should be lower in a Db2 13 system versus previous Db2 environments. In other words, the rate of success for access path reuse in a Db2 13 environment should be higher versus previous Db2 environments.

A word about APPLCOMPAT

When you do a large-scale rebind of packages after migrating to Db2 13, do you need to change the APPLCOMPAT value for your packages? No, you do not need to do this. As is the case for a Db2 12 system, in a Db2 13 environment APPLCOMPAT values as low as V10R1 are OK. Additionally, you should know that the CPU benefit of package rebind that I described previously does NOT require going to a higher APPLCOMPAT value when rebinding packages; and, if you want to let the Db2 13 optimizer utilize new access path options, by not using APREUSE(WARN) or APREUSE(ERROR) in a REBIND PACKAGE command, this does NOT require specification of a higher APPLCOMPAT value for the package in question - query optimization is unrelated to a package's APPLCOMPAT value.

If you'd prefer to take package APPLCOMPAT values higher when rebinding packages in a Db2 13 environment, go ahead and do that - just know that this is an option, and not a technical requirement.

A little more information here:

When a package is rebound and the REBIND PACKAGE command does not include an APPLCOMPAT specification, by default Db2 will just preserve the package's existing APPLCOMPAT value.
You can learn more about APPLCOMPAT via a part 1 and part 2 entry I posted to this blog a few years ago.

Remember that rebind phase-in will help you out

Rebind phase-in functionality, introduced with Db2 12 function level 505 and part of the Db2 13 base code, can be a BIG help when you're looking to rebind all of your packages following migration to Db2 13. How so? By eliminating REBIND contention with in-use packages. Prior to Db2 12 FL505, you couldn't rebind a package if the package was in-use at the time (i.e., if the package was allocated to a thread for execution), and this caused many REBIND PACKAGE commands to time out (especially if the target package was bound with RELEASE(DEALLOCATE) and allocated to one or more threads of the type that persist through COMMITs). With rebind phase-in functionality in effect (and this is automatic in a Db2 12 FL505 or higher environment, or in a Db2 13 system, with no special REBIND specification required), a REBIND package action will succeed even if the specified package is in-use at the time, and the rebind itself will not disrupt a related application workload.

Rebind phase-in functionality can be particularly helpful for rebinding packages that are very high-use in nature, as is often the case for (among others) the IBM Data Server Driver / Db2 Connect packages (referring to the packages whose default collection is the one named NULLID). At sites with high-volume, round-the-clock Db2 DDF (i.e., client-server) workloads, rebinding the NULLID packages could be a real challenge. Rebind phase-in removed that challenge.

NOTE: rebind phase-in functionality requires that the value of the Db2 ZPARM parameter PLANMGMT be set to EXTENDED. As that has long been the default value for PLANMGMT, it's probably the setting for your Db2 for z/OS systems, but you might want to double-check that.

Rebind plans, too - and note some distinctions versus packages

Yes, plans - as well as packages - should be rebound following a migration to Db2 13, but there are some difference versus packages that you should consider.

First, plan rebinds apply exclusively to your "local-to-Db2" applications, such as CICS-Db2 or IMS-Db2 transactions, or Db2-accessing batch jobs - DDF-using applications are not associated with Db2 plans in a technical sense (the DISTSERV plan related to DDF-using applications is for reporting and instrumentation purposes - it's not a Db2 plan in the technical sense).

Second, rebinding of plans when you get to Db2 13, while recommended, is not as high-priority an action as is rebinding of packages. Plans don't contain code that provides the executable form of SQL statements - it has been a LONG time since you were able to bind SQL statements directly into plans, and existing plans containing compiled SQL statement code had to be converted to package-related plans years ago (a Db2 plan points to one or more collections of packages that are executed by applications that utilize the plan). This being the case, you'll be more in take-your-time mode when it comes to rebinding plans in the Db2 13 environment, and that relatively diminished level of urgency is helpful in light of the fact that there is currently no plan-related equivalent of the phase-in functionality used for package rebinds.

Given that there can be contention when a REBIND PLAN command targets an in-use plan, timing can be important. For rebind of a plan used by a batch application, it generally won't be hard to find a time window during which the batch application is not running. Even if you have a round-the-clock transaction (CICS or IMS TM) that accesses Db2 for z/OS, you should be able to successfully execute a REBIND PLAN if you issue the command during a time of lower transaction volume and if the plan in question is not bound with the RELEASE(DEALLOCATE) option (in my experience, RELEASE(DEALLOCATE) is typically used for certain packages, and not for plans). If RELEASE(DEALLOCATE) is in effect for a plan you want to rebind, and that plan is associated with a round-the-clock transaction workload, and the plan is allocated to persistent threads (e.g., CICS protected threads or IMS WFI or pseudo-WFI threads), you might need to look at briefly going with non-persistent threads for the workload to enable successful execution of REBIND PLAN.

Summing it up

Rebinding your Db2 plans and packages after going to Db2 13 (while still at the V13R1M100 function level) should be part of your overall Db2 13 migration plan. Taking this action should yield some CPU efficiency gains (even when access paths for static SQL packages don't change), and the boost in package and plan code currency will help to facilitate migration to the version of Db2 that follows V13 (whenever "Vnext" makes the scene).

Tuesday, July 30, 2024

Db2 for z/OS: What Makes for a Balanced Mainframe Configuration?

What I'm writing about today is something I've been talking about for years, thought often in a somewhat indirect way. It has to do with mainframe memory (often referred to as "real storage" by z/OS people, to distinguish it from virtual storage) - more specifically, what it takes, memory-wise, to have what I'd call a balanced configuration for a z/OS LPAR (a z/OS LPAR, or logical partition, is a z/OS system; Db2 for z/OS runs within a z/OS LPAR, and a given mainframe "box" could house one or more z/OS LPARs). After working this year with a couple of situations involving z/OS LPARs with not-well-balanced configurations, I think it's time to address the topic directly.

I'll tell you right off that the gist of the matter is this: to realize the full productive potential of a z/OS LPAR's processing capacity, that capacity has to be balanced by an appropriate amount of system memory.

It's at this point that an IT person who administers so-called distributed systems servers (Linux, UNIX or Windows servers) could be expected to say, "Duh." See, distributed systems administrators have talked about "balanced configuration units" (aka BCUs) for many years (when I was in the IT organization of a financial services company back in the early 2000s, I regularly saw references to BCUs). The BCU concept was simple and straightforward: a distributed systems server with X number of processor cores should have at least Y amount of memory (BCU, in this context, often extended to disk storage, as well - that's not so much the case with z/OS systems, owing in large part to the architecture of the z/OS I/O subsystem).

Oddly enough, I generally didn't hear much talk about balanced configurations among z/OS people. In fact, back around 2008 or so things started to get more un-balanced at some z/OS sites. That was when IBM delivered the z10 line of mainframe servers, a line that pretty dramatically increased processing power relative to its predecessor. What then happened in more than a few instances is that z/OS LPAR processing capacity raced ahead of real storage resource growth. Why did this happen? I'm not certain, but one possibility is the fact that IBM Z (the official product designation for mainframe servers) and z/OS were a bit late to the 64-bit party, referring to 64-bit addressing, which hugely expanded virtual storage resources (to 16 exabytes, from the 2 gigabyte limit imposed by 31-bit addressing) and offered, as well, access to vastly larger real storage resources. z/OS system administrators had for years worked hard to fit huge mainframe workloads into virtual and real storage spaces that could not exceed 2 GB, and when enormously large real storage resources became a possibility, it seemed to me that there was almost a reluctance on the part of some z/OS systems people to ask for more memory than they'd used before. Maybe people felt that requesting a lot more memory for a z/OS system was akin to "taking the easy way out" by just "throwing hardware at the problem" of keeping up with application workload demands.

Distributed systems people had no such qualms about requesting lots of memory to go with lots of processing power.

Anyway, at a number of z/OS sites things got pretty rough with regard to delivering required levels of application performance and throughput because of a lack of real storage resources, and these issues could be particularly acute for z/OS systems supporting Db2 for z/OS workloads. Why is that? Because Db2 for z/OS, from the beginning, was architected to take advantage of large virtual and real storage resources. See, when Db2 for MVS (as it was originally known) was announced back in 1983 (as I recall - it was the second year of my first stint with IBM), the MVS/ESA operating system was right around the corner - and with it, Db2 for MVS/ESA. MVS/ESA provided 31-bit addressing, taking the old 16 megabyte virtual and real storage limit (can you believe that?) associated with 24-bit addressing up to a then-astounding 2 gigabytes. That great big increase in addressability allowed the buffering of large amounts of Db2 data in memory, and that was a huge factor in generating the performance that helped Db2 become a tremendously popular DBMS for z/OS systems (the "We have liftoff" for Db2 moment came when its performance capability matched the huge programmer productivity boost associated with SQL and the relational database model, both of which IBM invented).

Fast-forward lots of years, and you get to that time - starting around the late 2000s, by my reckoning - when, as noted, z/OS processing capacity got seriously out in front of real storage resources at a number of Db2 for z/OS sites. What this sometimes meant for those sites: either the Db2 buffer pool configuration was much smaller than it should have been, due to the lack of real storage for the z/OS LPAR, resulting in very high read I/O rates that impeded application throughput and increased CPU consumption; or, the Db2 buffer pool configuration - while still perhaps on the small side - was too big relative to the undersized real storage resource, leading to elevated levels of demand paging in the z/OS system, with resultant adverse impacts on performance (a production z/OS LPAR's demand paging rate, available via an IBM RMF CPU Summary report, should be less than 1 per second). Plenty of organizations have acted in recent years to get real storage in line with processing capacity for their Db2-containing z/OS LPARs, but quite a few still have production Db2 for z/OS subsystems running in z/OS LPARs that are under-configured from a real storage perspective. It's a good time to achieve balance for Db2-related z/OS LPARs that currently have too little system memory.

OK, so what does a balanced configuration look like for a z/OS LPAR in which a production Db2 subsystem runs? My rule of thumb, based on years of reviewing performance data for production Db2 for z/OS systems, is this: the z/OS LPAR should have at least 20 GB of real storage per engine (i.e., per processor) - and that's regardless of the mix of general-purpose and zIIP engines configured for the LPAR. Here's an example: suppose you have a z/OS LPAR, in which a production Db2 subsystem runs, that is configured with 5 general-purpose and 3 zIIP engines. I'd say that to effectively balance that LPAR's processing capacity with real storage, you'd want the LPAR to have at least 160 GB of real storage (8 engines X 20 GB - at least - per engine). I'm emphasizing "at least" because you don't need to stop at 20 GB of real storage per engine. Some Db2 for z/OS-using organizations have production z/OS systems with 30 GB, 40 GB, 50 GB or more of real storage per engine. Is that over-doing it? Not in my view. These organizations have done things like super-sizing buffer pools; going big for other performance-relevant Db2 memory areas such as the package cache, the database descriptor cache, the prepared dynamic statement cache, the sort pool and the RID (row ID) pool; and boosting the use of RELEASE(DEALLOCATE) packages combined with persistent threads (i.e., Db2 threads that persist across commits) - a move that, like the others, ups Db2's memory usage and drives down per-transaction CPU consumption. The point: give Db2 for z/OS enough memory, and it will perform very well. Give it even more memory, and it will perform even better.

[Here's an important cost factor to consider: upping the real storage resource for a z/OS LPAR will not cause software costs to increase for the LPAR. z/OS software costs are based on general-purpose processor utilization, not real storage size.]

You don't have to take my word on the importance of large memory resources for the performance of a Db2 for z/OS system - just look at the trend lines:

z/OS LPAR real storage sizes are in fact getting steadily larger over time, driven in large part by positive results achieved through leveraging big memory for improved Db2 for z/OS workload performance and CPU efficiency (and fueled as well by the continuing drop in the cost of mainframe memory on a per-gigabyte basis). Reference point: the largest real storage size I've seen with my own eyes for a z/OS LPAR in the real world (versus an IBM performance benchmark system) is 2.4 terabytes. The Db2 for z/OS subsystem running in that LPAR has a buffer pool configuration size (aggregate size of all allocated buffer pools) of 1.7 TB, and the LPAR's demand paging rate is zero (the 700 GB of real storage beyond the Db2 buffer pool configuration size is more than enough for the rest of the system's storage requirements).
z/OS 3.1 (the current version of the operating system) provides support for up to 16 terabytes of real storage for one z/OS LPAR (up from 4 TB previously). Caveat: with the current packaging of memory for the IBM z16 line of mainframe servers, it's recommended that you not go beyond 10 TB of real storage for a single z/OS LPAR - that restriction will likely be gone at a future time.
Since Db2 12 for z/OS (the current version is Db2 13), a single Db2 subsystem can have a buffer pool configuration size of up to 16 TB (that's a prep-for-the-future thing - you generally want your buffer pool configuration size to be less than the real storage size of the associated z/OS LPAR, and as noted that size is currently limited to 16 TB, with a recommended maximum of 10 TB for an LPAR on a z16 server).
Multiple virtual storage-related (and, therefore, real storage-related) Db2 for z/OS configuration parameters (known as ZPARMs) have default values for Db2 13 that are substantially greater than the corresponding Db2 12 values. These parameters include those that specify the size of the package cache, the database descriptor cache, the sort pool and the log output buffer. The substantially larger default values for these parameters in a Db2 13 environment reflect the awareness of the IBM Db2 for z/OS development team that z/OS LPAR real storage sizes are getting larger in an ongoing way, and a reaffirmation that leveraging larger real storage resources is a winner for Db2 performance and CPU efficiency.

If you have a Db2-housing z/OS LPAR that is under-configured in terms of real storage, do what you can to change that situation. Yes, when 2 GB was the real and virtual storage limit (the latter referring to the address space size limit) for z/OS systems, Db2 for z/OS people went to heroic lengths to push huge transaction rates through LPARs that were very much memory-constrained. That was then. These days, no one gets a medal for "doing the best you can" with a Db2 for z/OS system that is sorely lacking in the real storage department. Take a cue from your peers on the distributed systems side of the house. Memory matters.

Wednesday, June 26, 2024

Db2 for z/OS: Getting From Function Level 500 to 510 in a Db2 12 Environment

It's pretty well known that getting the activated function level of a Db2 12 for z/OS system to V12R1M510 is required prior to migrating that Db2 12 system to Db2 13 (I blogged about that a couple of years ago). At present (June, 2024), there are still a good number of sites running Db2 12 for z/OS with an activated function level of V12R1M500. I sometimes get, from people at such sites, a question like this: "How should we go from Db2 12 function level 500 to function level 510?" Generally speaking, what these folks are wanting is for me to opine on going straight from Db2 12 function level 500 to function level 510, versus getting from FL500 to FL510 by first going to some "in-between" function level and then going to 510. In this blog entry, I'll tell you what I tell people who ask me this "getting from A to B" question.

Right off the bat, I'll say that I'd be comfortable with either approach - in other words, there is not a "right" and a "wrong" way to go from Db2 12 FL500 to FL510. Let's consider first the "one fell swoop" approach of going straight from Db2 12 FL500 to FL510. What is required, in a technical sense, to do this? The code level of the Db2 12 system (which you can check using the output of the Db2 command -DISPLAY GROUP) has to be 121510. The catalog level (also check-able with -DISPLAY GROUP) has to be V12R1M509 (there is no 510 catalog level; and, you can use the CATMAINT utility to take the catalog level to V12R1M509 if it is not already there). Additionally, you can't have any packages that have been executed within the past 1.5 years that were last bound or rebound prior to Db2 11 (this "no pre-Db2 11 still-in-use packages" requirement is explained in the aforementioned blog entry about preparing to migrate from Db2 12 to Db2 13).

Let's say those technical prerequisites are in-place. Given that you could, in that case, go straight from Db2 12 FL500 to FL510, why wouldn't you? What would hold you back? In my experience, the chief concern is often something like this: "If we make the big jump from Db2 12 FL500 to FL510, I'm afraid that will mess up some of our Db2-accessing applications."

That concern, though understandable, is essentially unfounded if you don't change package APPLCOMPAT values when making the big jump in the activated function level of the Db2 12 system. I've written plenty about APPLCOMPAT, including a part 1 and part 2 entry in this blog. The important thing to remember about APPLCOMPAT, in relation to a change in the activated function level of a Db2 system, is this: a package's APPLCOMPAT value can serve to insulate an associated application program from SQL behavioral changes that could be introduced via a change to a higher activated function level in a Db2 for z/OS environment. Let's unpack that statement. A SQL behavioral change - referred to in the Db2 for z/OS documentation as a "SQL incompatibility" - can be succinctly described thusly: same SQL, same data, different result. That might sound kind of alarming. Here's the two-pronged good news: 1) SQL incompatibilities are quite few and far between (they are documented whenever they are introduced by a new Db2 for z/OS version or function level), and 2) they tend to affect either very few or none of an organization's Db2-accessing applications (they are usually associated with highly specialized, often unusual scenarios).

Now, while it's highly unlikely that any of an organization's Db2-accessing programs would be negatively impacted by a SQL incompatibility associated with a newer Db2 function level, I can't say that the risk of application impact related to a change in activated Db2 function level is zero if applications are exposed to SQL behavioral changes. That's where APPLCOMPAT comes in. See, if the Db2 package used by an application program has an APPLCOMPAT value of (for example) V12R1M100, SQL behavior for that program will be that of a Db2 12 system with function level 100 activated, even if the Db2 environment in which the program is executing has an activated function level of V12R1M510 (or V13R1M100 or some other Db2 13 function level, after the system has been migrated to Db2 13). That is how APPLCOMPAT insulates a Db2 application program from SQL behavioral changes that might be introduced when a higher function level is activated for the Db2 system in which the program's SQL statements execute. So, make a jump in activated function level from 500 to 510 in your Db2 environment, and leave package APPLCOMPAT values as they are, and your Db2 application programs will be (conceptually speaking) unaware of the environmental change.

OK, so I'd be plenty comfortable in taking a Db2 12 system's activated function level from 500 right to 510, and I hope you'd be similarly comfortable with such a move. You might find, however, that "selling" the powers that be on a plan for getting to Db2 12 FL510 from FL500 would be easier if that plan included a "rest stop" at a function level between 500 and 510. If that were the case, you could go that route and not lose any honor in my eyes; and, yours wouldn't be the first Db2-using organization to take the 2-step approach for getting from Db2 12 FL500 to FL510. I can tell you that a pretty popular intermediate stop on the way from Db2 12 FL500 to FL510 is FL505. This is true for a couple of reasons: stopping for a while (for however long makes people comfortable: a few weeks, a couple of months - whatever) right between FL500 and FL510 appeals to some folks. Additionally, Db2 12 FL505 makes available to you one of my favorite Db2 12 features: rebind phase-in.

Alright, to sum up: if your organization has been sitting at Db2 12 FL500 for a long time, and you want to get to FL510 so you can go from there to Db2 13, consider taking one of two routes from A to B:

Straight through - If your code level and catalog level and no-still-in-use-pre-Db2-11-packages ducks are all in a row, go from Db2 12 FL500 straight to FL510, keeping in mind that leaving package APPLCOMPAT values as they are is a way to insulate Db2-accessing programs from any impact related to the Db2 environmental change.
Go part way, sit a spell, then go the rest of the way. As noted, Db2 12 FL505 is a popular rest stop en route from FL500 to FL510.

The main point is, GET THIS DONE. Db2 12 goes out of service after December 31, 2025. You still have plenty of time to get to Db2 12 FL510 and then on to Db2 13, but if you sit back too long you could end up feeling a little anxious about the whole thing. Who needs that? I hope information in this blog entry will help you get your organization's Db2 12-to-13 show on the road.

Thursday, May 30, 2024

Db2 for z/OS: Really Big Buffer Pools are Great, but Don't Stop There

Back in 2018, I reviewed an organization's production Db2 for z/OS environment, and saw at that time the largest z/OS LPAR real storage size I'd ever seen: 1100 GB. The Db2 subsystem running in that LPAR had (not surprisingly) the largest buffer pool configuration I'd ever seen: 879 GB (referring to the aggregate size of all of the Db2 subsystem's buffer pools). Within that buffer pool configuration was (again, not surprising) the lagest single Db2 buffer pool I'd ever seen: approximately 262 GB (66,500,000 4K buffers).

Those "biggest I've ever seen" figures sustained that status until just recently, when I reviewed another production Db2 for z/OS system. The new "biggest I've ever seen" numbers are way larger than the previous maximums:

A z/OS LPAR with 2.4 terabytes of real storage
A Db2 subsystem with 1.7 terabytes of buffer pool resources.
An individual buffer pool of approximately 560 GB in size (140,000,000 4K buffers)

What I saw is great, and it's the future, folks. That humongous buffer pool configuration is eliminating a humongous number of read I/Os (synchronous and asynchronous), and that, in turn, boosts CPU efficiency and throughput for the Db2-accessing application workload. z/OS LPAR real storage sizes are getting bigger and bigger, in part because mainframe memory keeps getting less expensive on a per-GB basis, and in part because z/OS can utilize ever-larger amounts of real storage (z/OS 3.1 supports up to 16 TB of memory for one LPAR - though with the way memory is currently packaged on an IBM Z server, you're advised to limit the size of one LPAR's real storage resource to 10 TB), and in part because loading up on real storage does not increase the cost of software that runs in a z/OS system (the cost of that software is based on mainframe general-purpose CPU utilization - not real storage size).

Organizations that run production Db2 subsystems in z/OS LPARs that have large amounts of real storage are increasingly going with really large Db2 buffer pool configurations. That's a smart thing to do. Db2 allows for up to 16 TB of buffer pool space for a single subsystem, and going big is a definite plus for Db2 workload performance and CPU efficiency (just be sure that you don't over-burden a z/OS LPAR's real storage resource: you want to keep the LPAR's demand paging rate - available via an RMF Summary report - below 1 per second).

Here's the point I want to make with this blog entry: while really big buffer pools are great for Db2 application performance, don't focus solely on the size of a Db2 subsystem's buffer pool configuration. In addition to going big, take other steps to maximize the positive performance benefits of a large Db2 buffer pool configuration:

Use large real storage page frames for the busier buffer pools - In my view, a buffer pool's GETPAGE rate (obtainable from a Db2 monitor-generated statistics long report) is the best indicator of a pool's busy-ness. My recommendation is to use large real storage page frames for every pool that has a GETPAGE rate (during a busy hour of the processing day) in excess of 1000 per second. Large real storage page frames enhance CPU efficiency for page access by reducing the CPU cost of translating virtual storage addresses to real storage addresses. Some related things to note:

Page-fixing of buffers in a pool is a prerequisite for using large real storage page frames - This is done for a pool via a specification (in an -ALTER BUFFERPOOL command) of PGFIX(YES). Note that actually changing from not-page-fixed to page-fixed requires deallocation and reallocation of a pool - this usually happens as a consequence of recycling (i.e., stopping and then restarting) the associated Db2 subsystem. That deallocation and reallocation is also required to go from the default real storage frame size of 4K to a larger frame size.
Use the appropriate real storage frame size - For a buffer pool that is at least 20 GB in size (a little over 5,000,000 buffers, if we're talking about a 4K pool), I recommend the use of 2 GB page frames (this is accomplished via a specification of FRAMESIZE(2G) in an -ALTER BUFFERPOOL command). For a pool that is smaller than 20 GB, I recommend the use of 1 MB page frames (FRAMESIZE(1M)). Note that if a pool is defined with PGSTEAL(NONE) (see below for information about such pools, which are known as "contiguous" pools), 2 GB frames cannot be used - you need to go with FRAMESIZE(1M) for a PGSTEAL(NONE) pool, unless objects assigned to the pool are quite small (e.g., less than 100 4K pages), in which case the default 4K frame size would be appropriate (for a PGSTEAL(NONE) pool, Db2 will use a given large real storage page frame for buffers that hold pages of one object assigned to the pool).
Set the value of the z/OS parameter LFAREA appropriately - LFAREA, a parameter in the IEASYSnn member of the z/OS data set named SYS1.PARMLIB, specifies the amount of a z/OS LPAR's real storage resource that is to be managed in 1 MB (and, possibly, 2 GB) frames. I generally like to see an LFAREA specification that provides enough 1 MB (and 2 GB, when appropriate) frames to fully back FRAMESIZE(1M) (and, if relevant, FRAMESIZE(2G)) pools, but not much more than that. Why not much more? Because a lot of processes in a z/OS system can only use 4 KB page frames. If you think you might want to later enlarge some Db2 buffer pools that have a FRAMESIZE(1M) or a FRAMESIZE(2G) specification, you can make the LFAREA value for the frame size in question larger than you need at present - just don't go overboard with that.
Use the output of the Db2 command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL to verify that pools that have a preferred frame size of 1M or 2G are fully backed by frames of the desired size - If you see, in the output of that command, that a pool with a VPSIZE of 100,000 buffers and a preferred frame size of 1M has 70,000 buffers allocated in 1M frames and 30,000 buffers allocated in 4K frames, it means that there aren't enough 1 MB frames in the system to fully back the pool. Note that for a pool that has a preferred frame size of 2G, you might see some buffers allocated in 1M frames, even if the system has enough 2 GB frames to "fully" back the pool. Why might that be the case? Well, in the interest of not wasting a lot of space in a 2 GB page frame, Db2 won't use such a frame if it can't fill at least about 95% of the frame with buffers. Suppose you have a pool that is about 41 GB in size. Db2 will use twenty 2 GB frames for that pool, and the remaining 1 GB of buffers will be allocated in 1 MB frames, As far as I'm concerned, that is not at all a big deal. No problem.

If you have any PGSTEAL(NONE) buffer pools, aim for zero activity in the overflow area of those pools - A PGSTEAL(NONE) buffer pool is also known as a "contiguous" buffer pool. Such a pool is intended to be used to completely cache in memory the objects assigned to the pool. A contiguous buffer pool can optimize efficiency for page access, but some of that advantage is lost if there is any activity in the pool's overflow area, which is where any buffer stealing would occur (if the pool were in fact not large enough to hold every page of every object assigned to the pool). Output of the Db2 command -DISPLAY BUFFERPOOL (ACTIVE) DETAIL will show if there is any activity in the overflow area of a PGSTEAL(NONE) pool. If you see that there is activity in the overflow area of a PGSTEAL(NONE) pool (unless it's a really small amount of activity), either take one or more objects out of the pool (by reassigning them to other pools) or make the pool larger (always keeping in mind that you want the demand paging rate of the z/OS LPAR to be less than 1 per second - don't over-burden the LPAR's real storage resource).

And there you have it. For a Db2 for z/OS buffer pool configuration, really big is really great, but take the time to go the extra mile by optimally configuring the pools in the configuration.

Monday, April 29, 2024

Db2 13 for z/OS: Utility Execution History

A few months ago, I got this question from a Db2 for z/OS DBA: "Is there a way we can check to see if the UNLOAD utility has been executed for certain of our tables?" The environment in question was a Db2 12 for z/OS system, and because of that I had to provide an answer that was less-than-great from an ease-of-use persepective: "Yes, you can do that in a Db2 12 environment, but the process is going to be a bit cumbersome. You could set up a Db2 audit policy that would record utility execution. What that will do is cause Db2 to write certain trace records when utilities are executed. You'd need to use your Db2 monitor to format that trace information into a human-readable report (using IBM's OMEGAMON monitor for Db2 for z/OS, that would be by way of a Record Trace report); and, you'd have trace records written for EVERY utility execution during the reporting time interval - potentially, a lot of data to sift through; and, there would be trace records generated for every utility executed for ANY database object - not just the tables in which you're interested (more data to sift through)."

"Alternatively, you could set up a different audit policy that would limit trace output to the tables of interest, but that policy would cause Db2 to generate trace records for ALL operations performed involving that set of tables, whether associated with a utility or with an application process - again, potentially a lot of trace data to sift through."

Not the idea situation, but along comes Db2 13 for z/OS to make things better - in two steps.

Step 1: Once you've activated function level V13R1M501, you can take advantage of a new ZPARM parameter, UTILITY_HISTORY. The default value for that parameter is NONE - not because we (IBM) want to discourage you from using a new Db2 capability; rather, this is in keeping with the traditional approach for a new ZPARM: the initial default value preserves existing behavior. "Existing behavior" in this case is "no utility history," because utility history functionality was not available prior to Db2 13 function level 501. If you set UTILITY_HISTORY to UTILITY, something nice happens: every time a utility executes, Db2 will insert a row into the SYSIBM.SYSUTILITIES table (newly added when CATMAINT is executed to take the catalog to the V13R1M501 level). That row for a utility in SYSUTILITIES provides a lot of useful information, such as:

The name of the utility (e.g., UNLOAD, or COPY, or REORG, or ...)
The name of the job through which the utility was executed
The user ID that invoked the utility
The starting point of the utility, both timestamp-wise and logpoint-wise
The timestamp at which the utility completed processing
The elapsed time for the utility job
The general-purpose CPU time and zIIP CPU time for the utility job
The portion of CPU time (both general-purpose and zIIP) consumed with sort activity related to the utility job
The utility job's return code
Whether the utility job was terminated (and, if so, whether it was restarted)

Step 2: After you've activated function level V13R1M504, you can change the value of UTILITY_HISTORY in ZPARM to OBJECT. When you do that, execution of a utility will cause Db2 to insert into SYSUTILITIES a row with the useful informational items noted above, and in addition to that Db2 will insert into SYSIBM.SYSOBJEVENTS a row for each object (non-partitioned table space or index, or a partition of a table space or index) processed by the utility (SYSOBJEVENTS makes its debut when the catalog is taken to the V13R1M504 level).

[Note: prior to activation of function level V13R1M504, there is some capability to obtain information about the object(s) processed by a utility by matching SYSUTILITIES rows with SYSCOPY rows based on values in the EVENTID column which is found in both tables. I say, "some capability," because not every utility drives SYSCOPY insert activity - that's what makes SYSOBJEVENTS important.]

Back to the question, referenced at the start of this blog entry, about tracking UNLOAD activity for certain tables. In a Db2 13 environment with function level V13R1M504 (or higher) activated, and with UTILITIES_HISTORY set to OBJECT in ZPARM, this question can be answered simply by querying SYSUTILITIES with a WHERE NAME = 'UNLOAD' predicate, and joining with SYSOBJEVENTS on the EVENTID column to get the related table space names (and of course if you are wanting to track UNLOAD activity for particular tables you can have a WHERE clause for the SYSOBJEVENTS table that references the name of the table space or table spaces of interest).

[I recognize that UNLOAD can be executed at a table level, and that a table space could have multiple tables, but I'd hope that your sensitive-data tables are in single-table table spaces because 1) it's best to have tables in universal table spaces, and a universal table space is related to a single table; and 2) I think that data protection can be maximized when a sensitive-data table is isolated in its own table space.]

Needless to say, tracking UNLOAD activity is just one of many ways in which Db2 13 utility history data can be beneficially utilized. In section 8.1.7 of the IBM "redbook" titled, IBM Db2 13 for z/OS and More, you'll find examples of queries of SYSUTILITIES that can be used to gather all kinds of helpful information, such as:

All the utilities that executed between midnight and 8:00 AM this morning
All the utilities that are currently active or stopped
All the utilities that finished with a return code of 8 or higher
The top general-purpose CPU-consuming LOAD jobs over the past 7 days
And more...

Those sample queries can help you get started, and soon you and your teammates will come up with your own queries that yield valuable info on utility execution at your site (and keep in mind that the queries in the above-referenced redbook don't include SYSOBJEVENTS because that catalog table was not yet present when the redbook was written - you can of course extend the queries with joins to SYSOBJEVENTS).

This is a Db2 13 enhancement that I like a lot. I think you'll like it, too. Check it out.