Robert's Db2 blog

Db2 13 Function Level 506: a Nice Multi-Row Insert Enhancement

2025-01-07T06:30:00.000-08:00

Recently (via function level 506, which came out in October 2024), Db2 13 for z/OS got a multi-row INSERT enhancement that surprised me. Why was I surprised? Because - embarrassing admission - I thought the functionality had already been available in a Db2 for z/OS system. My confusion in this case stemmed largely from terminology, as I'll explain.

There's (now) more than one kind of multi-row INSERT

A colleague of mine, in the IBM Db2 for z/OS development organization, explained to me that with function level V13R1M506 activated in a Db2 13 system, the statement below can be successfully executed (provided the APPLCOMPAT value for the Db2 package associated with the statement is at least V13R1M506):

INSERT INTO EMPLOYEE
(EMPNO, FIRSTNAME, LASTNAME, WORKDEPT)
VALUES
('000206', 'ELIZABETH', 'GRACE', 'A11'),
('000207', 'JACK', 'JOHNSON', 'B13'),
('000208', 'JENNIFER', 'WHITE', 'D15');

My initial response: "But, we can already do that." The truth: no, we couldn't. I got mixed up because of two things. First, I knew that Db2 for z/OS had multi-row insert functionality, and that this feature was introduced with Version 8, back in 2004 (and I'm not talking about an INSERT with a subselect, which is a form of multi-row insert that has been around for I don't know how long). Second reason for my confusion: in my mind, I thought that the multi-row INSERT enhancement delivered with Db2 V8 for z/OS enabled the syntax of the green-highlighted statement shown above. Not so. What Db2 V8 made possible was this kind of process:

In your row-insert program, declare several host variable arrays - one for each column of the table into which rows will be inserted.
Load the host variable arrays with the values that will be inserted into columns of the target tables (so, referencing the green-highlighted INSERT statement shown above, the first host variable array could hold values '000206', '000207' and '000208'; the second host variable array could hold values 'ELIZABETH', 'JACK' and 'JENNIFER'; and so on).
Once the host variable arrays have been loaded, issue the INSERT statement in the form shown below (this example is of a static INSERT statement, and it is assumed that the host variable array names are :hva1, :hva2, :hva3 and :hva4).

INSERT INTO EMPLOYEE (EMPNO, FIRSTNME, LASTNAME, WORKDEPT)
VALUES (:hva1, :hva2, :hva3, :hva4) FOR 3 ROWS
NOT ATOMIC CONTINUE ON SQLEXCEPTION;

Obviously, if I'd ever had a need to code a multi-row INSERT statement prior to the advent of Db2 13 function level 506, I would have known right away that the new multi-row INSERT syntax provided with FL506 really was new; but, I never had that need (mostly I work with a test Db2 for z/OS system, and I often need only 3 or 4 rows in a table to try out a certain query syntax, and I always found it pretty easy to just insert those 3 or 4 rows individually - plus, I could often use already-populated tables in this system). Being quite familiar with single-row INSERT syntax, I had assumed that the Db2 V8-introduced "multi-row INSERT" capability of which I'd heard involved an extension of traditional single-row INSERT syntax. Invalid assumption.

What I really like about the new multi-row INSERT syntax provided by Db2 13 FL506

Here's the thing: it's always been pretty easy to programmatically insert a lot of rows into a Db2 for z/OS table using single-row INSERT syntax: you just specify host variables in the VALUES clause of the INSERT statement, and you place the values for row 1 into the appropriate host variables and execute the INSERT, then place the values for row 2 into the host variables and execute the INSERT, then place the values for row 3 into the host variables and execute the INSERT, and so on. The value of multi-row INSERT, then, is often related to enhanced CPU efficiency and throughput for high-volume INSERT processes. For organizations with a need for high levels of INSERT throughput, especially when the INSERTs are driven by a relatively smaller number of batch processes versus a large number of transactional processes, the new multi-row INSERT syntax provided by Db2 13 FL506 means that the CPU-efficiency and throughput benefits of multi-row INSERT can now be achieved in a way that is more programmer-friendly than before (referring the the older array-based Db2 multi-row INSERT capability).

[Added bonus: the programmer-friendly multi-row INSERT syntax delivered for Db2 13 for z/OS via function level 506 was already supported in a Db2 for Linux/UNIX/Windows (LUW) environment. This consistency between Db2 for z/OS and Db2 for LUW is really helpful for developers who work with both of these members of the Db2 database family.]

Shoot, I might even change my INSERT ways, thanks to Db2 13 FL506. I believe that old dogs can learn new tricks. Next time I'm working with one of our test Db2 for z/OS systems, and I need to put a few rows into a table I've created, I'm going to use the new multi-row INSERT syntax made possible by function level V13R1M506, versus my old "code single-row INSERT, execute it, edit the statement with different VALUES specifications, execute it again, and so on" approach. There's a Db2-geek New Year's resolution for you. Happy New Year to all.

Db2 13 for z/OS: Granular Control of Security Requirements for Client-Server Applications

2024-12-17T07:10:00.000-08:00

The Db2 profile tables - SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES - have long been very useful in managing a Db2 for z/OS client-server application workload (i.e., a Db2 system's DDF workload). In particular, organizations have appreciated the ability, provided via the profile tables, to set thread limits and/or connection limits and/or an idle thread timeout value for particular DDF-using applications, versus having to rely solely on the subsystem-wide thread limit, connection limit and idle thread timeout values specified by way of the ZPARM parameters MAXDBAT, CONDBAT and IDTHTOIN, respectively. Db2 13 for z/OS recently delivered a new "more granular than ZPARM" DDF workload management option that can be exercised through the profile tables. This time, the more fine-grained control pertains to security requirements for Db2 for z/OS client-server applications. You can read on to learn more about this enhancement.

The challenge for which a solution was needed

Historically, a system-wide security requirement (i.e., one that would apply to all requesters) for a Db2 for z/OS system was specified via the TCPALVER parameter in ZPARM. The need for a more-granular client-server security requirement specification can be readily seen if you think about a new requirement that you'd like to eventually apply to the entire DDF workload for a Db2 for z/OS system. Suppose, for example, that your Db2 client-server applications now authenticate to the Db2 server with a password, and you'd like to eventually have all requesters authenticate to Db2 instead with a digital certificate; or, maybe you'd like for all requester applications accessing the Db2 system to eventually be required to utilize SSL encryption (aka AT/TLS encryption, referring to encryption of data "on the wire" that flows between Db2 for z/OS and requester applications); or, maybe you'd like to eventually have all requester applications use multi-factor authentication when requesting a connection to the Db2 system, or use RACF-generated authentication tokens. With all these situations, the tricky thing has been the "eventually" part of the objective. That's been problematic because Db2 for z/OS client-server security requirements have often required implementation in a "big switch" way (as in, throwing a big switch that suddenly routes all traffic to track A versus track B).

Consider, for example, a desire to have all Db2 for z/OS client-server applications use SSL encryption. One way to do that - available since April of 2019, via Db2 12 APAR PH08188 - is to make it so that the only SQL listener port available for the Db2 subsystem in question is a secure port (when a client application requests a connection to a Db2 for z/OS system via the Db2 server's secure port, Db2 will reject the connection if it determines that the requester cannot support SSL encryption). Let's say you go this route. OK, objective achieved - but, at what cost? Setting up SSL encryption for a Db2 client-server application can be a little tricky if you haven't done that before. There are multiple steps that have to be taken - correctly - on the Db2 and z/OS side, and on the client application side. It's pretty easy to get something wrong, and it could take a few tries before you get it right. If the Db2 for z/OS system's only SQL listener port is of the secure variety (i.e., you've thrown that particular "big switch"), any of your applications that aren't properly set up for SSL encryption will not be able to connect to the Db2 system (and if part of the server-side set-up for SSL encryption was not done properly, NO applications will be able to connect to the Db2 system). This could end up causing substantial disruption for one or more of your Db2 for z/OS client-server applications - something you'd certainly want to avoid.

It's a similar story for things such as certificate-based (versus password-based) client authentication, multi-factor authentication, etc. There are "big switch" ways in which you can lay down the law, but those approaches carry with them an elevated risk of application disruption because of their "big switch" nature. Wouldn't it be nice if you could roll out a Db2 for z/OS client-server security requirement in an incremental way - start small, and then expand once you've gotten it right for a little piece of the client-server workload? In fact, you can take just that approach in a Db2 13 for z/OS environment, thanks to APARs PH48764 (March 2023) and PH57811 (January 2024).

A new way to use the Db2 profile tables

APAR PH48764 added a new profile table attribute: MONITOR _______ CONNECTIONS FOR SECURITY. You'd fill in that blank after MONITOR with a category of the Db2 client-server application workload, with the category choices being:

REST (for applications connecting to the Db2 for z/OS system's REST interface)
JDBC
CLI (short for the call-level interface - this will most commonly apply to applications using an ODBC driver to access the Db2 system)
DB2CONNECT (this applies to applications that connect to the Db2 system through a Db2 Connect "gateway" server, as opposed to getting a direct application server-to-Db2 connection)
DSN (this refers to requesters that are other Db2 for z/OS systems)

[You can also fill in the blank with an * - that profile would apply to any requester not covered by one of the more-specific client-server application categories listed above. Also, note that if there is a requester for which no profile applies, the security requirement specified via TCPALVER in ZPARM will be in effect.]

Importantly, APAR PH57811 made it possible to get very specific within a given client-server application category: you can have a profile that applies to a subnet of TCP/IP addresses, or even to a specific IP address (referring to the IP address of an application server).

With a profile defined (a category of the Db2 client-server workload, possibly as granular as a single application server), you can provide the associated attribute row in the DSN_PROFILE_ATTRIBUTES table. In that row, you can indicate the appropriate security requirement - choices include password-based cleint authentication, certificate-based client authentication, multi-factor authentication, and whether or not SSL encryption is required (the full range of choices can be found in the Db2 for z/OS documentation, and note that a requirement for certificate-based client authentication carries with it a requirement for SSL encryption).

OK, so let's look again at that scenario involving making SSL encryption a requirement for - eventually - all of your Db2 for z/OS client-server applications. This time, the "eventually" bit will NOT be a problem, because we'll leverage this new profile table-based granular client-server security requirement specification capability. Suppose we decide to start with applications that access the Db2 system using a JDBC driver. We can specify a profile for that DDF workload category, and we can make the profile applicable solely to a single server on which a JDBC-using application runs. If the application running on that server is running as well on one or more other servers, we can flow most all transaction work through those other servers and have just a few transactions (perhaps submitted by people on the IT team, as opposed to being user-submitted transactions) going through the server acting as our "canary in the coal mine." We issue the -START PROFILE command to load the profile table information into Db2 memory, and now the JDBC-using application running on the one app server will have to successfully complete an "SSL handshake" with the Db2 for z/OS server in order to establish a connection. What if the first connection request from the app server fails with a security error related to our MONITOR JDBC CONNECTIONS FOR SECURITY row in DSN_PROFILE_ATTRIBUTES? Not a huge deal - the application is interacting with Db2 just fine from servers other than the one on which we're working. We examine the connection error data, and determine where we need to make an adjustment or adjustments - maybe Db2 for z/OS-side, maybe app server-side, maybe a little of both. After making appropriate changes on whatever side, we try again and, WOO-HOO - successful SSL connection, followed by successful execution of some transactions flowing from the one app server. Now, we have a template for success - we know how things need to be set up in order for SSL encryption to work for our JDBC-using applications. We can subsequently expand the location-scope of our profile to include other servers hosting JDBC-using applications that access the Db2 system (maybe specifying, for example, a TCP/IP address that covers a subnet that includes several of these app servers), and eventually go with a generic location value (which could be specified in various ways, including * or 0.0.0.0) to indicate that the profile applies to all JDBC-using applications that access the Db2 for z/OS system.

Having gotten to where we want to be with the JDBC-using applications, we can take the same approach with (for example) applications that access the Db2 system using an ODBC driver - start with (if we want) a single app server, and stay with that very limited location scope until we get it right and have, as a result, the template for success with ODBC-using applications. Then we expand the location scope and eventually have the SSL requirement in force for all ODBC-uisng applications; then, on to the next category of interest to us (maybe REST clients). This "start small, get it right, then expand the location scope using what we've learned" approach can greatly reduce the risk and reach of application disruption when we want to implement a new Db2 for z/OS client-server security requirement. A game-changing Db2 enhancement, in my view.

Don't forget about profile table "warning" behavior - great for discovery purposes

Before closing out this blog entry, I want to highlight a Db2 profile table characteristic that I feel is sometimes overlooked by DBA teams: attributes (those that specify a requirement such as a security rule, or that specify a limit on application consumption of resources such as Db2 connections or threads) can be implemented with warning behavior ("let me know if the requirement was not met or the limit was exceeded, but otherwise take no action") as well as with exception behavior ("take application-impacting action when the requirement is not met or the limit is exceeded"). Warning behavior can be great for "getting the lay of the land" when you want to specify a new security requirement or set a new limit for some part of a Db2 subsystem's DDF workload. Let's again consider the SSL encryption requirement. Before establishing a "comply or suffer the consequences" SSL encryption requirement, even for a single app server, you might do well to activate one or more profiles and associated attributes in warning mode, just to see how big of a lift your team's going to be facing in establishing the new security requirement. How about JDBC-using applications? Is it the case that most of them are already using SSL encryption when accessing the Db2 system (not so big a lift, in that case), or is the reality more on the other end of the range of possibilities, with few or no JDBC-using applications using SSL when accessing the Db2 system (in which case, get ready to roll up your sleeves)? You can get the answer to that question by specifying an "SSL required" attribute for JDBC-using applications that will function in warning mode. No application work is impacted, and the messaging provided by Db2 when the SSL requirement is not met will provide you with valuable clarity regarding the current situation.

In conclusion...

In this blog entry, I have highlighted two of the beneficial applications of MONITOR _______ CONNECTIONS FOR SECURITY functionality that most appeal to me:

It gives you the ability to "start small, succeed and then expand" when rolling out a security requirement that you'd like to eventually be in effect for your entire Db2 for z/OS client-server application workload, such as use of SSL encryption or certificate-based client authentication or multi-factor authentication. This approach can substantially reduce the risk of application disruption that could occur if you were to implement the new security requirement by "throwing a big switch."
The non-application-impacting information provided by Db2 when profile table requirements or resource limits are implemented in warning mode can provide a very useful "lay of the land" view of one or more aspects of a Db2 client-server application workload. That intelligence, in turn, can help you to assess the scope of effort that would be required to implement the proposed requirement or resource limit in exception mode.

There are plenty of other ways that MONITOR _______ CONNECTIONS FOR SECURITY functionality could be helpful for your organization - including, of course, the establishment of security requirements that are intentionally different for different categories of your Db2 client-server application workload. Consider how your organization could advantageously use this Db2 13 enhancement, and get ready to put it to work.

Db2 13 for z/OS: Partition-Level Locking for Insert into a PBG Table Space

2024-11-22T13:53:00.000-08:00

Db2 13 for z/OS delivered an enhancement to reduce the likelihood that an insert targeting a table in a partition-by-growth (PBG) table space will fail due to partition-level lock contention. In this blog entry I'll describe that enhancement, and I'll provide some additional information that I hope will help you understand how partition-level locking affects insert actions involving PBG table spaces.

First, an important distinction between PBG and partition-by-range (PBR) table spaces: for a row-insert targeting a table in a PBR table space, the row in question must be inserted into a particular partition, based on the value of the partitioning key of the to-be-inserted row. In the case of an insert into a table in a PBG table space, the new row can go into any of the table space's partitions. Yes, a particular partition will be preferred for the new row (information on that follows), but the insert operation will not be restricted to that preferred partition.

OK, so how is the preferred partition of a PBG table space determined for an insert operation? Simple: that will be the partition containing the target page for the insert. What's the target page? It's the page that, per the table's clustering index, would best preserve row-clustering for the table if the new row were to go into that page.

Now, you might think that with the target page for the new row identified, Db2's next step would be to get an X-lock on the page (or for the row, if row-level locking is in effect), once it has been determined that the page has room for the row. In fact, that doesn't happen - not yet, anyway. See, before Db2 can request a child lock (i.e., a lock on a page or a row), it has to obtain the associated parent lock (for a universal table space, whether PBG or PBR, the parent lock will be on the partition within which a new row is to be inserted). If you're concerned about an entire partition being locked just for an insert of a row, relax - almost always, the partition lock acquired for an insert (or for an update or a delete) will be an intent lock (specifically, for a data-change action such as insert, an IX lock - short for intent exclusive), and intent locks do not conflict with each other.

There's an interesting aspect of the partition lock request Db2 makes for the PBG partition containing the target page: it will be a conditional lock request. What does that mean? It means that if Db2 can't get the requested partition lock right away, it will move on to another of the PBG table space's partitions (if the PBG table space has more than one partition). In other words, Db2 will not wait for up to the subsystem's lock timeout limit to obtain the requested partition lock (that lock timeout limit is the value of the ZPARM parameter IRLMRWT, which has a default value of 30 seconds).

"But wait," you might be thinking. "Didn't you just say that partition-level locks are non-exclusive?" I said that they're almost always non-exclusive. In unusual circumstances, some process may have an exclusive lock on the partition containing the target page. In that case, because the lock on the partition requested by Db2 for the insert action was conditional in nature, Db2 will effectively say, "Oh, I can't get an IX lock on this partition? No prob. I'll just request a conditional lock on the next partition." What's the "next" partition? Well, it could be the next one forward from the one containing the target page, or the next one going backward from that partition (Db2 mixes up the direction of partition progression in these situations, to kind of even things out). Let's say that in this case the PBG table space has 10 partitions, and the target page is in partition 5, and the direction of partition progression is "forward." Db2 requests a conditional lock on partition 6, because the conditional lock request for partition 5 was a no-go. If the conditional lock request for partttion 6 is unsuccessful, Db2 will try again with partition 7.If Db2 gets to partition 10 and the conditional lock request for that partition is unsuccessful, it will wrap around to partition 1 and try for that one.

Suppose this progression continues, and again and again the conditional requests for partition-level locks are unsuccessful, and Db2 ends up back where it started - partition 5. Here's where the Db2 13 enhancement comes in. In a Db2 12 environment, if Db2 tried and tried the conditional partition lock requests and those were unsuccessful every time and Db2 wrapped around the PBG table space's partitions and got back to the partition containing the target page, the insert would fail at that time with a -904 reason code (resource unavailable) and a reason code 00C90090 (partition lock failure). In a Db2 13 environment, Db2 will NOT fail the isert when it has "wrapped back around to" (in our example) partition 5; instead, Db2 13 at that point will retry up to 5 of the conditional partition lock requests that had been unsuccessful the first time around. Given that exclusive partition locks (which block the intent-exclusive partition locks that Db2 has been requesting for the insert) tend to be of relatively short duration, there is a pretty good chance that one of those retries will be successful, and the insert operation can then proceed.

If the (up to 5) conditional partition lock request re-tries are all unsuccessful, will Db2 13 fail the insert? Not yet. At that point, Db2 13 will issue a "regular" (i.e., not conditional) request for a partition-level lock (for the partition containing the target page), and will wait for up to the IRLMRWT-specified timeout period for that lock.

If the "regular" partition lock request times out, will Db2 13 fail the insert? Maybe not yet. Here, there is a dependency on whether or not the PBG table space in question has reached its MAXPARTITIONS limit. I've been using the example of a PBG table space that has 10 partitions. If the MAXPARTITIONS value for the table space is greater than 10, here's what will happen if the conditional partition lock requests (original and retry requests) and the "regular" (non-conditional) partition lock request all fail: Db2 13 will check to see if all of the table space's partitions are full. If they are, Db2 will add a new partition to the table space, and the row-insert into the newly added partition should be successful. If Db2 sees that the table space's partitions are not all full, Db2 will fail the insert with a 00C90090 reason code (partition lock failure). Let's change the scenario slightly, so that the table space's MAXPARTITIONS value is 10 (equal to the current number of partitions for the table space). If that is the case then after all the partition lock requests (the conditional requests - including retries - and the non-conditional request) have been unsuccessful, Db2 will fail the insert with the aforementioned 00C90090 reason code.

A word on the "check for partition full" action just mentioned: this is a "speed check," using cached-in-memory information, and that cached-in-memory information might be a little off from reality. To explain: when Db2 searches a partition for space for a new row (and Db2 will do this if it was able to get the partition-level lock required for an insert), and finds that the partition in question is actually full, it will record that finding in memory. If there are then some deletes that free up space in the partition, will Db2 adjust the cached-in-memory information indicating "partition full?" No - doing that would add undesirable overhead and contention for delete operations. When will the cached-in-memory "partition full/not full" information be updated? After the next search for space in a given partition; so, at a particular time there could be a slight difference between what the cached-in-memory information says about the full-ness of partitions, and the reality of the full-ness of those partitions. What the "sometimes slightly fuzzy" cached-in-memory information enables is a very quick check on partition full-ness after Db2 has been unable to obtain the partition lock needed for an insert. This, I think, is probably what you want, After Db2 has spent time trying (maybe trying and trying and trying and...) to get a partition lock in order to get an insert done, you probably don't want Db2 to add the time needed for a comprehensive search for space to be added to the process. Without getting a partition lock, no insert is going to happen anyway - the real quick partition full-ness check using the "pretty accurate but maybe kind of fuzzy" cached-in-memory information is done just to see if adding a new partition is warranted following failure by Db2 to get a lock on any of the table's existing partitions.

One more thing: if getting a the partition lock needed for a row-insert is not a problem (and usually, this will in fact not be a problem), then Db2 might find that all of the table space's partitions are actually full (done by actually searching the partitions for space - not based on a check of the cached-in-memory partition full/not full information). In that case, Db2 will add a new partition for the table space if the table space's MAXPARTITIONS limit has not been reached. If the MAXPARTITIONS limit has been reached, the "partitions actually all full" situation will cause Db2 to fail the insert with a -904 (resource unavailable) and a 00C9009C reason code (partition full).

I hope that this look at some insert scenarios involving Db2 for z/OS partition-by-growth table spaces has provided you with some useful information. As always, thanks for visiting the blog.

Db2 for z/OS: Avoiding Splitting Headaches (the Index Page Kind)

2024-10-28T14:36:00.000-07:00

This blog entry is about a Db2 for z/OS-internal process that has long been a headache-inducer for DBAs, and about some Db2 13 enhancements that can provide effective relief for those headaches. The Db2-internal process to which I'm referring is index page splitting.

Index page splitting: what it is and why it matters

Db2 for z/OS can support very high levels of row-insert throughput. One reason why this is so: a to-be-inserted row does not have to go into a particular page of a table's table space. Yes, if there is a clustering index on the table (and there will be such an index, if the table has any indexes) then there will be a "target page" for a to-be-inserted row - the page into which the new row should be inserted in order to best maintain row-clustering for the table; however, if that target page is locked at the time by some other process or is found to be full, Db2 will insert the new row into some other page - ideally, into a page relatively close (physically) to the target page, but into a "farther away" page if need be. The point here: insert throughput will not be slowed due to contention on, or fullness of, the table space page into which a new row should ideally go.

It's a different story for indexes on Db2 tables. When a new entry is to be added to an index (as a result of an insert operation), that new entry must go into a particular page, because Db2 maintains physical ordering of index entries according to index key value. If the page into which a new index entry must go is full, Db2 will do something called index page splitting. That operation involves moving a portion of the entries in the page to an empty page of the index, so that the formerly-full page will have room for the new entry (Db2 provides a pointer from the formerly-full page to the page into which relocated entries were placed, and a pointer from that page to the one that is next in terms of logical sequence).

The good about index page splitting is that it preserves physical ordering of index key values. As for the not-so-good effects of index page splitting...

It causes the index in question to be less well-organized in terms of physical versus logical page sequencing. The index entries relocated as a result of an index page split can't just go into some page of the index that has some room - they have to go into an empty page. If all of the pages in an index structure are populated, the relocated entries will have to be placed in a new page at the "far end" of the physical index space. As more and more such relocations happen, the performance of index-based data access paths such as "matching index scan" degrades, because Db2 has to "jump" - maybe quite a long ways - from index page n ("n" referring to logical sequence) to page n+1, because logical page n+1 may be physically quite far from the physical location of logical page n (and then Db2 has to jump back to page n+2 in the logical sequence). An index REORG is required to get the physical sequence of pages in line with their logical sequence.
Especially for indexes on tables that are the target of high-volume insert processes, index page splitting can be a real drag on performance, because an index page split operation is serialized by Db2 to protect the data integrity of the index, and that causes processes that need to access the affected part of the index tree structure to wait for the index page split operation to complete. This impediment to application throughput can be more pronounced when Db2 is running in data sharing mode in a Parallel Sysplex cluster of z/OS systems, because of the extra logging that accompanies index page split operations.

Given these negative impacts of index page splitting, you'd think that Db2 DBAs would want to be on the lookout for processes affected by index page splits, and for indexes getting a lot of index page split activity, and you'd be right in that thinking. Just one problem there: actually zeroing in on index page splitting hot spots was challenging (before Db2 13, that is).

A longstanding issue (prior to Db2 13): pinpointing index page split problems

For a long time, positively identifying processes driving index page splitting, and indexes getting a lot if index page split activity, was no easy thing. The issues:

Identifying split-heavy processes - There was a trace record, IFCID 359 (associated with performance trace class 4), that could be used to monitor index page split activity, but this trace was not often activated at Db2 sites. One reason for that lack of use: IFCID 359 is a relatively high-overhead trace - it causes a trace record to be generated for every index page split operation that occurs in a Db2 system. In some cases, there can be a lot of index page splitting going on - especially if there is a high volume of insert activity for tables that have indexes defined on keys that are not continuously-ascending (index page splits do not occur for indexes defined on continuously-ascending keys - i.e., keys for which every new entry has a higher key value than all previous entries in the index). Overhead issues aside, IFCID 359 trace records also lack some information that would be helpful in identifying processes driving index page split activity: it does not provide the unit of recovery (UR) ID of the process that is causing an index page split to happen, and in a Db2 data sharing environment it does not provide the ID of the member subsystem that is performing the index page split.
Identifying indexes with high levels of index page split activity - You could check the value of the REORGLEAFFAR column of the row for an index (or index partition) in the SYSINDEXSPACESTATS real-time statistics table in the Db2 catalog, and get maybe a sense that the level of index page splitting could be high for the index in question, but it was hard to be sure about this - index page splitting is not the only action that causes this counter to be incremented. On top of that, you couldn't determine if these (possible) index page splits for the index were taking a lot of time to be processed.

Bottom line: at best you could get kind of a fuzzy picture of indexes that might be seeing a lot of index page split activity, and you basically didn't have any clear indication of the processes driving index page split activity. Hard to take decisive mitigating actions when you don't have really good intel. Fortunately, Db2 13 provides relief for DBA headaches related to index page splitting.

Db2 13 for z/OS: more helpful (and more CPU-efficient) tracing, more real-time stats

First, the good news on the tracing front: Db2 13 introduces a new trace record, IFCID 396, that is associated with statistics class 3 and is, therefore, active by default (Db2 statistics trace classes 1, 3, 4, 5, and 6 are on by default, per the SMFSTAT parameter in the Db2 ZPARM module). Does this make the CPU overhead of statistics class 3 a matter of concern? Nope - stats class 3 remains a very low-overhead trace. How so? Well, the new IFCID 396 trace record is generated not for every index page split in a Db2 system, but only for those that take an unusually long time to complete (specifically, more than one second). That's generally what you want a spotlight on - more than on all the index page splits that get done lickety-split. What's more, an IFCID 396 trace record provides the UR ID of the process driving the unusually-long-running index page split operation, and (in a Db2 data sharing environment) the ID of the Db2 member that processed the index page split. Result: identifying processes that are driving unusually high levels of unusually long-time-to-complete index page split operations is now a lot more straightforward than it was before.

The real-time statistics situation is also improved with Db2 13, thanks to three columns that are added to SYSIBM.SYSINDEXSPACESTATS when the catalog level goes to V13R1M501 (and the catalog can be taken to that level once function level V13R1M500 has been activated):

REORGTOTALSPLITS - Since the index (or index partition) was last REORG-ed, how many index page splits have happened?
REORGSPLITTIME - What has been the total amount of time consumed in processing all the splits indicated by the REORGTOTALSPLITS value?
REORGEXCSPLITS - Of the number of splits indicated by the REORGTOTALSPLITS value, how many took an exceptionally long time (more than 1 second) to be processed?

With these new real-time stats columns, identifying indexes for which page splitting is problematic is a snap. With such an index identified, you can take an action (or combination of actions) to address the issue - for example:

Go to a larger index page size to reduce the incidence of page splitting (it takes more index entry inserts to fill a larger index page).
Increase the PCTFREE value for the index, so that after a REORG there will be more space in each page to hold newly-inserted entries.
Increase the frequency of REORGs for the index.

Here's the key takeaway for Db2 for z/OS DBAs: whereas before you were kind of shooting in the dark in trying to identify and deal with index page split issues, Db2 13 gives you actionable intelligence that you can use to effectively and efficiently focus your index-page-split-mitigating actions.

Migrating to Db2 13 for z/OS: What About Non-Universal Table Spaces?

2024-09-25T07:11:00.000-07:00

I recently communicated with a Db2 for z/OS system administrator who had some concerns about non-universal table spaces in the context of his organization's planned migration from Db2 12 to Db2 13. This person asked important questions about the presence of non-universal table spaces in his Db2 environment and how those database objects might affect his team's plans for Db2 12-to-13 migration. Looking over that communications thread, I found myself thinking (as I often have over the years) that it could be the basis for a blog entry that would be helpful for people in the wider Db2 for z/OS user community; so, here we go.

Terminology: universal and non-universal table spaces

Logically speaking, Db2 for z/OS-managed data appears in tabular form - i.e., data records appear as rows in tables, and the data fields of a given set of records are columns of the associated table (that is kind of relational database 101 information). Db2 tables are physically instantiated in what are called table spaces (and table spaces map to VSAM data sets - VSAM being the primary file system of the z/OS operating system).

Table spaces fall into the realm of Db2 physical database design, which means they are basically invisible to application programs (a given Db2 table will look the same to a program, regardless of the type of table space in which it resides). DBAs decide on the type of table space that will be used for a given Db2 table.

For many years, there were three Db2 table space types: simple, segmented and range-partitioned (for the latter, partition ranges were initially defined by way of an index - starting with Db2 Version 8 for z/OS, range-partition specifications could be made at the table level). Db2 9 for z/OS (2007) introduced a new category of table space, known as universal. There are two universal table space types: partition-by-growth (PBG) and partition-by-range (PBR). For a PBG table space, a DBA determines an appropriate partition size (known as a data set size) and an appropriate maximum number of partitions (which can later be adjusted if need be), and when the first partition fills up then Db2 automatically allocates another partition, and when that one fills up another partition is allocated, and so on. For a PBR table space, a DBA determines the appropriate number of partitions and the partitioning key (comprised of one or more of tha associated table's columns), and the limit key value for each partition - Db2 then assigns rows to partitions accordingly.

Some noteworthy characteristics of universal table spaces:

They are always partitioned (though a PBG table space might never grow beyond its first partition).
They are segmented (referring to an aspect of management of the space within the table space - this is why I often use the phrase "traditional segmented" to refer to the segmented table space type that existed before universal table spaces were introduced).
A given universal table space can hold one and only one table.

What does Db2 13 have to do with this?

For a Db2 12 system to be migrate-able to Db2 13, function level V12R1M510 has to be activated for the Db2 12 system. Activating function level N for a Db2 for z/OS system means that functionality associated with function levels prior to N will also be activated (if the prior function levels had not been explicitly activated previously). That means that activation of Db2 12 function level 510 will mean activation of the functionality of function level V12R1M504, and there's the rub - or so the aforementioned Db2 administrator thought. What this administrator knew - and knew rightly - is that, with function level V12R1M504 or higher activated, non-universal table spaces go into "deprecated" status (meaning that there will no longer be functional enhancements related to those non-universal table space types, and that the use of such table space types could eventually become not-possible - an eventuality likely to be far in the future and preceded by way-in-advance notice). Of more immediate concern to the Db2 administrator: he was under the impression that non-universal table spaces could not be created in a Db2 13 environment, and given that impression he was thinking that non-universal table spaces in his environment had to be converted to universal table spaces before Db2 12 systems could be migrated to Db2 13.

I had some good news for the Db2 administrator: non-universal table spaces CAN be created in a Db2 13 environment (this actually applies to traditional segmented table spaces - simple table spaces, while still usable even in a Db2 13 environment, have not been CREATE-able since Db2 9). To create a traditional segmented non-universal table space in a Db2 13 system (or in a Db2 12 system with function level V12R1M504 or higher activated), all one has to do is issue the CREATE TABLESPACE statement via a Db2 package executing with an application compatibility level of V12R1M503 or lower (as explained in an entry I posted to this blog a few years ago). The really important point here is that the presence of non-universal table spaces in your Db2 for z/OS environment should in no way impact your Db2 13 migration plans. Is it good to convert non-universal table spaces to the universal variety? Yes (as explained below); however, you do NOT need to complete that table space comversion work prior to migrating your Db2 12 systems to Db2 13 - you can continue that work in a Db2 13 environment.

Incentives for converting non-universal table spaces to universal table spaces

Avoiding use of deprecated table space types is a good reason to move away from simple and traditional segmented table spaces. There are, in addition, positive incentives for making this move, in the form of Db2 for z/OS features that apply only to universal table spaces (and tables therein):

Pending DDL - Many changes to database objects can be accomplished via ALTER and a subsequent online REORG of the related table space.
LOB in-lining - For a table with one or more LOB (large object) columns, this refers to the ability to store a portion of LOB values (up to a specified length for a given LOB column) in the base table, with the rest of the value (if any) going in the appropriate auxiliary table in a LOB table space.
XML multi-versioning - For a table with one or more XML columns, this Db2 feature provides better concurrency for XML data access, and supports the XMLMODIFY built-in function, enabling changes that affect only part of a stored XML data value.
ALTER TABLE with DROP COLUMN - This is a pending DDL change (see the first item in this list).
Ability to insert a new partition in the middle of a range-partitioned table space (versus having to add the new partition at the end of the table space).
The option of making ALTER COLUMN actions pending DDL changes (as opposed to requiring that these be immediate changes).
Relative page numbering for range-partitioned table spaces.

How disruptive is it to go from a non-universal to a universal table space?

Answer: about as minimally disruptive as could be. It's an ALTER + online REORG action, so the only period of data unavailability related to a non-universal-to-universal table space conversion would be during the "switch" phase at the end of an online REORG, and that can be as brief as a few seconds. Dependent Db2 packages (related to programs that issue so-called static SQL statements targeting a table in a table space converted from non-universal to a universal type) will be invalidated by the online REORG that materializes the table space conversion - you would subsequently either rebind those invalidated packages with explicit REBIND PACKAGE commands, or you would let Db2 auto-bind them (auto-bind is triggered by the first request by a program to execute an invalidated package).

Converting a table space from non-universal to universal is really easy when the non-universal table space holds a single table. If the non-universal table space is of the simple or traditional segmented variety, the process is: 1) ALTER the table space with a MAXPARTITIONS value, and 2) execute an online REORG of the table space. Following completion of the online REORG, the table that had been in a simple or traditional segmented table space will be in a universal partition-by-growth table space. If the non-universal table space is of the range-partitioned type, the process is similarly simple: 1) ALTER the table space, this time with a SEGSIZE value, and 2) execute an online REORG of the table space. Following completion of the online REORG, the table that had been in a non-universal range-partitioned table space will be in a universal partition-by-range table space.

For a simple or traditional segmented table space that contains multiple tables, conversion to universal is a somewhat more involved process, but it still comes down basically to ALTER and online REORG. The process, introduced with Db2 12 function level 508, is described in an entry I posted to this blog not long ago.

After a table space has been converted from non-universal to a universal type, is it necessary to change the APPLCOMPAT value for Db2 packages that are dependent on the table space (or the table therein)?

Answer: NO. The requirement for a Db2 application compatibility level of V12R1M503 or lower is ONLY relevant for the package (maybe a SPUFI or a DSNTEP2 package, or a package associated with another program or tool used by Db2 DBAs) through which a CREATE TABLESPACE statement is issued, when there is a need to create a traditional segmented (i.e., non-universal) table space in a Db2 13 environment or a Db2 12 environment in which the activated function level is V12R1M504 or higher. Db2 packages having any supported APPLCOMPAT value can access a table in a universal table space (the lowest supported APPLCOMPAT value is V10R1, and the highest possible APPLCOMPAT value is equivalent to the current activated function level of the Db2 for z/OS system in which the package executes).

In a nutshell...

Converting non-universal table spaces to universal types is something you ought to be doing, but you DO NOT have to get that conversion work completed prior to migrating a Db2 12 system to Db2 13. The conversion process - available in Db2 12 and Db2 13 environments - varies somewhat depending on whether the non-universal table space is range-partitioned or not, and on whether the non-universal table space (if not range-partitioned) holds one table or multiple tables, but in any case it is pretty straightforward, minimally disruptive and application-transparent. So, yes, work on this, but at a pace that works for you and for your organization.

After Migrating to a New Version of Db2 for z/OS, Rebind All Your Plans and Packages

2024-08-23T10:13:00.000-07:00

At this writing, a lot of Db2 for z/OS-using organizations are actively engaged in migrating to Db2 13 from Db2 12, and plenty of sites have recently gotten to Db2 13. That being the case, it seems like a good time to remind people of an important step to take after migrating to Db2 13, while still at the V13R1M100 function level (the initial Db2 13 function level when you've migrated from Db2 12): rebind all your plans and packages.

First: why do this? For a couple of reasons. One would be to get a CPU efficiency benefit. The other has to do with the benefit of keeping package code current (I'll explain that "code" reference momentarily). Let me expand on both these rebind motivations.

Package REBIND motivation #1: improved CPU efficiency

Rebinding packages when you get to a new version of Db2 for z/OS can deliver CPU efficiency benefits in two ways. First (for packages associated with programs that issue static SQL statements), it lets the Db2 optimizer generate access paths for queries (and for "searched" UPDATEs and DELETEs - i.e., UPDATE and DELETE statements that have predicates) using options that may not have been available to the optimizer in the previous version of Db2. In some cases, new access path options can substantially improve a query's performance.

What if you're happy with the current access plan for a package (very often the case)? That brings us to the second source of CPU efficiency gain typically associated with a package rebind in a new-version Db2 environment: even when access paths don't change as a result of a REBIND (and they generally won't when you issue REBIND PACKAGE with the option APREUSE(WARN) or APREUSE(ERROR), which tells Db2 to re-use existing access paths for the package's SQL statements), it's quite likely that the rebind of a package in a new-version Db2 environment will yield at least some improvement in the CPU efficiency of package execution. How is that so? Well, as briefly mentioned above, a Db2 package contains executable code. For a package related to a program that utilizes static SQL, part of the package's executable code will be, in essence, the compiled form of the static SQL statements. See, there is no z/OS or IBM Z instruction called (for example) SELECT. A SELECT statement (or any other SQL statement) embedded in a Db2-accessing program has to be turned into code that can be executed in the z/OS system. That executable-form SQL is found in the program's Db2 package (itself originally generated from a BIND of the program's static SQL statements). And here's the thing: the IBM Db2 for z/OS development team is always working to "tighten up" executable-form SQL (i.e., to make the executable form of SQL statements ever more CPU-efficient). When you let Db2 13 re-generate the code in a package by rebinding that package, you typically will get some CPU efficiency benefit from the resultant "tightened up" code that Db2 13 can deliver - again, even when access paths for the package's SQL statements don't change.

Package REBIND motivation #2: package code currency

Efficiency benefits of tightened-up package code aside, there is another reason to let a new version of Db2 for z/OS regenerate the code for your packages via REBIND: it makes the package code more current in the sense that it is associated with the newer version of Db2. Why is that important? Because at some point too-old package code (i.e., code generated for a package via BIND or REBIND with an older version of Db2) will become un-executable for a newer version of Db2. That was the case in a Db2 12 for z/OS environment for packages last bound or rebound in a pre-Db2 10 system, and it is true in a Db2 13 environment for packages last bound or rebound in a pre-Db2 11 system (that is why activation of Db2 12 function level 510 - a prerequisite for migration to Db2 13 - will not be successful if you have any pre-Db2 11 packages still in use in your Db2 12 environment). If you keep your Db2 package code current by rebinding packages when you go to a new version of Db2, this issue of eventually-not-executable package code will be something about which you won't have to worry.

Good Db2 13 news regarding REBIND PACKAGE with APREUSE(WARN) or APREUSE(ERROR)

As I mentioned previously, it's often the case that a Db2 DBA team is fine with package access paths being what they are. In such situations, when doing a large-scale rebind of packages upon migrating to a new version of Db2, it makes sense to specify APREUSE(WARN) or APREUSE(ERROR) in the REBIND PACKAGE commands. Both forms of APREUSE tell Db2 to reuse existing access paths for a package's SQL statements, versus generating new access paths. The difference? With APREUSE(WARN) in effect, if Db2 cannot re-use an access path for one or more of a package's SQL statements, the rebind will go forward and Db2 will issue an information message about the access path change and will write information to EXPLAIN tables that will allow you to see just what changed. With APREUSE(ERROR) in effect, a REBIND PACKAGE action will fail if Db2 cannot reuse all access paths associated with a package's SQL statements.

Here's the good Db2 13 news regarding APREUSE(WARN) and APREUSE(ERROR):

The CPU efficiency of REBIND PACKAGE when APREUSE(WARN/ERROR) is specified has been improved, so a large-scale rebind with these specifications will consume less CPU time.
The rate of warnings, when APREUSE(WARN) is used, or errors, when APREUSE(ERROR) is used, should be lower in a Db2 13 system versus previous Db2 environments. In other words, the rate of success for access path reuse in a Db2 13 environment should be higher versus previous Db2 environments.

A word about APPLCOMPAT

When you do a large-scale rebind of packages after migrating to Db2 13, do you need to change the APPLCOMPAT value for your packages? No, you do not need to do this. As is the case for a Db2 12 system, in a Db2 13 environment APPLCOMPAT values as low as V10R1 are OK. Additionally, you should know that the CPU benefit of package rebind that I described previously does NOT require going to a higher APPLCOMPAT value when rebinding packages; and, if you want to let the Db2 13 optimizer utilize new access path options, by not using APREUSE(WARN) or APREUSE(ERROR) in a REBIND PACKAGE command, this does NOT require specification of a higher APPLCOMPAT value for the package in question - query optimization is unrelated to a package's APPLCOMPAT value.

If you'd prefer to take package APPLCOMPAT values higher when rebinding packages in a Db2 13 environment, go ahead and do that - just know that this is an option, and not a technical requirement.

A little more information here:

When a package is rebound and the REBIND PACKAGE command does not include an APPLCOMPAT specification, by default Db2 will just preserve the package's existing APPLCOMPAT value.
You can learn more about APPLCOMPAT via a part 1 and part 2 entry I posted to this blog a few years ago.

Remember that rebind phase-in will help you out

Rebind phase-in functionality, introduced with Db2 12 function level 505 and part of the Db2 13 base code, can be a BIG help when you're looking to rebind all of your packages following migration to Db2 13. How so? By eliminating REBIND contention with in-use packages. Prior to Db2 12 FL505, you couldn't rebind a package if the package was in-use at the time (i.e., if the package was allocated to a thread for execution), and this caused many REBIND PACKAGE commands to time out (especially if the target package was bound with RELEASE(DEALLOCATE) and allocated to one or more threads of the type that persist through COMMITs). With rebind phase-in functionality in effect (and this is automatic in a Db2 12 FL505 or higher environment, or in a Db2 13 system, with no special REBIND specification required), a REBIND package action will succeed even if the specified package is in-use at the time, and the rebind itself will not disrupt a related application workload.

Rebind phase-in functionality can be particularly helpful for rebinding packages that are very high-use in nature, as is often the case for (among others) the IBM Data Server Driver / Db2 Connect packages (referring to the packages whose default collection is the one named NULLID). At sites with high-volume, round-the-clock Db2 DDF (i.e., client-server) workloads, rebinding the NULLID packages could be a real challenge. Rebind phase-in removed that challenge.

NOTE: rebind phase-in functionality requires that the value of the Db2 ZPARM parameter PLANMGMT be set to EXTENDED. As that has long been the default value for PLANMGMT, it's probably the setting for your Db2 for z/OS systems, but you might want to double-check that.

Rebind plans, too - and note some distinctions versus packages

Yes, plans - as well as packages - should be rebound following a migration to Db2 13, but there are some difference versus packages that you should consider.

First, plan rebinds apply exclusively to your "local-to-Db2" applications, such as CICS-Db2 or IMS-Db2 transactions, or Db2-accessing batch jobs - DDF-using applications are not associated with Db2 plans in a technical sense (the DISTSERV plan related to DDF-using applications is for reporting and instrumentation purposes - it's not a Db2 plan in the technical sense).

Second, rebinding of plans when you get to Db2 13, while recommended, is not as high-priority an action as is rebinding of packages. Plans don't contain code that provides the executable form of SQL statements - it has been a LONG time since you were able to bind SQL statements directly into plans, and existing plans containing compiled SQL statement code had to be converted to package-related plans years ago (a Db2 plan points to one or more collections of packages that are executed by applications that utilize the plan). This being the case, you'll be more in take-your-time mode when it comes to rebinding plans in the Db2 13 environment, and that relatively diminished level of urgency is helpful in light of the fact that there is currently no plan-related equivalent of the phase-in functionality used for package rebinds.

Given that there can be contention when a REBIND PLAN command targets an in-use plan, timing can be important. For rebind of a plan used by a batch application, it generally won't be hard to find a time window during which the batch application is not running. Even if you have a round-the-clock transaction (CICS or IMS TM) that accesses Db2 for z/OS, you should be able to successfully execute a REBIND PLAN if you issue the command during a time of lower transaction volume and if the plan in question is not bound with the RELEASE(DEALLOCATE) option (in my experience, RELEASE(DEALLOCATE) is typically used for certain packages, and not for plans). If RELEASE(DEALLOCATE) is in effect for a plan you want to rebind, and that plan is associated with a round-the-clock transaction workload, and the plan is allocated to persistent threads (e.g., CICS protected threads or IMS WFI or pseudo-WFI threads), you might need to look at briefly going with non-persistent threads for the workload to enable successful execution of REBIND PLAN.

Summing it up

Rebinding your Db2 plans and packages after going to Db2 13 (while still at the V13R1M100 function level) should be part of your overall Db2 13 migration plan. Taking this action should yield some CPU efficiency gains (even when access paths for static SQL packages don't change), and the boost in package and plan code currency will help to facilitate migration to the version of Db2 that follows V13 (whenever "Vnext" makes the scene).

Db2 for z/OS: What Makes for a Balanced Mainframe Configuration?

2024-07-30T19:06:00.000-07:00

What I'm writing about today is something I've been talking about for years, thought often in a somewhat indirect way. It has to do with mainframe memory (often referred to as "real storage" by z/OS people, to distinguish it from virtual storage) - more specifically, what it takes, memory-wise, to have what I'd call a balanced configuration for a z/OS LPAR (a z/OS LPAR, or logical partition, is a z/OS system; Db2 for z/OS runs within a z/OS LPAR, and a given mainframe "box" could house one or more z/OS LPARs). After working this year with a couple of situations involving z/OS LPARs with not-well-balanced configurations, I think it's time to address the topic directly.

I'll tell you right off that the gist of the matter is this: to realize the full productive potential of a z/OS LPAR's processing capacity, that capacity has to be balanced by an appropriate amount of system memory.

It's at this point that an IT person who administers so-called distributed systems servers (Linux, UNIX or Windows servers) could be expected to say, "Duh." See, distributed systems administrators have talked about "balanced configuration units" (aka BCUs) for many years (when I was in the IT organization of a financial services company back in the early 2000s, I regularly saw references to BCUs). The BCU concept was simple and straightforward: a distributed systems server with X number of processor cores should have at least Y amount of memory (BCU, in this context, often extended to disk storage, as well - that's not so much the case with z/OS systems, owing in large part to the architecture of the z/OS I/O subsystem).

Oddly enough, I generally didn't hear much talk about balanced configurations among z/OS people. In fact, back around 2008 or so things started to get more un-balanced at some z/OS sites. That was when IBM delivered the z10 line of mainframe servers, a line that pretty dramatically increased processing power relative to its predecessor. What then happened in more than a few instances is that z/OS LPAR processing capacity raced ahead of real storage resource growth. Why did this happen? I'm not certain, but one possibility is the fact that IBM Z (the official product designation for mainframe servers) and z/OS were a bit late to the 64-bit party, referring to 64-bit addressing, which hugely expanded virtual storage resources (to 16 exabytes, from the 2 gigabyte limit imposed by 31-bit addressing) and offered, as well, access to vastly larger real storage resources. z/OS system administrators had for years worked hard to fit huge mainframe workloads into virtual and real storage spaces that could not exceed 2 GB, and when enormously large real storage resources became a possibility, it seemed to me that there was almost a reluctance on the part of some z/OS systems people to ask for more memory than they'd used before. Maybe people felt that requesting a lot more memory for a z/OS system was akin to "taking the easy way out" by just "throwing hardware at the problem" of keeping up with application workload demands.

Distributed systems people had no such qualms about requesting lots of memory to go with lots of processing power.

Anyway, at a number of z/OS sites things got pretty rough with regard to delivering required levels of application performance and throughput because of a lack of real storage resources, and these issues could be particularly acute for z/OS systems supporting Db2 for z/OS workloads. Why is that? Because Db2 for z/OS, from the beginning, was architected to take advantage of large virtual and real storage resources. See, when Db2 for MVS (as it was originally known) was announced back in 1983 (as I recall - it was the second year of my first stint with IBM), the MVS/ESA operating system was right around the corner - and with it, Db2 for MVS/ESA. MVS/ESA provided 31-bit addressing, taking the old 16 megabyte virtual and real storage limit (can you believe that?) associated with 24-bit addressing up to a then-astounding 2 gigabytes. That great big increase in addressability allowed the buffering of large amounts of Db2 data in memory, and that was a huge factor in generating the performance that helped Db2 become a tremendously popular DBMS for z/OS systems (the "We have liftoff" for Db2 moment came when its performance capability matched the huge programmer productivity boost associated with SQL and the relational database model, both of which IBM invented).

Fast-forward lots of years, and you get to that time - starting around the late 2000s, by my reckoning - when, as noted, z/OS processing capacity got seriously out in front of real storage resources at a number of Db2 for z/OS sites. What this sometimes meant for those sites: either the Db2 buffer pool configuration was much smaller than it should have been, due to the lack of real storage for the z/OS LPAR, resulting in very high read I/O rates that impeded application throughput and increased CPU consumption; or, the Db2 buffer pool configuration - while still perhaps on the small side - was too big relative to the undersized real storage resource, leading to elevated levels of demand paging in the z/OS system, with resultant adverse impacts on performance (a production z/OS LPAR's demand paging rate, available via an IBM RMF CPU Summary report, should be less than 1 per second). Plenty of organizations have acted in recent years to get real storage in line with processing capacity for their Db2-containing z/OS LPARs, but quite a few still have production Db2 for z/OS subsystems running in z/OS LPARs that are under-configured from a real storage perspective. It's a good time to achieve balance for Db2-related z/OS LPARs that currently have too little system memory.

OK, so what does a balanced configuration look like for a z/OS LPAR in which a production Db2 subsystem runs? My rule of thumb, based on years of reviewing performance data for production Db2 for z/OS systems, is this: the z/OS LPAR should have at least 20 GB of real storage per engine (i.e., per processor) - and that's regardless of the mix of general-purpose and zIIP engines configured for the LPAR. Here's an example: suppose you have a z/OS LPAR, in which a production Db2 subsystem runs, that is configured with 5 general-purpose and 3 zIIP engines. I'd say that to effectively balance that LPAR's processing capacity with real storage, you'd want the LPAR to have at least 160 GB of real storage (8 engines X 20 GB - at least - per engine). I'm emphasizing "at least" because you don't need to stop at 20 GB of real storage per engine. Some Db2 for z/OS-using organizations have production z/OS systems with 30 GB, 40 GB, 50 GB or more of real storage per engine. Is that over-doing it? Not in my view. These organizations have done things like super-sizing buffer pools; going big for other performance-relevant Db2 memory areas such as the package cache, the database descriptor cache, the prepared dynamic statement cache, the sort pool and the RID (row ID) pool; and boosting the use of RELEASE(DEALLOCATE) packages combined with persistent threads (i.e., Db2 threads that persist across commits) - a move that, like the others, ups Db2's memory usage and drives down per-transaction CPU consumption. The point: give Db2 for z/OS enough memory, and it will perform very well. Give it even more memory, and it will perform even better.

[Here's an important cost factor to consider: upping the real storage resource for a z/OS LPAR will not cause software costs to increase for the LPAR. z/OS software costs are based on general-purpose processor utilization, not real storage size.]

You don't have to take my word on the importance of large memory resources for the performance of a Db2 for z/OS system - just look at the trend lines:

z/OS LPAR real storage sizes are in fact getting steadily larger over time, driven in large part by positive results achieved through leveraging big memory for improved Db2 for z/OS workload performance and CPU efficiency (and fueled as well by the continuing drop in the cost of mainframe memory on a per-gigabyte basis). Reference point: the largest real storage size I've seen with my own eyes for a z/OS LPAR in the real world (versus an IBM performance benchmark system) is 2.4 terabytes. The Db2 for z/OS subsystem running in that LPAR has a buffer pool configuration size (aggregate size of all allocated buffer pools) of 1.7 TB, and the LPAR's demand paging rate is zero (the 700 GB of real storage beyond the Db2 buffer pool configuration size is more than enough for the rest of the system's storage requirements).
z/OS 3.1 (the current version of the operating system) provides support for up to 16 terabytes of real storage for one z/OS LPAR (up from 4 TB previously). Caveat: with the current packaging of memory for the IBM z16 line of mainframe servers, it's recommended that you not go beyond 10 TB of real storage for a single z/OS LPAR - that restriction will likely be gone at a future time.
Since Db2 12 for z/OS (the current version is Db2 13), a single Db2 subsystem can have a buffer pool configuration size of up to 16 TB (that's a prep-for-the-future thing - you generally want your buffer pool configuration size to be less than the real storage size of the associated z/OS LPAR, and as noted that size is currently limited to 16 TB, with a recommended maximum of 10 TB for an LPAR on a z16 server).
Multiple virtual storage-related (and, therefore, real storage-related) Db2 for z/OS configuration parameters (known as ZPARMs) have default values for Db2 13 that are substantially greater than the corresponding Db2 12 values. These parameters include those that specify the size of the package cache, the database descriptor cache, the sort pool and the log output buffer. The substantially larger default values for these parameters in a Db2 13 environment reflect the awareness of the IBM Db2 for z/OS development team that z/OS LPAR real storage sizes are getting larger in an ongoing way, and a reaffirmation that leveraging larger real storage resources is a winner for Db2 performance and CPU efficiency.

If you have a Db2-housing z/OS LPAR that is under-configured in terms of real storage, do what you can to change that situation. Yes, when 2 GB was the real and virtual storage limit (the latter referring to the address space size limit) for z/OS systems, Db2 for z/OS people went to heroic lengths to push huge transaction rates through LPARs that were very much memory-constrained. That was then. These days, no one gets a medal for "doing the best you can" with a Db2 for z/OS system that is sorely lacking in the real storage department. Take a cue from your peers on the distributed systems side of the house. Memory matters.

Db2 for z/OS: Getting From Function Level 500 to 510 in a Db2 12 Environment

2024-06-26T19:46:00.000-07:00

It's pretty well known that getting the activated function level of a Db2 12 for z/OS system to V12R1M510 is required prior to migrating that Db2 12 system to Db2 13 (I blogged about that a couple of years ago). At present (June, 2024), there are still a good number of sites running Db2 12 for z/OS with an activated function level of V12R1M500. I sometimes get, from people at such sites, a question like this: "How should we go from Db2 12 function level 500 to function level 510?" Generally speaking, what these folks are wanting is for me to opine on going straight from Db2 12 function level 500 to function level 510, versus getting from FL500 to FL510 by first going to some "in-between" function level and then going to 510. In this blog entry, I'll tell you what I tell people who ask me this "getting from A to B" question.

Right off the bat, I'll say that I'd be comfortable with either approach - in other words, there is not a "right" and a "wrong" way to go from Db2 12 FL500 to FL510. Let's consider first the "one fell swoop" approach of going straight from Db2 12 FL500 to FL510. What is required, in a technical sense, to do this? The code level of the Db2 12 system (which you can check using the output of the Db2 command -DISPLAY GROUP) has to be 121510. The catalog level (also check-able with -DISPLAY GROUP) has to be V12R1M509 (there is no 510 catalog level; and, you can use the CATMAINT utility to take the catalog level to V12R1M509 if it is not already there). Additionally, you can't have any packages that have been executed within the past 1.5 years that were last bound or rebound prior to Db2 11 (this "no pre-Db2 11 still-in-use packages" requirement is explained in the aforementioned blog entry about preparing to migrate from Db2 12 to Db2 13).

Let's say those technical prerequisites are in-place. Given that you could, in that case, go straight from Db2 12 FL500 to FL510, why wouldn't you? What would hold you back? In my experience, the chief concern is often something like this: "If we make the big jump from Db2 12 FL500 to FL510, I'm afraid that will mess up some of our Db2-accessing applications."

That concern, though understandable, is essentially unfounded if you don't change package APPLCOMPAT values when making the big jump in the activated function level of the Db2 12 system. I've written plenty about APPLCOMPAT, including a part 1 and part 2 entry in this blog. The important thing to remember about APPLCOMPAT, in relation to a change in the activated function level of a Db2 system, is this: a package's APPLCOMPAT value can serve to insulate an associated application program from SQL behavioral changes that could be introduced via a change to a higher activated function level in a Db2 for z/OS environment. Let's unpack that statement. A SQL behavioral change - referred to in the Db2 for z/OS documentation as a "SQL incompatibility" - can be succinctly described thusly: same SQL, same data, different result. That might sound kind of alarming. Here's the two-pronged good news: 1) SQL incompatibilities are quite few and far between (they are documented whenever they are introduced by a new Db2 for z/OS version or function level), and 2) they tend to affect either very few or none of an organization's Db2-accessing applications (they are usually associated with highly specialized, often unusual scenarios).

Now, while it's highly unlikely that any of an organization's Db2-accessing programs would be negatively impacted by a SQL incompatibility associated with a newer Db2 function level, I can't say that the risk of application impact related to a change in activated Db2 function level is zero if applications are exposed to SQL behavioral changes. That's where APPLCOMPAT comes in. See, if the Db2 package used by an application program has an APPLCOMPAT value of (for example) V12R1M100, SQL behavior for that program will be that of a Db2 12 system with function level 100 activated, even if the Db2 environment in which the program is executing has an activated function level of V12R1M510 (or V13R1M100 or some other Db2 13 function level, after the system has been migrated to Db2 13). That is how APPLCOMPAT insulates a Db2 application program from SQL behavioral changes that might be introduced when a higher function level is activated for the Db2 system in which the program's SQL statements execute. So, make a jump in activated function level from 500 to 510 in your Db2 environment, and leave package APPLCOMPAT values as they are, and your Db2 application programs will be (conceptually speaking) unaware of the environmental change.

OK, so I'd be plenty comfortable in taking a Db2 12 system's activated function level from 500 right to 510, and I hope you'd be similarly comfortable with such a move. You might find, however, that "selling" the powers that be on a plan for getting to Db2 12 FL510 from FL500 would be easier if that plan included a "rest stop" at a function level between 500 and 510. If that were the case, you could go that route and not lose any honor in my eyes; and, yours wouldn't be the first Db2-using organization to take the 2-step approach for getting from Db2 12 FL500 to FL510. I can tell you that a pretty popular intermediate stop on the way from Db2 12 FL500 to FL510 is FL505. This is true for a couple of reasons: stopping for a while (for however long makes people comfortable: a few weeks, a couple of months - whatever) right between FL500 and FL510 appeals to some folks. Additionally, Db2 12 FL505 makes available to you one of my favorite Db2 12 features: rebind phase-in.

Alright, to sum up: if your organization has been sitting at Db2 12 FL500 for a long time, and you want to get to FL510 so you can go from there to Db2 13, consider taking one of two routes from A to B:

Straight through - If your code level and catalog level and no-still-in-use-pre-Db2-11-packages ducks are all in a row, go from Db2 12 FL500 straight to FL510, keeping in mind that leaving package APPLCOMPAT values as they are is a way to insulate Db2-accessing programs from any impact related to the Db2 environmental change.
Go part way, sit a spell, then go the rest of the way. As noted, Db2 12 FL505 is a popular rest stop en route from FL500 to FL510.

The main point is, GET THIS DONE. Db2 12 goes out of service after December 31, 2025. You still have plenty of time to get to Db2 12 FL510 and then on to Db2 13, but if you sit back too long you could end up feeling a little anxious about the whole thing. Who needs that? I hope information in this blog entry will help you get your organization's Db2 12-to-13 show on the road.

Db2 for z/OS: Really Big Buffer Pools are Great, but Don't Stop There

2024-05-30T11:56:00.000-07:00

Back in 2018, I reviewed an organization's production Db2 for z/OS environment, and saw at that time the largest z/OS LPAR real storage size I'd ever seen: 1100 GB. The Db2 subsystem running in that LPAR had (not surprisingly) the largest buffer pool configuration I'd ever seen: 879 GB (referring to the aggregate size of all of the Db2 subsystem's buffer pools). Within that buffer pool configuration was (again, not surprising) the lagest single Db2 buffer pool I'd ever seen: approximately 262 GB (66,500,000 4K buffers).

Those "biggest I've ever seen" figures sustained that status until just recently, when I reviewed another production Db2 for z/OS system. The new "biggest I've ever seen" numbers are way larger than the previous maximums:

A z/OS LPAR with 2.4 terabytes of real storage
A Db2 subsystem with 1.7 terabytes of buffer pool resources.
An individual buffer pool of approximately 560 GB in size (140,000,000 4K buffers)

What I saw is great, and it's the future, folks. That humongous buffer pool configuration is eliminating a humongous number of read I/Os (synchronous and asynchronous), and that, in turn, boosts CPU efficiency and throughput for the Db2-accessing application workload. z/OS LPAR real storage sizes are getting bigger and bigger, in part because mainframe memory keeps getting less expensive on a per-GB basis, and in part because z/OS can utilize ever-larger amounts of real storage (z/OS 3.1 supports up to 16 TB of memory for one LPAR - though with the way memory is currently packaged on an IBM Z server, you're advised to limit the size of one LPAR's real storage resource to 10 TB), and in part because loading up on real storage does not increase the cost of software that runs in a z/OS system (the cost of that software is based on mainframe general-purpose CPU utilization - not real storage size).

Organizations that run production Db2 subsystems in z/OS LPARs that have large amounts of real storage are increasingly going with really large Db2 buffer pool configurations. That's a smart thing to do. Db2 allows for up to 16 TB of buffer pool space for a single subsystem, and going big is a definite plus for Db2 workload performance and CPU efficiency (just be sure that you don't over-burden a z/OS LPAR's real storage resource: you want to keep the LPAR's demand paging rate - available via an RMF Summary report - below 1 per second).

Here's the point I want to make with this blog entry: while really big buffer pools are great for Db2 application performance, don't focus solely on the size of a Db2 subsystem's buffer pool configuration. In addition to going big, take other steps to maximize the positive performance benefits of a large Db2 buffer pool configuration:

Use large real storage page frames for the busier buffer pools - In my view, a buffer pool's GETPAGE rate (obtainable from a Db2 monitor-generated statistics long report) is the best indicator of a pool's busy-ness. My recommendation is to use large real storage page frames for every pool that has a GETPAGE rate (during a busy hour of the processing day) in excess of 1000 per second. Large real storage page frames enhance CPU efficiency for page access by reducing the CPU cost of translating virtual storage addresses to real storage addresses. Some related things to note:

Page-fixing of buffers in a pool is a prerequisite for using large real storage page frames - This is done for a pool via a specification (in an -ALTER BUFFERPOOL command) of PGFIX(YES). Note that actually changing from not-page-fixed to page-fixed requires deallocation and reallocation of a pool - this usually happens as a consequence of recycling (i.e., stopping and then restarting) the associated Db2 subsystem. That deallocation and reallocation is also required to go from the default real storage frame size of 4K to a larger frame size.
Use the appropriate real storage frame size - For a buffer pool that is at least 20 GB in size (a little over 5,000,000 buffers, if we're talking about a 4K pool), I recommend the use of 2 GB page frames (this is accomplished via a specification of FRAMESIZE(2G) in an -ALTER BUFFERPOOL command). For a pool that is smaller than 20 GB, I recommend the use of 1 MB page frames (FRAMESIZE(1M)). Note that if a pool is defined with PGSTEAL(NONE) (see below for information about such pools, which are known as "contiguous" pools), 2 GB frames cannot be used - you need to go with FRAMESIZE(1M) for a PGSTEAL(NONE) pool, unless objects assigned to the pool are quite small (e.g., less than 100 4K pages), in which case the default 4K frame size would be appropriate (for a PGSTEAL(NONE) pool, Db2 will use a given large real storage page frame for buffers that hold pages of one object assigned to the pool).
Set the value of the z/OS parameter LFAREA appropriately - LFAREA, a parameter in the IEASYSnn member of the z/OS data set named SYS1.PARMLIB, specifies the amount of a z/OS LPAR's real storage resource that is to be managed in 1 MB (and, possibly, 2 GB) frames. I generally like to see an LFAREA specification that provides enough 1 MB (and 2 GB, when appropriate) frames to fully back FRAMESIZE(1M) (and, if relevant, FRAMESIZE(2G)) pools, but not much more than that. Why not much more? Because a lot of processes in a z/OS system can only use 4 KB page frames. If you think you might want to later enlarge some Db2 buffer pools that have a FRAMESIZE(1M) or a FRAMESIZE(2G) specification, you can make the LFAREA value for the frame size in question larger than you need at present - just don't go overboard with that.
Use the output of the Db2 command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL to verify that pools that have a preferred frame size of 1M or 2G are fully backed by frames of the desired size - If you see, in the output of that command, that a pool with a VPSIZE of 100,000 buffers and a preferred frame size of 1M has 70,000 buffers allocated in 1M frames and 30,000 buffers allocated in 4K frames, it means that there aren't enough 1 MB frames in the system to fully back the pool. Note that for a pool that has a preferred frame size of 2G, you might see some buffers allocated in 1M frames, even if the system has enough 2 GB frames to "fully" back the pool. Why might that be the case? Well, in the interest of not wasting a lot of space in a 2 GB page frame, Db2 won't use such a frame if it can't fill at least about 95% of the frame with buffers. Suppose you have a pool that is about 41 GB in size. Db2 will use twenty 2 GB frames for that pool, and the remaining 1 GB of buffers will be allocated in 1 MB frames, As far as I'm concerned, that is not at all a big deal. No problem.

If you have any PGSTEAL(NONE) buffer pools, aim for zero activity in the overflow area of those pools - A PGSTEAL(NONE) buffer pool is also known as a "contiguous" buffer pool. Such a pool is intended to be used to completely cache in memory the objects assigned to the pool. A contiguous buffer pool can optimize efficiency for page access, but some of that advantage is lost if there is any activity in the pool's overflow area, which is where any buffer stealing would occur (if the pool were in fact not large enough to hold every page of every object assigned to the pool). Output of the Db2 command -DISPLAY BUFFERPOOL (ACTIVE) DETAIL will show if there is any activity in the overflow area of a PGSTEAL(NONE) pool. If you see that there is activity in the overflow area of a PGSTEAL(NONE) pool (unless it's a really small amount of activity), either take one or more objects out of the pool (by reassigning them to other pools) or make the pool larger (always keeping in mind that you want the demand paging rate of the z/OS LPAR to be less than 1 per second - don't over-burden the LPAR's real storage resource).

And there you have it. For a Db2 for z/OS buffer pool configuration, really big is really great, but take the time to go the extra mile by optimally configuring the pools in the configuration.

Db2 13 for z/OS: Utility Execution History

2024-04-29T20:40:00.000-07:00

A few months ago, I got this question from a Db2 for z/OS DBA: "Is there a way we can check to see if the UNLOAD utility has been executed for certain of our tables?" The environment in question was a Db2 12 for z/OS system, and because of that I had to provide an answer that was less-than-great from an ease-of-use persepective: "Yes, you can do that in a Db2 12 environment, but the process is going to be a bit cumbersome. You could set up a Db2 audit policy that would record utility execution. What that will do is cause Db2 to write certain trace records when utilities are executed. You'd need to use your Db2 monitor to format that trace information into a human-readable report (using IBM's OMEGAMON monitor for Db2 for z/OS, that would be by way of a Record Trace report); and, you'd have trace records written for EVERY utility execution during the reporting time interval - potentially, a lot of data to sift through; and, there would be trace records generated for every utility executed for ANY database object - not just the tables in which you're interested (more data to sift through)."

"Alternatively, you could set up a different audit policy that would limit trace output to the tables of interest, but that policy would cause Db2 to generate trace records for ALL operations performed involving that set of tables, whether associated with a utility or with an application process - again, potentially a lot of trace data to sift through."

Not the idea situation, but along comes Db2 13 for z/OS to make things better - in two steps.

Step 1: Once you've activated function level V13R1M501, you can take advantage of a new ZPARM parameter, UTILITY_HISTORY. The default value for that parameter is NONE - not because we (IBM) want to discourage you from using a new Db2 capability; rather, this is in keeping with the traditional approach for a new ZPARM: the initial default value preserves existing behavior. "Existing behavior" in this case is "no utility history," because utility history functionality was not available prior to Db2 13 function level 501. If you set UTILITY_HISTORY to UTILITY, something nice happens: every time a utility executes, Db2 will insert a row into the SYSIBM.SYSUTILITIES table (newly added when CATMAINT is executed to take the catalog to the V13R1M501 level). That row for a utility in SYSUTILITIES provides a lot of useful information, such as:

The name of the utility (e.g., UNLOAD, or COPY, or REORG, or ...)
The name of the job through which the utility was executed
The user ID that invoked the utility
The starting point of the utility, both timestamp-wise and logpoint-wise
The timestamp at which the utility completed processing
The elapsed time for the utility job
The general-purpose CPU time and zIIP CPU time for the utility job
The portion of CPU time (both general-purpose and zIIP) consumed with sort activity related to the utility job
The utility job's return code
Whether the utility job was terminated (and, if so, whether it was restarted)

Step 2: After you've activated function level V13R1M504, you can change the value of UTILITY_HISTORY in ZPARM to OBJECT. When you do that, execution of a utility will cause Db2 to insert into SYSUTILITIES a row with the useful informational items noted above, and in addition to that Db2 will insert into SYSIBM.SYSOBJEVENTS a row for each object (non-partitioned table space or index, or a partition of a table space or index) processed by the utility (SYSOBJEVENTS makes its debut when the catalog is taken to the V13R1M504 level).

[Note: prior to activation of function level V13R1M504, there is some capability to obtain information about the object(s) processed by a utility by matching SYSUTILITIES rows with SYSCOPY rows based on values in the EVENTID column which is found in both tables. I say, "some capability," because not every utility drives SYSCOPY insert activity - that's what makes SYSOBJEVENTS important.]

Back to the question, referenced at the start of this blog entry, about tracking UNLOAD activity for certain tables. In a Db2 13 environment with function level V13R1M504 (or higher) activated, and with UTILITIES_HISTORY set to OBJECT in ZPARM, this question can be answered simply by querying SYSUTILITIES with a WHERE NAME = 'UNLOAD' predicate, and joining with SYSOBJEVENTS on the EVENTID column to get the related table space names (and of course if you are wanting to track UNLOAD activity for particular tables you can have a WHERE clause for the SYSOBJEVENTS table that references the name of the table space or table spaces of interest).

[I recognize that UNLOAD can be executed at a table level, and that a table space could have multiple tables, but I'd hope that your sensitive-data tables are in single-table table spaces because 1) it's best to have tables in universal table spaces, and a universal table space is related to a single table; and 2) I think that data protection can be maximized when a sensitive-data table is isolated in its own table space.]

Needless to say, tracking UNLOAD activity is just one of many ways in which Db2 13 utility history data can be beneficially utilized. In section 8.1.7 of the IBM "redbook" titled, IBM Db2 13 for z/OS and More, you'll find examples of queries of SYSUTILITIES that can be used to gather all kinds of helpful information, such as:

All the utilities that executed between midnight and 8:00 AM this morning
All the utilities that are currently active or stopped
All the utilities that finished with a return code of 8 or higher
The top general-purpose CPU-consuming LOAD jobs over the past 7 days
And more...

Those sample queries can help you get started, and soon you and your teammates will come up with your own queries that yield valuable info on utility execution at your site (and keep in mind that the queries in the above-referenced redbook don't include SYSOBJEVENTS because that catalog table was not yet present when the redbook was written - you can of course extend the queries with joins to SYSOBJEVENTS).

This is a Db2 13 enhancement that I like a lot. I think you'll like it, too. Check it out.

Db2 for z/OS Data Sets: If You're Worrying About Extents, You Can Probably Stop Worrying

2024-03-29T08:58:00.000-07:00

Not long ago, a Db2 for z/OS DBA sent to me, via a colleague of mine, a question. He described in an email the procedure that his team regularly used to consolidate extents, when the number of these got into double digits for a Db2 table space-related data set, back down to one. He noted that this extent-consolidation procedure was more time-consuming and CPU-intensive than desired, and he wanted to know if I had any suggestions for making the procedure more efficient. In fact, I did have a suggestion for improving the efficiency of the Db2 data set extent consolidation procedure used at this person's site. My suggestion: STOP DOING THAT.

It might have been the comedian Henny Youngman who'd get laughs with this joke: "I told my doctor, 'It hurts when I do this.' He told me, 'Stop doing that.'" In all seriousness, Henny (or whoever it was) had an important point there. When some procedure causes pain in the form of CPU consumption and/or labor intensity, the best way to take that pain down to zero is to dispense with said procedure. In the context of Db2 data set extent-reduction efforts, my "Stop doing that" suggestion might engender this response: "But, if we dispense with our Db2 data set extent consolidation procedure, we'll end up with Db2 data sets that have a lot of extents!" My response to that response: "So?"

Here's the deal, folks: extents matter WAY less for Db2 for z/OS data sets than they did a long time ago (like, back in the 1990s). Way back when, a real concern about a Db2 table space data set going into a lot of extents was the impact this could have on prefetch read performance. Such a negative extent effect could in fact occur because - again, way back when - Db2 prefetch read operations were satisfied from spinning disk. Disk controller cache memory sizes were so small back in the day that Db2 would bypass cache for prefetch reads, and those multi-page reads from spinning disk could suffer, performance-wise, if the data set holding the page set or partition being accessed (table space or index) had gone into a good number of extents.

Things are different now, in a number of ways:

Db2 subsystem buffer pool configurations are MUCH larger than they were some years ago, owing largely to 1) mainframe memory getting less expensive all the time (on a per-gigabyte basis), leading organizations to load up on z/OS real storage (often to the tune of several hundred GB for a production z/OS LPAR); and 2) people realizing that if you give Db2 a lot of memory (e.g., for larger buffer pools), it generally performs really well. Much larger buffer pool configurations mean that a much higher percentage of Db2 page requests (synchronous and asynchronous) are satisfied from pages in memory, as opposed to requiring disk subsystem read I/O operations. Obviously, when page requests are satisfied from pages in memory, data set extents on disk are irrelevant.
Disk controller cache memory sizes have been really big for a long time; and, that large disk controller cache memory resource is managed in a high-performing way by powerful microprocessors that are an integral part of modern enterprise disk subsystems. What these large and intelligently managed disk controller cache resources mean is that a read request (synchronous or asynchronous) that cannot be satisfied from the buffer pool configuration will often result in a read from disk controller cache, as opposed to requiring a read from spinning disk. As is true for a read request that is satisfied from data in memory (in a buffer pool), data set extents are not relevant for a read of data from disk controller cache.
Even when a Db2 read request leads to accessing data all the way back on spinning disk, the architecture of modern enterprise disk subsystems - primarily RAID in nature - reduces the performance impact of data set extents from what it once was.

So, that's point #1: Db2 for z/OS data set extents just don't matter, from a performance perspective, as they once did. This point is underscored by the way in which Db2-managed secondary space allocation (the use of which I recommend) works. How do you get Db2 to manage secondary space allocation for data sets? You can do that by NOT including a SECQTY specification in a CREATE TABLESPACE or CREATE INDEX statement. For an existing table space or index, you can alter the object with a specification of SECQTY -1 to tell Db2 that you want it to manage secondary disk space allocation for the object. When Db2 manages secondary space allocation for a table space or index data set, it does so using what's called a "sliding scale" algorithm, which causes subsequent secondary space allocation quantities to be larger than those previously requested for the data set. If you check out the description of the sliding scale algorithm in the Db2 for z/OS documentation, you'll see the following (underlining added by me for emphasis): "The first 127 extents are allocated in increasing size, and the remaining extents..." Question for you: if extents were problematic from a performance perspective, would Db2's own secondary space allocation algorithm take you to 127 extents and beyond, as needed? Answer: NO. If Db2 doesn't care about this, should you? Again, NO.

"But wait," you might say, "Even if data set extents aren't a performance concern in a Db2 environment, there's a z/OS data set extent limit, right? If we hit that and Db2 can't extend a data set, application processes inserting rows into a table could fail, right?" True, but the data set extent limit is a lot bigger than it used to be. Back in the day, it was 251, and indeed that number might have me glancing in the rearview mirror with some of that "objects are closer than they may appear" anxiety. But quite some time ago - with z/OS 1.7 - the extent limit for a data set went to 7257 (when the Extent Constraint Removal option is set to YES in the SMS data class to which the data set belongs). When you let Db2 manage secondary space allocation for a table space or index data set, you are virtually assured that the data set will be able to reach its maximum size before it hits the extent limit.

Oh, and here's a fun fact: there is an EXTENTS column in the SYSIBM.SYSTABLESPACE and SYSIBM.SYSINDEXSPACE real-time statistics tables in the Db2 catalog. That column long had the SMALLINT data type, which can accommodate values of up to 32,767. When the Db2 catalog goes to the V13R1M501 level, the data type of the EXTENTS column changes to INTEGER - a type that can accommodate values of up to about 2.1 billion. I'd say this reflects an expectation that the z/OS data set limit is not going to stay at 7257 for the long haul.

So, would I ever be concerned with the number of extents to which a Db2 for z/OS table space or index data set has gone? I'd say that an extent value that's below 200 for a data set would not concern me. Above 200? Maybe, though not in a "front-burner" kind of way. If I saw that a Db2 data set had reached a number of extents greater than 200, I might be inclined to reduce that number at least somewhat, probably by going to a larger PRIQTY value for the object and executing an online REORG to put the change into effect. Again, though, this would not be a "crisis response" action - more like a Db2 housekeeping task.

Bottom line: if you've been spending your time and mainframe CPU time in being aggressive in keeping extent values low for Db2 data sets, my recommendation would be to ease up on that, because you can. Spend your time (and your mainframe's cycles) on more valuable tasks, like helping to get Db2-based applications designed (or enhanced) and deployed. That's where you'll make a bigger and more positive difference for your organization.

Db2 13 for z/OS: Now You Can Dynamically Remove, as Well as Add, Active Log Data Sets

2024-02-27T18:14:00.000-08:00

Db2 10 for z/OS (which came out back in 2010) provided a new capability related to management of a Db2 subsystem's active log data sets (known, along with the archive log data sets, as the subsystem's "log inventory"). The enhancement: the NEWLOG option of the Db2 command -SET LOG. With this added functionality, a Db2 for z/OS system administrator could add new active log data sets to a Db2 subsystem's log inventory, without having to bring the subsystem down (the system administrator would probably, in fact, add new pairs of active log data sets, as you always want to use dual logging to avoid a single point of failure for system and data recovery operations). Prior to this Db2 10 enhancement, adding active log data sets to a subsystem's log inventory could only be accomplished through execution of the DSNJU003 utility (also referred to as the "change log inventory" utility), and DSNJU003 can only be executed when the target Db2 subsystem is down.

The ability to dynamically add pairs of active log data sets to a Db2 subsystem's log inventory was welcomed by many Db2 people, and you can probably imagine why. A Db2 subsystem's active log data sets can be thought of, logically, as a ring of data sets around the Db2 subsystem. Suppose there are 20 pairs of active log data sets in this logical ring. Db2 writes information to pair #1, and when that pair of data sets is filled up then information is written to pair #2, and when that pair is filled up then information is written to pair #3, and so on around the ring. Meanwhile, not long after the filling up of active log data set pair #1, the information written to that pair of data sets will be copied to a pair of archive log data sets, and that action will make active log data set pair #1 reusable, so that new information can be written to that pair of data sets when Db2 comes back around the ring to them. The same archive operation is performed for other active log data set pairs after they have been filled up, making them reusable when their turn in the rotation comes up again to be the current active log data set pair.

All well and good - unless something goes wrong with the archive log write process. If filled-up active log data set pairs can't be archived, they can't be made reusable, and when Db2 has gone around the ring and comes back to the not-reusable active log data set pairs, logging will stop, and when logging stops just about everything stops. In a pre-Db2 10 environment, you could add active log data set pairs to a subsystem's log inventory to buy more time (by providing more space for logging) as you worked to fix whatever was impeding the log archiving process, but at the cost of stopping the Db2 subsystem in order to execute the DSNJU003 utility. Not good. Being able to buy extra fix-the-archiving-problem time by dynamically adding new pairs of active log data sets to a Db2 subsystem's log inventory, while the subsystem was still up and running, made for a much better situation.

Fast-forward to Db2 13 for z/OS, and now we get (once function level V13R1M500 has been activated) the ability to dynamically remove active log data set pairs, thanks to the new REMOVELOG option of the -SET LOG command. The value of dynamic (i.e., while the Db2 subsystem is up and running) removal of active log data set pairs is as a complement to the dynamic-add functionality we've had since Db2 10. Together, the NEWLOG and REMOVELOG options of the -SET LOG command provide a capability that can be very useful - namely, online replacement of a Db2 subsystem's active log data set pairs with better data set pairs.

"Better?" How so? Well, usually this will mean bigger and/or encrypted. Let's take the data set size case. Suppose you have a production Db2 subsystem that has 20 pairs of active log data sets, each data set being 2 GB in size. You're going through those active log data sets faster than you'd like - maybe filling up three or four (or more) pairs in an hour when the system is busy. You'd rather have active log data sets that are 8 GB apiece, versus 2 GB (Db2 12 for z/OS took the maximum size of an active log data set from 4 GB to 768 GB). Can you go from 2 GB active log data sets to 8 GB active log data sets without stopping the Db2 subsystem? With Db2 13, you can. Here's how that would work:

You dynamically add 20 pairs of active log data sets that are sized at 8 GB apiece, using the NEWLOG option of the -SET LOG command (a Db2 subsystem can have up to 93 pairs of active log data sets).
After the older and smaller active log data sets have been archived, dynamically remove them from the Db2 subsystem's log inventory via the new (with Db2 13) REMOVELOG option of the -SET LOG command.

Now you have 20 pairs of active log data sets, each sized at 8 GB, when before you had 20 pairs of active log data sets sized at 2 GB apiece, and in getting from A to B you never had to stop the Db2 subsystem.

The same approach could be used to go from 20 pairs (for example) of unencrypted active log data sets to 20 pairs of encrypted active log data sets in an online way (referring here to exploitation of the data set encryption feature of z/OS):

Dynamically add 20 pairs of active log data sets with which an encryption key label was associated at data set creation time.
When the older unencrypted data sets have been archived, dynamically remove them from the Db2 subsystem's log inventory.

In these example use cases, I've utilized the phrase, "when the older (smaller and/or unencrypted) actice log data sets have been archived, dynamically remove them." That suggests that trying to dynamically remove a not-yet-archived active log data set could be problematic. Do you need to worry about this? No. Why not? Because Db2 won't let you accidentally shoot yourself in the foot when using the REMOVELOG option of -SET LOG. Specifically:

Db2 won't let you remove an active log data set to which it is currently writing information.
Db2 won't let you remove a log data set in the pair that is next in line for the writing of log information.
Db2 won't let you remove an active log data set that has not been archived (i.e., an active log data set that is not in REUSABLE status).
Db2 won't let you remove an active log data set that is currently in use (for example, an active log data set that is being read by a RECOVER utility job).

If you try to dynamically remove an active log data set to which Db2 is currently writing, or one that is next in line for writing, or one that has not been archived (i.e., is not in the REUSABLE state), the -SET LOG command will fail with the message DSNJ391I. If the active log data set you're trying to dynamically remove does not have one of these characteristics but is currently in use by some process, that data set will be marked as REMOVAL PENDING, and message DSNJ393I will be issued. In that case, you can remove the data set from the log inventory by issuing -SET LOG with REMOVELOG again when the data set is no longer in use. Alternatively, if the Db2 subsystem is standalone in nature (as opposed to being a member of a Db2 data sharing group), the data set will be removed from the log inventory automatically when the subsystem is next recycled (in a data sharing environment, subsequent re-issuance of -SET LOG with the REMOVELOG option is required to remove a REMOVAL PENDING data set from the log inventory). Note that if an active log data set has been marked as REMOVAL PENDING, it will not be used again by Db2 for read or write purposes. Note also that information about an active log data set that is in REMOVAL PENDING status can be checked via the output of the Db2 command -DISPLAY LOG DETAIL (the DETAIL option was added with function level 500 of Db2 13). When you see, in the output of -DISPLAY LOG DETAIL, that an active log data set in REMOVAL PENDING status has 0 readers, you know that it is no longer in use and can be physically removed from the log inventory with another issuance of -SET LOG with REMOVELOG.

One more thing: I have been referring to removal of an active log data set "from the log inventory" of a Db2 subsystem. In the Db2 documentation, you'll see references to removal of an active log data set "from the BSDS" of a Db2 subsystem. The documentation is saying the same thing I'm saying. The BSDS - short for bootstrap data set - contains information about a Db2 subsystem's active and archive log data sets.

OK, there you have it. If you want to upgrade your active log data sets in one or more ways - maybe bigger than they are now, maybe encrypted versus unencrypted - then the REMOVELOG option of -SET LOG (thanks, Db2 13), together with the NEWLOG option (thanks, Db2 10) is your ticket for getting that done without having to stop the Db2 subsystem in question. Just another way that Db2 for z/OS enables you to take high availability higher than ever before.

Db2 for z/OS: Stop Making APPLCOMPAT in ZPARM More Important Than It Is

2024-01-31T06:13:00.000-08:00

The APPLCOMPAT option of the Db2 for z/OS BIND and REBIND PACKAGE commands is really important - that's why I posted part 1 and part 2 blog entries on the topic back in 2019. The APPLCOMPAT parameter in ZPARM, on the other hand (referring to DSNZPARM, the data set that contains a Db2 subsystem's configuration parameter settings), is less important. I pointed this out in part 1 of the aforementioned two-part blog entry on APPLCOMPAT, but I still find that plenty of Db2 for z/OS people ascribe significance to the ZPARM parameter APPLCOMPAT that just doesn't jibe with reality. That being the case, I am writing this blog entry in the hope that it will help to drive home the point that the ZPARM parameter called APPLCOMPAT should (generally speaking) not be the main focus of your APPLCOMPAT-related concerns.

To illustrate the point that plenty of people continue to over-inflate the importance of the APPLCOMPAT parameter in ZPARM, I'll share with you a question that a Db2 for z/OS person sent to me by way of one of my colleagues. The question was, basically, "We are getting ready to activate Db2 12 function level V12R1M510 (a prerequisite for migration from Db2 12 to Db2 13 for z/OS). Can we be pretty much assured that doing this will not cause SQL behavioral changes if we leave the value of APPLCOMPAT in ZPARM unchanged (they had this ZPARM parameter set to V10R1)?" In responding to this question, I explained that in advancing a Db2 system's active function level, one can indeed protect application programs from the risk of SQL behavioral changes (I'll explain what that means in a moment), but, I noted, this SQL behavioral change protection is provided by the APPLCOMPAT package bind specification, NOT by the APPLCOMPAT parameter in ZPARM. You can take a Db2 system's active function level as high as you want, and that will not lead to application-affecting SQL behavioral changes as long as you don't change the APPLCOMPAT value of your applications' Db2 packages. The value of the APPLCOMPAT parameter in ZPARM is only somewhat relevant to this discussion.

OK, what's all this about "SQL behavioral changes?" The term refers to this situation: same SQL statement, same data, different result. You might think, "How could that happen?" Well, every now and then, the Db2 for z/OS development team decides that the behavior of a given SQL statement should change, for one reason or another (and it's well-considered - these changes are not effected lightly). That change can be introduced with a new version or a new function level of Db2. My favorite example of a Db2 for z/OS SQL behavioral change is one that happened with Db2 11. In a Db2 10 environment, you could use a SQL statement to cast an eight-byte store clock value (a time value that a program can obtain from the z/OS operating system) as a Db2 timestamp value. In a Db2 11 system, that same SQL statement - cast an eight-byte store clock value as a Db2 timestamp - would fail with a -180 SQL error code. Same SQL statement, same data, different result in a Db2 11 versus a Db2 10 environment.

Here's one reason I really like this example: how many programs do you have that need to cast an eight-byte store clock value as a Db2 timestamp? Probably none - and this is typically the case for Db2 SQL behavioral changes - they usually affect either zero or very few of an organization's Db2-accessing programs. Alright, but what if you did have programs that needed the Db2 10 behavior of the Db2 TIMESTAMP function? Would you have been up a creek when your Db2 system went from Db2 10 to Db2 11? No - you would have been fine in that case, because you could just bind the Db2 packages used by those programs with APPLCOMPAT(V10R1), and that would mean that the programs would execute with Db2 10 SQL behavior, and that would mean that those programs could cast an eight-byte store clock value as a Db2 timestamp. See - it's the APPLCOMPAT package bind specification that provides protection (when needed) from Db2 SQL behavioral changes.

[By the way, in the Db2 for z/OS documentation, what I have been calling "SQL behavioral changes" are referred to as "SQL incompatibilities." These are documented for each Db2 application compatibility level, going back to V10R1 (that's as far back as Db2 application compatibility goes).]

So, I said up front that the APPLCOMPAT parameter is ZPARM is not as important as the APPLCOMPAT specification for your Db2 packages. Does that mean that the ZPARM has no significance? No. What is the purpose of the APPLCOMPAT parameter in ZPARM? It's this: the ZPARM parameter provides the default value for a package's APPLCOMPAT setting when the BIND PACKAGE command is issued without an APPLCOMPAT specification. That's it. I tell people to think of APPLCOMPAT in ZPARM as being like a cubbyhole. A BIND PACKAGE command may be issued without an APPLCOMPAT specification. The package in question needs an APPLCOMPAT value. Where is Db2 going to get that value, when the value was not provided via the BIND PACKAGE command? Db2 in that case is going to look in the cubbyhole labeled APPLCOMPAT in ZPARM. In that cubbyhole is a piece of paper (figuratively speaking) on which (for example) V12R1M509 is written. OK, that will be the package's APPLCOMPAT value.

[You might wonder: what if REBIND PACKAGE is issued without an APPLCOMPAT specification? Will the rebound package in that case get the APPLCOMPAT value to which the ZPARM parameter APPLCOMPAT has been set? Probably not. Why not? Because it is very likely that a package being rebound already has an APPLCOMPAT value, and in that case if the REBIND PACKAGE command is issued without an APPLCOMPAT specification then the package's current APPLCOMPAT value will be retained. For REBIND PACKAGE, then, the APPLCOMPAT parameter in ZPARM is relevant only when the REBIND PACKAGE command is issued without an APPLCOMPAT specification and the package in question does not already have an APPLCOMPAT value (again, unlikely, though not impossible - you can check on this via a query of the SYSIBM.SYSPACKAGE catalog table, which has a column named APPLCOMPAT).]

Given that APPLCOMPAT in ZPARM simply provides the default value for APPLCOMPAT when BIND PACKAGE is issued without an APPLCOMPAT specification, what should the value of this ZPARM parameter be? There isn't a right or wrong answer to this question - it's up to you. Personally, I'd lean towards making the value of APPLCOMPAT in ZPARM as high as it can be, which would be equal to the currently active function level in a Db2 system. Why would I want that? Because APPLCOMPAT, in addition to providing protection (when needed) from Db2 SQL incompatibilities, also enables use of newer SQL syntax and functionality. If I have APPLCOMPAT in ZPARM set to, for example, V10R1, and BIND PACKAGE at my site is typically issued without an APPLCOMPAT specification, I am limiting application programmers to SQL syntax and functionality only up to the Db2 10 level - can't use newer built-in functions such as LISTAGG and PERCENTILE_CONT, can't use Db2 global variables, can't use Db2 arrays, can't use newer special registers such as CURRENT LOCK TIMEOUT, etc. Is that what you want? Sure, if a program using one of those newer SQL capabilities fails at bind time because of the default V10R1 APPLCOMPAT value, you can fix that problem by issuing BIND PACKAGE a second time with an APPLCOMPAT specification that is current enough to support the desired functionality, but again, is that what you want?

At some Db2 for z/OS sites, APPLCOMPAT in ZPARM is indeed set at V10R1. Why so low? One reason maybe the misunderstanding (to which I've referred) of the purpose of the ZPARM parameter. Alternatively, maybe APPLCOMPAT in ZPARM is set at V10R1 because of concern about BIND PACKAGE issued for programs that aren't net new but rather have had a bit of an SQL change (which would then require a new precompile and BIND in the case of a static SQL-issuing program, as opposed to a REBIND). A person might think, "What if there's an existing program with 20 static SQL statements, and a programmer changes just one of those statements? When there is a BIND PACKAGE (with ADD or REPLACE, as the case may be) for that program's Db2 package, and the BIND PACKAGE is issued without an APPLCOMPAT specification, I want a from-the-ZPARM-parameter default APPLCOMPAT value that will have the 19 SQL statements that weren't changed behaving as they always have." OK. I get that. Like I said, it's up to you. Just keep in mind that the risk of adverse impact on your programs from Db2 SQL incompatibilities is usually very low - these incompatibilities are relatively few and far between, and they tend to affect few if any of your Db2-accessing programs.

The main point I want to make is this: when you change the value of the APPLCOMPAT parameter in ZPARM, that action is not, in and of itself, going to cause Db2-accessing programs in your environment to suddenly start behaving differently. All you've done with the ZPARM parameter update is change the APPLCOMPAT value that a package will get if a BIND PACKAGE command is issued without an APPLCOMPAT specification. Consider the ZPARM value change in that light, and act accordingly.

Db2 for z/OS: Code Level, Catalog Level, Function Level, and More

2023-12-27T13:46:00.000-08:00

In a Db2 for z/OS context, the terms "code level," "catalog level," and "function level" were introduced when the Db2 for z/OS development team went to the continuous delivery mechanism for delivering new product functionality in-between the availability dates of new versions of the DBMS. That was a little over 7 years ago, referring to the general availability of Db2 12 - the first continuous-delivery version of Db2 for z/OS. And yet, there remains a good bit of misunderstanding among some in the Db2 for z/OS user community regarding basic concepts that are of foundational importance in a continuous-delivery sense. That being the case, I'll try to shed some clarifying light on the subject via this blog entry.

Think about an upside-down hierarchy

By "upside-down" hierarchy, I mean one that goes bottom-up from a dependency perspective. At the bottom - the foundation - you have the Db2 system's code level. This has to do largely with the currency of the code in the Db2 system's load library (or libraries, if this is a Db2 data sharing group - it's a best practice in that case for each member subsystem to have its own load library). This is the code that is loaded into memory when the Db2 system is started. You obviously don't have availability of Db2 functionality if that functionality is not present in the Db2 system's code; so, if a Db2 system's code level is 121505 (indicating code that includes functionality delivered up through Db2 version 12 function level 505), you can't create a stored procedure with CREATE OR REPLACE syntax because that syntax was introduced with Db2 12 function level (FL) 507 - by definition, a Db2 12 FL505 code level does not include functionality first delivered by Db2 12 FL507 code.

I mentioned that a Db2 for z/OS system's code level is generally reflective of the currency of the Db2 code in question. Here's what that means: over the course of time, it's normal for the code of a Db2 system (and for other subsystems in a z/OS LPAR and for the z/OS LPAR itself) to be taken to a more-current maintenance level - ideally, this will be done 2-4 times per year, and often the aim is to take the code in the z/OS LPAR (Db2 code included) to a higher RSU level (RSU - short for Recommended Service Upgrade - is a packaging of z/OS and z/OS-related software maintenance that facilitates upgrading the service currency of a z/OS system). This process involves application of PTFs ("fixes," in z/OS parlance) to code in a z/OS system, including Db2 code. Maybe, in the course of one of these service-upgrade procedures, the fix for APAR PH33727 is applied to the system's Db2 code (that which a fix "fixes" is described via the associated APAR, i.e., the APAR describes what is changed or enhanced by the fix). APAR PH33727 is the one associated with Db2 12 function level 510, and when the corresponding PTF gets applied to a Db2 system's code then that system's Db2 code level will go to 121510. Does that mean that functionality delivered through Db2 12 function level 510 is now available in the system? No - there are further dependencies in the bottom-up hierarchy.

Next Db2 level up from code: catalog

The Db2 catalog is the set of tables that basically contain metadata - "data about the data," and about the related Db2 structures (e.g., tables, table spaces, indexes) and other associated database objects (e.g., packages, routines). Sometimes, a Db2 function level introduces new Db2 features that have catalog dependencies - in other words, these are new features that cannot be used until some Db2 catalog changes that support the new features have been effected. Take, for example, Db2 12 function level 509. That function level introduced the ability to specify a data compression type at the individual table space level, or at the partition level for a range-partitioned table space (two data compression types are available in a Db2 for z/OS system - one, which is based on the Lempel-Ziv compression algorithm, is referred to as fixed-length, and the other is Huffman compression). For a Db2 DBA to be able to utilize this feature, the first requirement is the ability to specify COMPRESS YES FIXEDLENGTH or COMPRESS YES HUFFMAN in a CREATE or ALTER TABLESPACE statement. That ability is provided in the Db2 code starting with code level 121509; however, the new forms of the COMPRESS YES clause can't be used unless Db2 can record in the catalog the fact that fixed-length or Huffman compression is used for a given table space or table space partition. That cataloging capability is provided by the COMPRESS_USED column that is added to the catalog table SYSIBM.SYSTABLEPART when the Db2 catalog level goes to V12R1M509 - hence, getting the catalog level to V12R1M509 is required for compression-type specification at the table space or partition level in a Db2 12 system (by the way, "fixed length," in a Db2 data compression context, does not refer to the length of rows in a table - it refers to the length of substitution values in a compression dictionary).

When there is a requirement to take a Db2 catalog to a higher level, that change is accomplished via execution of the Db2 utility called CATMAINT, with a specification of (for example) UPDATE LEVEL(V12R1M509). Note that if a Db2 system's catalog is currently at, say, the V12R1M500 level, it can be taken straight to the V12R1M509 level with one execution of CATMAINT - that one execution of the utility would make the catalog changes associated with level 509, and also the changes associated with other catalog levels between 500 and 509.

Sometimes, a Db2 function level introduces new capabilities that do not require catalog changes. In such cases, the catalog only has to be at the level related to the last preceding function level that did require catalog changes. For example, the features of Db2 12 function level 510 have no catalog dependencies; thus, there us no 510 catalog level, and use of Db2 12 FL510 functionality can be available when the Db2 system's catalog level is V12R1M509 (the description of a function level in the Db2 for z/OS documentation always lets you know if the function level requires catalog changes).

I mentioned in the preceding sentence that Db2 12 FL510 functionality "can be" available when the Db2 system's catalog level is V12R1M509. Does that mean that something other than a catalog level change can be required to use the features of a Db2 function level? Yep - that's exactly what that means.

Next level up: activated function level

For the continuous delivery mechanism for Db2 new-function delivery to work in a practical sense, the "turning on" of a function level's new features had to be made an asynchronous event with respect to the up-leveling of Db2 code that would introduce the features to the Db2 subsystem's load library. If this were not the case - if, instead, a Db2 code level's new features were instantly available once present from a load library perspective - then Db2 for z/OS systems programmers might hesitate to upgrade the maintenance level of a Db2 system out of concern about readiness to provide support and guidance in the use of the new features. That would not be a good thing - z/OS and z/OS subsystems function best when they are at a relatively current level of maintenance.

The means through which adding new features to Db2 code is made asynchronous to having that new code be usable in a Db2 system is the Db2 command -ACTIVATE FUNCTION LEVEL; so, a Db2 system's code level might be 121509, and the system's Db2 catalog level might be V12R1M509, but the previously-mentioned ability to issue ALTER TABLESPACE (or CREATE TABLESPACE) with a COMPRESS YES HUFFMAN specification won't be there until a Db2 administrator has issued the command ACTIVATE FUNCTION LEVEL (V12R1M509). Thanks to the -ACTIVATE FUNCTION LEVEL command, a Db2-using organization can decide when they want the features introduced in a Db2 code level to be usable in their Db2 environment.

So, does the -ACTIVATE FUNCTION LEVEL command put us at the top of our upside-down Db2 continuous delivery hierarchy? Not quite. One to go...

The last level: application compatibility

In a typical production Db2 for z/OS system, there's a lot going on - lots of different applications accessing Db2 for z/OS-managed data, lots of DBA activity related to administering the system, lots of new-program deployment action, etc. In light of that fact, the -ACTIVATE FUNCTION LEVEL command is a pretty big switch. What if the immediate need that an organization has for a given Db2 function level is related to exploitation of a new feature for a single application, or for one particular database administration task? Db2 application compatibility levels provide a way to very selectively exercise functionality that has been newly activated in a Db2 system. Db2 application compatibility levels are managed primarily through a Db2 package bind parameter called APPLCOMPAT (you might want to check out the part 1 and part 2 entries on APPLCOMPAT that I posted to this blog a few years ago). Returning to the previously used example, let's say that a Db2 DBA wants to alter a table space to use Huffman compression. Is it enough for the Db2 system's code level to be 121509, and for the catalog level to be V12R1M509, and for V12R1M509 to be the activated function level? No - that's not enough. The DBA will issue the ALTER TABLESPACE statement with a COMPRESS YES HUFFMAN specification by way of a Db2 package (there is always a package associated with execution of a Db2 SQL statement). That package might be related to one of the Db2-provided programs often used by DBAs to do their work - maybe SPUFI, or DSNTEP2. The package, like all packages, will have an APPLCOMPAT specification. For the ALTER TABLESPACE with COMPRESS YES HUFFMAN to execute successfully, the package through which the statement is issued - DSNTEP2, let's say - must have an APPLCOMPAT specification of not less than V12R1M509.

As this example suggests, a package's APPLCOMPAT value enables a program that issues SQL through the package to utilize SQL syntax that was introduced with a given Db2 function level. That is one purpose of the APPLCOMPAT package bind specification. The other purpose of APPLCOMPAT is to enable a program to get the SQL behavior of an earlier version and function level of Db2 for z/OS, if that older SQL behavior is needed. See, there are times when, going from one version or function level of Db2 to another, the behavior of a SQL statement will change. What does that mean? It means same SQL statement, same data, different result. This kind of change is referred to in the Db2 for z/OS documentation as a SQL incompatibility. There are times when a program executing in a Db2 system with function level X activated needs the behavior that a SQL statement had with a Db2 version or function level that is older than X. APPLCOMPAT can deliver, for this program, that older Db2 behavior. Here's an example: suppose that a DBA named Steve needs to create a non-universal table space in a Db2 system that he administers, and let's say that the activated function level for this system is V12R1M510. It's a fact that, starting with function level V12R1M504, a CREATE TABLESPACE statement can only create a universal table space. Is Steve stuck? No. Steve can create the needed non-universal table space by using a program (we'll say the Db2-provided DSNTEP2) whose package has an APPLCOMPAT value of V12R1M503. What if the DSNTEP2 package at Steve's shop has an APPLCOMPAT value of V12R1M504 or higher? No problem: Steve just needs to make sure that the first SQL statement issued by his DSNTEP2 job is SET CURRENT APPLICATION COMPATIBILITY = 'V12R1M503'; then, a CREATE TABLESPACE statement can be issued to create a non-universal table space (this scenario is described in an entry I posted to this blog in 2020). Note that SET CURRENT APPLICATION COMPATIBILITY can be used (with a dynamic SQL-issuing program) to dynamically take a program's application compatibility level to something below - but not above - the APPLCOMPAT level of the program's Db2 package.

For many Db2 DDF-using client-server applications (applications known as DRDA requesters in Db2-speak), the Db2 packages used will be those related to the IBM Data Server Driver. These packages reside, by default, in a collection named NULLID. If the NULLID packages have an APPLCOMPAT value of X, and some DRDA requester application requires an APPLCOMPAT value that is higher or lower than X, this need is often satisfied by BIND COPY-ing the packages in NULLID into an alternate collection (with the required APPLCOMPAT specification), and then using the Db2 profile tables to automatically direct the DRDA requester application in question to the alternate IBM Data Server Driver package collection. This technique is described in an entry I posted to this blog a while back, and while that entry refers to IBM Data Server Driver packages BIND COPY-ed into an alternate collection with an alternate RELEASE specification, the same thing can be done for IBM Data Server Driver packages that have an alternate APPLCOMPAT specification.

OK, so remember this bottom-up thing:

First, the feature you want to use needs to be present in your Db2 system's code - that's the code level.
The feature you want to use may have a catalog level requirement - that's the catalog level. You can't take the catalog level to X (via execution of the Db2 CATMAINT utility) unless the system's Db2 code level is at least X.
When the code and catalog levels are right for the Db2 feature you want to use, you need to make sure that the appropriate function level has been activated on the Db2 system - that's the activated function level. Function level X cannot be activated unless the Db2 system's code level is at least X and the Db2 catalog level is at least X (or, if function level X has no catalog dependencies, the catalog level has to be at least the level of the last preceding function level that did have catalog dependencies).
For your program to use the Db2 feature of interest, the application compatibility level has to be set as needed - that's done via the APPLCOMPAT value of the program's Db2 package (or by execution of SET CURRENT APPLICATION COMPATIBILITY, if the program issues dynamic SQL statements and if you need to take the application compatibility level lower than the package's APPLCOMPAT value). A package's application compatibility level cannot be set to X unless the activated function level of the Db2 system is at least X.

Checking on all this in your Db2 for z/OS environment

To see the upside-down hierarchical lay of the land in your Db2 environment, issue the Db2 command -DISPLAY GROUP. This output will look something like this (and don't be misled by the word GROUP in the command - this is applicable for a standalone Db2 subsystem as well as to a Db2 data sharing group):

*** BEGIN DISPLAY OF GROUP(DSNPROD ) CATALOG LEVEL(V13R1M100)
CURRENT FUNCTION LEVEL(V13R1M100)
HIGHEST ACTIVATED FUNCTION LEVEL(V13R1M100)

HIGHEST POSSIBLE FUNCTION LEVEL(V13R1M500)

PROTOCOL LEVEL(2)

GROUP ATTACH NAME(DSNP)
-----------------------------------------------------------------
DB2 SUB DB2 SYSTEM IRLM
MEMBER ID SYS CMDPREF STATUS LVL NAME SUBSYS IRLMPROC
------- -- ---- -------- -------- ------ -------- ---- --------
DB1P 1 DB1P DB1P ACTIVE 131503 SYS1 IR1P DB1PIRLM
DB2P 2 DB2P DB2P ACTIVE 131503 SYS2 IR2P DB2PIRLM
-----------------------------------------------------------------
SCA STRUCTURE SIZE: 36864 KB, STATUS= AC, SCA IN USE: 1 %
LOCK1 STRUCTURE SIZE: 17408 KB
NUMBER LOCK ENTRIES: 2097152
NUMBER LIST ENTRIES: 21415, LIST ENTRIES IN USE: 3
SPT01 INLINE LENGTH: 32138
*** END DISPLAY OF GROUP(DSNPROD )

For this 2-member Db2 data sharing system, the code level, highlighted in green, is 131503 (Db2 for z/OS Version 13, function level 503). The catalog level, highlighted in orange, is V13R1M100. The activated function level, highlighted in purple, is V13R1M100. As for a package's APPLCOMPAT level, you can see that via a query of the Db2 catalog table SYSIBM.SYSPACKAGE (check the value in the APPLCOMPAT column for the package's row in the table).

I hope that this information will be useful for you. The end of 2023 is around the corner. I'll post more in '24.

Db2 13 for z/OS: Autobind Phase-In

2023-11-29T19:33:00.000-08:00

Db2 13 function level 504 became available last month (October 2023), via the fix for APAR PH54919. One of the new capabilities delivered with FL504 is something called autobind phase-in. I like that new feature a lot, and I think you will, too - especially if you're a Db2 for z/OS DBA. In this blog entry I'll explain what autobind phase-in is, why it's a very welcome addition to Db2 functionality, and how you can get ready to leverage the feature, even before you've activated function level V13R1M504.

First, a shout-out to my coworker Dengfeng Gao, a member of the IBM Db2 for z/OS development team. Dengfeng had a lot to do with making autobind phase-in a reality, and she provided me with much of the information I'm now passing along to you. Thanks, Dengfeng!

OK, to begin the story...

The way things were

Way, way back (early 1990s, as I recall), Db2 for z/OS introduced packages. For a Db2-accessing program that issues static SQL statements, you can think of the associated package as being, in essence, the compiled and executable form of the program's SQL statements (what distinguishes static SQL statements: they are prepared for execution prior to being issued by a program, via a Db2 process known as bind). When a static SQL-issuing program executes (this could be, for example, a CICS transaction, or a batch job, or a stored procedure, or a Db2 REST service), the program's package is allocated to the Db2 thread being used by the application process, and the part of the package corresponding to a particular SQL statement is executed when it's time to run that SQL statement.

SQL statements, by and large, reference Db2 tables; thus, packages are dependent on the tables referenced by SQL statements associated with the packages. Packages are also dependent on database objects that are not referenced in SQL statements (indexes are a prime example), when those objects are part of a SQL statement's access plan (i.e., the paths and mechanisms by which data targeted by a SQL statement will be accessed - for example, via a nested loop join that will employ certain indexes on the outer and inner tables). The dependencies of packages on database objects (tables, table spaces, views, indexes, etc.) are recorded in the SYSIBM.SYSPACKDEP table in the Db2 catalog.

Sometimes, a database object on which a package depends is changed in a way that requires regeneration of the package; or, an object on which the package depends (such as an index) might be dropped; or, a privilege needed by the package's owner might be revoked. In such situations, the package in question is marked as "invalid" by Db2 (such a package will have a value of 'N' in the VALID column of the SYSIBM.SYSPACKAGE catalog table). When that happens (in a pre-Db2 13 FL504 environment), the package cannot be executed again until it is regenerated by Db2. That regeneration could be accomplished through a REBIND command issued for the package by a DBA (or issued from a batch job coded by a DBA); but, what if there is a request to execute an invalidated package before that package has been regenerated through a REBIND command? In that case, Db2 will automatically regenerate the package, and that process is called autobind (it's sometimes referred to as auto-rebind).

When autobind happens for a package (again, we're talking about a pre-Db2 13 FL504 environment), it can be disruptive for the application(s) that drive execution of the package. This disruption can take several forms:

The application process whose request for execution of an invalidated package triggered the autobind has to wait until the autobind completes.
If another application process also requests execution of the invalidated package before the first autobind completes, that will result in a second attempt to autobind the package. That second attempt will have to wait, because it requires a lock on the package held by the first autobind process; thus, this second requester of the package will also sit and wait (if the first autobind finishes successfully and the second requester has not timed out in the meantime, the second requester will use the package as regenerated by the initial autobind process).
If the autobind fails (this could happen as the result of an authorization issue, among other things), the package will be marked as "inoperative" by Db2 (indicated by the value 'N' in the OPERATIVE column of the package's row in SYSIBM.SYSPACKAGE). In that case, any attempt to execute the package will fail until the package is explicitly rebound (usually, by a DBA).

This not-good situation changes dramatically (for the better) with Db2 13 FL504 autobind phase-in functionality. Before getting to that, I'll cover some prep work in which DBAs will want to engage.

Laying the autobind phase-in groundwork: a new catalog table, and a new BIND/REBIND option

When function level 500 has been activated in a Db2 13 system, the CATMAINT utility can be executed to take the catalog level toV13R1M501. When that happens, some new tables get added to the catalog. One of those new catalog tables is SYSIBM.SYSPACKSTMTDEP. As the name implies, Db2 will use this table to record static SQL dependencies on database objects at the statement level. Does that just happen? Nope - and this is where DBA action comes in.

When function level V13R1M502 has been activated, new packages can be bound - and existing packages can be rebound - with the new DEPLEVEL option. If you bind or rebind a package with DEPLEVEL(STATEMENT) then Db2 will record statement-level dependency information in the SYSPACKSTMTDEP catalog table, in addition to recording package-level dependency information in SYSPACKDEP (if you bind or rebind with a specification of DEPLEVEL(PACKAGE), it'll be business as usual - only package-level dependency information will be recorded in the catalog).

Would you like to make DEPLEVEL(STATEMENT) the default for package BIND and REBIND actions? If so, set the value of the ZPARM parameter PACKAGE_DEPENDENCY_LEVEL to STATEMENT.

Is there any downside to binding or rebinding packages with DEPLEVEL(STATEMENT)? A small one (in my opinion): because of the extra work of recording statement-level dependency information in SYSPACKSTMTDEP, binding or rebinding with DEPLEVEL(STATEMENT) will somewhat increase elapsed and CPU time for the bind/rebind operation. My expectation is that in a typical production Db2 for z/OS system, CPU consumption related to BIND and REBIND activity is a very small fraction of total CPU consumption.

Is there an upside to binding and rebinding packages with DEPLEVEL(STATEMENT)? Oh, yeah...

Enter autobind phase-in

Once a Db2 13 system's activated function level. is V13R1M504 or higher, this is what happens when a package that has been bound or rebound with DEPLEVEL(STATEMENT) is invalidated: the first request to execute the package following invalidation (assuming the invalidated package wasn't explicitly rebound before that execution request) will trigger an autobind of the package.

"Wait," you might think. "Isn't that the same thing that happened before the advent of autobind phase-in?" Yes, but the autobind itself takes a very different path, and has a very different workload impact, versus the prior process. To wit:

The package will be regenerated in the background. The process that requested execution of the invalidated package will be allowed to execute the package - it will not have to wait for the autobind to complete.
When the invalidated package is executed, statements that were not invalidated by the action that invalidated the package (e.g., an ALTER of a table that is referenced by some - but not all - of the package's statements) will continue to execute as they did before the invalidation of the package.
Also when the invalidated package is executed, statements that were invalidated by the (for example) ALTER action will be incrementally bound when issued by the associated program. This means that they will be dynamically prepared for execution, and that will mean a temporary additional CPU cost (temporary until the in-the-background autobind completes the regeneration of the package that had been invalidated), but the statements will be executed.
And if, before the in-the-background autobind completes, there is a second request to execute the invalidated package, will that trigger a second autobind action? Nope - the one autobind is for any and all package requesters. That second requester will be allowed to execute the invalidated package, just as was the case for the requester that triggered the in-the-background autobind - still-valid statements will execute as usual, and invalidated statements will be incrementally bound and then executed.
When the in-the-background autobind has finished its work, the newly regenerated package will be phased into use, in much the same way that the rebind phase-in functionality introduced with Db2 12 FL505 phases a newly rebound package into use: the first request for execution of the package following completion of the in-the-background autobind will get the regenerated (and now valid) package. Eventually, processes that had been executing the previous instance of the package (the instance that had been invalidated) will be done with that, and all processes will be using the regenerated package when they request its execution.
If the in-the-background autobind fails, will the invalidated package be marked inoperative (with attendant error situations for processes that request execution of the package)? Nope. In that case, the package will be marked with rebind-advisory status ('R' in the OPERATIVE column for the package's row in SYSPACKAGE). The package can still be executed (as described above: as-usual for not-invalidated statements, incremental bind for invalidated statements), but an explicit REBIND is recommended to get the package regenerated and back into a valid status.

Bottom line: with autobind phase-in, autobind activity will have hugely less impact on throughput and service levels for applications that execute packages that have been invalidated.

Note that the above-described much-better autobind process applies only to packages that have been bound or rebound with DEPLEVEL(STATEMENT) - and you can start doing that (as previously mentioned) once you've activated function level 502 in a Db2 13 system.

One other item of information: the in-the-background autobind done for autobind phase-in will execute in access-path-reuse mode - in other words, Db2 will reuse the access paths previously utilized for the package's SQL statements, if that can be done (it of course could not be done for all statements if, for example, package invalidation resulted from the dropping of an index on which some of the package's SQL statements depended). The same goes for the incremental bind of invalidated SQL statements when the invalidated package is requested for execution before the in-the-background autobind has completed - access paths will be reused if possible.

OK, so if you've gotten to Db2 13 at your site, give major consideration to rebinding packages (and binding new packages) with DEPLEVEL(STATEMENT), once function level V13R1M502 or higher has been activated; and, look forward to a much more application workload-friendly autobind process when you get function level V13R1M504 activated.

Db2 13 for z/OS: A New Means for Managing RELEASE(DEALLOCATE) Packages

2023-10-30T18:05:00.000-07:00

It has long been understood by many Db2 for z/OS DBAs that a combination of the RELEASE(DEALLOCATE) package bind specification (especially for frequently executed Db2 packages that consume little CPU time per execution) and persistent threads can significantly reduce in-Db2 CPU time for Db2-accessing applications (a "persistent thread" is one that persists through commits - examples include CICS-Db2 protected threads, the threads between IMS wait-for-input regions and Db2, the Db2 threads associated with batch jobs, and high-performance database access threads, aka high-performance DBATs). The CPU efficiency benefit of RELEASE(DEALLOCATE) + persistent threads comes mainly from avoiding the cost of constantly releasing the package in question (i.e., separating it from the thread) at a commit point, only to reallocate it to the thread when it is subsequently (often, very soon - maybe a fraction of a second later) again requested for execution (some additional CPU savings are achieved via retention across commits of "parent" locks acquired in the execution of the package - these are table space- or partition-level locks, and they are almost always non-exclusive in nature).

Always nice to get a Db2 workload CPU efficiency boost, but those CPU savings came at one time with several "flip side" concerns. One of those concerns - conflict with Db2 utilities caused by retained parent locks - was addressed a long time ago (back in the mid-1990s) with the advent of the drain locking mechanism that utilities can use to gain exclusive access to a database object. Another concern from days past had to do with virtual storage constraint - the combination of RELEASE(DEALLOCATE) and persistent threads causes said threads to consume more virtual storage, and when that virtual storage was within the quite-limited confines of the EDM pool, there was a real risk of that getting filled up and causing application failures if one were not careful. Thankfully, that virtual storage-related risk was eliminated with Db2 10, when the virtual storage space used for allocation of packages to threads for execution moved from the EDM pool to above-the-bar agent local pool space, of which there is a very large quantity.

That left us with one more operational challenge associated with the RELEASE(DEALLOCATE) + persistent threads combo: conflict between in-use packages and processes that need to either rebind or invalidate a package. See, a package can't be rebound or invalidated when it is in-use, and a RELEASE(DEALLOCATE) package allocated for execution to a persistent thread is considered by Db2 to be continuously in-use until the thread is terminated, and that could be a while. This last operational challenge was partly addressed with the rebind phase-in functionality that was introduced with function level 505 of Db2 12 for z/OS. OK, great - you can successfully and non-disruptively rebind a package even when the package is in-use at the time of the issuance of the REBIND PACKAGE command; but, what about the situation in which a package needs to be invalidated, perhaps as a result of execution of an ALTER statement targeting an object on which the package is dependent (or as a result of an online REORG that materializes a pending DDL change)? Db2 11, in new-fucntion mode, provided an assist in this case: Db2 gained the ability to detect when a package-invalidating action was being blocked by a dependent package bound with RELEASE(DEALLOCATE) and allocated to a persistent thread - detecting that situation, Db2 could automatically, dynamically and temporarily change the package's behavior to RELEASE(COMMIT) at its next commit point. This enhancement, though welcome, did not fully eliminate the problem.

Why did Db2 11's automatic detection and reaction to a RELEASE(DEALLOCATE) package being allocated to a persistent thread not totally resolve the blocked package invalidation problem? Two reasons:

The persistent thread to which the RELEASE(DEALLOCATE) package is allocated may be outside of Db2 - being, perhaps, in a between-the-last-and-the-next-transaction situation (this can particularly be an issue for high-performance DBATs).
There might be several different dependent packages bound with RELEASE(DEALLOCATE), and copies of these various packages could be allocated to a large number of threads, such that Db2 is not able to separate all the packages from all those threads in time to keep the package-invalidating action from timing out.

For these situations, Db2 13 provides the solution in the form of a new option that can be exercised via the profile tables (SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES). The new option, available when Db2 13 function level 500 has been activated, comes in the form of a new keyword for the DSN_PROFILE_ATTRIBUTES table: RELEASE_PACKAGE. This keyword has one available associated ATTRIBUTE1 value: COMMIT, and specification of that value will cause Db2 to override RELEASE(DEALLOCATE) behavior for the package (or packages) associated with a profile to RELEASE(COMMIT) behavior, until further notice. And, this new profile table option can be used for local-to-Db2 applications (e.g., CICS-Db2 or IMS TM-Db2 or batch Db2) as well as for DDF-using applications (the profile tables had previously, essentially, been relevant only to DDF-using applications).

Let's consider a use-case scenario to illustrate exploitation of this Db2 13-delivered capability. Suppose you're a Db2 DBA, and you need to issue an ALTER TABLE statement that will invalidate several packages, in collection COLL_A, that are bound with RELEASE(DEALLOCATE) and are executed by way of persistent threads. Not wanting this ALTER TABLE statement to time out due to conflict with the dependent RELEASE(DEALLOCATE) packages, you take this approach (and we'll assume that this is a Db2 13 system with function level 500 or higher activated):

Sometime prior to the time you want to issue the ALTER TABLE statement (maybe 15 minutes ahead of that time, as an example), you insert a row in DSN_PROFILE_TABLE to identify a profile that is associated with the collection COLL_A (you could also do this for a specific package in COLL_A, but we'll do this for the whole collection in this case). In doing this, you put the value 'Y' in the PROFILE_ENABLED column, to let Db2 know that this profile (and its associated attributes) is "live" (i.e., in effect).
The just-created profile, like all profiles, has an ID associated with it (an integer value). We'll say that the ID for this COLL_A-related profile is 5. In the DSN_PROFILE_ATTRIBUTES table, you insert a row for profile 5. In that row, you specify 'RELEASE_PACKAGE' for the KEYWORDS column value, and 'COMMIT' for the ATTRIBUTE1 value. For the ATTRIBUTE2 column you specify a value of 1, because (in this example) you want the RELEASE(DEALLOCATE) override action to apply only to local-to-Db2 processes (a value of NULL in the ATTRIBUTE2 column would indicate that the RELEASE(DEALLOCATE) override action is to be taken only for DDF-using processes, and a value of 2 would mean, "Take this action for all related processes, whether local-to-Db2 or involving DDF").
You issue the Db2 command -START PROFILE, so that Db2 will load the information in the profile tables into memory. Db2 sees that profile 5 (among, possibly, others) is enabled, and takes action: every time a package in COLL_A is loaded for execution, it will be treated as though bound with RELEASE(COMMIT), even if RELEASE(DEALLOCATE) had been specified for the most recent bind or rebind of the package.
Because you took these steps 15 minutes prior to the time for issuing the ALTER TABLE statement, Db2 had plenty of time to switch to RELEASE(COMMIT) behavior for every instance of a RELEASE(DEALLOCATE) package in COLL_A that is allocated for execution to a persistent thread. You issue the ALTER TABLE statement, and it succeeds because there are no dependent packages bound with RELEASE(DEALLOCATE) and allocated to persistent threads to block execution of the statement. Note that the application workload associated with the RELEASE(DEALLOCATE) packages continues to execute - just with RELEASE(COMMIT) behavior in effect for those packages. That means you temporarily do without the CPU efficiency benefit of RELEASE(DEALLOCATE) for the associated application(s).
With the ALTER TABLE statement having been successfully executed, you update the row for profile 5 in DSN_PROFILE_TABLE to have 'N' in the PROFILE_ENABLED column, to show Db2 that this profile (and its attributes) is no longer in effect. A subsequent issuance of the -START PROFILE command lets Db2 know of this profile status change by re-loading the profile table information into memory.

And that's it. A problem that might have been standing in your way is taken care of, easily and non-disruptively.

Of course, this story is not yet over. With the ALTER TABLE statement having been successfully executed, packages dependent on the table are invalidated. What happens after that? The invalidated packages will be auto-rebound by Db2 when next requested for execution, if you don't explicitly rebind them before that happens. Db2 13 function level 504, which came out just a few days ago, delivers big news on the auto-rebind front. I'll post a blog entry on that enhancement within the next few weeks.

Db2 for z/OS: Two Stories of Temporary Tables

2023-09-27T12:54:00.000-07:00

Db2 for z/OS provides, for your use, two different temporary table types: declared and created. I recently worked with Db2 for z/OS people at a couple of different sites, providing assistance in understanding and effectively utilizing Db2 temporary tables, and I think information related to these experiences might be useful to folks in the larger user community - thus this blog entry. The first story points up an advantage of declared global temporary tables, while the second illustrates advantageous use of a created global temporary table. I hope that what follows will be of interest to you.

Db2 declared global temporary tables - the index advantage

A Db2 declared global temporary table (DGTT) is so-called because it comes into existence when it is declared - usually in an application program. There can be plenty of reasons for a program to declare a temporary Db2 table. For example, an application might need to get a preliminary set of result set rows from Db2 and then operate further on that data - easily done by inserting the result set rows into a declared temporary table and subsequently issuing various SQL statements that target the DGTT.

A Db2 for z/OS DBA sent me the other day some information about a use case at his site that made a declared global temporary table the right choice. Because the temporary table in this case was going to be fairly large, and because DELETE statements targeting the table would be coded with predicates, there would be a performance advantage to the use of a declared versus a created global temporary table (CGTT): an index can be defined on a DGTT, but not on a CGTT (this and other differences between declared and created global temporary tables are listed in the Db2 for z/OS online documentation).

So, with the available option of indexing, DGTTs will generally be preferred over CGTTs, right? Not necessarily...

When a CGTT can majorly reduce elapsed and CPU time for a process

The second Db2 temporary table story I'll relate came to me by way of a Db2 for z/OS systems programmer who works for a financial services organization. In this case, a z/OS-based COBOL batch program was retrieving data from a VSAM file and inserting related information into a Db2 temporary table. Rows were then SELECTed from the temporary table for further processing, after which the rows in the temporary table were deleted (via a predicate-less DELETE). These insert/retrieve/delete actions involving the temporary table were repeated about 300,000 times in an execution of the batch job. At first, the COBOL program declared and utilized a declared global temporary table. The problem? That approach did not perform as needed: the job consumed around 20 minutes of CPU time, and the SQL activity drove approximately 23 million GETPAGE requests (a GETPAGE happens when Db2 needs to examine the contents of a page of a database object, and GETPAGE activity is a major factor in the CPU cost of SQL statement execution). The Db2 sysprog also noticed that the batch process generated a lot of PREPARE activity (referring to the dynamic preparation of SQL statements for execution by Db2 - something that can significantly add to an application program's CPU consumption).

To try to reduce the CPU cost of the batch job, the Db2 team at this financial services organization switched from using a DGTT to a "permanent" (i.e., a "regular") Db2 table. Performance indeed got way better: GETPAGE requests dropped by over 70%, and CPU time for the job went from about 20 minutes to about 4 minutes. Why the big drop in GETPAGEs and CPU time? Probably this had to do with elimination of expensive SQL statement preparation activity. See, you might think that the SQL statements hard-coded in your COBOL (for example) program are all static ("static" SQL statements are prepared by Db2 for execution prior to program execution, via a process known as BIND), but when those statements refer to a DGTT they have to be dynamically prepared for execution when issued by the program because there is no definition of a DGTT in the Db2 catalog.

This big improvement in CPU efficiency notwithstanding, there was a side-effect of the switch from a DGTT to a permanent table that did not sit well with the Db2 team at the financial services company: as previously noted, the set of SQL statements targeting (initially) the DGTT and then the permanent Db2 table involved a delete of all rows in the table, and that happened 300,000 times in the execution of the batch job. When a permanent table was used in place of the DGTT, these 300,000 mass DELETEs (a mass DELETE is a DELETE without predicates) caused 300,000 rows to be inserted for the permanent table in the SYSCOPY table in the Db2 system's catalog. Running a MODIFY RECOVERY utility job to clear those rows out of SYSCOPY, and having to take an image copy of the permanent table to keep it from going into COPY-pending status, were viewed as significant hassles by the Db2 team. Was a still-better way forward available to them?

Indeed so. I suggested going with a created global temporary table. [Like a permanent table, a CGTT is defined by way of a CREATE statement, and there is information about a CGTT in the Db2 catalog. When a program references the CGTT it gets its own instance of the CGTT which (like a DGTT) is physically provisioned in the Db2 work file database.] The Db2 team did that, and the results were very positive. CPU time for the job - originally about 20 minutes with the DGTT and then about 4 minutes with the permanent table, went down to just over 2 minutes with the CGTT (as with the permanent table, no SQL statement dynamic preparation was needed, thanks to the CGTT being defined in the Db2 catalog); and, there were no inserts into SYSCOPY in association with the repeated mass DELETEs (same as with the DGTT); and, there was no need for an image copy of the CGTT because the instance of the table goes away automatically when the process using the table completes (same as with the DGTT). So, the CGTT in this case provided the advantages of a DGTT and of a permanent Db2 table, minus the relative disadvantages of those options (the dynamic statement preparation costs of the DGTT, and the mass-DELETE-related SYSCOPY inserts and the image copy requirements of the permanent table).

The bottom-line message

Declared and created global temporary tables both have their places in a Db2 for z/OS application environment. When considering the use of a Db2 temporary table for an application, be careful not to jump too quickly to a DGTT versus a CGTT decision (though sometimes there will be only one choice - as when, for example, UPDATE access to the temporary table is needed - something that can only be done with a DGTT). Consider the DGTT versus CGTT choice in light of the particulars of the use case, and choose accordingly. A thoughtful choice can yield a substantial performance advantage - so use your head.

Db2 for z/OS: An Important Difference Between Data in Memory and Data on Disk

2023-08-31T15:03:00.000-07:00

For the past several years, I've tried to post an entry per month to this blog. Sometimes, it will be very near the end of a month, and I haven't posted anything since the previous month, and I don't have any good ideas for a new entry. Then, I'll have an exchange with someone - could be via email, a phone call, a face-to-face discussion, whatever - and BOOM! Right there I'll find the inspiration for a blog entry. That happened again yesterday - the day before the last day of the month. Whew!

Here's what happened: in an email message, an IT professional recounted to me a situation that had her perplexed. The scenario: 11,000 rows of data were loaded into a table using the IBM Db2 for z/OS LOAD utility. Immediately after the completion of that load job, a program that updated 8 of those just-loaded rows executed and ran successfully to completion. Right after that, an unload job for the table in question was executed. This unload was performed using an IBM software product called Db2 High Performance Unload for z/OS, or HPU, for short (see https://www.ibm.com/docs/en/dhpufz/5.1.0?topic=documentation-db2-high-performance-unload-overview). HPU has two modes of operation: it can perform an unload by operating directly on the VSAM data set(s) associated with the target table, or it can do the unload through Db2, in which case the data is accessed in memory (i.e., in the buffer pool to which the table's table space is assigned). This unload was done in the former of these modes - operating directly on the VSAM data set(s) associated with the table's table space. The result of the unload surprised the person who emailed me. How so? Well, the unload was done using a predicate (the WHERE clause that you might see in a query), and the update program that ran between the load (of the 11,000 rows) and the unload changed values in a way that should have caused 8 of the 11,000 loaded rows to be filtered out by the unload process's predicate (the other 10,992 rows would be qualified by the predicate). The person who emailed me expected 10,992 records in the unload data set, but there were in fact 11,000 rows in that data set. The updates that should have caused 8 rows to be not-qualified by the unload process's predicate were committed before the unload job ran, so why was this update action not reflected in the contents of the unload data set? Consternation increased when another unload of the table, executed a few hours later (again, with the unload process using a predicate and operating directly on the table's associated VSAM data set(s)), generated an unload data set that did contain the expected 10,992 rows.

What in the world was going on here?

Here's what was going on: this all has to do with a big difference between a committed data change (which has relevance for Db2 data-in-memory) and an externalized data change (which relates to Db2 data-on-disk). What's important to know is that Db2 for z/OS does not externalize data changes (i.e., does not write changed data to the associated VSAM data set) as part of commit processing. Instead, database write I/O operations (to externalize data changes to VSAM data sets on disk) are done in a deferred way (and usually asynchronously, at that). This aspect of Db2's operation is critically important to scalability when it comes to data-change operations (e.g., INSERT, UPDATE and DELETE). If Db2 had to write changed pages to disk at commit time, data-change throughput would be majorly impacted in a negative way. In the scenario described above, the first unload generated by HPU (done right after the programmatic update of 8 of the 11,000 rows previously LOAD-ed into the table), operating directly on the table space's underlying VSAM data set(s), did not reflect the post-LOAD update of the 8 rows because the page(s) changed by the updating program were not written to the underlying VSAM data set(s) at commit time. The changed page(s) were externalized later by Db2 via deferred write processing, and that is why the second unload process, also operating directly on the table space's VSAM data set(s), reflected the aforementioned updates of 8 of the 11,000 table rows.

If Db2 deferred write action did eventually get the changed pages (associated with the updating of 8 of the table's rows) written to the associated VSAM data sets on disk - and it did - then what caused that deferred write action to happen? Usually Db2 deferred write operations for a given buffer pool are driven by one of two deferred write thresholds being reached for the pool. The deferred write queue threshold (abbreviated as DWQT) is expressed as a percentage of the total number of buffers allocated for a pool that are occupied by changed-but-not-externalized pages (the default value is 30), and the vertical deferred write queue threshold (VDWQT) is expressed as a percentage of the pool's buffers that are occupied by changed-but-not-externalized pages that belong to a particular data set (the default value is 5). Whenever either of those limits is reached (and it's usually the VDWQT limit), deferred write activity is triggered. The deferred write I/Os, by the way, are generally multi-page in nature (multiple pages written to disk in one I/O operation), and that is good for CPU-efficiency on a per-page basis. The CPU time associated with database writes (usually not much - I/Os don't require many CPU cycles) is charged to the Db2 database services address space (aka the DBM1 address space).

What about synchronous database write actions? Are they also deferred? Yes, they are - they're just triggered by something besides deferred write queue thresholds being reached. In the case of synchronous writes, the trigger is Db2 checkpoint processing. How that works: when Db2 executes a system checkpoint (which it does, by default, every 3 minutes), it notes all pages in each buffer pool that are in changed-but-not-yet-externalized status. When the next system checkpoint rolls around, Db2 checks to see if any of the changed-but-not-yet-externalized pages noted at the last checkpoint have still not been written to disk. If there are any such pages then they will be synchronously written to disk as part of checkpoint processing. Here, "synchronous" means that Db2 will immediately start writing those pages to disk, and it will continue to do that until they are all externalized.

OK, back to the story that prompted this blog entry. Is there a way that the initial HPU unload (the one executed very shortly after the programmatic update of 8 of the 11,000 rows LOAD-ed into the table) could have generated an unload data set with the desired 10,992 rows? Yes. In fact, there were at least two options for getting that done. One option would be to execute the Db2 QUIESCE utility for the table's table space prior to running the HPU unload. This would cause Db2 to write all changed-but-not-yet-externalized pages of the table's table space to disk, and then an HPU unload operating directly on the table space's VSAM data sets would have reflected the update of the 8 rows.

The second option would be to have HPU do the unload through Db2, as opposed to operating directly on the table space's underlying VSAM data sets - this is something that can be done through an HPU keyword. That, in turn, would have caused the HPU unload to be accomplished using data in memory (i.e., in the table space's assigned buffer pool) - any of the to-be-unloaded pages that were not already in memory would have been read into memory to as part of the unload process. This approach would have reflected the programmatic update of the 8 rows because those updates had been committed, and Db2 data-in-memory is always in a transactionally consistent state (any in-memory data that is not transactionally consistent because of an in-flight - that is, not-yet-completed - data change operation is blocked from access by X-type locks, taken at a page or a row level, that are not released until the data changes in question are committed).

Which of these options would you choose? It would depend on what is most important for you. The QUIESCE option would allow the HPU unload to operate directly on the VSAM data set(s) associated with the table space, and that would yield a CPU efficiency benefit, but the QUIESCE itself could be at least somewhat disruptive for applications accessing the target table. The "through Db2" option would not disrupt any application processes that were accessing the table at the time, but it would cause the HPU unload operation to consume some additional CPU time.

By the way, if you're wondering, "If committed data changes are written to disk in a deferred way, how is loss of committed data changes prevented in the event of an abnormal termination (i.e., a "crash") of the Db2 subsystem that happens when there are changed-but-not-yet-externalized pages in memory?" Worry not - data recorded in the Db2 transaction log is used to process those "pending writes" as part of the "roll-forward" phase of Db2 restart processing following a subsystem failure.

One more thing: the mechanics of all this are different in a Db2 data sharing environment (involving group buffer pool writes and associated castout operations to eventually get changed pages written to VSAM data sets on disk), but the net effect is the same.

And there you have it. I'm thankful for your visiting this blog, and I'm thankful for interesting questions that come in when I'm trying to figure out what I'm going to blog about.

Db2 for z/OS: What I Would Say to Application Developers (Part 2)

2023-07-28T16:11:00.000-07:00

In the part 1 of this 2-part entry (posted last month), I emphasized what I consider to be job one for a developer coding a Db2 for z/OS-targeted query (write a SQL statement that will retrieve the data your program requires, and don't worry too much about the statement's performance - that's mostly taken care of by Db2), while also noting ways in which a developer can effectively work in partnership with a DBA to enhance the performance of a Db2-based application. In this part 2 entry I will focus on leveraging application-enabling features of Db2 for z/OS.

First, what interface is right for a Db2-accessing application?

For a long time, this was not a very meaningful question, as there was essentially one application interface to Db2 for z/OS: the SQL interface. Sure, there could be discussions around dynamic versus static SQL (i.e., SQL statements prepared by Db2 for execution when initially issued by a program, versus statements that are pre-prepared for execution via a Db2 process known as "bind"), or about the use of "generic" (i.e., non-DBMS-specific) SQL forms such as JDBC and ODBC, but in any case you were talking about SQL, period. That changed with Db2 12 for z/OS, which introduced Db2's REST interface. Using Db2's REST interface does not involve compromising performance, security or scalability, so I'd say use it when it makes sense. When might it make sense? Here are some considerations:

Client-side programming language flexibility - The IBM Data Server Driver, which supports the use of SQL forms such as JDBC and ODBC with Db2 for z/OS, can be used with applications written in a number of programming languages (generally speaking, programs coded with embedded static SQL statements don't require a driver), but suppose your team wants to use a language for which the IBM Data Server Driver is not applicable? Well, does the language allow a program to issue a REST request? If the answer to that question is, "Yes" (often the case), programs written in that language can access Db2 via its REST interface.
Total abstraction of the particulars of the service-providing system - If you're using a generic SQL form such as JDBC or ODBC, you don't have to know the specifics of the relational database management system being accessed, but you still know that your program is accessing a relational DBMS (or something that through virtualization software appears to be a relational DBMS). Maybe you don't want to have to know that (even if you have strong SQL skills) - you just want to request some service and have it performed as expected by some system, and you don't care a whit about what that system is and how it does what it does. In that case, the REST interface to Db2 looks really good.
Separation of programming duties - When your client-side program accesses Db2 using REST requests, your program isn't issuing SQL statements - you're instead coding REST requests that invoke server-side SQL statements that were likely written by someone else. That separation of programming duties - client-side developers code programs that issue REST requests and process any returned results, and server-side developers write the Db2 SQL statements and/or stored procedures that will be REST-invoked - might suit you (and your organization's IT leadership) just fine.

So, think this over for a new application that will access Db2 for z/OS, and make the appropriate choice.

Let Db2 for z/OS do work so you don't have to

There are a number of Db2 features that can provide useful functionality for an application, and when you leverage one of these features that's functionality that you don't have to provide via program code. Here are some of the features in this category, delivered through recent versions of Db2 for z/OS:

Temporal data support (introduced with Db2 10 for z/OS) - This capability, through which a time dimension can be added to data in a table, comes in two forms (both can be implemented for a given table, if desired, or one or the other can be used):

System-time temporal (also known as row versioning) - The way this works: when a row in table T1 is deleted or updated, the "before" image of that row (the row as it existed prior to the update or delete operation) is automatically inserted by Db2 into the "history" table associated with T1, and Db2 also updates timestamp values in the history table row indicating when that "version" of the row first became the current version (i.e., when the row was first inserted into T1, or when it was updated to that version) and when the row stopped being the current version (i.e., when it was deleted from T1, or when it was updated to a new version). Here's what this means: using pretty simple Db2 temporal query syntax, your application can easily find out from Db2 what the row for a given entity (a bank account, an insurance policy, a customer record, whatever) looked like at a previous point in time. For example, when it comes to insurance claim adjudication, what's important is not what the policy holder's coverage is now - what's important is the coverage that was in effect when the event prompting the claim occurred. That's easy to determine with Db2 row versioning functionality. Also easy is seeing how a given entity's row in a table changed over a period of time, and who made changes to the row.
Business-time temporal - This form of Db2 temporal functionality lets you add future data changes to a table with an indication of when that change will go into effect and how long it will be in effect (if not indefinitely). For example, you could through a business-time temporal UPDATE statement indicate (and this will be reflected in the target table) that the price of product XYZ is going to change from $10 to $12 on May 1 of next year. Updates of this nature will not impact programs that, by default, are accessing rows that, from a business perspective, are currently in effect. Having future prices (for example) in a product table provides at least a couple of benefits: 1) it ensures that price changes will actually go into effect when scheduled, and 2) it allows financial analysts to issue queries that will show what revenue and profits will be with prices that will be in effect at a future date.

Db2 transparent archiving (introduced with Db2 11 for z/OS) - This feature can be helpful, especially for performance, in this scenario: table T1 has 20 years (for example) of historical transaction data (maybe related to account activity), but the vast majority of accesses to the table target rows that have been added in the past 90 days; further, because the table's row-clustering key is not continuously-ascending and because row inserts far outnumber row deletes, the "popular" rows in the table (the ones not more than 90 days old) are physically separated from each other by ever-larger numbers of "old and cold" rows (the ones rarely accessed by programs). In that case the performance for access to popular rows will get progressively worse, and the cost of administering the table (e.g., backing it up, periodically reorganizing it, etc.) will steadily increase. When Db2 transparent archiving is activated for the table T1 (easily done by a DBA), T1 will end up holding only the most recent 90 days of data (the popular rows), while all of the "old and cold" rows are physically stored in the "archive" table that - in an "under the covers" way - is associated with T1 (result: substantially better performance for access to the popular rows, because they are concentrated in a smaller table); and, for query purposes Db2 makes the base table and its associated archive table appear logically as a single table, so a query referencing only the base table can retrieve archived rows as needed; and, when a row is deleted from the base table (after it's been there for - in this example - 90 days), that row is automatically inserted by Db2 in the associated archive table.
Result set pagination (introduced with Db2 12 for z/OS) - Db2 12 made a new clause, OFFSET, available for a SELECT statement. OFFSET, used in combination with the FETCH FIRST n ROWS clause, makes it programmatically easier to return a multi-row result set in "pages" that a user can scroll through.
"Piece-wise" DELETE (introduced with Db2 12 for z/OS) - Starting with Db2 12, you can use the FETCH FIRST n ROWS clause with a DELETE statement, and that makes it really easy to write a program that will remove a large number of rows in a table in a series of small units of work (so that your data-purge program will not acquire too many locks at one time - nice for concurrency of access with other programs targeting the table).
Newer built-in functions - Db2 12 for z/OS added some nice built-in functions, including:

PERCENTILE_CONT and PERCENTILE_DISC - These functions (the former treats values in a column as points in a continuous distribution of values, while the latter treats column values as discrete data values) are useful for writing a query that answers a question such as, "Show me the value that is the 90th percentile with regard to salaries of employees in department A01."
LISTAGG - This function makes it easy to have a comma-separated list of values (e.g., last names of employees who have more than 10 years of service with the company, for a given department) as a column of a query result set.
HASH_MD5 - With this function, you can use a SQL statement to get an MD5 hash of a value before inserting that value into a table (and there are three related built-in functions associated with other popular hashing algorithms).

Application-specific lock timeout value (introduced with Db2 13 for z/OS) - Db2 13 provided the new CURRENT LOCK TIMEOUT special register, through which an application can set a lock timeout value that is different from the default lock timeout value for the Db2 system. Suppose, for example, that the system's default lock timeout value is 30 seconds. Maybe the development team for a mobile app that will drive accesses to a Db2 for z/OS database wants to make sure that a user will never have to look for 30 seconds at a spinning colored wheel on a mobile phone screen if a lock required by the app can't be readily obtained. The application team might decide (probably rightly) that it would be better to have a lock timeout value of 3 seconds for this app, have a condition handler in the Db2-accessing program for a lock timeout error, and in the event of a lock timeout (noticed in 3 seconds, versus 30 seconds) send an "Oops! Something went wrong - please try again" message to the user. I as a user would prefer that to looking at the spinning colored wheel for 30 seconds. Similarly, the development team for a long-running, mission-critical batch application might not want their job to time out unless a lock wait exceeds 10 minutes. Easily done with a SET CURRENT LOCK TIMEOUT statement.
SQL Data Insights (introduced with Db2 13 for z/OS) - This feature represents the first embedding of advanced machine learning technology in the Db2 for z/OS "engine." It's easy for a DBA to set up and easy for a developer (or a user utilizing a query tool) to use, because the data science work was done by IBM development teams. SQL Data Insights is made usable in the form of three new built-in Db2 functions (more will be delivered via future Db2 13 function levels): AI_SIMILARITY, AI_SEMANTIC_CLUSTER and AI_ANALOGY. These new functions allow for the asking of what I like to call "fuzzy" queries. Here's one example: suppose the fraud analysis team at an insurance company finally caught someone who had been submitting fraudulent claims (and had been very good at covering his tracks). Among the company's several million other policy holders, who else might be engaging in hard-to-detect fraudulent activity? Via the AI_SIMILARITY built-in function of Db2, you can (using standard query syntax for a built-in function) easily code a query that will do this: "Hey, Db2. Here is the ID of a policy holder. Show me the IDs of the 20 other policy holders who are most like this one." And here's the kicker: in coding that query, you don't have to tell Db2 what you mean by "like this one." Db2 will detect patterns of similarity in the data in the specified table - patterns that a human being might be challenged to discern - and return the rows with the highest "similarity scores" in relation to the policy holder ID provided as input to the function. You can turn that list over to the fraud detection team and say, "Hey, guys. Do a deep-dive analysis of activity for the policy holders associated with these 20 IDs - they are the ones most similar to the fraudster that we recently caught."

And there are more application-enabling Db2 features where these came from. Again, let Db2 do work that you might otherwise have to do programmatically. Not only will that save you time and effort - it's likely that Db2 will get the work done more CPU-efficiently than program code would.

Take advantage of Db2 global variable functionality

Db2 11 for z/OS introduced global variables. Unlike a traditional host variable, which you have to define in your program code, a Db2 global variable is created by a Db2 DBA. Once a global variable has been created, it can be used by an application process (a Db2 DBA just has to grant to the ID of the process the privilege to use the global variable). When an application references a Db2 global variable in a SQL statement, it gets its own instance of that global variable (in other words, if there is a Db2 global variable called GLOBVAR, and application process A puts 'cat' in the global variable and application process B puts 'dog' in the global variable, when the two processes look at the value of GLOBVAR then A will see 'cat' and B will see 'dog'). A Db2 global variable makes it really easy to get a value from a Db2 table and pass it via the global variable to a subsequent SQL statement (when the subsequent SQL statement references the global variable in, say, a query predicate, it is as though the predicate were referencing the value previously placed in the global variable).

Another Db2 global variable use case: a global variable makes it really easy to get a value from a Db2 advanced trigger (that's a trigger that has as part of its definition a SQL procedure language routine - more on SQL PL, below): the trigger just puts the value in a global variable, and when the trigger's processing is done and control is returned to the program that caused the trigger to fire, the program looks in the global variable and, voila - there's the value placed by the trigger.

Leverage SQL procedure language, and manage and deploy Db2 for z/OS SQL routines in an agile way

SQL procedure language (SQL PL) lets you code Db2 routines (stored procedures, user-defined functions and advanced triggers) using only SQL statements. This is do-able thanks to a class of Db2 SQL statements called control statements (a reference to logic flow control). These statements have names such as LOOP, ITERATE, WHILE and GOTO - you get the picture.

SQL PL has become really popular over the past 10-15 years (it was introduced with Db2 9 for z/OS). When it comes to Db2 data processing routines, those written in SQL PL can have both functional and performance advantages over routines written in other languages (autonomous procedures are just one example of a functional benefit of stored procedures written in SQL PL). If you decide to use SQL PL routines for an application, I encourage you to manage and deploy these routines in an agile way. In terms of SQL PL routine management, consider how associated source code will be managed. The source code of a Db2 for z/OS "native SQL procedure" (i.e., a stored procedure written in SQL PL) is the CREATE PROCEDURE statement that defines the stored procedure. How should you manage this source code? The same way you'd manage any other source code - don't get thrown off by the fact that this source code has a CREATE in it. Does your organization use an open-source source code management (SCM) tool such as GitLab? OK, fine: use GitLab to manage the source for your Db2 native SQL procedures - you wouldn't be the first to do that.

How about deployment of Db2 SQL PL routines - especially native SQL procedures? To do that in the most agile way possible, use CREATE OR REPLACE syntax when coding these routines. This is the best fit for a unified DevOps pipeline (i.e., an application deployment pipeline used for all of your organization's applications, regardless of the platform(s) on which application programs will run).

The bottom line

In the minds of the IBM folks who develop Db2 for z/OS, application developers are a tremendously important constituency. There are all kinds of Db2 features and functions that were expressly designed to make life easier for application developers working on programs that will access Db2 data servers. Learn about this stuff and take advantage of it. And work with your organization's Db2 for z/OS DBAs. They can help you leverage Db2's application-enabling capabilities.

Code on!

Db2 for z/OS: What I Would Say to Application Developers (Part 1)

2023-06-30T07:48:00.000-07:00

Not long ago, I received a request to deliver a couple of Db2 for z/OS-focused webcasts for an organization's application developers. The person who asked about this initially gave me the impression that the purpose of the webcasts would be to help developers write "efficient SQL." This did not have much appeal for me (as I'll explain below), and I communicated as much back to the requester. Subsequently, this individual rephrased the request, indicating that the aim of the webcasts would be to provide "insights for developers to increase their confidence and skills around Db2 [for z/OS] in both development and problem analysis." "OK," I thought to myself, "Now we're talking." This ask gave me an opportunity to think about what I'd like to say to people who write (or might write in the future) application programs that involve accessing Db2 for z/OS-managed data. I'll share these thoughts of mine in a two-part blog entry. In this first part I'll talk about application performance - but maybe not in the way you'd expect. In the part two entry, which I'll likely post in the next 2-3 weeks, I'll focus on application enablement from a Db2 for z/OS perspective.

OK, why will a request to talk to developers about "writing efficient SQL" generally rub me the wrong way? Two reasons: first, something I heard a few years ago. I was at a big Db2-related conference, sitting in the audience for a session delivered by the person who was at that time the leader of the optimizer team in the IBM Db2 for z/OS development organization (Db2's optimizer parses a query and generates for that query the access plan that it estimates will produce the requested result set at the lowest cost and with the best performance). The presenter said (in words to this effect, and with emphasis added by me), "As the leader of the Db2 for z/OS optimizer team, my message for application developers is this: job one for you is to write the query that will retrieve the data that your program needs. If that query could be written differently so as to retrieve the same result faster, we'll take care of that." What he was specifically referring to is the Db2 optimizer's ability to re-write a query under the covers so that the same result will be generated faster (more on that re-write capability momentarily). That statement by the optimizer team leader made a huge impression on me, and I think his words were absolutely spot-on.

I feel that it's very important for an application developer, when writing SQL targeting a Db2 for z/OS database, to focus on a query's objective, versus its form. Why? For one thing, job one really is to get the right data. If a query returns incorrect or incomplete data to a program, who cares if the query runs quickly? A bad result that is returned in a short time is still a bad result. Nothing is more important than retrieving the data that a program requires. Secondly, I believe it's very important for a developer writing Db2-targeting SQL to not have to think about the fact that the target DBMS is Db2 for z/OS. All you as an application developer should really have to think about is that the target DBMS is relational in nature. If you have to stop and think, "Oh, let's see - the data I'm going after is in a Db2 for z/OS database. That means I have to do X, Y and Z in order to get good performance," that's going to negatively impact your productivity, assuming that you're also called on to write SQL that targets other relational DBMSs. As far as I'm concerned, when Db2 for z/OS is the target DBMS you should just think, "relational DBMS," and go from there.

Here's another reason that a request to "tell developers how to write efficient Db2 for z/OS SQL" raises my hackles: too many Db2 for z/OS DBAs, in my opinion, just assume that the average application developer writes inefficient SQL. It's kind of like complaining about the food at college just because it's college food, regardless of whether or not it's actually tasty. That's not a helpful attitude. I've advised Db2 for z/OS DBAs that they should think of themselves as partners with developers when it comes to getting new applications and new application functionality into production. Similarly, I would advise developers to be partners with Db2 for z/OS DBAs when it comes to analyzing and addressing performance issues related to Db2 for z/OS-accessing applications.

How can a developer be a partner when it comes to taking action to resolve performance issues related to Db2 for z/OS-targeted queries? Some thoughts on that matter:

Learn some of the lingo. Sometimes, Db2 for z/OS DBAs will say things like, "This SQL statement isn't performing well because it has this stage 2 predicate?" Huh? OK, here's what that means: predicates (the result set row-qualifying parts of a query, such as WHERE ACCOUNT_NUM = 1234) in Db2 for z/OS SQL statements can be either stage 1 or stage 2 in nature. These terms refer, respectively, to two components of Db2 for z/OS: the data manager (stage 1) and the relational data system (stage 2). A stage 1 predicate can be evaluated by the Db2 data manager, while a stage 2 predicate has to be processed by the Db2 relational data system. Stage 2 predicates generally require more CPU time for processing than stage 1 predicates; furthermore, stage 1 predicates are usually index-able, while stage 2 predicates are almost never index-able. That index-able versus non-index-able characteristic of a query might result in a requirement for a table space scan when the query is processed, and that could really slow down query execution, especially when the table in question is really large. On the other hand, a stage 2 predicate in a query might not be a big deal if the query has another predicate or predicates that are highly filtering (i.e., that are evaluated as "true" for only a small number of a table's rows) and index-able.

If you're interested, you can read about stage 1 and stage 2 and index-able and non-index-able predicates on this page of the Db2 for z/OS online documentation. Do you need to sweat a lot about stage 1 versus stage 2 predicates when writing SQL statements for an application? I'd say, not necessarily. Remember that job one is to write a query that returns the data that your program needs. On top of that, Db2 for z/OS, especially over the most recent several versions, has gained more and more query re-write capabilities (as I mentioned previously). Suppose, for example, that you need to get from a Db2 for z/OS table all customers whose accounts were opened in 2010, and you write a predicate like this one to get those rows:

WHERE YEAR(DATE_OPENED) = 2010

That predicate is stage 2 and non-index-able; however, Db2 for z/OS, in preparing your query for execution, can automatically re-write that predicate in this form, which is stage 1 and index-able:

WHERE DATE_OPENED BETWEEN '2010-01-01' AND '2010-12-31'

Know something about what's possible for improving a query's performance. If a query processed by Db2 for z/OS is not performing as it needs to, re-writing the query in some way is one possible solution, but there may be other performance-boosting actions that could be taken instead. One possible solution could be a Db2 for z/OS index created on an expression - something do-able since Db2 Version 9 (as of the date of this blog post, the current Db2 for z/OS version is 13). Suppose, for example, that your program needs rows selected from a Db2 for z/OS table based on an upper-case comparison of values in column COL1 with a provided character string constant. Your query might have a predicate that looks like this:

WHERE UPPER(COL1, 'EN_US') = 'ABCDE'

That predicate is stage 2 and non-index-able; however, it will become index-able if an index is defined on that expression, as shown below (assume that COL1 is a column of table T1):

CREATE INDEX UPPER_VAL ON T1
(UPPER(COL1, 'EN_US'))

Another possible query tuning action is to provide the Db2 optimizer with the catalog statistics that it needs to choose a better-performing access plan for the query. In fact, colleagues of mine who are part of the team in IBM Support that works on cases (i.e., problem tickets) related to Db2 for z/OS query performance have said that the large majority of query performance issues on which they work are ultimately resolved in this manner. Here's the deal: the primary input to Db2 for z/OS access path selection is statistics about objects related to the query - e.g., tables, and indexes on those tables, and columns in tables - that are periodically collected (often by a Db2 for z/OS utility called RUNSTATS) and stored in the Db2 catalog. The richer and more complete the catalog statistics are, the better the Db2 optimizer can do in generating a well-performing access plan for a query. The tricky part is this: what statistics should be gathered for tables, columns, indexes, etc. to enable the optimizer to choose a well-performing access path for a particular query? Would histogram statistics on a given column help? How about frequent-value percentage information for a column? How about correlation statistics for a certain pair of table columns? Telling RUNSTATS to generate every possible statistic on everything would make that utility too costly to execute, so the utility is often executed with a specification that generates what you might call a good "base" of statistics (TABLE(ALL) INDEX(ALL) is typically the specification used for this purpose). How do you know when the optimizer needs additional statistics - and which additional statistics - in order to generate a well-performing access plan for a query that is currently performing poorly? Fortunately, starting with Db2 12 for z/OS the optimizer answers this question for you in the form of statistics profiles, as described in an entry I posted to this blog a few years ago. I'd say, if a query you wrote is not performing as it needs to then before trying to re-write the query or asking a DBA to add or alter an index to address the situation, see if Db2 has inserted a statistics profile in the SYSTABLES_PROFILES catalog table for one or more of the tables accessed by your query (as described in the aforementioned blog entry). If there is such a statistics profile or profiles, work with a DBA to get RUNSTATS executed using the profile(s) and then let Db2 re-optimize the query using the statistics added to the catalog by that RUNSTATS job (for a so-called static query, re-optimization would be accomplished via a REBIND of the associated Db2 package; for a query that is dynamic in the Db2 sense of that word, re-optimization is triggered by invalidating the previous prepared form of the query in Db2's dynamic statement cache). In plenty of cases this will resolve a query performance issue.

Know something about EXPLAIN. EXPLAIN is a Db2 SQL statement (also an option of the BIND and REBIND PACKAGE commands, for static SQL statements) through which you can get information about the access path selected by the optimizer for a query (you can read about the EXPLAIN statement in the online Db2 for z/OS documentation). EXPLAIN-generated access path information, in its traditional form, is written to the EXPLAIN tables (these are Db2 tables, as you might expect). The most important of these tables is the one called PLAN_TABLE. Information in this table shows, among other things, the order in which the parts of a query are executed (for example, the order of table access when a statement involves a multi-table join), how data in a table is accessed (e.g., through an index and, if so, which index), the number of columns in an index key that are a match for columns referenced in a query predicate (MATCHCOLS - a higher number is generally a good thing), and the type of join method used when tables are joined (e.g., nested loop or merge scan). If you know something about the information in PLAN_TABLE, you'll be better equipped to partner with a Db2 for z/OS DBA to see how execution of a query that is not performing as desired could potentially be sped up.

A lot of veteran Db2 for z/OS DBAs are very familiar with EXPLAIN information in its PLAN_TABLE form. If you're interested in viewing EXPLAIN information in a different form, consider using the Visual Explain feature of the IBM Db2 for z/OS Developer Extension for Visual Studio Code (a no-charge IBM software tool designed to facilitate development of applications that access Db2 for z/OS data). Visual Explain (as the feature's name implies) provides a visual representation of the access path selected by the Db2 optimizer for a query; and, it's not just pretty pictures - hovering over or clicking on a part of the displayed access path provides very useful related information. Among other things, you can see the number of result set rows that Db2 thinks there will be following execution of that part of the access plan. For a query that is not performing as desired, you might see such information and think, "That's not right. After accessing that table the result set should have way more (or way fewer) rows than indicated by this estimate." The implication here is that you know something about the data that Db2 doesn't know (thus the Db2 optimizer's off-the-mark estimate concerning refinement of the result set as the query's access plan is processed). That, in turn, could suggest that catalog statistics might need to be augmented to provide Db2 with a clearer view of the characteristics of data in a target table (as mentioned in the reference, above, to Db2's SYSTABLES_PROFILES catalog table); or, that might prompt you to consider adding or modifying a query predicate to provide Db2 with a different form of the data request - one that might generate the required result set more quickly.

The bottom line is this: while your primary focus in writing Db2 for z/OS SQL statements should be on retrieving the data your program requires (and I'm focusing on queries because INSERT, UPDATE and DELETE operations are usually more straightforward from a performance perspective), there could be a situation in which a query you coded needs to execute with greater speed and efficiency. Query performance tuning is something to which many Db2 for z/OS DBAs are accustomed, but success in that endeavor can be accelerated and made more likely when developers and DBAs work on this as a team. Don't worry about knowing as much about Db2 for z/OS as your DBA teammate - that's not your job; but, realize that your understanding of your application's data requirements, and of the data the application is accessing in Db2, can be a big help when it comes to tuning a query's performance. It's definitely a case in which 1 (your specialized knowledge as a developer) plus 1 (the DBA's specialized knowledge of Db2 for z/OS) is greater than 2.

In the part 2 of this blog entry I'll have some things to say about application enablement in a Db2 for z/OS context - that is, about ways that you can leverage Db2 functionality to get more feature-rich applications developed more quickly.

OUTBUFF: A Db2 for z/OS ZPARM You Really Ought to Check

2023-05-25T14:15:00.000-07:00

Over the past year or so, I've seen more and more situations in which a too-small Db2 for z/OS log output buffer is negatively impacting system and application performance. The Db2 development team took aggressive action to remedy that situation via a change to the default value of OUTBUFF (the relevant ZPARM parameter) with Db2 13, but if you are not yet on Db2 13 you should make this change yourself in your Db2 12 environment (and, if you are on Db2 13, you should definitely be using the new OUTBUFF default, or an even higher value). In this blog entry I'll provide information that I hope will make all of this clear and meaningful for you.

The Db2 for z/OS log output buffer

Db2, of course, logs changes made to database objects (aside from the situation in which a table is defined with the NOT LOGGED attribute - unusual, in my experience). This is data integrity 101 - data changes have to be logged so that they can be rolled back if a unit of work fails before completing, and so that database objects can be recovered when that is required, and so that a Db2 subsystem can be restarted and restored to a consistent state after an abnormal termination, etc., etc.

Db2 data changes are physically written to the active log data sets, which are made reusable (i.e. made available for further use after being filled) via the log archive process. Prior to being written to the current pair of active log data sets (you ALWAYS want to do dual-logging, at least in a production Db2 environment), data changes are written to the log output buffer in memory. Information in the log output buffer is externalized (i.e., written to the disk subsystem) when the log output buffer is full, and also when a data-changing unit of work commits.

The size of the log output buffer is specified by way of the OUTBUFF parameter in the Db2 ZPARM module (think of the ZPARM module as the configuration parameter settings for a Db2 subsystem). For Db2 12, the default value for OUTBUFF is 4000 KB (that became the default value for OUTBUFF starting with Db2 10 for z/OS). With Db2 13, the OUTBUFF default value was changed to 102400 KB. Yeah, that's a 25X increase (when I communicated that in writing to the Db2 for z/OS team at a certain site recently, the Db2 systems programmer on the team asked me, "Is that a typo?"). Why this major increase in the OUTBUFF default value? Two reasons:

It's eminently do-able for the vast majority of production Db2 subsystems that I have seen. Yes, in a relative sense a 25X increase in a ZPARM parameter's default value may seem to be a really aggressive move, but in absolute terms the increase - about 98 MB - is a drop in the bucket for a z/OS LPAR with a large real storage resource. Many production z/OS LPARs these days are generously configured with memory, because mainframe memory keeps getting cheaper on a per-gigabyte basis, and because leveraging that memory can be very good for system performance. It's increasingly common for production z/OS LPARs to have multiple hundreds of GB - or more - of central storage.
It can be very helpful for Db2 system and application performance, as noted below.

From a performance perspective, a larger Db2 log output buffer has two main benefits. They are...

Larger OUTBUFF benefit 1: reduced application log write wait time

Db2 accounting trace data - specifically, data provided by accounting trace class 3 (which, along with accounting trace classes 1 and 2, is almost always active at all times for a production Db2 subsystem) - records (among other things) the time that Db2-accessing applications wait for log write operations to complete. Average wait-for-log-write-I/O time (available via a Db2 monitor-generated accounting long report) is generally quite small - often less than 1% of total in-Db2 wait time (i.e., class 3 time) for a Db2 application workload. In some cases, however, this can be a significantly larger percentage of in-Db2 wait time. Now, there are multiple factors that can contribute to elevated wait-for-log-write-I/O time, but one of these factors can be a too-small log output buffer. If you see higher levels of wait-for-log-write-I/O time for your Db2 application workload, check the value of the field labeled UNAVAILABLE OUTPUT LOG BUFF (or something similar to that - different Db2 monitor products sometimes label the same field in slightly different ways) in a Db2 monitor-generated statistics long report (the field will be in a section of the report with the heading LOG ACTIVITY, or something similar to that). In my experience the value of this field is usually 0, but if the field has a non-zero value then it could be a good idea to set OUTBUFF to a larger value for the Db2 subsystem in question. Even if the value of UNAVAILABLE OUTPUT LOG BUFF is 0, if your Db2 subsystem has a log output buffer that's on the small side then making it larger via an increase in the OUTBUFF value could help to make Db2 log write operations more efficient, thereby potentially contributing to a decrease in wait-for-log-write-I/O time for your Db2-accessing applications.

Larger OUTBUFF benefit 2: better log read performance

The importance of good Db2 log write performance should be obvious: Db2 is writing to its log all the time, so getting that work done quickly and efficiently is good for any Db2 data-changing application. Can log read performance be important for a Db2-accessing process? YES - and that's especially true for a Db2 data-change-replication process.

It's not unusual for Db2 for z/OS-managed data to be replicated to some other location for some purpose. The data replication tools, from IBM and other vendors, that capture Db2 for z/OS data changes and send them in near-real time to another location generally do their data change capture work by issuing requests to the log manager component of Db2 to retrieve data change information (this is done using a Db2 trace record, IFCID 306, that can be requested synchronously by a process such as a data replication tool). Especially when the volume of changes made to data in a replicated Db2 table is high, you REALLY want the Db2 log manager to be able to retrieve the requested data change information from the log output buffer in memory, versus having to read the information from the log data sets on disk. If the log output buffer is too small, you can see a high percentage of log read requests that require access to the log data sets on disk. The volume of such log data set read I/Os can be very high - like, thousands per second. That chews up CPU cycles and adds to data replication latency (this latency refers to the time between a change being made to data on the source Db2 for z/OS system and that change being reflected in the corresponding data at the replication target location) - both things you'd rather avoid.

How can you check on this? Again, go to a statistics long report generated by your Db2 monitor, and again go to the section under the heading, LOG ACTIVITY (or something similar to that). Check the fields labeled READS SATISFIED-OUTP.BUF(%) and READS SATISFIED-ACTV.LOG(%). What you want to see (what I'd certainly like to see) is a value for READS SATISFIED-OUTP.BUF(%) that is well north of 90, and - conversely - a value for READS SATISFIED-ACTV.LOG(%) that is in the single digits (ideally, low single digits). If you see a lower than desired value for the percentage of log reads satisfied from the log output buffer, make OUTBUFF larger if you can.

Can you make OUTBUFF larger, and if so, how high should you go?

The answer to the first part of that question (assuming that the value of OUTBUFF is not already at the maximum of 400000 KB) depends on the pressure (or lack thereof) on the real storage resource of the z/OS LPAR in which the Db2 subsystem of interest is running. My favorite indicator of real storage constraint is the LPAR's demand paging rate, available from a z/OS monitor-generated summary report. If the LPAR's demand paging rate is less than 1 per second, the real storage resource is not at all constrained, and you have (as far as I'm concerned) a green light for increasing the OUTBUFF value. If the LPAR's demand paging rate is over 1 per second, you might want to see if more memory can be configured for the system, or if some memory can be freed up (perhaps by shrinking a Db2 buffer pool that is larger than it needs to be), prior to making the value of OUTBUFF significantly larger than its existing value.

If the z/OS LPAR's real storage is not constrained (as described above), and you want to make a Db2 subsystem's OUTBUFF value larger, how high should you go? First of all, I would highly recommend setting OUTBUFF at least to the new-with-Db2-13 default value of 102400 KB. Should you go higher than that? Well, I would if the value of READS SATISFIED-OUTP.BUF(%) is less than 90. One thing to keep in mind here: OUTBUFF is not an online-updatable ZPARM. That means you have to recycle a Db2 subsystem (i.e., stop and restart it) in order to put a new OUTBUFF value into effect. In a Db2 data sharing system, that may not be a big deal (application work can continue to process on other members of the data sharing group as the one member is recycled), and even for some standalone Db2 subsystems there are regular opportunities to "bounce" the subsystem. On the other hand, at some sites where Db2 runs in standalone mode there are only a few times per year when a production Db2 subsystem can be recycled. If that's your situation, you might want to consider going to the maximum OUTBUFF value of 400000 KB (again, if the LPAR's memory is not constrained - and it's not if the LPAR's demand paging rate is less than 1 per second).

And that's what I have to say about OUTBUFF. Check yours, and check the relevant information in Db2 monitor-generated accounting and statistics reports to see if an OUTBUFF increase would be good for your system.

Migrating to Db2 13 for z/OS When You Have Old (pre-11.1) Db2 Client Code on Your App Servers

2023-04-28T15:23:00.000-07:00

Not long ago, I had a talk with a Db2 for z/OS systems programmer who works at a pretty big site. In a somewhat dramatized form, our conversation went something like this:

Me: "When are you guys going to migrate your production Db2 for z/OS systems to Db2 13?"

Db2 sysprog: "Later than I'd like."

Me: "Why's that?"

Db2 sysprog: "We have some old Db2 client code on some of our application servers."

Me: "So?"

Db2 sysprog: "So, I can't take APPLCOMPAT for our NULLID packages above V12R1M500."

Me: "No prob. Just leave the APPLCOMPAT value for the NULLID packages at V12R1M500, and go ahead and activate function level V12R1M510, and then migrate the systems to Db2 13."

Db2 sysprog: "I can do that?"

Me: "YES."

The very next week, I had a very similar exchange with another Db2 for z/OS administrator at a different site. It seems clear to me that there's some misunderstanding in this area out there, with people thinking that way-old Db2 client code represents a roadblock on the way from Db2 12 for z/OS to Db2 13. NOT TRUE, as I hope to make clear in this blog entry.

Terminology: "Db2 client code"

This term refers to the piece of IBM code that runs on a remote (from the Db2 for z/OS perspective) server that enables an application on that server to be a DRDA requester (DRDA is short for distributed relational database architecture - the protocol used for Db2 distributed database processing). A DRDA requester application is one that sends SQL statements to Db2 by way of a driver such as IBM's JDBC or ODBC driver. Most often, the Db2 client code is the IBM Data Server Driver Package (for which entitlement is related to an organization's license for IBM Db2 Connect). Sometimes, it's something like the IBM Db2 Connect Runtime Client. In any case, the Db2 client code is considered to be part of the Db2 for Linux/UNIX/Windows (LUW) product family, and it will have a version that corresponds to a Db2 for LUW version.

What is "old Db2 client code" in this context?

Short answer: any version prior to 11.1. Now, to explain that short answer: some would say (understandably) that "old code" means out-of-support code. The 11.1 version of Db2 client code is out of support (and has been since April of 2022 - see https://www.ibm.com/support/pages/db2-distributed-end-support-eos-dates). Why, then, do I refer to pre-11.1 Db2 client code as being "old," implying that 11.1 Db2 client code, though out of support, is not "old?" It all has to do with context, and the context in this case is a Db2 12 for z/OS system that is the DRDA server for DRDA requester applications.

APPLCOMPAT is a Db2 for z/OS package bind parameter that specifies the Db2 application compatibility level that will be in effect when the package is executed (for more information about APPLCOMPAT, see the part 1 and part 2 blog entries I posted on that topic a few years ago). With the possibility of a few exceptions, every Db2 for z/OS package will have an APPLCOMPAT value, and that is true for the packages in the package collection called NULLID. NULLID is the "home" collection for the Db2 for z/OS packages that are executed when a DRDA requester application accesses the Db2 for z/OS system.

Here's the crux of the matter at hand: if the APPLCOMPAT value for the NULLID packages is taken above V12R1M500, DRDA requester applications will get an error when trying to connect to the Db2 for z/OS system if they are using pre-11.1 Db2 client code.

Why the preceding sentence does not amount to a Db2 13 migration roadblock

Before you can migrate to Db2 13 for z/OS from Db2 12, you have to activate Db2 12 function level V12R1M510 (the last of the Db2 12 function levels). What the Db2 for z/OS systems programmer I referenced at the beginning of this blog entry thought, and what apparently a number of other Db2 for z/OS people think, is that the APPLCOMPAT value for the NULLID packages (and maybe for other Db2 for z/OS packages, as well) has to be V12R1M510 before you can migrate a Db2 12 system to Db2 13. THAT IS NOT TRUE. Can you have, in a Db2 13 system, packages in the NULLID collection (and in other collections) that have an APPLCOMPAT value of V12R1M500? YES, YOU CAN. In fact, APPLCOMPAT values as low as V10R1 are supported in a Db2 13 environment. So, if your NULLID packages are at APPLCOMPAT(V12R1M500), and old (as defined above) Db2 client code is keeping you from upping that APPLCOMPAT value for your NULLID packages, leave the NULLID packages at APPLCOMPAT(V12R1M500), and activate function level V12R1M510 (when your Db2 code and catalog are at the right level), and then migrate the Db2 12 system to Db2 13. There is NOTHING about having NULLID packages at APPLCOMPAT(V12R1M500) that gets in the way of your doing this.

But what if you really want to take APPLCOMPAT for your NULLID packages to a higher level?

First, why might you want to do this? Best answer, I'd say: because you want developers of DRDA applications in your environment to be able to use the latest Db2 for z/OS SQL syntax and functionality (one particularly noteworthy example: the new built-in AI functions of Db2 13 for z/OS, part of that version's SQL Data Insights feature, which can be used via packages with an APPLCOMPAT value of V13R1M500 or higher). If there's pre-11.1 Db2 client code on some of your application servers, and you really want to take APPLCOMPAT higher than V12R1M500 for your NULLID packages (I would), you have a couple of options:

One option: update your Db2 client code. This would be for many people the ideal approach. Get the Db2 client code to the current level, which is 11.5, and you get two benefits: 1) you're actually using Db2 client code that is supported by IBM (always nice), and 2) you can take APPLCOMPAT for your Db2 for z/OS NULLID packages as high as you want. Of course, updating the Db2 client code will likely require working in concert with application server administrators in your organization that can perform the code update.
Another option: leave the old Db2 client code out there, and raise the APPLCOMPAT value for your NULLID packages anyway. This might be the required approach, at least in the near term, if your application server administrators are not presently able to help update old versions of Db2 client code within your IT infrastructure. How can you do this without causing connection errors for DRDA requester applications that are using old Db2 client code? You do that with the Db2 profile tables, together with an "alternate" collection for the IBM Data Server Driver / Db2 Connect packages, as explained below.

Creating (and, more importantly, using) an alternate collection for the IBM Data Server Driver / Db2 Connect packages

Step 1 for this approach is to create the alternate collection for the packages whose "home" collection is NULLID. This is pretty easy to do: just BIND COPY the packages in the NULLID collection into a collection with some other name (I'll go with OLD_COLL for this example), and in doing that specify APPLCOMPAT(V12R1M500). DRDA requester applications using pre-11.1 Db2 client code will not get connection errors when they use the packages in that OLD_COLL collection. Ah, but how do you get those applications to use the OLD_COLL collection when they will, by default, be looking to use packages in the NULLID collection (NULLID is the default Db2 for z/OS package collection for DRDA requester applications)? This is where the Db2 profile tables come in.

You can use SYSIBM.DSN_PROFILE_TABLE to identify a component of your DDF workload for which you want Db2 to take some action. The component of the DDF workload of interest here is the DRDA requester applications that are using pre-11.1 Db2 client code. How can you identify that DDF workload component as a profile? Easy: use the PRDID (short for product identifier) column of DSN_PROFILE_TABLE (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=tables-dsn-profile-table). How do you know which product ID(s) to use? You can get that information via output of the Db2 command -DISPLAY LOCATION (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=work-displaying-information-about-connections-other-locations). In the PRDID column of the command output, you'll see the product IDs associated with requesters, and there you'll see the version and release of the Db2 client code that a requester is using (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=work-product-identifier-prdid-values-in-db2-zos). Using the PRDID information provided via -DISPLAY LOCATION, insert a row (or rows) in DSN_PROFILE_TABLE for the pre-11.1 Db2 client code that is used in your environment. Having done that, for that row (or rows) in DSN_PROFILE_TABLE, insert a corresponding row (or rows) in SYSIBM.DSN_PROFILE_ATTRIBUTES to tell Db2 what you want it to do when one of the DRDA requesters using pre-11.1 Db2 client code requests a connection to the Db2 for z/OS system. And what do you want Db2 to do? You want Db2 to issue SET CURRENT PACKAGE PATH = OLD_COLL (using my example name for the collection into which you BIND COPY-ed the NULLID packages with a specification of APPLCOMPAT(V12R1M500)). This will make OLD_COLL the default collection for the DRDA requester applications using pre-11.1 Db2 client code. Having done this, you can take APPLCOMPT for the NULLID packages higher than V12R1M500, to the benefit of DRDA requester applications that are using 11.1-or-higher versions of the Db2 client code (note that the SET CURRENT PACKAGE PATH = OLD_COLL will happen at application connection time, so after doing the BIND COPY and profile table work you may need to have someone recycle the application servers on which old Db2 client code is running, so they'll get new connections to the Db2 for z/OS system and will be pointed to the OLD_COLL package collection). There is additional information on this approach in an entry I posted to this blog a few years ago (that entry concerns an alternate collection of IBM Data Server Driver / Db2 Connect packages used to get high-performance DBAT functionality, but the collection redirection technique is the same).

And there you have it. I hope you don't have old Db2 client code on your application servers, but if you do, don't worry about that being an impediment to getting to Db2 13, because it isn't.

Db2 for z/OS: If Index FTB Functionality is Disabled at Your Site, It's Time to Reconsider That

2023-03-30T14:30:00.000-07:00

Over the course of the past three weeks, I reviewed ZPARM settings (i.e., configuration parameter values) for three different production Db2 12 for z/OS environments at three different sites, and I noticed that index FTB (fast traverse block) functionality had been disabled in all three cases. I recommended to all three associated Db2-using organizations that they change the relevant ZPARM setting to re-enable FTB functionality, after first validating that the fixes for a set of related Db2 APARs have been applied to their Db2 12 code (the changes made by the fixes are part of the base Db2 13 code). My recommendation for you, if the FTB feature of Db2 has been "turned off" at your site, is to do the same: turn it on, after doing the aforementioned check of Db2 software maintenance if you're using Db2 12. In this blog entry, I'll explain what FTB functionality is, why it was disabled at some sites, and why it's time to go from "disabled" to "enabled" where feature deactivation has happened. I'll also provide information about the fixes (PTFs) that should be on your system to ensure the robust functioning of FTB-supporting Db2 code (again, if we're talking about Db2 12 - the base Db2 13 code has the solidified FTB functionality provided by the Db2 12 fixes).

The FTB raison d'etre: efficient use of non-leaf information in Db2 indexes

Db2 for z/OS indexes serve various purposes, such as assisting with maintenance of a desired ordering of rows in a table and ensuring uniqueness of key values for which duplicates would be problematic, but for the most part indexes in a Db2 system are there to speed the execution of queries (and of non-query SQL statements that contain predicates, aka search clauses). Indexes deliver this performance boost by enabling identification of query result set rows without a laborious row-by-row examination of values. It can be said that Db2 indexes provide shortcuts that get you to your destination (a query's result set) faster than would otherwise be possible.

The information in a Db2 index is arranged in what is known as a B-tree structure. The logical representation of this structure has something of the appearance of an upside-down tree: you have the root page at the top, and the leaf pages at the bottom. In-between the root page and the leaf pages of an index (unless the underlying Db2 table is quite small), you will have one or more levels of non-leaf pages. Finding a row in a table by way of an index on the table involves what is known as an index probe operation: Db2 starts at the root page and navigates down through the other non-leaf levels to reach the leaf page that contains the searched-for key value and the ID of the row (i.e., the row ID, or RID in Db2 parlance) or rows in which the key value can be found.

OK, so what is the value of index fast traverse blocks? Well, an index probe involves GETPAGE activity. A GETPAGE is a Db2 request to examine the contents of a page in an index or a table space (when the page in question is not already in a Db2 buffer pool in memory, the GETPAGE drives a read I/O request). The more rows a table has, the more levels an index on the table can have. More index levels means more GETPAGE activity associated with use of the index, and that matters because GETPAGE activity is one of the main determinants of the CPU cost of executing a query. Index fast traverse block functionality, introduced by Db2 12 for z/OS, improves CPU efficiency for query execution by reducing index-related GETPAGE activity.

An FTB reduces index GETPAGE activity by providing Db2 with a way to get to the leaf page of an index in which a query-predicate-matching key value is found without having to perform a top-to-bottom index probe. How that works: when Db2 builds an FTB structure in memory that is based on a given index, Db2 puts in that FTB structure the information in the non-leaf pages of the index (note that this is NOT just a matter of caching the index's non-leaf pages in memory - the FTB structure has a space requirement that is considerably smaller than what would be required to cache the index's non-leaf pages in an as-is manner); furthermore, navigation through an FTB structure does not require GETPAGE activity. Yes, FTB navigation does involve some instruction path length, but less than would be needed for the index GETPAGEs that would otherwise be required to get to a target leaf page. Let's say that an index on a large table has five levels. Retrieving a table row via the index will require six GETPAGEs - five for the index and one for the table space. If, on the other hand, Db2 has built an FTB structure from the index, when a query having a predicate that matches on the index's key is executed then Db2 can go to the FTB structure with the key value referenced in the predicate, and the FTB will tell Db2, "This is the leaf page in which you'll find that key value." Db2 then does one GETPAGE to examine that leaf page's contents, finds the key value and the associated RID, and does one more GETPAGE to access the row in the table space. Thanks to the FTB, we've gone from six GETPAGEs (five for the index and one for the table space) to two GETPAGEs (one for the index leaf page, one for the table space). Pretty good.

How is FTB functionality turned off, and why would anyone do that?

The FTB "on/off switch" is the ZPARM parameter INDEX_MEMORY_CONTROL. The default value for that parameter is AUTO. When INDEX_MEMORY_CONTROL is set to AUTO, Db2 notes the size of the subsystem's buffer pool configuration (i.e., the aggregate size of the subsystem's buffer pools) and says (figuratively speaking), "OK, I can create FTB structures from indexes, and the maximum amount of in-memory space I'll use for those FTB structures is equivalent to 20% of the size of the buffer pool configuration." Note that this is not space taken away from the buffer pools - it's net additional use of the z/OS LPAR's real storage by Db2. Consider an example: Db2 subsystem DB2P has 50 GB of buffer pools. If INDEX_MEMORY_CONTROL for DB2P is set to AUTO, DB2P can use up to 10 GB (20% times 50 GB) of memory for index FTBs. The size of the DB2P buffer pool configuration is not affected by FTBs - it remains at 50 GB. Got it?

Besides AUTO, another acceptable value for INDEX_MEMORY_CONTROL is an integer between 10 and 200,000. That would set the FTB memory usage limit in terms of megabytes. Using the previous example, if the z/OS LPAR in which subsystem DB2P is running is generously configured with memory, the organization might decide to set INDEX_MEMORY_CONTROL to 20000 if they want Db2 to be able to use up to about 20 GB of memory for index FTBs, versus the 10 GB limit established via the AUTO setting (20% of the 50 GB size of the buffer pool configuration assumed for the example). If, on the other hand, the z/OS LPAR's memory resource is quite limited, the organization might opt to set INDEX_MEMORY_CONTROL to 1000, to restrict DB2P's use of memory for index FTBs to about 1 GB (I say, "about," because 1 GB is actually 1024 MB).

INDEX_MEMORY_CONTROL can also be set to DISABLE. That has the effect of turning FTB functionality off. Why would someone disable a CPU efficiency-boosting Db2 feature? Well, relatively early on in the lifecycle of Db2 12 for z/OS (which became generally available in October of 2016), a few sites encountered some issues related to index FTB functionality. In some cases, use of an FTB was seen to cause a query to return incorrect output. These situations were pretty uncommon (recall that index FTB functionality is on by default, and most Db2 12 sites with INDEX_MEMORY_CONTROL set to AUTO encountered no problems in leveraging the technology), but they were real. Some organizations heard that other organizations had had some problems related to FTB usage, so they disabled the feature as a preemptive measure. I get it.

Why using FTB functionality makes sense now

In response to the FTB-related issues mentioned above, the IBM Db2 for z/OS development team created a number of code fixes that addressed the problems reported by Db2-using organizations. These fixes and their associated APARs (an APAR is an official description of a software problem for which IBM commits to providing corrective service) are noted in a blog entry, written by members of the Db2 development organization, that can be viewed at https://community.ibm.com/community/user/datamanagement/blogs/paul-mcwilliams1/2020/10/08/new-look-ftb-db2-12. If INDEX_MEMORY_CONTROL is set to DISABLE at your site, and if you are using Db2 12 for z/OS, check to see if the PTFs listed in this blog entry have been applied to your Db2 code. If they have been applied (or if you are using Db2 13), you can use index FTB functionality with confidence. If you are using Db2 12 and the fixes have not been applied in your environment, my recommendation is to get them applied, perhaps as part of a roll-out of a new and more-current level of z/OS software maintenance at your site.

The confidence that the IBM Db2 for z/OS development team has in FTB functionality, with the corrective maintenance applied, is evidenced by a couple of things. First, Db2 12 function level 508 extended FTB functionality to non-unique indexes (it had originally been limited to unique indexes). Second, Db2 13 for z/OS makes FTB functionality available for a larger set of indexes by doubling the key-length limit for FTB-qualifying indexes - from 64 bytes to 128 bytes for unique indexes, and from 56 bytes to 120 bytes for non-unique indexes (as previously mentioned, the code corrections made for Db2 12 by the FTB-related fixes listed in the above-referenced blog entry are part of the Db2 13 base code). The Db2 development team would not have made FTB functionality available for a wider range of indexes if they were anything less than highly confident in the quality of the FTB-supporting code.

Note that if you have INDEX_MEMORY_CONTROL set to DISABLE, and you're interested in turning FTB functionality on but would like to do so in a more-controlled and more-limited way before going to a setting of AUTO, that option is available to you. As noted in the blog entry for which I provided the link, above, and in the Db2 12 and Db2 13 online documentation, you can tell Db2, via a specification of (SELECTED, AUTO) or (SELECTED, n) for INDEX_MEMORY_CONTROL (where n would be a user-designated limit, in MB, on the memory that Db2 can use for FTB structures), that FTB structures can be built only for indexes that you have identified as FTB candidates by way of the SYSINDEXCONTROL table in the Db2 catalog.

In summary, if you have the FTB-solidifying fixes applied in your Db2 12 environment, or if you are running with Db2 13, and you have INDEX_MEMORY_CONTROL set to DISABLE, you should rethink that. The current FTB code is very robust, and if you don't leverage the functionality then you're leaving CPU savings on the table. I'd prefer to see you realize those CPU savings.

Two RMF (z/OS Monitor) Reports with which Db2 for z/OS People Should be Familiar

2023-02-23T15:16:00.000-08:00

When it comes to analyzing the performance of a Db2 for z/OS system and associated applications, I think of a set of concentric circles. The outermost circle represents the operational environment in which Db2 is processing work - that would be the z/OS LPAR (logical partition) in which the Db2 subsystem runs. The next circle within that outer one represents the Db2 subsystem itself - its buffer pool configuration, EDM pool, RID pool, lock manager, recovery log, etc. The innermost of these concentric circles relates to the applications that access Db2-managed data. With the three circles in mind, I take an "outside-in" approach to Db2 system and application tuning. In other words, I begin with a look at the z/OS system within which Db2 is running, then I turn to the Db2 subsystem itself and lastly I analyze application-centric information. The reason for this approach? If the z/OS system in which Db2 is running is constrained in some way, there's a good chance that Db2 subsystem and application tuning actions will yield little positive impact. Similarly, if the Db2 subsystem is operating in a constrained fashion then application tuning actions may not do much good.

So, if assessing the operation of a z/OS system is important prior to turning to Db2 subsystem and/or application performance analysis, how do you determine whether the z/OS LPAR in question is running in a constrained or an unconstrained way? I do this based on examination of information in two RMF reports: the Summary report and the CPU Activity report. If you support a Db2 for z/OS system, you should be familiar with the content of these reports - in particular, some key metrics provided by the reports. In this blog entry I'll point out those key metrics and explain how I use them.

[Note: I am referring to reports generated by IBM's RMF z/OS monitor because RMF is the z/OS monitor with which I am most familiar. If your organization uses another vendor's z/OS monitor, that monitor might be able to generate reports similar to the RMF reports that are the subject of this blog entry - if need be, check with the vendor on that.]

The RMF CPU Activity report

RMF, by default (you can change this), carves the time period covered by a CPU Activity report into 15-minute intervals (so, if you requested a report for a one-hour time period you will see within that report four sub-reports, each providing information for a 15-minute part of the overall one-hour time period). Within a given 15-minute interval you will see, for the z/OS LPAR for which the report was requested, information that looks something like this (I highlighted two important values in red):

---CPU--- ---------------- TIME % ----------------
NUM TYPE ONLINE LPAR BUSY MVS BUSY PARKED
0 CP 100.00 87.03 86.85 0.00
1 CP 100.00 77.76 77.68 0.00
2 CP 100.00 83.88 83.78 0.00
3 CP 100.00 87.07 86.91 0.00
4 CP 100.00 76.23 76.14 0.00
5 CP 100.00 76.79 76.71 0.00
6 CP 100.00 80.45 80.35 0.00
7 CP 100.00 73.29 73.24 0.00
8 CP 100.00 63.83 69.22 0.00
9 CP 100.00 57.78 62.95 0.00
A CP 100.00 35.28 48.33 17.01
TOTAL/AVERAGE 72.67 75.16
12 IIP 100.00 66.63 58.68 0.00
46.30 0.00
13 IIP 100.00 26.70 23.42 0.00
18.24 0.00
14 IIP 100.00 9.21 8.07 0.00
6.42 0.00
3E IIP 100.00 0.00 ----- 100.00
----- 100.00
TOTAL/AVERAGE 25.64 26.86

Here is an explanation of what you see in the report snippet above:

NUM - This is the ID of a given "engine" (processor core) configured for the LPAR.
TYPE - CP is short for central processor (typically referred to as a "general-purpose engine"); IIP is short for integrated information processor (typically referred to as a "zIIP engine").
LPAR BUSY - Engine utilization from the LPAR perspective.
MVS BUSY - I think of this as the busy-ness of the physical engine - if the engine is used exclusively (or almost exclusively) by the LPAR in question, the LPAR busy and MVS busy numbers should be very similar.
PARKED - This is the extent to which an engine's capacity is NOT available to the LPAR during the 15-minute interval (so, if the engine is seen to be 75% parked then the LPAR has access to 25% of that engine's processing capacity). When engines in a mainframe "box" (sometimes called a CEC - short for central electronic complex) are shared between several LPAR's it's not unusual to see a non-zero parked value for at least some of an LPAR's engines).

Note that for this LPAR, there are two MVS BUSY values for each zIIP engine. Why is that? Well, it indicates that the zIIP engines are running in SMT2 mode. SMT2 is short for simultaneous multi-threading 2, with the "2" meaning that z/OS can dispatch two pieces of work simultaneously to the one zIIP core. Running a zIIP engine in SMT2 mode does not double the engine's capacity (each of the two pieces of work dispatched to the one zIIP core will not run as fast as would be the case if the zIIP engine were running in "uni-thread" mode), but for a transactional workload SMT2 can enable a zIIP engine to deliver around 25-40% more throughput versus uni-thread mode (think of a one-way, single-lane road with a speed limit of 60 miles per hour versus a one-way, 2-lane road with a speed limit of 45 miles per hour - the latter will get more cars from A to B in a given time period if there's enough traffic to take advantage of the two lanes). For more information on SMT2 mode for zIIPs, see the entry on that topic that I posted to this blog a few years ago.

OK, to the values highlighted in red in the report snippet:

TOTAL/AVERAGE MVS BUSY for the general-purpose engines (75.16 in the report snippet) - As a general rule, application performance - especially for transactional applications (e.g., CICS-Db2, IMS TM-Db2, Db2 DDF) - will be optimal if average MVS busy for an LPAR's general-purpose engines does not exceed 80%. When average MVS busy for the general-purpose engines exceeds 90%, you can see a negative impact on the performance of Db2-accessing applications in the form of what is labeled "not accounted for" time in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information. Not-accounted-for time is in-Db2 elapsed time that is not CPU time and not one of the "known" Db2 wait times (those are so-called class 3 wait times, such as wait for database synchronous read, wait for lock/latch, wait for update/commit processing, etc.). It's literally elapsed time, related to SQL statement execution, for which Db2 cannot account. In my experience, in-Db2 not-accounted-for time is most often a reflection of wait-for-dispatch time, which itself is indicative of CPU contention. I'm generally not too concerned about not-accounted-for time as long as it's less than 10% of in-Db2 elapsed time for an application workload - particularly when it's a higher-priority transactional workload (you might tolerate a higher percentage of not-accounted-for time for a lower-priority batch workload). If not-accounted-for time exceeds 10% of in-Db2 elapsed time (again, especially for a higher-priority transactional workload), that would be a matter of concern for me, indicating that CPU contention is negatively impacting application throughput.
TOTAL/AVERAGE MVS BUSY for the zIIP engines (26.86 in the report snippet) - How "hot" can you run zIIP engines before zIIP engine contention becomes a concern? That depends on how many zIIP engines the LPAR has (and, to a lesser extent, whether or not the zIIPs are running in SMT2 mode). The more zIIP engines an LPAR has, the higher the average MVS busy figure for those engines can go before zIIP contention becomes an issue (in the example shown above, the LPAR has three zIIP engines that are running in SMT2 mode - in such a situation average MVS busy for the zIIP engines could probably go to 40-50% without zIIP contention becoming an issue). And when does zIIP contention become an issue? When the zIIP spill-over percentage gets too high, as explained in an entry I posted a few years ago to this blog. [Note: the report snippet shows four zIIP engines, but the fourth - the one identified as processor number 3E - is 100% parked from the LPAR's perspective. That means the LPAR had no access to zIIP processor 3E's capacity, so in effect the LPAR had three zIIP engines during the time interval.]

Below the information shown in the report snippet above, you'll see a sideways bar chart that looks something like this (again, I've highlighted some key information in red):

-----------------------DISTRIBUTION OF IN-READY WORK UNIT QUEUE-

NUMBER OF 0 10 20 30 40 50 60 70

WORK UNITS (%) |....|....|....|....|....|....|....|....

<= N 55.9 >>>>>>>>>>>>>>>>>>>>>>>>>>>>

= N + 1 3.5 >>

= N + 2 3.1 >>

= N + 3 3.5 >>

<= N + 5 5.5 >>>

<= N + 10 10.9 >>>>>>

<= N + 15 5.7 >>>

<= N + 20 4.2 >>>

<= N + 30 3.1 >>

<= N + 40 1.5 >

<= N + 60 1.3 >

<= N + 80 0.4 >

<= N + 100 0.2 >

<= N + 120 0.1 >

<= N + 150 0.2 >

> N + 150 0.2 >

N = NUMBER OF PROCESSORS ONLINE UNPARKED (16.8 ON AVG)

With regard to the report snippet above, the first thing to which I want to draw your attention is the bottom-line information about "N". We see that, for this LPAR during this 15-minute interval, N = 16.8. What does that mean? It means that the LPAR had "16.8 processor targets to which pieces of work could be dispatched." Why do I use the phrase "processor targets" instead of "processors?" It's because we tend to think of "mainframe processors" as meaning "mainframe engines," and that's not quite the case here. This report snippet goes with the first one we looked at (the second snippet appears right after the first one in the source RMF CPU Activity report), and you might recall that the first snippet showed that the LPAR's three zIIP engines are running in SMT2 mode. For that reason, those three zIIP engines are counted as six processor targets to which pieces of work can be dispatched. Thus, when the report shows that N = 16.8, we can say that 6 of the 16.8 relate to the LPAR's zIIP engines. That leaves 10.8 (16.8 - 6 = 10.8). We've accounted for the zIIP engines, so the 10.8 number relates to general-purpose engines. Where does that 10.8 come from? Refer again to the first report snippet. You'll see that the LPAR had 10 general-purpose processors that were not at all parked (i.e. that were 0% parked from the LPAR's perspective). An 11th general-purpose engine, identified as processor number A, was 17.01% parked during the interval. That means that 83% of the capacity of general-purpose processor number A (that's a hexadecimal A) was available to the LPAR during the time interval. That 83% is equivalent to 0.83, and RMF rounds 0.83 down to 0.8, and that's where the ".8" of N = 16.8 comes from. So, then, the LPAR had 6 zIIP "targets" to which work could be dispatched (3 engines, each running in SMT2 mode), and 10.8 general-purpose targets to which work could be dispatched, and that's why we have N = 16.8.

With N now understood, we can turn our attention to the other bit of information I highlighted in red: <= N 55.9. What does that mean? It means that for 55.9% of the time in the 15-minute report interval, the number of "in and ready" tasks (i.e., the number of tasks ready for dispatch) was less than or equal to the number of processor targets to which pieces of work in the LPAR could be dispatched. When that is true - when the number of in-and-ready tasks is <= N - there is essentially nothing in the way of CPU constraint, because an in-and-read task won't have to wait in line to get dispatched to a processor. In my experience, when the <= N figure is above 80%, the LPAR is very unconstrained in terms of processing capacity. A figure between 50% and 80% is indicative of what I'd call moderate CPU constraint, and performance (particularly in terms of throughput) is likely not impacted much by a lack of processing capacity. When the figure is below 50%, I'd say that CPU constraint could be impacting throughput in a noticeable way, and if it's below 10% the performance impact of CPU constraint for the LPAR could be severe. As previously mentioned, the Db2 performance impact of a CPU-constrained system is typically apparent in elevated levels of in-Db2 not-accounted-for time, as seen in a Db2 monitor-generated accounting long report or an online monitor display of Db2 thread detail information.

One more thing about an RMF CPU Activity report: the number of engines configured for an LPAR - something that is shown in the report - should be balanced by an adequate amount of memory (aka real storage) so that the LPAR's processing power can be fully exploited to maximize application performance. For a z/OS LPAR in which a production Db2 workload runs, my rule of thumb, based on years of analyzing system and application performance data, is this: the LPAR should have at least 20 GB of memory per engine with which it is configured. The first report snippet included above shows that the LPAR has 13.8 engines: 10.8 general-purpose engines (as previously mentioned, the ".8" relates to an engine that is about 20% parked from the LPAR's perspective) and 3 zIIP engines (and for balanced-configuration purposes, I count physical zIIP cores - I don't double-count a zIIP engine because it is running in SMT2 mode). I'd round the 13.8 to 14 (the nearest integer) and say that on that basis the LPAR should have at least 14 X 20 GB = 280 GB of memory. If that seems like a lot to you, it shouldn't - mainframe memory sizes are getting bigger all the time, and real storage resources in the hundreds of GB are no longer unusual for production z/OS LPARs, especially those in which Db2 workloads run (the biggest real storage size I've personally seen for a z/OS LPAR is about 1100 GB).

The RMF Summary report

An RMF Summary report is smaller in size than a CPU Activity report - typically, one line of information for each 15 minute interval within the report time period. An RMF Summary report for a one-hour period could look something like what you see below (I removed some columns so that I could use a big-enough-to-read font size - the really important column is the one on the far right, with the numbers highlighted in green):

NUMBER OF INTERVALS 4 TOTAL LENGTH OF INTERVALS 00.59.58

-DATE TIME INT ... JOB JOB TSO TSO STC ... SWAP DEMAND

MM/DD HH.MM.SS MM.SS ... MAX AVE MAX AVE MAX ... RATE PAGING

11/03 09.15.00 15.00 ... 83 72 96 92 371 ... 0.00 0.00

11/03 09.30.00 14.59 ... 85 68 98 95 369 ... 0.00 0.00

11/03 09.45.00 15.00 ... 75 68 95 92 363 ... 0.00 0.00

11/03 10.00.00 14.59 ... 82 70 94 91 365 ... 0.00 0.00

-TOTAL/AVERAGE ... 85 69 98 93 371 ... 0.00 0.00

So, what's the LPAR's demand paging rate? It's the rate, per second, at which pages that had been moved by z/OS from real to auxiliary storage (to make room for other pages that needed to be brought into real storage) were brought back into real storage on-demand (i.e., because some process needs to access the page). Why is the demand paging rate important? Here's why: it is, in my opinion, the best indicator of whether or not memory usage can be expanded without putting too much pressure on the LPAR's real storage resource. Here's what I mean by that: suppose you have a Db2 buffer pool that has a total read I/O rate (synchronous reads + sequential prefetch reads + list prefetch reads + dynamic prefetch reads, per second) that's higher than you like - maybe the total read I/O rate for the pool is north of 1000 per second, and you want to bring that down substantially to boost application performance and CPU efficiency (every read I/O eliminated saves CPU and elapsed time). The best way to lower a buffer pool's total read I/O rate is to make the pool larger. Can you do that without putting too much pressure on the LPAR's real storage resource? Here's what I'd say: If the LPAR's demand paging rate is consistently less than 1 per second, you have a green light for using more memory to get a performance boost. If the LPAR's demand paging rate is consistently greater than 1 per second, I'd hold off on using more memory until the LPAR is configured with additional real storage. This goes for any action that would increase memory usage by DB2 - besides enlarging a buffer pool, that could be a RID pool or a sort pool or an EDM pool size increase, or increasing the use of RELEASE(DEALLOCATE) packages with threads that persist through commits, or whatever. Before doing something that will increase memory usage, check the LPAR's demand paging rate.

That's it for now. If you haven't had a look at these two RMF reports before, get them for an LPAR of interest to you and give them a look-see - a z/OS systems programmer at your site would probably be happy to generate the reports for you. Knowing the key utilization and configuration characteristics of the z/OS LPAR in which a Db2 subsystem runs is an important part of effective performance management of the Db2 environment.

Db2 for z/OS: What is "Wait for Other Read" Time, and What Can You Do About It?

2023-01-20T11:41:00.000-08:00

A recent conversation I had with some folks who support a large Db2 for z/OS system reminded me of the importance of something called "wait for other read time." In this blog entry I want to make clear to people what Db2 wait-for-other-read time is, why it's important, how to monitor it and what to do about it if it becomes an issue.

What is Db2 for z/OS "wait for other read" time?

In Db2 performance monitoring parlance, time associated with SQL statement execution is known as "in-Db2" time. It's also called "class 2" time, because it is recorded, for monitoring purposes, in records that are generated when Db2 accounting trace class 2 is active. Class 2 elapsed time (elapsed time pertaining to SQL statement execution) has two main components: CPU time (some of which is consumed on so-called general-purpose processors - aka "engines" - of a mainframe server, and some of which might be consumed on what are known as zIIP engines) and suspend time (on a busy system there can be another component of in-Db2 time, called "not accounted for" time, that generally reflects wait-for-dispatch time). In-Db2 suspend time is also known as "class 3" time, because it is recorded in Db2 accounting trace records when accounting trace class 3 is active. Class 3 time is broken out in a number of categories, and these show up in an accounting long report that might be generated by your Db2 monitor, or by a Db2 monitor's online display of thread detail information.

In a Db2 monitor-generated accounting long report, class 3 suspend times are shown as "average" values. Average per what? Well, if you're looking at information for a Db2-accessing batch workload (referring to jobs that run in z/OS JES initiator address spaces and access Db2 by way of Db2's call attachment facility or TSO attachment facility), it'll be average per batch job (generally speaking, activity for one batch job will be recorded in one Db2 accounting trace record). If you're looking at a transactional workload (e.g., a CICS-Db2 workload, or a Db2-accessing IMS transactional workload, or a DDF client-server workload), the "average" values seen in a Db2 monitor-generated accounting long report will typically be average per transaction.

In many cases, the majority of in-Db2 time for a batch or a transactional workload will be class 3 suspend time (it is a little unusual, but certainly not unheard of, for a Db2 workload's in-Db2 time to be mostly CPU time). More often than not, the largest component of in-Db2 class 3 suspension time will be wait-for-synchronous-database-read time. Another wait-for-read time is labeled "wait for other read." What's that? Well, if it's "other than" synchronous read wait time, it must be asynchronous read time, right? Right, indeed. And what are asynchronous reads? Those are prefetch reads: read I/Os driven by Db2 in anticipation that the pages read into memory in bulk in this way will be requested by the process (such as an application process) that prompted Db2 to issue the prefetch read requests. Well, if a prefetch read I/O operation is executed because Db2 is aiming to get pages into a buffer pool in memory before they are requested by (for example) an application process, why would there be such a thing as a process having to wait for a prefetch read to complete?

Wait-for-prefetch read (reported as "wait for other read") happens because there are usually lots of Db2-accessing processes active in a system at one time. Let's call two of these processes process A and process B, and let's say that Db2 is driving prefetch reads (these could be sequential, list or dynamic prefetch reads - more on that in a moment) for process A. We'll further suppose that Db2 needs to access page 123 of table space TS1 on behalf of process B (i.e., Db2 issues a GETPAGE request for page 123 of table space TS1). If page 123 of table space TS1 is not already in the buffer pool to which TS1 is assigned, Db2 will drive a synchronous read request to get that page into memory, right? Not necessarily. It could be that page 123 of TS1 is already scheduled to be brought into memory via a prefetch read that is being executed on behalf of process A. If that is the case then process B will wait for that in-flight prefetch read to complete, and that wait time will be recorded as "wait for other read time" for process B. [It is also possible that process A has gotten to the point that it needs to access page 123 of TS1, and the prefetch read that will bring that page into memory is currently in-flight, and that would end up causing wait-for-other-read time for process A related to the prefetch request being driven on behalf of process A, but I think it's more likely that wait-for-other-read time will be associated with one process waiting on completion of a prefetch read operation that is being executed on behalf of another process.]

Why is wait-for-other-read time important?

Usually, wait-for-other-read time is a relatively small percentage of total class 3 suspend time for a process (it's typically much smaller than wait-for-synchronous-read time), but that's not always the case. In some situations, wait-for-other-read time is a major component of overall in-Db2 suspend time. The performance impact of elevated wait-for-other-read time can be especially significant for batch applications, as these Db2 processes are often particularly reliant on prefetch to achieve elapsed time objectives. If wait-for-other-read time gets too large then service levels could degrade, leading to user dissatisfaction.

How can wait-for-other-read time be monitored?

As mentioned previously, wait-for-other read time is recorded in accounting long (i.e., accounting detail) reports that can be generated by a Db2 performance monitor; so, you can track that for a process or a workload over time and note trends. Besides wait-for-other-read time itself, are there any other related fields in Db2 monitor-generated reports that you should keep your eye on to help ensure that a wait-for-other-read time problem does not sneak up on you? Yes, as explained below.

The "other related fields" that I'd recommend checking out are found in a Db2 monitor-generated statistics long report (i.e., statistics detail report). In such a report you would see, for each buffer pool, a set of fields like those shown below (this is a snippet of a statistics long report generated by the IBM OMEGAMON for Db2 for z/OS performance monitor - I've added some A, B, C labels that I'll subsequently use in referencing various of these fields):

BP1 READ OPERATIONS QUANTITY /SECOND
--------------------------- -------- -------

SEQUENTIAL PREFETCH REQUEST 5622.00 3.23 A
SEQUENTIAL PREFETCH READS 5587.00 3.21 B
PAGES READ VIA SEQ.PREFETCH 52950.00 30.43 C
S.PRF.PAGES READ/S.PRF.READ 9.48 D
LIST PREFETCH REQUESTS 47394.00 27.24 E
LIST PREFETCH READS 5876.00 3.38 F
PAGES READ VIA LIST PREFTCH 154.9K 89.03 G
L.PRF.PAGES READ/L.PRF.READ 26.36 H
DYNAMIC PREFETCH REQUESTED 378.3K 217.42 I
DYNAMIC PREFETCH READS 157.6K 90.59 J
PAGES READ VIA DYN.PREFETCH 3110.6K 1787.68 K
D.PRF.PAGES READ/D.PRF.READ 19.73 L

By way of explanation, I'll first point out that what you see above are three repeating sets of fields (4 fields in each set) that pertain to sequential, list and dynamic prefetch activity. Here are thumbnail definitions of these prefetch types:

Sequential - Generally speaking, this is the prefetch mode used for table space scans or for non-matching index scans. In other words, if Db2 determines that a front-to-back scan of a table space or index will be required, sequential prefetch will be used (assuming that the table or index in question is not super-small, in which case prefetch of any kind would usually not make sense).
List - This is the prefetch type used when Db2 is retrieving table rows based on a list of row IDs (RIDs) that have been retrieved from an index (or from more than one index, if index ANDing or index ORing is part of the access plan for the query). List prefetch can be efficient if the clustering sequence of rows in the target table is substantially uncorrelated with respect to the order of entries in the index in question (the list of RIDs obtained from the index is sorted in ascending RID sequence and then the sorted RID list is used to prefetch pages of associated rows from the target table). The hybrid method of joining tables is another driver of list prefetch activity.
Dynamic - This prefetch method is dynamically initiated at statement execution time when Db2 recognizes a sequential pattern of data access as it retrieves rows. Matching index scans are often drivers of dynamic prefetch activity.

OK, so here are a couple of things to keep an eye on, if you want to avoid a surprise situation involving elevated levels of wait-for-other-read time for processes that use prefetch to access pages of objects assigned to a given buffer pool:

Prefetch reads relative to prefetch requests - This tends to be more important for list and dynamic prefetch (less so for sequential prefetch, owing to locality of reference being less of a factor in that case). For list and dynamic prefetch, then, compare the number of prefetch reads to the number of prefetch requests (i.e., compare F to E, and J to I, using the letter-labels I added to the statistics report snippet shown above). What's this about? Well, a prefetch request is just that - a request to read a certain chunk of pages from a table space or an index into the assigned buffer pool. Suppose the prefetch request is for 32 pages (the most common quantity), and suppose that all 32 of those pages are already in the buffer pool. In that case, the prefetch request will not drive a prefetch read I/O operation. The larger the number of buffers allocated for a pool, the greater the likelihood that all pages associated with a prefetch request will already be in memory, thereby reducing prefetch reads as a percentage of prefetch requests. If you see the percentage of prefetch reads relative to prefetch requests going up over time for a pool, especially for list and/or dynamic prefetch, that's an indication that elevated levels of wait-for-other-read time could be in the offing. Why? Because more prefetch reads will generally mean more waiting for prefetch reads to complete.
The number of pages read per prefetch read - These are the fields labeled D, H and L in the example statistics report snippet. If you see that number going up for one or more prefetch types (sequential, list, dynamic), it could be an early-warning sign of higher wait-for-other-read times. Why? Because a prefetch read that will bring 25 pages into memory is likely to take longer than a prefetch read that will bring 5 pages into memory (recall that a prefetch read I/O is driven to bring into memory the pages, associated with a prefetch request, that are not already in the buffer pool). When prefetch reads take longer to complete, it is likely that application processes will see higher levels of wait-for-other-read time.

At this point you may have put two and two together, and are thinking, "Hmm. It seems to me that a growing number of prefetch reads relative to prefetch requests combined with an increase in the number of pages read into memory per prefetch read would really be a flashing yellow light with respect to wait-for-other-read time." Right you are. In that case, two things are happening, and both have negative implications for wait-for-other-read time: there are more prefetch reads (because there are fewer cases in which all pages associated with a prefetch request are already in memory) and each prefetch read, on average, is taking longer to complete (because each read, on average, is brining more pages into memory). If too much of that goes on, you could "hit the curve of the hockey stick" and see a sharp and sudden increase in applications' wait-for-other-read times. Better to take a corrective action before that happens. But what?

Glad you asked...

What can you do to reduce (or head off an increase in) wait-for-other-read time?

If wait-for-other-read time has become problematic, or if you see the warning signs and want to take a preemptive action, what can you do? Here are some possibilities:

Increase the size of the buffer pool in question - Simple: more buffers in a pool leads to increased page residency time, and that leads to 1) more prefetch requests NOT leading to prefetch reads (because all pages associated with a request are already in memory) and 2) fewer pages, on average, per prefetch read. Fewer prefetch reads + quicker execution of prefetch reads that do occur = less wait-for-other-read time. Obvious related question: "Can I make a buffer pool bigger? I don't want to put too much pressure on the z/OS LPAR's real storage resource." My response: check the LPAR's demand paging rate (available via an RMF Summary Report for the LPAR). If the demand paging rate is zero or a very small non-zero value (i.e., less than 1 per second), there is little to no pressure on the real storage resource, and you have a green light for making the buffer pool bigger. If the demand paging rate is 2-3 per second or more, and you don't want it to go higher than that (I wouldn't want it to go higher than that), consider reducing the size of a buffer pool that has low GETPAGE-per-second and read-I/O-per-second values, and increase the size of the buffer pool of concern by a like amount (so, the overall size of the buffer pool configuration remains the same). In my experience, plenty of Db2 for z/OS-using organization under-utilize the real storage resources of production z/OS LPARs.
Change some query access plans - If it looks as though sequential prefetch reads are the primary contributor to higher wait-for-other read times, you can consider taking actions that would reduce table space scan and/or non-matching index scan activity. For that, you could potentially use a query monitor to identify longer-running queries that access objects assigned to the buffer pool in question, and examine EXPLAIN output for those queries to see if any of them have table space scans and/or non-matching index scans in their access plans that involve objects assigned to the buffer pool. Then, consider whether it would be worth it to create a new index or indexes to eliminate such scans (there are cost factors associated with new indexes - you want the benefit to outweigh the cost), or whether simply adding a column to an existing index might reduce scan activity (there is a cost associated with that, too, but it's not as high as the cost of a new index). For dynamic prefetch, keep in mind that this is often related to matching index scans. You can sometimes reduce that activity by enabling Db2 to do more result set row filtering at the index level, and that often involves trying to increase MATCHOLS values for one or more predicates of longer-running and/or more-frequently-executed queries (referring to the name of a column in the PLAN_TABLE in which EXPLAIN output is found). Boosting MATCHCOLS can involve things such as changing an index (add a column, or change the order of columns in a key - keeping in mind that the latter change could benefit some queries and negatively impact others), or maybe re-coding some non-indexable predicates to make them index-able, or maybe adding a predicate that does not change a query's result set. For list prefetch, keep in mind that this often has to do with rows in a table being clustered in a sequence that is very different from the sequence of the index used in the list prefetch operation. You might consider whether a table's clustering key is what it should be - the clustering key of a table can always be changed, and sometimes it makes sense to do that. Also, when index ANDing is driving a lot of list prefetch activity, increasing index-level filtering can help (maybe by adding a column to an index involved in the ANDing, to increase the column-match number, or adding an index that would take the number of indexes AND-ed from n to n+1).
Take a look at RID pool activity - List prefetch operations involve use of the Db2 subsystem's RID pool. If the RID pool can't be used for a RID processing operation, Db2 will fall back to a table space scan for the target table, and that can drive sequential prefetch numbers up. RID pools these days are MUCH larger than they used to be (the default RID pool size in a Db2 12 or Db2 13 system is 1 GB), so incidences of RID processing "failed (or not used) due to lack of storage" - something that is indicated in Db2 monitor-generated accounting long as well as statistics long reports - are now quite rare. What you could potentially see, however, in the RID processing block of a statistics long report is a relatively large number of occurrences of "failed due to RDS limit exceeded." What this RID-pool-not-used counter means: If Db2 is executing a query, and a RID list processing action commences, and Db2 determines that more than 25% of the RIDs in the index being accessed will be qualified by the predicate in question, Db2 will abandon the RID list processing action in favor of a table space scan. Can you do anything about this? Maybe. In the case of a static SQL statement, it is my recollection that this RDS limit value is embedded in the associated Db2 package; so, at bind time, Db2 notes the number of RIDs that would exceed 25% of the RIDs in an index that is to be used as part of a RID-list-utilizing access plan action. Why is this potentially important? Because an index could grow substantially in the months (or years) following the most recent bind or rebind of a package. What this means: if you have an application process for which there are many occurrences of "RID list processing failed - RDS limit exceeded," check how long it has been since the package (or packages) associated with the process were last bound or rebound. If it's been a long time, and if you think that relevant indexes have grown substantially since then, consider rebinding the packages - if that rebind results in new and larger "this is the RDS limit threshold for this index" values being embedded in the package, that value increase might be enough to reduce incidences of "RID list processing failed - RDS limit exceeded" for the package.

OK, that's what I've got on this topic. As I mentioned up front: in-Db2 wait-for-other-read time is usually not a matter of concern for Db2 application performance. In some cases, it can be an issue. This blog entry is aimed at helping you should such a case arise at your site (or even better, to help ensure that it doesn't become an issue for you).