Robert's Db2 blog: 2021

Wednesday, December 29, 2021

Db2 for z/OS Data Sharing: is your Lock Structure the Right Size?

Recently I've encountered some situations in which organizations running Db2 for z/OS in data sharing mode had lock structures that were not sized for maximum benefit. In this blog entry, my aim is to shed some light on lock structure sizing, and to suggest actions that you might take to assess lock structure sizing in your environment and to make appropriate adjustments.

First, as is my custom, I'll provide some background information.

One structure, two purposes

A lot of you are probably familiar with Db2 for z/OS data sharing. That is a technology, leveraging an IBM Z (i.e., mainframe) cluster configuration called a Parallel Sysplex, that allows multiple Db2 subsystems (referred to as members of the data sharing group) to share read/write access to a single instance of a database. Because the members of a Db2 data sharing group can (and typically do) run in several z/OS LPARs (logical partitions) that themselves run (usually) in several different IBM Z servers, Db2 data sharing can provide tremendous scalability (up to 32 Db2 subsystems can be members of a data sharing group) and tremendous availability (the need for planned downtime can be virtually eliminated, and the impact of unplanned outages can be greatly reduced).

One of the things that enables Db2 data sharing technology to work is what's called global locking. The concept is pretty simple: if an application process connected to member DBP1 of a 4-way (for example) Db2 data sharing group changes data on page P1 of table space TS1, a "local" X-lock (the usual kind of Db2 lock associated with a data-change action) on the page keeps other application processes connected to DBP1 from accessing data on the page until the local X-lock is released by way of the application commit that "hardens" the data-change action. All well and good and normal, but what about application processes connected to the other members of the 4-way data sharing group? How do they know that data on page P1 of table space TS1 is not to be accessed until the application process connected to member DBP1 commits its data-changing action? Here's how: the applications connected to the other members of the data sharing group know that they have to wait on a commit by the DBP1-connected application because in addition to the local X-lock on the page in question there is also a global lock on the page, and that global lock is visible to all application processes connected to other members of the data sharing group.

Where does this global X-lock on page P1 of table space TS1 go? It goes in what's called the lock structure of the data sharing group. That structure - one of several that makes Db2 data sharing work, others being the shared communications area and group buffer pools - is located in a shared-memory LPAR called a coupling facility, and the contents of the structure are visible to all members of the data sharing group because all the members are connected to the coupling facility LPAR (and, almost certainly, to at least one other CF LPAR - a Parallel Sysplex will typically have more than one coupling facility LPAR so as to preclude a single-point-of-failure situation).

Here's something kind of interesting: a global lock actually goes to two places in the lock structure (if it's an X-lock, associated with a data-change action, versus an S-lock, which is associated with a data-read request). Those two places are the two parts of the lock structure: the lock table and the lock list:

The lock table can be thought of as a super-fast global lock contention detector. How it works: when a global X-lock is requested on a page (or on a row, if the table space in question is defined with row-level locking), a component of the z/OS operating system for the LPAR in which the member Db2 subsystem runs takes the identifier of the resource to be locked (a page, in this example) and runs it through a hashing algorithm. The output of this hashing algorithm relates to a particular entry in the lock table - basically, the hashing algorithm says, "To see if an incompatible global lock is already held by a member Db2 on this resource, check this entry in the lock table." The lock table entry is checked, and in a few microseconds the requesting Db2 member gets its answer - the global lock it wants on the page can be acquired, or it can't (at least not right away - see the description of false contention near the end of this blog entry). This global lock contention check is also performed for S-lock requests that are associated with data-read actions.
The lock list is, indeed (in essence), a list of locks - specifically, of currently-held global X-locks, associated with data-change actions. What is this list for? Well, suppose that member DBP1 of a 4-way data sharing group terminates abnormally (i.e., fails - and that could be a result of the Db2 subsystem failing by itself, or terminating abnormally as a result of the associated z/OS LPAR or IBM Z server failing). It's likely that some application processes connected to DBP1 were in the midst of changing data at the time of the subsystem failure, and that means that some data pages (or maybe rows) were X-locked at the time of the failure. Those outstanding X-locks prevent access to data that is in an uncommitted state (because the associated units of work were in-flight at the time of the failure of DBP1), but that blocking of access to uncommitted data is only effective if the other members of the data sharing group are aware of the retained page (or row) X-locks (they are called "retained locks" because they will be held until the failed Db2 subsystem can be restarted to release them - restart of a failed Db2 subsystem is usually automatic and usually completes quite quickly). The other members of the data sharing group are aware of DBP1's retained X-locks thanks to the information in the lock list.

For the lock structure, size matters - but how?

If you're implementing a new data sharing group, 128 MB is often a good initial size for a lock structure in a production environment (assuming that coupling facility LPAR memory is sufficient). Suppose that you have an existing lock structure. Is it the right size? To answer that question, you have to consider the two aforementioned parts of the lock structure: the lock list and the lock table. If the lock list is too small, the effect will often be quite apparent: data-changing programs will fail because they can't get the global X-locks they need, owing to the fact that the lock list is full (the SQL error code in that case would be a -904, indicating "resource unavailable," and the accompanying reason code will be 00C900BF). Certainly, you'd like to make the lock list part of the lock structure larger before it fills up and programs start abending. To stay ahead of the game in this regard, you can look for instances of the Db2 (actually, IRLM) message DXR142E, which shows that the lock list is X% full, with "X" being 80, 90 or 100. Needless to say, 100% full is not good. 90% full will also likely result in at least some program failures. Suffice it to say that if this message is issued on your system, 80% is the only "in-use" value that you want to see, and if you see "80% in-use" you'll want to make the lock list bigger (you can also issue, periodically, the Db2 command -DISPLAY GROUP, or generate and review an RMF Coupling Facility Activity report - in either case, see what percentage of the list entries in the lock structure are in-use).

If you want or need to make the lock list part of a lock structure bigger, how can you do that? Really easily, if the maximum size of the lock structure (indicated by the value of the SIZE specification for the structure in the coupling facility resource management - aka CFRM - policy) is larger than the structure's currently allocated size (as seen in the output of the Db2 -DISPLAY GROUP command, or in an RMF Coupling Facility Activity Report). When that is true, a z/OS SETXCF command can be issued to dynamically increase the lock structure size, and all of the space so added will go towards enlarging the lock list part of the structure. If the lock structure's current size is the same as the maximum size for the structure, lock list enlargement will require a change in the structure's specifications in the CFRM policy, followed by a rebuild of the structure (that rebuild will typically complete in a short time, but it can impact Db2-accessing application processes, so take the action at a time when system activity is at a relatively low level). More information on lock list size adjustments can be found in an entry I posted to this blog back in 2013.

How about the lock table part of the lock structure? Is yours large enough? In contrast to the lock structure, the lock table can't "run out of space" and cause programs to fail. How, then, do you know if a lock table size increase would be advisable? The key here is the "false contention" rate for the data sharing group (or for a member or members of the group - both a Db2 monitor statistics report and an RMF Coupling Facility Activity report can be used to see false contention at either the group or a member level). What is "false contention?" It's global lock contention that is initially perceived but later found to be not real. How does that happen? Recall the previous reference to the lock table and the associated hashing algorithm. A lock table has a certain number of entries, determined by the size of the table and the size of the lock entries therein. The size of a lock table entry will vary according to the number of members in a data sharing group, but for my example of a 4-way group the lock entry size would be 2 bytes. If the lock structure size is 128 MB, the lock table size will be 64 MB (the lock table part of the lock structure will always have a size that is a power of 2). That 64 MB lock table will accommodate a little over 33 million two-byte lock entries. 33 million may sound like a lot, but there are probably way more than 33 million lock-able things in the data sharing group's database (think about the possibility of row-level locking for a 100 million-row table, and you're already way past 33 million lock-able things).

How do you manage global lock contention detection with a 33 million-entry lock table when there are more (probably WAY more) than 33 million lock-able things in the database? That's where the hashing algorithm comes in. That algorithm will cause several different lock-able things to hash to a single entry in the lock table, and that reality makes false contention a possibility. Suppose an application process connected to member DBP1 of the 4-way data sharing group needs a global X-lock on page P1 of table space TS1. DBP1 propagates that request to the lock table, and receives in response an indication that contention is detected. Is it real contention? The z/OS LPARs in the Sysplex can communicate with each other to make that determination, and it may be found that the initial indication of contention was in fact false, meaning that incompatible lock requests (i.e., an X and and X, or an X and an S) for two different resources hashed to the same entry in the lock table. When the initially indicated contention is seen to be false, the application that requested the global lock (the request for which contention was initially indicated) gets the lock and keeps on trucking. That's a good thing, but resolving false contention involves some overhead that you'd like to minimize. Here's where lock table sizing comes in. As a general rule, you like for false contention to account for less than half of the global lock contention in the data sharing group (the total rate of global lock contention and the rate of false contention can both be seen via a Db2 monitor statistics report or an RMF coupling facility activity report).

What if the rate of false contention in your system is higher than you want it to be? In that case, consider doubling the size of the lock table (remember, the lock table size will always be a power of 2), or even quadrupling the lock table size, if the rate of false contention is way higher than you want and the CF LPAR holding the structure has sufficient memory to accommodate the larger lock structure size. If you double the lock table size, you'll double the number of lock entries, and that will mean that fewer lock-able things will hash to the same lock table entry, and THAT should result in a lower false contention rate, boosting the CPU efficiency of the data sharing group's operation.

Here's something you need to keep in mind with regard to increasing the size of a Db2 data sharing group's lock table: this cannot be done without a rebuild of the associated lock structure. As previously mentioned, a lock structure's size can be dynamically increased via a SETXCF command (assuming the structure is not already at its maximum size, per the CFRM policy), but all of that space dynamically added will go towards increasing lock list space, not lock table space. A bigger lock table will require a change in the lock structure's specifications in the CFRM policy, followed by a rebuild of the structure to put the change in effect. If the lock structure's current size is 128 MB, you might change its INITSIZE in the CFRM policy (the initial structure size) to 256 MB, and rebuild the structure to take the lock table size from 64 MB to 128 MB. A lock structure size that's a power of 2 is often reasonable - it will cause lock structure space to be divided 50-50 between the lock table and the lock list. Note that if the lock structure INITSIZE is not a power of two, as a general rule the lock table size will be the power of 2 that will result in the space division between lock table and lock list being as equal as possible.

And that wraps up this overview of Db2 lock structure sizing. I hope that the information will be useful for you.

Monday, November 29, 2021

Db2 for z/OS: -DISPLAY DDF DETAIL output, explained

The output of the Db2 for z/OS command -DISPLAY DDF DETAIL provides a lot of very useful information in a compact form. That said, I've found that quite a number of Db2 people are not so clear on the meaning of various fields in -DISPLAY DDF DETAIL output. In this blog entry I will aim to shed light on key information items provided via -DISPLAY DDF DETAIL, and explain how that information can be put to use.

To begin with, a sample of -DISPLAY DDF DETAIL output is shown below (I have anonymized identifiers such as IP addresses and subsystem names). The fields highlighted in red are those that I have found to be the most useful - they will be subsequently explained.

DSNL080I -DBP2 DSNLTDDF DISPLAY DDF REPORT FOLLOWS:
DSNL081I STATUS=STARTD
DSNL082I LOCATION LUNAME GENERICLU
DSNL083I DBP2 ABCD1234.DBP2 ABCD1234.DBP0GLU
DSNL084I TCPPORT=4100 SECPORT=0 RESPORT=4102 IPNAME=-NONE
DSNL085I IPADDR=::1.2.3.4
DSNL086I SQL DOMAIN=b4vipa.acme.com
DSNL086I RESYNC DOMAIN=b4vipa.acme.com
DSNL089I MEMBER IPADDR=::1.2.3.4
DSNL090I DT=I CONDBAT= 5000 MDBAT= 550
DSNL091I MCONQN= 0 MCONQW= 0
DSNL092I ADBAT= 67 QUEDBAT= 0 INADBAT= 0 CONQUED= 0
DSNL093I DSCDBAT= 60 INACONN= 1786 IUDBAT= 7
DSNL094I WLMHEALTH=100 CLSDCONQN= 0 CLSDCONQW= 0
DSNL100I LOCATION SERVER LIST:
DSNL101I WT IPADDR IPADDR
DSNL102I 32 ::1.2.3.4
DSNL102I 32 ::1.2.3.4
DSNL105I CURRENT DDF OPTIONS ARE:
DSNL106I PKGREL = COMMIT
DSNL106I SESSIDLE = 001440
DSNL099I DSNLTDDF DISPLAY DDF REPORT COMPLETE

OK, starting from the top and working down through the sample command output, following are my explanations of the fields highlighted in red:

TCPPORT, SECPORT and RESPORT - The first of these fields is the Db2 subsystem's "standard" SQL listener port. A client application (a DRDA requester or a REST client) that will interact with Db2 over a TCP/IP link will specify this port in a connection request (or in a REST request) if there is not a need for SSL encryption for the connection (i.e., encryption of data "on the wire"). If SSL encryption is required, the client specifies the Db2 subsystem's secure port. In the command output shown above, the value of 0 for SECPORT indicates that a secure SQL port has not been defined for the subsystem. That doesn't mean that SSL encryption cannot be used for a connection to the subsystem - that can still happen even when the standard SQL listener port is specified for the connection request. The difference is this: when the secure SQL listener port is specified at connection time then Db2 requires the use of SSL encryption, and rejects the connection request when SSL encryption is not used. Note that with the fix for APAR PH08188 applied (it came out in April of 2019), a Db2 subsystem's one and only SQL listener port can be its secure port - if that is made so then any application that would access the Db2 subsystem via DDF (whether a DRDA requester or a REST client) cannot do so unless SSL encryption is used for the application's connection(s) to the subsystem.

RESPORT is short for re-sync port. What is the purpose of this port? Well, in a DDF two-phase commit situation (i.e., when an application that is DDF-connected to a Db2 subsystem changes data at that location and at another Db2 location in the same unit of work), a failure (application-side or data server-side) could leave one or more associated DDF transactions in an in-doubt state. For those transactions to be resolved the application needs to tell the Db2 subsystem to either commit or abort the transactions, and when the application communicates to the Db2 subsystem the action it is to take for these transactions (commit or abort), it uses the subsystem's re-sync port. This is particularly important in a Db2 data sharing environment, in which all member subsystems have the same SQL listener port (and the same secure port, if applicable). When a DDF transaction is in an in-doubt state in that kind of environment, the associated application needs to communicate the commit or abort action to the particular member subsystem that was servicing the transaction - the member's unique re-sync port makes that possible.

CONDBAT and MDBAT - These refer to two parameters, CONDBAT and MAXDBAT, in the Db2 subsystem's ZPARM module (think of the ZPARMs as Db2 for z/OS configuration parameters). CONDBAT specifies the maximum number of connections that DDF-using applications can have with the Db2 subsystem. MAXDBAT is the maximum number of those connections that can be active at one time. It is typical for a subsystem's CONDBAT value to be much higher than its MAXDBAT value. Why is this so? Because at any given time, it's likely that most connections to the Db2 subsystem are not active, and an inactive connection doesn't need a DBAT (DBAT, by the way, is short for database access thread - that's what we call a DDF thread). Having a bunch of inactive connections is no big deal - the virtual storage "footprint" of a Db2 inactive connection is very small, and the CPU cost of switching an inactive connection to active (when a new transaction associated with the connection comes along) and back again (when said transaction has completed) is very small, as well. Generally speaking, you want the CONDBAT value to be high enough to avoid reaching the connection limit, because once that limit has been reached then the next attempt by an application to connect to the subsystem via DDF will be rejected by Db2. A statistics long report generated by your Db2 monitor (depending on the monitor product, that may be called a statistics detail report) will show you if the CONDBAT limit has been reached for a subsystem.

The MAXDBAT value, as stated previously, is the limit on the number of DDF connections that can be concurrently active for the subsystem. How so? Because by definition an active connection is one that is paired with a DBAT (a transaction that comes in by way of a connection needs a DBAT in order to be processed). It's long been thought that you want the MAXDBAT value for a subsystem to be large enough so as not to be reached, and that's still the case for most Db2 systems that I've seen, but there could be a situation in which you WANT the MAXDBAT value to be reached. Suppose, for example, that there are occasional huge surges of transactions that come in from DDF-using applications. It's possible that letting all of those transactions into the system at once would overwhelm the z/OS LPAR's processing capacity, leading to big-time performance problems. In that case, a MAXDBAT value that would be hit would cause some DDF transactions to queue up waiting for a DBAT to come free, but by keeping the system's processing capacity from being overwhelmed, the MAXDBAT limit would preserve a high level of transaction throughput, so a DDF transaction waiting for a DBAT to come free shouldn't have to wait long.

Another MAXDBAT-related matter to consider: high-performance DBATs. A high-performance DBAT, once instantiated, stays dedicated to the instantiating connection until it has been used to process 200 transactions. That means that a larger number of high-performance DBATs leads to fewer DBATs in the DBAT pool, all other things being equal, and THAT means you should almost certainly increase a Db2 subsystem's MAXDBAT value prior to implementing high-performance DBAT functionality.

ADBAT - ADBAT is short for "active DBATs," and it indicates the total number of DBATs that exist at the time of the issuance of the -DISPLAY DDF DETAIL command. It indicates the total number of DBATs because ALL DBATs - whether in the DBAT pool or currently paired with connections - are "active."

QUEDBAT - This field shows the number of times that the subsystem's MAXDBAT limit has been reached since DDF was last started (which was probably when the Db2 subsystem was last started). See the information about MAXDBAT, above, to get an idea as to how one should interpret a non-zero value in this field.

DSCDBAT - This is short for "disconnected DBATs," and it indicates the number of DBATs in the pool. DBATs in the DBAT pool are active, as previously noted, but they are in a disconnected state - a DBAT in the pool is there because it is not tied to a particular connection. When a connection needs a DBAT (for the servicing of a transaction associated with the connection), it gets one from the DBAT pool.

INACONN - This is short for "inactive connections." As previously mentioned, it is very common to see a large number of DDF connections in the inactive state - it just means that there are no currently in-flight transactions associated with the connections.

IUDBAT - This is short for "in-use" DBATs. It's a relatively new field in the output of -DISPLAY DDF DETAIL, added via the fix for APAR PH30222, which came out about a year ago. In-use DBATs are those that are currently being used to process transactions. A DBAT is also considered to be in-use when it is a high-performance DBAT.

PKGREL - This is a DDF configuration setting, and it can be thought of as the high-performance DBAT "on/off switch." When PKGREL is set to COMMIT, as seen in the example command output shown above, high-performance DBAT functionality is "off" for the Db2 subsystem. When PKGREL is set to BNDOPT, high-performance DBAT functionality is "on," in the sense that it is enabled, but for a DBAT to become the high-performance type it has to be used to execute at least one package that has been bound with RELEASE(DEALLOCATE). There is a third PKGREL specification, BNDPOOL, that in my experience is not commonly used - it can be a reasonable choice when an application that uses high-performance DBATs does not stay connected for very long to the Db2 subsystem (typically, a DDF-using application will connect to a Db2 for z/OS server and stay connected to that Db2 system for a long time).

The PKGREL setting can be changed using the Db2 command -MODIFY DDF.

And that's that. I hope that this explanation of -DISPLAY DDF DETAIL output will be useful for you.

Friday, October 29, 2021

Update on No-Charge IBM Tools for Db2 for z/OS

Recently there has been important news regarding no-charge IBM tools for application developers and for database administrators who work with Db2 for z/OS. This news has to do with strategic replacements for the development and administration features of IBM Data Studio and IBM Data Server Manager in a Db2 for z/OS environment. I want to use this blog entry to highlight two newer offerings in the no-charge IBM Db2 for z/OS tools space.

First, some background.

The original (and still there) no-charge IBM tooling for Db2 for z/OS

I've been working with Db2 for z/OS since about 1987 (Version 1 Release 2), and as far back as I can remember there has been no-charge IBM-provided tooling for users of the DBMS. I say "tooling" instead of "tools" because I'm referring here not to separate products but to the tooling that was - and still is - provided with Db2 for z/OS itself. Probably the best known of this "comes with Db2" tooling is the set of capabilities provided by DB2I (short for "Db2 interactive"), the panel-driven feature set that runs in a TSO/ISPF environment and is accessed via a 3270 terminal emulator (or, back in the day, by way of an actual 3270 terminal). DB2I functions that have long been useful to Db2 administrators and to application developers working with Db2 include SPUFI (SQL processor using file input), an interactive means of issuing dynamic SQL statements and reviewing associated output; DCLGEN (declarations generator), for generating structure definitions (for C and PL/I programs) and record descriptions (for COBOL programs) associated with Db2 tables and views that will be referenced in program code; binding and rebinding of plans and packages; a Db2 utility invocation mechanism; and an interface for interactive issuance of Db2 commands (though not commands of the DSN variety).

To this day I use DB2I regularly, but it was apparent some time ago that a newer generation of developers and administrators wanted new ways of working with Db2 for z/OS, and that some no-charge tooling with basic capabilities was still required in addition to the advanced-function for-fee tools available from IBM and other vendors. An initial response to this "new way, no-charge" need was IBM Data Studio, launched a little over 10 years ago.

No-charge GUI tools: IBM Data Studio, and then IBM Data Server Manager

IBM Data Studio is a download-able, workstation-based tool built on the Eclipse platform. It provided a GUI for Db2 for z/OS that delivered a number of features useful to developers and to administrators. Of particular value to many developers was Data Studio's debug functionality for routines written in SQL Procedure Language, aka SQL PL (examples of such routines are Db2 native SQL procedures, compiled SQL scalar functions and advanced triggers). I personally found Data Studio to be very helpful for testing stored procedure calls, and for retrieving non-traditional Db2 data values such as CLOBs and XML documents (I really appreciated Data Studio's nice formatting of XML documents returned from Db2). Data Studio also evolved to be the GUI front-end for many of IBM's fee-based Db2 for z/OS tools, and for the Db2 Analytics Accelerator, and the resulting look-and-feel commonality was a productivity booster.

As helpful as Data Studio was, the tool came to be more popular with developers than Db2 administrators. In particular, the Eclipse-based interface, familiar to many application programmers, didn't win a lot of fans among DBAs. Over time, it became more clear that another no-charge GUI tool for Db2 for z/OS, more purpose-built for administrators, was needed. That administrator-first tool was IBM Data Server Manager, or DSM. In addition to the admin-friendly interface, DSM had the advantage of browser-based user interaction, simplifying product installation and management.

Even with no-charge tools aimed at the developer and administrator constituencies, it still seemed that we weren't quite where we wanted to be in the Db2 for z/OS space. That changed with the advent of IBM Db2 for z/OS Developer Extension for Visual Studio Code, and IBM Db2 Administration Foundation for z/OS.

A newer no-charge tool for developers: IBM Db2 Developer Extension for VS Code

Visual Studio Code (aka VS Code) is a no-charge and very widely used IDE (integrated development environment), and it's the base for the IBM Db2 for z/OS Developer Extension (actually, one base - the Db2 Developer Extension is also available for Eclipse Theia, and other popular IDEs will likely follow). There are two ways to get the Db2 Developer Extension for VS Code:

From the Visual Studio Marketplace on the Web.
From within VS Code itself: in a VS Code session, go to the Extension view and type in “ibm z” - the Db2 Developer Extension should be at the top of the resulting list (see the screen shot below).

The Db2 Developer Extension has a lot of features that can be helpful for people writing SQL statements and routines that will target Db2 for z/OS-managed data, including:

SQL statement formatting
SQL syntax checking (including XML syntax checking)
SQL coding accelerators, like code completion and code snippets
SQL PL routine debugging (I highlighted this one because it's really important to a lot of people - this functionality was previously only available with Data Studio)

And there's more, and more to come. I encourage you to download and use the Db2 Developer. Extension, and I also recommend checking out the Db2 Developer Extension repository on GitHub - there, you can get lots more information about the Developer Extension, submit requests for enhancements and check out enhancement requests submitted by others (many of which have already been acted upon by the IBM Db2 Developer Extension team).

A newer no-charge tool for administrators: IBM Db2 Administration Foundation for z/OS

The Admin Foundation came out just this past summer. It's a licensed IBM product (the product ID is 5698-ADF), but there is no charge associated with the license (there's an optional service and support charge). Admin Foundation runs entirely in z/OS - it has a browser-based interface that leverages the Zowe virtual desktop.

Key features of IBM Db2 Administration Foundation for z/OS include the following:

Object dependency information
Generate DDL for an object from information in the Db2 catalog
SQL editor
Visual explain
Single-query tuning services (including statistics advisor)
Issue Db2 commands - including DSN commands (I highlighted this one because plenty of GUI tools let you issue SQL statements but not Db2 for z/OS commands, and Admin Foundation does DB2I one better by enabling issuance of Db2 commands of the DSN variety)

The screen shot below shows the object hierarchy information that can be displayed via Db2 Administration Foundation:

Here's the bottom line: Developer Extension for VS Code and Administration Foundation are great additions to the no-charge IBM Db2 for z/OS tools scene. Check them out, and ride the new wave.

Tuesday, September 28, 2021

Db2 for z/OS: SEPARATE_SECURITY and SECADM

In the Db2 for z/OS module called DSNZPARM (ZPARM, for short), which provides the configuration settings for a Db2 subsystem, there is a parameter named SEPARATE_SECURITY. This ZPARM was introduced with Db2 10, and its purpose is to enable a Db2 for z/OS-using organization to separate security administration from system administration in a Db2 environment (and here I'm talking about Db2 internal security, which is concerned with what a user or process can do once the user or process has successfully connected to a Db2 subsystem). The default value for SEPARATE_SECURITY is NO, and that's the setting that's in effect at most Db2 for z/OS sites. A Db2 team will generally go with SEPARATE_SECURITY=YES when they've been informed that this is a requirement for a Db2 system. The Db2 folks at one site were recently presented with this directive, and I was asked for some guidance on implementing the change. The back-and-forth exchanges I had with people on this team included information that I think could be of interest to the larger Db2 for z/OS community, so I'm packaging it in this blog entry.

In a nutshell, SEPARATE_SECURITY=YES makes a Db2 authorization level called SECADM really important. Does this mean that there's no such thing as SECADM authority in an environment in which SEPARATE_SECURITY=NO? No, that's not what it means. When SEPARATE_SECURITY=NO, an ID with SYSADM authority has implicit SECADM authority (with one exception: the ability to execute the SQL statement TRANSFER OWNERSHIP requires explicit SECADM authority, unless the statement is issued by the ID away from which a database object's ownership is being transferred). Why is SECADM authority really important when SEPARATE_SECURITY is set to YES? Because in that case there are important security-related things that can ONLY be done by someone with SECADM authority; for example:

Create, alter, activate or deactivate a column mask or a row permission.
Grant a privilege (unless an ID holds the privilege in question WITH GRANT OPTION, or owns the object on which the privilege is being granted).
Revoke the CREATE-SECURE_OBJECT privilege.

Now, here's something interesting about SECADM authority: it can't be granted. How, then, is that authority conferred? It's conferred by way of two ZPARM parameters, SECADM1 and SECADM2. This being so, the decision regarding values for those ZPARMs should be made thoughtfully. One option, of course, would be to supply two user IDs for SECADM1 and SECADM2, but that would be pretty restrictive - quite possibly more restrictive than you want. Another option is to make SECADM1 or SECADM2 a RACF group ID (or RACF-equivalent, if you're using some other z/OS security management subsystem). In a typical Db2 environment, if user ID ABC is connected to RACF group ID XYZ then XYZ will be a Db2 secondary authorization ID for ABC. That means that privileges granted to the RACF group ID are available to users whose auth IDs are connected to the group ID (in some cases, successful execution of a SQL statement or command requires that a privilege be held by the SQL ID of the associated process, and that can necessitate issuance of a SET CURRENT SQLID statement to change one's SQL ID from the primary auth ID to a secondary auth ID). Setting SECADM1 or SECADM2 to a RACF group ID offers some flexibility by enabling one to then connect the user IDs of people who will have security administration responsibility in the Db2 environment to the RACF group ID associated with the ZPARM parameter.

Another way to get flexibility for Db2 security administration when SEPARATE_SECURITY=YES is in effect is to specify a Db2 role for SECADM1 or SECADM2. This gets kind or interesting because it's a use of a Db2 role that differs from what in my experience has been the more common case: utilization of a role (and an associated trusted context) to help prevent misuse of a Db2 DDF-using application's ID and password (I blogged about that more-common use of roles and trusted contexts a couple of years ago). A bit of background: a role can be created by a Db2 DBA, and a privilege or privileges can be granted to that role, and a trusted context (also created by a DBA) specifies the conditions in which the privileges granted to the role can be used. When the role is used to help prevent misuse of a DDF application's ID and password, the associated trusted context will likely specify that the privileges of the role (the privileges required for the DDF application to execute its Db2-targeting SQL statements) can be used by the application that connects to Db2 using auth ID ABC (the application's auth ID) from a certain set of IP addresses (the addresses of the servers on which the application runs). What if the role instead were intended to enable a certain group of people to perform Db2 security administration tasks, with the understanding that these administrators will be locally connected to the Db2 system (i.e., not accessing the system through the distributed data facility)?

For the "local users" case, the set-up would be as follows. First, a Db2 role is created for the purpose. We'll call it SECROLE. Then, a trusted context is created to specify the circumstances in which the Db2 privilege held by the role (it will have SECADM authority) can be used. Db2 security administration actions will be performed by way of a batch job, and the trusted context associated with SECROLE will reference that job, as shown below (I'll explain the color-highlighted parts of the CREATE TRUSTED CONTEXT statement):

CREATE TRUSTED CONTEXT ADMIN_CTX
BASED UPON CONNECTION USING SYSTEM AUTHID ADMNP01
ATTRIBUTES (JOBNAME 'PRDADMN')
DEFAULT ROLE SECROLE WITH ROLE AS OBJECT OWNER AND QUALIFIER

WITH USE FOR SALLY, FRED, DAVID

ENABLE;

OK, here are some notes on the color-coded parts of the above statement:

ADMNP01 - This is the value of the USER parameter of the JOB statement in the batch job's JCL.
PRDADMN - This is the job name in the JOB statement in the batch job's JCL.
SECROLE - This is the role whose privileges can be used when the connection to Db2 is by way of the PRDADMN job with ADMNP01 as USER.
WITH ROLE AS OBJECT OWNER AND QUALIFIER - This is important given the purpose of the role SECROLE. That role will have SECADM authority in a SEPARATE_SECURITY=YES environment, and so will be needed to issue dynamic GRANT statements and, likely, some dynamic DDL statements such as CREATE and ALTER (especially for security objects such as column masks and row permissions). That makes SECROLE different versus a role created for a DDF-using application. In the latter case, the expectation is that the DDF application will be issuing dynamic SQL DML statements such as SELECT and INSERT, and for such dynamic statements the set of privileges checked by Db2 is the union of the privileges (if any) held by the application's auth ID and (if the application has a trusted connection to Db2) the privileges of the role associated with the trusted connection. When the SQL statements to be executed are dynamic GRANTs or dynamic DDL statements, as expected for the security administrators who will be using the batch job associated with the ADMIN_CTX trusted context, the Db2 privileges checked will be either those held by the primary auth ID of the process or those held by the role associated with the process. If it's the role's privileges that we want to be in effect (and we do want that in this case) then the trusted context needs to be defined WITH ROLE AS OBJECT OWNER (the additional AND QUALIFIER clause makes the role name the default value for the CURRENT SCHEMA special register when the batch job is executed).
WITH USE FOR SALLY, FRED, DAVID - We can optionally limit use of the role associated with the trusted context to a set of user IDs. What this means in a practical sense: in this particular case, the batch job with the name PRDADMN is going to invoke the Db2 DSN command processor, and is going to execute the program DSNTEP2 (could just as well be the DSNTEP4 program), through which SQL statements (e.g., GRANT, CREATE, ALTER) will be issued. It has been decided that the privileges held by the role SECROLE will be in effect only when the PRDADMN job, with USER name ADMNP01, is executed by one of the user IDs SALLY, FRED or DAVID. SALLY (or FRED or DAVID) can provide her user ID by way of the ASUSER option of the DSN command, when she executes the PRDADMN job. Here's an example of what I'm talking about:

//SYSTSIN DD *
DSN SYSTEM(xxxx) ASUSER(SALLY)
RUN PROGRAM(DSNTEP2)
//SYSIN DD *
...

With the role and trusted context created, the value of SECADM1 (or SECADM2) in ZPARM can be set to SECROLE (and the value of SECADM1_TYPE (or SECADM2_TYPE) in ZPARM would be set to ROLE).

If you set SECADM1 or SECADM2 to a role name, and you want to test out the functionality of the role and its trusted context before changing SEPARATE_SECURITY to YES, consider issuing the SQL statement TRANSFER OWNERSHIP via the batch job referenced in the trusted context. That statement should only succeed if the privileges of the role are in effect, since only SECADM has the authority to execute TRANSFER OWNERSHIP, even when SEPARATE_SECURITY is set to NO (a TRANSFER OWNERSHIP statement that transfers ownership of a database object away from ID ABC will be successful when issued by ID ABC, so don't use that ID in your test).

And there you have it. If you are going to go to SEPARATE_SECURITY=YES, have a plan in place for SECADM1 and SECADM2. It would probably be a good idea for one of the two SECADMs to be an authorization ID - either a user ID or a RACF (or equivalent) group ID. The other SECADM could be a different authorization ID, or it could be a role. The main point: give this some thought, and have your ducks in a row before you throw the SEPARATE_SECURITY=YES switch.

Monday, August 30, 2021

Db2 for z/OS: Package-Related Things of Which You Should be Aware

Db2 for z/OS packages have been around for a long time - since, I believe, V2R3, back in the early 1990s. Even so, I've found myself talking more about packages to people lately than I have in quite some years. These discussions - conducted in meetings, in phone calls and through email - have highlighted for me some related topics and items of information of which more folks should be aware. I've packaged (no pun intended) these informational nuggets in this blog entry, and I hope you'll find them to be useful. Without further ado, and in no particular order...

Many Db2 for z/OS packages are associated with static SQL, but "static SQL" is a term not universally understood by people who work with Db2. Sure, Db2 DBAs are familiar with the concept of static SQL. So, too, are application developers who code SQL statements in certain programs - typically, programs that 1) execute in the same z/OS LPAR as the target Db2 system, 2) are written in COBOL, and 3) execute as batch jobs or CICS transactions (though they could be stored procedure programs, and might in that case be written in SQL PL). Application developers who code Db2-accessing SQL statements in JDBC or ODBC form may not be so familiar with the static SQL concept. For those folks: a static SQL statement is one that has already been prepared for execution, via the Db2 BIND process, before it is ever even issued for execution by a program. A Db2-related SQL statement that is not static is referred to as dynamic - it is prepared for execution when Db2 first sees it coming in from an application program (statement preparation can be relatively expensive, so the prepared form of a dynamic SQL statement is stored in-memory by Db2 in its dynamic statement cache - that prepared form of the statement can be reused if the same dynamic statement comes along again). SQL in JDBC or ODBC form is dynamic from the Db2 perspective. If a Db2-accessing program contains static SQL statements, that program has an associated package, and the package in that case can be thought of as the compiled, executable form of the program's static SQL statements.

Dynamic SQL statements are also associated with packages. Any SQL statement executed in a Db2 for z/OS system will always have a package associated with it. If it's a dynamic SQL statement, the associated package will likely be one of the IBM Data Server Driver packages (sometimes referred to as the Db2 Connect packages - entitlement to use the IBM Data Server Driver is by way of a license for Db2 Connect).

Db2 packages are associated with collections. As far as I'm concerned, this is one of the best ways to conceptualize a collection: a package's collection is kind of like a table's schema name (i.e., its high-level qualifier). Does that mean that the same package can be in more than one collection? Yes - and hold onto that thought.

SQL statements of the DDL variety are also associated with packages. As previously mentioned, any Db2-targeting SQL statement will be associated with a package. That being the case, a DDL statement (data definition language, including ALTER and CREATE) will be issued by way of a package, same as a DML statement (data manipulation language, including SELECT, INSERT and UPDATE). Here's where this sometimes trips up even an experienced Db2 DBA: starting with Db2 12 for z/OS, a package's APPLCOMPAT value affects DDL as well as DML statements (with Db2 11, APPLCOMPAT was relevant only for DML statements). Want to ALTER a table with a KEY LABEL specification, to leverage the data set encryption feature of z/OS? Make sure that the APPLCOMPAT value of the package through which the ALTER TABLE statement will be issued (that might be a SPUFI or a DSNTEP2 package) has an APPLCOMPAT value of V12R1M502 or higher, because the KEY LABEL option of ALTER TABLE (and ALTER STOGROUP) was introduced with Db2 12 function level 502.

Access path changes: don't forget the PLANMGMT safety net, and don't cut a hole through that safety net if you don't have to. For a static SQL statement, access path selection (e.g., will a certain index be used in the evaluation of a predicate, will two tables be joined using the nested loop or merge scan or hybrid join method, etc.) is done at package bind time (for a dynamic SQL statement, access path selection is done when the statement is issued by an application program, unless the statement was previously prepared and its prepared form is still in Db2's dynamic statement cache). If a static SQL-containing package is rebound, access paths for the package's SQL statements could change (the APREUSE option of REBIND PACKAGE can be used to tell Db2 to stick with a package's existing SQL access paths, if possible). If a rebind action results in access path changes, the expectation is that the changes will be beneficial from a performance perspective, but that is not always the case. Sometimes, an access path change will have a negative impact on performance (incomplete or inaccurate catalog statistics, for example, could cause the Db2 query optimizer to choose an access path that appears to be good for performance but in fact is not). In such a case, the plan management feature of Db2, which is "on" by default (check the value of PLANMGMT in ZPARM), provides a safety net: it causes Db2 to retain the previous instance of a package when the package is rebound. In the event of a performance degradation caused by a rebind-related change in one or more query access paths, the previous and better-performing instance of the package can be quickly restored via another REBIND PACKAGE action with SWITCH(PREVIOUS) specified. That's great, but sometimes an organization will cut a big hole in this safety net. How? By unnecessarily doing BIND REPLACE for packages, versus REBIND. BIND REPLACE is needed when a program's static SQL statements are modified, because that necessitates a new precompile of the source program, which generates a new DBRM (a file containing the program's static SQL statements) that subsequently has to be bound to replace the prior version of the related package. If a BIND REPLACE results in a performance problem that is of an access path nature, REBIND with SWITCH(PREVIOUS) is not an option, because that restores the previous instance of the same version of the package, and the BIND REPLACE (assuming it was preceded by a Db2 precompile of the program source) generates a new version of the package. Here's where the "unnecessary" comes in: quite often, an organization will have a process whereby a static SQL-containing program will be re-precompiled and BIND REPLACE-ed whenever a program's source is changed, even when the source change does not change the program's static SQL statements. When source code other then SQL is changed, a new precompile and an associated BIND REPLACE is not necessary. Moral of the story: don't do program re-precompiles and BIND REPLACEs when you don't need to do that. And while I'm on the topic of unnecessary package-related actions...

Don't do unnecessary rebinds of packages. At many Db2 for z/OS sites, certain Db2 table spaces are REORGed quite frequently (maybe on a weekly basis), and catalog statistics are regenerated for the table spaces, and packages with static SQL statements targeting tables in the table spaces are rebound. DON'T DO THAT (the package rebinds, I mean), absent a really good reason. Here's the thing: if you have a package of well-performing static SQL statements, keep in mind that one of the really good things about static SQL is that the good performance is locked in, from an access path perspective. Why mess with that? Leave your well-performing packages alone. REORGing and RUNSTATing table spaces from time to time? Good idea. Rebinding dependent packages after running those utilities? Not such a good idea. Am I saying that you shouldn't ever rebind packages? Of course not. Just do it when it makes sense. When does it make sense? Well, it's a good idea to rebind all of your packages (and plans, too) soon after migrating to a new version of Db2, while still at the 100 function level (what we used to call conversion mode), because a) the new version's optimizer enhancements are available at that time, and b) even when access paths don't change, the package code generated by the new version of Db2 (packages are executable code) is likely to be at least a little more CPU-efficient than the code generated for the same package by the previous version of Db2. It also makes sense to rebind a package when you want a query associated with the package to use a newly created index on a table (that would of course require query re-optimization, which a rebind does). And certainly you'd rebind if you wanted or needed to change a package characteristic such as its RELEASE setting (COMMIT or DEALLOCATE) or its isolation level (e.g., cursor stability or uncommitted read) or its APPLCOMPAT value. Bottom line: have a reason for rebinding a package or packages (and "Because we just REORGed and RUNSTATed a table space on which the package depends" is usually not a good reason).

Remember that EXPLAIN information for a package is always available. Do you need to get access path information for a package that was bound without a specification of EXPLAIN(YES)? No problem: just use the EXPLAIN PACKAGE statement (this will extract access path information from the package itself - that information has been part of a package's structure since Db2 9 for z/OS). Want to know what a package's access path would be, if the package were to be rebound? No problem: just rebind the package with EXPLAIN(ONLY) - you'll get the access path information, but Db2 won't actually regenerate the package. More information about EXPLAIN PACKAGE and rebind with EXPLAIN(ONLY) can be found in an entry I posted to this blog a few months ago.

Get different behaviors for the same package by using different bind specifications and different collections. I mentioned previously that a collection is somewhat akin to a schema name for a package. Just as tables that are logically identical and have the same unqualified name can be in different schemas, so packages that are generated from the same SQL statements can be in different collections. This can be very handy when you want different behaviors for the same package in different situations. For example: suppose that in some cases you want queries associated with package PROGXYZ to be executed with cursor stability isolation, while in other cases you want the same queries to be executed with uncommitted read isolation. No prob: bind the package with ISOLATION(CS) into collection CSCOLL (or whatever name you want to use for the collection) and with ISOLATION(UR) into collection URCOLL (and that second bind action might be a BIND COPY from CSCOLL into URCOLL, with the different ISOLATION value specified). How do you "point" an application that uses the PROGXYZ package to the CSCOLL or the URCOLL collection, as needed? For a static SQL package used with (for example) a batch job or a CICS transaction, you could do this with a plan's PKLIST. You could also let the program itself operate the "switch in the tracks" by using the SET CURRENT PACKAGESET special register; or, for applications that are DRDA requesters you could point to one collection or another by way of the Db2 profile tables (and here we could be talking about alternate collections of the IBM Data Server Driver / Db2 Connect packages). You could turn statement concentration (automatic parameterizing by Db2 of dynamic query predicates that reference literal values) on or off for different applications in this way, or use different APPLCOMPAT values, or provide a different high-level qualifier for unqualified table names in SQL statements, or do a number of other things (to get the ideas flowing, check out the list of package bind options in the online Db2 documentation).

OK, that's enough package stuff for one blog entry. I hope that this information will be helpful for you. As always, thanks for stopping by.

Friday, July 30, 2021

CREATE OR REPLACE: Agile Deployment of Db2 for z/OS Stored Procedures

A relatively recent Db2 for z/OS enhancement that might have escaped your notice may well merit your consideration. Here's what I'm talking about: Db2 12 function level 507 (available since June of 2020, related to APAR PH24371) introduced CREATE OR REPLACE syntax for stored procedures. In a Db2 for z/OS context, CREATE OR REPLACE syntax was initially provided for Db2 12 advanced triggers, and it eliminated what had been a real irritant with regard to modifying a trigger. Extending CREATE OR REPLACE to stored procedures delivers a big benefit that I can sum up in two words: deployment agility.

Agility in development and deployment of new application functionality has very much grown in importance over the past few years, to no one's surprise. In an age of increasing enterprise digitalization, getting new capabilities and services into the hands of your customers and users in a timely and frequent way can be major competitive differentiator (or, in the case of a public sector organization, a source of citizen delight). Anyone who works with and advocates for Db2 for z/OS wants that DBMS to be fully plugged into an enterprise's DevOps pipeline, and that makes agility of deployment of the Db2 for z/OS objects associated with an application a priority. Suppose a Db2 for z/OS-accessing application utilizes some Db2 stored procedures, and a new iteration of the application is going to require modification of some of those stored procedures. Does that create any issues that could cause some friction on the deployment front? Quite possibly, yes, prior to Db2 12 function level 507.

How could a need to modify some Db2 stored procedures put some sand in the deployment gears? It comes down to the pre-function-level-507 options for deploying a new or modified stored procedure in "the next-higher-level" environment (e.g., dev to test, test to prod). Let's say that we're deploying to the production Db2 for z/OS environment, and let's say further that the stored procedures are of the native SQL variety (i.e., they're written in SQL PL), because it's for such procedures that CREATE OR REPLACE syntax is particularly helpful. OK, how do you get those modified native SQL procedures deployed to production in a pre-function-level-507 environment? You could get the job done using the BIND PACKAGE command with the DEPLOY option, but that approach is less than ideal from an agility perspective. Why? Because it relies on a Db2 command (BIND PACKAGE) versus a SQL statement. A Db2 for z/OS DBA might say, "What's the diff? I can use a Db2 command as easily as I can use a SQL statement." Sure you can, if you're using a tool like SPUFI (the TSO/ISPF-based interactive interface that comes with Db2), but DevOps engineers are not Db2 DBAs, and they're not SPUFI users. DevOps engineers like to use tools that work with all of the systems and servers across which an application's components are deployed. In other words, they have in mind a unified DevOps pipeline that can be used for all of an enterprise's applications. The tools generally used for such a pipeline are much more suited to issuing SQL statements than DBMS commands.

OK, BIND PACKAGE with DEPLOY is not your only option in a pre-function-level-507 Db2 system. You could go the drop and re-create route, couldn't you? Drop the already-there stored procedures in the production environment, and re-create them with the desired modifications. DROP PROCEDURE and CREATE PROCEDURE are SQL statements, not Db2 commands, so this approach satisfies the agility requirement, right? Eh, not really. Why not? Well, here's one problem: after you drop and re-create the Db2 stored procedures, you have to re-grant associated authorizations (referring to the EXECUTE privilege that one or more IDs need on the stored procedures). That can be a hassle. Another problem with the drop and re-create approach: if a stored procedure that you want to drop is called by a SQL PL routine (such as a native SQL procedure), the DROP will fail. You could drop the calling SQL PL routine that was blocking the drop of the stored procedure, but then you'd have to re-create that routine in addition to creating the dropped stored procedure. And, what if the DROP-blocking SQL PL routine is itself called by another SQL PL routine? You'd have to drop and re-create that routine, too. As you can see, the situation can get pretty messy pretty fast.

Enter CREATE OR REPLACE functionality, and things get easy peasy. I'll illustrate with an example. Let's say that we have this native SQL procedure already in the production Db2 environment:

CREATE PROCEDURE MYPROC1

( IN P1 CHAR(5),

OUT P2 DECIMAL(15,2) )

BEGIN

SELECT AVG(SALARY) INTO P2

FROM DSN8C10.EMP

WHERE WORKDEPT = P1;

END

OK, now I'm going to deploy a slightly modified form of this stored procedure (slight modification highlighted in orange). Function level 507 (or higher) is activated in my Db2 12 environment, and the package through which I'm issuing SQL statements (an IBM Data Server Driver package, or whatever) has an APPLCOMPAT value of V12R1M507 (or higher), so I can use CREATE OR REPLACE syntax, as shown below.

CREATE OR REPLACE PROCEDURE MYPROC1

( IN P1 CHAR(5),

OUT P2 DECIMAL(15,2) )

BEGIN

SELECT AVG(SALARY + 1000) INTO P2

FROM DSN8C10.EMP

WHERE WORKDEPT = P1;

END

What will Db2 do in this case? Db2 will check to see if a stored procedure named MYPROC1 already exists in the production system. One does (we created it earlier), so Db2 replaces the existing MYPROC1 procedure with the modified MYPROC1. After that replace action has been accomplished, do I need to re-grant authorizations for the MYPROC1 procedure? No! Why not? Because you didn't drop MYPROC1 (the word "DROP" appears nowhere in the SQL statement above). What if MYPROC1 is called by a SQL PL routine? Will that block the replace action? No! Why not? Because the SQL PL caller of MYPROC1 would block a DROP of MYPROC1, and you didn't drop MYPROC1 (you replaced MYPROC1, and a procedure replace action is not blocked by a SQL PL routine that calls the procedure being replaced).

Go forward in time a bit more, and we need to deploy yet another modified form of MYPROC1. This time, I want to add a version V2 of MYPROC1, as opposed to replacing the existing version of the procedure, so I specify VERSION V2 in my CREATE OR REPLACE statement (I can do that if the stored procedure is a native SQL procedure - can't if it's an external stored procedure written in a language such as COBOL or C):

CREATE OR REPLACE PROCEDURE MYPROC1

( IN P1 CHAR(5),

OUT P2 DECIMAL(15,2) )

VERSION V2

BEGIN

SELECT AVG(SALARY + 5000) INTO P2

FROM DSN8C10.EMP

WHERE WORKDEPT = P1;

END

What will Db2 do this time? It will check to see if there is already a version V2 of MYPROC1 in this production system. There isn't, so Db2 creates version V2 of MYPROC1.

One more illustrative change: I want to modify version V2 of MYPROC1. To do that, I issue a CREATE OR REPLACE PROCEDURE statement for MYPROC1 that includes a reference to version V2 of the procedure:

CREATE OR REPLACE PROCEDURE MYPROC1

( IN P1 CHAR(5),

OUT P2 DECIMAL(15,2) )

VERSION V2

BEGIN

SELECT AVG(SALARY + 9000) INTO P2

FROM DSN8C10.EMP

WHERE WORKDEPT = P1;

END

Db2 will check to see if there is already a version V2 of MYPROC1 in the production system. There is, so Db2 will replace version V2 of the procedure.

And there you have it: a SQL statement-driven (not command-driven) means of creating or replacing stored procedures in "the next-higher-level" environment that avoids the hassles associated with a drop and re-create approach. Big plus. Organizations have already been doing agile things with Db2 for z/OS stored procedures (especially the native SQL kind), like using their source code management tool of choice (GitLab, for example) to manage stored procedure source code (which, for a native SQL procedure, is the procedure's CREATE PROCEDURE statement), and using the IBM Db2 for z/OS Developer Extension for Visual Studio Code to give developers an assist in creating and debugging native SQL procedures. Add CREATE OR REPLACE PROCEDURE to that picture, and you take development and deployment agility for Db2 stored procedures (especially the native SQL kind) to a whole new level. Check it out.

Wednesday, June 30, 2021

Db2 for z/OS: REST Services Scalability

I received an email the other day from a Db2 for z/OS system administrator who had recently attended a workshop on the Db2 REST interface. He mostly liked what he heard, but he had a concern about the scalability of Db2 REST services. Knowing that REST requests are stateless in nature, he anticipated that a large volume of such requests targeting a Db2 for z/OS server would drive a large amount of activity involving connection processing, thread processing and user authentication and authorization. Wouldn't those areas of system operation end up being bottlenecks that would throttle transaction throughput for client applications?

The Db2 system administrator's concerns were certainly understandable, but a Db2 for z/OS system can in fact handle a very large volume of REST requests - several thousand per second should be do-able, assuming sufficient server-side processing capacity. In this blog entry, I'll provide information on how scalability is achieved for Db2 for z/OS REST services, referring to the three areas of associated server-side processing mentioned by the Db2 system administrator.

Connection processing

This is where the single most important step was taken by the IBM Db2 for z/OS developers responsible for the Db2 REST interface: they leveraged Db2's inactive connection functionality. A LONG time ago, Db2 got this functionality via the ZPARM parameter CMTSTAT. Starting with Db2 V8 for z/OS, the default value for CMTSTAT became INACTIVE, and here's what that means: say an application server establishes a connection to a Db2 for z/OS subsystem via that Db2 system's distributed data facility (DDF - the Db2 address space that handles communication with network-attached applications). A transaction associated with the connection is initiated, and the connection from the application to Db2 is paired with a DBAT (a DDF thread) for the purpose of SQL statement processing. The transaction completes, and the DBAT goes back in the DBAT pool, and the connection goes into an inactive state. This "inactive state" of the transaction is strictly a Db2-side thing - it is not visible to the application, which in a logical sense continues to see that it is connected to the Db2 system. When the next transaction associated with the connection is submitted, the connection on the Db2 side is "snapped back" to an active state, and paired with a DBAT for SQL processing, and then the transaction completes and the connection is again separated from the DBAT and placed back in an inactive state, and so on and so on.

Why was inactive connection functionality built into Db2? Originally, this was done largely to enable a single Db2 subsystem to support a very large number of connections from network-attached applications - there can be up to 150,000 such connections to one Db2 subsystem (n times 150,000 for a Db2 data sharing group with n members). No way can one Db2 subsystem handle anything like 150,000 concurrently-executing DDF transactions (never mind the processing power you'd need - there wouldn't be enough "below-the-2GB-bar" virtual storage in the Db2 database services address space for that number of threads), but Db2 doesn't have to be able to process as many in-flight DDF transactions as there are connections to Db2, because at any moment in time the majority - often, the large majority. - of those connections will be in a not-being-used-right-now state. This is why the value of the Db2 ZPARM parameter CONDBAT (maximum connections to a Db2 subsystem via DDF) is typically several times the value of MAXDBAT (maximum number of connections for which transactions can be concurrently in-flight). Db2 inactive connection functionality makes all this feasible because the CPU cost of flipping a connection from the inactive to the active state and back is very small, and the "footprint" of an inactive connection (virtual storage used, etc.) is very small.

OK, what does this have to do with the scalability of Db2 REST services? Well, you may be aware that REST requests are "stateless" in nature - each and every REST request is its own independent transaction (one unit of work, from a Db2 perspective), with no expectation or assurance of a related follow-on request from a given client-side application. With that being the case, you might think - like the Db2 system administrator to whom I previously referred - that there would be a whole ton of Db2 connect and disconnect activity associated with a large volume of REST requests. This isn't, after all, like the typical Db2 DRDA application scenario (DRDA - distributed relational database architecture - being Db2's protocol for distributed database processing) in which an application connects to a Db2 subsystem and stays connected to that subsystem for a long time (hours, maybe days). Connect and disconnect processing is kind of expensive. Wouldn't that cost put the kibosh on Db2 REST service scalability? It might have, but for an enhancement delivered a few years ago via the fix for Db2 APAR PI86868: when a client application sends a REST request to a Db2 subsystem, the associated connection will be retained by the Db2 subsystem, in an inactive state, for up to 15 seconds, so that it can be used again to process another request sent via that connection. When that next REST request associated with the connection (i.e., from the same client IP address and ephemeral port) comes along - assuming that that happens within the aforementioned 15-second window - the connection will be snapped from the inactive to the active state. The request will be processed, and the connection will again be flipped to the inactive state, where it can remain for up to another 15 seconds while Db2 waits for another REST request associated with the connection. If the inactive connection remains unused for 15 seconds it will be terminated by Db2.

In a high-volume REST request situation - exactly the scenario that would otherwise cause us concern - it is highly likely that Db2's temporary retention of connections from REST request-issuing applications will result in a lot of connection reuse (it doesn't take a whole lot of volume to get tp one request every 15 seconds or less from one connection), and that is a key factor in the scalability of Db2 REST services.

Thread processing

In this area, Db2 REST service scalability is boosted by something that's been around (as I recall) since the early 1990s: the DBAT pool. When Db2 creates a DBAT to service a DDF transaction, it doesn't just chuck that DBAT when the transaction completes. Instead, it puts the DBAT in the DBAT pool so that it can be used again to service another transaction (a DBAT can stay in the pool for a number of seconds specified by the ZPARM parameter POOLINAC, which has a default value of 120 - if the DBAT has not been reused in that time, it will be terminated). If you use your Db2 monitor to generate a statistics long report (depending on the monitor product used, that might be called a statistics detail report), you should see a section in the report with a heading like GLOBAL DDF ACTIVITY. In that section you should see a couple of fields with names like DBATS CREATED and POOL DBATS REUSED. If we call the former field A and the latter B, the DBAT reuse rate is B / (A + B). It is very common, in a production Db2 environment with a significant DDF workload, to see a DBAT reuse rate that exceeds 99.9%. High rates of DBAT reuse have long been part of the Db2 DDF efficiency story at many sites, and they are an important contributor to Db2 REST service efficiency and scalability.

[Fun fact: though you might think of "active" DBATs as those currently being used to service DDF transactions, and 'inactive" DBATs as those in the pool, that is not an accurate conception. In fact, all DBATs, including those in the pool, are active. What you have are DBATs that are in a connected state, servicing DDF transactions, and DBATs that are in a disconnected state - the latter are the ones in the pool.]

Keep in mind that once a DBAT is provided for use in processing a REST request, the package associated with the requested service has to be allocated to the DBAT. Here, you can enhance efficiency by making it very likely that the package in question will be in-memory, in the oddly named skeleton pool (the package cache - part of the Db2 EDM pool). How can you check on this? Generate a statistics long report, using your Db2 monitor, and look for the section in which EDM pool activity is provided. In that report section, you'll see a value for something like PT REQUESTS (PT is short for package table - this is a request for a package) and another value for something like PT NOT FOUND (meaning, not found in the skeleton pool, in which case the package has to be retrieved from the Db2 directory on disk). The ratio of PT REQUESTS to PT NOT FOUND should be at least several thousand to one, and ideally multiple tens of thousands to one - if it isn't (and if the z/OS LPAR is not short on real storage), make the skeleton pool larger (the ZPARM parameter to adjust is EDM_SKELETON_POOL).

User authentication and authorization

Let me cover authorization first. What this involves is checking to see if the ID associated with a REST request has the Db2 EXECUTE privilege on the package that will be executed to provide the requested service. That's it - a good bit simpler versus checking the various privileges that might be required for execution of dynamic SQL statements (e.g., the SELECT privilege on table T1, the UPDATE privilege on table T2, etc.). This straightforward EXECUTE-on-package check is made particularly efficient by an in-memory cache of package authorization information maintained by Db2. I have seen on a number of occasions that a Db2 subsystem's package authorization cache, sized by way of the ZPARM parameter CACHEPAC, is smaller than it should be, leading to a sometimes high volume of accesses to the SYSPACKAUTH table in the Db2 catalog to check package execution authorization. That too-small situation will soon be (maybe already is, for your system) a thing of the past, thanks to the fix for Db2 APAR PH28280, which removes CACHEPAC from ZPARM and sets its value internally to the maximum size of 10,485,760 bytes.

The value of package authorization caching is particularly high when a relatively limited number of IDs are checked for package execution authorization, and that is often the case in a DDF application environment, whether the client applications are issuing REST requests or SQL statements - it is quite common for the ID in question to be an application's ID, as opposed to an end-user's ID (in such a case, the application performs authentication for end users on its end, and then interacts with Db2 on behalf of end users).

The user authentication story is much the same. When processing a DDF transaction (REST-related or otherwise), Db2 authenticates the associated ID (and accompanying password or certificate) by invoking the RACF macro RACROUTE REQUEST=VERIFY (and I recognize that you might use in your environment a different z/OS security manager from another vendor). If the ID is successfully authenticated, Db2 caches the credentials in-memory for up to three minutes. If, during that interval, the same ID needs to again be authenticated then Db2 will reuse the cached credentials and there will not be a need for another invocation of the RACROUTE REQUEST=VERIFY macro. This mechanism can majorly enhance user authentication efficiency and scalability related to Db2 REST requests, just as Db2's package authorization cache enhances authorization scalability for such requests.

And there you have it. With regard to connection processing, thread processing, and ID authentication and authorization processing, Db2 for z/OS provides mechanisms to deliver scalability and efficiency for REST service requests. You can use Db2's REST interface with confidence, even for high-volume applications.

Friday, May 28, 2021

The REST Interface: Db2 for z/OS as a Utility, and the Role of SQL Programmers

Recently, I was out on the West Coast of the United States to celebrate my son's graduation from college. At a party with some students and their parents, I conversed for a while with one of my son's good friends. This young man was about to get his degree in computer science. We talked shop, and I asked him about the code he'd typically write to interact with a data server. Perhaps he used a generic (i.e., non-DBMS-specific) SQL interface, such as JDBC or ODBC? No, he said, he really didn't know SQL that well. I was a little surprised (on the inside) to hear that. I mean, SQL is kind of fundamental, isn't it? Then I reframed my question: "OK - I guess you'd rather just invoke a data service via something like a REST API, and let that service do what you want done, without having to worry about how the service works or being concerned with any of the particulars of the back-end system on which the service executes?" "Yeah," he said. "More like that."

I know this friend of my son is a smart and hard-working guy. It's not at all like he's too lazy to write data-access code like SQL. He doesn't care to do that because he doesn't need to. He's part of a new generation of developers who see a universe of services of all kinds. Sometimes they'll be involved in the creation of new services. Often they'll look to discover and invoke existing services - data-related and otherwise - to provide application functionality needed by organizations and end-users. This change in the application-development mindset is gaining momentum, and it will have some fairly profound implications for people who administer Db2 for z/OS and other data-serving platforms. Earlier this year I posted a blog entry that focused on the nuts and bolts of Db2's REST interface. In this entry, I'm going to shift from the "how" of the Db2 REST interface to the "so what?"

A really big "so what" for Db2 for z/OS people will be the change in how the data server they support is perceived by application developers - particularly, developers on the client side of client-server applications. For growing numbers of such developers, Db2 for z/OS itself will be seen as, essentially, a utility. And I'm not using "utility" in the technical Db2 sense (e.g., the REORG utility); rather, I mean "utility" as in, "electric utility." Think about that. When you flip the light switch in a room, you expect electricity to flow and lights to come on. You probably don't think about where the electricity came from (maybe a power plant located some miles from your home), or how it was generated (maybe a natural gas-powered generator, maybe a wind farm, maybe a hydroelectric installation by a dam). Your "electricity interface" - the light switch - is easy to use.

if you go from your home to someone else's home, or to a hotel room or to a conference room in an office building, you easily locate the light switch because it looks like the ones in your home; and, you benefit from "consistency of electricity invocation": whether you're at home or in an office or wherever, you flip the switch up and electricity flows. Flip the switch down, flow stops and the lights go out.

Suppose you have an electrical outlet hooked to a switch. You have all manner of electrical devices - a music player, a blender, a printer. They can all be utilized through the same standard interface (the outlet). If you take these devices to another room in another building, the interface is the same: plug 'em in and use 'em. Your thinking is about what you'll do with the electricity you can invoke through the standard interface - not about the way in which the electricity was generated for your use.

OK, your a client-side application programmer, coding in accordance with the REST architectural style. You have a consistent interface by which you can invoke services, and some of those services might be data-related. You (or, more likely, your program will) "flip the switch" by issuing a REST request, and the requested data service (retrieve such-and-such data associated with thus-and-so input, or persist this value, or update this value, or whatever) will get done, as you expect (the lights come on), and you (or your program will) move on to some other action. How did that data service get performed? Where did that data service get performed? DON'T KNOW. DON'T CARE. It worked, didn't it? Do you care about that? Probably not. Why? Because the result - successful execution of the requested service - was expected. When you flip the light switch in a room, do you think, "Cool! THE LIGHTS CAME ON!" No? Neither do I.

Shift the perspective now to that of the people - like DBAs - who work at the "data-services utility," which they know as a Db2 for z/OS system. That system has such cool stuff, doesn't it? Partition-by-growth table spaces! Automatic clean-up of pseudo-deleted index entries! Relative page numbering! LOB in-lining! OLAP expressions! Global locking on a Parallel Sysplex! We geek out about this stuff at conferences. Wouldn't client-side developers like to know about that cool stuff? No, they would not, and that is especially true if their way of interacting with your data-serving utility - invoking REST APIs - completely abstracts your world from their point of view.

Is this "don't know, don't care" mind set of client-side developers regarding the particulars of the system you support a reason to slump your shoulders and feel down about what you do? Of course not. I know plenty of people who work for electric utilities. You can take pride in doing what they do for their consumers: you provide a service (execution of data-processing services) that is easy to invoke via a consistent interface (REST requests). The provided service does WHAT is requested WHEN it is requested (any time, day or night) and as expected (no surprises). Interestingly, the BETTER your data-services utility is at doing what it does, the LESS your consumers (client-side developers) will think about the goings-on at your end. Freak-out time is when the switch is flipped and the lights DON'T come on, or they come on in an unsteady, flicker-y kind of way. You keep delivering service that is boringly (from the consumer perspective) excellent, and when those consumers DO have reason to think about your utility, it will be because they need a data-server for a new application and they want to use that data-services utility that has that rock-steady quality - the one you support.

And let's not forget, in our survey of personas in this new client-server world, the group of people who are key colleagues of the DBAs and the systems programmers at the Db2 for z/OS utility: the SQL programmers. That's right. Remember, when the REST architectural style is employed there is no client-side SQL. Any SQL statements executed at your site will be those invoked via REST requests issued by client-side developers. The SQL - whether an individual statement or a stored procedure - that performs a service associated with a REST request has to be developed by someone, and that someone is generally going to be, from the client-side developer's perspective, a person who works at the data-service utility. A client-side developer may need a new service (as opposed to the situation in which you REST-enable an existing service), and that service may be performed on the utility side through execution of SQL. That detail doesn't matter to the client-side developer, who just says, "I want to send this input and have this action performed and receive this output." Someone on the utility side will analyze the specifications of the required service and will code SQL to get it down. The SQL will be REST-enabled, and away things go: service is invoked (switch is flipped) on the client side, and on the utility side the associated SQL statement (about which the consumer likely has no clue) is executed, and it executes quickly and efficiently and reliably because the DBAs and systems programmers have constructed and maintain an environment that delivers those qualities of service. Thus it is that the SQL developers' role will come to overlap more and more with the role of the DBAs, and vice versa. They're on the same team, and together they provide the so-good-it's-not-noticed level of data service to the client-side consumers.

I've been doing what I do for the better part of 35 years, and this emerging application development landscape has me feeling young again. Client-side developers, I am A-OK with you having little to no visibility with regard to the the technology I support and what I do to support it. When you flip the switch, and my utility is responsible for the lights coming on right away and as expected so that you (your program) can proceed to do something that needs doing, I'll be happy even if you don't give it another thought. Trust me on that one.