Monday, December 30, 2019

Db2 for z/OS and RACF, Part 1: External Db2 Security

In a Db2 for z/OS context, RACF (the IBM security management subsystem for z/OS systems) can serve two important purposes, providing security for a Db2 environment in an external and an internal sense. Db2 external security is about determining which users/processes can connect to a Db2 subsystem, and how. Internal security concerns the Db2-related things that a user/process can do once the user/process has connected to a Db2 subsystem. External security for a Db2 subsystem is almost entirely a RACF matter, whereas internal security can be managed either through Db2 (largely via GRANT and REVOKE statements) or through RACF (via a Db2-supplied exit provided for this purpose). The overall subject is a little too broad for a single blog entry, so I'll tackle it in two parts. This first entry will focus on RACF in a Db2 external security sense. Part two, which I'll post within the next 3-4 weeks, will cover RACF management of Db2-internal security (which, again, is an alternative to Db2 management of Db2-internal security).

[Note: in this blog entry (and the next) I will focus exclusively on RACF, versus alternative z/OS security management products available from other vendors. If you use such an alternative product, you can check with the appropriate vendor to see how the RACF features and functions that I describe herein map to the alternate product's capabilities.]

With regard to RACF-provided Db2 external security, there is first of all a fundamental area of RACF functionality of which most every Db2-using organization makes use. Here, I am referring to the ability RACF provides to create (and authenticate) user profiles - basically, the ability to define and manage authorization IDs in a z/OS system. These are often referred to as "user IDs," but a user ID may or may not be associated with a user who is a person - it may be an ID associated with an application process. When a user or an application process wants to connect to a Db2 for z/OS subsystem, it presents an ID and an associated authentication credential (that credential is typically a password, but it could be a certificate - more on the latter below). When the ID and password are presented, they are usually processed through one of two Db2-supplied exits, called the sample connection exit (named DSN3SATH) and the sample sign-on exit (named DSN3SSGN). These exits are referred to as "sample" because they are provided in source form, and that source code can be modified if desired or required (note that when the sample connection and sign-on exits are used, from an execution perspective they are named DSN3@ATH and DSN3@SGN, respectively). The connection exit is driven when a subsystem/started task (e.g., CICS or IMS) or a user or a process wants to connect to a Db2 subsystem, and the sign-on exit is driven when a process (usually a CICS or an IMS transaction) wants to access Db2 through a subsystem/started task (e.g., CICS or IMS) that is already connected to the Db2 subsystem in question.

[As noted, passwords are typically used to authenticate presented IDs because that is a very easy thing for a security administrator to set up. If, however, there is a desire or a need to use certificates instead of passwords for ID authentication, that support has been there since Db2 10 for z/OS. Information on using certificates for ID authentication in a Db2 context can be found in an IBM "red paper" titled, DB2 for z/OS: Configuring TLS/SSL for Secure Client/Server Communications (that paper can be downloaded from the IBM Web page at http://www.redbooks.ibm.com/abstracts/redp4799.html?Open).]

When the Db2 sample connection or sign-on exit is driven, two main things happen: first, Db2 calls RACF to verify that the presented ID is valid and that the accompanying authentication credential (password or certificate) is in fact the one that is associated with the ID - if that user verification test fails, the connection or sign-on request is rejected. Secondarily, Db2 will request that RACF provide a list of all the group IDs (if any) to which the presented user ID is connected. These RACF group IDs become the list of Db2 secondary authorization IDs with which the primary authorization ID is associated (secondary authorization IDs are very helpful in managing Db2 privileges - when a privilege is granted to a RACF group ID, it is available to all the associated user IDs, saving the trouble of granting the privilege separately to each of those individual user IDs).

OK, that's basic RACF external security for Db2: no valid user ID (or lack of the required password or certificate), no connection to the Db2 subsystem. Suppose, though, that you need connection-permission security to be handled in a more granular fashion. For example, you may have a need for ID SMITH to be able to connect to a Db2 subsystem called DBP1, but only as a DRDA requester, or only as a REST requester, or only by way of TSO. In other words, SMITH can connect to DBP1 this way, but not that way. If such a need exists, RACF can do the job, but a little more set-up is required. For one thing, a RACF general resource class called DSNR has to be active. The DSNR resource class pertains specifically to Db2 connection permissions from a RACF perspective, and activating that resource class enables one to be more granular in the management of Db2-external security.

With the DSNR resource class activated, a security administrator can set up various profiles that enable management of Db2 connection permissions based on particular "come-from" environments. The available profiles of this nature are listed below (and one would substitute for "ssn" the name of the Db2 subsystem for which connection permission is being managed):
  • ssn.MASS for IMS (MASS is a reference to multi-address space security - access to Db2 via IMS typically involves several IMS address spaces) 
  • ssn.SASS for CICS (SASS is a reference to single-address space security)
  • ssn.RRSAF for the Recoverable Resource Services access facility
  • ssn.BATCH for the TSO attachment facility, the call attachment facility (aka CAF) and Db2 utilities
  • ssn.DIST for the DDF address space
  • ssn.REST for access via Db2's REST interface

Once a protected access profile has been defined for a Db2 subsystem, IDs can be permitted (in RACF terminology) to access the named Db2 subsystem by way of the specified access profile. To get a better handle on how all this comes together, consider a situation in which a user with ID SMITH wants to access a Db2 subsystem called DBP1:
  • If the RACF resource class DSNR has not been activated then from a RACF perspective SMITH can access DBP1 via any type of "come-from" environment (e.g., via TSO, a batch job, as a DRDA requester, etc.), though of course SMITH will not be able to get into the z/OS system itself unless RACF has determined that SMITH is a valid ID and that the required password (or certificate) has been provided for or by SMITH.
  • If the RACF resource class DSNR is active and a profile named DBP1.BATCH has been defined, SMITH will not be able to access DBP1 via TSO or a batch job or a Db2 utility unless the ID SMITH has been permitted to use the DBP1.BATCH profile (alternatively, SMITH could access DBP1 via TSO or a batch job or a utility if SMITH is connected to a RACF group ID that has been permitted to use the DBP1.BATCH profile).

A little something extra to note about the ssn.DIST profile: if user SMITH can access DBP1 via a batch job (because SMITH has been RACF-permitted to do so), and SMITH submits a batch job that references a table in Db2 subsystem DBP2, that access to the table in the DBP2 subsystem will require the ID SMITH to be RACF-permitted to use the DBP2.DIST profile on the z/OS system in which DBP2 runs (assuming that the DSNR resource class is active on that z/OS system, and assuming that DBP1's Db2 communications database has been set up so that DBP1 can be a DRDA requester to DBP2). In this case SMITH will not require RACF permission to use the DBP1.DIST profile, even though SMITH is using DBP1's DDF address space to get to the table in the DBP2 subsystem, because the DBP1.DIST profile is checked at connection-request time, and SMITH is already connected to DBP1 by way of a batch job (from a connection-request perspective, it is the DDF address space of DBP2 that is involved here, and that is why SMITH needs RACF permission to use the DBP2.DIST profile).

And yet one more thing about access to a Db2 subsystem via DDF: suppose you want to restrict SMITH's access to DBP1 so that connection is only permitted via DBP1's DDF address space (i.e., not batch, TSO, etc.) and only from certain IP addresses. Getting granular to that degree for connection-permission management requires some additional set-up actions. In particular, the RACF SERVAUTH class will need to be used, in combination with z/OS TCP/IP Network Access Control. Information on doing this can be found on this page in the Db2 for z/OS Knowledge Center on IBM's Web site: https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/seca/src/tpc/db2z_grantaccess2remotereq.html.

Information on other matters I have covered in this blog entry, such as defining protected access profiles for a Db2 subsystem and permitting IDs to use those profiles, can be found here: https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/seca/src/tpc/db2z_establishracf4db2.html.

Information on the Db2 connection and sign-on exits can be found here: https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/seca/src/tpc/db2z_specifyconnectsignonroutine.html.

I hope that this information will be useful for you. Tune in next month for a look at how RACF can be used to manage Db2-internal security.

Tuesday, November 26, 2019

Clearing Some Things Up Regarding the CICS-Db2 for z/OS Interface

Not long ago, a very experienced Db2 for z/OS person contacted me with some questions about the CICS-Db2 interface. He mentioned that he did this because he was having trouble finding the information he was seeking in the product documentation. I provided this person with answers to his questions, and then thought to myself, "If a veteran Db2 for z/OS technical professional was challenged in getting a clear picture of the CICS-Db2 interface, perhaps other people are in the same boat." In this blog entry, I'll try to explain how the piece parts of the relationship between a CICS region and a Db2 subsystem fit together, and in doing that I'll aim as well to clarify some of the terminology one might encounter in this domain.

The specific questions

There were a couple of questions that sparked my exchange with the Db2 person. First, he noted that when he and a CICS colleague of his displayed information about CICS-Db2 threads for a certain CICS region (using either a performance monitor or the Db2 command -DISPLAY THREAD), they'd see that a lot of the threads were in IDENTIFY status, and they weren't sure as to what that meant.

Second, they saw that the total number of threads appeared to equal the value of the TCBLIMIT parameter for the CICS region in question (more on TCBLIMIT to come), and they also observed that the TCBLIMIT values for the CICS regions that access a Db2 subsystem added up to a number that exceeded the CTHREAD value in the Db2 subsystem's ZPARM module, and this caused them some concern (CTHREAD is the maximum number of "local to Db2" threads - i.e., not DDF threads - that can be allocated at one time).

Where is the interface between a CICS region and a Db2 subsystem defined?

The veteran Db2 person who directed these questions to me referenced a parameter in "the RCT." I knew exactly what he meant, and "RCT" is an acronym often used in describing a CICS-Db2 set-up; however, it is a technically outdated term. Here's the deal: RCT, which is short for resource control table, refers to a macro that, long ago, provided the specifications for the interface between a CICS region and a Db2 subsystem. More than 20 years ago, the specifications pertaining to this interface moved from the RCT to CICS RDO, which is short for resource definition online. You have a DB2CONN resource in which things like the maximum number of CICS-Db2 subtask TCBs ("subtask TCBs" is explained below) and the maximum number of pool threads for the CICS region are defined, and one or more DB2ENTRY resources where a plan can be associated with a CICS transaction (or a group of transactions) and where the number entry threads for a transaction (or group of transactions) - including, perhaps, protected entry threads - is specified (another CICS resource, called DB2TRAN, can be used to associate a CICS transaction with a DB2ENTRY resource, when that transaction is not explicitly specified in the DB2ENTRY resource).

Answering the questions: IDENTIFY status

In spite of being something you see in a Db2 monitor display of thread information, or in the output of the Db2 command -DISPLAY THREAD, IDENTIFY status is something I think of more in terms of tasks than of threads. Here, I'm referring specifically to the subtask TCBs that are used by CICS-Db2 transactions. When such a task is in IDENTIFY status, I see it as being a task is known to Db2 - known, I would say, as essentially a connection from a particular CICS region to the Db2 subsystem. When a CICS-Db2 transaction executes, it uses a subtask TCB, and that TCB in turn uses a Db2 thread.

When a CICS-Db2 transaction finishes, the Db2 thread it used goes away (unless it was a protected thread), but the subtask TCB it used remains. In my view, a CICS subtask TCB used for CICS-Db2 transactions is in IDENTIFY status when it is not being used. Note that when a CICS-Db2 transaction uses a protected entry thread, the thread persists when the transaction completes, and remains associated with the transaction's subtask TCB, so that the thread can be reused for another execution of the transaction (more information on how protected threads drive CICS thread reuse can be found in an entry I posted to this blog a few years ago).

Answering the questions: CICS's TCBLIMIT and Db2's CTHREAD

In the DB2CONN resource for a CICS region, there is a value, TCBLIMIT, that indicates the maximum number of CICS-Db2 subtask TCBs that can exist for the CICS region. If all of those subtask TCBs are being used by transactions and another transaction request is received, that request will be queued to wait for a subtask TCB to free up.


I referred previously to the CTHREAD parameter in the ZPARM module of a Db2 subsystem. The value of CTHREAD, as noted, specifies the maximum number of "local to Db2" threads (i.e., not DDF threads) that can be allocated at one time. Threads used by CICS-Db2 transactions are among those considered to be be "local to Db2."

Can the sum of the TCBLIMIT values for all of the CICS regions connected to a given Db2 subsystem exceed the Db2 subsystem's CTHREAD limit? Yes, because at any given time there could be a number of subtask TCBs that aren't being used by transactions, and those subtask TCBs don't need threads (though, as mentioned, a protected thread will remain associated with a subtask TCB as long as the protected thread exists).

How many Db2 threads can a particular CICS region use at once? That would be the sum of the THREADLIMIT values for all the DB2ENTRY resources and for the DB2CONN resource for the region (THREADLIMIT for the DB2CONN resource refers to CICS pool threads), plus the value of COMTHREADLIMIT in the DB2CONN resource (this value pertains to CICS command threads). Note that the TCBLIMIT value in a CICS region's DB2CONN resource should not be smaller that that sum of THREADLIMIT values plus the COMTHREADLIMIT value - having enough CICS-Db2 threads for a CICS region is not of much value if there are not enough subtask TCBs that can be used for CICS-Db2 transactions (it should be OK if the value of TCBLIMIT is a little higher than the sum of the THREADLIMIT values plus the COMTHREADLIMIT value).

Another thing to consider in this area: have enough CICS-Db2 pool threads, if you have Db2 transactions that always use pool threads, or that can overflow to the pool if all the transaction's entry threads are in-use. I recommend issuing the CICS command DSNC DISPLAY STATISTICS, and examining the output, which might look something like this (I added color highlighting):


DFHDB2014 07/09/98 14:35:45 IYK4Z2G1 Statistics report follows for RCTJT
accessing DB2 DB3A
                                                     -----COMMITS----
DB2ENTRY PLAN     CALLS   AUTHS   W/P HIGH   ABORTS  1-PHASE  2-PHASE
*COMMAND             1       1     1    1       0         0       0
*POOL    ********    4       1     0    1       0         2       0
XC05     TESTP05    22       1    11    2       0         7       5
XP05     ********    5       2     0    1       0         1       1

DFHDB2020 01/17/98 15:45:27 IYKA4Z2G1 The display command is complete.

The row I highlighted in red is for the pool threads. In that row, check the number in the W/P column (I highlighted the W/P heading in blue above), which shows the number of times there was a wait or an overflow to the pool. For the pool threads, overflow is not relevant because you're already in the pool, so a non-zero number under W/P for the pool indicates a number of waits, and that suggests that the demand for pool threads exceeded the supply at times. In that case, it might be a good idea to increase THREADLIMIT for the DB2CONN resource (and doing that could potentially lead you to increase the TCBLIMIT value in DB2CONN).

As always, thanks for stopping by. I hope that this information will be helpful to you.

Wednesday, October 30, 2019

Db2 for z/OS: The Potential Performance Impact of 100% CPU Utilization

Twice during the current calendar year, I've worked on a Db2 for z/OS performance issue that I've encountered at various times over the past couple of decades: a transactional Db2 for z/OS workload (versus batch - more on that in a moment) has periods characterized by major increases in response times, with related symptoms being high in-Db2 not-accounted-for time and high - sometimes very high - in-Db2 latch wait times (more on those metrics to come). Analysis of z/OS monitor data shows that the general-purpose "engines" (processors) of the z/OS system in question are running at or near 100% busy during the time of the performance slow-down, and then a disagreement might arise, with some people (me among them) saying that the 100% busy CPU situation is the root cause of the response time elongation, and others saying that this is not the case because a z/OS system, as they see it, should be able to deliver good application performance even with the general-purpose engines running at or near 100%.

In this blog entry, I am aiming to explain why you do not want to have your z/OS system's general-purpose engines running at 100% busy for a Db2 for z/OS transactional workload (zIIP engine utilization should be managed differently versus general-purpose engine utilization, as noted in an entry I posted to this blog a few years ago).

First, let's distinguish between "transactional" and "batch" workloads

When I refer to a transactional workload in this blog entry, I most fundamentally mean, "not batch," and with "batch" I am referring to a traditional (in a z/OS system) JES batch workload. This workload distinction is important for the following reason: in a situation in which a Db2 for z/OS workload is entirely (or almost entirely) of a batch nature, as might be the case during certain overnight hours for certain organizations, it can be OK to try to drive general-purpose engine utilization towards 100%. Why? Because for a traditional z/OS batch workload, maximizing general-purpose engine utilization can be a key factor in getting the batch jobs done within a certain time window (and that can be beneficial in a financial sense: your organization might have service level agreements with clients that stipulate that files sent to you by those clients will be processed before a certain time, with failure to achieve that objective potentially triggering financial penalties).

The transactional workloads to which I refer in this blog entry are those that have two main characteristics: 1) the units of work are relatively small in scale, versus the often-large units of work that are common for batch jobs; and 2) response time for individual transactions is of prime importance. For a batch workload, "response time" is not often part of the conversation - throughput is key (referring to getting a certain amount of work done within a certain time frame). For a transactional workload, it is often very important that individual transactions have very good response time, and when that is a performance goal then you do NOT want the general-purpose engines in your z/OS system running at or near 100% busy.

Second, there is 100% busy, and there is 100% busy

OK, so what am I saying with that odd-looking phrase? I'm saying that there may be a particular level of transactional work on your system that would take general-purpose engine utilization for the LPAR just to 100% busy, and it may be that performance for the transactional workload will be OK in that situation. Trouble is, you're quite likely to blow right past that workload level, and general-purpose engine utilization is still at 100% (it can't go higher) and performance (elapsed time for individual transactions) starts to get pretty bad.

Consider this example: suppose that, at 1000 transactions per second, the general-purpose (GP) engines of the z/OS LPAR are running at 90% busy, and performance is good. Suppose that the transaction rate goes to 1070 per second and GP utilization goes to 98% and performance is still OK. Maybe a transaction rate of 1090 per second takes GP utilization right to 100%, and performance is still OK. What happens when the transaction arrival rate (meaning the rate at which transactions come into the system) goes to 1100 per second, or 1150 per second, or 1200 per second? GP utilization is still 100% (again, it can't go higher), but performance (response times for individual transactions) is likely degrading, and it will keep getting worse as the transaction arrival rate continues to climb. The point is this: to say that a Db2 for z/OS transactional workload can get good performance with GP engines running at 100% is often, in a practical sense, what you might call an academic point, versus a real-world truism, because it is quite unlikely that a transaction arrival rate that just takes GP engine utilization to 100% will stay constant - either it will soon decrease to a level that takes GP utilization below 100% or it will soon increase to a level that will cause performance problems.

What does the performance data tell you?

If you encounter a Db2 transactional application performance problem (unacceptably high response times), and you think that an overloaded system might be the cause, check the utilization of GP engines during the period of the performance problem. This can be done using your z/OS monitor. If that monitor is IBM's RMF product, look at an RMF CPU Activity Report for the z/OS LPAR in which the Db2 subsystem is running (you can see an example of this report on a page in the z/OS Knowledge Center on IBM's Web site). The report typically shows activity in 15-minute intervals of time. For a given 15-minute interval, find the section with CPU utilization numbers. The processors identified as "CP" under the "TYPE" heading are general-purpose engines (zIIP engines are identified by "IIP" under the "TYPE" heading). For the general-purpose engines (TYPE=CP), look at the figure under the heading  "MVS BUSY" in the "TOTAL/AVERAGE" row. See if that value is at or near 100%.

If you see that GP utilization was at or close to 100% during the period of the Db2 transactional performance problem, look next at figures in an accounting report generated by your Db2 monitor. If your monitor is IBM's OMEGAMON for Db2, the report is called an Accounting Long report. You can see an example of that report on this page in the Db2 for z/OS Knowledge Center. [A quick word about this example: it shows the standard aggregation level for data summarization, which is primary authorization ID within plan name. That aggregation level can be useful at times, but my preference is aggregation at the connection type level, which you get when you specify ORDER (CONNTYPE) in the report control statement. When data is ordered by connection type, you get, within the larger report, a sub-report showing all activity in the FROM and TO time period for each connection type used during the period - for example, one sub-report showing all CICS-Db2 activity for the subsystem, one showing all call-attach activity for the subsystem, one showing all DRDA activity (the DDF workload) for the subsystem, etc. I like that aggregation level because if I'm investigating, say, a DDF workload performance problem then I look at the sub-report for the DRDA connection type.] Anyway, if you're looking at the example report pointed to by the preceding link, go to page 1-5 of that sample report. You'll see, at the top of that page, a couple of sideways bar charts. One of the fields of the right-side bar chart is NOTACC. That's in-Db2 not-accounted-for time. For the transactional workload for which there was a performance problem, check that value. If it's greater than 10%, that's an indication of an elevated level of wait-for-dispatch time in the z/OS system, and that, in turn, is an indication of overly-busy GP engines (for more information on in-Db2 not-accounted-for time, you can look at an entry I posted to the Db2 blog I maintained while working as an independent consultant, prior to rejoining IBM in 2010).

Something else to check in that accounting report: Db2 latch wait time, one of the so-called "class 3" suspension times provided in a Db2 monitor accounting report. Normally, latch wait time is a very small percentage of in-Db2 elapsed time for a transactional workload, and a big increase in latch wait time is an indicator of an overloaded z/OS system. In my experience, when highly elevated latch wait times are seen during a period in which Db2 transactional performance is degraded due to an overloaded z/OS system, the data is often misinterpreted. People will see, in a Db2 monitor-generated statistics long report (that may be called a statistics detail report, depending on the monitor used), a big increase in the number of what is labeled "latch counts" for one or more latch classes during the period of elongated transactional response times, and they'll say, "Wow - why are we getting so many more latch class 14 (or 23 or 25 or whatever) events during this period?" and they'll look for things that might be done to reduce those seemingly high latch counts. The thing is, what's being shown in the statistics report is not the count of latch requests, per se - it's the count of latch suspends, meaning, the number of times that a process was suspended while waiting to get a required latch (Db2 uses what are called latches internally to ensure serialization of certain activities that have to be serialized - an example is the latch required when Db2 splits an index page). See, when a z/OS system is not overloaded, a required Db2 latch is often acquired right away, with no need to suspend the associated application process. Conversely, when a z/OS system is overloaded (GP engines running at or near 100%), a single latch request can lead to several suspensions; thus, a big increase in "latch counts" does NOT necessarily mean that more latch requests occurred - it could well be that about the same amount of latch activity was accompanied by many more latch-related suspensions. The problem, in other words, is not more latch activity - it's more suspends related to latch activity, and the increase in latch suspends can very much be a result of an overloaded z/OS system.

Dealing with an overloaded z/OS system

I'd like you to be in a situation in which utilization of the GP engines of your z/OS system doesn't exceed 90% during periods of peak Db2 for z/OS transactional traffic, but maybe that's not the case at your site. What if you are dealing with GP engine utilization that gets to 100% or very close to that when you have a lot of transactional work going through the system, and you see during those times the often-related Db2 performance symptoms: elongated transaction response times, high in-Db2 not-accounted-for time, high Db2 latch suspension time and high "latch count" numbers? What can be done? Some thoughts:

  • Maybe you can reduce CPU consumption through performance tuning actions. Do you have a lot of memory on the system? Maybe you can reduce CPU by enlarging and page-fixing buffer pools, and backing those pools with large page frames. Maybe you can utilize high-performance DBATs via the Db2 profile tables. Maybe other performance tuning actions would get GP engine utilization below 100%. Worth a try.
  • Maybe you can keep transactions flowing quickly through the system by inducing some degree of transaction queuing during peak processing times. One way to do that would be to set MAXDBAT to a value that will keep the Db2 system from getting flooded with DDF transactions (that can be done in a more granular fashion - for particular DDF-using applications - by way of the Db2 profile tables).
  • Maybe you can adjust some dispatching priorities. Getting a z/OS system's workload manager (WLM) policy right is especially important when the system is really busy. I have some advice for priorities that pertain to a DDF workload.
  • Maybe you can add processing capacity. It's likely that, at various times in the past, your organization has increased mainframe processing capacity as application workloads have increased. Maybe it's time to do that again.

Can you run a transactional Db2 workload on a z/OS system on which GP engine utilization is at or near 100%? Yes, in the sense that the system won't fall over. Do you want to run a transactional Db2 workload on a z/OS system on which GP utilization is at or near 100%? I would say, "No." Tuning actions, efficient application program code, proper workload prioritization - all are important for good performance. So is processing capacity - have enough of it.

Monday, September 30, 2019

Db2 12 for z/OS - Let's Talk About MAXDBAT in ZPARM

"ZPARMs" is a reference to the parameters in the Db2 for z/OS module called DSNZPARM - these are essentially the configuration parameters for a Db2 subsystem. Two of the ZPARMs that are closely related to each other are CONDBAT and MAXDBAT. CONDBAT specifies the maximum number of connections that network-attached applications (i.e., applications that access the Db2 system via the Db2 distributed data facility, aka DDF) can have with the Db2 system at any one time. MAXDBAT is, essentially, the number of those application connections that can be "in-use" at one time. ["In-use" has to do with the processing of transactions that are originated by the connected applications - for such a transaction to be processed by Db2, the associated connection has to be paired with a type of Db2 "thread" called a database access thread, or DBAT (think of a DBAT as a DDF thread).]

As a rule, you don't want a Db2 system's CONDBAT limit to be reached. Why? Because when that happens, the next connection request from a DDF-using application will fail with an error code. How can you tell if a Db2 system's CONDBAT limit has been reached (aside from noting that an application encountered a connection failure)? You can see that in a statistics detail report generated by your Db2 monitor (depending on the monitor, that might be called a statistics long report). In that report, you'd see a section with DDF-related information, and a field that would be labeled as the blue-highlighted line below (or in a similar way):

GLOBAL DDF ACTIVITY          QUANTITY
---------------------------  --------
DBAT/CONN QUEUED-MAX ACTIVE      0.00
CONN REJECTED-MAX CONNECTED      0.00

If that blue-highlighted number is non-zero, the CONDBAT limit for the Db2 system was hit during the time interval captured in the report. If I saw that CONDBAT was hit on my Db2 system, I'd increase the CONDBAT value. I wouldn't make the CONDBAT value way higher that it needs to be (it can be as high as 150,000 for a Db2 subsystem), but neither would I be stingy in this department (an application connection not in an "in-use" state is placed in an inactive status by Db2, and an inactive connection has a very small virtual storage footprint, and snapping it back to active status when needed is a very low-overhead operation).

Simple enough, but what about MAXDBAT? Do you also want that limit to not be reached? The answer to this question is a little less straightforward versus the CONDBAT situation. I'd say that in most cases you'd want the MAXDBAT value to be high enough so as not to be reached, but that's not necessarily so in all cases. First, how can you see that MAXDBAT has or has not been reached for a Db2 system? One way would be to check the aforementioned statistics detail (or statistics long) report generated by your Db2 monitor - the field highlighted in purple below is the relevant one:

GLOBAL DDF ACTIVITY          QUANTITY
---------------------------  --------
DBAT/CONN QUEUED-MAX ACTIVE      0.00
CONN REJECTED-MAX CONNECTED      0.00

A non-zero value in the purple-highlighted field indicates that the MAXDBAT limit was reached for the Db2 subsystem during the time interval captured in the report. Another way to check on this would be to issue the Db2 command -DISPLAY DDF DETAIL. In the output of that command you'd see this field and an accompanying value:

QUEDBAT=      0

The QUEDBAT value indicates the cumulative number of times that the MAXDBAT limit was hit for the Db2 subsystem since DDF was last started (which probably would be when the Db2 subsystem was last "bounced," or stopped and restarted).

What happens when MAXDBAT is hit? In that case, the request for a DBAT (required in order to service a transaction request coming via an application connection) is queued. No error, at least not right away, but if the request waits too long then the application server might time the transaction out, and we likely don't want that; so, you don't want MAXDBAT to be hit, right? If it is hit, you want to increase the MAXDBAT value, right? Probably right, but not always. Here's one of the "not always" scenarios: suppose you have a DDF workload characterized by the occasional surge of transaction volume (such as might occur during a certain part of a month). You could make the MAXDBAT value large enough to accommodate that surge, and that would be OK if the Db2 system's processing capacity is sufficient to effectively process the surge of transactions. What if that is not the case? What if the surge of transactions, if allowed to flow right into the Db2 system, would max the system out, taking the utilization of the Z server's general-purpose "engines" (processors) to something close to 100%? The z/OS system won't fail (z/OS is famous for staying up in extreme processing situations), but work could get really backed up, so much so that response times could soar, leading to performance complaints from application users (and maybe to monetary penalties if a service level agreement is violated).

If you have a situation in which a DDF transaction surge overwhelms a server's processing capacity, you might be better off with a MAXDBAT value that induces some transaction queueing during surges. If, for example, the number of concurrently-executing DDF transactions on a system is almost always below 2000, and surges above that have severely impacted response times, a MAXDBAT value of 2000 could be beneficial. Yes, if a surge comes along and MAXDBAT is hit then transactions will start queueing up, waiting for a DBAT to come free (when an in-process transaction completes), but the system, shielded from the negative impact of transaction overload, will continue to process work quickly and efficiently. That, in turn, will cause in-use DBATs to free up quickly, and that could mean that the time a DBAT-awaiting transaction spends in the queue will be very small. While there would be some response time elongation due to some transactions having to queue up for a DBAT, the performance impact could be reduced versus the "let 'em all in when they arrive" situation.

Bottom line: if a Z server has the processing capacity to efficiently handle "peak of the peak" DDF transaction volumes, make the MAXDBAT value high enough to avoid transaction wait-for-DBAT queueing. If, on the other hand, the occasional really-big DDF transaction surge causes the system to get severely bogged down so that transaction service times shoot way beyond the acceptable level, you could actually improve performance and throughput via a MAXDBAT value that maintains efficient processing by forcing a degree of transaction queueing. Note, too, that if you want to induce surge-time queuing only for transactions associated with a particular DDF-using application or applications, you can accomplish that via the Db2 profile tables (SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES), which enable the setting up of DBAT limits (and/or connection limits and/or idle thread timeout values) in a granular fashion (as described in an entry I posted to this blog a couple of years ago).

I hope that this information will be helpful for you.

Thursday, August 29, 2019

The Two Aspects of Db2 for z/OS Support for JSON Data

JSON (JavaScript Object Notation) is a form of data representation that is popular with people who develop what you might call "very modern" applications, including those that are cloud-based and those that have a mobile front-end. One reason this is so: a JSON document is easy for a program (or a person) to parse, as you'll see from examples I've included a bit further down in this blog entry.

As interest in JSON has grown, more folks have asked about the support that Db2 for z/OS provides for data represented in this form. I'm going to tell you about that support, starting with the fact that it comes in two flavors (which can be used separately or together). First, there is the JSON support that is associated with Db2's REST interface. The other flavor of Db2 REST support has to do with storing JSON documents, as such, in a column of a Db2 table. I'll cover these two REST-related Db2 capabilities in the order of my having mentioned them.


JSON and the REST interface to Db2 for z/OS

Through Db2's REST interface, you can enable execution of a static SQL statement via a REST request (for those not familiar with Db2 "static" SQL: the term refers to a statement that has been pre-coded and pre-prepared for execution - it's basically the compiled, executable form of a SQL statement). That SQL statement could be a SELECT, INSERT, UPDATE, DELETE or TRUNCATE, or a CALL to a stored procedure (the called stored procedure could then issue any number of different SQL statements). Let's consider a situation in which a SELECT that retrieves information about a bank account has been REST-enabled (sensitive information, yes, but Db2's REST interface utilizes connection protection and authentication services provided by the RACF z/OS security manager, along with Db2's own user authorization functionality). If this SELECT-based service is named getAcctInfo, the associated REST request issued by a client program could look like this:

POST http://mybank.com:4711/services/BANKING/getAcctInfo/v1

And in the body of the request you might see this input:

{
 “AcctID”: 123456789
}

That input is in the form of a JSON document - a very small JSON document, comprised of just a single name-value pair. Without being a rocket scientist, you can look at that JSON document and figure out that 123456789 is someone's account ID. That ease-of-interpretation is one of the appealing characteristics of JSON versus, say, XML. XML tags are sometimes kind of obtuse, owing in part to the fact that there often has to be broad agreement as to the meaning of a given tag (something that makes XML a very robust form of representation for data that might be exchanged between different companies, such as an automobile manufacturer and a parts supplier). With regard to the data payload of a REST request in JSON form, there just has to be agreement on meaning between client and server, and that flexibility enables use of meaningful names in name-value pairs.

OK, so that REST request, with its input in JSON form, comes into a Db2 for z/OS system (the system associated with mybank.com:4711 in the HTTP request). Assuming that the request's credentials check out (that would be ID and password, or ID and certificate), and the ID has been granted EXECUTE authority for the Db2 package that is the executable form of the REST-enabled SQL statement, the provided input will get plugged into the SELECT and the statement will be executed. Let's say the SELECT - with the provided input added - looks like this:

SELECT C.FIRSTNAME, C.LASTNAME, A.BALANCE, A.LIMIT
FROM ACCOUNTS A, CUSTOMERS C
WHERE A.ID=123456789

AND A.CUSTNO = C.CUSTNO

Suppose that the query's output looks like this, in traditional SQL result set form:

FIRSTNAME    LASTNAME    BALANCE    LIMIT
---------    --------    -------    -------
John         Smith       1982.42    3000.00

Db2 will take that result set and send it back to the REST client in the form of a JSON document, as seen below (if the REST-enabled query had a multi-row result set, you'd see multiple sets of associated name-value pairs, one after another in the JSON output document):

{
  “FIRSTNAME” : “John”,
  “LASTNAME”  : “Smith”,
  “BALANCE”   : 1982.42,
  “LIMIT”     : 3000.00
}

That, then, is the one aspect of Db2 for z/OS support for data in JSON form: as a provider of data services invoked via REST requests, Db2 accepts input in JSON form and sends output to a requester in JSON form. Now for a look at the other aspect of Db2 for z/OS support for JSON data.


Storing JSON documents in Db2 tables

Suppose that a client application sends data in JSON form to a z/OS server, and there is a desire to store those JSON documents as values in a column of a Db2 table. Db2 provides support for that usage scenario by way of some user-defined functions (UDFs) supplied via a no-charge software offering called the IBM Db2 Accessories Suite for z/OS (Version 4.1 of which is the one that goes with Db2 12 for z/OS - the product number is 5697-Q05). The most important of these UDFs are JSON2BSON, BSON2JSON and JSON_VAL. Here is what they do:

  • JSON2BSON - Converts a JSON document to BSON (a standard binary form for JSON data), so that it can be stored in a Db2 table column defined with the BLOB data type.
  • BSON2JSON - This UDF can be used to retrieve a JSON document in its entirety from a Db2 BLOB column in which the document is stored in BSON form. Output of the function is the text form of the retrieved JSON document.
  • JSON_VAL - This function can be used to retrieve - or to refer to - an individual data item (e.g., LASTNAME) in a JSON document. In addition to being usable in a query's select-list, JSON_VAL can be referenced in a predicate of a SELECT, INSERT or DELETE (you could, for example, retrieve data from a table based on the value of a particular item in a JSON document). JSON_VAL can also be used in a case expression or an ORDER BY clause; and, the function can be utilized to create an index on a table column that holds JSON documents (similar in concept to an index on an XML column - you might want an index on customer number values in JSON documents in a column).

Other UDFs that pertain to JSON data stored in a Db2 table are JSON_LEN, JSON_TYPE and JSON_TABLE. You can get additional information about all of the JSON-related UDFs in the Db2 for z/OS Knowledge Center on the Web.

And there you have it. As a data services provider for applications that use the REST architectural style, Db2 for z/OS can build a JSON document from data in a Db2 database, with said JSON document then used to return data to a REST client (and a REST client invoking a Db2-provided service can send associated input data in a JSON document). Db2 for z/OS also provides (in conjunction with the IBM Db2 Accessories Suite for z/OS) the ability to store JSON documents in Db2 tables, and to work with those documents (and the data values therein) using SQL. Two different areas of JSON-related functionality, each of which can be valuable for addressing different requirements that involve data in JSON form. I encourage you to explore and evaluate these capabilities.

Wednesday, July 31, 2019

Db2 for z/OS - Talking About APPLCOMPAT (Part 2)

Last month, I posted to this blog part 1 of a 2-part entry on the APPLCOMPAT option of the Db2 for z/OS BIND PACKAGE command. In that entry I covered several aspects of APPLCOMPAT. In this part 2 piece, I will focus on one particular matter pertaining to APPLCOMPAT - namely, the management of this option for packages associated with the IBM Data Server Driver (or its predecessor, Db2 Connect) in a Db2 12 for z/OS environment.

The IBM Data Server Driver is used by client application programs that issue SQL statements, typically in the form of non-DBMS-specific interfaces such as JDBC and ODBC and ADO.NET, that target a Db2 for z/OS system to which the application has a TCP/IP connection. The Data Server Driver packages belong, by default, to a collection called NULLID, and in that collection the packages have a certain APPLCOMPAT value. That APPLCOMPAT value determines the SQL functionality that is available to application programs that use the Data Server Driver packages in the NULLID collection. Let's say that function level V12R1M500 is active on a Db2 12 system. That M500 function level of Db2 12 is analogous to what was called new-function mode for Db2 versions that came before Db2 12, in that it makes available to programs a whole boat-load of new features and functions. But are those functions - such as piece-wise DELETE and query result set pagination - really available to DRDA requester applications (i.e., applications accessing Db2 via the IBM Data Server Driver and Db2's distributed data facility, aka DDF)? They are NOT available to those applications if the APPLCOMPAT value of the IBM Data Server Driver packages being used is below V12R1M500. [It's true that a dynamic SQL-issuing application could change the in-effect application compatibility level by changing the value of the CURRENT APPLICATION COMPATIBILITY special register, but starting with a Db2 12 system for which the activated function level is V12R1M500 or higher, a program can't set CURRENT APPLICATION COMPATIBILITY to a value greater than the APPLCOMPAT value of the package associated with the program.]

OK, so it would be a good idea, in a Db2 12 environment for which function level V12R1M500 had been activated, to have the IBM Data Server Driver packages in the NULLID collection bound with an APPLCOMPAT value of V12R1M500. That way, DRDA requester programs will be able to use the rich SQL functionality available at the M500 function level (I am thinking here of DRDA requester programs that do not have all of their "table-touching" SQL statements issued by Db2 stored procedures - the SQL functionality available to a stored procedure is determined by the APPLCOMPAT value of the stored procedure's package). But what about functionality beyond the V12R1M500 function level? Suppose, for example, that a DRDA requester program needs to issue a query with the LISTAGG function introduced with function level V12R1M501? In that case, you'd need the IBM Data Server Driver packages to be bound with APPLCOMPAT(V12R1M501) or higher. The question is: should you bind the packages in the NULLID collection with APPLCOMPAT(V12R1M501) or higher?

My answer to that question is, "No." Once I had my NULLID packages bound with APPLCOMPAT(V12R1M500), which I'd likely do not long after activating function level V12R1M500 for the Db2 system, I'd leave them at that application compatibility level. Here's why: if you keep rebinding the NULLID packages with ever-higher APPLCOMPAT values as you activate ever-higher function levels for a Db2 12 system, the result would be that DRDA requester programs would, by default, use higher and higher levels of Db2 application compatibility. What's wrong with that? Well, it could cause a DRDA requester application to "stumble into" a SQL incompatibility problem (as I mentioned in part 1 of this 2-part entry, a SQL incompatibility is a situation in which the same SQL statement operating on the same data yields a different result). Suppose, for example, that function level M508 introduces a SQL incompatibility (this is theoretical - as of the date of my posting this blog entry the latest available function level is M505). If you activate function level V12R1M508 for your Db2 12 system and thereafter rebind the NULLID packages with APPLCOMPAT(V12R1M508), it means that DRDA requester programs will by default be executing with the application compatibility level at V12R1M508. Suppose the (theoretical) SQL incompatibility ends up "biting" a DRDA requester application, causing it to fail? The big problem here is that the developers who worked on that application were blindsided by the SQL incompatibility. They didn't know that their program was going to execute with application compatibility set to V12R1M508 - that just happened because the NULLID packages (the IBM Data Server Driver packages that are used by default by DRDA requester applications) were bound with APPLCOMPAT(V12R1M508). No one likes unpleasant surprises.

OK, so you keep the NULLID packages with APPLCOMPAT(V12R1M500), as I'm recommending. How are we supposed to let developers of DRDA requester applications use SQL functionality introduced with function levels beyond M500? Here's how: with alternate IBM Data Server Driver package collections (meaning, collections with names other than NULLID) to which the NULLID packages have been BIND COPY-ed with a desired APPLCOMPAT value that is greater than V12R1M500. And how do you point a DRDA requester application to such an alternate IBM Data Server Driver collection? That can be done on the client side by way of a data source property, but the way that I see as being better (in that it doesn't introduce a hassle for client-side developers or application server administrators) is to use the Db2 profile tables to automatically point a particular DDF-using application to the desired alternate IBM Data Server Driver package collection (this is done via a profile attribute that sets the value of the CURRENT PACKAGE PATH special register to the name of the alternate IBM Data Server Driver package collection, as described in an entry I posted last year to this blog).

Here's how this scenario might play out in your environment, with a developer named Steve communicating a need to a Db2 for z/OS DBA named Sarah (again, I'm using a theoretical M508 future function level):

Steve: Hey, Sarah. Your team told us to let you know if we need Db2 SQL functionality beyond what's available at the M500 function level. I think I have that need. I want to use the XYZ built-in Db2 function for a new client-server application that will access Db2 for z/OS-based data, and it seems that the function isn't available with M500 because I get an error in our development environment when I try to use it.

Sarah: Yeah, function XYZ was delivered with function level M508, and our default function level for Db2 client-server applications is M500. We activated function level M508 for our Db2 for z/OS systems a few months ago, and we can get your application set up to work at that application compatibility level, no problem. All I need from you is a way to identify your client-server application, because we'll make a Db2-side change that will key off of that identifier. It could be the auth ID that the application uses to connect to the Db2 system, or the IP addresses of the servers on which the application will run, or a workstation name provided by the application, or...

Steve: Auth ID would make the most sense. This particular application on which we're working will connect to the Db2 system using auth ID 1234.

Sarah: OK, great. We'll get a Db2 profile set up for that auth ID, and that profile will enable your application to use M508 functionality. By the way, the M508 function level introduced a SQL incompatibility: if you provide a timestamp value as the argument for the ABC built-in Db2 function, the result at the M508 application compatibility level is thus-and-such, whereas before M508 the result would be so-and-so. Would that SQL incompatibility impact the application on which you're working?

Steve: No, we aren't using the ABC built-in function - but thanks for that heads-up.

Sarah: Sure thing. We should have the new Db2-side profile ready for you in the development system by end-of-day tomorrow - we'll notify you when that work is done. You should then be able to use the Db2 XYZ function with no problem.

Steve: Thanks!

OK, get it? Several very good things happened here. First, Steve let Sarah know about a new Db2 SQL function that his team wants to use - it's always nice when the Db2 team gets word about new SQL functionality that developers want to use. Second, Sarah, by way of a Db2 profile, will quickly enable Steve's team to use the desired new SQL functionality without having to do anything special (such as change a data source property) on the client side. Finally, this exchange gave Sarah the opportunity to inform Steve of a SQL incompatibility introduced with (the theoretical) M508 function level of Db2 12. It turned out that this SQL incompatibility was not going to impact the application on which Steve is working, but if that had not been the case - if the SQL incompatibility were going to be problematic for the new application - then knowledge of the incompatibility would give Steve and his teammates a chance to work around it by way of a code change.

I hope that this blog entry has provided you with what you'll see as being a reasonable way of managing APPLCOMPAT values beyond V12R1M500 for the IBM Data Server Driver packages in a Db2 12 for z/OS environment.

One more thing: for a time, starting last year, we (IBM) required DRDA client-side programmers to appropriately set the value of a client-side parameter called clientApplCompat if use of SQL functionality beyond the V12R1M500 function level were desired. Db2 teams at a number of sites were not keen on that requirement, as they want to make working with Db2 for z/OS-based data as hassle-free as possible for their application development colleagues. IBM Db2 for z/OS development heard the voice of the customer, and responded with APAR PH08482, the fix for which removed the need to set clientApplCompat in order to use Db2 12 SQL functionality beyond the M500 function level.

Sunday, June 30, 2019

Db2 for z/OS - Talking About APPLCOMPAT (Part 1)

The Db2 for z/OS package bind option APPLCOMPAT is not new - it was introduced with Db2 11. Even so, I think that now is a good time to spend some time thinking about APPLCOMPAT, for two reasons: 1) I've encountered some misunderstandings pertaining to APPLCOMPAT, and I'd like to clear those up; and 2) Db2 12 has introduced some changes that relate to APPLCOMPAT, and folks should be aware of those changes.

In this first of a two-part blog entry, I'll cover several aspects of APPLCOMPAT. In the part 2 entry, which I'll post within the next 30 days or so, I'll focus on one particular issue, that being the APPLCOMPAT value used for IBM Data Server Driver / Db2 Connect packages in a Db2 12 for z/OS environment.


What does APPLCOMPAT do?

APPLCOMPAT has a twofold purpose. First, it enables the use of SQL functionality introduced with a Db2 version and/or (in a Db2 12 environment) function level. Consider, for example, the LISTAGG built-in function delivered with Db2 12 function level V12R1M501 (LISTAGG is pretty slick, making it easy to have, as a column in a query result set, a comma-separated list of values). Suppose that an application developer wants to use LISTAGG in a query issued by his program. If the program's package is bound with APPLCOMPAT(V12R1M500) - or any other value below V12R1M501 - then the query that includes LISTAGG will get a SQL error code at execution time (most likely a -4743, ATTEMPT TO USE NEW FUNCTION WHEN THE APPLICATION COMPATIBILITY SETTING IS SET FOR A PREVIOUS LEVEL). The developer will need to change the query so that LISTAGG is not invoked, or the program's package will have to be rebound with an APPLCOMPAT value that is equal to or greater than V12R1M501 (the APPLCOMPAT value in a Db2 12 system can be as high as the system's currently-activated function level).

APPLCOMPAT's other purpose is to provide a shield that protects a program from what the Db2 documentation calls a "SQL incompatibility." I prefer the phrase, "SQL behavioral change," but whether you use that wording or "SQL incompatibility," what you're talking about is a situation in which the same SQL statement, executed with the same data, yields a different result after activation of the functionality provided by a new Db2 version (or a new Db2 12 function level). Here's an example of a SQL incompatibility: while in a Db2 10 system a SQL statement can successfully cast an 8-byte store clock value AS TIMESTAMP, in a Db2 11 new-function mode system that same SQL statement, operating on the same 8-byte store clock value, will get a -180 SQL error code - this because in a Db2 11 NFM environment a store clock value is no longer a valid input for a CAST(... AS TIMESTAMP) operation. Suppose you have a program with a SQL statement that casts a store clock value as a timestamp, and suppose changing that SQL statement (to avoid the -180 error code in a Db2 11 NFM system) is not feasible for some reason. In that case, the program's package could be bound with APPLCOMPAT(V10R1), and as a result the SQL statement will get, in a Db2 11 NFM system (or a Db2 12 system), the same result that it would get in a Db2 10 system, because APPLCOMPAT(V10R1) indicates that behavior for the package's SQL statements is to be what it would be in a Db2 10 system, regardless of the fact that the package is executing in a Db2 11 NFM (or Db2 12) environment (we always document SQL incompatibilities - the change related to casting store a clock value as a timestamp is one of the Db2 11-introduced incompatibilities).


Where does a package's APPLCOMPAT value come from?

As already mentioned, APPLCOMPAT is a package bind option. It is also a ZPARM parameter. That ZPARM has one and only one purpose: it provides the default value of APPLCOMPAT, when needed, for a BIND PACKAGE or REBIND PACKAGE operation. When is the ZPARM-provided default needed? For BIND PACKAGE, it is needed when an APPLCOMPAT value is not specified in the command. For REBIND PACKAGE, it is needed when an APPLCOMPAT value is not specified in the command and "not already there" for the package being rebound. I'll explain what I mean by "not already there." When a package is rebound and an APPLCOMPAT value is not specified in the REBIND PACKAGE command, the package's existing APPLCOMPAT value, if anything, will carry forward through the REBIND operation (so, if a package with an APPLCOMPAT value of V10R1 is rebound and an APPLCOMPAT value is not specified in the REBIND PACKAGE command, the rebound package will retain the V10R1 APPLCOMPAT value). "If anything" implies that the package may not have an explicit APPLCOMPAT value prior to the rebind, and that could indeed be the case (this situation would be indicated by a blank value in the APPLCOMPAT column in the package's row in the SYSIBM.SYSPACKAGE catalog table). When that is so the value used for the REBIND PACKAGE operation will be the value of the APPLCOMPAT parameter in ZPARM (as is the case when BIND PACKAGE is issued sans an APPLCOMPAT specification).

"How about dynamic SQL?" you may ask. "Can't a dynamic SQL-issuing program change the in-effect application compatibility level via the SQL statement SET CURRENT APPLICATION COMPATIBILITY?" Yes, that can be done, but starting with a Db2 12 system for which function level V12R1M500 or higher has been activated, the value specified for the SET CURRENT APPLICATION COMPATIBILITY special register cannot be greater than the APPLCOMPAT value of the program's Db2 package. Any dynamic SQL-issuing program has an associated package, which could be an IBM Data Server Driver / Db2 Connect package. If the package used for a dynamic SQL-issuing program is bound with APPLCOMPAT(V12R1M500) then V12R1M500 will be the initial value of the CURRENT APPLICATION COMPATIBILITY special register when the program executes, and the program can set the value of that special register to something less than V12R1M500 but not to a value greater than V12R1M500.


An optimizer-related misconception

Sometimes people think that APPLCOMPAT does more than it actually does. Case in point: SQL statement optimization. Lots of folks know that it's a good idea to rebind all plans and packages in a Db2 for z/OS system soon after going to a new version of Db2 (one reason: the package code generated by Db2 version n - packages are executable code, you know - is likely to be more CPU-efficient than the code generated for the same package by version n-1). Some of those same people think, however, that to take advantage of new access path choices available with the version n optimizer, APPLCOMPAT(n) - where n would be a value associated with the new Db2 version - has to be specified for the rebind operations. Nope. If you rebind a package with APPLCOMPAT(V11R1) in a Db2 12 system, you will be able to get for that package the performance benefit of access path choices introduced with Db2 12 (assuming that you did not go with the APREUSE option, which tells Db2 to reuse the existing access paths for the package's SQL statements).

Remember: APPLCOMPAT can enable use of new Db2 functionality, and APPLCOMPAT can serve as a shield to insulate a program from a SQL incompatibility introduced by a new version (or new function level) of Db2. APPLCOMPAT does not tell the Db2 optimizer what to do.


Db2 12 and the DDL dimension

In a Db2 11 system, APPLCOMPAT affects SQL DML statements (e.g., SELECT, INSERT, UPDATE and DELETE), but not DDL statements (e.g., CREATE, ALTER). That being the case, in a Db2 11 new-function mode environment a program whose package is bound with V10R1 cannot reference a global variable in a SELECT statement (global variables were introduced with Db2 11), but that same program can create a global variable (because DDL is not affected by APPLCOMPAT in a Db2 11 system).

Starting with Db2 12, APPLCOMPAT affects both DML and DDL statements; so, in a Db2 12 system, a program whose package has an APPLCOMPAT value of V12R1M500 cannot issue an ALTER TABLESPACE statement that includes a KEY LABEL specification (indicating that data in the table is to be encrypted using the key associated with the specified label), because KEY LABEL became an ALTER TABLE option starting with function level V12R1M502, and using that new DDL option would require that the package of the program issuing the ALTER TABLE statement be bound with an APPLCOMPAT value of V12R1M502 or greater.

This DDL dimension of APPLCOMPAT in a Db2 12 system gets particularly interesting starting with function level V12R1M504. With that function level (or higher) activated, a program whose package is bound with APPLCOMPAT(V12R1M504) or higher cannot use static SQL to create a traditional segmented table space or a range-partitioned table space that is not universal partition-by-range (in other words, that program cannot use static SQL to create a table space that is not either a partition-by-growth or a partition-by-range universal table space). Why is this so? Because plenty of Db2-using organizations, cognizant of the advantages of universal table spaces, do not want non-universal table spaces to be even accidentally created. Note also that a program whose package has an APPLCOMPAT value of V12R1M504 or higher cannot use static SQL to create a hash-organized table or a synonym (in the latter case one would create an alias instead).

Why did I repeatedly underline the phrase "cannot use static SQL" in the paragraph above? See the paragraph below.

What if you have a Db2 12 system with function level V12R1M504 or higher activated, and you need to create a traditional segmented table space? You have a couple of options for getting that done. One option would be to use a dynamic SQL-issuing program (an example being the Db2-supplied program called DSNTEP2) whose package has an APPLCOMPAT value of V12R1M504 or higher, and have that program first issue the SQL statement SET CURRENT APPLICATION COMPATIBILITY = 'V12R1M503' and then issue the CREATE for the traditional segmented table space (recall that in a Db2 12 environment in which function level V12R1M500 or higher has been activated, a dynamic SQL-issuing program can set the in-effect application compatibility level to something below the APPLCOMPAT value of its package, but not to something above that APPLCOMPAT value). The other option: create the traditional segmented table space via a program whose package has an APPLCOMPAT value of V12R1M503 or lower.

That's all for now. Tune in in about a month to see part 2 of this blog entry and get some guidance on specifying APPLCOMPAT for the IBM Data Server Driver / Db2 Connect packages in a Db2 12 system.

Friday, May 31, 2019

Db2 for z/OS: IIPHONORPRIORITY and zIIP offload

There is something I have encountered a couple of times (most recently, just a month ago) that can have a negative performance impact on a Db2 for z/OS system, and it's something of which quite a few Db2 people are, I think, unaware. I'm referring here to the setting of a z/OS parameter called IIPHONORPRIORITY, and I want to use this blog entry to provide information on how the value of that parameter can affect the processing of a Db2 workload.

IIPHONORPRIORITY is a parameter in the IEAOPTxx member of the z/OS data set called SYS1.PARMLIB. The value of the parameter can be set to NO or YES. Essentially, those two settings have the following meanings:

  • YES - Honor the priority of zIIP-eligible tasks (per the z/OS LPAR's workload manager policy), so that in the event that such a task is ready for dispatch and the LPAR's zIIP engines are busy, allow the task to be dispatched to a general-purpose engine so that wait-for-dispatch time for the task will not be too high.
  • NO - Regardless of the priority of zIIP-eligible tasks, limit the processing of such tasks to zIIP engines. If a zIIP-eligible task is ready for dispatch and the LPAR's zIIP engines are busy, require that the task wait until a zIIP engine becomes available.

Now, at first consideration a setting of IIPHONORPRIORITY=NO might appear to be an attractive alternative. After all, the raison d'ĂȘtre for zIIP engines is reduced cost of computing on a z/OS platform. Why not maximize that cost benefit by forcing all zIIP-eligible work to be processed by zIIP engines? In fact, however, if Db2 for z/OS is in the picture then specifying IIPHONORPRIORITY=NO can lead to significant performance degradation for an application workload (especially a DDF workload) and can even reduce offloading of work to zIIP engines. I have seen these effects with my own eyes, and I will describe them in more detail below.

Negative impact on application performance. Just about a month ago, as part of a review of an organization's production Db2 for z/OS environment, I looked through a Db2 monitor-generated accounting long report for the subsystem of interest. Data in the report, showing activity for a busy hour of the day, was ordered by connection type (meaning that the data was aggregated by connection type used by Db2-accessing programs: DDF, CICS, call attachment facility, etc.). In the DDF section of the report (connection type: DRDA), I saw that not-accounted-for time was 65% of in-Db2 elapsed time for DDF-using applications. In-Db2 not-accounted-for time is time associated with SQL statement execution that is not CPU time and not "known" wait time (the latter term referring to Db2 accounting trace class 3 wait times such as wait for synchronous read, wait for lock, wait for latch, etc.). In my experience, in-Db2 not-accounted-for time is primarily reflective of wait-for-dispatch time, and for a transactional workload (like the typical DDF workload) it should be a relatively small percentage of average in-Db2 elapsed time - under 10% is good, between 10 and 20% is marginally acceptable. 65% is an extremely elevated level of not-accounted-for time as a percentage of in-Db2 elapsed time.

I was initially perplexed by the very high level of in-Db2 not-accounted-for time seen for the DDF workload. The level of utilization of the system's general-purpose and zIIP engines, as seen in a z/OS monitor-generated CPU activity report, was not high enough to make that a prime culprit. Later, seeing unusual numbers in a Db2 monitor-generated statistics long report (more on that to come) prompted me to ask the question, "Do you folks by chance have IIPHONORPRIORITY set to NO?" It turned out that that was indeed the case, and was very likely the root cause of the very high percentage of in-Db2 not-accounted-for time observed for DDF-using applications: zIIP-eligible tasks servicing requests from Db2 client programs were spending a considerable amount of time queued up waiting for a zIIP engine to become available, with redirection to a general-purpose engine having been removed as an option via the IIPHONORPRIORITY=NO specification. A much better approach would be to have IIPHONORPRIORITY set to YES, and to have zIIP engine capacity sufficient to keep "spill-over" of zIIP-eligible work to general-purpose engines at an acceptably low level (information on monitoring spill-over of zIIP-eligible work to general-purpose engines can be found in a blog entry I posted on that topic); and, keep in mind that running zIIPs in SMT2 mode can help to minimize the zIIP spill-over rate.

Allowing some zIIP-eligible work to be directed to general-purpose engines when an LPAR's zIIP engines are busy (which should not often be the case when the LPAR is configured with an adequate amount of zIIP capacity), made possible via IIPHONORPRIORITY=YES, is a safety valve that enables consistently good performance for a zIIP-heavy workload such as that associated with DDF-using applications.

Potential non-maximization of zIIP offload. You might wonder, "How could setting IIPHONORPRIORITY to NO possibly have a reductive effect on offloading of work to zIIP engines?" I'll tell you how: when IIPHONORPRIORITY=NO is in effect, "Db2 does not allow system tasks to become eligible for zIIP processing." I have that in quotes because the words are taken directly from a page in the Db2 for z/OS Knowledge Center on IBM's Web site (on that page, scroll down to the part under the heading, "IIPHONORPRIORITY parameter"). To understand what the quoted phrase means, consider that zIIP-eligible Db2 tasks can be broadly divided into two categories: user tasks and system tasks. zIIP-eligible user tasks (which can also be thought of as application tasks) include those under which SQL statements issued by DRDA requesters - and SQL statements issued by native SQL procedures called through DDF - execute. Those zIIP-eligible tasks must be processed by zIIP engines when IIPHONORPRIORITY is set to NO. zIIP-eligible system tasks include those under which operations such as prefetch reads and database writes execute, and those tasks cannot run on zIIP engines when IIPHONORPRIORITY is set to NO (because in that case Db2 makes those tasks non-zIIP-eligible).

So, think about it: if you have sufficient zIIP capacity to keep spill-over of zIIP-eligible work to general-purpose engines at a low level, which should be the case in a production Db2 for z/OS environment (and I consider "low level" to be less than 1%), and you have IIPHONORPRIORITY set to NO, you might actually be causing total zIIP offload of Db2-related work to be lower than it otherwise would be, because prefetch reads and database writes, which are 100% eligible when IIPHONORPRIORITY is set to YES, are 0% zIIP-eligible when IIPHONORPRIORITY=NO is in effect. That's no small thing - in some Db2 systems, there is a great deal of prefetch read activity. I mentioned previously that a telltale sign that IIPHONORPRIORITY is set to NO in a z/OS system in which Db2 runs can be found in a Db2 monitor-generated statistics long report. In such a report, scroll down to the section in which CPU consumption attributed to the various Db2 address spaces is shown. In that report section, check the CPU times for the Db2 database services address space. If IIPHONORPRIORITY is set to YES, it is highly likely that a majority - often, a very large majority - of that address space's CPU time is in the zIIP column (the column in the report labeled PREEMPT IIP SRB, or something similar to that) versus the general-purpose engine column (labeled CP CPU TIME, or something similar). This is a reflection of the fact that prefetch reads and database writes very often constitute most of the work done by the Db2 database services address space, in terms of CPU resources consumed. If, on the other hand, IIPHONORPRIORITY is set to NO, it is quite likely that a majority - and maybe a large majority - of the Db2 database services address space's CPU time will be in the general-purpose engine column versus the zIIP column.

Bottom line: if you want to actually maximize offloading of Db2-related work to zIIP engines, do two things: 1) ensure that IIPHONORPRIORITY is set to YES, and 2) have enough zIIP capacity to keep the zIIP spill-over rate (the percentage of zIIP-eligible work that ends up being executed on general-purpose engines) below 1%. Refer to the blog entry to which I provided a link, above, to see how to calculate the zIIP spill-over rate using numbers from a Db2 monitor-generated accounting long report.

Is there any situation in which going with IIPHONORPRIORITY=NO might be reasonable? I'd say, "Maybe," if you're talking about a test or development environment in which optimal DDF application performance is not very important. If the z/OS system in question is one in which a production Db2 subsystem runs, I'd be hard pressed to come up with a justification for IIPHONORPRIORITY=NO. For a production Db2 system, you want IIPHONORPRIORITY=YES. You might want to check on this at your site.