Sunday, March 31, 2019

A Case Study: Implementing a Db2 for z/OS Implicit Trusted Connection

I've been interested in the role and trusted context features of Db2 for z/OS for a long time (I posted a two-part blog entry on the topic back in 2011). Theory is nice, but what about practice? Recently, I had an opportunity to assist a DBA in implementing implicit trusted connection functionality in his Db2 for z/OS environment (more on the implicit part in a moment). Today I'll share some of what the DBA and I learned through that endeavor.

First, the backstory: the DBA's organization wanted to tighten Db2 for z/OS data security by preventing unauthorized utilization of the authentication credentials used by an application to connect to a Db2 system. The kind of application we're talking about here is one that accesses a Db2 for z/OS server through a TCP/IP link via the IBM Data Server Driver (or its predecessor, Db2 Connect). In requesting a connection to a Db2 system, the application provides an authorization ID and an associated password (assuming authentication using an ID and a password, versus an ID and a certificate). Often, the ID and password in question are known by several of the application's developers. What is to keep someone from using the application's authorization credentials to connect to the Db2 system from, say, a laptop PC, thereby gaining access to data that should be accessible only through the application?

Enter roles and trusted contexts. These Db2 features work together to shut down the potential security exposure just described. How does this work? First of all, any Db2 privileges needed by the application (e.g., the SELECT privilege on table T1) are NOT granted to the application's Db2 authorization ID. Those privileges are instead granted to a Db2 role, which I'll call APP1ROLE. Next, a Db2 trusted context is created to specify the circumstances under which the Db2 privileges granted to the role can be used by an application. If the trusted context is named TRCON1, and if the application authenticates to the Db2 system using authorization ID APP1ID and the application runs on the two application servers at IP addresses 1.2.3.4 and 1.2.3.5, the Db2 DDL statement defining the trusted context might look something like this:

CREATE TRUSTED CONTEXT TRCON1
BASED UPON CONNECTION USING SYSTEM AUTHID APP1ID
DEFAULT ROLE APP1ROLE
ATTRIBUTES (ADDRESS ’1.2.3.4’, ADDRESS '1.2.3.5')
ENABLE;

Here's what the above DDL statement is essentially saying: "The Db2 privileges granted to role APP1ROLE can be used ONLY by the application that connects to Db2 using authorization ID APP1ID, and ONLY when that application connects to this Db2 system from either IP address 1.2.3.4 or IP address 1.2.3.5." Following creation of the role and trusted context, suppose that someone who know's the application's authentication credentials uses them to connect to the Db2 system from a laptop PC. Assuming a successful connection from the laptop PC, is that an "Uh-oh" situation? No. Why not? Because no Db2 privileges have been granted to the application's authorization ID (connecting to a Db2 for z/OS system doesn't get you anywhere if you can't do anything after connecting). But what about the privileges granted to role APP1ROLE, which we've associated with the application's authorization ID? Those privileges are not available to the person who used the application's credentials to connect to the Db2 system from a laptop PC because the laptop's IP address is not one of the two specified in the definition of the trusted context (those being the addresses of the two application servers on which the application runs). What has been accomplished through the creation of the role and trusted context is a vast reduction in what security people sometimes call the "threat area": instead of worrying about the application's credentials being misused from any IP address, we just need to secure the two IP addresses referenced in the trusted context definition (in other words, we need to ensure that an individual cannot make an unauthorized connection to the Db2 system from the two application servers on which the application of interest runs).

Back to my DBA friend. His first attempt at successful use of a Db2 role and trusted context didn't work: the SELECT privilege on table T1 was granted to a role but not to authorization ID APP1ID (using the example ID previously referenced), and an attempt to execute SELECT COL1 FROM T1 by way of a trusted context, using ID APP1ID, failed with a Db2 "ID not authorized to perform this action" error code. We decided to check to make sure that the IP address used in defining the trusted context was the right one. To help with that verification, we 1) turned auto-commit off on the tool being used by the developer who was helping us with the test, and 2) granted SELECT ON T1 to APP1ID. Why did we do these two things? So that we could see the IP address of the developer's Db2 thread in Db2 -DISPLAY THREAD command output (granting SELECT ON T1 to APP1ID ensured that the thread information wouldn't disappear due to the aforementioned authorization error, and making sure that a commit wasn't issued after the successful SELECT similarly preserved the thread's information, which would have disappeared when the thread went into a disconnected state following a commit).

After we'd taken those two actions to keep the developer's thread in a connected state following the SELECT FROM T1, we had the developer try again. Sure enough, we saw information about the developer's still-connected Db2 thread in -DISPLAY THREAD(*) DETAIL output, but the client IP address information was missing. That information was absent because -DISPLAY THREAD(*) DETAIL was an overly-generic form of the command -DISPLAY THREAD. When we switched to the form -DISPLAY THREAD(*) LOCATION(*) DETAIL, which tells Db2 that you only want to see information about DDF threads, we got additional DDF-related information in the command, including the client IP address associated with the developer's transaction (that address is in the V445 section of the command's output). The client address we saw was the one used in the definition of the trusted context, so why didn't the trusted context work for us? Because in this case the developer's connection to the Db2 system was through a Db2 Connect "gateway" server (versus a direct connection from the client via the IBM Data Server Driver), and in that case the client IP address that Db2 "sees" is the address of the Db2 Connect gateway server, not the address of the client that is "upstream" from the Db2 Connect gateway server.

OK, so we addressed that problem by re-defining the trusted context, this time using the IP address of the Db2 Connect gateway server. Well, not exactly. We actually re-defined the trusted context by specifying the domain name of the Db2 Connect gateway server, instead of its IP address. That can certainly be done in a CREATE (or ALTER) TRUSTED CONTEXT statement, and such a specification can be useful when (as was the case for my DBA friend's organization) the IP address of the Db2 Connect gateway server used by an application can change from time to time (if the application server references the Db2 Connect gateway server's domain name, a change in the gateway server's IP address does not require a client-side change - the domain name is simply mapped to the new gateway server's IP address versus the address of the old gateway server). We tried again with the redefined trusted context AND IT STILL DIDN'T WORK. ARRGGGHHH. What was going on? Further investigation revealed the problem: the domain name we were given by a network administration colleague of the DBA was actually associated with a pair of IP addresses (corresponding to a pair of Db2 Connect gateway servers), and for a domain name specification in the ATTRIBUTES part of a CREATE (or ALTER) TRUSTED CONTEXT statement to work properly, it has to be associated with a single IP address.

Once again, the DBA re-defined the trusted context, this time using two domain names as address attributes, with each domain name being specific to the one IP address of one of the two Db2 Connect gateway servers in a cluster. He revoked SELECT ON T1 from the APP1ID authorization ID, and had the developer try again with a SELECT from T1 using APP1ID, AND IT WORKED!! The successful execution of the SELECT from T1 showed that the privilege granted to the role APP1ROLE was being used, and indeed we saw in the output of a -DISPLAY THREAD(*) LOCATION(*) DETAIL command that the developer's thread was using the established trusted context (that information is in the V485 section of the command's output).

Now, about the "implicit" nature of the trusted connection, referenced way up at the top of this blog entry (and in the entry's title): I wanted to make sure to emphasize the implicit nature of the trusted connection used in this example, because so much of the documentation on Db2 for z/OS trusted contexts pertains to explicit trusted connections. What's the difference? Well, in the latter case a client application explicitly requests a trusted connection to the Db2 server, whereas in the former case the connection is just a "regular" connection from the client application's perspective, and it becomes a trusted connection because a trusted context matching the application's authorization ID and IP address (or domain name or whatever other attribute is specified) has been defined on the Db2 side. If you can get a trusted connection to a Db2 server (one that enables use of the Db2 privileges granted to a role associated with the trusted connection) without having to explicitly request such a connection, why would you go the explicit route? You'd go that route if you needed functionality that is ONLY available for explicit trusted connections. An example of that functionality: an application, having requested and obtained an explicit trusted connection to a Db2 for z/OS server, can then tell Db2 that it is "switching" use of the explicit trusted connection from its ID (the ID the application used to connect to the Db2 server) to the ID of an end user of the application. That switching from one ID to another (and the associated flowing by the application of an actual end-user's ID to the Db2 server system) is one of the things that has to happen if you want to make use of (for example) the enterprise identity mapping capability provided by RACF (whereby an individual's "network" ID can be mapped to a RACF ID). Note that a trusted context defined for the purpose of supporting explicit trusted connection functionality would likely include the WITH USE FOR clause in the CREATE TRUSTED CONTEXT statement - that clause can be used to reference end-user IDs passed to Db2 by an application. In the situation I've described in this blog entry, the DBA did not require explicit trusted connection functionality, so setting up implicit trusted connection capability was appropriate.

I hope that the information I've provided here will be useful to you if you want to utilize implicit trusted connection functionality in your Db2 for z/OS environment.

Monday, February 25, 2019

Db2 for z/OS: Isolating REST Client Activity in Db2 Monitor Accounting Reports

In an entry posted to this blog some years ago, I described the form of a Db2 monitor-generated accounting report that I most like to use to get an overall view of application activity in a Db2 for z/OS environment: an accounting long report with data ordered by connection type (here, I am using terminology associated with the Db2 monitor with which I'm most familiar: IBM's Tivoli OMEGAMON XE for Db2 Performance Expert on z/OS). I like to see data in an accounting report aggregated at the connection type level because within such a report there is a sub-report with detailed accounting information for each connection type through which the target Db2 subsystem was accessed: a sub-report showing all CICS-related work for the subsystem, one showing all TSO attach-related work, one showing all DDF-related work, etc.

Now, the thing about the DDF-related activity in this form of an accounting report is that it is presently - thanks to a recent and important Db2 for z/OS enhancement - kind of mis-labeled. The DDF-related work in an accounting long report with data ordered by connection type is distinguished by the label CONNTYPE: DRDA. How is that a bit misleading? Well, the native REST interface to Db2 for z/OS (the enhancement to which I referred previously - designed into Db2 12 and retrofitted to Db2 11) is an extension of the Db2 distributed data facility (DDF's long name). What that means - as pointed out in a blog entry I posted a few months ago - is that there are now two paths into DDF: the DRDA path (which application developers might think of as the SQL path) and the REST path; thus, CONNTYPE: DRDA is a little off because what you see in that part of an order-by-connection-type Db2 monitor accounting report is all DDF-related activity on the Db2 subsystem of interest - activity associated with DRDA requesters and activity associated with REST clients.

What should the designator of DDF-related activity in an order-by-connection-type Db2 monitor accounting report be? CONNTYPE: DRDA+REST? CONNTYPE: DDF? Or, should there be a CONNTYPE: DRDA for work that is strictly related to actual DRDA requesters and a CONNTYPE: REST for activity related to REST clients?

I'll let the Db2 monitor developers figure that out. Meanwhile, if you have a Db2 subsystem on which there is a good bit of activity related to both DRDA requesters and REST clients, and you want to see, in one Db2 monitor accounting report, just the REST-related activity, how can you do that? If all the REST clients authenticated to the Db2 subsystem using only two distinct authorization IDs - for example, ABCID and XYZID - then you could get your REST-only accounting report by telling the Db2 monitor to include, in the accounting report, only activity related to those primary authorization IDs. The way you'd convey that request to OMEGAMON for Db2 would be through this specification in the report control statement:

INCLUDE (PRIMAUTH(ABCID XYZID))

Obviously, that approach becomes unwieldy if there are, say, a hundred or more IDs used by Db2 REST clients.

Is there a better way? Yes, there is. It so happens that a Db2 accounting trace record associated with a REST client interaction will include, in the correlation ID field (the field name is QWHCCV), the value 'DB2_REST'. Thus, one can easily get a Db2 monitor accounting report showing all REST-related activity for a subsystem, and only that activity, by informing the monitor that the report should include data only for DDF-related requests for which the correlation ID value is DB2_REST. Again using OMEGAMON for Db2 report command syntax (which is what I know), that would be done with this specification:

INCLUDE(CONNTYPE(DRDA)
        CORRNAME(DB2_REST))

And there you have it. I hope that this information will be useful to you, should your site start seeing a significant amount of Db2 for z/OS access via the REST interface (and I think that a substantial number of sites will see just that in the near future - the REST interface has generated a lot of interest among application developers).

Friday, January 4, 2019

A Case Study: Measuring the Impact of Db2 for z/OS Buffer Pool Changes

Not long ago, a Db2 for z/OS administrator contacted me with a request. His team had made some changes to the buffer pool configuration in their production Db2 environment (the overall size of the buffer pool configuration was increased to reduce read I/O activity, and greater use was made of large real storage page frames), and they were not sure as to how the performance impact of these changes could be measured. Could I assist with that analysis?

Sure, I said. I asked the Db2 admin to use his Db2 performance monitor tool to generate detailed accounting reports (depending on the monitor used, such a report might be titled ACCOUNTING REPORT - LONG or ACCOUNTING SUMMARY - LONG) capturing activity during periods preceding and following the buffer pool changes - same FROM and TO times, same day of the week for both reports (to make the comparison as "apples to apples" as possible). I also requested that the reports be generated so as to have data aggregated at the connection type level (again, terminology can vary by monitor product - your instruction to the monitor might be to "order" the data by connection type, or to "group" the data by connection type). Why do I like for data in a detailed accounting report to be aggregated in this way? Well, I find the default aggregation specification for Db2 monitor-generated accounting reports - authorization ID within plan name - to be too fine-grained for effective analysis of a Db2 system in an overall sense; conversely, aggregation at the subsystem ID level is generally too coarse-grained for my liking. When data in a detailed accounting report is aggregated at the connection type level, you get within the larger report a sub-report for each connection type used in the Db2 for z/OS system of interest; for example: a sub-report showing all CICS-Db2 work in the system, one showing all batch work that uses the call attachment facility, one showing all batch work using the TSO attachment facility, one showing all DDF-related work (connection type: DRDA), etc. With such a report in hand (or on your screen), you can quickly and easily identify the largest Db2 workload components (with "component" here referring to work associated with a given connection type) and focus your analysis accordingly.

I showed the Db2 admin and one of his colleagues how to put the aforementioned "before" and "after" Db2 monitor accounting reports to use in measuring the impact of the effected buffer pool configuration changes, and I'll provide that knowledge transfer here in the remainder of this blog entry.

First, I look over the sub-reports (by connection type) within the larger "before" and "after" detailed accounting reports to find the workload components (connection types) that dominate in terms of size. "Size" refers in this case to aggregate in-Db2 CPU cost of SQL statement execution, and that's easy to determine: for a given workload component (a given connection type), find first the average in-Db2 (i.e., class 2) CPU time (done by summing two values: average class 2 CP CPU time, which is general-purpose CPU time, and average class 2 SE CPU time, which is "specialty engine," or zIIP, CPU time). Next, take that average class 2 CPU time for the connection type, and multiply it by the number of occurrences (typically, number of accounting records), because "average" is "average per occurrence." Here is an example of what you might see in one of the sub-reports (this for a DRDA workload), with the numbers I've referenced highlighted in red:

CONNTYPE: DRDA

AVERAGE.     DB2 (CL.2)                   HIGHLIGHTS
-----------  ----------                   -----------------------
                                          #OCCURRENCES  : 3620821
CP CPU TIME    0.000169

SE CPU TIME.   0.000223

So in this case, aggregate CPU cost of SQL statement execution (in-Db2, or class 2, accounting time is time associated with SQL statement execution) for the DRDA workload component is (0.000169 + 0.000223) * 3620821 = 1419 CPU seconds.

Performing this simple calculation showed that for the overall Db2 workload I was analyzing, the two biggest components by far were the DRDA and CICS-Db2 connection types; so, I didn't bother with the other much-smaller components (call attachment facility, TSO attachment facility, etc.).

For the two major workload components (DRDA and CICS-Db2), I compared "before" and "after" figures for average class 2 CPU time. That is the key metric for evaluating the impact of a Db2 system or application performance tuning action. Too often, Db2 for z/OS people effect a change aimed at improving workload performance and look, for evidence of improvement, at the total class 2 CPU time for the workload of interest (and by the way, I focus on class 2 CPU time because that is what a Db2 person can influence - class 1 CPU time includes time outside of Db2, over which a Db2 person may not have much control). "Before" and "after" figures for total class 2 CPU time for a workload may show little or nothing in the way of difference, leading a Db2 person to think that a performance tuning action accomplished nothing. In fact, it may be that the tuning change was indeed successful. How could that be, if total in-Db2 CPU time did not decrease? Simple: look for a change in throughput (the number of commits is a pretty good indicator of "pieces of Db2 work" completed on a system). A Db2 tuning change might reduce average in-Db2 CPU time per transaction (or, more broadly, per unit of work), and that might result in greater throughput (units of work completed in a given time period), and that (more transactions completed, at less in-Db2 CPU time per transaction) might result in a non-decrease in the total in-Db2 CPU time for the workload. In my experience, greater throughput with no increase in total in-Db2 CPU time for a workload is a GOOD thing. If you see increased throughput following a Db2 performance tuning change and your organization actually wants to keep throughput at the lower "before" level, that can be accomplished by limiting the availability of Db2 threads for the workload in question - essentially, you keep a lid on throughput by introducing some queuing for Db2 threads.

OK, so your focus as a Db2 person should be on reducing average class 2 CPU time for a workload, as indicated in a Db2 monitor-generated accounting report (or an online monitor display of thread activity on the system - but I prefer to look at accounting reports); and, in tracking this, make sure that you look at "average total" class 2 CPU time, meaning the sum of average class 2 general-purpose CPU time (usually labeled CP CPU time) and average class 2 zIIP CPU time (usually labeled SE CPU time) - always check the average class 2 zIIP CPU time, even for non-DDF-related applications that you might not think of as drivers of zIIP utilization, because that zIIP time figure could be non-zero (for example, if some queries in a batch job are parallelized by Db2, some of the batch job's class 2 CPU time will be consumed on zIIP engines because the "pieces" of queries parallelized by Db2 are zIIP-eligible). As indicated by the average-class-2-CPU-time yardstick, how did the Db2 team with which I worked do, with regard to improving application workload?

They did quite well:

                      Avg cl 2 CPU (GP + zIIP) - seconds
Workload component      Before     After        Change
------------------    --------  --------  ------------
CICS-Db2              0.001322  0.001245   6% decrease
DRDA                  0.000392  0.000259  34% decrease

That 6% improvement in average class 2 CPU time for CICS-Db2 transactions is close to what I am accustomed to seeing as a result of buffer pool actions such as those taken by the Db2 team in the case of which I'm writing: larger buffer pools, and greater use of page-fixed pools backed by large real storage page frames, tend to shave a few percentage points off of average class 2 CPU times reported by a Db2 monitor. A modest improvement of that nature is generally welcome, as it requires no application code changes and positively impacts a large number of programs (it's the kind of change that I put in the "rising tide lifts all boats" category).

What about the 34% improvement in class 2 CPU efficiency seen for the DRDA workload as a result of the buffer pool changes? Why such a large improvement? Here's my theory: as is true in many cases, the the SQL statements associated with the DRDA workload at this site are overwhelmingly dynamic in nature (that is commonly seen when DDF transactions do not make heavy use of static SQL-issuing stored procedures). Dynamic SQL statements can be automatically re-optimized on a regular basis, particularly when a "bounce" (stop and restart) of a Db2 subsystem requires total re-population of the Db2 dynamic statement cache in memory (where the prepared-for-execution form of dynamic SQL statements are stored for re-use). The Db2 subsystem on which this blog entry is focused was indeed bounced on the weekend during which the buffer pool changes were put into effect, so ALL of the dynamic SQL statements coming into the system following that Db2 stop/restart were optimized at first execution. And here's the thing: it's not just indexes and catalog statistics that influence SQL statement access path selection. Db2 memory resources - particularly buffer pool resources - are another factor. I have a feeling that when the dynamic SQL statements (especially queries) associated with the DRDA workload were re-optimized following the significant changes made to the Db2 subsystem's buffer pool configuration, some new and better-performing access plans were generated. Further evidence of positive access path changes: average GETPAGEs per transaction for the DDF workload decreased by 24% following the buffer pool changes. GETPAGE activity is a major determinant of the CPU cost of SQL statement execution, and a substantial decrease in such activity is often indicative of access path changes.

This kind of begs the question: would a rebind of the packages associated with the CICS-Db2 workload (a workload for which, at this site and many other sites, almost 100% of the SQL statements are static) following the buffer pool changes have led to per-transaction CPU efficiency gains going beyond the 6% improvement that we saw? Quite possibly. That's something to think about. If you do decide to rebind static SQL packages following some significant buffer pool changes, keep a couple of things in mind. First, rebinding with APCOMPARE(WARN) will cause Db2 to let you know (via message DSNT285I) when access paths change for one or more statements associated with a package (information about access path changes is provided in the REMARKS column of the PLAN_TABLE when a package is rebound with EXPLAIN(YES)). Second, if access paths change and package performance gets worse as a result (not likely, but possible), you can quickly and easily put the previous instance of the package back into use through a REBIND with SWITCH(PREVIOUS), assuming that plan management is in effect for your Db2 subsystem (and it is "on" by default - check the value of the PLANMGMT parameter in the ZPARM module). Together, these Db2 features are aimed at "taking the fear out of rebinding," as a former colleague of mine so aptly put it.

So, there you go. If you make Db2 changes aimed at boosting performance for a workload, measure the result using the average class 2 CPU time figures provided in Db2 monitor-generated accounting detail reports that show activity from "before" and "after" time periods. Buffer pool changes (larger pools, greater use of page-fixed pools backed by large real storage page frames) usually reduce average class 2 CPU time by a few percentage points. Greater performance gains could be seen if the buffer pool changes enable Db2 to choose new and better-performing access paths for SQL statements. Those statement re-optimization actions tend to be automatic for dynamic SQL statements (especially if a Db2 subsystem's dynamic statement cache has to be re-populated). For static SQL, package rebind is needed for statement re-optimization. Rebinding with APCOMPARE(WARN) and EXPLAIN(YES) will provide information about access path changes, and in the unlikely event of a negative performance result following re-optimization, rebinding with SWITCH(PREVIOUS) puts the previous instance of the package back into use.