Robert's Db2 blog: March 2013

Monday, March 25, 2013

Bringing Control to a DB2 for z/OS Client-Server Workload (Part 2)

In part one of this two-part blog entry, I pointed out that the notion that one cannot control a DB2 for z/OS client-server workload as one can a local-to-DB2 application is quite outdated, and I described a couple of the mechanisms (native SQL procedures, and roles and trusted contexts) by which a DB2 DDF workload can be controlled. In this part two entry, I'll cover some of the other available options for conforming DB2 client-server activity to organizational requirements.

The z/OS Workload Manager, for application priority control

WLM is nothing new -- it's been around for a long time. Still, there are people who seem to feel that WLM cannot be used for precision control of a DB2 client-server workload. What really galls me is the belief -- still, apparently, held by some individuals -- that all work processed through DDF executes at the priority of the DDF address space. That was once the case, long ago. What's true now, and has been for years, is that DDF's dispatching priority applies only to work done by what you might call the address space's "main" or "system" tasks -- and that work is likely to account for only a very small percentage of the total CPU consumption associated with a DB2 client-server workload (as I pointed out in a blog entry on this topic that I posted last year). The vast majority of DDF-related CPU consumption is directly tied to the execution of SQL statements executed via database access threads (DBATs), and THAT work can most definitely be controlled, in a dispatching priority sense, by way of a WLM policy. In fact, it's important to explicitly assign priorities to components of your DDF workload, because if you don't then your DB2 client-server transactions will execute, by default, not at the priority of the DDF address space but as "discretionary" work. That's a low priority, and not at all good for performance in a busy system (and z/OS LPARs are commonly quite busy, typically running with relatively high levels of CPU utilization).

If you have a "mixed" DDF application environment, with some high-volume, structured transactions, some lower-volume, less-structured decision-support queries, some long-running report-generating processes, etc., can you distinguish these different types of work from a z/OS dispatching priority perspective? Of course you can. Through a WLM policy you have the ability to set up multiple service classes for your DB2 client-server workload, and to map various components of the workload to appropriate service classes using a variety of identifiers. These identifiers, which can be provided by client information APIs or through client-side data source specifications, include -- but are not limited to -- the following:

Accounting information
Collection name (i.e., the name of the collection for the first package used by a DRDA requester in a unit of work)
Stored procedure name (the name of the stored procedure initially called in a unit of work)
Application process name
User ID (the primary DB2 authorization ID associated with an application)

A broad range of WLM options are available for managing DDF workload prioritization, including "period aging" for longer-running processes. A really good write-up of DB2 client-server application priority control using WLM can be found in the IBM "red book" titled, "DB2 9 for z/OS: Distributed Functions" (this information is applicable to DB2 10 systems, as well). Check out section 3.3.2 in this document: "Managing DDF work with WLM."

Profiles for application-level control of DB2 connection resources

Some mainframe DB2 people have wished for a while for a more granular, server-side means of managing client connections to a DB2 for z/OS system -- something more refined than the ZPARM parameters CONDBAT (maximum connections to a DB2 subsystem from network-attached clients), MAXDBAT (maximum number of concurrently active database access threads for a DB2 subsystem), and IDTHTOIN (the maximum amount of time that an active DBAT can be idle before being cancelled). Those folks got what they wanted when, with DB2 10, the two tables SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES, originally introduced with DB2 9 for SQL statement monitoring purposes, were made the enablers of an application-level connection management capability. Using these tables, one can define multiple scopes of DB2 connection control, based on criteria such as the IP address of an application server, the DB2 authorization ID or role associated with an application, or the name of a collection or a package used by an application. For a client-server workload component identified by one of these criteria, one can define and manage controls in relation to several thresholds:

The maximum number of connections (including both active and inactive connections) to the DB2 subsystem from an IP address or domain name.
The maximum number of concurrently active DBATs used for a DDF workload component identified by IP address, authorization ID, role, collection, or client application name, among other filtering criteria.
The active DBAT idle timeout limit for a DDF workload component (as for concurrent active DBAT limits, there are multiple criteria available for defining a particular workload component). Note that the specified active DBAT idle timeout limit can be greater or less than the subsystem-wide active DBAT idle timeout limit indicated by the value of IDTHTOIN in ZPARM.

For any of these connection control profiles, you can instruct DB2 to just inform you (via messages) when thresholds have been exceeded, or take action when limits are reached (e.g., reject new incoming remote connection requests from an IP address when the connection limit for that IP address has been reached). You can find a lot more information about managing DDF connections and threads using profile tables in the DB2 10 for z/OS Information Center.

The query repository, for access path and execution option control at the SQL statement level

As I mentioned in the part one entry that preceded this part two, DB2 client-server applications very often involve the execution of dynamic SQL statements. This fact presented some control challenges that were, formerly, particularly acute in situations in which the dynamic SQL statements in question could not be modified (think about client-server applications purchased from vendors, or queries generated by business intelligence tools). I say "formerly" because DB2 10 delivered a significant enhancement in the area of dynamic SQL statement control: the query repository (more formally known as the access path repository). That term refers to a set of new catalog tables that can be used for two important purposes: access path stabilization and statement execution option selectivity. In both cases, one begins by binding the individual SQL statement in question, using a DSN_USERQUERY_TABLE (the schema of which would be your DB2 authorization ID) and the new BIND QUERY command. With that action, information is inserted into the new SYSIBM.SYSQUERY catalog table. Two other new catalog tables SYSIBM.SYSQUERYPLAN and SYSIBM.SYSQUERYOPTS, can also be populated with information when you issue the BIND QUERY command, depending on what you want to do:

Influence a query's access path using a hint, WITHOUT having to modify the query itself. I refer to this as "hintless hints," to distinguish the capability from the more traditional DB2 hint. The problem with the latter is that it often requires modification of the SQL statement in question to add a QUERYNO clause, and as I've noted that is frequently not possible. With DB2 10, a hint can be applied to a query based on a matching of that query's text with that of a query in the SYSQUERY catalog table; thus, no modification of the query for which you want the hint to apply is required. What's more, you can specify that a hint of this type is to apply based only on query text-matching, or on a combination of text-matching and a particular schema or collection or package.
Specify certain execution options for a particular SQL statement. Sometimes, you want a particular dynamic SQL statement to be a candidate for query parallelization, without making ALL dynamic queries candidates for parallelization through a specification of ANY for the CDSSRDEF parameter in ZPARM. Similarly, you might want a REOPT option (such as ONCE or AUTO) to apply to a certain statement, versus all statements associated with a given package (a useful capability when a whole lot of dynamic statements are associated with, say, one of the IBM Data Server Driver or DB2 Connect packages). The query repository lets you exert that kind of specific control for query parallelization, reoptimization, and other execution options (such as those pertaining to star join processing).

The DB2 10 Information Center is a great source of additional information on creating statement-level optimization hints and parameters.

And there's more...

How about control over retention of certain resources needed for execution of statements associated with a given package? Prior to DB2 10, RELEASE(COMMIT) was the only option for packages executed through DDF (a package been bound with RELEASE(DEALLOCATE) would be treated as though bound with RELEASE(COMMIT), if executed via a DBAT). DB2 10 high-performance DBATs changed that. Do you run DB2 in data sharing mode? If so, do you want to control the data sharing group members on which requests from certain client-server applications can be processed? If you do, member subsetting is your ticket. Want even more control over the execution of a client-server workload in a DB2 data sharing system? Check out the IBM's InfoSphere Optim Configuration Manager.

Client-server applications are a major driver -- perhaps the major driver -- of DB2 for z/OS workload growth these days. Be a part of that story, and know that this is work that you CAN control.

Saturday, March 16, 2013

Bringing Control to a DB2 for z/OS Client-Server Workload (Part 1)

At System z sites all over the world, DB2 for z/OS workloads are growing -- often at a brisk clip. You'd think that this would be unqualified good news for the folks who support mainframe DB2 subsystems, but in fact some people view a lot of this new DB2 activity with trepidation. Why? Because the lion's share of application work being implemented these days on a DB2 for z/OS foundation is of the client-server variety. In other words, it involves data access through DB2's distributed data facility, aka DDF. OK, so what's the problem with that? The problem has to do with the belief of some individuals that a DB2 DDF workload is one that they cannot control.

I'll tell you up front that this notion of a DB2 client-server workload as being a chaotic swirl of data requests that execute without regard to rules is NOT grounded in facts -- at least, not in today's facts. And there you have a root cause of a lot of misunderstanding: knowledge of DDF that was current 20 years ago but which hasn't kept pace with enhancements that have been delivered with each succeeding release of DB2. In this blog post and in the part 2 post that will follow, I will provide information on several mechanisms that can be used to establish various controls on a DB2 for z/OS client-server application workload.

Let me start by saying that I don't consider people who are concerned about managing a DB2 DDF workload to be wound too tightly, as we say. No, they're being good mainframers. They support a platform that is known for rock-solid reliability, and a pillar of that reliability is control. In a DB2 sense, that means control over data access privileges, over SQL statement access paths, over workload classification and and management, and over DB2 connection resources provided to particular applications. It's true that in the early years of DDF (back in the 1990s), some controls that you'd like to have weren't there. Now they are. In the paragraphs below I'll describe a few of the techniques that can be utilized to bring control to a DB2 for z/OS client-server workload. More information will be provided in another week or so, in the companion to this part 1 entry.

Native SQL procedures, for data access control and SQL statement access path stability

Now, in truth, the SQL statement access path and data access control benefits of stored procedures are available whether external or native SQL procedures are utilized; however, in a client-server application context I'm particularly keen on native SQL procedures (about which I first started blogging while working as an independent DB2 consultant), for a couple of reasons:

All kinds of people can write them. This includes people who have worked predominantly with DB2 for LUW, or even with relational database management systems other than DB2. If you can write SQL, you can code a native SQL procedure (and it's even easier when you use Data Studio).
They give you lots of zIIP engine offload when invoked through DDF. Unlike an external DB2 stored procedure, which runs under a TCB in a WLM-managed stored procedure address space, a native SQL procedure runs under the task of the calling program. When the CALL comes from a DRDA requester, that task is a preemptible SRB in the DB2 DDF address space, and so the processing associated with the native SQL procedure's execution can be largely offloaded (to the tune of about 60%) to an available zIIP engine. Native SQL procedures are pretty CPU-efficient anyway (and more so in a DB2 10 environment). Toss in the high degree of zIIP eligibility in a client-server setting, and native SQL procedures look even more attractive from a cost-of-computing standpoint.

The data access control and access path stability benefits of native SQL procedures owe to the fact that the SQL statements issued by such a procedure are static (or typically so -- one can, of course, code PREPARE statements in native SQL procedures). Data security is enhanced when static SQL is used because the only authorization required for successful execution of SQL statements in that case is the EXECUTE privilege on the package into which the statements were bound (plus, in the case of a stored procedure, the EXECUTE privilege on the stored procedure). By contrast, successful execution of dynamic SQL DML statements requires that the authorization ID of the process issuing the statements have the requisite access privileges (SELECT and/or INSERT and/or UPDATE and/or DELETE) on the tables targeted by the statements. Stored procedures further enhance data security by abstracting from client-side developers the details of the database schema (table names, column names, etc.).

The stable access paths that characterize static SQL statements help to ensure consistent performance when the statements are executed. Static SQL also minimizes CPU consumption: even with a high "hit" percentage in the DB2 dynamic statement cache, dynamic SQL might approach -- but will not equal -- the CPU efficiency of static SQL.

So, static SQL is good, but do you need to utilize stored procedures to get the benefits of static SQL for DB2 client-server applications? Not necessarily. If your client-side code is written in Java, those programs can issue static SQL statements by way of SQLJ; however, while SQLJ is a standard at some sites, it is often the case that Java programmers will have a strong preference for using JDBC, and JDBC means dynamic SQL at the DB2 data server. That's what makes stored procedures a great fit in a DB2 client-server application environment: they provide a means whereby static SQL can be dynamically invoked. If your client-side developers prefer a database interface such as JDBC or ODBC, fine -- let them use JDBC or ODBC to call DB2 stored procedures and to process output parameters and/or result sets returned by those stored procedures.

Roles and trusted contexts for data access control

I won't repeat here everything that I wrote in a blog entry on roles and trusted contexts that I posted a couple of years ago. Let me instead focus on setting the stage as to why these features, introduced with DB2 9 for z/OS, are so valuable when it comes to controlling data access in a DB2 client-server context.

Sometimes, DB2 client-server applications are going to issue dynamic SQL DML statements, whether you like it or not. That will very often be case when the application in question is vendor-supplied (e.g., a purchased enterprise resource planning or human resources application). You'll be dealing with dynamic SQL, as well, if statements are generated by an end-user query or reporting tool. You could have dynamic SQL statements coming from an in-house-developed application, if the aforementioned stored procedure route (or the SQLJ route, for Java programs) is not taken. Whatever the reason for dynamic SQL DML statements, they create a potential data security exposure. As mentioned previously, successful execution of a dynamic SELECT, INSERT, UPDATE, or DELETE statement requires that the application's DB2 authorization ID have the corresponding privilege (i.e., SELECT, INSERT, UPDATE, DELETE) on the targeted table. Now, it's likely that an application issuing dynamic SQL DML statements will connect to DB2 using an authorization ID and password that are unique to the application, but anyone who knows that authorization ID and password would seem to be free to use the table access privileges granted to the ID to access data -- maybe very sensitive data -- outside of the application. That's kind of a scary thought.

Enter roles and trusted contexts. In essence, here's how these two DB2 features work together to tighten data access control for client-server applications that issue dynamic SQL DML statements: first, the table access privileges needed for an application to work are granted not to an authorization ID, but to a role (something created by way of the CREATE ROLE statement). Next, the conditions of the use of the role's privilege set are defined through the creation of a trusted context. In plain English, the CREATE TRUSTED CONTEXT statement says something like this: "OK, I will allow the privileges granted to role XYZ to be used by an application that connects to the DB2 subsystem using authorization ID ABC, but only when that connection is established from the application server with IP address 9.30.131.203." If someone who knows the application's authorization ID and password attempts to use these credentials to connect to the DB2 subsystem from his personal workstation, the connection might be successful but no security exposure will exist because no DB2 privileges were granted to the authorization ID -- they were granted to role XYZ. The unauthorized user of the application's ID will not be able to use the table privileges granted to role XYZ, because he has connected to the DB2 subsystem from an IP address other than the one specified in the CREATE TRUSTED CONTEXT statement. If the rogue user isn't permitted to log in to the app server at (in this example) IP address 9.30.131.203, he can't do anything, DB2-wise, with the application's authorization ID and password. If needs be, a trusted context can further tighten data access control by specifying that only user IDs SMITH, JONES, and MILLER can use the privileges granted to role XYZ, and again only when the "come from" IP address is 9.30.131.203. With roles and trusted contexts, people responsible for safeguarding DB2 data can rest easier, even when client-server applications are issuing dynamic SQL DML statements.

And more to come...

In my part 2 entry, I'll provide information about additional mechanisms -- namely, the z/OS workload manager, DB2 10 profiles, and the DB2 10 query repository -- that you can use to control a DB2 client-server workload.