Wednesday, September 7, 2011

Managing DB2 for z/OS CPU Parallelism for Dynamic SQL

DB2 for z/OS query CPU parallelism (hereinafter referred to simply as "query parallelism"), introduced with DB2 Version 4 in the mid-1990s, enables DB2 to split a query into multiple tasks that can execute concurrently on a mainframe server. The primary benefit of query parallelism is improved performance from an elapsed time perspective (query parallelism tends to increase statement execution CPU time somewhat -- this because DB2 needs to evaluate a candidate statement to determine whether or not parallelism would improve the query's run time, and because DB2 has to manage a parallelized query's "child" tasks). That said, for locally-originating queries (i.e., those issued by programs running on the mainframe server on which the target DB2 subsystem is running), a sometimes equally-appealing benefit is redirection of work to the server's zIIP engines, if the server is configured with zIIPs (queries that come from remote DRDA requesters, processed through DB2's distributed data facility, already benefit, cost-wise, from substantial zIIP offload).

One matter that sometimes comes up when an organization is considering the use of query parallelism concerns management of the DB2 feature's scope-of-use. In other words, people may want some, but not all, queries executed in a given DB2 environment to be candidates for parallelism. This some-but-not-all approach makes sense in certain situations, particularly when there is a desire to utilize query parallelism in a system on which high-volume OLTP and/or batch applications are already running. Mainframe servers that handle such operational workloads often run at high levels of utilization during peak processing hours (sometimes 90% busy or more). On that type of system, the CPU cost of evaluation by DB2 of all queries for estimated parallelism performance benefit, and of management by DB2 of child tasks for queries that are parallelized, could push the server's utilization rate to a level that would negatively impact the performance of the existing operational applications. [Note: for a DB2 subsystem that is dedicated to business intelligence usage (an example being a data warehousing system), it might be appropriate to make all dynamic queries -- and dynamic SQL tends to dominate in BI environments -- candidates for parallelism by changing the value of the DB2 ZPARM parameter CDSSRDEF to ANY from its default value of 1.]

So, if you want some, but not all, of your queries to be candidates for parallelization, how do you go about putting that aim into action? For static SQL, it's easy: identify the program that issues the SELECTs that you'd like DB2 to parallelize, and rebind that package with a specification of DEGREE(ANY). For dynamic SQL, selective implementation of query parallelism might be a little less straightforward, depending on the particulars of the situation. If dynamic queries are issued by a program that an organization's developers have coded, making that program's queries candidates for parallelization would simply require that the program issue the SQL statement SET CURRENT DEGREE = 'ANY' prior to issuing the queries. In other cases, the SET CURRENT DEGREE mechanism may not be applicable. You might have a situation in which dynamic queries are generated and issued by an end-user data query tool, and that tool might not provide a means of setting the value of the CURRENT DEGREE special register to 'ANY'. Or, the dynamic queries you want DB2 to parallelize might be issued by a purchased application for which the vendor does not supply source code, thereby making it impossible to add a SET CURRENT DEGREE statement to a program (and even if you have the source code for a purchased application, modifying that source code could put you in violation of the vendor's support agreement).

What can you do if the SET CURRENT DEGREE = 'ANY' option is not available to you? Here, you have some options with respect to implementing limited-scope parallelism for dynamic queries (i.e., implementing parallelism for queries in a "some-but-not-all" fashion), as explained below. Note that for all three of the parallelism management approaches described below, the value of the CDSSRDEF parameter in ZPARM is assumed to be ANY (in other words, it's assumed that you've opened the door for all-dynamic-query parallelism on that subsystem, and you'll then take one or more steps to close that door for all but a portion of the dynamic query workload).  

Limit query parallelism by table -- Suppose that you want to make dynamic queries targeting table ABC candidates for parallelism, while preventing parallelism for dynamic queries accessing other tables. This could be done pretty easily be assigning the table space associated with table ABC to its own buffer pool, and leaving the value of that pool's VPPSEQT parameter at the default of 50 (or specify a value greater than 50, if you want -- more on this momentarily). Set the value of VPPSEQT to zero for your other buffer pools, and parallelism for queries targeting tables in table spaces assigned to those pools will be disabled as a result.

Now, a bit more on VPPSEQT (short for Virtual Pool Parallel SEQuential Threshold): this is one of several DB2 buffer pool parameter specifications that can be altered for a particular pool via execution of an ALTER BUFFERPOOL command. The default value for VPPSEQT, as noted, is 50. This is actually a percentage of the VPSEQT (Virtual Pool SEQuential Threshold) value, which is 80 by default. That 80 is a percentage of the associated pool's buffers, so by default 50% X 80% = 40% of a pool's buffers are available to hold table and/or index pages that are read from disk as a result of parallel-mode query execution. If you change the VPPSEQT threshold for a pool to 60 and leave the VPSEQT threshold at 80 then, for that pool, 60% X 80% = 48% of the buffers will be available to hold pages read into memory in support of parallel-mode query processing. If VPPSEQT is set to zero, no buffers will be available to support parallel-mode query processing, so queries targeting tables in table spaces assigned to that pool will not be candidates for parallelization.

Limit query parallelism by package, collection, etc. -- A lot of people think of DB2's Resource Limit Facility (RLF) only as a means of limiting the amount of CPU time that a dynamic SQL statement can accumulate in execution before it is terminated. That is indeed the most common usage of RLF, and it is actuated through the use of what's called an RLMT table (the name of this table is actually of the form DSNRLMTxx, with xx being a user-specified two-character alphanumeric value). You could alternatively activate an RLST table (DSNRLSTxx), which would allow you to (among other things) deactivate query parallelism for dynamic queries associated with workload components qualified by certain identifiers (options include authorization ID, collection name, and package name -- these correspond to columns of the RLST table). Remember, your specification of ANY for CDSSRDEF in ZPARM opened the door for parallelism candidacy to all dynamic queries -- through the RLST table (via the value '4' in the RLFUNC column) you are closing that door for workloads OTHER than the one for which you want dynamic queries to be candidates for parallelism. Here's an interesting and easy-to-specify set-up: with one row in the RLST table, having '4' in column RLFUNC and no values in any other columns, parallelism will be disabled for dynamic queries that are local to the DB2 subsystem (e.g., issued via SPUFI or QMF or batch jobs), while allowing parallelism for queries issued from remote DRDA requesters (this thanks to a blank value in the LUNAME column of the RLST table, which represents the local location). You can find more information about the RLST table in the DB2 Version 8 Administration Guide, the DB2 9 Performance Monitoring and Tuning Guide, or the DB2 10 Managing Performance manual.

Limit query parallelism by DB2 data sharing group member -- I really like this option, but it does, of course, require that you have DB2 running in data sharing mode on a Parallel Sysplex. The concept here is pretty simple: you take one or more of the members of the data sharing group, and you set the value of the ZPARM parameter CDSSRDEF to ANY on those subsystems (leaving the value of CDSSRDEF at 1 on the other members of the group -- and it's assumed that you'll run your high-volume OLTP and batch processes on the CDSSRDEF = 1 subsystems). If you have "local" programs that issue dynamic queries that you want DB2 to parallelize, run those on the subsystem(s) that has CDSSRDEF = ANY. If you have users and/or applications that issue dynamic queries that access host data via DRDA and DB2's distributed data facility, have these users and apps connect to the CDSSRDEF = ANY subsystem(s). For maximum availability as it pertains to programs and users issuing dynamic queries that are to be candidates for parallelism, it's best to have at least two member DB2 subsystems set up for parallelism-by-default (i.e., with CDSSRDEF = ANY). That way, if one is down due to a planned (e.g., for maintenance) or unplanned outage, the other member(s) can continue to parallelize dynamic queries as desired. This kind of configuration for high availability was made MUCH easier for DRDA requesters when DB2 for z/OS Version 8 introduced a member-subsetting capability. Setting this up would involve creating an alias location name for the DB2 members for which you have CDSSRDEF = ANY, assigning a TCP/IP port to that alias, and updating the DDF record in the respective members' bootstrap data sets with this information (done via DSNJU003, the change log inventory utility). Combine a location alias with the use of dynamic virtual IP addresses (DVIPAs) for DB2 members, and you'll have a high-availability configuration that is simple from the perspective of users and applications connecting to the CDSSRDEF = ANY DB2 subsystems (from a CONNECT TO point of view, there is no need to know the names of individual DB2 members associated with the location alias). All of this is described quite clearly in an excellent IBM "red book" titled DB2 9 for z/OS: Distributed Functions.

And there you have it. You can implement parallelism for dynamic queries in a limited fashion, and you have options when it comes to setting this up. Assess your situation, and go the route that makes sense for your organization.

1 comment: