Sunday, June 30, 2019

Db2 for z/OS - Talking About APPLCOMPAT (Part 1)

The Db2 for z/OS package bind option APPLCOMPAT is not new - it was introduced with Db2 11. Even so, I think that now is a good time to spend some time thinking about APPLCOMPAT, for two reasons: 1) I've encountered some misunderstandings pertaining to APPLCOMPAT, and I'd like to clear those up; and 2) Db2 12 has introduced some changes that relate to APPLCOMPAT, and folks should be aware of those changes.

In this first of a two-part blog entry, I'll cover several aspects of APPLCOMPAT. In the part 2 entry, which I'll post within the next 30 days or so, I'll focus on one particular issue, that being the APPLCOMPAT value used for IBM Data Server Driver / Db2 Connect packages in a Db2 12 for z/OS environment.


What does APPLCOMPAT do?

APPLCOMPAT has a twofold purpose. First, it enables the use of SQL functionality introduced with a Db2 version and/or (in a Db2 12 environment) function level. Consider, for example, the LISTAGG built-in function delivered with Db2 12 function level V12R1M501 (LISTAGG is pretty slick, making it easy to have, as a column in a query result set, a comma-separated list of values). Suppose that an application developer wants to use LISTAGG in a query issued by his program. If the program's package is bound with APPLCOMPAT(V12R1M500) - or any other value below V12R1M501 - then the query that includes LISTAGG will get a SQL error code at execution time (most likely a -4743, ATTEMPT TO USE NEW FUNCTION WHEN THE APPLICATION COMPATIBILITY SETTING IS SET FOR A PREVIOUS LEVEL). The developer will need to change the query so that LISTAGG is not invoked, or the program's package will have to be rebound with an APPLCOMPAT value that is equal to or greater than V12R1M501 (the APPLCOMPAT value in a Db2 12 system can be as high as the system's currently-activated function level).

APPLCOMPAT's other purpose is to provide a shield that protects a program from what the Db2 documentation calls a "SQL incompatibility." I prefer the phrase, "SQL behavioral change," but whether you use that wording or "SQL incompatibility," what you're talking about is a situation in which the same SQL statement, executed with the same data, yields a different result after activation of the functionality provided by a new Db2 version (or a new Db2 12 function level). Here's an example of a SQL incompatibility: while in a Db2 10 system a SQL statement can successfully cast an 8-byte store clock value AS TIMESTAMP, in a Db2 11 new-function mode system that same SQL statement, operating on the same 8-byte store clock value, will get a -180 SQL error code - this because in a Db2 11 NFM environment a store clock value is no longer a valid input for a CAST(... AS TIMESTAMP) operation. Suppose you have a program with a SQL statement that casts a store clock value as a timestamp, and suppose changing that SQL statement (to avoid the -180 error code in a Db2 11 NFM system) is not feasible for some reason. In that case, the program's package could be bound with APPLCOMPAT(V10R1), and as a result the SQL statement will get, in a Db2 11 NFM system (or a Db2 12 system), the same result that it would get in a Db2 10 system, because APPLCOMPAT(V10R1) indicates that behavior for the package's SQL statements is to be what it would be in a Db2 10 system, regardless of the fact that the package is executing in a Db2 11 NFM (or Db2 12) environment (we always document SQL incompatibilities - the change related to casting store a clock value as a timestamp is one of the Db2 11-introduced incompatibilities).


Where does a package's APPLCOMPAT value come from?

As already mentioned, APPLCOMPAT is a package bind option. It is also a ZPARM parameter. That ZPARM has one and only one purpose: it provides the default value of APPLCOMPAT, when needed, for a BIND PACKAGE or REBIND PACKAGE operation. When is the ZPARM-provided default needed? For BIND PACKAGE, it is needed when an APPLCOMPAT value is not specified in the command. For REBIND PACKAGE, it is needed when an APPLCOMPAT value is not specified in the command and "not already there" for the package being rebound. I'll explain what I mean by "not already there." When a package is rebound and an APPLCOMPAT value is not specified in the REBIND PACKAGE command, the package's existing APPLCOMPAT value, if anything, will carry forward through the REBIND operation (so, if a package with an APPLCOMPAT value of V10R1 is rebound and an APPLCOMPAT value is not specified in the REBIND PACKAGE command, the rebound package will retain the V10R1 APPLCOMPAT value). "If anything" implies that the package may not have an explicit APPLCOMPAT value prior to the rebind, and that could indeed be the case (this situation would be indicated by a blank value in the APPLCOMPAT column in the package's row in the SYSIBM.SYSPACKAGE catalog table). When that is so the value used for the REBIND PACKAGE operation will be the value of the APPLCOMPAT parameter in ZPARM (as is the case when BIND PACKAGE is issued sans an APPLCOMPAT specification).

"How about dynamic SQL?" you may ask. "Can't a dynamic SQL-issuing program change the in-effect application compatibility level via the SQL statement SET CURRENT APPLICATION COMPATIBILITY?" Yes, that can be done, but starting with a Db2 12 system for which function level V12R1M500 or higher has been activated, the value specified for the SET CURRENT APPLICATION COMPATIBILITY special register cannot be greater than the APPLCOMPAT value of the program's Db2 package. Any dynamic SQL-issuing program has an associated package, which could be an IBM Data Server Driver / Db2 Connect package. If the package used for a dynamic SQL-issuing program is bound with APPLCOMPAT(V12R1M500) then V12R1M500 will be the initial value of the CURRENT APPLICATION COMPATIBILITY special register when the program executes, and the program can set the value of that special register to something less than V12R1M500 but not to a value greater than V12R1M500.


An optimizer-related misconception

Sometimes people think that APPLCOMPAT does more than it actually does. Case in point: SQL statement optimization. Lots of folks know that it's a good idea to rebind all plans and packages in a Db2 for z/OS system soon after going to a new version of Db2 (one reason: the package code generated by Db2 version n - packages are executable code, you know - is likely to be more CPU-efficient than the code generated for the same package by version n-1). Some of those same people think, however, that to take advantage of new access path choices available with the version n optimizer, APPLCOMPAT(n) - where n would be a value associated with the new Db2 version - has to be specified for the rebind operations. Nope. If you rebind a package with APPLCOMPAT(V11R1) in a Db2 12 system, you will be able to get for that package the performance benefit of access path choices introduced with Db2 12 (assuming that you did not go with the APREUSE option, which tells Db2 to reuse the existing access paths for the package's SQL statements).

Remember: APPLCOMPAT can enable use of new Db2 functionality, and APPLCOMPAT can serve as a shield to insulate a program from a SQL incompatibility introduced by a new version (or new function level) of Db2. APPLCOMPAT does not tell the Db2 optimizer what to do.


Db2 12 and the DDL dimension

In a Db2 11 system, APPLCOMPAT affects SQL DML statements (e.g., SELECT, INSERT, UPDATE and DELETE), but not DDL statements (e.g., CREATE, ALTER). That being the case, in a Db2 11 new-function mode environment a program whose package is bound with V10R1 cannot reference a global variable in a SELECT statement (global variables were introduced with Db2 11), but that same program can create a global variable (because DDL is not affected by APPLCOMPAT in a Db2 11 system).

Starting with Db2 12, APPLCOMPAT affects both DML and DDL statements; so, in a Db2 12 system, a program whose package has an APPLCOMPAT value of V12R1M500 cannot issue an ALTER TABLESPACE statement that includes a KEY LABEL specification (indicating that data in the table is to be encrypted using the key associated with the specified label), because KEY LABEL became an ALTER TABLE option starting with function level V12R1M502, and using that new DDL option would require that the package of the program issuing the ALTER TABLE statement be bound with an APPLCOMPAT value of V12R1M502 or greater.

This DDL dimension of APPLCOMPAT in a Db2 12 system gets particularly interesting starting with function level V12R1M504. With that function level (or higher) activated, a program whose package is bound with APPLCOMPAT(V12R1M504) or higher cannot use static SQL to create a traditional segmented table space or a range-partitioned table space that is not universal partition-by-range (in other words, that program cannot use static SQL to create a table space that is not either a partition-by-growth or a partition-by-range universal table space). Why is this so? Because plenty of Db2-using organizations, cognizant of the advantages of universal table spaces, do not want non-universal table spaces to be even accidentally created. Note also that a program whose package has an APPLCOMPAT value of V12R1M504 or higher cannot use static SQL to create a hash-organized table or a synonym (in the latter case one would create an alias instead).

Why did I repeatedly underline the phrase "cannot use static SQL" in the paragraph above? See the paragraph below.

What if you have a Db2 12 system with function level V12R1M504 or higher activated, and you need to create a traditional segmented table space? You have a couple of options for getting that done. One option would be to use a dynamic SQL-issuing program (an example being the Db2-supplied program called DSNTEP2) whose package has an APPLCOMPAT value of V12R1M504 or higher, and have that program first issue the SQL statement SET CURRENT APPLICATION COMPATIBILITY = 'V12R1M503' and then issue the CREATE for the traditional segmented table space (recall that in a Db2 12 environment in which function level V12R1M500 or higher has been activated, a dynamic SQL-issuing program can set the in-effect application compatibility level to something below the APPLCOMPAT value of its package, but not to something above that APPLCOMPAT value). The other option: create the traditional segmented table space via a program whose package has an APPLCOMPAT value of V12R1M503 or lower.

That's all for now. Tune in in about a month to see part 2 of this blog entry and get some guidance on specifying APPLCOMPAT for the IBM Data Server Driver / Db2 Connect packages in a Db2 12 system.

Friday, May 31, 2019

Db2 for z/OS: IIPHONORPRIORITY and zIIP offload

There is something I have encountered a couple of times (most recently, just a month ago) that can have a negative performance impact on a Db2 for z/OS system, and it's something of which quite a few Db2 people are, I think, unaware. I'm referring here to the setting of a z/OS parameter called IIPHONORPRIORITY, and I want to use this blog entry to provide information on how the value of that parameter can affect the processing of a Db2 workload.

IIPHONORPRIORITY is a parameter in the IEAOPTxx member of the z/OS data set called SYS1.PARMLIB. The value of the parameter can be set to NO or YES. Essentially, those two settings have the following meanings:

  • YES - Honor the priority of zIIP-eligible tasks (per the z/OS LPAR's workload manager policy), so that in the event that such a task is ready for dispatch and the LPAR's zIIP engines are busy, allow the task to be dispatched to a general-purpose engine so that wait-for-dispatch time for the task will not be too high.
  • NO - Regardless of the priority of zIIP-eligible tasks, limit the processing of such tasks to zIIP engines. If a zIIP-eligible task is ready for dispatch and the LPAR's zIIP engines are busy, require that the task wait until a zIIP engine becomes available.

Now, at first consideration a setting of IIPHONORPRIORITY=NO might appear to be an attractive alternative. After all, the raison d'ĂȘtre for zIIP engines is reduced cost of computing on a z/OS platform. Why not maximize that cost benefit by forcing all zIIP-eligible work to be processed by zIIP engines? In fact, however, if Db2 for z/OS is in the picture then specifying IIPHONORPRIORITY=NO can lead to significant performance degradation for an application workload (especially a DDF workload) and can even reduce offloading of work to zIIP engines. I have seen these effects with my own eyes, and I will describe them in more detail below.

Negative impact on application performance. Just about a month ago, as part of a review of an organization's production Db2 for z/OS environment, I looked through a Db2 monitor-generated accounting long report for the subsystem of interest. Data in the report, showing activity for a busy hour of the day, was ordered by connection type (meaning that the data was aggregated by connection type used by Db2-accessing programs: DDF, CICS, call attachment facility, etc.). In the DDF section of the report (connection type: DRDA), I saw that not-accounted-for time was 65% of in-Db2 elapsed time for DDF-using applications. In-Db2 not-accounted-for time is time associated with SQL statement execution that is not CPU time and not "known" wait time (the latter term referring to Db2 accounting trace class 3 wait times such as wait for synchronous read, wait for lock, wait for latch, etc.). In my experience, in-Db2 not-accounted-for time is primarily reflective of wait-for-dispatch time, and for a transactional workload (like the typical DDF workload) it should be a relatively small percentage of average in-Db2 elapsed time - under 10% is good, between 10 and 20% is marginally acceptable. 65% is an extremely elevated level of not-accounted-for time as a percentage of in-Db2 elapsed time.

I was initially perplexed by the very high level of in-Db2 not-accounted-for time seen for the DDF workload. The level of utilization of the system's general-purpose and zIIP engines, as seen in a z/OS monitor-generated CPU activity report, was not high enough to make that a prime culprit. Later, seeing unusual numbers in a Db2 monitor-generated statistics long report (more on that to come) prompted me to ask the question, "Do you folks by chance have IIPHONORPRIORITY set to NO?" It turned out that that was indeed the case, and was very likely the root cause of the very high percentage of in-Db2 not-accounted-for time observed for DDF-using applications: zIIP-eligible tasks servicing requests from Db2 client programs were spending a considerable amount of time queued up waiting for a zIIP engine to become available, with redirection to a general-purpose engine having been removed as an option via the IIPHONORPRIORITY=NO specification. A much better approach would be to have IIPHONORPRIORITY set to YES, and to have zIIP engine capacity sufficient to keep "spill-over" of zIIP-eligible work to general-purpose engines at an acceptably low level (information on monitoring spill-over of zIIP-eligible work to general-purpose engines can be found in a blog entry I posted on that topic); and, keep in mind that running zIIPs in SMT2 mode can help to minimize the zIIP spill-over rate.

Allowing some zIIP-eligible work to be directed to general-purpose engines when an LPAR's zIIP engines are busy (which should not often be the case when the LPAR is configured with an adequate amount of zIIP capacity), made possible via IIPHONORPRIORITY=YES, is a safety valve that enables consistently good performance for a zIIP-heavy workload such as that associated with DDF-using applications.

Potential non-maximization of zIIP offload. You might wonder, "How could setting IIPHONORPRIORITY to NO possibly have a reductive effect on offloading of work to zIIP engines?" I'll tell you how: when IIPHONORPRIORITY=NO is in effect, "Db2 does not allow system tasks to become eligible for zIIP processing." I have that in quotes because the words are taken directly from a page in the Db2 for z/OS Knowledge Center on IBM's Web site (on that page, scroll down to the part under the heading, "IIPHONORPRIORITY parameter"). To understand what the quoted phrase means, consider that zIIP-eligible Db2 tasks can be broadly divided into two categories: user tasks and system tasks. zIIP-eligible user tasks (which can also be thought of as application tasks) include those under which SQL statements issued by DRDA requesters - and SQL statements issued by native SQL procedures called through DDF - execute. Those zIIP-eligible tasks must be processed by zIIP engines when IIPHONORPRIORITY is set to NO. zIIP-eligible system tasks include those under which operations such as prefetch reads and database writes execute, and those tasks cannot run on zIIP engines when IIPHONORPRIORITY is set to NO (because in that case Db2 makes those tasks non-zIIP-eligible).

So, think about it: if you have sufficient zIIP capacity to keep spill-over of zIIP-eligible work to general-purpose engines at a low level, which should be the case in a production Db2 for z/OS environment (and I consider "low level" to be less than 1%), and you have IIPHONORPRIORITY set to NO, you might actually be causing total zIIP offload of Db2-related work to be lower than it otherwise would be, because prefetch reads and database writes, which are 100% eligible when IIPHONORPRIORITY is set to YES, are 0% zIIP-eligible when IIPHONORPRIORITY=NO is in effect. That's no small thing - in some Db2 systems, there is a great deal of prefetch read activity. I mentioned previously that a telltale sign that IIPHONORPRIORITY is set to NO in a z/OS system in which Db2 runs can be found in a Db2 monitor-generated statistics long report. In such a report, scroll down to the section in which CPU consumption attributed to the various Db2 address spaces is shown. In that report section, check the CPU times for the Db2 database services address space. If IIPHONORPRIORITY is set to YES, it is highly likely that a majority - often, a very large majority - of that address space's CPU time is in the zIIP column (the column in the report labeled PREEMPT IIP SRB, or something similar to that) versus the general-purpose engine column (labeled CP CPU TIME, or something similar). This is a reflection of the fact that prefetch reads and database writes very often constitute most of the work done by the Db2 database services address space, in terms of CPU resources consumed. If, on the other hand, IIPHONORPRIORITY is set to NO, it is quite likely that a majority - and maybe a large majority - of the Db2 database services address space's CPU time will be in the general-purpose engine column versus the zIIP column.

Bottom line: if you want to actually maximize offloading of Db2-related work to zIIP engines, do two things: 1) ensure that IIPHONORPRIORITY is set to YES, and 2) have enough zIIP capacity to keep the zIIP spill-over rate (the percentage of zIIP-eligible work that ends up being executed on general-purpose engines) below 1%. Refer to the blog entry to which I provided a link, above, to see how to calculate the zIIP spill-over rate using numbers from a Db2 monitor-generated accounting long report.

Is there any situation in which going with IIPHONORPRIORITY=NO might be reasonable? I'd say, "Maybe," if you're talking about a test or development environment in which optimal DDF application performance is not very important. If the z/OS system in question is one in which a production Db2 subsystem runs, I'd be hard pressed to come up with a justification for IIPHONORPRIORITY=NO. For a production Db2 system, you want IIPHONORPRIORITY=YES. You might want to check on this at your site.

Friday, April 26, 2019

Db2 12 for z/OS Statistics Profiles: Just What the Optimizer Ordered

Here is an interesting story for you: not long ago, an analytics-type query was executed on a Db2 for z/OS system that had recently been migrated to Db2 12. The query was cancelled after it had run for 23 hours. A DBA noticed that there was a row for a table targeted by the query in SYSIBM.SYSTABLES_PROFILES in the Db2 catalog. He ran RUNSTATS for the table, specifying USE PROFILE so that the utility would pick up the options specified in the aforementioned statistics profile. The DBA observed that the RUNSTATS job "ran a little longer than usual," an indication that the utility was doing something it hadn't previously done in generating statistics for the target table. The formerly very-long-running query was subsequently re-submitted, and it successfully ran to completion in 5 minutes. Do I have your attention? Good - read on.

First, the dramatic improvement in query response time described above is not about magic. It's about statistics. Db2 for z/OS has a very sophisticated query optimizer (IBM invented cost-based SQL statement optimization more than 35 years ago), but to do its job well that optimizer needs the right statistics related to tables accessed by queries. How important are those statistics? This important: the team at the IBM Support Center that works on problems involving query performance has said for years that over 75% of these problems are resolved not by rewriting the query, not by creating or modifying an index, but by providing the Db2 optimizer with the statistics it needs to generate a well-performing access plan.

Knowing that, you might think, "OK, so how do I know what statistics the optimizer needs in order to generate the access plan that will deliver the best-possible performance for a query?" And there, traditionally, has been the rub: it can be tricky to get this right, especially for a complex query. If you have the right skills and experience, you can pore over a query access plan as rendered by a visual EXPLAIN tool, and make a good determination of the statistics, currently absent from the catalog, that would appear to be useful to the optimizer for the case at hand. Alternatively, you could "go big" on statistics-gathering for your tables, and instruct RUNSTATS (or LOAD or REORG when inline statistics-gathering is requested) to generate every possible item of statistical data for your tables, but that would substantially increase the CPU cost of utility execution. There's also the option of using a tool that has a statistics recommendation feature (an example of a product in that space is IBM's Db2 Query Workload Tuner for z/OS). Or - and this is new with Db2 12 - you can let Db2 take care of this.

Let me expand on what Db2 12 has introduced with regard to getting catalog statistics right. Db2 11 for z/OS introduced a new catalog table, called SYSIBM.SYSSTATFEEDBACK, into which Db2 places information about statistics seen by the optimizer, in the course of generating query access plans, as being absent or inconsistent (regarding the latter, consider that one can insert statistical information into catalog tables on one's own, perhaps causing an inconsistency - a value in one column that should reflect the value in another column, but doesn't). That's fine, but sometimes that statistics feedback data would just sit there - you could look at it and use it (if you correctly interpreted what you saw), or a tool could read and act on it, but Db2 itself wouldn't do anything with SYSSTATFEEDBACK data. That changed with Db2 12: in a Db2 12 system (assuming that the new STATFDBK_PROFILE parameter in ZPARM is set to the default value of YES), when data is written to SYSIBM.SYSSTATFEEDBACK, data is also written to SYSIBM.SYSTABLES_PROFILES - either a statistics profile row will be added for a table, or an existing statistics profile row will be updated. That new or updated statistics profile row provides the RUNSTATS specifications that will gather the statistics that the Db2 query optimizer itself has indicated that it needs to do what it does to best possible effect.

AND, when I say, "RUNSTATS specifications," don't think that a Db2 12-generated statistics profile is usable only with RUNSTATS. In a Db2 12 environment you can also specify USE PROFILE when executing LOAD or REORG - something you couldn't do in a Db2 11 system.

What if you want to see what kind of statistics profiles Db2 12 is generating? That information is in the PROFILE_TEXT column of SYSIBM.SYSTABLES_PROFILES. The data type for the PROFILE_TEXT column is CLOB. You can view data in a CLOB column via a query issued through IBM Data Studio or IBM Data Server Manager (two no-charge, download-able tools); alternatively, you could view PROFILE_TEXT data by issuing a query through SPUFI, if you cast the CLOB column as VARCHAR. In the example below (a query I've executed using SPUFI), I've gone with VARCHAR(500) - you can specify whatever length you need in your environment.

SELECT CAST(SCHEMA AS CHAR(30)) AS SCHEMA,    
  CAST(TBNAME AS CHAR(30)) AS TBNAME,         
  CAST(PROFILE_TEXT AS VARCHAR(500)) AS PROFILE_OPTIONS
  FROM SYSIBM.SYSTABLES_PROFILES
  WHERE...

Given the enhancement described in this blog entry, I think it would be a very good idea to utilize the USE PROFILE option of RUNSTATS (and of LOAD and REORG, as appropriate) in a Db2 12 system. What if you do that when executing RUNSTATS (or LOAD or REORG) for a table, and there is no row for the table in SYSIBM.SYSTABLES_PROFILES? Not to worry: in that case, COLUMN(ALL) INDEX(ALL) will be used for the RUNSTATS (or LOAD or REORG) job (if a RUNSTATS or LOAD or REORG job is executed without a table being named, and if no row(s) for the table(s) in the target table space exist in SYSIBM.SYSTABLES_PROFILES, the job will be executed using options TABLE(ALL) and INDEX(ALL)). What if you have already populated SYSIBM.SYSTABLES_PROFILES via the use of SET PROFILE (or UPDATE PROFILE) in RUNSTATS jobs? Again, not a problem. Those user-developed statistics profiles can still be used in a Db2 12 environment. Keep in mind that Db2 12 will create or update a statistics profile row in SYSTABLES_PROFILES only when the Db2 optimizer determines that statistics needed to properly optimize a query are missing and/or inconsistent. If a statistics profile you generated yourself gathers the statistics that the optimizer needs, Db2 will not change that profile in the catalog.

For a long time, when confronted with a poorly-performing query, Db2 DBAs have slogged through query analysis and tuning, often a time-consuming task. If your Db2 for z/OS system is at Version 12 (and STATFDBK_PROFILE is set to YES in ZPARM), I'd recommend a different approach in the face of a query performance problem: first, see if a row (or rows) for the table (or tables) accessed by the query exist in SYSIBM.SYSTABLES_PROFILES. If there is such a row (or rows) in that catalog table, execute RUNSTATS for the table(s) with a USE PROFILE specification. Then re-submit the query. If performance is still not what it needs to be, engage in the usual query analysis process. If RUNSTATS with USE PROFILE does the trick for you, you've saved yourself a lot of time and effort.

Oh, and one more thing: when the query I referenced at the beginning of this blog entry was initially resubmitted following execution of RUNSTATS with USE PROFILE for the target table, its performance did not improve at all. Why? Because the previously-prepared form of the query was still in the Db2 subsystem's dynamic statement cache. The DBA I mentioned ran RUNSTATS with INVALIDATECACHE YES to invalidate that prepared form of the statement, and when the query was subsequently submitted it was reoptimized using the statistics generated through the Db2-generated profile, and that's when runtime went from 23-hours-then-cancel to 5 minutes.

Gather catalog statistics using profiles that Db2 generates based on optimizer feedback, and you, too, could realize some nice performance results.

Sunday, March 31, 2019

A Case Study: Implementing a Db2 for z/OS Implicit Trusted Connection

I've been interested in the role and trusted context features of Db2 for z/OS for a long time (I posted a two-part blog entry on the topic back in 2011). Theory is nice, but what about practice? Recently, I had an opportunity to assist a DBA in implementing implicit trusted connection functionality in his Db2 for z/OS environment (more on the implicit part in a moment). Today I'll share some of what the DBA and I learned through that endeavor.

First, the backstory: the DBA's organization wanted to tighten Db2 for z/OS data security by preventing unauthorized utilization of the authentication credentials used by an application to connect to a Db2 system. The kind of application we're talking about here is one that accesses a Db2 for z/OS server through a TCP/IP link via the IBM Data Server Driver (or its predecessor, Db2 Connect). In requesting a connection to a Db2 system, the application provides an authorization ID and an associated password (assuming authentication using an ID and a password, versus an ID and a certificate). Often, the ID and password in question are known by several of the application's developers. What is to keep someone from using the application's authorization credentials to connect to the Db2 system from, say, a laptop PC, thereby gaining access to data that should be accessible only through the application?

Enter roles and trusted contexts. These Db2 features work together to shut down the potential security exposure just described. How does this work? First of all, any Db2 privileges needed by the application (e.g., the SELECT privilege on table T1) are NOT granted to the application's Db2 authorization ID. Those privileges are instead granted to a Db2 role, which I'll call APP1ROLE. Next, a Db2 trusted context is created to specify the circumstances under which the Db2 privileges granted to the role can be used by an application. If the trusted context is named TRCON1, and if the application authenticates to the Db2 system using authorization ID APP1ID and the application runs on the two application servers at IP addresses 1.2.3.4 and 1.2.3.5, the Db2 DDL statement defining the trusted context might look something like this:

CREATE TRUSTED CONTEXT TRCON1
BASED UPON CONNECTION USING SYSTEM AUTHID APP1ID
DEFAULT ROLE APP1ROLE
ATTRIBUTES (ADDRESS ’1.2.3.4’, ADDRESS '1.2.3.5')
ENABLE;

Here's what the above DDL statement is essentially saying: "The Db2 privileges granted to role APP1ROLE can be used ONLY by the application that connects to Db2 using authorization ID APP1ID, and ONLY when that application connects to this Db2 system from either IP address 1.2.3.4 or IP address 1.2.3.5." Following creation of the role and trusted context, suppose that someone who know's the application's authentication credentials uses them to connect to the Db2 system from a laptop PC. Assuming a successful connection from the laptop PC, is that an "Uh-oh" situation? No. Why not? Because no Db2 privileges have been granted to the application's authorization ID (connecting to a Db2 for z/OS system doesn't get you anywhere if you can't do anything after connecting). But what about the privileges granted to role APP1ROLE, which we've associated with the application's authorization ID? Those privileges are not available to the person who used the application's credentials to connect to the Db2 system from a laptop PC because the laptop's IP address is not one of the two specified in the definition of the trusted context (those being the addresses of the two application servers on which the application runs). What has been accomplished through the creation of the role and trusted context is a vast reduction in what security people sometimes call the "threat area": instead of worrying about the application's credentials being misused from any IP address, we just need to secure the two IP addresses referenced in the trusted context definition (in other words, we need to ensure that an individual cannot make an unauthorized connection to the Db2 system from the two application servers on which the application of interest runs).

Back to my DBA friend. His first attempt at successful use of a Db2 role and trusted context didn't work: the SELECT privilege on table T1 was granted to a role but not to authorization ID APP1ID (using the example ID previously referenced), and an attempt to execute SELECT COL1 FROM T1 by way of a trusted context, using ID APP1ID, failed with a Db2 "ID not authorized to perform this action" error code. We decided to check to make sure that the IP address used in defining the trusted context was the right one. To help with that verification, we 1) turned auto-commit off on the tool being used by the developer who was helping us with the test, and 2) granted SELECT ON T1 to APP1ID. Why did we do these two things? So that we could see the IP address of the developer's Db2 thread in Db2 -DISPLAY THREAD command output (granting SELECT ON T1 to APP1ID ensured that the thread information wouldn't disappear due to the aforementioned authorization error, and making sure that a commit wasn't issued after the successful SELECT similarly preserved the thread's information, which would have disappeared when the thread went into a disconnected state following a commit).

After we'd taken those two actions to keep the developer's thread in a connected state following the SELECT FROM T1, we had the developer try again. Sure enough, we saw information about the developer's still-connected Db2 thread in -DISPLAY THREAD(*) DETAIL output, but the client IP address information was missing. That information was absent because -DISPLAY THREAD(*) DETAIL was an overly-generic form of the command -DISPLAY THREAD. When we switched to the form -DISPLAY THREAD(*) LOCATION(*) DETAIL, which tells Db2 that you only want to see information about DDF threads, we got additional DDF-related information in the command, including the client IP address associated with the developer's transaction (that address is in the V445 section of the command's output). The client address we saw was the one used in the definition of the trusted context, so why didn't the trusted context work for us? Because in this case the developer's connection to the Db2 system was through a Db2 Connect "gateway" server (versus a direct connection from the client via the IBM Data Server Driver), and in that case the client IP address that Db2 "sees" is the address of the Db2 Connect gateway server, not the address of the client that is "upstream" from the Db2 Connect gateway server.

OK, so we addressed that problem by re-defining the trusted context, this time using the IP address of the Db2 Connect gateway server. Well, not exactly. We actually re-defined the trusted context by specifying the domain name of the Db2 Connect gateway server, instead of its IP address. That can certainly be done in a CREATE (or ALTER) TRUSTED CONTEXT statement, and such a specification can be useful when (as was the case for my DBA friend's organization) the IP address of the Db2 Connect gateway server used by an application can change from time to time (if the application server references the Db2 Connect gateway server's domain name, a change in the gateway server's IP address does not require a client-side change - the domain name is simply mapped to the new gateway server's IP address versus the address of the old gateway server). We tried again with the redefined trusted context AND IT STILL DIDN'T WORK. ARRGGGHHH. What was going on? Further investigation revealed the problem: the domain name we were given by a network administration colleague of the DBA was actually associated with a pair of IP addresses (corresponding to a pair of Db2 Connect gateway servers), and for a domain name specification in the ATTRIBUTES part of a CREATE (or ALTER) TRUSTED CONTEXT statement to work properly, it has to be associated with a single IP address.

Once again, the DBA re-defined the trusted context, this time using two domain names as address attributes, with each domain name being specific to the one IP address of one of the two Db2 Connect gateway servers in a cluster. He revoked SELECT ON T1 from the APP1ID authorization ID, and had the developer try again with a SELECT from T1 using APP1ID, AND IT WORKED!! The successful execution of the SELECT from T1 showed that the privilege granted to the role APP1ROLE was being used, and indeed we saw in the output of a -DISPLAY THREAD(*) LOCATION(*) DETAIL command that the developer's thread was using the established trusted context (that information is in the V485 section of the command's output).

Now, about the "implicit" nature of the trusted connection, referenced way up at the top of this blog entry (and in the entry's title): I wanted to make sure to emphasize the implicit nature of the trusted connection used in this example, because so much of the documentation on Db2 for z/OS trusted contexts pertains to explicit trusted connections. What's the difference? Well, in the latter case a client application explicitly requests a trusted connection to the Db2 server, whereas in the former case the connection is just a "regular" connection from the client application's perspective, and it becomes a trusted connection because a trusted context matching the application's authorization ID and IP address (or domain name or whatever other attribute is specified) has been defined on the Db2 side. If you can get a trusted connection to a Db2 server (one that enables use of the Db2 privileges granted to a role associated with the trusted connection) without having to explicitly request such a connection, why would you go the explicit route? You'd go that route if you needed functionality that is ONLY available for explicit trusted connections. An example of that functionality: an application, having requested and obtained an explicit trusted connection to a Db2 for z/OS server, can then tell Db2 that it is "switching" use of the explicit trusted connection from its ID (the ID the application used to connect to the Db2 server) to the ID of an end user of the application. That switching from one ID to another (and the associated flowing by the application of an actual end-user's ID to the Db2 server system) is one of the things that has to happen if you want to make use of (for example) the enterprise identity mapping capability provided by RACF (whereby an individual's "network" ID can be mapped to a RACF ID). Note that a trusted context defined for the purpose of supporting explicit trusted connection functionality would likely include the WITH USE FOR clause in the CREATE TRUSTED CONTEXT statement - that clause can be used to reference end-user IDs passed to Db2 by an application. In the situation I've described in this blog entry, the DBA did not require explicit trusted connection functionality, so setting up implicit trusted connection capability was appropriate.

I hope that the information I've provided here will be useful to you if you want to utilize implicit trusted connection functionality in your Db2 for z/OS environment.

Monday, February 25, 2019

Db2 for z/OS: Isolating REST Client Activity in Db2 Monitor Accounting Reports

In an entry posted to this blog some years ago, I described the form of a Db2 monitor-generated accounting report that I most like to use to get an overall view of application activity in a Db2 for z/OS environment: an accounting long report with data ordered by connection type (here, I am using terminology associated with the Db2 monitor with which I'm most familiar: IBM's Tivoli OMEGAMON XE for Db2 Performance Expert on z/OS). I like to see data in an accounting report aggregated at the connection type level because within such a report there is a sub-report with detailed accounting information for each connection type through which the target Db2 subsystem was accessed: a sub-report showing all CICS-related work for the subsystem, one showing all TSO attach-related work, one showing all DDF-related work, etc.

Now, the thing about the DDF-related activity in this form of an accounting report is that it is presently - thanks to a recent and important Db2 for z/OS enhancement - kind of mis-labeled. The DDF-related work in an accounting long report with data ordered by connection type is distinguished by the label CONNTYPE: DRDA. How is that a bit misleading? Well, the native REST interface to Db2 for z/OS (the enhancement to which I referred previously - designed into Db2 12 and retrofitted to Db2 11) is an extension of the Db2 distributed data facility (DDF's long name). What that means - as pointed out in a blog entry I posted a few months ago - is that there are now two paths into DDF: the DRDA path (which application developers might think of as the SQL path) and the REST path; thus, CONNTYPE: DRDA is a little off because what you see in that part of an order-by-connection-type Db2 monitor accounting report is all DDF-related activity on the Db2 subsystem of interest - activity associated with DRDA requesters and activity associated with REST clients.

What should the designator of DDF-related activity in an order-by-connection-type Db2 monitor accounting report be? CONNTYPE: DRDA+REST? CONNTYPE: DDF? Or, should there be a CONNTYPE: DRDA for work that is strictly related to actual DRDA requesters and a CONNTYPE: REST for activity related to REST clients?

I'll let the Db2 monitor developers figure that out. Meanwhile, if you have a Db2 subsystem on which there is a good bit of activity related to both DRDA requesters and REST clients, and you want to see, in one Db2 monitor accounting report, just the REST-related activity, how can you do that? If all the REST clients authenticated to the Db2 subsystem using only two distinct authorization IDs - for example, ABCID and XYZID - then you could get your REST-only accounting report by telling the Db2 monitor to include, in the accounting report, only activity related to those primary authorization IDs. The way you'd convey that request to OMEGAMON for Db2 would be through this specification in the report control statement:

INCLUDE (PRIMAUTH(ABCID XYZID))

Obviously, that approach becomes unwieldy if there are, say, a hundred or more IDs used by Db2 REST clients.

Is there a better way? Yes, there is. It so happens that a Db2 accounting trace record associated with a REST client interaction will include, in the correlation ID field (the field name is QWHCCV), the value 'DB2_REST'. Thus, one can easily get a Db2 monitor accounting report showing all REST-related activity for a subsystem, and only that activity, by informing the monitor that the report should include data only for DDF-related requests for which the correlation ID value is DB2_REST. Again using OMEGAMON for Db2 report command syntax (which is what I know), that would be done with this specification:

INCLUDE(CONNTYPE(DRDA)
        CORRNAME(DB2_REST))

And there you have it. I hope that this information will be useful to you, should your site start seeing a significant amount of Db2 for z/OS access via the REST interface (and I think that a substantial number of sites will see just that in the near future - the REST interface has generated a lot of interest among application developers).

Friday, January 4, 2019

A Case Study: Measuring the Impact of Db2 for z/OS Buffer Pool Changes

Not long ago, a Db2 for z/OS administrator contacted me with a request. His team had made some changes to the buffer pool configuration in their production Db2 environment (the overall size of the buffer pool configuration was increased to reduce read I/O activity, and greater use was made of large real storage page frames), and they were not sure as to how the performance impact of these changes could be measured. Could I assist with that analysis?

Sure, I said. I asked the Db2 admin to use his Db2 performance monitor tool to generate detailed accounting reports (depending on the monitor used, such a report might be titled ACCOUNTING REPORT - LONG or ACCOUNTING SUMMARY - LONG) capturing activity during periods preceding and following the buffer pool changes - same FROM and TO times, same day of the week for both reports (to make the comparison as "apples to apples" as possible). I also requested that the reports be generated so as to have data aggregated at the connection type level (again, terminology can vary by monitor product - your instruction to the monitor might be to "order" the data by connection type, or to "group" the data by connection type). Why do I like for data in a detailed accounting report to be aggregated in this way? Well, I find the default aggregation specification for Db2 monitor-generated accounting reports - authorization ID within plan name - to be too fine-grained for effective analysis of a Db2 system in an overall sense; conversely, aggregation at the subsystem ID level is generally too coarse-grained for my liking. When data in a detailed accounting report is aggregated at the connection type level, you get within the larger report a sub-report for each connection type used in the Db2 for z/OS system of interest; for example: a sub-report showing all CICS-Db2 work in the system, one showing all batch work that uses the call attachment facility, one showing all batch work using the TSO attachment facility, one showing all DDF-related work (connection type: DRDA), etc. With such a report in hand (or on your screen), you can quickly and easily identify the largest Db2 workload components (with "component" here referring to work associated with a given connection type) and focus your analysis accordingly.

I showed the Db2 admin and one of his colleagues how to put the aforementioned "before" and "after" Db2 monitor accounting reports to use in measuring the impact of the effected buffer pool configuration changes, and I'll provide that knowledge transfer here in the remainder of this blog entry.

First, I look over the sub-reports (by connection type) within the larger "before" and "after" detailed accounting reports to find the workload components (connection types) that dominate in terms of size. "Size" refers in this case to aggregate in-Db2 CPU cost of SQL statement execution, and that's easy to determine: for a given workload component (a given connection type), find first the average in-Db2 (i.e., class 2) CPU time (done by summing two values: average class 2 CP CPU time, which is general-purpose CPU time, and average class 2 SE CPU time, which is "specialty engine," or zIIP, CPU time). Next, take that average class 2 CPU time for the connection type, and multiply it by the number of occurrences (typically, number of accounting records), because "average" is "average per occurrence." Here is an example of what you might see in one of the sub-reports (this for a DRDA workload), with the numbers I've referenced highlighted in red:

CONNTYPE: DRDA

AVERAGE.     DB2 (CL.2)                   HIGHLIGHTS
-----------  ----------                   -----------------------
                                          #OCCURRENCES  : 3620821
CP CPU TIME    0.000169

SE CPU TIME.   0.000223

So in this case, aggregate CPU cost of SQL statement execution (in-Db2, or class 2, accounting time is time associated with SQL statement execution) for the DRDA workload component is (0.000169 + 0.000223) * 3620821 = 1419 CPU seconds.

Performing this simple calculation showed that for the overall Db2 workload I was analyzing, the two biggest components by far were the DRDA and CICS-Db2 connection types; so, I didn't bother with the other much-smaller components (call attachment facility, TSO attachment facility, etc.).

For the two major workload components (DRDA and CICS-Db2), I compared "before" and "after" figures for average class 2 CPU time. That is the key metric for evaluating the impact of a Db2 system or application performance tuning action. Too often, Db2 for z/OS people effect a change aimed at improving workload performance and look, for evidence of improvement, at the total class 2 CPU time for the workload of interest (and by the way, I focus on class 2 CPU time because that is what a Db2 person can influence - class 1 CPU time includes time outside of Db2, over which a Db2 person may not have much control). "Before" and "after" figures for total class 2 CPU time for a workload may show little or nothing in the way of difference, leading a Db2 person to think that a performance tuning action accomplished nothing. In fact, it may be that the tuning change was indeed successful. How could that be, if total in-Db2 CPU time did not decrease? Simple: look for a change in throughput (the number of commits is a pretty good indicator of "pieces of Db2 work" completed on a system). A Db2 tuning change might reduce average in-Db2 CPU time per transaction (or, more broadly, per unit of work), and that might result in greater throughput (units of work completed in a given time period), and that (more transactions completed, at less in-Db2 CPU time per transaction) might result in a non-decrease in the total in-Db2 CPU time for the workload. In my experience, greater throughput with no increase in total in-Db2 CPU time for a workload is a GOOD thing. If you see increased throughput following a Db2 performance tuning change and your organization actually wants to keep throughput at the lower "before" level, that can be accomplished by limiting the availability of Db2 threads for the workload in question - essentially, you keep a lid on throughput by introducing some queuing for Db2 threads.

OK, so your focus as a Db2 person should be on reducing average class 2 CPU time for a workload, as indicated in a Db2 monitor-generated accounting report (or an online monitor display of thread activity on the system - but I prefer to look at accounting reports); and, in tracking this, make sure that you look at "average total" class 2 CPU time, meaning the sum of average class 2 general-purpose CPU time (usually labeled CP CPU time) and average class 2 zIIP CPU time (usually labeled SE CPU time) - always check the average class 2 zIIP CPU time, even for non-DDF-related applications that you might not think of as drivers of zIIP utilization, because that zIIP time figure could be non-zero (for example, if some queries in a batch job are parallelized by Db2, some of the batch job's class 2 CPU time will be consumed on zIIP engines because the "pieces" of queries parallelized by Db2 are zIIP-eligible). As indicated by the average-class-2-CPU-time yardstick, how did the Db2 team with which I worked do, with regard to improving application workload?

They did quite well:

                      Avg cl 2 CPU (GP + zIIP) - seconds
Workload component      Before     After        Change
------------------    --------  --------  ------------
CICS-Db2              0.001322  0.001245   6% decrease
DRDA                  0.000392  0.000259  34% decrease

That 6% improvement in average class 2 CPU time for CICS-Db2 transactions is close to what I am accustomed to seeing as a result of buffer pool actions such as those taken by the Db2 team in the case of which I'm writing: larger buffer pools, and greater use of page-fixed pools backed by large real storage page frames, tend to shave a few percentage points off of average class 2 CPU times reported by a Db2 monitor. A modest improvement of that nature is generally welcome, as it requires no application code changes and positively impacts a large number of programs (it's the kind of change that I put in the "rising tide lifts all boats" category).

What about the 34% improvement in class 2 CPU efficiency seen for the DRDA workload as a result of the buffer pool changes? Why such a large improvement? Here's my theory: as is true in many cases, the the SQL statements associated with the DRDA workload at this site are overwhelmingly dynamic in nature (that is commonly seen when DDF transactions do not make heavy use of static SQL-issuing stored procedures). Dynamic SQL statements can be automatically re-optimized on a regular basis, particularly when a "bounce" (stop and restart) of a Db2 subsystem requires total re-population of the Db2 dynamic statement cache in memory (where the prepared-for-execution form of dynamic SQL statements are stored for re-use). The Db2 subsystem on which this blog entry is focused was indeed bounced on the weekend during which the buffer pool changes were put into effect, so ALL of the dynamic SQL statements coming into the system following that Db2 stop/restart were optimized at first execution. And here's the thing: it's not just indexes and catalog statistics that influence SQL statement access path selection. Db2 memory resources - particularly buffer pool resources - are another factor. I have a feeling that when the dynamic SQL statements (especially queries) associated with the DRDA workload were re-optimized following the significant changes made to the Db2 subsystem's buffer pool configuration, some new and better-performing access plans were generated. Further evidence of positive access path changes: average GETPAGEs per transaction for the DDF workload decreased by 24% following the buffer pool changes. GETPAGE activity is a major determinant of the CPU cost of SQL statement execution, and a substantial decrease in such activity is often indicative of access path changes.

This kind of begs the question: would a rebind of the packages associated with the CICS-Db2 workload (a workload for which, at this site and many other sites, almost 100% of the SQL statements are static) following the buffer pool changes have led to per-transaction CPU efficiency gains going beyond the 6% improvement that we saw? Quite possibly. That's something to think about. If you do decide to rebind static SQL packages following some significant buffer pool changes, keep a couple of things in mind. First, rebinding with APCOMPARE(WARN) will cause Db2 to let you know (via message DSNT285I) when access paths change for one or more statements associated with a package (information about access path changes is provided in the REMARKS column of the PLAN_TABLE when a package is rebound with EXPLAIN(YES)). Second, if access paths change and package performance gets worse as a result (not likely, but possible), you can quickly and easily put the previous instance of the package back into use through a REBIND with SWITCH(PREVIOUS), assuming that plan management is in effect for your Db2 subsystem (and it is "on" by default - check the value of the PLANMGMT parameter in the ZPARM module). Together, these Db2 features are aimed at "taking the fear out of rebinding," as a former colleague of mine so aptly put it.

So, there you go. If you make Db2 changes aimed at boosting performance for a workload, measure the result using the average class 2 CPU time figures provided in Db2 monitor-generated accounting detail reports that show activity from "before" and "after" time periods. Buffer pool changes (larger pools, greater use of page-fixed pools backed by large real storage page frames) usually reduce average class 2 CPU time by a few percentage points. Greater performance gains could be seen if the buffer pool changes enable Db2 to choose new and better-performing access paths for SQL statements. Those statement re-optimization actions tend to be automatic for dynamic SQL statements (especially if a Db2 subsystem's dynamic statement cache has to be re-populated). For static SQL, package rebind is needed for statement re-optimization. Rebinding with APCOMPARE(WARN) and EXPLAIN(YES) will provide information about access path changes, and in the unlikely event of a negative performance result following re-optimization, rebinding with SWITCH(PREVIOUS) puts the previous instance of the package back into use.

Friday, December 28, 2018

Db2 for z/OS Partitioned Table Spaces: How the Number of Partitions can Impact Performance

This past summer, I delivered a couple of presentations at a meeting of a regional Db2 users group in the USA. During the lunch break, I shared a table with a member of the Db2 for z/OS support team at a large financial institution. He told me an interesting story. At his site, the Db2 for z/OS team had decided that universal partition-by-range (PBR) would be a better table space type for a large table currently housed in a partition-by-growth table space (PBG). To prove out this assessment, the team created a copy of the table of interest in a PBR table space, and ran some batch jobs that targeted the two variants of the table - the same jobs (some that changed data, others that only read data), accessing data that was identical in a logical sense (i.e., same data records in the tables in the two different table spaces). The performance characteristics of the PBG-accessing and the PBR-accessing jobs were compared.

The findings? The PBG-accessing jobs showed consistently better elapsed and CPU times versus the PBR-accessing jobs. That was not a result I would have expected. I've blogged before about reasons for favoring PBR over PBG for large Db2 tables, and some of those reasons are performance-related. Would I have been surprised by comparative results showing roughly equivalent performance for PBG-accessing and PBR-accessing jobs? Maybe not. But better performance for the PBG-accessing jobs? That had me scratching my head.

I asked the Db2 person at the user group meeting to send me Db2 performance monitor accounting "long" reports capturing activity for the PBR-accessing and PBG-accessing jobs, and he did that not long after getting back to his workplace. Analysis was focused on the package-level accounting data for a particular package that accessed only the table with the PBR and PBG variants (other packages associated with the batch jobs used for comparison accessed the PBR-and-PBG-variant table and several other related tables as well). Sure enough, for the "this table only" package the CPU time was about 6% higher for the PBR-housed table versus the PBG-housed table. Not a huge difference, but statistically significant and, as previously mentioned, unexpectedly favoring the PBG table space.

I started asking questions, looking for a possible explanation for the better performance result seen for the PBG-accessing package:

  • Was the SEGSIZE specification the same for both table spaces (a larger SEGSIZE value can result in better CPU-efficiency for large-scale page scans)? Yes, same SEGSIZE for both table spaces.
  • Was the PBR table space in fact universal, as opposed to being a non-universal table-controlled partitioned table space (I wanted to make sure that we were comparing universal to universal)? Yes, the PBR table space was indeed of the universal variety.
  • Were the PBG-housed and PBR-housed tables clustered by the same key (clustering affects locality of reference for set-level query and update and delete operations)? Yes, both variants of the table had the same clustering key.
  • Were there indexes defined on the same keys of both tables (obviously, indexes have a lot to do with query access paths, and they affect the CPU cost of insert and delete and some update operations)? For both the PBG-housed and the PBR-housed table, there was only one index, and it was defined on the same key.
  • Were the PCTFREE and FREEPAGE specifications the same for the one index on the two table variants (a significant difference here could affect index GETPAGE counts)? Yes, for the one index defined on the same key of both table variants, the PCTFREE and FREEPAGE values were identical.
  • Was the one index on the PBR-housed table a partitioned index or a non-partitioned index (NPI)? It was a partitioned index, and by the way the PBR table space had 318 partitions (me, on getting that piece of information: "Hmmmm").

OK, so what prompted my "Hmmmm" on learning the number of the PBR table space's partitions? That number got me thinking, "318 is a pretty substantial number of data sets. Depending on the number of data sets associated with the PBG table space, that could be an important differentiating factor between the two." Indeed, it turned out that the PBG table space had far fewer data sets versus its PBR cousin. How could that influence CPU time for an application process? In this case, a clue came from a number in the aforementioned Db2 monitor accounting reports: the commit count for the batch process that accessed the table of interest: 204,004 over the 4-hour, 50-minute reporting interval (a little over 700 per minute). What does that have to do with the job's CPU time? Well, when the RELEASE specification for a package is COMMIT (the default), any and all "parent" locks acquired in executing the package will be released every time the application process driving the package's execution issues a commit (the package itself is also released from the application process's thread at each commit). We tend to think of "parent" locks as being table space-level locks, but in fact when a table space is partitioned the parent locks are at the partition level. If the application process's SQL statements are such that a large number of partitions are accessed in each unit of work, that means a lot of partition-level locks are acquired within each unit of work and released at commit time. That parent lock acquisition and release cost can be yet a bit higher in a data sharing environment (and such was the case in the situation I'm describing), because a significant percentage of global locks tend to be of the parent type.

I relayed to the Db2 support person my thinking about the impact that the number of PBR versus PBG partitions might have on the application process's CPU time. He got together with the larger Db2 team at his site, and they decided to repartition the PBR table space in a way that dropped the number of partitions from 318 to 58, and voila - the application process's CPU time dropped to a level that was several percentage points below that seen for the PBG table space. Success!

Now, it's important to note that in this particular situation, a re-partitioning of the PBR table space was not a problematic change. Suppose you have a situation like the one I've described here, and re-partitioning your PBR table space is not a viable option? In that case, the in-Db2 CPU cost of an application process (likely a batch job) that accesses a large number of a table space's partitions within each unit of work could be reduced (especially if the job issues a large number of commits, resulting in a large number of units of work) via a rebinding of the associated Db2 package(s) with RELEASE(DEALLOCATE). That package bind option would cause the partition-level locks acquired as the batch job progresses to be retained until thread deallocation time (i.e., until end-of-job). That, in turn, would eliminate the overhead of releasing many partition-level locks at each commit (which locks would likely be re-acquired within the next unit of work), thereby reducing CPU time (I posted an entry on RELEASE(DEALLOCATE) considerations to this blog a few years ago - the part under the heading "Batch" is most relevant to the blog entry you're reading now). [Note: table space- and partition-level locks are almost always of the "intent" variety. Such locks are non-exclusive in nature, so retaining them across commits should not be a concern from a concurrency perspective.]

The bottom-line message here is NOT that you should always go for fewer rather than more partitions for a partitioned table space; rather, it is that the number of a table space's partitions can be a factor that affects CPU efficiency for some applications. A type of application for which this can matter is one that a) uses RELEASE(COMMIT) packages, b) is characterized by many units of work in the life of a thread, and c) accesses a large percentage of a table space's partitions in a typical unit of work. For such an application, CPU efficiency could potentially be improved by going with a smaller rather than a larger number of partitions for the target table space. When significant partition reduction is not an option for a table space, CPU time for an application process such as the one described in this blog entry could potentially be reduced by rebinding the application's Db2 packages with the RELEASE(DEALLOCATE) option.