Tuesday, September 30, 2014

DB2 for z/OS: DDF, Stored Procedures, and SET CURRENT PACKAGESET

Recently, I was talking with people from a company's IT department about modernizing their mainframe application architecture.Among other things, we discussed increasing the company's use of stored procedures for DB2 for z/OS data access. That prompted some questions around the use of the DB2 SQL statement SET CURRENT PACKAGESET in stored procedures called by DRDA requesters (these being calls that would get to DB2 via the distributed data facility, or DDF). Why did this issue come up? Because the organization's DB2 database had been divided into several schemas (sets of tables) along customer lines, with information pertaining to one set of customers being stored in tables in schema A (e.g., in a table called A.ORDERS), and data for another customer set going into schema B tables (such as B.ORDERS), and data for yet another customer set going into schema C tables, etc. This is not an unusual approach in environments characterized by very large amounts of data that can be relatively easily separated according to some identifier such as customer or account number. Such a mode of data organization can deliver some scalability and availability benefits, and if the number of database "instances" (referring to the different schemas, distinguished by high-level qualifier) doesn't get to be too large (I would want it to be in the single digits), the cost in terms of operational complexity is generally not too high. Does the presence of "n" database instances -- same tables, structurally, in each instance, but with different data -- require the coding of "n" versions of each database-accessing program? No. Programs are coded using unqualified table names in SQL statements, and then the packages associated with the programs are bound into "n" collections, one for each database instance, with the appropriate high-level qualifier provided via the QUALIFIER option of the BIND PACKAGE command. At execution time, application programs can then very easily and efficiently navigate between database instances by pointing to an instance's package collection using the SQL statement SET CURRENT PACKAGESET.

Nothing new here -- this kind of thing has been done for years at plenty of sites. The concern raised during the aforementioned meeting had to do with the use of SET CURRENT PACKAGESET in DB2 for z/OS stored procedures called through DDF. One of the DBAs with the company to which I've referred expressed doubt that SET CURRENT PACKAGESET could be used in this situation. The reason for this DBA's skepticism? He was pretty sure that the value used in a SET CURRENT PACKAGESET statement had to be a collection name in a plan's package list (often called the PKLIST). For DDF-using applications, the plan name always defaults to DISTSERV, and that plan doesn't have a package list; therefore (the DBA believed), SET CURRENT PACKAGESET, when issued from a stored procedure called by a DRDA requester, will fail.

This DBA even had what appeared to be solid evidence backing his contention: he created a stored procedure (it happened to be of the native SQL variety -- more on that to come), called ST_PROC_X, through which a SET CURRENT PACKAGESET = 'COLL_A' statement was issued, and when that stored procedure was called through DDF at DB2 location DB2LOCN, the SET CURRENT PACKAGESET failed with a -805 SQL error code (I've of course changed location, collection, and other names from the actual values used at the DBA's site):

DBRM OR PACKAGE NAME DB2LOCN.COLL_A.ST_PROC_X.0123456789ABCDEF NOT FOUND IN PLAN DISTSERV. REASON 02.

There, see? "Not found in plan DISTSERV." That means the program's DBRM wasn't bound directly into the DISTSERV plan, nor was it in a collection in DISTSERV's package list. And of course that must be the case, because a DBRM can't be bound directly into a plan in a DB2 10 (or later) environment (and even before DB2 10 you couldn't bind any DBRM's into plan DISTSERV), and you can't define any package collections for DISTSERV; therefore, SET CURRENT PACKAGESET is doomed to fail for DRDA requesters. Case closed, right?

Umm, no. I will say, however, that the DBA's conclusion was very understandable, given the text that accompanied the -805 SQL error code. This is one of those cases in which the words associated with a DB2 error code are a bit misleading. The error code text suggests that a -805 is always caused by a DBRM not being found in a plan, or a package not being found in a collection named in a plan's PKLIST. In fact, that is one -- not the only -- cause of a -805. If you look up the -805 in the DB2 for z/OS Codes manual, you'll see another reason for the error that's possible when the '02' reason code is indicated:

"The CURRENT PACKAGESET special register was not set correctly by the application."

I figured that was the problem, and might be so because the stored procedure in question was a native SQL procedure. How's that? Well, an external stored procedure's package can be bound into any collection via the COLLID option of the CREATE PROCEDURE statement. For a native SQL procedure, COLLID is not an option for the CREATE PROCEDURE statement. Where, then, does a native SQL procedure's package go? It goes into a collection that has the same name as the native SQL procedure's schema. The native SQL procedure ST_PROC_X referenced above was created with a high-level qualifier of HLQ_X (again, not the real identifier), so it's package was bound into collection HLQ_X. The stored procedure issued SET CURRENT PACKAGESET = 'COLL_A'. That would cause DB2 to look in collection COLL_A for package ST_PROC_X (same name as the stored procedure) to execute subsequent SQL statements issued by the stored procedure. Package ST_PROC_X was not in collection COLL_A, and that's why the -805 was issued, right?

Wrong on my part. The DBA told me that the ST_PROC_X package had, in fact, been bound into each of the database instance-aligned collections. OK, how had that been accomplished? Via BIND PACKAGE with the DEPLOY option, said the DBA.

And, BINGO, that was the problem. See, the DEPLOY option of BIND PACKAGE is typically used to migrate a native SQL procedure from one DB2 environment to another (e.g., from a test to a production environment). To get a native SQL procedure's package into various collections in the same DB2 environment as the package's "root" collection (i.e., the collection that has the same name as the native SQL procedure's schema), one should use BIND PACKAGE with the COPY option, not the DEPLOY option. When BIND PACKAGE was executed with DEPLOY by the DBA with whom I was working, new consistency tokens for the packages were generated by DB2; thus, the search for package ST_PROC_X in collection COLL_A failed at the consistency token level (the fourth-level qualifier of the fully qualified package name: location.collection.package.token). The DBA freed the packages that had been added to the database instance-aligned collections via BIND PACKAGE with DEPLOY, then copied the ST_PROC_X package from its root collection to the database instance-aligned collections using BIND PACKAGE with COPY, then called the stored procedure, and presto: everything worked like a champ.

So, SET CURRENT PACKAGESET is absolutely valid in a stored procedure (native or external) that is called by a DRDA requester. Just make sure that the stored procedure's package was placed in any collection named by SET CURRENT PACKAGESET, and put the stored procedure in that collection (or collections) the right way -- and that means via BIND PACKAGE with COPY when you're replicating a package across several collections within a given DB2 environment.

Friday, September 26, 2014

DB2 for z/OS: Avoiding zIIP Engine Contention Issues

In an entry posted a few months ago to this blog, I touched on several matters pertaining to mainframe zIIP engines and the use of these processors by DB2 for z/OS. Based on recent experience, I feel that it's important to further highlight a particular zIIP-related issue: zIIP engine contention. Through this post, I want to explain what zIIP contention is, why you want to avoid it, how you can spot it, and what you can do about it.

As, probably, most mainframers know by now, zIIPs (System z Integrated Information Processors) are "specialty engines" aimed at reducing the cost of mainframe computing. They do this in two ways: 1) they cost less than general-purpose engines, and 2) they do not factor into the pricing of mainframe software. Over time, as IBM has made more of the work done on System z servers zIIP-eligible, and as existing zIIP-eligible workloads have grown (particularly those associated with applications that access DB2 for z/OS data through network connections -- commonly referred to as DDF or DRDA workloads), zIIP engine utilization rates have climbed at many sites. That's a good thing, but only up to a certain point. If zIIP engines become too busy, DB2 performance can be negatively impacted. I'll first expand on that point.

What zIIP contention is, and why you want to avoid it

Consider a z/OS LPAR configured with one or more zIIPs. If a zIIP engine is not available when an item of zIIP-eligible work in the system is ready for dispatch (because the zIIPs are busy at that time with other work), that piece of work will be dispatched to a general-purpose engine. There is a slight delay involved in dispatching zIIP-eligible work to a general-purpose engine. That doesn't matter much if only a small amount of zIIP-eligible work ends up being redirected to general-purpose engines. If, however, the degree of what I call "zIIP spill-over" reaches a too-high level, application performance degradation can result.

Indicators of performance degradation caused by zIIP contention

zIIPs have been around for quite a few years now (since the early 2000s, I believe). Why is the issue of zIIP contention only now coming to the fore? Two reasons: 1) it took a while for zIIP utilization at many sites to reach a level at which contention can occur, and 2) the type of work that can utilize zIIP resources recently changed in an important way. This latter point refers to the introduction, with DB2 10 for z/OS, of zIIP eligibility for prefetch read and database write I/Os. Here's why that's important: it can be a pretty good chunk of work (often accounting for the large majority of CPU time charged to the DB2 database services address space, aka DBM1), and it's a type of system task that is very important for the performance of some DB2 applications. If zIIP contention causes delays in the dispatching of prefetch read tasks, run times for prefetch-intensive workloads (such as batch jobs and data warehousing/business intelligence applications) can become elongated. If that were to happen, you might see the effect as a significant increase in a class 3 wait time (so called because the information comes from records generated when DB2 accounting trace class 3 is active) that is labeled as suspension due to "other read I/O" in an IBM OMEGAMON for DB2 accounting long report (the field might be somewhat differently labeled in accounting reports generated by other vendors' DB2 monitor products).

Prefetch slowdown isn't the only thing to look out for. Lately, I've seen data that suggests a link between higher levels of "zIIP spill-over" and higher-than-desired in-DB2 not-accounted-for time for DB2-accessing applications. Not-accounted-for time is a DB2 class 2 time (it's captured in DB2 accounting trace class 2 records) that is probably calculated for you in an accounting long report generated by your DB2 monitor. If your monitor does not calculate this number for you, you can easily figure it yourself: just take average in-DB2 (i.e., class 2) elapsed time for an application, and subtract from that the average in-DB2 CPU time (keeping in mind that this is two numbers: in-DB2 general-purpose CPU time and in-DB2 specialty engine CPU time). Next, subtract out total class 3 suspend time. What's left is in-DB2 not-accounted-for time. Generally speaking, a not-accounted-for time that is less than 10% of total in-DB2 elapsed time for a higher-priority transactional workload (such as many DDF and CICS workloads) does not concern me (a higher level of not-accounted-for time for a batch workload is usually not such a big deal). In times past, a higher-than-desired in-DB2 not-accounted-for time (i.e., more than 10% of in-DB2 elapsed time for a higher-priority transactional workload) was usually an indication that a z/OS LPAR's general-purpose engines were over-committed (as might be the case if they are routinely running at utilization levels of 95% or more). Nowadays, it appears that higher-than-desired in-DB2 not-accounted-for time might also be caused by elevated levels of zIIP contention.

An early-warning indicator

Obviously, you'd like to spot and head off a potential zIIP contention-related performance problem. How could you do that? First and foremost, I'd keep an eye on what I call the zIIP spill-over rate. That's the percentage of zIIP-eligible work that ends up running on general-purpose engines (as mentioned previously, this spill-over occurs when zIIP engines are busy when zIIP-eligible work is ready for dispatch). Here's how to calculate the zIIP spill-over rate: first, use your Db2 monitor to generate an "accounting report - long" (that's what IBM's OMEGAMON for Db2 calls it - other monitors call the report something like "accounting summary, long"), and make sure that the data in the report is ordered by connection type. With data ordered by connection type, there will be, within the larger accounting report, several sub-reports - one for each connection type used to access the Db2 subsystem (for example, one sub-report will show all CICS-Db2 work, another will show all work associated with applications that access the subsystem using the Db2 call attachment facility, etc.). In the accounting report, go the to sub-report for the DRDA connection type (this sub-report shows all work associated with applications that access the Db2 subsystem through the distributed data facility). In this sub-report, look for the following fields (I'm using labels and headings found in an OMEGAMON for Db2 report - some monitors put this information under the heading, "average appl(class 1)"):

MEASURED/ELIG TIMES  APPL (CL1)
-------------------  ---------- 
CP CPU TIME            0.001935  (C)

 ELIGIBLE FOR SECP     0.000002  (A)

SE CPU TIME            0.000781  (B)

The number labeled (C) is the average general-purpose CPU time for the workload (average per accounting record, which is usually close to average per transaction). The number labeled (B) is the average zIIP engine CPU time. The number labeled (A) is the average amount of general-purpose CPU time consumed in doing zIIP-eligible work.  The zIIP spill-over rate is calculated as follows:

A / (A + B)

The figures above show a zIIP spill-over rate of less than 1%. Such a small non-zero value would not cause me concern. What would cause me some concern is a rate that is over 5%. With that much zIIP-eligible work running on general-purpose engines, you could see the beginnings of a negative impact on application performance.

So, at what level of zIIP engine utilization would you see a potentially performance-impacting zIIP spill-over rate? That depends on the number of zIIP engines configured in a z/OS LPAR. If an LPAR has only one zIIP engine, you could see a higher-than-desired rate of zIIP spill-over with that one engine running somewhere in the 30s, with respect to percentage of utilization. With four zIIP engines, utilization could probably go to around 50% with little spill-over of zIIP-eligible work to general-purpose engines. I saw data for one system in which average zIIP utilization (during a reporting interval, referring to an IBM RMF CPU Activity Report) was 70%, and zIIP spill-over was around 1% - that z/OS LPAR was configured with 9 zIIP engines. The more zIIP engines you have, the hotter you can run them while still keeping zIIP spill-over to general-purpose engines at a very low rate.

Doing something about zIIP contention

If you observe a higher-than-desired level of zIIP spill-over in one of your z/OS LPARs, what can you do about it? One obvious response would be to add zIIP capacity to the system (keep in mind that EC12 and BC12 servers can have up to two zIIPs for every general-purpose engine). If that is not feasible in the near term, see if you can take some steps to reduce zIIP utilization. An example of such a step would be a significant enlargement of one or more DB2 buffer pools (assuming that you have sufficient memory to back larger pools). This could reduce zIIP utilization by reducing synchronous reads associated with a DDF workload, and by reducing prefetch read activity (something that became zIIP-relevant starting with DB2 10). Using high-performance DBATs could reduce the overall CPU cost (and thus the zIIP consumption) of DDF-connected applications.

Summing it up

The moral of this story is pretty simple: do NOT think that you can run zIIPs as "hot" as you can general-purpose mainframe engines without encountering application performance issues. In doing capacity planning, anticipate growth in zIIP-eligible work on your z/OS systems (not only because of overall workload growth, but also because new types of work become zIIP-eligible through new releases -- and sometimes new maintenance levels -- of subsystem and operating system software), and aim to keep zIIP capacity ahead of zIIP demand so as to keep the rate of zIIP spill-over low. Calculate the zIIP spill-over rate for your production DB2 for z/OS systems, and track that number -- make it one of your key metrics for monitoring the performance of mainframe DB2 data servers.

Driving up zIIP utilization can be an effective way of enhancing the price performance of a z/OS system -- just don't take it so far as to be counterproductive.