Friday, September 23, 2016

DB2 for z/OS: Using PGFIX(YES) Buffer Pools? Don't Forget About Large Page Frames

Not long ago, I was reviewing an organization's production DB2 for z/OS environment, and I saw something I very much like to see: a REALLY BIG buffer pool configuration. In fact, it was the biggest buffer pool configuration I'd ever seen for a single DB2 subsystem: 162 GB (that's the combined size of all the buffer pools allocated for the subsystem). Is that irresponsibly large -- so large as to negatively impact other work in the system by putting undue pressure on the z/OS LPAR's central storage resource? No. A great big buffer pool configuration is fine if the associated z/OS LPAR has a lot of memory, and the LPAR in question here was plenty big in that regard, having 290 GB of memory. The 128 GB of memory beyond the DB2 buffer pool configuration size easily accommodated other application and subsystem memory needs within the LPAR, as evidenced by the fact that the LPAR's demand paging rate was seen, in a z/OS monitor report, to be zero throughout the day and night (I'll point out that the DB2 subsystem with the great big buffer pool configuration is the only one of any size running in its LPAR -- if multiple DB2 subsystems in the LPAR had very large buffer pool configurations, real storage could be considerably stressed).

A couple of details pertaining to this very large buffer pool configuration were particularly interesting to me: 1) the total read I/O rate for each individual buffer pool (total synchronous reads plus total asynchronous reads, per second) was really low (below 100 per second for all pools, and below 10 per second for all but one of the pools), and 2) every one of the buffer pools was defined with PGFIX(YES), indicating that the buffers were fixed in real storage (i.e., not subject to being paged out by z/OS). And here's the deal: BECAUSE the buffer pools all had very low total read I/O rates, page-fixing the buffers in memory was doing little to improve the CPU efficiency of the DB2 subsystem's application workload. Why? Because all of the pools were exclusively using 4K page frames.

Consider how it is that page-fixing buffer pools reduces the CPU cost of DB2 data access. When the PGFIX(YES) option of -ALTER BUFFERPOOL was introduced with DB2 Version 8 for z/OS, the ONLY CPU efficiency gain it offered was cheaper I/O operations. Reads and writes, whether involving disk volumes or -- in the case of a DB2 data sharing configuration on a Parallel Sysplex -- coupling facilities, previously had to be bracketed by page-fix and page-release actions, performed by z/OS, so that the buffer (or buffers) involved would not be paged out in the midst of the I/O operation. With PGFIX(YES) in effect for a buffer pool, those I/O-bracketing page-fix and page-release requests are not required (because the buffers are already fixed in memory), and that means reduced instruction pathlength for DB2 reads and writes (whether synchronous or asynchronous).

DB2 10 extended the CPU efficiency benefits of page-fixed buffer pools via support for 1 MB page frames. By default, in a DB2 10 (or 11) environment, a PGFIX(YES) buffer pool will be backed by 1 MB page frames if these large frames are available in the LPAR in which the DB2 subsystem runs. How does the use of 1 MB page frames save CPU cycles? By improving the hit ratio in the translation lookaside buffer, leading to more cost-effective translation of virtual storage addresses to corresponding real storage addresses for buffer pool-accessing operations. DB2 11 super-sized this concept by allowing one to request, via the new FRAMESIZE option for the -ALTER BUFFERPOOL command, that a page-fixed pool be backed by 2 GB page frames (note that 2 GB page frames may not save much more CPU than 1 MB frames, unless the size of the buffer pool with which they are used is 20 GB or more).

Having described the two potential CPU-saving benefits of page-fixed buffer pools, I can make the central point of this blog entry: if you have a PGFIX(YES) buffer pool that has a low total read I/O rate, and that pool is backed by 4 KB page frames, the PGFIX(YES) specification is not doing you much good because the low read I/O rate makes cheaper I/Os less important, and the 4 KB page frames preclude savings from more-efficient virtual-to-real address translation.

This being the case, I hope you'll agree that it's important to know whether a page-fixed buffer pool with a low read I/O rate is backed by large page frames. In a DB2 11 environment, that is very easy to do: just issue the command -DISPLAY BUFFERPOOL, for an individual pool or all of a subsystem's buffer pools (in that latter case, I generally recommend issuing the command in the form -DISPLAY BUFFERPOOL(ACTIVE)). You'll see in the output for a given pool one or more instances of a message, DSNB546I. That message information might look like this:


What would this information tell you? It would tell you that DB2 wanted this pool to be backed with 1 MB page frames (the default preference for a PGFIX(YES) pool), but the pool ended up using only 4 KB frames. Why? Because there weren't 1 MB frames available to back the pool (more on this momentarily). What you'd rather see, for a PGFIX(YES) pool that is smaller than 2 GB (or a pool larger than 2 GB for which 2 GB page frames have not been requested), is something like this:


(This information is also available in a DB2 10 environment, though in a somewhat convoluted way as described in an entry I posted to this blog a couple of years ago.)

So, what if you saw that a PGFIX(YES) pool is backed only by 4 KB page frames, and not by the preferred larger frames (which, as noted above, are VERY much preferred for a pool that has a low total read I/O rate)? Time then for a chat with your friendly z/OS systems programmer. That person could tell you if the LPAR has been set up to have some portion of the real storage resource managed in 1 MB (and maybe also 2 GB) page frames. Large frames are made available by way of the LFAREA parameter of the IEASYSxx member of the z/OS data set SYS1.PARMLIB. Ideally, the LFAREA specification for a z/OS LPAR should provide 1 MB page frame-managed space sufficient to allow PGFIX(YES) buffer pools to be backed to the fullest extent possible by 1 MB frames (and/or by 2 GB frames as desired). It may be that DB2 is the one major user of large real storage page frames in a z/OS LPAR, and if that is the case then the amount of 1 MB (and maybe 2 GB) page frame-managed space could reasonably be set at just the amount needed to back page-fixed DB2 buffer pools (in the case of 1 MB frames, I'd determine the amount needed to back PGFIX(YES) buffer pools, and increases that by about 5% to cover some smaller-scale uses of these frames in a z/OS environment). If WebSphere Application Server (WAS) is running in the same z/OS LPAR as DB2, keep in mind that WAS can use 1 MB page frames for Java heap memory -- your z/OS systems programmer should take that into account when determining the LFAREA specification for the system.

There you have it. To maximize the CPU efficiency advantages of page-fixed buffer pools, make sure they are backed by large page frames. This is particularly true for pools with a low total read I/O rate. The more active a buffer pool is (and the GETPAGE rate is a good measure of activity -- it can be thousands per second for a buffer pool), the greater the CPU cost reduction effect delivered by large page frames.

And don't go crazy with this. Don't have a buffer pool configuration that's 80% of an LPAR's memory resource, and all page-fixed. That would likely lead to a high level of demand paging, and that would be bad for overall system performance. Know your system's demand paging rate, and strive to keep it in the low single digits per second or less, even during times of peak application activity. Leveraging z Systems memory for better performance is a good thing, but like many good things, it can be overdone.

Monday, August 29, 2016

DB2 for z/OS: Clearing Up Some Matters Pertaining to Database Access Threads

I have recently received a number of questions pertaining to DB2 for z/OS database access threads, or DBATs. DBATs are threads used in the execution of SQL statements that are sent to DB2 from network-attached applications (i.e., from DRDA requesters that access DB2 for z/OS by way of DB2's distributed data facility, also known as DDF). Thinking that these questions (and associated answers) might be of interest to a good many people in the DB2 for z/OS community, I'm packaging them in this blog entry. Without further ado, here are the Qs and As.

Question: What makes a DBAT a high-performance DBAT?

Answer: This question actually come to me in reverse form, as (and I'm paraphrasing), "What makes a DBAT a 'regular' DBAT versus a high-performance DBAT?" The answer's pretty simple: a "regular" DBAT is one that is not a high-performance DBAT, and a high-performance DBAT is one to which at least one package bound with RELEASE(DEALLOCATE) has been allocated for execution; therefore, if a DBAT does not have a RELEASE(DEALLOCATE) package allocated to it then it is a "regular" DBAT. Note that this answer presupposes that high-performance DBAT functionality has been enabled on the DB2 subsystem in question (more on this below) -- if that has not happened then all DBATs will be of the "regular" variety because there cannot be any high-performance DBATs.

Question: Can I get high-performance DBATs by binding the IBM Data Server Driver (or DB2 Connect, if that's what you use) packages with RELEASE(DEALLOCATE)?

Answer: Yes, but you'll want to do that the right way. This question is of greatest interest to organizations having DDF application workloads characterized by little (if any) in the way of static SQL statements. Execution of a static SQL statement, of course, involves execution of code in a DB2 package, and packages can be bound with RELEASE(DEALLOCATE), and, as noted above, executing a RELEASE(DEALLOCATE) package by way of a DBAT makes the DBAT a high-performance DBAT, if it wasn't such already. For a DDF-using application, static SQL-related packages could belong to stored procedures called by the application, or they might be associated with static SQL statements issued by the application itself -- for example, SQLJ statements embedded in a Java program. What if a DDF-using application uses dynamic SQL exclusively? That's quite common, as SQL statements issued through non-DBMS-specific interfaces such as JDBC and ODBC (both very widely used) are dynamically prepared and executed on the target DB2 for z/OS server. Are such applications blocked from the CPU efficiency gains that can be delivered via high-performance DBATs? No, they are not. Why not? Because even when a DDF-using application issues SQL statements that are exclusively dynamic from the DB2 perspective, packages are still used. Which packages? IBM Data Server Driver -- or, alternatively, IBM DB2 Connect -- packages (your license for the latter entitles you to use the former, and it's recommended that you use the former). Those packages are bound, by default, into a collection called NULLID; so, should you bind all the packages in the NULLID collection with RELEASE(DEALLOCATE), so as to make your DBATs high-performance DBATs? No, you should not do that. Why? Because that would cause ALL of your DBATs to be of the high-performance variety, and that would not be an optimal use of this functionality. You want to be smart about your use of RELEASE(DEALLOCATE) -- for DDF-using applications and for "local-to-DB2" applications, as well. That means, for a DDF workload, utilizing high-performance DBATs for frequently executed transactions that have a relatively small average in-DB2 CPU time (for such transactions, the proportional cost of repeatedly acquiring, releasing, and re-acquiring the same table space-level locks and package sections -- greatly reduced via high-performance DBATs -- is relatively significant versus transactions with higher in-DB2 times). You want, then, to use high-performance DBATs selectively. How can you do that? By binding the IBM Data Server Driver (or DB2 Connect) packages into the default NULLID collection with RELEASE(COMMIT), and into a collection not named NULLID with RELEASE(DEALLOCATE). THEN you can selectively use high-performance DBATs by pointing an application, by way of a client-side data source property, to the collection into which the Data Server Driver (or DB2 Connect) packages were bound with RELEASE(DEALLOCATE), when you want that application to use high-performance DBATs. And remember, by the way, to give consideration to upping your MAXDBAT value before starting to use high-performance DBATs.

Question: I'm using RELEASE(DEALLOCATE) packages with my DBATs, but my DB2 monitor indicates that I'm not getting high-performance DBAT usage. Why is that?

Answer: First of all, about this monitoring thing: check a DB2 monitor-generated statistics long report for your subsystem (or an online display of DDF activity provided by your monitor), and in the DDF activity section of the report, look for the field that is labeled HWM ACTIVE DBATS-BND DEALLC (or something like that -- field names can differ slightly from one DB2 monitor to another). If that field shows zero, your DDF applications are not using high-performance DBATs. If you see a zero in that field and think that you shouldn't, because you have RELEASE(DEALLOCATE) packages being executed via DBATs, issue the command -DISPLAY DDF DETAIL on the DB2 subsystem of interest. In the output of that command, look for the line with message DSNL106I. If what you see there is PKGREL = COMMIT, you're not getting high-performance DBATs because your DDF is not enabled for high-performance DBAT functionality. To change that, issue the command -MODIFY DDF PKGREL(BNDOPT). As a result of this command being executed, DDF will honor the RELEASE(DEALLOCATE) specification for packages allocated to DBATs for execution, and you'll then see instantiation of high-performance DBATs in the system. Note that it is possible for certain BIND/REBIND, ALTER, and pending DDL-materializing online REORG operations to be blocked by a RELEASE(DEALLOCATE) package that is allocated to a persistent thread such as a high-performance DBAT. Even though DB2 11 delivered a RELEASE(DEALLOCATE) "break-in" feature to enable BIND/REBIND and ALTER and pending DDL-materializing online REORG operations to proceed in the face of RELEASE(DEALLOCATE) packages that would otherwise be in the way, it still may be necessary to issue -MODIFY DDF PKGREL(COMMIT) to temporarily "turn off" high-performance DBATs when you need to perform these operations. When you're done with the database administration actions, issue -MODIFY DDF PKGREL(BNDOPT) to turn high-performance DBAT functionality back on.

Question: I'm seeing more idle thread timeouts for my DBATs than I'd like. What can I do about that?

Answer: The best way to prevent DBATs from timing out due to inactivity is to let the connections that use DBATs go into an inactive state when DDF transactions complete (and I'm referring here to "regular" DBATs -- a high-performance DBAT will remain tied to the connection through which it was instantiated until it terminates, typically as a result of being reused 200 times). It is important to realize that an application's connection to DB2 going inactive at the end of a transaction is a GOOD THING -- it's a big part of the DB2 for z/OS client-server scalability story. The virtual storage footprint of an inactive connection is very small, and the CPU cost of switching a connection from an active to an inactive state and back again is very small. When an application's connection to DB2 goes inactive, the DBAT that it was using goes into a "disconnected" state (NOT inactive), which is to say that it goes into the DBAT pool, ready to be called up to service another transaction coming from some other connection. Pooled DBATs are not subject to the idle thread timeout limit (specified via the DB2 ZPARM parameter IDTHTOIN), and inactive connections don't time out. How do you get inactive connection functionality? First, the DB2 ZPARM parameter CMTSTAT needs to be set to INACTIVE (that's been the default value for a LONG time). Second, transactions using DBATs need to finish with a "clean" commit. Basically, this means that nothing is allocated to the DBAT when a transaction using the DBAT completes -- so, no locks of any kind, no WITH HOLD cursors left un-closed, no declared global temporary tables left with data in them, etc. The most important thing that goes into getting "clean" commits is the COMMIT itself. Some client-side application developers think that a COMMIT isn't needed for a transaction that only reads data. Not true. If a read-only DDF transaction doesn't issue a COMMIT, the associated DBAT will hold one or more table space-level locks, and that would be enough to keep the connection from going inactive and the associated DBAT from going back into the DBAT pool -- if the DBAT stays in a connected state because no COMMIT has been issued, and if the amount of time specified in the ZPARM parameter IDTHTOIN goes by with no activity for the thread, it will time out. Oh, and another thing: if you have client-side developers occasionally issuing SQL statements like SELECT 1 FROM SYSIBM.SYSDUMMY1 in a well-intentioned effort to keep connections to DB2 "alive," TELL THEM TO STOP DOING THAT. Such actions can keep DB2 connections from going inactive when transactions complete, and as I stated previously it is GOOD for DB2 connections to go inactive when DDF transactions finish processing.

Question: Can the KEEPDYNAMIC(YES) package bind specification lead to idle thread timeouts for DBATs?

Answer: Yes. Here's the deal: with KEEPDYNAMIC(YES) in effect, prepared dynamic SQL statements are kept in a thread-specific local cache. Reusing one of these locally cached dynamic SQL statements is even more CPU efficient than getting a hit in the global dynamic statement cache in the DB2 EDM pool. That's good, but here's the flip side: with KEEPDYNAMIC(YES) in effect, you'll have prepared dynamic statements allocated locally to the thread, and as I noted above, a DBAT cannot go into a disconnected state (into the DBAT pool) at the end of a transaction if something is allocated to the DBAT. With the DBAT thus prevented from going into the DBAT pool (because of the prepared statements allocated locally to the DBAT), it is subject to hitting the idle thread timeout limit and being cancelled for that reason. Now, in a couple of situations you get some relief from this stricture: if either Sysplex workload balancing or seamless failover is in effect (these are DB2 Connect or IBM Data Server Driver configuration options) then at commit time, if the only thing keeping a DBAT in the connected state is the local cache of prepared dynamic SQL statements that exists because KEEPDYNAMIC(YES) is in effect, the DBAT can go unused for 20 minutes before the DB2 idle thread checker will drive a cancellation of the thread. Also, when either of the aforementioned DB2 client configuration options is in effect (Sysplex workload balancing or seamless failover), and a DBAT associated with a KEEPDYNAMIC(YES) package is being repeatedly used (so that idle thread timeout does not occur), the DBAT can stay in the connected state for an hour before being terminated (this so that resources allocated to the DBAT will be periodically freed up). Given all this, my thinking on KEEPDYNAMIC(YES) can be summed up as follows: if you have a DDF-using application that is specifically designed to take advantage of the KEEPDYNAMIC(YES) bind option, you can certainly consider going that route to achieve related CPU savings; otherwise, my inclination would be to use KEEPDYNAMIC(NO), go for hits in the global dynamic statement cache (not quite as good, efficiency-wise, as reuse of a locally cached prepared statement, but still a whale of a lot better than full PREPAREs of dynamic statements), and use high-performance DBATs to achieve additional CPU savings.

That's all for now. I hope that this information will be of use to you.

Thursday, July 28, 2016

DB2 for z/OS: Clearing the Air Regarding CLOSE YES for Table Spaces and Indexes

This is another of my DB2 for z/OS blog entries motivated by a desire to clear up a matter about which there is some misunderstanding in the DB2 community. In this case, the misunderstanding concerns the CLOSE option of the CREATE and ALTER statements for DB2 for z/OS table spaces and indexes.

The confusion here, as I've encountered it, is mainly related to differences between "soft close" and "hard close" of DB2 data sets (table spaces or indexes, or partitions of same if the objects are partitioned). "Soft close" is the term that some people use to describe pseudo-close processing, while "hard close" refers to the physical closing of a data set that had been open and allocated to DB2. The primary point I want to make can be summed up simply: specification of CLOSE YES or CLOSE NO affects "hard close" activity, not "soft close" activity. Information provided below is intended to flesh out and provide context for that point. 

I'll start by explaining what pseudo-close (aka "soft close") is about. Pseudo-close is done largely for the purpose of updating a table in the DB2 for z/OS directory called SYSLGRNX. This table contains information about the time periods (expressed as ranges within the DB2 transaction log) during which DB2 data sets are open and in a read/write state. Why does DB2 record this information? To speed up RECOVER utility jobs. Typically, when RECOVER is run for a table space or index (or partition of same, if partitioned), the job involves recovery "to currency" -- that is, to what would be the current state of the object. In that case, the RECOVER utility restores the most recent image copy backup of the target object, and applies data changes made subsequent to that backup to the object to bring it to currency. These data changes are obtained from the transaction log, and therein we find the benefit of pseudo-close processing. [Note that even if recovery will be to a prior point in time, versus "to currency," if that point in time is "later" than the time of the restored image copy, post-copy changes will be applied as necessary from the transaction log. RECOVER with the BACKOUT option also involves transaction log access, but no image copy restoration.]  

When a DB2 data set that is open for read/write access has gone for a certain interval of time without any data changes, it will be pseudo-closed. That interval of time is determined by the value of two DB2 ZPARM parameters, PCLOSEN and PCLOSET. The former parameter refers to a number of checkpoints (the default is ten), and the latter to a number of minutes (the default is ten), and the pseudo-close action is taken when the first of those thresholds is reached; so, if PCLOSEN and PCLOSET are at their default values, a DB2 data set open for read/write will be pseudo-closed if it goes for 10 DB2 system checkpoints or 10 minutes (whichever happens first) without being updated (with the default pseudo-close parameters in effect, 10 minutes will typically pass before 10 DB2 checkpoints have occurred -- it's unusual for DB2 checkpoints to be separated by less than a minute). When a data set is pseudo-closed, DB2 switches the data set's state to read-only, and records that action (in terms of log RBA, or relative byte address, and -- in a data sharing environment -- log record sequence number, or LRSN, information) in the SYSLGRNX table in the DB2 directory. Though the data set, after being pseudo-closed, is in a read-only state, it is still available for update, and the state will be switched back to read/write from read-only when the next data-change operation (e.g., INSERT, UPDATE, DELETE) targeting the data set comes along -- and that (the switching of the data set's state from read-only back to read/write) is also recorded in SYSLGRNX. If the data set is recovered via the RECOVER utility at some time, RECOVER will get information pertaining to the data set that will indicate portions of the log that can be skipped over during change-apply processing -- if SYSLGRNX information indicates that the data set was in a read-only state between points X and Y (RBA or LRSN values) in the log, there's no need for RECOVER to process that part of the log because there's no way that data-change operations involving the data set will be recorded in that log range. The more of the log that RECOVER can skip over in recovering an object, the sooner the job will complete.

By the way, you can monitor pseudo-close activity for a DB2 subsystem by way of a DB2 monitor-generated statistics long report (or an online display provided by your monitor). In a statistics long report, find the section of information with the heading OPEN/CLOSE ACTIVITY. In that section, find the field with a label like DSETS CONVERTED R/W -> R/O. That shows the rate of pseudo-close actions. As a very rough rule of thumb, if the number of open data sets for the subsystem is somewhere in the vicinity of 10,000, I'm comfortable with a pseudo-close rate of around 20-40 per minute (you'd adjust that range proportionately for smaller or larger numbers of open data sets). 

That, then, is pseudo-close, and it is NOT affected by the specification of CLOSE YES or CLOSE NO for a DB2 data set. The CLOSE specification for a data set does affect "hard close" processing -- that is, the physical closing of data sets that had been open and allocated to DB2. These physical close actions can happen for two reasons -- one of which applies only to a DB2 data sharing environment (and I'm not talking about the data set close actions that are a normal part of shutdown processing for a DB2 subsystem). First, every DB2 subsystem has a limit on the number of data sets that can be open and allocated to the subsystem at one time. That limit is determined by the value of the ZPARM parameter DSMAX. If the number of data sets open and allocated to a DB2 subsystem reaches the DSMAX limit, DB2 will physically close some of those open data sets, starting with those that are a) defined with CLOSE YES and b) have gone the longest time without being accessed. If the DSMAX limit is reached for a subsystem and there are no open data sets defined with CLOSE YES, or if DB2 needs to close a number of open data sets beyond those defined with CLOSE YES, some data sets defined with CLOSE NO that have gone a long time without being referenced will be physically closed. Thus, CLOSE YES provides a means whereby you can indicate to DB2 that, in the event of the DSMAX open data set limit being reached, certain data sets are to be candidates for physical closing before other data sets are so considered. With this said, I'll note that my preference is to have DSMAX set to a value that is either not reached, or reached infrequently. More information about DSMAX, including how associated "hard close" activity can be monitored for a DB2 subsystem, can be found in an entry I posted to this blog a few months ago.

In a DB2 data sharing environment, there is an interesting relationship between "soft close" and "hard close" activity. Imagine that you have a 2-way DB2 data sharing group comprised of subsystems DB2A and DB2B. Suppose that data set X is being updated ("update" meaning "changed" -- via INSERT, UPDATE, and/or DELETE) by processes running on both DB2A and DB2B. If data set X goes for a pseudo-close interval without being changed on DB2B, it will be pseudo-closed on that member, and its state will be changed to read-only. If another pseudo-close interval goes by without any access at all (no read, no nothing) to data set X from DB2B, data set X will be physically closed on DB2B if it is defined with CLOSE YES. This physical closing of data set X on DB2B could result in DB2A getting an exclusive page set P-lock (physical lock -- used for data coherency versus access concurrency purposes) on the data set, causing it to become non-group-buffer-pool-dependent. That would reduce data sharing overhead by eliminating, from the system on which DB2A is running, group buffer pool-related coupling facility requests that would otherwise be required as part of accessing data set X.

So, in a DB2 data sharing environment, should objects be defined with CLOSE YES or CLOSE NO? There is, as you might expect, no one-size-fits-all answer to this question. CLOSE YES could be a good choice for an object if you think it likely that there will be significant stretches of time during which the object (or a partition thereof, if it is a partitioned table space or index) will be accessed exclusively by a process or processes running on a single member of the DB2 data sharing group (meaning, no access from other group members). In that case, CLOSE YES could enable acquisition by the one DB2 member of an exclusive page set P-lock on the object (or partition thereof). During the time interval in which that exclusive page set P-lock is held by the DB2 member, a lot of group buffer pool accesses could be eliminated, and that would be good from a CPU efficiency perspective. On the other hand, if you think that it would be unlikely for objects to be accessed for any significant period of time solely from one member of the data sharing group, CLOSE NO could be a better choice. Why? Because if there is a lot of pseudo-close activity going on in the data sharing group -- that is, if data sets are quite frequently being switched by DB2 members to a read-only state and then, very shortly, back to a read-write state -- then CLOSE YES could result in data sets frequently going from group buffer pool-dependent to non-group buffer pool-dependent and back again, and that could drive a lot of coupling facility activity (due to things like large-scale page registration and group buffer pool write activity). Ultimately, the CLOSE YES / CLOSE NO decision in a data sharing environment comes down to knowing how a given database object is likely to be accessed by processes running on various members of the DB2 data sharing group.

And there you have it. CLOSE YES has an impact on DB2 data set "hard close" (physical close) activity, but not on "soft close" (pseudo-close) activity, though in a data sharing environment CLOSE YES can lead to the physical closing of a data set as a result of a pseudo-close action. I hope that this information will be helpful to you.

Tuesday, June 28, 2016

DB2 for z/OS: Indexes and Foreign Keys

The functionality of DB2 for z/OS is enhanced, in ways small and big, on a regular basis. There's plenty to keep up with, and it is understandable that some product improvements could fly under the radar of a database administrator and escape notice. I encountered this situation a few weeks ago. During a presentation on new features and functions of DB2 11, delivered to an audience comprised mostly of DB2 DBAs, I was asked, "When will we see a lifting of the restriction that requires an index on a foreign key to be defined on just that key in order to be useful for improving referential integrity performance? When will an index that begins with a foreign key's columns, but includes one or more additional columns, be usable as an RI performance booster?"

Do you know what the answer is to that question? It's not actually answerable in an "It will happen..." way, because the dropping of the referenced restriction is something that did happen -- and happened some time ago, at that.

Before going further with this, a little level-setting is in order. Referential integrity (RI) is, of course, a data integrity protection mechanism. Its aim is to prevent "orphan" rows from showing up in the tables of a relational database management system like DB2; so, if you have a CUSTOMERS table and an ORDERS table, and CUST_NUMBER is a column of both tables, RI helps to avoid a situation in which the CUST_NUMBER column of ORDERS contains a value that does not appear in the corresponding column of the CUSTOMERS table. In this example. CUSTOMERS is the parent table in the RI relationship, and ORDERS is the child table (also referred to as a dependent table). CUST_NUMBER in ORDERS is a foreign key. CUST_NUMBER in CUSTOMERS would be a unique key (and maybe that table's primary key).

DB2-managed RI does what it does in part by preventing deletion of a row in the CUSTOMERS table (continuing with the example introduced previously) if that row's CUST_NUMBER value appears in the CUST_NUMBER column of the ORDERS table (assuming that the behavior specified for the referential constraint is DELETE RESTRICT versus DELETE CASCADE). The "child table check" required to enforce a DELETE RESTRICT rule will perform much better if DB2 can verify existence of a matching foreign key value using an index on the child table, as opposed to having to scan the child table itself. And here is where the aforementioned enhancement comes in. For a long time, the usability of an index on a foreign key for "child check" efficiency depended on the index being defined ONLY on the foreign key. In other words, if you defined on the ORDERS table an index on (CUST_NUMBER, ORDER_AMOUNT), that index would not be usable for improving the performance of a CUST_NUMBER "child check" associated with a DELETE targeting a row in the parent CUSTOMERS table. Here's the wording that appeared in the DB2 for z/OS Version 8 Application Programming and SQL Guide, under the heading, "Indexes on foreign keys" (with emphasis added by me):

To let an index on the foreign key be used on the dependent table for a delete operation on a parent table, the columns of the index on the foreign key must be identical to and in the same order as the columns in the foreign key.

That restriction caused some organizations to have more indexes on certain tables than they wanted. If you had, for example, a table T1, with a foreign key COL1, and you needed an index on COL1, COL2 and also wanted to have an index that would improve the performance of the "child check" of a COL1 value associated with a requested DELETE of a row in the parent table of the RI relationship, you needed an index on COL1 (exact match of the foreign key) as well as the index on COL1, COL2. Kind of a bummer, because every index you define on a table makes all DELETEs and all INSERTs involving the table more expensive, and ups the cost of index-processing utilities such as REORG, RUNSTATS, and LOAD (and of of course ups disk space usage).

Lo and behold, starting with DB2 9 for for z/OS (yeah, that far back), things changed in a good way regarding indexes and foreign keys. Here's the wording from the DB2 9 Application Programming and SQL Guide, under the heading, "Defining a foreign key" (again, emphasis has been added by me):

The index on the foreign key can be used on the dependent table for delete operations on a parent table. For the index to qualify, the leading columns of the index must be identical to and in the same order as all columns in the foreign key. The index can include additional columns, but the leading columns match the definition of the foreign key.

And here's a link to a page containing the same information in the DB2 11 Knowledge Center on the Web:

If this DB2 enhancement made the scene without you realizing it, don't worry -- you're not alone. What's important is what you do with this information, if it's news to you. First, for a new table to be defined in your DB2 for z/OS environment, note the foreign keys (if any). Then, in determining what indexes should be defined on the table, keep in mind that an index on (COL1, COL2, COL3, ...) will be effective as a performance booster for "child checks" for foreign key COL1 (or foreign key COL1, COL2 -- what matters is that the leading column or columns of the index match the column or columns of the foreign key in question) -- you don't have to define a separate index just on the foreign key to get efficient "child checking."

For existing tables, this could be an opportunity for some index pruning that would save disk space and enhance the CPU efficiency of INSERT and DELETE and utility operations (and UPDATEs of indexed columns). Note a table's foreign keys, see if there are indexes on the table that exactly match those foreign keys, and see if there are other indexes on the table with leading columns that match a foreign key. If you find such a situation, consider dropping the index that exactly matches the foreign key if due diligence shows that doing so would not cause any problems; so, if table T1 has a foreign key COL1, COL2 and an index on (COL1, COL2) and another index on (COL1, COL2, COL3), do some analysis. See if any packages are dependent on the index on (COL1, COL2). If any are, could those packages use the index on (COL1, COL2, COL3) and still get good performance? Is there any other reason why the "exact match" index on the foreign key COL1, COL2 is needed? If not, get rid of it and realize CPU and disk space savings. You might well find that this situation exists for multiple of your tables that were created when the "exact match" rule regarding indexes on foreign keys was still in effect.

Bottom line: the indexes you have should be the indexes you need. What's "needed" can change over time. Take that knowledge and use it.

Wednesday, May 25, 2016

DB2 for z/OS: When Did You Last Check Your Checkpoint Frequency?

I have written in this blog a few entries about various DB2 for z/OS ZPARM parameters, including a post on DSMAX and another on EDM_SKELETON_POOL. Time for another such entry, this time focused on the parameters that determine when a DB2 subsystem will perform checkpoint processing.

Checkpointing is an availability-enhancing mechanism that is related to DB2's deferred write functionality. DB2 externalizes changed pages to disk in a deferred manner so as to boost the efficiency of insert, update, and delete operations (the aim is to allow multiple changes to be made to a given page in memory before that page is written to a table space or index on disk). That's great for scalability, but it has an availability implication: if the DB2 subsystem terminates abnormally (i.e., "crashes"), pages in memory that were updated (and associated with committed data changes) and not yet externalized at the time of the crash will have to be identified (using data in the DB2 log) and processed in the course of DB2 restart (meaning, the corresponding table and index pages on disk will have to be updated accordingly) so that the data on disk will be in a consistent state before the subsystem resumes serving requests from applications. The more of these "pending writes" there are at the time of a DB2 failure, the longer it will take to complete the subsequent restart of the subsystem; thus, there is goodness in getting changed pages externalized to disk in a timely manner.

This is where checkpointing comes in. At regular intervals (based on a user-supplied criterion or criteria) DB2 will identify changed-but-not-externalized pages in the buffer pools and will schedule those pages for externalization via asynchronous write actions. At the time of the next checkpoint, DB2 will determine whether any of the updated-but-not-externalized pages identified at the time of the previous checkpoint are still un-externalized. Those pages (if any) will be written synchronously to disk. And so it goes, one checkpoint after another after another.

What makes for a "good" DB2 checkpoint frequency? That would be one that strikes a proper balance between greater CPU efficiency on the one hand -- allow pages to be updated at least a few times before being externalized to disk, and keep the consumption of resources expended in checkpointing at a reasonable level -- and faster restart on the other hand.

OK, how do you effect a desired frequency of checkpoint processing? For a long time, the only way to influence DB2 checkpoint frequency was to specify (by way of a ZPARM parameter) the number of records that would be written to the DB2 transaction log between checkpoints. The default value for this parameter used to be (if memory serves me correctly) 50,000, and if you went with that value for a subsystem then DB2 would go through checkpoint processing every time 50,000 records had been written to the log since the time of the last checkpoint. For many organizations, this default log-record setting resulted in a pretty reasonable checkpoint frequency.

As time went by, there came to be some problems with both the default value for log-records-between-checkpoints, and for the whole log-based means of regulating checkpoint frequency. The problem with the 50,000 log records threshold is that it came to be way too low at a growing number of sites -- a result of DB2 data-change workloads getting dramatically larger (sometimes thousands of insert/update/delete operations per second). At 50,000 log records between checkpoints, some systems would have DB2 checkpoints occurring every few seconds -- not where you want to be with respect to achieving a good balance balance between CPU efficiency and quick DB2 restart-ability (checkpointing every few seconds would be pretty expensive). The other problem with the log write-based checkpoint triggering threshold was variability -- sometimes extreme variability. At a given DB2 for z/OS site there could be times of the day when a read-dominated workload would lower log write activity to a point that caused lots of time to go by between checkpoints, and other times when high-volume insert jobs (for example) would drive log write levels to the point that checkpointing occurred with too much frequency.

DB2 for z/OS Version 7 delivered a very useful enhancement that enabled the management of checkpoint frequency based on an explicitly time-based threshold: you could tell a DB2 subsystem to checkpoint every X number of minutes, regardless of the volume of log write operations. DB2 10 for z/OS allowed one to specify that checkpointing is to occur after X minutes have elapsed, or after Y records have been written to the DB2 log, since the last checkpoint -- the threshold that's reached first triggers the next checkpoint.

So, where does all this leave us? First, you want to know how many checkpoints are driven on your DB2 subsystems (especially the production subsystems) during busy hours of the day and/or night. This can be easily done using a statistics long report generated by your DB2 monitor. I'd suggest creating a report for a busy daytime period of 1 or 2 hours, and a report for a busy nighttime period of 1 or 2 hours. In the report, find the section under the heading SUBSYSTEM SERVICES (or something similar -- different DB2 monitors might use slightly different wording for headings and fields). In that report section, find the field with a label like SYSTEM EVENT CHECKPOINT, and see how many checkpoints occurred during the time period for which the report was generated. Generally speaking, what you want to see is a checkpoint every 2 to 5 minutes (there is not a "right" or "wrong" place to be in that range -- closer to 5 minutes would make for a somewhat more CPU-efficient DB2 environment, and closer to 2 minutes would enable somewhat faster DB2 restart following an abnormal subsystem termination). If you're out of that range (on either side), an adjustment would likely be warranted.

While you can still manage DB2 checkpointing on a log-records-between-checkpoints basis, and also using both a number of log records and a number of minutes between checkpoints (whichever occurs first since the last checkpoint), my preference is to control checkpoint frequency solely by way of a minutes-between-checkpoints threshold -- it's simple and straightforward, and I like a fixed regularity of checkpoints per hour throughout the day. If you go with a minutes-between-checkpoints threshold, you'll have SINGLE for the value of the CHKTYPE parameter in ZPARM, and the desired number of minutes (again, something in the range of 2 to 5, inclusive, is recommended) specified as the value of the CHKFREQ parameter. By the way, if you go this route -- say, with CHKTYPE=SINGLE and CHKFREQ=5 if you want a DB2 checkpoint every 5 minutes -- then you will have a value of NOTUSED for the CHKLOGR and CHKMINS parameters in ZPARM, as these parameters are used only when checkpoint frequency is managed on both a minutes-between and a log-records-between basis (i.e., when you have CHKTYPE=BOTH). Also, if you see a value of something like 100,000 for the CHKFREQ parameter in ZPARM, that does not mean a checkpoint every 100,000 minutes -- it means a checkpoint after every 100,000 records have been written to the active log (when CHKTYPE is set to SINGLE, a CHKFREQ value between 1 and 60, inclusive, means that checkpoint frequency is time-based, and a CHKFREQ value between 1,000 and 16,000,000, inclusive, means that checkpoint frequency is based on that number of records being written to the DB2 log between checkpoints).

So, check your DB2 system's checkpoint frequency if you haven't in a while, and if you're getting fewer than 2 or more than 5 checkpoints per hour then consider making a change to get checkpoint frequency into the 2- to 5-minute range. Yes, it's technically possible to use log-records-between-checkpoints as a checkpointing threshold, but I'd prefer to make checkpoint frequency a fixed-number-of-minutes-between-checkpoints thing. I like simple, I like straightforward, and I like regularity.

Friday, April 29, 2016

DB2 for z/OS: Should You Do Some Buffer Pool Consolidation?

19, 19, 23, 26.

Know what these numbers have in common? They indicate the number of 4K buffer pools allocated for four different production DB2 for z/OS subsystems that I analyzed over the past couple of years. I believe that there are quite a few sites where a plethora of 4K buffer pools have been defined for a given DB2 subsystem. That's not illegal or anything (DB2 allows you to have as many as 50 different 4K pools for a single subsystem), but it does make the task of managing and monitoring a buffer pool configuration more complex and time-consuming. When I'm in a meeting with an organization's DB2 for z/OS DBAs, and we're looking over information for a DB2 environment, and I see lots of 4K buffer pools defined and I ask about that, I might see sheepish grins and a few eye-rolls, and hear words like, "Well, you know, we just added a buffer pool here and a buffer pool there over the years, for this reason and that, and we ended up with what you see." Recently, I've been pleased to see DBAs at more than one company engaged in consolidation of 4K buffer pools (the number of 8K, 16K, and 32K buffer pools is usually not a matter of concern -- you can only have as many as 10 of each, and most systems I've seen have fewer than that number allocated). These DBAs are reassigning database objects (table spaces and indexes) from lots of smaller 4K pools to a smaller number of larger 4K pools (moving objects from one buffer pool to another got easier with DB2 10, especially for organizations running DB2 in data sharing mode, as I pointed out in a blog entry on the topic I wrote a few years ago). At one site I visited earlier this year, they've taken an interesting approach to managing the size of some of the larger 4K pools they are using for consolidation purposes: they are allocating buffers for these pools in chunks of 524,288. Why? Because that's the number of 4K buffers that can fit into a 2 GB page frame (DB2 11 allows 2 GB page frames to be requested for pools defined with PGFIX(YES), with those very large frames being made available through a specification for the LFAREA parameter in the IEASYSxx member of a z/OS system's SYS1.PARMLIB data set). When the DBAs enlarge one of these pools, it will be enlarged by 524,288 buffers (or a multiple thereof), so as to get one more 2 GB page frame for the pool (or a few more, depending on the size increase).

So, if you're looking to rationalize and simplify the 4K buffer pool arrangement for a DB2 subsystem (or if your organization is just getting started with DB2 for z/OS and you're looking for a reasonable initial set-up -- I got this question a few days ago from a DBA at such a site), what kind of 4K buffer pool configuration might make sense for you? Here are my thoughts on the subject:

  • Ideally, the only database objects (table spaces and indexes) in buffer pool BP0 will be those associated with the DB2 catalog and directory.
  • You should dedicate a 4K buffer pool to the 4K table spaces in the work file database (just as you should dedicate a 32K buffer pool to the 32K table spaces in the work file database). Organizations very often use BP7 for this purpose, because (in a non-data sharing DB2 system) the work file database is named DSNDB07. The "7" in BP7 is then a handy reminder of the role of this buffer pool.
  • You should have a default 4K buffer pool for table spaces used by applications, and a default pool for indexes defined on those table spaces.
  • You might want to have a 4K buffer pool that you would use to "pin" objects in memory (i.e., cache objects in memory in their entirety). You might decide to do this for certain table spaces and indexes that a) are accessed very frequently by important programs and b) are not particularly large (it would take a lot of buffers to pin a really big table space or index in memory). Note that a "pinning" buffer pool should be defined with PGSTEAL(NONE), so as to let DB2 know that you want to use the pool to completely cache objects assigned to it.
  • You might want to have a 4K pool that you would use for monitoring and diagnostic purposes. Suppose, for example, that this pool is BP12, and you have a table space for which you want access activity information. You could then assign that table space temporarily to BP12 and know that the associated information provided by your DB2 monitor or via the DB2 command -DISPLAY BUFFERPOOL(BP12) DETAIL pertains to that one table space. That is a cheaper (in terms of CPU cost) and easier way to get pretty detailed object-specific access activity information versus turning on one or more DB2 performance trace classes.
  • Beyond this basic set-up, you could also consider assigning table spaces that are accessed in a mostly random fashion to a buffer pool that's different from a pool used for table spaces that see a lot of large page scans (and associated indexes would likewise go in two different buffer pools). This would depend, of course, on your being able to determine this division of objects based on predominant access patterns.

All told, you might end up with something like 6 to 8 different 4K buffer pools. I'd prefer that to having 15 or 20 or 25 or more 4K pools. There is often goodness in simplicity.

Thursday, March 31, 2016

Observations on DB2 for z/OS Address Space CPU Utilization

Have you ever looked at a DB2 Statistics Long report generated by your DB2 monitor? If so, you might have seen a section containing information about the CPU consumption of the various DB2 address spaces. It would look something like the example below, which reflects activity over a one-hour period of time (similar information might be available as well via an online display provided by your DB2 monitor). Note that I’ve abbreviated address space names and reduced the precision of CPU time figures (from microseconds to milliseconds) to enable the information to fit within the width of this blog space.

---------  --------  -----------  --------------  ---------------
SYS SVCS     23.203    13:18.791          10.464           55.127
DB SVCS      58.468     1:26.687          17.225        13:49.714
IRLM          0.006        0.000        6:00.679            0.000
DDF           5.741     9:36.358           2.596        12:23.363

An address space's total CPU time for the reporting interval is the sum of the numbers in the corresponding row of the tabular display; so, for example, the total CPU consumption for the DB2 database services address space, based on data shown above, is:

58.468 + 1:26.687 + 17.225 + 13:49.714 = 16:29.094 (16 minutes, 29.094 seconds)

The first column of numbers, labeled TCB TIME, shows the CPU consumption of address space processes represented in the system by TCBs, or task control blocks. TCB CPU time is always consumed on general-purpose processors (aka general-purpose "engines"). The second column, labeled PREEMPT SRB, shows the CPU time, for address space processes represented by preemptible SRBs (Service Request Blocks), that was consumed on general-purpose engines. Work done under preemptible SRBs is generally zIIP-eligible (i.e., eligible for execution by a zIIP engine - a specialty processor that provides relatively lower-cost MIPS), to varying degrees depending on the type of work being done. Work, done under preemptible SRBs, that is not executed by zIIP engines is executed by general-purpose engines. The third column, labeled NONPREEMPT SRB, shows CPU consumption of address space processes represented by non-preemptible SRBs. This work, like work done under TCBs, is always executed by general-purpose engines. The fourth column, labeled PREEMPT IIP SRB, shows the CPU time, associated with processes running under preemptible SRBs, that was consumed on zIIP engines.

In this blog entry, I want to point you towards some observations about DB2 address space CPU utilization figures that you might see for DB2 subsystems at your site.

IRLM - lean and mean

The IRLM address space - responsible for lock management in a DB2 for z/OS environment - typically consumes a very small amount of CPU resource, even when the volume of lock and unlock request activity is very high. For the system from which the CPU times shown above came, the six minutes of IRLM CPU time fueled execution of 128 million lock and unlock requests during the one-hour reporting interval. This great CPU efficiency is a key reason why you shouldn't hesitate to put IRLM where it belongs, priority-wise, in a z/OS LPAR's WLM policy: in the super-high-priority SYSSTC service class. IRLM doesn't use much CPU, but when it needs an engine it needs one RIGHT AWAY; otherwise, lock acquisition and release actions are delayed, and the whole DB2 workload slows down. [By the way, I would not recommend assigning DB2 address spaces other than IRLM to the SYSSTC service class. The other DB2 address spaces - MSTR, DBM1, DIST, and any stored procedure address spaces - should all be given the same priority, and that priority should be below SYSSTC and above address spaces, such as CICS regions, in which application programs run.]

System services (aka MSTR) - the thread factory

The DB2 systems services address space handles various functions. Of these, a major driver of CPU utilization is thread creation and termination. On the DB2 system for which address space CPU times are shown above, the dominant component of the workload during the reporting interval was CICS-DB2 (a little over 500 transactions per second, on average, during the one-hour time period). For this DB2 workload component, it so happens that there was very little in the way of CICS-DB2 thread reuse (the thread reuse rate was about 2%). That being the case, with the high transaction rate the MSTR address space was kept pretty busy creating and terminating hundreds of CICS-DB2 threads per second. If the CICS-DB2 thread reuse rate were to be made considerably higher through the use of a few protected entry threads for the most frequently executed transactions, it's likely that the CPU time for the DB2 system services address space, which was 14 minutes and 47.585 seconds for the system portrayed in the report snippet above, would have been a considerably smaller value.

Database services (aka DBM1) - readin' and writin'

Like the system services address space, the DB2 database services address space performs a variety of functions. Two functions that account for a lot of the address space's CPU consumption are prefetch reads and database writes. Prefetch read operations (referring to the combined total of sequential, list, and dynamic prefetch reads) often greatly outnumber database writes - sometimes by 10 to 1, sometimes by 20 to 1 - in a transactional application environment, as transactional work is often read-heavy (batch workloads are sometimes relatively more write-heavy), so your main leverage point in reducing DBM1 CPU consumption will typically be reducing the rate of prefetch reads in the system. That goal, in turn, is generally achieved via enlargement of buffer pools that see a lot of prefetch reads. Note that the bulk of DBM1's CPU consumption is associated with zIIP processors (in the report snippet above, that's the 13 minutes and 49.714 seconds seen for the database services address space in the column with the heading PREEMPT IIP SRB - about 84% of DBM1's total CPU time). This is so because, starting with DB2 10 for z/OS, prefetch read and database write operations became 100% zIIP-eligible. Because reducing prefetch reads will reduce zIIP engine utilization, does that mean it's not important? No, it doesn't mean that. Reducing zIIP engine utilization can be important, especially as a means of avoiding zIIP engine contention problems.

DDF - SQL-driven

The DDF address space (also known as DIST) is interesting in that its CPU consumption is largely driven by execution of SQL statements that get to DB2 by way of DDF. Referring to the report snippet above, the CPU times for DDF in the TCB TIME and NONPREEMPT SRB columns - about 8 seconds of the address space's total of a little over 22 minutes of CPU time - reflect activity performed by DDF "system" tasks. The rest of the DDF CPU time, consumed by processes represented by preemptible SRBs, is associated with execution of SQL statements issued by network-attached DB2-accessing applications (and that includes sending query result sets back to clients). The more SQL that gets to DB2 through DDF, the higher DDF's CPU consumption will be (just as the CPU time of a CICS region is affected by the cost of executing SQL statements that get to DB2 via that address space). Here's something else to note: the CPU time split between general-purpose engines and zIIP engines for DDF work done under preemptible SRBs. Using numbers from the report snippet above, you can see that this split is about 56% zIIP and 44% general-purpose-engine time (the figure for the zIIP offload percentage for the DDF address space is the time under the PREEMPT IIP SRB column for DDF divided by the sum of the times in the PREEMPT SRB and PREEMPT IIP SRB columns). Execution of SQL statements running under preemptible SRBs in the DDF address space is up to 60% zIIP-offload-able, and I'd say that a zIIP offload percentage in the 55-60% range is good. If you see a split such that the DDF CPU time associated with work done under preemptible SRBs is less than 55% zIIP time (i.e., if PREEMPT IIP SRB time for DDF divided by the sum of PREEMPT SRB and PREEMPT IIP SRB time is less than 55%), check to see if you have a zIIP engine contention issue (check the blog entry on zIIP engine conetntion pointed to by the hyperlink above).

So, looked lately at DB2 address space CPU times in your environment? Check 'em out, and see what conclusions you can draw. I hope that the information provided via this blog entry will be useful to you.