Monday, February 25, 2019

Db2 for z/OS: Isolating REST Client Activity in Db2 Monitor Accounting Reports

In an entry posted to this blog some years ago, I described the form of a Db2 monitor-generated accounting report that I most like to use to get an overall view of application activity in a Db2 for z/OS environment: an accounting long report with data ordered by connection type (here, I am using terminology associated with the Db2 monitor with which I'm most familiar: IBM's Tivoli OMEGAMON XE for Db2 Performance Expert on z/OS). I like to see data in an accounting report aggregated at the connection type level because within such a report there is a sub-report with detailed accounting information for each connection type through which the target Db2 subsystem was accessed: a sub-report showing all CICS-related work for the subsystem, one showing all TSO attach-related work, one showing all DDF-related work, etc.

Now, the thing about the DDF-related activity in this form of an accounting report is that it is presently - thanks to a recent and important Db2 for z/OS enhancement - kind of mis-labeled. The DDF-related work in an accounting long report with data ordered by connection type is distinguished by the label CONNTYPE: DRDA. How is that a bit misleading? Well, the native REST interface to Db2 for z/OS (the enhancement to which I referred previously - designed into Db2 12 and retrofitted to Db2 11) is an extension of the Db2 distributed data facility (DDF's long name). What that means - as pointed out in a blog entry I posted a few months ago - is that there are now two paths into DDF: the DRDA path (which application developers might think of as the SQL path) and the REST path; thus, CONNTYPE: DRDA is a little off because what you see in that part of an order-by-connection-type Db2 monitor accounting report is all DDF-related activity on the Db2 subsystem of interest - activity associated with DRDA requesters and activity associated with REST clients.

What should the designator of DDF-related activity in an order-by-connection-type Db2 monitor accounting report be? CONNTYPE: DRDA+REST? CONNTYPE: DDF? Or, should there be a CONNTYPE: DRDA for work that is strictly related to actual DRDA requesters and a CONNTYPE: REST for activity related to REST clients?

I'll let the Db2 monitor developers figure that out. Meanwhile, if you have a Db2 subsystem on which there is a good bit of activity related to both DRDA requesters and REST clients, and you want to see, in one Db2 monitor accounting report, just the REST-related activity, how can you do that? If all the REST clients authenticated to the Db2 subsystem using only two distinct authorization IDs - for example, ABCID and XYZID - then you could get your REST-only accounting report by telling the Db2 monitor to include, in the accounting report, only activity related to those primary authorization IDs. The way you'd convey that request to OMEGAMON for Db2 would be through this specification in the report control statement:

INCLUDE (PRIMAUTH(ABCID XYZID))

Obviously, that approach becomes unwieldy if there are, say, a hundred or more IDs used by Db2 REST clients.

Is there a better way? Yes, there is. It so happens that a Db2 accounting trace record associated with a REST client interaction will include, in the correlation ID field (the field name is QWHCCV), the value 'DB2_REST'. Thus, one can easily get a Db2 monitor accounting report showing all REST-related activity for a subsystem, and only that activity, by informing the monitor that the report should include data only for DDF-related requests for which the correlation ID value is DB2_REST. Again using OMEGAMON for Db2 report command syntax (which is what I know), that would be done with this specification:

INCLUDE(CONNTYPE(DRDA)
        CORRNAME(DB2_REST))

And there you have it. I hope that this information will be useful to you, should your site start seeing a significant amount of Db2 for z/OS access via the REST interface (and I think that a substantial number of sites will see just that in the near future - the REST interface has generated a lot of interest among application developers).

Friday, January 4, 2019

A Case Study: Measuring the Impact of Db2 for z/OS Buffer Pool Changes

Not long ago, a Db2 for z/OS administrator contacted me with a request. His team had made some changes to the buffer pool configuration in their production Db2 environment (the overall size of the buffer pool configuration was increased to reduce read I/O activity, and greater use was made of large real storage page frames), and they were not sure as to how the performance impact of these changes could be measured. Could I assist with that analysis?

Sure, I said. I asked the Db2 admin to use his Db2 performance monitor tool to generate detailed accounting reports (depending on the monitor used, such a report might be titled ACCOUNTING REPORT - LONG or ACCOUNTING SUMMARY - LONG) capturing activity during periods preceding and following the buffer pool changes - same FROM and TO times, same day of the week for both reports (to make the comparison as "apples to apples" as possible). I also requested that the reports be generated so as to have data aggregated at the connection type level (again, terminology can vary by monitor product - your instruction to the monitor might be to "order" the data by connection type, or to "group" the data by connection type). Why do I like for data in a detailed accounting report to be aggregated in this way? Well, I find the default aggregation specification for Db2 monitor-generated accounting reports - authorization ID within plan name - to be too fine-grained for effective analysis of a Db2 system in an overall sense; conversely, aggregation at the subsystem ID level is generally too coarse-grained for my liking. When data in a detailed accounting report is aggregated at the connection type level, you get within the larger report a sub-report for each connection type used in the Db2 for z/OS system of interest; for example: a sub-report showing all CICS-Db2 work in the system, one showing all batch work that uses the call attachment facility, one showing all batch work using the TSO attachment facility, one showing all DDF-related work (connection type: DRDA), etc. With such a report in hand (or on your screen), you can quickly and easily identify the largest Db2 workload components (with "component" here referring to work associated with a given connection type) and focus your analysis accordingly.

I showed the Db2 admin and one of his colleagues how to put the aforementioned "before" and "after" Db2 monitor accounting reports to use in measuring the impact of the effected buffer pool configuration changes, and I'll provide that knowledge transfer here in the remainder of this blog entry.

First, I look over the sub-reports (by connection type) within the larger "before" and "after" detailed accounting reports to find the workload components (connection types) that dominate in terms of size. "Size" refers in this case to aggregate in-Db2 CPU cost of SQL statement execution, and that's easy to determine: for a given workload component (a given connection type), find first the average in-Db2 (i.e., class 2) CPU time (done by summing two values: average class 2 CP CPU time, which is general-purpose CPU time, and average class 2 SE CPU time, which is "specialty engine," or zIIP, CPU time). Next, take that average class 2 CPU time for the connection type, and multiply it by the number of occurrences (typically, number of accounting records), because "average" is "average per occurrence." Here is an example of what you might see in one of the sub-reports (this for a DRDA workload), with the numbers I've referenced highlighted in red:

CONNTYPE: DRDA

AVERAGE.     DB2 (CL.2)                   HIGHLIGHTS
-----------  ----------                   -----------------------
                                          #OCCURRENCES  : 3620821
CP CPU TIME    0.000169

SE CPU TIME.   0.000223

So in this case, aggregate CPU cost of SQL statement execution (in-Db2, or class 2, accounting time is time associated with SQL statement execution) for the DRDA workload component is (0.000169 + 0.000223) * 3620821 = 1419 CPU seconds.

Performing this simple calculation showed that for the overall Db2 workload I was analyzing, the two biggest components by far were the DRDA and CICS-Db2 connection types; so, I didn't bother with the other much-smaller components (call attachment facility, TSO attachment facility, etc.).

For the two major workload components (DRDA and CICS-Db2), I compared "before" and "after" figures for average class 2 CPU time. That is the key metric for evaluating the impact of a Db2 system or application performance tuning action. Too often, Db2 for z/OS people effect a change aimed at improving workload performance and look, for evidence of improvement, at the total class 2 CPU time for the workload of interest (and by the way, I focus on class 2 CPU time because that is what a Db2 person can influence - class 1 CPU time includes time outside of Db2, over which a Db2 person may not have much control). "Before" and "after" figures for total class 2 CPU time for a workload may show little or nothing in the way of difference, leading a Db2 person to think that a performance tuning action accomplished nothing. In fact, it may be that the tuning change was indeed successful. How could that be, if total in-Db2 CPU time did not decrease? Simple: look for a change in throughput (the number of commits is a pretty good indicator of "pieces of Db2 work" completed on a system). A Db2 tuning change might reduce average in-Db2 CPU time per transaction (or, more broadly, per unit of work), and that might result in greater throughput (units of work completed in a given time period), and that (more transactions completed, at less in-Db2 CPU time per transaction) might result in a non-decrease in the total in-Db2 CPU time for the workload. In my experience, greater throughput with no increase in total in-Db2 CPU time for a workload is a GOOD thing. If you see increased throughput following a Db2 performance tuning change and your organization actually wants to keep throughput at the lower "before" level, that can be accomplished by limiting the availability of Db2 threads for the workload in question - essentially, you keep a lid on throughput by introducing some queuing for Db2 threads.

OK, so your focus as a Db2 person should be on reducing average class 2 CPU time for a workload, as indicated in a Db2 monitor-generated accounting report (or an online monitor display of thread activity on the system - but I prefer to look at accounting reports); and, in tracking this, make sure that you look at "average total" class 2 CPU time, meaning the sum of average class 2 general-purpose CPU time (usually labeled CP CPU time) and average class 2 zIIP CPU time (usually labeled SE CPU time) - always check the average class 2 zIIP CPU time, even for non-DDF-related applications that you might not think of as drivers of zIIP utilization, because that zIIP time figure could be non-zero (for example, if some queries in a batch job are parallelized by Db2, some of the batch job's class 2 CPU time will be consumed on zIIP engines because the "pieces" of queries parallelized by Db2 are zIIP-eligible). As indicated by the average-class-2-CPU-time yardstick, how did the Db2 team with which I worked do, with regard to improving application workload?

They did quite well:

                      Avg cl 2 CPU (GP + zIIP) - seconds
Workload component      Before     After        Change
------------------    --------  --------  ------------
CICS-Db2              0.001322  0.001245   6% decrease
DRDA                  0.000392  0.000259  34% decrease

That 6% improvement in average class 2 CPU time for CICS-Db2 transactions is close to what I am accustomed to seeing as a result of buffer pool actions such as those taken by the Db2 team in the case of which I'm writing: larger buffer pools, and greater use of page-fixed pools backed by large real storage page frames, tend to shave a few percentage points off of average class 2 CPU times reported by a Db2 monitor. A modest improvement of that nature is generally welcome, as it requires no application code changes and positively impacts a large number of programs (it's the kind of change that I put in the "rising tide lifts all boats" category).

What about the 34% improvement in class 2 CPU efficiency seen for the DRDA workload as a result of the buffer pool changes? Why such a large improvement? Here's my theory: as is true in many cases, the the SQL statements associated with the DRDA workload at this site are overwhelmingly dynamic in nature (that is commonly seen when DDF transactions do not make heavy use of static SQL-issuing stored procedures). Dynamic SQL statements can be automatically re-optimized on a regular basis, particularly when a "bounce" (stop and restart) of a Db2 subsystem requires total re-population of the Db2 dynamic statement cache in memory (where the prepared-for-execution form of dynamic SQL statements are stored for re-use). The Db2 subsystem on which this blog entry is focused was indeed bounced on the weekend during which the buffer pool changes were put into effect, so ALL of the dynamic SQL statements coming into the system following that Db2 stop/restart were optimized at first execution. And here's the thing: it's not just indexes and catalog statistics that influence SQL statement access path selection. Db2 memory resources - particularly buffer pool resources - are another factor. I have a feeling that when the dynamic SQL statements (especially queries) associated with the DRDA workload were re-optimized following the significant changes made to the Db2 subsystem's buffer pool configuration, some new and better-performing access plans were generated. Further evidence of positive access path changes: average GETPAGEs per transaction for the DDF workload decreased by 24% following the buffer pool changes. GETPAGE activity is a major determinant of the CPU cost of SQL statement execution, and a substantial decrease in such activity is often indicative of access path changes.

This kind of begs the question: would a rebind of the packages associated with the CICS-Db2 workload (a workload for which, at this site and many other sites, almost 100% of the SQL statements are static) following the buffer pool changes have led to per-transaction CPU efficiency gains going beyond the 6% improvement that we saw? Quite possibly. That's something to think about. If you do decide to rebind static SQL packages following some significant buffer pool changes, keep a couple of things in mind. First, rebinding with APCOMPARE(WARN) will cause Db2 to let you know (via message DSNT285I) when access paths change for one or more statements associated with a package (information about access path changes is provided in the REMARKS column of the PLAN_TABLE when a package is rebound with EXPLAIN(YES)). Second, if access paths change and package performance gets worse as a result (not likely, but possible), you can quickly and easily put the previous instance of the package back into use through a REBIND with SWITCH(PREVIOUS), assuming that plan management is in effect for your Db2 subsystem (and it is "on" by default - check the value of the PLANMGMT parameter in the ZPARM module). Together, these Db2 features are aimed at "taking the fear out of rebinding," as a former colleague of mine so aptly put it.

So, there you go. If you make Db2 changes aimed at boosting performance for a workload, measure the result using the average class 2 CPU time figures provided in Db2 monitor-generated accounting detail reports that show activity from "before" and "after" time periods. Buffer pool changes (larger pools, greater use of page-fixed pools backed by large real storage page frames) usually reduce average class 2 CPU time by a few percentage points. Greater performance gains could be seen if the buffer pool changes enable Db2 to choose new and better-performing access paths for SQL statements. Those statement re-optimization actions tend to be automatic for dynamic SQL statements (especially if a Db2 subsystem's dynamic statement cache has to be re-populated). For static SQL, package rebind is needed for statement re-optimization. Rebinding with APCOMPARE(WARN) and EXPLAIN(YES) will provide information about access path changes, and in the unlikely event of a negative performance result following re-optimization, rebinding with SWITCH(PREVIOUS) puts the previous instance of the package back into use.

Friday, December 28, 2018

Db2 for z/OS Partitioned Table Spaces: How the Number of Partitions can Impact Performance

This past summer, I delivered a couple of presentations at a meeting of a regional Db2 users group in the USA. During the lunch break, I shared a table with a member of the Db2 for z/OS support team at a large financial institution. He told me an interesting story. At his site, the Db2 for z/OS team had decided that universal partition-by-range (PBR) would be a better table space type for a large table currently housed in a partition-by-growth table space (PBG). To prove out this assessment, the team created a copy of the table of interest in a PBR table space, and ran some batch jobs that targeted the two variants of the table - the same jobs (some that changed data, others that only read data), accessing data that was identical in a logical sense (i.e., same data records in the tables in the two different table spaces). The performance characteristics of the PBG-accessing and the PBR-accessing jobs were compared.

The findings? The PBG-accessing jobs showed consistently better elapsed and CPU times versus the PBR-accessing jobs. That was not a result I would have expected. I've blogged before about reasons for favoring PBR over PBG for large Db2 tables, and some of those reasons are performance-related. Would I have been surprised by comparative results showing roughly equivalent performance for PBG-accessing and PBR-accessing jobs? Maybe not. But better performance for the PBG-accessing jobs? That had me scratching my head.

I asked the Db2 person at the user group meeting to send me Db2 performance monitor accounting "long" reports capturing activity for the PBR-accessing and PBG-accessing jobs, and he did that not long after getting back to his workplace. Analysis was focused on the package-level accounting data for a particular package that accessed only the table with the PBR and PBG variants (other packages associated with the batch jobs used for comparison accessed the PBR-and-PBG-variant table and several other related tables as well). Sure enough, for the "this table only" package the CPU time was about 6% higher for the PBR-housed table versus the PBG-housed table. Not a huge difference, but statistically significant and, as previously mentioned, unexpectedly favoring the PBG table space.

I started asking questions, looking for a possible explanation for the better performance result seen for the PBG-accessing package:

  • Was the SEGSIZE specification the same for both table spaces (a larger SEGSIZE value can result in better CPU-efficiency for large-scale page scans)? Yes, same SEGSIZE for both table spaces.
  • Was the PBR table space in fact universal, as opposed to being a non-universal table-controlled partitioned table space (I wanted to make sure that we were comparing universal to universal)? Yes, the PBR table space was indeed of the universal variety.
  • Were the PBG-housed and PBR-housed tables clustered by the same key (clustering affects locality of reference for set-level query and update and delete operations)? Yes, both variants of the table had the same clustering key.
  • Were there indexes defined on the same keys of both tables (obviously, indexes have a lot to do with query access paths, and they affect the CPU cost of insert and delete and some update operations)? For both the PBG-housed and the PBR-housed table, there was only one index, and it was defined on the same key.
  • Were the PCTFREE and FREEPAGE specifications the same for the one index on the two table variants (a significant difference here could affect index GETPAGE counts)? Yes, for the one index defined on the same key of both table variants, the PCTFREE and FREEPAGE values were identical.
  • Was the one index on the PBR-housed table a partitioned index or a non-partitioned index (NPI)? It was a partitioned index, and by the way the PBR table space had 318 partitions (me, on getting that piece of information: "Hmmmm").

OK, so what prompted my "Hmmmm" on learning the number of the PBR table space's partitions? That number got me thinking, "318 is a pretty substantial number of data sets. Depending on the number of data sets associated with the PBG table space, that could be an important differentiating factor between the two." Indeed, it turned out that the PBG table space had far fewer data sets versus its PBR cousin. How could that influence CPU time for an application process? In this case, a clue came from a number in the aforementioned Db2 monitor accounting reports: the commit count for the batch process that accessed the table of interest: 204,004 over the 4-hour, 50-minute reporting interval (a little over 700 per minute). What does that have to do with the job's CPU time? Well, when the RELEASE specification for a package is COMMIT (the default), any and all "parent" locks acquired in executing the package will be released every time the application process driving the package's execution issues a commit (the package itself is also released from the application process's thread at each commit). We tend to think of "parent" locks as being table space-level locks, but in fact when a table space is partitioned the parent locks are at the partition level. If the application process's SQL statements are such that a large number of partitions are accessed in each unit of work, that means a lot of partition-level locks are acquired within each unit of work and released at commit time. That parent lock acquisition and release cost can be yet a bit higher in a data sharing environment (and such was the case in the situation I'm describing), because a significant percentage of global locks tend to be of the parent type.

I relayed to the Db2 support person my thinking about the impact that the number of PBR versus PBG partitions might have on the application process's CPU time. He got together with the larger Db2 team at his site, and they decided to repartition the PBR table space in a way that dropped the number of partitions from 318 to 58, and voila - the application process's CPU time dropped to a level that was several percentage points below that seen for the PBG table space. Success!

Now, it's important to note that in this particular situation, a re-partitioning of the PBR table space was not a problematic change. Suppose you have a situation like the one I've described here, and re-partitioning your PBR table space is not a viable option? In that case, the in-Db2 CPU cost of an application process (likely a batch job) that accesses a large number of a table space's partitions within each unit of work could be reduced (especially if the job issues a large number of commits, resulting in a large number of units of work) via a rebinding of the associated Db2 package(s) with RELEASE(DEALLOCATE). That package bind option would cause the partition-level locks acquired as the batch job progresses to be retained until thread deallocation time (i.e., until end-of-job). That, in turn, would eliminate the overhead of releasing many partition-level locks at each commit (which locks would likely be re-acquired within the next unit of work), thereby reducing CPU time (I posted an entry on RELEASE(DEALLOCATE) considerations to this blog a few years ago - the part under the heading "Batch" is most relevant to the blog entry you're reading now). [Note: table space- and partition-level locks are almost always of the "intent" variety. Such locks are non-exclusive in nature, so retaining them across commits should not be a concern from a concurrency perspective.]

The bottom-line message here is NOT that you should always go for fewer rather than more partitions for a partitioned table space; rather, it is that the number of a table space's partitions can be a factor that affects CPU efficiency for some applications. A type of application for which this can matter is one that a) uses RELEASE(COMMIT) packages, b) is characterized by many units of work in the life of a thread, and c) accesses a large percentage of a table space's partitions in a typical unit of work. For such an application, CPU efficiency could potentially be improved by going with a smaller rather than a larger number of partitions for the target table space. When significant partition reduction is not an option for a table space, CPU time for an application process such as the one described in this blog entry could potentially be reduced by rebinding the application's Db2 packages with the RELEASE(DEALLOCATE) option.

Friday, November 30, 2018

Db2 for z/OS Global Variables: What is a "Session"?

Has this ever happened to you? You're looking at the Db2 for z/OS SQL Reference (or in the Db2 for z/OS Knowledge Center on the Web), and you see something like this (and the quote below is from the description of the CREATE VARIABLE statement in the SQL Reference, with highlighting added by me):

Global variables have a session scope. Although they are available for use to all sessions that are active at the current server, the value of the global variable is private for each session.

You think to yourself, "OK. That's good to know." And then, "But wait - what's a 'session'?" You scour the Db2 documentation for the definition of "session", and you come up empty. You're left scratching your head. "Is a 'session' a thread? A transaction? Something else?" I am writing today to remove this particular cause of head-scratching (your scalp can thank me later). I will start by giving you the short answer, and then I'll elaborate.

Short answer: the meaning of "session", in a Db2 for z/OS context, depends on the nature of an application's connection to a Db2 server. For "local-to-Db2" applications (meaning, applications that execute in the same z/OS LPAR as the target Db2 system, and are not network-connected to Db2 - examples include CICS transactions and batch jobs), "session" equates to "thread" (referring to the Db2 thread used by the application). For DRDA requesters (applications that access Db2 for z/OS via the Db2 DDF address space, using the DRDA protocol by way of an IBM driver such as the IBM Data Server Driver or Db2 Connect), "session" equates to the logical connection an application has established with the Db2 server. For a REST client (an application interacting with a Db2 system via the REST interface to the DDF address space), "session" equates to transaction.

You may wonder, "Why these differences between local-to-Db2, DRDA requesters and REST clients, with regard to the Db2 meaning of 'session'?" Read on for explanatory information.

Local-to-Db2 applications

Let's start with an easy case: a Db2-accessing batch job (and I'm not talking about a Java batch job that could use a type 4 JDBC driver and therefore be network-connected to Db2, from Db2's perspective). This job will get a Db2 thread at the first issuance of a SQL statement, and that thread will persist until end-of-job. The job might issue multiple COMMITs, but it's always the same thread that's being used for the life of the job. If the batch job involves use of a Db2 global variable, the initial value of the global variable will be its default value (for a user-created Db2 global variable, the default value will be null unless a different default was specified when the variable was created). If the job sets the value of the global variable to X, that will be the global variable's value for the duration of the job, unless it is subsequently changed by the job to some other value. If another, concurrently executing batch job sets the same global variable to value Y, the other batch job sees Y in the global variable, not X, because the two batch jobs are associated with two different Db2 sessions. Pretty simple.

How about a CICS or an IMS transaction? Still simple, absent thread reuse (I'll get to the thread reuse situation momentarily). The transaction, like a batch job, gets a Db2 thread at the first issuance of a SQL statement, and the thread persists until end-of-transaction. If the transaction involves use of a Db2 global variable, the global variable will initially have its default value. If the transaction sets the global variable to some value, that value will persist, unless subsequently changed by the transaction, until the transaction completes processing and its Db2 thread is deallocated. If two concurrently executing transactions set the value of the same Db2 global variable to X and Y, respectively, the one transaction will see the value X in the global variable and the other transaction will see the value Y in the global variable, because the two transactions are associated with two different Db2 sessions. Again, pretty simple.

Things get a little more interesting in the case of thread reuse by CICS or IMS transactions. Thread reuse is good for transaction CPU efficiency (especially when paired with Db2 packages bound with RELEASE(DEALLOCATE)), but it changes the effective meaning of "session" in a Db2 for z/OS context. Why? Because absent thread reuse, there is a one-to-one correspondence between transactions and Db2 threads. In a transactional thread reuse situation, there is a many-to-one relationship between transactions and threads - in other words, multiple transactions reuse the same CICS-Db2 or IMS-Db2 thread (that's the point of thread reuse). What does this mean for a given transaction program? Well, let's say that one transaction sets the value of Db2 global variable GVAR to X. The transaction completes, and another transaction reuses the same Db2 thread. With regard to that second transaction, what will be the initial value of the Db2 global variable GVAR? Will it be GVAR's default value? NO - it will be X, the value placed in GVAR by the transaction that previously used the thread. Why is this so? Because the scope of a Db2 global variable is a session, and for a local-to-Db2 application, "session" equates to "thread", and in a thread reuse situation multiple transactions will use a particular thread. The moral to this story: if you have a transaction program that puts sensitive information in a Db2 global variable, and this is (or could be) a thread reuse situation, you'd better make sure that you re-set the global variable to its default value before end-of-transaction; otherwise, a transaction that subsequently uses the same thread will be able to see the sensitive information your transaction placed in the global variable.

DRDA requesters

This situation gets really interesting, because the whole application-thread relationship is quite a bit different versus the CICS-Db2 or IMS-Db2 case. A DRDA client application will establish a connection to a Db2 for z/OS server, and very typically the application will stay connected to the Db2 system for a considerable period of time (and it may establish several connections to the Db2 server). Getting more specific, it is the "logical connection" that matters here - the connection that the application perceives that it has with the Db2 server. I say "logical connection" because physical connectivity to the Db2 server involves things like "transports" provided by the IBM Data Server Driver (or Db2 Connect) - something that makes connection pooling possible through the servicing of n logical connections to a Db2 server with fewer-than-n physical connections (connection pooling is good for client-server application scalability and efficiency).

How about threads? Well, a given logical connection between a DRDA client application will likely involve use of any number of Db2 threads. How so? Consider the "regular" DBAT case (DBATs - database access threads - are the kind used for applications accessing Db2 via the DDF address space, and by "regular" I mean DBATs that are not of the high-performance variety). A DRDA client application has established a logical connection to a Db2 server, and a transaction uses that connection. To service the transaction, Db2 takes the application's logical connection from the inactive state (its state when not being used by a transaction) to the active state, and assigns to the now-active connection a DBAT from the DBAT pool. The transaction completes, the DBAT is separated from the connection and goes back to the DBAT pool, and the connection goes back to the inactive state until it's needed for another transaction.

OK, how about the high-performance DBAT case? A high-performance DBAT is instantiated when a Db2 package bound with RELEASE(DEALLOCATE) is allocated to a "regular" DBAT for execution (in other words, the "regular" DBAT becomes a high-performance DBAT in that situation). Once instantiated, the high-performance DBAT will NOT go back into the DBAT pool at end-of-transaction; instead, it will remain dedicated to the connection through which it was instantiated (a connection identified with an application server IP address and an ephemeral port), and will be reused, up to 200 times, by transactions associated with the connection (thus, high-performance DBATs deliver performance benefits similar to those provided by CICS-Db2 protected entry threads and threads between IMS wait-for-input (WFI) and pseudo-WFI regions and a Db2 system).

In a Db2 data sharing system, in which something called Sysplex workload balancing is active (and it's active by default with the IBM Data Server Driver and Db2 Connect), the situation gets even more interesting: a DRDA client application establishes a connection to the Db2 data sharing group (at least it should, versus connecting to an individual group member), and different transactions associated with the same logical connection to the Db2 system end up executing on different members of the data sharing group, using DBATs (regular and/or high-performance) provided by the individual member Db2 subsystems.

I've provided, above, "behind the curtain" information on the workings of Db2 for z/OS as a DRDA server, and now I'll tell you that all this DBAT stuff is essentially irrelevant to the meaning of "session" in a DRDA requester context. The stuff that Db2 for z/OS does with regular and high-performance DBATs, and with active and inactive connections, is invisible to a DRDA client application. It's kind of like the difference between logical and physical database design: a Db2 table looks the same to an application program, regardless of whether it's in a traditional segmented or a universal partition-by-growth or a range-partitioned (universal or otherwise) table space. That's Db2 plumbing. Because a given logical connection to a Db2 server is what it is from the DRDA client application perspective, regardless of the number and/or type of DBATs used to service the transactions associated with the connection, it absolutely would not make sense to equate "session" with "thread" in this case. If a DRDA transaction using a logical connection to a Db2 server sets the value of Db2 global variable GVAR to X, that value will remain X (unless subsequently changed) for the life of the logical connection. In that sense, the situation is similar to the one described above for CICS-Db2 or IMS-Db2 thread reuse: if you do not want the value placed in a global variable by a DRDA transaction to be seen by a subsequent transaction associated with the same logical connection to the Db2 system, reset the value of the global variable before end-of-transaction.

If two transactions associated with two different logical connections between a DRDA client application (or applications) and a Db2 system place the values X and Y, respectively, in the same Db2 global variable, the one transaction will see X in the global variable and the other will see Y, because the scope of a global variable is a session, and two different logical connections to a Db2 server system are two different sessions.

REST clients

This one's pretty easy: for a client transaction using Db2's native REST interface, "session" equates to transaction. Period. Why so comparatively simple? Because REST is a so-called "stateless" model. Using the REST interface, each and every interaction with Db2 gets a "logical fresh start" on the Db2 side, and that means, among other things, automatic resetting of any global variable set by a REST-using transaction back to its default value at end-of-transaction. If Db2 global variable GVAR is set to X by a REST-using transaction, a subsequent REST-using transaction will not see X in GVAR at transaction start time - it will see GVAR's default value. If two different, concurrently executing REST-using transactions put X and Y, respectively, in the same Db2 global variable, the one transaction will see X in the global variable and the other will see Y, because the scope of a global variable is a session, and each and every REST-using transaction is associated with a different Db2 session.

In summary...

Keep the simple answer to the Db2 session question in mind:
  • For local-to-Db2, non-DDF-using applications: session = thread
  • For DRDA client applications: session = logical connection to Db2 server
  • For REST clients: session = transaction
I hope that this information will be helpful to you, and that you can take "Db2 session" off of your head-scratcher list.

Wednesday, October 31, 2018

Db2 for z/OS Buffer Pools: Clearing the Air Regarding PREFETCH DISABLED - NO READ ENGINE

Has this ever happened to you? You're looking at the output of a Db2 for z/OS -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command, or maybe at a statistics report produced by your Db2 monitor, or maybe your Db2 monitor's online display of a subsystem's buffer pool activity, and you see for one of your buffer pools something like this:

PREF.DISABLED-NO READ ENG   350.00

And you think, "Uh-oh. Prefetch is a good thing, so disabled prefetch must be a bad thing. And it happened 350 times for BP2 in one hour! AHHHHH! CODE RED! WE'RE OUT OF READ ENGINES!"

Hey - take a few deep breaths, and calm down. Chances are, this is not a big deal. In this blog entry, I'll explain why.


First: what does it mean?

OK, a lot of people know that prefetch operations - sequential, list, and dynamic - are handled by Db2 tasks. That is to say, the CPU  cost of processing prefetch requests is charged to Db2 (specifically, to the Db2 database services address space, also known as DBM1), as opposed to being charged to the application process on behalf of which Db2 is executing the prefetch operations. We sometimes refer to Db2's prefetch-handling tasks as "prefetch read engines." They are represented in a z/OS system by what are known as preempt-able SRBs (service request blocks - a type of z/OS control block), and that's what makes the work that they do zIIP-eligible (100% zIIP-eligible, in fact - a major reason why, in most systems, the bulk of CPU time charged to DBM1 is zIIP engine time).

There are - surprise! - a finite number of prefetch read engines associated with a given Db2 subsystem. That number is 600 in a Db2 11 environment, and 900 in a Db2 12 system. If all 600 Db2 11 prefetch read engines (or all 900 Db2 12 engines) are busy handling prefetch requests, and another prefetch request comes along, that prefetch request will be abandoned and the PREFETCH DISABLED - NO READ ENGINE counter will be incremented (externalized via the QBSTREE field of trace record IFCID 0002, written when Db2 statistics trace class 1 is active, and also part of the output of the Db2 command -DISPLAY BUFFERPOOL with DETAIL). And what does it mean when a prefetch request is abandoned because there was not a prefetch read engine available to process the request? It could mean that pages that would have been brought into memory via the abandoned prefetch request will subsequently be read into a buffer pool by way of a synchronous (i.e., single-page, on-demand) read operation, but that is not necessarily the case. Read on.


Second: do you care?

One thing to know up front: if you see a non-zero value for PREFETCH DISABLED - NO READ ENGINE for one or more of your Db2 buffer pools, you're not alone. This situation is not super-common, but neither is it highly unusual. In fact, we've been seeing more incidences lately of non-zero values for PREFETCH DISABLED - NO READ ENGINE. Why? Several reasons. One is the trend of Db2 data sharing group member consolidation. As IBM Z servers have become more powerful and as Db2's vertical scalability has been enhanced, organizations running Db2 in data sharing mode on Parallel Sysplex clusters have found that they can support an application workload with (for example) 4 members in a data sharing group versus 6. When work that had been spread across n data sharing members now runs on a smaller number of members, that can mean more concurrently active prefetch read requests for a given member, and that can lead in some cases to prefetch read requests exceeding the number of prefetch read engines available on a subsystem. Other factors driving increased levels of prefetch read activity include Db2 query optimizer changes that drive, in particular, more list prefetch-related access path selection; the I/O parallelism for index updates that Db2 10 introduced; and row-level sequential detection, also introduced with Db2 10, which enables Db2 to continue to use dynamic prefetch even as a table space's data rows become somewhat disorganized as a result of data-change activity.

So, if you see non-zero numbers for PREFETCH DISABLED - NO READ ENGINE for one or more of your buffer pools, should you try to do something about it? Maybe, maybe not. First, check a related counter: PREFETCH DISABLED - NO BUFFER. If that value is also non-zero, it's likely that the buffer pool in question is too small, and/or that VPSEQT (the virtual pool sequential threshold) has been changed from its default value of 80 to a too-small percentage of the pool's buffers. Respond to that situation by enlarging the pool, if the z/OS system's real storage resource is not constrained (system memory is not constrained if the z/OS LPAR's demand paging rate - available, among other places, in an IBM RMF CPU summary report - is zero or a very small non-zero value, like less than 1 per second), and/or by taking the pool's VPSEQT value from a too-small number (if applicable) to something closer to 80. Also consider enlarging the buffer pool if the pool's total read I/O rate is in the hundreds or more per second. How can a larger buffer pool reduce incidences of PREFETCH DISABLED - NO READ ENGINE? Here's how: a larger buffer pool that caches more pages makes it more likely that a prefetch request will NOT drive an associated prefetch read I/O operation. Examine information for your buffer pools, using a Db2 monitor or the -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command. Look at (for example) the dynamic prefetch numbers for the various buffer pools. You will likely see, for at least one of your pools, a situation in which the number of dynamic prefetch requests is larger - perhaps considerably larger - than the associated number of dynamic prefetch reads. How can that be? Simple: let's say that there's a dynamic prefetch request to read 32 pages of a table space or index into memory. What if all 32 of those pages are ALREADY IN MEMORY? In that case, the dynamic prefetch request will not result in a dynamic prefetch read operation. Now, the dynamic prefetch request will still occupy a prefetch read engine, but the request will tie up the prefetch read engine for a much shorter time if it does not involve a prefetch read I/O operation. When prefetch read engines are freed up more quickly thanks to prefetch requests that do not drive prefetch reads, it becomes less likely that all of a Db2 subsystem's prefetch read engines will be busy at the same time.

Now, let's say that you have a buffer pool that is large enough to have a relatively low total read I/O rate, and you see for that pool no occurrences of PREFETCH DISABLED - NO BUFFER, but you still see a non-zero value for PREFETCH DISABLED - NO READ ENGINE. Is that a cause for concern? Quite possibly not. To understand this, consider a scenario: some application process is accessing a table space or index in a sequential fashion. A dynamic prefetch request (for example) is initiated by Db2 on behalf of the application process, but all of the subsystem's prefetch read engines are busy handling other requests. The dynamic prefetch request is abandoned. That will lead to synchronous read requests, right? Not necessarily. Suppose the aforementioned application process (or another process accessing the same database object) needs a page that would have been read into memory via the dynamic prefetch request that was abandoned due to a "no read engine" situation. Will a synchronous read operation be required for that page access? NO, if that page is already in the buffer pool. If the requested page is indeed already in memory, there was essentially NO penalty associated with the abandoned dynamic prefetch request - only the very small amount of CPU consumption associated with initiating and subsequently abandoning the prefetch request (so small that I think it would be hard to measure). Here's something you can check, using your Db2 monitor or the output of -DISPLAY BUFFERPOOL(ACTIVE) DETAIL: for a buffer pool that shows, at least sometimes, some occurrences of PREFETCH DISABLED - NO READ ENGINE, look at the total number of synchronous read I/Os, and look also at the total number of synchronous read I/Os that are labeled SYNCHRONOUS READS - SEQUENTIAL. What is a "sequential synchronous read?" It's a synchronous read I/O driven by a process that is accessing data in a sequential fashion. Why might there be synchronous reads for a process accessing data in a sequential manner? One reason: a data row (for example) that is to be FETCHed by a process is way out of place, in a clustering sense. That being the case, the row is not in the quantity of table space pages recently prefetched into memory on behalf of the application process. When the process needs the out-of-place row, the associated page will be synchronously read into the buffer pool, and that action will be recorded as a SYNCHRONOUS READ - SEQUENTIAL. Another reason: an application process accessing data in a sequential fashion has to drive a synchronous read I/O because a page that would have been prefetched into memory was not because the prefetch request was abandoned due to NO READ ENGINE.

If you see PREFETCH DISABLED - NO READ ENGINE go from zero to non-zero for a pool, or if that number goes from n to something considerably larger than n, and sequential synchronous reads do not increase as a percentage of total synchronous reads for the buffer pool, it is likely that the incidences of PREFETCH DISABLED - NO READ ENGINE did not materially impact application performance. In other words, it is likely that pages related to prefetch requests that were abandoned due to NO READ ENGINE were found to be already in memory when subsequently needed by application processes. In that case, occurrences of PREFETCH DISABLED - NO READ ENGINE are really nothing about which you should be concerned.

One more thing: if you have a buffer pool that is already big enough to have a relatively low total read I/O rate, and the pool has nothing in the way of PREFETCH DISABLED - NO BUFFER, and you see for this pool some occurrences of PREFETCH DISABLED - NO READ ENGINE, and you'd really like to try to take that to zero, you could look at the possibility of taking some objects in the pool that are a) not humongous and b) are frequently accessed via prefetch, and reassigning them to a pool defined with PGSTEAL(NONE). If the PGSTEAL(NONE) buffer pool is large enough to hold all pages of all objects assigned to the pool (and that's the objective for a PGSTEAL(NONE) pool), there should be few, if any, prefetch requests associated with the pool, and that decrease in prefetch request activity might reduce the incidence of PREFETCH DISABLED - NO READ ENGINE you've seen for other pools. Something to consider.

So, don't freak out if you see some occurrences of PREFETCH DISABLED - NO READ ENGINE for a buffer pool. It may be having little - if any - impact on application performance. If you do think that non-zero values for PREFETCH DISABLED - NO READ ENGINE are leading to increased synchronous read activity (check for increases in sequential synchronous reads), consider enlarging the buffer pool in question, and maybe consider moving some frequently-accessed, small- to medium-sized objects to one or more PGSTEAL(NONE) pools. No need to panic, in any case. 

Friday, September 21, 2018

The Two Paths to the Db2 for z/OS Distributed Data Facility

It's been almost two years since the general availability date of Db2 12 for z/OS and, by way of that new version, the availability of Db2's native REST interface (retrofitted to Db2 11 for z/OS via APARs PI66828 and PI70477). While the REST interface to Db2 is relatively new, the foundation on which it was built is not: Db2's native REST interface is an extension of the Db2 distributed data facility, also known as DDF. DDF has been around for more than 25 years, providing data services for high-volume, operational, mission-critical applications deployed by organizations in all kinds of industries, all over the world. Given the industrial-strength underpinnings of Db2's REST interface, it's important that this interface be seen as, essentially, a second path into DDF for applications that access Db2 for z/OS via TCP/IP connections. That's the focus of this blog entry

The two paths into DDF

If the REST interface is one path to DDF, what is the other? Why, the SQL path, of course. Hearing that, a Db2 for z/OS DBA or systems programmer might ask, "Oh, you mean the DRDA path?" Yes, that's technically what I mean, but I think that it would be a good idea to use the words "SQL path" instead of "DRDA path," because the latter term is not relevant to a lot of application developers. DRDA - distributed relational database architecture - is Db2's distributed database protocol, but a client-side developer doesn't see DRDA. The developer sees SQL statements, perhaps in JDBC or ODBC form. Those JDBC or ODBC statements are processed by the IBM Data Server Driver (or Db2 Connect), and are in DRDA form when they get to Db2 for z/OS. So, if an application with a TCP/IP connection to a Db2 for z/OS system wants to access Db2-managed data, it can do that by issuing SQL statements (which, again, might be in the form of JDBC or ODBC statements) or by issuing REST calls.

On the topic of the SQL path to DDF, for a long time that path involved going through a Db2 Connect "gateway" server en route to the target Db2 for z/OS system. Over the past several years, we (IBM) have been recommending to Db2 for z/OS-using organizations that they move away from Db2 Connect gateway servers in favor of the IBM Data Server Driver, which runs on the same server as an application program that is utilizing the SQL path (over TCP/IP) to Db2. The IBM Data Server Driver is lighter weight than Db2 Connect, it has the functionality you want (things like connection pooling, Sysplex workload balancing, etc.), and it delivers performance and system management benefits (both largely owing to the fact that the "hop" formerly required to a Db2 Connect gateway server is eliminated - you go straight in to the Db2 for z/OS system from the application server). Note that the IBM Data Server Driver is not licensed as such: your entitlement to use the IBM Data Server Driver is through your Db2 Connect license. If you have, for example, a Db2 Connect Unlimited Edition for System z license for a Db2 for z/OS system, you are entitled to deploy the IBM Data Server Driver in an unlimited fashion for applications that target said Db2 for z/OS system.

As there are two paths to DDF - SQL and REST - there are also two ways a client program can access Db2's native REST interface: the program could directly access that interface, or get to it through z/OS Connect (when Db2's REST interface is accessed by a client program through z/OS Connect, we say that Db2 for z/OS is acting as a REST provider for z/OS Connect). Given that a client program using the REST path to DDF will be getting to Db2's REST interface regardless of whether or not z/OS Connect is in the picture, you might wonder why your organization would want z/OS Connect to be in the picture. z/OS Connect does, in fact, add a good bit of value when client-side programmers want to go the RESTful route to access z/OS-based data services. Some of the benefits delivered by z/OS Connect:

  • Client-side programmers can code more-intuitive REST calls. When going directly to Db2's native REST interface, you have to use the HTTP verb POST in your REST calls. That doesn't limit what you can do, Db2-wise, in response to a REST call, but a client-side developer might want to use GET in a REST call if the requested service is of a data-retrieval nature, or PUT if the call will cause some data values to be persisted at the data server. With z/OS Connect in the picture, all of the HTTP verbs are available for use in REST calls.
  • Easier creation of RESTful services from SQL statements. Db2's REST interface enables invocation of a single static SQL statement (which could be a data manipulation statement such as SELECT, INSERT, UPDATE or DELETE; or a CALL to a Db2 stored procedure) by way of a REST call. Db2 enables creation of a RESTful service through the Db2 command BIND SERVICE, which can be issued from a batch job, or through DB2ServiceManager, a Db2-provided RESTful service that creates RESTful services from SQL statements. That command and that service certainly work, but creating RESTful services from static SQL statements is even easier using the GUI tooling that comes with z/OS Connect.
  • Swagger-based service description. Client-side developers like to be able to "discover" the RESTful services that are available from a host system. The RESTful service-creation mechanisms provided by Db2 for z/OS (the aforementioned BIND SERVICE command and the DB2ServiceManager RERSTful service) allow a service-creator to provide a description of a service in the form of a comment (e.g., "This service returns information about a customer based on a provided customer number"). That's helpful, but with z/OS Connect, service information can be provided to client-side developers in Swagger format - Swagger being an industry-standard specification for describing RESTful services.
  • More-comprehensive capabilities around the management, monitoring, securing, and auditing of RESTful services. Db2 for z/OS provides functionality that addresses all of these areas. z/OS Connect enriches and enhances that functionality.
  • A single entry point for REST-enablement of all kinds of z/OS-based programmatic assets. With z/OS Connect, not only can you REST-enable Db2 SQL statements (which, again, could be stored procedure calls) - you can also REST-enable CICS transactions, IMS transactions, WebSphere Application Server for z/OS transactions, and batch jobs.


Which path to DDF? The SQL path, or the REST path?

Keep in mind that this is not an either/or choice for a Db2 for z/OS system. I expect that, going forward, at many sites you'll see a mix of SQL and REST access to Db2 systems. The question is more relevant, then, in the context of appropriateness/attractiveness for a given application that will access Db2-managed data via TCP/IP connections. Here are some factors that might cause you to lean towards the SQL path:

  • You have client-side developers with a lot of SQL coding experience and expertise, and you want to leverage that capability. In particular, lots and lots of client-side developers know JDBC and/or ODBC (and/or ADO.NET) very well.
  • You want to take advantage of scalability and/or workload management capabilities provided by the IBM Data Server Driver (or Db2 Connect). These capabilities include connection pooling and Sysplex workload balancing functionality.
  • You want client-side programs to be able to dynamically form SQL statements that are then sent to Db2 for z/OS for execution. Oftentimes, SQL statements are hard-coded in client-side programs; sometimes, that is not the case. While it would be technically possible to dynamically form a SQL statement and then pass it as input to a REST-enabled Db2 stored procedure for preparation and execution, issuing dynamically formed SQL statements targeting a Db2 for z/OS system is more easily done via the SQL path to DDF.
  • You need or want client-side control over the scope of transactions. If you want a client-side program to be able to do something like issue SQL | issue SQL | issue SQL | commit, the SQL path is the way to go. When using the REST path, every interaction with the server is, from Db2's perspective, a separate unit of work. That's OK for some applications, not OK for others.


How about factors that might cause you to favor use of the REST path to DDF? Among those could be:

  • There is no need for Db2 client code on the application requester side. In some cases, it may not be desireable or feasible to have the IBM Data Server Driver (or Db2 Connect) installed on the client-side application server.
  • There is no need for client-side programmers to know anything about the particulars of a back-end data server. When a client-side programmer codes SQL statements, he or she knows that the back-end data server is a relational database management system (or something that at least looks like a relational DBMS). Maybe the client-side developers for a given application project don't have much in the way of SQL skills; or, maybe they do have SQL skills, but they prefer the high level of back-end system abstraction provided by the REST architectural style (in other words, they like the data-as-a-service programming model).


Next time a development team is getting ready to build a new application that will access Db2 for z/OS-managed data via TCP/IP connections, or when an existing Db2 for z/OS-based application is going to be reengineered along client-server lines, keep in mind that two paths to the Db2 distributed data facility from such applications are available: the SQL path and the REST path. Work with the development team and determine which path would be best for the project at hand.

Wednesday, August 29, 2018

How Big is Big? (2018 Update - Db2 for z/OS Buffer Pools and DDF Activity)

Almost 5 years ago, I posted to this blog an entry on the question, "How big is big?" in a Db2 for z/OS context. Two areas that I covered in that blog entry are buffer pool configuration size and DDF transaction volume. Quite a lot has changed since October 2013, and it's time for a Db2 "How big is big?" update. In particular, I want to pass on some more-current information regarding buffer pool sizing and DDF activity.

How big is big? (buffer pools)

Back in 2013, when I posted the aforementioned blog entry, the largest buffer pool configuration I'd seen (i.e., the aggregate size of all buffer pools allocated for a single Db2 for z/OS subsystem) was 46 GB. That Db2 subsystem ran in a z/OS LPAR with 180 GB of real storage. Fast-forward to August 2018, and my, how z/OS LPAR memory resources - and exploitation of same via large buffer pools - have grown. The biggest buffer pool configuration for a single Db2 subsystem that I've seen to date? How about 879 GB, in a z/OS LPAR that has 1104 GB (almost 1.1 TB) of central storage. A single pool in that configuration has 66,500,000 buffers of 4 KB each - that's over 253 GB of space in one pool. Does that humongous amount of buffer pool space - over 600 GB of which is page-fixed in memory - put undue pressure on the z/OS LPAR's real storage? No. The demand paging rate for that system (a busy data server, processing about 14,000 SQL statements per second during peak times) is a big fat zero. That's because the 225 GB of memory not used for Db2 buffer pools is plenty for the other real storage requirements in the LPAR.

What does the organization with the great big Db2 buffer pool configuration get in return for using lots and lots of memory for data and index page caching? It gets tremendous suppression of disk subsystem read I/Os: I saw that for an hour during which the volume of data access activity on the system was really high, the highest total read I/O rate for any of the buffer pools was 48 per second (very low). During that peak-busy time, two of the other buffer pools had total read I/O rates of 15 and 13 per second (very, very low), five pools had total read I/O rates between 2 and 5 per second (super-low), and the other nine active pools had total read I/O rates of less than 1 per second, or even 0 (super-duper low). And what are the payoffs from tremendous suppression of disk read I/Os? CPU savings (every I/O - synch or asynch - consumes CPU time) and improved transaction and batch job elapsed times (in Db2 monitor accounting-long reports, wait time related to synchronous and asynchronous database reads - the latter is labeled "wait for other read" - becomes a very small percentage of in-Db2 elapsed time).

z/OS LPAR memory sizes are getting larger all the time. If you've got it, use it - and using it for big (maybe really, really big) buffer pools can be a great move on your part. In doing that, don't forget to leverage fixed-in-memory buffer pools, large page frames, and maybe "pinning" pools (the latter are used to cache associated database objects in memory in their entirety, and should be defined with PGSTEAL(NONE)).


How big is big? (DDF transaction volume)

In the above-cited "How big is big" blog entry, I noted that the highest DDF transaction rate I'd seen for a single Db2 subsystem (average over a 1-hour period) was 786 per second. A few months ago, I saw data from a Db2 subsystem that was processing 3057 DDF transactions per second (again, that's the average over a 1-hour period) - almost 4 times the highest DDF transaction rate I'd seen back in 2013. [It's easy to calculate a DDF transaction rate: in a Db2 monitor accounting-long report with data grouped by connection type, in the section on the DRDA connection type, divide the commit count by the number of seconds in the reporting interval, and there's your transaction rate.]

I have seen that an ever-growing percentage of overall Db2 for z/OS workloads - and a really big percentage of new Db2-accessing application workloads - involve Db2 access via the distributed data facility. This, combined with the increasing processing capacity delivered by new generations of IBM Z servers, plus DDF-related performance features such as high-performance DBATs and the SMT2 mode in which zIIP engines can operate, add up to substantial growth in DDF transaction volumes at many Db2 for z/OS sites (the organization with the DDF transaction rate in excess of 3000 per second for a Db2 subsystem runs the zIIP engines in the associated z/OS LPAR in SMT2 mode). DDF transaction rates are likely to get a further boost as companies take advantage of Db2's built-in REST interface, since that interface provides a second path to the Db2 distributed data facility (the other path being the SQL path which could also be called the DRDA path - more on that in the next entry I'll post to this blog).


Big is likely to get bigger

These upward trends, regarding Db2 buffer pool configuration sizes and DDF transaction volumes, are two that I like to see. The former reflects growing recognition that large IBM Z server real storage resources can be very effectively leveraged to turbocharge Db2 application performance, and the latter shows that Db2 for z/OS is an increasingly popular choice as the data server for modern, multi-tiered, client-server applications that access data through standard interfaces (e.g., JDBC, ODBC, ADO.NET, REST). How big will Db2 buffer pool configurations get in the years to come? How high will DDF transaction rates go? We'll see. My take is that big - even really big - is going to get a lot bigger still.