Robert's Db2 blog: 2013

Sunday, December 29, 2013

DB2 for z/OS: Want to use High-Performance DBATs? Check your MAXDBAT Value

Of the features introduced with DB2 10 for z/OS, high-performance DBATs is one of my favorites. It enabled (finally) DDF-using applications to get the CPU efficiency benefit that comes from combining thread reuse with the RELEASE(DEALLOCATE) package bind option -- a performance tuning action that has long been leveraged for CICS-DB2 workloads. Implementing high-performance DBATs is pretty easy: in a DB2 10 (or 11) environment, when a package bound with RELEASE(DEALLOCATE) is executed by way of a DBAT (i.e., a database access thread -- the kind used for DRDA requesters that connect to DB2 via the distributed data facility), that thread becomes a high-performance DBAT (if it isn't one already). Before jumping into this, however, you should consider some things that are impacted by the use of high-performance DBATs. One of those things is the DBAT pool. That's where the MAXDBAT parameter of ZPARM comes in, and that's what this blog entry is about.

The value of MAXDBAT determines the maximum number of DBATs that can be concurrently active for a DB2 subsystem. The default value is 200, and at many sites that value, or one that's a little larger, has effectively supported a much greater number of DB2 client-server application connections (the default value for CONDBAT in ZPARM -- the maximum number of connections through DDF to a DB2 subsystem -- is 10,000). How so? Well, if your system is set up to allow for inactive connections (CMTSTAT = INACTIVE has been the default in ZPARM since DB2 V8), when a DDF transaction completes the associated connection will go into an inactive state (a very low-overhead transition, as is the transition back to the active state) and the DBAT used for the transaction will go into the DBAT pool, ready to service another transaction. That can happen because a "regular" DBAT is only associated with a particular DB2 connection while it is being used to execute a request from said connection. Because it is common for only a small percentage of DDF connections to a DB2 subsystem to be active (i.e., associated with in-flight transactions) at any given moment, a large ratio of connections to DBATs has historically been no problem at all.

Bring high-performance DBATs into the picture, and things change. In particular, a high-performance DBAT, once instantiated, will remain dedicated to the connection through which it was instantiated until it's been reused by 200 units of work (at which point it will be terminated, so as to free up resources allocated to the thread). That high-performance DBAT, therefore, will NOT go into the DBAT pool when a transaction using the thread completes. When a request associated with another connection comes in (i.e., from a connection other than the one through which the high-performance DBAT was instantiated), the high-performance DBAT won't be available to service that request. Some other DBAT will have to be used, and guess what? If that DBAT isn't a high-performance DBAT, it will become one if the package associated with the incoming request (and that could be a DB2 Connect or IBM Data Server Driver package) was bound with RELEASE(DEALLOCATE). The DBAT pool thus becomes progressively smaller as high-performance DBATs are instantiated. Know what else happens? The number of active DBATs goes up -- maybe sharply. Why? Because a "regular" DBAT is active only while it is being used to execute a DDF transaction. A high-performance DBAT, on the other hand, is considered to be active as long as it exists -- that will be 200 units of work, as mentioned previously, and when a high-performance DBAT is waiting to be reused, it's an active DBAT.

This last point -- about the number of active DBATs potentially rising sharply when high-performance DBATs are utilized -- is illustrated by some information I recently received from a DB2 professional. At this person's shop, high-performance DBATs were "turned on" for a DB2 subsystem (the PKGREL option of the -MODIFY DDF command can be used as a "switch," telling DB2 to either honor RELEASE(DEALLOCATE) for packages executed via DBATs -- thereby enabling instantiation of high-performance DBATs -- or not), and the number of active DBATs for the subsystem went from the usual 60 or so to about 700. Because the MAXDBAT value for the DB2 subsystem was already at 750, these folks didn't run out of DBATs, but the pool of "regular" DBATs got pretty small. In response to the big increase in active DBATs seen when high-performance DBAT functionality was enabled, the MAXDBAT value for the DB2 system in question was increased to 2000. Was this OK? Yes: When packages are bound or rebound in a DB2 10 for z/OS environment, almost all thread-related virtual storage goes above the 2 GB "bar" in the DBM1 address space, and that allows for a 5- to 10-times increase in the number of threads that can be concurrently active for the DB2 subsystem.

So, if you're thinking about using high-performance DBATs (and you should), check your subsystem's MAXDBAT value, and consider making that value substantially larger than it is now. Additionally, take steps to enable selective use of high-performance DBATs by your network-attached, DB2-accessing applications. For programs that contain embedded SQL statements and, therefore, have their own packages (e.g., DB2 stored procedures -- both external and native), use RELEASE(DEALLOCATE) for the most frequently executed of these packages. For the packages associated with DB2 Connect and/or the IBM Data Server Driver, use two collections: The default NULLID collection, into which you'd bind the DB2 Connect and/or IBM Data Server Driver packages with RELEASE(COMMIT), and another collection (named as you want) into which you'd bind these packages with RELEASE(DEALLOCATE). Then, by way of a data source or connection string specification on the client side, direct DDF-using applications to NULLID or the other collection name, depending on whether or not you want high-performance DBATs to be used for a given application.

To keep an eye on DBAT usage for a DB2 subsystem, periodically issue the command -DISPLAY DDF DETAIL. In the output of that command you'll see a field, labeled QUEDBAT, that shows the number of times (since the DB2 subsystem was last started) that requests were delayed because the MAXDBAT limit had been reached. If the value of this field is non-zero, consider increasing MAXDBAT for the subsystem. You might also want to look at the value of the field DSCDBAT in the output of the -DISPLAY DDF DETAIL command. This value shows you the current number of DBATs in the pool for the subsystem. As I've pointed out, maintaining the "depth" of the DBAT pool as high-performance DBAT functionality is put to use might require increasing MAXDBAT for your DB2 subsystem.

DDF activity can also be tracked by way of your DB2 monitor. I particularly like to use a DB2 monitor-generated Statistics Long Report to see if the connection limit for a DB2 subsystem (specified via the CONDBAT parameter in ZPARM) is sufficiently high. In the section of the report under the heading "Global DDF Activity," I'll check the value of the field labeled CONN REJECTED-MAX CONNECTED (or something similar -- fields in reports generated by different DB2 monitors might be labeled somewhat differently). A non-zero value in this field is an indication that the CONDBAT limit has been hit, and in that case you'd probably want to set CONDBAT to a larger number to allow more connections to the DB2 subsystem.

So there you go. Using high-performance DBATs can improve the CPU efficiency of your DB2 for z/OS client-server workload, but if you do leverage high-performance DBAT functionality then you might need to boost the DBAT limit for your DB2 subsystem in order to maintain the depth of your DBAT pool, because as high-performance DBATs increase in number, pooled DBATs decrease in number (unless you've upped your MAXDBAT value to compensate for this effect). Boosting MAXDBAT in a DB2 10 (or 11) environment is OK, as thread-related virtual storage in such an environment is almost entirely above the 2 GB "bar" in the DBM1 address space (assuming that packages have been bound or rebound with DB2 at the Version 10 or 11 level). Of course, you need real storage to back virtual storage, so if you increase the MAXDBAT value keep an eye on the z/OS LPAR's demand paging rate and make sure that this doesn't get out of hand (if the demand paging rate is in the low single digits or less per second, it's not out of hand).

Wednesday, December 18, 2013

DB2 for z/OS: Monitoring Prefetch Read Activity

Not long ago, a colleague sent to me some DB2 for z/OS buffer pool activity data that a DB2-using organization had sent to him. The DB2 people at this site were alarmed by some of the prefetch-related numbers in the data, and they had requested a review of the information. My analysis of the figures suggested that the system administrators were concerned by numbers that were in fact positive in nature; however, I also saw prefetch items that really were troublesome, and I subsequently learned that these were likely due to a buffer pool configuration change that had been made to improve application performance but ended up working towards the opposite end. In this blog entry I'll explain why buffer pool statistics related to prefetch activity are sometimes misinterpreted, and how "tuning" actions intended to bolster synchronous read performance can have unexpectedly negative consequences.

First, the misinterpreted figures. A system administrator at the aforementioned DB2 for z/OS site was doing a good thing: reviewing the output of the DB2 command -DISPLAY BUFFERPOOL(bpname) DETAIL for two of the key buffer pools allocated in a production subsystem. He calculated some ratios, and was particularly concerned about a couple of the calculated values. For both of the pools, the ratio of pages read via dynamic prefetch to the number of dynamic prefetch requests was quite low: about 2 to 1 for one pool, and only 0.13 to 1 for the other pool. This had the appearance to the system administrator of something being not right, as he knew that a dynamic prefetch request would generally be for 32 pages.

In fact, these dynamic prefetch requests versus pages read via dynamic prefetch numbers were just fine, but they are often viewed incorrectly. One factor that contributes to these incorrect readings of -DISPLAY BUFFERPOOL data is the fact that many people look at prefetch requests when they should be focusing instead on prefetch I/Os. The really important thing to understand here is this: a prefetch request is just that: a request to read from disk a particular set of pages (typically 32 contiguous table space or index pages in the case of dynamic prefetch). If all of the pages associated with a prefetch request are already in the buffer pool, the prefetch request will not result in a prefetch I/O. What that should tell you is, a high ratio of prefetch requests to prefetch reads is a good thing -- it means that pages are being effectively cached in the buffer pool, so that when a chunk of said pages is asked for by way of a prefetch request, an associated I/O will often not be necessary because all of the pages in the prefetch request are already in memory. Sure, a prefetch I/O will often be preferred to a synchronous I/O, because the latter always results in application wait time, while in the case of the former it is hoped that the prefetch I/O operation will complete before the application process (the one on behalf of which the I/O is being driven) asks for rows (or index entries, as the case may be) from the pages being read from disk. Even so, however, a "non-I/O" is preferable versus a prefetch I/O.

Pages read via prefetch, then, should be compared to prefetch I/Os, not to prefetch requests. Looked at in that light, pages read via dynamic prefetch for the buffer pools of interest here might still appear to be on the low side: about 13 pages per prefetch I/O for the one buffer pool and about 14 pages per prefetch I/O for the other pool. It would be better if those figures were closer to 32, the size of a typical dynamic prefetch request, right? WRONG. It's the same thing as a high ratio of prefetch requests to prefetch read I/Os: fewer pages read into memory per prefetch I/O is a good thing. It means that (again) pages are being effectively cached in the buffer pool, so that when a prefetch request for 32 pages (for example) is issued, far fewer than 32 pages will have to be read into memory from disk because close to 20 of those 32 pages (on average for these particular buffer pools during a particular time period) are already in the buffer pool.

Thus it is that the high ratio of dynamic prefetch requests to dynamic prefetch read I/Os seen by the system administrator for the two buffer pools in his DB2 system, and the low ratio of pages read via dynamic prefetch to dynamic prefetch read I/Os, are not "uh-oh" numbers -- they are "Yes!" numbers. At the same time, however, there were indeed some "uh-oh" numbers in the -DISPLAY BUFFERPOOL output that caught my eye. Specifically, I saw that prefetch had been disabled for one of the pools, due to a lack of buffer resources, 164 times during the 28 minutes of activity captured in the command output. [The output of -DISPLAY BUFFERPOOL DETAIL contains a timestamp for each buffer pool listed, showing the start of the activity-capture time period (the end of the period is the time at which the command was issued). This timestamp will indicate either the time at which a pool was last allocated, or when the -DISPLAY BUFFERPOOL command was previously and most recently issued for the pool; so, if the command is issued once for a pool and then issued again for the pool an hour later, the timestamp value in the output of the second issuance of the command will be one hour before the time at which the command was issued for the second time.] That's not good. Because prefetch reads, when appropriate, are generally preferred over synchronous reads (this because, as previously mentioned, they are "anticipatory" reads, intended to bring pages into memory before they are needed by an application process), you really don't want to see prefetch disabled because of a shortage of buffers available to support prefetch reads.

Why would there be such a buffer shortage for this pool? I had a suspicion, and that was confirmed when I saw that the sequential threshold for the pool (also known as VPSEQT, short for the virtual pool sequential threshold) was set at 25, versus the default value of 80. What did that mean? It meant that instead of the usual 80% of the pool's buffers being available to hold pages brought into memory via prefetch, only 25% of the buffers were available for this purpose (the other 75% of the buffers in the pool were reserved exclusively for pages read into memory via synchronous reads). Why had the sequential threshold been lowered for the pool? The thinking had been that this action would reduce synchronous read activity (and we like to reduce that because, as I pointed out, a synchronous read always involves application wait time) by providing more synchronous-read-only buffers in the pool. Sounds logical, right? And, it might have had the desired effect if there hadn't been a lot of prefetch requests for the pool. In fact, there were more than 800 prefetch requests per second for this pool, and dramatically reducing the buffer resources available to support prefetch I/Os associated with the prefetch requests ended up negatively impacting synchronous read activity -- precisely the opposite of the hoped-for effect.

Here's the connection between a non-zero value for PREFETCH DISABLED - NO BUFFER in -DISPLAY BUFFERPOOL output and synchronous read activity: when prefetch is disabled, the pages that would have been read into memory via a prefetch read I/O will likely still be sitting on disk when an application process needs them. What happens then? They are synchronously read into the buffer pool (while the application process waits). Thus, incidents of prefetch being disabled can increase synchronous read activity; furthermore, with far fewer buffers (versus the default) in this pool being available for holding prefetch-read pages, pages that had been brought into memory via prefetch were quickly being flushed out of the pool to make room for other prefetch-read pages, and that meant that they couldn't be re-referenced without driving more I/Os, and that also worked to drive up synchronous read activity for the pool.

So, I told these folks (through their local IBM technical specialist) to significantly increase the VPSEQT setting for this buffer pool, and explained the rationale behind the recommendation. They ended up changing the sequential threshold for the pool back to the default value of 80. The results (shown in the buffer pool section of a DB2 monitor statistics long report for the DB2 subsystem) were quite positive and, I think, interesting: the frequency of prefetch being disabled due to a lack of buffer resources went from about 6 per minute to zero, and synchronous read activity dropped by 43%, even though the percentage of the pool's buffers set aside exclusively for synchronous reads went from 75% to 20%. With the VPSEQT change, more pages could be read into memory in an anticipatory fashion, and those pages, once prefetched into the pool, were likely to stay resident in the pool for a longer period of time, and the combination of these effects reduced the need to synchronously read pages from disk.

Here are what I see as the key take-aways from this case:

Do not discount the importance of prefetch reads. Some DB2 for z/OS people are overly focused on synchronous read activity, to the point that they will do things that negatively impact prefetch read operations for a buffer pool. That can be a mistake, leading in some cases to increased synchronous read activity.
Be careful about reducing VPSEQT for a buffer pool. Such an adjustment might have an overall positive performance impact for a pool that has relatively little prefetch activity. Here, I'd look at the volume of prefetch requests (sequential, list, and dynamic) per second for the pool of interest. If that number is high (e.g., hundreds or more per second), DB2 has determined that asynchronous reads are the right choice for bringing many of the pages of objects assigned to the pool that are needed by programs into memory. If you squeeze down the resources with which DB2 can work in servicing prefetch requests, the result could be an undesirable increase in synchronous read activity. Even if you do think that you have a pool for which a lowering of the VPSEQT setting could be beneficial, don't go overboard. Don't change from VPSEQT=80 to some much-lower value in one fell swoop -- make more modest adjustments to VPSEQT and monitor results as you go along.
Monitor buffer pool activity in an ongoing fashion, using information from your DB2 monitor and/or the output of the -DISPLAY BUFFERPOOL DETAIL command (a form of this command that I like to use is -DISPLAY BUFFERPOOL(ACTIVE) DETAIL, and I like to issue that command once and then again in an hour to have an hour's worth of activity captured in the output of the second issuance of the command). Look for incidents of PREFETCH DISABLED - NO BUFFER and PREFETCH DISABLED - NO READ ENGINE (the latter can also be related to a lack of buffer resources, if that lack of resources leads to elevated levels of prefetch read I/O activity). If you see non-zero values in these fields for a buffer pool, first check the VPSEQT value for the pool, and consider changing that value to 80 if it is less than 80. If you already have VPSEQT=80 for the pool, consider making the pool larger, if you have the real storage in the z/OS LPAR to back a larger pool.

If you're a DB2 person, prefetch is your friend. Don't hobble it.

Friday, November 29, 2013

A Great Analytics on System z Use Case: The Operational Data Store

A few months ago, I posted an entry to this blog on the subject of "moving queries to data," the idea being that when data to be analyzed for decision support purposes originates in a DB2 for z/OS database (as a great deal of the world's corporate and governmental data does), a very good case can be made for analyzing that data on its platform of origin -- System z -- as opposed to copying and moving the data to a different platform for querying and reporting. Quite recently I've had some communications with various individuals on this very topic, and these exchanges have led to my highlighting a particularly good use case for DB2 and System z as a data server for business intelligence applications: the operational data store, or ODS. The ODS is not a new concept, but I've found that it's not universally understood and leveraged. In this blog entry, I'll provide some information that I hope will clarify what an ODS is, for those not very familiar with the idea, and show why DB2 for z/OS can be an outstanding environment for an ODS, particularly when the operational data in question "lives" in a DB2 for z/OS system to begin with.

First, I think that it can be useful to contrast an ODS with a data warehouse. Keep in mind that having an ODS versus a data warehouse is not an either/or proposition -- an organization can make effective use of both. Speaking, then, in pretty broad-brush terms, here are some of the things that tend to differentiate these two types of analytics data stores (and when I use the term "operational database" hereafter, I'm referring to a source database that is accessed by operational, "run the business" online and/or batch applications):

Whereas the design of a data warehouse database will usually differ from the design of the operational database(s) from which the data warehouse data is sourced (for example, the data warehouse might feature so-called fact and dimension tables arranged in what's known as a star schema, to facilitate certain types of online analytical processing), an ODS is very often an exact duplicate of an operational database (or at least a duplicate of a subset of the tables in the operational database) -- "duplicate" referring to table and column names, and table structure (i.e., column order and column data types). In some cases an ODS will differ from the associated operational database in terms of data retention (a company might decide to keep data for X amount of time in the operational database, and for a longer period of time in an ODS, for historical reporting purposes -- an arrangement that can work especially well if the older data is rarely, if ever, updated).
Whereas a data warehouse is often sourced from several databases (which may be housed on different server platforms), an ODS commonly has one associated source database.
Whereas data in a data warehouse is sometimes aggregated in some way, data in an ODS is typically "atomic-level" with regard to detail.
Whereas there is often some tolerance for delay with respect to source data updates being reflected in a data warehouse (data extract/transform/load, aka ETL, processes that run on a nightly basis are not unusual for a data warehouse), users of an ODS often expect and may demand extremely low latency in this regard -- the requirement could even be for near-real-time ODS updates with respect to the updating of corresponding records in the source operational database.
Whereas data in a data warehouse is often transformed in some way, to accomplish things such as making data fields (which might be code values in an operational source table) more user-friendly, data is typically NOT transformed when added to an ODS.

Given these differences, in the case of an ODS you will regularly find that instead of ETL, you simply have EL (extract/load, from the source database to the ODS), and that EL process may involve near-real-time replication. That being the case, when the source operational database is managed by DB2 for z/OS, it can make all kinds of sense to use DB2 for z/OS as well for the ODS. Different organizations do exactly that, in different ways. A company might have an ODS in the same DB2 for z/OS subsystem that is used for the operational database, with the ODS tables being in a different schema (that is to say, the tables in the ODS would have different high-level qualifiers versus the tables in the operational database). More commonly (in my experience), the ODS will be set up in a different DB2 subsystem. The DB2 subsystem used for the ODS could be in the same z/OS LPAR as the DB2 subsystem used for the operational database, but I more frequently find that it's in a different z/OS LPAR.

When an ODS sourced from a DB2 for z/OS database is itself managed by DB2 for z/OS, very low-latency replication of source data changes to the ODS is facilitated, as is management of the ODS environment itself (the DBA team that takes care of the DB2 for z/OS operational database will generally have a pretty easy time managing an ODS database that essentially mirrors the source database). This ease of management extends to matters related to data security -- the same DB2 for z/OS and RACF (or equivalent) security controls used to lock down data in the operational database can be leveraged to protect ODS data from unauthorized access. Another factor that contributes to the goodness of fit of DB2 for z/OS with an ODS is the nature of the typical ODS workload: some longer-running, complex and/or data-intensive queries, yes, but also a relatively high volume of quick-running, in-and-out "transactional queries" -- users wanting to see, immediately, the information pertaining to this particular customer, or that particular transaction, in the course of their data analytics work. DB2 for z/OS and System z have long excelled at handling such mixed workloads, while some other platforms used for analytics work can bog down in processing a large number of concurrently executing queries. If you want to make your DB2 for z/OS system even more highly capable with regard to "operational analytics" workloads, take a look at the DB2 Analytics Accelerator for z/OS -- technology that can preserve excellent performance for high-volume, transactional queries while dramatically speeding up the execution of more complex and data-intensive queries.

Here's another thought: when the idea of minimizing inter-platform data movement is taken to its extreme, you don't move the data at all -- you allow analytics users to query the actual operational tables in the production DB2 for z/OS database. This approach, while not commonplace, is utilized in some cases, and successfully. It is certainly technically feasible, and more so on the z/OS platform than others, thanks to the highly advanced workload management capabilities of z/OS. One interesting option in this area is available to organizations that run DB2 for z/OS in data sharing mode on a Parallel Sysplex: you route the analytics queries through a subset of the data sharing group members, and route the operational transactions and batch jobs through the other members of the group. All the SQL statements hit the same DB2 tables, but because the analytics and operational SQL statements run in different DB2 subsystems (which can be in different z/OS LPARs on different System z servers in the Sysplex), the analytics and operational workloads don't compete with each other for server memory or processing resources. I have some personal experience with such a set-up, and I wrote a few blog entries that provide related information: a part 1 and part 2 entry on running OLTP and business intelligence workloads on the same DB2 for z/OS system (written while I was working as an independent DB2 consultant), and an entry that describes technology that can be used to limit different workloads to different subsets of the members of a DB2 data sharing group.

Does your organization use DB2 for z/OS to manage and protect your most valuable data: the data generated by your run-the-business operational applications -- the data that you own? If so, do you have a DB2 for z/OS-based ODS that provides users with secure, high-performance access to a current version of that data with atomic-level detail? If you don't have such an ODS, consider how a data store of this nature, managed by DB2 for z/OS -- the same DBMS in which your "gold" data is housed -- could provide users throughout your business with a resource that would enhance decision making effectiveness and improve outcomes. Lots of organizations have taken this path. It may be one that your company should take, as well.

Tuesday, November 26, 2013

DB2 for z/OS Work: the Task's the Thing

To understand how DB2 work is handled in a z/OS system, you need to have some understanding of the tasks used to manage that work. Here, "tasks" doesn't refer to things done or to be done. It refers instead to the control blocks used to represent, for dispatching purposes, application and system processes in a z/OS LPAR. I have found that several misconceptions related to DB2 for z/OS workload processing are rooted in misunderstandings of the tasks behind the workload. In particular, I've seen situations in which folks have the wrong idea about DB2 DDF address space CPU consumption, about batch- versus online-issued SQL statements, and about zIIP eligibility of DB2 work, and in each case clarity came from a realization of the tasks related to the DB2 activity in question. In this blog entry, I'll try to provide you with some information that I hope will be helpful to you in understanding why DB2 for z/OS work is processed as it is.

DB2 DDF address space CPU consumption

With respect to control blocks used to manage DB2 work in a z/OS system, there are at least a couple of relevant dichotomies. First, you have "system" tasks and "user" tasks. The system tasks are associated with DB2 address spaces, and they generally have to do with work of a housekeeping variety (e.g., writes of changed data and index pages to disk), as well as things pertaining to the SQL statement execution environment (e.g., thread creation and termination). Where things can get a little confusing is in the area of DB2 DDF CPU utilization (DDF being the distributed data facility -- the DB2 address space through which SQL statements from network-attached application servers and workstations flow, and through which results flow in the other direction). The confusion to which I refer stems from the fact that the DDF address space has associated with it both system tasks and user tasks, as opposed to having only the former. For SQL statements issued by "local" applications (those that attach directly to a DB2 subsystem in the same z/OS LPAR), the user tasks -- those under which SQL statements execute and to which the large majority of CPU time consumed in SQL statement execution is charged -- belong to "allied" address spaces (i.e., the address spaces in which the SQL-issuing programs run). So, for example, the user task under which a SQL statement issued by a local CICS transaction program executes is the CICS subtask TCB associated with that transaction, and the user task under which a SQL statement issued by a local batch job executes is the TCB of the batch address space (more on TCBs in a moment). Compared to the user tasks of allied address spaces connected to a local DB2 subsystem, the system tasks of the DB2 IRLM (lock manager), MSTR (system services), and DBM1 (database services) address spaces consume relatively little CPU time.

In the case of the DDF address space, you have to keep in mind that the programs issuing the SQL statements aren't running in the z/OS LPAR with DB2, and yet they must be represented locally in the z/OS LPAR so that they can be properly prioritized and dispatched by the operating system. The local representation of a remote DB2-accessing program is a so-called preembtable SRB in the DDF address space (more on SRBs to come), and because that task is the local representation of the remote program, most of the CPU time associated with execution of SQL statements issued by that program will be charged to the DDF preemptable SRB. That is why DDF CPU consumption will be significant if there are a lot of SQL statements flowing to DB2 through the DDF address space -- it's analogous to a CICS region consuming a lot of CPU time if a lot of SQL statements are sent to DB2 by transactions running in that CICS region. The DDF user tasks are charged with a lot of the CPU time related to SQL statement execution, while the DDF system tasks consume only a small amount of CPU time. You can find more information about DDF CPU consumption in an entry I posted to this blog last year.

Batch- versus online-issued SQL statements

Not long ago, I got a note from a software analyst who was concerned that SQL statements issued by batch programs would get in the way of SQL statements issued by concurrently executing CICS transactions. He was specifically worried about "inefficient" batch-issued SQL statements (which could be thought of as long-running SQL statements, though long-running SQL statements are not necessarily inefficient in any way) getting in line for CPU cycles ahead of CICS-issued SQL statements. The main issue in this person's mind was the dispatching priority of the DB2 address spaces: a priority that was somewhat higher (as recommended) than that of the CICS regions in his system. If DB2 has a really high priority in the z/OS LPAR, won't long-running, batch-issued SQL statements negatively impact the performance of CICS transactions?

The answer to that question (unless the site's workload manager policy is unusual) is, "No" (or at least, "No, CICS transactions will not be behind these long-running, batch-issued SQL statements in the line for CPU cycles"). Again, the task's the thing. A SQL statement (long-running or otherwise) issued by a batch job runs under that job's task (though it executes in the DB2 database services address space, as do all SQL statements), and it therefore has the priority of that task. A SQL statement issued by a CICS transaction runs under that transaction's task in the associated CICS region, and so it executes with the priority of the transaction's task. Assuming that batch jobs in your environment have a lower priority than CICS transactions, SQL statements issued by batch jobs will have a lower priority relative to SQL statements issued by CICS transactions.

The priority of DB2 address spaces (which should be really high) does NOT impact the priority of SQL statements that access DB2 data. Why, then, is it important for the DB2 address spaces to have a high priority in the system? Because a LITTLE BIT of the work related to SQL statement execution is done under DB2 tasks (examples include data set open and close, and lock management), and if this work doesn't get done RIGHT AWAY as needed, the whole DB2-accessing workload can get seriously gummed up. That's why giving the DB2 address spaces a higher priority (doesn't have to be way higher) than DB2-connected CICS regions is good for CICS-DB2 throughput: it enables DB2 to very quickly take care of the little bit of work done by DB2 tasks in a very timely manner, so that the bulk of SQL statement processing (which, again, happens at the priority of SQL statement-issuing programs) won't get bogged down waiting for locks to be released or threads to be created or whatever. More information on DB2 and CICS address space priority recommendations can be found in a blog entry on the topic that I posted last year.

zIIP eligibility of DB2 work

I mentioned in the second paragraph of this entry that there are a couple of dichotomies with regard to the control blocks that are used for tracking and dispatching work in a z/OS system. The first of these -- system tasks and user tasks -- I covered already. The second dichotomy I have in mind is TCBs and SRBs, or, more formally, task control blocks and service request blocks. For many years, people associated TCBs with user tasks, and SRBs with system tasks. That thinking wasn't far off the mark until the 1990s, when, needing a mechanism to manage work such as that processed through the DB2 DDF address space, IBM z/OS developers delivered a new type of SRB -- an SRB that, as said by a then-DB2 DDF developer, "acts like a TCB." This was the enclave SRB, and in particular, the preemptable SRB.

It's important to keep in mind that SQL statements that get to DB2 via DDF execute under preemptable SRBs. Here's why that's important: work that runs under such tasks is zIIP eligible (zIIPs are System z Integrated Information Processors -- the "specialty engines" that provide very economical computing capacity for certain types of work). In the case of DDF-routed SQL, the zIIP offload percentage tends to be about 60% (queries parallelized by DB2 also run under preemptable SRBs, and so are zIIP eligible, as I pointed out in a blog entry I posted back in 2010).

In my experience, people are most likely to get confused about zIIP eligibility of DB2 work when they are thinking about native SQL stored procedures. First, they may wonder why a native SQL procedure is zIIP-eligible when called by a DRDA requester (i.e., when invoked via a CALL that goes through the DDF address space), but not zIIP-eligible when called by a local-to-DB2 program (such as a CICS transaction). People can also be a little unclear on the reason why a native SQL procedure called by a DRDA requester is zIIP-eligible, while an external stored procedure (such as one written in COBOL or C) called by a DRDA requester is not zIIP-eligible. To get things straight in both of these cases, remember (again) that the task's the thing. A native SQL procedure runs under the task of the application process through which it was invoked, while an external stored procedure runs under a TCB in a WLM-managed stored procedure address space; thus, a native SQL procedure, when called by a DRDA requester, will run under a preemptable SRB in the DDF address space (as will any SQL statement issued by a DRDA client program), and that is why the native SQL procedure will be zIIP eligible in that situation. When the same native SQL procedure is called (for example) by a CICS transaction program, it will run under that program's task. Because that task is a TCB (in the associated CICS region), the native SQL procedure will not be zIIP eligible when so called. Similarly, even if an external stored procedure is called by a DRDA requester, it will run under a TCB (in a WLM-managed stored procedure address space) and so will not be zIIP eligible.

If you'll keep in mind the type of task -- user or system, TCB or SRB -- involved in getting various types of DB2 work done, you'll have a much better understanding of how that work is managed and prioritized, and whether or not it is zIIP-eligible. I hope that the information in this blog entry will be useful to you in that regard.

Friday, October 25, 2013

DB2 for z/OS: How Big is Big?

For several decades I've enjoyed reading the comic strip, B.C., about a bunch of cavemen. In one strip published long ago (I may be assigning quotes to the wrong characters, but you'll get the point), B.C. asks one of his compadres, "Thor, how far can you see?" Thor responds, "Well, I can see that cave over there, and it's about a mile away, so I guess I can see about a mile." B.C. then happens upon Peter, and asks the same question. Peter's reply: "I can see the sun, so the answer to your question is 93 million miles." Thus, B.C. learns something: how far is far? It depends on who you ask.

That truism applies to matters of size as well as distance. How big is big? More specifically, how big is big in a DB2 for z/OS context? What is a big table? What is a big buffer pool configuration? What is a big data sharing group? Ask different people, and you'll get different answers. In this blog entry I'll give you my answers to these and some other questions in the same vein. Most of the information you'll find below is based on my observations of actual, real-world DB2 for z/OS workloads. Some of data was conveyed to me by DB2 users and by IBM colleagues. Keep in mind, then, that what I'm reporting is not necessarily "biggest in the world." It's "biggest that I've seen," or "biggest that I've heard of." I haven't been to every DB2 for z/OS site around the world, nor have I spoken with everyone in the DB2 community, so I imagine that other people could provide numbers that top the ones I've provided in this blog post. Feel free to communicate "I can beat that" information via a comment.

Without further ado, here is my take on "big" as it pertains to DB2 for z/OS.

Buffer pool configuration size. This is an area of much interest to me (it was the focus of part 2 of my 3-part "memory for MIPS" blog entry, posted in the spring of last year). The biggest buffer pool configuration I've seen (combined size of all pools allocated for a single DB2 subsystem) is 46 GB. And get this: every one of the buffer pools comprising this configuration is defined with PGFIX(YES). And get THIS: the demand paging rate for the associated z/OS LPAR is ZERO. How can you use so much page-fixed memory for DB2 buffer pools and still not have any paging? Easy: just have a lot of real storage. The z/OS LPAR on which this DB2 subsystem runs has about 180 GB of memory.

I'll tell you, my definition of "big" as it pertains to a DB2 buffer pool configuration has changed considerably over the past few years. These days, I consider a buffer pool configuration of less than 5 GB for a production DB2 subsystem to be on the small side. Between 5 and 20 GB? I'd call that medium-sized. Big is over 20 GB.

Total disk read I/O rate for one buffer pool. I don't pay much attention to hit ratios when I look at buffer pool activity. I'm much more concerned with a buffer pool's total disk read I/O rate. That rate is the sum of synchronous and asynchronous read I/Os per second for a pool. If you get the numbers from a DB2 monitor, you'll probably see one figure for synchronous read I/Os per second, and three fields showing asynchronous read I/Os per second: sequential prefetch reads, list prefetch reads, and dynamic prefetch reads. Add all these together to get the total rate. You can also get the data from the output of the DB2 command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL. If you go that route, issue the command once, then wait an hour and issue it again. The output of the SECOND issuance of the command will show one hour of activity (verify that by checking the value of the timestamp in the "INCREMENTAL STATISTICS SINCE" message in the command's output). Add up the synchronous reads (random and sequential) and the asynchronous reads (sequential, list, and dynamic), and divide the total by 3600 to get a per-second rate. Whether looking at DB2 monitor-generated information or -DISPLAY BUFFERPOOL output, do NOT make the mistake of using prefetch REQUEST numbers. You want the prefetch READ numbers (if all of the pages included in one prefetch request are already in the buffer pool, there will be no read associated with that request).

The highest total read I/O rate I've seen for one buffer pool is 9000 per second. My preference is to see a rate that's below 1000 per second for each pool. That may or may not be possible at your site, depending on the amount of server memory available for backing buffer pools.

Sort pool size. I wrote about the sort pool (and other DBM1 pools besides the buffer pools) in part 3 of the aforementioned 3-part "memory for MIPS" bog entry. The largest sort pool setting I've seen in a production DB2 environment is 48,876 KB (the default value is 10,000 KB in a DB2 10 system). On this particular subsystem, there was very little activity in the buffer pools dedicated to work file table spaces. That might be due at least in part to lots of sorts getting done in the large in-memory sort work area. Keep in mind that the sort pool value as specified in ZPARM is the maximum sort work area for an individual SQL-related sort. With lots of concurrent sorts and a large SRTPOOL value, you could see quite a bit of DBM1 virtual storage utilized for sort work space. That could be OK if you have a lot of memory on the z/OS LPAR on which the DB2 subsystem is running.

DDF transaction rate. Back in the 1990s, when DDF (the distributed data facility, through which network-attached application servers access DB2 for z/OS data) was still relatively young, it wasn't thought of as a conduit for for high-volume transaction processing. Various technology enhancements -- including block fetch, static SQL support (via DRDA), stored procedures, and dynamic statement caching -- had a positive effect on DDF application throughput, and nowadays lots of organizations have high-volume DB2 for z/OS client-server workloads. The highest DDF transaction rate I've seen for a single DB2 subsystem is 786 per second. That's the average rate over a one-hour period, not a burst of one or two minutes' duration (the figure came from a DB2 monitor accounting long report, with data ordered by connection type -- in the DRDA section of the report, I found the number of commits and divided that by 3600, the number of seconds in the report time period). The highest DDF transaction rate I've seen for a multi-member DB2 data sharing group is 1110 per second -- also an average over a one-hour time period.

DDF share of overall DB2 workload. If you think of an overall DB2 for z/OS workload in pie chart terms, there are various ways to calculate the size of the slice that represents a component of the workload. Indeed, there are different ways to define "workload component." Personally, I like to divvy up an overall DB2 workload by connection type: there's the DRDA piece (the DDF workload), the CICS-DB2 piece, the call attach piece (one type of batch interface to DB2), the TSO piece (another mostly batch interface), etc. If you look at things from this perspective, one way to size the pie slices is to measure the aggregate in-DB2 cost of SQL statement execution for each connection type. This is easily done if you have a DB2 monitor accounting long report, with data in the report ordered by connection type: you just go to a section of the report (e.g., the CICS section), find the average class 2 CPU time (and make sure that you add "specialty engine" CPU time, if any -- this would typically be zIIP engine CPU time -- to "central processor" CPU time) and multiply that by the "number of occurrences" (this will typically be a field in the upper right of the first page of the report section for a connection type, under a heading of "Highlights" or something similar). You can decide whether you want the report to cover a particularly busy one- or two-hour time period for your system, or a 24-hour period.

I've observed over the years that the DRDA slice of the overall DB2 workload pie is getting larger and larger at many sites, and for some subsystems it's already the largest workload component -- bigger than the CICS-DB2 slice, bigger than the batch DB2 slices (call attach and TSO). Recently, a DB2 for z/OS DBA told me that his organization has a production subsystem for which 95% of the work flows through the DDF address space.

Number of rows in one table. Considering just the DB2 for z/OS-using organizations with which I've directly worked, the largest table of which I'm aware has about 18.9 billion rows. I recall hearing of a company that has about 90 billion rows of data in one DB2 table. The theoretical maximum number of rows in one DB2 for z/OS table is currently 1 trillion.

Members in a DB2 data sharing group. The largest DB2 data sharing group of which I'm aware has 22 members. DB2 for z/OS data sharing and Parallel Sysplex technology (the latter being the mainframe cluster configuration that provides the infrastructure on which DB2 data sharing runs) are architected to support up to 32 DB2 subsystems in one single-image system.

That's all for now. I hope that this blog entry has provided you with information of interest. Maybe some ammo that you can use in "Can we do that?" discussions at your site. Maybe validation of plans you've made or actions you've taken for a DB2 environment that you support. Maybe something that gives you a sense of superiority ("You call THAT big?"). In any case, I'll probably return to this topic at some future date, because one thing I know about "big" is that it keeps getting bigger.

Wednesday, October 9, 2013

DB2 for z/OS: Something You Might Not Know About Character String Literal Values

Here's something I imagine a lot of you (or programs written by you) do all the time: issue DB2 for z/OS-accessing SQL statements that contain character string literal values. Below is a query that I issued from Data Studio earlier today, targeting a table that I'd created on one of our test DB2 subsystems:

SELECT PRODUCT_ID, UNIT_SALES
FROM PRODUCT_SALES
WHERE PRODUCT_ID = 'A12345';

Now, a question for you: is that character string literal value, 'A12345', fixed-length or varying-length? Considering this from your point of view, you might answer, "Fixed length." After all, it's six characters, right? No more, no less. And if you look at it again a few minutes from now, it'll still be six characters. Guess what? From DB2's point of view, the value 'A12345' in the above statement is a varying-length character string. I'm not revealing any state secrets here -- check the DB2 10 for z/OS SQL Reference and you'll see, in the first sentence of the section on character string constants, the following:

"A character string constant specifies a varying-length character string."

Of course, even a seasoned mainframe DB2 professional is not likely to have committed the entire SQL Reference to memory, and in fact plenty of people are unaware of this little nugget of information. That's generally not a big deal, because it almost never matters. Why does it hardly ever matter? Because fixed-length and varying-length character strings are compatible for purposes of assignment and comparison in a DB2 for z/OS system. Take the previously referenced SELECT statement. The 'A12345' character string constant is, as noted, treated by DB2 as a varying-length value. The PRODUCT_ID column of my PRODUCT_SALES table is defined as CHAR(6) -- a fixed-length character column. Does that matter? No -- the query runs just fine. What if you have a stored procedure with an input parameter defined as CHAR(10), and you call the stored procedure with a 10-character (or shorter) literal value provided for that input parameter? Does that matter? No -- the CALL will work just fine. Same goes for a user-defined function invocation.

Alright, so I said that the fact that DB2 treats a character string constant as a varying-length value almost never matters. "Almost never" is not the same thing as "never." So, when does it matter? I'm aware of one case, which was recently brought to my attention by a DB2 for z/OS DBA who I've known for years. This individual was trying to create a column mask in a DB2 10 for z/OS new-function mode environment. His CREATE MASK statement looked something like this (I'm using table and column names other than the real ones):

CREATE MASK ABC.COL2_MASK
ON ABC.MY_TABLE
FOR COLUMN COL2 RETURN
   CASE
    WHEN (1 = 1)
     THEN '999999'
    ELSE '000000'
   END
   ENABLE;

When he issued that SQL statement, it failed with a -20474 SQL code, reason code 33, indicating that "The data type of the return expression is not the same as the data type of the column on which the column mask is defined." This didn't make sense to my DBA friend, because COL2 was defined as CHAR(6), and he'd provided 6-character literal values in the RETURN clause of the CREATE MASK statement. In this situation, the problem was related to the fact that the character string literals in the SQL statement were treated as varying-length values by DB2. Why did that matter? Because, per the DB2 10 SQL Reference, one of the rules of column mask creation is as follows:

"The result data type, null attribute, data length, subtype, encoding scheme, and CCSID of the CASE expression must be identical to those attributes of the column that is specified by column-name."

Fixed-length character strings and varying-length character strings are two different DB2 data types. Thus, the -20474 error, with reason code 33. When the DBA changed the statement as shown below, it worked:

CREATE MASK ABC.COL2_MASK
ON ABC.MY_TABLE
FOR COLUMN COL2 RETURN
   CASE
    WHEN (1 = 1)
     THEN CAST('999999' AS CHAR(6))
    ELSE CAST('000000' AS CHAR(6))
   END
   ENABLE;

The statement also worked when coded as follows, because the CHAR scalar function returns a fixed-length character string representation of the argument (this from the DB2 10 SQL Reference, with emphasis added by me):

CREATE MASK ABC.COL2_MASK
ON ABC.MY_TABLE
FOR COLUMN COL2 RETURN
   CASE
    WHEN (1 = 1)
     THEN CHAR('999999')
    ELSE CHAR('000000')
   END
   ENABLE;

Recap time. 1) Character string literals in SQL statements are treated by DB2 for z/OS as varying-length values. 2) That almost never matters. 3) It matters when you want to create a mask for a fixed-length character column.

Monday, September 30, 2013

DB2 for z/OS Hash-Organized Data: an Interesting Performance Story

A lot of DB2 for z/OS people know about the hash mode of organizing data in a table -- something introduced with DB2 10 (I've blogged about this topic multiple times, most recently in an entry I posted a few months ago). The basics of hash-organization of data (this as opposed to traditional cluster-based data organization) are pretty straightforward: you select for a table a hash key (which could be comprised of a single column or a concatenation of multiple columns, as long as each key value is unique), and you tell DB2 to hash-organize the data in the table using the designated key (this can be done with a CREATE TABLE or an ALTER TABLE statement -- in the latter case a subsequent online REORG of the associated table space changes the mode of data organization from cluster-based to hash-based). Thereafter, when a row is inserted into the table, DB2 will determine the target page for the row by running the value of the hash key for the row through a hashing algorithm. The really good part, performance-wise, comes when a row is retrieved from the table. If a SELECT statement includes an "equals" predicate that references the table's unique hash key, DB2 can run that hash key through the aforementioned hashing algorithm to identify the page into which the row was assigned on insert, and voila -- the row is returned to the requesting application process with a single GETPAGE (possibly some rows in a hash-organized table will be placed in the overflow area of the table space at insert if the hash-identified target page is full, but the percentage of such rows in the table should be small if the table space was properly sized). In contrast, if DB2 were using a unique index to access a row then multiple GETPAGE operations would be necessary (e.g., three GETPAGEs for a three-level index, followed by one more GETPAGE for the table space page identified via the row ID found in the index leaf page containing the key value referenced in an "equals" predicate in a query). Because GETPAGEs are a key determinant of the CPU cost of executing a SQL statement, the cost of accessing a row in a hash-organized table will, in some cases, be less than the cost of accessing the same row in a cluster-organized table.

When will a hash-organized table be tops in terms of efficiency of data access? Your initial response to that question might be, "A hash-organized table wins the CPU efficiency game when a query retrieves a single row qualified by an 'equals' predicate that references the table's hash key." That answer will generally be right, but it won't always be right, as illustrated by an interesting situation recently brought to my attention by a DB2 for z/OS DBA. This DBA informed me of a test he ran in a DB2 10 new-function mode environment. The test involved a program that issued a query that targeted a hash-organized table and contained an "equals" predicate referencing the table's hash key. The DBA ran the same program using the same data, with the only difference being cluster-based organization of the data in test B instead of the hash organization used in test A. Guess what? The program's elapsed and CPU times were lower when the target table was cluster-organized. Huh? How could that be?

The key to what at first appears to be a strange result is this: the program executed by the DBA issued the singleton SELECT with the hash key-referencing "equals" predicate in a loop, with the value plugged into the "equals" predicate picked up, for each execution of the query, from a file -- a very common batch scenario in the mainframe DB2 world. On top of that, the file of key values was sorted in a way that pretty much lined up with the clustering sequence of the target table (referring to the case in which the program was executed with a cluster-organized table). This is also very common in DB2 for z/OS systems. With the input file of key values sorted in this way, execution of the program with a cluster-organized table resulted in a sequential page access pattern: each successive table space page accessed was "ahead of" (with respect to physical order) and "nearby" (generally speaking, within 16 pages) of the previously accessed page. Through the mechanism known as sequential detection, DB2 recognized this sequential data access pattern and activated dynamic sequential prefetch for the program, significantly reducing elapsed time, and saving CPU time, as well, relative to the random synchronous read activity attending the execution of the program with the hash-organized table. Additionally, repeated execution of the singleton SELECT with the cluster-organized table allowed DB2 to utilize index look-aside to dramatically reduce index-related GETPAGE activity, thereby largely negating the GETPAGE minimization advantage that would otherwise be seen with access to a hash-organized table (note that, for a batch program that issues many COMMITs, the effectiveness of sequential detection and index look-aside is maximized when the associated package(s) is bound with RELEASE(DEALLOCATE)). Thus, in this case, two old DB2 performance features (I believe that both sequential detection and index look-aside were introduced with DB2 V2.3, in the early 1990s) trumped one very new one (hash-organized data).

Now, this isn't the end of the story. The DBA ran the test again, this time with the input file of key values sorted in a random fashion that didn't line up at all with the sequencing of rows in the cluster-organized table. The results of this second test were the reverse of what was seen the first time around: the performance of the program was better when it executed with the hash-organized table. No surprise there. With sequential detection and index look-aside out of the picture, the one-GETPAGE-per-row aspect of hash-organized data access beat out the many top-to-bottom index probes and random reads of table and index pages that came with access to the cluster-organized table.

So, in weighing whether or not a particular table should be hash- or cluster-organized, do not check to see only that the table is primarily accessed by singleton SELECTs that qualify rows with an "equals" predicate referencing what would be the table's hash key. Consider, as well, where those singleton SELECTs come from. Are they mostly issued by online transactions that retrieve just one row (or a very small number of rows) from the table with each execution, or are they chiefly issued in do-loop fashion by batch programs that pull lots of rows from the table, using files of input keys that can be sorted to match (or highly correlate with) what is (or would be) the table's clustering key? If the latter holds true, the oldie-but-goodie batch performance optimization features, sequential detection and index look-aside, might provide even greater CPU efficiency (and lower elapsed times) that you could get with a hash-organized table. If the singleton SELECTs are, successively speaking, more random in terms of data access, hash-organizing the data could be your best choice for optimal performance. What's true now has been true as long as I've been working with DB2 for z/OS (a long time): good physical database design decisions proceed from knowledge of your data and of how that data is accessed. 'Nuff said.

Wednesday, September 25, 2013

DB2 10 for z/OS: Take Advantage of "Native" SQL User-Defined Functions

Lots and lots of mainframe DB2 people know about the native SQL procedure technology delivered with DB2 9 for z/OS. That was, in my opinion, the most important enhancement of this aspect of DB2 functionality since stored procedures were introduced in the mid-1990s with DB2 for z/OS Version 4 (I've blogged multiple times about native SQL procedures, starting with an entry I posted a few years ago to the blog I maintained while working as an independent DB2 consultant). Many DB2-using organizations around the world are reaping the benefits of native SQL procedure utilization, but plenty of folks have overlooked a related enhancement provided by DB2 10 for z/OS: "native" SQL user-defined functions (UDFs). I have "native" in quotes because officially these UDFs are not known by that designation -- they are simply referred to in the DB2 10 SQL Reference as SQL scalar UDFs and SQL table UDFs; however, they share with native SQL procedures several characteristics that make them a great choice for packaging server-side data manipulation logic, so calling DB2 10 SQL UDFs "native" appeals to me. In this entry, I'll explain why native SQL UDFs (not gonna put quotes around native from here on out) are not only an advance in terms of programming productivity, but also, in some cases, a way to improved application performance versus traditional, "external" UDFs.

First, a little review. UDFs, like stored procedures, provide (as noted) a means of packaging data access logic on the DB2 for z/OS server. One of the main differences between UDFs and stored procedures is the way in which they are invoked. Stored procedures, of course, are invoked via a CALL statement, like this one:

CALL MYPROC(:V1, :V2);

UDFs, on the other hand, are invoked as a result of appearing in an SQL DML statement -- usually a SELECT statement. For example, if I have a simple UDF, called KM_MILES, that converts kilometers to miles, it could be executed by way of a statement like the one below:

SELECT KM_MILES(10) FROM SYSIBM.SYSDUMMY1;

This difference in the means of invocation is one reason why a person might choose to package some data manipulation logic in the form of a UDF as opposed to a stored procedure -- sometimes it's preferable to invoke the packaged functionality via a SELECT (or other SQL DML statement), versus a CALL statement (and sometimes the reverse is true, and a stored procedure will be favored over a UDF).

Now, one of the great things about native SQL procedures is that they significantly expanded the population of people who could develop stored procedures in a DB2 for z/OS environment: if you could write SQL, you could develop a native SQL procedure -- you didn't have to know COBOL or Java or one of the other languages used to develop traditional external DB2 stored procedures (external SQL procedures, introduced with DB2 Version 7 for z/OS, similarly expanded the stored procedure development population, but external SQL procedures execute as C language stored procedures, and many organizations prefer the "in-DB2" nature of native SQL procedures). Development-oriented DBAs, in particular, got heavily involved with stored procedure development at numerous sites, thanks to native SQL procedure technology. In the same way, native SQL UDFs extended DB2 for z/OS UDF development capabilities to people who know SQL but don't know COBOL or Java or C or one of the other external UDF languages.

Here, I have to make an acknowledgement. It is true that UDFs could be written in SQL prior to DB2 10 for z/OS; however, there were significant restrictions placed on such UDFs. Basically, in a pre-DB2 10 system the functionality of a SQL UDF was limited to what you could code in the UDF's RETURN statement. And the SQL expression in that RETURN statement couldn't reference a column name. And, it couldn't contain a scalar fullselect (i.e., you couldn't generate the returned value via a SELECT statement). Also, data-changing SQL statements (INSERT, UPDATE, DELETE) were not allowed (you could specify READS SQL DATA in the CREATE FUNCTION statement, but MODIFIES SQL DATA was not an option). If you wanted to declare variables and assign values to same in your SQL UDF, you were out of luck. Logic control SQL statements, such as GOTO, IF, and WHILE? Nope. Oh, and SQL table UDFs, which return a set of rows versus a single value? Forget about it (or, as my friends up New York City way might say, "fuhgeddaboudit").

With DB2 10 (in new-function mode), those restrictions went away. A DB2 10 SQL UDF can call a stored procedure, consume a result set generated by a called stored procedure, declare variables and assign values to those variables, loop through a set of statements, provide condition handlers, and more (and, DB2 10 allows for the coding of row-set-returning table UDFs in SQL). So DB2 10 not only expands the population of people who can develop UDFs (because UDFs can be written in SQL), it also enables SQL UDF developers to code routines that can do all kinds of useful things -- the possibilities have been hugely extended versus the DB2 9 situation. Programming productivity is further enhanced by the fact that DB2 10 SQL UDFs (like native SQL procedures) do not have the external-to-DB2 artifacts and processes associated with external routines -- there is no need for precompile, compile, and linkedit steps; no object libraries; no load libraries; no WLM execution environment (though you can specify a WLM environment to be used for running a SQL UDF in debug mode).

And what about the performance angle? There are several benefits here. For a client-server workload involving access to DB2 via the DDF address space from DRDA requesters, there's the fact that a SQL UDF runs under the task of the process invoking the UDF, as opposed to running under a TCB in a WLM-managed address space. When the application process is a DRDA requester, the task in the z/OS system is an enclave SRB in the DDF address space, and that means substantial zIIP offload for SQL UDF-related processing. Also related to DDF-using applications: the package associated with a SQL UDF can be bound with RELEASE(DEALLOCATE), providing the CPU efficiency boost associated with high-performance DBATs when the UDF is invoked through a SQL statement issued by a DRDA requester.

Another performance benefit has to do with the elimination of "thread-task switching" when SQL UDFs are utilized instead of external UDFs. Here's what I mean by "thread-task switching": when an external UDF is invoked, the DB2 thread of the application process has to be switched from the task of the application process to the task of the external UDF. When the UDF has completed its processing, the DB2 thread is switched back to the task of the UDF-invoking application process. When the UDF is a SQL UDF, there is no "other" task involved, because (as mentioned) the SQL UDF runs under the task of the invoking application process (and runs in the DB2 DBM1 address space); thus, there is no need to switch a DB2 thread from one task to another and back. If a UDF is invoked once in the course of the invoking SQL statement's execution, the lack of thread-task switching in the SQL UDF case may not be noticeable. But what if the UDF is invoked LOTS of times in the course of an SQL statement's execution? I heard recently of an interesting real-world situation of this nature. A DB2 for z/OS DBA told me of a SELECT statement at his site that included a COBOL-coded external UDF in the select-list and in the WHERE clause of the statement. One execution of this statement could drive over 100,000 executions of the external UDF. Statement elapsed time could go to several minutes. Much of that time was UDF time, and the vast majority of UDF elapsed time was shown by a monitor to be TCB wait time. In response to this performance issue, the DBA replaced the COBOL external UDF with a functionally equivalent SQL UDF (the organization is running DB2 10 for z/OS in new-function mode). The result? UDF-related TCB wait time was eliminated, and response time for the query went way down.

Believe me, folks, the ability to code functionally rich, native SQL UDFs is one of the real gems provided with DB2 10 for z/OS. It's a gem that has not received a lot of attention (understandable, given the many "wow" features of DB2 10, such as temporal data support, hash-organized tables, and LOB inlining), but it's one that you shouldn't overlook. Consider how you could put this technology to work in your shop.

Friday, August 30, 2013

DB2 for z/OS: GETPAGEs are not ALWAYS the Key Determinant of SQL Statement CPU Time

In my DB2 for z/OS-related work over the past 25 years, I've done a lot in the way of performance analysis, looking at individual queries and at DB2-accessing application programs (both transactional and batch in nature). Time and time again, sometimes in reviewing trend data and sometimes in examining DB2 monitor output to gauge the impact of performance tuning actions, I've seen that the CPU time consumed in SQL statement execution rises or falls with the associated level of GETPAGE activity (a GETPAGE is essentially a request by DB2 to examine a page from a table or an index). The correlation I observed between CPU cost and GETPAGEs for SQL statements was so high that I came to see GETPAGE activity as the number one determinant of a query's CPU consumption in a mainframe DB2 system (and I declared as much in a blog entry that I wrote a few years ago while working as an independent DB2 consultant).

Well, a lot of rules have exceptions, and that turned out to be the case for this rule of thumb. I got some information from a DB2 for z/OS DBA that turned my notion of the relationship between GETPAGEs and CPU cost for SQL statements on its head. The data I saw seemed at first to be very strange, but as I dug deeper it started to make sense. I still see GETPAGEs as being, more often than not, the most important factor with regard to a query's CPU cost, but it's not the only factor, and sometimes other aspects of SQL statement execution exert a greater influence on CPU consumption. In this blog entry I'll show you why GETPAGEs are not always the strongest flavor in the mix of determinants that influence CPU consumption for DB2 for z/OS-accessing queries.

Interestingly, the case about which I'm writing involved data compression. That's a DB2 for z/OS feature that's been around a LONG time (around 20 years or so), and which is VERY widely used. Data compression is super-popular at mainframe DB2 sites because it saves a ton of disk space and it's virtually free, overhead-wise. Why is DB2 for z/OS data compression so CPU-efficient? Two reasons: first, it gets a hardware assist from System z servers (big difference versus a software-only implementations of data compression). The second reason is GETPAGE-related: because compression often allows you to get three to four times as many rows in a page versus a non-compressed table space, the number of GETPAGEs required to access data frequently drops considerably when a table space is compressed. Yes, even hardware-assisted data compression has a cost, but many DB2 for z/OS-using organizations have found that the reduction in GETPAGE activity that goes along with compression largely offsets the CPU cost of compressing and decompressing data rows; thus, the near-zero overhead effect of DB2 compression observed in many systems.

Imagine my surprise, then, when the aforementioned DBA presented me with DB2 monitor data showing the effect of data compression on a DB2 for z/OS-accessing query. I saw that the number of GETPAGEs associated with execution of the query dropped by 65% after the target table space was compressed. No surprise there. I also saw that the query's elapsed time dropped by a whopping 92%. Not a huge surprise there, either. The lion's share of elapsed time for many a query is wait-for-I/O time, and with many fewer pages having to be read from disk to obtain the query's result set, a much-reduced run time would be expected. Then came the not-expected result: the query's CPU time went up by 31% following compression of the target table space.

What?!? How could that be? How could you have a 65% reduction in GETPAGEs and and a 31% increase in CPU time? Sure, I wouldn't expect CPU time to go down by the percentage of GETPAGE reduction, because some CPU cost does get added when COMPRESS YES is put into effect for a table space. Still, though, I would have expected either a modest decrease in the query's CPU cost, or at worst a break-even situation, with the lower level of GETPAGE activity cancelling out the CPU expense of decompressing result set rows. This just didn't make sense to me.

My confusion started to dissipate as I learned more about the query in question, and thought more about what DB2 was doing in executing the query. The query targeted a table with about 3 million rows, and 600,000 of those rows had to be examined in order to generate the result set. The large number of rows needing evaluation was due to the fact that the primary search argument -- the one that really filtered out a lot of rows -- was a non-indexable LIKE predicate used to find a particular character string that could occur anywhere within a certain column of the table (the % in front of the LIKE made the predicate non-indexable; an index on the column could have been used in executing the query had there been only a % after the LIKE, versus the %LIKE% that enabled location of the character string anywhere in the referenced column). Another predicate in the query -- an indexable one -- filtered out 80% of the table's rows, but the remaining 600,000 rows had to be individually examined for the existence of the required character string in the referenced column, as noted.

Here's where the compression effect kicked in. See, when rows in a compressed page are evaluated against a query search argument, they are individually decompressed, one at a time. Sometimes (very often, in fact), an index can be used to identify the small number of qualifying rows that must be examined in a given table page (or simply retrieved from the page, if result set row qualification can be fully determined at the index level), and decompressing those few rows in the page will be a relatively small component of the SQL statement's overall CPU cost. Other times (and this was the case for the query that the DBA discussed with me), an index can be used to reduce the number of table pages requiring access in query execution, but in a given page there might be 30 or 40 or more rows that have to be decompressed and evaluated (and maybe retrieved). When decompression cost is driven by the expansion of 30 or 40 rows in a page, versus 3 or 4 rows, that cost can rise to the point that it exceeds the cost reduction associated with diminished GETPAGE activity for a compressed table space. That's exactly what happened in the situation I've described. Was the compression performance result reported by the DBA good or bad? That depends on whether the performance objective for the query was improved response time (in which case the big reduction seen for the query's elapsed time with the compressed table space would be a positive outcome) or lower CPU consumption (this particular query consumed less CPU time when the target table space was not compressed).

It all goes to show that rules of thumb are best thought of as guidelines, as opposed to absolutes. You still have to use your brain in analyzing query performance, and you have to be ready to dig below the surface of a situation. GETPAGE activity is indeed a very important factor in determining a query's CPU cost in a DB2 for z/OS system, and tuning actions that reduce GETPAGEs are generally beneficial with regard to SQL statement CPU efficiency. What you have to remember is that GETPAGE activity is one of several determinants of query performance. When DB2 has to do something to a lot of rows in each page accessed in the execution of a query -- things like decompression, value transformation via scalar functions, casting of one data type to another, arithmetic operations, etc. -- the CPU cost of GETPAGEs can become a relatively smaller part of the query performance big picture. Consider what DB2 has to do with the rows on a page -- and with how many of the rows on that page -- once it gets the page. There will be times when improving the CPU efficiency of a query is is not just a matter of reducing the query's GETPAGE count. Also, have a good understanding of your query tuning objective: is it reduced elapsed time, or reduced CPU consumption, or both?

Rules of thumb can be useful in guiding your query tuning efforts. Don't let them become blinders that prevent you from thinking outside of the box.

Friday, August 9, 2013

DB2 for z/OS: Get Your DDF-Related Dispatching Priorities Right

The DB2 for z/OS Distributed Data Facility (DDF), the component of DB2 that provides data services for network-attached client applications, is not a new kid on the block -- it's been around for more than two decades. In recent years, however, DDF has gained in importance for many organizations as client-server activity has grown as a percentage of the overall DB2 workload (in a blog entry posted a couple of years ago, I wrote about a way in which a DB2 monitor accounting report can be used to determine the relative size of a DDF workload in a DB2 system). At numerous sites, DDF-related activity is the fastest growing component of the overall DB2 for z/OS workload, and in more and more cases it is already the largest DB2 workload component on a mainframe system (a DBA recently told me that more than 95% of the SQL processing on his company's production DB2 for z/OS system is DDF-related).

As a DDF workload grows, it becomes more important that the associated execution environment be set up for optimal performance and throughput. A key aspect of this effort is the proper assignment of dispatching priorities (by way of a z/OS WLM policy) to DDF-related tasks in the system, and by "tasks" I'm referring both to the "main tasks" of address spaces (the DDF address space and, if applicable, stored procedure address spaces) and to tasks associated with application processes sending SQL requests to DB2 via DDF. Lots of organizations have done this in a way that I'd consider to be spot-on. That said, I've seen that some companies have their DDF-related dispatching priorities set up in a less-than-ideal way. In this blog entry I'll give you my recommendations on assigning priorities to DDF-related tasks in your system.

First, let's consider the DDF address space itself (it's also known as the DIST address space). Too often, I see that people have assigned a too-low priority to DDF. The fact of the matter is that the DDF address space should have the same priority as the other DB2 address spaces (those being the database services and system services address spaces, also known as DBM1 and MSTR, respectively). The DB2 DBM1 and MSTR address spaces are typically assigned to a service class with a name like STCHIGH or STCHI, and that is where the DIST address space should be, too (the IRLM address space should be assigned to the SYSSTC service class). Why is DDF often assigned to a service class with a priority below that of the service class used for DBM1 and MSTR? I believe that it has to do with two things: 1) a belief that SQL getting to DB2 via DDF will execute at the priority of the DDF address space, and 2) a misunderstanding of DDF CPU utilization in general. With regard to that first factor (DDF-related SQL executing at DDF's priority), this is something that was once true but stopped being true a LONG time ago. For years and years its been the case that SQL coming through DDF executes under tasks called enclave SRBs (more specifically, preemptible SRBs), and the priority of these tasks is determined according to the service class to which network-attached application processes have been mapped in the active WLM policy (more on this momentarily) -- NOT by the priority of the DDF address space.

As for the general misunderstanding of DDF CPU utilization on the part of some folks, that's a topic on which I blogged last year. There are people who see a high level of DDF CPU utilization in an environment with a large amount of DB2 client-server activity, and don't know what to make of that. Were they to do a little investigation using DB2 monitor data (as I described in the aforementioned blog entry), they'd see that only a very small portion of that DDF CPU time is associated with DDF "system" tasks (those that run at the priority of the DDF address space). The vast majority of DDF CPU time simply reflects the cost of executing the SQL statements that get to DB2 via DDF (just as the CPU cost of SQL statements issued by CICS-DB2 transactions is charged to the associated CICS region), and those statements, as noted in the preceding paragraph, execute at a priority dictated by the service class (or classes) to which network-attached application requesters have been mapped in the system's WLM policy. The DDF "system" tasks use very little CPU, but when those tasks need CPU time they need it RIGHT AWAY, and if they can't be dispatched in a very timely manner then the throughput of the DDF application workload could be negatively impacted in a busy system.

So, do the right thing and assign the DDF address space to the same high-priority service class that you use for the DB2 DBM1 and MSTR address spaces; and, while we're on the subject of address spaces, I'll tell you that your WLM-managed stored procedure address spaces (if you use external DB2 stored procedures) should also have the same priority as DBM1 and MSTR, and for the same reason that I make this recommendation for DDF: the "main tasks" of these stored procedure address spaces use very little CPU time, and they need to be very quickly dispatched when they have work to do if you want optimal stored procedure performance. A stored procedure that executes in one of these WLM-managed address spaces inherits the dispatching priority of the application task that called it -- it does NOT run at the priority of the stored procedure address space (a native SQL procedure, which runs in the DB2 DBM1 address space, also inherits the priority of the task of the application process that called it).

Now, about the DDF application tasks (versus the DDF "system" tasks): these, as I mentioned previously, are preemptible enclave SRBs in the DDF address space. We've established that the priority of these tasks is NOT tied to the priority of the DDF address space; rather, they are determined according to the service class (or classes) to which network-attached application processes are assigned in the system's WLM policy. But what if your WLM policy does no such mapping? What then? Well, in that case, your network-attached application processes will, by default, be mapped to a service class called SYSOTHER. That service class has a priority called "discretionary." That's a nice way of saying "bottom feeder." It is a LOW priority -- almost certainly NOT the one you want for your DDF transactions (unless you like providing DDF users with poor performance). At the very least, you should have a "catch-all" service class for DDF-connected applications that has at least a modestly high dispatching priority. Of course, if you have a large and varied DDF workload (some business intelligence queries, some high-volume transactions associated with a vendor-supplied ERP application, some processes related to a master data management application, etc.), you can -- and probably should -- map different parts of the DDF workload to different service classes. There are multiple attributes that you can use to tie various network-attached application processes to different service classes, including collection name, procedure name, and primary DB2 authorization ID; and, for a given service class you can use period aging to progressively lower a task's priority as its execution time increases (particularly useful for workloads characterized by long-running queries). In all cases, I recommend keeping the priority of DDF application tasks at least a few notches below the priority of the DB2 address spaces (DBM1, MSTR, DIST, and -- if applicable -- stored procedure address spaces). You don't want application code running at a priority higher than that of system code. Really useful information on setting up a WLM policy for a DB2 DDF workload can be found in section 3.3 ("Workload Manager setup") of the IBM redbook titled "DB2 9 for z/OS: Distributed Functions" (very much applicable to a DB2 10 system).

There you have it. Make sure that you understand how dispatching priorities are assigned to DDF system and application tasks, and make sure that your WLM policy is set up to help deliver the kind of DB2 data-serving performance that your client-server application users want.

Thursday, July 25, 2013

DB2 for z/OS: Clearing Up Some Matters Pertaining to LOB Inlining

Plenty of folks know that LOB inlining is one of the really great enhancements delivered with DB2 10 for z/OS (it's something about which I blogged last year). I've noticed lately, however, that a fair number of people have some misconceptions regarding two important aspects of LOB inlining, namely the relationship between inlining and LOB table spaces, and scenarios in which LOB inlining is and is not a good idea from an application performance perspective. In this entry I'll aim to clear up these misunderstandings.

LOB table spaces: You need them. Period (but DB2 can help)

Some DBAs have this idea that LOB inlining can eliminate the requirement that there be an auxiliary table (and associated table space and index) for every LOB column in a table definition (and that's actually one auxiliary table per LOB column per partition, if the base table space is partitioned). This line of thinking is understandable: If the largest (length-wise) value that will go into a LOB column is smaller than the inline length specified for that column, so that all of the column's values can be 100% inlined, there's no need for an auxiliary table for that column, right?

WRONG. You MUST have an auxiliary table (and associated LOB table space and index) if you are going to have a LOB column in a base table, even if the values in the LOB column will always be 100% inlined. If these objects do not exist, you will not be able to insert any data into the table because the definition of the table will be considered by DB2 to be incomplete. So, avoiding the need for an auxiliary table and a LOB table space and an index is NOT a reason to go with LOB inlining, because LOB inlining does nothing to change the requirement that these objects exist before the table with the LOB column can be used.

Now, creating one measly auxiliary table, and one LOB table space to hold that table, and one index on the auxiliary table (used to quickly access a LOB value associated with a particular base table row) is not exactly a tall order, but recall that you need one set of these objects per partition and per column if the base table is partitioned and has multiple LOB columns. Do the math, and the DDL work can start to look kind of intimidating. What if a base table has 1000 partitions and two LOB columns? Ready to create 2000 auxiliary tables, and the same number of LOB table spaces and indexes?

Before you get all wound up about such a prospect, consider that DB2 can automatically create required LOB-related objects for you when you create a table with one or more LOB columns. DB2 will do that if EITHER of the following is true:

The CREATE TABLE statement for the base table (the table with the LOB column(s)) does NOT include an "in database-name.table-space-name" clause. In that case, DB2 will implicitly create the database for the base table space, the base table space itself, and all other objects needed to make the base table usable (e.g., a unique index on the table's primary key if the CREATE TABLE statement designated a primary key, and all objects needed for LOB data if the CREATE TABLE statement included one or more LOB columns).
The CREATE TABLE statement for the base table DOES include an "in database-name.table-space-name" clause, and the value of the DB2 special register CURRENT RULES is 'STD' at the time of the execution of the CREATE TABLE statement.

Performance: When LOB inlining helps, and when it doesn't

As I see it, the primary ways in which LOB inlining delivers benefits in terms of application performance and resource-efficiency are as follows:

Disk space savings, if a high percentage of a table's LOB values can be 100% inlined in the base table. In such a situation, the disk space requirement for LOB data is reduced in two ways: 1) Compression. Data in a LOB table space cannot be compressed by DB2; however, inlined LOB data values will be compressed by DB2, along with non-LOB data in the base table space. 2) More efficient use of data page space. In a LOB table space, one particular page can hold data associated with one LOB value. If, for example, a LOB table space has 8 KB-sized pages, and a particular LOB value is 9 KB in length, the first 8 KB of the LOB value will be in one page and the last 1 KB will be in a second page in the LOB table space. The rest of that second page in the LOB table space (7 KB of space) will remain empty because it cannot be used to hold data for any other LOB value. In a base table space, of course, there is no such rule, so inlined LOB data can lead to more efficient use of space in data pages.
Improved performance of INSERT and SELECT operations (for SELECTs retrieving LOB data), when most LOB values can be 100% inlined in the base table. The performance gains here can be quite significant versus the non-inlining case.
Ability to create an index on an expression on the inlined portion of a LOB column. Such an index would be created using an expression based on the SUBSTR function. This could be very useful if, for example, you store a type of document as a CLOB and a value of interest (maybe a department number) always appears in characters 10 through 14 of the document. You could build an index on a SUBSTR expression on the inlined portion of the LOB, and therefore be able to very quickly zero in on rows containing documents pertaining to department 'AB123' (I posted an entry about DB2's index-on-expression capability -- introduced with DB2 9 for z/OS -- to the blog I maintained while I was working as an independent DB2 consultant prior to re-joining IBM).

Clearly, LOB inlining can be used very advantageously in some cases. In other cases, LOB inlining could negatively impact application performance. Here are some potential disadvantages of LOB inlining:

Performance degradation for INSERT and SELECT operations (for SELECTs retrieving LOB data) when most LOB values cannot be 100% inlined in the base table. Performance would be negatively impacted because DB2 would have to go to both the base table and the auxiliary table for most inserts and retrievals of LOB data.
Performance degradation for SELECTs that DO NOT retrieve LOB data. When you inline LOB data, you make the base table rows longer (sometimes much longer). As a result, you'll have fewer base table rows per page, and because of that you'll get a lower buffer pool hit ratio for the base table. That means more disk I/Os, and that will impact elapsed time.

Here's the bottom line: If LOB data will be stored in a table but only rarely retrieved, inlining probably isn't a good idea unless it is very important to improve INSERT (or LOAD) performance (and that won't happen unless most LOB values can be 100% inlined in the base table). If you go with inlining for this reason, consider enlarging the buffer pool to which the base table space is assigned so the buffer pool hit ratio won't be negatively impacted. LOB inlining can have a very positive impact on the performance of queries that retrieve LOB data, so if that is important to you then inlining can be a very good move when most of the LOB values can be 100% inlined in the base table. Again, consider enlarging the base table space's buffer pool so that queries that do not retrieve LOB data won't be negatively impacted by a reduced buffer pool hit ratio.

DB2 10 is by far the best release of DB2 in terms of LOB data management capabilities, and LOB inlining is an important part of that story. The important thing to keep in mind is that LOB inlining is not a universally applicable DB2 feature. Figure out first if inlining makes sense for your particular situation.