Friday, August 30, 2013

DB2 for z/OS: GETPAGEs are not ALWAYS the Key Determinant of SQL Statement CPU Time

In my DB2 for z/OS-related work over the past 25 years, I've done a lot in the way of performance analysis, looking at individual queries and at DB2-accessing application programs (both transactional and batch in nature). Time and time again, sometimes in reviewing trend data and sometimes in examining DB2 monitor output to gauge the impact of performance tuning actions, I've seen that the CPU time consumed in SQL statement execution rises or falls with the associated level of GETPAGE activity (a GETPAGE is essentially a request by DB2 to examine a page from a table or an index). The correlation I observed between CPU cost and GETPAGEs for SQL statements was so high that I came to see GETPAGE activity as the number one determinant of a query's CPU consumption in a mainframe DB2 system (and I declared as much in a blog entry that I wrote a few years ago while working as an independent DB2 consultant).

Well, a lot of rules have exceptions, and that turned out to be the case for this rule of thumb. I got some information from a DB2 for z/OS DBA that turned my notion of the relationship between GETPAGEs and CPU cost for SQL statements on its head. The data I saw seemed at first to be very strange, but as I dug deeper it started to make sense. I still see GETPAGEs as being, more often than not, the most important factor with regard to a query's CPU cost, but it's not the only factor, and sometimes other aspects of SQL statement execution exert a greater influence on CPU consumption. In this blog entry I'll show you why GETPAGEs are not always the strongest flavor in the mix of determinants that influence CPU consumption for DB2 for z/OS-accessing queries.

Interestingly, the case about which I'm writing involved data compression. That's a DB2 for z/OS feature that's been around a LONG time (around 20 years or so), and which is VERY widely used. Data compression is super-popular at mainframe DB2 sites because it saves a ton of disk space and it's virtually free, overhead-wise. Why is DB2 for z/OS data compression so CPU-efficient? Two reasons: first, it gets a hardware assist from System z servers (big difference versus a software-only implementations of data compression). The second reason is GETPAGE-related: because compression often allows you to get three to four times as many rows in a page versus a non-compressed table space, the number of GETPAGEs required to access data frequently drops considerably when a table space is compressed. Yes, even hardware-assisted data compression has a cost, but many DB2 for z/OS-using organizations have found that the reduction in GETPAGE activity that goes along with compression largely offsets the CPU cost of compressing and decompressing data rows; thus, the near-zero overhead effect of DB2 compression observed in many systems.

Imagine my surprise, then, when the aforementioned DBA presented me with DB2 monitor data showing the effect of data compression on a DB2 for z/OS-accessing query. I saw that the number of GETPAGEs associated with execution of the query dropped by 65% after the target table space was compressed. No surprise there. I also saw that the query's elapsed time dropped by a whopping 92%. Not a huge surprise there, either. The lion's share of elapsed time for many a query is wait-for-I/O time, and with many fewer pages having to be read from disk to obtain the query's result set, a much-reduced run time would be expected. Then came the not-expected result: the query's CPU time went up by 31% following compression of the target table space.

What?!? How could that be? How could you have a 65% reduction in GETPAGEs and and a 31% increase in CPU time? Sure, I wouldn't expect CPU time to go down by the percentage of GETPAGE reduction, because some CPU cost does get added when COMPRESS YES is put into effect for a table space. Still, though, I would have expected either a modest decrease in the query's CPU cost, or at worst a break-even situation, with the lower level of GETPAGE activity cancelling out the CPU expense of decompressing result set rows. This just didn't make sense to me.

My confusion started to dissipate as I learned more about the query in question, and thought more about what DB2 was doing in executing the query. The query targeted a table with about 3 million rows, and 600,000 of those rows had to be examined in order to generate the result set. The large number of rows needing evaluation was due to the fact that the primary search argument -- the one that really filtered out a lot of rows -- was a non-indexable LIKE predicate used to find a particular character string that could occur anywhere within a certain column of the table (the % in front of the LIKE made the predicate non-indexable; an index on the column could have been used in executing the query had there been only a % after the LIKE, versus the %LIKE% that enabled location of the character string anywhere in the referenced column). Another predicate in the query -- an indexable one -- filtered out 80% of the table's rows, but the remaining 600,000 rows had to be individually examined for the existence of the required character string in the referenced column, as noted.

Here's where the compression effect kicked in. See, when rows in a compressed page are evaluated against a query search argument, they are individually decompressed, one at a time. Sometimes (very often, in fact), an index can be used to identify the small number of qualifying rows that must be examined in a given table page (or simply retrieved from the page, if result set row qualification can be fully determined at the index level), and decompressing those few rows in the page will be a relatively small component of the SQL statement's overall CPU cost. Other times (and this was the case for the query that the DBA discussed with me), an index can be used to reduce the number of table pages requiring access in query execution, but in a given page there might be 30 or 40 or more rows that have to be decompressed and evaluated (and maybe retrieved). When decompression cost is driven by the expansion of 30 or 40 rows in a page, versus 3 or 4 rows, that cost can rise to the point that it exceeds the cost reduction associated with diminished GETPAGE activity for a compressed table space. That's exactly what happened in the situation I've described. Was the compression performance result reported by the DBA good or bad? That depends on whether the performance objective for the query was improved response time (in which case the big reduction seen for the query's elapsed time with the compressed table space would be a positive outcome) or lower CPU consumption (this particular query consumed less CPU time when the target table space was not compressed).

It all goes to show that rules of thumb are best thought of as guidelines, as opposed to absolutes. You still have to use your brain in analyzing query performance, and you have to be ready to dig below the surface of a situation. GETPAGE activity is indeed a very important factor in determining a query's CPU cost in a DB2 for z/OS system, and tuning actions that reduce GETPAGEs are generally beneficial with regard to SQL statement CPU efficiency. What you have to remember is that GETPAGE activity is one of several determinants of query performance. When DB2 has to do something to a lot of rows in each page accessed in the execution of a query -- things like decompression, value transformation via scalar functions, casting of one data type to another, arithmetic operations, etc. -- the CPU cost of GETPAGEs can become a relatively smaller part of the query performance big picture. Consider what DB2 has to do with the rows on a page -- and with how many of the rows on that page -- once it gets the page. There will be times when improving the CPU efficiency of a query is is not just a matter of reducing the query's GETPAGE count. Also, have a good understanding of your query tuning objective: is it reduced elapsed time, or reduced CPU consumption, or both?

Rules of thumb can be useful in guiding your query tuning efforts. Don't let them become blinders that prevent you from thinking outside of the box.

Friday, August 9, 2013

DB2 for z/OS: Get Your DDF-Related Dispatching Priorities Right

The DB2 for z/OS Distributed Data Facility (DDF), the component of DB2 that provides data services for network-attached client applications, is not a new kid on the block -- it's been around for more than two decades. In recent years, however, DDF has gained in importance for many organizations as client-server activity has grown as a percentage of the overall DB2 workload (in a blog entry posted a couple of years ago, I wrote about a way in which a DB2 monitor accounting report can be used to determine the relative size of a DDF workload in a DB2 system). At numerous sites, DDF-related activity is the fastest growing component of the overall DB2 for z/OS workload, and in more and more cases it is already the largest DB2 workload component on a mainframe system (a DBA recently told me that more than 95% of the SQL processing on his company's production DB2 for z/OS system is DDF-related).

As a DDF workload grows, it becomes more important that the associated execution environment be set up for optimal performance and throughput. A key aspect of this effort is the proper assignment of dispatching priorities (by way of a z/OS WLM policy) to DDF-related tasks in the system, and by "tasks" I'm referring both to the "main tasks" of address spaces (the DDF address space and, if applicable, stored procedure address spaces) and to tasks associated with application processes sending SQL requests to DB2 via DDF. Lots of organizations have done this in a way that I'd consider to be spot-on. That said, I've seen that some companies have their DDF-related dispatching priorities set up in a less-than-ideal way. In this blog entry I'll give you my recommendations on assigning priorities to DDF-related tasks in your system.

First, let's consider the DDF address space itself (it's also known as the DIST address space). Too often, I see that people have assigned a too-low priority to DDF. The fact of the matter is that the DDF address space should have the same priority as the other DB2 address spaces (those being the database services and system services address spaces, also known as DBM1 and MSTR, respectively). The DB2 DBM1 and MSTR address spaces are typically assigned to a service class with a name like STCHIGH or STCHI, and that is where the DIST address space should be, too (the IRLM address space should be assigned to the SYSSTC service class). Why is DDF often assigned to a service class with a priority below that of the service class used for DBM1 and MSTR? I believe that it has to do with two things: 1) a belief that SQL getting to DB2 via DDF will execute at the priority of the DDF address space, and 2) a misunderstanding of DDF CPU utilization in general. With regard to that first factor (DDF-related SQL executing at DDF's priority), this is something that was once true but stopped being true a LONG time ago. For years and years its been the case that SQL coming through DDF executes under tasks called enclave SRBs (more specifically, preemptible SRBs), and the priority of these tasks is determined according to the service class to which network-attached application processes have been mapped in the active WLM policy (more on this momentarily) -- NOT by the priority of the DDF address space.

As for the general misunderstanding of DDF CPU utilization on the part of some folks, that's a topic on which I blogged last year. There are people who see a high level of DDF CPU utilization in an environment with a large amount of DB2 client-server activity, and don't know what to make of that. Were they to do a little investigation using DB2 monitor data (as I described in the aforementioned blog entry), they'd see that only a very small portion of that DDF CPU time is associated with DDF "system" tasks (those that run at the priority of the DDF address space). The vast majority of DDF CPU time simply reflects the cost of executing the SQL statements that get to DB2 via DDF (just as the CPU cost of SQL statements issued by CICS-DB2 transactions is charged to the associated CICS region), and those statements, as noted in the preceding paragraph, execute at a priority dictated by the service class (or classes) to which network-attached application requesters have been mapped in the system's WLM policy. The DDF "system" tasks use very little CPU, but when those tasks need CPU time they need it RIGHT AWAY, and if they can't be dispatched in a very timely manner then the throughput of the DDF application workload could be negatively impacted in a busy system.

So, do the right thing and assign the DDF address space to the same high-priority service class that you use for the DB2 DBM1 and MSTR address spaces; and, while we're on the subject of address spaces, I'll tell you that your WLM-managed stored procedure address spaces (if you use external DB2 stored procedures) should also have the same priority as DBM1 and MSTR, and for the same reason that I make this recommendation for DDF: the "main tasks" of these stored procedure address spaces use very little CPU time, and they need to be very quickly dispatched when they have work to do if you want optimal stored procedure performance. A stored procedure that executes in one of these WLM-managed address spaces inherits the dispatching priority of the application task that called it -- it does NOT run at the priority of the stored procedure address space (a native SQL procedure, which runs in the DB2 DBM1 address space, also inherits the priority of the task of the application process that called it).

Now, about the DDF application tasks (versus the DDF "system" tasks): these, as I mentioned previously, are preemptible enclave SRBs in the DDF address space. We've established that the priority of these tasks is NOT tied to the priority of the DDF address space; rather, they are determined according to the service class (or classes) to which network-attached application processes are assigned in the system's WLM policy. But what if your WLM policy does no such mapping? What then? Well, in that case, your network-attached application processes will, by default, be mapped to a service class called SYSOTHER. That service class has a priority called "discretionary." That's a nice way of saying "bottom feeder." It is a LOW priority -- almost certainly NOT the one you want for your DDF transactions (unless you like providing DDF users with poor performance). At the very least, you should have a "catch-all" service class for DDF-connected applications that has at least a modestly high dispatching priority. Of course, if you have a large and varied DDF workload (some business intelligence queries, some high-volume transactions associated with a vendor-supplied ERP application, some processes related to a master data management application, etc.), you can -- and probably should -- map different parts of the DDF workload to different service classes. There are multiple attributes that you can use to tie various network-attached application processes to different service classes, including collection name, procedure name, and primary DB2 authorization ID; and, for a given service class you can use period aging to progressively lower a task's priority as its execution time increases (particularly useful for workloads characterized by long-running queries). In all cases, I recommend keeping the priority of DDF application tasks at least a few notches below the priority of the DB2 address spaces (DBM1, MSTR, DIST, and -- if applicable -- stored procedure address spaces). You don't want application code running at a priority higher than that of system code. Really useful information on setting up a WLM policy for a DB2 DDF workload can be found in section 3.3 ("Workload Manager setup") of the IBM redbook titled "DB2 9 for z/OS: Distributed Functions" (very much applicable to a DB2 10 system).

There you have it. Make sure that you understand how dispatching priorities are assigned to DDF system and application tasks, and make sure that your WLM policy is set up to help deliver the kind of DB2 data-serving performance that your client-server application users want.