Monday, April 21, 2014

DB2 for z/OS: the Importance of Historical Performance Data

Just within the past couple of weeks, I've received questions of a similar nature from two different DB2 for z/OS people. In each message, a DBA described a change that he had made in a production DB2 environment (in one case a size increase for several buffer pools, and in the other a change from an external stored procedure to a native SQL procedure). The questions asked pertained to assessing the impact of the change on system and/or application performance.

I've been fielding questions of this nature for a long time. In every such instance, my first question is pretty much the same: do you have accounting and statistics trace information for the DB2 subsystem in question from before the time of the change? If the answer to that question is, "No," I know that assessing the impact of the DB2 system or application change will be less easy -- maybe a lot less easy -- than it otherwise would have been. In this blog entry, I hope to convey to you the importance of having historical DB2 for z/OS performance information on hand (ideally, information that is based on DB2 accounting and statistics trace information), and to provide you with some related guidance and recommendations.

Why you need "before" performance data

In a nutshell: because the value of your "after" data (meaning, from after the change that you made and want to assess) is considerably diminished if there's no "before" data that can be used for comparative purposes. This is true not only for situations in which you want to determine the performance impact of an alteration of a DB2 setting (e.g., a change to PGFIX(YES) for a buffer pool), or of some aspect of a database object's physical design (e.g., going to LOCKSIZE ROW for a table space) or of DB2-accessing application code (e.g., going to multi-row INSERT for a program) -- it's also applicable to situations in which something's gone wrong, performance-wise (perhaps a transaction's response time has quadrupled, or a batch job is running past a time specified in an SLA), and you need to get to the root of the problem. Online monitoring is great for seeing things from a "now" perspective, but sometimes you need the "then" view of things, too, to avoid going down dead-end streets in your pursuit of a solution.

Your historical data raw materials

First, you need to have DB2 writing trace records to data sets on disk -- not just to the in-memory buffers that are used by performance monitors. In my experience, DB2 trace records are most often written to SMF data sets (GTF is another option). As for the trace classes to have active, I'd go with accounting trace classes 1, 2, and 3, and statistics trace classes 1, 3, 4, 5, and 6 (class 5 is applicable to a DB2 data sharing system). The combined overhead of running all these trace classes should be less than 5%. There are organizations that like to have DB2 accounting trace classes 7 and 8 on, too, to provide package-level information. I agree that package-level figures can be very useful, but at some sites there is concern about the volume of trace records generated via accounting trace classes 7 and 8. If that's a concern at your shop, you could leave classes 7 and 8 off most of the time, and activate them on-demand if you have a need for package-level accounting data. Here's another idea: leave accounting trace classes 7 and 8 on, and use the new (with DB2 10) ZPARM parameter SMFCOMP to activate compression of DB2-generated SMF trace records. You're likely to get a disk space reduction of 60-80% for these compressed SMF records, and the CPU overhead associated with the compression work is likely to be very small (like 1% or less). Before compressing SMF trace records, verify that your DB2 monitor can handle SMF records in compressed form (IBM's OMEGAMON DB2 monitor can) -- if your monitor can't, you can use DB2 sample program DSNTSMFD, provided via APAR PM27872, to decompress DB2 SMF trace records that have been compressed).

Options for keeping historical performance data in a space-efficient way

With DB2 trace records being written to SMF data sets, you then need to decide how long to keep the data around. This is going to have something to do with the volume of trace data produced on a daily basis (workload dependent -- lots more DB2 trace data will be generated at some sites versus others), and the amount of space available for the data in your disk subsystem. Personally, I'd like to have at least 60 days of DB2 accounting and statistics trace data available to me -- 30 days at a minimum. What if you'd like to keep a rolling 60 days of DB2 trace data but don't have the space needed to hold that much in its raw SMF form (even with SMF data compression turned on)? In that case, you have at least a couple of options, which I'll briefly describe below.

Option 1: print to disk. Have a batch process kick off once each night to generate (by way of your DB2 monitor's batch reporting function) a standard set of accounting and statistics reports, and "print" those reports to disk. At a DB2-using site at which I worked in the early- and mid-2000s, we had such reports written to a set of GDGs, with each GDG having 60 data sets (thus, an automatically maintained rolling 60 days of reports). If you go with the "print to disk" approach, I'd recommend generating, each night (using the previous days' SMF records), the following reports:
  • At least one statistics long report for each production DB2 subsystem. That report should have FROM and TO times that will capture 24 hours of system activity (midnight-to-midnight), and the data should be summarized at 1-hour intervals (so, one report would consist of twenty-four 1-hour subreports). If you are running DB2 in data sharing mode, in addition to member-level statistics long reports generate one group-level statistics long report for each production data sharing group.
  • At least one accounting long report for each production DB2 subsystem. If you just generate one report, I'd go with a report that captures 24 hours of data (as with the previously mentioned statistics long report, this should be midnight-to-midnight, with data summarized at 1-hour intervals), and I'd have the data ordered by connection type (i.e., aggregated at the connection type level, so that I could see, for a given time period, the CICS-DB2 activity, the DRDA activity, the call attach activity, etc.). If you want to slice and dice things a little more finely, consider generating at least a couple more accounting long reports each night: 1) for your CICS-DB2 and batch-DB2 activity, consider a report that includes only the CICS and call attach connection types (and/or TSO attach, depending on how your batch jobs connect to DB2), and that orders information by correlation name (batch job names and CICS transaction names are correlation names in DB2 accounting lingo), so that you'll see detailed accounting data for each DB2-accessing batch job and CICS transaction; and 2) for your DDF activity, generate an accounting detail report that includes only the DRDA connection type, with data ordered by primary authorization ID (that should then show you information at the DDF-using application level if, as is typically the case, each such application has its own ID with which it connects to the DB2 server).

Option 2: a performance database. Here, you'd use two functions of your monitor (if your DB2 monitor has these functions): a "reduce" function (probably, to filter and/or summarize information contained in the raw SMF trace records) and a "save" or "file" function. The "save" or "file" function would write data to a file, and you could subsequently use that file to load the DB2 tables that would comprise your performance database. If you go this route, you'll want to follow the directions provided in your monitor's documentation to create and populate the tables in the performance database. The performance database approach might be more involved than the print-to-disk option (in terms of set-up, that is), but once you get the prep work done and the data loaded, you'll have a lot of flexibility in terms of how you view and report on your historical DB2 performance data.

Information is power

When you have historical DB2 performance data at your fingertips, you'll be able to much more effectively point to the effects of your DB2 tuning changes, and say, "Our team did that." What was the impact of a buffer pool change on DB2 DBM1 address space CPU time? How did thread reuse increase when you set up some CICS-DB2 protected entry threads, and how did average in-DB2 CPU time change for transactions that use those protected threads to execute DB2 packages bound with RELEASE(DEALLOCATE)? How did the change to row-level locking for a particular set of table spaces affect deadlock and timeout totals, and how was per-transaction in-DB2 CPU time impacted? How did an increase in the value of the SRTPOOL parameter in ZPARM affect activity in the buffer pools used for the work file table spaces? How did an increase in the page size of a set of indexes affect the throughput of a certain batch job? How did leveraging of DB2 10 high-performance DBATs impact per-transaction in-DB2 CPU times for a DDF-using application? How did a change from an external stored procedure to a native SQL procedure affect zIIP offload for a DDF-using application? How did migration to DB2 11 from DB2 10 affect average in-DB2 CPU times for your applications? How did the archive and purge of "old and cold" data in DB2 tables affect GETPAGE counts in the system?

And so on, and so on. Yes, your DB2 monitor's "what's happening now" capability makes it a powerful tool, but combine that with some historical data, and watch that power grow exponentially. The cost is low, and the return is high. If you're not there, get busy.

No comments:

Post a Comment