Sunday, March 13, 2011

Monitoring DB2 for z/OS: What's in YOUR Subsystem?

First of all, if your organization uses DB2 for z/OS, I hope that you have a DB2 monitor on the system. While several of the DB2 DISPLAY commands provide information that's useful for monitoring activity on a DB2 subsystem (a favorite of mine is -DISPLAY BUFFERPOOL(ACTIVE) DETAIL), for an in-depth view into what's going on you really want the capabilities of a monitor tool at your disposal. Multiple vendors provide these products -- IBM's offering is Tivoli OMEGAMON XE for DB2 Performance Monitor on z/OS (as that's a bit of a mouthful, I'll hereafter refer to it in this blog post as OMEGAMON for DB2).

I'll tell you something interesting that I've observed regarding the use of DB2 monitors out in the real world: LOTS of folks use only the online monitoring capability of whatever tool they have on their system. Online monitoring is definitely useful when it comes to checking out what's happening right now in a DB2 subsystem -- something you may need to do, and quickly, if a problem pops up. That said, when it comes to deep-dive analysis of the performance of a DB2 subsystem, nothing beats the REPORTS that a DB2 monitor can provide. Plenty of DB2 professionals have NEVER used the report-generation capabilities of their monitor product. That's a bummer, because there's so much good stuff in those reports, and they are fantastic for trend analysis. If you have never used your DB2 monitor to produce reports, do yourself a favor and figure out how to do that. It's not hard: check your monitor's report command reference (or batch reporting users guide -- different monitoring tools have differently-titled manuals), and you'll see JCL requirements (a key: point to the SMF data set that will provide the desired input to the monitor) and examples of SYSIN control statements (these specify report type, reporting time frame, and report interval, among other things). Run some of these reports, and check out the wealth of information therein.

I find that the two most useful DB2 monitor report types are the Accounting Report - Long and the Statistics Report - Long (that's what OMEGAMON for DB2 calls 'em -- for some other monitoring products these reports are referred to as Accounting Detail and Statistics Detail). Input to these reports are records generated by the standard DB2 accounting and statistics trace classes that most folks have running all the time (e.g., accounting trace classes 1, 2, and 3, plus classes 7 and 8 if you're interested in package-level accounting). I'll tell you about my favorite flavor of the Accounting Report - Long in just a moment, but first I want to mention a practice implemented at my shop when I worked in the IT department of a DB2-using organization: every day, an Accounting Report - Long and a  Statistics Report - Long (each covering the previous days' 24 hours of activity) were "printed" to a data set on disk (more specifically, a GDG, which made it easy to keep a rolling X days of reports). 60 days of these reports were kept online, available for browsing via TSO/ISPF. Having these around was super for a couple of reasons: 1) if a problem situation started to crop up, we could look back over several weeks of data to see how things had been trending, and what might have changed; and 2) with the reports on-hand, we didn't have to keep 60 days of SMF records online (the reports summarize and greatly reduce these records).

Now, the Statistics Report - Long is great, and I may write more about that report in a future post, but for now I want to talk a little about the Accounting Report - Long (statistics reports show activity from the DB2 subsystem perspective, while accounting reports provide information from an application or workload point of view). An important specification when generating an Accounting Report - Long is the desired grouping of the accounting data. For an OMEGAMON for DB2 report, this data grouping is determined via the ORDER subcommand (this would be part of the SYSIN input to a report generation job). The default grouping for OMEGAMON for DB2 (and for other monitors I've seen) is primary authorization ID within plan name. I generally don't want data in the report grouped that way. What I most often like to see is a report generated with ORDER(CONNTYPE) (using OMEGAMON for DB2 lingo -- the terminology might be slightly different for other monitors). With ORDER(CONNTYPE), your Accounting Report - Long will have several sections, with one report component for each DB2 connection type. In other words, one part of the report will show all the CICS-DB2 activity on the subsystem, another part will have all the DRDA activity (that which comes through the DB2 DDF address space), another will show activity for all programs linked with the Call Attach Facility (typically, these are batch jobs), and so on.

If you do generate an Accounting Report - Long with data grouped by connection type, do a little exercise for me -- it could be something of an eye-opener for you. Take, for each part of the report (i.e., the CICS-DB2 part, the DRDA part, the CAF part, etc.), two fields (maybe three, as I'll explain), and get the product of these. The fields of interest are (and again, I'm using OMEGAMON for DB2 terminology -- this might vary somewhat from product to product) #OCCURRENCES in the HIGHLIGHTS section, and CP CPU TIME in the AVERAGE section (the latter under the "class 2" column in that report section, so called because it shows information from DB2 accounting trace class 2 records). Before getting the product of those two fields, check to see if the SE CPU TIME filed in the "class 2" column in the AVERAGE section contains a non-zero value (and note that depending on your monitor and your release of DB2, "SE CPU TIME" may be labeled "IIP CPU TIME). If it does, that's CPU time on a zIIP engine (an SE, or "specialty engine"), and it's NOT included in CP CPU TIME (which is just CPU time on general-purpose engines, or central processors -- CPs for short). Still with me? OK, so take the average class 2 CPU time per occurrence (that is, per accounting trace record), which is general-purpose CP CPU time plus zIIP, or SE, CPU time, and multiply that by the number of occurrences. What this gives you: the total application-chargeable CPU time consumed in SQL statement execution for the connection type. [I say "application chargeable" because some SQL execution-related CPU time is consumed by things such as prefetch reads and database writes, which are charged to DB2's address spaces and not to so-called allied address spaces -- but the "application chargeable" CPU time is almost always the large majority of total SQL statement-related CPU consumption.]

[If you have a DB2 data sharing group, you'll want to sum the SQL statement CPU consumption figures for each member of the group. In other words, add total CICS-DB2 class 2 CPU time for member DB2A to total CICS class 2 CPU time for member DB2B to the total for DB2C, and so on, for each connection type.] 

To help clarify things, here is an excerpt from an OMEGAMON for DB2 Accounting Report - Long, with the fields I've mentioned highlighted:

   SUBSYSTEM: DB2A   ORDER: CONNTYPE  INTERVAL FROM: 12/01/10 09:00:00.00
 DB2 VERSION: V9     SCOPE: MEMBER    TO: 12/01/10 11:00:00.00
 
 CONNTYPE: DRDA
 
  AVERAGE       APPL(CL.1)  DB2 (CL.2) ... HIGHLIGHTS
 ------------  ----------  ----------  ... --------------------------
 ELAPSED TIME    0.031811    0.015938  ... #OCCURRENCES    :  2813092
  NONNESTED      0.031361    0.015523  ... #ALLIEDS        :     7790
  STORED PROC    0.000313    0.000277  ... #ALLIEDS DISTRIB:        0
  UDF            0.000000    0.000000  ... #DBATS          :  2805272
  TRIGGER        0.000138    0.000138  ... #DBATS DISTRIB. :       30
                                       ... #NO PROGRAM DATA:        0
 CP CPU TIME     0.004754    0.004685  ... #NORMAL TERMINAT:    71402
  AGENT          0.004754    0.004685  ... #DDFRRSAF ROLLUP:   276108
   NONNESTED     0.004614    0.004551  ... #ABNORMAL TERMIN:        0
   STORED PRC    0.000120    0.000114  ... #CP/X PARALLEL. :        0
   UDF           0.000000    0.000000  ... #IO PARALLELISM :        0
   TRIGGER       0.000020    0.000020  ... #INCREMENT. BIND:     1299
  PAR.TASKS      0.000000    0.000000  ... #COMMITS        :  2823659
                                       ... #ROLLBACKS      :    74630
  SECP CPU       0.000682         N/A  ... #SVPT REQUESTS  :        0

                                       ... #SVPT RELEASE   :        0
 SE CPU TIME     0.003992    0.004067  ... #SVPT ROLLBACK  :        0




I'll tell you why this information is of such interest to me: it shows the in-DB2 CPU cost (i.e., the CPU cost of SQL statement execution) of the DB2 workload by component, and that can be news to people. I could ask a DB2 person, "What's the largest component of your DB2 workload?" and that person might say, "Batch," or "CICS," because he or she thinks that's the case. Then we generate and take a look at a DB2 monitor Accounting Report - Long with data grouped by connection type, and we do the numbers in the aforementioned way (average class 2 CPU CPU time per occurrence -- ensuring that class 2 zIIP CPU time, if any, is added in -- times number of occurrences, for each connection type). People are sometimes surprised by what they see. Maybe the batch DB2 workload isn't king of the hill, after all. What often makes the biggest impression is the relative size of the DRDA workload (which I often refer to as the client-server DB2 workload). It's not unusual for this to be the fastest-growing part of an organization's overall DB2 for z/OS workload, and sometimes it's the largest component of the overall workload. One factor here: at more and more sites, the bulk of new DB2 for z/OS-related application development work involves applications running on off-mainframe application servers, directing SQL statements to a DB2 for z/OS database via DRDA and the DB2 DDF (these statements may take the form of JDBC or ODBC calls, and more and more frequently they include calls -- maybe LOTS of calls to DB2 stored procedures).

Another client-server DB2 workload growth factor: a growing number of organizations are providing users with query and reporting tools and allowing them to use these to access data in production, operational DB2 databases. That's right -- BI (business intelligence) work targeting production DB2 for z/OS tables. Yes, this can be done while OLTP and batch programs go against the same data. I've seen it, and it works (it can work particularly well when DB2 is running in data sharing mode on a Parallel Sysplex, and one or two members of the group are dedicated to decision support applications). The DDF connection? The query and reporting tools typically send SQL statements to DB2 using the DRDA protocol (done through DB2 Connect or via one of the IBM Data Server Driver packages).

Whatever the breakdown of your workload, knowing it can help you to see where the demand for SQL statement execution capacity is coming from, and that can help you to deliver support where its needed most (suggestion: try tracking the workload breakdown over time, perhaps presenting the trends graphically in a line chart, with different colored lines for the different components of the overall DB2 workload, or a series of pie charts, the latter with slices for the different workload components).

I'll try to post more entries in the future on other uses of the information that can be found in a DB2 monitor Accounting Report - Long and Statistics Report - Long. For now, look these reports over at your shop. Don't run 'em yet? Get started. You'll be glad you did.