Friday, March 30, 2012

DB2 for z/OS: Trading Memory for MIPS (Part 1)

If you are a DB2 for z/OS DBA or systems programmer, chances are that a good bit of what you do is focused on improving the CPU efficiency of your organization's DB2-accessing application workload. There are various ways in which people go about reducing DB2-related CPU consumption, including SQL statement tuning, physical database design changes (e.g., adding or modifying indexes), shifting work to zIIP engines (for example, by going with native SQL procedures versus external stored procedures for data access via DDF), and exploiting new DB2 features and functions (such as hash organization of data in a table space -- a potential winner for tables for which single-row access via a unique key is the dominant data access mode). If you're doing these things, good -- keep doing them; but, let me tell you something: I've been working with DB2 on the mainframe platform since the late 1980s, and to this day I am still taken aback by the number of sites at which one of the best DB2 efficiency-boosters of all is not fully leveraged (and sometimes is hardly leveraged at all). I'm talking about z/OS system memory. You should really be looking to use this resource for all it's worth, because a) System z servers these days often are configured with LOTS of the stuff (you can get 3 terabytes on a z196), b) mainframe memory costs a lot less than it did a few years ago, c) adding memory to your system doesn't affect the cost of your System z software, and d) thoughtful memory utilization offers a variety of ways to enable DB2 to do more while consuming less in the way of processor cycles. In this, the first of a three-part entry, I'll look at maximizing the effectiveness of your DB2 buffer pools without making them larger. In part two, I'll examine the issue of configuring DB2 to use more memory for buffer pools in an unapologetic way -- meaning, when you want more memory resource for your DB2 subsystem, don't ask for it as though you're Oliver Twist ("Please, sir, I want some more"). In part three, I'll look at ways you can leverage mainframe memory for performance purposes via pools other than the buffer pools (e.g., the EDM pool, the sort pool, and the RID pool).

I want to start out this memory-for-MIPS series with information on using the buffer pool virtual storage resource that you already have more beneficially, from a performance perspective, for three reasons: 1) the buffer pool configuration is typically the primary DB2 consumer of a system's memory resource, 2) you can really do some good by tuning what you've got, and 3) before asking for more memory for DB2 buffers, you ought to be making good use of the memory you've got.

Following, then, are some of my favorite hints and tips for getting your DB2 buffer pool house in order:

A buffer pool dedicated to the work file table spaces is a different animal. Let me start by saying that it's a REALLY good idea to have buffer pools that are dedicated to your work file table spaces (one for the 4K table spaces, and one for the 32K table spaces -- the latter being, often, MUCH more heavily utilized in a DB2 9 or 10 environment, versus a DB2 V8 system). In configuring these buffer pools, keep a couple of things in mind:
  • Almost all of the reads of data in work file table spaces will be of the prefetch variety. Why should you care about this? Because the default value for the virtual pool sequential steal threshold, VPSEQT, is 80. Basically, that setting means that only about 80% of the buffers in a pool dedicated to work file table spaces will get a lot of use, with the other 20% being reserved for relatively infrequent random (single-page) reads. Raise that threshold value significantly, and it's like getting more memory for free (you should see fewer prefetch read I/Os for the pool in question, and fewer read I/Os means less CPU spent in driving I/O operations). I've often seen a VPSEQT value of 95-99% for a work file-dedicated buffer pool used to good effect.
  • The motivation for externalizing updated work file table space pages to disk is not the same as it is for application table spaces and indexes. For application data (in table spaces and indexes), you don't want too much time to pass between the updating of a page and the externalization of that changed page to disk. Why? Because if the DB2 subsystem should fail, restart processing will be elongated if a large number of pending writes have to be processed (part of restart following abnormal termination involves DB2 making sure that committed updates made to pages that were in memory and not yet externalized at the time of the failure are reflected in the data on disk; otherwise, the data on disk would be inconsistent). It's a different story for pages belonging to work file table spaces. Those table spaces are conceptually like scratch pads and don't have to be processed during restart as do application table spaces and indexes and the DB2 catalog. So, for the work file table spaces, page externalization is about ensuring that the associated buffer pools don't get too full of non-stealable pages. That being the case, the deferred write and vertical deferred write queue thresholds (DWQT and VDWQT) can be raised for work file-dedicated buffer pools without adversely affecting DB2 restart performance. Raising these thresholds for the pools used for work file table spaces reduces disk write activity and that reduces CPU consumption. Some folks specify 70 and 40 for DWQT and VDWQT, respectively, some go with 80 and 50, and some go with even higher values (the defaults are 30 for DWQT and 5 for VDWQT). A word of caution: at one site I saw that DWQT and VDWQT for a work file-dedicated buffer pool were both set to 90, and the data manager threshold (indicating that 95% of a pool's buffers are unavailable) was hit a great many times per hour for that pool. That is REALLY BAD for performance (big spike in GETPAGE activity), so watch that you don't go overboard with choosing elevated values for the deferred write thresholds for your work file buffer pools (a DB2 monitor will tell you if the data manager threshold has been reached for a buffer pool; so, too, will output of the DB2 command -DISPLAY BUFFERPOOL DETAIL).

Steal smart. By default, DB2 utilizes a least-recently-used (LRU) algorithm in identifying buffers that are to be stolen to accommodate new pages being brought into memory from disk. That's the right algorithm to use in most cases; however, if you have a DB2 9 (or earlier) system and you "pin" some objects in memory by assigning those objects to a pool that has as many or more buffers than the objects have pages (e.g., you assign index A, with 200 pages, and index B, with 300 pages, to a pool that has 500 or more buffers) then you should use the first-in, first-out (FIFO) page steal algorithm for the pin-pool. Why? Because FIFO is a simpler, and therefore less costly (CPU-wise), page steal algorithm versus LRU. For a pool used for object pinning, little to no buffer stealing will occur (depending on whether or not you actually have enough buffers in the pool to hold all the pages of the objects assigned to the pool), so why go with the more expensive page steal algorithm?

In a DB2 10 environment you have a new choice here: PGSTEAL(NONE). When that option is specified for a buffer pool, here's what happens:
  1. When an object assigned to a PGSTEAL(NONE) pool is first accessed, DB2 will, in the background, prefetch every page of that object into the pool (the application process that first accessed the object doesn't have to wait for this prefetch-all process to finish -- it gets the pages that it needs and goes on from there).
  2. For further access to the now-fully-in-memory object, the DB2 optimizer will assume that no I/Os will be required, and this assumption will factor into SQL statement access path selection.
  3. If in fact the pool does not have enough buffers to hold all of the pages of an object assigned to the pool, buffers will be stolen as needed based on the FIFO page steal algorithm.

Use page-fixing of buffers appropriately. Since DB2 V8, you've had the option of fixing a pool's buffers in memory via the PGFIX(YES) option of the -ALTER BUFFERPOOL command. Page-fixing a pool's buffers (meaning that the page frames holding those buffers can't be stolen by z/OS) delivers CPU savings by eliminating the need to fix in memory and then release a buffer every time it receives a page read in from disk or provides a page to be written out to disk (those page fix and release actions become unnecessary when the buffers are always in a fixed-in-memory state). Page-fixing then, can boost a DB2 system's CPU efficiency when it's used for pools that have a high level of I/O activity (and the focus here is usually on read I/O activity). If there are 10 or 20 read I/O operations per second for a buffer pool, fixing that pool's buffers in memory won't have much of an impact on CPU consumption. If the rate of read I/O activity for a pool is hundreds or thousands per second, page-fixing buffers can deliver a nice CPU savings benefit. You might think that it would be easier to specify PGFIX(YES) for all of a DB2 subsystem's buffer pools, but unless you're really awash in mainframe memory I wouldn't recommend that approach. I think that it's a good idea to have some pools -- those with lower rates of read I/O activity -- for which buffers are not page-fixed. That approach provides z/OS with something of a safety valve, referring to stealable page frames holding DB2 buffers, that can be utilized if the system memory resource becomes overtaxed.

You may have heard that DB2 10 will use 1 MB page frames (if available) for page-fixed buffer pools. That's true, and this feature of DB2 10 further boosts the CPU savings effect of fixing buffers in memory by improving the efficiency of virtual-to-real storage address translation (versus a 4 KB page frame usage scenario). Keep this in mind, though: DB2 9 can't use 1 MB page frames for buffer pools, whether page-fixed or not. What this means is that you don't want to configure an overly large portion of your z/OS system's real storage resource to be managed using 1 MB page frames (that's done via the LFAREA parameter of the IEASYSxx member of PARMLIB) until you've migrated to DB2 10 and are likely past the point of needing to fall back to DB2 9. If you want to be conservative, migrate to DB2 10 and THEN configure your z/OS system for 1 MB page frames.

Consider robbing Peter to pay Paul, if Peter has more than he needs and Paul doesn't have enough. As I mentioned up front, this first of my three-part memory-for-MIPS blog entry is focused on making the best use of the memory already allocated for your DB2 buffer pools. That doesn't mean that you can't make a pool larger -- you can, if you decrease the size of another pool by the same amount. Suppose that buffer pools BP2 and BP3 are both sized at 40,000 buffers each in your system. If the read I/O rate for BP2 is 20 per second and the rate for BP3 is 2000 per second, I'd seriously consider taking a big chunk of BP2's buffers away and giving them to BP3 (for example, you might take BP2 from 40,000 down to 20,000 buffers, and take BP3 from 40,000 buffers to 60,000. Such a move might put a serious dent in the high read I/O rate seen for BP3 without substantially increasing the read I/O rate associated with BP2.

As for these read I/O rates I've mentioned, you can get those from a DB2 monitor statistics detail report, or perhaps from a monitor's online display of buffer pool activity. You can also get this information from the output of the DB2 command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL (and I specify ACTIVE in the command to get information for pools that are actually being used). To get per-second figures using the command output, do the following:
  1. Issue -DISPLAY BUFFERPOOL(ACTIVE) DETAIL at the beginning of some busy hour of the day (referring to workload being processed by the DB2 subsystem). The output of this issuance of the command does not have to be retained.
  2. Issue -DISPLAY BUFFERPOOL(ACTIVE) DETAIL a second time, one hour after issuing the command the first time. Retain the output from this issuance of the command.
  3. A few minutes after issuing the command the second time, issue it a third time. All you want from the output of this issuance of the command is the timestamp value that appears in the DSNB409I part of the output (e.g., "INCREMENTAL STATISTICS SINCE 10:00:48 MAR 19, 2012"). That's the "to" time for the interval of activity captured in the second issuance of the -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command. The "from" time for this interval is the timestamp value found in the output of the second issuance of the command. Subtract the "from" timestamp value from the "to" value, and you have the number of seconds in the interval. Divide the read I/O figures in the output of the second issuance of the -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command by this number of seconds, and you have per-second rates of activity.

Whatever the source of your numbers (DB2 monitor or DB2 command output), what you want, for each pool, is the TOTAL read I/O rate per second. That's the sum of synchronous read I/Os (random + sequential) and asynchronous read I/Os (sequential prefetch + list prefetch + dynamic prefetch). This rate is your guide to appropriate use of buffer page-fixing, and your measuring stick to gauge the impact of, for example, increasing VPSEQT for a work file-dedicated pool, or increasing the size of a high-read-I/O pool by adding buffers taken from a low-read-I/O pool. In part two of my three-part memory-for-MIPS blog entry, we'll revisit read I/O rates in the context of boosting the aggregate size of a buffer pool configuration. Look for that next entry to be posted in a week or two.

13 comments:

  1. if i have a table on db2 mainframe and the java application hosted on unix servers , when the application queries the table , do the query account for MIPS and associated costs ?

    ReplyDelete
  2. In that case, some of the CPU cycles will be consumed on the application server, and some on the mainframe DB2 database server. On the DB2 side, the CPU consumption will be that required to receive and execute the SQL statement(s) issued by the program, and to send results (messages and/or query result set rows) to the SQL statement-issuing application. In my experience, it's typical for the bulk of mainframe CPU time for a client-server DB2 application workload to be attributable to SQL statement execution, with relatively less CPU time consumed in receive and send activity. I would expect the application server CPU consumption related to the issuance of DB2-accessing SQL statements to be largely driven by what the SQL-issuing programs do with data received from DB2. A DB2 monitor will show the mainframe CPU cost of execution of SQL statements received from network-attached applications (this is known in DB2 for z/OS lingo as the DDF or DRDA workload).

    Robert

    ReplyDelete
  3. I have a question regarding Update Query. My application is on UNIX box and db is DB2. I have a DB record with 4 columns.
    I have scenario where I am updating that record.

    Out of 4, only one column has diff data and remaining column data is same which I am setting in UPDATE query.

    My Question is MIPS comparison between an UPDATE query which is setting only one COLUMN value and an UPDATE query which is setting all 4 columns but 3 columns' values are same as DB record.

    ReplyDelete
  4. I assume that you're talking about DB2 for z/OS. I would expect the UPDATE that references 4 columns in the SET clause and changes the value in 1 column to be marginally more expensive than the statement that references (and changes) just one column in the SET clause, because for the former statement DB2 has to do value comparisons to see that the values of columns 2, 3, and 4 are in fact not changed by the UPDATE. This CPU difference should be quite small, especially if updating the one column requires expense-adding things like index update and RI checking (these operations would make the CPU difference associated with the extra non-changing columns being referenced in the UPDATE proportionately smaller).

    Robert

    ReplyDelete
  5. Is part II of this article going to address the impact of increasing buffer pool sizes on the operating system itself? These increases have a systemic effect. Especially if a pool is fixing frames.

    ReplyDelete
    Replies
    1. Sorry - forgot to hit "reply" when replying to your question, so it came out as a new comment versus a reply to your comment. Anyway, see my comment below.

      Delete
  6. Yes. The "part 2" entry, which I posted in April of last year, does indeed address the question, "How big is too big?" as it pertains to enlarging a buffer pool configuration. The URL for the "part 2" entry is:

    http://robertsdb2blog.blogspot.com/2012/04/db2-for-zos-trading-memory-for-mips.html

    ReplyDelete
  7. Robert,
    do you know of anything like SQL Server DMVs, but for DB2 z/OS, to aid in analyzing bufferpool usage? I.E., a poor man's method to analyze bufferpools if you don't have Omegamon or the Bufferpool Analysis tool?

    Bob

    ReplyDelete
    Replies
    1. Bob,

      When I don't have access to DB2 for z/OS monitor data, I get buffer pool activity via the output of the DB2 command -DISPLAY BUFFERPOOL. In particular, I like to do the following: 1) issue -DISPLAY BUFFERPOOL(ACTIVE) DETAIL, 2) wait an hour, 3) issue -DISPLAY BUFFERPOOL ACTIVE(DETAIL) again.

      The output of that second issuance of the command will show one hour's worth of buffer pool activity (verify that by checking the value of the timestamp in the "INCREMENTAL STATISTICS SINCE" part of the output -- it should be one hour prior to the time of the second issuance of the command). Divide the activity counters by 3600 to get per-second figures. Use the read I/O numbers (synchronous plus asynchronous reads) to gauge the effect of buffer pool size increases (larger pools should lead to fewer read I/Os).

      Robert

      Delete
    2. Thank you, Robert. Your articles always help me to see the intent of features in DB2 z/OS in the real world.

      I was hoping that DB2 z/OS had information, say in RTS, that could be gleaned from something like DB2 Administrative Stored Procedures.

      Delete
    3. Oh, sure, there's that. You asked initially about buffer pool monitoring, and that's not done via the real-time statistics tables in the DB2 catalog.

      DB2 for z/OS provides a stored procedure, called DSNACCOX, that accesses real-time statistics data to provide recommendations aimed at helping you to maintain your DB2 databases. Here is a link to DSNACCOX information in the online version of the DB2 10 for z/OS Managing Performance manual: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db2z10.doc.perf/src/tpc/db2z_sp_dsnaccox.htm

      Robert

      Delete
  8. Hallo Robert,

    we have just defined a bufferpool with Pagesteal(NONE). The Bufferpool is large enough to hold all the Tablespaces defined to use it. We still see synchronous I/O, both randon and sequential. We have a Sysplex with 3 Members. Would you know of a reason, why these I/Os could be taking place? We have run the command -ACCESS DATABASE(XXXXX01) SPACE(*) MODE(OPEN), to preload all the objects into the tablespace.

    Thanks

    ReplyDelete
    Replies
    1. I would not expect -ACCESS DATABASE(xxxxx) SPACE(*) MODE(OPEN) to cause preloading of data into a PGSTEAL(NONE) buffer pool. That command should only accomplish the physical open of the data sets associated with target table spaces and/or index spaces. My understanding is that DB2's loading of pages from an object into a PGSTEAL(NONE) buffer pool is accomplished when the object is first accessed by an application process. At that time, DB2 should get -- right away -- any pages needed to satisfy the application process's request (and that could account for the synchronous reads you're seeing), and will follow that action with a load (via prefetch) of all of the other pages of the object into the buffer pool.

      If you'd like, you can issue the command -ALTER BUFFERPOOL(bpname) DETAIL, and then issue the command again 1 hour later, and then send the output of the SECOND issuance of the command to me at rfcatterall@gmail.com.

      Robert

      Delete