Thursday, July 25, 2013

DB2 for z/OS: Clearing Up Some Matters Pertaining to LOB Inlining

Plenty of folks know that LOB inlining is one of the really great enhancements delivered with DB2 10 for z/OS (it's something about which I blogged last year). I've noticed lately, however, that a fair number of people have some misconceptions regarding two important aspects of LOB inlining, namely the relationship between inlining and LOB table spaces, and scenarios in which LOB inlining is and is not a good idea from an application performance perspective. In this entry I'll aim to clear up these misunderstandings.

LOB table spaces: You need them. Period (but DB2 can help)

Some DBAs have this idea that LOB inlining can eliminate the requirement that there be an auxiliary table (and associated table space and index) for every LOB column in a table definition (and that's actually one auxiliary table per LOB column per partition, if the base table space is partitioned). This line of thinking is understandable: If the largest (length-wise) value that will go into a LOB column is smaller than the inline length specified for that column, so that all of the column's values can be 100% inlined, there's no need for an auxiliary table for that column, right?

WRONG. You MUST have an auxiliary table (and associated LOB table space and index) if you are going to have a LOB column in a base table, even if the values in the LOB column will always be 100% inlined. If these objects do not exist, you will not be able to insert any data into the table because the definition of the table will be considered by DB2 to be incomplete. So, avoiding the need for an auxiliary table and a LOB table space and an index is NOT a reason to go with LOB inlining, because LOB inlining does nothing to change the requirement that these objects exist before the table with the LOB column can be used.

Now, creating one measly auxiliary table, and one LOB table space to hold that table, and one index on the auxiliary table (used to quickly access a LOB value associated with a particular base table row) is not exactly a tall order, but recall that you need one set of these objects per partition and per column if the base table is partitioned and has multiple LOB columns. Do the math, and the DDL work can start to look kind of intimidating. What if a base table has 1000 partitions and two LOB columns? Ready to create 2000 auxiliary tables, and the same number of LOB table spaces and indexes?

Before you get all wound up about such a prospect, consider that DB2 can automatically create required LOB-related objects for you when you create a table with one or more LOB columns. DB2 will do that if EITHER of the following is true:
  • The CREATE TABLE statement for the base table (the table with the LOB column(s)) does NOT include an "in database-name.table-space-name" clause. In that case, DB2 will implicitly create the database for the base table space, the base table space itself, and all other objects needed to make the base table usable (e.g., a unique index on the table's primary key if the CREATE TABLE statement designated a primary key, and all objects needed for LOB data if the CREATE TABLE statement included one or more LOB columns).
  • The CREATE TABLE statement for the base table DOES include an "in database-name.table-space-name" clause, and the value of the DB2 special register CURRENT RULES is 'STD' at the time of the execution of the CREATE TABLE statement.

Performance: When LOB inlining helps, and when it doesn't

As I see it, the primary ways in which LOB inlining delivers benefits in terms of application performance and resource-efficiency are as follows:
  • Disk space savings, if a high percentage of a table's LOB values can be 100% inlined in the base table. In such a situation, the disk space requirement for LOB data is reduced in two ways: 1) Compression. Data in a LOB table space cannot be compressed by DB2; however, inlined LOB data values will be compressed by DB2, along with non-LOB data in the base table space. 2) More efficient use of data page space. In a LOB table space, one particular page can hold data associated with one LOB value. If, for example, a LOB table space has 8 KB-sized pages, and a particular LOB value is 9 KB in length, the first 8 KB of the LOB value will be in one page and the last 1 KB will be in a second page in the LOB table space. The rest of that second page in the LOB table space (7 KB of space) will remain empty because it cannot be used to hold data for any other LOB value. In a base table space, of course, there is no such rule, so inlined LOB data can lead to more efficient use of space in data pages.
  • Improved performance of INSERT and SELECT operations (for SELECTs retrieving LOB data), when most LOB values can be 100% inlined in the base table. The performance gains here can be quite significant versus the non-inlining case.
  • Ability to create an index on an expression on the inlined portion of a LOB column. Such an index would be created using an expression based on the SUBSTR function. This could be very useful if, for example, you store a type of document as a CLOB and a value of interest (maybe a department number) always appears in characters 10 through 14 of the document. You could build an index on a SUBSTR expression on the inlined portion of the LOB, and therefore be able to very quickly zero in on rows containing documents pertaining to department 'AB123' (I posted an entry about DB2's index-on-expression capability -- introduced with DB2 9 for z/OS -- to the blog I maintained while I was working as an independent DB2 consultant prior to re-joining IBM).

Clearly, LOB inlining can be used very advantageously in some cases. In other cases, LOB inlining could negatively impact application performance. Here are some potential disadvantages of LOB inlining:
  • Performance degradation for INSERT and SELECT operations (for SELECTs retrieving LOB data) when most LOB values cannot be 100% inlined in the base table. Performance would be negatively impacted because DB2 would have to go to both the base table and the auxiliary table for most inserts and retrievals of LOB data.
  • Performance degradation for SELECTs that DO NOT retrieve LOB data. When you inline LOB data, you make the base table rows longer (sometimes much longer). As a result, you'll have fewer base table rows per page, and because of that you'll get a lower buffer pool hit ratio for the base table. That means more disk I/Os, and that will impact elapsed time.

Here's the bottom line: If LOB data will be stored in a table but only rarely retrieved, inlining probably isn't a good idea unless it is very important to improve INSERT (or LOAD) performance (and that won't happen unless most LOB values can be 100% inlined in the base table). If you go with inlining for this reason, consider enlarging the buffer pool to which the base table space is assigned so the buffer pool hit ratio won't be negatively impacted. LOB inlining can have a very positive impact on the performance of queries that retrieve LOB data, so if that is important to you then inlining can be a very good move when most of the LOB values can be 100% inlined in the base table. Again, consider enlarging the base table space's buffer pool so that queries that do not retrieve LOB data won't be negatively impacted by a reduced buffer pool hit ratio.

DB2 10 is by far the best release of DB2 in terms of LOB data management capabilities, and LOB inlining is an important part of that story. The important thing to keep in mind is that LOB inlining is not a universally applicable DB2 feature. Figure out first if inlining makes sense for your particular situation.

Sunday, July 21, 2013

DB2 for z/OS Data Sharing: The Evolution of a GBP Sizing Formula

Recently I had an e-mail exchange with my friend, colleague, and fellow DB2 blogger Willie Favero, on the subject of DB2 for z/OS group buffer pool sizing. One of my messages was a little wordy (imagine that), and Willie responded with, "[This] reads like it could be your next blog post." And so it is. In this entry I'll share with you what I think is an interesting story, along with some information that I hope will help you with group buffer pool monitoring and tuning.

For those of you who aren't familiar with the term "group buffer pool," a bit of introduction: DB2 data sharing is technology that enables multiple DB2 for z/OS subsystems to share concurrent read/write access to a single database. The DB2 subsystems are members of a data sharing group that runs on a cluster of z/OS systems called a Parallel Sysplex. Within the Parallel Sysplex are resources known as coupling facilities. These are essentially shared memory devices. Within a coupling facility one would find several structures, and among these would be the DB2 group buffer pools, or GBPs. Basically, group buffer pools are used for two things:
  • Page registration -- When a DB2 member subsystem reads into a local buffer pool a page belonging to a GBP-dependent data set (i.e., a table space or index -- or a partition of a partitioned table space or index -- in which there is inter-DB2 read/write interest), it registers that page in a directory entry of the corresponding GBP (so, a page read into buffer pool BP1 would be registered in GBP1). DB2 member X registers a locally cached page of a GBP-dependent data set so that it can be informed if that page is changed by a process running on DB2 member Y. Such a change effected on another member of the data sharing group would cause the copy of the page cached locally in a buffer pool of DB2 member X to be marked invalid. On re-referencing the page, DB2 member X would see that its copy of the page is invalid, and would then look for the current version of the page in the associated GBP (and would retrieve that current version from disk if it were not found in the GBP).
  • Caching of changed pages -- If a process running on DB2 member Y of the data sharing group changes a page belonging to a GBP-dependent data set, that member will write the changed page to the associated GBP in a coupling facility LPAR (usually at commit time, but sometimes before a commit). The changed page will remain in the GBP -- from whence it can be retrieved by a member DB2 subsystem in a few microseconds -- for some time following the update. Eventually the page will be overwritten by another changed page, but before that happens the GBP data entry occupied by the page will be made stealable via a process called castout, through which changed pages written to a GBP are externalized to disk.

OK, now for the story: back in the mid-1990s, when data sharing was introduced with DB2 Version 4 for z/OS, an organization that was one of the early adopters of the technology wanted to know how large a GBP ought to be. I was working at the time in IBM's DB2 for z/OS national technical support group, and I fielded this question and thought that "as large as possible" was an unacceptably imprecise answer; so, I set about trying to determine the right size for a GBP in a quantitative way. I focused on GBPs with 4K-sized data entries because buffer pools that hold 4K-sized pages were (and to a somewhat lesser extent still are) dominant versus buffer pools that hold 8K, 16K, or 32K-sized pages. Further, I assumed that the default ratio of five GBP directory entries for every one data entry would be in effect. I also knew that a GBP directory entry occupied about 200 bytes of space (that was then -- I'll get to the current size of a GBP directory entry in a bit).

The last piece of the puzzle was a GBP sizing objective. A GBP sized right would be beneficial because... what? That "what," I decided, should be 1) avoidance of GBP write failures due to lack of storage (something that can occur if a GBP is so small that casting out of changed pages to disk -- required to make GBP data entries stealable -- cannot keep up with the volume of writes of changed pages to the GBP) and 2) avoidance of GBP directory entry reclaims (if a page must be registered in a GBP but all of that GBP's directory entries are in use, a directory entry has to be reclaimed, and when that happens the copies of the page cached in local buffer pools of member DB2 subsystems have to be preemptively marked invalid). Knowing that GBP write failures due to lack of storage are most likely to occur for a way-undersized GBP, I decided to concentrate on determining a GBP size that would virtually eliminate the possibility of directory entry reclaims, my thinking being that a GBP so sized would also be large enough to avoid GBP write failures due to lack of storage (and that has proven to be the case, in my experience).

How, then, to avoid GBP directory entry reclaims? I realized that the key to accomplishing that goal was having at least as many GBP directory entries as there were "slots" into which different pages of table spaces and indexes could be cached. These slots would be the buffers of local pools, PLUS the data entries in the corresponding GBP. If, for example, a data sharing group had two member DB2 subsystems, and if BP1 on each member had 30,000 buffers, you'd have 60,000 page slots right there. 60,000 directory entries in GBP1 would cover that base, right? Yeah, but with the standard 5:1 ratio of GBP directory entries to data entries, those 60,000 directory entries would bring along 12,000 data entries in GBP1. That's 12,000 additional slots in which different pages could be cached, so you'd need 12,000 more directory entries to eliminate the possibility of directory entry reclaims. Well, those 12,000 GBP directory entries would bring along another 2400 GBP data entries (with the aforementioned 5:1 ratio in effect). Providing 2400 more directory entries to account for those page slots would give you 480 more GBP data entries, so you'd need 480 additional directory entries, and so on. I came up with a formula into which I plugged numerical values (200 bytes for a directory entry, 4K for a data entry, 5 directory entries for every 1 data entry), and saw that I had a converging sequence. That sequence converged to approximately 0.3125, meaning that combining the size (in megabytes) of local BPn buffer pools and multiplying that figure by 0.3125 would give you a size for GBPn that would virtually ensure zeroes for directory entry reclaims and for GBP write failures due to lack of storage (something that could be verified using the output of the command -DISPLAY GROUPBUFFERPOOL(GBPn) GDETAIL(*)).

Nice, but I was concerned that 0.3125 was a number that people would not be able to readily bring to mind, and I really wanted a GBP sizing formula that a person could carry around in his or her head; so, I nudged 0.3125 up to 0.33, and thus arrived at the GBP sizing rule of thumb that came to be known as "add 'em up and divide by 3" (example: given a 4-way DB2 data sharing group, with a BP3 sized at 30,000 buffers on each member, and the default 5:1 ratio of directory entries to data entries in the GBP, a good initial size for GBP3 would be one-third of the combined size of the BP3 pools, or (30,000 X 4KB X 4) / 3, which is 160 MB). That formula was widely and successfully used at DB2 data sharing sites, and it continued to work well even as the size of a GBP directory entry increased (with newer versions of coupling facility control code). Now that the size of a directory entry is about 400 bytes, "add 'em up and divide by 3" has become "add 'em up and multiply by 0.375." So, given that prior example of a 4-way data sharing group with BP3 sized at 30,000 buffers on each member, a GBP3 size that would virtually eliminate the possibility of directory entry reclaims (because it would give you, with the default 5:1 directory-to-data entry ratio, a number of directory entries about equal to the number of "slots" into which different pages could be cached in the local BP3 buffer pools and in GBP3) would be:

(30,000 X 4KB X 4) X 0.375 = 180 MB

Now, I've mentioned multiple times that the formula I derived was based on two assumptions: a local buffer pool with 4K-sized buffers, and a GBP with five directory entries for every one data entry. What about buffer pools with larger buffer sizes (e.g., 8K or 16K)? What about GBP directory-to-data-entry ratios (which can be changed via the -ALTER GROUPBUFFERPOOL command) other than the default 5:1? Regardless of the size of the buffers and the directory-to-data-entry ratio in effect for a GBP, the basic goals remain the same: avoid directory entry reclaims and avoid write failures due to lack of storage. Because accomplishing the first of these goals is likely to result in achievement of the second objective as well, you focus on sizing a GBP to eliminate directory entry reclaims. Directory entries won't be reclaimed if there is at least one directory entry for every "slot" in which a different page could be cached (i.e., you need to have a number of directory entries that is at least as great as the number of buffers in the local buffer pools PLUS the number of data entries in the GBP).

To see how this GBP sizing approach can be used, suppose that you have a 2-way DB2 data sharing group, with a BP16K1 that has 10,000 buffers on each member. Suppose further that you have reason to want to use a 7:1 directory-to-data-entry ratio for GBP16K1 instead of the default 5:1 ratio. In that case, how should GBP16K1 be sized so as to prevent directory entry reclaims? Well, to begin with, you know that you'll need a directory entry in GBP16K1 for every local BP16K1 buffer. That's 20,000 directory entries. Given the 7:1 directory-to-data-entry ratio in effect for GBP16K1, along with the 20,000 GBP directory entries you'll have 2858 data entries (20,000 divided by 7, and rounded up to the nearest integer value). To cover those additional 2858 page "slots," you'll need another 2858 directory entries in the GBP. Again with the 7:1 directory-to-data-entry ratio in mind, the 2858 additional directory entries will come with another 409 data entries (2858 divided by 7, rounded up). You'll need another 409 directory entries to cover those 409 data entries, but that means an additional 59 data entries. The 59 directory entries needed to cover those data entries will bring along 9 more data entries, and the 9 directory entries needed to cover those data entries will mean 2 more data entries, for which you'll need 2 more directory entries. Add one more data entry to go along with those last 2 directory entries (2 divided by 7, rounded up), and you've taken the sequence far enough.

Now, just add up the directory entries you got with each iteration of the sequence, starting with the initial 20,000:

20,000 + 2858 + 409 + 59 + 9 + 2 = 23,337

Divide that figure by 7 (and round up to the nearest integer) to get a number of data entries for the GBP:

23,337 / 7 = 3333.86, or 3334 when rounded up

Now you can size GBP16K1 by multiplying the number of directory entries by 400 bytes and the number of data entries by 16KB:

(23,337 X 400 bytes) + (3334 X 16KB) = 62.7 MB

I'd round that up to 63 MB for good measure. And there you have it: a GBP16K1 sized at 63 MB (given our use of a 7:1 directory-to-data-entry ratio for this GBP) should result in your seeing zero directory entry reclaims for the GBP (and zero write failures due to lack of storage, to boot). As mentioned previously, you can check directory entry and write failure values using the output of the DB2 command (using GBP16K1 in this case) -DISPLAY GROUPBUFFERPOOL(GBP16K1) GDETAIL(*). Dealing with a different buffer size? A different GBP directory-to-data-entry ratio? Doesn't matter. You can use the approach laid out above for any buffer size and any GBP directory-to-data-entry ratio.

I hope that this information will be useful to you in right-sizing your group buffer pools, if you're running DB2 for z/OS in data sharing mode (if you're not, consider the use of this technology at your site -- you can't beat it for availability and scalability).