Tuesday, December 29, 2015

Thoughts on DB2 for z/OS Index Compression

Not long ago, a DB2 for z/OS DBA requested of me some guidance in planning for his organization's initial use of index compression (a capability introduced with DB2 9 for z/OS). He asked some good questions, and with the thought that my responses might be useful to others in the DB2 community, I offer this blog entry.

First, strive to get good "bang" from your index compression "buck." "Bang for the buck" is an expression here in the USA (and perhaps elsewhere in the English-speaking world) that refers to getting a good return on investment. At this DBA's site, there were many thousands of indexes defined on DB2 tables (in two large production environments and multiple development and test systems). The DBA understandably viewed compression of every last one of these indexes as a daunting prospect. My advice: don't compress all the indexes. Generally speaking, at a DB2 for z/OS site a relatively small number of really big tables will hold the lion's share of data rows, and the indexes defined on these really big tables will be the ones that occupy the large majority of disk space used for indexes. Compress this proportionally small number of indexes, and you'll achieve maximum disk space savings with minimal effort and overhead.

I told the DBA to identify the largest indexes in a given DB2 system (and by "system," I'm referring to the objects for which information can be found in a particular DB2 catalog). He did that with a query similar to this one (the grouping and summing aggregates information pertaining to the partitions of a partitioned index):

SELECT
 DBNAME, NAME
,SUM(TOTALENTRIES) AS TOT_ENT
,SUM(SPACE) AS TOT_SPC_KB
FROM SYSIBM.SYSINDEXSPACESTATS
GROUP BY DBNAME, NAME
WITH UR;


We speak sometimes of the "80/20" rule, whereby 80% of results are tied to 20% of things acted upon. In the case of the DB2 systems examined by the DBA with whom I was working, 80/20 would have understated the outcome: in one of the production DB2 environments, 1% of the indexes (the top 1% in terms of number of entries) occupied 85% of the total disk space used for indexes (and results were similar for the other DB2 systems at this site). By compressing a relatively small number of indexes, a large decrease in disk space consumption will be achieved (index compression can often deliver a 50-70% reduction in the disk space occupied by a given index). That's a very good bang for the index compression buck.

Second, choose the right page size for an index that is to be compressed. When a DB2 for s/OS table space is compressed, data is in compressed form in memory as well as on disk (data is compressed when inserted into a page, and decompressed when accessed on behalf of a program), so a 4KB page (for example) of compressed data in memory also occupies 4KB of space on disk.

Not so with index compression. Index pages in memory are always uncompressed -- they are in compressed form only on disk. That being the case, index compression is achieved by squeezing the contents of a 32KB, 16KB, or 8KB index page in memory into a 4KB page on disk. Compressing an index, then, involves two actions: 1) change the index's page size to something larger than 4KB (if the current page size is 4KB) via ALTER INDEX with a BUFFERPOOL specification that references an 8K, 16K, or 32K buffer pool; and 2) change the index again to be compressed via a second ALTER INDEX with a COMPRESS YES specification. [The first ALTER INDEX, specifying an 8K, 16K, or 32K buffer pool, will place the index in AREOR status, indicating a pending DDL change -- a subsequent online REORG of the index will change the index's page size. The second ALTER INDEX, with the COMPRESS YES specification, will place the index in rebuild-pending status.] Knowing this, you might think, "Hey, I'll change the page size for all the indexes I want to compress to 32KB. That way, I'll get an 87% reduction in disk space for these indexes!"

That would not be a good move. Here's why: if a compressed index has 32KB-sized pages, DB2 will stop inserting entries into a page in memory if it determines that the page already holds the maximum amount of information that can be squeezed into a 4KB-sized page on disk; thus, going with 32KB pages for an index could result in space being wasted in the in-memory pages of the index. Does that mean that 8KB would be a better page-size choice for indexes that you want to compress? Not necessarily -- with 8KB-size pages, you might be missing out on disk space consumption reduction that could be achieved with a larger page size. What to do?

Thankfully, DB2 provides a utility, DSN1COMP, that provides a nice assist with respect to choosing the right page size for an index that you want to compress. When you run DSN1COMP for an index, the output will show you, for each page size (8, 16, and 32 KB), the estimated amount of disk space savings that could be achieved with compression, and the estimated amount of space that would be wasted in in-memory pages of the index. For a given index that you want to compress, the right page size will be the one -- per DSN1COMP output -- that will deliver the most disk space savings with the least amount of in-memory space wastage.

Finally, leverage z/OS LPAR memory resources to minimize the CPU overhead of index compression. As noted above, pages of a compressed DB2 for z/OS table space are compressed in memory as well as on disk. Consequently, data compression CPU overhead is incurred when a row is inserted into a table in a compressed table space (or an existing row is updated), and when a row is retrieved from a compressed page in memory. Because pages of compressed indexes are compressed only on disk, index compression CPU overhead is incurred when a compressed page is decompressed upon being read into memory, and when an in-memory index page is compressed as it is written to disk. In other words, the CPU cost of index compression is paid at read and write I/O time -- once an index page is in memory, there is no compression-related cost to access that page, because the page is in memory in uncompressed form.

Since the cost of index compression is associated with index I/O operations, it stands to reason that as these operations decrease, the CPU cost of index compression will go down. And what makes index I/O rates go down? A larger buffer pool. If you compress index XYZ, a subsequent enlargement of the buffer pool to which XYZ is assigned should reduce the rate of I/Os (especially read I/Os) for the index, and that should translate to reduced compression overhead for index XYZ. Where would you see this overhead reduction? That depends on the type of I/Os that are being reduced via buffer pool enlargement. To the extent that the rate of synchronous read I/Os is lowered by way of buffer pool enlargement, the in-DB2 (aka class 2) CPU time for programs using a compressed index should be positively impacted. Reduced prefetch read and database write activity should reduce the CPU consumption of the DB2 database services address space (aka DBM1) -- and recall that prefetch reads and database writes became 100% zIIP eligible starting with DB2 10, so a lower volume of that activity will cut down on associated zIIP engine consumption.

Index compression has been an option for some years now in a DB2 for z/OS environment, but I have the impression that it's a DB2 feature that is not yet used as often as it should be. If you're not yet using index compression, think about putting the capability to work at your site; and, if you decide to compress some of your indexes, do it the right way.

10 comments:

  1. Hi Robert,
    Thank you for the article.
    Would you clarify this sentence :
    "And what makes index I/O rates go down? A larger buffer pool"

    You mean larger Bufferpool in terms of PAge size, like 8K, 16K ,etc or Larger Bufferpool in terms of space allocation ?
    Regards Ilya Kurepa

    ReplyDelete
    Replies
    1. I mean a larger buffer pool in terms of more buffers. Example: a buffer pool with 200,000 buffers instead of 100,000 buffers.

      Robert

      Delete
  2. Thanks Robert.

    But how rate of Sync I/O could go down ? I understand if pages gets reused, and with bigger buffer , they , obviously, will stay longer in the buffer , but what about if I read index in skip-sequential manner ( prefetch is ineffective in such case ) , and I bring new pages in the buffer , thus , incurring high number of Sync I/Os . How index compression can help me even with bigger buffers ?

    Regards Ilya

    ReplyDelete
    Replies
    1. First of all, prefetch can be effective in a skip-sequential read situation. List prefetch might be used in that case (list prefetch, formerly used exclusively for data page access, can be used for index page retrieval starting with DB2 10).

      Second, of course when a page (index or table space) is first accessed and brought into memory, that will involve an I/O (synchronous or asynchronous). If the buffer pool into which the page is read is larger instead of smaller, the page should remain in memory for a longer period of time, and that will increase the chances of the page being in memory when it is next referenced, and that would reduce read I/O activity.

      Index compression should not be expected to reduce I/O activity - it is done to reduce disk space utilization. Larger buffer pools, by reducing index-related I/O activity, can help to reduce the CPU cost of index compression, because a large percentage of that cost is incurred when index pages are read from or written to disk.

      Robert

      Delete
  3. Hi Robert,

    Thanks for clarification.

    One more question . You're saying :
    "When a DB2 for s/OS table space is compressed, data is in compressed form in memory as well as on disk (data is compressed when inserted into a page, and decompressed when accessed on behalf of a program)"

    So, data pages stays compressed in memory too . Is that applied for Local Buffers only ? What about GroupBuffepools - they still in compressed form there ?
    Regards

    ReplyDelete
    Replies
    1. Compressed data pages are cached in group buffer pools in compressed form. There would be no purpose in decompressing a data page in a GBP, for the same reason there is no need to decompress such a page on disk: it is processed at the page level, versus at the row level, when it is in a GBP or on disk. Decompression of the contents of a compressed data page is done only when rows in the page need to be accessed.

      Robert

      Delete
  4. Hi Rob,

    It will be useful if you can explain/write separate page for virtual index

    ReplyDelete
    Replies
    1. Do you have a specific question about Db2 for z/OS virtual index functionality? Documentation on that functionality can be found on the Db2 for z/OS Knowledge Center (https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/usrtab/src/tpc/db2z_dsnvirtualindexes.html).

      Robert

      Delete
  5. Hi Robert, I have a quick question, So if DSN1COMP shows 51% compression rate for 8K, and 73% for 16K and 73% for 32K, does picking 32K benefit us versus 16K? and Why? Thanks for your help. Rachel

    ReplyDelete
    Replies
    1. Hey, Rachel. In addition to estimating disk space savings for each possible index page size, DSN1COMP should provide you with an estimate of the percentage of space that would be wasted in in-memory index pages, for each page size (for a larger page size, especially, there might be some space wasted in in-memory pages, because Db2 will not put more entries in a page if doing so would prevent the page from being compressed down to 4K when written to disk). You indicate that disk space savings would be about the same for 16K and 32K-sized index pages. If it is estimated that 16K-sized pages would waste less space in-memory versus 32K-sized pages, go with the 16K-sized pages. Exception to that rule: if the index in question gets a whole lot of page split activity, and if that is hampering performance for processes that insert rows into the underlying table, and if DSN1COMP indicates that not much in-memory space would be wasted with 32K-sized pages, consider going that route to maximize reduction of index page split activity.

      Robert

      Delete