Wednesday, September 30, 2020

Db2 for z/OS Character Columns: Thoughts on CHAR versus VARCHAR

Recently, I received from a Db2 for z/OS DBA a question: for a table column that will hold character string values, are there any recommendations concerning the use of CHAR (fixed-length) versus VARCHAR (variable-length) for the column's data type?

That question has been asked, I think, many times by many different people over the 35+ years since Db2 made the scene. Even so, I find it to be an interesting query to this day. Why? Because various Db2 developments - some relatively recent and some from way back - have caused the answer to change over time. You can read on and see if you agree with me.

Now, it is of course true that in at least one case the question has a very simple answer: if a value to be stored in the column can exceed 255 byes in length, CHAR cannot be the data type because 255 bytes is the maximum length you can specify for a fixed-length Db2 character string column. Longer than that and it has to be VARCHAR, right? Probably right, but maybe not - you could go with CLOB. Do you think that CLOB is the right choice only when the length of a character string value that will be stored in a column might exceed 32,704 bytes - the maximum length of a VARCHAR column? I'd say that that is not always true - there could be a situation in which you'd choose CLOB over VARCHAR even if no value longer than 32,704 bytes will go in the column. Suppose that the maximum length of a character string value that will be stored in a table is "only" 20,000 bytes, but the column will very rarely be referenced in a query's select-list and maybe never (or almost never) referenced in a query predicate. In that case, CLOB could be a good choice for the column's data type. Why? Because CLOB values (unless in-lined, and you wouldn't inline CLOB values that are rarely retrieved and rarely referenced in predicates) are physically stored in an auxiliary table in a LOB table space. The LOB table space could (should) be assigned to a buffer pool other than the one used for the base table's table space, and THAT would allow a lot more of the base table's pages (sans the LOB values) to be cached in a given amount of buffer pool space. You'd also have more rows in each base table page. More pages in memory and more rows per page means fewer GETPAGEs, fewer read I/Os, better CPU efficiency and better throughput. CLOB can be the right choice for a character string column, even when VARCHAR is technically usable.

How about the other end of the column-length spectrum: super-short character strings? Here again, the best choice is often pretty clear, owing to the two bytes of "baggage" that come along with a variable-length character column. For many years, you knew what that two bytes of baggage was: it was the length indicator for the VARCHAR column. These days, it could be a length indicator, or it could be an offset indicator. It will be one or the other depending on whether basic row format is used for the table's table space or partition (in which case it will be a length indicator) or reordered row format is in effect (in which case it is an offset indicator). I'll have more to say about basic versus reordered row format in a moment, but either way you're looking at two extra bytes for a varying-length value. Because of those two extra bytes, going with VARCHAR for a column that will hold hold values with a maximum length of one or two bytes does not make the best use of disk or buffer pool space - for optimal efficiency you should use CHAR(1) or CHAR(2) for a column that will hold values that will be, respectively, one byte or not more than two bytes in length.

Having considered values longer than 255 bytes and shorter than 3 bytes, we are left with the byte-length range 3-255 to ponder. Within that range, you could go with CHAR or VARCHAR. What might influence your decision? Here are some thoughts:

  • What is the variability in length of values that will go in the table column? The greater the variability, the more space that could be saved via use of VARCHAR versus CHAR (though table space compression, which is widely used, offsets much of this advantage of VARCHAR).
  • How important is it to make access to the data values in a column absolutely as efficient as possible? Maximum efficiency of access is generally associated with fixed-length columns, though reordered row format (RRF) makes access to VARCHAR columns more efficient than it used to be. How does reordered row format do that? It does that by physically storing variable-length columns at the end of a table's rows (logical column order is not affected by reordered versus basic row format) and preceding those variable-length columns with a series of 2-byte offset indicators - one for each variable-length column. The offset indicator for a given variable-length column shows where in the row the first byte of that column can be found. This is good for performance because to access a value in a variable-length column, Db2 goes straight to that column's offset indicator (easily done, because Db2 knows which of those 2-byte indicators it is going to, and the offset indicators are preceded by whatever fixed-length columns are in the row). Having gotten efficiently to the right offset indicator, Db2 sees exactly where to go to get to the variable-length column's first byte - boom-boom, done. Contrast that with the basic row format (BRF) situation, in which physical column order is the same as logical order in a row. If Db2 wants to access the fourth variable-length column in a BRF row, it can't get there without checking the length of each of the three variable-length columns ahead of the column of interest. Now, does RRF make access to variable-length columns just as efficient as access to fixed length columns? Not quite, but RRF does reduce the access-efficiency gap between fixed-length and variable-length columns.
  • How important is it to minimize the volume of data written to the Db2 log? Where UPDATE operations are concerned, Db2 tends to log fewer bytes when fixed-length character columns are updated versus updating of values in variable-length columns. That said, as reordered row format reduces the access-efficiency penalty associated with variable-length columns (as explained above), so too does RRF reduce the logging-volume penalty that goes along with such columns. How so? Well, when one fixed-length column in a row is updated, just that column change is logged. When a variable-length column is updated and that update changes the row's length (typically the case), data is logged from the first changed byte to the end of the row. Because RRF physically moves variable-length columns to the end of data rows, there will likely be fewer bytes between a variable-length column and the end of its row, and that means fewer bytes written to the log when that variable-length column is updated in a way that changes the column's (and thus, the row's) length. I have to say here that data compression muddies the water a bit, since with compression in effect an update of even a fixed-length column could change a row's length, leading to more bytes logged due to update activity (though that effect can be mitigated by the fact that data associated with changes to compressed tables is logged in compressed form). Still, when all is said and done, it is likely that maximum efficiency from a log-write perspective would be achieved by use of fixed-length versus variable-length columns, particularly for tables in uncompressed table spaces. You might have - or might want to have - heavily updated tables in uncompressed table spaces when those tables are not too large (the value of compression is a little less compelling for tables that are not so big).
  • Do you want to avoid row-location changes that result from UPDATEs that increase a row's length? When an UPDATE of a variable-length column increases a row's length so that the row will no longer fit in its page, Db2 will relocate the row in the table and will put in the old location a pointer to the row's new location. That's called an indirect reference, and those can have a negative impact on data-access efficiency (REORGs clean those indirect references up). Use of fixed-length character columns tends to result in fewer indirect references for a table (with data compression being a bit of a muddying factor, as it is for log-write volumes, and for the same reason: an update of a fixed-length column in a compressed row can cause that row to get longer). That said, the indirect-reference penalty associated with variable-length columns can be mitigated through use of the PCTFREE FOR UPDATE table space specification that was introduced with Db2 11 for z/OS (I posted an entry on PCTFREE FOR UPDATE to this blog a few years ago).

And with that, I've provided just about everything I can think of on the subject of CHAR versus VARCHAR for character string values. I'd sum it up this way: the CHAR versus VARCHAR choice is most interesting for character string values that will be between 3 and 255 bytes in length. For such columns, there are situations that would favor CHAR versus VARCHAR, and vice versa. Even so, Db2 features and functions such as table space compression, reordered row format, and PCTFREE FOR UPDATE can reduce the negative effects associated with the use of CHAR or VARCHAR in these various situations. My advice to you: take advantage of Db2 functionality, and make the CHAR and VARCHAR decisions that seem reasonable to you given your application and data environment.

Friday, August 28, 2020

Db2 12 for z/OS RPN Table Spaces: New Possibilities for Range-Partitioned Tables

Sometimes, I assume that I've blogged about some Db2 for z/OS-related topic when in fact I have not. RPN table spaces is one such topic. In thinking about something to post this month, I briefly considered RPN table spaces and thought, "Nah. I've probably already blogged on that subject, given that it's a big deal." Lo and behold, I haven't blogged about RPN table spaces. I've spoken of them numerous times in presentations over the past few years, but written nothing about them in this blog (save for a brief mention in a 2017 response to a comment associated with an entry on universal table spaces). Time to rectify that situation.

RPN table spaces (I'll explain the acronym in a moment) were introduced with Db2 12 for z/OS (with function level 500 or higher activated). RPN is essentially a new type of universal range-partitioned table space. Before getting into the details, I'll provide a few examples of gripes and concerns that Db2 DBAs have regarding traditional universal partition-by-range (PBR) table spaces:

  • "I have a table in a PBR table space that is really big and getting bigger. I'm starting to sweat about only being able to put 16 TB of data in the table. Yeah, I could get up to 128 TB of data in the table if I went with a page size of 32K, but we had good reason to go with 4K pages for this table space."
  • "I don't like the fact that my choice for DSSIZE for a PBR table space impacts the maximum number of partitions for the table space. 256G is the DSSIZE we want for a particular PBR table space, but the limit on partitions for the table space would then be 512 - and that only if we go with 32K for the page size (if we go with 4K for the page size and 256G for DSSIZE, I can only have 64 partitions)."
  • "Ugh. I have a big table with 1000 partitions in a PBR table space. DSSIZE is 8G, and that has been just fine for 999 of the partitions. One partition is about to hit the 8G size limit. I'm going to have to go to 16G for DSSIZE for THE WHOLE TABLE SPACE, even though ONLY ONE PARTITION NEEDS THE LARGER DSSIZE VALUE."

The RPN table space feature of Db2 12 addresses all of those issues, and more. It does that by way of a very important change versus traditional PBR table spaces. This change pertains to the numbering of pages in a range-partitioned table space. In a traditional PBR table space, every page of the table space has a unique number. In an RPN table space, we number the pages in partition 1, and then for partition 2 we start over again with regard to numbering pages. For partition 3, same thing - start over again with page numbering. Same for partition 4, partition 5, and so on. What this means: if I have an RPN table space with 1000 partitions, the first page page in each and every partition has the same page number (referring to the number stored in the data page itself). You read that right: in the table space, there will be 1000 first-in-the-partition pages that all have the same page number. There will also be 1000 second-in-the-partition pages that all have the same page number (assuming that each partition has at least 2 pages). Does this sound like something that would lead to chaos? In fact there is no chaos because in an RPN table space it is still very much possible to uniquely identify a page. The twist, versus a traditional PBR table space, is that in an RPN table space a page is uniquely identified by the combination of its page number and its partition number. That page numbering scheme is called relative page numbering, and that gives us the acronym RPN (the type of page numbering used for a traditional PBR table space is called absolute page numbering).

OK, so what can I do with an RPN table space that I can't do with a traditional PBR table space? Plenty:

  • You can put up to 4 petabytes (that's 4096 terabytes) of data into one table in an RPN table space, and that's true for any page size you choose for the table space.
  • Row-capacity-wise, you can put up to 280 trillion rows (yes, that's "trillion," with a "t") into one table in an RPN table space, if you use 4K pages for the table space (if you go with 32K pages, you can put "only" 35 trillion rows into a table in an RPN table space).
  • With an RPN table space, you can have up to 4096 partitions of any allowable size - in other words, the limit on the number of table space partitions does not go down as DSSIZE goes up; AND, the maximum DSSIZE is 1024G (versus 256G for a traditional PBR table space); AND, DSSIZE for an RPN table space can be nG, with "n" being any integer between 1 and 1024 (in other words, "n" no longer has to be a power of 2, as is the case for a traditional PBR table space - if you want to go with a DSSIZE of, for example, 131G for an RPN table space, go for it).
  • Different partitions of one RPN table space can have different DSSIZE specifications. Consider the scenario I described previously, in which a DSSIZE of 8G is fine for all but one partition in a 1000-partition table space. Let's say the one partition about to hit the 8G size limit is partition 327, and a DBA wants to take DSSIZE for that partition to 17G. With an RPN table space (which makes 17G a valid value for DSSIZE, as noted above), the DBA alters only partition 327 to have DSSIZE 17G; and, get this: that ALTER is an immediate change - partition 327 can immediately get larger than its former 8G DSSIZE, without even a REORG of the partition required to put the DSSIZE change into effect (if the DBA had altered DSSIZE for the partition to a value smaller than the previous value, it would be a pending change and a REORG of the partition would be required to put the smaller DSSIZE value into effect).
  • The DSSIZE flexibility provided by an RPN table space extends to partitioned indexes defined on the table in the table space: DSSIZE for partitions of such an index can be nG, with "n" being any integer between 1 and 1024, and different DSSIZE values can be used for different partitions of an index. Speaking of indexes, note that an index on a table in an RPN table space will be slightly larger than an index defined on the same table in a traditional PBR table space. That is so because the size of a RID goes from 5 bytes to 7 bytes when data is in an RPN table space versus a traditional PBR table space.

That's some good stuff, eh? Wanna know how you can have RPN table spaces in your Db2 12 system? I'll tell you. First, for a new table space, in the CREATE TABLESPACE statement you can include the option PAGENUM RELATIVE. You can also create an RPN table space without even specifying PAGENUM RELATIVE. How? By setting the new (with Db2 12) ZPARM parameter PAGESET_PAGENUM to RELATIVE. Doing that will make RPN the default for a new universal PBR table space created in the system (whether PAGESET_PAGENUM is set to RELATIVE or to its default value of ABSOLUTE, that specification can be overridden at the individual table space level via use of the PAGENUM option of CREATE TABLESPACE).

How about an existing universal PBR table space of the traditional variety? Can it be changed to an RPN table space? Yep - just ALTER the table space with PAGENUM RELATIVE, and then do an online REORG of the table space to materialize the change (yes, you have to REORG the table space in its entirety to get to RPN, but the good news is that once the conversion is done you should have fewer situations that would necessitate a REORG of the whole table space - that's part of the RPN value proposition).

Can a "classic" range-partitioned table space be converted to RPN ("classic" referring to a table space that is not universal and is associated with a table that uses table-controlled partitioning versus index-controlled partitioning)? Yes. First, you ALTER the table space with a SEGSIZE specification to indicate that you want to convert it from non-universal to universal PBR (a value of 64 for SEGSIZE is generally appropriate). After that, issue a second ALTER for the table space, this time specifying PAGENUM RELATIVE. After both of those ALTER TABLESPACE statements have been executed, a single online REORG will make the table space both universal and RPN.

If I were administering a Db2 12 system, I think I'd want to make extensive use of relative page numbering for my range-partitioned table spaces, for the DSSIZE flexibility (and the ability to immediately increase a partition's maximum size) and for enhanced scalability. I encourage you to put this new Db2 feature to work for the benefit of your organization.

Friday, July 31, 2020

A Case Study - Db2 for z/OS High-Performance DBATs

Over the years, I have posted several entries to this blog that have covered various aspects of Db2 for z/OS high-performance DBATs (examples are entries posted in 2016 and 2013). Over the past couple of weeks, I have been working with a Db2 for z/OS administrator who is engaged in implementing high-performance DBAT functionality for the first time at his site. That implementation effort has involved a good bit of back-and-forth communication between myself and the Db2 administrator, and I saw that it makes for a good case study that touches on many of the issues associated with high-performance DBAT set-up, monitoring and usage. Accordingly, I decided to use this case study as the basis for a blog entry.

OK, so first there is the matter of getting high-performance DBAT functionality in the first place. How do such DBATs come into being (and a DBAT, or database access thread, is the kind of thread used for an application process that accesses Db2 data by way of the distributed data facility, or DDF)? There are two requirements here:
  1. A package or packages bound with RELEASE(DEALLOCATE). The CPU efficiency boost provided by high-performance DBATs is attributable to RELEASE(DEALLOCATE) packages that stay allocated to such a thread, and to parent locks (table space- or partition-level locks) that, once acquired, also stay allocated to the thread, until the thread is terminated after having been reused by 200 transactions (parent locks acquired in the execution of dynamic SQL statements issued through a RELEASE(DEALLOCATE) package are retained until thread termination if dynamic statement caching is in effect). If the same packages would be used for all - or at least most - of the 200 transactions that reuse a high-performance DBAT, and if the same table spaces would be accessed by all - or at least most - of the thread-reusing transactions, what we save is the CPU that would be consumed in releasing and reacquiring for the thread the same packages and table space locks, over and over again when one transaction completes and another reuses the thread (keep in mind that parent locks are almost always of the "intent" variety, and intent locks do not conflict with each other, so we do not worry about retaining them for a thread as transactions reusing a thread come and go).
  2. A DDF PKGREL setting of BNDOPT. The PKGREL parameter of a Db2 subsystem's distributed data facility is the "on/off switch" for high-performance DBATs. The parameter's value is set via execution of a -MODIFY DDF command, and when PKGREL = BNDOPT then the high-performance DBAT switch is "on," and when PKGREL = COMMIT the switch is in the "off" position (another PKGREL setting, BNDPOOL, can be appropriate when application servers disconnect from a Db2 system soon after connecting, versus the more common scenario in which servers connect to Db2 and stay connected for a long time). Why this on/off switch? Because it is possible that a RELEASE(DEALLOCATE) package allocated to a high-performance DBAT could get in the way of a database administration task such as an ALTER of a database object on which the package is dependent. If that situation were to present itself, a Db2 DBA could issue -MODIFY DDF PKGREL(COMMIT), wait for the existing high-performance DBATs to go away (that should not take more than about 4 minutes), perform the database administration function while DDF transactions continue to execute using "regular" DBATs, and then turn high-performance DBAT functionality back on by issuing -MODIFY DDF PKGREL(BNDOPT).

Next question: which package or packages should be bound with RELEASE(DEALLOCATE) for the purpose of getting high-performance DBAT functionality? If you want a certain DDF application to use high-performance DBATs, and that application calls Db2 stored procedures, one option would be to bind the packages of those stored procedures (or at least the ones that are called most frequently) with RELEASE(DEALLOCATE). The DDF application workload at the site I referenced above involves very little in the way of stored procedure calls, so the Db2 administrator decided to get high-performance DBAT functionality by binding the IBM Data Server Driver (formerly Db2 Connect) packages with RELEASE(DEALLOCATE). The default collection for those packages is called NULLID. Did the Db2 administrator bind the packages in the NULLID collection with RELEASE(DEALLOCATE)? No. Why? Because doing that would cause all DDF applications (more specifically, all such applications that are DRDA requesters, as opposed to applications that use the REST interface to DDF) to use high-performance DBATs, and that would likely be a sub-optimal situation. What you want is to utilize high-performance DBATs for selected DDF applications. How do you do that? You do what the Db2 administrator did: you BIND COPY the IBM Data Server Driver packages into another collection, specifying RELEASE(DEALLOCATE) in so doing, and then you point a DDF application for which you want high-performance DBAT usage to the alternate collection of IBM Data Server Driver packages using the Db2 profile tables (this technique is described in an entry I posted to this blog a couple of years ago).

Time for a test! The Db2 administrator verified that PKGREL was set to BNDOPT in a development Db2 environment, and then he got some DDF activity going in that environment, with applications accessing Db2 using the IDs for which profiles had been created to point the applications to the collection in which the IBM Data Server Driver packages had been bound with RELEASE(DEALLOCATE). He used his Db2 monitor to check on high-performance DBAT usage, and saw that there was no high-performance DBAT activity whatsoever on the system - evidenced by the big fat zero in the DDF statistics field labeled by his monitor as "Active DBATs Bound with RELEASE(DEALLOCATE)" (note that different Db2 monitors give this field a slightly different label). What was going on? Why was there no high-performance DBAT activity? There was no such activity because the Db2 administrator had overlooked one thing: after one or more profiles have been defined using the Db2 profile tables, they have to be activated if you want them to be in effect. The Db2 administrator issued the command -START PROFILE to load the contents of the profile tables into memory, thereby activating all profiles for which PROFILE_ENABLED had been set to Y in DSN_PROFILE_TABLE, and after doing that he saw that high-performance DBATs were indeed being used in the development Db2 system (note that profiles can be automatically activated by setting the ZPARM parameter PROFILE_AUTOSTART to YES - the default value for that ZPARM is NO).

OK, on to the production environment. The Db2 administrator BIND COPY-ed the IBM Data Server Driver packages from NULLID into an alternate collection, with a specification of RELEASE(DEALLOCATE); set up the appropriate Db2 profiles and attributes (and activated them!); made sure that the DDF PKGREL setting was BNDOPT (verified in the output of a -DISPLAY DDF DETAIL command); and... saw no use of high-performance DBATs. Huh? What was going on? Why were high-performance DBATs not used? Here's why: when an application connects to a Db2 system via DDF, it picks up its default package collection at connection time, and in the production environment at this site - as at many sites - DDF-using applications connect to Db2 and stay connected for a long time. Those applications were already connected to the Db2 system when the alternate IBM Data Server Driver package collection was established and profiles were set up and activated to point applications to that collection. The applications that the Db2 administrator wanted to direct to the alternate IBM Data Server Driver package collection would not use packages in that collection until they established new connections to the Db2 system. The Db2 administrator knew that the Db2 system was going to be "bounced" (stopped and restarted) during the upcoming weekend, and he figured - rightly - that when applications subsequently reconnected to the Db2 system they would be directed to the alternate IBM Data Server Driver package collection (the one with the packages bound with RELEASE(DEALLOCATE)) thanks to the Db2 profiles he has set up and activated.

One more plot twist. The Db2 Administrator had bound the packages in his alternate IBM Data Server Driver package collection with RELEASE(DEALLOCATE) and with KEEPDYNAMIC(YES). I explained to him that I generally view those two specifications as either/or options, versus both-together options. In other words, I would bind the IBM Data Server Driver packages in an alternate collection with either RELEASE(DEALLOCATE) or KEEPDYNAMIC(YES), and with that choice before me I would go with RELEASE(DEALLOCATE) in a large majority of cases (the packages in the NULLID collection should be bound with RELEASE(COMMIT) and KEEPDYNAMIC(NO)). KEEPDYNAMIC(YES) has some baggage associated with it (as explained in an entry I posted to this blog in 2016 - scroll down near the bottom of that entry), and getting maximum performance benefit from KEEPDYNAMIC(YES) requires programs to be coded with KEEPDYNAMIC(YES) in mind - and most are not - and I really like the on/off switch that we have for high-performance DBATs (provided, as previously mentioned, by the Db2 DDF PKGREL parameter).

The Db2 administrator agreed that the packages in his alternate IBM Data Server Driver package collection should be bound with RELEASE(DEALLOCATE) and with KEEPDYNAMIC(NO), and he wanted to rebind the packages with those specifications but found that he could not do that because the packages were in use (note: the rebind phase-in functionality provided by Db2 12 function level 505 can take care of that situation). This "in-use" situation surprised him, because he had set PKGREL for the Db2 system to COMMIT to turn off high-performance DBAT functionality (he planned to turn high-performance DBAT functionality back on after rebinding the packages in the alternate IBM Data Server Driver package collection), and following the PKGREL change he saw that the number of high-performance DBATs did indeed go to zero for the system. Why, then, were the packages in what he thought of as his high-performance DBAT collection still being used? I told the administrator that PKGREL does NOT affect the package collection that a DDF application uses; rather, it determines whether the RELEASE specification of packages allocated to DBATs for execution will be honored, if that specification is DEALLOCATE. If a DDF application is pointed to collection COLL_XYZ, and the packages in that collection are bound with RELEASE(DEALLOCATE), those packages will be used if PKGREL is set to COMMIT. In that case, however, the packages will be treated by Db2 as though they had been bound with RELEASE(COMMIT). If the PKGREL setting is changed to BNDOPT, the RELEASE(DEALLOCATE) bind option (for which BNDOPT is short) will be honored.

And there you have it. One high-performance DBAT-related case study that touches on so many different aspects of this Db2 for z/OS feature. I hope that the information provided in this blog entry will be useful for you.

Monday, June 29, 2020

The Various Aspects of Db2 for z/OS Data Encryption

Just last week, a person on the Db2 for z/OS team at an organization asked me for some guidance regarding data encryption in a Db2 context. In responding to that request, I noted that Db2 for z/OS data encryption is a multifaceted topic, and I proceeded to describe various categories of Db2 data encryption and the Db2 features and functions that pertain to those categories. Thinking that this information might be of interest to a larger audience, I'll provide it in this blog entry.

The categories of Db2 for z/OS data encryption I will address herein are as follows:
  • Application-transparent encryption of Db2 data "at rest" (i.e., on disk)
  • Application-transparent encryption of Db2 for z/OS data in memory (i.e., in a Db2 buffer pool) as well as on disk
  • Application-aware encryption of data in a column of a Db2 table, on-disk and in-memory
  • Application-transparent encryption of Db2 data "on the wire" (i.e., in-motion on a TCP/IP link between a client application and a Db2 for z/OS server)

Application-transparent encryption of Db2 for z/OS data "at rest" (i.e., on disk)

One way to get this done is via a feature of your disk subsystem, if the disk subsystem that you use with Db2 for z/OS has that feature.

Another way to get this done is through the use of a z/OS feature called, simply, z/OS data set encryption. This z/OS capability can be used with Db2 11 or Db2 12 for z/OS, but it got quite a bit easier to use with Db2 12, at function level M502 or higher (more on that to come). With z/OS data set encryption, data is encrypted automatically when written to disk, and it is decrypted automatically when read from disk. z/OS data set encryption is enabled by associating an encryption key label with a data set (it is of course very important to secure and protect an encryption key, so instead of referencing the key itself we reference its label, which you can think of as the key's "handle"). Here is the difference between a Db2 environment that is at V12 with function level M502 or higher activated, and one that is not (function levels, by the way, are related to the "continuous delivery" mode by which new Db2 12 functionality is provided to users):
  • Db2 12, at function level M502 or higher: A DBA can issue an ALTER TABLE or ALTER STOGROUP statement with a new option, KEY LABEL, to assign an encryption key label to a set of Db2 data sets (for ALTER TABLE. the specified encryption key label will also apply to index and auxiliary table data sets). Function level M502 also provides a new ZPARM parameter, ENCRYPTION_KEYLABEL, through which a key label can be specified for the Db2 catalog and directory data sets, and for archive log data sets (when the latter are written to disk - these will remain encrypted if they are later HSM-migrated to tape). When a key label has been associated with a database object, actual encryption of that object's data can be accomplished via execution of the Db2 online REORG utility (or any other Db2 utility that involves creation of a new data set or data sets, such as LOAD REPLACE or RECOVER or REBUILD INDEX). Note that the KEY LABEL option is valid for CREATE TABLE and CREATE STOGROUP statements, as well as for ALTER statements.
  • Db2 12, at function level M501 or lower, or Db2 11: A key label is associated with a Db2 data set either via a RACF data set profile or by way of an SMS data class specification or through an explicit IDCAMS definition (in the case of a user-managed data set). As noted above, once a key label has been associated with an existing Db2 data set, actual encryption of the data in that data set can be accomplished via execution of a Db2 utility (such as online REORG) that will involve creation of a new data set (or data sets).

One more thing about z/OS data set encryption: for data decryption to occur when an encrypted data set is accessed, the ID of the accessing process has to be RACF-permitted to access the data set's key label. For Db2 data sets, the only IDs that need this RACF permission to access the associated key labels are the IDs of the Db2 database services and system services address spaces (i.e., the Db2 DBM1 and MSTR address spaces). This is so because, when user SMITH issues a SELECT for table T1, z/OS does not perceive that SMITH is accessing the table's data set(s) - it perceives that Db2 is accessing the data set(s).


Application-transparent encryption of Db2 for z/OS data in memory (i.e., in a buffer pool) as well as on disk

This can be accomplished using Db2 edit procedures (EDITPROCs), for encryption of entire rows in tables, or via Db2 field procedures (FIELDPROCs) for encryption of data in individual columns. An IBM software product, called Guardium Data Encryption for Db2 and IMS Databases, makes it much easier to use Db2 EDITPROCs and/or FIELDPROCs for this purpose.

Note that CPU overhead for this type of Db2 data encryption will likely be higher than for z/OS data set encryption, because with the latter data is decrypted when read into memory, whereas with EDITPROC- or FIELDPROC-based encryption, data is encrypted in memory and is decrypted every time it is accessed by an application process.


Application-aware encryption of data in a column of a Db2 table, on-disk and in-memory

This has been do-able for some time by way of a built-in Db2 scalar function called ENCRYPT_TDES. Db2 12 function level M505 provided a new built-in scalar function, ENCRYPT_DATAKEY, that is even more robust than ENCRYPT_TDES. Several factors contribute to the stronger column-level data protection provided by ENCRYPT_DATAKEY versus ENCRYPT_TDES:
  • ENCRYPT_DATAKEY uses 256-bit encryption, while ENCRYPT_TDES uses 128-bit encryption.
  • With ENCRYPT_TDES, an application provides a password when data is encrypted, and that same password must be provided when the data is to be decrypted. What if that password becomes known to someone who shouldn't know it? With ENCRYPT_DATAKEY, access to encrypted data requires specification of the right key label, and on top of that the ID of the application process must be RACF-permitted to access that key label. If someone knows the key label that has to be provided to decrypt data encrypted via the ENCRYPT_DATAKEY function, that person will nonetheless be unable to access decrypted data if he or she is using an ID that has not been RACF-permitted to access the key label.
  • ENCRYPT_DATAKEY offers an option whereby duplicate values in a column will have unique encrypted values (in other words, two values in the column that are duplicates when decrypted will be unique when encrypted). This option makes the data encryption even harder to defeat. 

Application-transparent encryption of Db2 data "on the wire" (i.e., in-motion between a client application and a Db2 for z/OS server)

This is done by way of SSL encryption (or, more precisely, via AT-TLS, which is short for application-transparent, transport-layer security). When a client application - either a DRDA requester or a program utilizing the REST interface to Db2 - interfaces with the Db2 distributed data facility (DDF) and specifies the Db2 system's secure SQL listener port, Db2 will require the use of SSL encryption. The client application will then need to present a server certificate (i.e., a digital certificate obtained from the target z/OS system) to the z/OS system for an SSL "handshake." The appropriate AT-TLS policy, defined on the z/OS side, will be applied to enable SSL communication between the client and the Db2 server.

The process of initially setting up SSL encryption between a client process and a Db2 for z/OS system can be a bit involved because several pieces of work have to fit together. There is client-side work to be done: the server certificate from the target z/OS system needs to be obtained and placed properly in a key store database, and the right connection string has to be used (and/or the right information has to be specified in the IBM Data Server Driver's db2dsdriver.cfg file) to request an SSL connection to the Db2 server. There is Db2 work to be done: a secure SQL listener port needs to be specified for the Db2 system. There is z/OS work to be done: an AT-TLS policy has to be defined and implemented by way of a policy agent. The best reference I've seen that covers all these pieces of work is an IBM "red paper" titled, DB2 for z/OS: Configuring TLS/SSL for Secure Client/Server Communications (download-able from http://www.redbooks.ibm.com/abstracts/redp4799.html?Open). If you get the right people in your organization involved, with the right skills and knowledge (client-side, Db2 for z/OS, z/OS Communications Server, RACF), you should be successful in getting SSL communications set up between Db2 for z/OS and network-connected applications.

That concludes this overview of Db2 for z/OS data encryption in its various forms. I hope that this information will be useful for you.

Thursday, May 28, 2020

Db2 12 for z/OS Buffer Pools - Recommendations for PGSTEAL, PGFIX and FRAMESIZE

Not too long ago, I received a note from the leader of a Db2 for z/OS database administration team, asking for my take on a Db2 12 buffer pool configuration set-up that his team had devised. An interesting characteristic of the configuration was the use of the options PGSTEAL(NONE) and FRAMESIZE(2G) for most of the Db2 subsystem's buffer pools. The team's reasoning regarding this choice was as follows:

  • The z/OS LPAR in which the Db2 subsystem runs has a very large real storage resource, and most of the Db2 subsystem's buffer pools are very large, so why not use 2 GB real storage page frames?
  • Most of the Db2 subsystem's buffer pools are so large that the total read I/O rate for each pool is very low. A low read I/O rate for a pool suggests a low rate of buffer stealing. When that is the case then why not use PGSTEAL(NONE) for the pool, since PGSTEAL(NONE) does not preclude buffer stealing when that is necessary (as it likely will be if the pool is not large enough to hold all pages of all objects assigned to the pool)?

I explained to the DBA team leader that the buffer pool configuration he'd shared with me was likely sub-optimal, owing largely to over-use of the PGSTEAL(NONE) option. I provided some buffer pool configuration recommendations for his consideration. Those recommendations are shown below, in "thumbnail" (i.e., highly condensed) form. Following the list of recommendations is the rationale for each one. I hope that this information will be helpful to you for optimizing the buffer pool configuration in your Db2 12 for z/OS environment.


The list

  • If ALL of the pages of ALL the objects assigned to a buffer pool will fit into 90% of the pool's buffers*, use PGSTEAL(NONE), PGFIX(YES) and FRAMESIZE(1M) for the pool (* for a pool that has a VPSIZE greater than or equal to 64000, this recommendation applies if all pages of all objects assigned to the pool will fit into VPSIZE minus 6400 buffers).
  • If it is expected that there will be some buffer stealing for a pool, and the size of the pool is at least 20 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(2G) for the pool.
  • If it is expected that there will be some buffer stealing for a pool, and the size of the pool is less than 2 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(1M) for the pool.
  • If it is expected that there will be some buffer stealing for a pool, and the size of the pool is between 2 GB and 20 GB, use PGSTEAL(LRU) and PGFIX(YES) for the pool. For the pool's real storage frame size, use either FRAMESIZE(1M) or FRAMESIZE(2G) - your choice.
  • If it is expected that the rate of buffer stealing for a pool will be non-zero but VERY LOW (maybe less than 10 buffer steal actions per minute), you MIGHT consider using PGSTEAL(FIFO) for the pool.

The rationale

Recommendation: If ALL of the pages of ALL the objects assigned to a buffer pool will fit into 90% of the pool's buffers*, use PGSTEAL(NONE), PGFIX(YES) and FRAMESIZE(1M) for the pool (* for a pool that has a VPSIZE greater than or equal to 64000, this recommendation applies if all pages of all objects assigned to the pool will fit into VPSIZE minus 6400 buffers).

Rationale: Why recommend that PGSTEAL(NONE) be used only for Db2 12 buffer pools for which not more than 90% of the buffers (or not more than VPSIZE minus 6400 buffers, when VPSIZE is greater than 64000) will be filled by pages of objects assigned to the pool, when buffer stealing for such a pool can be done, when required? Here's why: Db2 12 manages a PGSTEAL(NONE) buffer pool differently versus Db2 11. In a Db2 12 environment, when the pages assigned to a PGSTEAL(NONE) buffer pool are read into that pool (as they will be when the object is first accessed after the pool has been allocated), they will effectively be arranged in memory as they are arranged on disk. This is why a PGSTEAL(NONE) pool is also referred to as a "contiguous buffer pool" in a Db2 12 system. The in-memory-as-on-disk arrangement of pages reduces the CPU cost of accessing the pages in the pool, because every access to a page in the contiguous part of a PGSTEAL(NONE) buffer pool (there is also a non-contiguous part, as explained below) is a direct access - Db2 knows exactly where a given page in the contiguous part of the pool is located, without having to deal with the hash and LRU chains that are associated with buffer management in a standard pool. Now, a PGSTEAL(NONE) buffer pool still has to allow for buffer stealing in case that is required. How can you enable buffer stealing and still have pages of table spaces or indexes assigned to the pool arranged in memory as they are arranged on disk?

The fact of the natter is, you can't have it both ways - not for the whole pool, at least. You can't both allow for buffer stealing and maintain an in-memory-as-on-disk arrangement of pages in all of the pool's buffers. Here's how Db2 12 addresses this issue: it preserves an in-memory-as-on-disk arrangement of pages for 90% of the buffers of a PGSTEAL(NONE) pool (or for VPSIZE minus 6400 buffers, when VPSIZE is greater than 64000). That is the contiguous part of the pool. Any buffer stealing that has to happen will involve buffers in the remainder of the pool, which is called the overflow area. The number of buffers in the overflow area will be 10% of the pool's VPSIZE if VPSIZE is 64000 or less, and will be 6400 buffers if the pool's VPSIZE is greater than 64000 (for a very small pool, with a VPSIZE smaller than 500, the number of buffers in the overflow area will be 50). What this means is that the maximum CPU efficiency benefit of a Db2 12 PGSTEAL(NONE) buffer pool is achieved not just when there is no buffer stealing for the pool, but when the pool will not get more than 90% full of pages (or when not more than VPSIZE minus 6400 buffers will be occupied by pages, when the pool's VPSIZE is greater than 64000). Why? Because when Db2 has to put pages in the overflow area of a PGSTEAL(NONE) pool, it is no longer arranging all pages in the pool as they are arranged on disk - pages in the overflow area are not managed that way (they can't be, because that's where buffer stealing, if required, will happen). Suppose that a Db2 12 PGSTEAL(NONE) pool has 10,000 buffers. The maximum CPU efficiency benefit for that pool will be realized if not more than 9000 of the pool's buffers are used (i.e., when the pool's overflow area is not used). Another example: if VPSIZE for a PGSTEAL(NONE) pool is 100000, the maximum CPU efficiency benefit for the pool will be realized if not more than 93,600 of the pool's buffers (100,000 minus 6400) are used. To the extent that the overflow area is used, there will be a slight reduction in CPU efficiency delivered via the pool, because access to pages In the overflow area will not be quite as CPU efficient as access to pages in the contiguous part of the pool (where pages are arranged in memory as they are arranged on disk). If the overflow area fills up and some buffer stealing happens, there will be yet additional reduction in the CPU efficiency delivered by way of the PGSTEAL(NONE) pool. [Note that buffer stealing - if required - in the overflow area of a PGSTEAL(NONE) buffer pool will be handled via the traditional LRU (least recently used) algorithm.]

What about the FRAMESIZE(1M) recommendation for a PGSTEAL(NONE) buffer pool in a Db2 12 system? Why not go with FRAMESIZE(2G) if a PGSTEAL(NONE) pool will be at least 2 GB in size? Here's why: in a Db2 12 environment, FRAMESIZE(2G) will not be honored for a PGSTEAL(NONE) pool. The reason for this: if pages are going to be arranged in the contiguous part a PGSTEAL(NONE) pool as they are on disk for maximum efficiency of page access (and that's the case, as noted above), we need to restrict a given real storage page frame in the contiguous part of the pool to pages belonging to one and only one database object (i.e., one table space or index). That being the case, if we honored FRAMESIZE(2G) for a PGSTEAL(NONE) pool, there would be the potential for a large amount of wasted space in some of those very large page frames. If you specify FRAMESIZE(2G) for a Db2 12 PGSTEAL(NONE) buffer pool, the pool will end up being backed by 4K page frames. Large page frames can definitely provide a CPU efficiency benefit for Db2 buffer pools. How can you get large page frames for a PGSTEAL(NONE) pool? By specifying FRAMESIZE(1M) for the pool. If a page frame used for the contiguous part of a Db2 12 PGSTEAL(NONE) buffer pool can hold pages belonging to only one database object (as mentioned), wouldn't we potentially waste space in some 1 MB page frames if we were to use them for a PGSTEAL(NONE) pool? Yes, but some wasted space in some 1 MB frames is generally not going to be a big deal in a z/OS LPAR that has (as is more and more often the case) hundreds of thousands of MB of real storage. [Note, by the way, that a 1 MB page frame used for the overflow area of a Db2 12 PGSTEAL(NONE) buffer pool can hold pages belonging to more than one database object - that is one reason for the drop-off in CPU efficiency for page access we get when the page in question is in the overflow area of a Db2 12 PGSTEAL(NONE) pool versus being in the contiguous part of the pool.]

Recommendation: If it is expected that there will be some buffer stealing for a pool, and the size of the pool is at least 20 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(2G) for the pool.

Rationale: The CPU efficiency advantage provided by using 2 GB page frames for a buffer pool is generally not very significant (versus the use of 1 MB frames) unless the buffer pool is really large (say, 20 GB or larger in size).

Recommendation: If it is expected that there will be some buffer stealing for a pool, and the size of the pool is less than 2 GB, use PGSTEAL(LRU) and PGFIX(YES) and FRAMESIZE(1M) for the pool.

Rationale: As you might expect, FRAMESIZE(2G) is not going to be honored for a pool that is smaller than 2 GB in size (though we will use a 2 GB frame for a pool if FRAMESIZE(2G) is specified and the pool's size is only a little bit smaller than 2 GB).

Recommendation: If it is expected that there will be some buffer stealing for a pool, and the size of the pool is between 2 GB and 20 GB, use PGSTEAL(LRU) and PGFIX(YES) for the pool. For the pool's real storage frame size, use either FRAMESIZE(1M) or FRAMESIZE(2G) - your choice.

Rationale: For a pool whose size is between 2 GB and 20 GB, use of 2 GB page frames will likely not deliver performance that is significantly better than what you'd see with FRAMESIZE(1M). With that said, use of 2 GB frames for a pool that is smaller than 20 GB will not have a negative impact on performance versus 1 MB frames; so, I say that this is your choice, and you'll be fine either way.

Recommendation: If it is expected that the rate of buffer stealing for a pool will be non-zero but VERY LOW (maybe less than 10 buffer steal actions per minute), you MIGHT consider using PGSTEAL(FIFO) for the pool.

Rationale: I see PGSTEAL(FIFO) as being very much a niche-type specification. Yes, it means that we won't consume CPU cycles in tracking least-recently-used information for buffers, but it could negatively impact performance if the buffer stolen via FIFO (the one that, of all the buffers in the pool, holds the page read into the pool the longest time ago) happens to be frequently referenced - in that case, that buffer will likely have to be read right back into the pool. With this in mind, I think that PGSTEAL(FIFO) might deliver a bit of a performance advantage over PGSTEAL(LRU) if the rate of buffer stealing is a really low non-zero value (like, maybe single digits of buffer steals per minute). If the buffer stealing rate is zero, then as previously mentioned the pool could be a good candidate for PGSTEAL(NONE).

Wednesday, April 29, 2020

Clearing the Air Regarding Db2 12 for z/OS and "Deprecated Objects"

On a number of occasions over the past several months, I have seen that there is some misunderstanding in the Db2 for z/OS community regarding support for certain types of database objects in a Db2 12 environment. Via this blog entry, I want to clear up these misconceptions.

The misunderstandings to which I alluded are, most often, related to traditional segmented table spaces. By "traditional segmented," I mean table spaces that are not simple, not partitioned and not universal (I say "traditional segmented" instead of just "segmented" because universal table spaces are themselves segmented). In its most extreme form, the misconception is expressed in this way (and in the quoted phrase below, "segmented" refers to what I call a "traditional segmented" table space):

"I hear that segmented table spaces are not supported in a Db2 12 environment."

Wrong, wrong, wrong.

Where did this misconception come from? Most likely, from a change introduced with Db2 12 function level V12R1M504 (or just M504, for short). OK, quick shot of background information: you are probably aware that with Version 12, Db2 for z/OS has gone to a "continuous delivery" mechanism for getting new functionality out to users. What does that mean? It means that, several times per year, new functionality for Db2 12 is introduced by way of the product's maintenance stream. In other words, Db2 12 fixes come out, and applying one of these fixes takes the code in a Db2 12 subsystem's load library to a new function level (fixes are tied to what IBM calls APARs, and indeed every Db2 12 function level from M501 upwards is associated with an APAR number, as indicated on the "What's new in Db2 12 function levels" page on IBM's Web site). Does that mean you need to be careful about applying Db2 12 fixes, out of concern that you might inadvertently take your system's Db2 code to a function level you're not ready to support? No, you don't need to worry about that. The reason: applying a fix that takes a Db2 12 subsystem's code to a new level doesn't change anything with regard to the subsystem's functionality. A new code level's functionality is activated only when - at a time of your choosing - someone issues an -ACTIVATE FUNCTION LEVEL command that specifies the function level in question.

Back to function level M504. A good bit of the information that went out about that function level - in presentations and such - made mention of "deprecated objects." That, in turn, caused a lot of people to think that (in particular) traditional segmented table spaces could not be used in a Db2 12 system with function level M504 (or higher) activated. As mentioned previously, some people even thought that traditional segmented table spaces could not be used at all in a Db2 12 environment, regardless of the activated function level.

Part of this confusion likely results from misinterpretation of that term, "deprecated." "Deprecated" does not mean gone; instead, it means that the deprecated thing (a feature, a ZPARM parameter, a type of database object, whatever) is a) not going to be enhanced going forward, and b) is likely to go away eventually. That "eventually" can be a long time, in some cases. Example: you lost the ability to create so-called simple table spaces (not segmented, not universal, not partitioned) back in the Db2 9 time frame, but existing simple table spaces can still be used, even in a Db2 12 system.

Here's the straight skinny regarding Db2 12 function level M504 and deprecated objects: when SQL DDL statements are issued via a package for which application compatibility level V12R1M504 or higher is in effect, non-universal table spaces cannot be created. [when a package's application compatibility level is V12R1M504 or higher, that also precludes creation of synonyms (we've been telling you for years to use aliases, instead) and hash-organized tables (those haven't been used by many organizations, and their benefits in a modern Db2 system tend to be marginal, at best).] Boom. That's it.

A package's application compatibility level is specified via the package BIND (or REBIND) parameter APPLCOMPAT, about which I wrote in an entry I posted to this blog last year. When you activate Db2 12 function level V12R1M504, does that mean you can no longer create non-universal table spaces (such as traditional segmented table spaces)? No. Remember, what matters here is the APPLCOMPAT value of the package through which a CREATE TABLESPACE statement is issued (that could be, for example, a SPUFI package or a DSNTEP2 package - common avenues for the issuance of Db2 DDL statements). You could have a Db2 12 system on which function level V12R1M506 is activated (that's the lastest function level available as of the posting of this blog entry), but if you're issuing CREATE TABLESPACE statements via a package bound with, for example, APPLCOMPAT(V12R1M503), you can create traditional segmented table spaces. When the package (or packages) through which you typically issue Db2 DDL statements is bound with APPLCOMPAT(V12R1M504) or higher (possible only if function level V12R1M504 or higher has been activated, as the APPLCOMPAT value for a package cannot be higher than the Db2 12 system's currently activated function level), does that mean that the package can no longer be used to create traditional segmented table spaces? No, that is not what it means. If a package (say a DSNTEP2 package) is bound with APPLCOMPAT(V12R1M504) or higher, and you need to use that package to create a traditional segmented table space, you can do that. How? Simple: prior to issuing the CREATE TABLESPACE statement, issue this statement:

SET CURRENT APPLICATION COMPATIBILITY = 'V12R1M503';

Then, issue the CREATE TABLESPACE statement that creates the traditional segmented table space (a dynamic SQL-issuing program can use the SET CURRENT APPLICATION COMPATIBILITY statement to take its application compatibility level to a value below - but not above - the APPLCOMPAT value of the associated package).

And there you have it. Existing traditional segmented table spaces can still be used in a Db2 12 system, and they can be created in a Db2 12 system via a package with an APPLCOMPAT value of V12R1M503 or lower, or via a package with an APPLCOMPAT value of V12R1M504 or higher if, prior to issuance of the CREATE TABLESPACE statement, the package's application compatibility level has been taken below V12R1M504 via a SET CURRENT APPLICATION COMPATIBILITY statement.

Hope this helps.

Monday, March 30, 2020

Db2 for z/OS Online LOAD REPLACE - Minimizing Impact on Transactions

A couple of years ago, IBM introduced an important enhancement for the Db2 for z/OS LOAD utility, via the fix for an APAR (the APARs are PI69095 for Db2 12, and PI67793 for Db2 11). This enhancement provided a capability that is popularly known as "online LOAD REPLACE," but is more formally referred to as LOAD REPLACE SHRLEVEL REFERENCE.  I'll provide an overview of this functionality in a moment, but I'll first get to the impetus for my writing this blog entry: recently I've worked with Db2 people at a pair of sites, helping them to answer this question: How can I execute an online LOAD REPLACE with minimal impact on transactions that concurrently access the target table? The answer to that question involves utility control options that will likely be recognized by people who are familiar with IBM Db2 online REORG processing.

OK, the scenario that the online LOAD REPLACE enhancement was designed to address: suppose you have a need to periodically "switch out" data in a table. In other words, you want the set of rows now in table T1 to be replaced by another set of rows. This might be done, for example, on a quarterly basis if table T1 contains data for the most recently completed quarter. You could of course use traditional LOAD REPLACE functionality to get the task done, but with that approach the table will be inaccessible to application programs for the duration of the LOAD REPLACE job, and it the amount of data you're putting into the table via LOAD REPLACE is large, and if several indexes are defined on the target table, that period of table inaccessibility could be unacceptably long.

How about using clone table functionality for this purpose? Yes, that could be an appropriate option. Clone table functionality, introduced with Db2 9 for z/OS, allows you to create a clone of a given table (via a DDL statement of the form ALTER TABLE... ADD CLONE). Once a clone has been created for table Tab1 (I'll call the clone Tab1_CLONE, though there is no need for "clone" to be in the object's name), you can put data in it any way you want - via LOAD or INSERT - with no impact whatsoever on access to Tab1. Once Tab1_CLONE contains the data for which you want the data now in Tab1 to be switched out, you issue the SQL statement EXCHANGE DATA for the target table, and as soon as Db2 can get a drain on that table (which should not take long if transactions accessing the table are committing frequently), the switch happens and the next time a program issues a SQL statement referencing table Tab1, that table name will resolve to the physical object that had been called Tab1_CLONE prior to execution of the EXCHANGE DATA statement (and the physical object that had been called Tab1 prior to execution of the EXCHANGE DATA statement will subsequently be called, in this example, Tab1_CLONE). To put this another way, from a transaction program's perspective table Tab1 contained a certain set of rows a few seconds ago, and now (following execution of the EXCHANGE DATA statement) Tab1 contains a different set of rows.

Problem solved, right? Not necessarily. You see, a lot of Db2 for z/OS DBAs don't much like clone tables, and I get that. The concern boils down to this: once a clone has been created for a certain table, there are a number of things you can't do with the table without first dropping the clone, necessitating a re-creation of the clone afterwards. A prime example of this hassle factor: what would be a simple ALTER of a table becomes a more convoluted process if said table has a clone (I wrote about such a scenario in an entry I posted to this blog some years ago). Trying to avoid clone-related complications, some DBA teams came up with data switch-out processes that involved two different tables and use of the SQL statement RENAME TABLE. The main problem with that approach, aside from complexity, is the fact that RENAME TABLE leads to invalidation of dependent packages. Ugh.

Enter online LOAD REPLACE functionality, which has been described as "clone table functionality without the hassles of clone tables." When you run LOAD for a table using the REPLACE option and SHRLEVEL REFERENCE, the table will be placed in read-only mode for application access (programs that access tables for which a data "switch-out" is periodically required are typically of the data-retrieval variety, anyway), and the input data for the utility will be loaded into "shadow" objects corresponding to the table, its table space and its indexes. When the shadow objects are all loaded, online LOAD REPLACE does what Db2 online REORG does - it goes through the SWITCH phase, after which references to what had been the original table (and table space and indexes) resolve to the "new" original objects (which, prior to the SWITCH phase of the utility, were the shadow objects). The SWITCH phase of online LOAD REPLACE generally completes very quickly.

So there you have it, right? An easy, quick (from an application program's perspective) way to switch out data in a table for other data, right? Well, in fact some contention issues can arise when an online LOAD REPLACE job is executed. Twice over the past few months I have had Db2 for z/OS people contact me with questions about application disruption related to execution of the LOAD utility with REPLACE and SHRLEVEL REFERENCE. Similar points came up in these discussions, and I want to share pertinent information with you in the remainder of this blog entry.

The problem the aforementioned Db2 people faced was this: when an online LOAD REPLACE job was executed, some transactions would time out. The "why" of those timeouts has to do with Db2 claims and drains. Every program process that accesses Db2 for z/OS data has to get a claim (read or write) on every table space or index it accesses. Claims acquired by an application process are released when the process commits. When a Db2 utility requires exclusive access to an object (as a LOAD REPLACE SHRLEVEL REFERENCE job does when it is time to do the data set switch between the "original" and shadow data sets), it requests a drain on that object. Once a drain has been requested, in-flight units of work associated with programs accessing the target object are allowed to run to a commit point, but new units of work that will access the object are prevented from acquiring the necessary claim, effectively holding up their execution. If these held-up transactions have to wait too long on a claim, they will time out. Will that happen because a Db2 lock timeout value is reached? Possibly, but more probably the concern is a timeout value enforced by an application server. In this era of transactions that can be initiated at any time by someone using a smartphone app, application-side timeout values can be quite stringent - 10 seconds, for example, is not unusual (if a transaction accesses a data server such as Db2 for z/OS, and the application server waits for 10 seconds without getting a response, the application server will time the transaction out).

Why might it take more than a second or two for a Db2 utility, such as an online LOAD REPLACE that is ready for its SWITCH phase, to get the drain it needs to proceed? Won't in-flight transactions reach a commit point very quickly, since they tend to be sub-second in terms of elapsed time? Sure, most - maybe very nearly all - of the in-flight transactions accessing a table space or index (or a partition of same, if the object is partitioned) on which a utility wants a drain will get to a commit point very quickly, but all it takes is one transaction that's taking its time before committing to cause other transactions to be held up long enough to be timed out. As long as any ONE claim on a Db2 page set or partition (table space or index) has not been released via commit, the drain requested by a utility such as LOAD REPLACE SHRLEVEL REFERENCE (at SWITCH phase time) will not be acquired. When a not-committing-any-time-soon process is in the picture, something's gotta give. Either the utility will give up it's drain request (or will fail because it waited too long to get the drain), or transactions held up because of the drain will time out.

Here is where, in the case of an online LOAD REPLACE, you have to answer a question: what is your priority? Is it getting data in the target table switched out by time X, no matter what? Or, if you have at least some wiggle room regarding the time at which the data switch-out for the table is to be accomplished (at least a few minutes, let's say), is the priority on getting the data switch-out done with minimal (preferably zero) application impact in the form of transaction time-outs? If the priority is getting the data switch-out done by time X, no matter what, the probability of at least a few transactions timing out is going to be greater. If, on the other hand, the data switch-out for the target table needs to be accomplished not more than 10 minutes (for example) after time X, and transaction time-outs must be minimized (ideally, eliminated), you can provide the online LOAD REPLACE job with some flexibility regarding drain acquisition when it's time for the SWITCH phase.

How is that flexibility implemented? By way of the same options available when you're executing a Db2 online REORG utility. I'm talking about the utility options DRAIN_WAIT, RETRY, and RETRY_DELAY:

  • DRAIN_WAIT - If transaction time-out limits are really stringent (and again, that is very often enforced at the application server level) and time-outs must be avoided if at all possible, you might want to go really low with this value - maybe something like 8 seconds. If you do that, and an online LOAD REPLACE requests a drain on a target object in order to perform a data set switch with the associated shadow object, and the drain has not been acquired within 8 seconds (perhaps because some transaction that can run long but is infrequently executed is accessing the object and hasn't committed), the utility will give up on trying to get the drain. The thinking here is that we may be able to avoid transaction time-outs if transactions will be held up by a utility-driven drain request for no more than 8 seconds. Will giving up on a drain request cause the utility to fail? No - not if the utility can try again (see the next item, below).
  • RETRY - If you make the drain wait limit really short, thereby increasing the chance that the drain won't be acquired within the specified wait time, use RETRY to allow the utility to try again for the drain a little later. A RETRY value of 20, for example, would give the online LOAD REPLACE job quite a few chances to acquire the drain needed for SWITCH processing. How long will the utility wait to try again for the drain, if at first it doesn't succeed? That's determined by the setting of the next item, below.
  • RETRY_DELAY - If the online LOAD REPLACE tries and fails to get the drain it needs to get the SWITCH phase done, trying again immediately for the drain might not make sense. Better to give the process that prevented drain acquisition a little time to complete and get out of the way; but, don't wait too long - we haven't got all day to get the job done. Maybe waiting 20 seconds between successive attempts to get the drain needed for utility completion would make sense (in that case you'd specify RETRY_DELAY 20).

OK, so online LOAD REPLACE is a very nice Db2 for z/OS utility enhancement. That said, it doesn't do what it does using pixie dust. When the shadow table space and indexes are loaded and ready for switching for the corresponding "original" objects, the online LOAD REPLACE job needs to have - briefly - exclusive access to the "original" objects that are perhaps being concurrently accessed by transactions. Exclusive access is achieved via a drain request, and that drain request will hold up incoming transactions that are going to access the target objects. Transactions can be timed out if they are held up too long. Minimizing transaction impact, then, can necessitate a really short drain wait time for the utility. That short drain wait time increases odds of drain acquisition failure, but let the utility try again for a the drain a few times, with a reasonable time between successive drain requests, and you can have your cake and eat it, too: switch-out of data in a table (even involving a whole lot of data) accomplished easily, with minimal - quite possibly zero - time-out impact on transactions concurrently accessing the target table.

Friday, February 28, 2020

Db2 for z/OS: Clearing Up Some Matters Pertaining to 10-Byte RBA and LRSN Values

At this time, many Db2 for z/OS-using organizations have already completed their migration to Db2 12. Others are still in the process of accomplishing that migration, or are finalizing migration plans. Especially at sites at which migration to Db2 12 is not yet a fait accompli, I have seen that there is often some degree of confusion pertaining to "extended" RBA and LRSN values, particularly with regard to what needs to be accomplished prior to migrating from Db2 11 to Db2 12, and what actions should be taken in which order. Through this blog entry I hope to bring a measure of clarity to the topic.


Background

As is typical for a database management system, Db2 for z/OS records information about database changes in its log. This is done to enable things like recovery of objects in a database, and rollback of data changes made by a transaction that terminates abnormally. While Db2's log exists physically in some number of active and archive log data sets, logically it is essentially a single large (maybe VERY large) file. In that single logical log file, Db2 records the location of individual records by their position in the log - a position known as the relative byte address, or RBA. That is literally the number of bytes into the (potentially very large) single logical log file at which a record can be found.

For years and years, log RBA values were stored in a six-byte field. Six bytes allowed for 256 TB of data to be recorded in the Db2 log, and way back around 1983, when Db2 for z/OS (then called Db2 for MVS) came to market, that seemed like plenty of capacity. Db2 ended up being used for a lot of very large databases that were characterized by very large volumes of data-change activity, and it eventually became clear that the Db2 log needed more than 256 TB of space for records - a LOT more. Db2 11 addressed that need by enabling a transition from 6-byte to 10-byte RBA and LRSN values (LRSN values pertain to Db2 data sharing groups - more on that in a moment). A 10-byte RBA value provides 1 yottabyte of log capacity (2 to the 80th power). Log space problem solved, once you transition to the 10-byte values.

And how do you make that transition? There are two aspects of the task - what you might call a system aspect and a database aspect. The system aspect involves getting a Db2 subsystem to use 10-byte RBA values in the records it writes to the log. That is done by running a Db2-provided utility, called DSNJCNVT, that converts the Db2 bootstrap data set, or BSDS, to accommodate 10-byte RBA (and LRSN) values (the BSDS contains, among other things, information about a Db2 subsystem's log). This BSDS conversion utility can be executed any time after the associated Db2 11 subsystem is in new-function mode (NFM). Here we come to an important point, and to a clarification I want to communicate: the BSDS must be converted to accommodate 10-byte RBA (and LRSN) values before you migrate a Db2 11 subsystem to Db2 12, and that is the only action pertaining to extended RBA (and LRSN) values that must be completed prior to migrating to Db2 12. In other words, the system part of getting to 10-byte RBA (and LRSN) values must be taken care of prior to migrating from Db2 11 to Db2 12, but the database part of transition to extended RBA (and LRSN) values can be addressed following migration to Db2 12.

Why I mentioned "clarification" in the preceding paragraph: I've found that a number of Db2 people think that both the system and database parts of extended RBA (and LRSN) transition have to be accomplished prior to migrating a Db2 11 subsystem to Db2 12. NOT TRUE - it's just the system part (BSDS conversion) that has to be done before you can migrate to Db2 12 from Db2 11.

OK, so that should make things pretty straightforward, right? You get your BSDS converted to accommodate 10-byte RBA (and LRSN) values, and then you can tackle the database part of extended RBA (and LRSN) transition at a time of your choosing, either before or after migrating to Db2 12, right? Well, maybe not so fast there. There is an important consideration regarding the timing of BSDS conversion versus database conversion that is relevant in a standalone Db2 environment and not relevant in a Db2 data sharing environment. I'll explain this critical distinction below.


The database part of extended RBA (and LRSN) transition, and the data sharing difference

First, what is this "database part" of extended RBA (and LRSN) transition to which I've been referring? Simple: in every page of every database object (table space or index), Db2 records the point in the log at which data in the page was last updated (among other things, that page-level information is used by Db2 for lock avoidance purposes). The particular field in a table space or index page in which this information is recorded is called the PGLOGRBA field. Even if Db2 is using 10-byte RBA (and LRSN) values in records it writes to the log, if the PGLOGRBA field is using 6-byte values, and we reach the limit of those 6-byte values, the data in the database objects can't be updated. Not good. That means that we have to change the database objects so that the PGLOGRBA field can contain 10-byte values. That change is generally accomplished via online REORGs of table spaces and indexes (it can also be accomplished by other utilities that involve creation of new data sets to replace the old, such as LOAD REPLACE and REBUILD INDEX). Db2 provides a job, DSNTIJCV, that will convert the catalog and directory objects to accommodate 10-byte PGLOGRBA values, and you can get this done for user and application table spaces and indexes by executing online REORG (or LOAD REPLACE or REBUILD INDEX) with the RBALRSN_CONVERSION EXTENDED option (or with the ZPARM parameter UTILITY_OBJECT_CONVERSION set to EXTENDED or NOBASIC). Note that if you want to "keep score," with respect to progress made in converting table spaces and indexes to accommodate 10-byte PGLOGRBA values, you can do that via the RBA_FORMAT column of the SYSTABLEPART and SYSINDEXPART tables in the Db2 catalog.

So, what about that "timing" thing I mentioned previously? That has to do with converting the BSDS to accommodate 10-byte RBA (and LRSN) values (the "system" part of transition to 10-byte RBA/LRSN values) and converting database objects to accommodate 10-byte values in the PGLOGRBA field in table space and index pages (the "database" part of transition to extended values). Which should you tackle first? Well, the system part, right, because that's required prior to migration to Db2 12, while the database part can be taken care of either before or after migration to Db2 12 (or some before and some after migration), right? In fact, the proper way to proceed, with regard to the order in which you address the system and database parts of transition to extended RBA/LRSN values, depends in large part on whether the Db2 subsystem in question is operating in standalone mode or as a member of a Db2 data sharing group.

What does Db2 data sharing have to with this? Here is the key distinction: in a Db2 data sharing environment, RBA values are NOT stored in the PGLOGRBA field of a table space or index page; instead, LRSN values are stored in that field. LRSN stands for log record sequence number. It's a timestamp-based value, and we need to use that for log record location in a Db2 data sharing system because in such an environment it is very common for a given database object to be updated concurrently by processes running on different members of the data sharing group. Each Db2 group member writes records to its log data sets that pertain to database changes made by processes running on that member, and each member is still writing RBA values to its log, but different members of the data sharing group will have different (often WAY different) RBA values, and what's needed is a COMMON expression of a log point, and LRSN values provide that consistency of log point references across members of the data sharing group; thus, in a data sharing group a Db2 member will write LRSN values as well as RBA values to its log, whereas the PGLOGRBA field in table space and index pages in a data sharing environment will contain LRSN values instead of RBA values. Make sure that you understand this: in a Db2 data sharing environment, log RBA values are NOT found in the pages of database objects, whereas in a standalone Db2 system log RBA values ARE recorded in the PGLOGRBA field of table space and index pages.

With that established, here's the implication for Db2 systems operating in standalone mode: when the BSDS is converted to use 10-byte RBA values, the pace at which the RBA value advances will accelerate. Why? Because the RBA values being written to the log are now 4 bytes longer than they were previously. Will the change in the pace of increase for RBA values be pretty small, or fairly large? That depends on the nature of the log records. If data-change activity in a Db2 system primarily involves inserts, and if inserted rows are rather long, the addition to the length of the typical log record due to the extra 4 bytes of the RBA value will be relatively small. If, on the other hand, data change activity in the system is primarily related to updates, and if the associated log records are pretty short (Db2 might be logging just the change of a single column caused by an update), the increase in the size of the log records caused by the extra 4 bytes of the RBA value can be quite significant.

Alright, so the pace of RBA value-increase quickens after conversion of the BSDS. Is that something about which you should be concerned, assuming that you're running Db2 in standalone versus data sharing mode? Yes, if your system's RBA value is getting pretty close to the 6-byte limit. You can check the latest log RBA value on your system via the Db2 command -DISPLAY LOG, and if the high-order value of the 6-byte RBA field is 'C' or less (hexadecimal format), you likely have a good bit of headroom. If, on the other hand, you see that the high-order value of the 6-byte RBA field is 'D' or higher, you'd best get cracking on avoiding problems related to hitting the 6-byte limit. In that situation, converting database objects to accommodate 10-byte values in the PGLOGRBA field (the database part of transition to extended values) BEFORE converting the BSDS could be a good idea, because you'll maximize the time available to you to accomplish the conversion of database objects (because you haven't sped up the advance in RBA values by converting the BSDS). If your system's 6-byte RBA value is a good ways off from the limit, you could convert the BSDS prior to converting table spaces and indexes, because you're not looking at a crunch-time situation.

In a data sharing environment, go ahead and convert the BSDS first. Yes, that will cause the advancement of RBA values for the Db2 member subsystem to accelerate, but so what? RBA values are not recorded in the PGLOGRBA field of table space and index pages (LRSN values are there, instead), and the conversion to 10-byte RBA values means that the system's logging capacity has been enormously increased.

Even if you are running Db2 in data-sharing mode, or even if (in a standalone Db2 environment) you're a long way from hitting the 6-byte RBA limit, you should get the database part of the transition to 10-byte RBA/LRSN values done within a reasonable time frame - like, maybe within the next year or so. Why? Because eventually that will have to get done, and you don't want the task to be put on the back burner and subsequently forgotten. Get catalog and directory objects converted (using the Db2-supplied job to which I referred above), and use online REORG (or LOAD REPLACE or REBUILD INDEX) to convert user and application table spaces, and track progress (as previously noted) via the RBA_FORMAT column of the SYSTABLEPART and SYSINDEXPART catalog tables. Once you're done, you'll never have to worry about this again - and that would be a good thing.

I hope that the information I've provided in this blog entry will be useful for you. For more information on extended RBA and LRSN values, and on the transition to same, see the excellent and comprehensive write-up in section 3.1 of the IBM rebook titled, IBM DB2 11 for z/OS Technical Overview (download-able from http://www.redbooks.ibm.com/abstracts/sg248180.html?Open).