Wednesday, October 24, 2012

DB2 for z/OS: Have You Checked Your Log RBA Value Lately?

I'm in Las Vegas this week for the IBM Information on Demand 2012 conference. At this year's IOD event I've particularly enjoyed learning more about DB2 11 for z/OS, the beta program for which is expected to begin in March of 2013. DB2 11 will deliver a lot in the way of new features and capabilities. One of the big news items here is the extended RBA that will be available in DB2 11 systems running in new-function mode (the RBA, or relative byte address, is an identifier for records written by a DB2 subsystem to its transaction log). Whereas the current RBA is a six-byte hexadecimal value, providing 2^48 bytes (256 TB) of log capacity, the DB2 11 extended RBA is a ten-byte value, allowing for the logging of a yottabyte of data-change records. Not familiar with the term, "yottabyte?" Neither was I until a couple of days ago. It's 1000 zetabytes, or a trillion terabytes. That's a lot of logging capacity.

A larger maximum RBA value is important because a few DB2 for z/OS sites have bumped up against the 2^48-byte limit of the current RBA. Actually, what these folks bump up against is a practical RBA value limit that's a little less than 2^48. See, the RBA value would be thoretically maxed out at x'FFFFFFFFFFFF' (in hexadecimal notation); however, actually reaching that value would prevent a DB2 subsystem from doing much of anything, and that would not be good for data integrity protection. So, what happens is that DB2 will automatically shut down if the log RBA value reaches x'FFFF00000000'. This gives the DB2 subsystem a little room at the end of the log that it will use, when restarted, in resolving any incomplete units of recovery and otherwise ensuring that data is left in a consistent state (an attempt to restart the subsystem in "regular" mode after the automatic shutdown will result in DB2 going through "restart light" processing and shutting down again, after which it could be restarted again with ACCESS(MAINT) for any further clean-up tasks).

That sort of automatic shutdown is of course something you'd want to avoid, and fortunately you can. First of all, DB2 will issue message DSNJ032I, with an "alert level" of WARNING, when the log RBA value reaches x'F00000000000'. This would give you some time to plan for dealing with the approaching RBA limit (DSNJ032I is also issued, with an alert level of CRITICAL, when the RBA value reaches x'FFFF00000000' -- the value that triggers automatic shutdown). Here is a piece of advice for you: do NOT wait until your DB2 subsystem issues DSNJ032I with the WARNING alert level. Instead, check the RBA value for each of your DB2 for z/OS subsystems now. This can be quickly and easily done by issuing the command -DISPLAY LOG on each subsystem. In the command output you'll see a field, H/W RBA, that shows the highest RBA value written by the DB2 subsystem to its transaction log. Chances are, the value that you see in this field will be way lower than x'F00000000000', the WARNING threshold. A colleague of mine who has a lot of knowledge in this area, Paul Bartak, has said that a high-RBA value of x'C00000000000' or greater should be a trigger that gets you into RBA limit planning mode. If what you see for H/W RBA in -DISPLAY LOG output for a subsystem is less than x'C00000000000', just keep an eye on the situation through periodic issuance of -DISPLAY LOG.

If the high RBA value for a DB2 subsystem in your environment is north of x'C00000000000' then as my co-worker Paul suggested you should start developing a plan of action. What you'll end up doing will depend on whether you run DB2 for z/OS in standalone mode or in a data sharing configuration on a Parallel Sysplex. In the standalone case, you will follow a procedure that will result in the DB2 subsystem's RBA value being reset to zero. This procedure is documented in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA value in a non-data sharing environment" (you can access the DB2 for z/OS product documentation, including the Administration Guide, in either PDF or HTML form, at http://www-01.ibm.com/support/docview.wss?rs=64&uid=swg27011656). A number of mainframe DB2 sites have successfully gone through this documented RBA-reset process -- it works.

If you have a DB2 subsystem that is approaching the log RBA limit, and that subsystem is a member of a data sharing group, there are a couple of options available to you. One alternative is to reset the member subsystem's RBA -- a less involved process in a data sharing versus a non-data sharing environment. The other choice involves starting up a new member of the data sharing group and permanently quiescing the member with the elevated RBA value (with DB2 10 in new-function mode, one can delete a member from the data sharing group -- until then, the subsystem ID of the permanently quiesced member will continue to show up in the text of some DB2 messages, but that's generally not a big deal). The DB2 for z/OS Data Sharing Planning and Administration Guide, under the heading, "Removing members from the data sharing group," provides information on permanently quiescing a member subsystem. Note that if the new member -- the one that will replace the subsystem that is to be permanently quiesced -- will be the eighth to join the data sharing group, entries in the lock table (part of the lock structure in a coupling facility) will have to be 4 bytes apiece, versus 2 bytes each. To maintain the same number of lock table entries in that case (good for holding the line on false global lock contention), you'd want to increase the size of the lock structure -- which should already be at a power of 2, such as 64 MB -- to the next higher power of 2.

Whether your DB2 subsystem is operating in standalone or data sharing mode, there is information in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA," on using output from the print log map utility (aka DSNJU004) to estimate the amount of time you have before a subsystem's RBA value reaches the critical threshold.

Something else for those of you running DB2 in a data sharing group: do NOT think that RBA values no longer matter, just because LRSN values (short for log record sequence number) are used for data recovery operations. RBA values continue to go up for data sharing members as they write records to their log data sets, and the automatic shutdown occurs for a data sharing member just as it does for a non-data sharing subsystem when the RBA value reaches x'FFFF00000000'. I don't want you to be caught napping in this regard.

Speaking of LRSN values, those are also extended beginning with DB2 11 in new-function mode, to the tune of giving you something like another 30,000 years of value growth (LRSN values are derived from the store clock timestamp, and the current LRSN field can take you to the year 2042). Given the 30 years of value-increase that can be accommodated by the current-format LRSN, will any organizations be in a hurry to implement the DB2 11 extended LRSN (done, as for the extended RBA, through a conversion of the bootstrap data set and online REORGs of database objects)? Some will be. Why? Because on occasion a company had to add a delta to the LRSN of the originating member of a DB2 data sharing group in order to have a starting LRSN value that was higher than the already-elevated RBA value of that originating DB2 member. You can look at print log map utility output to see of you have such an LRSN delta in your environment. If you don't -- and that's probably the case -- then the LRSN value limit is still a long ways off for you. If there is a delta, you'll be more interested in getting to the DB2 11 extended LRSN sooner rather than later.

So, to recap: you should periodically check the RBA values of your DB2 subsystems, to ensure that they are well below the critical threshold. If the RBA value for a subsystem is seen to be high (with a 'C' in the high-order position), DON'T PANIC. Prepare to head off the RBA limit situation before it occurs, and get ready to put all this in your rear-view mirror once you get to DB2 11 new-function mode.  

6 comments:

  1. Nice Article on DB2 version 11..

    ReplyDelete
  2. hello Robert , thanks for the article .

    We close to reaching the limit and have seen the warning messages in the log. We have set a plan in motion to get the RBA RESET . Let's hope all goes well :-)

    ReplyDelete
    Replies
    1. Did you face any issues?.I am about to reach this value for a customer and planing to reset the RBA

      Delete
  3. we ran out of RBA but we did not see the message DSNJ032I where would that be found? I have been scanning Z/OS system logs

    ReplyDelete
    Replies
    1. DSNJ032I should go to the z/OS console. And, DSNJ032I is a non-scrollable message - it should remain, highlighted, on the z/OS console until it has been responded to. I believe that the specific z/OS console to which the message would go is the console from which -START DB2 was issued for the Db2 subsystem in question. It is possible to alter the routing of Db2 messages via a specification for parameter ROUTCDE in ZPARM.

      Robert

      Delete