Monday, December 17, 2012

DB2 for z/OS: Clone Tables and "Active Versioning"

A DB2 for z/OS DBA contacted me a few days ago and posed a seemingly simple question, the answer to which ended up being a little more involved than might be expected. The DBA's query: how can one alter a table that has a clone?

The specific need in this case was to add a column to a table for which a clone had been created. As is true of other ALTER TABLE operations, addition of a table column cannot be accomplished while the clone relationship exists. The in-a-nutshell solution is to drop the clone table (via ALTER TABLE DROP CLONE), perform the desired alteration of the base table (such as ALTER TABLE ADD COLUMN), and then re-create the clone table (ALTER TABLE ADD CLONE). The thing is, the aforementioned DBA was doing this, and it wasn't working. In this blog entry, I'll explain why that was so, and how we determined the procedural steps required for success.

First, a brief level-set regarding DB2 clone tables: this refers to a feature, introduced with DB2 9 for z/OS in new-function mode, whereby one can create a "clone" of a DB2 table. If you have a base table named TBX, an associated clone could be created with a statement that would look like this:

ALTER TABLE TBX
ADD CLONE TBX_CLONE;

[You don't have to have the word "clone" in the name of the clone table, but having such an identifier is helpful when it comes to accessing a clone table -- it helps to ensure that you're accessing a clone, rather than the base table, when clone access is what you want.]

The clone table created by the statement above would look, in a structural sense, just like the base table: same columns, in the same order, with the same names and data types; same indexes; same check constraints (if any); etc. The one difference between the clone and the base table (other than the table name) is the data in the two tables, and this difference is the essence of a clone table's raison d'etre. Initially, a clone table is empty (and it resides in another "instance" of the base table's table space, distinguished by the fifth-level qualifier in the name of the underlying data set, which at first will be I0002 for the clone, versus I0001 for the base). After being created, the clone table can be populated in the usual way: through SQL inserts or the LOAD utility. Adding data to, and changing data in, the clone table has no effect on the users and programs accessing data in the base table. Once the clone data contains data as desired, that data can be effectively "switched" with the data in the base table -- quickly and programmatically -- by way of the SQL statement EXCHANGE DATA. Users and applications continue to access table TBX (to use my example table-name), but now the data they're accessing is the data that -- before the "switch" -- was the data in TBX_CLONE. In effect, through the EXCHANGE DATA statement, the table that had been the clone in the relationship becomes the new base table and takes on the base table's name, while the table that had been the base table becomes the new clone and takes on the name of the clone table. To learn a little more about what happens when EXCHANGE DATA is executed, you can check out an entry that I posted a few years ago to the blog that I maintained while working as an independent DB2 consultant.

OK, back to the question asked by the DBA. He knew that an ALTER TABLE table statement (other than an ALTER TABLE DROP CLONE) could not be successfully executed for a target table with a clone; so, he dropped the clone and successfully executed an ALTER TABLE ADD COLUMN statement. After doing that, however, the DBA was not able to re-create a clone for the table -- the ALTER TABLE ADD CLONE statement failed with a -148 error code, reason code 9, indicating that the ALTER failed because "active versioning" was in effect for the table.

DB2 uses table space versioning to ensure data integrity and preserve data availability through database schema changes effected by way of ALTER TABLE statements. A new version for a table space is created by DB2 when (for example) a new column is added to a table in the table space via ALTER TABLE ADD COLUMN, and the table space is placed in an advisory REORG-pending state (AREO*). This state will be removed, and the associated schema change will be applied (i.e., physically implemented) when the table space in question is reorganized. You might think that versioning would no longer be "active" for the table space following a reorganization, but post-REORG, the ALTER TABLE ADD CLONE statement to which I've referred still didn't work for the DBA who contacted me for help: again, the error code received was -148, condition code 9 (failure due to active versioning).

Faced with this situation, the DBA ran the REPAIR utility for the affected table space, with the VERSIONS option. Well, that reset the version information for the table space in the SYSIBM.SYSTABLESPACE catalog table (the values of the OLDEST_VERSION and CURRENT_VERSION columns for the table space's row in SYSTABLESPACE were both changed to 0, indicating that versioning had NEVER been active for the table space), and after that the ALTER TABLE ADD CLONE statement could be successfully executed; however, there was a problem: the column added to the table was of the type INTEGER WITH DEFAULT NULL, but the null indicator was not set for the column (a SELECT for the table with an IS NULL predicate referencing the new column returned no results). On top of that, the value sometimes found in the new column (which was supposed to have NULL as its default value) was -2147483648.

That's pretty weird, but weird things can happen when you use REPAIR to "turn off" indicators (such as active versioning) that DB2 uses for data integrity purposes. To be sure, REPAIR has its place as a tool for DBAs, and indeed it is sometimes the ONE way to address an issue impacting data availability. Still, it's a utility with which you can shoot yourself in the foot, and when I hear of a situation in which REPAIR is used to "fix" a DB2 database problem, I am always motivated to find a means by which the problematic object indicator can be "turned off" or reset through actual removal of the circumstances prompting DB2 to set said indicator, versus sledgehammering the indicator with REPAIR. The presence of "active versioning," as indicated by non-equal values for OLDEST_VERSION and CURRENT_VERSION for a table space's row in SYSIBM.SYSTABLESPACE, is addressed by two actions: reorganizing the table space that is in AREO* status and deleting from SYSCOPY any records associated with image copies of the table space that were taken when a version other than the table's current version was in effect. My DBA friend took these two steps for the table space holding the altered table (image copy records associated with older versions of the table space were removed from SYSCOPY via execution of the MODIFY RECOVERY utility), and everything then worked as he wanted it to: a clone could be created for the the altered table, and there were none of the unexpected data retrieval results for the newly added column that he'd seen after using REPAIR to force a resetting of the table space's version number.

So, if you have a table with a clone, and you want to alter that table in some way (e.g., by adding a column to the table), do the following:
  1. Drop the clone table with ALTER TABLE DROP CLONE.
  2. Alter the table as needed (e.g., issue an ALTER TABLE ADD COLUMN statement).
  3. REORG the table space to apply the schema change specified in the preceding ALTER TABLE statement, and take a full image copy in the process (an inline copy is taken anyway if you run an online REORG). The REORG will ensure that only one version is associated with the data in the table space, and the full image copy will allow you to delete older image copy records (those taken when prior versions were in effect for the table space).
  4. Run MODIFY RECOVERY for the table space, with a specification that will remove from SYSCOPY records for image copies taken when a prior version was in effect for the table space. A specification of RETAIN LAST(1) should preserve the record for the image copy taken during the REORG (step 3 above) while deleting records for older image copies. Note that this is a special case: it's generally recommended that you retain SYSCOPY records for at least the two most recent full image copies of a table space. So, after running this MODIFY RECOVERY job with RETAIN LAST(1), for subsequent SYSCOPY "housekeeping" go back to a MODIFY RECOVERY specification that will retain records for two or more full image copies of the table space.
  5. With versioning no longer "active" for the table space, you can issue an ALTER TABLE ADD CLONE statement to create a clone for the table you altered in step 2.

And leave REPAIR out of this picture -- you don't need to run it to be able to create a clone for an altered table.

12 comments:

  1. Hi,
    I tried your recommended strategy but was not successful. The reason was an Incremental Copy taken in the early morning of the same day before the Alter Table Statement.
    Modify Recovery with Retain Last(1) deletes only the ICs taken the day before or earlier.
    So you have to ensure that no IC ist taken on this day prior the Alter Table Statement

    ReplyDelete
    Replies
    1. Good point. RETAIN LAST(n) does indeed work on a date boundary, versus a timestamp bounday; so, if two image copies were taken on date X, abd RETAIN LAST(1) were issued later in date X, both of those image copy records will be retained. If one of the image copies was taken when a previous version of the table space was active, you could run into the situation you've mentioned.

      Thanks for sharing your experience via the comment.

      Delete
  2. here's what i'd like to do:
    1. unload a base table
    2. add a non-indexed column to the base table
    3. clone the base table
    4. append new col data to unload file
    5. load replace unload file to clone table without building indexes
    6. execute an exchange that points to the underlying version 2 table files, but renames the underlying version 1 index files to version 2.

    if 5 and 6 were a reality, this would be an efficient way to do this type of maintenance on very large tables with multiple NPIs.

    thoughts?

    ReplyDelete
    Replies
    1. Item 5 in your list (populate a clone table via LOAD REPLACE) is, of course, a reality. Only item 6 in the list is not possible. I don't see that becoming possible any time soon, if ever. Taking existing indexes on a base table that has a clone, and somehow causing them to be associated with the clone table instead of the base table, would, I think, create a number of data integrity exposures.

      Robert

      Delete
  3. Thanks for the response Robert, and Happy New Year!

    Could you elaborate on the risk you referred to? If no additional indexed data was introduced in the clone, shouldn't an index built from the base table service the clone? Such an option on the Exchange would allow minor alterations to very large tables with only the overhead of a Load Replace that builds the partitioning index.

    Thanks again,
    Rick

    ReplyDelete
  4. First and foremost, I'd be concerned about row IDs changing between the base and clone tables. If the table space is compressed, the rows in the input file for LOAD into the clone table might compress differently versus the rows in the base table which was altered to have the new column, and that could affect row positioning within the clone table versus the base table, and then the RIDs in the indexes you want to "switch" would be off, and that would be a bad thing.

    Of course, my opinion really doesn't matter much here. If you want this proposed capability to be added to DB2 for z/OS, submit a requirement via the RFE (Request For Enhancements) application on IBM's Web site (http://www.ibm.com/developerworks/rfe/?BRAND_ID=184). That way, you'll be dealing with the people whose opinions DO matter here: the IBM DB2 for z/OS development team. Entering the request will also allow other DB2 for z/OS users to indicate their support for the proposed enhancement.

    Robert

    ReplyDelete
    Replies
    1. once again, thanks for taking the time to respond.

      i'm approaching this from an applications architecture perspective, which can conveniently ignore some devils until database experts, such as yourself, expose them in the details.

      i think i will throw this against the wall you referred me to, just in case it hasn't already been discussed and dismissed during the design of the Exchange process.

      you've been very helpful.

      thanks again,
      rick

      Delete
  5. Hi Robert,

    "This state will be removed, and the associated schema change will be applied (i.e., physically implemented) when the table space in question is reorganized".....

    I have a doubt which is related to the above statement. If i am performing "ALTER TABLE ADD COLUMN" then DB2 would logically make the changes and Set the TS in ADVISORY pending status. So REORG is necessary to make it physically implemented in DB2 . But if there is the case that i want to withdraw my changes (Logical changes/want to drop the column that i tried to add) before performing REORG as it has not implemented physically. Is it possible? Will Advisory pending status restricts doing any SQL on the TS?

    ReplyDelete
    Replies
    1. In a pre-DB2 11 for z/OS environment, the only way to "undo" an ALTER TABLE ADD COLUMN is to unload the table, drop the table, re-create the table without the column in question, and re-load the table. With DB2 11 running in new-function mode, there is a new DROP COLUMN option for the ALTER TABLE statement. ALTER TABLE DROP COLUMN is a pending change, and a subsequent online REORG of the associated table space would put the change into effect. Note that the target table must be in a universal table space.

      Robert

      Delete
  6. Question: Would the conditions you describe still occur if the tablespace that held the CLONE table were DROPped with/or instead of only the table?
    Wouldn't IC's on this tablespace Incremental or Full be deleted from SYSIBM.SYSCOPY?

    ReplyDelete
    Replies
    1. Sorry about the delay in responding.

      Are you asking for confirmation that a DROP of a table space will remove SYSCOPY records pertaining to the table space? If so, then I can tell you that, yes, dropping a table space results in deletion of all records - image copy-related and otherwise - pertaining to the table space in SYSCOPY (and in SYSLGRNX).

      Hope that helps.

      Robert

      Delete