Monday, March 30, 2020

Db2 for z/OS Online LOAD REPLACE - Minimizing Impact on Transactions

A couple of years ago, IBM introduced an important enhancement for the Db2 for z/OS LOAD utility, via the fix for an APAR (the APARs are PI69095 for Db2 12, and PI67793 for Db2 11). This enhancement provided a capability that is popularly known as "online LOAD REPLACE," but is more formally referred to as LOAD REPLACE SHRLEVEL REFERENCE.  I'll provide an overview of this functionality in a moment, but I'll first get to the impetus for my writing this blog entry: recently I've worked with Db2 people at a pair of sites, helping them to answer this question: How can I execute an online LOAD REPLACE with minimal impact on transactions that concurrently access the target table? The answer to that question involves utility control options that will likely be recognized by people who are familiar with IBM Db2 online REORG processing.

OK, the scenario that the online LOAD REPLACE enhancement was designed to address: suppose you have a need to periodically "switch out" data in a table. In other words, you want the set of rows now in table T1 to be replaced by another set of rows. This might be done, for example, on a quarterly basis if table T1 contains data for the most recently completed quarter. You could of course use traditional LOAD REPLACE functionality to get the task done, but with that approach the table will be inaccessible to application programs for the duration of the LOAD REPLACE job, and it the amount of data you're putting into the table via LOAD REPLACE is large, and if several indexes are defined on the target table, that period of table inaccessibility could be unacceptably long.

How about using clone table functionality for this purpose? Yes, that could be an appropriate option. Clone table functionality, introduced with Db2 9 for z/OS, allows you to create a clone of a given table (via a DDL statement of the form ALTER TABLE... ADD CLONE). Once a clone has been created for table Tab1 (I'll call the clone Tab1_CLONE, though there is no need for "clone" to be in the object's name), you can put data in it any way you want - via LOAD or INSERT - with no impact whatsoever on access to Tab1. Once Tab1_CLONE contains the data for which you want the data now in Tab1 to be switched out, you issue the SQL statement EXCHANGE DATA for the target table, and as soon as Db2 can get a drain on that table (which should not take long if transactions accessing the table are committing frequently), the switch happens and the next time a program issues a SQL statement referencing table Tab1, that table name will resolve to the physical object that had been called Tab1_CLONE prior to execution of the EXCHANGE DATA statement (and the physical object that had been called Tab1 prior to execution of the EXCHANGE DATA statement will subsequently be called, in this example, Tab1_CLONE). To put this another way, from a transaction program's perspective table Tab1 contained a certain set of rows a few seconds ago, and now (following execution of the EXCHANGE DATA statement) Tab1 contains a different set of rows.

Problem solved, right? Not necessarily. You see, a lot of Db2 for z/OS DBAs don't much like clone tables, and I get that. The concern boils down to this: once a clone has been created for a certain table, there are a number of things you can't do with the table without first dropping the clone, necessitating a re-creation of the clone afterwards. A prime example of this hassle factor: what would be a simple ALTER of a table becomes a more convoluted process if said table has a clone (I wrote about such a scenario in an entry I posted to this blog some years ago). Trying to avoid clone-related complications, some DBA teams came up with data switch-out processes that involved two different tables and use of the SQL statement RENAME TABLE. The main problem with that approach, aside from complexity, is the fact that RENAME TABLE leads to invalidation of dependent packages. Ugh.

Enter online LOAD REPLACE functionality, which has been described as "clone table functionality without the hassles of clone tables." When you run LOAD for a table using the REPLACE option and SHRLEVEL REFERENCE, the table will be placed in read-only mode for application access (programs that access tables for which a data "switch-out" is periodically required are typically of the data-retrieval variety, anyway), and the input data for the utility will be loaded into "shadow" objects corresponding to the table, its table space and its indexes. When the shadow objects are all loaded, online LOAD REPLACE does what Db2 online REORG does - it goes through the SWITCH phase, after which references to what had been the original table (and table space and indexes) resolve to the "new" original objects (which, prior to the SWITCH phase of the utility, were the shadow objects). The SWITCH phase of online LOAD REPLACE generally completes very quickly.

So there you have it, right? An easy, quick (from an application program's perspective) way to switch out data in a table for other data, right? Well, in fact some contention issues can arise when an online LOAD REPLACE job is executed. Twice over the past few months I have had Db2 for z/OS people contact me with questions about application disruption related to execution of the LOAD utility with REPLACE and SHRLEVEL REFERENCE. Similar points came up in these discussions, and I want to share pertinent information with you in the remainder of this blog entry.

The problem the aforementioned Db2 people faced was this: when an online LOAD REPLACE job was executed, some transactions would time out. The "why" of those timeouts has to do with Db2 claims and drains. Every program process that accesses Db2 for z/OS data has to get a claim (read or write) on every table space or index it accesses. Claims acquired by an application process are released when the process commits. When a Db2 utility requires exclusive access to an object (as a LOAD REPLACE SHRLEVEL REFERENCE job does when it is time to do the data set switch between the "original" and shadow data sets), it requests a drain on that object. Once a drain has been requested, in-flight units of work associated with programs accessing the target object are allowed to run to a commit point, but new units of work that will access the object are prevented from acquiring the necessary claim, effectively holding up their execution. If these held-up transactions have to wait too long on a claim, they will time out. Will that happen because a Db2 lock timeout value is reached? Possibly, but more probably the concern is a timeout value enforced by an application server. In this era of transactions that can be initiated at any time by someone using a smartphone app, application-side timeout values can be quite stringent - 10 seconds, for example, is not unusual (if a transaction accesses a data server such as Db2 for z/OS, and the application server waits for 10 seconds without getting a response, the application server will time the transaction out).

Why might it take more than a second or two for a Db2 utility, such as an online LOAD REPLACE that is ready for its SWITCH phase, to get the drain it needs to proceed? Won't in-flight transactions reach a commit point very quickly, since they tend to be sub-second in terms of elapsed time? Sure, most - maybe very nearly all - of the in-flight transactions accessing a table space or index (or a partition of same, if the object is partitioned) on which a utility wants a drain will get to a commit point very quickly, but all it takes is one transaction that's taking its time before committing to cause other transactions to be held up long enough to be timed out. As long as any ONE claim on a Db2 page set or partition (table space or index) has not been released via commit, the drain requested by a utility such as LOAD REPLACE SHRLEVEL REFERENCE (at SWITCH phase time) will not be acquired. When a not-committing-any-time-soon process is in the picture, something's gotta give. Either the utility will give up it's drain request (or will fail because it waited too long to get the drain), or transactions held up because of the drain will time out.

Here is where, in the case of an online LOAD REPLACE, you have to answer a question: what is your priority? Is it getting data in the target table switched out by time X, no matter what? Or, if you have at least some wiggle room regarding the time at which the data switch-out for the table is to be accomplished (at least a few minutes, let's say), is the priority on getting the data switch-out done with minimal (preferably zero) application impact in the form of transaction time-outs? If the priority is getting the data switch-out done by time X, no matter what, the probability of at least a few transactions timing out is going to be greater. If, on the other hand, the data switch-out for the target table needs to be accomplished not more than 10 minutes (for example) after time X, and transaction time-outs must be minimized (ideally, eliminated), you can provide the online LOAD REPLACE job with some flexibility regarding drain acquisition when it's time for the SWITCH phase.

How is that flexibility implemented? By way of the same options available when you're executing a Db2 online REORG utility. I'm talking about the utility options DRAIN_WAIT, RETRY, and RETRY_DELAY:

  • DRAIN_WAIT - If transaction time-out limits are really stringent (and again, that is very often enforced at the application server level) and time-outs must be avoided if at all possible, you might want to go really low with this value - maybe something like 8 seconds. If you do that, and an online LOAD REPLACE requests a drain on a target object in order to perform a data set switch with the associated shadow object, and the drain has not been acquired within 8 seconds (perhaps because some transaction that can run long but is infrequently executed is accessing the object and hasn't committed), the utility will give up on trying to get the drain. The thinking here is that we may be able to avoid transaction time-outs if transactions will be held up by a utility-driven drain request for no more than 8 seconds. Will giving up on a drain request cause the utility to fail? No - not if the utility can try again (see the next item, below).
  • RETRY - If you make the drain wait limit really short, thereby increasing the chance that the drain won't be acquired within the specified wait time, use RETRY to allow the utility to try again for the drain a little later. A RETRY value of 20, for example, would give the online LOAD REPLACE job quite a few chances to acquire the drain needed for SWITCH processing. How long will the utility wait to try again for the drain, if at first it doesn't succeed? That's determined by the setting of the next item, below.
  • RETRY_DELAY - If the online LOAD REPLACE tries and fails to get the drain it needs to get the SWITCH phase done, trying again immediately for the drain might not make sense. Better to give the process that prevented drain acquisition a little time to complete and get out of the way; but, don't wait too long - we haven't got all day to get the job done. Maybe waiting 20 seconds between successive attempts to get the drain needed for utility completion would make sense (in that case you'd specify RETRY_DELAY 20).

OK, so online LOAD REPLACE is a very nice Db2 for z/OS utility enhancement. That said, it doesn't do what it does using pixie dust. When the shadow table space and indexes are loaded and ready for switching for the corresponding "original" objects, the online LOAD REPLACE job needs to have - briefly - exclusive access to the "original" objects that are perhaps being concurrently accessed by transactions. Exclusive access is achieved via a drain request, and that drain request will hold up incoming transactions that are going to access the target objects. Transactions can be timed out if they are held up too long. Minimizing transaction impact, then, can necessitate a really short drain wait time for the utility. That short drain wait time increases odds of drain acquisition failure, but let the utility try again for a the drain a few times, with a reasonable time between successive drain requests, and you can have your cake and eat it, too: switch-out of data in a table (even involving a whole lot of data) accomplished easily, with minimal - quite possibly zero - time-out impact on transactions concurrently accessing the target table.

75 comments:

  1. I have a question , should the online Load replace task allow new transactions to come into the target table, once the Switch phase is started .When I tried this online Load replace , a new SELECT transaction on the target table was allowed to start and run ,once the online Load utility was in Switch phase , I was expecting that Switch should not be allowing new transaction on target table

    ReplyDelete
    Replies
    1. Per my understanding, that should not have happened - the switch phase of a LOAD REPLACE SHRLEVEL REFERENCE should be preceded by a "drain all." You might want to open a case with IBM Support to investigate this situation.

      Robert

      Delete
    2. Hi Rob, I understood its function correctly now , actually,Though the switch phase had begun , Drain all was not acquired still ,mean time , another thread came into the target table , which the utility allowed .So at that time , Utility had not aquired exclusive access on the target table , so it works correctly . Thank you

      Delete
    3. Certainly - you're welcome.

      Robert

      Delete
  2. Hi Rob,
    Application team wants to match prod table rows as that of sandbox.when I did load of a table,it reported with rows mismatch ,upon checking I found that there are six parent tables to it and among that two tables are already loaded(in random order).let me know how to proceed so that there are no records mismatch

    ReplyDelete
    Replies
    1. If you want to copy production Db2 for z/OS data into a test/dev environment, and you want that data to be referentially complete (i.e., no "orphan" rows in tables that are dependent in RI relationships), you will of course need to copy data from the whole set of referentially-related tables. One way to do that pretty easily is with Db2 tools. If you want to copy referentially related table spaces (and associated indexes) from one Db2 system to another in their entirety, that can be easily and efficiently and quickly done using IBM's Db2 Cloning Tool for z/OS (see https://www.ibm.com/docs/en/db2-cloning-tool/3.2.0?topic=documentation-db2-cloning-tool-overview). If you want to copy a subset of table space rows (as opposed to entire table spaces), in a referentially complete way, from production to dev/test, that can be easily done using IBM Optim Test Data Management (see https://www.ibm.com/docs/en/iotdmfz/11.7?topic=overview-optim-test-data-management-solution-zos).

      If you want to do this on your own, without using tools such as those just mentioned, you could try this approach:
      1) Determine the set of referentially related objects, using the Db2 REPORT utility (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=utilities-report).
      2) Do a redirected recovery of that set of objects (see https://community.ibm.com/community/user/datamanagement/blogs/kate-wheat1/2020/11/05/redirected-recovery-with-recover-u).
      3) Unload from the table spaces for which you did the redirected recovery, and load that data into table spaces in the test/dev environment.
      4) Create the needed indexes on the table spaces in the test/dev system.

      Robert

      Delete
  3. Hello ,One of my load replace job had message stating 30 rows deleted from this table as parent rows were not matching .looks like whichever rows not in sync with that of parent has been deleted right away in same job.Is it possible to prevent that deletion of rows from this table? and how do i collect the deleted rows/can I store the deleted rows for me to look back?

    ReplyDelete
    Replies
    1. Running LOAD with ENFORCE NO instead of the default ENFORCE CONSTRAINTS should prevent the row deletion you saw, but it will leave the target table space in check-pending status. If you stay with ENFORCE CONSTRAINTS, Db2 will place deleted rows in the data set named via the DISCARDDN option (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-syntax-options-control-statement).

      Robert

      Delete
    2. If I dint include discardddn explicitly in my job ,Will the job create some default discard dsn implicitly while running & place the deleted rows ? or only if we include discardddn explicitly in job,will i able to collect the deleted rows ?

      Delete
    3. Please refer to the online Db2 12 for z/OS documentation page previously cited (https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-syntax-options-control-statement). On that page, in the description of the DISCARDDN option, you will see this sentence: "If you omit the DISCARDDN option, the utility application program saves discarded records only if a SYSDISC DD statement is in the JCL input."

      The online Db2 for z/OS documentation is a very comprehensive informational resource.

      Robert

      Delete
  4. Hello Robert,
    I was running three loads in same job .1st step was load replace(with nocopypend option),2nd step load resume (without nocopypend) 3rd step-load resume (without nocopypend).After 2nd step,objects were placed in copypending.Next load resume ran on same object which was in copypend status.Isnt it restricted status? I'm surprised how load resume was allowed and many subsequent loads resume also ran fine after that.Any advise.

    ReplyDelete
    Replies
    1. Sorry about the delay in responding.

      My understanding is that COPY-pending status prevents changing data in a table space via SQL (e.g., INSERT, UPDATE, DELETE). The LOAD utility (unless executed with RESUME YES and SHRLEVEL CHANGE) does not use the SQL interface.

      Robert

      Delete
  5. Inspite of having log no replace nocopypend keyword in load card.The job still shows messages that it placed part 1 in cope pending state.Why that happens?
    LOAD DATA INDDN TBL LOG NO REPLACE NOCOPYPEND STATISTICS TABLE(ALL) INTO TABLE DB1.TB1
    message:
    TABLESPACE xxxx PARTITION 1 IS IN COPY PENDING

    ReplyDelete
    Replies
    1. Only two possibilities come to my mind:
      A) The object was already in copy-pending status prior to execution of the LOAD job (NOCOPYPEND does not change that),
      -or-
      B) The table space is defined with the NOT LOGGED attribute (in which case the NOCOPYPEND option would be ignored).

      If neither A nor B is true, I do not know why the object is in a copy-pending state. In that case, you might need to open a case with IBM Support.

      Robert

      Delete
    2. Right.I see that there was an load to this table(logged attribute) without nocopypend keyword which placed in copy pending state.Shouldn't this prevent from any further loads to be happening for same table without clearing copy pend ? I see another load replace with nocopypend was attempted for same table and that still dint remove the copy pend state as you said.

      Delete
    3. Refer to this page in the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-syntax-options-control-statement. On that page, look at the description of the LOG option for LOAD. In the description of LOG NO, you will see that when a table space is placed in COPY-pending status, "No table or partition in the table space can be updated by SQL until the restriction is removed." The operative phrase there is, "updated by SQL." COPY-pending status prevents updating of data via SQL, but does not prevent execution of LOAD.

      Robert

      Delete
  6. I get below when I run load using dsnuproc.UTILITY BATCH MEMORY EXECUTION ABENDED, REASON=X'00E40002'.
    load job:
    //JOBLIB DD DSN=DBXX.SDSNLOAD,DISP=SHR
    //*
    //PROCLIB JCLLIB ORDER=xxxxxxx
    //*
    //LOAD0 EXEC DSNUPROC,UID='LOAD1',UTPROC='',SYSTEM='Dxx'
    //SYSIN DD DISP=SHR,DSN=xxxxxxxxxxx
    // *
    Note:even if i add steplib and remove proclib step.Still I get this error.Any advise?

    ReplyDelete
    Replies
    1. Typo-I have tried adding steplib in above jcl (but dint work) and but not made any chnages to proclib.

      Delete
    2. I can't add anything to what is in the "response" information in the documentation on the 00E40002 reason code (https://www.ibm.com/docs/en/db2-for-zos/12?topic=codes-00e40002). If that information is not sufficient to resolve the problem, you might need to open a case with IBM Support.

      Robert

      Delete
  7. I get below error when I attempted to do load.
    DSNU398I DSNURWBF - UNEXPECTED PROCESSING ERROR, REASON=X'00E40318' ON TABLE
    DSNT500I DSNUGBAC - RESOURCE UNAVAILABLE
    REASON 00D70014
    TYPE 00000220
    DSNU017I DSNUGBAC - UTILITY DATA BASE SERVICES MEMORY EXECUTIon ABENDED, REASON=X'00E40347'
    CAUSE=X'00E40318'
    Per reason code-00D70014 it states an attempt to extend dataset failed.I tried to give more space in SYS* DD cards,but still it is failing .How can this be resolved?

    ReplyDelete
    Replies
    1. I do not know why the data set in question could not be extended. You will likely need to open a case with IBM Support to get this issue resolved.

      Robert

      Delete
  8. I have read manuals about override(identity) and ignorefiled(yes) in load card but still I'm not able to conceptually understand the use of it & when I need to use it?
    Any idea?

    ReplyDelete
    Replies
    1. Sometimes you want to load values into a column defined with GENERATED ALWAYS. The OVERRIDE option of LOAD causes the values you want to be loaded into the GENERATED ALWAYS column to be loaded, versus Db2 generating its own values for the column (as GENERATED ALWAYS indicates). Basically what you're overriding is the GENERATED ALWAYS characteristic of the column.

      There is a scenario for using IGNOREFIELDS on this page of the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-loading-tables-special-column-types-by-using-generated-statements.

      Robert

      Delete
  9. Robert,
    We have ri in table(parent table with three child table),should we give check data exception for a table listing all child table underneath? or should we give check data exception for child table with corresponding exception table for child table? to remove orpan/mismtached records with that of parent?

    I see both the below options are used in my shop involving ri.So I'm confused which is used for what?
    option 1:
    check data ts " parent tsname" for exception in " parent table name" use "exception table name of allchild tables" deletes yes;
    option 2:
    check data ts " child tsname" for exception in " child table name" use "exception table name of same table" delete yes;

    ReplyDelete
  10. My load job is failed with REASON 00C90099 TYPE 00000210
    .Per manual it states ,partition is full.I'm unloading from prod and loading into dev.PCTFREE & FREEPAGE is 5.The partition structure is same across system.Wondering why would it fail? And what can be the possile ways to resolve this ?

    ReplyDelete
    Replies
    1. The first possibility that comes to my mind is the one mentioned in the description of the 00C90099 reason code (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=codes-00c90099): the source table sees INSERT activity, and the inserts executed since the last REORG or LOAD REPLACE of the source partition largely filled up that partition's free space. In that case, the LOAD into the target partition might exhaust available space by adding the free space specified (via PCTFREE and FREEPAGE) to the data being loaded. If this is indeed what happened, the LOAD of the target partition should succeed if the target table space is defined with 0 free space (i.e., PCTFREE 0 and FREEPAGE 0).

      Robert

      Delete
  11. Pctfree and freepage is same for both source & target.If that is the case,then inserts are accepted/accumulated in source according to free space.when same thing is loaded to target(including inserted records) why dont the freespace accept that(as that is what source has as well as target freespace is available to) ?

    ReplyDelete
    Replies
    1. Think about it. Let's say the source partition size is X (doesn't matter what X is). Suppose the partition is reorganized. After the REORG, the partition has some amount of free space (specified via PCTFREE and/or FREEPAGE). Imagine that inserts of new rows subsequent to the REORG fill up all of the partition's PCTFREE and/or FREEPAGE space, and not only that - the post-REORG inserts almost take space utilization in the partition up to the limit of X - say that 99% of the X amount of space is filled.

      Now, you unload the production partition and attempt to load the data into a test system partition that also has a size limit of X. You're going to need 99% of X just to hold all the data unloaded from the production partition, but Db2 is also going to try to reserve 5% free space in each data-holding page and and leave an empty page after every 5 data-holding pages. All told, providing that specified level of free space is going to require 0.99 * X (to hold the rows unloaded from the production LPAR) plus about 0.2 * X (for the free space requested via the PCTFREE and FREEPAGE specifications). That.s (0.99 + 0.2) * X, or 1.19 * X, and that takes you past the X partition size limit and the LOAD job fails. Either make the test system partition's DSSIZE larger or change the FREEPAGE and PCTFREE values for the test partition LOAD to 0. If you do that latter, you'll just need .99 * X for that data rows, and they will fit within the X partition size limit (just barely, but they will fit).

      Robert

      Delete
  12. My load job failed with jcl error .When I checked the job log,there was no sysprint dd/sysprint log.There was only jesmsglog,jesjcl,jesysmsg.In jesysmsg-I saw there was dataset not found messages(it was all sysrec,syspunch datasets).
    So probably load did not run as datasets dint exist.But why it was shown in jesysmsg rather than sysprint.Sysprint was not even printed.My thought was, it should have printed in sysprint.

    ReplyDelete
  13. I see a column termed as "TIME" in RC query tool .Is that start time of load or end time/completion of load ?HOw sgould we consider it?
    DATABASE OPERATION DATE TIME
    TS1 LOAD REPLACE 11-11-2023 22:58:25

    ReplyDelete
    Replies
    1. RC Query is not an IBM product. You'll have to ask Broadcom/CA (the vendor) about that.

      Robert

      Delete
  14. As you said,rename invalidates the packages.Other thing I noticed is-As per manual,syntax of rename is rename table source tablename to new table name.But when I execute rename statement without table keyword still it executes fine whereas rename index does not execute without index keyword,how is that possible for rename to execute without table keyword and how rename index strictly expects index keyword?

    ReplyDelete
    Replies
    1. First, look at the diagram under the heading, "Syntax," on the Db2 for z/OS online documentation page that describes the RENAME statement (that page is at https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-rename). You will see that TABLE is "above the line" in the syntax diagram, and INDEX is "below the line." Next, look at the information in the online documentation on how to read SQL statement syntax diagrams (https://www.ibm.com/docs/en/db2-for-zos/12?topic=zos-how-read-syntax-diagrams). You will see that when an item is above the line and another item in the same "stack" is below the line, the above-the-line item is the default specification; Thus TABLE is the default for the RENAME statement, and TABLE is assumed unless you specify INDEX instead.

      Robert

      Delete
    2. Wow! thanks for pointing out that.

      Delete
  15. One of distributed application reported us -SQLCODE=-904, SQLSTATE=57011, SQLERRMC=00C900A3;00000210;xxxxxxx.0000001, DRIVER=4.19.49"
    }
    I checked syscopy and db2 rc tools to know if there is any load happneed to all parent & child tables related to this table.But i dont see any load or alter to table.and no modify recovery too.
    It's very strange to see check pending.

    ReplyDelete
    Replies
    1. LOAD and ALTER TABLE are not the only actions that can cause a page set or partition to go into check-pending status (to my knowledge, MODIFY RECOVERY does not cause check-pending status to be set). The various actions that can cause a page set or partition to go into check-pending status are listed on this page of the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=states-check-pending-status.

      Robert

      Delete
    2. Thanks for article rob.But none of listed options was performed on this table.Interesting.No alter,no load,no recover,no check lob,no reorg discard, etc .I checked syscopy to find the operations on this table as well the parent table(this table had few parent tables).only ic and quiesce was performed.But onething I noticed is,this parent tables in turn had few parent attached to it,but Im not sure if we need to check whole tree to see if above operations was all done.I just checked only the direct related parent table to this child table.Wondering what would have caused this? would an failed utlity (i think failed utlity will not be recorded in syscopy) cause this kind of error?

      Delete
    3. I don't see that a failed utility would put a partition or page set into check pending status.

      I suppose a "cascading check pending" situation could explain what you're seeing. If table T1 is a parent of T2, and T2 is a parent of T3, and T2's table space is in check-pending status, that is an indication that there might be "orphan rows" in T2. If that's true - if there are rows in T2 that should not be there - then that could mean that there are rows in T3 that should not be there.

      A case opened with IBM Support might help you determine how the partition in question got into check-pending status.

      Robert

      Delete
    4. Thanks for the quick response Rob!
      "I don't see that a failed utility would put a partition or page set into check pending status."-Lets says a load failed during enforce phase for some reason or during any phase .Do you mean to say that it will place in check ending state ?

      Delete
    5. Good point. Check "Table 1" on this page in the Db2 for z/OS online documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-termination. You will see that in the BUILD phase, the target page set or partition will be placed in check-pending status; thus, if the utility fails in that phase or any subsequent phase, the page set or partition will be in check-pending status. Resetting of check-pending status requires successful completion of the ENFORCE phase of LOAD.

      Robert

      Delete
  16. I have a tablespace with "not logged attribute" put in "Icopy status" .Having to find the reason for icopy state,when I was checking syscopy ,I see the an load replace(log no) was run on the table.Would that be the cause of icopy?if so why it places in icopy and what's the implication of it?
    1)Only the tablespace and table is in icopy state.Ideally the index should also be in icopy right?.But here the index was not in icopy.clue?
    2)log keyword in load card and logged/not logged attribute in tablespace .which one will take precedence?
    3)If we include nocopypend keyword in load ,would an icopy be ignored?
    4)what is the prefernce/why do we prefer "not logged attribute" in tablespace.what is the use of it and disadvantage of it.

    Appreciate your help on this

    ReplyDelete
    Replies
    1. I will not be able to directly address your questions - there are too many of them, and they are too general in nature. The best I can do for you is point you to the Db2 for z/OS documentation on informational copy-pending status (https://www.ibm.com/docs/en/db2-for-zos/12?topic=states-informational-copy-pending-status), on the NOT LOGGED attribute of CREATE TABLESPACE (search on "logged" on this page: https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-create-tablespace), and on the NOCOPYPEND option of LOAD (search on "nocopypend" on this page: https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-syntax-options-control-statement).

      Regarding indexes, I can tell you that image copies are not applicable unless the index is defined with COPY YES, and the default value for an index is COPY NO (search on "copy" on this page: https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-create-index).

      Robert

      Delete
  17. Rob,
    I ran below check data for an tablespace(which has only 1 table) that was placed in check pending.

    //CHK EXEC PGM=DSNUTILB,PARM='DB2O'
    //UTPRINT DD SYSOUT=*
    //SYSPRINT DD SYSOUT=*
    //SYSIN DD *
    CHECK DATA TABLESPACE DB1.TS1 SCOPE PENDING

    Eventhough I have specified to run on entire tablespace.I'm interested to know why it says entire ts must be checked in output?

    DSNU050I 337 05:47:04.77 DSNUGUTC - CHECK DATA TABLESPACE DB1.TS1
    DSNU726I -DB2R 337 05:47:04.78 DSNUKINP - SCOPE PENDING IS ACTIVE, BUT THE ENTIRE
    TABLE SPACE 'DB1.TS1 ' MUST BE CHECKED

    ReplyDelete
    Replies
    1. I do not know why this message would be issued for a single-table table space. You might need to open a case with IBM Support to get clarification on the meaning of the DSNU726I message.

      Robert

      Delete
  18. What are all the utilities/dml/ddl can be run on a table/tablespaces placed in check pending.I see ic ,quiesce and modify recovery allowed for an ts (placed in check pending).I dont find an page which tells that ddl/dml/db2 utlities allowed on check ending ts.Appreciate for an help.

    ReplyDelete
    Replies
    1. I am not aware that such a page exists in the Db2 for z/OS documentation. Pretty much all that is stated in the doc is that access to data in a table space in check-pending status is "severely restricted." From a DML perspective, I suspect that access is zero. For DDL, I suppose that DROP would work - I don't know that anything else would. As for utilities, I don't know that anything other that REPAIR would work (aside, of course, from the CHECK DATA utility).

      Bottom line: if a table space is in check-pending status, in my mind it is effectively inaccessible from a SQL and a utility perspective. You need to act to get that table space out of check-pending status.

      Robert

      Delete
  19. Hi,I have only run check data with exceptable table for child table but I see few shops run check data with exception for parent table along with listing child tables.Which of the below format is correct?
    CHECK DATA TABLESPACE parent tablespace name
    FOR EXCEPTION
    IN parent table name USE exp1
    IN child 1 table name USE exp2
    IN child 2 table name USE exp3
    DELETE YES

    or

    CHECK DATA TABLESPACE parent tablespace name
    FOR EXCEPTION
    IN child 1 table name USE exp2
    DELETE YES

    ReplyDelete
    Replies
    1. Neither is "correct" as opposed to "incorrect" - they represent two options: you can run CHECK DATA on a single table space, or on several table spaces (which could be in the same set of referentially-related table spaces).

      Robert

      Delete
    2. In my first example,parent and child was on different tablespace.Several tablespaces in same check statment? can you me an example please .

      Delete
    3. Refer to Example 1 on this page of the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=data-sample-check-control-statements.

      Robert

      Delete
    4. Thanks for the article.As you said,if this in the same set of referentially-related table spaces.Does that mean parent tablespaces or child tablespaces to be all listed in same check statement.As per example 1- looks like any tablespaces can be listed in same job.Is that all parent tablespaces and associated parent tables? or only child tablespaces(or child which is parent of another child) and asscoated child tables?

      Delete
    5. As far as I know you can specify any table spaces you want for an execution of CHECK DATA - doesn't matter if the tables in the specified table spaces are child or parent in the context of a referential constraint.

      Robert

      Delete
  20. Hello Rob,I'm doing an load for table which has index (defined with copy yes) so I guess thats why after load is done it automatically places index in informational copy pending state.How can we prevent this from happening? nocopypend keyword does not seem to be rempving informational copy pending.

    ReplyDelete
    Replies
    1. You mention that, with regard to an index defined with COPY YES going into informational copy-pending status via execution of LOAD for the associated table space, the NOCOPYPEND keyword does not seem to be removing informational copy-pending." Check this page in the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=load-syntax-options-control-statement. In the description of the NOCOPYPEND keyword on that documentation page you will see the following: "A NOCOPYPEND specification does not... change any informational COPY-pending (ICOPY) status for indexes." Here's what that means: when LOAD is executed for a table, an index on the table that is defined with COPY YES will be placed in informational copy-pending (ICOPY) status (as noted on this page of the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=copy-preparing-recovery-by-using-utility). Specifying NOCOPYPEND for that LOAD job will not change this.

      You ask, "How can we prevent this from happening?" You can prevent this from happening by altering the index with a specification of COPY NO. If you tell Db2, by including COPY YES in an index's definition, that you want to be able to image-copy the index and use those backups for index-recovery purposes, Db2 will tell you, via ICOPY status for the index, when you would be well-advised to create a new image copy of the index (such as following execution of LOAD for the table on which the index is defined). That's the deal. If you are willing to rely on REBUILD INDEX to recover an index when necessary, issue an ALTER for the index with COPY NO, and you will no longer see ICOPY status for that index.

      Robert

      Delete
    2. Thanks for the response

      Delete
  21. I ran a load replace on a table(which is only child to a parent table) and noticed that it is placed in check pending.
    Child tables will be in check pending only when the parent tables data gets loaded,isnt'it ? how come the load to child itself place table in check pending? Im trying to understand the reason behind it

    ReplyDelete
    Replies
    1. Think about the objective of referential integrity: it is to ensure that there will be no "orphan" rows in a child table (an "orphan" row being a row with a foreign key value that has no matching value in the referenced key of the parent table). If new rows are loaded in a parent table and those new rows have no matching rows in the child table, so what? Those are just parent rows with no corresponding child rows - no violation of referential integrity there. On the other hand, if you load rows into a child table with ENFORCE NO specified for the LOAD job, how does Db2 know if you've loaded child table rows with no match in the parent table? Answer: it doesn't know that, so it places the table space in check-pending status so you can execute CHECK DATA to determine if the child-load operation violated referential integrity constraints.

      Refer to this page in the Db2 for z/OS online documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=states-check-pending-status. On that page, you'll see this sentence: "A table space or partition is placed in CHKP status in any of the following situations that might introduce table check violations:" One of the listed situations is this one: "The LOAD utility is run with ENFORCE NO, and check constraints are defined on the table." Could the phrase "check constraints are defined on the table" apply to a parent table? No. A referential integrity constraint is ALWAYS defined only on the child table of a parent-child table relationship.

      Robert

      Delete
    2. But I ran the load without any enforce keyword-So by default as per ibm manual "enforce constraints" is applied.If that is the case,then how come the table I loaded (which is the only child table) gets into check pending?

      Delete
    3. What did the utility control statement look like for this execution of LOAD?

      Robert

      Delete
    4. LOAD DATA INDDN SYSRECAA LOG NO NOCOPYPEND RESUME NO REPLACE
      SORTKEYS SORTDEVT SYSDA SORTNUM 10
      STATISTICS TABLE(ALL) INDEX(ALL) KEYCARD FREQVAL NUMCOLS 10 COUNT 15
      INTO TABLE TBLNAME

      Delete
    5. Hmm. Given that LOAD utility control statement, the default of ENFORCE CONSTRAINTS should have been in effect, and if that were the case then the LOAD job itself should not have (per my understanding) put the table into check-pending status. Is it possible that some other process (e.g., a LOAD REPLACE of a parent table that has this table as a child) caused the table on which you're focused to go into check-pending status? If that did not happen, I am at a loss here. You might need to open a case with IBM Support to get your question answered.

      Robert

      Delete
    6. Yes load did not happen to parent table.

      Delete
    7. I do not know why the child table space was put in check-pending status. If you open a case with IBM Support, those folks might be able to determine what happened.

      Robert

      Delete
  22. Hello,
    On running copy for set of tablespaces with explicit specification of shrlevel change,I get below error.How does the shrlevel change affect/relate to logging attribute of tablespace ?
    DSNUBAII - COPY SHRLEVEL CHANGE OF TABLESPACE xxxxx.xxxx DSNUM 1 IS NOT ALLOWED
    BECAUSE IT HAS A LOGGING ATTRIBUTE OF NOT LOGGED

    ReplyDelete
    Replies
    1. Plese see this page in the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=copy-syntax-options-control-statement. On that page, in the description of the SHRLEVEL CHANGE option, you will find the following: "SHRLEVEL CHANGE is not allowed for a table space that is defined as NOT LOGGED unless it is a LOB table space and its base table space has the LOGGED attribute."

      Robert

      Delete
    2. Thanks for pointing that out.I'm trying to understand what would be the concept/logic behind this "not logged attribute and shrlevel change" why does shrlevel change not allowed for not logged ts's?

      Delete
    3. Probably because an image copy made with SHRLEVEL CHANGE in effect could not be relied upon as input for a RECOVER job that would recover a NOT LOGGED table space to a point of consistency. A SHRLEVEL CHANGE image copy is likely to be "fuzzy" (i.e., inconsistent), owing to data changes that could be made while the image copy is being generated. A fuzzy image copy, when restored as part of a RECOVER operation, can be taken to a point of consistency via roll-forward application of logged data changes, but that process is not an option for a NOT LOGGED table space.

      Robert

      Delete
  23. I came across below phrase from IBM manual -https://www.ibm.com/docs/en/db2-for-zos/13?topic=copy-copies-lists-objects

    "If you use COPY with the SHRLEVEL(CHANGE) option:
    If you specify OPTIONS EVENT(ITEMERROR,SKIP):
    Each object in the list is placed in UTRW status and the read claim class is held only while the object is being copied"
    When an object is placed in rw status it allows both read and write already,why does it talk about read claim class separetely?

    ReplyDelete
    Replies
    1. Hey, Doug. The STATE of a Db2 for z/OS object and CLAIM activity related to that object are two different things. Let's say that a database object - for example, a table space - is in a read/write state. That means (as you probably know) that the object can be accessed in a data-read way or in a data-change way by an application process. Are there any claims on the object? Maybe, maybe not. Any time a process accesses a Db2 database object, that process has to get a claim on the object (a read claim if the object will only be accessed for read purposes, a write claim if the process will - or at least, might - change data in the object, perhaps by way of INSERT, UPDATE or DELETE processing). When a process commits, any claims it might have on any database objects are released, and any such claim that the process still needs (e.g., if it's a batch job that issues multiple commits while executing) will have to be re-acquired after that commit.

      What is the purpose of claims? Basically, this has to do with Db2 for z/OS drain processing, which is primarily used by Db2 utilities. Take online REORG, for example. Near the end of an online REORG, we can't switch from the "original" data sets to the shadow data sets (which, after the switch, will be the object's "new original" data sets) if some process is currently "on" one or more of the original data sets - in other words, we're not going to jerk the data set "chair" out from under some process while the process is sitting on that chair. REORG will request a drain lock on the original data sets. That means no new claims will be allowed on the data sets while REORG tries to get that drain. If there are any clains on the data sets, the drain request will have to wait for those claims to be released (via commit), and if the drain has to wait too long then the drain request will time out. If REORG gets the drain it requested, it knows that no process is currently "sitting" on the data sets, and it can swap the old original data sets for the shadow data sets (new original data sets) while processes are waiting to take their seats on the data sets, and when the data set switch is complete the waiting processes get the claims they've requested, and they can take their data set seats and everyone's happy and no one ended up with their backside on the floor because of an unorderly data set switcheroo.

      Hope this helps.

      Robert

      Delete
    2. So Probably it tells that when an object is in utrw state it will allow only for read claim (not for write claim) while it is doing an copy? with below quote:
      "Each object in the list is placed in UTRW status and the read claim class is held only while the object is being copied"

      Delete
    3. UTRW status basically means that a utility is operating on the object, and it is generally available for read and write access from other processes during that time, but there could be some restrictions owing to the fact that the utility is operating on the object (the long form of UTRW is "Utility restrictive state, read-write access allowed."

      The "read claim" in the part of the documentation you quoted refers, I believe, to the read claim of the COPY utility itself: COPY will have a read claim on an object that is being copied, but that read claim will be released when COPY has completed the backup of the object in question.

      Robert

      Delete