Thursday, December 2, 2010

REORG and DB2 for z/OS Partition-by-Growth Table Spaces

A DB2 DBA friend of mine recently told me of a problem he'd encountered in reorganizing a single partition of a partition-by-growth (PBG) table space: the online REORG job failed with an out-of-space condition pertaining to the partition's shadow data set. The topic of partition-level REORGs of PBG table spaces is one that I find interesting; thus, this post, in which I'll explain why my friend hit that out-of-space situation and how he resolved it.

I blogged a couple of times about PBG table spaces (introduced with DB2 9 for z/OS) while I was working as an independent DB2 consultant, in one entry comparing them with partition-by-range (PBR) table spaces and in another briefly mentioning them among several new physical database design options provided by DB2 9. One advantage of a PBG table space versus a traditional segmented table space (PBG table spaces are also segmented) is the ability to run utilities at a partition level (the exception to this rule being LOAD, which has to run at the table space level for a PBG table space). On the surface, the option of REORGing a single partition of a PBG table space looks pretty attractive, but there is a catch, and it does have to do with space, as my DBA friend discovered.

Here's the deal: by definition, a new partition for a PBG table space won't be allocated unless the table space's existing partitions are full. So, if you run an online REORG for, say, partition 4 of a ten-partition PBG table space, that partition is likely to be quite full (depending on delete activity, which might have created some "holes" in pages of the partition, and on the table's clustering key, which would influence placement of newly inserted rows -- more on that momentarily). If the table space was created with non-zero values for PCTFREE and/or FREEPAGE, the REORG utility will attempt to reestablish that free space, and if the partition's data rows and the reestablished free space will not fit in the partition's shadow data set, the REORG job will fail. That was the situation that my friend brought to my attention. What did he do? He resolved the problem by setting the value of a relatively new ZPARM parameter, REORG_IGNORE_FREESPACE (introduced last fall via the fix for APAR PK83397), to YES. When the value of REORG_IGNORE_FREESPACE is YES (the default is NO), REORG TABLESPACE will ignore PCTFREE and FREEPAGE values when reloading data rows into a PBG tablespace if either of the following is true:
  • The utility is reorganizing a subset of the table space's partitions
  • The table in the PBG table space has one or more LOB columns

If free space is not reestablished, the rows unloaded from a partition will almost certainly fit into the shadow data set during a REORG. [I say "almost" because for a table with varying-length rows (and rows in a compressed table space are varying-length even if there are no VARCHAR columns), I can imagine a very unusual scenario in which rows unloaded from a partition might not fit back in after being sorted in clustering key sequence, even with free space parameter values ignored. So, REORG_IGNORE_FREESPACE = YES should take chances of an out-of-space condition on REORG or a partition of a PBG tablespace way down, if not all the way to zero.]

Now, you might be thinking, "Why is there even the possibility of this out-of-space condition when one partition (or a subset of partitions) of a PBG table space is being reorganized? Can't REORG take rows that won't fit back into the target partition after reestablishing free space and put them in other partitions that have unused space?" The fact of the matter is that REORG doesn't work that way -- it doesn't move rows from one PBG partition to another unless the whole PBG table space is being REORGed.

As a coda, I'll tell you that the DBA to whom I've referred repeatedly in this entry ended up asking whether it was even worth it to reorganize partitions of the table in question. That is a very good question to ask. It's easy to get into a REORG-if-it's-unclustered mindset, but in fact some tablespaces don't need to be REORGed. The one with which the DBA was dealing gets plenty of inserts but is rarely read, and when it is read, performance just has to be OK -- it doesn't have to be great. CPU resources in such a case can be saved by cutting way back on REORG frequency (if not eliminating REORGs altogether) and optimizing insert performance either by switching to a continuously-ascending clustering key or by altering the table with the APPEND YES option introduced with DB2 9 (this causes DB2 to ignore clustering when inserting or loading data into the table). Bottom line: don't REORG a table space (or partitions thereof) "just because." Make sure that the benefits of reorganizing an object justify the CPU (and disk space) used during REORG processing. While keeping an object well organized is usually a good idea, in some situations disorganization really isn't a big deal. Think it through, and act accordingly.

53 comments:

  1. Excellent article, Robert! The staff at SSWUG.org has featured it on the site.

    We hope you continue to supply informative posts like this one!

    ReplyDelete
  2. Thanks for the positive feedback, Sam. I'll try to keep 'em coming.

    Robert

    ReplyDelete
  3. this was very useful since there isn't a lot of information on utilities and UTS.

    Dale

    ReplyDelete
  4. Glad it was a good read for you, Dale.

    Robert

    ReplyDelete
  5. Very useful article and it helped! Thank you

    ReplyDelete
  6. Happy to hear that the information was of use to you, Cuneyt.

    Robert

    ReplyDelete
  7. Robert,

    It would be very nice to be able to activate the APAR you mention above - REORG_IGNORE_FREESPACE (introduced last fall via the fix for APAR PK83397) as a PARM on the execute statement of the REORG.

    ReplyDelete
  8. That would be an attractive option. You might want to pass your functionality enhancement idea to your local IBM DB2 technical support rep. He or she could then enter the request into our product requirements application, and you'd get a response back from DB2 for z/OS development.

    Robert

    ReplyDelete
  9. Hi Robert,
    While performing reorg on one of our TS it failed with "RELOADING OF DATASET 'dsnxxx.xxxx.dbnam.tsnam.jooo1.a002' WILL EXCEED THE MAXIMUM SIZE OF ******* PAGES"

    and the control card it is using "PART 2".
    My observations are it is a PBG(Since Maxparts is there) and i can see currently 3 partitions for that TS.

    I reran the Reorg again with removing the "PART 2" from the control card and it was successful.
    Was this approach correct ?
    Please let me know why the Initial Reorg Failed.

    ReplyDelete
    Replies
    1. First of all, partition 3 would not have been added to the table space unless partition 2 had filled up. Assuming that there have not been many deletions of rows from partition 2, a partition-level REORG for that partition probably failed because with free space reestablished (assuming non-zero values for PCTFREE and/or FREEPAGE for the partition), the data would not fit back into the partition. When a partition-by-growth table space is REORGed in its entirety, DB2 can distribute rows across the table space's partitions so that all will fit, and if DB2 needs to add a partition to the table space to accommodate all rows plus reestablished free space, it will do so.

      Delete
  10. Hi Robert, first of all, great article!

    I was thinking, what if you reorg a UTS PGB with 5 partitions on part 4:5 where all partitions are on the verge of being full, you use PCTFREE 10 / FREEPAGE 0 / MAXPARTITIONS 10, and run on Db2 v11. Will this result in
    1) reorg fail due to the reorged data not being able to fit back into part 4 and/or 5.
    2) Db2 reorgs and recognize that the data wont fit and also that the current last existing partition is in the reorg and therefore adds a new partition for the overflowing data. (This would be cool but I guess it's just wishful thinking).

    Regards
    Johan

    ReplyDelete
    Replies
    1. Sorry about the delay in responding, Johan.

      In a Db2 11 for z/OS environment, I would expect the REORG job you've described to fail, unless you have the ZPARM parameter REORG_IGNORE_FREESPACE set to YES.

      In a Db2 12 system, the REORG job should complete successfully, with Db2 adding a new partition to the table space to hold rows that would not fit back into partitions 4 and 5 (with PCTFREE free space re-established); and, this success would not be dependent on one of the partitions targeted by the job being the table space's last partition. (in other words, given your example table space, a REORG PART 2:3 should complete successfully in a Db2 12 system, with a new partition 6 added to hold the rows that would not fit back into partitions 2 and 3 with free space re-established).

      Robert

      Delete
  11. Hi Rob,

    I see inline copies are taken after all the phases are completed in an reorg (both shrlevel change/none/ref) but in load I believe it is taken after reload phase.Why such difference? why inline copy is allowed only at the end/completion of all the phases
    ? Can you explain

    ReplyDelete
    Replies
    1. I'm not certain as to what you're asking. For both the LOAD and REORG utilities, inline image copies are created as part of the RELOAD phase of the utility.

      Robert

      Delete
  12. Hi Rob,I would like get understanding on the parm -maxro,longlog,delay.The website information states these keywords related to last iterations of log processing.But I would like to get your help to get clarity on this.

    ReplyDelete
    Replies
    1. MAXRO, which is relevant to REORG with SHRLEVEL CHANGE, specifies the limit on how long the target table space or partition can be in a read-only state while REORG does the last iteration of applying data changes picked up from the Db2 log - this is when the shadow data sets on which REORG has been operating get "synched-up" with the "original" data sets, from a data currency perspective. The larger the MAXRO value, the more likely it is that the next iteration of application of data changes picked up from the log (the process of "catching up" to the currency of data in the "original" data sets, which with SHRLEVEL CHANGE can be updated as REORG is executing) will be the last one, after which the SWITCH phase can execute and REORG completes. A smaller MAXRO value means that REORG is likely to run longer, as it waits for a time period when the rate of data changes for the original data sets drops to a level such that the last catch-up application of logged data changes can be completed within the specified MAXRO interval.

      LONGLOG tells REORG what to do if it determines that it's not making significant process in getting the shadow data sets caught up to the original data sets in terms of currency - in that case, REORG can continue processing, terminate or force completion by draining processes with write claims on the original data sets.

      DELAY is relevant if LONGLOG is set to TERM or DRAIN - if DELAY is n seconds, the termination or drain action will be taken in not less than n seconds from the time of the associated LONGLOG message.

      A fairly common approach is to execute a SHRLEVEL CHANGE online REORG with MAXRO DEFER and LONGLOG CONTINUE - that basically tells REORG, "keep doing iterations of catch-up processing until further notice." Then, at a time when data-change activity targeting the table space or partition(s) being REORGed drops off, an -ALTER UTILITY command is issued for the utility to change MAXRO from DEFER to a fairly large interval. The large MAXRO interval makes it likely that the next iteration of catch-up processIng will be the last one. As an alternative to submitting an -ALTER UTILITY command for this purpose, the REORG job can be submitted with a SWITCHTIME specification that essentially says, "at such-and-such time, or after the REORG job has been running for X amount of time, change MAXRO to the value specified by NEWMAXRO.

      Delete
  13. I se a PBR table,which has 25 parts defined .when I keep on adding part using "alter table add part" it allows it.How do I know what is the maximum part allowed in pbr?

    ReplyDelete
    Replies
    1. For a PBR table space that uses relative page numbering, the maximum number of partitions is 4096. For a PBR table space that uses absolute page numbering, the maximum number of partitions depends on the page size and the DSSIZE used for the table space. See the information in tables 1, 2, 3 and 4 on this page in the Db2 for z/OS online documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-alter-tablespace#db2z_sql_altertablespace__dlentry-maxtssize.

      Robert

      Delete
    2. thanks for quick response

      Delete
  14. Hello,I'm getting 00E40347 in an reorg job
    DSNU3350I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06:24746409 RECORDS SORTED, ESTIMATED 24746404, VARIATION 0 PERCENT
    DSNU3351I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06: AVERAGE RECORD LENGTH PROCESSED 224, ESTIMATED 223, VARIATION 0
    DSNU3352I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06: USED DFSORT
    DSNU3354I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06: MEMORY BELOW THE BAR: OPTIMAL 24 MB, USED 15 MB
    DSNU017I 343 12:36:32.98 DSNUGSAT - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED, REASON=X'00E40347'
    CAUSE=X'00E40347'
    Per few sites it states it is due to space issue probably.I would like to know what correction can be made in my job so that it runs fine next time .Below is my control statement
    //REORG EXEC PGM=DSNUTILB
    //SYSPRINT DD SYSOUT=*
    //SYSUDUMP DD SYSOUT=*
    //SYSERR DD DSN=&&SYSERR,
    // DISP=(MOD,DELETE,CATLG),
    // SPACE=(CYL,(10,15)),
    // UNIT=SYSDA
    //SYSREC DD DSN=&&SYSREC,
    // DISP=(NEW,DELETE,DELETE),
    // UNIT=(SYSDA,9),
    // SPACE=(CYL,(5000,500),,,ROUND)
    //SYSUT1 DD DSN=&&SYSUT1,
    // DISP=(NEW,DELETE,DELETE),
    // UNIT=(SYSDA,59),
    // SPACE=(CYL,(5000,500),,,ROUND)
    //UTPRINT DD SYSOUT=*

    ReplyDelete
    Replies
    1. I can't help you here. That's a pretty non-specific error code ("a Db2 subtask terminated abnormally"). You'll probably need to open a case with IBM Support to get more specific cause and resolution information.

      Robert

      Delete
  15. Hello,
    I have below reorg card used in my job.But inspite of specifying "aux yes" .I see this message in job log.Why is this aux yes ignored?

    DSNU124I DSNURFIT - AUX YES SPECIFICATION IS IGNORED AND AUX NO IS IN EFFECT FOR CURRENT UTILITY
    EXECUTION

    TEMPLATE COPYDSN DSN DSNNAME
    UNIT CTAPE BUFNO 20 RETPD 7 VOLCNT(59)
    DISP (NEW,CATLG,DELETE) STACK YES TRTCH COMP

    LISTDEF TEMPDEF
    INCLUDE TABLESPACE TS1
    REORG TABLESPACE LIST TEMPDEF
    KEEPDICTIONARY
    NOSYSREC
    AUX YES
    SORTDEVT SYSDA SORTNUM 20
    COPYDDN COPYDSN
    TIMEOUT TERM MAXRO 5 DRAIN_WAIT 10 RETRY 10 RETRY_DELAY 5
    SHRLEVEL CHANGE FASTSWITCH YES

    ReplyDelete
    Replies
    1. If the table space in question is a partition-by-growth (PBG) table space, the issue is probably the STACK YES in your TEMPLATE statement. Refer to this page in the Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=tablespace-syntax-options-reorg-control-statement. In the blue "Restrictions" box under the description of the AUX keyword, you will see that AUX YES and STACK YES are not a compatible combination when REORG is executed for a PBG table space that has one or more LOB columns.

      Robert

      Delete
    2. But I found out that the tablespace that is reorged has no LOB's but due to existing job setup it had "aux yes" included.So eventhough tablespace did not have lob's associated still does the stack and aux combination restriction apply?

      Delete
    3. If there are no LOB columns in a table, AUX YES will be ignored because it has no meaning for a table space holding a table that has no LOB columns. AUX YES is only relevant if the base table space has one or more LOB table spaces associated with it, and no LOB columns means no LOB table spaces.

      Robert

      Delete
    4. So probably in this case," AUX YES SPECIFICATION IS IGNORED" message appeared in the log due to no LOb tablespaces unde base or because "stack yes" keyword? which takes precedance?

      Delete
    5. The fact that the table in the target table space has no LOB columns is the reason AUX YES is ignored.

      Robert

      Delete
  16. When I run reorg on list,I get error related to "AUX".When I explicitly specify AUX yes why does it say "AUX YES" is ignored and "aux no" in effect?

    DSNU1033I 067 12:41:27.27 DSNUGULM - PROCESSING LIST ITEM: TABLESPACE RAPDB033.RAPTS033
    DSNU124I -DBM1 067 12:41:27.27 DSNURFIT - AUX YES SPECIFICATION IS
    IGNORED AND AUX NO IS IN EFFECT FOR CURRENT UTILITY
    EXECUTION
    reorg stmt:

    LISTDEF TEMP
    INCLUDE TABLESPACE RAPDB033.*
    REORG TABLESPACE LIST TEMP
    KEEPDICTIONARY
    NOSYSREC
    AUX YES
    SORTDEVT SYSDA SORTNUM 10
    COPYDDN COPYDSN
    TIMEOUT TERM MAXRO 10 DRAIN_WAIT 15 RETRY 10 RETRY_DELAY 5
    SHRLEVEL CHANGE FASTSWITCH YES

    ReplyDelete
    Replies
    1. It could be that the LISTDEF-defined list of objects includes one or more LOB table spaces. AUX YES is not applicable to a LOB table space (see the Db2 for z/OS online documentation page at https://www.ibm.com/docs/en/db2-for-zos/13?topic=tablespace-reorganization-lob-table-space).

      Robert

      Delete
    2. Listdef has combination of lob and pbg,pbr tablespaces

      Delete
    3. and "aux yes ignored" message got listed for pbg and pbr tablespaces as well.

      Delete
    4. It could be that PBG and PBR table spaces holding tables with no LOB columns were in the list. AUX YES does not apply to a base table space when the table in that table space has. no LOB columns.

      Robert

      Delete
  17. I Ran reorg on tablespace and job failed with below message:
    DSNU1033I 093 03:02:30.35 DSNUGULM - PROCESSING LIST ITEM: TABLESPACE XX
    DSNU1152I -DB2D 093 03:02:30.36 DSNURFIT - REORG SHRLEVEL CHANGE OF TABLESPACE xx IS NOT ALLOWED
    DSNU012I 093 03:02:30.36 DSNUGBAC - UTILITY EXECUTION TERMINATED, HIGHEST RETURN CODE=8

    As per DSN code,I see that tablespace was "not logged " tablespace that is the reason for shrlevel change not accepted ,but when I ran with shrlevel reference I got same error .Also ,when I removed shrlevel keyword ,job ran fine (probably it ran with shrlevel none)

    Why shrlevel reference is also not allowed ?
    when the job was running as shrlevel none. I checked for display locks on this object.I saw the lockinfo as H-IX,W,A->which is write intentions(IX) with write drain lock(W).But shrlevel none will not do read/write which is only done by shrlevel change/reference per my reading.What could be the reason for holding write locks?

    ReplyDelete
    Replies
    1. In the description of the DSNU1152I message in the Db2 for z/OS documentation (see https://www.ibm.com/docs/en/db2-for-zos/13?topic=messages-dsnu1152i), you will see that this error message will be generated when "SHRLEVEL REFERENCE was specified when REORG TABLESPACE with PART was run on a NOT LOGGED table space on which nonpartitioned indexes are defined." This suggests that a) the PART keyword was specified for your REORG, and b) there were one or more non-partitioned indexes on the table in the table space.

      Yes, when SHRLEVEL is not specified for a REORG TABLESPACE job, it defaults to SHRLEVEL NONE.

      As for locks and drain requests, absolutely REORG does reading and writing when it operates on a table space, regardless of the SHRLEVEL option. It reads rows and moves them around to reestablish clustering - that involves read and write access. SHRLEVEL NONE means that processes OTHER THAN the REORG job cannot read from or write to the table space (or associated indexes) while it is being reorganized.

      Robert

      Delete
  18. Hello,Why do reorg on an tablespace fail due to resource unavailability on sysutilx? Why reorg refers sysutilx?

    DSNU1033I 094 03:06:34.52 DSNUGULM - PROCESSING LIST ITEM: TABLESPACE EE01EDB.EE211TS
    -DSNT500I 094 03:12:56.60 DSNUGBAC - RESOURCE UNAVAILABLE
    REASON 00C9008E
    TYPE 00000302
    NAME DSNDB01 .SYSUTILX.X'00001B'
    -DSNU017I 094 03:12:56.60 DSNUGBAC - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED, REASON=X'00E4000D'
    CAUSE=X'00C9008E'

    ReplyDelete
    Replies
    1. SYSUTILX is used by Db2 for z/OS to record periods (portions of the Db2 transaction log) during which page sets or partitions were or were not being updated (i.e., periods during which data in the page set or partition was or was not being changed). REORG makes changes to the object on which it is operating, and that can require a SYSUTILX update.

      Robert

      Delete
    2. But why would my job states contention/timeout with sysutilx update? sysutilx should be able to hold as many updates that happen in entire system right.Also,only my job was running reorg on that object at that time.

      Delete
    3. REORG typically executes with no issues related to SYSUTILX contention. I do not know why your job encountered the SYSUTILX contention issue. To get to the bottom of that, you may need to open a case with IBM Support.

      Robert

      Delete
  19. Hello,Does reorg of an tablespace -reorg's index as well?
    Reorg of tablespace has build phase -does building of an index means index is reorgd as well?

    ReplyDelete
    Replies
    1. Please refer to this page of the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=utilities-reorg-tablespace. In Table 1 on that page, you will see the following description of the result of executing REORG TABLESPACE: "Reorganizes all data and all indexes."

      Robert

      Delete
    2. Thankyou for pointing that out!

      Delete
  20. If a reorg job has both NOSYSREC as well as UNLDDN SYSREC .which will take precedence? I couldnt find in manual

    ReplyDelete
    Replies
    1. I'd expect NOSYSREC to take precedence. An UNLDDN specification doesn't mean you WILL unload data - it just says that IF data is to be unloaded, the unload data set information is provided via the DD name specified for UNLDDN.

      Robert

      Delete
    2. Correct. For a SHRLEVEL CHANGE online REORG, restart is only relevant for the SWITCH phase (as indicated in the table you referenced in the documentation).

      Robert

      Delete
    3. But under NOSYSREC option(https://www.ibm.com/docs/sk/db2-for-zos/13?topic=tablespace-syntax-options-reorg-control-statement) it is specified -
      when you omit the unload data set by specifying NOSYSREC"If an error occurs during reloading, you must restart the utility at the beginning of the UNLOAD phase."
      How is this possible? Restart is not allowed(shrlevel change) even if we specify sysrec/unlddn.Kindly clarify this

      Delete
    4. I believe that an additional phrase (which I have put in all-caps for emphasis) would clarify what's in the documentation: "If an error occurs during reloading, you must restart the utility at the beginning of the UNLOAD phase IN SITUATIONS IN WHICH RESTART AT THE BEGINNING OF THAT PHASES IS POSSIBLE." A REORG executed with SHRLEVEL CHANGE is not such a situation.

      Robert

      Delete
    5. Thanks for the clarification !

      Delete
  21. Hello,
    I ran a reorg on a "NOT LOGGED" tablespace with shrlevel reference.The job failed due to reason DSNU1152I.
    Per manual,"
    partition-level REORG for a NOT LOGGED table space on which at least one non-partitioned index defined will not work for shrlevel reference"
    I understand the reason behind "NOT LOGGED " .What could be the logic/concept behind NPI involved in this?

    ReplyDelete
    Replies
    1. First, note that this restriction, in addition to being stated in the DSNU1152I message, is also described in the documentation for the REORG utility. On the documentation page at https://www.ibm.com/docs/en/db2-for-zos/13?topic=tablespace-syntax-options-reorg-control-statement, under the description of the SHRLEVEL REFERENCE option, in a red box labeled "restriction," you will see this: "You cannot specify SHRLEVEL REFERENCE when REORG TABLESPACE with PART is run on a NOT LOGGED table space on which nonpartitioned indexes are defined."

      The reason for the restriction: when you do a partition-level online REORG for a table space on which one or more non-partitioned indexes are defined, those NPIs will be reorganized in their entirety. SHRLEVEL REFERENCE at the table space partition level cannot be imposed on the NPIs in their entirety - entries in the NPIs related to partitions not covered by the SHRLEVEL CHANGE state of the target partitions can be changed. That is an inherent conflict, at the NPI level, with the SHRLEVEL REFERNCE specified for the target partitions. NOT LOGGED specified for a table space applies to related indexes, as well. Allowing changes to the NPIs (which is allowed when there is a SHRLEVEL REFERENCE REORG of a subset of a table space's partitions) when NOT LOGGED is in effect would lead to an unrecoverable situation, and Db2 will not allow that to happen.

      Robert

      Delete