Thursday, December 2, 2010

REORG and DB2 for z/OS Partition-by-Growth Table Spaces

A DB2 DBA friend of mine recently told me of a problem he'd encountered in reorganizing a single partition of a partition-by-growth (PBG) table space: the online REORG job failed with an out-of-space condition pertaining to the partition's shadow data set. The topic of partition-level REORGs of PBG table spaces is one that I find interesting; thus, this post, in which I'll explain why my friend hit that out-of-space situation and how he resolved it.

I blogged a couple of times about PBG table spaces (introduced with DB2 9 for z/OS) while I was working as an independent DB2 consultant, in one entry comparing them with partition-by-range (PBR) table spaces and in another briefly mentioning them among several new physical database design options provided by DB2 9. One advantage of a PBG table space versus a traditional segmented table space (PBG table spaces are also segmented) is the ability to run utilities at a partition level (the exception to this rule being LOAD, which has to run at the table space level for a PBG table space). On the surface, the option of REORGing a single partition of a PBG table space looks pretty attractive, but there is a catch, and it does have to do with space, as my DBA friend discovered.

Here's the deal: by definition, a new partition for a PBG table space won't be allocated unless the table space's existing partitions are full. So, if you run an online REORG for, say, partition 4 of a ten-partition PBG table space, that partition is likely to be quite full (depending on delete activity, which might have created some "holes" in pages of the partition, and on the table's clustering key, which would influence placement of newly inserted rows -- more on that momentarily). If the table space was created with non-zero values for PCTFREE and/or FREEPAGE, the REORG utility will attempt to reestablish that free space, and if the partition's data rows and the reestablished free space will not fit in the partition's shadow data set, the REORG job will fail. That was the situation that my friend brought to my attention. What did he do? He resolved the problem by setting the value of a relatively new ZPARM parameter, REORG_IGNORE_FREESPACE (introduced last fall via the fix for APAR PK83397), to YES. When the value of REORG_IGNORE_FREESPACE is YES (the default is NO), REORG TABLESPACE will ignore PCTFREE and FREEPAGE values when reloading data rows into a PBG tablespace if either of the following is true:
  • The utility is reorganizing a subset of the table space's partitions
  • The table in the PBG table space has one or more LOB columns

If free space is not reestablished, the rows unloaded from a partition will almost certainly fit into the shadow data set during a REORG. [I say "almost" because for a table with varying-length rows (and rows in a compressed table space are varying-length even if there are no VARCHAR columns), I can imagine a very unusual scenario in which rows unloaded from a partition might not fit back in after being sorted in clustering key sequence, even with free space parameter values ignored. So, REORG_IGNORE_FREESPACE = YES should take chances of an out-of-space condition on REORG or a partition of a PBG tablespace way down, if not all the way to zero.]

Now, you might be thinking, "Why is there even the possibility of this out-of-space condition when one partition (or a subset of partitions) of a PBG table space is being reorganized? Can't REORG take rows that won't fit back into the target partition after reestablishing free space and put them in other partitions that have unused space?" The fact of the matter is that REORG doesn't work that way -- it doesn't move rows from one PBG partition to another unless the whole PBG table space is being REORGed.

As a coda, I'll tell you that the DBA to whom I've referred repeatedly in this entry ended up asking whether it was even worth it to reorganize partitions of the table in question. That is a very good question to ask. It's easy to get into a REORG-if-it's-unclustered mindset, but in fact some tablespaces don't need to be REORGed. The one with which the DBA was dealing gets plenty of inserts but is rarely read, and when it is read, performance just has to be OK -- it doesn't have to be great. CPU resources in such a case can be saved by cutting way back on REORG frequency (if not eliminating REORGs altogether) and optimizing insert performance either by switching to a continuously-ascending clustering key or by altering the table with the APPEND YES option introduced with DB2 9 (this causes DB2 to ignore clustering when inserting or loading data into the table). Bottom line: don't REORG a table space (or partitions thereof) "just because." Make sure that the benefits of reorganizing an object justify the CPU (and disk space) used during REORG processing. While keeping an object well organized is usually a good idea, in some situations disorganization really isn't a big deal. Think it through, and act accordingly.

28 comments:

  1. Excellent article, Robert! The staff at SSWUG.org has featured it on the site.

    We hope you continue to supply informative posts like this one!

    ReplyDelete
  2. Thanks for the positive feedback, Sam. I'll try to keep 'em coming.

    Robert

    ReplyDelete
  3. this was very useful since there isn't a lot of information on utilities and UTS.

    Dale

    ReplyDelete
  4. Glad it was a good read for you, Dale.

    Robert

    ReplyDelete
  5. Very useful article and it helped! Thank you

    ReplyDelete
  6. Happy to hear that the information was of use to you, Cuneyt.

    Robert

    ReplyDelete
  7. Robert,

    It would be very nice to be able to activate the APAR you mention above - REORG_IGNORE_FREESPACE (introduced last fall via the fix for APAR PK83397) as a PARM on the execute statement of the REORG.

    ReplyDelete
  8. That would be an attractive option. You might want to pass your functionality enhancement idea to your local IBM DB2 technical support rep. He or she could then enter the request into our product requirements application, and you'd get a response back from DB2 for z/OS development.

    Robert

    ReplyDelete
  9. Hi Robert,
    While performing reorg on one of our TS it failed with "RELOADING OF DATASET 'dsnxxx.xxxx.dbnam.tsnam.jooo1.a002' WILL EXCEED THE MAXIMUM SIZE OF ******* PAGES"

    and the control card it is using "PART 2".
    My observations are it is a PBG(Since Maxparts is there) and i can see currently 3 partitions for that TS.

    I reran the Reorg again with removing the "PART 2" from the control card and it was successful.
    Was this approach correct ?
    Please let me know why the Initial Reorg Failed.

    ReplyDelete
    Replies
    1. First of all, partition 3 would not have been added to the table space unless partition 2 had filled up. Assuming that there have not been many deletions of rows from partition 2, a partition-level REORG for that partition probably failed because with free space reestablished (assuming non-zero values for PCTFREE and/or FREEPAGE for the partition), the data would not fit back into the partition. When a partition-by-growth table space is REORGed in its entirety, DB2 can distribute rows across the table space's partitions so that all will fit, and if DB2 needs to add a partition to the table space to accommodate all rows plus reestablished free space, it will do so.

      Delete
  10. Hi Robert, first of all, great article!

    I was thinking, what if you reorg a UTS PGB with 5 partitions on part 4:5 where all partitions are on the verge of being full, you use PCTFREE 10 / FREEPAGE 0 / MAXPARTITIONS 10, and run on Db2 v11. Will this result in
    1) reorg fail due to the reorged data not being able to fit back into part 4 and/or 5.
    2) Db2 reorgs and recognize that the data wont fit and also that the current last existing partition is in the reorg and therefore adds a new partition for the overflowing data. (This would be cool but I guess it's just wishful thinking).

    Regards
    Johan

    ReplyDelete
    Replies
    1. Sorry about the delay in responding, Johan.

      In a Db2 11 for z/OS environment, I would expect the REORG job you've described to fail, unless you have the ZPARM parameter REORG_IGNORE_FREESPACE set to YES.

      In a Db2 12 system, the REORG job should complete successfully, with Db2 adding a new partition to the table space to hold rows that would not fit back into partitions 4 and 5 (with PCTFREE free space re-established); and, this success would not be dependent on one of the partitions targeted by the job being the table space's last partition. (in other words, given your example table space, a REORG PART 2:3 should complete successfully in a Db2 12 system, with a new partition 6 added to hold the rows that would not fit back into partitions 2 and 3 with free space re-established).

      Robert

      Delete
  11. Hi Rob,

    I see inline copies are taken after all the phases are completed in an reorg (both shrlevel change/none/ref) but in load I believe it is taken after reload phase.Why such difference? why inline copy is allowed only at the end/completion of all the phases
    ? Can you explain

    ReplyDelete
    Replies
    1. I'm not certain as to what you're asking. For both the LOAD and REORG utilities, inline image copies are created as part of the RELOAD phase of the utility.

      Robert

      Delete
  12. Hi Rob,I would like get understanding on the parm -maxro,longlog,delay.The website information states these keywords related to last iterations of log processing.But I would like to get your help to get clarity on this.

    ReplyDelete
    Replies
    1. MAXRO, which is relevant to REORG with SHRLEVEL CHANGE, specifies the limit on how long the target table space or partition can be in a read-only state while REORG does the last iteration of applying data changes picked up from the Db2 log - this is when the shadow data sets on which REORG has been operating get "synched-up" with the "original" data sets, from a data currency perspective. The larger the MAXRO value, the more likely it is that the next iteration of application of data changes picked up from the log (the process of "catching up" to the currency of data in the "original" data sets, which with SHRLEVEL CHANGE can be updated as REORG is executing) will be the last one, after which the SWITCH phase can execute and REORG completes. A smaller MAXRO value means that REORG is likely to run longer, as it waits for a time period when the rate of data changes for the original data sets drops to a level such that the last catch-up application of logged data changes can be completed within the specified MAXRO interval.

      LONGLOG tells REORG what to do if it determines that it's not making significant process in getting the shadow data sets caught up to the original data sets in terms of currency - in that case, REORG can continue processing, terminate or force completion by draining processes with write claims on the original data sets.

      DELAY is relevant if LONGLOG is set to TERM or DRAIN - if DELAY is n seconds, the termination or drain action will be taken in not less than n seconds from the time of the associated LONGLOG message.

      A fairly common approach is to execute a SHRLEVEL CHANGE online REORG with MAXRO DEFER and LONGLOG CONTINUE - that basically tells REORG, "keep doing iterations of catch-up processing until further notice." Then, at a time when data-change activity targeting the table space or partition(s) being REORGed drops off, an -ALTER UTILITY command is issued for the utility to change MAXRO from DEFER to a fairly large interval. The large MAXRO interval makes it likely that the next iteration of catch-up processIng will be the last one. As an alternative to submitting an -ALTER UTILITY command for this purpose, the REORG job can be submitted with a SWITCHTIME specification that essentially says, "at such-and-such time, or after the REORG job has been running for X amount of time, change MAXRO to the value specified by NEWMAXRO.

      Delete
  13. I se a PBR table,which has 25 parts defined .when I keep on adding part using "alter table add part" it allows it.How do I know what is the maximum part allowed in pbr?

    ReplyDelete
    Replies
    1. For a PBR table space that uses relative page numbering, the maximum number of partitions is 4096. For a PBR table space that uses absolute page numbering, the maximum number of partitions depends on the page size and the DSSIZE used for the table space. See the information in tables 1, 2, 3 and 4 on this page in the Db2 for z/OS online documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-alter-tablespace#db2z_sql_altertablespace__dlentry-maxtssize.

      Robert

      Delete
    2. thanks for quick response

      Delete
  14. Hello,I'm getting 00E40347 in an reorg job
    DSNU3350I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06:24746409 RECORDS SORTED, ESTIMATED 24746404, VARIATION 0 PERCENT
    DSNU3351I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06: AVERAGE RECORD LENGTH PROCESSED 224, ESTIMATED 223, VARIATION 0
    DSNU3352I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06: USED DFSORT
    DSNU3354I 343 12:36:10.84 DSNUGSRT - SORT TASK DA06: MEMORY BELOW THE BAR: OPTIMAL 24 MB, USED 15 MB
    DSNU017I 343 12:36:32.98 DSNUGSAT - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED, REASON=X'00E40347'
    CAUSE=X'00E40347'
    Per few sites it states it is due to space issue probably.I would like to know what correction can be made in my job so that it runs fine next time .Below is my control statement
    //REORG EXEC PGM=DSNUTILB
    //SYSPRINT DD SYSOUT=*
    //SYSUDUMP DD SYSOUT=*
    //SYSERR DD DSN=&&SYSERR,
    // DISP=(MOD,DELETE,CATLG),
    // SPACE=(CYL,(10,15)),
    // UNIT=SYSDA
    //SYSREC DD DSN=&&SYSREC,
    // DISP=(NEW,DELETE,DELETE),
    // UNIT=(SYSDA,9),
    // SPACE=(CYL,(5000,500),,,ROUND)
    //SYSUT1 DD DSN=&&SYSUT1,
    // DISP=(NEW,DELETE,DELETE),
    // UNIT=(SYSDA,59),
    // SPACE=(CYL,(5000,500),,,ROUND)
    //UTPRINT DD SYSOUT=*

    ReplyDelete
    Replies
    1. I can't help you here. That's a pretty non-specific error code ("a Db2 subtask terminated abnormally"). You'll probably need to open a case with IBM Support to get more specific cause and resolution information.

      Robert

      Delete
  15. Hello,
    I have below reorg card used in my job.But inspite of specifying "aux yes" .I see this message in job log.Why is this aux yes ignored?

    DSNU124I DSNURFIT - AUX YES SPECIFICATION IS IGNORED AND AUX NO IS IN EFFECT FOR CURRENT UTILITY
    EXECUTION

    TEMPLATE COPYDSN DSN DSNNAME
    UNIT CTAPE BUFNO 20 RETPD 7 VOLCNT(59)
    DISP (NEW,CATLG,DELETE) STACK YES TRTCH COMP

    LISTDEF TEMPDEF
    INCLUDE TABLESPACE TS1
    REORG TABLESPACE LIST TEMPDEF
    KEEPDICTIONARY
    NOSYSREC
    AUX YES
    SORTDEVT SYSDA SORTNUM 20
    COPYDDN COPYDSN
    TIMEOUT TERM MAXRO 5 DRAIN_WAIT 10 RETRY 10 RETRY_DELAY 5
    SHRLEVEL CHANGE FASTSWITCH YES

    ReplyDelete
    Replies
    1. If the table space in question is a partition-by-growth (PBG) table space, the issue is probably the STACK YES in your TEMPLATE statement. Refer to this page in the Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=tablespace-syntax-options-reorg-control-statement. In the blue "Restrictions" box under the description of the AUX keyword, you will see that AUX YES and STACK YES are not a compatible combination when REORG is executed for a PBG table space that has one or more LOB columns.

      Robert

      Delete
    2. But I found out that the tablespace that is reorged has no LOB's but due to existing job setup it had "aux yes" included.So eventhough tablespace did not have lob's associated still does the stack and aux combination restriction apply?

      Delete
    3. If there are no LOB columns in a table, AUX YES will be ignored because it has no meaning for a table space holding a table that has no LOB columns. AUX YES is only relevant if the base table space has one or more LOB table spaces associated with it, and no LOB columns means no LOB table spaces.

      Robert

      Delete
    4. So probably in this case," AUX YES SPECIFICATION IS IGNORED" message appeared in the log due to no LOb tablespaces unde base or because "stack yes" keyword? which takes precedance?

      Delete
    5. The fact that the table in the target table space has no LOB columns is the reason AUX YES is ignored.

      Robert

      Delete