Wednesday, May 13, 2015

For a Large DB2 for z/OS Table, Should You Go With Partition-by-Range or Partition-by-Growth?

There are several aspects to this question: What do I mean by "large?" Is the table in question new, or does it exist already? What is the nature of the data in the table, and how will that data be accessed and maintained? I'll try to cover these various angles in this blog entry, and I hope that you will find the information provided to be useful.

What is a "large" table?

Why even ask this question? Because the relevance of the partition-by-range (PBR) versus partition-by-growth (PBG) question is largely dependent on table size. If a table is relatively small, the question is probably moot because it is unlikely that range-partitioning a smaller table will deliver much value. Partitioning by growth would, in that case, be the logical choice (for many smaller tables, given the default DSSIZE of 4G, a PBG table space will never grow beyond a single partition).

OK, so what is "smaller" and what is "larger" when you're talking about a DB2 for z/OS table? There is, of course, no hard and fast rule here. In my mind, a larger DB2 for z/OS table is one that has 1 million or more rows. That's not to say that a table with fewer than 1 million rows would never be range-partitioned -- it's just that the benefits of range-partitioning are likely to be more appealing for a table that holds (or will hold) millions of rows (or more).

When the table in question is a new one

This, to me, is the most interesting scenario, because it is the one in which the options are really wide open. I'll start be saying that you definitely want to go with a universal table space here, primarily because a number of recently delivered DB2 features and functions require the use of universal table spaces. But should the table space be PBR or PBG? A partition-by-growth table space can be as large as a partition-by-range table space, so that's not a differentiator. What, then, would be your criteria?

To me, the appeal of a PBG table space is mostly a factor of it being a labor-saving device for DB2 for z/OS DBAs. PBG table spaces have an almost "set it and forget it" quality. There is no need to identify a partitioning key, no need to determine partition limit key values, no worries about one partition getting to be much larger than others in a table space. You just choose reasonable DSSIZE and MAXPARTITION values, and you're pretty much done -- you might check back on the table space once in a while, to see if the MAXPARTITION value should be bumped up, but that's about it. Pretty sweet deal if you're a DBA.

On the other hand, PBR can deliver some unique benefits, and these should not be dismissed out of hand. Specifically:
  1. A PBR table space provides maximum partition independence from a utility perspective. You can even run the LOAD utility at the partition level for PBR table space -- something you can't do with a PBG table space. You can also create data-partitioned secondary indexes (DPSIs) on a PBR table space (not do-able for a PBG table space), and that REALLY maximizes utility-related partition independence (though it should be noted that DPSIs can negatively impact the performance of queries that do not reference a PBR table space's partitioning key).
  2. PBR table spaces enable the use of page-range screening, a technique whereby the DB2 for z/OS optimizer can limit the partitions that have to be scanned to generate a result set when a query has a predicate that references a range-partitioned table space's partitioning key (or at least the lead column or columns thereof). Page-range screening doesn't apply to PBG table spaces, because a particular row in such a table space could be in any of the table space's partitions.
  3. A PBR table space can be a great choice for a table that would be effectively partitioned on a time-period basis. Suppose, for example, that the rows most recently inserted into a table are those most likely to be retrieved from the table. In that case, date-based partitioning (e.g., having each partition hold data for a particular week) would have the effect of concentrating a table's most "popular" rows in the pages of the most current partition(s), thereby reducing GETPAGE activity associated with retaining sets of these rows. Date-based partitioning also enables very efficient purging of a partition's data (when the purge criterion is age-of-data) via a partition-level LOAD REPLACE operation with a dummy input data set (the partition's data could be first unloaded and archived, if desired).
  4. A PBR table space tends to maximize the effectiveness of parallel processing, whether of the DB2-driven query parallelization variety or in the form of user-managed parallel batch jobs. This optimization of parallel processing can be particularly pronounced for joins of tables that are partitioned on the same key and by the same limit key values.
Those are some attractive benefits, I'd say. Still, the previously mentioned DBA labor-saving advantages of PBG table spaces are not unimportant. That being the case, this is my recommendation when it comes to evaluating PBR versus PBG for a large, new table: consider first whether the advantages of PBR, listed above, are of significant value for the table in question. If they are, lean towards the PBR option. If they are not, PBG could be the right choice for the table's table space. In particular, PBG can make sense for a large table for which access will be mostly through transactions (as opposed to batch jobs), especially if those transactions will retrieve small result sets via queries for which most row filtering will occur at the index level. In that case, the advantages of range-partitioning could be of limited value.

When the table space in question is an existing one

Here, the assumption is that the table space is not currently of the universal type. When that is true, and the aim is (as it should be) to convert the table space from non-universal to universal, the PBR-or-PBG decision will usually be pretty straightforward and will be based on the easiest path to universal: you'll go with universal PBR for an existing non-universal range-partitioned table space (if it is a table-controlled, versus an index-controlled, partitioned table space), because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a SEGSIZE for the table space) followed by an online REORG (if you are have DB2 10 running in new-function mode, or DB2 11). Similarly, for an existing non-partitioned table space (segmented or simple, as long as it contains only one table), you'll go with universal PBG because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a MAXPARTITIONS value for the table space) followed by an online REORG (again, if your DB2 environment is Version 10 in new-function mode, or DB2 11).

I recently encountered an exception to this rule: if you have a non-universal, range-partitioned table space, with almost all of the data in the last of the table space's partitions (something that could happen, depending on how partition limit keys were initially set), you might decide not to go for the non-disruptive change to universal PBR, because then you'd have a PBR table space with almost all of the data in the last of the table space's partitions. Yes, with enough ALTER TABLE ALTER PARTITION actions, you could get the table's rows to be spread across many partitions (and with DB2 11, alteration of partition limit key values is a non-disruptive change), but that would involve a lot of work. You might in that case just opt to go to a PBG table space through an unload/drop/re-create/re-load process.

To sum things up: PBR and PBG have their respective advantages and disadvantages. In choosing between these flavors of universal table space, the most important thing is to put some thought into your decision. Give careful consideration to what PBR might deliver for a table, and think also of how useful PBG might be for the same table. If you weigh your options, the decision at which you ultimately arrive will likely be the right one.

11 comments:

  1. With regard to partitioning smaller tables -- this might also be done (PGR here nand not PBG) to promote parallell processing in batch, running multiple jobs in batch in parallell. More of an operational consideration.

    ReplyDelete
  2. The first question I ask when determining PBG vs. PBR is whether there is a good partitioning key. Sadly where I work the data modellers are surrogate key happy with that key based upon some monotonic value such as a sequence. These do not make good partitioning keys.

    Michael Harper/TD Bank

    ReplyDelete
    Replies
    1. That's one of the great things about partition-by-growth table spaces (and apologies for the late response): they extend many of the benefits of partitioned table spaces (data capacity, portion-level utility execution, etc.) to tables that do not have a good partitioning key.

      Robert

      Delete
  3. So lets say you have a table that is PBG (no good partitioning key for PBR) and you anticipate the data will occupy 50 - 60 Gig. Would you recommend allocating the tablespace with DSSIZE of 64G, so that all the data fits in one dataset? Or would you allocate something smaller, say DSSIZE of 4G and 16 or more parts to try and get some parallelism benefits (or maybe some other reason I haven't thought of)? Of course I'm anticipating a "depends" answer :-)

    ReplyDelete
    Replies
    1. Apologies for the delay in responding, Jim.

      That's a good question. I suppose it would depend (as you anticipated) on a number of factors. Some thoughts that come to my mind:
      * One factor, as you mentioned, could be the desire (or not) for parallelism (of the query variety or the utility variety). More partitions would up the parallelism factor, but that may or may not be of value to your organization.
      * Another factor could be the particulars of the disk subsystem in which the table would be located. At some sites, not all of the mainframe-attached volumes are set up to be able to hold extra-large (greater than 4 GB) data sets. At such a site, 4G data sets might make more of the disk volumes available for holding the table space's partitions.
      * Another disk-related factor would be average utilization of mainframe-attached volumes. At some organizations there is a desire to drive disk volume utilization to very high levels. That can mean small chunks of available space on volumes, and THAT could conceivably lead to a situation in which extension of a really big (e.g., 64G) data set fails due to hitting the maximum number of volumes across which a data set can be spread (I believe that this limit is 57 or something like that). It would be easier to go with a larger data set size at a site at which disk volume utilization is not driven to a really high level (moderately high disk space utilization, versus very high utilization, might mean spending a little more on storage, but it provides greater operational flexibility).
      * MAYBE the DB2 for z/OS DSMAX limit could be a consideration. You'd think that this wouldn't normally be the case, as DSMAX can be up to 200,000, but of course that's a theoretical limit. Below-the-bar virtual storage can be in shorter supply at some sites than at others, and that might require a not-so-big DSMAX value, and if that value is already being bumped up against with some frequency (resulting in physical data set close activity, which can be a drag on performance when these data sets later have to be re-opened), you might opt to have fewer, larger data sets.

      That's about all I can think of at this time. Hope this helps.

      Robert

      Delete
  4. HI Robert,

    We have a LOB table which is nearing the maximum capacity of 1 TB space. The base tablespace is defined as simple. The DB2 version is 11. Still we are in process of converting the classic partition and segmented and simple to UTS. Please clarify below things

    1) In order to reclaim the space of the deleted LOBS does reorg with AUX YES is required?
    2) If we want to convert the table to UTS ,which type of UTS will be suitable. (in our case i believe we can only convert it to PBG as the base tablespace is simple).
    3) If we convert to PBG do we face any performance issues
    4) If want to convert to PBR ,is it possible to do without drop and recreate the table. As the table is gigantic we are not ready to drop and recreate the table
    5) While converting the table to UTS ,do we need issue the alter maxpart and dssize command for both base and auxialiary table

    ReplyDelete
    Replies
    1. I apologize for the delay in responding - recently back from a vacation.

      Regarding your questions:

      1) AUX YES would be specified only if you wanted to reorganize the base and LOB table spaces together with a single REORG job. If you just want to REORG the LOB table space and reclaim space therein, execute REORG TABLESPACE just for the LOB table space in question, by specifying the name of that table space (qualified, of course, by the associated database name) in the REORG TABLESPACE utility control statement.

      2) You're right: to go from simple (or traditional segmented) in an online way, your only universal table space choice is partition-by-growth (PBG). Going from simple to universal partition-by-range (PBR) would require unload/drop/re-create/reload. With PBG, it's just ALTER TABLESPACE with a MAXPARTITIONS specification, followed by an online REORG to materialize the pending change.

      3) Nope. Performance should not be an issue.

      4) As pointed out in my response to question 2, above, at this time there is no online way to get from simple to universal PBR. Drop and re-create would be necessary.

      5) Nope. In fact, those options of ALTER TABLESPACE (and CREATE TABLESPACE) are not relevant for a LOB table space. If your new PBG table space goes to multiple partitions (that will depend on the amount of data in the base table space and on the DSSIZE value specified), Db2 will implicitly create the LOB-related objects (LOB table space, auxiliary table, index) needed for each partition past the first one.

      Robert

      Delete
  5. Hi Robert,
    We have a very large table (over 2 Billion rows now) that was originally in a segmented tablespace. We converted it to UTS PBG a couple years ago, but I don't know that PBG is the best for this table, and here's why.
    The primary key is the account number, and records are mostly inserted and rarely deleted from the table. Inserts could be anywhere in the table (not a continuously increasing key). As time goes by, partitions fill up and DB2 creates a new partition, which is great for availability. But what about maintenance and performance over the long term? If I reorg a partition (or several partitions) to get the rows in key sequence and to add back freespace, the rows do not all fit and DB2 creates a new partition for the overflow. So let's say I reorg parts 1-10 and during the reorg DB2 added part 35. After the reorg I now I have partitions 1-10 in key sequence, then it skips to part 35, then back to part 11, 12, etc.... Over time, data can be all over the place. The table is too large to reorg the entire thing at once to get all of the data back in key sequence. And I don't know how to tell what keys are in what partitions since with PBG I can't specify the partition number on an unload...
    I'm considering changing to a UTS PBR but that would be an unload, drop, create, load.... (db2 v12)
    Anything I'm missing? I think I saw where in DB2 v13 that there may be some relief on the change from PBG to PBR but v13 is probably a long way off for us...

    Thanks for your help,
    Steve

    ReplyDelete
    Replies
    1. Actually, the data clustering situation is worse than you think: I'm pretty sure that rows are moved from one partition of a PBG table space to another ONLY when the table space is REORG-ed in its entirety. That being the case, in your example of REORG-ing partitions 1-10 of a PBG table space the utility will NOT re-establish clustering for those 10 partitions as a whole - instead, rows will be re-clustered only WITHIN the partitions in which they were located prior to executing the utility. In other words, a row located in partition 10 that "should" (per its clustering key value) be in partition 1 will still be in partition 10 after the REORG of partitions 1-10 completes.

      Yes, Db2 13 helps out here by enabling ONLINE migration of a table space from PBG to PBR (see the blog entry at http://robertsdb2blog.blogspot.com/2022/06/db2-for-zos-online-path-from-partition.html). This requires a full-table space REORG.

      You indicate that "the table is too large to reorg the entire thing at once." How is that? What would actually make it not-possible for you to do a full REORG of the table space in your environment?

      Robert

      Delete
  6. Hi Robert,
    Thanks for the quick reply. Yes that clustering situation does sound worse than I thought.

    Regarding my inability to reorg the entire tablespace at once, I guess I need to revisit that again. It seems it would take huge amount of space to sort 2 billion rows of data, and if I remember correctly that sort space all has to be on DASD, and cannot be on VTS or native tape...
    But looking at the manual just now I see the option to specify SORTDATA NO, so the Reorg utility will unload the data in the order of the clustering index and not have to sort all that data. As time permits I will try to replicate the table in a test environment and run a full tablespace reorg.

    Thanks, as always, for the great articles and feedback.

    Steve

    ReplyDelete