Robert's Db2 blog: For a Large DB2 for z/OS Table, Should You Go With Partition-by-Range or Partition-by-Growth?

Wednesday, May 13, 2015

For a Large DB2 for z/OS Table, Should You Go With Partition-by-Range or Partition-by-Growth?

There are several aspects to this question: What do I mean by "large?" Is the table in question new, or does it exist already? What is the nature of the data in the table, and how will that data be accessed and maintained? I'll try to cover these various angles in this blog entry, and I hope that you will find the information provided to be useful.

What is a "large" table?

Why even ask this question? Because the relevance of the partition-by-range (PBR) versus partition-by-growth (PBG) question is largely dependent on table size. If a table is relatively small, the question is probably moot because it is unlikely that range-partitioning a smaller table will deliver much value. Partitioning by growth would, in that case, be the logical choice (for many smaller tables, given the default DSSIZE of 4G, a PBG table space will never grow beyond a single partition).

OK, so what is "smaller" and what is "larger" when you're talking about a DB2 for z/OS table? There is, of course, no hard and fast rule here. In my mind, a larger DB2 for z/OS table is one that has 1 million or more rows. That's not to say that a table with fewer than 1 million rows would never be range-partitioned -- it's just that the benefits of range-partitioning are likely to be more appealing for a table that holds (or will hold) millions of rows (or more).

When the table in question is a new one

This, to me, is the most interesting scenario, because it is the one in which the options are really wide open. I'll start be saying that you definitely want to go with a universal table space here, primarily because a number of recently delivered DB2 features and functions require the use of universal table spaces. But should the table space be PBR or PBG? A partition-by-growth table space can be as large as a partition-by-range table space, so that's not a differentiator. What, then, would be your criteria?

To me, the appeal of a PBG table space is mostly a factor of it being a labor-saving device for DB2 for z/OS DBAs. PBG table spaces have an almost "set it and forget it" quality. There is no need to identify a partitioning key, no need to determine partition limit key values, no worries about one partition getting to be much larger than others in a table space. You just choose reasonable DSSIZE and MAXPARTITION values, and you're pretty much done -- you might check back on the table space once in a while, to see if the MAXPARTITION value should be bumped up, but that's about it. Pretty sweet deal if you're a DBA.

On the other hand, PBR can deliver some unique benefits, and these should not be dismissed out of hand. Specifically:

A PBR table space provides maximum partition independence from a utility perspective. You can even run the LOAD utility at the partition level for PBR table space -- something you can't do with a PBG table space. You can also create data-partitioned secondary indexes (DPSIs) on a PBR table space (not do-able for a PBG table space), and that REALLY maximizes utility-related partition independence (though it should be noted that DPSIs can negatively impact the performance of queries that do not reference a PBR table space's partitioning key).
PBR table spaces enable the use of page-range screening, a technique whereby the DB2 for z/OS optimizer can limit the partitions that have to be scanned to generate a result set when a query has a predicate that references a range-partitioned table space's partitioning key (or at least the lead column or columns thereof). Page-range screening doesn't apply to PBG table spaces, because a particular row in such a table space could be in any of the table space's partitions.
A PBR table space can be a great choice for a table that would be effectively partitioned on a time-period basis. Suppose, for example, that the rows most recently inserted into a table are those most likely to be retrieved from the table. In that case, date-based partitioning (e.g., having each partition hold data for a particular week) would have the effect of concentrating a table's most "popular" rows in the pages of the most current partition(s), thereby reducing GETPAGE activity associated with retaining sets of these rows. Date-based partitioning also enables very efficient purging of a partition's data (when the purge criterion is age-of-data) via a partition-level LOAD REPLACE operation with a dummy input data set (the partition's data could be first unloaded and archived, if desired).
A PBR table space tends to maximize the effectiveness of parallel processing, whether of the DB2-driven query parallelization variety or in the form of user-managed parallel batch jobs. This optimization of parallel processing can be particularly pronounced for joins of tables that are partitioned on the same key and by the same limit key values.

Those are some attractive benefits, I'd say. Still, the previously mentioned DBA labor-saving advantages of PBG table spaces are not unimportant. That being the case, this is my recommendation when it comes to evaluating PBR versus PBG for a large, new table: consider first whether the advantages of PBR, listed above, are of significant value for the table in question. If they are, lean towards the PBR option. If they are not, PBG could be the right choice for the table's table space. In particular, PBG can make sense for a large table for which access will be mostly through transactions (as opposed to batch jobs), especially if those transactions will retrieve small result sets via queries for which most row filtering will occur at the index level. In that case, the advantages of range-partitioning could be of limited value.

When the table space in question is an existing one

Here, the assumption is that the table space is not currently of the universal type. When that is true, and the aim is (as it should be) to convert the table space from non-universal to universal, the PBR-or-PBG decision will usually be pretty straightforward and will be based on the easiest path to universal: you'll go with universal PBR for an existing non-universal range-partitioned table space (if it is a table-controlled, versus an index-controlled, partitioned table space), because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a SEGSIZE for the table space) followed by an online REORG (if you are have DB2 10 running in new-function mode, or DB2 11). Similarly, for an existing non-partitioned table space (segmented or simple, as long as it contains only one table), you'll go with universal PBG because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a MAXPARTITIONS value for the table space) followed by an online REORG (again, if your DB2 environment is Version 10 in new-function mode, or DB2 11).

I recently encountered an exception to this rule: if you have a non-universal, range-partitioned table space, with almost all of the data in the last of the table space's partitions (something that could happen, depending on how partition limit keys were initially set), you might decide not to go for the non-disruptive change to universal PBR, because then you'd have a PBR table space with almost all of the data in the last of the table space's partitions. Yes, with enough ALTER TABLE ALTER PARTITION actions, you could get the table's rows to be spread across many partitions (and with DB2 11, alteration of partition limit key values is a non-disruptive change), but that would involve a lot of work. You might in that case just opt to go to a PBG table space through an unload/drop/re-create/re-load process.

To sum things up: PBR and PBG have their respective advantages and disadvantages. In choosing between these flavors of universal table space, the most important thing is to put some thought into your decision. Give careful consideration to what PBR might deliver for a table, and think also of how useful PBG might be for the same table. If you weigh your options, the decision at which you ultimately arrive will likely be the right one.

11 comments:

AnonymousMay 14, 2015 at 7:54 AM
With regard to partitioning smaller tables -- this might also be done (PGR here nand not PBG) to promote parallell processing in batch, running multiple jobs in batch in parallell. More of an operational consideration.
ReplyDelete
Replies
AnonymousMay 20, 2015 at 9:43 AM
The first question I ask when determining PBG vs. PBR is whether there is a good partitioning key. Sadly where I work the data modellers are surrogate key happy with that key based upon some monotonic value such as a sequence. These do not make good partitioning keys.

Michael Harper/TD Bank
ReplyDelete
Replies
UnknownNovember 24, 2015 at 2:06 PM
So lets say you have a table that is PBG (no good partitioning key for PBR) and you anticipate the data will occupy 50 - 60 Gig. Would you recommend allocating the tablespace with DSSIZE of 64G, so that all the data fits in one dataset? Or would you allocate something smaller, say DSSIZE of 4G and 16 or more parts to try and get some parallelism benefits (or maybe some other reason I haven't thought of)? Of course I'm anticipating a "depends" answer :-)
ReplyDelete
Replies
AnonymousMay 17, 2021 at 4:36 AM
HI Robert,

We have a LOB table which is nearing the maximum capacity of 1 TB space. The base tablespace is defined as simple. The DB2 version is 11. Still we are in process of converting the classic partition and segmented and simple to UTS. Please clarify below things

1) In order to reclaim the space of the deleted LOBS does reorg with AUX YES is required?
2) If we want to convert the table to UTS ,which type of UTS will be suitable. (in our case i believe we can only convert it to PBG as the base tablespace is simple).
3) If we convert to PBG do we face any performance issues
4) If want to convert to PBR ,is it possible to do without drop and recreate the table. As the table is gigantic we are not ready to drop and recreate the table
5) While converting the table to UTS ,do we need issue the alter maxpart and dssize command for both base and auxialiary table
ReplyDelete
Replies
SteveNovember 15, 2022 at 11:12 AM
Hi Robert,
We have a very large table (over 2 Billion rows now) that was originally in a segmented tablespace. We converted it to UTS PBG a couple years ago, but I don't know that PBG is the best for this table, and here's why.
The primary key is the account number, and records are mostly inserted and rarely deleted from the table. Inserts could be anywhere in the table (not a continuously increasing key). As time goes by, partitions fill up and DB2 creates a new partition, which is great for availability. But what about maintenance and performance over the long term? If I reorg a partition (or several partitions) to get the rows in key sequence and to add back freespace, the rows do not all fit and DB2 creates a new partition for the overflow. So let's say I reorg parts 1-10 and during the reorg DB2 added part 35. After the reorg I now I have partitions 1-10 in key sequence, then it skips to part 35, then back to part 11, 12, etc.... Over time, data can be all over the place. The table is too large to reorg the entire thing at once to get all of the data back in key sequence. And I don't know how to tell what keys are in what partitions since with PBG I can't specify the partition number on an unload...
I'm considering changing to a UTS PBR but that would be an unload, drop, create, load.... (db2 v12)
Anything I'm missing? I think I saw where in DB2 v13 that there may be some relief on the change from PBG to PBR but v13 is probably a long way off for us...

Thanks for your help,
Steve
ReplyDelete
Replies
SteveNovember 15, 2022 at 3:06 PM
Hi Robert,
Thanks for the quick reply. Yes that clustering situation does sound worse than I thought.

Regarding my inability to reorg the entire tablespace at once, I guess I need to revisit that again. It seems it would take huge amount of space to sort 2 billion rows of data, and if I remember correctly that sort space all has to be on DASD, and cannot be on VTS or native tape...
But looking at the manual just now I see the option to specify SORTDATA NO, so the Reorg utility will unload the data in the order of the clustering index and not have to sort all that data. As time permits I will try to replicate the table in a test environment and run a full tablespace reorg.

Thanks, as always, for the great articles and feedback.

Steve
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.