The findings? The PBG-accessing jobs showed consistently better elapsed and CPU times versus the PBR-accessing jobs. That was not a result I would have expected. I've blogged before about reasons for favoring PBR over PBG for large Db2 tables, and some of those reasons are performance-related. Would I have been surprised by comparative results showing roughly equivalent performance for PBG-accessing and PBR-accessing jobs? Maybe not. But better performance for the PBG-accessing jobs? That had me scratching my head.
I asked the Db2 person at the user group meeting to send me Db2 performance monitor accounting "long" reports capturing activity for the PBR-accessing and PBG-accessing jobs, and he did that not long after getting back to his workplace. Analysis was focused on the package-level accounting data for a particular package that accessed only the table with the PBR and PBG variants (other packages associated with the batch jobs used for comparison accessed the PBR-and-PBG-variant table and several other related tables as well). Sure enough, for the "this table only" package the CPU time was about 6% higher for the PBR-housed table versus the PBG-housed table. Not a huge difference, but statistically significant and, as previously mentioned, unexpectedly favoring the PBG table space.
I started asking questions, looking for a possible explanation for the better performance result seen for the PBG-accessing package:
- Was the SEGSIZE specification the same for both table spaces (a larger SEGSIZE value can result in better CPU-efficiency for large-scale page scans)? Yes, same SEGSIZE for both table spaces.
- Was the PBR table space in fact universal, as opposed to being a non-universal table-controlled partitioned table space (I wanted to make sure that we were comparing universal to universal)? Yes, the PBR table space was indeed of the universal variety.
- Were the PBG-housed and PBR-housed tables clustered by the same key (clustering affects locality of reference for set-level query and update and delete operations)? Yes, both variants of the table had the same clustering key.
- Were there indexes defined on the same keys of both tables (obviously, indexes have a lot to do with query access paths, and they affect the CPU cost of insert and delete and some update operations)? For both the PBG-housed and the PBR-housed table, there was only one index, and it was defined on the same key.
- Were the PCTFREE and FREEPAGE specifications the same for the one index on the two table variants (a significant difference here could affect index GETPAGE counts)? Yes, for the one index defined on the same key of both table variants, the PCTFREE and FREEPAGE values were identical.
- Was the one index on the PBR-housed table a partitioned index or a non-partitioned index (NPI)? It was a partitioned index, and by the way the PBR table space had 318 partitions (me, on getting that piece of information: "Hmmmm").
OK, so what prompted my "Hmmmm" on learning the number of the PBR table space's partitions? That number got me thinking, "318 is a pretty substantial number of data sets. Depending on the number of data sets associated with the PBG table space, that could be an important differentiating factor between the two." Indeed, it turned out that the PBG table space had far fewer data sets versus its PBR cousin. How could that influence CPU time for an application process? In this case, a clue came from a number in the aforementioned Db2 monitor accounting reports: the commit count for the batch process that accessed the table of interest: 204,004 over the 4-hour, 50-minute reporting interval (a little over 700 per minute). What does that have to do with the job's CPU time? Well, when the RELEASE specification for a package is COMMIT (the default), any and all "parent" locks acquired in executing the package will be released every time the application process driving the package's execution issues a commit (the package itself is also released from the application process's thread at each commit). We tend to think of "parent" locks as being table space-level locks, but in fact when a table space is partitioned the parent locks are at the partition level. If the application process's SQL statements are such that a large number of partitions are accessed in each unit of work, that means a lot of partition-level locks are acquired within each unit of work and released at commit time. That parent lock acquisition and release cost can be yet a bit higher in a data sharing environment (and such was the case in the situation I'm describing), because a significant percentage of global locks tend to be of the parent type.
I relayed to the Db2 support person my thinking about the impact that the number of PBR versus PBG partitions might have on the application process's CPU time. He got together with the larger Db2 team at his site, and they decided to repartition the PBR table space in a way that dropped the number of partitions from 318 to 58, and voila - the application process's CPU time dropped to a level that was several percentage points below that seen for the PBG table space. Success!
Now, it's important to note that in this particular situation, a re-partitioning of the PBR table space was not a problematic change. Suppose you have a situation like the one I've described here, and re-partitioning your PBR table space is not a viable option? In that case, the in-Db2 CPU cost of an application process (likely a batch job) that accesses a large number of a table space's partitions within each unit of work could be reduced (especially if the job issues a large number of commits, resulting in a large number of units of work) via a rebinding of the associated Db2 package(s) with RELEASE(DEALLOCATE). That package bind option would cause the partition-level locks acquired as the batch job progresses to be retained until thread deallocation time (i.e., until end-of-job). That, in turn, would eliminate the overhead of releasing many partition-level locks at each commit (which locks would likely be re-acquired within the next unit of work), thereby reducing CPU time (I posted an entry on RELEASE(DEALLOCATE) considerations to this blog a few years ago - the part under the heading "Batch" is most relevant to the blog entry you're reading now). [Note: table space- and partition-level locks are almost always of the "intent" variety. Such locks are non-exclusive in nature, so retaining them across commits should not be a concern from a concurrency perspective.]
The bottom-line message here is NOT that you should always go for fewer rather than more partitions for a partitioned table space; rather, it is that the number of a table space's partitions can be a factor that affects CPU efficiency for some applications. A type of application for which this can matter is one that a) uses RELEASE(COMMIT) packages, b) is characterized by many units of work in the life of a thread, and c) accesses a large percentage of a table space's partitions in a typical unit of work. For such an application, CPU efficiency could potentially be improved by going with a smaller rather than a larger number of partitions for the target table space. When significant partition reduction is not an option for a table space, CPU time for an application process such as the one described in this blog entry could potentially be reduced by rebinding the application's Db2 packages with the RELEASE(DEALLOCATE) option.