Wednesday, December 21, 2022

Db2 13 for z/OS: Setting Lock Timeout Limit and Deadlock Priority at the Application Level

Db2 13 for z/OS, which became generally available about seven months ago, introduced two interesting features that are similar in some ways but differ in one important aspect (about which I'll comment momentarily). These new features allow an application (or, more broadly, a "process") to set its own lock timeout limit and/or its own deadlock priority. With this blog entry I aim to provide related information that will be useful for you.

Application-level lock timeout

First, let's establish the need for this Db2 13 enhancement. Historically, there has been one lock timeout limit - specified via the IRLMRWT parameter in ZPARM - that applies to all processes interacting with a Db2 subsystem. While IRLMRWT is still there in a Db2 13 environment, it became apparent some time ago that "one size fits all" will often NOT be ideal when it comes to lock timeout in a Db2 system. Think about it. Suppose the value of IRLMRWT is at the default of 30 seconds for a production Db2 system at your site. You might have a developer of a Db2-accessing online application say, "What? NO! This app has a mobile front-end and users can get VERY frustrated if they have to wait more than a few seconds for a transaction to complete. It would be TERRIBLE to have a transaction sit and wait for 30 seconds to get a Db2 lock. We need the lock timeout value to be WAY lower than 30 seconds." At the same time, a developer of a long-running batch application might say, "What? NO! This job HAS to complete once it gets started or we miss SLAs and have angry customers. The job typically runs for five hours, and maybe it's been running for four hours and you want to time it out because it's been waiting for a lock for 30 seconds? 30 seconds is NOTHING as far as this job's concerned. The Db2 lock timeout value should be SUBSTANTIALLY greater than 30 seconds." Both of the developers are expressing legit concerns. How can those disparate concerns be addressed?

They can be addressed via the new (with Db2 13) special register named CURRENT LOCK TIMEOUT (available for use when Db2 13 function level 500 has been activated). Here are some things to know about CURRENT LOCK TIMEOUT:

  • The value of the special register - expressed in seconds - can be anything between -1 and 32767 (or a site-specified upper bound - see the next item in this list). A value if -1 means that the process will not be timed out if it ends up waiting for a lock - it will wait until it gets the requested lock or becomes deadlocked with some other process. A value of 0 means that the process does not want to wait at all for a lock - it wants to get an error message if a requested lock can't be obtained immediately (this basically makes available for application use a formerly Db2-internal mechanism known as a conditional lock request).
  • If the default upper-limit value of 32767 seconds is deemed by a Db2-using organization to be too high, a different max value can be provided via the new (with Db2 13) ZPARM parameter SPREG_LOCK_TIMEOUT_MAX. If you set that value to (for example) 1800, no process will be able to set the CURRENT LOCK TIMEOUT special register to a value greater than 1800 seconds.
  • If a lock timeout occurs and an application-level timeout limit was in effect for the lock requester and/or for the lock holder, that will be reflected in the information provided via the DSNT376I lock timeout message generated by Db2.
  • The value of the CURRENT LOCK TIMEOUT special register can be set automatically for an application by way of the Db2 profile tables, and not just for DDF-using applications (more information on this is provided below).

Application-level deadlock priority

A deadlock, of course, happens when process A holds a lock that process B needs in order to proceed, and process B holds a lock that process A needs in order to proceed. With both processes in a not-able-to-proceed state, Db2 detects the deadlock and chooses a "winner" and a "loser." The "loser" process is rolled back, causing it to release locks it had held, and that enables the other process (the "winner") to acquire the lock for which it had been waiting.

All well and good, except for the fact that one traditionally has been able to do little to nothing to influence Db2's choice of winner and loser in deadlock situations. That changes starting with function level 501 of Db2 13, thanks to a new built-in global variable named DEADLOCK_RESOLUTION_PRIORITY.

Before providing some helpful (I hope) items of information about DEADLOCK_RESOLUTION_PRIORITY, let me point out a very important difference between this new feature and the previously-described CURRENT LOCK TIMEOUT: the latter is a special register, while the former is a global variable. Why is that notable? One simple reason: any process can set the value of a special register, but a process must have permission to set the value of a global variable. The rationale for making DEADLOCK_RESOLUTION_PRIORITY a global variable may already be clear to you: if a value for DEADLOCK_RESOLUTION_PRIORITY could be set by any process, one could imagine everyone setting the global variable to its maximum value ("I always want to be the winner in a deadlock situation"), and that would defeat the purpose of the new capability (as the bad guy, Syndrome, in the movie The Incredibles put it, "When everyone is super, no one will be"). The permission-only nature of DEADLOCK_RESOLUTION_PRIORITY means that (for example) a Db2 DBA can assign the max-priority value to a "must-complete" database administration process, and know that under almost any circumstances (exceptions noted below) the process will be the winner in case of a deadlock. The same could be done - with permission received from a DBA - for a high-priority application process.

OK, items of related information that might be good for you to know:
  • The maximum value for DEADLOCK_RESOLUTION_PRIORITY is 255 (the range of acceptable values for the global variable is 0-255).
  • If you're a Db2 DBA, you might think, "I have a process that I think of as 'should-complete,' versus 'must-complete.' I want that process to generally be the winner in a deadlock situation, but I don't want it to get in the way of a 'must-complete' process. If 255 is a good DEADLOCK_RESOLUTION_PRIORITY value for a 'must-complete' process, what would be a reasonable priority value for a 'should-complete' process?" There is not a totally straightforward answer to that question. What you could do is this: start with some value for the 'should complete' process (maybe 150, or maybe 200, for example), and see if it ends up becoming the loser in a deadlock situation. If that happens, you can see how the priority of the "winner" process compared to the priority that you assigned to your 'should-complete' process, and potentially adjust your process's priority accordingly. How could you see the deadlock priority of a process that "beat" your process? That information is available via the IFCID 172 Db2 trace record. Activating IFCID 172 should involve very little overhead, as the trace record captures information about deadlocks, and deadlocks tend to be unusual in most Db2 systems I've seen. By the way, you should be able to use your Db2 monitor to generate a report with formatted information from IFCID 172 trace records (if you use IBM's OMEGAMON for Db2 monitor, the report to use for this purpose is called the Record Trace Report - that report can format the information in most any Db2 trace record).
  • I mentioned previously that there are exceptions to the "255 always wins" rule. Even if DEADLOCK_RESOLUTION_PRIORITY has been set to 255 for a process, that process could be the loser if it gets deadlocked with a process that is changing data in a table space defined with NOT LOGGED (hard for Db2 to roll back a unit of work when there are no associated undo records in the log), or if it gets deadlocked with a rollback or an abort or a backout process.

Setting the lock timeout limit or deadlock priority automatically for an application

What if you want an application to have a certain lock timeout limit or a certain deadlock priority, but you don't want the application to have to issue a SET CURRENT LOCK TIMEOUT or a SET SYSIBMADM.DEADLOCK_RESOLUTION_PRIORITY statement in order to accomplish the objective (SYSIBMADM is the schema for user-set-able built-in Db2 global variables - SYSIBM is the schema for built-in global variables that are set by Db2)? No problem: you can get that done using the Db2 profile tables. "Yeah," you might say, "but I want to do this for a local-to-Db2 application, and the profile tables can be used to set special register or built-in global variable values only for DDF-using applications." Actually, with Db2 13 that statement is no longer entirely true. Db2 13 allows the setting of CURRENT LOCK TIMEOUT (starting with function level 500) and DEADLOCK_RESOLUTION_PRIORITY (starting with function level 501) via profile table entries for local-to-Db2 as well as for DDF-using applications (for other special registers and built-in global variables, value-setting by way of the profile tables remains do-able only for DDF-using applications).

For a DDF-using application, the profile-defining specification (what you put in SYSIBM.DSN_PROFILE_TABLE) can be in a Db2 13 environment what it could be in a Db2 12 environment (the auth ID an application uses when connecting to the Db2 system is one example; the IP address of an application server is another example). For a local-to-Db2 application, the profile-defining specification can be auth ID and/or role, or collection name and/or package name, or the value of CLIENT_APPLNAME or CLIENT_USERID or CLIENT_WRKSTNNAME. The online Db2 13 for z/OS documentation provides additional details on using the profile tables to set values for special registers and for built-in global variables.

And there you have it. Db2 13 provides more control - and more-granular control - over two important aspects of application execution. I hope that these new capabilities will be useful at your site.

Friday, November 11, 2022

A Case Study: Using Db2 for z/OS Monitor Reports to Zero In on a Performance Problem

I had an interesting exchange recently with a Db2 for z/OS systems programmer. This individual had been asked to assist in determining the cause of a performance problem impacting a Db2-accessing application. The sysprog shared with me a Db2 monitor-generated accounting report showing activity for the application process, and a Db2 monitor statistics report covering the same time period for the same Db2 subsystem. In those two reports were the clues that pointed to the source of the application performance problem, and with the probable problem source identified the associated mitigating actions were readily determined. In this blog entry, I'll take you through my analysis of the Db2 monitor-provided accounting and statistics information, and the rationale behind my recommended steps for resolving the problem. My aim is not only to shed light on a particular performance-affecting issue and related remediating moves, but also to illustrate a methodical approach for analyzing Db2 application performance issues in general.

Input for analysis

I find Db2 monitor generated accounting and statistics reports to be extremely useful for analysis of application performance problems. In both cases, what you want is the detailed form of the report. In the case of IBM's OMEGAMON for Db2 performance monitor (the one with which I'm most familiar), you're talking about the ACCOUNTING REPORT - LONG and the STATISTICS REPORT - LONG (for other Db2 monitor products, these reports might have titles like, "detailed summary of accounting information" or "statistics detail report"). For the accounting report, your preference is to see activity pertaining exclusively to the application process for which the performance issue has arisen. This is usually done by using the monitor's data-filtering capabilities to include, for report-generation purposes, only the Db2 accounting records of interest (an accounting report is basically information from Db2 accounting trace records, formatted for readability). Db2 accounting trace records have all kinds of identifier fields, so you can get pretty specific. In the case about which I'm writing here, the relevant filtering criteria were the authorization ID of the application process of interest, and the FROM and TO times that bracketed the period during which the performance problem occurred.

As for the statistics report, what you want is one that covers the same time period as the accounting report (same FROM and TO times), for the same Db2 subsystem.

Looking for clues

What the Db2 sysprog had been told by the development team is that the application in question started out performing well, and then slowed down a lot (important input). The sysprog and I looked first at the accounting report, and in doing that we focused initially on the "class 3 suspension" information (this information, obtained from Db2 accounting trace class 3 records, has to do with "known" wait events, as opposed to "not accounted for" time, about which I'll comment momentarily). Why did we look there first? Because (in my experience), when a Db2-accessing process starts out performing well and then slows way down, it's often due to a substantial increase in one or more of the "wait times" captured by accounting trace class 3 (yes, an application slowdown could be related to a significant CPU time increase, but I've seen that less often than I've seen large wait time increases).

Looking at the class 3 times for the application process, what jumped out was a very large value for the average DB2 LATCH time ("average" is average per accounting trace record, which typically equates to average per transaction or average per batch job, depending on the workload type). Here, I mean "large" in terms of average DB2 LATCH time being a large percentage of average TOTAL CLASS 3 time. Usually, DB2 LATCH time is a very small percentage of TOTAL CLASS 3 time, with "wait time" categories such as SYNCHRON DATABASE I/O and OTHER READ I/O accounting for the bulk of TOTAL CLASS 3 time. A (proportionately) really large DB2 LATCH time is usually an indicator that something's not right.

The first thing I look at when I see unusually large DB2 LATCH time for a Db2-accessing process is the "in-Db2 not-accounted-for time" for that process. Several Db2 monitor products calculate that for you - in an IBM OMEGAMON for Db2 accounting report, the field is labeled NOTACC, and it's shown, on a sideways bar chart at the top of an accounting report, as a percentage of average in-Db2 elapsed time. If you need to calculate this figure for yourself, the denominator is average in-Db2 elapsed time (aka "class 2" elapsed time), and the numerator is average in-Db2 elapsed time minus in-Db2 CPU time (general-purpose plus zIIP, or "specialty engine," CPU time) minus TOTAL CLASS 3 time. In other words, it's the percentage of in-Db2 elapsed time that is not CPU time and not "identifiable" wait time. For a transactional application process (as was the case for the situation about which I'm writing), as a general rule you want in-Db2 not-accounted-for time to be less than 10%. If that figure is substantially greater than 10%, it's indicative of a CPU-constrained environment, and if the environment is highly CPU-constrained then DB2 LATCH time can get really large (as pointed out in an entry I posted to this blog a few years ago).

Well, in this particular case the average in-Db2 not-accounted-for time for the application process was 11% of in-Db2 elapsed time - a little on the high side for a transactional process, but not high enough to explain a really large DB2 LATCH time. With that cause of elevated DB2 LATCH time pretty much eliminated, it was time to turn to the breakdown of latch wait events for different latch categories, and that's where the statistics report comes in. In a Db2 monitor-generated statistics long report, the latch suspend count information will likely look something like this (what you see below is from an OMEGAMON for Db2 statistics long report, but it is NOT from the report I reviewed with the Db2 systems programmer - we jointly viewed that report in a web meeting, and I do not have a copy of the report):

---------  --------  --------  --------  --------
LC01-LC04      0.00      0.00      0.00      0.00
LC05-LC08      0.00      0.74      0.00      0.41
LC09-LC12      0.00      0.02      0.00      0.32
LC13-LC16      0.00     12.89      0.00      0.00
LC17-LC20      0.00      0.00      0.01      0.00
LC21-LC24      0.04      0.00      1.96      2.84
LC25-LC28      0.12      0.02      0.01      0.00
LC29-LC32      0.06      0.04      0.00      0.28
LC254          0.00

What I saw in the report I reviewed in the web meeting with the Db2 sysprog (and again, that's NOT what you see above - the snippet above is provided so that you can see what the latch suspend count information looks like in a statistics report) was a particularly high value for latch class 6 suspend events (that would be in the position highlighted in green in the sample report snippet above). What is latch class 6? It has to do with index page split activity in a Db2 data sharing environment (by the way, a handy page for seeing the activities associated with various Db2 latch classes is this one from the OMEGAMON for Db2 documentation).

Let's unpack that. An index page split occurs when Db2 has to insert an entry in an index page because of an insert (or an update of an indexed column) and that page is full. In that situation, a portion of the entries in the page will be moved to what had been an empty page in the index, so that there will be room in the formerly-full page for the new entry. What does Db2 data sharing have to do with this (and in the environment about which I'm writing, Db2 is running in data sharing mode)? In a data sharing system (versus a standalone Db2 subsystem), an index page split action has a greater impact on throughput because it forces a log-write operation.

Seeing the high level of index page split activity suggested by the numerous latch class 6 wait events, we turned again to the accounting report to see the average number of insert operations executed by the performance-impacted application. Sure enough, we saw that this was an insert-intensive process - more by the average number of rows inserted, versus the number of INSERT statements executed (the average number of rows inserted, per the accounting report, was about 100 times larger than the average number of INSERT statements executed, indicating use of block-level inserts by the application).

The elevated count of latch class 6 suspend events (related to index page splits) and the insert-intensive nature of the process also dove-tailed with the observed "started out fine, then slowed down" behavior of the application: in all probability, when the process started there was a pretty good amount of free space in leaf pages of indexes on the table(s) into which rows were being inserted. After a while these "holes" in index leaf pages filled up, and that resulted in a large number of index page split actions to make space for new index entries, and THAT - partly due to the fact that this was a Db2 data sharing system - had a majorly negative impact on application performance (the keys of affected indexes were clearly not of the continuously-ascending variety, because index page split actions are not required for an index defined on a continuously-ascending key).

With the source of the performance problem identified, the next matter to consider was...

What to do about it?

The Db2 sysprog and I discussed two problem-mitigating actions - one an "absolutely do" and the other a "maybe do." The "absolutely do" step was to increase the amount of free space in indexes to accommodate new entries. That step, in turn, was comprised of two sub-steps, each of which is applicable to indexes defined on non-continuously-ascending keys. Sub-step one: increase the index's PCTFREE value. Whereas the default PCTFREE value for an index is 10, a value of 20 or 25 might make more sense for an index defined on a non-continuously-ascending key for an insert-heavy table. Sub-step two: increase the index's FREEPAGE value. The default FREEPAGE value is 0. Here's why boosting the FREEPAGE value - for example, to 5 (in which case there would be an empty index page after every 5 pages containing index entries) - can be helpful for an index defined on a non-continuously-ascending key for an insert-heavy table: as previously mentioned, when an index page is split a portion of that page's entries are moved to a previously empty page in the index. If FREEPAGE 0 (the default) is in effect, the only empty index pages will be at the very end of the index - potentially a long way from the page that was split. That situation creates a drag on performance through degradation of the index's organization (reflected in the LEAFDIST value for the index - or index partition, in the case of a partitioned index - in the SYSIBM.SYSINDEXPART catalog table). With a non-zero and relatively low value for FREEPAGE (meaning, an empty page following each relatively-low-number of populated pages), when an index split does occur then there should be a "nearby" empty page into which entries from the full page can be moved.

Both PCTFREE and FREEPAGE can be changed for an index via an ALTER INDEX statement, and both take effect when the index is subsequently reorganized (or loaded). The larger PCTFREE value will reduce index page split activity between REORGs, and the non-zero FREEPAGE value will reduce the impact of page splits if they do occur.

And what about the "maybe do" step? That would be an increase in the size of the index's pages, from the default of 4 KB to maybe the maximum of 32 KB. How can that help? Here's how: because a 32 KB index page (for example) can hold 8 times as many entries as a 4 KB page, going to 32 KB-sized pages for an index (via an ALTER INDEX statement that assigns the index to a 32K buffer pool) can potentially result in an 87.5% reduction (seven eighths) in page split activity for an index, other things being equal (e.g., same rate of inserts for the underlying table). Why is this a "maybe do" thing versus an "absolutely do" thing? Because if access to table rows through the index (e.g., for queries) is truly random in nature with respect to key values, 32 KB-sized pages could mean somewhat less-effective use of buffer pool resources versus 4 KB-sized pages. It's a matter, then, of what's more important in a particular situation: is it minimizing index page split activity, or maximizing the effectiveness of buffer pool resources for random patterns of row access by applications?

In conclusion...

I hope that this entry has provided you with some "news you can use." Utilize Db2 monitor accounting and statistics reports to determine the source of an application performance problem, then take appropriate remedial action.

One more thing: Db2 13 for z/OS made it a lot easier to verify that index page splits are an issue, through enhanced instrumentation (new IFCID 396, associated with statistics trace class 3, which is on by default, indicates when an index page split operation takes more than 1 second, which would be unusually long) and through the new REORGTOTALSPLITS, REORGSPLITTIME and REORGEXCSPLITS columns of the SYSIBM.SYSINDEXSPACESTATS real-time statistics table in the Db2 catalog.

Friday, October 28, 2022

Getting Ready to Migrate to Db2 13 for z/OS from Db2 12

It could be said that preparation for migration of a Db2 12 for z/OS system to Db2 13 comes down to one thing: activate Db2 12 function level 510. That's a pretty simple-looking migration plan, but there's more to it than meets the eye - as I'll explain in this blog entry.

First, let's consider function level 510 itself. Unique among Db2 12 function levels, 510 provides no new functionality in the traditional sense - there's nothing new there that a DBA or a developer would use, or that could make an application execute with greater efficiency. The purpose of function level 510 is simply this: to validate that a Db2 12 system is technically ready for migration to Db2 13 (here, "system" refers to a standalone Db2 subsystem or a Db2 data sharing group).

If activating function level 510 gets a Db2 12 system ready for migration to Db2 13, what makes a Db2 12 system ready for activation of function level 510? Three things:

  1. The system's code level has to be 510. That can be easily verified by issuing the Db2 command -DISPLAY GROUP: in the command output, look for the value 121510 in a column labeled DB2 LVL (if your Db2 system's maintenance level is relatively current, the code level is likely to be 510 already - the PTF that takes a Db2 system's code to the 510 level came out in April of 2021).
  2. The catalog level has to be V12R1M509 - again, -DISPLAY GROUP output tells the tale.
  3. There can't be any packages in the system, used within the past 18 months, that were last bound or rebound prior to Db2 11.
Let me expand on items 2 and 3 in that list.

Getting to the right catalog level

First, you might be wondering, "How is it that I need to get to catalog level V12R1M509 before I can activate function level V12R1M510? Wouldn't the catalog level have to be V12R1M510?" No. There is no catalog level V12R1M510. Why is that? Because function level 510 has no catalog dependencies (i.e., no changes to a catalog at the V12R1M509 level are necessary to support function level 510). This is not at all unprecedented. Several of the Db2 12 function levels had no catalog dependencies. For example, function level 504 can be activated when the catalog level is V12R1M503 - there is no V12R1M504 catalog level because function level 504 did not require any changes to a catalog at the 503 level.

Second, suppose your Db2 catalog level is, say, V12R1M500. Can you take that catalog right to the 509 level? YES. Assuming your code level is at least 121509, you can execute the CATMAINT utility with a specification of UPDATE LEVEL(V12R1M509). That one execution of CATMAINT will make the changes to the catalog associated with the 509 catalog level, and it will make all the changes associated with all the other catalog levels between 500 and 509.

About those old packages...

"What's with that?" you might wonder. "Why would the existence of one or more packages, that we've used in the past 18 months and that were last bound or rebound prior to Db2 11, keep me from being able to activate function level 510?" Short answer: it's for your own good. Longer answer: packages are executable code. The code generated for a package by a Db2 version prior to V11 cannot be executed in a Db2 13 system. If there is a request to execute a pre-Db2 11 package in a Db2 13 environment, that package will be auto-rebound so that Db2 13 can generate for that package code that can be executed in the Db2 13 system. "OK," you say, "I get that auto-rebinds of packages can be a little disruptive with respect to my application workload, but it's not THAT big of a deal - the auto-rebind gets done and we go trucking on." Not so fast, I'd say. What if, as a result of the auto-rebind of a package, there's an access path change that negatively impacts - perhaps majorly - the performance of the associated program? Your possible response: "Again, not a huge deal. We run with PLANMGMT=EXTENDED in ZPARM, and so we'd have the prior copy of that package available, and we can just execute a REBIND with SWITCH(PREVIOUS) to restore the previous better-performing copy of the package." WRONG! You CAN'T restore the previous copy of that package, because the previous copy was generated prior to Db2 11, and that means the previous copy can't be executed in a Db2 13 system. You're STUCK with that poor-performing package. Sure, you can take steps to try to correct the performance regression - maybe update catalog statistics or take an index action and then rebind and hope for a better performance outcome - but do you want to do that while some critically important production program is performing in an unacceptable way and you're phone is going off non-stop because upper management wants to know WHEN YOU'RE GOING TO GET THIS FIXED? Probably not; so, we're going to prevent that scenario from happening by not letting you go to Db2 13 if you still have any pre-Db2 11 packages that have been used within the past 18 months (the thinking of the Db2 for z/OS development team: if a package was last used more than 18 months ago, it's highly likely that it's a package that's just not used anymore in your environment - it's still in SYSPACKAGE simply because no one has FREE-ed the old package).

This "keep you out of trouble" action taken by the Db2 for z/OS development team is based on the painful experience some organizations had when they migrated to Db2 12 from Db2 11. In that situation, we made it clear in the documentation that pre-Db2 10 packages would need to be rebound prior to going to Db2 12 because pre-Db2 10 packages could not be executed in a Db2 12 environment. Well, some Db2 for z/OS people either didn't see that warning, or saw it and decided to ignore it and take their chances, and in a few cases the problem described in the preceding paragraph was encountered. At some sites, the problem's impact was severe enough to warrant falling back to Db2 11, at which point people would rebind the pre-Db2 10 packages (as had been strongly encouraged by us) and then re-migrate to Db2 12. Not wanting to see reoccurrences of those difficulties, with Db2 13 we're basically saying, "We are not going to let you get into the potentially ugly situation you could see if a package that cannot be executed in a Db2 13 system is requested for execution in that environment - you cannot go to Db2 13 if you have pre-Db2 11 packages that might still be in use at your site.

By the way, if you want to see if you have any packages that would prevent successful activation of function level V12R1M510, you can execute this query on a Db2 12 system (and note that this query is also provided in the Db2 12 documentation):

  AND VALID <> ‘N’ 

One more thing: as previously mentioned, it's highly likely that a package that has not been executed in a Db2 system within the past 18 months will not be executed at some future time in that Db2 environment. That said, maybe you're concerned that, for some reason, a package in your environment is executed every two years (24 months). The chances of that being true are almost certainly very small, but perhaps non-zero. If that's bugging you, disregard the "18 months" window and rebind ANY pre-Db2 11 package in your system prior to going to Db2 13.

If you're nervous about a "big jump" to Db2 12 function level 510...

Consider this hypothetical situation: you're running with Db2 12 function level 500 activated and you're contemplating going to function level 510 to prepare for migration to Db2 13. That's actually not so hypothetical - a good number of Db2 12 systems are running with an activated function level of V12R1M500. If that looks familiar to you, there might be a couple of thoughts running through your mind:
  • Function level 500 to 510 looks like a really big jump to me. How do I get that done with a minimized risk of complications? The key here is the APPLCOMPAT specification for your packages. Maybe you're concerned that making a big jump up in the activated function level for your Db2 12 systems could lead to programs being impacted by a "SQL incompatibility" (basically, that's a SQL behavioral change: same SQL, same data, different result - these incompatibilities are pretty few and far between, and they often affect either few or none of your programs, but they can indeed arise on occasion). If you're indeed worried about that, you can guard against that by leaving the APPLCOMPAT values for your packages where they are when you activate a higher function level of Db2 12 for z/OS. If you have a package bound with, for example, APPLCOMPAT(V12R1M500), and you activate function level 510, SQL issued through the package bound with APPLCOMPAT(V12R1M500) will still get the SQL behavior associated with function level 500. You can find lots more information about APPLCOMPAT in the part 1 and part 2 posts of the 2-part entry on APPLCOMPAT that I added to this blog in 2019.
  • If function level 510 is a good ways beyond where my Db2 12 system is at present, could I maybe go from where we are to some intermediate function level, and later to level 510? Of course you can. "If I decide to do that," you might be thinking, "what would a good intermediate function level be for us?" That's really up to you. My advice: go to the Db2 12 function levels "main page" in the Db2 12 online documentation, and check out the features and functions introduced with each function level between 510 and where you are now. If there's a function level that provides an enhancement that would be particularly helpful in your environment, go to that one, and later, at a time of your choosing, go from that level to level 510 (I'll tell you that a pretty popular "intermediate" Db2 12 function level is 505 - this because a lot of Db2 DBAs really like the "rebind phase-in" functionality that was introduced via function level 505).
OK, that about wraps it up for this blog entry. I hope that this information will be helpful for you as you plan for your site's migration from Db2 12 to Db2 13.

P.S. In addition to activating Db2 12 function level 510, don't forget to apply the "fallback SPE" to your Db2 12 systems prior to migrating to Db2 13 - that PTF allows you to fallback from Db2 13 to Db2 12, in the unlikely event that that should be deemed necessary for you. The APAR associated with this fallback SPE is PH37108.

Wednesday, September 28, 2022

Db2 for z/OS: GETPAGEs, Predicates, SELECTs and DELETEs

About 15 years ago, I posted to the blog I maintained while working as an independent consultant (prior to rejoining IBM in 2010) an entry in which I named GETPAGEs the most important factor with regard to the CPU consumption of a Db2 for z/OS application workload. GETPAGEs, which are requests by Db2 (typically on behalf of application or user processes) to access pages of table spaces or indexes, are indeed a prime driver of SQL statement execution cost. That being the case, people who work on tuning Db2 query performance often aim to reduce GETPAGE activity associated with executing the query. That work, in turn, tends to focus on a query's predicates (the clauses of the query that determine which rows will qualify for the query's result set). Can an index be added or modified to enable index-level (versus table-level) row filtering for a given predicate? Can a non-index-able predicate be rewritten to be index-able? Can a predicate be added to further refine the query's result set? And so on.

The thing is, the predicate focus of query tuning could lead one to believe that the same GETPAGE-reducing actions could yield similar results for any SQL statement containing predicates, regardless of whether the statement is a SELECT or, say, a DELETE. That is not the case, especially for DELETEs versus queries, and the difference basically boils down to one thing: indexes. The same indexes that reduce GETPAGE activity for a query can make GETPAGE counts stubbornly high for a DELETE statement, in spite of tuning actions related to the DELETE statement's predicates. A Db2 SQL programmer recently ran up against this reality. He asked me about it, and I think his situation could be instructive for others.

The SQL programmer (I'll refer to him as R - the first letter of his first name) was analyzing the performance of a row-deleting process that would remove a few hundred thousand rows from a table in a given execution. Thinking that reducing GETPAGE activity would lower the CPU cost of the process, and approaching the tuning effort for the process's DELETE statement as one would approach tuning the performance of a query (thinking about an analogous query that would have the same predicates as the DELETE statement of interest), R had a DBA create on the target table a new index with a key comprised of the five columns referenced in a series of ANDed "equals" predicates in the DELETE statement (in other words, the DELETE had the series of predicates WHERE C1 = ? AND C2 =? AND C3 = ? AND C4 = ? AND C5 = ?, and the new index had the key C1 | C2 | C3 | C4 | C5). That would make for a MATCHCOLS value of 5 for the DELETE statement, right (referring to a column in the Db2 access path-explaining PLAN_TABLE)? And, that should reduce GETPAGE activity for the DELETE by enabling a greater degree of row filtering at the index level, right? Well, not exactly, as things turned out: R was surprised to see that the new index had very little impact on the GETPAGE count for the DELETE statement. What was going on here?

The index-impact result that surprised R comes down to essentially one thing, that being the fundamentally different nature of DELETE versus SELECT statements. Yes, both can have predicates, but that doesn't mean that a DELETE with predicates can be thought of as a "query" - DELETEs and SELECTs are like apples and oranges when it comes to their execution characteristics. A SELECT returns values (specifically, rows of values), and that's it. A certain index might sharply reduce GETPAGE activity for a query by reducing the need for Db2 to examine table space pages in order to find qualifying rows (in fact, an index could virtually eliminate table space GETPAGEs for a query, if all columns referenced in the query - in the query's select-list and in its predicates - are part of the index's key). A DELETE, on the other hand, changes pages, many of which - and this is of critical importance - will be pages in index spaces (excepting the unusual, but not unheard-of, situation in which a table has no indexes).

Given this aspect of DELETE processing, not only will adding an index to a table potentially have little impact on GETPAGE activity for a DELETE statement - it might even increase GETPAGE activity for the DELETE. Think about it: for every row removed from a table by a DELETE statement, an entry has to be removed from each and every index defined on the table (yes, it's actually a "pseudo-delete" of index entries, with the relevant entries just marked for later physical deletion, but this still involves index GETPAGEs). Not only that, but Db2 very often can't just march through an index's leaf pages, deleting index entries as corresponding rows are deleted from the underlying table - not when a given index's keys have a way-different sequence relative to the sequence in which table rows are being deleted. Maybe, because of matches on predicate columns, a DELETE statement is guided to rows qualifying for deletion by index X, but index Y, on the same table, may have a key whose ordering is very different from that of index X's (i.e., the two indexes' key values correlate little, if at all, on a row-by-row basis). In that case, finding the entry in index Y to delete as part of deleting a table row could well require an index probe operation (i.e., a top-down traversal of index Y, from root page to leaf page). If that kind of thing is happening for several indexes on the table, the number of GETPAGEs for a DELETE statement could be several times larger than the number of rows deleted; and, that's not because the DELETE has a "bad" access path - it's because the statement is a DELETE and not a SELECT.

Bottom line: comparing GETPAGE counts between SELECT and DELETE statements is not useful or meaningful, even if the statements have identical predicates. If you have a row-delete process that is consuming more CPU than you'd like, what can you do about it? Here are a couple of possibilities:

  • Reduce - don't increase - the number of indexes on the target table. As I've mentioned, it's a general truism that while indexes can boost query performance, they tend to increase CPU and elapsed time for DELETE statements. That being the case, a good way to boost DELETE performance is often to find and remove indexes on the target table that are not doing any good. I call this "index trimming," and I provided some information on that topic in an entry I posted to this blog some years ago (I'd ignore the part of that blog entry that deals with hash-organized tables - that Db2 feature was deprecated with function level 504 of Db2 12 for z/OS).
  • Consider using the DISCARD option for online REORG. Particularly when the row-delete criterion (or criteria) is not particularly complex, and can be expressed in the form of a predicate or predicates referencing the target table, executing an online REORG of the table's table space with the DISCARD option can be an attractive way to efficiently remove a large number of rows from the table with minimal disruption of application access to the data (there will always be at least a brief - sometimes just a few seconds - period of no data access, when a REORG job enters the SWITCH phase near end-of-execution).
I hope that this information will be useful for you. Don't confuse DELETE-statement tuning with SELECT-statement tuning. Two different animals, as we say.

Tuesday, August 23, 2022

What Db2 for z/OS People Should Know About Data Fabric

"Data fabric" is an increasingly hot topic in IT circles, and with good reason - an effectively implemented data fabric can deliver significant dividends by enabling an organization to get more value from its data assets. Db2 for z/OS people should have some familiarity with the data fabric concept and associated technology, not only as preparation for participating in data fabric-related discussions but also because data fabric is of major strategic importance for Db2 for z/OS (and for other z/OS-based data sources). In this blog entry I'll provide information on data fabric that I hope will be helpful to Db2 for z/OS people.

What is "data fabric," anyway?

Essentially, data fabric is an architecture that brings uniformity and consistency to data originating in a disparate collection of sources - sources which could be (likely would be) housed in a mix of on-premise and in-cloud systems (and, especially for larger enterprises, "in cloud" would involve several different public cloud providers and perhaps some private cloud environments). That uniformity and consistency is manifest in multiple aspects of data interaction via the data fabric, including data access, discovery, utilization, cataloging, protection and governance; further, a data fabric is likely to have a "smart" dimension, with AI and machine learning technology leveraged to provide intelligent automation of data management tasks.

I mentioned that the data fabric payoff is increased value gained from an organization's data assets. How does data fabric deliver that payoff? Basically, by eliminating friction that would otherwise impede data access, discovery, utilization and integration - and doing that without compromising data security. The promise of a data fabric can be largely summed up in this way: it provides an environment in which the right data (i.e., data that is current, trusted, understood and complete) is available to the right people (people who know the data, people who know what data they need, people who know what they want to do with data) at the right time (i.e., when the data is needed).

In thinking about the value of the consistency and uniformity that a data fabric brings to what would otherwise be a disjointed data landscape, it can be helpful to consider a cake-baking analogy. Suppose you are tasked with baking a cake, and suppose further that the ingredients must be ordered from different countries, and you have to communicate with suppliers using the primary language of each source country and you have to remunerate the suppliers using source-specific modes of payment. Here's how that might go (and in your mind, substitute any countries you want for the ones I mention - I'm not picking on anyone):

  • The eggs for the cake are to come from Japan, but there is a delay in procurement because you don't speak Japanese.
  • The butter is to come from Australia, but the supplier will only send the butter after having received payment in coins that were sent via sailboat.
  • The flour will come from a supplier in Germany. Your German is a little rusty, but pretty good so there's not much of a delay there.
  • The sugar is to be sourced from Brazil, but your lack of familiarity with the ingredient-ordering user interface results in your being unable to locate a supplier.
  • This all leads to your getting a late start in baking the cake, and on top of that the eggs went bad while you were waiting for the butter, and you never got the sugar. The people who were looking forward to consuming your confection had to wait a frustratingly long time to get a very un-tasty cake. Not good.
Now imagine a different scenario, in which a cake-ingredient-ordering front end abstracts the particulars of the ingredient suppliers (such as native language) and provides uniformity for payment and shipment. Using that front end, you get the ingredients you need - and all the ingredients you need - in a timely manner, and your cake consumers are delighted with the product of your kitchen, which satisfied their sweet-tooth needs and arrived at the right time.

So it is with a data fabric: different data elements from different data sources are the “ingredients” that provide a complete (sometimes called a “360”) view of a subject of interest - be that customers, processes, supply chains, products, whatever. And here's the upshot: when the right (and all the right) data ingredients get to the right people at the right time, the result is better: better decisions, better and more timely applications, better outcomes.

There is technology that can make the promise of data fabric a reality, but before getting into that I want to emphasize that data fabric is NOT just a matter of leveraging technology. I'd go so far as to say...

Data fabric is culture

There were people who said the same thing a few years ago about DevOps, and for the same reason: full and effective implementation of a data fabric can require new organizational roles and new ways of thinking about and managing data. To appreciate this assertion, consider the "personas" (i.e., the people-roles) associated with individuals who would work with, and in relation to, a data fabric. That exercise is facilitated if you think of a data fabric as something that enables a “data store,” in which people “shop for data.” For a traditional retail store, relevant personas include the following:

  • Consumers acquire products from the store.
  • Suppliers provide products for the store.
  • A store manager decides which products should go on which shelves.
  • A sales associate puts the right products on the right shelves.
OK, so what are the personas that have a relationship with the "data store" enabled by a data fabric? Some are listed below.

  • data consumer might be a developer working on a new application, or a business analyst researching the viability of a proposed new product.
  • database administrator oversees a data source that supplies the data store.
  • data curator might make decisions on what data will be available through the data store, and to whom.
  • data steward might “stock the shelves” of the data store, based on decisions made by a data curator.
Look again at those last two personas in the list above - data curator and data steward. I can tell you for a fact that those roles exist today in multiple organizations - are they present in your workplace? And note: a data fabric's impact goes beyond new organizational roles - it involves new ways of thinking about data management. Here's what I mean: historically, data was often thought of in relation to where it was stored. That manner of thinking led to “silo” situations, and the difficulty of working with data in a “cross-silo” way interfered with organizations’ extracting maximum value from their data assets. By contrast, a data fabric will deliver the greatest benefit when it supports a data management approach that focuses more on data itself, and less on where data is stored. One implication of a data-centric (versus a data-source-centric) approach to data management is that data access decisions (i.e., who can access what data, and in what form) are made by data professionals (e.g., data curators), as opposed to being made by database professionals (e.g., DBAs). In such an environment, data source administrators are implementers of data access decisions made by data curators.

If a data fabric puts data administration (versus database administration) responsibility on data professionals (e.g., data curators), does that diminish the role of a Db2 for z/OS DBA? I would say it does not. I see this is being part of an ongoing evolution of the Db2 for z/OS DBA role to be more engaged in application development (for distributed systems DBAs, this role shift became widespread some years ago). This is a good thing. I am convinced (and more importantly, so are a lot of IT leaders at Db2 for z/OS sites) that the value a mainframe Db2 DBA delivers to an organization goes up when that DBA's work has more of an application-enabling focus.

Let me shift now from organizational impact to enabling technology.

IBM's foundational data fabric-enabling technology

Multiple IBM offerings have a connection with data fabric, but the most foundationally important of these is called Cloud Pak for Data. Cloud Pak for Data's importance has a lot to do with IBM's point of view regarding data fabric implementation. We believe that a data fabric is most effectively implemented as an abstraction layer extended over an existing data landscape. Such an implementation approach acknowledges the significance of “data gravity” - the idea that data usage actions should flow to the data, rather than vice versa. A data fabric enabled via Cloud Pak for Data is characterized by "in-place” access to data on systems of origin. This approach delivers multiple benefits, including:
  • Minimization of data replication costs.
  • Protection of data security and consistency.
  • Optimized performance.
Cloud Pak for Data itself can be thought of as a set of software-powered services that relate to access, governance and usage of data. Cloud Pak for Data can be deployed anywhere Red Hat OpenShift (a Kubernetes container platform) can be deployed: on-premise, in a private cloud or in a variety of public cloud environments (it is also available in a fully managed, as-a-service form). Cloud Pak for Data can be used with a wide range of data sources on Linux, UNIX, Windows and z/OS systems, and those data sources can be on-premise and/or in-cloud.

How would Cloud Pak for Data be used by people in an organization? Here's one scenario: let's say that Val leads a development team that will soon begin work on a new application. To support this work, Val’s team will need access to some data (which happens to be in a Db2 for z/OS database) and associated metadata (data about the data). Val sends a request to this effect to Steve, a data curator. Steve is very familiar with the data that the new application will process. He logs in to Cloud Pak for Data's user interface and creates a project that will provide Val’s team with the data and metadata they need. Db2 for z/OS is one of many data sources supported by Cloud Pak for Data, and Steve creates a connection to the relevant Db2 system. Steve selects the particular tables holding the data that the new application will process and associates them with the project he created for Val's team. Steve also imports metadata for the selected tables, and enriches that metadata with statistical values, data quality scores and business terms. Finally, Steve creates a masking rule for sensitive data in a column of one of the selected Db2 tables - Val's team will be able to reference the column in their program code, but they will only see masked values when they view the column's contents. With the project created and the associated data assets published to a catalog to which Val and her teammates have access, the developers will be able to easily view the data and the related metadata, and this will enable them to move ahead quickly and productively with coding and testing.

The point I really want to make here is not so much, "Look what the data curator can do for the application development team." Even more important to me is the fact that, had Val's team needed access to data (and with it, associated metadata) in a Db2 for Linux/UNIX/Windows database, or a SQL Server database, or an Oracle database, or Apache Cassandra, or Amazon S3, or MariaDB or one of the myriad other data sources supported by Cloud Pak for Data, the actions of the data curator would have been largely the same. And, that would be the case for all kinds of other Cloud Pak for Data usage scenarios - a data scientist needing to develop and train a predictive model, a business person wanting to create a report with accompanying data visualizations, a data curator implementing new rules and policies concerning access to certain data assets, a data administrator virtualizing non-relational data to make it more easily accessible and consumable, whatever. That, as much as anything, is the "secret sauce" of a Cloud Pak for Data-enabled data fabric: it makes all kinds of data sources more easily accessible and effectively consumable by all kinds of people, without sacrificing data governance and security. And when more of an organization’s data assets are used more easily and effectively by more people, the organization works better.

Data fabric is strategically really important for z/OS as a data-serving platform

The uniformity brought to a data landscape by a data fabric is of outsized importance in the context of z/OS as a data-serving platform. How so? Think about it. What gets in the way of z/OS-based data being more effectively - and more widely - used by people in an organization? Often, it's the perceived “other-ness” of the mainframe – the sense non-mainframe people have that z/OS-based data is inherently harder to access, understand and use than data on other platforms. Truth be told, that perception has, historically, been at least partly fact-based – it has been harder for many people to access and use z/OS-based data versus off-mainframe data. The great value, then, of an effectively implemented data fabric, from a z/OS perspective, is not so much that it makes z/OS-based data easier to access and use versus off-mainframe data; rather, it’s the fact that the data fabric makes z/OS-based data as easy to access and use as off-mainframe data. Why that's so powerful: while mainframe systems have been recognized for decades as being unmatched in terms of reliability, security, scalability, efficiency and performance, there have been plenty of people who would say, "Yeah, but mainframe-based data is hard to access and use." An effective data fabric eliminates that "yeah, but..."

Let that sink in: by making discovery, understanding, consumption and usage of data in z/OS systems as easy it is for data on other platforms, a data fabric makes IBM zSystems an even higher-value platform for an organization's most valuable data assets.

If your organization has not yet looked at implementing an enterprise data fabric, now could be a good time to start down that path. And, the "in-place access to data on systems of origin" that characterizes a data fabric implemented with IBM's Cloud Pak for Data could well be the approach that will deliver maximum benefits in your environment. Give it some thought, and get engaged.

Tuesday, July 26, 2022

What Should a Modern Db2 for z/OS Client-Server Application Environment Look Like?

The distributed data facility (aka DDF) is the component of Db2 for z/OS that enables data access by applications that connect to Db2 via TCP/IP communication links. DDF has been around for over 30 years, but especially during the past 10 years or so DDF workloads have become very large at many Db2 for z/OS sites, with individual Db2 subsystems processing DDF transactions at sustained rates of over 4000 per second (and many more than that for Db2 data sharing groups running on Parallel Sysplex clusters of mainframe servers). For an ever-larger number of Db2-using organizations, the DDF workload is the largest - and the fastest-growing - component of the overall Db2 workload. Given the importance of DDF in the Db2 workload mix, it's worthwhile to consider what a modern Db2 client-server application environment should look like. In looking over the Db2 DDF scene in recent years I've seen a lot of things that I like. In this blog entry I'll share Db2 DDF application environment characteristics that get a thumbs-up from me.

Straight from the app servers to Db2 (for DRDA requesters)

For a number of years, organizations have been migrating away from the use of Db2 Connect "gateway servers" (application server to Db2 Connect gateway server, Db2 Connect gateway server to Db2 for z/OS system) in favor of direct connections from application servers to Db2 for z/OS systems by way of the IBM Data Server Driver (at many sites this transition is already complete). When access to Db2 for z/OS from DRDA requester applications is accomplished through the IBM Data Server Driver, "Db2 Connect" becomes, essentially, just a product licensing term, versus an actual product used - this because entitlement to use the IBM Data Server Driver is provided through an organization's Db2 Connect license (so, if an organization is licensed for Db2 Connect Unlimited Edition for System z, that organization can deploy the IBM Data Server Driver in an unlimited way for applications that access the mainframe system(s) associated with the Db2 Connect license).

There are several advantages to going with the direct connection to Db2 for z/OS versus going through a Db2 Connect gateway server. One is performance: with the "hop" to a Db2 Connect gateway server eliminated, better response times and throughput can be achieved. Another direct-connection benefit is improved problem diagnosis capabilities - error messages have more-specific meaning when the network-connected server that is "adjacent" to Db2 for z/OS is an application server, versus a Db2 Connect gateway server. The direct connection approach also tends to make Db2 client configuration and upgrade work more straightforward.

Note my mention that this Db2 Connect gateway versus direct Db2 for z/OS connection matter is relevant for DRDA requester applications. It is not pertinent to clients that utilize the REST interface to Db2 for z/OS, as such interactions do not involve the DRDA protocol. See below for more information about Db2 REST clients.

Greater use of the Db2 for z/OS REST interface (which is part of DDF functionality)

By "greater use" I don't mean to suggest that the REST interface to Db2 for z/OS is somehow "better" than the DRDA interface (which I like to call the SQL interface to Db2, as a DRDA requester application issues Db2-targeting SQL statements). The REST interface is different versus the SQL interface, and sometimes that difference makes it a good choice for a Db2-accessing client-server application. I wrote a blog entry a few years ago with a lot of compare-and-contrast information about the REST and SQL interfaces to Db2, and I won't repeat all that here. To Db2 for z/OS DBAs, I'd say this: 1) make sure your application developers know that Db2 has a built-in REST interface, and 2) be ready to help support use of the REST interface when that is the choice of a development team. Sometimes, developers - even those who have strong SQL skills - have a preference for the REST architectural style, often because it so fully abstracts the particulars of service-providing systems.

If you do make use of Db2's REST interface, and think you might expand on that in the future, consider what IBM z/OS Connect could do for your organization. When Db2 for z/OS is accessed through z/OS Connect, it's still Db2's REST interface that's being used (Db2 in that case is a REST provider to z/OS Connect), but z/OS Connect provides some important benefits: it makes creation of z/OS-based REST services easier, it provides richer "service discovery" information to client application developers, it adds flexibility to the formatting of service-output JSON documents, and it provides a single access point through which all manner of z/OS-based programmatic assets can be invoked through REST requests - not only Db2 SQL statements and stored procedures, but also CICS and IMS transactions (which might or might not involve access to Db2) and JES batch jobs.

Leveraging SQL procedure language (SQL PL)

SQL PL is for Db2 (for z/OS and for Linux/UNIX/Windows) what T-SQL is for SQL Server and what PLSQL is for Oracle - a way to write data-processing programs using only SQL statements. SQL PL makes that do-able via a set of SQL statements called control statements - "control" being short for "logic flow control." Among these statements are ITERATE, WHILE, GOTO, IF and LOOP - you get the idea. There are all kinds of reasons for using SQL PL, one being related to cost-of-computing: when a SQL PL routine (such as a native SQL procedure) is invoked through Db2's distributed data facility - whether through a SQL CALL issued by a DRDA requester or via a REST request - its execution is up to 60% offload-able to zIIP engines (mainframe processors that cost less than general-purpose processors and do not factor into the determination of z/OS software charges).

Besides the economic advantage of SQL PL for DDF-using applications, there are functional advantages. For example, a native SQL procedure (a stored procedure written in SQL PL) - and only a native SQL procedure - can be created (or altered) with the AUTONOMOUS option, which means that if the calling transaction fails and is rolled back by Db2, the data-changing actions (e.g., INSERT/UPDATE/DELETE) performed by the autonomous procedure will not be rolled back (this can make autonomous procedures very useful for "transaction initiation audit trail" purposes - you can use an autonomous procedure to record the fact that a transaction got started, and that information will be preserved even if the transaction ends up failing). SQL PL routines can also accept Db2-defined arrays as input, whereas external Db2 routines (written in languages such as COBOL) cannot.

Something else to consider: if you're using SQL PL only for stored procedure programs, you're missing out. SQL PL can also be used to write user-defined functions, and a SQL PL routine can be included in the body of an advanced trigger (advanced triggers were introduced with Db2 12 for z/OS).

And, you should take note of how the CREATE OR REPLACE PROCEDURE syntax introduced with Db2 12 function level 507 can enable greater agility when it comes to deploying Db2 stored procedure programs, especially those written in SQL PL.

And, you should be managing SQL PL source code (that would be, in the case of native SQL procedures, the associated CREATE PROCEDURE statements) with a source code management (SCM) tool - the SYSROUTINES table in the Db2 catalog is not intended to be a SQL PL SCM. Which SCM? Whichever one(s) your organization's developers use to manage their source code - that could be a vendor-supplied SCM or an open-source tool such as Git.

Using the right (no-charge) Db2 SQL development tool

If you (or others in your organization) are using IBM Data Studio for Db2 for z/OS SQL testing and for SQL PL routine development and debugging, it's time for a change. IBM's strategic replacement for Data Studio is the (also no-charge) Db2 for z/OS Developer Extension for Visual Studio Code (also available for the Eclipse Theia IDE).

Leveraging the Db2 profile tables

The Db2 for z/OS profile tables - SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES - can be very helpful for a DRDA requester application workload. For one thing, they can be used to specify application-specific limits on concurrent DBAT (DDF thread) usage and/or connections established with a Db2 system and/or idle thread time - handy when the system-wide DBAT, connection and idle thread limits established via the ZPARM parameters MAXDBAT, CONDBAT and IDTHTOIN are not as granular as you need them to be. The Db2 profile tables can also be used to set the value of a number of Db2 special registers and/or built-in global variables, automatically when an application connects to the Db2 system. One example of this kind of profile table usage is setting the value of the CURRENT PACKAGE PATH special register to point a DRDA requester application to a collection in which the IBM Data Server Driver packages are bound with RELEASE(DEALLOCATE), as a means of getting high-performance DBAT functionality for the application.

Db2 for z/OS DBAs being heavily involved in application development

In way too many cases, DRDA requester or Db2 REST interface-using applications are developed with little Db2 DBA involvement, until late in the game when a production deadline is looming and physical implementation of tables is done in a rushed and sub-optimal way. Logical database design may also have happened with little DBA input, with negative consequences down the road. This situation is typically not a result of application developers giving Db2 DBAs the cold shoulder. Rather, my observation has been that some Db2 for z/OS DBAs view developers as a nuisance or an irritant - as "them." Wrong mindset. Way wrong. Db2 for z/OS DBAs maximize the value they deliver to an organization when they team with developers at the very early stages of an application development project. Not only can that help to ensure a logical and physical database design that will deliver optimal benefits for the application (and for application users), it also provides an opportunity for DBAs to ensure that developers are aware of Db2 features - temporal data support, transparent archiving, the REST interface, autonomous native SQL procedures, non-traditional data types (e.g., XML), global variables, newer built-in functions (e.g., LISTAGG), advanced triggers, whatever - that could enable and accelerate development of functionality of importance for an application. My advice is for Db2 for z/OS DBAs to think of themselves as part of the extended development team for Db2-accessing applications. That approach can be especially effective for modern Db2 client-server applications.

I hope that the information in this blog entry will be useful for you. As always, thanks for stopping by.

Sunday, June 26, 2022

Db2 for z/OS: The Online Path from a Partition-by-Growth Table Space to Partition-by-Range

Last month, I posted to this blog an entry on the long-awaited capability to migrate tables, in an online way, from a multi-table segmented or simple table space to multiple single-table partition-by-growth table spaces (a capability delivered with function level 508 of Db2 12 for z/OS). This month, I'll describe an even newer Db2 feature that also enables online migration of tables from one table space type to another. This feature, introduced with function level 500 of Db2 13 for z/OS, allows a DBA to migrate a table from a partition-by-growth table space to a partition-by-range table space with an ALTER TABLE statement and a subsequent online REORG. Read on to learn more.

The motivation

When the universal partition-by-growth (PBG) table space type was introduced with Db2 9 for z/OS, the response among a lot of DBAs was very enthusiastic, and there was good reason for this: it enabled a table to grow beyond 64 GB in size without having to be range-partitioned. Range-partitioning a table, after all, requires a good bit of up-front analysis. What should be the table's partitioning key? How many partitions should the table have? What should be the limit key value for each partition? By contrast, a PBG table space has more of a "set it and forget it" quality - you just determine the appropriate DSSIZE value for the table space (the maximum size for a partition of the table space), and a maximum number of partitions (the MAXPARTITIONS specification - easily changed at a later time if need be), and you're done. If the table space's DSSIZE value is, for example, 16G (i.e., 16 GB), when partition 1 reaches that size then Db2 will automatically add a second partition for the table space, and when that one hits 16 GB then a third partition will be added by Db2, and so on. Easy.

Ah, but there came to be some "buyer's remorse" at more than a few Db2 for z/OS sites as certain PBG table spaces got larger and larger. Why? Because the larger a table gets the more advantageous it can be to have the table in a partition-by-range (PBR) table space. I described these PBR versus PBG advantages (for large tables) in an entry I posted to this blog a few years ago. They include potentially greater (maybe much greater) insert throughput, thanks to the ability to have multiple row-inserting processes execute concurrently for different partition of the PBR table space; great suitability for data rows managed on a time-series basis; and maximum partition-level utility independence. Here was the dilemma, though: prior to Db2 13, the only way to get a table from a PBG to a PBR table space was to unload the table, drop the table, re-create the table in a PBR table space, and reload the table's data. You had, then, this irritating situation: the advantages of PBR versus PBG would be more pronounced as a table got larger, but changing from PBG to PBR was more challenging as a table got larger, due to the unload/drop/re-create/re-load requirement.

Enter Db2 13, and this situation changes, big-time.

The Db2 13 difference

Let's say you have table BIGTAB in a PBG table space, and you'd really like for BIGTAB to be in a PBR table space. In a Db2 13 system (Db2 13 became generally available on May 31 of this year), with function level 500 (or higher) activated, you can issue the following SQL statement (I have highlighted the new syntax in green, and I am assuming that the ACCT_NUM column of BIGTAB is the desired partitioning key):


That ALTER is a pending change. When an online REORG is subsequently executed for BIGTAB's table space, coming out of that online REORG the BIGTAB table will be in a PBR table space. Done. The table will have the same indexes that it had before, and it'll be immediately available for access by users and programs.

Some additional information, and considerations

Here are a few things to keep in mind:

  • The partitions of the new PBR table space will initially have the same DSSIZE as the PBG table space that's being migrated to PBR, and that's kind of important. Here's why: when you're determining the partitioning scheme for the new PBR table space you need to consider whether all the rows that belong to a given partition (per the partitioning scheme of the PBR table space) will fit in the partition, given the DSSIZE. Suppose, for example (and continuing with the BIGTAB table referenced previously), that the DSSIZE value for BIGTAB's PBG table space is 4G, and the number of rows in BIGTAB with an ACCT_NUM value greater than 199 and less than or equal to 299 (i.e., rows that would go into partition 2 of the new PBR table space) will not fit into a 4 GB data set. In that case the online REORG after the ALTER will fail. To avoid that failure, you'd need to either change the partitioning scheme so that the rows assigned to a given partition will fit in a 4 GB data set, or change the DSSIZE value of BIGTAB's PBG table space to something larger than 4G. If you decide on the latter action (increase the DSSIZE value for BIGTAB's PBG table space), understand that you'll need to issue that ALTER for the table space (to go to a large DSSIZE value) and then execute an online REORG to materialize that change and then issue the ALTER to change from PBG to PBR and execute another online REORG to materialize that pending change. Why two online REORGs? Because, when you take action to change a PBG table space to PBR the online way, there can't be any other outstanding (i.e., not yet materialized) pending changes for the PBG table space - the change to PBR has to be the only pending change for the PBG table space.
  • The new PBR table space will use relative page numbering (RPN), which was introduced with Db2 12 for z/OS. This is a very good thing. To see why, check out the blog entry I wrote about RPN a couple of years ago.
  • As is very often the case when a pending change is materialized, the online REORG that changes a PBG table space to PBR will invalidate packages dependent on the associated table. You can identify those dependent packages by querying the SYSPACKDEP table in the Db2 catalog.
  • This ALTER + online REORG route from PBG to PBR is not available for a table that has an XML or a LOB column.

A closing thought

If you are on Db2 12 and you have some PBG table spaces that you'd like to change - easily, and in an online way - to PBR, the enhancement I've described herein could be a good reason for getting your Db2 13 migration project going.

Monday, May 30, 2022

Db2 for z/OS: The Online Path from a Multi-Table Table Space to Universal Table Spaces

Back in 2014, I posted an entry to this blog on the topic of getting to universal table spaces from non-universal table spaces. In that entry, I noted that there was an online path (ALTER followed by online REORG) for getting from a "classic" partitioned table space to a universal partition-by-range (PBR) table space, and for getting from a single-table simple table space or traditional segmented table space to a universal partition-by-growth (PBG) table space. I also pointed out that no such online path to universal table spaces existed for a multi-table table space: "For a simple or segmented table space containing multiple tables, you'd have to either go the unload/drop/create/re-load route (with one PBG universal table space created for each table in the multi-table simple or segmented table space), or wait and see if a future release of Db2 provides a non-disruptive universal table space conversion process for multi-table simple and segmented table spaces (this is a known requirement)." The wait for the hoped-for Db2 enhancement ended in October of 2020, when Db2 12 function level 508 became available (via the fix for APAR PH29392). In this blog entry, I will describe how a capability introduced with Db2 12 function level 508 enables online migration of tables from multi-table table spaces to universal PBG table spaces.

For illustrative purposes, let's say that you have a traditional segmented table space containing four tables (I say, "traditional segmented table space" because universal table spaces are also segmented). The tables are named T1, T2, T3 and T4. You have function level 508 (or later) activated on your Db2 12 system (or you have a Db2 13 system). How do you get tables T1, T2, T3 and T4 from the traditional segmented table space into universal PBG table spaces, in an online way? Here's how:

  • First, create a table space for each of the tables T1, T2 and T3 (I'll get to T4 momentarily). Here's the form of the CREATE TABLESPACE statement you should use for this purpose (let's assume that the new table spaces will be named TS1, TS2, etc.):

  IN dbname
  DSSIZE xxx;

Note: the database for each of these new PBG table spaces will need to be the same as the database of multi-table table space of interest (similarly, the CCSID of each of the new table spaces will have to be the same as the CCSID of the multi-table table space of interest). Also, MAXPARTITIONS 1 is required, at least initially (you can change the MAXPARTITIONS value later if desired). DEFINE NO is also required (the table space data sets will be created by Db2 later). DSSIZE can be any value appropriate for the tables that will be moved to the new table spaces (consider that 64G would work for any table, since a traditional segmented table space cannot exceed 64 GB in size).

  • Next, issue the following ALTER statement for the multi-table traditional segmented table space (the new ALTER TABLESPACE option introduced with Db2 12 function level 508 is highlighted in green):

ALTER TABLESPACE dbname.source-table-space-name

Note that this is a pending change for the source table space - the actual table move will be effected via a subsequent online REORG of the source table space, as explained below. The statement above would be executed as well for tables T2 and T3 (I haven't forgotten about table T4 - I'll get to that). Keep in mind that, starting with Db2 12, the APPLCOMPAT package bind specification applies to DDL as well as DML statements. What does that mean for the ALTER TABLESPACE statement shown above? It means that the package through which the ALTER is issued (e.g., a DSNTEP2 package, or a SPUFI package) needs to have an APPLCOMPAT value of V12R1M508 or higher.

  • OK, online REORG time. An online REORG executed for the source table space (the one in which tables T1, T2, T3 and T4 had been located) will cause each table for which an ALTER TABLESPACE with MOVE TABLE has been executed to be relocated to its designated PBG table space. When the online REORG has completed, each relocated table will be ready to use - it will have its indexes and everything.
So, what about table T4? You have a choice here. One option would be to do for table T4 what you did for tables T1, T2 and T3: create a new PBG table space for the table, and execute an ALTER TABLESPACE with MOVE TABLE T4. And the other option? Well, consider the situation after you've moved tables T1, T2 and T3 to their respective PBG table spaces. The source table space, which formerly held four tables, now holds only one table: T4. What does that mean? It means that you can alter the source table space with a MAXPARTITIONS value and then online REORG it to convert it to a PBG table space - you've been able to do that for a single-table traditional segmented table space or a single-table simple table space since Db2 10 for z/OS.

Here are a few things to keep in mind with regard to online migration of tables from a multi-table table table space to PBG table spaces:
  • Moving tables from multi-table table spaces to PBG table spaces is likely to mean an increase in the number of data sets for the Db2 system. Given that a table's database can't change when it goes from a multi-table table space to a PBG table space, you'll want to make sure that the number of OBIDs (object identifiers) for the database in question will not exceed the limit of 32,767. A page in the Db2 for z/OS documentation contains a query that you can issue to identify databases that might be getting close to the OBID limit.
  • More data sets could also lead you to increase the size of the DBD cache in the EDM pool (the associated ZPARM parameter is EDMDBDC), and/or to increase the Db2 subsystem's DSMAX value. Regarding the DBD cache, you generally want the ratio of "DBD requests" to "DBD not found" (referring to fields in a Db2 monitor-generated statistics long report, or in an online display of EDM pool activity) to be at least in the tens of thousands to one. As for DSMAX, you usually want that value to be sufficiently high so as to either not be reached (per your Db2 monitor) or so that only a few data sets per hour are closed as a result of hitting the DSMAX limit (again, per your Db2 monitor).
  • An online REORG that materializes a MOVE TABLE pending change will invalidate packages that depend on the table or tables being moved, so plan for rebinding those packages (the SYSPACKDEP catalog table contains information to identify dependent packages).
  • Additionally, an online REORG that materializes a MOVE TABLE pending change will operate on both the source and target table spaces; accordingly, inline image copies will be created for those table spaces. These will establish a recovery base for the objects, but note that after execution of the table-moving online REORG you will not be able to recover the source table space to a point in time prior to the table-moving online REORG.
  • Don't worry if a source table space holds hundreds of tables (as some do in the real world) - there's no requirement that all the tables be moved to PBG table spaces in one fell swoop. You can move a few at a time, no problem. Just keep in mind that an online REORG of the source table space will move every table that has not already been moved and for which there is a pending MOVE TABLE change.
There you have it. Universal table spaces are what you want, and you now have an online way to get there for your multi-table table spaces. Happy moving.