Wednesday, December 21, 2022

Db2 13 for z/OS: Setting Lock Timeout Limit and Deadlock Priority at the Application Level

Db2 13 for z/OS, which became generally available about seven months ago, introduced two interesting features that are similar in some ways but differ in one important aspect (about which I'll comment momentarily). These new features allow an application (or, more broadly, a "process") to set its own lock timeout limit and/or its own deadlock priority. With this blog entry I aim to provide related information that will be useful for you.


Application-level lock timeout

First, let's establish the need for this Db2 13 enhancement. Historically, there has been one lock timeout limit - specified via the IRLMRWT parameter in ZPARM - that applies to all processes interacting with a Db2 subsystem. While IRLMRWT is still there in a Db2 13 environment, it became apparent some time ago that "one size fits all" will often NOT be ideal when it comes to lock timeout in a Db2 system. Think about it. Suppose the value of IRLMRWT is at the default of 30 seconds for a production Db2 system at your site. You might have a developer of a Db2-accessing online application say, "What? NO! This app has a mobile front-end and users can get VERY frustrated if they have to wait more than a few seconds for a transaction to complete. It would be TERRIBLE to have a transaction sit and wait for 30 seconds to get a Db2 lock. We need the lock timeout value to be WAY lower than 30 seconds." At the same time, a developer of a long-running batch application might say, "What? NO! This job HAS to complete once it gets started or we miss SLAs and have angry customers. The job typically runs for five hours, and maybe it's been running for four hours and you want to time it out because it's been waiting for a lock for 30 seconds? 30 seconds is NOTHING as far as this job's concerned. The Db2 lock timeout value should be SUBSTANTIALLY greater than 30 seconds." Both of the developers are expressing legit concerns. How can those disparate concerns be addressed?

They can be addressed via the new (with Db2 13) special register named CURRENT LOCK TIMEOUT (available for use when Db2 13 function level 500 has been activated). Here are some things to know about CURRENT LOCK TIMEOUT:

  • The value of the special register - expressed in seconds - can be anything between -1 and 32767 (or a site-specified upper bound - see the next item in this list). A value if -1 means that the process will not be timed out if it ends up waiting for a lock - it will wait until it gets the requested lock or becomes deadlocked with some other process. A value of 0 means that the process does not want to wait at all for a lock - it wants to get an error message if a requested lock can't be obtained immediately (this basically makes available for application use a formerly Db2-internal mechanism known as a conditional lock request).
  • If the default upper-limit value of 32767 seconds is deemed by a Db2-using organization to be too high, a different max value can be provided via the new (with Db2 13) ZPARM parameter SPREG_LOCK_TIMEOUT_MAX. If you set that value to (for example) 1800, no process will be able to set the CURRENT LOCK TIMEOUT special register to a value greater than 1800 seconds.
  • If a lock timeout occurs and an application-level timeout limit was in effect for the lock requester and/or for the lock holder, that will be reflected in the information provided via the DSNT376I lock timeout message generated by Db2.
  • The value of the CURRENT LOCK TIMEOUT special register can be set automatically for an application by way of the Db2 profile tables, and not just for DDF-using applications (more information on this is provided below).

Application-level deadlock priority

A deadlock, of course, happens when process A holds a lock that process B needs in order to proceed, and process B holds a lock that process A needs in order to proceed. With both processes in a not-able-to-proceed state, Db2 detects the deadlock and chooses a "winner" and a "loser." The "loser" process is rolled back, causing it to release locks it had held, and that enables the other process (the "winner") to acquire the lock for which it had been waiting.

All well and good, except for the fact that one traditionally has been able to do little to nothing to influence Db2's choice of winner and loser in deadlock situations. That changes starting with function level 501 of Db2 13, thanks to a new built-in global variable named DEADLOCK_RESOLUTION_PRIORITY.

Before providing some helpful (I hope) items of information about DEADLOCK_RESOLUTION_PRIORITY, let me point out a very important difference between this new feature and the previously-described CURRENT LOCK TIMEOUT: the latter is a special register, while the former is a global variable. Why is that notable? One simple reason: any process can set the value of a special register, but a process must have permission to set the value of a global variable. The rationale for making DEADLOCK_RESOLUTION_PRIORITY a global variable may already be clear to you: if a value for DEADLOCK_RESOLUTION_PRIORITY could be set by any process, one could imagine everyone setting the global variable to its maximum value ("I always want to be the winner in a deadlock situation"), and that would defeat the purpose of the new capability (as the bad guy, Syndrome, in the movie The Incredibles put it, "When everyone is super, no one will be"). The permission-only nature of DEADLOCK_RESOLUTION_PRIORITY means that (for example) a Db2 DBA can assign the max-priority value to a "must-complete" database administration process, and know that under almost any circumstances (exceptions noted below) the process will be the winner in case of a deadlock. The same could be done - with permission received from a DBA - for a high-priority application process.

OK, items of related information that might be good for you to know:
  • The maximum value for DEADLOCK_RESOLUTION_PRIORITY is 255 (the range of acceptable values for the global variable is 0-255).
  • If you're a Db2 DBA, you might think, "I have a process that I think of as 'should-complete,' versus 'must-complete.' I want that process to generally be the winner in a deadlock situation, but I don't want it to get in the way of a 'must-complete' process. If 255 is a good DEADLOCK_RESOLUTION_PRIORITY value for a 'must-complete' process, what would be a reasonable priority value for a 'should-complete' process?" There is not a totally straightforward answer to that question. What you could do is this: start with some value for the 'should complete' process (maybe 150, or maybe 200, for example), and see if it ends up becoming the loser in a deadlock situation. If that happens, you can see how the priority of the "winner" process compared to the priority that you assigned to your 'should-complete' process, and potentially adjust your process's priority accordingly. How could you see the deadlock priority of a process that "beat" your process? That information is available via the IFCID 172 Db2 trace record. Activating IFCID 172 should involve very little overhead, as the trace record captures information about deadlocks, and deadlocks tend to be unusual in most Db2 systems I've seen. By the way, you should be able to use your Db2 monitor to generate a report with formatted information from IFCID 172 trace records (if you use IBM's OMEGAMON for Db2 monitor, the report to use for this purpose is called the Record Trace Report - that report can format the information in most any Db2 trace record).
  • I mentioned previously that there are exceptions to the "255 always wins" rule. Even if DEADLOCK_RESOLUTION_PRIORITY has been set to 255 for a process, that process could be the loser if it gets deadlocked with a process that is changing data in a table space defined with NOT LOGGED (hard for Db2 to roll back a unit of work when there are no associated undo records in the log), or if it gets deadlocked with a rollback or an abort or a backout process.

Setting the lock timeout limit or deadlock priority automatically for an application

What if you want an application to have a certain lock timeout limit or a certain deadlock priority, but you don't want the application to have to issue a SET CURRENT LOCK TIMEOUT or a SET SYSIBMADM.DEADLOCK_RESOLUTION_PRIORITY statement in order to accomplish the objective (SYSIBMADM is the schema for user-set-able built-in Db2 global variables - SYSIBM is the schema for built-in global variables that are set by Db2)? No problem: you can get that done using the Db2 profile tables. "Yeah," you might say, "but I want to do this for a local-to-Db2 application, and the profile tables can be used to set special register or built-in global variable values only for DDF-using applications." Actually, with Db2 13 that statement is no longer entirely true. Db2 13 allows the setting of CURRENT LOCK TIMEOUT (starting with function level 500) and DEADLOCK_RESOLUTION_PRIORITY (starting with function level 501) via profile table entries for local-to-Db2 as well as for DDF-using applications (for other special registers and built-in global variables, value-setting by way of the profile tables remains do-able only for DDF-using applications).

For a DDF-using application, the profile-defining specification (what you put in SYSIBM.DSN_PROFILE_TABLE) can be in a Db2 13 environment what it could be in a Db2 12 environment (the auth ID an application uses when connecting to the Db2 system is one example; the IP address of an application server is another example). For a local-to-Db2 application, the profile-defining specification can be auth ID and/or role, or collection name and/or package name, or the value of CLIENT_APPLNAME or CLIENT_USERID or CLIENT_WRKSTNNAME. The online Db2 13 for z/OS documentation provides additional details on using the profile tables to set values for special registers and for built-in global variables.

And there you have it. Db2 13 provides more control - and more-granular control - over two important aspects of application execution. I hope that these new capabilities will be useful at your site.

6 comments:

  1. Hi Rob,while running reorg shrlevel reference job,job failed with error stating sysrec dd card is required.I dont see in any manual,stating sysrec is mandatory one.whereas shrlevel change job runs fine without sysrec dd card.can you share your understanding on this

    ReplyDelete
    Replies
    1. I don't see that SHRLEVEL REFERENCE versus SHRLEVEL CHANGE would have anything to do with the need (or lack of need) for an unload data set when executing REORG TABLESPACE. What was the actual error message generated when the REORG job failed?

      With REORG TABLESPACE, it's not so much "Do I need a SYSREC, or not?" The more important question is, "Do I need an unload data set, or not?" The documentation (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=tablespace-syntax-options-reorg-control-statement) specifies when you can (and can't) use the NOSYSREC option, which tells REORG that you don't want the utility to use an unload data set (in that case, the output of sorting is the input to reload). If you don't specify NOSYSREC, you'll need to have an unload data set. That data set might or might not be named 'SYSREC'. SYSREC is just the default name for the unload data set. You could provide a different name for the unload data set via the UNLDDN option of REORG TABLESPACE.

      Robert

      Delete
  2. Hi rob,

    Good morning.
    1) to load table which has child table attached to it
    2)load a table which has parent tables attached to it.
    Should i unload and load all ri tables,but isn't time consuming if there are more ri tables?What is the best approach /possible solutions

    ReplyDelete
    Replies
    1. The "best" solution depends on the situation. If you are certain that the LOAD operations on the parent and dependent table spaces will not lead to any RI constraint violations (you might, for example, already have the data in "test" tables and you have verified that there are no RI constraints with those tables, and you are just going to unload from the "test" tables and then load the RI-verified data into the "real" parent and dependent tables) then you can load both table spaces at the same time and specify NOCHECKPEND for both LOAD jobs, and you're done.

      If there is any doubt as to whether the LOAD operations could lead to "orphan" rows in the dependent table, you can still load both table spaces concurrently with ENFORCE NO. The parent table will be fine, but the dependent table will be in check-pending status. You can then run CHECK DATA for the dependent table to see if there are any "orphan" rows in that table. CHECK DATA can delete orphan rows that it identifies, or you can delete those rows in a subsequent operation (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=utility-check-pending-status-after-running-load).

      If the parent table is itself a dependent table in one or more RI relationships, you can still load both table spaces concurrently with ENFORCE NO, but in that case they will both be marked check-pending, and you'd run CHECK DATA for the table spaces to clear that status (or to identify rows in violation of constraints).

      Hope this helps.

      Robert

      Delete
    2. Hi rob,thanks for your response.
      I understand that running chk data will automatically remove orphan rows which need not have delete yes/exception keyword to have this deleted.
      You have mentioned che k data will delete invalid rows,will it get stored somewhere implicitly.or only if I mention exception with delete yes ,it will get stored?
      You have mentioned orphan rows can be deleted by chk data or delete by subsequent operation-what is that? How can I do that.is it manual delete?

      Delete
    3. If you go to the Db2 for z/OS online documentation page for the CHECK DATA utility (https://www.ibm.com/docs/en/db2-for-zos/12?topic=utilities-check-data), and you look at the text under the heading, "Output," you will see this: "CHECK DATA SHRLEVEL REFERENCE optionally copies and deletes rows that violate referential or table check constraints. The utility copies each row that violates one or more constraints to an exception table."

      Robert

      Delete