Db2 13 for z/OS, which became generally available about seven months ago, introduced two interesting features that are similar in some ways but differ in one important aspect (about which I'll comment momentarily). These new features allow an application (or, more broadly, a "process") to set its own lock timeout limit and/or its own deadlock priority. With this blog entry I aim to provide related information that will be useful for you.
Application-level lock timeout
First, let's establish the need for this Db2 13 enhancement. Historically, there has been one lock timeout limit - specified via the IRLMRWT parameter in ZPARM - that applies to all processes interacting with a Db2 subsystem. While IRLMRWT is still there in a Db2 13 environment, it became apparent some time ago that "one size fits all" will often NOT be ideal when it comes to lock timeout in a Db2 system. Think about it. Suppose the value of IRLMRWT is at the default of 30 seconds for a production Db2 system at your site. You might have a developer of a Db2-accessing online application say, "What? NO! This app has a mobile front-end and users can get VERY frustrated if they have to wait more than a few seconds for a transaction to complete. It would be TERRIBLE to have a transaction sit and wait for 30 seconds to get a Db2 lock. We need the lock timeout value to be WAY lower than 30 seconds." At the same time, a developer of a long-running batch application might say, "What? NO! This job HAS to complete once it gets started or we miss SLAs and have angry customers. The job typically runs for five hours, and maybe it's been running for four hours and you want to time it out because it's been waiting for a lock for 30 seconds? 30 seconds is NOTHING as far as this job's concerned. The Db2 lock timeout value should be SUBSTANTIALLY greater than 30 seconds." Both of the developers are expressing legit concerns. How can those disparate concerns be addressed?
They can be addressed via the new (with Db2 13) special register named CURRENT LOCK TIMEOUT (available for use when Db2 13 function level 500 has been activated). Here are some things to know about CURRENT LOCK TIMEOUT:
- The value of the special register - expressed in seconds - can be anything between -1 and 32767 (or a site-specified upper bound - see the next item in this list). A value if -1 means that the process will not be timed out if it ends up waiting for a lock - it will wait until it gets the requested lock or becomes deadlocked with some other process. A value of 0 means that the process does not want to wait at all for a lock - it wants to get an error message if a requested lock can't be obtained immediately (this basically makes available for application use a formerly Db2-internal mechanism known as a conditional lock request).
- If the default upper-limit value of 32767 seconds is deemed by a Db2-using organization to be too high, a different max value can be provided via the new (with Db2 13) ZPARM parameter SPREG_LOCK_TIMEOUT_MAX. If you set that value to (for example) 1800, no process will be able to set the CURRENT LOCK TIMEOUT special register to a value greater than 1800 seconds.
- If a lock timeout occurs and an application-level timeout limit was in effect for the lock requester and/or for the lock holder, that will be reflected in the information provided via the DSNT376I lock timeout message generated by Db2.
- The value of the CURRENT LOCK TIMEOUT special register can be set automatically for an application by way of the Db2 profile tables, and not just for DDF-using applications (more information on this is provided below).
- The maximum value for DEADLOCK_RESOLUTION_PRIORITY is 255 (the range of acceptable values for the global variable is 0-255).
- If you're a Db2 DBA, you might think, "I have a process that I think of as 'should-complete,' versus 'must-complete.' I want that process to generally be the winner in a deadlock situation, but I don't want it to get in the way of a 'must-complete' process. If 255 is a good DEADLOCK_RESOLUTION_PRIORITY value for a 'must-complete' process, what would be a reasonable priority value for a 'should-complete' process?" There is not a totally straightforward answer to that question. What you could do is this: start with some value for the 'should complete' process (maybe 150, or maybe 200, for example), and see if it ends up becoming the loser in a deadlock situation. If that happens, you can see how the priority of the "winner" process compared to the priority that you assigned to your 'should-complete' process, and potentially adjust your process's priority accordingly. How could you see the deadlock priority of a process that "beat" your process? That information is available via the IFCID 172 Db2 trace record. Activating IFCID 172 should involve very little overhead, as the trace record captures information about deadlocks, and deadlocks tend to be unusual in most Db2 systems I've seen. By the way, you should be able to use your Db2 monitor to generate a report with formatted information from IFCID 172 trace records (if you use IBM's OMEGAMON for Db2 monitor, the report to use for this purpose is called the Record Trace Report - that report can format the information in most any Db2 trace record).
- I mentioned previously that there are exceptions to the "255 always wins" rule. Even if DEADLOCK_RESOLUTION_PRIORITY has been set to 255 for a process, that process could be the loser if it gets deadlocked with a process that is changing data in a table space defined with NOT LOGGED (hard for Db2 to roll back a unit of work when there are no associated undo records in the log), or if it gets deadlocked with a rollback or an abort or a backout process.
Hi Rob,while running reorg shrlevel reference job,job failed with error stating sysrec dd card is required.I dont see in any manual,stating sysrec is mandatory one.whereas shrlevel change job runs fine without sysrec dd card.can you share your understanding on this
ReplyDeleteI don't see that SHRLEVEL REFERENCE versus SHRLEVEL CHANGE would have anything to do with the need (or lack of need) for an unload data set when executing REORG TABLESPACE. What was the actual error message generated when the REORG job failed?
DeleteWith REORG TABLESPACE, it's not so much "Do I need a SYSREC, or not?" The more important question is, "Do I need an unload data set, or not?" The documentation (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=tablespace-syntax-options-reorg-control-statement) specifies when you can (and can't) use the NOSYSREC option, which tells REORG that you don't want the utility to use an unload data set (in that case, the output of sorting is the input to reload). If you don't specify NOSYSREC, you'll need to have an unload data set. That data set might or might not be named 'SYSREC'. SYSREC is just the default name for the unload data set. You could provide a different name for the unload data set via the UNLDDN option of REORG TABLESPACE.
Robert
Hi rob,
ReplyDeleteGood morning.
1) to load table which has child table attached to it
2)load a table which has parent tables attached to it.
Should i unload and load all ri tables,but isn't time consuming if there are more ri tables?What is the best approach /possible solutions
The "best" solution depends on the situation. If you are certain that the LOAD operations on the parent and dependent table spaces will not lead to any RI constraint violations (you might, for example, already have the data in "test" tables and you have verified that there are no RI constraints with those tables, and you are just going to unload from the "test" tables and then load the RI-verified data into the "real" parent and dependent tables) then you can load both table spaces at the same time and specify NOCHECKPEND for both LOAD jobs, and you're done.
DeleteIf there is any doubt as to whether the LOAD operations could lead to "orphan" rows in the dependent table, you can still load both table spaces concurrently with ENFORCE NO. The parent table will be fine, but the dependent table will be in check-pending status. You can then run CHECK DATA for the dependent table to see if there are any "orphan" rows in that table. CHECK DATA can delete orphan rows that it identifies, or you can delete those rows in a subsequent operation (see https://www.ibm.com/docs/en/db2-for-zos/12?topic=utility-check-pending-status-after-running-load).
If the parent table is itself a dependent table in one or more RI relationships, you can still load both table spaces concurrently with ENFORCE NO, but in that case they will both be marked check-pending, and you'd run CHECK DATA for the table spaces to clear that status (or to identify rows in violation of constraints).
Hope this helps.
Robert
Hi rob,thanks for your response.
DeleteI understand that running chk data will automatically remove orphan rows which need not have delete yes/exception keyword to have this deleted.
You have mentioned che k data will delete invalid rows,will it get stored somewhere implicitly.or only if I mention exception with delete yes ,it will get stored?
You have mentioned orphan rows can be deleted by chk data or delete by subsequent operation-what is that? How can I do that.is it manual delete?
If you go to the Db2 for z/OS online documentation page for the CHECK DATA utility (https://www.ibm.com/docs/en/db2-for-zos/12?topic=utilities-check-data), and you look at the text under the heading, "Output," you will see this: "CHECK DATA SHRLEVEL REFERENCE optionally copies and deletes rows that violate referential or table check constraints. The utility copies each row that violates one or more constraints to an exception table."
DeleteRobert