Friday, September 29, 2017

Db2 for z/OS: Lock Avoidance

Not long ago, I served as part of the teach team for a Db2 12 for z/OS Technology Workshop that IBM offered in a city in the USA. During that workshop, I covered, among other things, a Db2 12 enhancement that can improve the CPU efficiency of a Db2 data sharing system by reducing locking activity in that environment. During a break, one of the workshop attendees told me (in words to this effect), "That lock avoidance enhancement is good news, but I think a fair number of Db2 for z/OS people are a little uncertain as to what lock avoidance is - and how it works - in the first place." I think this person was right on that score, and that's why I'm writing this blog entry: to facilitate understanding of the fundamentals of Db2 lock avoidance. In a follow-on entry that I'll try to post within the next couple of weeks, I'll cover the aforementioned Db2 12 data sharing-related lock avoidance enhancement.

OK, let's start with the "why" of Db2 lock avoidance. In support of a large application workload, Db2 does a lot of stuff. One thing that Db2 does about as much as anything else is issue lock and unlock requests. To see what I mean, take a look at a statistics long (aka statistics detail) report, generated by your Db2 monitor, covering a period of high activity on the system (or check the monitor's online display of lock activity). I looked at a report just now, reflecting activity in a large real-world production Db2 for z/OS subsystem, and what does it show? 106 million lock requests and 23 million unlock requests in one hour. That's about 30,000 lock requests per second, and about 6400 unlock requests per second, on average (there are more lock than unlock requests because multiple X-locks, acquired by an application process in the course of modifying data values, are released at a commit point with one unlock request). Now, the workload associated with all that lock activity is a big one, driving in excess of 700 million SQL data manipulation statements during the hour between the report's FROM and TO times, but that kind of volume is by no means unusual for a large Db2 site. Why the big lock request number? Two words: data integrity. Locks acquired on data pages or rows (depending on the granularity of locking in effect for a table space) help to ensure that data consistency is maintained in a Db2 for z/OS system, and that programs retrieve only committed data from the database (i.e., values that have been "hardened" in the database in the wake of data-change operations).

One lock request consumes very little in the way of CPU cycles, but tens of thousands of lock requests per second? That's another matter. When the execution volume for a given Db2 operation is really high, there is CPU-efficiency value in reducing the frequency of said operation. The Db2 development team had that in mind when, more than two decades ago, they delivered a capability called lock avoidance. Essentially, lock avoidance enables Db2 to reduce lock activity for an application workload by issuing S-lock requests (i.e., data-read locks associated with execution of queries) only when such locks are needed to ensure retrieval of committed data values.

How does Db2 know when an S-lock on.a page or row is needed to guarantee retrieval of committed data values (i.e., to avoid returning uncommitted data changes to a program)? It knows through the use of two lock-avoidance indicators: CLSNs and PUNC bits.

  • CLSNs - This acronym stands for "commit log sequence number." In every page of every Db2 page set, Db2 records the log point corresponding to the most recent update of the page's contents. Additionally, for every page set or partition (for partitioned objects), Db2 keeps track of the starting log point of the oldest still-in-flight unit of work that is changing data in the page set or partition. That latter log point is the commit log sequence number of the page set or partition. When going to retrieve data in a page, Db2 can check the page's last-updated log point and compare that to the CLSN of the page set or partition. Using simple numbers for example, suppose that a page's last-updated log point (information that, again, is stored in the page) is 20, and the CLSN of the page set or partition in which the page is located is 30. Because the starting log point of the oldest still-in-flight unit of work affecting the page set or partition is higher than (meaning, is later in time than) the log point of the last update of the page, Db2 knows that all the data in the page is in a committed state, so guaranteed-committed data can be retrieved from the page with no need for S-lock acquisition on the page (or on rows therein). When the  oldest still-in-flight data-changing unit of work affecting a page set or partition commits, the CLSN of the page set or partition moves up to the starting log point of what had previously been the next oldest still-in-flight data-changing unit of work affecting the page set or partition. Even if (using simple numbers as before) the last-updated log point of a page is seen to be 50, and the CLSN of the associated page set or partition is 40 (meaning that data in the page might have been updated by a still-in-flight data-changing unit of work), lock avoidance may still be possible for a data retrieval operation targeting the page, thanks to PUNC bits.  
  • PUNC bits - PUNC is short for "possibly uncommitted." Included in the control information on every Db2 data page and every index page are bits that indicate whether or not a row contains possibly uncommitted data (in the case of an index page, the PUNC bits are associated with the RIDs, or row IDs, that point to rows in the underlying Db2 table). When a row is changed, its PUNC bit is set. That being the case, when Db2 examines a row (or its RID in an index) and sees that the PUNC bit is NOT set, Db2 knows that the data in the row is committed, and the data can be retrieved without the need for an S-lock on the data page or row to ensure data committed-ness. So, why is this indicator called the "possibly" uncommitted bit, as opposed to the "for sure uncommitted" bit? Because Db2 does not synchronously reset a "turned on" PUNC bit when the data change that caused the bit to be set is committed - doing that would have an unacceptable overhead cost. Instead, PUNC bits that are turned on as a result of data-change activity are reset asynchronously, in the background, when certain events happen (one such event is execution of the REORG utility for a table space; another is when more than 25% of the rows in a page have their PUNC bits turned on and the page set's or partition's CLSN advances). Because of the asynchronous nature of PUNC bit resetting, relative to the turning on of a PUNC bit, the turned-on PUNC bit setting can only be interpreted as meaning, "Maybe the data in this row is committed and the PUNC bit hasn't been reset, or maybe the data is in fact not committed." When an application program wants only committed data values to be retrieved by queries (i.e., when isolation level UR, short for "uncommitted read," is not in effect for the program), "maybe" isn't good enough, and Db2 will request an S-lock on the row or page to ensure the committed state of data values (successful acquisition of an S-lock means that the page or row is not X-locked, and that means the row or page contains only committed data).

I want to impart now a couple more items of information pertaining to Db2 for z/OS lock avoidance. First, lock avoidance can be utilized to the maximum extent possible when a Db2 package is bound with ISOLATION(CS) - short for cursor stability - and CURRENTDATA(NO). Second, what I have described in this entry is lock avoidance as it occurs in a non-data sharing Db2 system. Lock avoidance in a Db2 data sharing environment - including the Db2 12 enhancement referenced at the start of this entry - will be the subject of my next post to this blog. I hope to have that written within the next two weeks, so check back in if you use - or are interested in - Db2 data sharing.

11 comments:

  1. What do we mean by transaction locks ? This link tells me about lock size,duration ,lock mode but I do not see anything about transaction lock in specific.
    https://www.ibm.com/docs/hr/db2-for-zos/12?topic=locks-transaction

    ReplyDelete
    Replies
    1. That's just a term that broadly applies to locks that are acquired in the processing of a Db2 for z/OS-accessing transaction.

      Robert

      Delete
    2. When we issue "-DIS DB() TS() CLAIMERS " we see locks acquired for resource.
      what does it mean by
      "Claimer's are different from transaction locks" .How is it different?

      Delete
    3. Locks control concurrency of access to data. Claims indicate intent of data access ("I intend to read data in this page set or partition," or "I intend to change data in this page set or partition"). Claims (and drains, an associated Db2 for z/OS concept) are explained on this page of the Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=concurrency-claims-drains.

      Robert

      Delete
    4. Thanks for the info

      Delete
  2. For locksize ANY defined in table structure-How /on what basis DB2 decide if it should take page or row. how do this gets controlled?

    ReplyDelete
    Replies
    1. I'm not aware that that information is documented. My understanding is that LOCKSIZE(ANY) will cause Db2 to choose page-level locking for child rows, but Db2 "reserves the right" in that case to choose row-level locking. I do not know of a scenario that would cause Db2 to choose row-level locking when LOCKSIZE(ANY) is in effect.

      Robert

      Delete
    2. When I issue DIS DB() TS() LOCKS command ,we only see below entry

      NAME PART TYPE STATUS CONNID CORRID LCKINFO
      TS1 01 TS RW TSO USER1 H-IX,P,C

      From this info,how do we know which is parent lock and child lock .
      How should we interpret this?


      Delete
    3. -DISPLAY DATABASE with the LOCKS option will show parent lock information. The meaning of the information in the output is explained on this page in the Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=messages-dsnt361i#dsnt361i__lock. On that page, if you do a search on the string 'status-lock', that will take you right to the part of the page that explains the meaning of data in the LOCKINFO part of the output.

      Robert

      Delete
    4. If "LOCK STATUS" is the parent lock info ,where can we find the child lock info?
      Is this something that can be found by doing computations from lock modes and compatibility table related to this parent lock?

      Delete
    5. You won't find child lock information in the output of a Db2 -DISPLAY command - that would be unwieldy, as a given process could hold thousands of child locks at one time. There might be a Db2 performance trace record that would provide child lock information, but that would probably be a trace with significant CPU time that you would not want to incur.

      Child lock type is generally quite straightforward, and is based on the SQL DML statement in question: data-changing statements (INSERT, UPDATE, DELETE) will get X child locks for changed pages or rows; read-only statements (e.g., read-only SELECT statements) will get S child locks on pages or rows accessed; a cursor defined with FOR UPDATE will get U child locks on rows or pages accessed, and those U locks will be converted to X locks if rows are changed by way of the cursor. The detailed information about child locks acquired for various SQL DML statements, with various LOCKSIZE and ISOLATION values in effect, is found in the tables on this page in the Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/13?topic=types-locks-acquired-sql-statements.

      Robert

      Delete