Tuesday, January 22, 2013

DB2 for z/OS Data Sharing: the Lock List Portion of the Lock Structure

With DB2's data sharing capability having been introduced back in the mid-1990s with DB2 for z/OS Version 4, a great many people -- including lots of folks whose organizations do not run DB2 in data sharing mode -- are quite familiar with the technology (for people new to this subject area, I posted, a few years ago, a "What is DB2 Data Sharing?" entry to the blog that I maintained while working as an independent DB2 consultant). Still, there are some aspects of DB2 data sharing technology that are more widely understood than others. It's recently come to my attention that among mainframe DB2 people there is a bit of a knowledge gap with regard to a particular component of a DB2 data sharing configuration. The component to which I refer here is the lock list portion of the DB2 lock structure, and with this blog entry my aim is to fill that knowledge gap to some degree.

The lock structure is of course one of three coupling facility structure types that DB2 uses to make data sharing work (the other two structure types are the Shared Communications Area and the group buffer pools). Within the lock structure you have the lock table and the lock list. The lock table is, conceptually, at work all the time in a DB2 data sharing environment. It's essentially a mechanism through which global lock contention is detected: when a process running on a member of a DB2 data sharing group (I'll call this member DB2A) wants a lock on something (e.g., a data page or row) that might be locked by a process running on another member of the group, the item on which the lock is being requested is mapped by way of a hashing algorithm to an entry in the lock table. That lock entry is checked for the presence of an incompatible lock held by a process running on another member of the group, and if that turns out to be the case (e.g., a process running on member DB2B has an X-lock on a page on which the process running on DB2A wants an S-lock) then the contention indication is sent back from the coupling facility holding the lock structure to the z/OS LPAR on which member DB2A is running. [Because there will almost certainly be many more lockable "things" in the database than entries in the lock table, some different lockable things will map to the same lock table entry; thus, contention can be falsely detected. Typically, "false contention" is the initial result for a very small percentage of global lock requests, and when this occurs the contention situation is determined to be false in a quick and efficient manner.]

Because the lock table is vitally important for the moment-by-moment operation of a DB2 data sharing group, it gets a lot of attention. By comparison, the lock list portion of the lock structure is usually not front-of-mind when people thing about DB2 data sharing. It's important, sure -- but in an insurance policy kind of way: you're glad it's there, but you'd be pleased if it were never needed. Here's what I mean by this analogy: the purpose of the lock list is to store information about X-type global locks held by members of the data sharing group on behalf of processes executing on the various members ("X-type" refers to locks associated with data-changing actions, and "global" refers to locks of which members other than the lock-acquiring member might need to be aware). In the event of the abnormal termination of a member DB2 subsystem, information about the X-type global locks held by that member at the time of failure will be, in effect, downloaded from the lock list to the surviving members of the group. These locks will be treated by the surviving members as "retained" locks, and they will continue to be retained (and the associated locked items, such as data pages and/or rows, will remain unavailable) until the failed DB2 subsystem releases the locks in the course of restart processing (restart of a failed DB2 member will usually be initiated automatically by the system, and will often complete in less than 2 minutes). Knowing about these retained locks is important for data integrity-preservation purposes: they keep surviving subsystems from accessing data pages or rows that were in the process of being changed through another member when that member failed. The retained locks "seal off" affected pages and/or rows (depending on the granularity of locking used for a table space) until the failed member can, as part of restart processing, roll back the units of work that were in-flight in the subsystem at the time of failure and write to disk those committed data changes that had not yet been externalized at the time of the failure event.

So, what does the lock list need in order to do its job? It needs enough space to hold information about all the X-type global locks held at a given time by the members of the DB2 data sharing group. Often people supporting DB2 data sharing systems don't think much about this, and usually they don't need to: these folks take the recommended approach of making the size of the DB2 lock structure a power of two (e.g., 64 MB, specified by way of the INITSIZE parameter for the structure in the system's coupling facility resource management policy, also known as the CFRM policy). When that's done, half the space in the lock structure will be used for the lock table, and half will be used for the lock list (the lock table size MUST be a power of two). In most cases, this approach provides plenty of lock list space (and that's good -- don't get stingy here).

Suppose your particular situation falls outside of the "most cases" norm? I recently encountered an interesting set of circumstances at a DB2 for z/OS site that made the "power of two" standard for lock structure size inappropriate. At this site, in a test environment, the lock list in a 327 MB lock structure (that's really big) got full. Let me put this in perspective for you: 327 MB is not a power of two. So, how big was the lock list in this lock structure? As mentioned previously, the lock table portion of a lock structure MUST be sized at a power of two. When the lock structure INITSIZE is not a power of two, upon initial structure build (or subsequent rebuild) the lock table portion of the structure will be sized at the power of two that will result in as close to a "50-50" split of space as possible between the lock table and the lock list. For a 327 MB lock structure that power of two for the lock table will be 128 MB (versus 64 MB or 256 MB). Subtract the 128 MB for the lock table from the 327 MB for the overall lock structure, and what you have left is 199 MB for the lock list. Will a lock list of that size have enough room to store a lot of global X-lock information? Yes: at a little over 300 bytes per lock list entry, a 199 MB lock list can accommodate over 600,000 list entries. That kind of capacity would typically be more than enough in a DB2 data sharing system, but in the situation to which I'm referring it fell short of the need. Why? Because at this site the migration of an application from a non-DB2 database to DB2 is requiring, at least for a time, some very large units of work (with respect to the number of update locks acquired by a program between commits). The ZPARM parameters NUMLKTS and NUMLKUS had been set to pretty high values to accommodate this requirement, but as mentioned the lock list in the test system did fill up and programs needing additional global X-locks abended as a result.

At this site, the lock list needs to be made larger (in fact, considerably larger), and that's going to happen -- but not in the way you might suppose. You might think that in this situation you'd just make the lock structure big enough (referring to INITSIZE) so that the lock list will be as big as needed, keeping in mind that, as noted, the size of the lock table part of the structure will be the power of two that gets you as close as possible to a 50-50 space split between the lock table and the lock list. In fact, that is NOT what this organization needs to do, for this simple reason: the lock table portion of the current lock structure, at 128 MB, is already plenty big and doesn't need to get any bigger (the adequacy of the lock table's size is indicated by a very low rate of false contention, which I briefly described above). The solution here is to keep the lock structure's INITSIZE as-is, and to substantially boost the value of the SIZE parameter for the structure in the CFRM policy. The SIZE value specifies the size up to which a coupling facility structure can be dynamically increased (by way of a SETXCF command) after it has been initially built (or rebuilt). Now, here's the really important point in the context of the referenced organization's need for substantially more lock list space: when a lock structure is made dynamically larger by way of a SETXCF command (and again, this can only be done if the value of SIZE exceeds the value of INITSIZE for the structure in the CFRM policy), ALL of the added space is used for the lock list -- the size of the lock table does not change (a larger INITSIZE and a structure rebuild would be needed to increase the size of the lock table). So, if INITSIZE for a lock structure is, say, 256 MB, and SIZE for the structure is 512 MB, and if the structure is initially built and then 100 MB is dynamically added to the structure's size through execution of a SETXCF command, the effect will be a 100 MB increase in the size of the lock list (which should give you room for an extra 320,000 or so additional lock list entries).

Another thing about dynamically increasing the size of the lock list: suppose, as in the example at the end of the preceding paragraph, you had a lock structure with an INITSIZE of 256 MB and a SIZE of 512 MB, and after initial structure build you dynamically increased the size of the structure by 100 MB. That would leave you with another 156 MB of available dynamic-size-increase capability. How would you know when to use that? IRLM (DB2's lock manager component) will tell you. If the lock list gets pretty full (50% full), IRLM will issue message DXR170I. That message will be issued again if the lock list becomes 60% full, and again if it gets to 70% full. If the lock list gets really full (80%), IRLM will issue message DXR142E. That message will be issued again if the lock list gets to 90% full, and again if it gets to 100% full, but you'd want to act on the 80% full message because at 90% full you're likely to start getting application program abends.

In summary: the lock list portion of the lock structure serves its purpose when a DB2 subsystem in a data sharing group fails, and that tends to be a pretty rare event. Still, you need to make sure that you have enough space in your lock list to hold all the global X-locks that might accumulate in your DB2 data sharing system, because running short on this space could result in application program abend events. In my experience, the lock list space that people have is usually plenty big for their needs, but there are situations in which the accumulation of a really large number of X-type global locks across a data sharing group can be expected. If that's the case at your site, be ready with a big-enough lock list that you can dynamically enlarge if needed. In other words, make sure that this insurance policy is in order and reflects the characteristics of your DB2 application environment.

2 comments:

  1. Hi Robert, We are facing issues with rebuilding SCA when we migrate our DB2 V11 system(on z/OS 2.2) from ENFM to NFM. Reason ABND=04E-00F70601.

    DSN7508I -DBN1 DSN7LRW1
    ACCESS TO THE SCA STRUCTURE DSNDBN0_SCA FAILED.
    MVS IXLLIST RETURN CODE = 0000000C,
    MVS IXLLIST REASON CODE = 0C1C0C17.

    Do we have any limitations with the size of SCA as we have only 6MB in our sandbox system and its only 10% used.

    Thanks,
    Ram R.

    ReplyDelete
    Replies
    1. Might be a good idea to open a PMR for this issue with the IBM Support Center.

      The 0C1C0C17 reason code from MVS IXLLIST indicates that the structure could not be accessed because it is full. If you think that you should not be getting this error (e.g., if you display information about the structure and see that in fact it is not full), one possible cause for the problem could be a SIZE for the structure in the CFRM policy that is more than twice the value of INITSIZE (see info APAR II14807: http://www-01.ibm.com/support/docview.wss?uid=isg1II14807).

      Robert

      Delete