Tuesday, November 26, 2013

DB2 for z/OS Work: the Task's the Thing

To understand how DB2 work is handled in a z/OS system, you need to have some understanding of the tasks used to manage that work. Here, "tasks" doesn't refer to things done or to be done. It refers instead to the control blocks used to represent, for dispatching purposes, application and system processes in a z/OS LPAR. I have found that several misconceptions related to DB2 for z/OS workload processing are rooted in misunderstandings of the tasks behind the workload. In particular, I've seen situations in which folks have the wrong idea about DB2 DDF address space CPU consumption, about batch- versus online-issued SQL statements, and about zIIP eligibility of DB2 work, and in each case clarity came from a realization of the tasks related to the DB2 activity in question. In this blog entry, I'll try to provide you with some information that I hope will be helpful to you in understanding why DB2 for z/OS work is processed as it is.

DB2 DDF address space CPU consumption

With respect to control blocks used to manage DB2 work in a z/OS system, there are at least a couple of relevant dichotomies. First, you have "system" tasks and "user" tasks. The system tasks are associated with DB2 address spaces, and they generally have to do with work of a housekeeping variety (e.g., writes of changed data and index pages to disk), as well as things pertaining to the SQL statement execution environment (e.g., thread creation and termination). Where things can get a little confusing is in the area of DB2 DDF CPU utilization (DDF being the distributed data facility -- the DB2 address space through which SQL statements from network-attached application servers and workstations flow, and through which results flow in the other direction). The confusion to which I refer stems from the fact that the DDF address space has associated with it both system tasks and user tasks, as opposed to having only the former. For SQL statements issued by "local" applications (those that attach directly to a DB2 subsystem in the same z/OS LPAR), the user tasks -- those under which SQL statements execute and to which the large majority of CPU time consumed in SQL statement execution is charged -- belong to "allied" address spaces (i.e., the address spaces in which the SQL-issuing programs run). So, for example, the user task under which a SQL statement issued by a local CICS transaction program executes is the CICS subtask TCB associated with that transaction, and the user task under which a SQL statement issued by a local batch job executes is the TCB of the batch address space (more on TCBs in a moment). Compared to the user tasks of allied address spaces connected to a local DB2 subsystem, the system tasks of the DB2 IRLM (lock manager), MSTR (system services), and DBM1 (database services) address spaces consume relatively little CPU time.

In the case of the DDF address space, you have to keep in mind that the programs issuing the SQL statements aren't running in the z/OS LPAR with DB2, and yet they must be represented locally in the z/OS LPAR so that they can be properly prioritized and dispatched by the operating system. The local representation of a remote DB2-accessing program is a so-called preembtable SRB in the DDF address space (more on SRBs to come), and because that task is the local representation of the remote program, most of the CPU time associated with execution of SQL statements issued by that program will be charged to the DDF preemptable SRB. That is why DDF CPU consumption will be significant if there are a lot of SQL statements flowing to DB2 through the DDF address space -- it's analogous to a CICS region consuming a lot of CPU time if a lot of SQL statements are sent to DB2 by transactions running in that CICS region. The DDF user tasks are charged with a lot of the CPU time related to SQL statement execution, while the DDF system tasks consume only a small amount of CPU time. You can find more information about DDF CPU consumption in an entry I posted to this blog last year.

Batch- versus online-issued SQL statements

Not long ago, I got a note from a software analyst who was concerned that SQL statements issued by batch programs would get in the way of SQL statements issued by concurrently executing CICS transactions. He was specifically worried about "inefficient" batch-issued SQL statements (which could be thought of as long-running SQL statements, though long-running SQL statements are not necessarily inefficient in any way) getting in line for CPU cycles ahead of CICS-issued SQL statements. The main issue in this person's mind was the dispatching priority of the DB2 address spaces: a priority that was somewhat higher (as recommended) than that of the CICS regions in his system. If DB2 has a really high priority in the z/OS LPAR, won't long-running, batch-issued SQL statements negatively impact the performance of CICS transactions?

The answer to that question (unless the site's workload manager policy is unusual) is, "No" (or at least, "No, CICS transactions will not be behind these long-running, batch-issued SQL statements in the line for CPU cycles"). Again, the task's the thing. A SQL statement (long-running or otherwise) issued by a batch job runs under that job's task (though it executes in the DB2 database services address space, as do all SQL statements), and it therefore has the priority of that task. A SQL statement issued by a CICS transaction runs under that transaction's task in the associated CICS region, and so it executes with the priority of the transaction's task. Assuming that batch jobs in your environment have a lower priority than CICS transactions, SQL statements issued by batch jobs will have a lower priority relative to SQL statements issued by CICS transactions.

The priority of DB2 address spaces (which should be really high) does NOT impact the priority of SQL statements that access DB2 data. Why, then, is it important for the DB2 address spaces to have a high priority in the system? Because a LITTLE BIT of the work related to SQL statement execution is done under DB2 tasks (examples include data set open and close, and lock management), and if this work doesn't get done RIGHT AWAY as needed, the whole DB2-accessing workload can get seriously gummed up. That's why giving the DB2 address spaces a higher priority (doesn't have to be way higher) than DB2-connected CICS regions is good for CICS-DB2 throughput: it enables DB2 to very quickly take care of the little bit of work done by DB2 tasks in a very timely manner, so that the bulk of SQL statement processing (which, again, happens at the priority of SQL statement-issuing programs) won't get bogged down waiting for locks to be released or threads to be created or whatever. More information on DB2 and CICS address space priority recommendations can be found in a blog entry on the topic that I posted last year.

zIIP eligibility of DB2 work

I mentioned in the second paragraph of this entry that there are a couple of dichotomies with regard to the control blocks that are used for tracking and dispatching work in a z/OS system. The first of these -- system tasks and user tasks -- I covered already. The second dichotomy I have in mind is TCBs and SRBs, or, more formally, task control blocks and service request blocks. For many years, people associated TCBs with user tasks, and SRBs with system tasks. That thinking wasn't far off the mark until the 1990s, when, needing a mechanism to manage work such as that processed through the DB2 DDF address space, IBM z/OS developers delivered a new type of SRB -- an SRB that, as said by a then-DB2 DDF developer, "acts like a TCB." This was the enclave SRB, and in particular, the preemptable SRB.

It's important to keep in mind that SQL statements that get to DB2 via DDF execute under preemptable SRBs. Here's why that's important: work that runs under such tasks is zIIP eligible (zIIPs are System z Integrated Information Processors -- the "specialty engines" that provide very economical computing capacity for certain types of work). In the case of DDF-routed SQL, the zIIP offload percentage tends to be about 60% (queries parallelized by DB2 also run under preemptable SRBs, and so are zIIP eligible, as I pointed out in a blog entry I posted back in 2010).

In my experience, people are most likely to get confused about zIIP eligibility of DB2 work when they are thinking about native SQL stored procedures. First, they may wonder why a native SQL procedure is zIIP-eligible when called by a DRDA requester (i.e., when invoked via a CALL that goes through the DDF address space), but not zIIP-eligible when called by a local-to-DB2 program (such as a CICS transaction). People can also be a little unclear on the reason why a native SQL procedure called by a DRDA requester is zIIP-eligible, while an external stored procedure (such as one written in COBOL or C) called by a DRDA requester is not zIIP-eligible. To get things straight in both of these cases, remember (again) that the task's the thing. A native SQL procedure runs under the task of the application process through which it was invoked, while an external stored procedure runs under a TCB in a WLM-managed stored procedure address space; thus, a native SQL procedure, when called by a DRDA requester, will run under a preemptable SRB in the DDF address space (as will any SQL statement issued by a DRDA client program), and that is why the native SQL procedure will be zIIP eligible in that situation. When the same native SQL procedure is called (for example) by a CICS transaction program, it will run under that program's task. Because that task is a TCB (in the associated CICS region), the native SQL procedure will not be zIIP eligible when so called. Similarly, even if an external stored procedure is called by a DRDA requester, it will run under a TCB (in a WLM-managed stored procedure address space) and so will not be zIIP eligible.

If you'll keep in mind the type of task -- user or system, TCB or SRB -- involved in getting various types of DB2 work done, you'll have a much better understanding of how that work is managed and prioritized, and whether or not it is zIIP-eligible. I hope that the information in this blog entry will be useful to you in that regard.

5 comments:

  1. great blog, TCB cpu time not ZIIP-eligible, Some of SRB cpu time ZIIP-eligible.
    Can a ZOS task on the same LPAR do a remote access to a DB2 sub-system and get some cpu to go off to ZIIP?
    Bob S

    ReplyDelete
    Replies
    1. The only such access situation of which I'm aware concerns IBM's WebSphere Application Server (WAS) running in the same z/OS LPAR as a target DB2 subsystem. In that case, you can use either the type 2 JDBC driver (local access to the DB2 subsystem, via the DB2 Recoverable Resource Services -- aka RRS -- attach facility) or the type 4 JDBC driver (involves going into the z/OS system's TCP/IP stack and into DB2 via the distributed data facility, aka DDF). The type 4 driver will likely mean more path length in getting to DB2 versus the type 2 driver, but because SQL statements will execute under enclave SRBs in the DB2 DDF address space, the type 4 driver might reduce general-purpose CPU utilization by increasing zIIP offload for SQL statement execution. CICS, IMS, and batch programs have to connect to a local DB2 subsystem. WAS for z/OS can connect to DB2 locally or remotely.

      Robert

      Delete
  2. Super article Robert, but still, I am not sure about zIIP eligible workload. Let's say, I have a COBOL program which connects to DB2 via CAF. Then I perform some FETCHes in a loop to retrieve data from the database. Will these SQLs run on zIIP?

    George

    ReplyDelete
  3. As noted in the blog post, George, zIIP-eligibility of SQL statements depends on the type of task under which the SQL is executing. For a COBOL program accessing a local DB2 subsystem, SQL issued by the program will execute under the program's TCB an so will not be zIIP eligible. To be zIIP eligible, a SQL statement has to run under a preemptible SRB. That will be the case if the SQL is issued from a program that is a DRDA requester (i.e., a DDF-using program), or issued by a native SQL procedure called from a DRDA requester, or if a query is parallelized by DB2.

    Robert

    ReplyDelete
  4. Thanks for explanation Robert, I think I understood now.

    Regards,
    George

    ReplyDelete