Tuesday, January 31, 2017

Are You Using System Profile Monitoring to Manage Your DB2 for z/OS DDF Workload? Perhaps You Should

Here's a scenario that might sound familiar to you: you have a particular transaction, which I'll call TRNX, that is the source of quite a lot of deadlock activity in a DB2 for z/OS system. It seems that whenever more than one instance of TRNX is executing at the same time, a deadlock situation is highly likely. You went with row-level locking for the table spaces accessed by TRNX, but the trouble persisted. It is conceivable that rewriting the program code associated with TRNX might eliminate the problem, but the task would be challenging, the development team has limited bandwidth to accomplish the recommended modifications, and it could take months -- or longer -- for the fix to get into production. What can you do?

Well, as I pointed out in an entry posted to this blog a few years ago, sometimes the right approach in a case such as this one is to single-thread TRNX. Though it may at first seem counter-intuitive, there are circumstances for which transactional throughput can be increased through a decrease in the degree of transactional multi-threading, and that approach can be particularly effective when the rate of transaction arrival is not particularly high (i.e., not hundreds or thousands per second), transaction elapsed time is short (ideally, well under a second), and probability of a DB2 deadlock is high if more than one instance of the transaction is executing at the same time.

Lots of people know how to single thread a CICS-DB2 or IMS-DB2 transaction, but what about a DDF transaction (i.e., a transaction associated with a DRDA requester, which would be an application that accesses DB2 for z/OS by way of TCP/IP network connections)? Is there a means by which a DDF transaction can be single-threaded?

The answer to that question is, "Yes," and the means is called system profile monitoring, and DDF transaction single-threading is just one of many useful applications of this DB2 for z/OS capability. I'll provide a general overview of DB2 system profile monitoring, and then I will cover transaction single-threading and a couple of other use cases.

System profile monitoring is put into effect by way of two DB2 tables, SYSIBM.DSN_PROFILE_TABLE, and SYSIBM.DSN_PROFILE_ATTRIBUTES. Those tables were introduced with DB2 9 for z/OS, and DB2 10 enabled their use in managing a DDF workload in a more granular fashion than was previously possible. Prior to DB2 10, the number of connections from DRDA requesters allowed for a DB2 subsystem, and the maximum number of those connections that could be concurrently in-use, and the maximum time that an in-use (i.e., non-pooled) DBAT (database access thread -- in other words, a DDF thread) could sit idle without being timed out, could be controlled only at the DB2 subsystem level via the ZPARM parameters CONDBAT, MAXDBAT, and IDTHTOIN, respectively. What if you want to exert control over a part of a DDF workload in a very specific way? With system profile monitoring, that is not at all hard to do.

A row inserted into SYSIBM.DSN_PROFILE_TABLE indicates the scope of a particular DDF workload managemnt action, and the corresponding row (or rows) in SYSIBM.DSN_PROFILE_ATTRIBUTES indicates what you want to manage for this component of your DDF workload (number of connections, number of active connections, idle thread timeout, or two of the three or all three) and how you want that management function to be effected (e.g., do you want DB2 to take action when a specified limit is exceeded, or just issue a warning message). The columns of the two tables, and their function and allowable values, are well described in the DB2 for z/OS documentation, and I won't repeat all that information here (the DB2 11 information is available online at http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_monitorthreadsconnectionsprofiles.html, and you can easily go from there to the DB2 10 or DB2 12 information, if you'd like). What I will do is take you through a few use cases, starting with the single-threading scenario previously referenced.

To single-thread a DDF transaction, you would first need to identify that transaction by way of a row inserted into the SYSIBM.DSN_PROFILE_TABLE. You have multiple options here. You might identify the transaction by workstation name (a string that is easily set-able on the client side of a DDF-using application, as described in a blog entry I wrote back in 2014); or, you might identify the transaction via package name, if, for example, it involves execution of a particular stored procedure; or, you might use collection name [Collection name can be specified as a client-side data source property, and it is increasingly used to manage applications that use only IBM Data Server Driver (or DB2 Connect) packages -- these packages, which belong by default in the NULLID collection, can be copied into other collections, and in that way a DDF-using application can be singled out by way of the name of the Data Server Driver (or DB2 Connect) package collection to which it is pointed.] And, there are multiple additional identifier choices available to you -- check the DB2 documentation to which I provided the link above.

In your SYSIBM.DSN_PROFILE_TABLE row used to identify the transaction you want to single-thread, you provide a profile ID. That ID serves as the link to an associated row (or rows) in SYSIBM.DSN_PROFILE_ATTRIBUTES. In a row in that latter table, you would provide the ID of the profile you'll use to single-thread the transaction, 'MONITOR THREADS' in the KEYWORD column, 1 in the ATTRIBUTE2 column (to show that you will allow one active DBAT for the identified transaction), and 'EXCEPTION' in the ATTRIBUTE1 column to indicate that DB2 is to enforce the limit you've specified, as opposed to merely issuing a warning message (you could also specify 'EXCEPTION_DIAGLEVEL2' if you'd like the console message issued by DB2 in the event of an exceeded threshold to be more detailed versus the message issued with EXCEPTION, or its equivalent, EXCEPTION_DIAGLEVEL1, in effect). Then you'd activate the profile with the DB2 command -START PROFILE, and bingo -- you have put single-threading in effect for the DDF transaction in question.

Something to note here: Suppose you have set up single-threading in this way for transaction TRNX, and an instance of TRNX is executing, using the one thread you've made available for the transaction. Suppose another request to execute TRNX arrives. What then? In that case, the second-on-the-scene request for TRNX will be queued until the first-arriving TRNX completes (if TRNX typically executes in, say, less than a second, the wait shouldn't be long). What if a third request for TRNX comes in, while the second request is still queued because the first TRNX has not yet completed? In that case, the third TRNX request will fail with a -30041 SQLCODE. This is so because DB2 will queue requests only up to the value of the threshold specified. If you specify 1 active thread for a transaction, DB2 will queue up to 1 request for that transaction. If you specify a maximum of 4 active threads for a transaction, DB2 will queue up to 4 requests for the transaction if the 4 allowable active threads are busy. With this in mind, you'd want to have the TRNX program code handle the -30041 SQLCODE and retry the request in the event of that SQLCODE being received. Would you like to be able to request a "queue depth" that is greater than your specified threshold value? So would some other folks. That enhancement request has been communicated to the DB2 for z/OS development team.

Something else to note here: What if you are running DB2 in data sharing mode. Does a limit specified via SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES apply to the whole data sharing group? No -- it applies to each member of the group. How, then, could you truly single-thread a DDF transaction in a data sharing environment? Not too hard. You'd set up the profile and associated threshold as described above, and you'd start the profile on just one member of the group (-START PROFILE is a member-scope command). On the client side, you'd have the application associated with TRNX connect to a location alias, versus connecting to the group's location, and that alias would map to the one member for which the single-threading profile has been started (I wrote about location aliases in an entry posted to this blog a few years ago -- they are easy to set up and change). If the one member is down, have the profile ready to go for another member of the group (you could have leave the GROUP_MEMBER column blank in the DSN_PROFILE_TABLE row to show that the profile applies to all members of the group, or you could have two rows, one for the "primary" member for the transaction in question, and one for an alternate member, in case the "primary" member for the transaction is not available), and start the profile on that member. You would also change the location alias that maps to the one member, so that it maps instead to the other member (or you could map the alias to two members, and only start the alias on one member at any given time -- location aliases can be dynamically added, started, and stopped by way of the DB2 command -MODIFY DDF); so, no client-side changes would be needed to move single-threading for a transaction from one data sharing member to another.

A couple more use cases. What else can be accomplished via DB2 system profile monitoring? There are many possibilities. Consider this one: you have a certain DDF-using application for which you want to allow, say, 50 active threads. Easily done: if the application connects to DB2 using a particular authorization ID -- very commonly the case -- then set up a profile that is associated with the application's ID, and in the associated DSN_PROFILE_ATTRIBUTES row indicate that you want to MONITOR THREADS, that the threshold is 50, and the action is EXCEPTION. Note, then, that up to 50 requests associated with the application could be queued, if the 50 allotted DBATs are all in-use.

Or how about this: there are hundreds (maybe thousands) of people employed by your organization that can connect to one of your DB2 for z/OS systems directly from their laptop PCs. You know that a single individual could establish a large number of connections to the DB2 system, and you are concerned that, were that to happen, your system could hit its CONDBAT limit, to the detriment of other DDF users and applications (and maybe that's actually happened at your shop -- you wouldn't be the first to encounter this situation). How could you limit individuals' laptop PCs to, say, no more than 5 host connections apiece? Would you have to enter hundreds (or thousands) of rows in DSN_PROFILE_TABLE, each specifying a different user ID (or IP address or whatever)? That is what you WOULD have had to do, before a very important system profile monitoring enhancement was delivered with DB2 12 for z/OS (and retrofitted to DB2 11 via the fix for APAR PI70250). That enhancement, in a word: wildcarding. By leveraging this enhancement (explained below), you could limit EACH AND EVERY "laptop-direct" user to no more than 5 connections to the DB2 for z/OS subsystem by way of a single row in DSN_PROFILE_TABLE (and an associated MONITOR CONNECTIONS row in DSN_PROFILE_ATTRIBUTES).

More on wildcarding support for system profile monitoring: with DB2 12 (or DB2 11 with the fix for the aforementioned APAR applied), you can use an asterisk ('*') in the AUTHID or the PRDID column of DSN_PROFILE_TABLE (the latter can identify the type of client from which a request has come); so, an AUTHID value of 'PRD*' would apply to all authorization IDs beginning with the characters PRD (including 'PRD' by itself), and an asterisk by itself would apply to ALL authorization IDs (with regard to rows in DSN_PROFILE_TABLE, a DRDA request will map to the profile that matches it most specifically, so if there were a profile row for auth ID 'PROD123' and another row for auth ID '*', the former would apply to requests associated auth ID PROD123 because that is the more specific match).

You can also use wildcarding for the IP address in the LOCATION column of a row in SYSIBM.DSN_PROFILE_TABLE, but in a different form. For an IPv4 TCP/IP address, a wildcard-using entry would be of the form address/mm where mm is 8, 16, or 24. Those numbers refer to bits in the IP address. Here's what that means: think of an IPv4 address as being of the form A.B.C.D. Each of those four parts of the address consists of a string of 8 bits. If you want to wildcard an IPv4 address in the LOCATION column of a DSN_PROFILE_TABLE row, so that the row will apply to all addresses that start with A.B.C but have any possible value (1-254) for the D part of the address, the specification would look like this (if A, B, and C were 9, 30, and 222, respectively):

9.30.222.0/24

And note that a specification of 0.0.0.0 applies to all IP addresses from which requests could come for the target DB2 for z/OS system. A similar convention is used for IPv6 addresses -- you can read about that in the text of the APAR for which I provided a link, above.

Why use this convention, instead of something like 9.30.222.* for addresses 9.30.222.1 through 9.30.222.254, or an * by itself for all IP addresses? Because the convention used is already prevalent in the TCP/IP world, and in this case it made sense to go with the flow.

So, that's what I have to say about DB2 system profile monitoring. It's a great way to manage a DB2 for z/OS DDF workload in a more granular way than is offered by the ZPARMs CONDBAT, MAXDBAT, and IDTHTOIN (though those ZPARM values remain in effect in an overall sense when system profile monitoring is in effect). If you aren't yet using this functionality, think of what it could do for your system. If you are using system profile monitoring but haven't used the new wildcard support, consider how that enhancement could provide benefits for you. In an age of ever-growing DB2 DDF application volumes, system profile monitoring is a really good thing.