Robert's Db2 blog: February 2012

Sunday, February 26, 2012

DB2 for z/OS: of Stored Procedures and the DB2 MQListener

As a mainframe DB2 guy, I'm bullish on stored procedures -- I've presented on the topic at international conferences and regional user group meetings, written about stored procedures for the old DB2 Magazine, and posted many a related entry to my DB2 blogs, both this blog (an example being an entry from August of last year) and the one I maintained while working as an independent DB2 consultant (included among these is part 1 of a 3-part entry on stored procedure usage from an application architecture perspective). I'm an advocate of DB2 stored procedure technology because I'm very big on the use of DB2 for z/OS as an enterprise data server for multi-tiered, client-server applications, and stored procedures have an important role to play there.

I'm also a proponent of asynchronous data processing, particularly as it pertains to DB2 data-changing operations (e.g., database inserts and updates). In this context, "asynchronous" means that the processing of a back-end data change is "unhooked," time-wise, from the client-side process that supplied the input to the data change operation (this input could come from an end-user clicking "submit" on a screen of a Web-based application). The means of accomplishing this "unhooking" of front-end input and back-end insert or update is often a message queue placed in-between the two processes. In many cases, that message queue will be managed by MQ. Asynchronous processing of DB2 data change operations can be attractive for several reasons, including better end-user response time (the data-input transaction is complete once the data is placed on the MQ queue) and improved application resiliency (if the back-end database is temporarily unavailable for some reason, input messages simply accumulate on the MQ queue, and are processed when the database is brought back online).

A handy piece of code called the DB2 MQListener enables the bringing together of these two favorite technologies of mine -- DB2 for z/OS stored procedures and asynchronous transaction processing -- and that's what this blog entry is about.

Here's the deal: if you want information in a message that an application process has placed on an MQ queue to be input to a DB2 INSERT or UPDATE (or, conceivably, a DELETE), something has to take that message off the queue and initiate the DB2 data change operation. That "something" can be the DB2 MQListener. It is provided with DB2 (starting, I believe, with DB2 for z/OS V8), and you can read about it in the DB2 for z/OS Application Programming and SQL Guide (this manual -- and other DB2 manuals -- are available in PDF and HTML form for DB2 Versions 8, 9, and 10 at http://www-01.ibm.com/support/docview.wss?rs=64&uid=swg27011656).

The DB2 MQListener runs as a daemon (a background process, in the parlance of Linux/UNIX) under the UNIX System Services component of z/OS. Conceptually, what it does is pretty simple: it calls a DB2 for z/OS stored procedure in response to a message arriving on an MQ queue. What stored procedure does it call? That's up to you. See, the DB2 MQListener is a multi-threaded process, and each thread is connected to one MQ queue. When a message arrives on a particular queue, the DB2 MQListener calls the stored procedure that is associated with the queue (this association between a queue and a stored procedure is established by way of MQListener commands, and the configuration information is stored in a DB2 table named SYSMQL.LISTENERS). If you want several different message types to be processed by different DB2 stored procedures (i.e., message type A drives an insert to DB2 table X, message type B is input to an update of table Y, etc.), set up several input queues, and associate with each the stored procedure that is to operate on a given message type. Then, have the input-providing application programs direct each message type to its designated queue.

What, exactly, is passed to a stored procedure by the DB2 MQListener when a message arrives on a queue? That's easy: the message itself -- all of it. Each arriving message generates one stored procedure call. A stored procedure will do with a message what you've programmed it to do. If you want to keep things really simple, you can have a stored procedure insert a message, in its entirety, into a column of a DB2 table. If the message contains several items of information and you want these various items to go into different columns of a table, you can write a stored procedure to do that. This would be pretty straightforward if the message parts were of a fixed and consistent length (e.g., positions 1 through 4 contain item A, positions 5 through 12 contain item B, etc.). Disassembling a message into its component parts would be a little more involved if the parts were of varying length, but this could certainly be done -- you just have to be able to find the end of one item in the message and the beginning of the next item (the parts of the message might be separated by commas, for example). Want to drive an update of information in a table, based on the content of a message? Write a stored procedure to do that. Have a maybe insert, maybe update situation? Let the message be input to a MERGE statement in a stored procedure. You get the picture: the DB2 MQListener is going to pass a message, in its entirety, to the stored procedure associated with an input queue. What the stored procedure does with that message is your call. [Note that, in addition to the one input parameter (a message taken from a queue), the DB2 MQListener will specify a single output parameter when calling a stored procedure. You can have your stored procedure assign a value to this output parameter, but you don't have to -- it can be NULL.]

Just as you decide what a stored procedure is to do with a message supplied as input by the DB2 MQListener, so, too, are you in control of the type of stored procedure that will be used with the MQListener. Want it to be an external stored procedure, perhaps one that's written in COBOL? That's fine. Want to go with a native SQL procedure (these were introduced with DB2 9 in new-function mode)? That's OK, too. The right choice is the one that make sense in your environment and provides the processing functionality that you need (though I will say that I'm a big proponent of native SQL procedures).

Herein I've provided just a brief overview of the DB2 MQListener and the ways in which you can use it. As I mentioned, you can get a lot more information from the DB2 for z/OS Application Programming and SQL Guide (including examples of MQListener set-up and operation and DB2 stored procedure coding). Check it out. DB2 and MQ are a great combination, and the DB2 MQListener can help you take asynchronous invocation of stored procedures from concept to reality.

Saturday, February 11, 2012

Got LOBs? Get DB2 10 for z/OS (Part 2)

So, last week I posted a blog entry describing one of the two really important (in my opinion) DB2 10 for z/OS enhancements related to LOB (large object) data management -- that being the ability to "in-line" a portion (or even all) of a LOB column's data values in a base table space, alongside the associated table's non-LOB data values (this as opposed to having to store all of every LOB value in a LOB table space that -- while logically transparent -- is physically distinct from a table's base table space). This week, I'll cover the other of my two favorite DB2 10 LOB-related enhancements: support for the variable-blocked spanned record format for the SYSREC data set of LOAD and UNLOAD utility jobs. I'll also provide some information about other improvements in LOB data management delivered with DB2 10. [And before going further, I'll give props here to Jeffrey Berger, a member of IBM's DB2 for z/OS development team, who's helped me to build up my knowledge of LOBs.]

At first glance, being able to use the variable-blocked spanned (VBS) record format for the LOAD and UNLOAD SYSREC data set may not seem like a big deal. A little background, then: SYSREC is the default DD name for the data set into which data is unloaded from a table via the UNLOAD utility, and it's the default DD name for the data set that holds records that are loaded into a table space through the LOAD utility. Prior to DB2 10, the LOAD or UNLOAD SYSREC data set had to use the variable-blocked record format (RECFM=VB). That's fine in most cases, but often a problem when LOB data is involved. Why? Because the maximum record length for RECFM=VB is 32,760, and as we all know a LOB data value can exceed 32 KB in length. How, then, were LOB values handled with respect to UNLOAD output and LOAD input? Here's how: when a table with a LOB column is unloaded with the UNLOAD utility, the non-LOB data goes into the one SYSREC data set, but each individual LOB data value goes into a separate file (the same is true, in reverse, for the LOAD utility: individual LOB data values are loaded from separate files). The name of the file into which a LOB value is placed (for UNLOAD) or from which it is retrieved (for LOAD) goes into what's called a file reference variable, or FRV, and it's the FRV values that are found in the SYSREC data set, along with the non-LOB data processed by UNLOAD and LOAD.

With regard to the file type used for LOB data UNLOAD and LOAD in a pre-DB2 10 environment, you have choices, but these come with associated trade-offs. One option is to unload LOB data values into (or load LOB data from) members of a PDS or PDSE. The goodness in that alternative is the fact that most DB2 for z/OS people are very familiar with these data set types. The negatives are related to scalability (i.e., accommodating a large amount of LOB data). Specifically:

A PDS or PDSE is limited to one disk volume.
The size of a PDS is limited to 65,536 tracks.
A PDSE can have no more than 524,236 members.

LOB data can also be unloaded to, or loaded from, HFS files (referring to a file system available via the UNIX System Services component of z/OS, also known as USS). This USS file system doesn't have the above-noted space limitations of PDS and PDSE data sets, and it delivers a performance advantage, to boot; however, a lot of mainframe DB2 people are not very familiar with HFS, so there can be a learning curve to deal with when this file system is used for DB2 LOB data. On top of that, for UNLOAD an FRV cannot refer to a sequential file (DSORG=PS), which is the type used for data sets on tape (this is true whether HFS files or members of a PDS or PDSE are used to hold unloaded LOB values). Bummer.

Enter DB2 10, and these limitations and hassles are removed because the variable-blocked spanned record format (RECFM=VBS) is supported for SYSREC data sets (use of such a data set is indicated via the new SPANNED option of the LOAD and UNLOAD utility control statements). Thanks to this enhancement,

...a table's LOB and non-LOB data can be unloaded to (or loaded from) the SAME data set! LOB values no longer have to be in separate files!
...the UNLOAD output file (or LOAD input file) can be on tape -- not just disk!
...the UNLOAD output file (or LOAD input file) can span multiple volumes (thus overcoming a PDS/PDSE restriction)!

Those exclamation points may come across as a little juvenile, but I really am psyched about this DB2 10 feature. Add to the benefits listed above a MAJOR performance boost, especially for smaller LOBs: an IBM test of an UNLOAD of 16,000-byte LOB values showed an 80% reduction in elapsed time when a VBS SYSREC data set was used versus HFS file reference variables, and a 99% elapsed time improvement for a VBS SYSREC data set as compared to PDSE FRVs.

Truly, if you are storing LOB values in a DB2 for z/OS database, or thinking of doing that, DB2 10 is the release for you. As if LOB in-lining (written about, as previously noted, in an entry I posted to this blog last week) and VBS SYSREC data sets weren't enough to convince you of that, consider these additional LOB-related goodies delivered with DB2 10:

LOAD REPLACE of data in a table containing LOB values is substantially faster with DB2 10 than with prior DB2 releases, thanks to the use of a write operation called format write for LOB table spaces (LOAD previously used format writes only for non-LOB table spaces).
INSERT of data from a DRDA client into a table with a LOB column can take considerably less time when LOB values are large, because DB2 10's distributed data facility (DDF) does not have to materialize the entire to-be-inserted LOB value before passing to to the database services address space (DDF will materialize up to 2 MB of an incoming LOB value before starting to pass the value to the database manager, and it will continue passing chunks of the value until it's all been received from the client). In addition to reducing elapsed time for network-driven inserts of larger LOBs, this feature can reduce CPU time and virtual storage consumption associated with such insert operations.
DB2 10 supports online REORG of a LOB table space with SHRLEVEL CHANGE, and it will reclaim disk space (if less is needed, as would be the case if some LOB values had been remove via DELETE operations). Previously, SHRLEVEL CHANGE could not be specified for a REORG of a LOB table space, and the utility would not reclaim space occupied by the LOB table space.
The new AUX YES option of REORG will cause the utility to reorganize LOB table spaces associated with partitions of a partitioned table space, as these base table space partitions are reorganized. Additionally, when a partitioned table space holding a table with LOB data is reorganized in a DB2 10 environment, REORG can move rows from one partition of the base table space to another -- something not done in prior releases of DB2. This functionality enables REBALANCE to be specified for a REORG of a range-partitioned table space holding a table that contains LOB data, and it also comes into play when a partition-by-growth table space associated with a LOB-containing table is reorganized (in that case, rows could be moved from one partition to another to reestablish the table's clustering sequence).
DEFINE NO can be specified on the creation of a LOB table space, so that the table space's data set(s) will not be created until data is inserted into the table space (this option was previously ignored when specified for a LOB table space). DEFINE NO can be particularly useful when DB2 objects are being defined as part of the installation of a packaged software application that will not be immediately put to use.

All good stuff. Get to DB2 10 for z/OS (if you're not there already), and reap these LOB-related benefits. If you're not yet storing LOBs in DB2 tables, get to DB2 10 and re-think that situation. More than ever, it's LOB time in mainframe DB2 land.