Robert's Db2 blog: June 2014

Thursday, June 26, 2014

DB2 for z/OS: the Functional Advantages of Native SQL Procedures

I have been a big fan of DB2 for z/OS native SQL procedures ever since the functionality was introduced with DB2 9, back in 1997. I have posted quite a few blog entries on the topic, the first in 1998 (written while I was working as an independent consultant) and the most recent just last month. In these blog entries, and in presentations I've delivered over the past several years (and in discussions generally), I've focused largely on performance aspects of native SQL procedures versus external stored procedures. These performance pluses (native SQL procedures run in the DB2 database services address space, they run under the task of the calling application process, and they are zIIP-eligible when called by a DRDA requester) are important, but lately I've found myself thinking, more and more, about the functional advantages of native SQL procedures. That's the thrust of this blog post.

Right out of the gate with DB2 9 for z/OS (new-function mode) there was some space, functionality-wise, between native and external SQL procedures: a native SQL procedure could include a nested compound statement, while an external SQL procedure could not (a compound SQL statement, typically the heart of a SQL procedure, is a group of SQL statements, set off by BEGIN and END, in which there can be variable declarations and associated assignments, along with SQL logic flow control statements such as IF, WHILE, and ITERATE). What does support for a compound SQL statement within another compound SQL statement mean to a developer? It means that he or she can create SQL procedures that have multi-statement condition handlers, for more sophisticated processing of exception and/or error conditions that might occur in the execution of the procedure.

The functional advantage of native SQL procedure usage advanced further with DB2 10 for z/OS (new-function mode), which allowed a native SQL procedure -- and only a native SQL procedure -- to have input and output parameters of the XML data type (and to specify the XML data type for variable declarations). Before DB2 10 (and even in a DB2 10 or DB2 11 environment, using anything other than a native SQL procedure), getting an XML data value to a stored procedure required serializing the XML document into character string or CLOB (depending on its size) and passing it to the stored procedure (and working with it in the stored procedure) in that form. Yuck.

DB2 11 for z/OS (in new-function mode) delivers two really cool SQL-procedure-only functional enhancements: array parameters and autonomous procedures. More information on these enhancements follow.

Array parameters

An array parameter is a parameter that contains, essentially, a "stack" of values. Before passing an array parameter to a native SQL procedure (or receiving an array as an output parameter of a native SQL procedure, or declaring and using an array variable in a native SQL procedure), you first have to create the array. Why? Because an array in a DB2 for z/OS context is a user-defined data type (UDT). In creating an array, you have two choices: ordinary and associative. In an ordinary array, elements are referenced by their ordinal position within the array (for example, the third element added to an ordinary array would be referenced as value 3 of that array). Elements in an associative array are referenced by user-provided index values; so, if I assigned an index value of 'Home' to a data value in an associative array containing phone numbers, I could reference a person's home number by using the 'Home' index value.

Here is the CREATE statement for an ordinary array:

CREATE TYPE EMPL_NUMS AS CHAR(6) ARRAY[20];

In the above example statement, CHAR(6) refers to the data type of the values placed in the array (an example would be employee number '089234'), and 20 refers to the number of values that the array can hold (if no value were there, i.e., if [] had been specified instead of [20] after the keyword ARRAY, the maximum number of values that the array could hold would default to the high positive value for the INTEGER data type, which is 2147483647).

The CREATE statement for an associative array would look like this:

CREATE TYPE SCHOOLS_ATTENDED AS VARCHAR(40) ARRAY[VARCHAR(30)];

As with the ordinary array, the data type after the AS keyword refers to the data values that will be stored in the array ('Eastern State University' could be an example). The second data type specified for the array (and that second data type is your indication that it's an associative array) refers to index values for the array ('Graduate school - Masters' could be one such value). Note that an associative array, unlike an ordinary array, does not have an explicitly or implicitly specified maximum cardinality -- the cardinality of an associative array is based on the number of unique index values used when elements are assigned to the array.

As mentioned, a DB2 for z/OS stored procedure defined with array-type input and/or output parameters (or with references to array variables in the procedure body) must be a native SQL procedure; furthermore, the CALL that invokes such a SQL procedure can come from only two types of program: another SQL procedure language routine (SQL PL is the language in which SQL procedures are coded, and in which compiled SQL scalar functions can be coded) or a Java program that accesses the DB2 for z/OS server via the IBM Data Server Driver for JDBC and SQLJ type 4 driver. That second program type is really important in my eyes. Java programmers regularly work with arrays, and plenty of those folks had requested, prior to DB2 11's debut, the ability to pass an array to, or receive an array from, a DB2 for z/OS stored procedure.

Autonomous procedures

Consider this scenario: you have a transaction for which you want to record some information for each execution, even if the transaction fails before completion and is rolled back by DB2. A rollback would undo any data changes made by the transaction, right? So, how do you persist some information associated with the transaction? With DB2 11, you can do that with an autonomous procedure, which is a type of native SQL procedure. How would this work? Well, the transaction would call the autonomous procedure, and that SQL procedure would do its thing -- inserting, for example, some data into DB2 table T1. Control would then pass back to the caller, and the transaction would do its thing -- updating, let's say, data in table T2. If the transaction fails after updating T2, what happens? DB2 will back out the transaction's change of T2 data, but the insert into T1 performed by the autonomous procedure will not be backed out. Cool, eh?

What makes a DB2 11 native SQL procedure an autonomous procedure? Technically, it's the specification of the AUTONOMOUS option in the associated CREATE PROCEDURE (or ALTER PROCEDURE) statement. AUTONOMOUS would be used in place of the COMMIT ON RETURN option. Completion of a called autonomous procedure will drive a commit, but that commit will "harden" only the data changes made by the autonomous procedure -- it will have NO EFFECT on any data changes made up to that point by the calling program. This is made possible by the fact that the autonomous procedure's DB2 unit of work is independent from that of the calling application process. Because of this independence, locks acquired by DB2 for an application process are not shared with locks acquired for an autonomous procedure called by the application process. It is therefore theoretically possible that an autonomous procedure will encounter lock contention vis-a-vis its caller. That possibility might influence decisions you'd make about the locking granularity that you'd like DB2 to use for a table space (e.g., row versus page), if an autonomous procedure and its caller will change data in the same table.

The future?

Will future versions of DB2 for z/OS introduce other enhancements that fall into the "SQL procedures only" category? We'll have to wait and see about that, but it sure has been interesting to see the progressive augmentation of native SQL procedure functionality just over the past three DB2 versions. I'll be looking for more of the same, and that's one of the reasons that I'm a native SQL procedure advocate: they get better and better.

Friday, June 13, 2014

DB2 for z/OS: Getting to Universal Table Spaces

Often, there is a bit of a time lag between the introduction of a DB2 for z/OS feature and the widespread adoption of the new technology. Take universal table spaces, for example. These were introduced with DB2 9 for z/OS (almost 7 years ago), but some organizations are only now beginning to convert non-universal table spaces to the universal variety. In this blog post I want to go over the incentives for undertaking a conversion to universal table spaces, highlight the important way in which DB2 10 eased the conversion process, and raise some matters one should consider in the course of effecting table space conversions.

Why universal?

As I see it, their are two main reasons to convert non-universal table spaces to the universal kind. First, it's the only way in which you can leverage the benefits of partition-by-growth table spaces, one of two varieties of universal table space (the other being partition-by-range). A partition-by-growth (PBG) table space -- as the name implies -- is one that is partitioned as needed to accommodate a table's growth; so, if in creating a table space (or altering an existing table space) one specifies a DSSIZE (data set size) of 2G (2 gigabytes) then upon the table reaching 2 GB in size DB2 will add a second partition to the table space. If that partition fills up (i.e, when it reaches 2 GB in size), a third partition will be added to the table space, and so on up to the maximum number of partitions specified for the table space (that being MAXPARTITIONS, an alterable value).

What's good about this? Well, first and foremost it eliminates the 64 GB size limit that previously existed for table spaces that are not range-partitioned -- a PBG table space, like a range-partitioned table space, can reach a size of 128 TB (and that's for the non-LOB data in the table -- with LOB data the data capacity of a table space can far exceed 128 TB). Of course, you might think of a table in your DB2 for z/OS environment that would never approach 64 GB in size, and wonder, "Why should I convert THAT table's table space to universal PBG?" That would be a good question, if size were the only incentive for converting a non-universal table space to universal. There are, in fact, quite a few non-size-related reasons for getting these conversions done. I'll get to these presently, but first I want to clear up a misconception. Some folks think that PBG table spaces are not appropriate for small tables because of the "P" (for partition) in PBG: we've historically thought of partitioning as a means of getting more than 64 GB of data in a table, and so we equate "partition" with "big" and write off PBG for smaller tables. Time to change that thinking. Is PBG a good choice for a table that will never hold more than, say, 20 KB of data (this could be a reference or code table), even with 1G being the smallest allowable DSSIZE value? Sure it is. Will that little table's table space be 1 GB in size, with 20 KB of the space used and the rest wasted? Of course not. The table space's physical size will be determined by its PRIQTY and SECQTY specifications (primary and secondary space allocation, respectively). If those specifications are chosen appropriately, the table with 20 KB of data will occupy 20 KB of disk space. The 1G DSSIZE specification means that IF the table space size reaches 1 GB then DB2 will add another partition to the table space. If the table space size never reaches 1 GB then the table space will stay at one partition. Got it?

OK, on to the non-size related incentives for going universal. A growing list of DB2 features can ONLY be used in conjunction with universal table spaces. These include (and I've indicated the DB2 release through which these features were introduced):

Clone tables (DB2 9)
Hash-organized tables (DB2 10)
"Currently committed" locking behavior (DB2 10 -- a means of reducing lock contention)
Pending DDL (DB2 10 -- more on this below)
LOB inlining (DB2 10)
XML multi-versioning (DB2 10 -- required for a number of XML-related enhancements)
ALTER TABLE with DROP COLUMN (DB2 11)

It is likely that future releases of DB2 for z/OS will introduce more features with a universal table space requirement.

Getting there got a lot easier starting with DB2 10

In my universal table space incentive list above I mentioned pending DDL. This is a capability that became available with DB2 10 running in new-function mode. Big picture-wise, what pending DDL made possible was the non-disruptive alteration of a number of aspects of the definition of a table space or table or index. Want to change the SEGSIZE of a table space? The page size of an index? The DSSIZE (data set size) of a table space? No problem. Thanks to pending DDL, you just issue the appropriate ALTER statement and then materialize the change by way of an online REORG. Between the ALTER and the online REORG, is application access to the target object compromised? No. The table space or index affected is placed in the new (with DB2 10) and non-restrictive AREOR state (which basically means that a pending DDL change has been issued for the object but has not yet been materialized via online REORG).

So, what does this have to do with conversion of non-universal to universal table spaces? Well, it so happens that the only pending DDL change possible for a non-universal table space is a change that would, when materialized, result in the non-universal table space becoming universal. For a simple or segmented table space containing a single table, that change is an ALTER TABLESPACE that adds a MAXPARTITIONS specification to the object's definition. Issue such an ALTER, and after a follow-on online REORG the table space will be a universal partition-by-growth (PBG) table space. For a table-controlled partitioned table space, the change is an ALTER TABLESPACE that adds a SEGSIZE specification to the object's definition. Run an online REORG for the object after the ALTER, and voila -- you have a universal partition-by-range (PBR) table space. Easy.

Some considerations

As you plan for the conversion of your non-universal table spaces to the universal variety, there are things to which you should give some thought:

The non-disruptive process described above for converting simple and segmented table spaces to PBG universal table spaces (ALTER with MAXPARTITIONS, followed by online REORG) is available, as I mentioned, only for single-table simple and segmented table spaces. For a simple or segmented table space containing multiple tables, you'd have to either go the unload/drop/create/re-load route (with one PBG universal table space created for each table in the multi-table simple or segmented table space), or wait and see if a future release of DB2 provides a non-disruptive universal table space conversion process for multi-table simple and segmented table spaces (this is a known requirement).
The non-disruptive process for converting traditional range-partitioned table spaces to PBR universal table spaces (ALTER with SEGSIZE, followed by online REORG) is available, as I mentioned, only for table-controlled partitioned table spaces. For an index-controlled partitioned table space, you'll first need to accomplish the conversion to table-controlled partitioning. That's most easily done via issuance of an ALTER INDEX statement with NOT CLUSTER for an index-controlled partitioned table space's partitioning index (as described in a blog post I wrote a couple of years ago).
Materialization of the change to universal table space from non-universal (via online REORG after the appropriate ALTER statement) will invalidate packages that depend on a given table space. These packages will be automatically rebound (by default) when the associated programs are next executed, or you can rebind them explicitly. By default, plan management should be active on your system (i.e., the value of the PLANMGMT parameter in ZPARM should be EXTENDED or BASIC). That being the case, on the off chance that a package rebind operation leads to performance degradation (performance will typically be the same or better following a package rebind), you can very quickly switch back to the previous instance of a package via a REBIND PACKAGE command with the SWITCH(PREVIOUS) option.

There you have it. If you've not already started the process of converting your non-universal table spaces to universal PBG or PBR table spaces, it's time to get that effort going. Your organization will benefit from your actions.