Robert's Db2 blog: DB2 10 for z/OS: Take Advantage of "Native" SQL User-Defined Functions

Wednesday, September 25, 2013

DB2 10 for z/OS: Take Advantage of "Native" SQL User-Defined Functions

Lots and lots of mainframe DB2 people know about the native SQL procedure technology delivered with DB2 9 for z/OS. That was, in my opinion, the most important enhancement of this aspect of DB2 functionality since stored procedures were introduced in the mid-1990s with DB2 for z/OS Version 4 (I've blogged multiple times about native SQL procedures, starting with an entry I posted a few years ago to the blog I maintained while working as an independent DB2 consultant). Many DB2-using organizations around the world are reaping the benefits of native SQL procedure utilization, but plenty of folks have overlooked a related enhancement provided by DB2 10 for z/OS: "native" SQL user-defined functions (UDFs). I have "native" in quotes because officially these UDFs are not known by that designation -- they are simply referred to in the DB2 10 SQL Reference as SQL scalar UDFs and SQL table UDFs; however, they share with native SQL procedures several characteristics that make them a great choice for packaging server-side data manipulation logic, so calling DB2 10 SQL UDFs "native" appeals to me. In this entry, I'll explain why native SQL UDFs (not gonna put quotes around native from here on out) are not only an advance in terms of programming productivity, but also, in some cases, a way to improved application performance versus traditional, "external" UDFs.

First, a little review. UDFs, like stored procedures, provide (as noted) a means of packaging data access logic on the DB2 for z/OS server. One of the main differences between UDFs and stored procedures is the way in which they are invoked. Stored procedures, of course, are invoked via a CALL statement, like this one:

CALL MYPROC(:V1, :V2);

UDFs, on the other hand, are invoked as a result of appearing in an SQL DML statement -- usually a SELECT statement. For example, if I have a simple UDF, called KM_MILES, that converts kilometers to miles, it could be executed by way of a statement like the one below:

SELECT KM_MILES(10) FROM SYSIBM.SYSDUMMY1;

This difference in the means of invocation is one reason why a person might choose to package some data manipulation logic in the form of a UDF as opposed to a stored procedure -- sometimes it's preferable to invoke the packaged functionality via a SELECT (or other SQL DML statement), versus a CALL statement (and sometimes the reverse is true, and a stored procedure will be favored over a UDF).

Now, one of the great things about native SQL procedures is that they significantly expanded the population of people who could develop stored procedures in a DB2 for z/OS environment: if you could write SQL, you could develop a native SQL procedure -- you didn't have to know COBOL or Java or one of the other languages used to develop traditional external DB2 stored procedures (external SQL procedures, introduced with DB2 Version 7 for z/OS, similarly expanded the stored procedure development population, but external SQL procedures execute as C language stored procedures, and many organizations prefer the "in-DB2" nature of native SQL procedures). Development-oriented DBAs, in particular, got heavily involved with stored procedure development at numerous sites, thanks to native SQL procedure technology. In the same way, native SQL UDFs extended DB2 for z/OS UDF development capabilities to people who know SQL but don't know COBOL or Java or C or one of the other external UDF languages.

Here, I have to make an acknowledgement. It is true that UDFs could be written in SQL prior to DB2 10 for z/OS; however, there were significant restrictions placed on such UDFs. Basically, in a pre-DB2 10 system the functionality of a SQL UDF was limited to what you could code in the UDF's RETURN statement. And the SQL expression in that RETURN statement couldn't reference a column name. And, it couldn't contain a scalar fullselect (i.e., you couldn't generate the returned value via a SELECT statement). Also, data-changing SQL statements (INSERT, UPDATE, DELETE) were not allowed (you could specify READS SQL DATA in the CREATE FUNCTION statement, but MODIFIES SQL DATA was not an option). If you wanted to declare variables and assign values to same in your SQL UDF, you were out of luck. Logic control SQL statements, such as GOTO, IF, and WHILE? Nope. Oh, and SQL table UDFs, which return a set of rows versus a single value? Forget about it (or, as my friends up New York City way might say, "fuhgeddaboudit").

With DB2 10 (in new-function mode), those restrictions went away. A DB2 10 SQL UDF can call a stored procedure, consume a result set generated by a called stored procedure, declare variables and assign values to those variables, loop through a set of statements, provide condition handlers, and more (and, DB2 10 allows for the coding of row-set-returning table UDFs in SQL). So DB2 10 not only expands the population of people who can develop UDFs (because UDFs can be written in SQL), it also enables SQL UDF developers to code routines that can do all kinds of useful things -- the possibilities have been hugely extended versus the DB2 9 situation. Programming productivity is further enhanced by the fact that DB2 10 SQL UDFs (like native SQL procedures) do not have the external-to-DB2 artifacts and processes associated with external routines -- there is no need for precompile, compile, and linkedit steps; no object libraries; no load libraries; no WLM execution environment (though you can specify a WLM environment to be used for running a SQL UDF in debug mode).

And what about the performance angle? There are several benefits here. For a client-server workload involving access to DB2 via the DDF address space from DRDA requesters, there's the fact that a SQL UDF runs under the task of the process invoking the UDF, as opposed to running under a TCB in a WLM-managed address space. When the application process is a DRDA requester, the task in the z/OS system is an enclave SRB in the DDF address space, and that means substantial zIIP offload for SQL UDF-related processing. Also related to DDF-using applications: the package associated with a SQL UDF can be bound with RELEASE(DEALLOCATE), providing the CPU efficiency boost associated with high-performance DBATs when the UDF is invoked through a SQL statement issued by a DRDA requester.

Another performance benefit has to do with the elimination of "thread-task switching" when SQL UDFs are utilized instead of external UDFs. Here's what I mean by "thread-task switching": when an external UDF is invoked, the DB2 thread of the application process has to be switched from the task of the application process to the task of the external UDF. When the UDF has completed its processing, the DB2 thread is switched back to the task of the UDF-invoking application process. When the UDF is a SQL UDF, there is no "other" task involved, because (as mentioned) the SQL UDF runs under the task of the invoking application process (and runs in the DB2 DBM1 address space); thus, there is no need to switch a DB2 thread from one task to another and back. If a UDF is invoked once in the course of the invoking SQL statement's execution, the lack of thread-task switching in the SQL UDF case may not be noticeable. But what if the UDF is invoked LOTS of times in the course of an SQL statement's execution? I heard recently of an interesting real-world situation of this nature. A DB2 for z/OS DBA told me of a SELECT statement at his site that included a COBOL-coded external UDF in the select-list and in the WHERE clause of the statement. One execution of this statement could drive over 100,000 executions of the external UDF. Statement elapsed time could go to several minutes. Much of that time was UDF time, and the vast majority of UDF elapsed time was shown by a monitor to be TCB wait time. In response to this performance issue, the DBA replaced the COBOL external UDF with a functionally equivalent SQL UDF (the organization is running DB2 10 for z/OS in new-function mode). The result? UDF-related TCB wait time was eliminated, and response time for the query went way down.

Believe me, folks, the ability to code functionally rich, native SQL UDFs is one of the real gems provided with DB2 10 for z/OS. It's a gem that has not received a lot of attention (understandable, given the many "wow" features of DB2 10, such as temporal data support, hash-organized tables, and LOB inlining), but it's one that you shouldn't overlook. Consider how you could put this technology to work in your shop.

12 comments:

AnonymousDecember 14, 2013 at 8:09 AM
Robert, you wrote, "the package associated with a SQL UDF can be bound with RELEASE(DEALLOCATE)." How is this accomplished?

CREATE/ALTER FUNCTION for an SQL scalar UDF has several bind options but not one for RELEASE, which is unlike CREATE/ALTER PROCEDURE for a native SP.

This suggests that an extra REBIND PACKAGE is required, but why was the option left out, seemingly deliberately?

ALTER FUNCTION states:

"REGENERATE is different than the REBIND PACKAGE command. REBIND PACKAGE rebinds the SQL statements (usually to generate better access paths for those statement) but the SQL control statements in the function definition are not rebound."

What are these "control statements" and what are the implications of their not being "rebound" in the DEALLOCATE case?
ReplyDelete
Replies
AnonymousDecember 14, 2013 at 8:15 AM
Robert, you wrote, "vast majority of UDF elapsed time was shown by a monitor to be TCB wait time." Does this mean Class 7 elapsed time - Class 8 CPU time for the UDF package in the Accounting report? How did you know what was causing the wait time?

Why was a UDF being used in the SELECT instead of using a common routine in the calling program?
ReplyDelete
Replies
AnonymousDecember 14, 2013 at 8:28 AM
Robert, CREATE FUNCTION for SQL table UDFs does not seem to indicate support for the procedure language unlike SQL scalar UDFs and is missing the bind options available for SQL scalar UDFs. From the examples in the manual it appears that SQL table UDFs are limited to a single SELECT in the RETURN with no bind options. Is this a correct interpretation? Can REBIND PACKAGE be used to get the desired bind options like EXPLAIN? Why were they deliberately left out?

It is not evident what is allowed for the RETURN-statement.
ReplyDelete
Replies
AnonymousDecember 15, 2013 at 11:36 AM
Robert, in a previous post I should have written:
Class 7 elapsed time - Class 8 total suspension time for the UDF package in the Accounting report?

Or were you referring to the Class 7 NOT ACCOUNTED time?
ReplyDelete
Replies
AnonymousDecember 17, 2013 at 8:43 AM
Robert, thank you for your answers.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.