Thursday, July 9, 2015

How Will You Use DB2 for z/OS Global Variables?

About seven years ago, in an entry posted to the blog I maintained while working as an independent DB2 consultant (prior to rejoining IBM), I wrote of the evolution of DB2 for z/OS SQL from merely a data manipulation language to a programming language. That evolution continues with DB2 Version 11, in part through the introduction in that DB2 release of global variable functionality. If you want to know more about this new application enabling feature of DB2 for z/OS, read on.

As SQL (in a DB2 for z/OS context) has become an ever-richer language, the scenarios in which lines of code written in some other language (e.g., COBOL or C or Java) have to be interspersed with SQL statements to accomplish some objective have decreased in number. A few more such scenarios were eliminated thanks to global variables. In particular, consider the situation in which you want to use one query to obtain a value from a DB2 table, which you then plug into a predicate of a second query. You could certainly write a simple routine (in COBOL or some other programming language) that declares a host variable, populates it with a value retrieved by a query of a table, and then issues a second query with the host variable referenced in a predicate. The pseudo-code would look something like this:

DECLARE VARIABLE :hvar AS CHARACTER STRING
SELECT COL1 FROM T1 INTO :hvar WHERE COL2 = '0123'
SELECT COL3 FROM T2 WHERE COL4 = :hvar

Now, you might say, "I don't need COBOL (or Java or whatever) to do that. I can write a native SQL procedure that gets this done using nothing but SQL statements." True enough, but there are scenarios in which use of a SQL PL routine would not be appropriate -- or maybe not even possible. Suppose, for example, that you're working with a tool (Data Studio, perhaps, or the SPUFI interface that comes with DB2) through which one interactively issues SQL statements for execution by a DB2 subsystem. What then? That's where a DB2 11 global variable (if you're in a DB2 11 new-function mode environment) comes in handy. Here are some statements that I executed on one of our IBM test DB2 systems, by way of SPUFI:

CREATE VARIABLE ROBSBLOG VARCHAR(30) DEFAULT 'HELLO';

SET ROBSBLOG = 'PRODUCT_SALES';

SELECT CAST(NAME AS VARCHAR(20)) AS NAME, COLNO
FROM SYSIBM.SYSCOLUMNS                        
WHERE TBNAME = ROBSBLOG                       
ORDER BY COLNO; 


And the result of that SELECT is:
                       
---------+---------+---------+---------+-------
NAME                   COLNO                  
---------+---------+---------+---------+-------
PRODUCT_ID                 1                  
MONTH                      2                  
UNIT_SALES                 3            
      


Easy breezy. And, no coding required beyond the SQL statements.

Now, there are a few informational items of which you should be aware regarding the use of DB2 global variables:

The value of a global variable is associated with the thread of an application process; thus, if the value of a global variable is set by a transaction program, the global variable's value will be discarded when the transaction ends and its thread is deallocated. If a global variable's value is set in a batch job, the variable's value will persist across COMMITs issued by the batch job (though it can of course be subsequently changed by the batch job), and will be discarded when the job completes and its thread is deallocated. In the case of my SPUFI exercise mentioned above, to make those SQL statements work as desired I had to set AUTOCOMMIT in my SPUFI session to NO. Had I not done that then the auto-COMMIT after my SET statement (referring to the statement that assigned the value 'PRODUCT_SALES' to the global variable) would have resulted in the assigned value being discarded, and then the statement

SELECT ROBSBLOG FROM SYSIBM.SYSDUMMY1;

would have returned the value 'HELLO' (the default value specified in my CREATE VARIABLE statement).

A DB2 global variable, once created, can be used by any authorization ID or role with the requisite privilege. An ID or role can be granted the READ or WRITE privilege (or both) for a given global variable. The owner of a global variable has, implicitly, all privileges on that variable.

Though a created global variable can be used by any authorization ID or role with the requisite privilege, a global variable's value does not extend beyond a given session (i.e., a given thread). So, if I create a global variable called VAR_X, and I set that variable to the value 'ABC' (assuming that it's a character string variable), another application process (with the right privilege) can use VAR_X, but the value 'ABC' that I assigned to VAR_X will not be picked up in any other session that references VAR_X; so, a global variable itself is globally available within a DB2 subsystem, but a value assigned to the global variable within a session does not extend to any other session.

Some global variables come with DB2 11 for z/OS. Besides global variables that you might create and use, there are several that are already there in a DB2 11 system. These built-in global variables are:

SYSIBM.CLIENT_IPADDR
SYSIBMADM.GET_ARCHIVE
SYSIBMADM.MOVE_TO_ARCHIVE

The first of these provides the IP address associated with a network-connected DB2 client application, or of a requester DB2 subsystem that establishes a connection to a remote DB2 host (for a local-to-DB2 application, the value of SYSIBM.CLIENT_IPADDR will be NULL). The other two built-in global variables are for use with the DB2-managed archiving (aka transparent archiving) feature of DB2 11.

There you have it. Yet another in a long line of DB2 for z/OS enhancements that have truly made SQL a programming language. When you get to DB2 11 in new-function mode (or if you're already there), give global variables a try-out, and think about how you might put them to work at your site.

Thursday, June 18, 2015

Are You Using DB2 for z/OS Real-Time Statistics? DB2 Is

DB2 for z/OS is making increased use of the information that it keeps in the real-time statistics tables in the DB2 catalog. You should be doing likewise. Read on to learn more.

For a long time, the only statistics to be found in DB2 "system" tables were those generated through execution of the RUNSTATS utility. Starting with DB2 Version 7 (around the 2000 time frame), DB2 could automatically and periodically (every 30 minutes, by default) write a variety of statistics about table spaces and index spaces to a pair of tables (one for table space statistics, and one for index space statistics), but only if you created those tables. With DB2 9 in new-function mode, the real-time statistics information was moved from user-created tables to a couple of new catalog tables, SYSIBM.SYSTABLESPACESTATS and SYSIBM.INDEXSPACESTATS (I wrote about that change a few years ago, in an entry posted to the blog that I maintained while working as an independent DB2 consultant, prior to re-joining IBM). In those days, real-time statistics were mainly used by DBAs, via vendor tools or by way of user-written programs (the latter often written in REXX, and sometimes involving invocation of the DB2-supplied stored procedure called DSNACCOX), to make more intelligent decisions regarding the execution of utilities such as REORG and COPY. That's still a very good use of DB2 real-time statistics, but what has interested me more of late -- and what I'm writing about here -- is DB2's increasing use of these statistics in support of new autonomic functions, and a number of newer columns in SYSIBM.SYSTABLESPACESTATS and SYSIBM.SYSINDEXSPACESTATS that serve useful purposes beyond smarter utility management. I'll cover these topics now, in that order.

Increased use of real-time statistics by DB2 itself

Rather than try to provide an exhaustive list of each and every usage by DB2 itself of its own real-time statistics, I'll highlight a few that are particularly interesting to me:
  • DB2-managed allocation of utility sort work data sets -- This was a terrific enhancement, in my opinion, and I blogged about it back in 2011. By way of this new feature, built into the DB2 10 base code and retrofitted to DB2 Versions 8 and 9 via PTFs, DB2 utilities such as REORG, LOAD, REBUILD INDEX, and RUNSTATS could dynamically, automatically, and intelligently (using real-time statistics) manage the allocation of the sort work data sets required for their execution. At a growing number of DB2 for z/OS sites, utility JCL and DB2 ZPARM parameter values have been changed to put the DB2 utilities in charge of sort work data set allocation, and every such site that has communicated with me about that change has reported positive results in terms of both utility job reliability (a higher percentage of utility jobs successfully run to completion) and efficiency. That said, at too many sites utility sort work data sets are still being allocated the old way. If that's true where you work, read that aforementioned blog entry and make the changes described therein. It would be a good investment of your time.
  • Automatic calculation by REORG of the appropriate fixed hash space for hash-organized table spaces. DB2 10 for z/OS introduced hash-organized tables -- a row-organizing mechanism that enables super-efficient data access for a query that issues a singleton SELECT containing an "equals" predicate that references a particular unique key of the target table. For hash organization of data to be effective, a table space's fixed hash area must be of a sufficient size; however, you don't want to overdo the sizing of this area, because that would waste space (on disk and in memory). A good way to get a "just right" fixed hash area for a hash-organized table space is to let DB2 figure that out, and that's just what the REORG utility will do -- using real-time statistics -- when you run REORG with AUTOESTSPACE YES (the default) for the table space in question.
  • DB2-calculated space to accommodate table row length increases caused by UPDATEs. Starting with DB2 11 for z/OS, you can set aside space in a table space's pages that will be reserved exclusively for the accommodation of row length-increasing UPDATE statements (previously, space in a table space's pages could be set aside only for accommodation of INSERTs). If in a CREATE TABLESPACE or ALTER TABLESPACE statement you specify PCTFREE FOR UPDATE -1, for the associated table space the initial amount of space set aside in each page (following a REORG or a LOAD REPLACE utility operation) for length-increasing UPDATEs will be 5%, and that amount will subsequently be adjusted by DB2 (for subsequent REORG or LOAD REPLACE jobs) based on real-time statistics information (setting the new ZPARM parameter PCTFREE_UPD to AUTO will make -1 the default value of PCTFREE FOR UPDATE in CREATE TABLESPACE and ALTER TABLESPACE statements).
  • Identification of indexes in need of pseudo-deleted index entry clean-up. Starting with DB2 11 for z/OS, DB2 gets much better at physically deleting index entries that had been marked as pseudo-deleted in the course of row-delete operations. To determine which indexes are most in need of pseudo-deleted index entry clean-up (done by a background process), DB2 periodically checks relevant values in the SYSIBM.SYSINDEXSPACESTATS catalog table. You can read more about this in section 4.2 of the IBM redbook titled, "IBM DB2 11 for z/OS Performance Topics" (downloadable at http://www.redbooks.ibm.com/abstracts/sg248222.html?Open).
 
As DB2 itself is making ever-greater use of real-time statistics, so should you -- and not just to better manage utility operations (though that, as mentioned, continues to be a good use of real-time stats). Below I'll call attention to some of my favorite columns in the two real-time statistics table spaces.

Some of my favorite SYSTABLESPACESTATS columns
  • DATASIZE -- The number of bytes occupied by rows in a table space.
  • REORGCLUSTERSENS -- Why REORG a table space to reestablish clustering sequence, if that isn't important for the table space? This column provides information to let you make that determination.
  • UPDATESIZE -- This column (new with DB2 11) lets you know of the extent to which a table space is growing (or shrinking) as a result of length-increasing (or length-decreasing) UPDATE operations.
  • LASTDATACHANGE -- When was data in the table space last modified?
  • REORGNEARINDREF and REORGFARINDREF -- If you set aside space in a table space's pages for length-increasing UPDATE operations (a new option, as previously noted, with DB2 11), did that action reduce the occurrence of indirect references that are the result of row relocations caused by update-related row-length increases?
 
Some of my favorite SYSINDEXSPACESTATS columns
  • LASTUSED -- A great way to identify indexes that aren't doing you any good, and could therefore be dropped (following some due diligence on your part) to free up disk space and to reduce the CPU cost of insert and delete (and some update) and utility operations.
  • REORGNUMLEVELS -- Are levels being added to, or removed from, any of your indexes?
  • DRIVETYPE -- Same information as the identically-named column in SYSTABLESPACESTATS, but for index spaces.
  • REORGLEAFFAR -- Are index page splits causing index leaf pages to be located far from where they optimally should be in an index? If so, would a larger page size make sense for this index?

I encourage you to look over all of the columns of SYSTABLESPACESTATS and SYSINDEXSPACESTATS. I'm pretty sure that you'll find some that will be helpful to you. Take advantage of this information. DB2 is.

Thursday, May 28, 2015

DB2 11 for z/OS: Setting Aside Space for Updates

A DB2 table space is well-organized when rows are physically located where they should be per the clustering index of the associated table. Because a high degree of "clustered-ness" is a desirable quality of a table space (mostly for performance reasons), a DB2 DBA will often take steps to preserve row sequencing between REORGs. One such step that is very commonly executed is the setting aside of space in a table space's pages to accommodate inserts of new rows (relevant for a table that is clustered by a key that is NOT continuously ascending -- in the case of a continuously-ascending clustering key, new rows will be inserted at the "end" of the table). This insert-accommodating space set-aside is accomplished by way of the PCTFREE option of CREATE and ALTER TABLESPACE.

PCTFREE has always been a nice tool in the DB2 DBA's physical database design tool kit, but it had a shortcoming: it didn't apply to UPDATE operations that caused a row's length to increase. If a row in a DB2 for z/OS table space becomes longer as a result of an UPDATE, and if that longer row will no longer fit into the page in which it had been stored, it will be moved to another page in the table space. Will the row's location, as indicated by the RID (row ID) for the row in entries of indexes defined on the table, be changed to reflect the update-driven move of the row to a new page? No. Index entries associated with the row will continue to point to the row's former location. When DB2, in using an index to build a query result set that will include the row that was moved because of a length-increasing UPDATE, arrives at the "original home" page to retrieve the row, it will find, in effect, a "We've moved!" sign, along with a pointer to the row's new page. And, if that moved row is again lengthened by an UPDATE and again is moved (if it became too long to go back into its "second home" page), another "We've moved!" sign will be posted, along with a pointer to the row's "thIrd home" page. And so on.

These "We've moved!" signs related to update-induced row relocations are known, in DB2 parlance, as indirect references. Why are things done this way? Because updating a row's RID in a table's indexes every time a row-lengthening UPDATE caused a row move would add too much cost to UPDATE operations (the affected RIDs will be corrected with the next table space REORG). Obviously, rows with varying-length columns (such as VARCHAR columns) can be lengthened (and potentially relocated) as a result of UPDATE operations. So, too, can rows containing only fixed-length columns. How's that? Compression, that's how -- a row in a COMPRESS YES table space might compress differently (and with more length) with different column values resulting from an UPDATE.

Because indirect references are a drag on query performance (because of the hops to "new home" pages to which lengthened-beyond-fitting rows are relocated), one would like to reduce their occurrence. But how? There wasn't a good answer to this question until DB2 11 came along and provided the new PCTFREE FOR UPDATE option for CREATE and ALTER TABLESPACE. Here's how it works: if you set PCTFREE FOR UPDATE for a table space to, say, 10 then 10% of the space in the table space's pages (following a REORG or a LOAD REPLACE) will be set aside for the sole purpose of accommodating length-increasing  UPDATEs. In other words, that space will NOT be available for newly inserted rows. Can you still set aside space for to-be-inserted row's in a table space's pages? Of course you can. If you wanted 15% of the space in pages of a table space to be set aside (again, following a REORG or a LOAD REPLACE) for new INSERTs, and 10% set aside for length-increasing UPDATEs, you'd specify the following:

PCTFREE 15 FOR UPDATE 10

Would it be appropriate to specify a PCTFREE  FOR UPDATE value for each and every one of your DB2 tables? Probably not. An unneeded, non-zero PCTFREE FOR UPDATE value will result in wasted space in a table space's pages. It's best to use this DB2 enhancement for tables that are the targets of length-increasing UPDATE operations. Is it easy to identify table spaces that are associated with such tables? Sure it is -- just check the new (with DB2 11) UPDATESIZE column of the SYSIBM.SYSTABLESPACESTATS real-time statistics table in the DB2 catalog.The value in this column shows the extent to which a table space grew (or shrank) as a result of length-changing UPDATE operations since the most recent REORG or LOAD REPLACE. Speaking of real-time stats, here's an interesting option: if you specify PCTFREE FOR UPDATE -1 for a table space, the initial space set aside in the table space's pages for length-increasing UPDATEs will be 5%, and that amount will subsequently be adjusted by DB2 based on certain real-time statistics values. This is one of a growing number of examples of DB2 using its own real-time statistics to enhance efficiency of operation -- I'll be writing more about that soon.

One more thing: there is a new (with DB2 11) ZPARM, PCTFREE_UPD, that provides the default value for PCTFREE FOR UPDATE for ALTER and CREATE TABLESPACE statements. Setting PCTFREE_UPD to AUTO would make PCTFREE FOR UPDATE -1 (described above) the default.

How do you know if you've done good by specifying a non-zero PCTFREE FOR UPDATE value for a table space? Check to see that the incidence of indirect references (i.e., "We've moved!" signs) has decreased. Indirect references for table spaces can be tracked via the NEARINDREF and FARINDREF columns of the SYSIBM.SYSTABLEPART table, and the REORGNEARINDREF and REORGFARINDREF columns of SYSIBM.SYSTABLESPACESTATS. [An indirect reference is "near" if the associated row was moved to a page within 16 pages (or SEGSIZE/2, for a segmented table space -- and universal table spaces are segmented) of its previous "home" page; otherwise, the indirect reference is considered to be "far".]

By the way, PCTFREE FOR UPDATE is expected to be particularly useful for tables that have VARCHAR columns that are initially set to NULL and are later updated to have non-null values -- that is a scenario in which you'd expect rows to become significantly longer as a result of UPDATEs.

And there you have it. Nice feature, eh? When you get to DB2 11 (or if you're already there), take advantage of it.

Wednesday, May 13, 2015

For a Large DB2 for z/OS Table, Should You Go With Partition-by-Range or Partition-by-Growth?

There are several aspects to this question: What do I mean by "large?" Is the table in question new, or does it exist already? What is the nature of the data in the table, and how will that data be accessed and maintained? I'll try to cover these various angles in this blog entry, and I hope that you will find the information provided to be useful.

What is a "large" table?

Why even ask this question? Because the relevance of the partition-by-range (PBR) versus partition-by-growth (PBG) question is largely dependent on table size. If a table is relatively small, the question is probably moot because it is unlikely that range-partitioning a smaller table will deliver much value. Partitioning by growth would, in that case, be the logical choice (for many smaller tables, given the default DSSIZE of 4G, a PBG table space will never grow beyond a single partition).

OK, so what is "smaller" and what is "larger" when you're talking about a DB2 for z/OS table? There is, of course, no hard and fast rule here. In my mind, a larger DB2 for z/OS table is one that has 1 million or more rows. That's not to say that a table with fewer than 1 million rows would never be range-partitioned -- it's just that the benefits of range-partitioning are likely to be more appealing for a table that holds (or will hold) millions of rows (or more).

When the table in question is a new one

This, to me, is the most interesting scenario, because it is the one in which the options are really wide open. I'll start be saying that you definitely want to go with a universal table space here, primarily because a number of recently delivered DB2 features and functions require the use of universal table spaces. But should the table space be PBR or PBG? A partition-by-growth table space can be as large as a partition-by-range table space, so that's not a differentiator. What, then, would be your criteria?

To me, the appeal of a PBG table space is mostly a factor of it being a labor-saving device for DB2 for z/OS DBAs. PBG table spaces have an almost "set it and forget it" quality. There is no need to identify a partitioning key, no need to determine partition limit key values, no worries about one partition getting to be much larger than others in a table space. You just choose reasonable DSSIZE and MAXPARTITION values, and you're pretty much done -- you might check back on the table space once in a while, to see if the MAXPARTITION value should be bumped up, but that's about it. Pretty sweet deal if you're a DBA.

On the other hand, PBR can deliver some unique benefits, and these should not be dismissed out of hand. Specifically:
  1. A PBR table space provides maximum partition independence from a utility perspective. You can even run the LOAD utility at the partition level for PBR table space -- something you can't do with a PBG table space. You can also create data-partitioned secondary indexes (DPSIs) on a PBR table space (not do-able for a PBG table space), and that REALLY maximizes utility-related partition independence (though it should be noted that DPSIs can negatively impact the performance of queries that do not reference a PBR table space's partitioning key).
  2. PBR table spaces enable the use of page-range screening, a technique whereby the DB2 for z/OS optimizer can limit the partitions that have to be scanned to generate a result set when a query has a predicate that references a range-partitioned table space's partitioning key (or at least the lead column or columns thereof). Page-range screening doesn't apply to PBG table spaces, because a particular row in such a table space could be in any of the table space's partitions.
  3. A PBR table space can be a great choice for a table that would be effectively partitioned on a time-period basis. Suppose, for example, that the rows most recently inserted into a table are those most likely to be retrieved from the table. In that case, date-based partitioning (e.g., having each partition hold data for a particular week) would have the effect of concentrating a table's most "popular" rows in the pages of the most current partition(s), thereby reducing GETPAGE activity associated with retaining sets of these rows. Date-based partitioning also enables very efficient purging of a partition's data (when the purge criterion is age-of-data) via a partition-level LOAD REPLACE operation with a dummy input data set (the partition's data could be first unloaded and archived, if desired).
  4. A PBR table space tends to maximize the effectiveness of parallel processing, whether of the DB2-driven query parallelization variety or in the form of user-managed parallel batch jobs. This optimization of parallel processing can be particularly pronounced for joins of tables that are partitioned on the same key and by the same limit key values.
Those are some attractive benefits, I'd say. Still, the previously mentioned DBA labor-saving advantages of PBG table spaces are not unimportant. That being the case, this is my recommendation when it comes to evaluating PBR versus PBG for a large, new table: consider first whether the advantages of PBR, listed above, are of significant value for the table in question. If they are, lean towards the PBR option. If they are not, PBG could be the right choice for the table's table space. In particular, PBG can make sense for a large table for which access will be mostly through transactions (as opposed to batch jobs), especially if those transactions will retrieve small result sets via queries for which most row filtering will occur at the index level. In that case, the advantages of range-partitioning could be of limited value.

When the table space in question is an existing one

Here, the assumption is that the table space is not currently of the universal type. When that is true, and the aim is (as it should be) to convert the table space from non-universal to universal, the PBR-or-PBG decision will usually be pretty straightforward and will be based on the easiest path to universal: you'll go with universal PBR for an existing non-universal range-partitioned table space (if it is a table-controlled, versus an index-controlled, partitioned table space), because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a SEGSIZE for the table space) followed by an online REORG (if you are have DB2 10 running in new-function mode, or DB2 11). Similarly, for an existing non-partitioned table space (segmented or simple, as long as it contains only one table), you'll go with universal PBG because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a MAXPARTITIONS value for the table space) followed by an online REORG (again, if your DB2 environment is Version 10 in new-function mode, or DB2 11).

I recently encountered an exception to this rule: if you have a non-universal, range-partitioned table space, with almost all of the data in the last of the table space's partitions (something that could happen, depending on how partition limit keys were initially set), you might decide not to go for the non-disruptive change to universal PBR, because then you'd have a PBR table space with almost all of the data in the last of the table space's partitions. Yes, with enough ALTER TABLE ALTER PARTITION actions, you could get the table's rows to be spread across many partitions (and with DB2 11, alteration of partition limit key values is a non-disruptive change), but that would involve a lot of work. You might in that case just opt to go to a PBG table space through an unload/drop/re-create/re-load process.

To sum things up: PBR and PBG have their respective advantages and disadvantages. In choosing between these flavors of universal table space, the most important thing is to put some thought into your decision. Give careful consideration to what PBR might deliver for a table, and think also of how useful PBG might be for the same table. If you weigh your options, the decision at which you ultimately arrive will likely be the right one.

Wednesday, April 29, 2015

DB2 for z/OS: Busting a Myth About Dynamic SQL

Twice in the past month, I've encountered a misunderstanding pertaining to dynamic SQL statements issued by applications that access DB2 for z/OS via network connections (these could also be called DDF-using applications, or DRDA requesters). Now seems as good a time as any to clear things up. I'll use this blog entry for that purpose.

The misunderstanding of which I speak: some people are under the impression that dynamic SQL statements issued by DDF-connected applications are zIIP-eligible when executed, while static SQL statements issued by DDF-connected applications are not zIIP-eligible.

This is not true. zIIP eligibility is not a dynamic or static SQL thing. It is a task thing (as described in a blog entry I wrote about 15 months ago). Here, "task" refers to the type of task in a z/OS system under which an SQL statement executes. zIIP-eligible work executes under a type of task called an enclave SRB (also referred to as a preemptible SRB); thus, when a SQL statement -- dynamic or static -- runs under an enclave SRB, it is zIIP-eligible. If it runs under a TCB, it is not zIIP-eligible. When does a SQL statement run under an enclave SRB in a z/OS system? Three scenarios come to mind:

When the SQL statement is issued by a DDF-connected application (i.e., by a DRDA requester). In that case, the SQL statement will run under an enclave SRB (again, that's a preemptible SRB) in the DB2 DDF address space. The statement's execution will be off-loadable -- as much as 60% so -- to a zIIP engine (if significantly less than 60% zIIP offload is seen for SQL statements issued by DDF-connected applications, the cause could well be zIIP engine contention in the LPAR). Note that static versus dynamic is a non-issue here -- either way, the statement runs under an enclave SRB and so is zIIP-eligible.

When the SQL statement is issued by a native SQL procedure that is called by a DDF-connected application. I underlined "native SQL procedure" because a SQL statement (again, static or dynamic) issued by an external stored procedure (such as an external SQL procedure, or a stored procedure written in COBOL or C or Java) will not be zIIP-eligible, regardless of whether it is called by a DDF-connected application or by a local-to-DB2 program (such as a CICS transaction or a batch job). A SQL statement issued by an external DB2 stored procedure will not be zIIP eligible because such a stored procedure always runs under a TCB in a stored procedure address space, no matter what type of application -- local to DB2, or remote -- issues the CALL to invoke the stored procedure. Conversely, a native SQL procedure always runs under the task of the calling application process. If that process is a DDF-connected application, the application's z/OS task will be, as pointed out above, an enclave SRB in the DDF address space. That being the case, a native SQL procedure called by a DDF-connected application will run under the DDF enclave SRB representing that application in the z/OS LPAR, and the SQL statements issued by the native SQL procedure (and all statements of that type of stored procedure are SQL statements -- it's a stored procedure written in SQL) will execute under that enclave SRB and so will be (as previously noted) up to 60% zIIP-eligible.

You might think, "Hey, isn't a Java stored procedure zIIP-eligible?" Yes, the Java part of that stored procedure program will be zIIP eligible, but SQL is not Java, and the SQL statements issued by a Java stored procedure will run under a TCB in a stored procedure address space and so will not be zIIP-eligible (actually, they might be a little zIIP-eligible, because the Java stored procedure might "hold on" to a zIIP processor for just a bit after a SQL statement issued by the stored procedure starts executing).

When the SQL statement is parallelized by DB2. A query can be a candidate for parallelization by DB2 if: it is static and the associated package was bound with DEGREE(ANY); it is dynamic and the value of the CURRENT DEGREE special register is 'ANY'; or there is a row for the query in the SYSIBM.SYSQUERY and SYSIBM.SYSQUERYOPTS catalog tables (introduced with DB2 10 to enable statement-level control over execution behaviors such as parallelization and degree of parallelization, for static and dynamic SQL). If a query is parallelized by DB2, the "pieces" of the split query will run under enclave SRBs and so will be zIIP-eligible (up to 80%).

And one more thing... Since DB2 10 for z/OS, prefetch read operations have been 100% zIIP-eligible; thus, even if a query is running under a TCB and is therefore not zIIP-eligible, prefetch reads executed by DB2 on behalf of the query are zIIP-eligible. Prefetch read CPU time, as always, shows up in the DB2 database services address space (DBM1), not in the address space associated with the DB2-accessing application process (e.g., a CICS region, or the DB2 DDF address space, or a batch initiator address space).

So there you have it. To repeat a point made up front: zIIP eligibility of SQL statement execution is a task thing, not a dynamic versus static SQL thing. A static SQL statement issued by a DDF-connected application (i.e., by a DRDA requester) will be zIIP-eligible because it will run under an enclave SRB (i.e., a preemptible SRB). A dynamic SQL statement issued by a CICS transaction program will not be zIIP-eligible, because it will execute under a TCB. Clear? I hope so.

Wednesday, April 22, 2015

A DB2 11 for z/OS Temporal Data Enhancement You Might Have Missed

When DB2 10 for z/OS introduced temporal data functionality about five years ago, one of the first use cases that jumped to mind for many people was data-change auditing: a table could be created with (or an existing table altered to add) the "system time" characteristic, and thereafter one would have, in the history table associated with the base table, a record of row changes resulting from UPDATE and DELETE statements targeting the base table.

That's nice, but suppose you want to see more than WHAT a row looked like before it was changed. Suppose you also want to see WHO changed a row, and by what means (i.e., INSERT, UPDATE, or DELETE). I was recently contacted by a programmer, working on a new application in a DB2 10 for z/OS environment, who wanted to do just that. He and his team had created a table with the system time property, and in this table they had two columns to capture the identity of a data-changer and the nature of the change: one column to record the ID of the user who added a row to the table, and another column to record the ID of any user who subsequently updated the row. The rub, as this programmer saw it, concerned delete activity. How could he capture the ID of the user who deleted a row in the table? The delete operation would cause DB2 (by way of its system time temporal capability) to move a copy of the deleted row to the base table's history table, but that "pre-delete" image of the row would contain no information about the ID of the user associated with the delete operation. The programmer thought about updating a row before deleting it, just to capture (via the UPDATE) the ID of the user that would subsequently drive the row-delete action. That didn't seem like a desirable solution to the developer, but what else could he do? On top of this problem, there was the matter of not being able to easily determine whether a DELETE or an UPDATE caused a "before" image of a row to be placed in the history table. Not a good situation.

I'll tell you, I like to give people good news, and I had good news for this guy. The good news, I told him, was that his organization was about to migrate their DB2 for z/OS subsystems to DB2 11, and new functionality in that release would address his "who did what?" requirements while also allowing his team to simplify their application code.

I'm actually talking here about capabilities added to DB2 11 after its general availability, by way of several APARs and their respective PTFs. Key among these APARs is PM99683 (the text of this APAR references the related APARs that, together with PM99683, provide the new functionality I'm about to describe). The first goody here is a new type of generated column specification, GENERATED ALWAYS AS (CURRENT SQLID). That enables code simplification: there's no need to programmatically place the ID of a data-changer in a column of a row -- DB2 11 will do it for you (and note that CURRENT SQLID is one of several special registers that can now be used with GENERATED ALWAYS -- you can read more about this in the section of the DB2 11 SQL Reference that covers CREATE TABLE).

There's more: you can also have in a table a column that is GENERATED ALWAYS AS (DATA CHANGE OPERATION). What's that? It's just a 1-character indication of the nature of a data change operation: I for INSERT, U for UPDATE, D for DELETE. Isn't that cool?

I'm still not done. In addition to the new GENERATED ALWAYS AS (CURRENT SQLID) and GENERATED ALWAYS AS (DATA CHANGE OPERATION) options of CREATE TABLE (and ALTER TABLE), there is a very handy clause that can now be added to the ALTER TABLE statement used to "turn on" versioning (i.e., system time) for a table: ON DELETE ADD EXTRA ROW. When system time activation for a table includes this clause, DB2 will add an extra row to the base table's history table when a row is deleted. That is to say, you'll get (as usual) the "pre-delete" image of the row (with the "row end" timestamp showing when the row was made non-current by the DELETE), and you'll ALSO get ANOTHER version of the row added to the history table -- this one with a 'D' in your GENERATED ALWAYS AS (DATA CHANGE OPERATION) column, and the ID of the deleting user in your GENERATED ALWAYS AS (CURRENT SQLID) column.

A little more information about this "extra row" that's added to the history table for a base table DELETE when ON DELETE ADD EXTRA ROW is in effect: first, the "row begin" and "row end" timestamps in the extra row are the same, and are equal to the "row end" value in the "as usual" row placed in the history table as a result of the DELETE (by "as usual" I mean the "before-change" row image that's always been placed in a history table when a base table row is deleted). Second, "extra rows" in the history table resulting from base table DELETEs with ON DELETE ADD EXTRA ROW in effect are NOT part of a base table query result set when that query has a FOR SYSTEM_TIME period specification, no matter what that specification is. If you want to see the extra rows added to a history table by way of ON DELETE ADD EXTRA ROW functionality, you'll need to query the history table explicitly.

The text of APAR PM99683, which you can access via the hyperlink I included a few paragraphs up from here, provides a set of SQL DDL and DML statements that very effectively illustrate the use and effects of the enhancements about which I've written in this blog entry. I encourage you to try these statements (or variations of them) on a DB2 11 test or development system at your site, to see for yourself what the new capabilities can do for you.

Temporal data support was a gem when it was introduced with DB2 10. That gem just got shinier.

Tuesday, March 31, 2015

The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Over the past year and a half, I have been talking to lots of people about DB2 11 for z/OS. In the course of these discussions and presentations, I've noticed something interesting pertaining to the DB2-managed data archiving capability delivered with this latest release of the DBMS: a) it is one of the more misunderstood features of DB2 11, and b) when people do understand what DB2-managed archiving is about, it becomes one of the DB2 11 features about which they are most enthusiastic.

If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.

What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.

Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.

OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.

As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.

Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:
  1. A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
  2. The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
  3. To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
  4. If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.
And that's about the size of it. No great mystery here. This is, as I stated up front, all about making it easier -- for DBAs and for application developers -- to enhance CPU efficiency for retrieval of oft-requested rows from a table, when those "popular" rows are those that have been more recently added to the target table. You could have done this on your own, and a number of organizations did, but now DB2 11 gives you a nice assist. I hope that you will consider how DB2-managed data archiving could be used to advantage in your environment.