Tuesday, March 31, 2015

The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Over the past year and a half, I have been talking to lots of people about DB2 11 for z/OS. In the course of these discussions and presentations, I've noticed something interesting pertaining to the DB2-managed data archiving capability delivered with this latest release of the DBMS: a) it is one of the more misunderstood features of DB2 11, and b) when people do understand what DB2-managed archiving is about, it becomes one of the DB2 11 features about which they are most enthusiastic.

If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.

What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.

Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.

OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.

As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.

Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:
  1. A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
  2. The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
  3. To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
  4. If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.
And that's about the size of it. No great mystery here. This is, as I stated up front, all about making it easier -- for DBAs and for application developers -- to enhance CPU efficiency for retrieval of oft-requested rows from a table, when those "popular" rows are those that have been more recently added to the target table. You could have done this on your own, and a number of organizations did, but now DB2 11 gives you a nice assist. I hope that you will consider how DB2-managed data archiving could be used to advantage in your environment.

Friday, March 20, 2015

DB2 for z/OS: Non-Disruptively Altering a Native SQL Procedure

Not long ago, I was contacted by a DB2 for z/OS DBA who wanted to run a situation by me. On one of his organizations' mainframe DB2 systems -- a system with very high transaction volumes and very stringent application availability requirements -- there was a native SQL procedure for which the ASUTIME specification had to be changed. The DBA wanted to see if I could help him to find a non-disruptive way to effect this stored procedure modification.

[Background: a native SQL procedure is a DB2 stored procedure, written in SQL procedure language (SQP PL), for which the associated package is the the procedure's only executable (a native SQL procedure does not have an external-to-DB2 load module). ASUTIME is an option that can be included in a CREATE or ALTER statement for a DB2 stored procedure, and it indicates the maximum amount of mainframe processor time, in CPU service units, that can be consumed in one execution of the stored procedure. The default value for ASUTIME is NO LIMIT, and in this case the DBA needed to set an ASUTIME limit for a stored procedure because it was sometimes running too long.]

One stored procedure change mechanism that you really want to avoid, if at all possible, is drop and re-create. At some DB2 for z/OS sites, particularly when use of stored procedures is at an early stage, it is not uncommon for stored procedure changes to be accomplished through a drop and re-create process. That approach, aside from being relatively disruptive, can become unfeasible once an organization starts using nested stored procedures (referring to stored procedures that are called by other stored procedures). Native SQL procedures, in particular, make the drop-and-re-create means of stored procedure modification problematic, because an attempt to drop a stored procedure (whether native or external) will fail if that stored procedure is called by a native SQL procedure.

So, ALTER PROCEDURE is the way to go, versus drop and re-create, unless the change you want to make cannot be accomplished with ALTER (several parameter-related changes come to mind here -- ALTER PROCEDURE can be used to change the names of a native SQL procedure's parameters, but not to change the number of parameters for a stored procedure, or a parameter's usage (e.g., from IN to OUT), or a parameter's data type). You need to keep in mind, however, that changing some options of the current version of a native SQL procedure via ALTER PROCEDURE will cause the packages of programs that call the altered SQL procedure to be invalidated. Other ALTER PROCEDURE changes -- again, when it is the current version of the procedure that is modified -- cause the package of the native SQL procedure itself to be invalidated (and some current-version changes do both: they invalidate the SQL procedure's package and the packages of programs that call the SQL procedure). A table in the DB2 for z/OS SQL Reference shows the package invalidation effects of changing various options of the current version of a native SQL procedure via ALTER PROCEDURE. Here is the URL for information on ALTER PROCEDURE for a native SQL procedure, from the DB2 10 for z/OS Knowledge Center on IBM's Web site:

http://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_alterproceduresqlnative.dita

When you go to the Web page pointed to by the above URL, scroll down to Table 2, and there you will find the information pertaining to the package invalidation effects of ALTER PROCEDURE changes applied to the current version of a native SQL procedure. You will see in that table that changing ASUTIME via ALTER PROCEDURE does not invalidate (i.e., does not drive an implicit rebind or regeneration of) the package of the target native SQL procedure. This change does, however, invalidate the package of any program that calls the altered native SQL procedure. The DBA who brought his situation to my attention could have identified dependent packages (i.e., packages of programs that invoke the native SQL procedure for which the ASUTIME would be changed) via a query of the SYSIBM.SYSPACKDEP table in the DB2 catalog, and then could have issued the ALTER PROCEDURE statement and rebound the affected packages, given a brief window of time during which the related programs could be offline. That approach was not attractive to the DBA, given the fact that scheduled application outages at his site were hard to come by.

Putting our heads together, the DBA and I came up with an alternate, non-disruptive process for effecting the desired change in the definition of the native SQL procedure. This approach relies on two, not one, ALTER PROCEDURE statements to get the job done. The first is an ALTER PROCEDURE with the ADD VERSION clause, along with the new ASUTIME value (and this ALTER PROCEDURE statement would also include the list of parameters for the procedure, any options that will have non-default values, and the body of the procedure). Because the ASUTIME change applies to the new, added version of the SQL procedure, packages associated with callers of the current version of the stored procedure are not invalidated when the ALTER PROCEDURE with ADD VERSION is executed. A second ALTER PROCEDURE statement with the ACTIVATE VERSION clause is issued to make the just-added procedure version (the one with the updated ASUTIME value) the current version, and the change process is complete. Callers of the native SQL procedure are not disrupted by the ALTER PROCEDURE statement with ACTIVATE VERSION, because that statement just causes subsequent CALLs naming the SQL procedure to use the version of the procedure that you previously added with step 1 of this 2-step process.

The DBA tested this 2-step native SQL procedure change process during a period of heavy activity on a DB2 for z/OS system, and as expected, no contention issues were seen (even so, his plan is to use this process during periods when the system is less busy, and that is probably a good idea).

So, if you're willing to issue two ALTER PROCEDURE statements versus one, you can non-disruptively change ASUTIME and most other characteristics of a native SQL procedure. An extra ALTER PROCEDURE statement seems to me to be a small price to pay for enhanced application availability.