If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.
What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.
Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.
OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.
As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.
Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:
- A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
- The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
- To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
- If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.