Robert's Db2 blog: The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Tuesday, March 31, 2015

The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Over the past year and a half, I have been talking to lots of people about DB2 11 for z/OS. In the course of these discussions and presentations, I've noticed something interesting pertaining to the DB2-managed data archiving capability delivered with this latest release of the DBMS: a) it is one of the more misunderstood features of DB2 11, and b) when people do understand what DB2-managed archiving is about, it becomes one of the DB2 11 features about which they are most enthusiastic.

If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.

What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.

Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.

OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.

As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.

Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:

A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.

And that's about the size of it. No great mystery here. This is, as I stated up front, all about making it easier -- for DBAs and for application developers -- to enhance CPU efficiency for retrieval of oft-requested rows from a table, when those "popular" rows are those that have been more recently added to the target table. You could have done this on your own, and a number of organizations did, but now DB2 11 gives you a nice assist. I hope that you will consider how DB2-managed data archiving could be used to advantage in your environment.

39 comments:

UnknownJuly 17, 2015 at 11:37 AM
Hello Robert, Thanks for sharing the details.

Is there any way i can get the list of top 20 or 30 Largest Tables by using an SQL Query in DB2 z/OS
ReplyDelete
Replies
RobertJuly 23, 2015 at 12:38 PM
Sorry about the delayed response.

Largest by what measure? Number of rows? Physical size of table?

Robert
ReplyDelete
Replies
UnknownOctober 29, 2015 at 8:34 AM
Can I initially populate the archive table via DB2 LOAD?
ReplyDelete
Replies
UnknownDecember 28, 2015 at 2:07 PM
Robert, is this feature available on DB2 V11 CM mode or NFM?

Thanks!

Jean Fang from Visa.
ReplyDelete
Replies
Henrik Sand NielsenJanuary 15, 2016 at 4:49 AM
Can you run REORG DISCARD on an archive enabled table?
Thanks
Henrik
ReplyDelete
Replies
AnonymousMarch 7, 2017 at 10:10 PM
Hi Robert, Does the archive table have to be in the same schema or can it have a different schema?
ReplyDelete
Replies
AnonymousApril 18, 2017 at 1:30 PM
Hi Robert, Are there limits on the size of the archive table? Is it possible to have multiple archive tables associated to the base table i.e. a yearly archive?
ReplyDelete
Replies
Bill renoMay 8, 2017 at 9:07 AM
robert do I understand correctly that in order to get a performance improvement you have to set SYSIBMADM.GET_ARCHIVE global variable to 'N' so that it will only go against the base table otherwise since it goes against both tables the performance is worse?
ReplyDelete
Replies
AnonymousJuly 17, 2017 at 3:23 PM
Hi Robert,

I am getting -551 SQLCODE when I set SYSIBMADM.MOVE_TO_ARCHIVE = 'Y'. Could you please help me overcome this.

Thanks
ReplyDelete
Replies
KirtiAugust 2, 2017 at 11:50 AM
SET SYSIBMADM.GET_ARCHIVE, does it require any kind of special permission. Wht access should be granted to developers ? As we give very specific access to programmers (not the DBA access), we dont want to expose a wide access to programmers.
ReplyDelete
Replies
AnonymousFebruary 10, 2019 at 11:34 PM
SYSIBMADM.GET_ARCHIVE & SYSIBMADM.MOVE_TO_ARCHIVE are global variables. Lets say I have 10 Archive tables and we set these global variables as 'Y' or 'N'. Are these set for all Archive tables or for the one that we are interested in Either bringing back the rows from archive table or start/stop archiving when deleting the rows?
ReplyDelete
Replies
guptaeJune 19, 2019 at 11:38 AM
Hello Robert,

I have created base table and its associated Archival Tables . Binded the package and after setting the Global Variable Data is inserted in Archival tables when its deleted from my base.
Now my question is how can I prune these Archival Tables ? Archival table should be pruned depending upon the date on which data is deleted from Base table.Just an example I want to purge /delete 7 year older Purge data .

Please advise.
ReplyDelete
Replies
UnknownJune 25, 2019 at 12:13 PM
Hello Robert,

We are looking into DB2 11 archive tables archiving of LOB data. Our LOB data are timestamp keyed in DB2 11 and we're hoping that the older/less frequently referenced LOBs can be archived then offloaded from archive tables onto tapes while maintaining the ability to load the offloaded LOBs back for reference should that be needed. Reading up on on DB2 archive tables requirements, aux tables does not seemed to qualify. Is that true ?
ReplyDelete
Replies
WatasonSeptember 23, 2019 at 8:01 AM
Hi robert,

How can I find/determine what are all the inactive/old/cold data's that can be moved to archive table from base table.
ReplyDelete
Replies
AnonymousSeptember 25, 2019 at 4:28 AM
How can I find rows that are frequently accessed and rows that are far more frequently accessed?
ReplyDelete
Replies
swaroopNovember 5, 2019 at 2:05 AM
Hi, Can we create a archival table to already created base table
ReplyDelete
Replies
UnknownDecember 18, 2019 at 4:01 AM
Hi Robert, thanks for this informative article. One question I have is about referential integrity. If all related tables have DELETE CASCADE and all related tables are Archive Enabled if you delete from the parent does the cascaded delete propagate the respective archive tables.

I'm guessing it does but I can't find anywhere to confirm this.

Thanks in advance

Anthony
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.