Tuesday, March 31, 2015

The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Over the past year and a half, I have been talking to lots of people about DB2 11 for z/OS. In the course of these discussions and presentations, I've noticed something interesting pertaining to the DB2-managed data archiving capability delivered with this latest release of the DBMS: a) it is one of the more misunderstood features of DB2 11, and b) when people do understand what DB2-managed archiving is about, it becomes one of the DB2 11 features about which they are most enthusiastic.

If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.

What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.

Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.

OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.

As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.

Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:
  1. A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
  2. The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
  3. To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
  4. If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.
And that's about the size of it. No great mystery here. This is, as I stated up front, all about making it easier -- for DBAs and for application developers -- to enhance CPU efficiency for retrieval of oft-requested rows from a table, when those "popular" rows are those that have been more recently added to the target table. You could have done this on your own, and a number of organizations did, but now DB2 11 gives you a nice assist. I hope that you will consider how DB2-managed data archiving could be used to advantage in your environment.

40 comments:

  1. Hello Robert, Thanks for sharing the details.

    Is there any way i can get the list of top 20 or 30 Largest Tables by using an SQL Query in DB2 z/OS

    ReplyDelete
  2. Sorry about the delayed response.

    Largest by what measure? Number of rows? Physical size of table?

    Robert

    ReplyDelete
  3. Can I initially populate the archive table via DB2 LOAD?

    ReplyDelete
    Replies
    1. Sorry about the delayed response, Jim.

      Yes, you can use LOAD for a table space containing an archive table. Here is an interesting item from the DB2 11 for z/OS Utility Guide and Reference, in the section on LOAD: "LOAD REPLACE is not allowed on an archive-enabled table. (LOAD REPLACE is allowed on the table space that contains the archive table.)" So, no LOAD REPLACE for an archive-enabled table, but LOAD REPLACE is OK for the archive table that is associated with an archive-enabled table.

      Robert

      Delete
  4. Robert, is this feature available on DB2 V11 CM mode or NFM?

    Thanks!

    Jean Fang from Visa.

    ReplyDelete
    Replies
    1. New-function mode, Jean.

      As a general rule, when use of a new DB2 for z/OS feature involves using new SQL syntax (such as ALTER TABLE... ENABLE ARCHIVE), that feature will only be available when the DB2 system is running in new-function mode.

      Robert

      Delete
  5. Can you run REORG DISCARD on an archive enabled table?
    Thanks
    Henrik

    ReplyDelete
    Replies
    1. You should be able to do that, Henrik, and that would be an OK thing to do IF YOUR INTENTION IS TO REMOVE THE DISCARDED ROWS from the one "logical" table that is the combination of an archive-enabled table and its associated archive table. If what you want to do is remove the discarded rows from the archive-enabled table and place them in the associated archive table, REORG with DISCARD would not do this the way a DELETE from the archive-enabled table would, if the SYSIBMADM.MOVE_TO_ARCHIVE global variable were set to 'Y'. In that case, the deleted rows are not removed from the "one logical table" - they are simply relocated from the archive-enabled table to the archive table. If you're wanting to do that (relocate rows) using REORG DISCARD (perhaps for efficiency reasons, if we are talking about a really large number of rows), I'd think that you would want to do that within a window in which you have blocked user and application access to the table. If you have such a window, you can do the REORG of the archive-enabled table's table space with DISCARD, and then load the discarded rows into the archive table during the window. If you follow that approach, the rows would be in the "one logical table" prior to entering the window, and they'd be there (having been relocated) on exiting the window. If you go the REORG DISCARD + LOAD approach while preserving user and application access to the table (which would require, on the archive table side, either INSERT or LOAD with SHRLEVEL CHANGE, there would be a period during which the DISCARDed rows are missing from the "one logical table" (because it would be a 2-step operation). That, in turn, could cause problems for users and/or applications accessing the "one logical table."

      Hope this helps.

      Robert

      Delete
    2. Thanks Robert for a very detailed answer. Reason for my question was exactly to do an efficient relocation of huge amount rows from the archive enabled table to the archive table.

      Delete
    3. Robert -

      I have an application where we do quarterly REORG w/discards in a small window that involves 10's of millions of rows per table. History tables exist. Unloads are done on the active tables before the REORGs and the unloaded data is LOAD RESUMEd into the history tables. In reading your response from 1/18/2016, it sounds like an archive enabled table with a REORG w/discard run against it will not automatically move the discarded rows over to the archive table. Is that correct? I was hoping the transparent archive process would allow me to eliminate the UNLOAD/LOAD RESUME process I am currently doing and just do the REORG w/discard process with transparent archive.

      Thank you.

      Delete
    4. For movement of rows from an archive-enabled base table to its associated archive table to be accomplished automatically by DB2, the mechanism has to be DELETE from the base table by a process for which the SYSIBMADM.MOVE_TO_ARCHIVE built-in global variable has been set to Y. Since REORG with DISCARD does not use the SQL interface (i.e., does not involve execution of DELETE), the MOVE_TO_ARCHIVE global variable does not apply.

      You could potentially take advantage of DB2-managed archiving by moving rows from the base to the associated archive table (using DB2-managed archiving) on a more-frequent basis, so that DELETE could be used to accomplish the move, versus REORG with DISCARD. You could even do this move daily, so that (for example) the most current three months of data is kept in the base table, with the rest being in the associated archive table. With DB2-managed archiving, programs accessing the data do not have to be aware of the existence of the archive table - all data appears to be in the base table if programs are bound with ARCHIVESENSITIVE(YES) and if the value of the built-in global variable SYSIBMADM.GET_ARCHIVE is set to Y.

      You could request that a MOVE_TO_ARCHIVE-like option be added for REORG with DISCARD. Such a request would be formally submitted via the Request for Enhancements (RFE) application on IBM's Web site (the DB2 for z/OS page for RFE is at https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=184&PROD_ID=450&x=11&y=9).

      Robert

      Delete
  6. Hi Robert, Does the archive table have to be in the same schema or can it have a different schema?

    ReplyDelete
    Replies
    1. Sorry about the delayed response.

      No, the schema name (the qualifier of the table name) for an archive table does not have to be the same as that of the associated base table. That said, I'd think that making the schema name the same would be helpful - it would be an indicator that the tables are related to each other in some way; but, not a technical requirement.

      Robert

      Delete
  7. Hi Robert, Are there limits on the size of the archive table? Is it possible to have multiple archive tables associated to the base table i.e. a yearly archive?

    ReplyDelete
    Replies
    1. Sorry about the delay in responding.

      A given base table can have one and only one associated archive table. The limit on the size of an archive table is the same as the limit on the size of any DB2 for z/OS table (assuming that the table is in a partitioned table space): 128 TB in a DB2 11 environment, 4 PB (4,096 TB) in a DB2 12 system (if the archive table is in a table space that leverages the relative page numbering enhancement introduced with DB2 12).

      Those size limits are for non-LOB table spaces. A LOB column can hold a very large amount of data, as pointed out in the entry in this blog that is view-able at http://robertsdb2blog.blogspot.com/2012/01/note-on-data-capacity-of-db2-for-zos.html.

      Robert

      Delete
  8. robert do I understand correctly that in order to get a performance improvement you have to set SYSIBMADM.GET_ARCHIVE global variable to 'N' so that it will only go against the base table otherwise since it goes against both tables the performance is worse?

    ReplyDelete
    Replies
    1. I believe that you would also get the performance benefit of more-efficient access to more-current rows concentrated in a base table by binding a program's package with ARCHIVESENSITIVE(NO). Doing that takes the GET_ARCHIVE global variable out of the picture.

      Robert

      Delete
  9. Hi Robert,

    I am getting -551 SQLCODE when I set SYSIBMADM.MOVE_TO_ARCHIVE = 'Y'. Could you please help me overcome this.

    Thanks

    ReplyDelete
    Replies
    1. Sorry about the delay in responding. -551 is an authorization error. Your DB2 authorization ID does not have the privilege needed to change the value of a global variable. Someone will need to GRANT to your ID the WRITE privilege (or ALL PRIVILEGES, which is a combination of READ and WRITE) on the global variable. More information about that GRANT statement can be found online in the DB2 for z/OS Knowledge Center (see https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_sql_grantvariableprivileges.html).

      Robert

      Delete
  10. SET SYSIBMADM.GET_ARCHIVE, does it require any kind of special permission. Wht access should be granted to developers ? As we give very specific access to programmers (not the DBA access), we dont want to expose a wide access to programmers.

    ReplyDelete
    Replies
    1. If you want a process to be able to change the value of a global variable (whether SYSIBMADM.GET_ARCHIVE or any other global variable, built-in or user-defined), the Db2 authorization ID of the process has to have been granted the WRITE privilege (or ALL PRIVILEGES, which includes READ and WRITE) on the global variable in question. See https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_sql_grantvariableprivileges.html

      Robert

      Delete
  11. SYSIBMADM.GET_ARCHIVE & SYSIBMADM.MOVE_TO_ARCHIVE are global variables. Lets say I have 10 Archive tables and we set these global variables as 'Y' or 'N'. Are these set for all Archive tables or for the one that we are interested in Either bringing back the rows from archive table or start/stop archiving when deleting the rows?

    ReplyDelete
    Replies
    1. Global variables - whether built-in or user-defined - are best thought of as being associated with application processes versus database tables. The scope of a global variable's setting is a "session" of an application process (I describe what is meant by "session" in the blog entry at https://robertsdb2blog.blogspot.com/2018/11/db2-for-zos-global-variables-what-is.html).

      Let's say that, for a particular session, SYSIBM.GET_ARCHIVE is set to YES. That means that for ANY of the archive-enabled tables accessed in that session, queries will target the archive table and the base table IF the package in use was bound with ARCHIVESENSITIVE(YES).

      Similarly, if the global variable SYSIBM.MOVE_TO_ARCHIVE is set to Y for a given session, a delete from ANY archive-enabled table will result in the deleted row being moved from the base table to the associated archive table.

      Robert

      Delete
  12. Hello Robert,

    I have created base table and its associated Archival Tables . Binded the package and after setting the Global Variable Data is inserted in Archival tables when its deleted from my base.
    Now my question is how can I prune these Archival Tables ? Archival table should be pruned depending upon the date on which data is deleted from Base table.Just an example I want to purge /delete 7 year older Purge data .

    Please advise.

    ReplyDelete
    Replies
    1. A periodic date-based purge from the archive table will be tricky if you do not have a date or timestamp column in that table. My advice on accomplishing your objective: add a row-change-timestamp column to the base and archive tables. Just an ALTER TABLE ADD COLUMN for the base table. For the archive table, I'd suggest unload/drop/re-create/re-load. Here's why: if you do add the row-change-timestamp column to the base and archive tables, it could be a very good idea to make the archive table a range-partitioned table that is partitioned on the row-change-timestamp column, and that would mean re-creating the archive table. Note that partitioning the archive table on the row-change-timestamp column would NOT require that the base table be partitioned in this way. The base table would in fact not have to even be range-partitioned. The base and archive tables only have to be LOGICALLY equivalent (same column types, in same order, with same names), and range-partitioning is a matter of PHYSICAL database design.

      Why I like the idea of partitioning the archive table on a row-change-timestamp: let's say you partition the archive table by week. When you delete a row from the base table (to move it to the archive table, assuming that the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to Y for the delete process), Db2 will put in the row-change-timestamp column the timestamp of the delete operation. That row will then go into the partition of the archive table that holds rows moved to the archive table (via delete from the base) this week. Delete a row from the base table next week (for the purpose of moving the row to the archive table) and that row will go into the partition of the archive table that receives rows deleted from the base table next week, and so on. 7 years later, purge data by emptying out the partition holding data that has been (per the row-change-timestamp value) in the archive table for 7 years.

      Robert

      Delete
  13. Hello Robert,

    We are looking into DB2 11 archive tables archiving of LOB data. Our LOB data are timestamp keyed in DB2 11 and we're hoping that the older/less frequently referenced LOBs can be archived then offloaded from archive tables onto tapes while maintaining the ability to load the offloaded LOBs back for reference should that be needed. Reading up on on DB2 archive tables requirements, aux tables does not seemed to qualify. Is that true ?

    ReplyDelete
    Replies
    1. It is my understanding that a table with one or more LOB columns can be archive-enabled. What you have seen may be indicating that an auxiliary table in a LOB table space cannot be archive-enabled apart from the associated base table. If you archive-enable a base table with a LOB column, the associated history table will have the same LOB column and will have its own auxiliary table and LOB table space related to that LOB column. The LOB data would be archived together with, as opposed to separately from, the base-table data; so, when a base table row is moved to the associated archive table, it's LOB value should be moved to the history table's auxiliary table and LOB table space.

      Robert

      Delete
    2. Hello Robert,

      I could also see that the archive table cannot be auxillary or xml table under the restrictions for archive table.can you explain on this?

      Delete
    3. A table that has an XML column or a LOB column can be archive-enabled, and that means an archive table can have an XML and/or a LOB column. The restriction you mentioned simply means that you can't take an existing auxiliary table (which holds LOB values) or an implicitly-created table that holds XML values and make that table an archive table or an archive-enabled table. You can make make the associated BASE table an archive-enabled table or an archive table.

      Robert

      Delete
  14. Hi robert,

    How can I find/determine what are all the inactive/old/cold data's that can be moved to archive table from base table.

    ReplyDelete
    Replies
    1. You have to know the application and know the data rows that are most frequently accessed by users. If certain rows (maybe more recently inserted into a table, or pertaining to a certain region or a certain set of products) are far more frequently accessed than other rows in a table, those "other" rows can potentially be moved to an archive table to enhance performance for retrieval of the "in-demand" rows. If you are a Db2 DBA and don't know about these data access patterns, communicate about this with someone on the team that owns the application.

      Robert

      Delete
  15. How can I find rows that are frequently accessed and rows that are far more frequently accessed?

    ReplyDelete
    Replies
    1. As I have mentioned previously, the best way to make this determination is by consulting with people who know the applications that access a given Db2 table for which you are considering the use of Db2 transparent archiving (best candidate tables would be large ones, and particularly those clustered on a not-continuously-ascending key if you are thinking that frequency of access might be skewed towards rows inserted relatively recently into the table). If the application people (at least the people who own the applications that most frequently access the Db2 table in question) have no sense of rows in the table that might be more frequently accessed than others, it may be that there is not a notable skew in terms of frequency of access to rows in the table, and in that case use of Db2 transparent archiving for the table might not be beneficial from a performance perspective).

      Robert

      Delete
  16. Hi, Can we create a archival table to already created base table

    ReplyDelete
    Replies
    1. Yes. If the existing base table is named T1 then you would create an archive table that is logically equivalent to T1 (same number of columns, with same names and data types). Suppose that table is named T1_AR. You'd then "turn on" transparent archiving for T1 with a statement like this one:

      ALTER TABLE T1 ENABLE ARCHIVE USE T1_AR;

      Note that while the archive table has to be logically equivalent to the base table, it can be physically different (for example, the base table might be in a partition-by-growth table space, whereas the associated archive table could be in a partition-by-range table space).

      Robert

      Delete
  17. Hi Robert, thanks for this informative article. One question I have is about referential integrity. If all related tables have DELETE CASCADE and all related tables are Archive Enabled if you delete from the parent does the cascaded delete propagate the respective archive tables.

    I'm guessing it does but I can't find anywhere to confirm this.

    Thanks in advance

    Anthony

    ReplyDelete
    Replies
    1. With regard to RI, the parent and child tables can be archive-enabled tables, but they cannot be archive tables. Here is my understanding of what that means:

      1) An RI delete rule, such as DELETE CASCADE, is relevant only for the archive-enabled parent and child tables - not for the archive tables associated with the base, archive-enabled parent and child tables.

      2) If the delete rule is DELETE CASCADE and a parent table row is deleted, Db2 will check to see if there are any associated rows the the dependent base table (but not in the archive table associated with the dependent table). If an associated row in the base dependent table is found, that row will be deleted from that base table. The deleted row should be moved to the dependent table's archive table IF the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to Y for the process that deleted the parent table row; otherwise, the row deleted from the base dependent table should be actually deleted (meaning, it'll be gone following the delete operation).

      Robert

      Delete
  18. This comment has been removed by the author.

    ReplyDelete