Thursday, January 21, 2016

DB2 for z/OS-Managed Archiving, or System-Time Temporal?

Recently I had the opportunity to talk to a group of application developers and DBAs about the temporal data support that was introduced with DB2 10 for z/OS. One of the meeting attendees asked an interesting question: for data-archive purposes, should one use system-time temporal functionality (by which DB2 retains "before" images of rows that are affected by update or delete operations), or the DB2-managed archiving capability delivered with DB2 11 for z/OS?

At first glance, that might look like a question with an obvious answer: because the question is about data archiving, the answer should be, "Use DB2-managed archiving," right? In actuality, one really could go either way, depending on an organization's requirements regarding the data in question. In certain circumstances, system-time temporal functionality can be an appropriate data archiving mechanism, despite the fact that the word "archiving" is not part of the formal name of that DB2 feature.

Before proceeding further, I'll provide a brief description of DB2's system-time temporal support (DB2-managed arching is described in the blog entry to which I provided a hyperlink, above). System-time refers to one of two "flavors" of DB2 temporal data functionality (the other flavor, known as business-time temporal, provides a means whereby future data changes -- such as a price adjustment for a product or service -- can be reflected in a DB2 table's data without affecting programs that, by default, access data that is in effect, from a business perspective, now). When a DB2 table is enabled for system-time temporal support (either when it is created, or, for an existing table, via ALTER operations), it has a couple of extra columns, both of the timestamp variety (there is actually a third added timestamp column, called a "transaction-start ID" column, that appears to be largely unused at present but may get a role to play in a future release of DB2 for z/OS). These two columns indicate the time at which a row became current (by being inserted into the table, or by way of an update operation targeting an existing row), and the time at which the row became non-current (by being deleted, or changed via update -- rows that are "currently current" have a high value of midnight, December 31, 9999 in the "time the row became non-current" column).

A table enabled for system-time temporal functionality, in addition to having "row became current" and "row became non-current" timestamp columns, also has an associated table called a history table. The history table holds the "before" images of rows that were either deleted from the "base" table or were changed in the base table via update; so, all the rows in the history table are non-current, and all the rows in the base table are "currently current." If a row for product XYZ is inserted into a system-time temporal-enabled PRODUCTS table at 9:27 AM on January 22, 2016, that timestamp (which will in fact have picosecond precision) will go into the row's "became current" column (which is actually called the "row begin" column), and the "became non-current" column (officially, the "row end" column) will contain the high value for a timestamp (as previously mentioned, that's midnight on December 31, 9999). If the price of product XYZ is changed from $10 to $15 via an UPDATE at 3:30 PM on March 10, 2016, the row in the base table will show the $15 price, a "row begin" time of 3:30 PM on March 10, 2016 (the time of the UPDATE), and a "row end" time of the previously noted high-timestamp value. DB2 will place in the base table's history table the "before" image of the row (before the update, that is), with a price of $10 (the pre-UPDATE price), a "row begin" time of 9:27 AM on January 22, 2016 (the time the row was first inserted in the base table with the $10 price), and a "row end" time of 3:30 PM on March 10, 2016 (the time of the update that changed the price for product XYZ to $15).

Here's what's really cool about this system-time temporal thing: through extensions to SQL that accompanied the introduction of temporal functionality, I can (or a program can) ask DB2 this question: "What was the price of product XYZ on February 17, 2016?" DB2 will retrieve information from the row that was current as of the date specified, and provide the answer: $10. Importantly, that temporal query does not have to reference the history table associated with the system-time temporal-enabled base table -- it just references the PRODUCTS table (the base table in this example), and DB2 takes care of looking for the row, if needs be, in the history table that has been paired with the PRODUCTS table. Besides allowing data retrieval on a prior POINT in time basis, DB2 system-time temporal support allows data retrieval on a prior RANGE of time basis -- a program or user can ask DB2, "Show me any and all rows for product XYZ that were current for at least some time between December 1, 2015 and April 30, 2016." DB2 will retrieve that row or rows (in our example, it would be two rows for product XYZ -- one from the base table and one from the history table), with again no need for the history table to be referenced in the query. By examining the values in the "row begin" and "row end" columns of the retrieved rows, one can see how data for (in this example) a product changed, and when those changes were effected.

What system-time temporal support and DB2-managed archiving have in common is the notion of a "single logical table" that in fact consists of two physical tables (a base and a history table, in the case of system-time temporal, and for DB2-managed archiving a base and an archive table). In both cases, application programs do not have to reference the "associate" table. Data access references are to the base table, and DB2 takes care of pulling in data from the "associate" table, as needed.

Back now to data archiving. Various DB2 for z/OS-using organizations have seen system-time temporal support as a data archiving solution, and they are not wrong in looking at it that way. The choice between system-time temporal and DB2-managed archiving (and an either-or choice it is, because these capabilities cannot both be used with a single base table) comes down to the nature of data in a table that an organization wishes to retain over some long period of time (and long-term retention of historical data is my simple conception of archiving). It may be that data in a table, once inserted, is never updated, and that rows deleted are either discarded completely (thrown in "the bit bucket," if you will) or moved to offline media. In that case, if interest is only in rows that are "currently current" (i.e., still valid and true, even if quite old), and if rows have to be kept for a long time, and there is a desire to physically separate older from newer rows to boost access performance for the newer rows (the assumption here being that newer rows are the ones most frequently accessed), DB2-managed archiving offers a nice solution that combines optimal performance for newer-row retrieval with large-capacity historical data retention, without complicating SQL coding for queries that might need to access older as well as newer rows (thanks to the fact that, as noted, programs need only reference the base table, even if the associated archive table needs to be searched for data). Even if rows in the table are updated after having been inserted, DB2-managed archiving can be a good historical data retention solution if there is interest only in currently-valid rows -- if there is no interest in what updated rows "used to look like" before they were updated.

What if, on the other hand, there is an interest in what rows looked like at some prior point in time, even if the rows look different now because of updates, or even if the rows were deleted from the base table? If the need to retain large amounts of historical data includes a requirement -- or at least a desire -- to maintain accessibility to non-current (i.e., not in effect now), "prior versions" of rows made non-current through DELETEs and/or UPDATEs, DB2's system-time temporal functionality (sometimes referred to as "row versioning") could be just the ticket.

System-time temporal data support and DB2-managed archiving can both be thought of as data archiving solutions, even though only the latter feature has "archiving" in its name. The decision to use one or the other will depend on an organization's requirements for data in a given table, and one DB2-using company might well make use of both solutions for different tables (keeping in mind that a single table cannot be both archive-enabled and system-time temporal-enabled). Remember that system-time temporal functionality is about holding onto "what was true" data associated with a table, even as "what is true" is changed by way update and/or delete operations. DB2-managed archiving is the right tool for the job when the only data to be retained is of the "is true now" variety, and when a large quantity of historical data and programs' propensity to access fairly "new" data make physical separation of older and newer rows -- without complicating data access logic -- advantageous from a performance perspective.

In about a week or so, I'll post a companion piece to this entry, with some matters for DB2 DBAs to consider as they ponder physical database design options for history tables (used with system-time temporal-enabled base tables) and archive tables (used with archive-enabled base tables).

5 comments:

  1. Hi Robert,

    Is DB2-Managed Data Archiving Feature in DB2 v11 similar to Archive process in IBM InfoSphere Optim for z/OS?
    If not, can you please provide the differences between the two.

    Thanks,
    Bharath Nunepalli.

    ReplyDelete
  2. They are different, largely in terms of comprehensiveness and sophistication. The Optim Archive solution, for example, can archive referentially complete sets of data, not only from multiple different tables in a DB2 for z/OS database, but also, if applicable, from multiple different sources on multiple different platforms. It also supports a multi-tiered archive infrastructure. Optim Archive can automatically dispose of data when that is legally permissible (and an organization's legal departments is often very keen on that capability). Optim Archive supports a wide range of access mechanisms for archived data. Optim Archive provides data archive capabilities that can work in concert with application retirement procedures. Optim Archive provides a data compression capability that can achieve even higher compression ratios than DB2's data compression feature. Optim Archive has features that support regulatory compliance. Optim Archive archives metadata along with business data. Optim Archive provides an additional layer of data security with respect to access controls for archived data.

    In short, DB2-11 managed archiving is a nice feature, built into DB2 for z/OS, that is chiefly beneficial as a means of improving performance for access to "newer" rows in a table (when a table is very large and when newer rows are more frequently accessed than older rows), while not complicating application access to data. Optim Archive, on the other hand, is a very feature-rich and sophisticated, complete data archiving solution for a wide range of data requirements.

    More information about Optim Archive (including Optim Archive for z/OS), can be found online at http://www-03.ibm.com/software/products/en/infosphere-optim-archive.

    Robert

    ReplyDelete
    Replies
    1. Thanks for the response and details.


      Bharath Nunepalli.

      Delete
  3. Hi Robert,
    Do archive tables have to be universal? and what are the consequences if not?
    Thanks,
    Natalie E.

    ReplyDelete
    Replies
    1. I apologize for the long delay in responding - I somehow overlooked your comment until just now.

      No, archive tables do not have to be in universal table spaces. The consequences of an archive table not being in a universal table space? Similar to the consequences of ANY table not being in a universal table space: you would not be able to take advantage of DB2 features and functions that are dependent on a table space being of the universal variety. For example, if you wanted to hash-organize an archive table, that would not be possible if the table were not in a universal table space. You couldn't use partition-by-growth for an archive table not in a universal table space. You couldn't change the DSSIZE or page size of segment size for the table space via pending DDL (ALTER followed by online REORG) if it were not a universal table space. And there are other examples. I think that I would always prefer to put a new table - regardless of whether or not it is an archive table or an archive-enabled table - in a universal table space. That's the way of the future.

      Robert

      Delete