Robert's Db2 blog: A Great Analytics on System z Use Case: The Operational Data Store

A few months ago, I posted an entry to this blog on the subject of "moving queries to data," the idea being that when data to be analyzed for decision support purposes originates in a DB2 for z/OS database (as a great deal of the world's corporate and governmental data does), a very good case can be made for analyzing that data on its platform of origin -- System z -- as opposed to copying and moving the data to a different platform for querying and reporting. Quite recently I've had some communications with various individuals on this very topic, and these exchanges have led to my highlighting a particularly good use case for DB2 and System z as a data server for business intelligence applications: the operational data store, or ODS. The ODS is not a new concept, but I've found that it's not universally understood and leveraged. In this blog entry, I'll provide some information that I hope will clarify what an ODS is, for those not very familiar with the idea, and show why DB2 for z/OS can be an outstanding environment for an ODS, particularly when the operational data in question "lives" in a DB2 for z/OS system to begin with.

First, I think that it can be useful to contrast an ODS with a data warehouse. Keep in mind that having an ODS versus a data warehouse is not an either/or proposition -- an organization can make effective use of both. Speaking, then, in pretty broad-brush terms, here are some of the things that tend to differentiate these two types of analytics data stores (and when I use the term "operational database" hereafter, I'm referring to a source database that is accessed by operational, "run the business" online and/or batch applications):

Whereas the design of a data warehouse database will usually differ from the design of the operational database(s) from which the data warehouse data is sourced (for example, the data warehouse might feature so-called fact and dimension tables arranged in what's known as a star schema, to facilitate certain types of online analytical processing), an ODS is very often an exact duplicate of an operational database (or at least a duplicate of a subset of the tables in the operational database) -- "duplicate" referring to table and column names, and table structure (i.e., column order and column data types). In some cases an ODS will differ from the associated operational database in terms of data retention (a company might decide to keep data for X amount of time in the operational database, and for a longer period of time in an ODS, for historical reporting purposes -- an arrangement that can work especially well if the older data is rarely, if ever, updated).
Whereas a data warehouse is often sourced from several databases (which may be housed on different server platforms), an ODS commonly has one associated source database.
Whereas data in a data warehouse is sometimes aggregated in some way, data in an ODS is typically "atomic-level" with regard to detail.
Whereas there is often some tolerance for delay with respect to source data updates being reflected in a data warehouse (data extract/transform/load, aka ETL, processes that run on a nightly basis are not unusual for a data warehouse), users of an ODS often expect and may demand extremely low latency in this regard -- the requirement could even be for near-real-time ODS updates with respect to the updating of corresponding records in the source operational database.
Whereas data in a data warehouse is often transformed in some way, to accomplish things such as making data fields (which might be code values in an operational source table) more user-friendly, data is typically NOT transformed when added to an ODS.

Given these differences, in the case of an ODS you will regularly find that instead of ETL, you simply have EL (extract/load, from the source database to the ODS), and that EL process may involve near-real-time replication. That being the case, when the source operational database is managed by DB2 for z/OS, it can make all kinds of sense to use DB2 for z/OS as well for the ODS. Different organizations do exactly that, in different ways. A company might have an ODS in the same DB2 for z/OS subsystem that is used for the operational database, with the ODS tables being in a different schema (that is to say, the tables in the ODS would have different high-level qualifiers versus the tables in the operational database). More commonly (in my experience), the ODS will be set up in a different DB2 subsystem. The DB2 subsystem used for the ODS could be in the same z/OS LPAR as the DB2 subsystem used for the operational database, but I more frequently find that it's in a different z/OS LPAR.

When an ODS sourced from a DB2 for z/OS database is itself managed by DB2 for z/OS, very low-latency replication of source data changes to the ODS is facilitated, as is management of the ODS environment itself (the DBA team that takes care of the DB2 for z/OS operational database will generally have a pretty easy time managing an ODS database that essentially mirrors the source database). This ease of management extends to matters related to data security -- the same DB2 for z/OS and RACF (or equivalent) security controls used to lock down data in the operational database can be leveraged to protect ODS data from unauthorized access. Another factor that contributes to the goodness of fit of DB2 for z/OS with an ODS is the nature of the typical ODS workload: some longer-running, complex and/or data-intensive queries, yes, but also a relatively high volume of quick-running, in-and-out "transactional queries" -- users wanting to see, immediately, the information pertaining to this particular customer, or that particular transaction, in the course of their data analytics work. DB2 for z/OS and System z have long excelled at handling such mixed workloads, while some other platforms used for analytics work can bog down in processing a large number of concurrently executing queries. If you want to make your DB2 for z/OS system even more highly capable with regard to "operational analytics" workloads, take a look at the DB2 Analytics Accelerator for z/OS -- technology that can preserve excellent performance for high-volume, transactional queries while dramatically speeding up the execution of more complex and data-intensive queries.

Here's another thought: when the idea of minimizing inter-platform data movement is taken to its extreme, you don't move the data at all -- you allow analytics users to query the actual operational tables in the production DB2 for z/OS database. This approach, while not commonplace, is utilized in some cases, and successfully. It is certainly technically feasible, and more so on the z/OS platform than others, thanks to the highly advanced workload management capabilities of z/OS. One interesting option in this area is available to organizations that run DB2 for z/OS in data sharing mode on a Parallel Sysplex: you route the analytics queries through a subset of the data sharing group members, and route the operational transactions and batch jobs through the other members of the group. All the SQL statements hit the same DB2 tables, but because the analytics and operational SQL statements run in different DB2 subsystems (which can be in different z/OS LPARs on different System z servers in the Sysplex), the analytics and operational workloads don't compete with each other for server memory or processing resources. I have some personal experience with such a set-up, and I wrote a few blog entries that provide related information: a part 1 and part 2 entry on running OLTP and business intelligence workloads on the same DB2 for z/OS system (written while I was working as an independent DB2 consultant), and an entry that describes technology that can be used to limit different workloads to different subsets of the members of a DB2 data sharing group.

Does your organization use DB2 for z/OS to manage and protect your most valuable data: the data generated by your run-the-business operational applications -- the data that you own? If so, do you have a DB2 for z/OS-based ODS that provides users with secure, high-performance access to a current version of that data with atomic-level detail? If you don't have such an ODS, consider how a data store of this nature, managed by DB2 for z/OS -- the same DBMS in which your "gold" data is housed -- could provide users throughout your business with a resource that would enhance decision making effectiveness and improve outcomes. Lots of organizations have taken this path. It may be one that your company should take, as well.

Robert's Db2 blog

Friday, November 29, 2013

A Great Analytics on System z Use Case: The Operational Data Store

No comments:

Post a Comment