Now, having data is one thing, but to get maximum value from this resource, you have to use it -- to identify new market opportunities, to optimize product mix, to manage risk, to better understand customer needs and buying behaviors. Thus the importance of business analytics, which can be thought of as the means whereby information is made to yield insight. For quite a few years, the conventional wisdom held that business analytics was best done on distributed systems platforms; so, organizations wishing to analyze, for decision support purposes, data stored in DB2 for z/OS databases, routinely copied that data to databases running under Linux, UNIX, or Windows on x86- or RISC-based servers. The copied data would then be accessed by various business intelligence tools.
The argument for performing business analytics in this fashion was largely cost-based. People acknowledged that mainframe systems were great as data-of-record repositories -- the super-sized, super-reliable data vaults on which run-the-business applications were built -- but they assumed that System z was overkill, cost-wise, as a platform for query and reporting applications. Interestingly, as LInux-, UNIX- and Windows-based analytics systems grew and proliferated within enterprises, a funny thing happened: the "inexpensive" alternative started to become notably more expensive, as costs mounted in several areas:
- Source data extract and change replication. Aside from the costs associated with extract and load processing to build an initial query/reporting database on a distributed systems platform, there is the cost of keeping the copied data up to date, as demanded by BI application users. At some sites, growth in the number of query/reporting databases resulted in a change data replication infrastructure that looked something like a spider web when viewed diagrammatically. This infrastructure could end up being quite resource intensive from a support perspective.
- Environmental factors. Floor space, electricity, and data center cooling costs mount as x86- and RISC-based analytics servers multiply.
- Data security and integrity. As more and more copies of data sourced from a DB2 for z/OS database pop up in the enterprise, control and management of the data becomes increasingly problematic. Data copies can become out-of-sync with respect to the source, and data security safeguards present on the mainframe system may be lacking on some query and reporting platforms.
Faced with the escalating costs of a "move the data to the query systems" approach, a growing number of companies have opted to go a different route: they are moving the queries to the data. In other words, enterprise data is analyzed where it lives: on the mainframe.
"Moving the queries to the data" is a beautifully simple concept that is grounded in advanced technology and open standards. It starts with the fact that the leading query and reporting tools -- from IBM and from other vendors -- absolutely can be used with DB2 for z/OS. If your preferred business analytics software utilizes JDBC or ODBC to access relational databases (very often the case), you're in good shape, as IBM provides DB2 drivers for those interfaces (and others, as well).
If you query mainframe data on the mainframe (versus copying it to some other platform for analytics purposes), you might still decide to extract data from a source operational database into a query and reporting database that also resides on a System z server. Why? There are several potential reasons:
- "Data stability" during daytime query hours. Sometimes, users want to get the same result for a given query whether it is executed at 9 AM or at 4 PM on a given day -- in other words, they don't want the data "changing underneath them." In such cases, updates of data in the query and reporting database are processed on a nightly basis.
- Database design differences. Whereas the data in the source operational tables is likely organized in traditional third normal form, data to be queried via business intelligence tools might need to be arranged in a star schema manner.
- Data transformation requirements. Data values might need to be transformed in some ways before being loaded or inserted into query and reporting tables.
- Workload isolation. In particular in a non-data sharing DB2 for z/OS environment, directing analytics users to a set of tables separate from the production operational tables might be done to increase the degree of physical separation between the business intelligence workload and the "run the business" transactional and batch workload. Separate tables could be assigned to separate buffer pools to reduce contention for server memory resources. They could be in a different DB2 subsystem on another z/OS LPAR, perhaps on the same mainframe server, to add CPU isolation to the aforementioned memory isolation which can be achieved by using different DB2 buffer pools (different processors on a System z server can be dedicated to different z/OS LPARs on the server).
Whatever the reason for copying operational DB2 for z/OS data into query and reporting tables, that task is made simpler and more efficient when the source and target tables are both managed by DB2 for z/OS.
In some cases, organizations point query and reporting tools directly at DB2 for z/OS operational tables -- in other words, data copying is not part of the picture. Is that really feasible? Yes, it is, thanks largely to the highly advanced workload management capabilities of z/OS, an operating system that for decades has handled highly differentiated workloads (e.g., transactional and batch) concurrently in a single system image. Having analytics users query the same tables as operational applications can be a particularly attractive option when DB2 for z/OS is running in data sharing mode on a Parallel Sysplex cluster configuration. In that setting, a location alias could be used to restrict business analytics users to (for example) two members of a six-member data sharing group, with the operational applications running on the other four members of the group (also by way of a location alias for DRDA requesters, and via the new subgroup attach feature of DB2 10 for batch jobs, and through the placement of CICS regions for a CICS-DB2 workload). With this type of set-up, the operational and analytics applications could access the same DB2 tables and there would be no contention for memory or CPU resources (assuming that the DB2 members used for the query and reporting workload were running on LPARs in the Parallel Sysplex separate from the LPARs on which the DB2 members handling the operational applications run).
Here's another interesting configuration option for mainframe business analytics: when BI queries involve particularly large scans of data and/or are particularly complex, dramatic reductions in query elapsed times can be achieved through the use of an IBM DB2 Analytics Accelerator. These servers extend the capabilities of a DB2 for z/OS system running a business intelligence workload, delivering "shockingly fast" performance for formerly long-running queries (borrowing a phrase used by an IT manager friend of mine at a site that recently started using DB2 Analytics Accelerator technology), while requiring nothing in the way of changes on the client side of the BI application (users continue to direct their queries to the same DB2 location, and the DB2 optimizer determines whether a given query should be processed locally (i.e., on the DB2 for z/OS front-end) or routed to the Analytics Accelerator. This is way cool technology about which I'll write more soon in a subsequent blog entry.
Topping it all off are the functionality enhancements, delivered in each succeeding release of DB2 for z/OS, that are especially useful in a business analytics context. Just in the last couple of releases (Versions 9 and 10), we've gotten OLAP specifications such as rank, dense rank, and moving aggregates; index-on-expression, which can make predicates with column functions indexable; temporal data support, which enables one to add a time dimension to data in a table; the access path repository, a set of new DB2 10 catalog tables including SYSQUERYOPTS, that enables, among other things, DB2 server-side control of parallelization for dynamic queries at the statement level; a slew of new built-in functions (among them LPAD, MONTHS_BETWEEN, OVERLAY, and TRIM); new data types such as BIGINT, DECFLOAT, and XML; and real-time scoring, which allows for evaluation of newly inserted or updated database values against an SPSS predictive model as part of a transaction's flow, with virtually no impact on transaction elapsed time. And, there's more of this to come with DB2 11 for z/OS, now in beta test.
Is DB2 for z/OS the source DBMS for much of your company's "of record" data? Are you copying this data out to various other platforms for query and reporting purposes? Have you stopped to consider whether moving the queries to the data -- i.e., running your business analytics queries on the source DB2 for z/OS system -- might be a less complicated, more efficient, more secure, and more cost-effective approach? Leveraging DB2 for z/OS and System z as a platform for business analytics has been a winning strategy for many organizations. It might be for yours, as well.
A very interesting article as always. Keep up the good work.
ReplyDeleteGuido Verbraak
Thanks for the positive feedback, Guido.
DeleteRobert