Friday, October 29, 2010

DB2 Dispatch from Vegas: IOD, Day Four

I only got to a couple of sessions on this, the last day of IBM's 2010 Information on Demand conference (held in Las Vegas), but they were good ones. Below are some of the highlights of each.

Back to the future: early experiences with temporal data processing in DB2 10 for z/OS. In this session, Shawn Gravelle, IT architect at State Farm (a large insurance company), talked about State Farm's work in testing the new temporal data support delivered by DB2 10. This feature enables DB2 to automatically manage data based on system time (basically, maintaining prior versions of a table's rows as well as the current version), and to facilitate the management of data with a business time perspective (referring to a time period during which data in a row is in effect -- a period that could be in the future). System time and business time can be combined for a table to provide bitemporal data management capabilities.

Among the items of information provided by Shawn during his session:
  • State farm has a LOT of data in their system. As in, more than 5000 terabytes on SAN storage. As in, more data than you'll find in the Library of Congress (for readers outside the US: that's a VERY big library).
  • Temporal data support is largely about productivity. It doesn't allow you to do what you couldn't do before. Instead, it allows you to maintain row history and to account for business dates (again, that's "in effect as of such-and-such a date" information) with a lot less SQL -- in application code and/or triggers and/or stored procedures -- than before. That means that functionality can be implemented more quickly than before -- and that's good for business. [Note that temporal data support simplifies SQL used for data retrieval, as well SQL used to change data values.]
  • Temporal data support can also be a performance-booster. Shawn mentioned that some processes ran quite a bit more quickly -- in one case, twice as fast -- when DB2 10 temporal data capabilities were used versus having the same functions handled by user code (implemented in the form of triggers and such).
  • Implementing system-time support is easy. That's true whether the table in question is new or an existing object.
  • Implementing business-time support can be a more challenging endeavor. It's a more complicated concept. As mentioned above, a row could be inserted into a table with an "in effect" business date that's sometime in the future. The row is in one sense current (it's the most recent version, from a system perspective), but in a business sense, it isn't (in other words, the information in the row is not yet active with respect to a policy or a promotion or whatever it represents). Business time can have implications for logical and physical database design, primary keys, and the organization's data archive strategy. With all this said, business time is a powerful capability -- you just want to put plenty of thought into using it. Another point: business time can be implemented for existing tables, but it can be easier to work it into a new application database design.

The IBM Smart Analytics Optimizer query engine. This session was delivered by IBM's Vijayshankar Raman. The Smart Analytics Optimizer (SAO) is, as Vijay put it, "a network-attached accelerator for DB2 for z/OS." What does it accelerate? Queries -- particularly OLAP-type queries (referring to online analytical processing) that involve aggregation and maybe roll-ups, and which scan a lot of data. It attaches to a mainframe system running DB2 for z/OS. A DB2 DBA defines a mart (a potion of the larger DB2 database), tables in the mart are copied to the SAO, and the accelerator does the rest.

Some of the points made by Vijay during his session:
  • The SAO can speed the execution of many queries by 10 to 100 times. This thing really screams. One of the keys to the SAO's ability to majorly accelerate "big query" execution is the fact that it operates on compressed data. I'm not talking about compressed as in DB2 compressed tablespaces -- rows from tables in such tablespaces are decompressed for query processing. Vijay noted that in the SAO, "most operations are performed on encoded [i.e., compressed] values." And the degree of compression achieved is enormous -- generally in the range of 8 to 40 times with respect to space reduction.
  • The SAO delivers consistent response times for complex, large-scale queries. Most queries directed to the system (and that's an automatic routing) will complete in 10-20 seconds.
  • The SAO is disruptive technology that is not disruptive to the DB2 query environment. In putting the SAO to use, there will often be no need to change SQL or applications currently running on the mainframe DB2 system.
  • Lots of engines, lots of memory. The SAO hardware is a set of blades installed in IBM blade centers. There can be up to 56 blades in one SAO, and each blade has 48 GB of main memory. In-memory processing (the SAO is a "cache-aware" system) is another part of the SAO performance picture.
  • The star schema angle: Generally speaking, the tables in the data mart copied to the SAO will be arranged in a so-called star schema. If the dimension tables in a star schema are particularly large relative to the associated fact table, query acceleration may be a little less dramatic than it would be for a star schema with dimension tables that are relatively small compared to the fact table.
  • Not all queries are supported. In many cases, well over 50% of the queries targeting the tables in a mart copied to the SAO will be supported for execution on the SAO (others would be executed by DB2 on the mainframe -- recall that the mart is copied to the SAO, as opposed to being relocated to the SAO). IBM has a tool that can be run on a mainframe DB2 system to estimate the percentage of existing queries that could execute on an SAO.
  • The SAO does its thing without the need for a "performance layer" over the tables in the mart. By "performance layer," Vijay meant indexes and materialized query tables. SAO just scans data in tables, but it does it very smartly (using techniques such as partition elimination and simultaneous application of all of a query's predicates), and in a highly parallel, in-memory-leveraging way. The SAO will rewrite a join, by the way, as multiple scans.

I'm certainly hoping to get the chance to work with a Smart Analytics Optimizer myself in the near future. And, I'm looking forward to next year's Information on Demand conference. I hope that a lot of you will be able to make it to that event.

No comments:

Post a Comment