Wednesday, October 27, 2010

DB2 Dispatch from Vegas: IOD, Day Three

Today was a little different for me versus days one and two of IBM's Information on Demand conference in Las Vegas. I attended only one session -- the one in which I presented. Aside from that, my time was spent in meetings and in the Expo Hall. Tomorrow (the last day of the conference) it'll be back to regular session attendance for me.

This entry will be a little on the brief side. I'll provide some of the key points I made in my session, and I'll share information from a meeting I had that may be of interest to people. The overall theme of today's activity for me was business intelligence.

Mainframe DB2 data warehousing. I started my session by going over five factors that I believe are behind the increased incidence of organizations running data warehouses on a DB2 for z/OS platform:
  1. Business intelligence applications have become mission-critical. Companies rely more and more on their data warehouse systems to fuel their decision making. These systems have to be highly available, and System z and z/OS set the standard for up time (the mainframe availability story is more impressive still when several systems are configured in a DB2 data sharing group on a parallel sysplex cluster.
  2. "Operational" BI is on the rise. Here I'm referring to a data warehouse workload that is a mix of complex, long-running queries and simpler data requests that users expect to complete in a few seconds. Effectively managing performance for such a mixed workload is a challenge on some platforms, but not on the mainframe. The workload management capabilities of the z/OS operating system are second to none.
  3. Data warehouses are being updated more frequently. The old pattern of query by day, ETL (extract/transform/load) by night no longer holds true at a number of sites. To an increasing extent, data warehouse users want source data changes to be more quickly reflected in the data warehouse database (sometimes the requirement is for data warehouse updates to be made in close to real time with respect to source data changes). When more-frequent updating is required, it can make sense to locate the data warehouse closer to the primary source system, and very often that primary source system is a mainframe.
  4. Highly robust data access controls are a must. Data warehouses generally contain a lot of very sensitive data, and protecting this data from unauthorized access is an imperative. Take the security features of z/OS, the advanced capabilities of the RACF security management subsystem (or an equivalent third-party product), and DB2's own controls (made more comprehensive in DB2 9 with roles and trusted contexts, and with the new authority levels introduced with DB2 10), and put them together, and you have a platform that stands out from the crowd in terms of its data lock-down capabilities.
  5. Mainframe people and processes complete the picture. Generally speaking, at sites around the world, the people who support mainframe systems are a highly experienced, "been there / done that" crowd. The processes they've developed over the years around change management, capacity planning, performance monitoring and tuning, and business continuity are rock-solid. Companies trust their mainframe support staffs with their most critical applications and databases, and that's driving BI work to the System z platform.

I then proceeded to a discussion of threes, covering my three favorite features of DB2 9 for z/OS (the release that most sites are running) in the areas of data warehouse performance, data warehouse cost efficiency, and data warehouse functionality. The performance list is as follows:
  1. Global query optimization. This refers to DB2's ability to do some major transformation, under the covers, of queries that contain subquery predicates. To boost a query's performance, DB2 might change a correlated subquery to a non-correlated subquery (or vice versa). DB2 can also treat the result set of a subquery as a virtual table and join it to an "actual" table in the query's FROM list. The result: potentially large performance improvements with no need to alter the query's SQL code.
  2. Indexes on column expressions. It used to be that a predicate containing a column expression such as WHERE UPPER(LAST_NAME) = 'JOHNSON' or WHERE SALARY + BONUS < 100000 were non-indexable. DB2 9 enables the creation of indexes on such expressions, resulting in sometimes-huge reductions in query run times.
  3. Smarter sequential prefetch. Much of the prefetch activity in a DB2 9 environment shifts from "pure" sequential prefetch (now used only for tablespace scans) to dynamic sequential prefetch. Dynamic prefetch can be used for both forward and backward scans (pure sequential prefetch is forward-only), and it tends to make better use of buffer pool resources. With more dynamic prefetch, queries can be expected to run in less time and to consume less in the way of CPU resources.

My favorite DB2 9 features for data warehouse cost-efficiency are:
  1. Universal tablespaces. Compared to the space map pages in a traditional partitioned tablespace, the space map pages in a universal tablespace (UTS) provide more information about "holes" in data pages that can accommodate to-be-inserted rows that are of varying length (and rows in compressed tablespaces are varying-length, regardless of whether or not they contain any VARCHAR columns) and which need to be inserted "in the middle" of the target table (versus at the end) to maintain clustering sequence. Because of this additional information about space in data pages, insert processes can execute with reduced CPU cost when populating tables in universal tablespaces.
  2. Index compression. At some sites, indexes take up more disk space than tables. DB2 9 index compression can reduce disk space requirements for an index by 50-70%, with relatively little cost in the way of CPU overhead (often in the low single digits of percent).
  3. Unused index detection. Dynamic SQL dominates in most DB2 for z/OS data warehouse systems, and it used to be very difficult to identify indexes that are not being used to improve the performance of any queries. DB2 9 introduced a new column, LASTUSED, in the real-time statistics table SYSIBM.SYSINDEXSPACESTATS. This column shows the last time that an index was used by a SELECT or FETCH statement, or by an UPDATE or a DELETE statement with predicates, or to check on a referential integrity constraint. Indexes that have not been used for any such purpose will have the default value or 1/1/0001 in the LASTUSED column. These are indexes that can probably be deleted to reduce disk space consumption and lower CPU costs (unneeded indexes drive up the cost of data change operations).

And, my top-three favorite DB2 9 data warehouse functionality features:
  1. INSTEAD OF triggers. Many views, such as those on multi-table joins, are read-only. INSTEAD OF triggers can be used to enable updates through such views, allowing developers to reference the same object for both data retrieval and data change operations.
  2. INTERSECT and EXCEPT. These two new set operators (added to the UNION operator, which has been around for a long time) make it much easier to get result sets based on the comparison of two different query result sets. INTERSECT instructs DB2 to retrieve rows from result set A that also appear in result set B. EXCEPT can be used to get rows from result set A that do not appear in result set B.
  3. RANK, DENSE_RANK, and ROW_NUMBER. These so-called OLAP functions make it easy to assign numeric values to result set rows based on user-specified ordering criteria (which could be quite different from the ORDER BY sequence specified at the "end" of a query). Consider a set of rows to be numbered based on descending unit sales amounts. RANK could have gaps in the number values if there are "ties" (if row one has a unit sales figure of 1000 and both rows two and three have a unit sales figure of 900, rows two and three will both have a RANK or 2 and row four, with a unit sales figure of 800, will have a RANK of 4). DENSE_RANK leaves no gaps in the value of assigned numbers (given the example in the previous parenthetical phrase, rows two and three would have a DENSE_RANK value of 2 and row four would have a DENSE_RANK value of 3). ROW_NUMBER assigns numbers in sequence with no gaps and no repeats (so rows two and three as previously described with get ROW_NUMBER values of  2 and 3, respectively, even though they both have the same unit sales value of 900).

I concluded my presentation with some performance monitoring and tuning hints and tips:
  • Make DB2 buffer pools big if you have a server with a lot of memory. This may sound like a statement of the obvious, but a lot of folks have pretty small buffer pools in spite of having DB2 running on a mainframe server with tens of gigabytes of memory. I blogged on buffer pool sizing a few weeks ago.
  • Page-fix the buffers in heavily used buffer pools. This move can deliver some nice CPU savings. I blogged on this topic earlier this year.
  • Take advantage of tools for query performance analysis. I'm a big fan of IBM's free and downloadable Optimization Service Center for DB2 for z/OS.
  • Try executing a long-running, complex query in "chunks" to isolate a performance problem. Suppose that a query with a number of local (i.e., non-join) subquery predicates runs in 30 minutes. Try executing the query minus one of these subquery predicates. Then try it with that predicate back in the picture but without another of the subquery predicates. Keep doing this to see if the query's run time drops significantly after one of the subquery predicate has been removed. If that happens, focus your tuning efforts on that subquery predicate.

The strength of the System z platform, the convenience and simplicity of a BI "appliance." The key meeting on my schedule today involved a discussion of IBM's Smart Analytics System 9600. Demand is growing for preconfigured, optimized, full-function systems that accelerate data warehouse implementations by providing an end-to-end solution "out of the box." The Smart Analytics System 9600 brings this popular approach to the super-scalable, super-secure, super-highly-available System z platform. Everything you need to get productive with business intelligence -- the mainframe server, the z/OS operating system, DB2 for z/OS, InfoSphere Warehouse for DB2 for z/OS (providing, among other things, cubing services), Cognos BI software for advanced analytics, and implementation services -- comes with the System 9600, all at a price that is substantially lower than what you'd pay to get the components separately. Check it out -- it could be a good fit for your organization.

No comments:

Post a Comment