Tuesday, September 25, 2012

Linux on System z: a Better Place for Your DB2 for z/OS Client-Server Applications

In DB2 for z/OS circles, there has been an evolution of attitudes concerning Linux on System z. Back around 2003, when I was on the user side of the DB2 community, I was part of a team charged with developing a plan for the simplification and streamlining of my company's IT infrastructure. One of the mainframers on this team had a suggestion: why not add a few IFL engines (more on this to come) to our System z servers and use those to consolidate a lot of our Linux systems (which were running primarily on x86 boxes)? I'll confess that this suggestion seemed a little weird to me, a little "out there" -- Linux on a mainframe? Really? The person who was advocating running Linux on System z made some good points, but the idea didn't get far and was set aside. Linux was kind of new to us at the time, and while we were ready to increase the use of this OS in our environment (and we did), we weren't ready to even seriously consider running it on mainframes.

Fast forward to 2010. A few months after rejoining IBM (I'd previously worked for Big Blue from 1982 to 2000), I was working with people at an insurance company where Linux on System z was being put through a trial run. What they were willing to run under Linux on System z was a DB2 Connect gateway, not a more high-profile workload such as an application server or a business analytics tool -- an indication of the iffiness of the Linux-on-z solution as seen by the firm's IT leaders.

Jump now to the present, and it seems to me that a corner has been turned. Over the past couple of years especially, the thinking of many IT people -- both mainframers and Linux specialists -- towards Linux on System has gone from cautious consideration to genuine enthusiasm. All kinds of folks have recently come to the realization that Linux on a mainframe is not some kind of sideways, six-of-one/half-dozen-of-the-other move, in terms of where Linux runs in the enterprise; rather, it's a move that upgrades the IT infrastructure in multiple ways, delivering enhanced performance, cost efficiency, availability, and security. A big chunk of installed System z MIPS, right now, is IFL capacity, and that slice of the pie is growing at a rapid clip (IFLs -- short for Integrated Facility for Linux -- are System z processors that are dedicated to Linux workloads). If I were you, I'd be looking to ride that wave.

In terms of factors that are driving the upsurge in Linux on System z usage, two really stand out: server consolidation and front-ending DB2 for z/OS. That first factor applies to Linux systems that may or may not have any connection to DB2 for z/OS subsystems. A wide variety of Linux workloads can be more cost-effectively run on mainframes versus distributed systems servers. Want to know what one of the big money savers is here? Think software licensing costs: the price of commercial software that runs under Linux is usually based on the number of processors on the system (or the partition) on which the OS is running, and plenty of organizations have found that moving Linux workloads to mainframes results in a decreases in the number of engines needed for those workloads, and that allows for a reduction in licensing costs for vendor-supplied software running in the Linux images.

That reduced-cost advantage also comes into play with regard to the second Linux-on-z driver, DB2 for z/OS front-ending. So do advantages in the areas of performance and security. When a Linux workload -- perhaps an application server running Java programs, or a vendor-supplied ERP or CRM package, or a business analytics tool -- is the client to a DB2 for z/OS server (i.e., if the Linux-based application is a DB2 DRDA requester), that back-and-forth network traffic can flow via HiperSockets. That's memory-to-memory data transfer, folks, and it screams. This "internal LAN" set-up also offers a security benefit, as less of your network is physically exposed to those who might want hack it (as Jim Elliott, one of IBM's prominent Linux-on-z experts, likes to put it, "The best LAN is one with no wires").

At this point, I would be remiss if I didn't bring z/VM into the picture. Can you run Linux on System z without z/VM? Absolutely, but z/VM delivers tremendous virtualization, management, availability, and security capabilities for your Linux-on-z environment. On the security front, RACF for z/VM brings the same peerless validation and auditing features that have made RACF for z/OS famous (note, too, that the cryptographic hardware on System z servers can be exploited for Linux workloads). High availability can be taken higher via the live guest relocation capability of a clustered z/VM configuration, which enables you to move a running Linux image from one z/VM LPAR to another.

Back to dollar (or your local currency) savings: I haven't yet given you the whole picture with regard to the cost-efficiency advantages of Linux on System z. Those IFLs I mentioned are not only great performers, allowing you to do more Linux work with fewer engines -- they are also very attractively priced (and of course they have no impact on the licensing costs for z/OS software running on the same System z server). Want to get even more bang for your bucks? Check out the IBM System z Solution Edition for Enterprise Linux, an offering for existing mainframe systems that combines IFLs, memory, z/VM, and maintenance in a package that's a terrific deal.

Are you wondering who in your shop would administer Linux systems on a mainframe? That's easy: your current Linux system administrators would do the job just fine. During a Linux on System z proof of concept on which I worked a couple of years ago, I recall that a Linux pro in the client's IT department said something to this effect: "It's just Linux." Bingo -- and it's available in distributions from SUSE and Red Hat. What if your organization decides to bring in z/VM with Linux for System z (good move)? Who will take care of that OS, if it's new to your company? Answer: your z/OS systems programmers. First, there's a good chance that one or more of them have some VM experience (many mainframers do). Second, VM has been significantly simplified in recent years (for a "simple" install of z/VM, the instructions take up all of one page).

I'll close by reiterating the fact that Linux on System z is not some Johnny-come-lately solution. It was announced in December of 1999, and it's been regularly enhanced since then. Yes, it took a while for this System z configuration option to catch fire, but it is certainly hot now. If you have Linux systems that connect to DB2 for z/OS, running those Linux images on the same mainframe where the DB2 for z/OS data server runs can take the core of your IT infrastructure to new levels of efficiency, performance, scalability, security, and manageability. Get smart, and check it out.

Monday, September 17, 2012

DB2 10 for z/OS: New Options for Trimming and Upgrading Indexes

First, some explanation as to what I mean by "trimming and upgrading" indexes. My use of "trimming" refers to reducing the number of indexes defined on tables, through elimination of indexes that aren't providing value. Having fewer indexes on a table is good for cost efficiency, as every index increases the CPU cost of every insert and delete operation targeting the table (and every update of an indexed column), increases the cost of execution of several DB2 utilities (including LOAD, REORG, and RUNSTATS), and drives up disk space utilization (though that cost can be mitigated through the index compression capability introduced with DB2 9 for z/OS, about which I blogged a couple of years ago while working as an independent DB2 consultant). That's clear enough, but you may wonder why an index serving no useful purpose got created in your environment in the first place. Perhaps an index created in anticipation that it would be needed for good application performance turned out to be unnecessary for that purpose. Another possibility is that an index that had been necessary for technical reasons became unnecessary (or potentially so) through some enhancement delivered with a recent release of DB2. I'll provide more information on both of these scenarios in the remainder of this blog entry.

The index "upgrade" concept is related to index "trimming," in that identification of an unnecessary index presents you with a couple of options: one is to eliminate the useless index (to reduce CPU and disk costs, as mentioned); the other is to replace that index with one that is useful -- probably in terms of reducing CPU and elapsed time for some important queries (or searched updates or deletes). I'd consider that action to be an index "upgrade" because you've improved application performance while holding the line with respect to the number of indexes defined on a table (one useless index out, one useful index in).

Whether your aim is index trimming, index upgrading, or both, the essential first step is identification of indexes that can be eliminated because they no longer serve a useful purpose (if indeed they ever did). The last several releases of DB2 for z/OS have made this task easier and more fruitful. I'll first provide a quick review of enhancements related to index trimming and upgrading that were introduced with DB2 V8 and V9, then I'll get into new opportunities for tightening up and/or boosting the effectiveness of your index configuration that were made available through DB2 10 for z/OS. So, the review:
  • Table-controlled table partitioning (DB2 V8). Though this enhancement was delivered years ago, a lot of mainframe DB2 shops have yet to leverage its value (as I pointed out in an entry I posted to this blog a few months ago). Basically, it comes down to this: when you convert an index-controlled partitioned table space to an table-controlled partitioned table space, the formerly partition-controlling index may end up becoming useless. There's your opportunity for index trimming or upgrading.
  • The LASTUSED column of the SYSIBM.SYSINDEXSPACESTATS catalog table (DB2 9). Those "I thought we'd need it, but maybe we don't" indexes were once pretty hard to find. Looking for package dependencies on an index will tell you when one is not used by static SQL statements, but what about dynamic SQL? The LASTUSED column of the real-time statistics table SYSIBM.SYSINDEXSPACESTATS makes identification of indexes that are just taking up space much easier than before, and that multiplies your index trimming and upgrading opportunities. I've blogged about this DB2 feature several times, most recently in an entry posted to this blog a couple of years ago.
  • Index-on-expression (DB2 9). Index upgrading is all about replacing an unnecessary index with one that does you some good. DB2 9 broadened the "does you some good" landscape by introducing a new category of indexes, that being indexes defined not just on columns, but on column expressions, such as SUBSTR(DEPT_NAME, 2, 3). With such an index, a predicate that had been non-indexable can be made indexable, leading to potentially dramatic query performance improvement. I blogged about this DB2 enhancement a few years ago.
  • The XML data type (DB2 9). Just storing XML data in a column defined with the XML data type (as opposed to, for example, the VARCHAR or CLOB data type) can deliver greatly improved performance with respect to querying said XML data. Create an index on that XML column (via XPath expressions allowed in index definitions), and you can REALLY speed things up (particularly when the XML documents are pretty large). This is another of those index upgrade opportunities -- and keep in mind that it pertains not only to retrieval of XML data, but to retrieval of non-XML data based on a predicate that references an XML column (an excellent reference for XML data in a mainframe DB2 environment is the IBM "red book" titled Extremely pureXML in DB2 10 for z/OS -- most of the information in which is also applicable to a DB2 9 system).

OK, now for the good stuff delivered through DB2 10 for z/OS that can take your index trimming and upgrading efforts to new levels of effectiveness. I'll start with what are called index INCLUDE columns. This is really big, in that it created a whole new category of indexes that could be eliminated, thereby paving the way for new index trimming and upgrading actions. The concept of index INCLUDE columns is simple, and best explained through an example. Suppose you have a table for which the combined values of columns C1 and C2 in each row have to be unique. To enable DB2 to enforce that requirement, you defined a unique constraint on (C1, C2), and created a unique index on (C1, C2). Suppose further that you wanted to provide index-only access for queries that reference only columns C1, C2, C3, and C4 of the table, or that you wanted to enable sort avoidance for queries that contain ORDER BY C1, C2, C3, C4. In a pre-DB2 10 environment, you addressed this second need by creating an additional index on columns C1, C2, C3, and C4. DB2 10 (in new-function mode) provides a new option: you can alter the unique index on (C1, C2) to INCLUDE (C3), and again to INCLUDE (C4) (for ALTER INDEX, only one column can be named in an INCLUDE specification -- n columns are included in an index through n executions of ALTER INDEX). In doing that, you'd end up with one index that could accomplish two purposes: 1) enforcement of uniqueness for (C1, C2) value pairs, and 2) query performance enhancement (through enablement of index-only access for queries referencing only columns C1, C2, C3, and C4; and sort avoidance for queries containing ORDER BY C1, C2, C3, C4 ). Voila! That index on (C1, C2, C3, C4) is now unnecessary and can be eliminated, clearing the way for index trimming or index upgrading. Note that INCLUDE is an option for CREATE INDEX as well as ALTER INDEX statements (in the case of CREATE INDEX, several columns can be listed in the INCLUDE part of the statement, so INCLUDE (C3, C4) is OK, syntax-wise). Whether it's an ALTER INDEX or a CREATE INDEX statement, the aim is the same: use one column (or set of columns) for enforcement of a UNIQUE constraint (or for a primary key), and with the same index include another column (or set of columns) that are "along for the ride" to provide better-performing query access paths. The extra columns added to an index via INCLUDE do NOT factor into uniqueness determination with regard to the "core" key on which the unique index is defined -- that's what enables dual-purpose indexing.

Here's another index trimming/upgrading opportunity provided courtesy of DB2 10: hash-organized tables. This is a feature that has generated a good bit of buzz in the DB2 community, but some folks have overlooked the tie-in to index trimming and upgrading. Here's how hash organization of data in a table works: when a table is created (in a DB2 10 new-function mode environment) with ORGANIZE BY HASH (or when an existing table is changed via ALTER TABLE to ADD ORGANIZE BY HASH), assignment of rows to table pages is not based on a clustering key; rather, the hash key value for a row (this is a unique key specified by the user) is run through a hashing algorithm whose output determines the target page. Generally speaking, when people think of hash-organized tables, they think of the performance boost that can be achieved when a query referencing the hash key in an equals predicate can get a row from DB2 with a single GETPAGE. That's great (for single-row access -- not so much for sequential data access), but here's something else to consider: if you're looking to alter an existing table to be hash-organized, and the unique key that you're going to use for hash-organization purposes is already defined as unique and has a corresponding uniqueness-enforcing index (and this applies as well to a primary key), that unique index will likely be unnecessary following the change to hash organization. Why? Because DB2 does not need an index on the hash key to ensure uniqueness of that key -- it can accomplish duplicate-value checking very efficiently using just the table (remember that the location of a row in a hash-organized table can be directly ascertained using the value of the hash key in the row). So, you can probably drop the index that had been previously created on the key that is the hash key for the hash-organized table, and there you have that new opportunity for index trimming or upgrading. Now, I say that you can probably drop that unique index, because it's possible that you'd want to keep it for queries that 1) reference only the column or columns in that index and 2) involve a scan of index values (recall that hash-organized tables are not good for the performance of data retrieval operations that involve sequential access).

[Note: technically speaking, converting a table to be hash-organized results in the creation by DB2 of a new index on the table, but this index -- the hash-overflow index, used to locate rows that were stored in the table space's overflow area because the hash-determined target page was full -- is a "sparse" index, containing entries only for rows in the overflow area. It does not have the CPU and disk space costs associated with traditional indexes.]

So, index trimming and upgrading is good for the efficiency of your DB2 for z/OS environment, and good for the performance of your DB2-accessing applications. DB2 10 continues the trend of giving you more opportunities to prune your index configuration, or to maintain a certain level, with respect to the number of indexes defined, while boosting overall index effectiveness (through replacement of unneeded indexes with value-delivering indexes). Invest some time in this area, and you're likely to see a good return.