Tuesday, October 11, 2011

Even a "One-Box" DB2 Data Sharing Group Can Boost Availability

Back in the mid-1990s, when organizations were first implementing DB2 for z/OS data sharing groups on Parallel Sysplex mainframe clusters, there were ALWAYS multiple server "boxes" involved. For one thing, the only type of coupling facility (CF) available was the external type -- a box of its own. And, you'd have two of those, to allow for things like lock structure rebuild in CF2 in case of the failure of, or loss of connectivity to, CF1. So, that's two boxes already, and we haven't gotten to the mainframe servers yet. You'd almost certainly have at least two of these, as well, because the mainframes that IBM was selling at the time -- the first generation built with CMOS microprocessors versus bipolar chip sets -- maxed out at 50 MIPS of processing power (5 MIPS per engine -- I'm not kidding -- with up to 10 engines per box). Parallel Sysplexes with four or five or more boxes -- coupling facilities and mainframe servers -- were the norm.

Not too many years later, the number of boxes in Parallel Sysplex / DB2 data sharing configurations started going down. Two key factors were at play here. First, IBM came out with internal coupling facilities (ICFs) -- basically allowing an organization to dedicate one or more mainframe engines and some mainframe memory to one or more LPARs that would run Coupling Facility Control Code instead of z/OS. That reduced Parallel Sysplex server "footprints," and saved companies some money, to boot (internal CFs are less expensive than their external cousins). The other development that had a reducing effect on the number of boxes found in the typical Parallel Sysplex was the introduction of ever-more-powerful mainframe servers -- machines featuring more engines and more MIPS per processor. Organizations found that they no longer needed five or eight or ten System z servers to get whopping huge amounts of processing capacity. The current flagship of the System z line, the z196, can have up to 80 general purpose processors with an aggregate capacity of more than 52,000 MIPS in ONE footprint (up to 16 additional processors can be configured as specialty engines, examples of which include internal coupling facility engines and zIIPs). Lash just a couple of these bad boys together in a 'Plex, and you've got one very highly scalable computing platform.

With all this about internal coupling facilities and super-powerful mainframes said, plenty of organizations want a minimum of three boxes in a Parallel Sysplex / DB2 data sharing configuration. A three-box set-up does indeed provide the ultimate in high availability, as it enables you to avoid a scenario in which the failure of one box results in the simultaneous loss of 1) a DB2 data sharing member and 2) the coupling facility lock structure and/or the coupling facility shared communications area (that particular simultaneous loss would cause a data sharing group failure which would then require a group restart). This scenario, sometimes called the "double failure" scenario, can also be avoided in a two-box Parallel Sysplex if the coupling facility lock structure and the coupling facility shared communications area are duplexed. I wrote a good bit about the "double failure" scenario and the pros and cons of lock structure and SCA duplexing in an entry posted last year to the blog I maintained while working as an independent DB2 consultant.

Ultimate DB2 data sharing availability, then, is delivered by a multi-box Parallel Sysplex. Now, suppose your organization runs DB2 for z/OS on one System z server, and suppose that single-box situation is unlikely to change anytime soon. Could your company still realize value from the implementation of a DB2 data sharing group, even if that group were to run on a one-box Parallel Sysplex? YES. Here's why: even on a one-box Sysplex, DB2 data sharing can deliver a very important availability benefit: the ability to apply DB2 maintenance, and even to migrate to a new release of DB2 (as I pointed out in a previous post to this blog), without the need for a maintenance window (you apply fixes to the load library of a member of the DB2 data sharing group, quiesce application traffic to that member, stop and start the member to activate the maintenance, resume the flow of application requests to the member, then do the same for the other member -- or members -- of the group). To put it another way: even when the Parallel Sysplex infrastructure (at least two z/OS LPARs, and at least two internal coupling facilities) is configured on one mainframe, implementation of a DB2 data sharing group will enable you to apply DB2 maintenance (and to perform DB2 version upgrades) without having to stop the application workload. At a time when unplanned DB2 outages are more and more rare (thanks to ever-more-reliable hardware and software), the opportunity to virtually eliminate planned outages can be a very big deal for an organization.

And even though the frequency of unplanned DB2 outages is diminishing, in an increasingly online world the cost of unexpected downtime is higher than ever (i.e., tolerance for unplanned outages is lower than ever). A DB2 data sharing group running on a one-box Parallel Sysplex can greatly reduce the scope of an abnormal DB2 subsystem termination: if such a failure occurs in a data sharing system, only the data and index pages (or rows, if row-level locking is used) changed by units of work that were in-flight on the failing DB2 subsystem become unavailable until the failed subsystem is restarted (this as opposed to the whole database becoming inaccessible if a standalone DB2 subsystem terminates abnormally). On top of that, DB2 restart processing tends to complete more quickly in a data sharing versus a standalone DB2 environment (externalization of changed pages to group buffer pools at commit means that the roll-forward part of restart requires less time).

Of course, a data sharing group running on a one-box Parallel Sysplex can't provide data access if the one mainframe server fails. Maintaining application availability in the event of such a failure (or of a planned server outage) would require a multi-box Sysplex. The point I want to make is this: you can get a lot -- though not all -- of the availability benefits of DB2 data sharing even if your Parallel Sysplex is contained within one System z server (consider: mainframes are extremely reliable). There are organizations out there right now that have boosted uptime for DB2-accessing applications by implementing data sharing groups on single-box Parallel Sysplexes. Want ultimate data availability? Do data sharing on a multi-box Sysplex. If the multi-box route is not an option for your company, don't assume that DB2 data sharing can't work for you. It can. 

No comments:

Post a Comment