Robert's Db2 blog: November 2011

Wednesday, November 23, 2011

DB2 Data Sharing: What do you Know about Member Subsetting?

DB2 for z/OS data sharing is great technology. One of the things that makes it great is the ability of a data sharing system to automatically spread work across DB2 members so as to optimize load balancing, throughput, and availability. For this we can thank the z/OS workload manager (WLM), the DB2 group attach capability (for local-to-DB2 applications such as batch programs and utilities), and the Sysplex awareness built into DB2 Connect, the IBM Data Server Driver, and other DB2 clients (for remote data requesters).

Nice as it is to let the system spread work around the Parallel Sysplex, there are times when you'd like a particular workload to run on a subset of the members of the data sharing group. Here's a real-world example from a DB2-using organization that I support: the company has a business intelligence, or BI, workload (fairly complex data analysis queries) that targets tables also accessed by "run the business" OLTP and batch programs. To keep the BI queries from interfering with the highly performance-sensitive operational application programs, the organization restricted the analytic workload to a single member of their 6-way data sharing group. This workload isolation was easily implemented: the BI queries were issued by remote requesters, and the client systems connected to the one DB2 member to which the analytic SQL statements were restricted, versus connecting via the location name of the data sharing group (connecting to the group's location name causes requests to be spread across all of the group's DB2 members).

All well and good -- to a point. Yes, the high-volume online and batch programs with stringent performance requirements were insulated from the more dynamic BI application workload, but the analytics users were not protected from system outages as were the other users of the DB2 data sharing system. Here's what I mean by that statement: suppose that DBS1 (a made-up name) were the subsystem to which users of the BI application were required to connect. If that subsystem were to be unavailable for some reason, the BI users would get connection failures and might be out of luck for a while. In contrast to that situation, the operational application users (whether connecting locally via the DB2 group attach name, or remotely over the network using the group's location name) were assured of successful connections as long as any of the member DB2 subsystems were available (thanks, in the network-attached case, to the use of dynamic virtual IP addressing, which I briefly described in a blog entry from my independent DB2 consulting days, and about which you can read more in the IBM "red book" titled "DB2 9 for z/OS: Distributed Functions" -- most all of which is applicable to DB2 10, as well).

So, how to keep a given application workload from running on all members of a DB2 data sharing group without taking from that application's users the super-high-availability benefits conferred by DB2's shared-data technology? The solution to that challenge is something called member subsetting, and it was introduced -- for network-attached requesters -- with DB2 Version 8 (I'll cover member subsetting for locally-attached applications momentarily). Implementation of member subsetting for DRDA requesters involves creating one or more location aliases that map to subsets of the DB2 subsystems that comprise the data sharing group. For example, suppose that application ABC is to be restricted to members DBP1 and DBP2 of a 5-way data sharing group that has a location name of LOCDB2Y. In that case, a location alias associated only with members DBP1 and DBP2 could be created. Let's say that the location alias so defined is named LOCDB2X. If application ABC connects to location LOCDB2X, SQL statements issued through the application will execute only on subsystems DBP1 and DBP2 (there's your workload isolation); further, requests to connect to location alias LOCDB2X will succeed as long as one of the two associated subsystems is available -- so, if DBP1 is down due to a failure situation or a planned outage (perhaps a software maintenance upgrade), LOCDB2X connection requests will be handled by subsystem DBP2 (there's your high-availability set-up). Pretty sweet.

Setting up a location alias in a DB2 Version 8 or DB2 9 environment involves updating the communication record in the DB2 bootstrap data set (BSDS). That's done by executing the DB2 change log inventory utility (aka DSNJU003) with an input statement that would look something like this (referring to names I've used in my example, this statement would be executed via DSNJU003 on member DBP1 -- the statement for member DBP2 would look the same, but with a different RESPORT value):

DDF LOCATION=LOCDB2Y,PORT=1237,RESPORT=1238,ALIAS=LOCDB2X:8002

The location alias would have the same IP address as the group's location name. All members of the group listen on port 1237, but only members DBP1 and DBP2 would listen for requests directed to port 8002. Sysplex workload balancing for the subsystems to which location alias LOCDB2X works as it does for the overall group, except that the list of available subsystems returned to the DB2 client (e.g., the IBM Data Server or DB2 Connect), sorted in descending order of processing capacity as assessed by WLM, includes only DBP1 and DBP2.

Very good stuff, here, with one hitch: DSNJU003 can only be executed when DB2 is down. DB2 10 delivered an online means of defining a location alias via the new command -MODIFY DDF. Again using the names referred to previously, a location alias mapping to subsystem DBP1 would be dynamically created through the issuance of this command on DBP1 (the command would also be issued on DBP2 to map the alias to that subsystem):

-MODIFY DDF ALIAS(LOCDB2X) ADD

The -MODIFY DDF command would be executed again to specify a port number for LOCDB2X -- one option pertaining to a location alias can be specified per issuance of the command. Dynamically added location aliases are initially in a stopped state by default, so the last command issued to activate the fully-defined location alias would be (and again, this command would be issued on each subsystem associated with the alias):

-MODIFY DDF ALIAS(LOCDB2X) START

And, the story keeps getting better: DB2 10 provided a way to implement member subsetting for locally-attached applications (those being applications, such as batch jobs, that run on the same Parallel Sysplex as the DB2 data sharing members). This DB2 10 feature is called subgroup attach. A subgroup attach name for a DB2 10 subsystem can be specified via the SUBGRP ATTACH field of installation CLIST panel DSNTIPK. It can also be added by modifying the IEFSSNxx member of SYS1.PARMLIB and the IPLing the associated z/OS LPAR (details can be found in the DB2 10 Installation and Migration Guide, available online at http://www-01.ibm.com/support/docview.wss?uid=swg27019288#manuals).

So, if your organization has a DB2 data sharing group and you want to achieve both application workload isolation AND super-high availability for users of said application, implement member subsetting via a location alias or a subgroup attach. It's like having your cake and eating it, too.

Wednesday, November 2, 2011

IOD Dispatch (3) - Gleanings from the DB2 for z/OS "Coffee Track"

IBM's 2011 Information on Demand conference took place last week in Las Vegas. I attended a number of the event's technical sessions, and summarized what I heard in some of those sessions in a couple of entries posted to this blog. In this entry, my last focused on IOD 2011, I'll provide some of the more interesting items of information that I picked up in the "coffee track" -- that being the term often used to describe discussions that take place during coffee breaks and otherwise outside of the formal presentations. I don't know about you, but the "coffee track" has always been for me one of the most valuable aspects of conferences such as IOD. I really enjoy informal conversations with IBM DB2 developers and with professionals in the DB2 user community. Even when these talks are brief, they often result in my seeing a DB2 feature or function in a new light, and thus my understanding of DB2 technology is broadened. So, without further ado, here are some of the highlights from my "coffee track" time last week:

A lot of mainframe DB2 people still have a lot to learn about native SQL procedures. On Tuesday, I delivered a presentation on DB2 for z/OS stored procedures. During that hour I spent a lot of time talking about native SQL procedures, and some related questions that I got from a session attendee were illuminating. Native SQL procedures, by the way, were introduced with DB2 9 in new-function mode (I blogged about them a few years ago, when DB2 9 was new and I was working as an independent DB2 consultant). The functionality enables the development of stored procedures that are written entirely in SQL, and unlike external SQL procedures (introduced with DB2 Version 7), which are turned into C language programs that run in a stored procedure address space, native SQL procedures run in the DB2 database services address space and have no associated external-to-DB2 executable. Anyway, the aforementioned session attendee asked me about deployment of native SQL procedures from one DB2 environment to another (e.g., from test to production). My answer didn't quite hit the mark, and the question was asked again in a slightly different form. We went back and forth in this way for a while, until the session attendee finally asked, "With native SQL procedures, there's just one 'piece', right?" Right.

With the phrase "one piece," this person was contrasting native SQL procedures with traditional external stored procedures, which have two "pieces": an external-to-DB2 program written in a language such as COBOL or C or Java, and that program's DB2 package. In that application environment (one with which many mainframe DB2 people are very familiar), stored procedure deployment is mostly about deployment of the external-to-DB2 program, and the DB2 "piece" -- the package -- is dragged along in that process. In the case of COBOL stored procedures, for example, the external-to-DB2 executable exists in the form of a load module. Organizations have procedures, often involving the use of software tools, that are followed to get a load module deployed into a given system, and these procedures ensure that the associated DB2 package is also made available in the target system. But what if the only "piece" of a stored procedure is the DB2 package? If the DB2 package has previously been "along for the ride" as an external-to-DB2 program is deployed in, say, the production system, what's to be done when the package is in the driver's seat, so to speak, and the car (to continue the vehicular analogy) is otherwise empty (i.e., there's not a second "piece" to the stored procedure)? That is indeed something new for DB2 people, and dealing with that situation means learning about the new (with DB2 9) DEPLOY option of the BIND PACKAGE command, and about extensions to the ALTER PROCEDURE statement, such as the ACTIVATE VERSION option (useful for "backing out" a version of a native SQL procedure that is causing errors). So, if you're looking to take advantage of native SQL procedure functionality (and there can be multiple benefits of doing so), get familiar not only with development of these procedures, but with deployment, as well. I have some introductory information in an entry I posted a couple of years ago to the blog I maintained while working as an independent consultant, and you can find more detailed information in the DB2 Command Reference (for BIND PACKAGE) and the DB2 SQL Reference (for ALTER PROCEDURE) -- both the DB2 9 and the DB2 10 manuals are available online at IBM's Web site.

DB2 for z/OS for data warehousing -- when the application "cannot fail." I had a talk with a couple of IOD attendees from a large company. Their organization had recently spent months in developing go-forward plans for a mission-critical business intelligence application. The basic decision was this: leave the application's database on the mainframe DB2 platform, where it had been for some time, or move it to a non-mainframe, non-DB2 system. The company chose to stay with DB2 for z/OS, mainly for two reasons: 1) they are confident that DB2 on System z can scale to handle the big increase in database size and access activity expected over the coming years, and 2) they place a high value on the rock-solid reliability of the mainframe DB2 platform (as one of the people with whom I spoke put it: "The [decision support application] database cannot fail"). In expanding on that second point, these folks went beyond System z's and DB2's well-earned reputation for high availability, and talked up the excellence of their DB2 for z/OS support staff -- systems programmers and DBAs.

Sometimes, we take for granted the deep expertise possessed by a lot of companies' mainframe DB2 teams. DB2 for z/OS has been around for more than 25 years (IBM invented relational database technology), and many DB2 for z/OS professionals have two decades or more of experience in working with the product. They know how to configure DB2 systems and design DB2 databases for high availability, and they know what to do if a data recovery situation arises. On top of that, these teams tend to have have processes, developed over a number of years, around things such as change management, capacity planning, and performance monitoring and tuning, that are very robust and very effective. Combine a super-reliable hardware/software platform with a veteran and highly knowledgeable support staff, and you get a data-serving system that organizational leaders trust to deliver information as needed, whenever it's needed.

DB2 10 for z/OS: it's about time. There was a lot of buzz at IOD 2011 about the temporal data support provided by DB2 10 for z/OS, and with good reason: this is functionality that would be a bear to provide via your own application code (something I know from first-hand experience), and even if you went that route you couldn't match the performance of the built-in DB2 capability.

There are two flavors of temporal data support available in a DB2 10 new-function mode environment (and they can be employed separately or together for a given table): system time (this has to do with automatic saving of "before" images of rows targeted by UPDATE and DELETE operations in a history table, with information indicating the timestamps between which a row was in the "current" versus the "history" state) and business time (this enables, among other things, a "future insert" capability, whereby a row with, for example, next year's price for a product can be added to a table without affecting programs querying "currently valid" rows).

I appreciated the value of system time temporal data support as soon as I learned of it, having done, in the mid-1990s, some consulting work for an insurance company that had written "row change history" logic into their application code -- this to enable determination of a policy holder's coverage at the time that an automobile accident (for example) occurred. With regard to business time, however, I'll admit that I initially had some trouble envisioning use cases. I heard of an interesting one at the conference last week: a company is utilizing DB2 10 business time to enable analysts to do future forecasting of profit margins. Very cool. I expect that as time goes by, organizations will come up with new and innovative ways to derive value from the system time and business time capabilities of DB2 10.

I had a great time last week, and I'm already looking forward to IOD 2012 -- hope to see you there.