Earlier this year, some of my colleagues and I who work with IBM information management software for the System z platform were invited to write essays addressing the question, "Why is IBM System z essential to business?" I decided to act on this invitation, and to get my creative juices flowing I imagined an impromptu conversation with a skeptical CEO on the business value of System z. A co-worker requested that I post my essay on my DB2 blog, so I'm doing that now. You'll find it below. I hope that it will provide you with some ideas that you can use in discussions that you might have at your site about the value of System z as an enterprise data-serving platform.
On a recent business trip, I got an upgrade to first class on the flight home. The person seated next to me turned out to be the CEO of a retail company. When I told her that I work for IBM, she said, “Ah, yes. Big Blue. Big Iron.” And then, “When are you folks going to get out of the mainframe business and into the modern world?”
Smiling, I replied, “Since that’s a two-part question, I’ll give you a two-part answer. First, I suppose we’ll get out of the mainframe business when our customers give up on the platform, and our customers are doing nothing of the sort. Second, organizations that utilize mainframes, which in the context of our product line are called IBM System z servers, know that it’s not a choice between using mainframes and moving into the ‘modern world,’ as you put it, because System z technology is always moving forward – has been for almost fifty years. A server line doesn’t stick around that long if it’s standing still.”
The CEO smiled back at me, warming to the debate. “OK, so mainframes, or (air quotes) ‘System z servers,’ can do what they’ve done for a long time, and they can do it faster than before. But doing the same thing faster doesn’t mean they’re doing the right thing. Mainframes run legacy applications just fine, but newer applications have to be more flexible and adaptable – my CIO talks a lot about ‘agile’ applications. Frankly, I think of mainframes as rigid and ponderous – not the light-on-their-feet systems that we need in today’s world.”
Back to me. “I hear what you’re saying. In addition to ‘agile,’ I imagine that your CIO uses terms such as ‘service-oriented,’ ‘multi-tiered,’ and ‘loosely coupled’ in describing your newer application systems.”
“Those words sound familiar,” said the CEO.
“Your CIO is thinking the right way,” I said, “but there’s no reason that mainframes can’t be part of his thinking. Organizations all over the world are developing applications with all of the characteristics cited by your CIO, and these applications are running on mainframes, or they’re running on other platforms – and accessing data stored on mainframes. Either way, System z servers are the foundation on which these very modern applications are built.”
The CEO shook her head. “Even if building those kinds of applications on a mainframe is technically possible, it’s an expensive proposition based on what I know.”
“Look,” I said, “as the captain of a big ship, you understand that the tip of an iceberg is not the big concern – the whole of the iceberg is what matters. Similarly, people who say that mainframes are expensive tend to focus only on a part of the cost-of-computing iceberg – namely, the part that is the initial cost to acquire the server hardware and software. Why don’t you have your CIO take you through your data center to see all the other costs? Get him to show you where the mainframes are and compare that to where the non-mainframe servers are. How much floor space do those other servers take up? How much electricity do they consume? How much heat do they pump out, adding to your data center cooling costs? After you’ve surveyed your computer server scene, get the CIO to go over his org chart with you. How many people support the System z servers? How many support the non-mainframe servers? I think you’ll find that your System z servers handle a big workload at a total cost of computing that compares very favorably to that of your non-mainframe servers.”
I could see that the CEO’s gears were turning. “You’ve got me curious. I think I’ll take that data center tour. But I think that you’re leaving out an important cost factor: the people who develop our applications. The programmers we hire out of college know languages like Java. Mainframe programs are written mostly in COBOL, right? The only people who know that language these days are folks who’ve been programming since the 1970s, and a lot of them are pretty expensive.”
“It’s true,” I said, “that many programs running on mainframes are written in COBOL – largely because COBOL programs that were written years ago still run great on mainframes. They still keep the trucks rolling, the shelves stocked, and the customer invoices flowing. But application developers today – including the twenty-somethings – are accessing mainframe databases using Java and even Python and other languages with amusing names such as Ruby, C#, Perl, and PHP. These are often modern web-interfacing applications with slick user interfaces.”
“I don’t know,” said the CEO. “I hear that ‘open systems’ are the way to go. That leaves mainframes out, right?”
“Not as I see it. A computing platform can be ‘closed’ – in the sense that one company owns the design and development of the associated hardware and operating system – and still be an excellent fit in a heterogeneous application infrastructure. The key is to have open interfaces to the platform. When a Java programmer uses something called Java database connectivity, or JDBC – an industry-standard interface – to access records in a relational database, his work isn’t made any more challenging because he’s targeting a database on a System z server.”
“Well if the mainframe database looks to a programmer like databases on other servers, what’s the advantage of having the data on a mainframe?” asked the CEO.
“Good question. The mainframe database system likely looks different versus other data-serving platforms because it’s always up, never gets hacked, and never gets in the way of your company’s growth.”
Now the CEO cracks a pretty good smile. “Nice pitch. I’m going to follow up on it when I get back to the office.”
“You should. Remember: our job is to help you win in the marketplace. Some of your most effective weapons have “IBM” and “System z” stamped on them. Use them for all they’re worth, and kick some butt.”
“Will do.” Then, turning to the crossword puzzle in the in-flight magazine: “Do you know a three-letter word for ‘essential company’?”
This is the blog of Robert Catterall, an IBM Db2 for z/OS specialist. The opinions expressed herein are the author's, and should not be construed as reflecting official positions of the IBM Corporation.
Monday, June 25, 2012
Tuesday, June 5, 2012
The ONE Key Aspect of DB2 for z/OS Table-Controlled Partitioning that People Seem Most Likely to Forget
Sometimes, an important DB2 for z/OS feature can be out there for years, and yet not be fully appreciated by lots of people in the DB2 user community. Table-controlled partitioning is such a feature. In this blog entry I want to generally set the record straight about table-controlled partitioning, and in particular I want to make sure that mainframe DB2 folks understand a key aspect of this physical database design option that for some reason eludes many users to this day.
First, the basics: table-controlled partitioning refers to a means of range-partitioning the data in a table, so that, for example, sales records for retail store numbers 1-25 go into table space partition 1, records for store numbers 26-50 go into partition 2, records for store numbers 51-75 go into partition 3, and so on. Table-controlled partitioning is so called to distinguish it from index-controlled partitioning, which for years and years was the only way to assign a table's rows to table space partitions based on the value of a user-specified partitioning key: you created a partitioning index for a table, and in the definition of that index you associated table space partitions with ranges of the partitioning key.
That changed with DB2 Version 8 for z/OS, and this brings me to another point of clarification: some people get table-controlled partitioning and universal table spaces a little mixed up, in that they believe that table-controlled partitioning was introduced along with universal table spaces. If you've been under that impression, you're off by one DB2 release. DB2 V8 delivered table-controlled partitioning. DB2 9 delivered universal table spaces. There are two types of universal table space, and one of these, called a partition-by-range universal table space, is conceptually very similar to a DB2 V8 table-controlled partitioned table space (the other type of universal table space is known as partition-by-growth). Those who lump DB2 V8-introduced table-controlled partitioning in with DB2 9-introduced universal table spaces may be under the impression that you cannot convert an index-controlled partitioned table space to a table-controlled partitioned table space until you're running with DB2 10, because that's the release in which a smooth path to universal table spaces from other table space types was provided. In fact, in a DB2 V8 or DB2 9 (or DB2 10) environment you can easily convert an index-controlled partitioned table space to a table-controlled partitioned table space. How is that done? It's as simple as this: for the partitioning index of the index-controlled partitioned table space, issue an ALTER INDEX with a NOT CLUSTER. Bingo -- you've now got a table-controlled partitioned table space.
This simple means of converting an index-controlled partitioned table space to a table-controlled partitioned table space highlights the really key feature of table-controlled partitioning that -- as I mentioned up front -- has escaped the notice of many a DB2 person: data in a table-controlled partitioned table space can be partitioned by one key and clustered within partitions by another key. For a number of mainframe DB2-using organizations, the fact that the partitioning index of an index-controlled partitioned table space had to also be the table's clustering index was a pain. I worked in the early 2000s in the IT department of a financial services company, and I well recall that we had, in our DB2 for z/OS V7 database, a partitioned table space that presented us with a Hobson's choice: we could go with a key that would be good for partitioning (benefiting the performance of high-volume insert operations) but lousy for clustering (hurting query result set retrieval performance), or vice versa. When DB2 for z/OS V8 was announced, and with it the ability to specify data partitioning at the table level (versus the index level) and -- as a consequence -- to specify a clustering index with no dependence on the partitioning scheme, we jumped on that like a cat on a mouse. In my mind, this feature ranked right up there with 64-bit addressing as the MOST valuable capability provided via DB2 V8.
Obviously, not everyone in the DB2 user community shared this assessment of mine, as evidenced by the fact that I still find people who are under the impression that the partitioning key for a table-controlled partitioned table space has to be the table's clustering key. Again, NOT SO. Suppose you have a table for which it is important, for data load and offload purposes, that rows be partitioned by date (so, for example, one week's data goes into partition 1, the next week's into partition 2, and so on). For that same table, good query performance may depend on rows within the date-based partitions being clustered by account number. FINE. NO PROBLEM. Define the table with a specification of the date column as the partitioning key, and create an index, with the CLUSTER option specified, on the account number column. What if this table exists today in index-controlled partitioned form, with the partitioning/clustering index defined on the data column (good for load/offload, not good for result set retrieval)? In that case, you could first issue, as previously mentioned, an ALTER INDEX statement, with the NOT CLUSTER option, targeting the existing partitioning/clustering index on the table. Because the partitioning index on an index-controlled partitioned table space must also be the clustering index on the table, execution of this ALTER INDEX statement will have the effect of changing the table space's partitioning scheme from index-controlled to table-controlled (it will still be partitioned by the date column, but that partitioning information will now be tied to the table definition instead of being tied to an index's definition). Next, if you already have an index defined on the account number column, issue an ALTER INDEX statement, with the CLUSTER option specified, for that index (if an index on the account number column doesn't exist already, create one and include CLUSTER in the CREATE INDEX statement).
Here's another potential benefit you can get by going from index-controlled to table-controlled partitioning for a table space: you might be able to get rid of an index that is useful ONLY for partitioning purposes (i.e., it provides no query access path performance benefit, isn't defined on a foreign key, etc.). For the purpose of illustrating this point, consider an index-controlled partitioned table space that is partitioned on a column, called PARTITION_KEY, that is populated with a value by the program that inserts rows into the table. Not only will users never search on this column, it's probably a stinko clustering key, to boot. Issue the handy-dandy ALTER INDEX statement with the NOT CLUSTER option for this index, and alter some other index on the table to make it the clustering index. After that (and after checking that the index on the PARTITION_KEY column, now no longer needed for partitioning, is not needed for any other reason), drop the index on PARTITION_KEY. You'll save the disk space that had been occupied by the index, and insert and delete operations targeting the table will be more CPU-efficient (due to your having one less index on the table that would have to be maintained by DB2). Remember, the table space is still partitioned by the PARTITION_KEY column -- it's just that you no longer need an index to effect that partitioning scheme.
There you go. Nice capability, eh? Could you take advantage of it? Have you taken advantage of it? I hope so. Thanks for reading.
First, the basics: table-controlled partitioning refers to a means of range-partitioning the data in a table, so that, for example, sales records for retail store numbers 1-25 go into table space partition 1, records for store numbers 26-50 go into partition 2, records for store numbers 51-75 go into partition 3, and so on. Table-controlled partitioning is so called to distinguish it from index-controlled partitioning, which for years and years was the only way to assign a table's rows to table space partitions based on the value of a user-specified partitioning key: you created a partitioning index for a table, and in the definition of that index you associated table space partitions with ranges of the partitioning key.
That changed with DB2 Version 8 for z/OS, and this brings me to another point of clarification: some people get table-controlled partitioning and universal table spaces a little mixed up, in that they believe that table-controlled partitioning was introduced along with universal table spaces. If you've been under that impression, you're off by one DB2 release. DB2 V8 delivered table-controlled partitioning. DB2 9 delivered universal table spaces. There are two types of universal table space, and one of these, called a partition-by-range universal table space, is conceptually very similar to a DB2 V8 table-controlled partitioned table space (the other type of universal table space is known as partition-by-growth). Those who lump DB2 V8-introduced table-controlled partitioning in with DB2 9-introduced universal table spaces may be under the impression that you cannot convert an index-controlled partitioned table space to a table-controlled partitioned table space until you're running with DB2 10, because that's the release in which a smooth path to universal table spaces from other table space types was provided. In fact, in a DB2 V8 or DB2 9 (or DB2 10) environment you can easily convert an index-controlled partitioned table space to a table-controlled partitioned table space. How is that done? It's as simple as this: for the partitioning index of the index-controlled partitioned table space, issue an ALTER INDEX with a NOT CLUSTER. Bingo -- you've now got a table-controlled partitioned table space.
This simple means of converting an index-controlled partitioned table space to a table-controlled partitioned table space highlights the really key feature of table-controlled partitioning that -- as I mentioned up front -- has escaped the notice of many a DB2 person: data in a table-controlled partitioned table space can be partitioned by one key and clustered within partitions by another key. For a number of mainframe DB2-using organizations, the fact that the partitioning index of an index-controlled partitioned table space had to also be the table's clustering index was a pain. I worked in the early 2000s in the IT department of a financial services company, and I well recall that we had, in our DB2 for z/OS V7 database, a partitioned table space that presented us with a Hobson's choice: we could go with a key that would be good for partitioning (benefiting the performance of high-volume insert operations) but lousy for clustering (hurting query result set retrieval performance), or vice versa. When DB2 for z/OS V8 was announced, and with it the ability to specify data partitioning at the table level (versus the index level) and -- as a consequence -- to specify a clustering index with no dependence on the partitioning scheme, we jumped on that like a cat on a mouse. In my mind, this feature ranked right up there with 64-bit addressing as the MOST valuable capability provided via DB2 V8.
Obviously, not everyone in the DB2 user community shared this assessment of mine, as evidenced by the fact that I still find people who are under the impression that the partitioning key for a table-controlled partitioned table space has to be the table's clustering key. Again, NOT SO. Suppose you have a table for which it is important, for data load and offload purposes, that rows be partitioned by date (so, for example, one week's data goes into partition 1, the next week's into partition 2, and so on). For that same table, good query performance may depend on rows within the date-based partitions being clustered by account number. FINE. NO PROBLEM. Define the table with a specification of the date column as the partitioning key, and create an index, with the CLUSTER option specified, on the account number column. What if this table exists today in index-controlled partitioned form, with the partitioning/clustering index defined on the data column (good for load/offload, not good for result set retrieval)? In that case, you could first issue, as previously mentioned, an ALTER INDEX statement, with the NOT CLUSTER option, targeting the existing partitioning/clustering index on the table. Because the partitioning index on an index-controlled partitioned table space must also be the clustering index on the table, execution of this ALTER INDEX statement will have the effect of changing the table space's partitioning scheme from index-controlled to table-controlled (it will still be partitioned by the date column, but that partitioning information will now be tied to the table definition instead of being tied to an index's definition). Next, if you already have an index defined on the account number column, issue an ALTER INDEX statement, with the CLUSTER option specified, for that index (if an index on the account number column doesn't exist already, create one and include CLUSTER in the CREATE INDEX statement).
Here's another potential benefit you can get by going from index-controlled to table-controlled partitioning for a table space: you might be able to get rid of an index that is useful ONLY for partitioning purposes (i.e., it provides no query access path performance benefit, isn't defined on a foreign key, etc.). For the purpose of illustrating this point, consider an index-controlled partitioned table space that is partitioned on a column, called PARTITION_KEY, that is populated with a value by the program that inserts rows into the table. Not only will users never search on this column, it's probably a stinko clustering key, to boot. Issue the handy-dandy ALTER INDEX statement with the NOT CLUSTER option for this index, and alter some other index on the table to make it the clustering index. After that (and after checking that the index on the PARTITION_KEY column, now no longer needed for partitioning, is not needed for any other reason), drop the index on PARTITION_KEY. You'll save the disk space that had been occupied by the index, and insert and delete operations targeting the table will be more CPU-efficient (due to your having one less index on the table that would have to be maintained by DB2). Remember, the table space is still partitioned by the PARTITION_KEY column -- it's just that you no longer need an index to effect that partitioning scheme.
There you go. Nice capability, eh? Could you take advantage of it? Have you taken advantage of it? I hope so. Thanks for reading.