A few days ago I received a question that I believe reflects a rather widely held misconception about DB2 for z/OS native SQL procedures. In the interest of clearing the air, I'll provide in this entry the information that I communicated to my question-asking friend.
Native SQL procedures were introduced with DB2 9 for z/OS in new-function mode (or with DB2 10 in new-function mode, for those organizations that migrated or are migrating to DB2 10 from DB2 V8). Native SQL procedures changed the DB2 for z/OS stored procedure game in a big way, as I pointed out in an entry posted several years ago to the blog I maintained while working as an independent DB2 consultant. One of the important differences between a native SQL procedure and an external stored procedure (such as a COBOL stored procedure) is what I'd call the execution locale. Whereas an external stored procedure executes at least partially in a WLM-managed stored procedure address space (and I'll explain that "partially" in a moment), a native SQL procedure executes entirely in the DB2 database services address space (also known as DBM1). That difference prompted the question to which I referred in my opening: to what address space is the CPU time associated with execution of a native SQL procedure charged?
The person who asked this question was thinking that because a native SQL procedure executes in the DB2 DBM1 address space, the CPU cost of executing a native SQL procedure would show up in the CPU consumption of DBM1. While I can see how one could arrive at that conclusion, it is not correct. The CPU cost of executing a native SQL procedure is in fact charged to the address space through which the stored procedure call came to DBM1; so, if a native SQL procedure is called by a network-attached DRDA requester, the CPU cost of executing the native SQL procedure will be charged to the DB2 DDF address space. If the caller is a CICS transaction program, the CPU cost of executing a called native SQL procedure will be charged to the CICS region in which the transaction executes.
If you think about it, this makes sense. There are two key factors to keep in mind here. First, though a native SQL procedure executes in the DB2 database services address space, it executes under the task of the application process that issued the stored procedure call (e.g., a DDF enclave SRB or a CICS subtask TCB or a batch address space's TCB). This is true of any SQL statement: the statement executes in DBM1, but under the task of the statement-issuing process, and associated CPU consumption is charged back to the address space through which the statement came to DBM1. [There is a slight exception to this rule: if a SQL statement is issued by an external stored procedure, the CPU cost of executing the statement will be charged to the address space through which the stored procedure call came to DBM1, as opposed to being charged to the WLM-managed address space in which the SQL-issuing stored procedure runs.]
The second key factor is this: a native SQL procedure is comprised wholly of SQL statements. Some of these statements will likely be data manipulation statements (e.g., SELECT, INSERT, DELETE, UPDATE), and some will likely be "logic" statements (e.g., LOOP, IF, WHILE), but it's all SQL. Yes, a native SQL procedure executes in the DBM1 address space, but SQL statements ALWAYS execute in the DBM1 address space. That's why I stated earlier that an external DB2 stored procedure generally executes "partially" in a WLM-managed stored procedure address space: if that external DB2 stored procedure issues SQL statements (and a stored procedure program doesn't have to issue SQL statements), those SQL statements will execute -- as is true of all SQL statements -- in the DB2 DBM1 address space; so, it could be said that the external stored procedure will execute partially in its assigned WLM-managed stored procedure address space (for execution of program code other than SQL statements) and partially in the DB2 DBM1 address space (for execution of SQL statements issued by the stored procedure program). A native SQL procedure executes entirely in DBM1 because it is comprised entirely of SQL statements. And where does the cost of SQL statement execution show up? In the address space through which the SQL statements come to DBM1 (with the aforementioned slight exception related to external stored procedures). And that's why the CPU cost of native SQL procedure execution gets charged to the address space through which the associated CALL statement comes to DB2.
I hope that this information will be useful to you. Best wishes to all for a successful 2013.
This is the blog of Robert Catterall, an IBM Db2 for z/OS specialist. The opinions expressed herein are the author's, and should not be construed as reflecting official positions of the IBM Corporation.
Friday, December 28, 2012
Monday, December 17, 2012
DB2 for z/OS: Clone Tables and "Active Versioning"
A DB2 for z/OS DBA contacted me a few days ago and posed a seemingly simple question, the answer to which ended up being a little more involved than might be expected. The DBA's query: how can one alter a table that has a clone?
The specific need in this case was to add a column to a table for which a clone had been created. As is true of other ALTER TABLE operations, addition of a table column cannot be accomplished while the clone relationship exists. The in-a-nutshell solution is to drop the clone table (via ALTER TABLE DROP CLONE), perform the desired alteration of the base table (such as ALTER TABLE ADD COLUMN), and then re-create the clone table (ALTER TABLE ADD CLONE). The thing is, the aforementioned DBA was doing this, and it wasn't working. In this blog entry, I'll explain why that was so, and how we determined the procedural steps required for success.
First, a brief level-set regarding DB2 clone tables: this refers to a feature, introduced with DB2 9 for z/OS in new-function mode, whereby one can create a "clone" of a DB2 table. If you have a base table named TBX, an associated clone could be created with a statement that would look like this:
ALTER TABLE TBX
ADD CLONE TBX_CLONE;
[You don't have to have the word "clone" in the name of the clone table, but having such an identifier is helpful when it comes to accessing a clone table -- it helps to ensure that you're accessing a clone, rather than the base table, when clone access is what you want.]
The clone table created by the statement above would look, in a structural sense, just like the base table: same columns, in the same order, with the same names and data types; same indexes; same check constraints (if any); etc. The one difference between the clone and the base table (other than the table name) is the data in the two tables, and this difference is the essence of a clone table's raison d'etre. Initially, a clone table is empty (and it resides in another "instance" of the base table's table space, distinguished by the fifth-level qualifier in the name of the underlying data set, which at first will be I0002 for the clone, versus I0001 for the base). After being created, the clone table can be populated in the usual way: through SQL inserts or the LOAD utility. Adding data to, and changing data in, the clone table has no effect on the users and programs accessing data in the base table. Once the clone data contains data as desired, that data can be effectively "switched" with the data in the base table -- quickly and programmatically -- by way of the SQL statement EXCHANGE DATA. Users and applications continue to access table TBX (to use my example table-name), but now the data they're accessing is the data that -- before the "switch" -- was the data in TBX_CLONE. In effect, through the EXCHANGE DATA statement, the table that had been the clone in the relationship becomes the new base table and takes on the base table's name, while the table that had been the base table becomes the new clone and takes on the name of the clone table. To learn a little more about what happens when EXCHANGE DATA is executed, you can check out an entry that I posted a few years ago to the blog that I maintained while working as an independent DB2 consultant.
OK, back to the question asked by the DBA. He knew that an ALTER TABLE table statement (other than an ALTER TABLE DROP CLONE) could not be successfully executed for a target table with a clone; so, he dropped the clone and successfully executed an ALTER TABLE ADD COLUMN statement. After doing that, however, the DBA was not able to re-create a clone for the table -- the ALTER TABLE ADD CLONE statement failed with a -148 error code, reason code 9, indicating that the ALTER failed because "active versioning" was in effect for the table.
DB2 uses table space versioning to ensure data integrity and preserve data availability through database schema changes effected by way of ALTER TABLE statements. A new version for a table space is created by DB2 when (for example) a new column is added to a table in the table space via ALTER TABLE ADD COLUMN, and the table space is placed in an advisory REORG-pending state (AREO*). This state will be removed, and the associated schema change will be applied (i.e., physically implemented) when the table space in question is reorganized. You might think that versioning would no longer be "active" for the table space following a reorganization, but post-REORG, the ALTER TABLE ADD CLONE statement to which I've referred still didn't work for the DBA who contacted me for help: again, the error code received was -148, condition code 9 (failure due to active versioning).
Faced with this situation, the DBA ran the REPAIR utility for the affected table space, with the VERSIONS option. Well, that reset the version information for the table space in the SYSIBM.SYSTABLESPACE catalog table (the values of the OLDEST_VERSION and CURRENT_VERSION columns for the table space's row in SYSTABLESPACE were both changed to 0, indicating that versioning had NEVER been active for the table space), and after that the ALTER TABLE ADD CLONE statement could be successfully executed; however, there was a problem: the column added to the table was of the type INTEGER WITH DEFAULT NULL, but the null indicator was not set for the column (a SELECT for the table with an IS NULL predicate referencing the new column returned no results). On top of that, the value sometimes found in the new column (which was supposed to have NULL as its default value) was -2147483648.
That's pretty weird, but weird things can happen when you use REPAIR to "turn off" indicators (such as active versioning) that DB2 uses for data integrity purposes. To be sure, REPAIR has its place as a tool for DBAs, and indeed it is sometimes the ONE way to address an issue impacting data availability. Still, it's a utility with which you can shoot yourself in the foot, and when I hear of a situation in which REPAIR is used to "fix" a DB2 database problem, I am always motivated to find a means by which the problematic object indicator can be "turned off" or reset through actual removal of the circumstances prompting DB2 to set said indicator, versus sledgehammering the indicator with REPAIR. The presence of "active versioning," as indicated by non-equal values for OLDEST_VERSION and CURRENT_VERSION for a table space's row in SYSIBM.SYSTABLESPACE, is addressed by two actions: reorganizing the table space that is in AREO* status and deleting from SYSCOPY any records associated with image copies of the table space that were taken when a version other than the table's current version was in effect. My DBA friend took these two steps for the table space holding the altered table (image copy records associated with older versions of the table space were removed from SYSCOPY via execution of the MODIFY RECOVERY utility), and everything then worked as he wanted it to: a clone could be created for the the altered table, and there were none of the unexpected data retrieval results for the newly added column that he'd seen after using REPAIR to force a resetting of the table space's version number.
So, if you have a table with a clone, and you want to alter that table in some way (e.g., by adding a column to the table), do the following:
And leave REPAIR out of this picture -- you don't need to run it to be able to create a clone for an altered table.
The specific need in this case was to add a column to a table for which a clone had been created. As is true of other ALTER TABLE operations, addition of a table column cannot be accomplished while the clone relationship exists. The in-a-nutshell solution is to drop the clone table (via ALTER TABLE DROP CLONE), perform the desired alteration of the base table (such as ALTER TABLE ADD COLUMN), and then re-create the clone table (ALTER TABLE ADD CLONE). The thing is, the aforementioned DBA was doing this, and it wasn't working. In this blog entry, I'll explain why that was so, and how we determined the procedural steps required for success.
First, a brief level-set regarding DB2 clone tables: this refers to a feature, introduced with DB2 9 for z/OS in new-function mode, whereby one can create a "clone" of a DB2 table. If you have a base table named TBX, an associated clone could be created with a statement that would look like this:
ALTER TABLE TBX
ADD CLONE TBX_CLONE;
[You don't have to have the word "clone" in the name of the clone table, but having such an identifier is helpful when it comes to accessing a clone table -- it helps to ensure that you're accessing a clone, rather than the base table, when clone access is what you want.]
The clone table created by the statement above would look, in a structural sense, just like the base table: same columns, in the same order, with the same names and data types; same indexes; same check constraints (if any); etc. The one difference between the clone and the base table (other than the table name) is the data in the two tables, and this difference is the essence of a clone table's raison d'etre. Initially, a clone table is empty (and it resides in another "instance" of the base table's table space, distinguished by the fifth-level qualifier in the name of the underlying data set, which at first will be I0002 for the clone, versus I0001 for the base). After being created, the clone table can be populated in the usual way: through SQL inserts or the LOAD utility. Adding data to, and changing data in, the clone table has no effect on the users and programs accessing data in the base table. Once the clone data contains data as desired, that data can be effectively "switched" with the data in the base table -- quickly and programmatically -- by way of the SQL statement EXCHANGE DATA. Users and applications continue to access table TBX (to use my example table-name), but now the data they're accessing is the data that -- before the "switch" -- was the data in TBX_CLONE. In effect, through the EXCHANGE DATA statement, the table that had been the clone in the relationship becomes the new base table and takes on the base table's name, while the table that had been the base table becomes the new clone and takes on the name of the clone table. To learn a little more about what happens when EXCHANGE DATA is executed, you can check out an entry that I posted a few years ago to the blog that I maintained while working as an independent DB2 consultant.
OK, back to the question asked by the DBA. He knew that an ALTER TABLE table statement (other than an ALTER TABLE DROP CLONE) could not be successfully executed for a target table with a clone; so, he dropped the clone and successfully executed an ALTER TABLE ADD COLUMN statement. After doing that, however, the DBA was not able to re-create a clone for the table -- the ALTER TABLE ADD CLONE statement failed with a -148 error code, reason code 9, indicating that the ALTER failed because "active versioning" was in effect for the table.
DB2 uses table space versioning to ensure data integrity and preserve data availability through database schema changes effected by way of ALTER TABLE statements. A new version for a table space is created by DB2 when (for example) a new column is added to a table in the table space via ALTER TABLE ADD COLUMN, and the table space is placed in an advisory REORG-pending state (AREO*). This state will be removed, and the associated schema change will be applied (i.e., physically implemented) when the table space in question is reorganized. You might think that versioning would no longer be "active" for the table space following a reorganization, but post-REORG, the ALTER TABLE ADD CLONE statement to which I've referred still didn't work for the DBA who contacted me for help: again, the error code received was -148, condition code 9 (failure due to active versioning).
Faced with this situation, the DBA ran the REPAIR utility for the affected table space, with the VERSIONS option. Well, that reset the version information for the table space in the SYSIBM.SYSTABLESPACE catalog table (the values of the OLDEST_VERSION and CURRENT_VERSION columns for the table space's row in SYSTABLESPACE were both changed to 0, indicating that versioning had NEVER been active for the table space), and after that the ALTER TABLE ADD CLONE statement could be successfully executed; however, there was a problem: the column added to the table was of the type INTEGER WITH DEFAULT NULL, but the null indicator was not set for the column (a SELECT for the table with an IS NULL predicate referencing the new column returned no results). On top of that, the value sometimes found in the new column (which was supposed to have NULL as its default value) was -2147483648.
That's pretty weird, but weird things can happen when you use REPAIR to "turn off" indicators (such as active versioning) that DB2 uses for data integrity purposes. To be sure, REPAIR has its place as a tool for DBAs, and indeed it is sometimes the ONE way to address an issue impacting data availability. Still, it's a utility with which you can shoot yourself in the foot, and when I hear of a situation in which REPAIR is used to "fix" a DB2 database problem, I am always motivated to find a means by which the problematic object indicator can be "turned off" or reset through actual removal of the circumstances prompting DB2 to set said indicator, versus sledgehammering the indicator with REPAIR. The presence of "active versioning," as indicated by non-equal values for OLDEST_VERSION and CURRENT_VERSION for a table space's row in SYSIBM.SYSTABLESPACE, is addressed by two actions: reorganizing the table space that is in AREO* status and deleting from SYSCOPY any records associated with image copies of the table space that were taken when a version other than the table's current version was in effect. My DBA friend took these two steps for the table space holding the altered table (image copy records associated with older versions of the table space were removed from SYSCOPY via execution of the MODIFY RECOVERY utility), and everything then worked as he wanted it to: a clone could be created for the the altered table, and there were none of the unexpected data retrieval results for the newly added column that he'd seen after using REPAIR to force a resetting of the table space's version number.
So, if you have a table with a clone, and you want to alter that table in some way (e.g., by adding a column to the table), do the following:
- Drop the clone table with ALTER TABLE DROP CLONE.
- Alter the table as needed (e.g., issue an ALTER TABLE ADD COLUMN statement).
- REORG the table space to apply the schema change specified in the preceding ALTER TABLE statement, and take a full image copy in the process (an inline copy is taken anyway if you run an online REORG). The REORG will ensure that only one version is associated with the data in the table space, and the full image copy will allow you to delete older image copy records (those taken when prior versions were in effect for the table space).
- Run MODIFY RECOVERY for the table space, with a specification that will remove from SYSCOPY records for image copies taken when a prior version was in effect for the table space. A specification of RETAIN LAST(1) should preserve the record for the image copy taken during the REORG (step 3 above) while deleting records for older image copies. Note that this is a special case: it's generally recommended that you retain SYSCOPY records for at least the two most recent full image copies of a table space. So, after running this MODIFY RECOVERY job with RETAIN LAST(1), for subsequent SYSCOPY "housekeeping" go back to a MODIFY RECOVERY specification that will retain records for two or more full image copies of the table space.
- With versioning no longer "active" for the table space, you can issue an ALTER TABLE ADD CLONE statement to create a clone for the table you altered in step 2.
And leave REPAIR out of this picture -- you don't need to run it to be able to create a clone for an altered table.
Friday, November 30, 2012
Some Information About DB2 10 for z/OS Hash-Organized Tables
A DB2 for z/OS DBA recently asked me about the pros and cons of hash-organized tables -- a physical database design option that is introduced with DB2 10 for z/OS running in new-function mode (NFM). Assuming that others in the mainframe DB2 community are also pondering this issue, I've decided to provide what I hope will be some useful information in this blog entry.
For as long as I can remember, there have been dichotomies with respect to DB2 for z/OS table spaces: partitioned and non-partitioned, single-table and multi-table, segmented and non-segmented, compressed and non-compressed, etc. In my mind, we really didn't have such dichotomies in the realm of tables: DB2 tables were just DB2 tables (with allowances for some specific and distinguishing usage scenarios such as clone tables and materialized query tables). Now, with DB2 10 NFM, we have what I see as a bona fide dichotomy in DB2 table-world: there are hash-organized tables and there are... Are what? "Non-hash-organized tables?" That seems to me to be a pretty clunky term. I prefer to describe a table that is not hash-organized as "cluster-organized," because that speaks to the gist of the difference between the two table types. Whereas rows in a cluster-organized table are arranged within the table according to a designated clustering key (with a requisite clustering index), rows in a hash-organized table will be physically placed according to the value generated when a row's hash key is run through a hashing algorithm. By way of the hashing algorithm, rows are mapped to the table's pages in such a way as to make the physical location of rows within a hash-organized table essentially random with respect to the unhashed key values; so, if an ORDER table has a unique key called ORDERNO (and a unique key can of course be comprised of more than one column), and if that table is clustered by the ORDERNO column, then the rows for ORDERNO 1122 and ORDERNO 1123 are likely to be on the same page (assuming a well-organized table). If, on the other hand, the same table is defined to be hash-organized with ORDERNO as the hash key (a hash key must be unique), it is very unlikely that the rows for ORDERNO 1122 and ORDERNO 1123 will be located on the same page.
Now, this idea of rows being purposely (and very) non-clustered might seem to fly in the face of traditional DB2 for z/OS physical database design wisdom, and indeed it does. Lots of us have known for years about the importance (in terms of query performance) of "locality of reference" -- the notion that rows in a set to be retrieved by a query should be located near each other in the target table. What, then, is the upside to hash-organization of rows in a table? Well, if the set of rows to be retrieved by a query in fact consists of only one row, locality of reference doesn't matter. What does matter? Getting to that one row as efficiently as possible. And how do we measure efficiency here? GETPAGEs, my friend. In mainframe DB2 parlance, a GETPAGE is a request by DB2 to examine a table or an index page. GETPAGEs are very important when it comes to the CPU cost of query execution (as I noted in a blog entry on the topic that I posted a few years ago while working as an independent DB2 consultant): enable generation of a query's result set with a smaller number of GETPAGEs, and it's a pretty safe bet that you've reduced the CPU cost of that query's execution. So, how low can you go when it comes to GETPAGE minimization? How about one GETPAGE? And how do you get to one GETPAGE for a query? With a hash-organized table, WHEN the query retrieves a single row that is qualified by an "equals" predicate that references the table's hash key. Using the example from the preceding paragraph, if you had a query like this one:
SELECT CUSTNAME, DATE, ...
FROM ORDER
WHERE ORDERNO = 1005;
You'd be likely to see one GETPAGE if the query targeted a hash-organized table (assuming that ORDERNO is the table's hash key), and maybe four GETPAGEs if the query targeted a cluster-organized table (3 index GETPAGEs if the table had a three-level index on ORDERNO, and 1 table space GETPAGE). How does DB2 get the row requested by the query with only one GETPAGE when the target table is hash-organized? Simple: it takes the hash key value referenced in the query's predicate, runs it through the hashing algorithm, and thereby obtains the row's location in the table. Due largely to the difference in GETPAGE activity, the query targeting the hash-organized table would consume less CPU time when executed. How much less will depend on several factors, but you can expect the CPU efficiency of a query with a 1-row result set with an "equals" predicate referencing a hash-organized table's hash key to be even better than retrieval of the same row from a cluster-organized table via index-only access (heretofore thought of as the cheapest way to get data from a table, and sometimes pursued in the extreme as an access path, with virtually every column in a table being included in an index key for this purpose). And, speaking of indexes, you don't even need to have an index on a hash-organized table's hash key, as I pointed out in an entry posted to this blog a couple of months ago.
Now, in the first line of the paragraph above this one, I mentioned that you're likely to see one GETPAGE for the query targeting the hash-organized table. That suggests that in some cases you could see more than one GETPAGE. When would that happen? When the target row is in fact not located in the page mapped to via the hash of the hash key value. That would be the case if on insert of the row the hash-mapped page turned out to be full. In such a situation the row is placed in the table's hash overflow area, and it would be located when needed via a special hash overflow index that contains entries only for rows located in the hash overflow area. Note that even retrieval of overflow rows should be quite efficient, as the overflow index should have a very small number of levels (since it should be pretty sparse).
This overflow business gets at one of the downsides of hash-organized tables: they'll generally take up 20-50% more disk space versus equivalent cluster-organized tables. This generosity of space is necessary to allow the large majority of rows to be located in pages identified via the hashing algorithm, and for the overflow area that accommodates rows that can't be so located. The DB2 REORG utility offers you an option -- a good one to take, by the way -- whereby DB2 will calculate, based on information in the real-time statistics tables in the catalog, the size of a hash-organized table's hash area (this is the fixed-size area into which row locations are mapped via the hashing algorithm). When REORG does this, it goes for a hash space size that will result in 5-10% of the table's rows going into the overflow area. If you want even fewer rows in the overflow area, you can increase the size of the table's hash space via ALTER TABLE (and a REORG to put the change into effect), but this will of course consume more disk space. You can use the real-time stats tables yourself to keep an eye on overflow space usage, and schedule REORGs accordingly (TOTALROWS in SYSTABLESPACESTATS tells you how many rows are in the hash-organized table, and TOTALENTRIES in SYSINDEXSPACESTATS tells you how many entries there are in the hash-organized table's overflow index).
The other downside to hash-organized tables is degraded performance for sequential data access. This is to be expected, since rows in the table are unclustered in the extreme if you think of them in terms of some clustering key. As efficient sequential access tends to be important for the performance of queries that have relatively large result sets, hash-organization of data is clearly not the best choice for all tables. In fact, my thinking at this early stage of the game is that cluster-organization of data will continue to be the right choice for most of your tables, with hash organization being appropriate for selected tables for which optimal performance for queries with single-row result sets qualified by "equals" predicates referencing a particular unique key is of prime importance (and this performance benefit extends to searched updates and deletes, as well). Hash organization of data in a table is not a DB2 physical database design panacea, but it sure is great to have in your tool box. Check it out and decide where (or if) this capability could be beneficially leveraged in your environment.
For as long as I can remember, there have been dichotomies with respect to DB2 for z/OS table spaces: partitioned and non-partitioned, single-table and multi-table, segmented and non-segmented, compressed and non-compressed, etc. In my mind, we really didn't have such dichotomies in the realm of tables: DB2 tables were just DB2 tables (with allowances for some specific and distinguishing usage scenarios such as clone tables and materialized query tables). Now, with DB2 10 NFM, we have what I see as a bona fide dichotomy in DB2 table-world: there are hash-organized tables and there are... Are what? "Non-hash-organized tables?" That seems to me to be a pretty clunky term. I prefer to describe a table that is not hash-organized as "cluster-organized," because that speaks to the gist of the difference between the two table types. Whereas rows in a cluster-organized table are arranged within the table according to a designated clustering key (with a requisite clustering index), rows in a hash-organized table will be physically placed according to the value generated when a row's hash key is run through a hashing algorithm. By way of the hashing algorithm, rows are mapped to the table's pages in such a way as to make the physical location of rows within a hash-organized table essentially random with respect to the unhashed key values; so, if an ORDER table has a unique key called ORDERNO (and a unique key can of course be comprised of more than one column), and if that table is clustered by the ORDERNO column, then the rows for ORDERNO 1122 and ORDERNO 1123 are likely to be on the same page (assuming a well-organized table). If, on the other hand, the same table is defined to be hash-organized with ORDERNO as the hash key (a hash key must be unique), it is very unlikely that the rows for ORDERNO 1122 and ORDERNO 1123 will be located on the same page.
Now, this idea of rows being purposely (and very) non-clustered might seem to fly in the face of traditional DB2 for z/OS physical database design wisdom, and indeed it does. Lots of us have known for years about the importance (in terms of query performance) of "locality of reference" -- the notion that rows in a set to be retrieved by a query should be located near each other in the target table. What, then, is the upside to hash-organization of rows in a table? Well, if the set of rows to be retrieved by a query in fact consists of only one row, locality of reference doesn't matter. What does matter? Getting to that one row as efficiently as possible. And how do we measure efficiency here? GETPAGEs, my friend. In mainframe DB2 parlance, a GETPAGE is a request by DB2 to examine a table or an index page. GETPAGEs are very important when it comes to the CPU cost of query execution (as I noted in a blog entry on the topic that I posted a few years ago while working as an independent DB2 consultant): enable generation of a query's result set with a smaller number of GETPAGEs, and it's a pretty safe bet that you've reduced the CPU cost of that query's execution. So, how low can you go when it comes to GETPAGE minimization? How about one GETPAGE? And how do you get to one GETPAGE for a query? With a hash-organized table, WHEN the query retrieves a single row that is qualified by an "equals" predicate that references the table's hash key. Using the example from the preceding paragraph, if you had a query like this one:
SELECT CUSTNAME, DATE, ...
FROM ORDER
WHERE ORDERNO = 1005;
You'd be likely to see one GETPAGE if the query targeted a hash-organized table (assuming that ORDERNO is the table's hash key), and maybe four GETPAGEs if the query targeted a cluster-organized table (3 index GETPAGEs if the table had a three-level index on ORDERNO, and 1 table space GETPAGE). How does DB2 get the row requested by the query with only one GETPAGE when the target table is hash-organized? Simple: it takes the hash key value referenced in the query's predicate, runs it through the hashing algorithm, and thereby obtains the row's location in the table. Due largely to the difference in GETPAGE activity, the query targeting the hash-organized table would consume less CPU time when executed. How much less will depend on several factors, but you can expect the CPU efficiency of a query with a 1-row result set with an "equals" predicate referencing a hash-organized table's hash key to be even better than retrieval of the same row from a cluster-organized table via index-only access (heretofore thought of as the cheapest way to get data from a table, and sometimes pursued in the extreme as an access path, with virtually every column in a table being included in an index key for this purpose). And, speaking of indexes, you don't even need to have an index on a hash-organized table's hash key, as I pointed out in an entry posted to this blog a couple of months ago.
Now, in the first line of the paragraph above this one, I mentioned that you're likely to see one GETPAGE for the query targeting the hash-organized table. That suggests that in some cases you could see more than one GETPAGE. When would that happen? When the target row is in fact not located in the page mapped to via the hash of the hash key value. That would be the case if on insert of the row the hash-mapped page turned out to be full. In such a situation the row is placed in the table's hash overflow area, and it would be located when needed via a special hash overflow index that contains entries only for rows located in the hash overflow area. Note that even retrieval of overflow rows should be quite efficient, as the overflow index should have a very small number of levels (since it should be pretty sparse).
This overflow business gets at one of the downsides of hash-organized tables: they'll generally take up 20-50% more disk space versus equivalent cluster-organized tables. This generosity of space is necessary to allow the large majority of rows to be located in pages identified via the hashing algorithm, and for the overflow area that accommodates rows that can't be so located. The DB2 REORG utility offers you an option -- a good one to take, by the way -- whereby DB2 will calculate, based on information in the real-time statistics tables in the catalog, the size of a hash-organized table's hash area (this is the fixed-size area into which row locations are mapped via the hashing algorithm). When REORG does this, it goes for a hash space size that will result in 5-10% of the table's rows going into the overflow area. If you want even fewer rows in the overflow area, you can increase the size of the table's hash space via ALTER TABLE (and a REORG to put the change into effect), but this will of course consume more disk space. You can use the real-time stats tables yourself to keep an eye on overflow space usage, and schedule REORGs accordingly (TOTALROWS in SYSTABLESPACESTATS tells you how many rows are in the hash-organized table, and TOTALENTRIES in SYSINDEXSPACESTATS tells you how many entries there are in the hash-organized table's overflow index).
The other downside to hash-organized tables is degraded performance for sequential data access. This is to be expected, since rows in the table are unclustered in the extreme if you think of them in terms of some clustering key. As efficient sequential access tends to be important for the performance of queries that have relatively large result sets, hash-organization of data is clearly not the best choice for all tables. In fact, my thinking at this early stage of the game is that cluster-organization of data will continue to be the right choice for most of your tables, with hash organization being appropriate for selected tables for which optimal performance for queries with single-row result sets qualified by "equals" predicates referencing a particular unique key is of prime importance (and this performance benefit extends to searched updates and deletes, as well). Hash organization of data in a table is not a DB2 physical database design panacea, but it sure is great to have in your tool box. Check it out and decide where (or if) this capability could be beneficially leveraged in your environment.
Friday, November 9, 2012
DB2 for z/OS Data Sharing: Why CF LPARs need DEDICATED Engines
I was recently involved in an interesting DB2 for z/OS performance analysis effort. An organization had moved a DB2 data sharing group from one Parallel Sysplex mainframe cluster to another (if you aren't familiar with DB2 data sharing, you can check out an introductory blog entry that I posted a few years ago while working as an independent consultant). Right after that move, the CPU consumption associated with the DB2 workload on the system jumped by about 30%. DBAs diligently searched for the cause of this DB2 CPU cost spike. Was it related to an increase in the volume of DB2-accessing transactions executing in the data sharing group? No -- the size of the workload was about the same after the relocation of the DB2 subsystems as it had been before. Had SQL access paths changed, causing an increase in GETPAGE activity (something else I blogged about in my independent consultant days)? No -- GETPAGEs per transaction had stayed steady. A friend of mine supporting this system asked me: what could it be? How is it that the same workload is chewing so many more CPU cycles?
I had a hunch, and asked to see coupling facility activity reports (generated by z/OS monitor software) from before and after the relocation of the DB2 data sharing group. Sure enough, service times for synchronous requests to the DB2 lock structure and group buffer pools increased dramatically from the "before" to the "after" period -- I'm talking 300% - 500% increases. Here's why that matters: DB2 data sharing technology delivers major benefits in the areas of availability and scalability, but those benefits are not free. One cost factor is the increase in CPU consumption related to SQL statement execution in a data sharing system versus a standalone DB2 environment. That cost, which can be thought of as the CPU overhead of data sharing, is generally rather low -- typically about 10% in a well-tuned system. A very important characteristic of a well-tuned DB2 data sharing system is extremely rapid servicing of synchronous requests to the DB2 lock structure and the group buffer pools (GBPs) that are housed in coupling facility (CF) LPARs (another CF structure that supports DB2 data sharing, called the shared communications area, is accessed much less frequently than the lock structure and the group buffer pools, and so is usually not discussed in a performance context). The vast majority of requests to the lock structure and the GBPs are synchronous, and for performance purposes it is imperative that these requests be serviced with great speed, because when it's synchronous CF requests we're talking about, elapsed time directly impacts CPU time.
"How is that?" you might ask. "Why would the elapsed time of a CF request affect the CPU cost of that request?" I'll tell you. A synchronous CF request is not like a disk I/O request (or an asynchronous CF request), in that a mainframe engine driving an I/O request can go and do other things while waiting for the I/O request to complete (an interrupt will let the system know when an I/O operation has been completed, and any task that was suspended awaiting the completion of the I/O will then be eligible for re-dispatching). In the case of a synchronous CF request (such as a request for a global X-lock on a table page that is to be updated), the mainframe engine driving the CF will do nothing until the request has completed. The waiting mainframe engine is conceptually similar to a runner in a relay race awaiting the hand-off of the baton from a teammate. This special type of wait situation is known as processor "dwell" time. The cost of "dwell" time -- the waiting engine is, for all practical purposes, busy while it's dwelling -- is charged to the application process for which the synchronous CF request is executed. The result is higher CPU consumption for SQL statement execution. Fortunately, as previously mentioned, this added CPU cost is typically quite modest. Data sharing CPU overhead tends to be low in large part because synchronous requests to the DB2 lock structure and GBPs are usually serviced VERY quickly -- it's common to see service times for synchronous requests to GBPs that average in the low double digits of microseconds, and synchronous lock structure requests are sometimes serviced in the single digits of microseconds, on average. That's a good two orders of magnitude better than the service time for a DB2 disk read -- even a read from disk controller cache.
So, you can imagine what would happen if synchronous CF request service times for a DB2 data sharing group were to jump by several hundred percent: processor dwell time would shoot up, and so would the CPU consumption of the DB2 workload. This is what happened in the situation to which I've referred in this blog entry. Having established that, I then looked to identify the cause of the elongated synchronous CF request service times. Again on a hunch, I looked in the CF activity reports that I'd received for the section with information about the configuration of the CF LPARs on the Parallel Sysplex. There it was. In the "before" report, for each of the two CF LPARs in the Sysplex, I saw the following:
LOGICAL PROCESSORS: DEFINED 1 EFFECTIVE 1.0
SHARED 0 AVG WEIGHT 0.0
And this is what the same part of the "after" report looked like:
LOGICAL PROCESSORS: DEFINED 1 EFFECTIVE 0.9
SHARED 1 AVG WEIGHT 90.0
Bingo. Now, why is that slight change such a big deal? The CF LPARs were both running at a low average rate of utilization (under 10%), so wouldn't it be OK to "split" an engine between production and test CF LPARs, with the production CF LPAR getting 90% of that engine's processing capacity? NO. z/OS LPARs (and z/VM LPARs) can share mainframe engines very effectively with minimal overhead. Not so coupling facility control code (CFCC), the operating system of a CF LPAR. To get good performance in a high-volume environment (and with production DB2 for z/OS data sharing groups you're likely to see thousands of synchronous CF requests per second), it is VERY important that the CF LPARs have dedicated engines. I communicated this message to the DB2 and z/OS teams at the organization that had been grappling with the problem of elevated DB2-related CPU utilization, and they implemented a change that effectively gave the production CF LPARs one DEDICATED engine apiece. The result? CPU utilization associated with the DB2 application workload dropped dramatically -- to a level that was even lower than it had been prior to the relocation of the DB2 data sharing group from the one Parallel Sysplex to the other (lower rather than equal, because with the relocation of the data sharing group there was also a move to internal versus external CF LPARs, and a large percentage of the synchronous CF requests were between a z/OS LPAR and a CF LPAR on the same mainframe "box" -- these requests flowed over virtual CF links that deliver tremendous performance).
The moral to this story: for your production CF LPARs, make sure that engines are DEDICATED, not shared.
With all this said, I'll offer one caveat, of which Gary King, one of IBM's top System z performance experts, recently reminded me. IF a test Parallel Sysplex has a REALLY LOW level of CF request activity, it MIGHT be possible to let production CF LPARs share engines with the test CF LPARs and still get good synchronous CF request service times in the production environment (though still not as good as you'd see with dedicated CF LPAR engines) IF you turn dynamic CF dispatching (DCFD) ON for the test CF LPARs and leave DCFD OFF (the default) for the production CF LPARs (DCFD is a CF configuration option). In that case, the test CF LPARs would generally use around 1% of the CF engines (even if you had the engine weightings split 90% for production and 10% for test). The test CF LPARs would probably get lousy service times for synchronous CF requests, but that could be OK in a low-volume test environment. There is a risk associated with this approach: if the test Parallel Sysplex got to be fairly busy, so that the test CF LPARs more fully utilized their 10% slice of the shared CF engines, the production CF LPARs would be negatively impacted, performance-wise. The safest route, then, for a production Parallel Sysplex is to go with dedicated engines for the CF LPARs (by the way, internal coupling facility engines, or ICFs, do not impact the cost of your z/OS software running on the same mainframe).
Lots of us were taught as kids that sharing is a good thing. As I've pointed out just now, there is at least one exception to that rule.
I had a hunch, and asked to see coupling facility activity reports (generated by z/OS monitor software) from before and after the relocation of the DB2 data sharing group. Sure enough, service times for synchronous requests to the DB2 lock structure and group buffer pools increased dramatically from the "before" to the "after" period -- I'm talking 300% - 500% increases. Here's why that matters: DB2 data sharing technology delivers major benefits in the areas of availability and scalability, but those benefits are not free. One cost factor is the increase in CPU consumption related to SQL statement execution in a data sharing system versus a standalone DB2 environment. That cost, which can be thought of as the CPU overhead of data sharing, is generally rather low -- typically about 10% in a well-tuned system. A very important characteristic of a well-tuned DB2 data sharing system is extremely rapid servicing of synchronous requests to the DB2 lock structure and the group buffer pools (GBPs) that are housed in coupling facility (CF) LPARs (another CF structure that supports DB2 data sharing, called the shared communications area, is accessed much less frequently than the lock structure and the group buffer pools, and so is usually not discussed in a performance context). The vast majority of requests to the lock structure and the GBPs are synchronous, and for performance purposes it is imperative that these requests be serviced with great speed, because when it's synchronous CF requests we're talking about, elapsed time directly impacts CPU time.
"How is that?" you might ask. "Why would the elapsed time of a CF request affect the CPU cost of that request?" I'll tell you. A synchronous CF request is not like a disk I/O request (or an asynchronous CF request), in that a mainframe engine driving an I/O request can go and do other things while waiting for the I/O request to complete (an interrupt will let the system know when an I/O operation has been completed, and any task that was suspended awaiting the completion of the I/O will then be eligible for re-dispatching). In the case of a synchronous CF request (such as a request for a global X-lock on a table page that is to be updated), the mainframe engine driving the CF will do nothing until the request has completed. The waiting mainframe engine is conceptually similar to a runner in a relay race awaiting the hand-off of the baton from a teammate. This special type of wait situation is known as processor "dwell" time. The cost of "dwell" time -- the waiting engine is, for all practical purposes, busy while it's dwelling -- is charged to the application process for which the synchronous CF request is executed. The result is higher CPU consumption for SQL statement execution. Fortunately, as previously mentioned, this added CPU cost is typically quite modest. Data sharing CPU overhead tends to be low in large part because synchronous requests to the DB2 lock structure and GBPs are usually serviced VERY quickly -- it's common to see service times for synchronous requests to GBPs that average in the low double digits of microseconds, and synchronous lock structure requests are sometimes serviced in the single digits of microseconds, on average. That's a good two orders of magnitude better than the service time for a DB2 disk read -- even a read from disk controller cache.
So, you can imagine what would happen if synchronous CF request service times for a DB2 data sharing group were to jump by several hundred percent: processor dwell time would shoot up, and so would the CPU consumption of the DB2 workload. This is what happened in the situation to which I've referred in this blog entry. Having established that, I then looked to identify the cause of the elongated synchronous CF request service times. Again on a hunch, I looked in the CF activity reports that I'd received for the section with information about the configuration of the CF LPARs on the Parallel Sysplex. There it was. In the "before" report, for each of the two CF LPARs in the Sysplex, I saw the following:
LOGICAL PROCESSORS: DEFINED 1 EFFECTIVE 1.0
SHARED 0 AVG WEIGHT 0.0
And this is what the same part of the "after" report looked like:
LOGICAL PROCESSORS: DEFINED 1 EFFECTIVE 0.9
SHARED 1 AVG WEIGHT 90.0
Bingo. Now, why is that slight change such a big deal? The CF LPARs were both running at a low average rate of utilization (under 10%), so wouldn't it be OK to "split" an engine between production and test CF LPARs, with the production CF LPAR getting 90% of that engine's processing capacity? NO. z/OS LPARs (and z/VM LPARs) can share mainframe engines very effectively with minimal overhead. Not so coupling facility control code (CFCC), the operating system of a CF LPAR. To get good performance in a high-volume environment (and with production DB2 for z/OS data sharing groups you're likely to see thousands of synchronous CF requests per second), it is VERY important that the CF LPARs have dedicated engines. I communicated this message to the DB2 and z/OS teams at the organization that had been grappling with the problem of elevated DB2-related CPU utilization, and they implemented a change that effectively gave the production CF LPARs one DEDICATED engine apiece. The result? CPU utilization associated with the DB2 application workload dropped dramatically -- to a level that was even lower than it had been prior to the relocation of the DB2 data sharing group from the one Parallel Sysplex to the other (lower rather than equal, because with the relocation of the data sharing group there was also a move to internal versus external CF LPARs, and a large percentage of the synchronous CF requests were between a z/OS LPAR and a CF LPAR on the same mainframe "box" -- these requests flowed over virtual CF links that deliver tremendous performance).
The moral to this story: for your production CF LPARs, make sure that engines are DEDICATED, not shared.
With all this said, I'll offer one caveat, of which Gary King, one of IBM's top System z performance experts, recently reminded me. IF a test Parallel Sysplex has a REALLY LOW level of CF request activity, it MIGHT be possible to let production CF LPARs share engines with the test CF LPARs and still get good synchronous CF request service times in the production environment (though still not as good as you'd see with dedicated CF LPAR engines) IF you turn dynamic CF dispatching (DCFD) ON for the test CF LPARs and leave DCFD OFF (the default) for the production CF LPARs (DCFD is a CF configuration option). In that case, the test CF LPARs would generally use around 1% of the CF engines (even if you had the engine weightings split 90% for production and 10% for test). The test CF LPARs would probably get lousy service times for synchronous CF requests, but that could be OK in a low-volume test environment. There is a risk associated with this approach: if the test Parallel Sysplex got to be fairly busy, so that the test CF LPARs more fully utilized their 10% slice of the shared CF engines, the production CF LPARs would be negatively impacted, performance-wise. The safest route, then, for a production Parallel Sysplex is to go with dedicated engines for the CF LPARs (by the way, internal coupling facility engines, or ICFs, do not impact the cost of your z/OS software running on the same mainframe).
Lots of us were taught as kids that sharing is a good thing. As I've pointed out just now, there is at least one exception to that rule.
Wednesday, October 24, 2012
DB2 for z/OS: Have You Checked Your Log RBA Value Lately?
I'm in Las Vegas this week for the IBM Information on Demand 2012 conference. At this year's IOD event I've particularly enjoyed learning more about DB2 11 for z/OS, the beta program for which is expected to begin in March of 2013. DB2 11 will deliver a lot in the way of new features and capabilities. One of the big news items here is the extended RBA that will be available in DB2 11 systems running in new-function mode (the RBA, or relative byte address, is an identifier for records written by a DB2 subsystem to its transaction log). Whereas the current RBA is a six-byte hexadecimal value, providing 2^48 bytes (256 TB) of log capacity, the DB2 11 extended RBA is a ten-byte value, allowing for the logging of a yottabyte of data-change records. Not familiar with the term, "yottabyte?" Neither was I until a couple of days ago. It's 1000 zetabytes, or a trillion terabytes. That's a lot of logging capacity.
A larger maximum RBA value is important because a few DB2 for z/OS sites have bumped up against the 2^48-byte limit of the current RBA. Actually, what these folks bump up against is a practical RBA value limit that's a little less than 2^48. See, the RBA value would be thoretically maxed out at x'FFFFFFFFFFFF' (in hexadecimal notation); however, actually reaching that value would prevent a DB2 subsystem from doing much of anything, and that would not be good for data integrity protection. So, what happens is that DB2 will automatically shut down if the log RBA value reaches x'FFFF00000000'. This gives the DB2 subsystem a little room at the end of the log that it will use, when restarted, in resolving any incomplete units of recovery and otherwise ensuring that data is left in a consistent state (an attempt to restart the subsystem in "regular" mode after the automatic shutdown will result in DB2 going through "restart light" processing and shutting down again, after which it could be restarted again with ACCESS(MAINT) for any further clean-up tasks).
That sort of automatic shutdown is of course something you'd want to avoid, and fortunately you can. First of all, DB2 will issue message DSNJ032I, with an "alert level" of WARNING, when the log RBA value reaches x'F00000000000'. This would give you some time to plan for dealing with the approaching RBA limit (DSNJ032I is also issued, with an alert level of CRITICAL, when the RBA value reaches x'FFFF00000000' -- the value that triggers automatic shutdown). Here is a piece of advice for you: do NOT wait until your DB2 subsystem issues DSNJ032I with the WARNING alert level. Instead, check the RBA value for each of your DB2 for z/OS subsystems now. This can be quickly and easily done by issuing the command -DISPLAY LOG on each subsystem. In the command output you'll see a field, H/W RBA, that shows the highest RBA value written by the DB2 subsystem to its transaction log. Chances are, the value that you see in this field will be way lower than x'F00000000000', the WARNING threshold. A colleague of mine who has a lot of knowledge in this area, Paul Bartak, has said that a high-RBA value of x'C00000000000' or greater should be a trigger that gets you into RBA limit planning mode. If what you see for H/W RBA in -DISPLAY LOG output for a subsystem is less than x'C00000000000', just keep an eye on the situation through periodic issuance of -DISPLAY LOG.
If the high RBA value for a DB2 subsystem in your environment is north of x'C00000000000' then as my co-worker Paul suggested you should start developing a plan of action. What you'll end up doing will depend on whether you run DB2 for z/OS in standalone mode or in a data sharing configuration on a Parallel Sysplex. In the standalone case, you will follow a procedure that will result in the DB2 subsystem's RBA value being reset to zero. This procedure is documented in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA value in a non-data sharing environment" (you can access the DB2 for z/OS product documentation, including the Administration Guide, in either PDF or HTML form, at http://www-01.ibm.com/support/docview.wss?rs=64&uid=swg27011656). A number of mainframe DB2 sites have successfully gone through this documented RBA-reset process -- it works.
If you have a DB2 subsystem that is approaching the log RBA limit, and that subsystem is a member of a data sharing group, there are a couple of options available to you. One alternative is to reset the member subsystem's RBA -- a less involved process in a data sharing versus a non-data sharing environment. The other choice involves starting up a new member of the data sharing group and permanently quiescing the member with the elevated RBA value (with DB2 10 in new-function mode, one can delete a member from the data sharing group -- until then, the subsystem ID of the permanently quiesced member will continue to show up in the text of some DB2 messages, but that's generally not a big deal). The DB2 for z/OS Data Sharing Planning and Administration Guide, under the heading, "Removing members from the data sharing group," provides information on permanently quiescing a member subsystem. Note that if the new member -- the one that will replace the subsystem that is to be permanently quiesced -- will be the eighth to join the data sharing group, entries in the lock table (part of the lock structure in a coupling facility) will have to be 4 bytes apiece, versus 2 bytes each. To maintain the same number of lock table entries in that case (good for holding the line on false global lock contention), you'd want to increase the size of the lock structure -- which should already be at a power of 2, such as 64 MB -- to the next higher power of 2.
Whether your DB2 subsystem is operating in standalone or data sharing mode, there is information in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA," on using output from the print log map utility (aka DSNJU004) to estimate the amount of time you have before a subsystem's RBA value reaches the critical threshold.
Something else for those of you running DB2 in a data sharing group: do NOT think that RBA values no longer matter, just because LRSN values (short for log record sequence number) are used for data recovery operations. RBA values continue to go up for data sharing members as they write records to their log data sets, and the automatic shutdown occurs for a data sharing member just as it does for a non-data sharing subsystem when the RBA value reaches x'FFFF00000000'. I don't want you to be caught napping in this regard.
Speaking of LRSN values, those are also extended beginning with DB2 11 in new-function mode, to the tune of giving you something like another 30,000 years of value growth (LRSN values are derived from the store clock timestamp, and the current LRSN field can take you to the year 2042). Given the 30 years of value-increase that can be accommodated by the current-format LRSN, will any organizations be in a hurry to implement the DB2 11 extended LRSN (done, as for the extended RBA, through a conversion of the bootstrap data set and online REORGs of database objects)? Some will be. Why? Because on occasion a company had to add a delta to the LRSN of the originating member of a DB2 data sharing group in order to have a starting LRSN value that was higher than the already-elevated RBA value of that originating DB2 member. You can look at print log map utility output to see of you have such an LRSN delta in your environment. If you don't -- and that's probably the case -- then the LRSN value limit is still a long ways off for you. If there is a delta, you'll be more interested in getting to the DB2 11 extended LRSN sooner rather than later.
So, to recap: you should periodically check the RBA values of your DB2 subsystems, to ensure that they are well below the critical threshold. If the RBA value for a subsystem is seen to be high (with a 'C' in the high-order position), DON'T PANIC. Prepare to head off the RBA limit situation before it occurs, and get ready to put all this in your rear-view mirror once you get to DB2 11 new-function mode.
A larger maximum RBA value is important because a few DB2 for z/OS sites have bumped up against the 2^48-byte limit of the current RBA. Actually, what these folks bump up against is a practical RBA value limit that's a little less than 2^48. See, the RBA value would be thoretically maxed out at x'FFFFFFFFFFFF' (in hexadecimal notation); however, actually reaching that value would prevent a DB2 subsystem from doing much of anything, and that would not be good for data integrity protection. So, what happens is that DB2 will automatically shut down if the log RBA value reaches x'FFFF00000000'. This gives the DB2 subsystem a little room at the end of the log that it will use, when restarted, in resolving any incomplete units of recovery and otherwise ensuring that data is left in a consistent state (an attempt to restart the subsystem in "regular" mode after the automatic shutdown will result in DB2 going through "restart light" processing and shutting down again, after which it could be restarted again with ACCESS(MAINT) for any further clean-up tasks).
That sort of automatic shutdown is of course something you'd want to avoid, and fortunately you can. First of all, DB2 will issue message DSNJ032I, with an "alert level" of WARNING, when the log RBA value reaches x'F00000000000'. This would give you some time to plan for dealing with the approaching RBA limit (DSNJ032I is also issued, with an alert level of CRITICAL, when the RBA value reaches x'FFFF00000000' -- the value that triggers automatic shutdown). Here is a piece of advice for you: do NOT wait until your DB2 subsystem issues DSNJ032I with the WARNING alert level. Instead, check the RBA value for each of your DB2 for z/OS subsystems now. This can be quickly and easily done by issuing the command -DISPLAY LOG on each subsystem. In the command output you'll see a field, H/W RBA, that shows the highest RBA value written by the DB2 subsystem to its transaction log. Chances are, the value that you see in this field will be way lower than x'F00000000000', the WARNING threshold. A colleague of mine who has a lot of knowledge in this area, Paul Bartak, has said that a high-RBA value of x'C00000000000' or greater should be a trigger that gets you into RBA limit planning mode. If what you see for H/W RBA in -DISPLAY LOG output for a subsystem is less than x'C00000000000', just keep an eye on the situation through periodic issuance of -DISPLAY LOG.
If the high RBA value for a DB2 subsystem in your environment is north of x'C00000000000' then as my co-worker Paul suggested you should start developing a plan of action. What you'll end up doing will depend on whether you run DB2 for z/OS in standalone mode or in a data sharing configuration on a Parallel Sysplex. In the standalone case, you will follow a procedure that will result in the DB2 subsystem's RBA value being reset to zero. This procedure is documented in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA value in a non-data sharing environment" (you can access the DB2 for z/OS product documentation, including the Administration Guide, in either PDF or HTML form, at http://www-01.ibm.com/support/docview.wss?rs=64&uid=swg27011656). A number of mainframe DB2 sites have successfully gone through this documented RBA-reset process -- it works.
If you have a DB2 subsystem that is approaching the log RBA limit, and that subsystem is a member of a data sharing group, there are a couple of options available to you. One alternative is to reset the member subsystem's RBA -- a less involved process in a data sharing versus a non-data sharing environment. The other choice involves starting up a new member of the data sharing group and permanently quiescing the member with the elevated RBA value (with DB2 10 in new-function mode, one can delete a member from the data sharing group -- until then, the subsystem ID of the permanently quiesced member will continue to show up in the text of some DB2 messages, but that's generally not a big deal). The DB2 for z/OS Data Sharing Planning and Administration Guide, under the heading, "Removing members from the data sharing group," provides information on permanently quiescing a member subsystem. Note that if the new member -- the one that will replace the subsystem that is to be permanently quiesced -- will be the eighth to join the data sharing group, entries in the lock table (part of the lock structure in a coupling facility) will have to be 4 bytes apiece, versus 2 bytes each. To maintain the same number of lock table entries in that case (good for holding the line on false global lock contention), you'd want to increase the size of the lock structure -- which should already be at a power of 2, such as 64 MB -- to the next higher power of 2.
Whether your DB2 subsystem is operating in standalone or data sharing mode, there is information in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA," on using output from the print log map utility (aka DSNJU004) to estimate the amount of time you have before a subsystem's RBA value reaches the critical threshold.
Something else for those of you running DB2 in a data sharing group: do NOT think that RBA values no longer matter, just because LRSN values (short for log record sequence number) are used for data recovery operations. RBA values continue to go up for data sharing members as they write records to their log data sets, and the automatic shutdown occurs for a data sharing member just as it does for a non-data sharing subsystem when the RBA value reaches x'FFFF00000000'. I don't want you to be caught napping in this regard.
Speaking of LRSN values, those are also extended beginning with DB2 11 in new-function mode, to the tune of giving you something like another 30,000 years of value growth (LRSN values are derived from the store clock timestamp, and the current LRSN field can take you to the year 2042). Given the 30 years of value-increase that can be accommodated by the current-format LRSN, will any organizations be in a hurry to implement the DB2 11 extended LRSN (done, as for the extended RBA, through a conversion of the bootstrap data set and online REORGs of database objects)? Some will be. Why? Because on occasion a company had to add a delta to the LRSN of the originating member of a DB2 data sharing group in order to have a starting LRSN value that was higher than the already-elevated RBA value of that originating DB2 member. You can look at print log map utility output to see of you have such an LRSN delta in your environment. If you don't -- and that's probably the case -- then the LRSN value limit is still a long ways off for you. If there is a delta, you'll be more interested in getting to the DB2 11 extended LRSN sooner rather than later.
So, to recap: you should periodically check the RBA values of your DB2 subsystems, to ensure that they are well below the critical threshold. If the RBA value for a subsystem is seen to be high (with a 'C' in the high-order position), DON'T PANIC. Prepare to head off the RBA limit situation before it occurs, and get ready to put all this in your rear-view mirror once you get to DB2 11 new-function mode.
Sunday, October 14, 2012
So, Where do You Specify the IP Address of YOUR DB2 for z/OS Subsystem?
If your answer to the question posed in the title of this blog entry is, "In a PORT reservation statement in our z/OS LPAR's TCP/IP profile data set," that's NOT what I want to hear. It's true that for a long time this was the standard way to assign an IP address to a DB2 for z/OS subsystem, and it would look something like this:
PORT 446 TCP DBP1DIST BIND 111.222.333.444 ; DRDA SQL PORT FOR DBP1
This was the way to go because DB2 didn't give you much of a choice in this regard. You can also set things up so that a DB2 subsystem's IP address is specified in a domain name server, but that's not the answer I'm looking for, either. No, if I ask you where the IP address of your DB2 subsystem is specified, what I want to hear is this: "In the bootstrap data set, Mr. Catterall." Of course I'm kidding about the "Mr. Catterall" bit, but the BSDS is indeed where DB2's IP address should be. I'll tell you why momentarily.
Specifying the IP address of a DB2 subsystem in the associated BSDS is something that only recently (relatively speaking) became possible -- the capability was delivered with DB2 9 for z/OS. In a DB2 9 environment, an IP address can be added to a subsystem's communications record in the BSDS by way of the change log inventory utility, aka DSNJU003. The input statement for an execution of DSNJU003 that would place an IP address in the BSDS would be of this form:
DDF IPV4=111.222.333.444
Note that the address of the subsystem could also be specified in the BSDS in IPv6 form (colon hexadecimal). Note, too, that in a data sharing system you can place in the BSDS the IP address of the data sharing group as well as the address of the individual member subsystem. Also in a data data sharing environment it is highly recommended that the address of a DB2 subsystem be a dynamic virtual IP address, or DVIPA (that way, if a failed DB2 member is restarted on a different z/OS LPAR in the Parallel Sysplex, clients utilizing DRDA two-phase commit protocol will be able to re-connect to the member to resolve in-doubt threads).
If you put a DB2 subsystem's IP address in the associated BSDS, a PORT reservation statement for the subsystem in the LPAR's TCP/IP data set would look like this:
PORT 446 TCP DBP1DIST ; DRDA SQL PORT FOR DBP1
That's right: no IP address for the DB2 subsystem in the PORT statement. When a DB2 subsystem's IP address is recorded in that subsystem's BSDS, placing the address as well in a PORT statement for the DB2 subsystem is not only unnecessary, it's ill-advised. Why? Because if the IP address is recorded in both places (the BSDS and the PORT statement), the PORT statement wins. Why is that important? Because if z/OS Communications Server gets a DB2 subsystem's IP address from a PORT reservation statement, you miss out on the benefits of INADDR_ANY functionality. INADDR_ANY is a TCP/IP capability whereby, in a DB2 for z/OS context, a DB2 subsystem can accept requests to its port on any IP address. What that gives you is configuration and operational flexibility. This is particularly advantageous in a DB2 data sharing environment, and when DB2 Connect is running under Linux on System z and utilizing HiperSockets for access to DB2 for z/OS.
There are other goodies associated with recording a DB2 subsystem's IP address in the BSDS instead of in a PORT reservation statement: you don't have to define a domain name for the DB2 subsystem to TCP/IP, and you can use SSL (Secure Socket Layer) for DRDA communications (not possible with BIND specific TCP/IP statements).
You've probably done a lot over the years to modernize your DB2 for z/OS environment. Getting a DB2 subsystem's IP address into the BSDS and out of the PORT reservation statement is another such move. Take this action and position your DB2 subsystem for enhanced client-server application availability, agility, and functionality.
PORT 446 TCP DBP1DIST BIND 111.222.333.444 ; DRDA SQL PORT FOR DBP1
This was the way to go because DB2 didn't give you much of a choice in this regard. You can also set things up so that a DB2 subsystem's IP address is specified in a domain name server, but that's not the answer I'm looking for, either. No, if I ask you where the IP address of your DB2 subsystem is specified, what I want to hear is this: "In the bootstrap data set, Mr. Catterall." Of course I'm kidding about the "Mr. Catterall" bit, but the BSDS is indeed where DB2's IP address should be. I'll tell you why momentarily.
Specifying the IP address of a DB2 subsystem in the associated BSDS is something that only recently (relatively speaking) became possible -- the capability was delivered with DB2 9 for z/OS. In a DB2 9 environment, an IP address can be added to a subsystem's communications record in the BSDS by way of the change log inventory utility, aka DSNJU003. The input statement for an execution of DSNJU003 that would place an IP address in the BSDS would be of this form:
DDF IPV4=111.222.333.444
Note that the address of the subsystem could also be specified in the BSDS in IPv6 form (colon hexadecimal). Note, too, that in a data sharing system you can place in the BSDS the IP address of the data sharing group as well as the address of the individual member subsystem. Also in a data data sharing environment it is highly recommended that the address of a DB2 subsystem be a dynamic virtual IP address, or DVIPA (that way, if a failed DB2 member is restarted on a different z/OS LPAR in the Parallel Sysplex, clients utilizing DRDA two-phase commit protocol will be able to re-connect to the member to resolve in-doubt threads).
If you put a DB2 subsystem's IP address in the associated BSDS, a PORT reservation statement for the subsystem in the LPAR's TCP/IP data set would look like this:
PORT 446 TCP DBP1DIST ; DRDA SQL PORT FOR DBP1
That's right: no IP address for the DB2 subsystem in the PORT statement. When a DB2 subsystem's IP address is recorded in that subsystem's BSDS, placing the address as well in a PORT statement for the DB2 subsystem is not only unnecessary, it's ill-advised. Why? Because if the IP address is recorded in both places (the BSDS and the PORT statement), the PORT statement wins. Why is that important? Because if z/OS Communications Server gets a DB2 subsystem's IP address from a PORT reservation statement, you miss out on the benefits of INADDR_ANY functionality. INADDR_ANY is a TCP/IP capability whereby, in a DB2 for z/OS context, a DB2 subsystem can accept requests to its port on any IP address. What that gives you is configuration and operational flexibility. This is particularly advantageous in a DB2 data sharing environment, and when DB2 Connect is running under Linux on System z and utilizing HiperSockets for access to DB2 for z/OS.
There are other goodies associated with recording a DB2 subsystem's IP address in the BSDS instead of in a PORT reservation statement: you don't have to define a domain name for the DB2 subsystem to TCP/IP, and you can use SSL (Secure Socket Layer) for DRDA communications (not possible with BIND specific TCP/IP statements).
You've probably done a lot over the years to modernize your DB2 for z/OS environment. Getting a DB2 subsystem's IP address into the BSDS and out of the PORT reservation statement is another such move. Take this action and position your DB2 subsystem for enhanced client-server application availability, agility, and functionality.
Tuesday, September 25, 2012
Linux on System z: a Better Place for Your DB2 for z/OS Client-Server Applications
In DB2 for z/OS circles, there has been an evolution of attitudes concerning Linux on System z. Back around 2003, when I was on the user side of the DB2 community, I was part of a team charged with developing a plan for the simplification and streamlining of my company's IT infrastructure. One of the mainframers on this team had a suggestion: why not add a few IFL engines (more on this to come) to our System z servers and use those to consolidate a lot of our Linux systems (which were running primarily on x86 boxes)? I'll confess that this suggestion seemed a little weird to me, a little "out there" -- Linux on a mainframe? Really? The person who was advocating running Linux on System z made some good points, but the idea didn't get far and was set aside. Linux was kind of new to us at the time, and while we were ready to increase the use of this OS in our environment (and we did), we weren't ready to even seriously consider running it on mainframes.
Fast forward to 2010. A few months after rejoining IBM (I'd previously worked for Big Blue from 1982 to 2000), I was working with people at an insurance company where Linux on System z was being put through a trial run. What they were willing to run under Linux on System z was a DB2 Connect gateway, not a more high-profile workload such as an application server or a business analytics tool -- an indication of the iffiness of the Linux-on-z solution as seen by the firm's IT leaders.
Jump now to the present, and it seems to me that a corner has been turned. Over the past couple of years especially, the thinking of many IT people -- both mainframers and Linux specialists -- towards Linux on System has gone from cautious consideration to genuine enthusiasm. All kinds of folks have recently come to the realization that Linux on a mainframe is not some kind of sideways, six-of-one/half-dozen-of-the-other move, in terms of where Linux runs in the enterprise; rather, it's a move that upgrades the IT infrastructure in multiple ways, delivering enhanced performance, cost efficiency, availability, and security. A big chunk of installed System z MIPS, right now, is IFL capacity, and that slice of the pie is growing at a rapid clip (IFLs -- short for Integrated Facility for Linux -- are System z processors that are dedicated to Linux workloads). If I were you, I'd be looking to ride that wave.
In terms of factors that are driving the upsurge in Linux on System z usage, two really stand out: server consolidation and front-ending DB2 for z/OS. That first factor applies to Linux systems that may or may not have any connection to DB2 for z/OS subsystems. A wide variety of Linux workloads can be more cost-effectively run on mainframes versus distributed systems servers. Want to know what one of the big money savers is here? Think software licensing costs: the price of commercial software that runs under Linux is usually based on the number of processors on the system (or the partition) on which the OS is running, and plenty of organizations have found that moving Linux workloads to mainframes results in a decreases in the number of engines needed for those workloads, and that allows for a reduction in licensing costs for vendor-supplied software running in the Linux images.
That reduced-cost advantage also comes into play with regard to the second Linux-on-z driver, DB2 for z/OS front-ending. So do advantages in the areas of performance and security. When a Linux workload -- perhaps an application server running Java programs, or a vendor-supplied ERP or CRM package, or a business analytics tool -- is the client to a DB2 for z/OS server (i.e., if the Linux-based application is a DB2 DRDA requester), that back-and-forth network traffic can flow via HiperSockets. That's memory-to-memory data transfer, folks, and it screams. This "internal LAN" set-up also offers a security benefit, as less of your network is physically exposed to those who might want hack it (as Jim Elliott, one of IBM's prominent Linux-on-z experts, likes to put it, "The best LAN is one with no wires").
At this point, I would be remiss if I didn't bring z/VM into the picture. Can you run Linux on System z without z/VM? Absolutely, but z/VM delivers tremendous virtualization, management, availability, and security capabilities for your Linux-on-z environment. On the security front, RACF for z/VM brings the same peerless validation and auditing features that have made RACF for z/OS famous (note, too, that the cryptographic hardware on System z servers can be exploited for Linux workloads). High availability can be taken higher via the live guest relocation capability of a clustered z/VM configuration, which enables you to move a running Linux image from one z/VM LPAR to another.
Back to dollar (or your local currency) savings: I haven't yet given you the whole picture with regard to the cost-efficiency advantages of Linux on System z. Those IFLs I mentioned are not only great performers, allowing you to do more Linux work with fewer engines -- they are also very attractively priced (and of course they have no impact on the licensing costs for z/OS software running on the same System z server). Want to get even more bang for your bucks? Check out the IBM System z Solution Edition for Enterprise Linux, an offering for existing mainframe systems that combines IFLs, memory, z/VM, and maintenance in a package that's a terrific deal.
Are you wondering who in your shop would administer Linux systems on a mainframe? That's easy: your current Linux system administrators would do the job just fine. During a Linux on System z proof of concept on which I worked a couple of years ago, I recall that a Linux pro in the client's IT department said something to this effect: "It's just Linux." Bingo -- and it's available in distributions from SUSE and Red Hat. What if your organization decides to bring in z/VM with Linux for System z (good move)? Who will take care of that OS, if it's new to your company? Answer: your z/OS systems programmers. First, there's a good chance that one or more of them have some VM experience (many mainframers do). Second, VM has been significantly simplified in recent years (for a "simple" install of z/VM, the instructions take up all of one page).
I'll close by reiterating the fact that Linux on System z is not some Johnny-come-lately solution. It was announced in December of 1999, and it's been regularly enhanced since then. Yes, it took a while for this System z configuration option to catch fire, but it is certainly hot now. If you have Linux systems that connect to DB2 for z/OS, running those Linux images on the same mainframe where the DB2 for z/OS data server runs can take the core of your IT infrastructure to new levels of efficiency, performance, scalability, security, and manageability. Get smart, and check it out.
Fast forward to 2010. A few months after rejoining IBM (I'd previously worked for Big Blue from 1982 to 2000), I was working with people at an insurance company where Linux on System z was being put through a trial run. What they were willing to run under Linux on System z was a DB2 Connect gateway, not a more high-profile workload such as an application server or a business analytics tool -- an indication of the iffiness of the Linux-on-z solution as seen by the firm's IT leaders.
Jump now to the present, and it seems to me that a corner has been turned. Over the past couple of years especially, the thinking of many IT people -- both mainframers and Linux specialists -- towards Linux on System has gone from cautious consideration to genuine enthusiasm. All kinds of folks have recently come to the realization that Linux on a mainframe is not some kind of sideways, six-of-one/half-dozen-of-the-other move, in terms of where Linux runs in the enterprise; rather, it's a move that upgrades the IT infrastructure in multiple ways, delivering enhanced performance, cost efficiency, availability, and security. A big chunk of installed System z MIPS, right now, is IFL capacity, and that slice of the pie is growing at a rapid clip (IFLs -- short for Integrated Facility for Linux -- are System z processors that are dedicated to Linux workloads). If I were you, I'd be looking to ride that wave.
In terms of factors that are driving the upsurge in Linux on System z usage, two really stand out: server consolidation and front-ending DB2 for z/OS. That first factor applies to Linux systems that may or may not have any connection to DB2 for z/OS subsystems. A wide variety of Linux workloads can be more cost-effectively run on mainframes versus distributed systems servers. Want to know what one of the big money savers is here? Think software licensing costs: the price of commercial software that runs under Linux is usually based on the number of processors on the system (or the partition) on which the OS is running, and plenty of organizations have found that moving Linux workloads to mainframes results in a decreases in the number of engines needed for those workloads, and that allows for a reduction in licensing costs for vendor-supplied software running in the Linux images.
That reduced-cost advantage also comes into play with regard to the second Linux-on-z driver, DB2 for z/OS front-ending. So do advantages in the areas of performance and security. When a Linux workload -- perhaps an application server running Java programs, or a vendor-supplied ERP or CRM package, or a business analytics tool -- is the client to a DB2 for z/OS server (i.e., if the Linux-based application is a DB2 DRDA requester), that back-and-forth network traffic can flow via HiperSockets. That's memory-to-memory data transfer, folks, and it screams. This "internal LAN" set-up also offers a security benefit, as less of your network is physically exposed to those who might want hack it (as Jim Elliott, one of IBM's prominent Linux-on-z experts, likes to put it, "The best LAN is one with no wires").
At this point, I would be remiss if I didn't bring z/VM into the picture. Can you run Linux on System z without z/VM? Absolutely, but z/VM delivers tremendous virtualization, management, availability, and security capabilities for your Linux-on-z environment. On the security front, RACF for z/VM brings the same peerless validation and auditing features that have made RACF for z/OS famous (note, too, that the cryptographic hardware on System z servers can be exploited for Linux workloads). High availability can be taken higher via the live guest relocation capability of a clustered z/VM configuration, which enables you to move a running Linux image from one z/VM LPAR to another.
Back to dollar (or your local currency) savings: I haven't yet given you the whole picture with regard to the cost-efficiency advantages of Linux on System z. Those IFLs I mentioned are not only great performers, allowing you to do more Linux work with fewer engines -- they are also very attractively priced (and of course they have no impact on the licensing costs for z/OS software running on the same System z server). Want to get even more bang for your bucks? Check out the IBM System z Solution Edition for Enterprise Linux, an offering for existing mainframe systems that combines IFLs, memory, z/VM, and maintenance in a package that's a terrific deal.
Are you wondering who in your shop would administer Linux systems on a mainframe? That's easy: your current Linux system administrators would do the job just fine. During a Linux on System z proof of concept on which I worked a couple of years ago, I recall that a Linux pro in the client's IT department said something to this effect: "It's just Linux." Bingo -- and it's available in distributions from SUSE and Red Hat. What if your organization decides to bring in z/VM with Linux for System z (good move)? Who will take care of that OS, if it's new to your company? Answer: your z/OS systems programmers. First, there's a good chance that one or more of them have some VM experience (many mainframers do). Second, VM has been significantly simplified in recent years (for a "simple" install of z/VM, the instructions take up all of one page).
I'll close by reiterating the fact that Linux on System z is not some Johnny-come-lately solution. It was announced in December of 1999, and it's been regularly enhanced since then. Yes, it took a while for this System z configuration option to catch fire, but it is certainly hot now. If you have Linux systems that connect to DB2 for z/OS, running those Linux images on the same mainframe where the DB2 for z/OS data server runs can take the core of your IT infrastructure to new levels of efficiency, performance, scalability, security, and manageability. Get smart, and check it out.
Monday, September 17, 2012
DB2 10 for z/OS: New Options for Trimming and Upgrading Indexes
First, some explanation as to what I mean by "trimming and upgrading" indexes. My use of "trimming" refers to reducing the number of indexes defined on tables, through elimination of indexes that aren't providing value. Having fewer indexes on a table is good for cost efficiency, as every index increases the CPU cost of every insert and delete operation targeting the table (and every update of an indexed column), increases the cost of execution of several DB2 utilities (including LOAD, REORG, and RUNSTATS), and drives up disk space utilization (though that cost can be mitigated through the index compression capability introduced with DB2 9 for z/OS, about which I blogged a couple of years ago while working as an independent DB2 consultant). That's clear enough, but you may wonder why an index serving no useful purpose got created in your environment in the first place. Perhaps an index created in anticipation that it would be needed for good application performance turned out to be unnecessary for that purpose. Another possibility is that an index that had been necessary for technical reasons became unnecessary (or potentially so) through some enhancement delivered with a recent release of DB2. I'll provide more information on both of these scenarios in the remainder of this blog entry.
The index "upgrade" concept is related to index "trimming," in that identification of an unnecessary index presents you with a couple of options: one is to eliminate the useless index (to reduce CPU and disk costs, as mentioned); the other is to replace that index with one that is useful -- probably in terms of reducing CPU and elapsed time for some important queries (or searched updates or deletes). I'd consider that action to be an index "upgrade" because you've improved application performance while holding the line with respect to the number of indexes defined on a table (one useless index out, one useful index in).
Whether your aim is index trimming, index upgrading, or both, the essential first step is identification of indexes that can be eliminated because they no longer serve a useful purpose (if indeed they ever did). The last several releases of DB2 for z/OS have made this task easier and more fruitful. I'll first provide a quick review of enhancements related to index trimming and upgrading that were introduced with DB2 V8 and V9, then I'll get into new opportunities for tightening up and/or boosting the effectiveness of your index configuration that were made available through DB2 10 for z/OS. So, the review:
OK, now for the good stuff delivered through DB2 10 for z/OS that can take your index trimming and upgrading efforts to new levels of effectiveness. I'll start with what are called index INCLUDE columns. This is really big, in that it created a whole new category of indexes that could be eliminated, thereby paving the way for new index trimming and upgrading actions. The concept of index INCLUDE columns is simple, and best explained through an example. Suppose you have a table for which the combined values of columns C1 and C2 in each row have to be unique. To enable DB2 to enforce that requirement, you defined a unique constraint on (C1, C2), and created a unique index on (C1, C2). Suppose further that you wanted to provide index-only access for queries that reference only columns C1, C2, C3, and C4 of the table, or that you wanted to enable sort avoidance for queries that contain ORDER BY C1, C2, C3, C4. In a pre-DB2 10 environment, you addressed this second need by creating an additional index on columns C1, C2, C3, and C4. DB2 10 (in new-function mode) provides a new option: you can alter the unique index on (C1, C2) to INCLUDE (C3), and again to INCLUDE (C4) (for ALTER INDEX, only one column can be named in an INCLUDE specification -- n columns are included in an index through n executions of ALTER INDEX). In doing that, you'd end up with one index that could accomplish two purposes: 1) enforcement of uniqueness for (C1, C2) value pairs, and 2) query performance enhancement (through enablement of index-only access for queries referencing only columns C1, C2, C3, and C4; and sort avoidance for queries containing ORDER BY C1, C2, C3, C4 ). Voila! That index on (C1, C2, C3, C4) is now unnecessary and can be eliminated, clearing the way for index trimming or index upgrading. Note that INCLUDE is an option for CREATE INDEX as well as ALTER INDEX statements (in the case of CREATE INDEX, several columns can be listed in the INCLUDE part of the statement, so INCLUDE (C3, C4) is OK, syntax-wise). Whether it's an ALTER INDEX or a CREATE INDEX statement, the aim is the same: use one column (or set of columns) for enforcement of a UNIQUE constraint (or for a primary key), and with the same index include another column (or set of columns) that are "along for the ride" to provide better-performing query access paths. The extra columns added to an index via INCLUDE do NOT factor into uniqueness determination with regard to the "core" key on which the unique index is defined -- that's what enables dual-purpose indexing.
Here's another index trimming/upgrading opportunity provided courtesy of DB2 10: hash-organized tables. This is a feature that has generated a good bit of buzz in the DB2 community, but some folks have overlooked the tie-in to index trimming and upgrading. Here's how hash organization of data in a table works: when a table is created (in a DB2 10 new-function mode environment) with ORGANIZE BY HASH (or when an existing table is changed via ALTER TABLE to ADD ORGANIZE BY HASH), assignment of rows to table pages is not based on a clustering key; rather, the hash key value for a row (this is a unique key specified by the user) is run through a hashing algorithm whose output determines the target page. Generally speaking, when people think of hash-organized tables, they think of the performance boost that can be achieved when a query referencing the hash key in an equals predicate can get a row from DB2 with a single GETPAGE. That's great (for single-row access -- not so much for sequential data access), but here's something else to consider: if you're looking to alter an existing table to be hash-organized, and the unique key that you're going to use for hash-organization purposes is already defined as unique and has a corresponding uniqueness-enforcing index (and this applies as well to a primary key), that unique index will likely be unnecessary following the change to hash organization. Why? Because DB2 does not need an index on the hash key to ensure uniqueness of that key -- it can accomplish duplicate-value checking very efficiently using just the table (remember that the location of a row in a hash-organized table can be directly ascertained using the value of the hash key in the row). So, you can probably drop the index that had been previously created on the key that is the hash key for the hash-organized table, and there you have that new opportunity for index trimming or upgrading. Now, I say that you can probably drop that unique index, because it's possible that you'd want to keep it for queries that 1) reference only the column or columns in that index and 2) involve a scan of index values (recall that hash-organized tables are not good for the performance of data retrieval operations that involve sequential access).
[Note: technically speaking, converting a table to be hash-organized results in the creation by DB2 of a new index on the table, but this index -- the hash-overflow index, used to locate rows that were stored in the table space's overflow area because the hash-determined target page was full -- is a "sparse" index, containing entries only for rows in the overflow area. It does not have the CPU and disk space costs associated with traditional indexes.]
So, index trimming and upgrading is good for the efficiency of your DB2 for z/OS environment, and good for the performance of your DB2-accessing applications. DB2 10 continues the trend of giving you more opportunities to prune your index configuration, or to maintain a certain level, with respect to the number of indexes defined, while boosting overall index effectiveness (through replacement of unneeded indexes with value-delivering indexes). Invest some time in this area, and you're likely to see a good return.
The index "upgrade" concept is related to index "trimming," in that identification of an unnecessary index presents you with a couple of options: one is to eliminate the useless index (to reduce CPU and disk costs, as mentioned); the other is to replace that index with one that is useful -- probably in terms of reducing CPU and elapsed time for some important queries (or searched updates or deletes). I'd consider that action to be an index "upgrade" because you've improved application performance while holding the line with respect to the number of indexes defined on a table (one useless index out, one useful index in).
Whether your aim is index trimming, index upgrading, or both, the essential first step is identification of indexes that can be eliminated because they no longer serve a useful purpose (if indeed they ever did). The last several releases of DB2 for z/OS have made this task easier and more fruitful. I'll first provide a quick review of enhancements related to index trimming and upgrading that were introduced with DB2 V8 and V9, then I'll get into new opportunities for tightening up and/or boosting the effectiveness of your index configuration that were made available through DB2 10 for z/OS. So, the review:
- Table-controlled table partitioning (DB2 V8). Though this enhancement was delivered years ago, a lot of mainframe DB2 shops have yet to leverage its value (as I pointed out in an entry I posted to this blog a few months ago). Basically, it comes down to this: when you convert an index-controlled partitioned table space to an table-controlled partitioned table space, the formerly partition-controlling index may end up becoming useless. There's your opportunity for index trimming or upgrading.
- The LASTUSED column of the SYSIBM.SYSINDEXSPACESTATS catalog table (DB2 9). Those "I thought we'd need it, but maybe we don't" indexes were once pretty hard to find. Looking for package dependencies on an index will tell you when one is not used by static SQL statements, but what about dynamic SQL? The LASTUSED column of the real-time statistics table SYSIBM.SYSINDEXSPACESTATS makes identification of indexes that are just taking up space much easier than before, and that multiplies your index trimming and upgrading opportunities. I've blogged about this DB2 feature several times, most recently in an entry posted to this blog a couple of years ago.
- Index-on-expression (DB2 9). Index upgrading is all about replacing an unnecessary index with one that does you some good. DB2 9 broadened the "does you some good" landscape by introducing a new category of indexes, that being indexes defined not just on columns, but on column expressions, such as SUBSTR(DEPT_NAME, 2, 3). With such an index, a predicate that had been non-indexable can be made indexable, leading to potentially dramatic query performance improvement. I blogged about this DB2 enhancement a few years ago.
- The XML data type (DB2 9). Just storing XML data in a column defined with the XML data type (as opposed to, for example, the VARCHAR or CLOB data type) can deliver greatly improved performance with respect to querying said XML data. Create an index on that XML column (via XPath expressions allowed in index definitions), and you can REALLY speed things up (particularly when the XML documents are pretty large). This is another of those index upgrade opportunities -- and keep in mind that it pertains not only to retrieval of XML data, but to retrieval of non-XML data based on a predicate that references an XML column (an excellent reference for XML data in a mainframe DB2 environment is the IBM "red book" titled Extremely pureXML in DB2 10 for z/OS -- most of the information in which is also applicable to a DB2 9 system).
OK, now for the good stuff delivered through DB2 10 for z/OS that can take your index trimming and upgrading efforts to new levels of effectiveness. I'll start with what are called index INCLUDE columns. This is really big, in that it created a whole new category of indexes that could be eliminated, thereby paving the way for new index trimming and upgrading actions. The concept of index INCLUDE columns is simple, and best explained through an example. Suppose you have a table for which the combined values of columns C1 and C2 in each row have to be unique. To enable DB2 to enforce that requirement, you defined a unique constraint on (C1, C2), and created a unique index on (C1, C2). Suppose further that you wanted to provide index-only access for queries that reference only columns C1, C2, C3, and C4 of the table, or that you wanted to enable sort avoidance for queries that contain ORDER BY C1, C2, C3, C4. In a pre-DB2 10 environment, you addressed this second need by creating an additional index on columns C1, C2, C3, and C4. DB2 10 (in new-function mode) provides a new option: you can alter the unique index on (C1, C2) to INCLUDE (C3), and again to INCLUDE (C4) (for ALTER INDEX, only one column can be named in an INCLUDE specification -- n columns are included in an index through n executions of ALTER INDEX). In doing that, you'd end up with one index that could accomplish two purposes: 1) enforcement of uniqueness for (C1, C2) value pairs, and 2) query performance enhancement (through enablement of index-only access for queries referencing only columns C1, C2, C3, and C4; and sort avoidance for queries containing ORDER BY C1, C2, C3, C4 ). Voila! That index on (C1, C2, C3, C4) is now unnecessary and can be eliminated, clearing the way for index trimming or index upgrading. Note that INCLUDE is an option for CREATE INDEX as well as ALTER INDEX statements (in the case of CREATE INDEX, several columns can be listed in the INCLUDE part of the statement, so INCLUDE (C3, C4) is OK, syntax-wise). Whether it's an ALTER INDEX or a CREATE INDEX statement, the aim is the same: use one column (or set of columns) for enforcement of a UNIQUE constraint (or for a primary key), and with the same index include another column (or set of columns) that are "along for the ride" to provide better-performing query access paths. The extra columns added to an index via INCLUDE do NOT factor into uniqueness determination with regard to the "core" key on which the unique index is defined -- that's what enables dual-purpose indexing.
Here's another index trimming/upgrading opportunity provided courtesy of DB2 10: hash-organized tables. This is a feature that has generated a good bit of buzz in the DB2 community, but some folks have overlooked the tie-in to index trimming and upgrading. Here's how hash organization of data in a table works: when a table is created (in a DB2 10 new-function mode environment) with ORGANIZE BY HASH (or when an existing table is changed via ALTER TABLE to ADD ORGANIZE BY HASH), assignment of rows to table pages is not based on a clustering key; rather, the hash key value for a row (this is a unique key specified by the user) is run through a hashing algorithm whose output determines the target page. Generally speaking, when people think of hash-organized tables, they think of the performance boost that can be achieved when a query referencing the hash key in an equals predicate can get a row from DB2 with a single GETPAGE. That's great (for single-row access -- not so much for sequential data access), but here's something else to consider: if you're looking to alter an existing table to be hash-organized, and the unique key that you're going to use for hash-organization purposes is already defined as unique and has a corresponding uniqueness-enforcing index (and this applies as well to a primary key), that unique index will likely be unnecessary following the change to hash organization. Why? Because DB2 does not need an index on the hash key to ensure uniqueness of that key -- it can accomplish duplicate-value checking very efficiently using just the table (remember that the location of a row in a hash-organized table can be directly ascertained using the value of the hash key in the row). So, you can probably drop the index that had been previously created on the key that is the hash key for the hash-organized table, and there you have that new opportunity for index trimming or upgrading. Now, I say that you can probably drop that unique index, because it's possible that you'd want to keep it for queries that 1) reference only the column or columns in that index and 2) involve a scan of index values (recall that hash-organized tables are not good for the performance of data retrieval operations that involve sequential access).
[Note: technically speaking, converting a table to be hash-organized results in the creation by DB2 of a new index on the table, but this index -- the hash-overflow index, used to locate rows that were stored in the table space's overflow area because the hash-determined target page was full -- is a "sparse" index, containing entries only for rows in the overflow area. It does not have the CPU and disk space costs associated with traditional indexes.]
So, index trimming and upgrading is good for the efficiency of your DB2 for z/OS environment, and good for the performance of your DB2-accessing applications. DB2 10 continues the trend of giving you more opportunities to prune your index configuration, or to maintain a certain level, with respect to the number of indexes defined, while boosting overall index effectiveness (through replacement of unneeded indexes with value-delivering indexes). Invest some time in this area, and you're likely to see a good return.
Tuesday, August 28, 2012
The New IBM zEnterprise EC12: Big Iron Gets Bigger (and Better)
When I say "bigger," I don't mean bigger in a "footprint" sense -- the new flagship system in IBM's mainframe server product line is very close, size-wise, to its predecessor, the z196, and has similar power draw and heat load characteristics. What I mean by "bigger"is that the zEnterprise EC12 provides the ability to get more work done, in less time, more efficiently than ever before. That's good news for organizations that need best-of-breed enterprise computing capabilities. It's also good news for those of us who comprise the DB2 for z/OS community: the horse that our favorite DBMS rides is, in the form of the EC12, a thoroughbred like no other.
Continuing with the equine analogy, let's start with a look at the raw horsepower numbers. The EC12's engines run at 5.5 GHz, the highest clock frequency in the industry (the frequency of a z196 engine is 5.2 GHz). That's impressive enough, but the EC12 delivers a per-core processing capacity advantage over the z196 -- up to 25% -- that is significantly greater than suggested just by the difference in engine speed. Very fast processors are only part of the EC12 performance story. There's also a doubling of the cache memory on each processor chip, which reduces delays caused by having to go "off chip" for instructions and data.
Not only are there faster engines that can be more effectively kept busy, there are more engines: a single EC12 server can have up to 120 cores, up to 101 of which can be configured for client use (versus a maximum of 80 configurable cores on a z196 server). Put it all together (a larger number of faster engines), and a fully configured EC12 can deliver 50% more processing capacity than a top-end z196 server.
The total amount of memory available on an EC12 server -- 3 TB -- is the same as for the z196, but something new and very cool is being done with EC12 memory: it can be managed in 2 GB page frames. Yes, that is 2 GB, as in 2000 times the size of the 1 MB page frames available on z196 servers. What does that mean for DB2? Think about a page-fixed buffer pool of 2 GB in size (e.g., 500,000 4K buffers) fitting into ONE real storage page frame. Virtual-to-real storage translation will be more efficient than ever, and that will translate into CPU savings. If you've ever wondered, as a mainframe DB2 person, whether or not the age of Big Memory has really begun, wonder no more. It's here. Get ready to exploit it, if you haven't already.
You might find yourself thinking, "What would my company do with a server that has that much memory and that much processing capacity?" In that case, I have one word for you: virtualization. Don't limit your thinking of EC12 exploitation to one or just a few system images. Think LOTS of system images. Lots of z/OS LPARs on an EC12? Maybe, but I'm thinking more along Linux lines when I think of lots of virtual systems running on a single EC12 box. If your organization isn't running Linux on System z, or thinking seriously about doing so, you're missing a train on which a whole lot of people are traveling. Installation of IFL engines on zEnterprise servers (these are specialty engines that run the Linux operating system) is soaring, and with good reason. The mainframe is a great virtual-system platform (has been for decades, since long before "cloud" mean anything other than coalesced water vapor), and if you have a big and growing DB2 for z/OS client-server workload (as many organizations do), what better place is there for your DB2-accessing application servers than in a Linux image on the same System z server in which DB2 is running? Ever heard of HiperSockets? Network performance doesn't get any better than that. And, think of the benefits in terms of network simplification (and security) when app servers run under Linux on System z, instead of on outboard boxes. With the EC12, the already great System z virtualization story gets better still.
Virtualization's hot, and so is analytics. How about analytics on z? I'm not talking just about having the data accessed by your analytics tools housed in a mainframe database -- we already know that DB2 for z/OS is a great solution there. I'm talking about having the analytics tools themselves running on System z -- in a Linux image or a z/OS LPAR. More and more organizations are doing just that, and the EC12 will provide a related momentum boost. How's that? Well, as you get into progressively higher-value types of analytics -- from "what happened" reporting to "what will happen" predictive modeling -- then you find that the associated processing gets to be more and more CPU-intensive. The EC12 delivers here with significant improvements in performance for compute-intensive and floating-point applications. C and C++ on z? Java on z? Yes. The EC12 is the best mainframe server yet for business analytics.
Excellent performance across a wide range of applications is great, but we all know that a system has to be up if it's to deliver the processing power your organization needs. System z has always been the high-availability champ, and the EC12 takes high availability to a whole new level with zAware (short for System z Advanced Workload Analysis Reporter). zAware provides what one of my colleagues has termed "integrated expert system diagnostics," constantly monitoring OPERLOG messages (which at some sites can number in the millions per day) and presenting related information via an easy-to-interpret graphical user interface. zAware can help mainframe operations personnel to quickly determine when an EC12 system is not behaving normally -- thereby enabling corrective action to be taken before a situation reaches user-impacting proportions.
I've covered a lot in this blog post, but you can learn more about the zEnterprise EC12 and I encourage you to do so. Use the hyperlink I provided at the top of this entry, check out the IBM EC12 Technical Introduction and EC12 Technical Guide "red books," and look for presentations at national conferences and local events. The more you know, the more you'll realize that Big Iron never looked better.
Continuing with the equine analogy, let's start with a look at the raw horsepower numbers. The EC12's engines run at 5.5 GHz, the highest clock frequency in the industry (the frequency of a z196 engine is 5.2 GHz). That's impressive enough, but the EC12 delivers a per-core processing capacity advantage over the z196 -- up to 25% -- that is significantly greater than suggested just by the difference in engine speed. Very fast processors are only part of the EC12 performance story. There's also a doubling of the cache memory on each processor chip, which reduces delays caused by having to go "off chip" for instructions and data.
Not only are there faster engines that can be more effectively kept busy, there are more engines: a single EC12 server can have up to 120 cores, up to 101 of which can be configured for client use (versus a maximum of 80 configurable cores on a z196 server). Put it all together (a larger number of faster engines), and a fully configured EC12 can deliver 50% more processing capacity than a top-end z196 server.
The total amount of memory available on an EC12 server -- 3 TB -- is the same as for the z196, but something new and very cool is being done with EC12 memory: it can be managed in 2 GB page frames. Yes, that is 2 GB, as in 2000 times the size of the 1 MB page frames available on z196 servers. What does that mean for DB2? Think about a page-fixed buffer pool of 2 GB in size (e.g., 500,000 4K buffers) fitting into ONE real storage page frame. Virtual-to-real storage translation will be more efficient than ever, and that will translate into CPU savings. If you've ever wondered, as a mainframe DB2 person, whether or not the age of Big Memory has really begun, wonder no more. It's here. Get ready to exploit it, if you haven't already.
You might find yourself thinking, "What would my company do with a server that has that much memory and that much processing capacity?" In that case, I have one word for you: virtualization. Don't limit your thinking of EC12 exploitation to one or just a few system images. Think LOTS of system images. Lots of z/OS LPARs on an EC12? Maybe, but I'm thinking more along Linux lines when I think of lots of virtual systems running on a single EC12 box. If your organization isn't running Linux on System z, or thinking seriously about doing so, you're missing a train on which a whole lot of people are traveling. Installation of IFL engines on zEnterprise servers (these are specialty engines that run the Linux operating system) is soaring, and with good reason. The mainframe is a great virtual-system platform (has been for decades, since long before "cloud" mean anything other than coalesced water vapor), and if you have a big and growing DB2 for z/OS client-server workload (as many organizations do), what better place is there for your DB2-accessing application servers than in a Linux image on the same System z server in which DB2 is running? Ever heard of HiperSockets? Network performance doesn't get any better than that. And, think of the benefits in terms of network simplification (and security) when app servers run under Linux on System z, instead of on outboard boxes. With the EC12, the already great System z virtualization story gets better still.
Virtualization's hot, and so is analytics. How about analytics on z? I'm not talking just about having the data accessed by your analytics tools housed in a mainframe database -- we already know that DB2 for z/OS is a great solution there. I'm talking about having the analytics tools themselves running on System z -- in a Linux image or a z/OS LPAR. More and more organizations are doing just that, and the EC12 will provide a related momentum boost. How's that? Well, as you get into progressively higher-value types of analytics -- from "what happened" reporting to "what will happen" predictive modeling -- then you find that the associated processing gets to be more and more CPU-intensive. The EC12 delivers here with significant improvements in performance for compute-intensive and floating-point applications. C and C++ on z? Java on z? Yes. The EC12 is the best mainframe server yet for business analytics.
Excellent performance across a wide range of applications is great, but we all know that a system has to be up if it's to deliver the processing power your organization needs. System z has always been the high-availability champ, and the EC12 takes high availability to a whole new level with zAware (short for System z Advanced Workload Analysis Reporter). zAware provides what one of my colleagues has termed "integrated expert system diagnostics," constantly monitoring OPERLOG messages (which at some sites can number in the millions per day) and presenting related information via an easy-to-interpret graphical user interface. zAware can help mainframe operations personnel to quickly determine when an EC12 system is not behaving normally -- thereby enabling corrective action to be taken before a situation reaches user-impacting proportions.
I've covered a lot in this blog post, but you can learn more about the zEnterprise EC12 and I encourage you to do so. Use the hyperlink I provided at the top of this entry, check out the IBM EC12 Technical Introduction and EC12 Technical Guide "red books," and look for presentations at national conferences and local events. The more you know, the more you'll realize that Big Iron never looked better.
Monday, August 20, 2012
When CICS and DB2 for z/OS Monitors Disagree (Part 2)
About a week ago, in part 1 of this two-part blog entry, I described a situation I've encountered several times at mainframe DB2 sites over the past 20 years: CICS users complain of elongated response times for DB2-accessing transactions, and these users' complaints are backed up by CICS monitor data that indicate elevated "wait for DB2" times as being the cause of the performance degradation. Concurrently, DB2 systems people point to their own data, from a DB2 monitor, showing that in-DB2 time for CICS transactions is NOT significantly higher than normal during times when the CICS programs are running long. Finger-pointing can ensue, and frustration can build in the face of a seemingly intractable problem: how can it be that the CICS monitor shows higher "wait for DB2" times, while according to DB2 monitor data the performance picture looks good? I noted in the aforementioned part 1 blog entry that this situation can occur when there is either a shortage of DB2 threads available for CICS-DB2 transactions or a shortage of TCBs through which the DB2 threads are utilized. In this part 2 entry I'll shine a light on another possibility: the disagreement between CICS and DB2 monitors can be caused by inappropriate priority specifications for DB2 address spaces and/or CICS transactions.
The priority to which I'm referring is dispatching priority -- the kind that determines which of several "in and ready" tasks in a z/OS system will get CPU cycles first. Dispatching priorities for address spaces are specified in the WLM policy of the z/OS LPAR in which they run. The busier a system is (with respect to CPU utilization), the more important it is to get the address space dispatching priorities right. DB2's IRLM address space (the lock manager) should have a very high priority, and in fact should be assigned to the SYSSTC service class (for ultra-high-priority address spaces). Most sites get this right. The mistake that I've seen more than once is giving the DB2 system services and database services address spaces (aka MSTR and DBM1) a priority that is lower than that of the CICS application owning regions (AORs) in the z/OS LPAR. The thinking here is often something like this: "If the DB2 MSTR and DBM1 address spaces have a higher priority than the CICS AORs, they will get in the way of CICS transactions and performance will be the worse for it." Not so. The DB2 "system" address spaces generally consume little in the way of CPU time (DDF can be a different animal in this regard, as I pointed out in an entry posted to this blog last month), so "getting in the way" of CICS regions is not an issue. In fact, it's when CICS regions "get in the way" of the DB2 system address spaces (as they can in a busy system when they have a higher-than-DB2 priority), that CICS transaction performance can go downhill.
How's that? Well, it can come down to a thread-availability issue. It's common for CICS-DB2 threads to be created and terminated with great frequency as transactions start and complete. The DB2 address space that handles thread creation and termination is MSTR, the system services address space. If CICS address spaces are ahead of MSTR in the line for CPU cycles (i.e., if the CICS AORs have a higher priority than MSTR), and if the z/OS LPAR is really busy (think CPU utilization north of 90%), MSTR may not be readily dispatched when it has work to do -- and the work MSTR needs to do may be related to CICS-DB2 thread creation. If DB2 threads aren't made available to CICS transactions in a timely manner, the transactions will take longer to complete, and "wait for DB2" will be seen -- via a CICS monitor -- as the reason for the not-so-good performance. Check DB2 monitor data at such times and you'll likely see that things look fine. This is because DB2 doesn't "see" a CICS transaction until it gets a thread. Thus, as I pointed out in part 1 of this two-part entry, the "wait for DB2" time reported by the CICS monitor can be time spent in-between CICS and DB2, and that's why the different monitors paint different pictures of the same performance scene: the CICS monitor indicates that transactions are waiting longer for DB2, and that's true, but the DB2 monitor shows that when a transaction does get the thread it needs it performs just fine. If you see this kind of situation in your environment, check the dispatching priorities of the DB2 address spaces and of the CICS AORs. The priority of the DB2 MSTR, DBM1, and DDF address spaces should be a little higher than that of the CICS AORs in the LPAR. [Don't sweat giving DDF a relatively high dispatching priority. The vast majority of DDF CPU utilization is associated with the execution of SQL statements that come from network-attached clients, and these execute at the priority of the application processes that issue the SQL statements -- not at the priority of DDF.] Oh, and in addition to ensuring that DB2 MSTR can create threads quickly when they are needed for CICS transaction execution, think about taking some of the thread creation pressure off of MSTR by increasing the rate at which CICS-DB2 threads are reused -- I blogged on this topic a few months ago.
One other thing to check is the priority of the CICS transactions themselves -- more precisely, the priority of the TCBs used for CICS transaction execution. This is specified through the value given to PRIORITY in the definition of a CICS DB2ENTRY resource, or in the DB2CONN definition for transactions that utilize pool threads. The default value of the PRIORITY attribute is HIGH, and this means that the tasks associated with entry threads (or pool threads, as the case may be) will have a dispatching priority that is a little higher than that of the CICS AOR's main task. HIGH is an OK specification if the z/OS LPAR isn't too busy -- it helps to get transactions through the system quickly; however, if the z/OS LPAR is very busy, PRIORITY(HIGH) may lead to a throughput issue -- this because not only the DB2 address spaces, but the CICS AORs, as well, could end up waiting behind transactions to get dispatched. In that case, going with PRIORITY(LOW) could actually improve CICS-DB2 transaction throughput. I have seen this myself. Bear in mind that PRIORITY(LOW) doesn't mean batch low -- it means that the transactions will have a priority that is a little lower than that of the CICS region's main task.
Bottom line: dealing with (or better, preventing) CICS-DB2 performance problems is sometimes just a matter of getting your priorities in order.
The priority to which I'm referring is dispatching priority -- the kind that determines which of several "in and ready" tasks in a z/OS system will get CPU cycles first. Dispatching priorities for address spaces are specified in the WLM policy of the z/OS LPAR in which they run. The busier a system is (with respect to CPU utilization), the more important it is to get the address space dispatching priorities right. DB2's IRLM address space (the lock manager) should have a very high priority, and in fact should be assigned to the SYSSTC service class (for ultra-high-priority address spaces). Most sites get this right. The mistake that I've seen more than once is giving the DB2 system services and database services address spaces (aka MSTR and DBM1) a priority that is lower than that of the CICS application owning regions (AORs) in the z/OS LPAR. The thinking here is often something like this: "If the DB2 MSTR and DBM1 address spaces have a higher priority than the CICS AORs, they will get in the way of CICS transactions and performance will be the worse for it." Not so. The DB2 "system" address spaces generally consume little in the way of CPU time (DDF can be a different animal in this regard, as I pointed out in an entry posted to this blog last month), so "getting in the way" of CICS regions is not an issue. In fact, it's when CICS regions "get in the way" of the DB2 system address spaces (as they can in a busy system when they have a higher-than-DB2 priority), that CICS transaction performance can go downhill.
How's that? Well, it can come down to a thread-availability issue. It's common for CICS-DB2 threads to be created and terminated with great frequency as transactions start and complete. The DB2 address space that handles thread creation and termination is MSTR, the system services address space. If CICS address spaces are ahead of MSTR in the line for CPU cycles (i.e., if the CICS AORs have a higher priority than MSTR), and if the z/OS LPAR is really busy (think CPU utilization north of 90%), MSTR may not be readily dispatched when it has work to do -- and the work MSTR needs to do may be related to CICS-DB2 thread creation. If DB2 threads aren't made available to CICS transactions in a timely manner, the transactions will take longer to complete, and "wait for DB2" will be seen -- via a CICS monitor -- as the reason for the not-so-good performance. Check DB2 monitor data at such times and you'll likely see that things look fine. This is because DB2 doesn't "see" a CICS transaction until it gets a thread. Thus, as I pointed out in part 1 of this two-part entry, the "wait for DB2" time reported by the CICS monitor can be time spent in-between CICS and DB2, and that's why the different monitors paint different pictures of the same performance scene: the CICS monitor indicates that transactions are waiting longer for DB2, and that's true, but the DB2 monitor shows that when a transaction does get the thread it needs it performs just fine. If you see this kind of situation in your environment, check the dispatching priorities of the DB2 address spaces and of the CICS AORs. The priority of the DB2 MSTR, DBM1, and DDF address spaces should be a little higher than that of the CICS AORs in the LPAR. [Don't sweat giving DDF a relatively high dispatching priority. The vast majority of DDF CPU utilization is associated with the execution of SQL statements that come from network-attached clients, and these execute at the priority of the application processes that issue the SQL statements -- not at the priority of DDF.] Oh, and in addition to ensuring that DB2 MSTR can create threads quickly when they are needed for CICS transaction execution, think about taking some of the thread creation pressure off of MSTR by increasing the rate at which CICS-DB2 threads are reused -- I blogged on this topic a few months ago.
One other thing to check is the priority of the CICS transactions themselves -- more precisely, the priority of the TCBs used for CICS transaction execution. This is specified through the value given to PRIORITY in the definition of a CICS DB2ENTRY resource, or in the DB2CONN definition for transactions that utilize pool threads. The default value of the PRIORITY attribute is HIGH, and this means that the tasks associated with entry threads (or pool threads, as the case may be) will have a dispatching priority that is a little higher than that of the CICS AOR's main task. HIGH is an OK specification if the z/OS LPAR isn't too busy -- it helps to get transactions through the system quickly; however, if the z/OS LPAR is very busy, PRIORITY(HIGH) may lead to a throughput issue -- this because not only the DB2 address spaces, but the CICS AORs, as well, could end up waiting behind transactions to get dispatched. In that case, going with PRIORITY(LOW) could actually improve CICS-DB2 transaction throughput. I have seen this myself. Bear in mind that PRIORITY(LOW) doesn't mean batch low -- it means that the transactions will have a priority that is a little lower than that of the CICS region's main task.
Bottom line: dealing with (or better, preventing) CICS-DB2 performance problems is sometimes just a matter of getting your priorities in order.
Wednesday, August 8, 2012
When CICS and DB2 for z/OS Monitors Disagree (Part 1)
Today I'm writing about a situation that I first encountered almost 20 years ago, and which I most recently saw about three months ago: an organization with a CICS transaction workload accessing a DB2 for z/OS database reports a performance problem. The company's CICS people point to monitor data that indicate higher "wait for DB2" times when CICS transactions run long. Meanwhile, the folks on the DB2 team serve up monitor numbers of their own, showing that in-DB2 times for CICS transactions are not higher than normal during CICS slow-down events. What's to be made of this seeming contradiction? How can it be that for a given period, a CICS monitor shows unusually high "wait for DB2" times while DB2 monitor data for the same period show consistently good in-DB2 elapsed times for CICS transactions? I've found that this kind of "are too!" / "am not!" situation is usually the result of CICS transactions waiting longer than usual for DB2 threads. "Wait for thread" goes in the "wait for DB2" bucket from the CICS monitor perspective, but it doesn't inflate in-DB2 elapsed times because the DB2 monitor doesn't "see" a transaction until it has a thread. "Wait for thread" time is, in essence, a waiting that is "between" CICS and DB2, and that's why the CICS and DB2 monitors see it differently.
OK, so why would CICS transactions have to wait longer than usual for DB2 threads? In my experience, this happens for one of two reasons: not enough threads, or not enough priority. In the remainder of this blog entry I'll expand on the first of these two factors: not enough threads (in some cases, this is actually a matter of not having enough TCBs, as explained below). In a part 2 entry, I'll address the priority issue.
Back in the 1980s and 1990s, the connection between a CICS application-owning region (AOR) and a local DB2 for z/OS subsystem was defined by way of a macro called the RCT, or resource control table. RCT gave way to RDO (CICS resource definition online) towards the end of the 1990s. One of the CICS-DB2 connection values, specified in the definition of a resource called DB2CONN, is TCBLIMIT -- the maximum number of TCBs (task control blocks) that can be used to connect transactions running in the CICS region to a target DB2 subsystem. Another of the CICS-DB2 connection set-up parameters, THREADLIMIT, appears in a DB2CONN resource definition (indicating the maximum number of pool threads for CICS-DB2 transactions) and can also appear in a DB2ENTRY resource definition (indicating the maximum number of entry threads for transactions associated with the DB2ENTRY resource). The sum of all THREADLIMIT values (for pool threads and entry threads) for a given CICS region should be less than the value of TCBLIMIT for the region, and people generally start out that way; however, over time folks may increase THREADLIMIT values -- to accommodate a growing CICS-DB2 transaction workload -- without adjusting TCBLIMIT accordingly; thus, the sum of all THREADLIMIT values for a CICS region could end up being greater than the TCBLIMIT value, and that could result in a situation in which threads are available for transactions, but TCBs needed to use those threads to connect to DB2 are insufficient in number to avoid elongated wait times. You should check your system for this possibility, and, if the sum of THREADLIMIT values exceeds TCBLIMIT for a region, either adjust TCBLIMIT upwards or adjust THREADLIMIT values downwards. I'd generally lean towards a larger TCBLIMIT value in that case, but if there were a lot of protected entry threads defined (PROTECTNUM > 0 and THREADLIMIT > 0 for DB2ENTRY resources), I'd consider reducing PROTECTNUM and THREADLIMIT values for those DB2ENTRY resources.
Here's another scenario: TCBLIMIT is greater than the sum of all THREADLIMIT values for a CICS region, but wait-for-thread time is still relatively high. This can happen when the following are true:
In that case, it may be that lots of transactions are overflowing to the pool, but the number of pool threads is not large enough for the associated transaction volume. This situation could be indicated by a non-zero value in the W/P column for the POOL row in the output of a DSNC DISPLAY STATISTICS command for the region in question (this is an attachment command issued through a CICS-supplied transaction called DSNC). If you see such a value, bump up the THREADLIMIT number in the DB2CONN resource definition for the region (and in doing that, check, as previously mentioned, to see if a TCBLIMIT increase might be needed, as well).
Of course, you could also have elongated wait-for-thread times if a DB2ENTRY resource definition has a very small (but non-zero) THREADLIMIT value and a specification of THREADWAIT(YES). Before messing with that, check to see if this is an intentional aspect of the DB2ENTRY definition: it's conceivable that a certain transaction must be limited with respect to concurrent execution (maybe even single-threaded, via THREADLIMIT(1) and THREADWAIT(YES) for a DB2ENTRY resource) in order to prevent a contention problem.
Here's one more possibility (though it's not something that I've actually seen): you may have high CICS wait-for-thread times because the DB2 limit on the number of threads for local (i.e., not network-attached) programs is too small. The ZPARM parameter CTHREAD specifies this limit, and it's for all local threads: for CICS transactions, batch jobs, TSO users, etc. If you have an indication of high CICS wait-for-thread times, and you have a pretty small CTHREAD value, consider adjusting CTHREAD upwards. Note that CTHREAD can have a much larger value in a DB2 10 environment versus prior versions of DB2 for z/OS -- this because almost all thread-related virtual storage usage goes above the 2 GB "bar" when packages are bound (or rebound) in a DB2 10 system. Whereas the sum of CTHREAD and MAXDBAT formerly had to be less than 2000, in a DB2 10 environment that sum has to be less than 20,000.
You can help to avoid high CICS wait-for-thread times by ensuring that you have enough threads (and enough CICS TCBs) for your CICS-DB2 transactions. Check back in a week or so for part 2 of this two-part entry, in which I'll explain how high wait-for-thread times can be a matter of priorities (dispatching, that is).
OK, so why would CICS transactions have to wait longer than usual for DB2 threads? In my experience, this happens for one of two reasons: not enough threads, or not enough priority. In the remainder of this blog entry I'll expand on the first of these two factors: not enough threads (in some cases, this is actually a matter of not having enough TCBs, as explained below). In a part 2 entry, I'll address the priority issue.
Back in the 1980s and 1990s, the connection between a CICS application-owning region (AOR) and a local DB2 for z/OS subsystem was defined by way of a macro called the RCT, or resource control table. RCT gave way to RDO (CICS resource definition online) towards the end of the 1990s. One of the CICS-DB2 connection values, specified in the definition of a resource called DB2CONN, is TCBLIMIT -- the maximum number of TCBs (task control blocks) that can be used to connect transactions running in the CICS region to a target DB2 subsystem. Another of the CICS-DB2 connection set-up parameters, THREADLIMIT, appears in a DB2CONN resource definition (indicating the maximum number of pool threads for CICS-DB2 transactions) and can also appear in a DB2ENTRY resource definition (indicating the maximum number of entry threads for transactions associated with the DB2ENTRY resource). The sum of all THREADLIMIT values (for pool threads and entry threads) for a given CICS region should be less than the value of TCBLIMIT for the region, and people generally start out that way; however, over time folks may increase THREADLIMIT values -- to accommodate a growing CICS-DB2 transaction workload -- without adjusting TCBLIMIT accordingly; thus, the sum of all THREADLIMIT values for a CICS region could end up being greater than the TCBLIMIT value, and that could result in a situation in which threads are available for transactions, but TCBs needed to use those threads to connect to DB2 are insufficient in number to avoid elongated wait times. You should check your system for this possibility, and, if the sum of THREADLIMIT values exceeds TCBLIMIT for a region, either adjust TCBLIMIT upwards or adjust THREADLIMIT values downwards. I'd generally lean towards a larger TCBLIMIT value in that case, but if there were a lot of protected entry threads defined (PROTECTNUM > 0 and THREADLIMIT > 0 for DB2ENTRY resources), I'd consider reducing PROTECTNUM and THREADLIMIT values for those DB2ENTRY resources.
Here's another scenario: TCBLIMIT is greater than the sum of all THREADLIMIT values for a CICS region, but wait-for-thread time is still relatively high. This can happen when the following are true:
- THREADLIMIT is set to zero (or a very small number) for DB2ENTRY resources
- THREADWAIT(POOL) is specified for these same DB2ENTRY resources
- THREADLIMIT is a too-small number for pool threads (i.e., THREADLIMIT has a small value in the region's DB2CONN resource definition)
In that case, it may be that lots of transactions are overflowing to the pool, but the number of pool threads is not large enough for the associated transaction volume. This situation could be indicated by a non-zero value in the W/P column for the POOL row in the output of a DSNC DISPLAY STATISTICS command for the region in question (this is an attachment command issued through a CICS-supplied transaction called DSNC). If you see such a value, bump up the THREADLIMIT number in the DB2CONN resource definition for the region (and in doing that, check, as previously mentioned, to see if a TCBLIMIT increase might be needed, as well).
Of course, you could also have elongated wait-for-thread times if a DB2ENTRY resource definition has a very small (but non-zero) THREADLIMIT value and a specification of THREADWAIT(YES). Before messing with that, check to see if this is an intentional aspect of the DB2ENTRY definition: it's conceivable that a certain transaction must be limited with respect to concurrent execution (maybe even single-threaded, via THREADLIMIT(1) and THREADWAIT(YES) for a DB2ENTRY resource) in order to prevent a contention problem.
Here's one more possibility (though it's not something that I've actually seen): you may have high CICS wait-for-thread times because the DB2 limit on the number of threads for local (i.e., not network-attached) programs is too small. The ZPARM parameter CTHREAD specifies this limit, and it's for all local threads: for CICS transactions, batch jobs, TSO users, etc. If you have an indication of high CICS wait-for-thread times, and you have a pretty small CTHREAD value, consider adjusting CTHREAD upwards. Note that CTHREAD can have a much larger value in a DB2 10 environment versus prior versions of DB2 for z/OS -- this because almost all thread-related virtual storage usage goes above the 2 GB "bar" when packages are bound (or rebound) in a DB2 10 system. Whereas the sum of CTHREAD and MAXDBAT formerly had to be less than 2000, in a DB2 10 environment that sum has to be less than 20,000.
You can help to avoid high CICS wait-for-thread times by ensuring that you have enough threads (and enough CICS TCBs) for your CICS-DB2 transactions. Check back in a week or so for part 2 of this two-part entry, in which I'll explain how high wait-for-thread times can be a matter of priorities (dispatching, that is).
Sunday, July 29, 2012
DB2 10 (and 9) for z/OS: I Need 128 GB of WHAT?
The other day, I got a note from a mainframe DB2 DBA who'd seen a system requirement for DB2 10 that had him concerned. This DBA, whose company was preparing for the migration of their DB2 9 subsystems to DB2 10, had read the following in the text of IBM APAR II14564 (an "info" APAR that documents some DB2 9-to-10 migration and fallback considerations):
128 GB is a lot of space. That and the "for each subsystem" bit really jumped out at the DBA. As he pointed out in his note, "We have four subsystems on an LPAR and that would be half a terabyte."
I'll tell anyone who's concerned about this what I told the DBA who contacted me: RELAX. Most important point: this is VIRTUAL storage we're talking about, not real storage. Second most important point: this HVSHARE value (a parameter in the IEASYSxx member of PARMLIB) establishes the BOUNDARY of an area of z/OS virtual storage -- it demarcates the part of z/OS virtual storage that is available for use by subsystems in the form of something called shared private storage. How can something be both shared and private? I'll get to that in a moment.
The 128 GB of HVSHARE space required by DB2 is, more than likely, just a small portion of the total HVSHARE space defined on your system: the DEFAULT value of HVSHARE is 510 TERABYTES. How can any system support 510 TB of HVSHARE virtual storage? Easy: shared private storage does not have to be backed by real storage when it's allocated -- it only has to be backed by real storage as it's USED. The amount of HVSHARE space used by DB2 and other subsystems will probably be a small fraction of what's allocated on the z/OS LPAR. You might see a few hundred megabytes, or maybe a gigabyte or two, of shared private storage used by a production DB2 subsystem -- nothing remotely close to 128 GB (DB2 10 uses more shared private storage than does DB2 9, for reasons noted a little further on in this entry). If you have the current version of IBM's Tivoli OMEGAMON XE for DB2 Performance Expert on z/OS monitor, you can see, in a statistics report or online display, the amount of shared private storage used by a DB2 subsystem. Are you concerned that the default of 510 TB of z/OS virtual storage available for use as shared private storage will leave little virtual storage space for other uses? No need to worry about that: 64-bit addressing enables access to 16 million terabytes of virtual storage. Setting aside 510 of those 16 million terabytes for shared private storage is not a big deal.
Now, how does DB2 use shared private memory? That actually depends on the release of DB2 to which you're referring. You see, DB2 9 was actually the first DB2 for z/OS release to exploit shared private memory (a technology delivered with z/OS 1.5 in the mid-2000s). Here's the story: a lot of data is transferred between certain DB2 address spaces (for example, between the DDF and DBM1 address spaces, and between utility address spaces and DBM1). If said data can be exchanged by way of an area of virtual storage that is part of both of the participating address spaces, the CPU cost of data transfer operations is reduced. An option here would be to use common storage, which is part of every address space. One problem with that approach is that making common storage bigger to accommodate large inter-address space exchanges of data reduces the private portion of virtual storage for all address spaces. Why make the private part of every address space smaller when the data exchanges we're talking about involve just a few DB2 address spaces? Shared private storage offered a better solution: a section of virtual storage in which shared memory objects could be created -- these being areas of virtual storage that can be shared by address spaces, but only by address spaces that register to use a particular shared memory object (that's what makes the virtual storage both shared and private). Beginning with DB2 9, the DDF, DBM1, and MSTR address spaces (and, as mentioned, address spaces in which some DB2 utilities execute) are registered to use the DB2 shared memory object, which is created at DB2 start-up time in the portion of virtual storage reserved for this purpose by way of the HVSHARE specification.
So, if DB2 9 is the release that first exploited shared private storage, how is it that DB2 10 has more people talking about this z/OS resource? The answer, in a word, is threads. One of the big benefits provided by DB2 10 for z/OS is constraint relief as it pertains to virtual storage below the 2 GB "bar." This big free-up of space was achieved by moving almost all thread- and stack-related storage above the bar ("stack" storage is essentially working storage). Most of this thread and stack storage is of the shared private variety; so, the amount of shared private storage usage in a DB2 10 environment will depend largely on the number of threads that are concurrently active in the system. I'll point out here that you need to rebind packages in a DB2 10 environment to get this thread-related virtual storage constraint relief. When you do that, space used for package tables (aka PTs -- the sections of packages copied out of the skeleton package table for each package executed by a particular thread) is moved not only above the bar, but out of the EDM pool (the skeleton package table remains in the EDM pool). Thus is delivered the added benefit of eliminating contention caused by the serialization of updates to EDM pool control blocks used in the execution of packages allocated to threads (this contention in previous DB2 releases was associated with latch class 24).
There you have it. Yes, DB2 10 requires (as does DB2 9) 128 GB of HVSHARE space, so that the shared memory object used by DB2 address spaces can be created when DB2 starts up. No, you don't need to get worked up about this, because 1) the amount of shared private storage used by DB2 is going to be WAY less than 128 GB, and 2) a z/OS LPAR, by default, will have 510 terabytes of HVSHARE space. There are plenty of things in life you can worry about. This shouldn't be one of them.
REAL STORAGE CONSIDERATION
Ensure that you have defined a high enough value for HVSHARE... to satisfy all requests on this LPAR. DB2 Version 10 requires 128GB of contiguous 64-bit shared private storage above the 2GB bar for each DB2 subsystem... If not enough contiguous storage for... HVSHARE exists, the DB2 [subsystem] will abend with S0DC2 RC8A001721 at the first starting.
Ensure that you have defined a high enough value for HVSHARE... to satisfy all requests on this LPAR. DB2 Version 10 requires 128GB of contiguous 64-bit shared private storage above the 2GB bar for each DB2 subsystem... If not enough contiguous storage for... HVSHARE exists, the DB2 [subsystem] will abend with S0DC2 RC8A001721 at the first starting.
128 GB is a lot of space. That and the "for each subsystem" bit really jumped out at the DBA. As he pointed out in his note, "We have four subsystems on an LPAR and that would be half a terabyte."
I'll tell anyone who's concerned about this what I told the DBA who contacted me: RELAX. Most important point: this is VIRTUAL storage we're talking about, not real storage. Second most important point: this HVSHARE value (a parameter in the IEASYSxx member of PARMLIB) establishes the BOUNDARY of an area of z/OS virtual storage -- it demarcates the part of z/OS virtual storage that is available for use by subsystems in the form of something called shared private storage. How can something be both shared and private? I'll get to that in a moment.
The 128 GB of HVSHARE space required by DB2 is, more than likely, just a small portion of the total HVSHARE space defined on your system: the DEFAULT value of HVSHARE is 510 TERABYTES. How can any system support 510 TB of HVSHARE virtual storage? Easy: shared private storage does not have to be backed by real storage when it's allocated -- it only has to be backed by real storage as it's USED. The amount of HVSHARE space used by DB2 and other subsystems will probably be a small fraction of what's allocated on the z/OS LPAR. You might see a few hundred megabytes, or maybe a gigabyte or two, of shared private storage used by a production DB2 subsystem -- nothing remotely close to 128 GB (DB2 10 uses more shared private storage than does DB2 9, for reasons noted a little further on in this entry). If you have the current version of IBM's Tivoli OMEGAMON XE for DB2 Performance Expert on z/OS monitor, you can see, in a statistics report or online display, the amount of shared private storage used by a DB2 subsystem. Are you concerned that the default of 510 TB of z/OS virtual storage available for use as shared private storage will leave little virtual storage space for other uses? No need to worry about that: 64-bit addressing enables access to 16 million terabytes of virtual storage. Setting aside 510 of those 16 million terabytes for shared private storage is not a big deal.
Now, how does DB2 use shared private memory? That actually depends on the release of DB2 to which you're referring. You see, DB2 9 was actually the first DB2 for z/OS release to exploit shared private memory (a technology delivered with z/OS 1.5 in the mid-2000s). Here's the story: a lot of data is transferred between certain DB2 address spaces (for example, between the DDF and DBM1 address spaces, and between utility address spaces and DBM1). If said data can be exchanged by way of an area of virtual storage that is part of both of the participating address spaces, the CPU cost of data transfer operations is reduced. An option here would be to use common storage, which is part of every address space. One problem with that approach is that making common storage bigger to accommodate large inter-address space exchanges of data reduces the private portion of virtual storage for all address spaces. Why make the private part of every address space smaller when the data exchanges we're talking about involve just a few DB2 address spaces? Shared private storage offered a better solution: a section of virtual storage in which shared memory objects could be created -- these being areas of virtual storage that can be shared by address spaces, but only by address spaces that register to use a particular shared memory object (that's what makes the virtual storage both shared and private). Beginning with DB2 9, the DDF, DBM1, and MSTR address spaces (and, as mentioned, address spaces in which some DB2 utilities execute) are registered to use the DB2 shared memory object, which is created at DB2 start-up time in the portion of virtual storage reserved for this purpose by way of the HVSHARE specification.
So, if DB2 9 is the release that first exploited shared private storage, how is it that DB2 10 has more people talking about this z/OS resource? The answer, in a word, is threads. One of the big benefits provided by DB2 10 for z/OS is constraint relief as it pertains to virtual storage below the 2 GB "bar." This big free-up of space was achieved by moving almost all thread- and stack-related storage above the bar ("stack" storage is essentially working storage). Most of this thread and stack storage is of the shared private variety; so, the amount of shared private storage usage in a DB2 10 environment will depend largely on the number of threads that are concurrently active in the system. I'll point out here that you need to rebind packages in a DB2 10 environment to get this thread-related virtual storage constraint relief. When you do that, space used for package tables (aka PTs -- the sections of packages copied out of the skeleton package table for each package executed by a particular thread) is moved not only above the bar, but out of the EDM pool (the skeleton package table remains in the EDM pool). Thus is delivered the added benefit of eliminating contention caused by the serialization of updates to EDM pool control blocks used in the execution of packages allocated to threads (this contention in previous DB2 releases was associated with latch class 24).
There you have it. Yes, DB2 10 requires (as does DB2 9) 128 GB of HVSHARE space, so that the shared memory object used by DB2 address spaces can be created when DB2 starts up. No, you don't need to get worked up about this, because 1) the amount of shared private storage used by DB2 is going to be WAY less than 128 GB, and 2) a z/OS LPAR, by default, will have 510 terabytes of HVSHARE space. There are plenty of things in life you can worry about. This shouldn't be one of them.
Thursday, July 12, 2012
DB2 for z/OS: What's With DDF CPU Time?
Recently I got a question from a mainframer who was seeing rather high CPU times for the DB2 DDF address space. DDF, of course, is the distributed data facility -- the DB2 component that handles communications with client systems (usually application servers) that send requests to the DB2 for z/OS server using the DRDA protocol (DRDA being Distributed Relational Database Architecture). Anyway, the questioner wanted to know what he and his colleagues could do to reduce DDF CPU consumption.
This query prompted me to write a blog entry about DDF CPU utilization, because there remains today -- even 20 years after DDF was introduced with DB2 V2.2 -- a good bit of misunderstanding concerning this subject. I believe that the confusion comes from people looking at the DDF address space as they do the other DB2 "system" address spaces (the database services address space, also known as DBM1; the system services address space, or MSTR; and the internal resource lock manager, or IRLM). Generally speaking, these address spaces show very little in the way of CPU consumption (this is particularly true of the IRLM and MSTR address spaces). Here is some information from a real-world mainframe DB2 environment, as seen in a DB2 monitor statistics detail report covering a busy two-hour time period:
TOTAL TIME
------------
SYSTEM SERVICES 2:16.529524
DATABASE SERVICES 55:38.969257
IRLM 21.249774
So, on this system (a big one, with multiple engines, running a large DB2 workload), the IRLM address space consumed 21 seconds of CPU time and the DB2 system services address space consumed just over 2 minutes and 16 seconds of CPU time. The DB2 database services address space consumed considerably more CPU time than these other two (a little more than 55 minutes' worth), but that's not unusual when there's a lot of prefetch and database write I/O activity on a system (true in this case), as the CPU time for such I/O operations is charged to DBM1. That the CPU consumption of these address spaces is relatively low on a system with a great deal of DB2 data access activity is not at all unusual: overall CPU consumption for a DB2 workload is associated primarily with SQL statement execution, and the vast majority of that time is charged to the address spaces (e.g., CICS regions and batch initiators) through which data access requests get to DB2.
Now, during the same two-hour period on the same system on which the numbers shown above were seen, CPU consumption of the DB2 DDF address space was as follows:
TOTAL TIME
--------------
DDF ADDRESS SPACE 3:16:34.642514
"Three hours and sixteen minutes?!? What's with that? I thought that the DB2 address spaces are supposed to show smaller numbers for CPU consumption! Is there a problem in my system? What should I do?!?"
First of all, you should relax. You can't equate DDF CPU consumption with that of the other DB2 "system" address spaces, because it really is a different animal. IRLM, MSTR, and DBM1 are about data access. DDF is about transaction management (specifically, client-server transactions) -- and that understanding should point you to the reason for sometimes-high DDF CPU utilization: it's driven by execution of SQL statements that get to DB2 through the DDF address space (typically, SQL statements issued from programs running in network-attached application servers, or from DB2 stored procedures called by such programs). Recall that I mentioned, a couple of paragraphs up, that the vast majority of the CPU time associated with SQL statement execution is charged to the "come from" address space. If the statement comes from a CICS transaction program, that time is charged to the CICS region in which the program executed. If the statement comes from a DRDA requester, the CPU time is going to be charged to the DDF address space (and stored procedure usage doesn't change this picture: CPU time consumed by a stored procedure is charged to the address space through which the stored procedure CALL got to DB2). So, in a system in which there is a lot of client-server DB2 activity, you'll see higher numbers for DDF CPU consumption (at plenty of sites, DDF-related activity is the fastest-growing component of the overall DB2 for z/OS workload, and at more and more locations it is already the largest component of the DB2 workload).
Back to the DB2 monitor statistics detail report referenced earlier: the breakdown of address space CPU time tells the story with respect to DDF CPU consumption. Here's the additional information from the DDF address space line that I didn't show before (I'm showing times down to the millisecond level instead of the microsecond level to save space):
TCB TIME PREEMPT SRB NONPREEMPT SRB TOTAL TIME
-------- ----------- -------------- -----------
DDF 1:02.659 3:13:21.699 2:10.283 3:16:34.642
You can see that almost all of the DDF address space CPU time (about 98%) is in the "preemptible SRB" category. This is "user" CPU time, associated with the execution of SQL statements issued by DRDA requesters (work tied to database access threads -- aka DBATs -- is represented by preemptible SRBs, which you can think of as "dispatchable" SRBs). The other 2% of the DDF CPU time is "system" time, related to work done under DDF's own tasks on behalf of the aforementioned "user" tasks. So, DDF is in fact a very CPU-efficient manager of DB2-accessing client-server transactions.
Now, what would you do to reduce the CPU consumption of a DDF address space? You'd take actions to reduce the execution cost of SQL statements that get to DB2 through DDF. Those actions might include the following:
Now you know what's behind the DDF address space CPU times that you're seeing on your system, and what you can do to reduce DDF address space CPU consumption. I hope that this information will be useful to you.
This query prompted me to write a blog entry about DDF CPU utilization, because there remains today -- even 20 years after DDF was introduced with DB2 V2.2 -- a good bit of misunderstanding concerning this subject. I believe that the confusion comes from people looking at the DDF address space as they do the other DB2 "system" address spaces (the database services address space, also known as DBM1; the system services address space, or MSTR; and the internal resource lock manager, or IRLM). Generally speaking, these address spaces show very little in the way of CPU consumption (this is particularly true of the IRLM and MSTR address spaces). Here is some information from a real-world mainframe DB2 environment, as seen in a DB2 monitor statistics detail report covering a busy two-hour time period:
TOTAL TIME
------------
SYSTEM SERVICES 2:16.529524
DATABASE SERVICES 55:38.969257
IRLM 21.249774
So, on this system (a big one, with multiple engines, running a large DB2 workload), the IRLM address space consumed 21 seconds of CPU time and the DB2 system services address space consumed just over 2 minutes and 16 seconds of CPU time. The DB2 database services address space consumed considerably more CPU time than these other two (a little more than 55 minutes' worth), but that's not unusual when there's a lot of prefetch and database write I/O activity on a system (true in this case), as the CPU time for such I/O operations is charged to DBM1. That the CPU consumption of these address spaces is relatively low on a system with a great deal of DB2 data access activity is not at all unusual: overall CPU consumption for a DB2 workload is associated primarily with SQL statement execution, and the vast majority of that time is charged to the address spaces (e.g., CICS regions and batch initiators) through which data access requests get to DB2.
Now, during the same two-hour period on the same system on which the numbers shown above were seen, CPU consumption of the DB2 DDF address space was as follows:
TOTAL TIME
--------------
DDF ADDRESS SPACE 3:16:34.642514
"Three hours and sixteen minutes?!? What's with that? I thought that the DB2 address spaces are supposed to show smaller numbers for CPU consumption! Is there a problem in my system? What should I do?!?"
First of all, you should relax. You can't equate DDF CPU consumption with that of the other DB2 "system" address spaces, because it really is a different animal. IRLM, MSTR, and DBM1 are about data access. DDF is about transaction management (specifically, client-server transactions) -- and that understanding should point you to the reason for sometimes-high DDF CPU utilization: it's driven by execution of SQL statements that get to DB2 through the DDF address space (typically, SQL statements issued from programs running in network-attached application servers, or from DB2 stored procedures called by such programs). Recall that I mentioned, a couple of paragraphs up, that the vast majority of the CPU time associated with SQL statement execution is charged to the "come from" address space. If the statement comes from a CICS transaction program, that time is charged to the CICS region in which the program executed. If the statement comes from a DRDA requester, the CPU time is going to be charged to the DDF address space (and stored procedure usage doesn't change this picture: CPU time consumed by a stored procedure is charged to the address space through which the stored procedure CALL got to DB2). So, in a system in which there is a lot of client-server DB2 activity, you'll see higher numbers for DDF CPU consumption (at plenty of sites, DDF-related activity is the fastest-growing component of the overall DB2 for z/OS workload, and at more and more locations it is already the largest component of the DB2 workload).
Back to the DB2 monitor statistics detail report referenced earlier: the breakdown of address space CPU time tells the story with respect to DDF CPU consumption. Here's the additional information from the DDF address space line that I didn't show before (I'm showing times down to the millisecond level instead of the microsecond level to save space):
TCB TIME PREEMPT SRB NONPREEMPT SRB TOTAL TIME
-------- ----------- -------------- -----------
DDF 1:02.659 3:13:21.699 2:10.283 3:16:34.642
You can see that almost all of the DDF address space CPU time (about 98%) is in the "preemptible SRB" category. This is "user" CPU time, associated with the execution of SQL statements issued by DRDA requesters (work tied to database access threads -- aka DBATs -- is represented by preemptible SRBs, which you can think of as "dispatchable" SRBs). The other 2% of the DDF CPU time is "system" time, related to work done under DDF's own tasks on behalf of the aforementioned "user" tasks. So, DDF is in fact a very CPU-efficient manager of DB2-accessing client-server transactions.
Now, what would you do to reduce the CPU consumption of a DDF address space? You'd take actions to reduce the execution cost of SQL statements that get to DB2 through DDF. Those actions might include the following:
- Convert external stored procedures to native SQL procedures. This is actually more about reducing the monetary cost of CPU cycles consumed by a DB2 client-server workload, versus reducing the cycles consumed. When a native SQL procedure (introduced with DB2 9 in new-function mode, and available in a DB2 10 new-function mode environment if you're migrating to DB2 10 from DB2 V8) is executed through DDF, around 60% of the associated CPU time will be zIIP eligible (the same is not true for external stored procedures). This zIIP offload will reduce the cost of your DB2 client-server workload, because zIIP MIPS are less expensive than general-purpose CPU MIPS.
- Take greater advantage of DB2 dynamic statement caching. Often, a lot of dynamic SQL statements are executed through DDF. If you are getting a low hit ratio in your dynamic statement cache (check this via your DB2 monitor), consider making the dynamic statement cache larger (done via a ZPARM change). This assumes that you have enough real storage on your system to back a larger dynamic statement cache.
- Go to DB2 10 (if you're not already there) and take advantage of high-performance DBATs. This is done by binding packages executed via database access threads with RELEASE(DEALLOCATE). I blogged on this topic last year.
- Improve SQL statement CPU efficiency by tuning your DB2 buffer pool configuration. This, of course, would beneficially affect all SQL statements -- not just those executed via DDF. I have "part 1" and "part 2" blog entries on this subject.
- Turn dynamic SQL statements executed via DDF into static SQL statements. We have a nice tool, IBM Optim pureQuery Runtime, that can help you to do this without having to change client-side program code. pureQuery Runtime can also be used to "parameterize" dynamic SQL statements that were coded with references to literal values in predicates -- something that will boost the effectiveness of the DB2 dynamic statement cache. I recently wrote about pureQuery Runtime on my DB2 for z/OS tools blog.
- Tune your higher-cost DDF-related SQL statements. We have a tool that can help you to do this in a proactive, productive way. It's called the IBM InfoSphere Optim Query Workload Tuner. You can find out more about this product in an entry I posted to my DB2 for z/OS tools blog just last week.
Now you know what's behind the DDF address space CPU times that you're seeing on your system, and what you can do to reduce DDF address space CPU consumption. I hope that this information will be useful to you.
Subscribe to:
Posts (Atom)