Robert's Db2 blog: DB2 for z/OS: What Might "Template" DDL for CREATE TABLESPACE and CREATE INDEX Look Like?

Wednesday, February 27, 2013

DB2 for z/OS: What Might "Template" DDL for CREATE TABLESPACE and CREATE INDEX Look Like?

Recently, a DB2 for z/OS DBA sent me a question about a particular clause of the SQL DDL (data definition language) statements CREATE TABLESPACE and CREATE INDEX. In his note containing the question he also sent what you might call his "template" CREATE TABLESPACE and CREATE INDEX statements -- the statements that serve as a pattern that he uses when creating these objects in his DB2 environment. Having such template DDL can be handy, in that it gives you a starting point for coding the statement that will be used to create a particular object. You might vary a clause specification or two to suit a given need, but the general pattern will be followed -- and that to your organization's benefit, if you've chosen the specifications in your template DDL thoughtfully. In this blog entry I'll give you my thoughts on what you might want to have in your template CREATE TABLESPACE and CREATE INDEX statements.

[Note that my use of the word template as it pertains to DDL should not be confused with the TEMPLATE control statement that can be used to facilitate data set allocation when executing some IBM DB2 utilities. Note, too, that I'm writing about application table spaces and indexes, not work file or system (e.g., catalog and directory) objects.]

Let me tell you up front that I'm not going to comment here on every single clause of the CREATE TABLESPACE and CREATE INDEX statements -- I don't have time for that, and even if I did it wouldn't add a lot of value. I'm fine with the default values for many of these clauses (e.g., LOGGED and TRACKMOD YES for CREATE TABLESPACE, and CLOSE YES and GBPCACHE CHANGED for both CREATE TABLESPACE and CREATE INDEX). If you are also OK with the default values for these clauses, you can leave them out of your template DDL (in order to accept the default values), or you can specify some or all of them if you want to see in the DDL a reminder of just what the default values for the clauses are. What I'll provide from here on out are my thoughts concerning some CREATE TABLESPACE and CREATE INDEX clauses about which there is some apparent misunderstanding within the mainframe DB2 community, and/or for which I have some opinions regarding specified values.

For CREATE TABLESPACE and CREATE INDEX

USING STOGROUP -- You might think that this would go without saying -- that the use of DB2-managed data sets (indicated by USING STOGROUP) versus user-defined data sets (indicated by USING VCAT) is understood by everyone as being the way to go. I hope that's the case, but I'm including this recommendation anyway, just to make sure. Who wants to mess with z/OS Access Method Services statements to create the data sets that DB2 will use for table spaces and indexes? Let DB2 do it. Note, by the way, that this convention has moved beyond application-related objects: starting with DB2 9, work file table spaces can be DB2-managed, and with DB2 10 the catalog and directory objects become DB2-managed (I have some information about DB2-managed work file table spaces in an entry that I posted to this blog last spring, and information about DB2- and SMS-management of catalog and directory objects in a DB2 10 environment in another entry, posted in the fall of 2011). Note also that, starting with DB2 9 (in new-function mode), you can specify SMS data class, management class, and storage class names in a CREATE STOGROUP statement. That could reduce the need for automatic class selection (ACS) routines used for DB2 data sets at your site.
PRIQTY and SECQTY -- Leave them off of your CREATE TABLESPACE and CREATE INDEX statements, or specify them with a value of -1 (i.e., PRIQTY -1 and SECQTY -1). Either way, what this means is that you're going with a default primary quantity for the object's underlying data set(s), and you're letting DB2 handle allocation of secondary space as needed for the data set(s). The default primary quantity is specified via two ZPARM parameters (TSQTY for table spaces, and IXQTY for indexes -- both with a default value of one cylinder), and DB2 manages secondary space allocation by way of what's known as the "sliding scale" algorithm (this when the ZPARM parameter MGEXTSZ is set to YES). Letting DB2 manage space allocation for table space and index data sets makes all kinds of sense: it works, it helps to keep you out of trouble (in terms of avoiding data set extend failures), and it frees up DBA time for more high-value tasks. I wrote about DB2 management of data set space allocation in an entry that I posted to the blog that I maintained while working as an independent DB2 consultant.

For CREATE TABLESPACE

MAXPARTITIONS -- What I'm getting at here is the idea of a default table space type for your environment. Your first decision is universal versus non-universal. You might want to give serious consideration to making universal partition-by-growth your default table space type (that would be indicated by having the MAXPARTITIONS and SEGSIZE clauses in the CREATE TABLESPACE statement). The primary reason for this recommendation: to an increasing extent, exploitation of new DB2 features will require the use of universal table spaces. With DB2 9 (the version with which universal table spaces were introduced), it was clone tables. With DB2 10, it's hash-organized tables, LOB-inlining, and the CURRENTLY COMMITTED mode of data access. I fully expect that this trend will continue with future versions of DB2. Using universal table spaces gives you maximum flexibility when it comes to leveraging DB2 technology. If you're going to go universal, the right choice for most table spaces is likely to be partition-by-growth (PBG) versus partition-by-range (PBR). Partition-by-range has its advantages, but I believe that PBR is best used for certain larger tables in a database (e.g., tables holding at least one million rows) that have columns or column sets that would make for good partitioning keys. Is non-universal ever the right choice for a table space? It could be if you want to assign several tables (perhaps a number of small-ish reference tables) to a single table space -- something you might do to keep a lid on the number of data sets in a DB2 system, and to simplify the backing up of said set of tables (via the COPY utility, which operates at the table space level). A traditional segmented table space can hold several tables, whereas one universal table space can hold one table. Bear in mind that the MEMBER CLUSTER table space definition option, required at some sites to support very high volumes of insert processing, is valid for universal table spaces in a DB2 10 new-function mode environment (it could not be used with universal table spaces in a DB2 9 system).
SEGSIZE -- Most all of your tables should be segmented -- either traditional segmented table spaces or universal table spaces (the latter are also segmented). An exception to this rule: in a DB2 9 environment, you might choose a traditional range-partitioned table space over a PBR universal table space if the volume of inserts into the associated table will be very high, as the former table space type will likely deliver a performance advantage over the latter (with DB2 10, high-volume insert performance for a universal PBR table space is pretty much equivalent to what you get with a traditional range-partitioned table space). Aside from that case, segmented -- whether universal or non-universal -- is the way to go. The default SEGSIZE value is 4, and that might make sense for a non-universal segmented table space to which several really small tables are assigned. For single-table table spaces, the defualt SEGSIZE of 4 is probably not the best choice. I'd be more inclined to go with something like SEGSIZE 32, with an eye towards boosting prefetch efficiency.
LOCKSIZE -- The default value here is ANY, which allows DB2 to use any type of locking granularity when accessing the table space. Yes, DB2 will typically use page-level locking for a table space that is defined with LOCKSIZE ANY, but I like to make the locking granularity choice myself. I prefer to go with LOCKSIZE PAGE as a default for table spaces. I'd change that to LOCKSIZE ROW for a table space only if page-level locking resulted in an unacceptable level of lock contention. Some people believe that you can't use row-level locking in a DB2 data sharing environment, or that doing so would be a big mistake. In fact, selective use of row-level locking in a DB2 data sharing system can be quite beneficial in relieving lock contention issues, as I pointed out in a blog entry that I wrote a few years ago. Another thing about a change to row-level locking from page-level locking for a table space: doing that could cause some data-changing programs that target the table space to acquire considerably more locks between commits, and THAT could conceivably cause said programs to hit the limits you've established (via ZPARMs) on the number of locks held by one process across all the tables that it is accessing, and/or the number of locks acquired on a single table space (the latter put into effect for a table space by way of the LOCKMAX SYSTEM clause of CREATE TABLESPACE). To avoid a surprise in this department, consider raising the values of the ZPARM parameters NUMLKUS and NUMLKTS if you change from page-level to row-level locking for one or more of your table spaces. Also keep in mind that if you change the locking granularity for a table space from page-level to row-level (or vice versa), and you want the new locking granularity to be utilized for static SQL statements that target the table (or tables) in the table space, you'll need to rebind the associated packages after altering the LOCKSIZE specification.
COMPRESS -- DB2 for z/OS data compression is very CPU-efficient, thanks to a hardware assist, and it is therefore widely used. I'm a big fan of COMPRESS YES myself, but I don't think that it should be used for every table in a database. A compression dictionary takes up 64 KB of space in memory, so it's possible that compressing a really small table space could lead to that table space actually having a somewhat larger virtual and real storage "footprint" than it would have were it not compressed. I'd certainly lean towards compressing table spaces with 1000 or more pages.
PCTFREE -- For a table space, the default value for PCTFREE is 5 (that's 5% of the space in a page). If a table's rows are not clustered by a continuously-ascending key, it would be good to have some "holes" in pages (following a REORG or a LOAD) into which newly-inserted rows could go -- the better to maintain good data organization between REORGs. In that case, PCTFREE 5 might be too small (5% of a 4K page is about 200 bytes, and that may or may not be enough to hold a table row -- and remember to consider compressed row length if you specified COMPRESS YES for the table space). A PCTFREE value of 10 might be better for maintaining data organization than PCTFREE 5. Of course, if a table's rows are clustered by a continuously-ascending key, a non-zero PCTFREE value would probably result in wasted space.

For CREATE INDEX

FREEPAGE -- The default value of FREEPAGE is 0. Many indexes are defined on keys that do not have continuously-ascending values. For these indexes, new entries will go into the "middle" of the index. Some index leaf page split activity has to be expected, and when that happens it would be nice to be able to put the "split out" index entries into a page that is not too far from the page that was split. For that to happen you need to have some empty pages throughout the index (following an index REORG or a LOAD of the underlying table), because index entries that are relocated as a result of a leaf page split have to go into an empty page. If FREEPAGE 0 is specified for an index, the only empty pages will be at the end of the index, and that's where "split out" entries will go when page splits occur. The result could be fairly rapid disorganization for an index defined on a non-continuously-ascending key, if a lot of rows are inserted into the underlying table. For such an index, consider a FREEPAGE value of between 5 and 10 (FREEPAGE 0 is appropriate for an index on a continuously-ascending key).
BUFFERPOOL -- Starting with DB2 9 in new-function mode, indexes could be assigned to buffer pools used for pages larger than 4K in size. Going with 8K, 16K, or 32K pages for an index is required if you want to utilize the index compression capability introduced with DB2 9, but even if you aren't interested in compressing an index, a larger page size could be a good choice. In particular, if the index in question is defined on a non-continuously-ascending key, and the volume of inserts into the underlying table is high, a large index page size (32K, for example) could substantially reduce the occurrence of index page splits, and that could deliver a nice performance dividend.

Do you have template DDL that you use as a starting point for CREATE TABLESPACE and CREATE INDEX statements in your DB2 environment? If not, perhaps you should. If you already have DDL that you use as a pattern for CREATE TABLESPACE and CREATE INDEX statements, how long ago was that pattern set up? Has the template DDL kept pace with DB2 changes that provide new options for table spaces and indexes? If not, a review and an update of the template DDL could be in order. It's all part of being ready to put the latest DB2 technology to the best use at your site.

16 comments:

AnonymousMarch 6, 2013 at 9:14 AM
Re: "e.g., tables holding at least one million rows)"

Dislike this . . . This strategy ignores simple laws of physics. A table defined with a row size that allows the maximum rows/page (255) will consume ~16 MB – which is great …. I have an active project that will have a CLOB whose size will average 1.5 MB. At 1,000,000 rows that table’s partition would be 1.5 TB – that a T, not a G or M. Recovering this puppy (if a 1.5 TB partition was even allowed) would not be nice."

Most shops I am familiar with (in cluding where I work) have changed to a partition-by-size strategy - i.e. every 1 GB or 2 GB.

Michael Harper, TD Bank
ReplyDelete
Replies
AnonymousMarch 6, 2013 at 9:22 AM
A great feature to use is ISPF Models. A few *cough* decades ago I coded DB2 DDL and Utility command templates as models making it simple for people to enter in edit commands such as MODEL DB2UTIL REORG. Models liberally sprinkled with ")CM" are wonderful productivity tools.

Another method I have see is to store the templates in an edit macro.

Michael Harper, TD Bank

ReplyDelete
Replies
AnonymousMarch 11, 2013 at 6:48 AM
Robert and Michael, do you have observations about how sites using data modeling design tools that generate DB2 for z/OS DDL adapt the Template, Model or Mask techniques?

Sites might consider a DDL checklist, integrated with site standards and well commented best practices examples, that are repeatedly updated to reflect current knowledge and APARs. Along with a change control process that checks for reasons for deviations which may or may not be very justified by particular processing demands.
ReplyDelete
Replies
AnonymousJuly 10, 2013 at 2:03 PM
Hey Robert, I notice you didn't really mention what you thought the template should be for LOCKMAX. Most places I have been at we would use LOCKMAX 0 to prevent escalation of a lock. In the case a thread hits the maximum, they then fail and our response was "you aren't using the commit routine?" However, in my most recent assignment I am seeing the standard as LOCKMAX SYSTEM. I see quite a few tables where the locksize is escalating multiple times per day and I see quite a few -911's as well. I, for one am having a hard time convincing them the benefits they would see by changing their default here.

Dave Nance
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.