Tuesday, November 24, 2020

Db2 for z/OS EXPLAIN: What an Access Path Is, and What it Would Be

The EXPLAIN functionality of Db2 for z/OS, by which the query optimizer's access path determinations are externalized, has been around as long as I can remember, and Db2 people, by and large, are familiar with the concept. In recent weeks, however, I discovered that there is some lack of awareness of EXPLAIN enhancements delivered through more recent versions of Db2 - specifically, Db2 Version 8, Db2 10 and Db2 12. These enhancements, which provide ways to see what a statement's access path is, and what it would be, can be very helpful in certain scenarios. I will describe the newer EXPLAIN capabilities and related use cases in this blog entry.


Db2 Version 8: using EXPLAIN to get "access path is" information for cached dynamic statements

"Db2 Version 8?" you might ask. "That came out 16 years ago. Are you telling me there's something about Db2 V8 that I still might not know?" Yes, that's what I'm telling you. And it's not because you aren't smart. It's because you're a human being, and there's only so much knowledge that one can put into one's brain and be able to pull out, intact, on-demand. Db2 V8, like most Db2 for z/OS versions, was packed with a lot of Very Important New Features, such as 64-bit addressing; table-controlled range-partitioning; a type 4 JDBC driver; and the ability to change a table column's data type (within a family of data types, such as character string or numeric) with an ALTER, versus unload/drop/re-create/re-load - and that's just to name a few of the Db2 V8-introduced capabilities that were a Big Deal. With all that "wow" stuff, no surprise that some of the less-Big - but still really helpful - new things escaped some peoples' notice (or were noticed and later forgotten).

One of those not-huge-but-really-helpful Db2 V8 enhancements concerned EXPLAIN, in the context of the dynamic statement cache. Now, we got dynamic statement caching well before Db2 V8, and it was very important in making Db2 for z/OS an outstanding data server for applications - very often client-server applications, and often provided by vendors - that exclusively (or almost exclusively) issue SQL statements that are dynamically prepared by Db2. Sometimes, though, there could be a performance analysis challenge presented by the dynamic statement cache: what if you needed to know the access path for a dynamic query that had been prepared and cached? Sure, you could take the text of the query and make that input to an EXPLAIN statement, but the output of that EXPLAIN would tell you what the query's access path would be if it were generated right now. That's not what you want, in this case. What you want is the access path that was generated when the query was initially prepared and cached. Well, depending on the size of the statement cache and the frequency of the cached statement's execution (when a statement has to be removed from the cache to make room for another statement, that is done on a least-recently-used, or LRU, basis), initial preparation of the statement might have happened hours or days ago - who's to say that an access path generated for the statement NOW would be what it was THEN?

Db2 V8 addressed that scenario by providing a new EXPLAIN option which can be utilized as shown in the example below (the Db2 V8 enhancement is highlighted in red):

EXPLAIN STMTCACHE STMTID 1287;

Now, plenty of Db2 people are familiar with the STMTCACHE option of EXPLAIN, but only when that option is used with the keyword ALL, as in, EXPLAIN STMTCACHE ALL. Executing that statement - referred to by many as, "snapping the cache" - will cause Db2 to insert, into the DSN_STATEMENT_CACHE_TABLE qualified by your ID, a row for each statement in the dynamic statement cache. Snapping the cache can be a very handy way to get some very useful information about cached dynamic statements. When EXPLAIN STMTCACHE is issued together with STMTID xxxx, where "xxxx" could be an integer value, as shown in my red-highlighted example, above, or a host variable into which a statement ID had been placed, what Db2 provides is the access path information that was generated for the statement associated with the ID (and statement ID is part of what you get in the output of EXPLAIN STMTCACHE ALL) when that statement was initially prepared and placed in the dynamic statement cache. This is so because EXPLAIN STMTCACHE STMTID does not generate EXPLAIN information - it extracts EXPLAIN information. See, when Db2 prepares a dynamic statement and caches the prepared form of the statement in the statement cache, it also stores access path information for the statement in a compressed form, and that information is what's accessed and expanded into human-usable form when EXPLAIN STMTCACHE is issued with the STMTID option.

OK, challenge effectively addressed. This Db2 V8 enhancement laid the groundwork for a similar enhancement, for static SQL statements, that was delivered with Db2 10.


Db2 10: getting "what is" access path information for a package when you neglected to ask for it at BIND (or REBIND) time, and getting "what would be" access path information for a package

First, getting "what is" access path information for statements in a package: you might think, "That's easy enough - just specify EXPLAIN(YES) when you bind (or rebind) the package." Okay... but what if you didn't do that at bind or rebind time (the default value for the EXPLAIN option of BIND and REBIND PACKAGE is NO)? EXPLAIN-ing the statements in the package won't be very helpful, as that will give you "would be now" access path information for the statements, versus the "was at bind (or rebind) time" access path information you want. You're out of luck, right?

No, you're not out of luck. That compact-form access path information I mentioned above in describing EXPLAIN STMTCACHE STMTID? Db2 started making that part of a package's structure with Db2 9, and Db2 10 provided a way to extract the information from a package (for packages generated in a Db2 9 environment or beyond). Again, an example of the new (with Db2 10) EXPLAIN statement syntax, with the new part highlighted in red:

EXPLAIN PACKAGE COLLECTION ’COLLA’ PACKAGE ’PKG1234’ COPY ’CURRENT’;

As with EXPLAIN STMTCACHE STMTID, execution of the statement shown above would extract, not generate, access path information for statements in package PKG1234 in collection COLLA, and what you would get is information about the access paths generated at the time the package was last bound or rebound (absent a COPY specification, such as the COPY 'CURRENT' in the example above, the EXPLAIN PACKAGE statement would give you access path information for all available copies of the package, which could be the current, previous and original copies if PLANMGMT(EXTENDED) functionality is in effect).

Let's turn the package scenario around. Suppose you actually want "what would be" access path information for the package's statements. In other words, you want to know what the access paths for the statements would be if they were generated now. Why would you want to know that? Well, maybe there's been a change in the Db2 environment that would prompt you to rebind a package (maybe many packages) in order to get a performance boost - maybe some new indexes have been defined on tables accessed by statements in the package, or maybe catalog statistics have been enriched so as to get better-performing access paths for some of the package's statements (and those statistics-enrichment actions may have been suggested by Db2 itself), or maybe you've migrated to a new version of Db2. You don't want to issue REBIND package with EXPLAIN(YES), because that would actually cause the package to be rebound, and maybe you don't want to do that before seeing what the access paths for the package's statements would be as a result of the rebind. You could issue EXPLAIN for individual statements of the package, but that could be pretty laborious, especially for a package with quite a few statements. What to do?

Simple: you rebind the package with EXPLAIN(ONLY), an option introduced with Db2 10 (and available for BIND PACKAGE as well as for REBIND PACKAGE commands). When EXPLAIN(ONLY) is specified for a BIND or REBIND PACKAGE command, a new package is not generated by Db2; instead, Db2 only generates EXPLAIN information for the package's statements, and it does that as though you had issued EXPLAIN for the individual statements - that is to say, Db2 goes through standard EXPLAIN processing for the package's statements, showing you what the access paths for those statements would be if they were generated right now. With that information in hand, you can make your decision as to whether or not you want Db2 to actually generate a new instance of the package via a REBIND PACKAGE command.


Db2 12: getting "what is" information for stabilized dynamic statements

Db2 12 introduced an important new feature, called dynamic plan stability, whereby Db2 can persist the prepared form of a cached dynamic SQL statement to a new (with Db2 12) catalog table called SYSIBM.SYSDYNQRY. What that means: you can stabilize the access path for a dynamic query, even across stop/starts of a Db2 subsystem, because if a stabilized query comes into the system and its prepared form is not in the dynamic statement cache (as would be the case if this were the first issuance of the query following a stop and start of the subsystem, with the dynamic statement cache initially being empty after the -START DB2), Db2 will not re-prepare the statement - instead, Db2 will retrieve the prepared form of the statement from the SYSDYNQRY catalog table and load that structure into the dynamic statement cache.

That's great, when you want to avoid the performance instability that can result from reoptimization of a query, but it also would seem to present an access path analysis challenge. We know that access path information is available (via EXPLAIN STMTCACHE STMTID) for a statement in the dynamic statement cache, but it is possible (as just pointed out) that a stabilized dynamic statement is not in the dynamic statement cache. What then? How can we, in that case, obtain access path information for the stabilized query?

No problem. You get that information using this statement (with the new part, as before, highlighted in red):

EXPLAIN STABILIZED DYNAMIC QUERY STMTID 179 COPY 'CURRENT';

Here, the STMTID is not the ID assigned by Db2 to the statement when it is inserted into the dynamic statement cache - it is, rather, the persistent statement ID generated for the statement by Db2 when the statement is stabilized by way of the new (with Db2 12) command -START DYNQUERYCAPTURE (this is the value placed in the SDQ_STMT_ID column of the query's row in the SYSDYNQRY catalog table). Execution of the EXPLAIN statement shown above extracts, from the SYSDYNQRY catalog table (actually, from an associated auxiliary table in a LOB table space), the access path information for the query as it was when the query was initially stabilized (that being the access path information stored with the prepared form of the statement in the dynamic statement cache when the query was last optimized - the information that would be extracted via EXPLAIN STMTCACHE STMTID for a statement currently in the dynamic statement cache).

And there you have it. The access path information you want, Db2 has (when you want "access path is" information) or can generate (when you want "what would be" information). You just need to know how to get it. And now you do.