Friday, June 30, 2023

Db2 for z/OS: What I Would Say to Application Developers (Part 1)

Not long ago, I received a request to deliver a couple of Db2 for z/OS-focused webcasts for an organization's application developers. The person who asked about this initially gave me the impression that the purpose of the webcasts would be to help developers write "efficient SQL." This did not have much appeal for me (as I'll explain below), and I communicated as much back to the requester. Subsequently, this individual rephrased the request, indicating that the aim of the webcasts would be to provide "insights for developers to increase their confidence and skills around Db2 [for z/OS] in both development and problem analysis." "OK," I thought to myself, "Now we're talking." This ask gave me an opportunity to think about what I'd like to say to people who write (or might write in the future) application programs that involve accessing Db2 for z/OS-managed data. I'll share these thoughts of mine in a two-part blog entry. In this first part I'll talk about application performance - but maybe not in the way you'd expect. In the part two entry, which I'll likely post in the next 2-3 weeks, I'll focus on application enablement from a Db2 for z/OS perspective.

OK, why will a request to talk to developers about "writing efficient SQL" generally rub me the wrong way? Two reasons: first, something I heard a few years ago. I was at a big Db2-related conference, sitting in the audience for a session delivered by the person who was at that time the leader of the optimizer team in the IBM Db2 for z/OS development organization (Db2's optimizer parses a query and generates for that query the access plan that it estimates will produce the requested result set at the lowest cost and with the best performance). The presenter said (in words to this effect, and with emphasis added by me), "As the leader of the Db2 for z/OS optimizer team, my message for application developers is this: job one for you is to write the query that will retrieve the data that your program needs. If that query could be written differently so as to retrieve the same result faster, we'll take care of that." What he was specifically referring to is the Db2 optimizer's ability to re-write a query under the covers so that the same result will be generated faster (more on that re-write capability momentarily). That statement by the optimizer team leader made a huge impression on me, and I think his words were absolutely spot-on.

I feel that it's very important for an application developer, when writing SQL targeting a Db2 for z/OS database, to focus on a query's objective, versus its form. Why? For one thing, job one really is to get the right data. If a query returns incorrect or incomplete data to a program, who cares if the query runs quickly? A bad result that is returned in a short time is still a bad result. Nothing is more important than retrieving the data that a program requires. Secondly, I believe it's very important for a developer writing Db2-targeting SQL to not have to think about the fact that the target DBMS is Db2 for z/OS. All you as an application developer should really have to think about is that the target DBMS is relational in nature. If you have to stop and think, "Oh, let's see - the data I'm going after is in a Db2 for z/OS database. That means I have to do X, Y and Z in order to get good performance," that's going to negatively impact your productivity, assuming that you're also called on to write SQL that targets other relational DBMSs. As far as I'm concerned, when Db2 for z/OS is the target DBMS you should just think, "relational DBMS," and go from there.

Here's another reason that a request to "tell developers how to write efficient Db2 for z/OS SQL" raises my hackles: too many Db2 for z/OS DBAs, in my opinion, just assume that the average application developer writes inefficient SQL. It's kind of like complaining about the food at college just because it's college food, regardless of whether or not it's actually tasty. That's not a helpful attitude. I've advised Db2 for z/OS DBAs that they should think of themselves as partners with developers when it comes to getting new applications and new application functionality into production. Similarly, I would advise developers to be partners with Db2 for z/OS DBAs when it comes to analyzing and addressing performance issues related to Db2 for z/OS-accessing applications.

How can a developer be a partner when it comes to taking action to resolve performance issues related to Db2 for z/OS-targeted queries? Some thoughts on that matter:

  • Learn some of the lingo. Sometimes, Db2 for z/OS DBAs will say things like, "This SQL statement isn't performing well because it has this stage 2 predicate?" Huh? OK, here's what that means: predicates (the result set row-qualifying parts of a query, such as WHERE ACCOUNT_NUM = 1234) in Db2 for z/OS SQL statements can be either stage 1 or stage 2 in nature. These terms refer, respectively, to two components of Db2 for z/OS: the data manager (stage 1) and the relational data system (stage 2). A stage 1 predicate can be evaluated by the Db2 data manager, while a stage 2 predicate has to be processed by the Db2 relational data system. Stage 2 predicates generally require more CPU time for processing than stage 1 predicates; furthermore, stage 1 predicates are usually index-able, while stage 2 predicates are almost never index-able. That index-able versus non-index-able characteristic of a query might result in a requirement for a table space scan when the query is processed, and that could really slow down query execution, especially when the table in question is really large. On the other hand, a stage 2 predicate in a query might not be a big deal if the query has another predicate or predicates that are highly filtering (i.e., that are evaluated as "true" for only a small number of a table's rows) and index-able.

If you're interested, you can read about stage 1 and stage 2 and index-able and non-index-able predicates on this page of the Db2 for z/OS online documentation. Do you need to sweat a lot about stage 1 versus stage 2 predicates when writing SQL statements for an application? I'd say, not necessarily. Remember that job one is to write a query that returns the data that your program needs. On top of that, Db2 for z/OS, especially over the most recent several versions, has gained more and more query re-write capabilities (as I mentioned previously). Suppose, for example, that you need to get from a Db2 for z/OS table all customers whose accounts were opened in 2010, and you write a predicate like this one to get those rows: 

WHERE YEAR(DATE_OPENED) = 2010

That predicate is stage 2 and non-index-able; however, Db2 for z/OS, in preparing your query for execution, can automatically re-write that predicate in this form, which is stage 1 and index-able:

WHERE DATE_OPENED BETWEEN '2010-01-01' AND '2010-12-31'

  • Know something about what's possible for improving a query's performance. If a query processed by Db2 for z/OS is not performing as it needs to, re-writing the query in some way is one possible solution, but there may be other performance-boosting actions that could be taken instead. One possible solution could be a Db2 for z/OS index created on an expression - something do-able since Db2 Version 9 (as of the date of this blog post, the current Db2 for z/OS version is 13). Suppose, for example, that your program needs rows selected from a Db2 for z/OS table based on an upper-case comparison of values in column COL1 with a provided character string constant. Your query might have a predicate that looks like this:

WHERE UPPER(COL1, 'EN_US') = 'ABCDE'

That predicate is stage 2 and non-index-able; however, it will become index-able if an index is defined on that expression, as shown below (assume that COL1 is a column of table T1):

CREATE INDEX UPPER_VAL ON T1
(UPPER(COL1, 'EN_US'))

Another possible query tuning action is to provide the Db2 optimizer with the catalog statistics that it needs to choose a better-performing access plan for the query. In fact, colleagues of mine who are part of the team in IBM Support that works on cases (i.e., problem tickets) related to Db2 for z/OS query performance have said that the large majority of query performance issues on which they work are ultimately resolved in this manner. Here's the deal: the primary input to Db2 for z/OS access path selection is statistics about objects related to the query - e.g., tables, and indexes on those tables, and columns in tables - that are periodically collected (often by a Db2 for z/OS utility called RUNSTATS) and stored in the Db2 catalog. The richer and more complete the catalog statistics are, the better the Db2 optimizer can do in generating a well-performing access plan for a query. The tricky part is this: what statistics should be gathered for tables, columns, indexes, etc. to enable the optimizer to choose a well-performing access path for a particular query? Would histogram statistics on a given column help? How about frequent-value percentage information for a column? How about correlation statistics for a certain pair of table columns? Telling RUNSTATS to generate every possible statistic on everything would make that utility too costly to execute, so the utility is often executed with a specification that generates what you might call a good "base" of statistics (TABLE(ALL) INDEX(ALL) is typically the specification used for this purpose). How do you know when the optimizer needs additional statistics - and which additional statistics - in order to generate a well-performing access plan for a query that is currently performing poorly? Fortunately, starting with Db2 12 for z/OS the optimizer answers this question for you in the form of statistics profiles, as described in an entry I posted to this blog a few years ago. I'd say, if a query you wrote is not performing as it needs to then before trying to re-write the query or asking a DBA to add or alter an index to address the situation, see if Db2 has inserted a statistics profile in the SYSTABLES_PROFILES catalog table for one or more of the tables accessed by your query (as described in the aforementioned blog entry). If there is such a statistics profile or profiles, work with a DBA to get RUNSTATS executed using the profile(s) and then let Db2 re-optimize the query using the statistics added to the catalog by that RUNSTATS job (for a so-called static query, re-optimization would be accomplished via a REBIND of the associated Db2 package; for a query that is dynamic in the Db2 sense of that word, re-optimization is triggered by invalidating the previous prepared form of the query in Db2's dynamic statement cache). In plenty of cases this will resolve a query performance issue.

  • Know something about EXPLAIN. EXPLAIN is a Db2 SQL statement (also an option of the BIND and REBIND PACKAGE commands, for static SQL statements) through which you can get information about the access path selected by the optimizer for a query (you can read about the EXPLAIN statement in the online Db2 for z/OS documentation). EXPLAIN-generated access path information, in its traditional form, is written to the EXPLAIN tables (these are Db2 tables, as you might expect). The most important of these tables is the one called PLAN_TABLE. Information in this table shows, among other things, the order in which the parts of a query are executed (for example, the order of table access when a statement involves a multi-table join), how data in a table is accessed (e.g., through an index and, if so, which index), the number of columns in an index key that are a match for columns referenced in a query predicate (MATCHCOLS - a higher number is generally a good thing), and the type of join method used when tables are joined (e.g., nested loop or merge scan). If you know something about the information in PLAN_TABLE, you'll be better equipped to partner with a Db2 for z/OS DBA to see how execution of a query that is not performing as desired could potentially be sped up.
A lot of veteran Db2 for z/OS DBAs are very familiar with EXPLAIN information in its PLAN_TABLE form. If you're interested in viewing EXPLAIN information in a different form, consider using the Visual Explain feature of the IBM Db2 for z/OS Developer Extension for Visual Studio Code (a no-charge IBM software tool designed to facilitate development of applications that access Db2 for z/OS data). Visual Explain (as the feature's name implies) provides a visual representation of the access path selected by the Db2 optimizer for a query; and, it's not just pretty pictures - hovering over or clicking on a part of the displayed access path provides very useful related information. Among other things, you can see the number of result set rows that Db2 thinks there will be following execution of that part of the access plan. For a query that is not performing as desired, you might see such information and think, "That's not right. After accessing that table the result set should have way more (or way fewer) rows than indicated by this estimate." The implication here is that you know something about the data that Db2 doesn't know (thus the Db2 optimizer's off-the-mark estimate concerning refinement of the result set as the query's access plan is processed). That, in turn, could suggest that catalog statistics might need to be augmented to provide Db2 with a clearer view of the characteristics of data in a target table (as mentioned in the reference, above, to Db2's SYSTABLES_PROFILES catalog table); or, that might prompt you to consider adding or modifying a query predicate to provide Db2 with a different form of the data request - one that might generate the required result set more quickly.

The bottom line is this: while your primary focus in writing Db2 for z/OS SQL statements should be on retrieving the data your program requires (and I'm focusing on queries because INSERT, UPDATE and DELETE operations are usually more straightforward from a performance perspective), there could be a situation in which a query you coded needs to execute with greater speed and efficiency. Query performance tuning is something to which many Db2 for z/OS DBAs are accustomed, but success in that endeavor can be accelerated and made more likely when developers and DBAs work on this as a team. Don't worry about knowing as much about Db2 for z/OS as your DBA teammate - that's not your job; but, realize that your understanding of your application's data requirements, and of the data the application is accessing in Db2, can be a big help when it comes to tuning a query's performance. It's definitely a case in which 1 (your specialized knowledge as a developer) plus 1 (the DBA's specialized knowledge of Db2 for z/OS) is greater than 2.

In the part 2 of this blog entry I'll have some things to say about application enablement in a Db2 for z/OS context - that is, about ways that you can leverage Db2 functionality to get more feature-rich applications developed more quickly.