Wednesday, September 28, 2022

Db2 for z/OS: GETPAGEs, Predicates, SELECTs and DELETEs

About 15 years ago, I posted to the blog I maintained while working as an independent consultant (prior to rejoining IBM in 2010) an entry in which I named GETPAGEs the most important factor with regard to the CPU consumption of a Db2 for z/OS application workload. GETPAGEs, which are requests by Db2 (typically on behalf of application or user processes) to access pages of table spaces or indexes, are indeed a prime driver of SQL statement execution cost. That being the case, people who work on tuning Db2 query performance often aim to reduce GETPAGE activity associated with executing the query. That work, in turn, tends to focus on a query's predicates (the clauses of the query that determine which rows will qualify for the query's result set). Can an index be added or modified to enable index-level (versus table-level) row filtering for a given predicate? Can a non-index-able predicate be rewritten to be index-able? Can a predicate be added to further refine the query's result set? And so on.

The thing is, the predicate focus of query tuning could lead one to believe that the same GETPAGE-reducing actions could yield similar results for any SQL statement containing predicates, regardless of whether the statement is a SELECT or, say, a DELETE. That is not the case, especially for DELETEs versus queries, and the difference basically boils down to one thing: indexes. The same indexes that reduce GETPAGE activity for a query can make GETPAGE counts stubbornly high for a DELETE statement, in spite of tuning actions related to the DELETE statement's predicates. A Db2 SQL programmer recently ran up against this reality. He asked me about it, and I think his situation could be instructive for others.

The SQL programmer (I'll refer to him as R - the first letter of his first name) was analyzing the performance of a row-deleting process that would remove a few hundred thousand rows from a table in a given execution. Thinking that reducing GETPAGE activity would lower the CPU cost of the process, and approaching the tuning effort for the process's DELETE statement as one would approach tuning the performance of a query (thinking about an analogous query that would have the same predicates as the DELETE statement of interest), R had a DBA create on the target table a new index with a key comprised of the five columns referenced in a series of ANDed "equals" predicates in the DELETE statement (in other words, the DELETE had the series of predicates WHERE C1 = ? AND C2 =? AND C3 = ? AND C4 = ? AND C5 = ?, and the new index had the key C1 | C2 | C3 | C4 | C5). That would make for a MATCHCOLS value of 5 for the DELETE statement, right (referring to a column in the Db2 access path-explaining PLAN_TABLE)? And, that should reduce GETPAGE activity for the DELETE by enabling a greater degree of row filtering at the index level, right? Well, not exactly, as things turned out: R was surprised to see that the new index had very little impact on the GETPAGE count for the DELETE statement. What was going on here?

The index-impact result that surprised R comes down to essentially one thing, that being the fundamentally different nature of DELETE versus SELECT statements. Yes, both can have predicates, but that doesn't mean that a DELETE with predicates can be thought of as a "query" - DELETEs and SELECTs are like apples and oranges when it comes to their execution characteristics. A SELECT returns values (specifically, rows of values), and that's it. A certain index might sharply reduce GETPAGE activity for a query by reducing the need for Db2 to examine table space pages in order to find qualifying rows (in fact, an index could virtually eliminate table space GETPAGEs for a query, if all columns referenced in the query - in the query's select-list and in its predicates - are part of the index's key). A DELETE, on the other hand, changes pages, many of which - and this is of critical importance - will be pages in index spaces (excepting the unusual, but not unheard-of, situation in which a table has no indexes).

Given this aspect of DELETE processing, not only will adding an index to a table potentially have little impact on GETPAGE activity for a DELETE statement - it might even increase GETPAGE activity for the DELETE. Think about it: for every row removed from a table by a DELETE statement, an entry has to be removed from each and every index defined on the table (yes, it's actually a "pseudo-delete" of index entries, with the relevant entries just marked for later physical deletion, but this still involves index GETPAGEs). Not only that, but Db2 very often can't just march through an index's leaf pages, deleting index entries as corresponding rows are deleted from the underlying table - not when a given index's keys have a way-different sequence relative to the sequence in which table rows are being deleted. Maybe, because of matches on predicate columns, a DELETE statement is guided to rows qualifying for deletion by index X, but index Y, on the same table, may have a key whose ordering is very different from that of index X's (i.e., the two indexes' key values correlate little, if at all, on a row-by-row basis). In that case, finding the entry in index Y to delete as part of deleting a table row could well require an index probe operation (i.e., a top-down traversal of index Y, from root page to leaf page). If that kind of thing is happening for several indexes on the table, the number of GETPAGEs for a DELETE statement could be several times larger than the number of rows deleted; and, that's not because the DELETE has a "bad" access path - it's because the statement is a DELETE and not a SELECT.

Bottom line: comparing GETPAGE counts between SELECT and DELETE statements is not useful or meaningful, even if the statements have identical predicates. If you have a row-delete process that is consuming more CPU than you'd like, what can you do about it? Here are a couple of possibilities:

  • Reduce - don't increase - the number of indexes on the target table. As I've mentioned, it's a general truism that while indexes can boost query performance, they tend to increase CPU and elapsed time for DELETE statements. That being the case, a good way to boost DELETE performance is often to find and remove indexes on the target table that are not doing any good. I call this "index trimming," and I provided some information on that topic in an entry I posted to this blog some years ago (I'd ignore the part of that blog entry that deals with hash-organized tables - that Db2 feature was deprecated with function level 504 of Db2 12 for z/OS).
  • Consider using the DISCARD option for online REORG. Particularly when the row-delete criterion (or criteria) is not particularly complex, and can be expressed in the form of a predicate or predicates referencing the target table, executing an online REORG of the table's table space with the DISCARD option can be an attractive way to efficiently remove a large number of rows from the table with minimal disruption of application access to the data (there will always be at least a brief - sometimes just a few seconds - period of no data access, when a REORG job enters the SWITCH phase near end-of-execution).
I hope that this information will be useful for you. Don't confuse DELETE-statement tuning with SELECT-statement tuning. Two different animals, as we say.

9 comments:

  1. Robert,

    Thank you so much for the post!!

    Thanks,
    Ravikumar

    ReplyDelete
  2. Robert,

    Thank you so much for creating this post in response to my previous request!!

    Based on my post https://www.idug.org/discussion/db2-help-needed-to-understand-getpage-activity-in-a-delete-query,

    I am trying to understand the GETPAGEs against the "new index" alone, please note that I am not talking about the total GETPAGES of the DELETE query which will include GETPAGEs of primary index plus new index plus tablepsace.


    Without new index "run" - number of GETPAGEs
    --------------------------------------------
    PRIMFEK0 index - 1,339,424
    TEMT_ITM_ABC_TEMP table - 18,382

    With new index "run" - number of GETPAGEs
    --------------------------------------------
    SECDKEY1 index - 1,345,556
    PRIMFEK0 index - 25,182
    TEMT_ITM_ABC_TEMP table - 20,472

    From Plan_Table
    ---------------
    In "without new index" run,
    ACCESSTYPE = I
    MATCHCOLS = 1
    ACCESSNAME = PRIMFEK0
    INDEXONLY = Y
    PREFETCH = D

    In "with new index" run
    ACCESSTYPE = I
    MATCHCOLS = 5
    ACCESSNAME = SECDKEY1
    INDEXONLY = N
    PREFETCH = L

    Both the Primary index and new index has 6 levels. "81" DELETE queries were executed in the test runs.
    I understand that "with new index" run, it has to delete entries from both the indexes as well as data pages, so "new index" run will experience more GETPAGEs.

    Number of rows deleted in both the runs is 238,035.

    Since the DELETE query is matching the first 5 columns of the new SECDKEY1 index, I was expecting reduced GETPAGEs against the new index alone, please note that I am not talking about the total GETPAGES of the DELETE query.


    In the worst case, I was expecting 238,035 GETPAGEs against leaf pages of the "new index" plus some GETPAGEs against non-leaf pages. But GETPAGEs against SECDKEY1 index was 1,345,556. This is what I am trying to understand.

    Thanks,
    Ravikumar

    ReplyDelete
  3. 1,345,556 divided by 238,035 is almost 6 (actually, a little less than 5.7). That suggests a very large number of top-to-bottom traversals of the 6-level index - almost one such traversal (known as an "index probe") for each entry deleted. The list prefetch access indicated may be a contributor to this behavior. Db2 accesses the index to find the row IDs (RIDS) or qualifying rows (i.e., rows to be deleted), then sorts that list in ascending RID order, then prefetches table pages using the sorted RID list and proceeds to delete rows. A table row is always physically deleted first, and then the associated entry in each index is accessed and marked for later deletion (that's known as "pseudo-delete" of index entries - they'll be asynchronously and automatically physically deleted later). If an index's key order is significantly different from the table's clustering key order, deletion (pseudo-deletion) of entries in the index can require a lot of index probes when rows are deleted in physical order (as they will be when accessed via a sorted RID list).

    Adding an index to a table will often do little to reduce the cost of a DELETE operation. It can help if locating to-be-deleted rows would otherwise require scanning a lot of index and/or table pages. If the first column of the PRIMFEK0 index is of relatively high cardinality, that index may already have been useful in doing a lot of row-qualifying filtering at the index level.

    Robert

    ReplyDelete
    Replies
    1. Robert,

      Thank you so much!!..That helps me!!

      Thanks,
      Ravikumar

      Delete
  4. Robert,

    The DELETE statements executed right after loading the tablespace to make sure no rows were deleted before the testing and I was the only person using the table during the testing.

    Only "81" DELETE statements were executed during the test run.

    I assume top-to-bottom traversals of the 6-level new "SECDKEY1" index would have happened for 81 times.

    Further, I assmume that leaf pages will be in the sorted order and each leaf page will have pointer to next leaf page.

    During the execution of each DELETE statement, once first matching leaf page of "SECDKEY1" index is located after top-to-bottom traversal, it may be doing sequential pre-fetch of subsequent leaf pages until key value changes.


    Since 238,035 rows were deleted, I was expecting mcuh less GEPTAGEs against "SECDKEY1" index alone. But in reality 1,345,556 GETPAGEs were issued SECDKEY1 index alone.

    This is what I am trying to understand.

    Thanks,
    Ravikumar

    ReplyDelete
    Replies
    1. The number of DELETE statements does not matter. The number of rows deleted matters. The SECDKEY1 index appears to have been used to find the row IDs (RIDs) of the rows to be deleted. Those RIDs were sorted in ascending RID sequence and the pages with qualifying rows were list-prefetched. For each row deleted, Db2 has to find and delete the associated entry in the SECDKEY1 index. That will typically require a top-down traversal of the SECDKEY1 index FOR EACH ROW DELETED if either or both of the following are true:
      * The order of keys in the SECDKEY1 index is not closely correlated with the order of the table's clustering index.
      *There are quite a few rows between each row deleted and the next row deleted (i.e., a good number of pages are between each deleted row and the next).

      Robert

      Delete
    2. Robert,

      Thank you so much!!..That helps me!!

      Thanks,
      Ravikumar

      Delete
  5. REORG WITH DISCARD is great as a simple "clean up" tool with a couple of "gotcha":

    Table should not be a parent in an RI relationship, will throw the children into CHECK PENDING, which could affect availability.

    Table should not have LOBs > 32K. Discard dataset cannot be defined spanned. I have personally suggested to Haakon to copy the UNLOAD code handling VBS to REORG; I have entered an enhancement request some years ago; but no dice.

    If a pure delete of rows with LOBs is required (instead wanting to save the deleted data for history or archive) there is a JCL trick to get this to work, but it is a trick that UNLOAD can kabosh at any time (hasn't so far).

    Michael Harper, TD Bank

    ReplyDelete