Tuesday, November 16, 2010

An Update on the LASTUSED DB2 Catalog Column

In a couple of entries that I posted to the blog I maintained while working as an independent DB2 consultant (one written this past summer, and the other in the spring of 2009), I mentioned the LASTUSED column of the DB2 for z/OS catalog table SYSIBM.SYSINDEXSPACESTATS. This column was introduced with DB2 9 for z/OS (indeed, the SYSINDEXSPACESTATS table itself was introduced with DB2 9 -- it's the moved-to-the-catalog version of the formerly user-defined INDEXSPACESTATS table, which holds index-related real-time statistics). LASTUSED provided -- finally -- an easy way to identify indexes that are just taking up space and which, therefore, should be candidates for being dropped (and potentially replaced with indexes that actually help query performance). You don't think that finding useless indexes was a challenging task before LASTUSED? It wasn't, if all of your SQL was static -- you'd just look at the SYSPACKDEP catalog table to see if any packages were dependent on a given index. That technique, however, didn't help in environments in which dynamic as well as static SQL statements were executed. How could you determine whether or not an index helped the performance of dynamic SQL statements? Believe me, this was no easy thing. LASTUSED changed that situation. It shows the last date on which an index was used by an SQL DML statement (a static or dynamic SELECT, FETCH or "searched" UPDATE or DELETE -- the latter term referring to UPDATE and DELETE statements containing one or more predicates) or for referential integrity constraint enforcement purposes (you wouldn't want to be searching for foreign key matches without an index on said foreign key). The column has a default value, and if you see that default value in a row for an index in SYSINDEXSPACESTATS, chances are that index is not boosting the performance of any SQL statements.

It's that default value that I want to talk about. See, a whole lot of mainframe DB2 people (including myself, up until about two days ago) think that the default value of the LASTUSED column is '0001-01-01' (i.e., January 1 of the year 1). Believe it or not, the default value of LASTUSED has NEVER been '0001-01-01'. It is, and always has been, NULL. So, if you're running DB2 9 in New Function Mode, and you do a SELECT on SYSIBM.SYSINDEXSPACESTATS, and you see '----------' in the LASTUSED column for an index (a string of dashes is the typical representation of the null value when you query the catalog using a tool such as SPUFI), that's an index that hasn't been used lately to performance-enhancing effect (but give it a little more time, and then do a little checking around before dropping the seemingly useless index, just to make sure that it really isn't needed).

I'll close this entry with an interesting piece of related information: DB2 9.7 for Linux, UNIX, and Windows delivered the LASTUSED catalog column for users of that DB2 platform (it's a column of several catalog views, including SYSCAT.INDEXES). The default value of the column? '0001-01-01'. Gotta love it.

6 comments:

  1. This is very useful information to reveal not used indexes using 'lastused' column value as null.The column ‘lastused’ is updated every time a select/fetch, update/delete is done. However, the column is not updated for inserts and loads.

    ReplyDelete
  2. Hello, Mohit.

    The LASTUSED column doesn't have to be updated for INSERTs and LOAD operations, because indexes aren't USED for those operations (aside from the clustering index, obviously, being "used" to find the target data page for a row insert). On INSERT and LOAD, indexes are MAINTAINED versus being USED, if by "used" you mean "utilized to speed up identification of predicate-qualified rows, or to speed the checking of referential integrity constraints," and that's the meaning of the LASTUSED column about which I wrote.

    Hope this helps.

    Robert

    ReplyDelete
  3. I should add that LASTUSED is updated as a result of INSERT activity, when the insert causes DB2 to use an index to check on a referential integrity constraint (e.g., to check on the existence of a parent table primary key value when a row is inserted into a child table).

    Robert

    ReplyDelete
  4. Thanks ! Robert for sharing this information.
    I meant the last time that an index
    was used to process an SQL statement.

    ReplyDelete
  5. Robert,
    A co-worker asked me whether there is ANY use of an index that wouldn't show up in LASTUSED. I suggested 2: enforcing uniqueness, and supporting a primary key (admittedly, it's pretty hard to imagine a primary key index that's never used). Are these two cases accurate? Do you know of any others?

    Thanks!
    Kathy

    ReplyDelete
  6. Kathy,

    Apologies for the very late response to your comment -- I was laid up for a while.

    I ran your question by one of the IBM DB2 for z/OS developers and received in response some interesting information (so, I'm addressing your question from a DB2 for z/OS perspective).

    You're right in that two of the reasons to NOT drop a DB2 index, even though the value of LASTUSED for that index in the SYSINDEXSPACESTATS catalog table is NULL, are uniqueness enforcement and primary key enablement. The DB2 developer brought up another couple of issues that had not occurred to me:

    1) When checking on LASTUSED values for indexes, take index creation date into consideration. In some packaged application environments, database objects (including indexes) are dropped and recreated on a regular basis. When an index is dropped and recreated, the LASTUSED value is reset. So, make sure that a LASTUSED = NULL situation does not exist simply because the index was recently dropped and recreated (again, this may happen more than you'd think in DB2 systems used with vendor-supplied packaged applications).

    2) The mere presence of an index will, by default, cause the DB2 RUNSTATS utility to gather some column statistics that could be very important for the performance of some queries (i.e., the stats are a very important factor in the generation by the DB2 optimizer of well-performing access paths for certain queries), even if the index itself is not used for data access. Consider, for example, that a DEFAULT correlation statistics specification for the RUNSTATS utility (and the same is true with respect to the inline stats you can get as part of the execution of a LOAD or REORG job) is:

    FREQVAL NUMCOLS 1 COUNT 10 MOST

    This will cause RUNSTATS to gather column value distribution stats for the first column of EVERY index key associated with the target table (specifically, RUNSTATS will determine the percentage of table rows that contain each the 10 most frequently occurring of the column's values -- this obviously is relevant only for non-unique columns).

    If you drop an index, those stats will go away. Will the removal of the stats (not the index) negatively impact performance for any queries? If so, you'll need to modify your RUNSTATS (or LOAD or REORG) control statement to specify (assuming that the lead column of the dropped index is COL1):

    COLGROUP(COL1) FREQVAL COUNT 10 MOST

    The same goes for duplicate value distribution stats that you might have requested explicitly (as opposed to their being generated by default) on utility control statements. Suppose, for example, that you have a correlation stats specification of:

    FREQVAL NUMCOLS 3 COUNT 10 MOST

    In that case, if the index is dropped and the resultant removal of those stats causes some query performance degradation, you'll need to replace that correlation statistics specification with a COLGROUP specification.

    The above consideration might also apply to the FIRSTKEYCARDF and FULLKEYCARDF statistics that are gathered by default when you run RUNSTATS (or LOAD or REORG with inline stats).

    Hope this helps.

    Robert

    ReplyDelete