Thursday, July 25, 2013

DB2 for z/OS: Clearing Up Some Matters Pertaining to LOB Inlining

Plenty of folks know that LOB inlining is one of the really great enhancements delivered with DB2 10 for z/OS (it's something about which I blogged last year). I've noticed lately, however, that a fair number of people have some misconceptions regarding two important aspects of LOB inlining, namely the relationship between inlining and LOB table spaces, and scenarios in which LOB inlining is and is not a good idea from an application performance perspective. In this entry I'll aim to clear up these misunderstandings.

LOB table spaces: You need them. Period (but DB2 can help)

Some DBAs have this idea that LOB inlining can eliminate the requirement that there be an auxiliary table (and associated table space and index) for every LOB column in a table definition (and that's actually one auxiliary table per LOB column per partition, if the base table space is partitioned). This line of thinking is understandable: If the largest (length-wise) value that will go into a LOB column is smaller than the inline length specified for that column, so that all of the column's values can be 100% inlined, there's no need for an auxiliary table for that column, right?

WRONG. You MUST have an auxiliary table (and associated LOB table space and index) if you are going to have a LOB column in a base table, even if the values in the LOB column will always be 100% inlined. If these objects do not exist, you will not be able to insert any data into the table because the definition of the table will be considered by DB2 to be incomplete. So, avoiding the need for an auxiliary table and a LOB table space and an index is NOT a reason to go with LOB inlining, because LOB inlining does nothing to change the requirement that these objects exist before the table with the LOB column can be used.

Now, creating one measly auxiliary table, and one LOB table space to hold that table, and one index on the auxiliary table (used to quickly access a LOB value associated with a particular base table row) is not exactly a tall order, but recall that you need one set of these objects per partition and per column if the base table is partitioned and has multiple LOB columns. Do the math, and the DDL work can start to look kind of intimidating. What if a base table has 1000 partitions and two LOB columns? Ready to create 2000 auxiliary tables, and the same number of LOB table spaces and indexes?

Before you get all wound up about such a prospect, consider that DB2 can automatically create required LOB-related objects for you when you create a table with one or more LOB columns. DB2 will do that if EITHER of the following is true:
  • The CREATE TABLE statement for the base table (the table with the LOB column(s)) does NOT include an "in database-name.table-space-name" clause. In that case, DB2 will implicitly create the database for the base table space, the base table space itself, and all other objects needed to make the base table usable (e.g., a unique index on the table's primary key if the CREATE TABLE statement designated a primary key, and all objects needed for LOB data if the CREATE TABLE statement included one or more LOB columns).
  • The CREATE TABLE statement for the base table DOES include an "in database-name.table-space-name" clause, and the value of the DB2 special register CURRENT RULES is 'STD' at the time of the execution of the CREATE TABLE statement.

Performance: When LOB inlining helps, and when it doesn't

As I see it, the primary ways in which LOB inlining delivers benefits in terms of application performance and resource-efficiency are as follows:
  • Disk space savings, if a high percentage of a table's LOB values can be 100% inlined in the base table. In such a situation, the disk space requirement for LOB data is reduced in two ways: 1) Compression. Data in a LOB table space cannot be compressed by DB2; however, inlined LOB data values will be compressed by DB2, along with non-LOB data in the base table space. 2) More efficient use of data page space. In a LOB table space, one particular page can hold data associated with one LOB value. If, for example, a LOB table space has 8 KB-sized pages, and a particular LOB value is 9 KB in length, the first 8 KB of the LOB value will be in one page and the last 1 KB will be in a second page in the LOB table space. The rest of that second page in the LOB table space (7 KB of space) will remain empty because it cannot be used to hold data for any other LOB value. In a base table space, of course, there is no such rule, so inlined LOB data can lead to more efficient use of space in data pages.
  • Improved performance of INSERT and SELECT operations (for SELECTs retrieving LOB data), when most LOB values can be 100% inlined in the base table. The performance gains here can be quite significant versus the non-inlining case.
  • Ability to create an index on an expression on the inlined portion of a LOB column. Such an index would be created using an expression based on the SUBSTR function. This could be very useful if, for example, you store a type of document as a CLOB and a value of interest (maybe a department number) always appears in characters 10 through 14 of the document. You could build an index on a SUBSTR expression on the inlined portion of the LOB, and therefore be able to very quickly zero in on rows containing documents pertaining to department 'AB123' (I posted an entry about DB2's index-on-expression capability -- introduced with DB2 9 for z/OS -- to the blog I maintained while I was working as an independent DB2 consultant prior to re-joining IBM).

Clearly, LOB inlining can be used very advantageously in some cases. In other cases, LOB inlining could negatively impact application performance. Here are some potential disadvantages of LOB inlining:
  • Performance degradation for INSERT and SELECT operations (for SELECTs retrieving LOB data) when most LOB values cannot be 100% inlined in the base table. Performance would be negatively impacted because DB2 would have to go to both the base table and the auxiliary table for most inserts and retrievals of LOB data.
  • Performance degradation for SELECTs that DO NOT retrieve LOB data. When you inline LOB data, you make the base table rows longer (sometimes much longer). As a result, you'll have fewer base table rows per page, and because of that you'll get a lower buffer pool hit ratio for the base table. That means more disk I/Os, and that will impact elapsed time.

Here's the bottom line: If LOB data will be stored in a table but only rarely retrieved, inlining probably isn't a good idea unless it is very important to improve INSERT (or LOAD) performance (and that won't happen unless most LOB values can be 100% inlined in the base table). If you go with inlining for this reason, consider enlarging the buffer pool to which the base table space is assigned so the buffer pool hit ratio won't be negatively impacted. LOB inlining can have a very positive impact on the performance of queries that retrieve LOB data, so if that is important to you then inlining can be a very good move when most of the LOB values can be 100% inlined in the base table. Again, consider enlarging the base table space's buffer pool so that queries that do not retrieve LOB data won't be negatively impacted by a reduced buffer pool hit ratio.

DB2 10 is by far the best release of DB2 in terms of LOB data management capabilities, and LOB inlining is an important part of that story. The important thing to keep in mind is that LOB inlining is not a universally applicable DB2 feature. Figure out first if inlining makes sense for your particular situation.

9 comments:

  1. Great article. I was just having this conversation with a developer yesterday.

    ReplyDelete
    Replies
    1. I'm glad to know that you're talking with folks about this, Troy.

      Robert

      Delete
  2. I have an index on expression on an NPI and table not in need of reorg. But I am consistently seeing multiple read operations(via I/O trace), in this case 3 on an insert into this unique index. As you can imagine 3 read I/Os on one index starts to comume much in db2 time. Averaging about 70 inserts per second, which seems extremely low. Question: If its not in need of reorg is PCTFREE and/or Freepage my only options? Low BP hit ratio but cannot justify increasing for one job..

    ReplyDelete
    Replies
    1. Three read I/Os from one index for each insert seems pretty high. First, if the index has three levels that suggests that you are getting ZERO buffer pool read hits for pages of the index when performing table inserts. That, in turn, suggests a way-undersized buffer pool (as it would indicate that even the index's root page can't stay cached in the pool. Even if the index has four levels, it would appear that your buffer pool (the one to which the index is assigned) is way too small, as it would seem in that case that no pages other than the index's root page are held for any length of time in the buffer pool. Either make the pool significantly large or create a new pool of significant size and reassign the index to that new pool (feasibility of these options would depend on the system having sufficient real storage resources to support a larger buffer pool configuration).

      Second, if the table is not in need of a REORG, the index might be. Consider running an index-only REORG for the index.

      Third, larger FREEPAGE and PCTFREE values for the index could improve performance by reducing the incidence of index page splits resulting from insert operations.

      Finally, I don't know that you need to be running I/O traces to analyze this situation. Using a DB2 monitor, the standard DB2 for z/OS accounting and statistics traces should give you the information you need for analysis of the insert process in question.

      Robert

      Delete
  3. We have a small table and the table has only 3 rows. But the application program is running late after we introduced the CLOB column in that table. Could you please suggest what can be the possible cause for this performance degradation

    ReplyDelete
    Replies
    1. I do not have any suggestions, aside from checking your DB2 performance monitor to determine the nature of the slowdown.

      Robert

      Delete
  4. I created an table with column as blob_image blob ,by default inline length(0) gets added .What is this inline length? why it is getting added automatically? im not able to understand this inline length concept given in the articles.

    ReplyDelete
    Replies
    1. LOB in-lining is explained on this page of the online Db2 for z/OS documentation: https://www.ibm.com/docs/en/db2-for-zos/12?topic=performance-improving-lob-data. The information on that documentation page also explains that the default inline length for a LOB column is 0.

      Robert

      Delete
    2. That was very useful.Thankyou!

      Delete