In the part 1 of this 2-part entry (posted last month), I emphasized what I consider to be job one for a developer coding a Db2 for z/OS-targeted query (write a SQL statement that will retrieve the data your program requires, and don't worry too much about the statement's performance - that's mostly taken care of by Db2), while also noting ways in which a developer can effectively work in partnership with a DBA to enhance the performance of a Db2-based application. In this part 2 entry I will focus on leveraging application-enabling features of Db2 for z/OS.
First, what interface is right for a Db2-accessing application?
For a long time, this was not a very meaningful question, as there was essentially one application interface to Db2 for z/OS: the SQL interface. Sure, there could be discussions around dynamic versus static SQL (i.e., SQL statements prepared by Db2 for execution when initially issued by a program, versus statements that are pre-prepared for execution via a Db2 process known as "bind"), or about the use of "generic" (i.e., non-DBMS-specific) SQL forms such as JDBC and ODBC, but in any case you were talking about SQL, period. That changed with Db2 12 for z/OS, which introduced Db2's REST interface. Using Db2's REST interface does not involve compromising performance, security or scalability, so I'd say use it when it makes sense. When might it make sense? Here are some considerations:
- Client-side programming language flexibility - The IBM Data Server Driver, which supports the use of SQL forms such as JDBC and ODBC with Db2 for z/OS, can be used with applications written in a number of programming languages (generally speaking, programs coded with embedded static SQL statements don't require a driver), but suppose your team wants to use a language for which the IBM Data Server Driver is not applicable? Well, does the language allow a program to issue a REST request? If the answer to that question is, "Yes" (often the case), programs written in that language can access Db2 via its REST interface.
- Total abstraction of the particulars of the service-providing system - If you're using a generic SQL form such as JDBC or ODBC, you don't have to know the specifics of the relational database management system being accessed, but you still know that your program is accessing a relational DBMS (or something that through virtualization software appears to be a relational DBMS). Maybe you don't want to have to know that (even if you have strong SQL skills) - you just want to request some service and have it performed as expected by some system, and you don't care a whit about what that system is and how it does what it does. In that case, the REST interface to Db2 looks really good.
- Separation of programming duties - When your client-side program accesses Db2 using REST requests, your program isn't issuing SQL statements - you're instead coding REST requests that invoke server-side SQL statements that were likely written by someone else. That separation of programming duties - client-side developers code programs that issue REST requests and process any returned results, and server-side developers write the Db2 SQL statements and/or stored procedures that will be REST-invoked - might suit you (and your organization's IT leadership) just fine.
- Temporal data support (introduced with Db2 10 for z/OS) - This capability, through which a time dimension can be added to data in a table, comes in two forms (both can be implemented for a given table, if desired, or one or the other can be used):
- System-time temporal (also known as row versioning) - The way this works: when a row in table T1 is deleted or updated, the "before" image of that row (the row as it existed prior to the update or delete operation) is automatically inserted by Db2 into the "history" table associated with T1, and Db2 also updates timestamp values in the history table row indicating when that "version" of the row first became the current version (i.e., when the row was first inserted into T1, or when it was updated to that version) and when the row stopped being the current version (i.e., when it was deleted from T1, or when it was updated to a new version). Here's what this means: using pretty simple Db2 temporal query syntax, your application can easily find out from Db2 what the row for a given entity (a bank account, an insurance policy, a customer record, whatever) looked like at a previous point in time. For example, when it comes to insurance claim adjudication, what's important is not what the policy holder's coverage is now - what's important is the coverage that was in effect when the event prompting the claim occurred. That's easy to determine with Db2 row versioning functionality. Also easy is seeing how a given entity's row in a table changed over a period of time, and who made changes to the row.
- Business-time temporal - This form of Db2 temporal functionality lets you add future data changes to a table with an indication of when that change will go into effect and how long it will be in effect (if not indefinitely). For example, you could through a business-time temporal UPDATE statement indicate (and this will be reflected in the target table) that the price of product XYZ is going to change from $10 to $12 on May 1 of next year. Updates of this nature will not impact programs that, by default, are accessing rows that, from a business perspective, are currently in effect. Having future prices (for example) in a product table provides at least a couple of benefits: 1) it ensures that price changes will actually go into effect when scheduled, and 2) it allows financial analysts to issue queries that will show what revenue and profits will be with prices that will be in effect at a future date.
- Db2 transparent archiving (introduced with Db2 11 for z/OS) - This feature can be helpful, especially for performance, in this scenario: table T1 has 20 years (for example) of historical transaction data (maybe related to account activity), but the vast majority of accesses to the table target rows that have been added in the past 90 days; further, because the table's row-clustering key is not continuously-ascending and because row inserts far outnumber row deletes, the "popular" rows in the table (the ones not more than 90 days old) are physically separated from each other by ever-larger numbers of "old and cold" rows (the ones rarely accessed by programs). In that case the performance for access to popular rows will get progressively worse, and the cost of administering the table (e.g., backing it up, periodically reorganizing it, etc.) will steadily increase. When Db2 transparent archiving is activated for the table T1 (easily done by a DBA), T1 will end up holding only the most recent 90 days of data (the popular rows), while all of the "old and cold" rows are physically stored in the "archive" table that - in an "under the covers" way - is associated with T1 (result: substantially better performance for access to the popular rows, because they are concentrated in a smaller table); and, for query purposes Db2 makes the base table and its associated archive table appear logically as a single table, so a query referencing only the base table can retrieve archived rows as needed; and, when a row is deleted from the base table (after it's been there for - in this example - 90 days), that row is automatically inserted by Db2 in the associated archive table.
- Result set pagination (introduced with Db2 12 for z/OS) - Db2 12 made a new clause, OFFSET, available for a SELECT statement. OFFSET, used in combination with the FETCH FIRST n ROWS clause, makes it programmatically easier to return a multi-row result set in "pages" that a user can scroll through.
- "Piece-wise" DELETE (introduced with Db2 12 for z/OS) - Starting with Db2 12, you can use the FETCH FIRST n ROWS clause with a DELETE statement, and that makes it really easy to write a program that will remove a large number of rows in a table in a series of small units of work (so that your data-purge program will not acquire too many locks at one time - nice for concurrency of access with other programs targeting the table).
- Newer built-in functions - Db2 12 for z/OS added some nice built-in functions, including:
- PERCENTILE_CONT and PERCENTILE_DISC - These functions (the former treats values in a column as points in a continuous distribution of values, while the latter treats column values as discrete data values) are useful for writing a query that answers a question such as, "Show me the value that is the 90th percentile with regard to salaries of employees in department A01."
- LISTAGG - This function makes it easy to have a comma-separated list of values (e.g., last names of employees who have more than 10 years of service with the company, for a given department) as a column of a query result set.
- HASH_MD5 - With this function, you can use a SQL statement to get an MD5 hash of a value before inserting that value into a table (and there are three related built-in functions associated with other popular hashing algorithms).
- Application-specific lock timeout value (introduced with Db2 13 for z/OS) - Db2 13 provided the new CURRENT LOCK TIMEOUT special register, through which an application can set a lock timeout value that is different from the default lock timeout value for the Db2 system. Suppose, for example, that the system's default lock timeout value is 30 seconds. Maybe the development team for a mobile app that will drive accesses to a Db2 for z/OS database wants to make sure that a user will never have to look for 30 seconds at a spinning colored wheel on a mobile phone screen if a lock required by the app can't be readily obtained. The application team might decide (probably rightly) that it would be better to have a lock timeout value of 3 seconds for this app, have a condition handler in the Db2-accessing program for a lock timeout error, and in the event of a lock timeout (noticed in 3 seconds, versus 30 seconds) send an "Oops! Something went wrong - please try again" message to the user. I as a user would prefer that to looking at the spinning colored wheel for 30 seconds. Similarly, the development team for a long-running, mission-critical batch application might not want their job to time out unless a lock wait exceeds 10 minutes. Easily done with a SET CURRENT LOCK TIMEOUT statement.
- SQL Data Insights (introduced with Db2 13 for z/OS) - This feature represents the first embedding of advanced machine learning technology in the Db2 for z/OS "engine." It's easy for a DBA to set up and easy for a developer (or a user utilizing a query tool) to use, because the data science work was done by IBM development teams. SQL Data Insights is made usable in the form of three new built-in Db2 functions (more will be delivered via future Db2 13 function levels): AI_SIMILARITY, AI_SEMANTIC_CLUSTER and AI_ANALOGY. These new functions allow for the asking of what I like to call "fuzzy" queries. Here's one example: suppose the fraud analysis team at an insurance company finally caught someone who had been submitting fraudulent claims (and had been very good at covering his tracks). Among the company's several million other policy holders, who else might be engaging in hard-to-detect fraudulent activity? Via the AI_SIMILARITY built-in function of Db2, you can (using standard query syntax for a built-in function) easily code a query that will do this: "Hey, Db2. Here is the ID of a policy holder. Show me the IDs of the 20 other policy holders who are most like this one." And here's the kicker: in coding that query, you don't have to tell Db2 what you mean by "like this one." Db2 will detect patterns of similarity in the data in the specified table - patterns that a human being might be challenged to discern - and return the rows with the highest "similarity scores" in relation to the policy holder ID provided as input to the function. You can turn that list over to the fraud detection team and say, "Hey, guys. Do a deep-dive analysis of activity for the policy holders associated with these 20 IDs - they are the ones most similar to the fraudster that we recently caught."
No comments:
Post a Comment