Monday, August 30, 2021

Db2 for z/OS: Package-Related Things of Which You Should be Aware

Db2 for z/OS packages have been around for a long time - since, I believe, V2R3, back in the early 1990s. Even so, I've found myself talking more about packages to people lately than I have in quite some years. These discussions - conducted in meetings, in phone calls and through email - have highlighted for me some related topics and items of information of which more folks should be aware. I've packaged (no pun intended) these informational nuggets in this blog entry, and I hope you'll find them to be useful. Without further ado, and in no particular order...

Many Db2 for z/OS packages are associated with static SQL, but "static SQL" is a term not universally understood by people who work with Db2.  Sure, Db2 DBAs are familiar with the concept of static SQL. So, too, are application developers who code SQL statements in certain programs - typically, programs that 1) execute in the same z/OS LPAR as the target Db2 system, 2) are written in COBOL, and 3) execute as batch jobs or CICS transactions (though they could be stored procedure programs, and might in that case be written in SQL PL).  Application developers who code Db2-accessing SQL statements in JDBC or ODBC form may not be so familiar with the static SQL concept. For those folks: a static SQL statement is one that has already been prepared for execution, via the Db2 BIND process, before it is ever even issued for execution by a program. A Db2-related SQL statement that is not static is referred to as dynamic - it is prepared for execution when Db2 first sees it coming in from an application program (statement preparation can be relatively expensive, so the prepared form of a dynamic SQL statement is stored in-memory by Db2 in its dynamic statement cache - that prepared form of the statement can be reused if the same dynamic statement comes along again). SQL in JDBC or ODBC form is dynamic from the Db2 perspective. If a Db2-accessing program contains static SQL statements, that program has an associated package, and the package in that case can be thought of as the compiled, executable form of the program's static SQL statements.

Dynamic SQL statements are also associated with packages. Any SQL statement executed in a Db2 for z/OS system will always have a package associated with it. If it's a dynamic SQL statement, the associated package will likely be one of the IBM Data Server Driver packages (sometimes referred to as the Db2 Connect packages - entitlement to use the IBM Data Server Driver is by way of a license for Db2 Connect).

Db2 packages are associated with collections. As far as I'm concerned, this is one of the best ways to conceptualize a collection: a package's collection is kind of like a table's schema name (i.e., its high-level qualifier). Does that mean that the same package can be in more than one collection? Yes - and hold onto that thought.

SQL statements of the DDL variety are also associated with packages. As previously mentioned, any Db2-targeting SQL statement will be associated with a package. That being the case, a DDL statement (data definition language, including ALTER and CREATE) will be issued by way of a package, same as a DML statement (data manipulation language, including SELECT, INSERT and UPDATE). Here's where this sometimes trips up even an experienced Db2 DBA: starting with Db2 12 for z/OS, a package's APPLCOMPAT value affects DDL as well as DML statements (with Db2 11, APPLCOMPAT was relevant only for DML statements). Want to ALTER a table with a KEY LABEL specification, to leverage the data set encryption feature of z/OS? Make sure that the APPLCOMPAT value of the package through which the ALTER TABLE statement will be issued (that might be a SPUFI or a DSNTEP2 package) has an APPLCOMPAT value of V12R1M502 or higher, because the KEY LABEL option of ALTER TABLE (and ALTER STOGROUP) was introduced with Db2 12 function level 502.

Access path changes: don't forget the PLANMGMT safety net, and don't cut a hole through that safety net if you don't have to. For a static SQL statement, access path selection (e.g., will a certain index be used in the evaluation of a predicate, will two tables be joined using the nested loop or merge scan or hybrid join method, etc.) is done at package bind time (for a dynamic SQL statement, access path selection is done when the statement is issued by an application program, unless the statement was previously prepared and its prepared form is still in Db2's dynamic statement cache). If a static SQL-containing package is rebound, access paths for the package's SQL statements could change (the APREUSE option of REBIND PACKAGE can be used to tell Db2 to stick with a package's existing SQL access paths, if possible). If a rebind action results in access path changes, the expectation is that the changes will be beneficial from a performance perspective, but that is not always the case. Sometimes, an access path change will have a negative impact on performance (incomplete or inaccurate catalog statistics, for example, could cause the Db2 query optimizer to choose an access path that appears to be good for performance but in fact is not). In such a case, the plan management feature of Db2, which is "on" by default (check the value of PLANMGMT in ZPARM), provides a safety net: it causes Db2 to retain the previous instance of a package when the package is rebound. In the event of a performance degradation caused by a rebind-related change in one or more query access paths, the previous and better-performing instance of the package can be quickly restored via another REBIND PACKAGE action with SWITCH(PREVIOUS) specified. That's great, but sometimes an organization will cut a big hole in this safety net. How? By unnecessarily doing BIND REPLACE for packages, versus REBIND. BIND REPLACE is needed when a program's static SQL statements are modified, because that necessitates a new precompile of the source program, which generates a new DBRM (a file containing the program's static SQL statements) that subsequently has to be bound to replace the prior version of the related package. If a BIND REPLACE results in a performance problem that is of an access path nature, REBIND with SWITCH(PREVIOUS) is not an option, because that restores the previous instance of the same version of the package, and the BIND REPLACE (assuming it was preceded by a Db2 precompile of the program source) generates a new version of the package. Here's where the "unnecessary" comes in: quite often, an organization will have a process whereby a static SQL-containing program will be re-precompiled and BIND REPLACE-ed whenever a program's source is changed, even when the source change does not change the program's static SQL statements. When source code other then SQL is changed, a new precompile and an associated BIND REPLACE is not necessary. Moral of the story: don't do program re-precompiles and BIND REPLACEs when you don't need to do that. And while I'm on the topic of unnecessary package-related actions...

Don't do unnecessary rebinds of packages. At many Db2 for z/OS sites, certain Db2 table spaces are REORGed quite frequently (maybe on a weekly basis), and catalog statistics are regenerated for the table spaces, and packages with static SQL statements targeting tables in the table spaces are rebound. DON'T DO THAT (the package rebinds, I mean), absent a really good reason. Here's the thing: if you have a package of well-performing static SQL statements, keep in mind that one of the really good things about static SQL is that the good performance is locked in, from an access path perspective. Why mess with that? Leave your well-performing packages alone. REORGing and RUNSTATing table spaces from time to time? Good idea. Rebinding dependent packages after running those utilities? Not such a good idea. Am I saying that you shouldn't ever rebind packages? Of course not. Just do it when it makes sense. When does it make sense? Well, it's a good idea to rebind all of your packages (and plans, too) soon after migrating to a new version of Db2, while still at the 100 function level (what we used to call conversion mode), because a) the new version's optimizer enhancements are available at that time, and b) even when access paths don't change, the package code generated by the new version of Db2 (packages are executable code) is likely to be at least a little more CPU-efficient than the code generated for the same package by the previous version of Db2. It also makes sense to rebind a package when you want a query associated with the package to use a newly created index on a table (that would of course require query re-optimization, which a rebind does). And certainly you'd rebind if you wanted or needed to change a package characteristic such as its RELEASE setting (COMMIT or DEALLOCATE) or its isolation level (e.g., cursor stability or uncommitted read) or its APPLCOMPAT value. Bottom line: have a reason for rebinding a package or packages (and "Because we just REORGed and RUNSTATed a table space on which the package depends" is usually not a good reason).

Remember that EXPLAIN information for a package is always available. Do you need to get access path information for a package that was bound without a specification of EXPLAIN(YES)? No problem: just use the EXPLAIN PACKAGE statement (this will extract access path information from the package itself - that information has been part of a package's structure since Db2 9 for z/OS). Want to know what a package's access path would be, if the package were to be rebound? No problem: just rebind the package with EXPLAIN(ONLY) - you'll get the access path information, but Db2 won't actually regenerate the package. More information about EXPLAIN PACKAGE and rebind with EXPLAIN(ONLY) can be found in an entry I posted to this blog a few months ago.

Get different behaviors for the same package by using different bind specifications and different collections. I mentioned previously that a collection is somewhat akin to a schema name for a package. Just as tables that are logically identical and have the same unqualified name can be in different schemas, so packages that are generated from the same SQL statements can be in different collections. This can be very handy when you want different behaviors for the same package in different situations. For example: suppose that in some cases you want queries associated with package PROGXYZ to be executed with cursor stability isolation, while in other cases you want the same queries to be executed with uncommitted read isolation. No prob: bind the package with ISOLATION(CS) into collection CSCOLL (or whatever name you want to use for the collection) and with ISOLATION(UR) into collection URCOLL (and that second bind action might be a BIND COPY from CSCOLL into URCOLL, with the different ISOLATION value specified). How do you "point" an application that uses the PROGXYZ package to the CSCOLL or the URCOLL collection, as needed? For a static SQL package used with (for example) a batch job or a CICS transaction, you could do this with a plan's PKLIST. You could also let the program itself operate the "switch in the tracks" by using the SET CURRENT PACKAGESET special register; or, for applications that are DRDA requesters you could point to one collection or another by way of the Db2 profile tables (and here we could be talking about alternate collections of the IBM Data Server Driver / Db2 Connect packages). You could turn statement concentration (automatic parameterizing by Db2 of dynamic query predicates that reference literal values) on or off for different applications in this way, or use different APPLCOMPAT values, or provide a different high-level qualifier for unqualified table names in SQL statements, or do a number of other things (to get the ideas flowing, check out the list of package bind options in the online Db2 documentation).

OK, that's enough package stuff for one blog entry. I hope that this information will be helpful for you. As always, thanks for stopping by.

Friday, July 30, 2021

CREATE OR REPLACE: Agile Deployment of Db2 for z/OS Stored Procedures

A relatively recent Db2 for z/OS enhancement that might have escaped your notice may well merit your consideration. Here's what I'm talking about: Db2 12 function level 507 (available since June of 2020, related to APAR PH24371) introduced CREATE OR REPLACE syntax for stored procedures. In a Db2 for z/OS context, CREATE OR REPLACE syntax was initially provided for Db2 12 advanced triggers, and it eliminated what had been a real irritant with regard to modifying a trigger. Extending CREATE OR REPLACE to stored procedures delivers a big benefit that I can sum up in two words: deployment agility.

Agility in development and deployment of new application functionality has very much grown in importance over the past few years, to no one's surprise. In an age of increasing enterprise digitalization, getting new capabilities and services into the hands of your customers and users in a timely and frequent way can be major competitive differentiator (or, in the case of a public sector organization, a source of citizen delight). Anyone who works with and advocates for Db2 for z/OS wants that DBMS to be fully plugged into an enterprise's DevOps pipeline, and that makes agility of deployment of the Db2 for z/OS objects associated with an application a priority. Suppose a Db2 for z/OS-accessing application utilizes some Db2 stored procedures, and a new iteration of the application is going to require modification of some of those stored procedures. Does that create any issues that could cause some friction on the deployment front? Quite possibly, yes, prior to Db2 12 function level 507.

How could a need to modify some Db2 stored procedures put some sand in the deployment gears? It comes down to the pre-function-level-507 options for deploying a new or modified stored procedure in "the next-higher-level" environment (e.g., dev to test, test to prod). Let's say that we're deploying to the production Db2 for z/OS environment, and let's say further that the stored procedures are of the native SQL variety (i.e., they're written in SQL PL), because it's for such procedures that CREATE OR REPLACE syntax is particularly helpful. OK, how do you get those modified native SQL procedures deployed to production in a pre-function-level-507 environment? You could get the job done using the BIND PACKAGE command with the DEPLOY option, but that approach is less than ideal from an agility perspective. Why? Because it relies on a Db2 command (BIND PACKAGE) versus a SQL statement. A Db2 for z/OS DBA might say, "What's the diff? I can use a Db2 command as easily as I can use a SQL statement." Sure you can, if you're using a tool like SPUFI (the TSO/ISPF-based interactive interface that comes with Db2), but DevOps engineers are not Db2 DBAs, and they're not SPUFI users. DevOps engineers like to use tools that work with all of the systems and servers across which an application's components are deployed. In other words, they have in mind a unified DevOps pipeline that can be used for all of an enterprise's applications. The tools generally used for such a pipeline are much more suited to issuing SQL statements than DBMS commands.

OK, BIND PACKAGE with DEPLOY is not your only option in a pre-function-level-507 Db2 system. You could go the drop and re-create route, couldn't you? Drop the already-there stored procedures in the production environment, and re-create them with the desired modifications. DROP PROCEDURE and CREATE PROCEDURE are SQL statements, not Db2 commands, so this approach satisfies the agility requirement, right? Eh, not really. Why not? Well, here's one problem: after you drop and re-create the Db2 stored procedures, you have to re-grant associated authorizations (referring to the EXECUTE privilege that one or more IDs need on the stored procedures). That can be a hassle. Another problem with the drop and re-create approach: if a stored procedure that you want to drop is called by a SQL PL routine (such as a native SQL procedure), the DROP will fail. You could drop the calling SQL PL routine that was blocking the drop of the stored procedure, but then you'd have to re-create that routine in addition to creating the dropped stored procedure. And, what if the DROP-blocking SQL PL routine is itself called by another SQL PL routine? You'd have to drop and re-create that routine, too. As you can see, the situation can get pretty messy pretty fast.

Enter CREATE OR REPLACE functionality, and things get easy peasy. I'll illustrate with an example. Let's say that we have this native SQL procedure already in the production Db2 environment:

CREATE PROCEDURE MYPROC1
   ( IN  P1 CHAR(5),  
     OUT P2 DECIMAL(15,2) ) 
  BEGIN 
     SELECT AVG(SALARY) INTO P2 
       FROM DSN8C10.EMP
      WHERE WORKDEPT = P1;
  END

OK, now I'm going to deploy a slightly modified form of this stored procedure (slight modification highlighted in orange). Function level 507 (or higher) is activated in my Db2 12 environment, and the package through which I'm issuing SQL statements (an IBM Data Server Driver package, or whatever) has an APPLCOMPAT value of V12R1M507 (or higher), so I can use CREATE OR REPLACE syntax, as shown below.

CREATE OR REPLACE PROCEDURE MYPROC1
   ( IN  P1 CHAR(5),  
     OUT P2 DECIMAL(15,2) ) 
  BEGIN 
     SELECT AVG(SALARY + 1000) INTO P2 
       FROM DSN8C10.EMP
      WHERE WORKDEPT = P1;
  END

What will Db2 do in this case? Db2 will check to see if a stored procedure named MYPROC1 already exists in the production system. One does (we created it earlier), so Db2 replaces the existing MYPROC1 procedure with the modified MYPROC1. After that replace action has been accomplished, do I need to re-grant authorizations for the MYPROC1 procedure? No! Why not? Because you didn't drop MYPROC1 (the word "DROP" appears nowhere in the SQL statement above). What if MYPROC1 is called by a SQL PL routine? Will that block the replace action? No! Why not? Because the SQL PL caller of MYPROC1 would block a DROP of MYPROC1, and you didn't drop MYPROC1 (you replaced MYPROC1, and a procedure replace action is not blocked by a SQL PL routine that calls the procedure being replaced).

Go forward in time a bit more, and we need to deploy yet another modified form of MYPROC1. This time, I want to add a version V2 of MYPROC1, as opposed to replacing the existing version of the procedure, so I specify VERSION V2 in my CREATE OR REPLACE statement (I can do that if the stored procedure is a native SQL procedure - can't if it's an external stored procedure written in a language such as COBOL or C):

CREATE OR REPLACE PROCEDURE MYPROC1
   ( IN  P1 CHAR(5),  
     OUT P2 DECIMAL(15,2) ) 
  VERSION V2
  BEGIN 
     SELECT AVG(SALARY + 5000) INTO P2 
       FROM DSN8C10.EMP
      WHERE WORKDEPT = P1;
  END

What will Db2 do this time? It will check to see if there is already a version V2 of MYPROC1 in this production system. There isn't, so Db2 creates version V2 of MYPROC1.

One more illustrative change: I want to modify version V2 of MYPROC1. To do that, I issue a CREATE OR REPLACE PROCEDURE statement for MYPROC1 that includes a reference to version V2 of the procedure:

CREATE OR REPLACE PROCEDURE MYPROC1
   ( IN  P1 CHAR(5),  
     OUT P2 DECIMAL(15,2) ) 
  VERSION V2
  BEGIN 
     SELECT AVG(SALARY + 9000) INTO P2 
       FROM DSN8C10.EMP
      WHERE WORKDEPT = P1;
  END

Db2 will check to see if there is already a version V2 of MYPROC1 in the production system. There is, so Db2 will replace version V2 of the procedure.

And there you have it: a SQL statement-driven (not command-driven) means of creating or replacing stored procedures in "the next-higher-level" environment that avoids the hassles associated with a drop and re-create approach. Big plus. Organizations have already been doing agile things with Db2 for z/OS stored procedures (especially the native SQL kind), like using their source code management tool of choice (GitLab, for example) to manage stored procedure source code (which, for a native SQL procedure, is the procedure's CREATE PROCEDURE statement), and using the IBM Db2 for z/OS Developer Extension for Visual Studio Code to give developers an assist in creating and debugging native SQL procedures. Add CREATE OR REPLACE PROCEDURE to that picture, and you take development and deployment agility for Db2 stored procedures (especially the native SQL kind) to a whole new level. Check it out.

Wednesday, June 30, 2021

Db2 for z/OS: REST Services Scalability

I received an email the other day from a Db2 for z/OS system administrator who had recently attended a workshop on the Db2 REST interface. He mostly liked what he heard, but he had a concern about the scalability of Db2 REST services. Knowing that REST requests are stateless in nature, he anticipated that a large volume of such requests targeting a Db2 for z/OS server would drive a large amount of activity involving connection processing, thread processing and user authentication and authorization. Wouldn't those areas of system operation end up being bottlenecks that would throttle transaction throughput for client applications?

The Db2 system administrator's concerns were certainly understandable, but a Db2 for z/OS system can in fact handle a very large volume of REST requests - several thousand per second should be do-able, assuming sufficient server-side processing capacity. In this blog entry, I'll provide information on how scalability is achieved for Db2 for z/OS REST services, referring to the three areas of associated server-side processing mentioned by the Db2 system administrator.


Connection processing

This is where the single most important step was taken by the IBM Db2 for z/OS developers responsible for the Db2 REST interface: they leveraged Db2's inactive connection functionality. A LONG time ago, Db2 got this functionality via the ZPARM parameter CMTSTAT. Starting with Db2 V8 for z/OS, the default value for CMTSTAT became INACTIVE, and here's what that means: say an application server establishes a connection to a Db2 for z/OS subsystem via that Db2 system's distributed data facility (DDF - the Db2 address space that handles communication with network-attached applications). A transaction associated with the connection is initiated, and the connection from the application to Db2 is paired with a DBAT (a DDF thread) for the purpose of SQL statement processing. The transaction completes, and the DBAT goes back in the DBAT pool, and the connection goes into an inactive state. This "inactive state" of the transaction is strictly a Db2-side thing - it is not visible to the application, which in a logical sense continues to see that it is connected to the Db2 system. When the next transaction associated with the connection is submitted, the connection on the Db2 side is "snapped back" to an active state, and paired with a DBAT for SQL processing, and then the transaction completes and the connection is again separated from the DBAT and placed back in an inactive state, and so on and so on.

Why was inactive connection functionality built into Db2? Originally, this was done largely to enable a single Db2 subsystem to support a very large number of connections from network-attached applications - there can be up to 150,000 such connections to one Db2 subsystem (n times 150,000 for a Db2 data sharing group with n members). No way can one Db2 subsystem handle anything like 150,000 concurrently-executing DDF transactions (never mind the processing power you'd need - there wouldn't be enough "below-the-2GB-bar" virtual storage in the Db2 database services address space for that number of threads), but Db2 doesn't have to be able to process as many in-flight DDF transactions as there are connections to Db2, because at any moment in time the majority - often, the large majority. - of those connections will be in a not-being-used-right-now state. This is why the value of the Db2 ZPARM parameter CONDBAT (maximum connections to a Db2 subsystem via DDF) is typically several times the value of MAXDBAT (maximum number of connections for which transactions can be concurrently in-flight). Db2 inactive connection functionality makes all this feasible because the CPU cost of flipping a connection from the inactive to the active state and back is very small, and the "footprint" of an inactive connection (virtual storage used, etc.) is very small.

OK, what does this have to do with the scalability of Db2 REST services? Well, you may be aware that REST requests are "stateless" in nature - each and every REST request is its own independent transaction (one unit of work, from a Db2 perspective), with no expectation or assurance of a related follow-on request from a given client-side application. With that being the case, you might think - like the Db2 system administrator to whom I previously referred - that there would be a whole ton of Db2 connect and disconnect activity associated with a large volume of REST requests. This isn't, after all, like the typical Db2 DRDA application scenario (DRDA - distributed relational database architecture - being Db2's protocol for distributed database processing) in which an application connects to a Db2 subsystem and stays connected to that subsystem for a long time (hours, maybe days). Connect and disconnect processing is kind of expensive. Wouldn't that cost put the kibosh on Db2 REST service scalability? It might have, but for an enhancement delivered a few years ago via the fix for Db2 APAR PI86868: when a client application sends a REST request to a Db2 subsystem, the associated connection will be retained by the Db2 subsystem, in an inactive state, for up to 15 seconds, so that it can be used again to process another request sent via that connection. When that next REST request associated with the connection (i.e., from the same client IP address and ephemeral port) comes along - assuming that that happens within the aforementioned 15-second window - the connection will be snapped from the inactive to the active state. The request will be processed, and the connection will again be flipped to the inactive state, where it can remain for up to another 15 seconds while Db2 waits for another REST request associated with the connection. If the inactive connection remains unused for 15 seconds it will be terminated by Db2.

In a high-volume REST request situation - exactly the scenario that would otherwise cause us concern - it is highly likely that Db2's temporary retention of connections from REST request-issuing applications will result in a lot of connection reuse (it doesn't take a whole lot of volume to get tp one request every 15 seconds or less from one connection), and that is a key factor in the scalability of Db2 REST services.


Thread processing

In this area, Db2 REST service scalability is boosted by something that's been around (as I recall) since the early 1990s: the DBAT pool. When Db2 creates a DBAT to service a DDF transaction, it doesn't just chuck that DBAT when the transaction completes. Instead, it puts the DBAT in the DBAT pool so that it can be used again to service another transaction (a DBAT can stay in the pool for a number of seconds specified by the ZPARM parameter POOLINAC, which has a default value of 120 - if the DBAT has not been reused in that time, it will be terminated). If you use your Db2 monitor to generate a statistics long report (depending on the monitor product used, that might be called a statistics detail report), you should see a section in the report with a heading like GLOBAL DDF ACTIVITY. In that section you should see a couple of fields with names like DBATS CREATED and POOL DBATS REUSED. If we call the former field A and the latter B, the DBAT reuse rate is B / (A + B). It is very common, in a production Db2 environment with a significant DDF workload, to see a DBAT reuse rate that exceeds 99.9%. High rates of DBAT reuse have long been part of the Db2 DDF efficiency story at many sites, and they are an important contributor to Db2 REST service efficiency and scalability.

[Fun fact: though you might think of "active" DBATs as those currently being used to service DDF transactions, and 'inactive" DBATs as those in the pool, that is not an accurate conception. In fact, all DBATs, including those in the pool, are active. What you have are DBATs that are in a connected state, servicing DDF transactions, and DBATs that are in a disconnected state - the latter are the ones in the pool.]

Keep in mind that once a DBAT is provided for use in processing a REST request, the package associated with the requested service has to be allocated to the DBAT. Here, you can enhance efficiency by making it very likely that the package in question will be in-memory, in the oddly named skeleton pool (the package cache - part of the Db2 EDM pool). How can you check on this? Generate a statistics long report, using your Db2 monitor, and look for the section in which EDM pool activity is provided. In that report section, you'll see a value for something like PT REQUESTS (PT is short for package table - this is a request for a package) and another value for something like PT NOT FOUND (meaning, not found in the skeleton pool, in which case the package has to be retrieved from the Db2 directory on disk). The ratio of PT REQUESTS to PT NOT FOUND should be at least several thousand to one, and ideally multiple tens of thousands to one - if it isn't (and if the z/OS LPAR is not short on real storage), make the skeleton pool larger (the ZPARM parameter to adjust is EDM_SKELETON_POOL).


User authentication and authorization

Let me cover authorization first. What this involves is checking to see if the ID associated with a REST request has the Db2 EXECUTE privilege on the package that will be executed to provide the requested service. That's it - a good bit simpler versus checking the various privileges that might be required for execution of dynamic SQL statements (e.g., the SELECT privilege on table T1, the UPDATE privilege on table T2, etc.). This straightforward EXECUTE-on-package check is made particularly efficient by an in-memory cache of package authorization information maintained by Db2. I have seen on a number of occasions that a Db2 subsystem's package authorization cache, sized by way of the ZPARM parameter CACHEPAC, is smaller than it should be, leading to a sometimes high volume of accesses to the SYSPACKAUTH table in the Db2 catalog to check package execution authorization. That too-small situation will soon be (maybe already is, for your system) a thing of the past, thanks to the fix for Db2 APAR PH28280, which removes CACHEPAC from ZPARM and sets its value internally to the maximum size of 10,485,760 bytes.

The value of package authorization caching is particularly high when a relatively limited number of IDs are checked for package execution authorization, and that is often the case in a DDF application environment, whether the client applications are issuing REST requests or SQL statements - it is quite common for the ID in question to be an application's ID, as opposed to an end-user's ID (in such a case, the application performs authentication for end users on its end, and then interacts with Db2 on behalf of end users).

The user authentication story is much the same. When processing a DDF transaction (REST-related or otherwise), Db2 authenticates the associated ID (and accompanying password or certificate) by invoking the RACF macro RACROUTE REQUEST=VERIFY (and I recognize that you might use in your environment a different z/OS security manager from another vendor). If the ID is successfully authenticated, Db2 caches the credentials in-memory for up to three minutes. If, during that interval, the same ID needs to again be authenticated then Db2 will reuse the cached credentials and there will not be a need for another invocation of the RACROUTE REQUEST=VERIFY macro. This mechanism can majorly enhance user authentication efficiency and scalability related to Db2 REST requests, just as Db2's package authorization cache enhances authorization scalability for such requests.

And there you have it. With regard to connection processing, thread processing, and ID authentication and authorization processing, Db2 for z/OS provides mechanisms to deliver scalability and efficiency for REST service requests. You can use Db2's REST interface with confidence, even for high-volume applications.

Friday, May 28, 2021

The REST Interface: Db2 for z/OS as a Utility, and the Role of SQL Programmers

Recently, I was out on the West Coast of the United States to celebrate my son's graduation from college. At a party with some students and their parents, I conversed for a while with one of my son's good friends. This young man was about to get his degree in computer science. We talked shop, and I asked him about the code he'd typically write to interact with a data server. Perhaps he used a generic (i.e., non-DBMS-specific) SQL interface, such as JDBC or ODBC? No, he said, he really didn't know SQL that well. I was a little surprised (on the inside) to hear that. I mean, SQL is kind of fundamental, isn't it? Then I reframed my question: "OK - I guess you'd rather just invoke a data service via something like a REST API, and let that service do what you want done, without having to worry about how the service works or being concerned with any of the particulars of the back-end system on which the service executes?" "Yeah," he said. "More like that."

I know this friend of my son is a smart and hard-working guy. It's not at all like he's too lazy to write data-access code like SQL. He doesn't care to do that because he doesn't need to. He's part of a new generation of developers who see a universe of services of all kinds. Sometimes they'll be involved in the creation of new services. Often they'll look to discover and invoke existing services - data-related and otherwise - to provide application functionality needed by organizations and end-users. This change in the application-development mindset is gaining momentum, and it will have some fairly profound implications for people who administer Db2 for z/OS and other data-serving platforms. Earlier this year I posted a blog entry that focused on the nuts and bolts of Db2's REST interface. In this entry, I'm going to shift from the "how" of the Db2 REST interface to the "so what?"

A really big "so what" for Db2 for z/OS people will be the change in how the data server they support is perceived by application developers - particularly, developers on the client side of client-server applications. For growing numbers of such developers, Db2 for z/OS itself will be seen as, essentially, a utility. And I'm not using "utility" in the technical Db2 sense (e.g., the REORG utility); rather, I mean "utility" as in, "electric utility." Think about that. When you flip the light switch in a room, you expect electricity to flow and lights to come on. You probably don't think about where the electricity came from (maybe a power plant located some miles from your home), or how it was generated (maybe a natural gas-powered generator, maybe a wind farm, maybe a hydroelectric installation by a dam). Your "electricity interface" - the light switch - is easy to use.

if you go from your home to someone else's home, or to a hotel room or to a conference room in an office building, you easily locate the light switch because it looks like the ones in your home; and, you benefit from "consistency of electricity invocation": whether you're at home or in an office or wherever, you flip the switch up and electricity flows. Flip the switch down, flow stops and the lights go out.

Suppose you have an electrical outlet hooked to a switch. You have all manner of electrical devices - a music player, a blender, a printer. They can all be utilized through the same standard interface (the outlet). If you take these devices to another room in another building, the interface is the same: plug 'em in and use 'em. Your thinking is about what you'll do with the electricity you can invoke through the standard interface - not about the way in which the electricity was generated for your use.

OK, your a client-side application programmer, coding in accordance with the REST architectural style. You have a consistent interface by which you can invoke services, and some of those services might be data-related. You (or, more likely, your program will) "flip the switch" by issuing a REST request, and the requested data service (retrieve such-and-such data associated with thus-and-so input, or persist this value, or update this value, or whatever) will get done, as you expect (the lights come on), and you (or your program will) move on to some other action. How did that data service get performed? Where did that data service get performed? DON'T KNOW. DON'T CARE. It worked, didn't it? Do you care about that? Probably not. Why? Because the result - successful execution of the requested service - was expected. When you flip the light switch in a room, do you think, "Cool! THE LIGHTS CAME ON!" No? Neither do I.

Shift the perspective now to that of the people - like DBAs - who work at the "data-services utility," which they know as a Db2 for z/OS system. That system has such cool stuff, doesn't it? Partition-by-growth table spaces! Automatic clean-up of pseudo-deleted index entries! Relative page numbering! LOB in-lining! OLAP expressions! Global locking on a Parallel Sysplex! We geek out about this stuff at conferences. Wouldn't client-side developers like to know about that cool stuff? No, they would not, and that is especially true if their way of interacting with your data-serving utility - invoking REST APIs - completely abstracts your world from their point of view.

Is this "don't know, don't care" mind set of client-side developers regarding the particulars of the system you support a reason to slump your shoulders and feel down about what you do? Of course not. I know plenty of people who work for electric utilities. You can take pride in doing what they do for their consumers: you provide a service (execution of data-processing services) that is easy to invoke via a consistent interface (REST requests). The provided service does WHAT is requested WHEN it is requested (any time, day or night) and as expected (no surprises). Interestingly, the BETTER your data-services utility is at doing what it does, the LESS your consumers (client-side developers) will think about the goings-on at your end. Freak-out time is when the switch is flipped and the lights DON'T come on, or they come on in an unsteady, flicker-y kind of way. You keep delivering service that is boringly (from the consumer perspective) excellent, and when those consumers DO have reason to think about your utility, it will be because they need a data-server for a new application and they want to use that data-services utility that has that rock-steady quality - the one you support.

And let's not forget, in our survey of personas in this new client-server world, the group of people who are key colleagues of the DBAs and the systems programmers at the Db2 for z/OS utility: the SQL programmers. That's right. Remember, when the REST architectural style is employed there is no client-side SQL. Any SQL statements executed at your site will be those invoked via REST requests issued by client-side developers. The SQL - whether an individual statement or a stored procedure - that performs a service associated with a REST request has to be developed by someone, and that someone is generally going to be, from the client-side developer's perspective, a person who works at the data-service utility. A client-side developer may need a new service (as opposed to the situation in which you REST-enable an existing service), and that service may be performed on the utility side through execution of SQL. That detail doesn't matter to the client-side developer, who just says, "I want to send this input and have this action performed and receive this output." Someone on the utility side will analyze the specifications of the required service and will code SQL to get it down. The SQL will be REST-enabled, and away things go: service is invoked (switch is flipped) on the client side, and on the utility side the associated SQL statement (about which the consumer likely has no clue) is executed, and it executes quickly and efficiently and reliably because the DBAs and systems programmers have constructed and maintain an environment that delivers those qualities of service. Thus it is that the SQL developers' role will come to overlap more and more with the role of the DBAs, and vice versa. They're on the same team, and together they provide the so-good-it's-not-noticed level of data service to the client-side consumers.

I've been doing what I do for the better part of 35 years, and this emerging application development landscape has me feeling young again. Client-side developers, I am A-OK with you having little to no visibility with regard to the the technology I support and what I do to support it. When you flip the switch, and my utility is responsible for the lights coming on right away and as expected so that you (your program) can proceed to do something that needs doing, I'll be happy even if you don't give it another thought. Trust me on that one.

Thursday, April 29, 2021

Db2 for z/OS: Dealing With Application Compatibility Errors (-4743 SQL Error Code)

The notion of "application compatibility," in a Db2 for z/OS context, has been around for a while - it was introduced with Db2 11, which came out in 2013; still, in my experience it's been only recently that a number of Db2 for z/OS administrators have found themselves dealing with problems in this area. The incidents typically involve dynamic SQL statements. A person - maybe a Db2 DBA, maybe an application developer, maybe a user - issues a dynamic SQL statement via some tool (Db2's built-in SPUFI, the Db2-supplied DSNTEP2 program, a query tool such as IBM QMF, whatever) and execution of the statement fails, with Db2 returning a -4743 SQL error code: "ATTEMPT TO USE NEW FUNCTION WHEN THE APPLICATION COMPATIBILITY SETTING IS SET FOR A PREVIOUS LEVEL." Quite often, it's the first time that a Db2 DBA has encountered this error code, and he or she is left wondering, "Huh? What does THAT mean, and what am I supposed to do about it?" I'll try to provide some helpful information through this blog entry.


Why now?

If the -4743 Db2 SQL error code has been around since 2013 (and it has), why is it that a number of people have encountered problems of this nature just within the last year or so? I'd say that's largely due to a couple of factors:

  • Very helpful new SQL syntax and functionality introduced with Db2 12 for z/OS, with function level 500 or higher activated. Wait - hasn't Db2 12 itself been out for over four years? Yes, but there's often a lag between a new Db2 feature being introduced and people acting to take advantage of the new feature. Db2 12 delivered a number of important SQL-related enhancements - piece-wise DELETE, result set pagination and advanced triggers being just a few examples - and as people try these enhancements out they find that the appropriate application compatibility level needs to be in effect. Want to use the LISTAGG function, introduced with Db2 12 function level 501? Application compatibility level V12R1M501 (or higher) needs to be in effect.

  • In a Db2 12 environment, application compatibility level affects DDL statements (e.g., CREATE and ALTER) as well as DML statements such as SELECT and DELETE. When the Db2 for z/OS concept of application compatibility level was introduced with Db2 11, it applied only to SQL DML statements (DML is short for data manipulation language); so, if you tried to execute, in a Db2 11 new-function mode environment, a query that referenced a Db2 global variable (something introduced with Db2 11), and the application compatibility level in effect were V10R1, the statement would fail with a -4743 SQL error code. If on the other hand you issued, in that same Db2 11 environment, a CREATE VARIABLE statement, with application compatibility level V10R1 again being in effect, that statement would execute successfully (assuming you had the Db2 privileges needed to create a global variable). Why would creating a global variable be OK in a Db2 11 environment with the application compatibility level being V10R1, while in the same Db2 environment with that same application compatibility level being in effect a query referencing a global variable would fail with a -4743 error code? The reason: CREATE VARIABLE is an SQL DDL statement, and SELECT is a DML statement, and as noted the application compatibility level affected execution of DML statements but not DDL statements in a Db2 11 system. In a Db2 12 system, the application compatibility level affects both DDL and DML statements.

Examples of -4743 error situations in a Db2 12 environment

A few weeks back, a Db2 for z/OS DBA, working in a Db2 12 environment with function level 500 activated, contacted me to see why the following rather straightforward-looking CREATE LOB TABLESPACE statement was failing with - yep - a -4743 error code (I have masked the actual names of the table space, database, etc.):

CREATE LOB TABLESPACE XXXXXXXX
IN YYYYYYYY
USING STOGROUP ZZZZZZZ
PRIQTY 5760 SECQTY 576
ERASE NO
LOGGED
DSSIZE 4G
BUFFERPOOL BP32K
LOCKSIZE ANY
LOCKMAX 0
CLOSE NO
COMPRESS NO
DEFINE YES;

"What application or tool did you use to issue the CREATE statement?" I asked. The DBA's response: "SPUFI." "OK," I said, "Your SPUFI packages probably have an APPLCOMPAT value of V10R1 or V11R1. Rebind them with APPLCOMPAT(V12R1M500)." The DBA rebound the SPUFI packages with the suggested APPLCOMPAT value, and the CREATE LOB TABLESPACE statement worked just fine.

What messed things up initially? The COMPRESS NO specification. Why? Because the ability to compress a LOB table space (done using a mainframe compression feature called zEDC) was introduced with Db2 12 with function level 500 activated. "So what?" you might say. "The DBA didn't specify COMPRESS YES for the LOB table space. He specified COMPRESS NO. He wasn't requesting compression for the LOB table space." True, but you can't include a COMPRESS specification - YES or NO - for a LOB table space until function level 500 has been activated for your Db2 12 system. That means application compatibility level V12R1M500 (or higher) needs to be in effect when you issue the CREATE LOB TABLESPACE statement with a COMPRESS specification, and THAT'S where APPLCOMPAT comes in. I've repeatedly referred to a Db2 application compatibility level being "in effect." What determines the application compatibility level that is in effect when a SQL statement is issued? It's the APPLCOMPAT value of the package associated with the SQL statement; and, there is ALWAYS a package associated with a Db2 SQL statement, whether the statement is static or dynamic, DML or DDL or whatever. If the SQL statement is dynamic, the package might be a SPUFI package (as in this case), or a DSNTEP2 package, or maybe an IBM Data Server Driver / Db2 Connect package, if the SQL statement is issued by a DRDA requester application or tool.

OK, example number 2. A Db2 for z/OS team at an organization here in the USA had recently activated function level 500 for a Db2 12 system, and they were eager to try out the way-improved MERGE statement enabled via that function level. They gave it a shot, and the statement failed with a -4743 SQL error code. In a meeting with the team, I asked what tool or application had been used to issue the MERGE statement. They told me: a workstation-based tool that used IBM's JDBC driver to access the target Db2 for z/OS system. Probably an issue with the APPLCOMPAT value of the IBM Data Server Driver packages, I explained; but, these folks were ahead of me. "We already rebound all of those packages with APPLCOMPAT(V12R1M500)." Oh. Still, Db2 was telling us via the -4743 that the application compatibility level in effect when the MERGE statement was issued was not what it needed to be. I requested that someone use the workstation-based tool to issue this statement:

SELECT CURRENT APPLICATION COMPATIBILITY
FROM SYSIBM.SYSDUMMY1;

Here's what the query returned:

V10R1

Aha! OK, some explanation: CURRENT APPLICATION COMPATIBILITY is one of the Db2 special registers. Where does its value come from? Well, the special register's initial value for a Db2 session comes from the APPLCOMPAT value of the package being used. The V10R1 returned by the query meant that the workstation-based tool was using a package bound with an APPLCOMPAT value of V10R1. Using that piece of information, and knowing the names of the IBM Data Server Driver packages, one of the DBAs did a quick query of the Db2 system's SYSPACKAGE catalog table, selecting packages having an APPLCOMPAT value of V10R1. Sure enough, there was a set of the IBM Data Server Driver packages bound with APPLCOMPAT(V10R1) in a collection other than the default (for those packages) NULLID collection, and the workstation-based tool was using a package in that alternate collection (these days, it is increasingly common for organizations to have several collections of the IBM Data Server Driver packages, with the packages in the various collections differentiated by bind specifications). The DBA rebound the IBM Data Server Driver packages in the alternate collection with APPLCOMPAT(V12R1M500), the MERGE was issued again via the workstation-based tool, and it worked like a champ.

"OK, I understand this," you might be thinking, "But in the case of a tool or application used with dynamic SQL statements, couldn't you just issue SET CURRENT APPLICATION COMPATIBILITY = 'V12R1M500' (or whatever higher application compatibility level is needed), instead of rebinding the Db2 package being used?" To answer that question: you can indeed use a SET CURRENT APPLICATION COMPATIBILITY statement to change the Db2 application compatibility level in effect for subsequent SQL statements that you or a program might issue, but you can only change that level to a value BELOW the APPLCOMPAT value of the package being used - you can't dynamically go to a level higher than the package's APPLCOMPAT value. If you're wondering, "What good would dynamically going to a LOWER application compatibility level do you?" I'll cite the example of needing to create a traditional segmented table space using a package bound with an APPLCOMPAT value of V12R1M504 (or higher). You can get the details of that scenario in an entry I posted to this blog exactly one year ago.

And that wraps up this blog entry. I hope that it will help you to avoid -4743 SQL errors (or help you to get past those errors if they show up at your site).


Wednesday, March 31, 2021

Db2 for z/OS: Clearing Up Some Matters Pertaining to PGSTEAL(NONE) Buffer Pools

I have written a few entries for this blog in which I provided information about Db2 for z/OS buffer pools defined with PGSTEAL(NONE), including one from a few years ago in which I differentiated the PGSTEAL and PGFIX buffer pool specifications, and another from just a few months ago in which I explained - among other things - why a PGSTEAL(NONE) buffer pools is referred to as a "contiguous buffer pool" in a Db2 12 environment. With z/OS LPAR real storage sizes getting larger and larger and Db2 people looking - rightly - for more ways to exploit "big memory" to gain performance advantages, PGSTEAL(NONE) buffer pools are getting more attention than ever. That said, I have noticed that there continues to be some misunderstanding among Db2 users about a couple of matters pertaining to PGSTEAL(NONE) pools: the use of such pools for read/write versus read-only tables, and the appropriate use of large real storage page frames for PGSTEAL(NONE) pools. In this blog entry I'll try to clear things up on those two points.

First, a little background (for more background information you can check one or both of the blog entries for which I provided links, above). The PGSTEAL(NONE) specification does NOT mean that Db2 cannot steal buffers in such a pool - Db2 will always steal a buffer in a pool when it needs to, and it needs to when the pool's buffers are all occupied and a page of an object assigned to the pool needs to be read from disk storage into memory. PGSTEAL(NONE), rather, is your way of telling Db2, "There should not be a need to steal buffers in this pool, because I have made it (or I think I have made it) large enough to hold all pages of all objects assigned to the pool." A PGSTEAL(NONE) buffer pool, then, is intended for use as a "pinning" pool (i.e., as a buffer pool in which assigned database objects - table spaces and/or indexes - are "pinned" in memory, which is to say, cached in memory in their entirety). Of course, you could set up a pinning pool long before we got the PGSTEAL(NONE) buffer pool specification with Db2 10 for z/OS - all you had to do was make the pool large enough to completely cache the assigned object or objects in memory. Why, then, was the PGSTEAL(NONE) option provided?

PGSTEAL(NONE) was introduced so that Db2 would know that the pool in question is intended for the pinning of objects in memory. You see, when Db2 realizes that the pool is to be used for completely caching objects in memory, it will pitch in to help in achieving your no-read-i/O goal (no read I/Os after an object is fully in memory, that is). When an object assigned to a PGSTEAL(NONE) buffer pool is first accessed after the pool has been allocated, Db2 will immediately get for the associated process (e.g., an application program) the needed page or pages, so that the process can get on with whatever it's doing; then, Db2 will asynchronously (i.e., via prefetch) read into the buffer pool all of the remaining pages of the object. Voila! The object is now completely in-memory. Db2 12 goes a step further and enhances the efficiency of access to an object's pages in a PGSTEAL(NONE) buffer pool by arranging the pages in a contiguous fashion in the pool, as explained in the second of the two blog entries for which I provided links, above.

That is the gist of PGSTEAL(NONE) buffer pool functionality. On now to the two misunderstandings I want to clear up via this blog entry.


Read/write versus read-only tables

This misunderstanding regarding PGSTEAL(NONE) buffer pools has persisted for quite some time. Somehow, a lot of Db2 for z/OS people got the idea that a PGSTEAL(NONE) buffer pool is appropriate only for objects that are read-only (or very nearly so) in nature. WRONG. Read/write versus read-only access will generally not be an important factor when deciding whether an object would be a good candidate for a PGSTEAL(NONE) buffer pool. So what if a page of an object cached in a PGSTEAL(NONE) buffer pool gets updated? The updated page will be externalized (to disk, probably in an asynchronous fashion, or to a group buffer pool, likely in a synchronous manner as part of commit processing, when the Db2 subsystem is part of a data sharing group) as needed and in the usual way. No problem.

There are basically two things to consider when you're looking for database objects (table spaces or indexes) that would be good candidates for a PGSTEAL(NONE) buffer pool: size and frequency of access. Let's take a closer look at these two factors:

  • Size - That this is a factor should be obvious. Suppose you have a table with a billion rows of data (not unusual - an organization with which I often work has more than 50 Db2 for z/OS tables that each hold more than 1 billion rows of data). That table's rows might take up several hundred gigabytes of space. Could you cache that table in memory in its entirety? Sure you could - an individual Db2 buffer pool can be multiple terabytes in size. Do you want to use that much memory for a single database object? Probably not. With that said, when you think about size as a factor in evaluating candidates for a PGSTEAL(NONE) buffer pool, you don't have to think, "Tiny." As I mentioned in my opening paragraph, z/OS LPAR real storage sizes are getting larger all the time (I have seen plenty of LPARs with several hundred gigabytes of real storage - the largest real storage size for an LPAR that I've personally seen in the real world is 1.1 TB). Db2 buffer pool configurations are growing along with z/OS LPAR real storage sizes (I've seen a real-world Db2 for z/OS subsystem with an 879 GB buffer pool configuration; I've seen an individual buffer pool sized at 253 GB). With a large real storage resource available on a system, could you reasonably consider assigning to a PGSTEAL(NONE) buffer pool an object that takes up several hundred megabytes of space? Maybe a gigabyte or more? Yes, you could. The size-appropriateness, then, of an object for a PGSTEAL(NONE) buffer pool is going to depend in large part on the size of the relevant z/OS LPAR's real storage resource and the degree to which that resource can be used for the Db2 subsystem (even a large real storage resource can be over-utilized - I like to see a z/OS LPAR's demand paging rate at either zero or less than 1 per second).

[Something else to think about when considering the size of an object that you might assign to a PGSTEAL(NONE) buffer pool: online REORG. If you execute an online REORG for an object in a PGSTEAL(NONE) buffer pool, that buffer pool will also be used - understandably - for the object's shadow data set(s). That, in turn, means that you'll either size the PGSTEAL(NONE) buffer pool to accommodate the shadow data set(s), or you'll be prepared for a lot of I/O activity for the buffer pool when the REORG job is running, or you'll think about not REORGing the object (and that may not be so far-fetched - some objects truly do not have to be reorganized).]

  • Frequency of access - Yes, z/OS LPAR real storage sizes are getting bigger and bigger, but mainframe memory isn't free. You want to use it effectively - maximize its value. Can you put a lot of dinky objects in a PGSTEAL(NONE) pool? Yes. If those objects are not accessed particularly frequently, would that be a good use of mainframe memory? Probably not. What you want in a PGSTEAL(NONE) buffer pool are objects that are accessed very frequently - that will give you the biggest bang for your buck, as we say. You may already know which not-too-big (see the discussion of size, above) table spaces and/or indexes in your system get a great deal of access activity. If you don't know that, information in the real-time statistics tables in the Db2 catalog can provide clarity (for example, the GETPAGES column of SYSTABLESPACESTATS and SYSINDEXSPACESTATS).
So, consider object size, and consider frequency of access when looking for good candidates for a PGSTEAL(NONE) buffer pool. Type of access, in the read/write versus read-only sense, is generally not going to be an important factor (yes, read/write access could impact the need for online REORG, and that has an implication that I noted in the [bracketed] paragraph above).


Large real storage page frames

I've often recommended to Db2 for z/OS-using organizations that they take steps to ensure that busy buffer pools - those, I'd say, for which the GETPAGE rate is greater than 1000 per second - are fully backed by large real storage page frames. As I explained in an entry posted to this blog a few years ago, large real storage page frames improve page-access performance by making the translation of virtual storage to real storage addresses more CPU-efficient.

It's likely that a PGSTEAL(NONE) buffer pool in your system will be quite busy, in a GETPAGEs per second sense (I just got through explaining why frequency of access is an important consideration when evaluating candidate objects for a PGSTEAL(NONE) pool). That being the case, you should ensure that such a pool is backed by large real storage page frames, right? Well, maybe - maybe not.

Why the equivocation? Here's the deal. As I explained in my blog entry covering the "contiguous" nature of a PGSTEAL(NONE) buffer pool in a Db2 12 for z/OS system, a given real storage page frame used for a PGSTEAL(NONE) pool will hold pages belonging to one and only one database object. Suppose you want to assign to a PGSTEAL(NONE) buffer pool a pretty good number of small, very frequently accessed table spaces and/or indexes. Fine. That could be a very good use of a PGSTEAL(NONE) pool. Should that pool be backed by 1 MB real storage page frames, for a performance boost? Maybe not. Consider that it takes 256 buffers of 4 KB each to exactly fill a single 1 MB page frame. What if you assign, to a 4K pool defined with PGSTEAL(NONE) and FRAMESIZE(1M), a frequently accessed "code" or "look-up" table whose table space consists of only, say, 20 pages? Those 20 pages will go in one 1 MB page frame, and none of the rest of the space in that page frame will be usable for any other object assigned to the pool. What if there are quite a few such small objects assigned to the pool? That could mean that the actual size of the pool will have to be quite a bit larger than you'd first thought, owing to the unusable space in the 1 MB frames that each are limited to holding the few pages of a single object. Are you OK with that? Maybe you'll decide to just go with 4 KB page frames for this PGSTEAL(NONE) pool, to make more efficient use of memory, and I'd find that decision to be very much understandable. What if objects assigned to a PGSTEAL(NONE) pool have, on average, a few thousand pages apiece? Might you still have some "wasted" space in 1 MB page frames if it's a FRAMESIZE(1M) pool? Yes, but in a proportionate sense this won't be such a big deal. What if an object assigned to the pool has 5000 pages? Those pages will exactly fill nineteen 1 MB page frames, and the 136 pages left over will go in another 1 MB frame, and the remaining space in the 1 MB frame, which could hold another 120 pages of 4 KB each, will not be usable for another object. That would be nineteen 1 MB pages fully utilized, and about 47% of the twentieth 1 MB page frame "wasted," or a little over 2% wastage when you consider all of the twenty 1 MB page frames used for the object. Personally, I wouldn't lose any sleep over that.

Here's something that follows from what I've just explained: if you have identified a set of objects that you want to assign to a PGSTEAL(NONE) pool, and a number of those objects are quite small (with several having maybe fewer than 50 pages apiece) and others are larger (maybe a few thousand pages apiece), use two different PGSTEAL(NONE) pools. Specify FRAMESIZE(1M) for one of the pools, and assign the larger objects to that pool, and assign the smaller objects to the pool for which you're using the default frame size of 4K (assuming we're talking about objects with 4 KB-sized pages). That will give you some performance boost from large frames for the one pool, while ensuring efficient use of memory for the other pool.

By the way, I'm not talking about FRAMESIZE(2G) for a PGSTEAL(NONE) pool because Db2 does not allow 2 GB page frames to be used for a PGSTEAL(NONE) pool. If you want to back a PGSTEAL(NONE) pool with large frames, go with FRAMESIZE(1M).

I hope that this blog entry will be helpful for you. Thanks for being a reader of the blog.

Wednesday, February 24, 2021

Are You Using the REST Interface to Db2 for z/OS?

Db2 for z/OS got a built-in REST interface about four years ago. Have you put it to work? Perhaps you should.

Some information about REST

REST is short for representational state transfer, an architectural style that enables invocation of services through a straightforward, consistent mechanism. It is increasingly popular among application developers in large part because it is lightweight, easy to use, and provides a high degree of abstraction with respect to the particulars of a service-providing system. That can be really helpful when an application's client-side programs have to interact with a number of disparate back-end systems - if the service requests your programs issue look largely the same in form, even though the responding systems are quite different from each other, that is likely to be a big boost to your coding productivity, versus a scenario in which you have to interact in different ways (maybe very different) with the various systems that provide services your programs consume.

Let me show you what I mean. A REST request to get information related to account ID 123456789, issued by a client program and fielded by a Db2 for z/OS system, might look something like this:

POST http://mybank.com:4711/services/ACCOUNTS/getBalance

Body: { “ID”: 123456789 } 

 

The response received by the client from the Db2 for z/OS system could look like this:

 

{

  “FIRSTNAME” : “John”,

  “LASTNAME”  : “Smith”,

  “BALANCE”   : 1982.42,

  “LIMIT”     : 3000.00

}

There is nothing in the REST request issued by the client program, and nothing in the request response received by the program, that gives any indication that the request was processed by a Db2 for z/OS system, and that's the point. This type of programming model, sometimes referred to as data-as-a-service, eliminates the need for a client-side developer to know the technical particulars of a request-processing system. Is it a z/OS system? A Linux or a Windows system? Will a relational database management system be involved? Will data be retrieved from flat files, or maybe from a Hadoop data store or a NoSQL database? For the developer of the REST request-issuing program, the answer to these questions can be, "Don't know, don't care - all that stuff's plumbing, as far as I'm concerned." What matters to the developer is that a requested service - retrieval of data for a given account ID, in the example above - worked as expected.

When Db2 for z/OS is the service-providing system, what happens when the REST request is received, and how did that REST service come to be, in the first place? Read on.

What Db2 does with a REST request

So, let's say that the first part of the example REST request seen above, mybank.com:4711, resolves to a Db2 for z/OS system (in that case, 4711 would be the Db2 system's SQL listener port - it could be the system's secure SQL port, and if that is true then Db2 will require SSL encryption for the transaction). The request would be received by the Db2 distributed data facility (DDF), as Db2's REST interface is an extension of DDF functionality (that brings a cost-of-computing advantage: as many Db2 people know, when a SQL statement is executed via DDF, up to 60% of the CPU cost of SQL statement execution can be offloaded to a zIIP engine). Db2 parses the HTTP and JSON associated with the request (JSON is short for JavaScript Object Notation, and I'm referring to the request's input of { "ID": 123456789 } - more on that in a moment), and then does security checks. Security check number one is for user authentication purposes. Db2 requires that a REST request be associated with either an ID and a password or an ID and a certificate (if a password is supplied, it will be in the HTTP header of the request). The ID and password (or certificate) will be validated by RACF (or an equivalent z/OS security manager), and if authentication is successful then we move on to security check number two (note that, additionally, a RACF profile for REST access to the target Db2 subsystem can be defined, giving you the ability to permit an ID to access the Db2 subsystem via the REST interface but not through another interface such as CICS, or vice versa).


That second security check is of an authorization nature: does the ID associated with the REST request have the EXECUTE privilege on the Db2 package associated with the requested service? If that required Db2 privilege has indeed been granted to the ID (or to a group ID to which the ID associated with the REST request is connected), it's package execution time: the package is allocated for execution to the REST request's Db2 thread (a DDF thread, also known as a DBAT - short for database access thread), and the corresponding SQL statement is executed (with the input that accompanied the request, if input is required and supplied).

 

The Db2 REST service example I have used involves execution of this SQL statement:


SELECT C.FIRSTNAME,

C.LASTNAME, A.BALANCE,

A.LIMIT

FROM ACCOUNTS A,

CUSTOMERS C

WHERE A.ID = ?

AND A.CUSTNO = C.CUSTNO

 

The input value supplied with the REST request, account ID 123456789, is substituted for the parameter marker (the question mark) in the query text, and the result set is sent, in JSON format, to the REST client. That output document, as we saw before, looks like this:

 

{

  “FIRSTNAME” : “John”,

  “LASTNAME”  : “Smith”,

  “BALANCE”   : 1982.42,

  “LIMIT”     : 3000.00

}

 

This JSON document shows what a one-row query result set would look like when sent to the REST client. If it were a multi-row result set, the output JSON document would contain several sets of name-value pairs - one such set for each result set row.

 

A word about these JSON documents used for Db2 REST service input and output "payloads": this is another aspect of the REST model that appeals to a lot of developers. Though it is not technically required that input and output data associated with REST requests be in the form of JSON documents, that tends to be the case, and with good reason: JSON documents, as you can see, are really easy to form and to parse (by programs and by people). That's not always true for, say, XML documents. JSON also tends to be "lighter weight" versus XML - fewer bytes are transmitted between client and server.

 

With regard to the name-value pairs seen in the JSON documents associated with a Db2 REST service, let's consider first the input. We saw, for my example, { "ID": 123456789 }. Where did that "ID" come from? It's the name of the Db2 table column referenced in the query predicate (referring to the SELECT statement associated with the REST service) that is coded with a parameter marker. "ID" is a pretty intuitive column name. What if that were not the case? What if you wanted to have a Db2 REST service for retrieval of information related to a product ID, and the table column had an obtuse name like PRID? In that case, the predicate of the query to be REST-enabled could be coded with a host variable instead of a parameter marker, and the host variable name could then be used in the input document for the REST request instead of the column name (so, if the query predicate were of the form WHERE PRID = :PRODUCT_ID, the input JSON document could be { "PRODUCT_ID": 56789 } instead of { "PRID" 56789 }).

 

For an output JSON document that contains data from a query result set, the names in the name-value pairs are, by default, the names of the columns in the query's select-list. As noted above, you could have a table column with a not-very-intuitive name. That situation can be addressed by renaming the column in the query select-list. If, for example, you want to return a mobile phone number from a column named MONUM, you could code the query that you're going to REST-enable in this way: SELECT MONUM AS MOBILE_NUMBER, ... Having done that, you'll have an output JSON document with "MOBILE_NUMBER" : "111-222-3333" instead of "MONUM" : "111-222-3333".

 

How Db2 for z/OS REST services are created

 

I've referred to packages associated with Db2 REST services, and that might cause you to think that what you're REST-enabling is static SQL, and you'd be right about that: the Db2 capability I'm talking about enables invocation of a pre-coded, server-side, static SQL statement in response to receipt of a properly coded (and security-checked) REST request. And yes, I do mean a single statement. Sometimes that's all you need for a given Db2 REST service, and in such a case the single SQL statement can be a SELECT, INSERT, UPDATE, DELETE or TRUNCATE. What if you want a Db2 REST service to involve execution of several SQL statements? Can that be done? Absolutely, because the single static SQL statement for which you create a REST service can be a CALL of a Db2 stored procedure, and the called stored procedure could issue any number of SQL statements (if you REST-enable a CALL of a stored procedure, I'd recommend going with a native SQL procedure because execution of such a stored procedure, when the call is through DDF, is up to 60% zIIP-offload-able).

 

With a SQL statement that you want to REST-enable, you have two service-creation options. One option is to use the Db2-supplied REST service called DB2ServiceManager (a REST service that can be used to create a REST service from a SQL statement). Information on using DB2ServiceManager to create a REST service from a Db2 SQL statement (including an example) can be found in the Db2 for z/OS Knowledge Center on IBM's Web site.


Another option for creating a REST service from a SQL statement is the Db2 command BIND SERVICE, which can be used in a batch job such as the one shown below:


//CR8SRVC EXEC PGM=IKJEFT01,DYNAMNBR=20        

//STEPLIB  DD  DSN=DB2A.SDSNEXIT,DISP=SHR      

//         DD  DSN=DB2A.SDSNLOAD,DISP=SHR      

//SYSTSPRT DD  SYSOUT=*                        

//SYSPRINT DD  SYSOUT=*                        

//SYSUDUMP DD  SYSOUT=*                        

//DSNSTMT  DD  DSN=SYSADM.SERVICE.SQL(SELECT1),

//             DISP=SHR                        

//SYSTSIN  DD  *                               

 DSN SYSTEM(DB2A)                              

                                               

 BIND SERVICE(SYSIBMSERVICE) -                 

  NAME("simpleSelect1") -                      

  SQLENCODING(1047) -                          

  DESCRIPTION('return a list of deptname-      

 based on input location') 

/*


More information about BIND SERVICE can be found in the Db2 Knowledge Center. Also in the Knowledge Center is additional information on invoking a Db2 REST service. And, check out the series of helpful videos on creating and invoking Db2 REST services (in - yes - the Db2 Knowledge Center).

Db2 REST service capabilities, made even better

As I noted at the beginning of this blog entry, the REST interface is built into Db2 for z/OS: if you have Db2, you have what you need to create and use Db2 REST services. With that said, if you want to have even greater ease of use in creating Db2 REST services, have your Db2 REST services described using Swagger (a standard specification with which many developers are familiar), have more flexibility in invoking Db2 REST services, and have more options for formatting the JSON output document associated with a Db2 REST service, you'll want to take a look at an IBM product called z/OS Connect. In an entry I'll post to this blog in the near future, I'll expand on what z/OS Connect can do to make Db2 REST services even more valuable for your organization.