Wednesday, February 24, 2021

Are You Using the REST Interface to Db2 for z/OS?

Db2 for z/OS got a built-in REST interface about four years ago. Have you put it to work? Perhaps you should.

Some information about REST

REST is short for representational state transfer, an architectural style that enables invocation of services through a straightforward, consistent mechanism. It is increasingly popular among application developers in large part because it is lightweight, easy to use, and provides a high degree of abstraction with respect to the particulars of a service-providing system. That can be really helpful when an application's client-side programs have to interact with a number of disparate back-end systems - if the service requests your programs issue look largely the same in form, even though the responding systems are quite different from each other, that is likely to be a big boost to your coding productivity, versus a scenario in which you have to interact in different ways (maybe very different) with the various systems that provide services your programs consume.

Let me show you what I mean. A REST request to get information related to account ID 123456789, issued by a client program and fielded by a Db2 for z/OS system, might look something like this:

POST http://mybank.com:4711/services/ACCOUNTS/getBalance

Body: { “ID”: 123456789 } 

 

The response received by the client from the Db2 for z/OS system could look like this:

 

{

  “FIRSTNAME” : “John”,

  “LASTNAME”  : “Smith”,

  “BALANCE”   : 1982.42,

  “LIMIT”     : 3000.00

}

There is nothing in the REST request issued by the client program, and nothing in the request response received by the program, that gives any indication that the request was processed by a Db2 for z/OS system, and that's the point. This type of programming model, sometimes referred to as data-as-a-service, eliminates the need for a client-side developer to know the technical particulars of a request-processing system. Is it a z/OS system? A Linux or a Windows system? Will a relational database management system be involved? Will data be retrieved from flat files, or maybe from a Hadoop data store or a NoSQL database? For the developer of the REST request-issuing program, the answer to these questions can be, "Don't know, don't care - all that stuff's plumbing, as far as I'm concerned." What matters to the developer is that a requested service - retrieval of data for a given account ID, in the example above - worked as expected.

When Db2 for z/OS is the service-providing system, what happens when the REST request is received, and how did that REST service come to be, in the first place? Read on.

What Db2 does with a REST request

So, let's say that the first part of the example REST request seen above, mybank.com:4711, resolves to a Db2 for z/OS system (in that case, 4711 would be the Db2 system's SQL listener port - it could be the system's secure SQL port, and if that is true then Db2 will require SSL encryption for the transaction). The request would be received by the Db2 distributed data facility (DDF), as Db2's REST interface is an extension of DDF functionality (that brings a cost-of-computing advantage: as many Db2 people know, when a SQL statement is executed via DDF, up to 60% of the CPU cost of SQL statement execution can be offloaded to a zIIP engine). Db2 parses the HTTP and JSON associated with the request (JSON is short for JavaScript Object Notation, and I'm referring to the request's input of { "ID": 123456789 } - more on that in a moment), and then does security checks. Security check number one is for user authentication purposes. Db2 requires that a REST request be associated with either an ID and a password or an ID and a certificate (if a password is supplied, it will be in the HTTP header of the request). The ID and password (or certificate) will be validated by RACF (or an equivalent z/OS security manager), and if authentication is successful then we move on to security check number two (note that, additionally, a RACF profile for REST access to the target Db2 subsystem can be defined, giving you the ability to permit an ID to access the Db2 subsystem via the REST interface but not through another interface such as CICS, or vice versa).


That second security check is of an authorization nature: does the ID associated with the REST request have the EXECUTE privilege on the Db2 package associated with the requested service? If that required Db2 privilege has indeed been granted to the ID (or to a group ID to which the ID associated with the REST request is connected), it's package execution time: the package is allocated for execution to the REST request's Db2 thread (a DDF thread, also known as a DBAT - short for database access thread), and the corresponding SQL statement is executed (with the input that accompanied the request, if input is required and supplied).

 

The Db2 REST service example I have used involves execution of this SQL statement:


SELECT C.FIRSTNAME,

C.LASTNAME, A.BALANCE,

A.LIMIT

FROM ACCOUNTS A,

CUSTOMERS C

WHERE A.ID = ?

AND A.CUSTNO = C.CUSTNO

 

The input value supplied with the REST request, account ID 123456789, is substituted for the parameter marker (the question mark) in the query text, and the result set is sent, in JSON format, to the REST client. That output document, as we saw before, looks like this:

 

{

  “FIRSTNAME” : “John”,

  “LASTNAME”  : “Smith”,

  “BALANCE”   : 1982.42,

  “LIMIT”     : 3000.00

}

 

This JSON document shows what a one-row query result set would look like when sent to the REST client. If it were a multi-row result set, the output JSON document would contain several sets of name-value pairs - one such set for each result set row.

 

A word about these JSON documents used for Db2 REST service input and output "payloads": this is another aspect of the REST model that appeals to a lot of developers. Though it is not technically required that input and output data associated with REST requests be in the form of JSON documents, that tends to be the case, and with good reason: JSON documents, as you can see, are really easy to form and to parse (by programs and by people). That's not always true for, say, XML documents. JSON also tends to be "lighter weight" versus XML - fewer bytes are transmitted between client and server.

 

With regard to the name-value pairs seen in the JSON documents associated with a Db2 REST service, let's consider first the input. We saw, for my example, { "ID": 123456789 }. Where did that "ID" come from? It's the name of the Db2 table column referenced in the query predicate (referring to the SELECT statement associated with the REST service) that is coded with a parameter marker. "ID" is a pretty intuitive column name. What if that were not the case? What if you wanted to have a Db2 REST service for retrieval of information related to a product ID, and the table column had an obtuse name like PRID? In that case, the predicate of the query to be REST-enabled could be coded with a host variable instead of a parameter marker, and the host variable name could then be used in the input document for the REST request instead of the column name (so, if the query predicate were of the form WHERE PRID = :PRODUCT_ID, the input JSON document could be { "PRODUCT_ID": 56789 } instead of { "PRID" 56789 }).

 

For an output JSON document that contains data from a query result set, the names in the name-value pairs are, by default, the names of the columns in the query's select-list. As noted above, you could have a table column with a not-very-intuitive name. That situation can be addressed by renaming the column in the query select-list. If, for example, you want to return a mobile phone number from a column named MONUM, you could code the query that you're going to REST-enable in this way: SELECT MONUM AS MOBILE_NUMBER, ... Having done that, you'll have an output JSON document with "MOBILE_NUMBER" : "111-222-3333" instead of "MONUM" : "111-222-3333".

 

How Db2 for z/OS REST services are created

 

I've referred to packages associated with Db2 REST services, and that might cause you to think that what you're REST-enabling is static SQL, and you'd be right about that: the Db2 capability I'm talking about enables invocation of a pre-coded, server-side, static SQL statement in response to receipt of a properly coded (and security-checked) REST request. And yes, I do mean a single statement. Sometimes that's all you need for a given Db2 REST service, and in such a case the single SQL statement can be a SELECT, INSERT, UPDATE, DELETE or TRUNCATE. What if you want a Db2 REST service to involve execution of several SQL statements? Can that be done? Absolutely, because the single static SQL statement for which you create a REST service can be a CALL of a Db2 stored procedure, and the called stored procedure could issue any number of SQL statements (if you REST-enable a CALL of a stored procedure, I'd recommend going with a native SQL procedure because execution of such a stored procedure, when the call is through DDF, is up to 60% zIIP-offload-able).

 

With a SQL statement that you want to REST-enable, you have two service-creation options. One option is to use the Db2-supplied REST service called DB2ServiceManager (a REST service that can be used to create a REST service from a SQL statement). Information on using DB2ServiceManager to create a REST service from a Db2 SQL statement (including an example) can be found in the Db2 for z/OS Knowledge Center on IBM's Web site.


Another option for creating a REST service from a SQL statement is the Db2 command BIND SERVICE, which can be used in a batch job such as the one shown below:


//CR8SRVC EXEC PGM=IKJEFT01,DYNAMNBR=20        

//STEPLIB  DD  DSN=DB2A.SDSNEXIT,DISP=SHR      

//         DD  DSN=DB2A.SDSNLOAD,DISP=SHR      

//SYSTSPRT DD  SYSOUT=*                        

//SYSPRINT DD  SYSOUT=*                        

//SYSUDUMP DD  SYSOUT=*                        

//DSNSTMT  DD  DSN=SYSADM.SERVICE.SQL(SELECT1),

//             DISP=SHR                        

//SYSTSIN  DD  *                               

 DSN SYSTEM(DB2A)                              

                                               

 BIND SERVICE(SYSIBMSERVICE) -                 

  NAME("simpleSelect1") -                      

  SQLENCODING(1047) -                          

  DESCRIPTION('return a list of deptname-      

 based on input location') 

/*


More information about BIND SERVICE can be found in the Db2 Knowledge Center. Also in the Knowledge Center is additional information on invoking a Db2 REST service. And, check out the series of helpful videos on creating and invoking Db2 REST services (in - yes - the Db2 Knowledge Center).

Db2 REST service capabilities, made even better

As I noted at the beginning of this blog entry, the REST interface is built into Db2 for z/OS: if you have Db2, you have what you need to create and use Db2 REST services. With that said, if you want to have even greater ease of use in creating Db2 REST services, have your Db2 REST services described using Swagger (a standard specification with which many developers are familiar), have more flexibility in invoking Db2 REST services, and have more options for formatting the JSON output document associated with a Db2 REST service, you'll want to take a look at an IBM product called z/OS Connect. In an entry I'll post to this blog in the near future, I'll expand on what z/OS Connect can do to make Db2 REST services even more valuable for your organization.

Wednesday, January 27, 2021

Db2 for z/OS: Why You Might See a Lot of Catalog Access for Package Authorization Checks

From time to time I will receive a note from a Db2 for z/OS DBA, asking for help in understanding why there is a larger-than-expected amount of Db2 catalog access activity associated with package authorization checks, and what can be done to address that situation. In my experience there are two primary causes of this observed Db2 behavior, one more straightforward than the other. In this blog entry I'll describe these two drivers of head-scratchingly high numbers for catalog accesses related to package authorization, and I'll provide associated mitigating actions that you can take.

Before proceeding, a couple of points of clarification. When I mention "package authorization," what I'm talking about, in the context of this blog entry, is authorization to execute a package (not authorization to create a package by way of a BIND PACKAGE operation - that's a different subject). Also, this whole discussion is relevant primarily to packages that are the compiled and executable form of what Db2 people refer to as static SQL statements (there are other packages that essentially enable preparation and execution of dynamic SQL statements). Such a static SQL package could be associated with a Db2 stored procedure, or with a CICS-Db2 transaction program, or with a batch job. If the application process requesting execution of a static SQL package is local to the Db2 subsystem (i.e., it is not accessing the Db2 system by way of a network connection and the Db2 distributed data facility), the package will be executed in conjunction with a Db2 plan, with which it will be associated by way of the plan's package list (more on that to come).

OK, on to the first of the two aforementioned drivers of unexpectedly high catalog access activity related to package authorization checking.


A too-small package authorization cache

In a Db2 system in which the workload involves execution of a lot of static SQL statements, there can be a LOT of package authorization checks - like, hundreds per second. To help make this authorization checking as efficient as it can be, Db2 has, in its database services address space, an in-memory cache of package authorization information. When Db2 checks to see if ID ABC is authorized to execute package XYZ, it will check the package authorization cache first, to see if the authorization information is there. If that information is not found in the cache, Db2 checks the catalog (the SYSPACKAUTH table, for example) to see if ID ABC is authorized to execute package XYZ. The information found in the catalog is then placed by Db2 in the package authorization cache in memory, so that the next time a check of ID ABC's authorization to execute package XYZ is needed, the check can be very quickly and efficiently accomplished via the cache.

All well and good, but sometimes there's a bit of a problem here, and it has to do with the size of the in-memory package authorization cache. If it's too small, there could be a lot of cache "misses" for package authorization checks, and those misses will drive catalog accesses. How can you tell if this is happening? Use your Db2 monitor to generate a statistics long report for the Db2 subsystem of interest, covering a busy hour of a busy day (depending on the Db2 monitor in use at your site, this report might be called a statistics detail report). In that report, you should see a set of fields under the heading AUTHORIZATION MANAGEMENT, such as those shown in the report snippet below (headings and field names might vary slightly from one Db2 monitor to another).

AUTHORIZATION MANAGEMENT     QUANTITY  /SECOND

---------------------------  --------  -------

PKG-AUTH SUCC-W/O CATALOG     5885.1K  1634.76

PKG-AUTH UNSUCC-CACHE        43464.00    12.07

These numbers look good. Why do I say that? Because more than 99% of the time, a package authorization check was completed without a requirement for catalog access (1634.76 + 12.07 package authorization checks per second, with only 12.07 of those requiring catalog access). There were very few package authorization cache "misses" (as indicated by the smallness of the values in the PKG-AUTH UNSUCC-CACHE row relative to the values in the PKG-AUTH SUCC-W/O CATALOG row).

Sometimes, the numbers in this part of a Db2 monitor statistics long report don't look so good.  I've seen that the value in the PKG-AUTH UNSUCC-CACHE row can be as high as several hundred per second (even over 1000 per second). When that's the case, it can be an indication that the package authorization cache is way smaller that it should be. Why might that be the case? Well, the size of the cache is determined by the value of the CACHEPAC parameter in ZPARM, and prior to Db2 10 the default CACHEPAC value was only 100 KB. Sometimes, an existing ZPARM value is carried forward when a Db2 system is migrated to a new version, and that old and small CACHEPAC value did indeed go forward at quite a few sites; so, check the value of CACHEPAC for your Db2 system. If it is anything less than the max value of 10M (meaning, 10 MB), change the value to 10M.

Having said all this, I'll tell you that depending on the maintenance level of your Db2 system, you might look for CACHEPAC in your ZPARM listing and NOT SEE IT. Why is that? It's because the fix for APAR PH28280, which came out in the latter part of 2020, removed CACHEPAC from ZPARM and set its value internally to 10M (that fix did the same for CACHERAC, the parameter that specifies the size of the routine authorization cache, used to check authorization to execute Db2 routines such as stored procedures and user-define functions).

OK, so a too-small package authorization cache can lead to high levels of catalog access activity related to package authorization checks. You can make the size of that cache what it should be, which is 10M (or IBM will do that for you, via the fix for the aforementioned APAR), but there's something else that could cause a lot of catalog access for package authorization checks...


Location asterisks in a plan's PKLIST

I noted earlier that a static SQL package used by a local-to-Db2 application will be executed in conjunction with a Db2 plan (remote client applications that access Db2 via the Db2 distributed data facility are associated, by default, with plan DISTSERV, which is not a plan in the traditional sense). The plan for a local-to-Db2 application has something called a PKLIST, which is short for package list. In a plan's PKLIST are the packages that can be executed by way of the plan. You can make use of asterisks in a plan's PKLIST in a couple of ways. One way, which is very common and helpful, is to use an asterisk for the packages in a given package collection; so, if packages in collection COLL1 are to be executed through plan PLAN1, the PKLIST for PLAN1 can have the entry COLL1.*. That covers all the packages in collection COLL1, and it's a lot more convenient than putting each individual package in COLL1 in the PKLIST for PLAN1.

Another use of asterisks in a plan's PKLIST can be problematic. I'm referring here to the location of a collection. You could have, in the PKLIST for PLAN1, the entry *.COLL1.*, and here's what that high-order asterisk means: it means that PLAN1 can be used to execute packages in collection COLL1 at the local Db2 system and at any remote location at which a collection named COLL1 might exist. Sometimes, there actually is a collection COLL1 at a remote server, and there is a need for packages in the remote COLL1 collection to be executable through PLAN1. Usually, that is not the case - the plan is only going to be used with local packages. "So what?" you might say, "If PLAN1 is only going to execute packages in the local COLL1 collection, it can do that it it has an *.COLL1.* entry in its PKLIST." True, but here's the deal: if there is an asterisk in the location-name part of an entry in a plan's PKLIST, there will be a package authorization check done every time a package in that collection is executed via the plan. That, in turn, could lead to a lot of associated catalog access activity. If, instead, the entry is the plan's PKLIST is COLL1.* (i.e., nothing - not even an asterisk - is specified for the collection's location), that will be as though the local Db2 had been explicitly specified as the collection's location, and that single-location specificity will mean that authorization to execute the packages in COLL1 is checked at BIND PLAN time and will NOT be checked when packages in COLL1 are subsequently executed via the plan (the authorization checked at BIND PLAN time is that of the ID of the process issuing the BIND PLAN command - for example, that ID may have been granted EXECUTE ON PACKAGE COLL1.*).

With this in mind, if for a local-to-Db2 application involving static SQL execution you see an unexpectedly high level of catalog accesses (specifically, accesses to catalog tables such as SYSPACKAUTH), check the entries in the plan's PKLIST. If you see an entry of the form *.COLL1.*, you might want to do one of two things: remove the asterisk in front of the collection's name if the collection is local; or, if packages in a collection with the specified name at remote location LOC2 will be executed via the plan, change an entry of the form *.COLL1.* to LOC2.COLL1.*. What if the plan will be used to execute packages both in a local collection named COLL1 and in a COLL1 collection at location LOC2? In that case, have a COLL1.* entry in the plan's PKLIST for the local collection COLL1, and a LOC2.COLL1.* entry for the collection COLL1 at location LOC2. Yes, this is more of a hassle than just putting an *.COLL1.* entry in the plan's PKLIST, but it can result in elimination of execution-time package authorization checks (and associated catalog access activity).

And there you have it. To make package authorization checking as efficient as it can be, make the size of the package authorization cache as large as it can be, which is 10M (if you have not already done this or if Db2 has not already done that for you via the fix for APAR PH28280), and - for local-to-Db2 applications using static SQ packages - don't use an asterisk for the location of the packages' collection in the relevant plan's PKLIST (instead, don't specify anything - not even an asterisk - for the location of a local collection, and explicitly specify the remote location name of a not-local collection).

I hope this information will be useful for you.