Wednesday, June 30, 2021

Db2 for z/OS: REST Services Scalability

I received an email the other day from a Db2 for z/OS system administrator who had recently attended a workshop on the Db2 REST interface. He mostly liked what he heard, but he had a concern about the scalability of Db2 REST services. Knowing that REST requests are stateless in nature, he anticipated that a large volume of such requests targeting a Db2 for z/OS server would drive a large amount of activity involving connection processing, thread processing and user authentication and authorization. Wouldn't those areas of system operation end up being bottlenecks that would throttle transaction throughput for client applications?

The Db2 system administrator's concerns were certainly understandable, but a Db2 for z/OS system can in fact handle a very large volume of REST requests - several thousand per second should be do-able, assuming sufficient server-side processing capacity. In this blog entry, I'll provide information on how scalability is achieved for Db2 for z/OS REST services, referring to the three areas of associated server-side processing mentioned by the Db2 system administrator.


Connection processing

This is where the single most important step was taken by the IBM Db2 for z/OS developers responsible for the Db2 REST interface: they leveraged Db2's inactive connection functionality. A LONG time ago, Db2 got this functionality via the ZPARM parameter CMTSTAT. Starting with Db2 V8 for z/OS, the default value for CMTSTAT became INACTIVE, and here's what that means: say an application server establishes a connection to a Db2 for z/OS subsystem via that Db2 system's distributed data facility (DDF - the Db2 address space that handles communication with network-attached applications). A transaction associated with the connection is initiated, and the connection from the application to Db2 is paired with a DBAT (a DDF thread) for the purpose of SQL statement processing. The transaction completes, and the DBAT goes back in the DBAT pool, and the connection goes into an inactive state. This "inactive state" of the transaction is strictly a Db2-side thing - it is not visible to the application, which in a logical sense continues to see that it is connected to the Db2 system. When the next transaction associated with the connection is submitted, the connection on the Db2 side is "snapped back" to an active state, and paired with a DBAT for SQL processing, and then the transaction completes and the connection is again separated from the DBAT and placed back in an inactive state, and so on and so on.

Why was inactive connection functionality built into Db2? Originally, this was done largely to enable a single Db2 subsystem to support a very large number of connections from network-attached applications - there can be up to 150,000 such connections to one Db2 subsystem (n times 150,000 for a Db2 data sharing group with n members). No way can one Db2 subsystem handle anything like 150,000 concurrently-executing DDF transactions (never mind the processing power you'd need - there wouldn't be enough "below-the-2GB-bar" virtual storage in the Db2 database services address space for that number of threads), but Db2 doesn't have to be able to process as many in-flight DDF transactions as there are connections to Db2, because at any moment in time the majority - often, the large majority. - of those connections will be in a not-being-used-right-now state. This is why the value of the Db2 ZPARM parameter CONDBAT (maximum connections to a Db2 subsystem via DDF) is typically several times the value of MAXDBAT (maximum number of connections for which transactions can be concurrently in-flight). Db2 inactive connection functionality makes all this feasible because the CPU cost of flipping a connection from the inactive to the active state and back is very small, and the "footprint" of an inactive connection (virtual storage used, etc.) is very small.

OK, what does this have to do with the scalability of Db2 REST services? Well, you may be aware that REST requests are "stateless" in nature - each and every REST request is its own independent transaction (one unit of work, from a Db2 perspective), with no expectation or assurance of a related follow-on request from a given client-side application. With that being the case, you might think - like the Db2 system administrator to whom I previously referred - that there would be a whole ton of Db2 connect and disconnect activity associated with a large volume of REST requests. This isn't, after all, like the typical Db2 DRDA application scenario (DRDA - distributed relational database architecture - being Db2's protocol for distributed database processing) in which an application connects to a Db2 subsystem and stays connected to that subsystem for a long time (hours, maybe days). Connect and disconnect processing is kind of expensive. Wouldn't that cost put the kibosh on Db2 REST service scalability? It might have, but for an enhancement delivered a few years ago via the fix for Db2 APAR PI86868: when a client application sends a REST request to a Db2 subsystem, the associated connection will be retained by the Db2 subsystem, in an inactive state, for up to 15 seconds, so that it can be used again to process another request sent via that connection. When that next REST request associated with the connection (i.e., from the same client IP address and ephemeral port) comes along - assuming that that happens within the aforementioned 15-second window - the connection will be snapped from the inactive to the active state. The request will be processed, and the connection will again be flipped to the inactive state, where it can remain for up to another 15 seconds while Db2 waits for another REST request associated with the connection. If the inactive connection remains unused for 15 seconds it will be terminated by Db2.

In a high-volume REST request situation - exactly the scenario that would otherwise cause us concern - it is highly likely that Db2's temporary retention of connections from REST request-issuing applications will result in a lot of connection reuse (it doesn't take a whole lot of volume to get tp one request every 15 seconds or less from one connection), and that is a key factor in the scalability of Db2 REST services.


Thread processing

In this area, Db2 REST service scalability is boosted by something that's been around (as I recall) since the early 1990s: the DBAT pool. When Db2 creates a DBAT to service a DDF transaction, it doesn't just chuck that DBAT when the transaction completes. Instead, it puts the DBAT in the DBAT pool so that it can be used again to service another transaction (a DBAT can stay in the pool for a number of seconds specified by the ZPARM parameter POOLINAC, which has a default value of 120 - if the DBAT has not been reused in that time, it will be terminated). If you use your Db2 monitor to generate a statistics long report (depending on the monitor product used, that might be called a statistics detail report), you should see a section in the report with a heading like GLOBAL DDF ACTIVITY. In that section you should see a couple of fields with names like DBATS CREATED and POOL DBATS REUSED. If we call the former field A and the latter B, the DBAT reuse rate is B / (A + B). It is very common, in a production Db2 environment with a significant DDF workload, to see a DBAT reuse rate that exceeds 99.9%. High rates of DBAT reuse have long been part of the Db2 DDF efficiency story at many sites, and they are an important contributor to Db2 REST service efficiency and scalability.

[Fun fact: though you might think of "active" DBATs as those currently being used to service DDF transactions, and 'inactive" DBATs as those in the pool, that is not an accurate conception. In fact, all DBATs, including those in the pool, are active. What you have are DBATs that are in a connected state, servicing DDF transactions, and DBATs that are in a disconnected state - the latter are the ones in the pool.]

Keep in mind that once a DBAT is provided for use in processing a REST request, the package associated with the requested service has to be allocated to the DBAT. Here, you can enhance efficiency by making it very likely that the package in question will be in-memory, in the oddly named skeleton pool (the package cache - part of the Db2 EDM pool). How can you check on this? Generate a statistics long report, using your Db2 monitor, and look for the section in which EDM pool activity is provided. In that report section, you'll see a value for something like PT REQUESTS (PT is short for package table - this is a request for a package) and another value for something like PT NOT FOUND (meaning, not found in the skeleton pool, in which case the package has to be retrieved from the Db2 directory on disk). The ratio of PT REQUESTS to PT NOT FOUND should be at least several thousand to one, and ideally multiple tens of thousands to one - if it isn't (and if the z/OS LPAR is not short on real storage), make the skeleton pool larger (the ZPARM parameter to adjust is EDM_SKELETON_POOL).


User authentication and authorization

Let me cover authorization first. What this involves is checking to see if the ID associated with a REST request has the Db2 EXECUTE privilege on the package that will be executed to provide the requested service. That's it - a good bit simpler versus checking the various privileges that might be required for execution of dynamic SQL statements (e.g., the SELECT privilege on table T1, the UPDATE privilege on table T2, etc.). This straightforward EXECUTE-on-package check is made particularly efficient by an in-memory cache of package authorization information maintained by Db2. I have seen on a number of occasions that a Db2 subsystem's package authorization cache, sized by way of the ZPARM parameter CACHEPAC, is smaller than it should be, leading to a sometimes high volume of accesses to the SYSPACKAUTH table in the Db2 catalog to check package execution authorization. That too-small situation will soon be (maybe already is, for your system) a thing of the past, thanks to the fix for Db2 APAR PH28280, which removes CACHEPAC from ZPARM and sets its value internally to the maximum size of 10,485,760 bytes.

The value of package authorization caching is particularly high when a relatively limited number of IDs are checked for package execution authorization, and that is often the case in a DDF application environment, whether the client applications are issuing REST requests or SQL statements - it is quite common for the ID in question to be an application's ID, as opposed to an end-user's ID (in such a case, the application performs authentication for end users on its end, and then interacts with Db2 on behalf of end users).

The user authentication story is much the same. When processing a DDF transaction (REST-related or otherwise), Db2 authenticates the associated ID (and accompanying password or certificate) by invoking the RACF macro RACROUTE REQUEST=VERIFY (and I recognize that you might use in your environment a different z/OS security manager from another vendor). If the ID is successfully authenticated, Db2 caches the credentials in-memory for up to three minutes. If, during that interval, the same ID needs to again be authenticated then Db2 will reuse the cached credentials and there will not be a need for another invocation of the RACROUTE REQUEST=VERIFY macro. This mechanism can majorly enhance user authentication efficiency and scalability related to Db2 REST requests, just as Db2's package authorization cache enhances authorization scalability for such requests.

And there you have it. With regard to connection processing, thread processing, and ID authentication and authorization processing, Db2 for z/OS provides mechanisms to deliver scalability and efficiency for REST service requests. You can use Db2's REST interface with confidence, even for high-volume applications.