Tuesday, August 23, 2022

What Db2 for z/OS People Should Know About Data Fabric

"Data fabric" is an increasingly hot topic in IT circles, and with good reason - an effectively implemented data fabric can deliver significant dividends by enabling an organization to get more value from its data assets. Db2 for z/OS people should have some familiarity with the data fabric concept and associated technology, not only as preparation for participating in data fabric-related discussions but also because data fabric is of major strategic importance for Db2 for z/OS (and for other z/OS-based data sources). In this blog entry I'll provide information on data fabric that I hope will be helpful to Db2 for z/OS people.


What is "data fabric," anyway?

Essentially, data fabric is an architecture that brings uniformity and consistency to data originating in a disparate collection of sources - sources which could be (likely would be) housed in a mix of on-premise and in-cloud systems (and, especially for larger enterprises, "in cloud" would involve several different public cloud providers and perhaps some private cloud environments). That uniformity and consistency is manifest in multiple aspects of data interaction via the data fabric, including data access, discovery, utilization, cataloging, protection and governance; further, a data fabric is likely to have a "smart" dimension, with AI and machine learning technology leveraged to provide intelligent automation of data management tasks.

I mentioned that the data fabric payoff is increased value gained from an organization's data assets. How does data fabric deliver that payoff? Basically, by eliminating friction that would otherwise impede data access, discovery, utilization and integration - and doing that without compromising data security. The promise of a data fabric can be largely summed up in this way: it provides an environment in which the right data (i.e., data that is current, trusted, understood and complete) is available to the right people (people who know the data, people who know what data they need, people who know what they want to do with data) at the right time (i.e., when the data is needed).

In thinking about the value of the consistency and uniformity that a data fabric brings to what would otherwise be a disjointed data landscape, it can be helpful to consider a cake-baking analogy. Suppose you are tasked with baking a cake, and suppose further that the ingredients must be ordered from different countries, and you have to communicate with suppliers using the primary language of each source country and you have to remunerate the suppliers using source-specific modes of payment. Here's how that might go (and in your mind, substitute any countries you want for the ones I mention - I'm not picking on anyone):

  • The eggs for the cake are to come from Japan, but there is a delay in procurement because you don't speak Japanese.
  • The butter is to come from Australia, but the supplier will only send the butter after having received payment in coins that were sent via sailboat.
  • The flour will come from a supplier in Germany. Your German is a little rusty, but pretty good so there's not much of a delay there.
  • The sugar is to be sourced from Brazil, but your lack of familiarity with the ingredient-ordering user interface results in your being unable to locate a supplier.
  • This all leads to your getting a late start in baking the cake, and on top of that the eggs went bad while you were waiting for the butter, and you never got the sugar. The people who were looking forward to consuming your confection had to wait a frustratingly long time to get a very un-tasty cake. Not good.
Now imagine a different scenario, in which a cake-ingredient-ordering front end abstracts the particulars of the ingredient suppliers (such as native language) and provides uniformity for payment and shipment. Using that front end, you get the ingredients you need - and all the ingredients you need - in a timely manner, and your cake consumers are delighted with the product of your kitchen, which satisfied their sweet-tooth needs and arrived at the right time.

So it is with a data fabric: different data elements from different data sources are the “ingredients” that provide a complete (sometimes called a “360”) view of a subject of interest - be that customers, processes, supply chains, products, whatever. And here's the upshot: when the right (and all the right) data ingredients get to the right people at the right time, the result is better: better decisions, better and more timely applications, better outcomes.

There is technology that can make the promise of data fabric a reality, but before getting into that I want to emphasize that data fabric is NOT just a matter of leveraging technology. I'd go so far as to say...


Data fabric is culture

There were people who said the same thing a few years ago about DevOps, and for the same reason: full and effective implementation of a data fabric can require new organizational roles and new ways of thinking about and managing data. To appreciate this assertion, consider the "personas" (i.e., the people-roles) associated with individuals who would work with, and in relation to, a data fabric. That exercise is facilitated if you think of a data fabric as something that enables a “data store,” in which people “shop for data.” For a traditional retail store, relevant personas include the following:

  • Consumers acquire products from the store.
  • Suppliers provide products for the store.
  • A store manager decides which products should go on which shelves.
  • A sales associate puts the right products on the right shelves.
OK, so what are the personas that have a relationship with the "data store" enabled by a data fabric? Some are listed below.

  • data consumer might be a developer working on a new application, or a business analyst researching the viability of a proposed new product.
  • database administrator oversees a data source that supplies the data store.
  • data curator might make decisions on what data will be available through the data store, and to whom.
  • data steward might “stock the shelves” of the data store, based on decisions made by a data curator.
Look again at those last two personas in the list above - data curator and data steward. I can tell you for a fact that those roles exist today in multiple organizations - are they present in your workplace? And note: a data fabric's impact goes beyond new organizational roles - it involves new ways of thinking about data management. Here's what I mean: historically, data was often thought of in relation to where it was stored. That manner of thinking led to “silo” situations, and the difficulty of working with data in a “cross-silo” way interfered with organizations’ extracting maximum value from their data assets. By contrast, a data fabric will deliver the greatest benefit when it supports a data management approach that focuses more on data itself, and less on where data is stored. One implication of a data-centric (versus a data-source-centric) approach to data management is that data access decisions (i.e., who can access what data, and in what form) are made by data professionals (e.g., data curators), as opposed to being made by database professionals (e.g., DBAs). In such an environment, data source administrators are implementers of data access decisions made by data curators.

If a data fabric puts data administration (versus database administration) responsibility on data professionals (e.g., data curators), does that diminish the role of a Db2 for z/OS DBA? I would say it does not. I see this is being part of an ongoing evolution of the Db2 for z/OS DBA role to be more engaged in application development (for distributed systems DBAs, this role shift became widespread some years ago). This is a good thing. I am convinced (and more importantly, so are a lot of IT leaders at Db2 for z/OS sites) that the value a mainframe Db2 DBA delivers to an organization goes up when that DBA's work has more of an application-enabling focus.

Let me shift now from organizational impact to enabling technology.


IBM's foundational data fabric-enabling technology

Multiple IBM offerings have a connection with data fabric, but the most foundationally important of these is called Cloud Pak for Data. Cloud Pak for Data's importance has a lot to do with IBM's point of view regarding data fabric implementation. We believe that a data fabric is most effectively implemented as an abstraction layer extended over an existing data landscape. Such an implementation approach acknowledges the significance of “data gravity” - the idea that data usage actions should flow to the data, rather than vice versa. A data fabric enabled via Cloud Pak for Data is characterized by "in-place” access to data on systems of origin. This approach delivers multiple benefits, including:
  • Minimization of data replication costs.
  • Protection of data security and consistency.
  • Optimized performance.
Cloud Pak for Data itself can be thought of as a set of software-powered services that relate to access, governance and usage of data. Cloud Pak for Data can be deployed anywhere Red Hat OpenShift (a Kubernetes container platform) can be deployed: on-premise, in a private cloud or in a variety of public cloud environments (it is also available in a fully managed, as-a-service form). Cloud Pak for Data can be used with a wide range of data sources on Linux, UNIX, Windows and z/OS systems, and those data sources can be on-premise and/or in-cloud.

How would Cloud Pak for Data be used by people in an organization? Here's one scenario: let's say that Val leads a development team that will soon begin work on a new application. To support this work, Val’s team will need access to some data (which happens to be in a Db2 for z/OS database) and associated metadata (data about the data). Val sends a request to this effect to Steve, a data curator. Steve is very familiar with the data that the new application will process. He logs in to Cloud Pak for Data's user interface and creates a project that will provide Val’s team with the data and metadata they need. Db2 for z/OS is one of many data sources supported by Cloud Pak for Data, and Steve creates a connection to the relevant Db2 system. Steve selects the particular tables holding the data that the new application will process and associates them with the project he created for Val's team. Steve also imports metadata for the selected tables, and enriches that metadata with statistical values, data quality scores and business terms. Finally, Steve creates a masking rule for sensitive data in a column of one of the selected Db2 tables - Val's team will be able to reference the column in their program code, but they will only see masked values when they view the column's contents. With the project created and the associated data assets published to a catalog to which Val and her teammates have access, the developers will be able to easily view the data and the related metadata, and this will enable them to move ahead quickly and productively with coding and testing.

The point I really want to make here is not so much, "Look what the data curator can do for the application development team." Even more important to me is the fact that, had Val's team needed access to data (and with it, associated metadata) in a Db2 for Linux/UNIX/Windows database, or a SQL Server database, or an Oracle database, or Apache Cassandra, or Amazon S3, or MariaDB or one of the myriad other data sources supported by Cloud Pak for Data, the actions of the data curator would have been largely the same. And, that would be the case for all kinds of other Cloud Pak for Data usage scenarios - a data scientist needing to develop and train a predictive model, a business person wanting to create a report with accompanying data visualizations, a data curator implementing new rules and policies concerning access to certain data assets, a data administrator virtualizing non-relational data to make it more easily accessible and consumable, whatever. That, as much as anything, is the "secret sauce" of a Cloud Pak for Data-enabled data fabric: it makes all kinds of data sources more easily accessible and effectively consumable by all kinds of people, without sacrificing data governance and security. And when more of an organization’s data assets are used more easily and effectively by more people, the organization works better.


Data fabric is strategically really important for z/OS as a data-serving platform

The uniformity brought to a data landscape by a data fabric is of outsized importance in the context of z/OS as a data-serving platform. How so? Think about it. What gets in the way of z/OS-based data being more effectively - and more widely - used by people in an organization? Often, it's the perceived “other-ness” of the mainframe – the sense non-mainframe people have that z/OS-based data is inherently harder to access, understand and use than data on other platforms. Truth be told, that perception has, historically, been at least partly fact-based – it has been harder for many people to access and use z/OS-based data versus off-mainframe data. The great value, then, of an effectively implemented data fabric, from a z/OS perspective, is not so much that it makes z/OS-based data easier to access and use versus off-mainframe data; rather, it’s the fact that the data fabric makes z/OS-based data as easy to access and use as off-mainframe data. Why that's so powerful: while mainframe systems have been recognized for decades as being unmatched in terms of reliability, security, scalability, efficiency and performance, there have been plenty of people who would say, "Yeah, but mainframe-based data is hard to access and use." An effective data fabric eliminates that "yeah, but..."

Let that sink in: by making discovery, understanding, consumption and usage of data in z/OS systems as easy it is for data on other platforms, a data fabric makes IBM zSystems an even higher-value platform for an organization's most valuable data assets.

If your organization has not yet looked at implementing an enterprise data fabric, now could be a good time to start down that path. And, the "in-place access to data on systems of origin" that characterizes a data fabric implemented with IBM's Cloud Pak for Data could well be the approach that will deliver maximum benefits in your environment. Give it some thought, and get engaged.