Skip to main content

The Role of Mainframe Data in a Hybrid Environment

Craig Mullins outlines the various types of mainframe data and how hybrid applications use them

TechChannel Data Management

More and more organizations are adopting a hybrid approach to their IT environment. A hybrid environment, in the context of computing and IT infrastructure, combines multiple types of resources or technologies, typically involving a mix of on-premises infrastructure, private cloud and public cloud services. It allows organizations to leverage the benefits of both on-premises and cloud-based solutions to meet their specific requirements.

This approach is especially appealing to larger organizations with a mainframe footprint because of the strong legacy of important applications that run on mainframes. Mainframes are designed to handle large-scale, high-performance, mission-critical workloads, and they excel in processing and managing vast amounts of data and transactions. So, organizations that have invested in mainframes rely on the high reliability, availability and security offered by the platform.

At the same time, cloud computing offers tremendous benefits for new development of web and mobile applications on a cost-effective platform. The cloud is well suited for a wide range of applications, particularly those that benefit from scalability, flexibility, cost-efficiency and accessibility.

This means that organizations are keeping and extending their mainframe applications, while also building out new applications using cloud services. Such an approach is called a hybrid environment, or sometimes hybrid cloud computing.

Types of Mainframe Data

Data plays a crucial role in modern development practices, enabling organizations to make informed decisions, optimize processes and deliver better user experiences. For example, data is at the core of AI and Machine Learning (ML) development as ML models are trained on data sets to learn patterns and make predictions. And this is only one example: Truly, all types of development require and produce data. Given its rich heritage, the mainframe is a phenomenal source of useful data for all types of modern development.

But it can be difficult to access mainframe data out of context. Consider the numerous different types and formats of mainframe data that exist.

Data may be stored on many different database management systems (DBMS) on the mainframe. Even though today Db2 for z/OS is the leading mainframe DBMS, there are several other popular DBMSes that are used to power mainframe applications and store critical data, including IMS, IDMS, Adabas and Datacom. These DBMSes all store data differently and use multiple different models including relational, network (where relationships between records are defined through sets and pointers), hierarchical (where data is organized in a tree-like structure with parent-child relationships) and others.

Mainframe data need not be stored in a DBMS, though. A lot of mainframe data is stored in flat files, also known as QSAM or physical sequential files. Flat files contain records with no structured relationships and require additional knowledge to interpret their content (for example, a COBOL copybook that contains a file description including fields, data types and lengths).

Mainframe data sets can also be partitioned; a partitioned data set is often referred to simply as a PDS. A PDS consists of a directory and members. The directory holds the address of each member and enables each member to be accessed directly. Each member consists of sequentially stored records.

Another very popular type of mainframe data is VSAM, or virtual sequential access method. This is a methodology for the indexed or sequential processing of records on direct access devices. There are three ways to access data in a VSAM file: random (or direct), sequential and skip-sequential. As with flat files, VSAM files require a file definition in order to access as there is no embedded description of the data, other than perhaps a key.

Another type of mainframe data that may be useful exists in log files. Log data is used with DBMSes to manage and record changing data, but it can also be used by transaction processing systems (such as CICS and IMS/TM) and other system software. The operating system, z/OS, also writes log data to multiple locations, such as the SYSLOG, the job log, the OPERLOG, the console and more. All of these logs are formatted differently and can be difficult to interpret without additional context and documentation. Nevertheless, log data can be a useful tool for system management, as well as uncovering useful operational information.

We also need to acknowledge that not all mainframe data need be stored on disk. Mainframes often utilize magnetic tape storage for archival purposes. Tape data only can be accessed sequentially.

Obviously, the mainframe is a rich source of data. But how can we access this data in a hybrid environment from applications that may not be running on the mainframe itself?

How to Access Mainframe Data

There are many different ways for hybrid applications to utilize mainframe data. The key is to enable the application to understand and access the data in a way that makes the most sense for the operations it will undertake.

One approach is to use application programming interfaces (APIs) or web services to interact with mainframe data. These APIs enable authorized access to specific mainframe functions, data repositories or transactions. Cloud applications can use standard protocols like RESTful APIs, Simple Object Access Protocol (SOAP) or WebSphere MQ to communicate with mainframe systems and exchange data.

Another popular mechanism is to deploy middleware and integration platforms to act as intermediaries between cloud applications and mainframe systems. They provide connectors, adapters or APIs specifically designed to interface with mainframe environments. These platforms facilitate seamless data integration, transformation and messaging between cloud applications and mainframe data sources.

Message queues and event streams are also a way for cloud applications to access mainframe data. MQ technologies like IBM MQ or Apache Kafka can be used to exchange messages or events with mainframe systems. Mainframe applications can publish messages to the queue or subscribe to event streams, allowing cloud applications to consume and process the data in near real time.

It is also possible for cloud applications to connect directly to mainframe databases using the appropriate database connectivity protocols. For example, Db2 on z/OS supports industry-standard database connectivity options like Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), and SQL in Java (SQLJ) for accessing mainframe data. Cloud applications can use these interfaces to directly query, update or manipulate mainframe databases.

Another possibility is to move the data from the mainframe to the cloud using extract, transform, load (ETL). ETL procedures can be deployed to extract data from mainframe systems, transform it into a suitable format and load it into cloud-based data storage or analytics platforms. This approach involves extracting mainframe data through various methods such as file transfers, database queries or APIs. The data is then transformed and loaded into cloud data warehouses, data lakes or other storage solutions for further processing and analysis.

Yet another approach is to use replication and synchronization mechanisms to continuously capture changes made to mainframe data and propagate them to corresponding cloud data repositories in near real time. Examples of this technology include IBM Data Replication and change data capture (CDC). Another option is IBM Data Gate, which can be used to synchronize data from Db2 for z/OS to the hybrid cloud. These approaches enable cloud applications to work with up-to-date mainframe data without directly accessing the mainframe systems.

Finally, virtualization and emulation techniques may be employed to create an abstraction layer between cloud applications and mainframe systems. This involves running mainframe operating systems or applications on virtualized environments or emulators within the cloud infrastructure. Cloud applications can then interact with the emulated mainframe environment using standard networking protocols or APIs.

The specific method chosen depends on factors such as the nature of the mainframe data, security considerations, performance requirements, existing mainframe infrastructure, latency considerations and compatibility with cloud technologies.

Organizations often leverage a combination of these approaches to integrate cloud applications with mainframe data, enabling modernization, data access and seamless interoperability between mainframe and cloud environments.

The Bottom Line

Today’s systems are decidedly hybrid in nature and, as such, larger organizations with significant investment in mainframe applications must consider how to integrate mainframe data into these hybrid systems. There are multiple approaches, as we discussed, but the mainframe must be a vital component of your ongoing mission-critical applications and systems in a hybrid environment.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.