Skip to main content

An Application View of Data Is Important for a Point-in-Time Recovery

Hardly a week goes by without another report of data being stolen, lost or corrupted. Black hats working for organized crime or unfriendly governments often perpetrate these events, but many are the result of poor planning, carelessness or outright neglect. While most data centers with mainframe footprints have a disaster recovery (DR) strategy with a plan and periodic testing, for today’s threats, more is necessary.

Ransomware attacks, with colorful names like WannaCry and Locky, can be leveled against any platform. In an attack:

  1. A server is infected with a piece of code that searches out important data
  2. When important data are found, the code encrypts it, overwriting the original data on disk, destroying the original copy
  3. This process is repeated until your important data are encrypted

Once all of the important data are corrupted, an installation must pay the ransom to recover the data.

Applicability to the Mainframe World

The scope and value of the assets on most mainframes dwarf the systems penetrated by recent attacks. If a z/OS* version of ransomware attacks even a single LPAR, the results could be devastating.

A mainframe attack could come from compromised transactions or jobs originating on distributed systems connected to IBM Z*.

Data can also be corrupted by non-malevolent means—an erroneous change to an application. While the data may not become unreadable, such an error might make recovering application data improbable.

To understand the scope of the problem, consider different views of examining information assets: storage, data and application.

Storage View

The storage view focuses on the media where information assets are stored (e.g., DASD/disk drives, (virtual) tape, flash, etc.), and how the integrity of media is maintained. This view focuses on maintaining a replica of the current state of the storage that can be quickly brought into production use, minimizing the disruption of business activities.

Solutions in this space are storage controller based (i.e., management of the recovery occurs based on the configuration of the installation’s storage controllers and connectivity network). These solutions copy data as it’s written from the primary storage controllers to alternate storage controllers to ensure that a reasonably up-to-date copy is maintained (locally, at metro or global distances). While a synchronous connection between the primary and alternate copies is required to minimize the recovery time objective (RTO) and eliminate data loss, asynchronous data replication over longer distances is more frequently used to ensure better isolation between the primary and alternate copies, while accepting a level of data loss within the recovery point objective (RPO).

Data View

The data view examines the ability to recover the logical constructs that are maintained on storage (e.g., volumes, data sets and files). Solutions for recovering the data view are based on system software, storage controller firmware and hardware.

By understanding and managing the constructs within the storage controllers, software can make point-in-time copies. However, recovering a complex system requires more context to understand how different backups are related to the dependent applications.

Application View

The application view adds the context of how the data is used by the client’s applications. From this view, a client can understand more about applications: sequencing, what’s running at a point in time, data accessed by each, and what data are in use at a point in time.

An application may need to access hundreds or thousands of data sets that were generated by other applications or previous executions of that application, and generate hundreds or thousands of new data sets, which are used for other applications or subsequent executions of the same application.

The application view is critical to recoverability in that it enhances the storage and data views by providing contextual temporality of data and execution dependencies. At best, the storage view can maintain the integrity of “the now” (or “the just before now”) and the data view can provide point-in-time backups of individual data sets. An application-centric solution provides the ability to take point-in-time backups at key points in time and restore the necessary data to restart with a consistent set of data.

Devise a Plan

For maximum resiliency, many organizations rely on storage replication as the primary approach for DR and business continuity. The data are divided into consistency groups (e.g., sets of related data) typically by application. The primary and alternate storage controllers communicate changes in the data while maintaining the referential integrity within each consistency group. The client and vendor plan the placement of data into consistency groups and configure the flow of changes to achieve the necessary RTO and RPO.

The reason data replication doesn’t protect you from a ransomware attack or application corruption is that a storage controller replicates all data as it’s written. It can’t distinguish an “appropriate write” from an “inappropriate write.” This results in alternate copies that are equally corrupt as the production data. To make an application or set of applications recoverable requires a backup copy (“snapshot”) of all application data that’s consistent at critical points in time and a backup copy of the application as it was running at those critical points in time.

To protect snapshot copies from corruption, they should be maintained on media that are isolated from and secured differently from primary production storage.

Understanding all of the applications and data sets that are critical to your environment is a daunting task. Even small environments might have hundreds of jobs accessing thousands of related data sets. Large environments can have tens or hundreds of thousands of jobs and millions of data sets. Aside from the initial complexity, the environment is continually changing.

The size and complexity of the problem demand a solution that interrogates and understands the application flow, data usage and the relationship of that data to all of the jobs within that application. The solution should perform backups at critical times and maintain an inventory of the backed up data.

When recovery becomes necessary, the solution should be able to use that inventory to inform administrators of the complete sets of recoverable data, minimizing the need for complex human analysis of the environment.

Finally, the solution must adapt the recovery strategy to account for changes in the environment. With an application-centric solution, your organization can more reliably recover your applications, restoring service more quickly after data corruption failures.