Skip to main content

5 Points to Consider When Preparing for Backup and Recovery

Planning for IT outages, whether instigated by Mother Nature, human error, system issues or a sly hacker, is no small feat for businesses. While the causes of unplanned outages vary, a strategy to manage outages can be standardized.

Harry Batten, executive IT architect, IBM, lists five common points companies should consider while crafting a backup and recovery plan:

  1. The importance of the data on your system
  2. How often the data changes
  3. The lifecycle of this data
  4. If this data is required for system recovery
  5. When this data should be backed up

Identify Data Importance

According to Batten, an important first step toward achieving a successful backup and recovery plan is to understand your data and its various unique requirements. While this may sound daunting, he says it’s not a challenging task for clients who truly understand their businesses.

He recommends examining the data from a failure perspective: Clients should first consider what would happen if their data became corrupted or lost. What’s needed to get the data back online, and what is the recovery point objective? Logged data, for example, could require restoring the last backup, applying log files and performing a forward-recovery that places the data at a point just before the outage occurred. Likewise, the loss of non-logged data could require restoring to the last-known successful backup, which may have occurred several days prior to the outage.

“It’s a good idea even in a static environment to look at what is being backed up, as well as how and where it’s stored on an annual basis at least.”

—Harry Batten, executive IT architect, IBM

Much has changed since the early days of computing, when anything and everything was regularly backed up. With the recent explosion of data, companies must now tag the data to indicate whether it should be backed up and, if so, to use the most appropriate method. Once it’s decided the data should be backed up, it’s necessary to determine whether the backup is for compliance or recovery reasons. The government dictates the number of years companies must store compliance data, and requires this data to be stored using a method that can’t be easily modified.

“In most [IBM] z Systems* environments, most of the critical data is already vaulted via a disaster recovery plan,” Batten says. “The backups we are talking about here are for easy recovery of day-to-day operations. Compliance data is self-explanatory—the government tells us that we need to retain this information for x number of years. Not all remaining data sets need to be backed up; however, there are almost always system data sets such as work files. “A question we ask

is, ‘What would be the result of losing or corrupting this data set?’ If we aren’t sure whether it should be backed up, it’s fairly easy to put it in a management class that will migrate it, then delete it after x days of non-usage.”

Lifecycle Considerations

Clients must consider whether their data is static (i.e., little chance exists of this data changing on a regular basis) or dynamic (i.e., records within the data are regularly added, modified or deleted). The type of data dictates the type of backup methodology that should be used. “For more static data,” Batten says, “we could use a generation data group, while dynamic data would benefit from an incremental method.”

While the actual exercise of backing up data may be relatively simple, Batten notes understanding when backups are no longer current—or even usable—is a more challenging, but imperative, task. Data that was backed up years ago and is no longer pertinent can be taking up valuable storage space on backup disk or tape. Identifying this data and understanding its lifecycle is a step toward being able to automate the backup and recovery process.

“Once the lifecycle is understood, it’s a good idea to use automated methods, such as data facility storage management subsystems with properly defined management classes and automatic class selection routines,” Batten says. “The currently available tooling for products like hierarchical storage management and tape management will allow the user to produce reports that can show when a particular data set was backed up and when it was last accessed. Even a simple listing of a backup volume can show this information.”

A successful disaster recovery plan identifies and accounts for the data that’s required for system recovery. Batten says many z Systems clients use synchronous or asynchronous data-replication methods to ensure all current data is present at a remote site in the event of a disaster. In other words, clients must know not only which data is necessary for system recovery, but also where it’s stored. For example, a client may have data that has been migrated to tape that is needed in order to restore data to a point in time. In this case, the client’s recovery plan should note the tape must also be replicated and available at the remote site.

GDPS* is designed to guarantee data consistency for z Systems data and automate the entire recovery process. GDPS utilizes IBM remote copy solutions such as Metro Mirror, Global Mirror and z/OS* Global Mirror. The automation achieved with GDPS is based upon Tivoli* System Automation (SA) for z/OS, which is the only automation product designed to exploit the Parallel Sysplex* environment. Tivoli SA provides the functionality and ability to manage the whole sysplex from a central point.

Data should be backed up at a time that will cause the least amount of disruption to users. Batten says modern technologies have allowed for more dynamic data backup; certain databases or data set types can be backed up while remaining open. In other cases, the backup software could lock the data and cause delays in application programs. “For data sets that are updated perhaps via a batch job, backup should be triggered by a scheduler after satisfactory completion of the job,” he says.

Challenges and Safeguards

Batten notes clients should consider revisiting their backup and recovery plans periodically, but especially after the adoption of new technologies (e.g., encryption) or when new data-compliance regulations are put into effect.

“It’s a good idea even in a static environment to look at what is being backed up, as well as how and where it’s stored on an annual basis at least,” Batten says. “It’s not uncommon for applications to be retired or rewritten and the old data never removed from storage.”

He also advises clients to develop robust naming conventions for their data sets. This ensures each backup file has a unique identifier and allows clients to quickly determine the details of the backup by publishing a list of existing management classes and their triggers. When a new application is introduced, this list allows clients to determine whether the application fits into an existing class or if a new one should be created.

According to Batten, z Systems data audits have uncovered a common error clients make while planning for backup and recovery. While the production systems are clean and controlled, he says the development and test environments tend to be messy and use valuable space. When application programmers develop a new program, for example, they create test data. Because this data will be changed by the application during the test, programmers create another copy from which to recover and test again. Once the testing is complete, all of this test data remains stored. Unless the developer complied with the same standards used to backup production systems, this development data may never be deleted. “It’s imperative that a backup plan extends to these environments as well, even if to just enforce the data life cycle,” Batten explains.

To ensure all backups or migrations are happening when required and the proper data lifecycle is being enforced, Batten suggests using automation tools available for z Systems clients. According to Batten, one of the biggest challenges companies face while developing a backup and recovery plan is not being able to standardize and control their data.

Plan for the Worst

An effective backup and recovery plan assumes nothing is safe from failure: hardware, software and even applications may contribute to unplanned outages. Batten says it’s paramount clients understand their data and how it relates to achieving business continuity. He suggests they try to identify all possible scenarios when it comes to data loss, and then ask themselves what steps they could take to recover.