Hot, Warm and Cold Data Find a Home With Storage Groups
The introduction of storage groups provides a user-friendly way for the provisioning of storage based on business requirements.
By Dan Gibson09/30/2012
Your data has a temperature. In fact, it might have several. Knowing its temperature will help you know how to manage it. Multitemperature data management refers to the frequency of accessing data in storage. The classifications are often referred to as hot, warm and cold (see Table 1). Hot data is frequently accessed on faster storage, warm data is accessed less frequently and stored on slightly slower storage, and cold data is rarely accessed and stored on even slower storage.
Each type of storage has an associated cost, which is dependent on several factors, such as the type of storage or the size of the storage devices. Costs are also influenced by environmental factors, such as rack space, floor space, the amount of power required, the number of power supplies, and redundancy and recovery capabilities. Costs can further be impacted by additional features incorporated in your storage, such as the amount of memory cache and the use of certain algorithms to assist with performance, error checking or error correction.
Business requirements can help you decide what type of storage to use for different types of data where the temperature of the data is part of the decision-making process. One such possibility is using storage groups.
What Are Storage Groups?
A storage group is a set of storage paths that manage storage allocation for table spaces. Different storage technologies can be defined in different storage groups, thus allowing table spaces to be created using the most effective storage type based on business requirements, such as service-level objectives, recovery requirements (e.g., RAID definition, table recovery requirements, etc.) and cost. Some examples of storage group configurations that can be defined are:
- 144 1 TB drives, 7.5k rpm, RAID 1
- 128 500 GB drives, 10k rpm, RAID 5
- 5 500 GB of solid state drives, RAID 6
- 256 500 GB drives, 10k rpm, RAID 3 with disk replication
Storage Groups and Multitemperature Storage
Storage groups are an excellent fit for the implementation and management of a multitemperature storage allocation scheme. Figure 1 shows data that’s range partitioned in table spaces defined using the storage groups “hot,” “warm” and “cold”—each defined with a different type of storage.
You can have as few or as many storage groups with the same or different characteristics as you wish. Based on requirements, different types of storage—and therefore storage groups—will be best suited for different applications, workloads and business. For example:
- Privacy laws for storing medical data may require all medical data be stored in a storage group that supports disk encryption
- In an attempt to use the least expensive storage with acceptable write performance during Materialized Query Tables (MQTs) maintenance, all MQTs will be placed in table spaces that reside in that storage group, working on the assumption that read performance will be sufficient due to the intrinsic benefits of using MQTs because they’re usually aggregates of base table data and thus contain significantly less rows.
- The deep analytics team has a no-change data capture process and performs a full refresh of its tables nightly. As such, it has no business requirement for anything but RAID 1 as the recovery plan is simply to reload the data. However, it does require performance. One option being discussed for sustained performance is to use numerous, inexpensive 500 GB disks, therefore providing a sufficient number of spindles for disk performance.
Moving Data to Different Storage Groups
Table spaces can be moved from one storage group to another. This process is performed online using the ALTER TABLESPACE statement with all data remaining fully accessible as the data in the table space is moved. Referring to Figure 1, this means you can move table spaces to different storage media as required.
It’s also possible to add storage media to or remove it from storage groups. As expected, adding or removing storage is performed without any interruption of service.
While it may be easy to envision moving data to different storage groups as data “cools,” here are a few other scenarios:
- A set of tables containing historical data needs to be accessed to create a set of reports for an audit. To provide adequate performance, you’ll move the table spaces containing that data to a storage group that will provide faster disk I/O.
- The XML content in a set of tables has become static and will never be updated but will still be frequently accessed in the near future. You’ll temporarily move all table spaces containing the XML content to solid state storage for the next six months until the next content-generation cycle takes place. At which time, you’ll move the data to cold storage and then eventually to an archive.
- After encountering a series of disk failures for a particular storage group, you’ve decided to move all data to a different storage group.
Figure 2 depicts table spaces being moved to different storage groups.
Intelligent Storage Provisioning Strateg
While also referred to as thin provisioning or dynamic provisioning, intelligent storage provisioning intends to allocate space as required but in a way that the storage capacity allocated is not wasted. Examples are:
- A project requires only 500 GB of disk space in the beginning as a means to load data, and only after further design decisions are made will the project require additional storage.
- An existing project requires additional storage but only for certain table spaces. As such, the additional storage added is to be used specifically for those table spaces only.
- The company’s chargeback policy wants to charge not for space used or allocated but for space allotted. As such, managers need a much more granular unit of measure when storage capacity is increased.
Because a storage group consists of a set of storage paths, you can add space, or storage, to a storage group and only the table spaces in that storage group are able to take advantage of that space. The DB2* database doesn’t immediately start using that space unless required because space is limited. Any table space in that storage group can immediately start using that space simply by altering the table space and rebalancing the data across it.
The introduction of storage groups provides a user-friendly way for the provisioning of storage based on business requirements. Knowing your data’s temperature can help you make informed decisions about the type of storage and configuration that best fits your data.
Dan Gibson is a 20-year veteran of IBM specializing in very large databases (VLDB), data warehousing and business analytical solutions across the globe.