IBM Storage Technology Shifts to Accommodate Cloud and Cognitive Demands
The IT industry is celebrated for its embrace of rapid change and disruptive breakthroughs, but storage is the outlier. If all of the disciplines of large-scale computing were invited to a party, storage would be the designated driver. After all, it’s the custodian of the handfuls of electrons and the tiny, fragile magnetic fields that represent people’s net wealth, tell surgeons what to fix, distribute works of art and predict the weather. The discipline does this with amazing precision, at immense scale and without a safety net; if digital data is lost or corrupted, the recovery must be from other digital data.
Rock-solid reliability at scale isn’t easy to achieve, and users are appropriately wary of risk, so major shifts in storage technology don’t happen often. Revolutions in storage require a confluence of new technology; new business needs that require it; and the conviction by component vendors, systems vendors and end users that the required investment is sound.
By my own count of revolutions, we had the disk drive introduced by IBM in 1956 with the 350-disk storage unit, storage networking in the late 1990s with the introduction of Fibre Channel SAN, and small incremental steps in between. We now find ourselves in the midst of another revolution, driven by the combined forces of cloud services, cognitive computing and a new set of storage technologies that have come of age.
New Data and Storage Needs
Cloud computing presents a new deployment model, in which storage is abstracted and presented to end users and applications as a service. There can be a variety in terms of service levels, speed, cost, available and disaster tolerant, etc., but users of applications only see the service attributes, not the physical storage itself. Some of the physical properties of storage do show through, including the number and geographic locations of copies, and the underlying speed and cost of the infrastructure. But overall, users can ignore the questions of how things are done and simply choose from a service catalog and pay per use. The cloud service providers must then provide the required capabilities, at the required scale and at a cost that permits an adequate margin for their organization.
New storage requirements arise out of the cloud model. A network data portal that can ingest data uploads from tenants’ own IT operations, or data feeds from distributed operations or the Internet of Things is needed. Storage infrastructure must be shared by tenants, be able to start small with low cost and quickly scale as large as demand requires. Cloud tenants must be securely isolated. Not only must their data be separated, but excessive activity by one tenant must not slow down others. Above all, the regular functions of storage operation must be highly automated. User onboarding, provisioning services selected from the catalog and even troubleshooting should be handled programmatically, because systems are too large and complex for administrators to be regularly touching. Fortunately, the cloud model enforces limited patterns of user behavior. Users can have any storage they want as long as it’s in the service catalog.
Right behind the idea of clouds run by service providers is the pattern of hybrid clouds. An emerging trend is for established businesses with their own IT shops to use service providers to complement their own facilities. An enterprise could use a cloud for backup or archive of inactive data, to offload applications during peak demand or to provide a complete business process (e.g., email and collaboration for a mobile workforce). Hybrid clouds require management oversight that spans both on-premises and at-cloud infrastructure, and allows for application and data mobility between the facilities. The need to cooperate with cloud services that become part of enterprise operations pushes cloud technology into traditional IT.
At the same time that clouds are changing the ownership model for storage, cognitive computing is changing the way applications use data and the storage services that would benefit them. The attributes of cognitive systems are that they:
- Understand a variety of data sources and imagery the way humans do
- Reason and extract underlying principles by forming hypotheses based on observation of data
- Learn by taking actions and observing results and refining hypothesis
- Interact with humans in a natural way
Taking the first two of these, we find that many cognitive systems require large collections of unstructured data, presented in a variety of formats and operated on by a variety of algorithms. This is quite different from traditional rules-based systems, which accept a limited number of inputs and then execute a fixed set of steps.
Somewhat less exotic than fully cognitive systems is the rapidly growing collection of analytic or big data systems that seek insights from large collections of data. Data scientists have become an important presence in many enterprises, and they deploy a range of tools such as MongoDB, Spark, Cloudant*, Couchbase and others to derive business insights from data. New applications can usually get by using whatever storage facilities are available, but as these become mission-critical and grow in scale, storage requirements specific to the workloads begin to emerge. If the applications are important enough, and the benefits large enough, enterprises seeking competitive advantage through data will adopt new storage patterns.
New Storage Technologies Step up
Fortunately, a new set of advanced storage technologies that meet new needs for performance, cost and function have reached maturity and are ready for use. For performance and cost, there are Flash-based solid-state storage and phase-change memory; and the non-volatile memory express (NVMe), NVMe over Fabric, Coherent Accelerator Processor Interface (CAPI), and zHyperLink attachments. For cost and function, there is software-defined storage (SDS), including the software for object stores, which are key to cloud services. Finally, for function, IBM has a new role of data management through metadata, which will be important for well-run analytics and cognitive systems.
Storage performance at the media level has been highly problematic for nearly 20 years. Disk drives have been unchanged in IO performance, measured in IOs/Sec, as all other IT technologies improved by orders of magnitude. NAND Flash media isn’t a perfect fit for enterprise applications, but with software for redundancy and elimination of delays for garbage collection, it’s now highly reliable. Flash is faster than disk by up to 1,000x, and is now denser: Flash SSDs reach 15 TB in the same space as a 12 TB HDD. With the added effect of hardware compression, Flash is now cheaper than 15K and 10K RPM HDDs and dramatically cheaper in cost per IO/sec. IBM delivers Flash using form factor SSDs as well as a highly optimized 2U Flash enclosure with new models reaching 180 TB before compression.
Faster storage media requires faster interfaces. Network speeds have improved steadily, but the workhorse storage protocol, Small Computer Systems Interface, has become a performance bottleneck for fast devices. New attachment protocols offer much more efficient data exchanges: For x86 systems, there are NVMe and NVMe over Fabric; for Power*, it’s the CAPI interface; and on IBM z14*, it’s zHyperLink.
SDS Leader
For cloud environments, the need to create storage services on commodity hardware has driven investment in SDS. Multiple types of storage services exist: file, block or object storage. A cloud service provider or a modern enterprise administrator can create service capacity by launching additional software on appropriate hardware.
IBM has been the leader in SDS since its inception; the IBM SAN Volume Controller with IBM Spectrum Virtualize* software is the industry-leading SDS product that earns IBM its No. 1 ranking (ibm.co/2y3yUNl). SDS is also an enabler of the hybrid cloud model. To achieve easy mobility of data between on-premises operations and a cloud service, compatible storage services spread across both is required. The SDS model allows enterprise and cloud operations to share data exchange and management interfaces, making the composite operation nearly seamless.
Within the SDS category, object storage deserves a special mention. The idea of object stores has been around for decades, but until hyperscale cloud providers appeared, they had little use. As public clouds were built, a need emerged to provide clients with an interface for data upload and retrieval. The interface had to support arbitrary networks (e.g., the internet), accommodate millions of clients, provide efficient data transfer and include a metadata facility so data can be identified and searched. All of this matches the design of object storage when used with interface protocols like S3 or Swift. Again, IBM has stepped up to this opportunity with the acquisition of Cleversafe*, now IBM Cloud Object Storage, a leading object storage software vendor.
Meeting Requirements
The unique requirements of cognitive and analytic systems can be summarized as having fast performance, being hybrid cloud capable and having a scalable, data-centric design. The performance and cloud capabilities are met by Flash and NVMe, and by the attributes of SDS. The additional requirement of data-centric design calls for facilities with rapid ingest and data indexing. Data from enterprise applications or IoT must be brought in, cataloged and formatted appropriately for the possible using applications, and metadata-based indexes that represent content must be built.
When cognitive applications run, they must quickly discover what data is available and relevant, and determine which objects to inspect more deeply. Without this capability, enormous repositories become unusable data junkyards that can’t produce timely results. IBM Research has demonstrated the MetaOcean project, a data curation layer for cognitive and analytic computing. MetaOcean manages an ingest and indexing facility that gives data scientists a quick way to find the correct version of data they want to analyze, and provide information on the source and validity of the data.
Storage as a discipline takes careful steps, due to its responsibility for data integrity. As new compelling business opportunities arise, the demand for enhanced scale, performance, lower cost and application efficiency overcome inertia and accelerate the pace of technology adoption.
As a leader in the cloud, cognitive and storage businesses, IBM understands these synergies and delivers the best possible system value.