Christian Jacobi, Chief Architect for IBM Z Processors, on the New IBM Telum Chip
The innovative IBM Telum chip enables users to grow and optimize workloads with a new cache infrastructure
Christian Jacobi, Distinguished Engineer and Chief Architect for IBM Z Processors, IBM, says the pandemic has accelerated many digital transformations, leading to more throughput happening on IBM clients’ systems. The Telum chip allows for this increased activity while maintaining the performance and scale of the systems.
“The new cache architecture is a very innovative design that provides 1.5x the cache capacity per core and at the same time provides faster access to the data,” says Jacobi. “The new architecture provides a strong performance foundation for Telum-based systems, and a new platform from which we can optimize for future generations.”
Clients have come to expect this kind of innovation from IBM, and the IBM Telum chip’s new cache design and AI capabilities do not disappoint.
Evolving TechnologyThe IBM Telum chip evolved from past technology in a way that many view as both efficient and revolutionary. IBM first created a cache design based off on an older system called embedded DRAM, which was a super high-density memory element that could be put on processor chips. This “traditional cache hierarchy,” as Jacobi describes it, consisted of four levels of cache. Each cache level was inclusive of the level below, which allowed for efficient tracking and exchange of data, but required replication of the data.
Since there is no longer embedded DRAM available in 7nm, the ground-breaking Telum chip was created to maximize the effectiveness of each memory cell on the chip and forego data replication across the different cache levels.
In the IBM Telum chip, there is one very large, 32 megabyte, private L2 cache for each core. The Telum chip implements a cache line exchange mechanism forming virtual L3 and L4 caches from underutilized space in the neighboring L2 caches.
This innovative architecture decreases data replication and delivers more cache efficiency out of every memory cell on the Telum chip.
A Real-Time ApproachThe ability to perform analysis in real time is becoming more important than ever. The IBM Telum chip addresses this imperative by enabling IBM clients to leverage AI technologies for things like fraud detection to ultimately improve security.
The IBM Telum chip is designed to help stop a fraudulent credit card transaction by detecting fraud immediately before transactions complete. Other chip technologies perform latent detection that signals for fraud after an online transaction has already occurred. The Telum chip operates on data in the cache, as a credit card is being processed, and so the AI accelerator has direct access to the data in real time to help organizations intercept fraud.
“The technology can deliver on that minimized latency to enable [clients] to use high accuracy models for their analysis embedded directly into the workload,” says Jacobi.
IBM wanted to optimize for latency in a hybrid workload by implementing a centralized on-chip accelerator into the IBM Telum chip. This is technologically different from many other platforms that build an AI accelerator as an add-on to every core on the chip, diluting how much compute each one can have when applications perform AI tasks.
By putting more than 6 teraflops in one central area of the chip, IBM’s Telum chip can use the entire capacity of the accelerator to achieve low latencies when a core switches in its hybrid operation, from database or transaction, into AI.
Inference in the AI StackThe on-chip accelerator in the IBM Telum Chip was designed to allow clients to do both training and inference on the IBM Z platform, if so desired. The training process enables the creation of AI models out of historical data and the inference applies the model to a concrete production situation using live data.
This comprehensive architecture was designed to help prevent security and latency issues that arise when clients try to move their sensitive data over a network link or use x86-based AI technologies. With the IBM Telum chip, clients can conduct the training process and enable inference across the entire AI stack, using real time data, without performing inference on another platform with the associated latency delays, making the Telum capabilities extremely valuable.
Clients can do their training on IBM Z, IBM Power Systems, in a data lake or in the cloud, etc. This allows for the utilization of any training infrastructure and makes it easier for clients to get started with already familiar tools. Then, the data taken from training can be turned into an open neural network exchange format model (ONNX) and compiled for execution on the Telum AI accelerator. This execution enables the model to be deployed directly in a Z software stack, such as a z/OS stack or a Linux stack.
The model can be used at various levels in the stack for use cases like intelligent infrastructure, workload placements, database query optimization, credit card fraud or client behavior prediction.
Fulfilling Client NeedsWith every new technology comes challenges that must be overcome to reach a high level of innovation. The designers of the IBM Telum chip experienced this as they navigated technical hurdles while working from home during the COVID-19 pandemic.
Despite many challenges, the IBM team was resilient, and Jacobi says there were many conscientious decisions on how to work collaboratively on the Telum chip in the time of COVID-19. IBM has a long history of co-creating with clients and understanding their needs, so when a shift toward AI occurred, the IBM team was prepared with the desired infrastructure.
Jacobi is proud of his team of engineers for all the hard work that went into the new chip design. The public’s positive reaction to the IBM Telum chip has also been extremely rewarding for him.
“Seeing how the innovation resonates and how people who understand processor design write very positive stories about that innovation and are somewhat awed by what we’re doing. That’s obviously something that as an engineer makes me proud,” Jacobi says.
z/OS / Linux on IBM Z / z/VM / z/VSE / Article / Security / Systems management / Artificial intelligence / Data management / Data security / Performance / Workload management / Artificial intelligence
Brianna Boecker is an intern for MSPC.
See more by Brianna Boecker