A POWER9 Roadmap
Rob McNelly explores Jeff Stuecheli's POWER9 presentation from January's AIX Virtual User Group meeting.
By Rob McNelly03/21/2017
- The slide on page 2 shows a roadmap with POWER9 appearing in the second half of 2017 and into 2018, with POWER10 appearing in the 2020 timeframe.
- Page 3 covers different workloads that POWER9 has been designed for.
- This is from page 4:
• Increased execution bandwidth efficiency for a range of workloads including commercial, cognitive and analytics
• Sophisticated instruction scheduling and branch prediction for unoptimized applications and interpretive languages
• Adaptive features for improved efficiency and performance especially in lower memory bandwidth systems
- This is from page 5:
• Enhanced pipeline efficiency with modular execution and intelligent pipeline control
• Increased pipeline utilization with symmetric data-type engines: Fixed, Float, 128b, SIMD
• Shared compute resource optimizes data-type interchange
- From page 8: There will be two ways to attach memory. You can either attach it directly or you can use the buffered memory in the scale up systems.
- Page 10 shows a matrix and what you will be able to expect from the two socket vs. multi-socket systems.
- Page 11 shows the socket performance you can expect from POWER9 vs. POWER8.
- Page 13 covers data capacity and throughput.
- Page 15 covers the bandwidth improvements between CECs on the large systems, and page 17 examines the different accelerators that will be incorporated.
- This is from page 18:
• Coherent Memory and Virtual Addressing Capability for all Accelerators
• OpenPOWER Community Enablement – Robust Accelerated Compute Options
– PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
– 25Gb/s Common Link x 48 lanes – 300 GB/s duplex bandwidth
• Robust Accelerated Compute Options with OPEN standards
– On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2
– CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4
– NVLink 2.0 – Next generation of GPU/CPU bandwidth and integration using 25G
– Open CAPI 3.0 – High bandwidth, low latency and open interface using 25G
- This is from page 19:
• Coherent memory sharing
• Enhanced virtual address translation
• Data interaction with reduced SW & HW overhead
Broader Application of Heterogeneous Compute
• Designed for efficient programming models
• Accelerate complex analytic/cognitive applications
- Page 23 covers OpenCAPI 3.0 features. This is from page 26:
• New Core Optimized for Emerging Algorithms to Interpret and Reason
• Bandwidth, Scale, and Capacity, to Ingest and Analyze
Processor Family with Scale-Out and Scale-Up Optimized Silicon
• Enabling a Range of Platform Optimizations – from HSDC Clusters to Enterprise Class Systems
• Extreme Virtualization Capabilities for the Cloud
Premier Acceleration Platform
• Heterogeneous Compute Options to Enable New Application Paradigms
• State of the Art I/O
• Engineered to be Open
These are things that stood out to me, but obviously you'll get more from listening to the replay.
Rob McNelly is a senior AIX solutions architect doing pre-sales and post-sales support for IBM Premier Business Partner Meridian IT Inc.
See more by Rob McNelly