yitit
Home
/
Hardware
/
IBM’s Next-Gen Z Processor Detailed: Telum Chip Based on 7nm Process, 22.5 Billion Transistors, 8 Cores Running Beyond 5 GHz Clocks
IBM’s Next-Gen Z Processor Detailed: Telum Chip Based on 7nm Process, 22.5 Billion Transistors, 8 Cores Running Beyond 5 GHz Clocks-February 2024
Feb 13, 2026 2:10 AM

IBM has detailed its next-generation Telum chip which is part of the Z processor lineup at HotChips 33. The Telum chip features a brand new core architecture design that's geared for AI acceleration.

IBM's Next-Gen Z Processor: 7nm Telum Chip With 22.5 Billion Transistors, 8 Cores, 5 GHz+ Clocks & 6+ TFLOPs AI Acceleration

According to IBM, the newly optimized Z core along with its brand new cache and multi-chip fabric hierarchy enables over 40% per socket performance growth. The Telum chip is comprised of a total of 8 cores that feature their dedicated L2 cache. The chip features SMT2 so which gives 16 threads on the chip while a maximum configuration of 32 core and 64 threads is possible with a 4-drawer system.

ibm-z-telum-chip-processor-_-samsung-7nm-_1

ibm-z-telum-chip-processor-_-samsung-7nm-_2

ibm-z-telum-chip-processor-_-samsung-7nm-_3

ibm-z-telum-chip-processor-_-samsung-7nm-_5

ibm-z-telum-chip-processor-_-samsung-7nm-_6

ibm-z-telum-chip-processor-_-samsung-7nm-_7

ibm-z-telum-chip-processor-_-samsung-7nm-_8

ibm-z-telum-chip-processor-_-samsung-7nm-_9

ibm-z-telum-chip-processor-_-samsung-7nm-_10

2 of 9

Clock speeds are said to be higher than 5 GHz while the Telum Z chip comes with a re-designed branch prediction with integrated 1st/2nd level BTB, Dynamic BTB entry reconfiguration, & more than 270K branch target table entries. The private L2 cache has a size of 32 MB and features a 19 cycle load-use latency (~3.8 ns including TLB access).

Moving over to L3 and L4 caches which are shared across the 8 cores, the IBM Z Telum chip packs virtual on-chip 256 MB L3 cache and virtual 2 GB L4 cache across up to 8 chips. The L2 cache uses a 320 GB/s dual-direction ring interconnect topology whereas the L3 cache is distributed through L2 cooperation and has an average latency of 12ns. The virtual L3 and L4 cache provide 1.5x cache per core.

ibm-z-telum-chip-processor-_-samsung-7nm-_11

ibm-z-telum-chip-processor-_-samsung-7nm-_12

ibm-z-telum-chip-processor-_-samsung-7nm-_13

ibm-z-telum-chip-processor-_-samsung-7nm-_14

ibm-z-telum-chip-processor-_-samsung-7nm-_15

2 of 9

Performance in AI Acceleration is rated at over 6 TFLOPs per chip & over 200 TFLOPs in a 4-drawer system that packs 4 IBM Z chips. The internal Matrix array features 128 tiles with 8-way FP-16 SIMD, high-density multiply, and accumulates FPUs while the Activation Array is composed of 32 tiles with 8-way FP16/FP-32 SIMD. A dual-chip configuration yields 116,000 inferences (1.1ms) while a 32-chip configuration yields 3,600,000 inferences (1.2ms).

IBM Z Telum chips can be scaled up for even more performance as there are both single-chip and dual-chip modular designs. The 2-chip configuration features a chiplet design with 2 Telum chips and offers 16 cores, 32 threads, and 512 MB of cache.

The AI accelerator on the IBM Z Telum chip provides:

Very low and consistent inference latencyCompute capacity for utilization at scaleVariety of AI models ranging from traditional ML to RNNs and CNNsSecurity - provide enterprise-grade memory virtualization and protectionExtensibility with future firmware and hardware updates

The IBM Z Telum Chip is going to be fabricated on the 7nm Samsung process node and will feature a die size of 530mm2. The chip will house 22.5 Billion transistors and will be aimed at enterprise & embedded workloads.

Comments
Welcome to yitit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Login to display more comments
Hardware
Recent News
Copyright 2023-2026 - www.yitit.com All Rights Reserved