During Dell's EMC presentation, AMD's CTO, Mark Papermaster, confirmed that they will be introducing the next-generation CDNA architecture-based Radeon Instinct MI100 accelerator during the second half of 2020.
AMD's Radeon Instinct MI100 CDNA Architecture Based Discrete GPU Accelerator Arriving in 2H 2020
The AMD Radeon Instinct MI100 which is internally referred to as 'Arcturus' will be a next-gen HPC part that will feature an enhanced version of the 7nm Vega architecture. The accelerator has never been mentioned by AMD officially until now. The GPU seems to be the top HPC part for 2020 in the AMD first-generation CDNA portfolio. Mark confirmed that the Discrete GPU will be introduced in the second half of 2020.
Mark Papermaster confirmed MI100 Discrete GPU accelerator for 2H 2020 pic.twitter.com/P6KTrm0B2S
— Hassan Mujtaba (@hms1193) June 17, 2020
Following is the quote from Mark during the Q/A session:
Like our multi-generational commitment to the Zen roadmap in x86 CPU, we have done the same with our DNA architectures for GPU - rDNA for gaming and visualization, and cDNA for compute & AI. The rDNA is driving gain in AMD share for graphics and deployed in the upcoming Sony and Microsoft new game consoles, and for cDNA you will see the MI100 discrete GPU both 2nd half of 2020.
The ROCm software stack creates an alternative for GPU compute with easy portability and enabling competition. - AMD CTO, Mark Papermaster
Based on what we have learned from various prototype leaks, the Radeon Instinct MI100 'Arcturus' GPU will feature several variants. The flagship variant goes in the D34303 SKU which makes use of the XL variant. The info for this part is based on a test board so it is likely that final specifications would not be the same but here are the key points:
Based on Arcturus XL GPUTest Board has a TDP of 200WUp To 32 GB HBM2 MemoryHBM2 Memory Clocks Reported Between 1000-1200 MHz
The Radeon Instinct MI100 test board has a TDP of 200W and is based on the XL variant of AMD's Arcturus GPU. The card also features 32 GB of HBM2 memory with pin speeds of 1.0 - 1.2 GHz. The MI60 in comparison has 64 CUs with a TDP of 300W while clock speeds are reported at 1200 MHz (Base Clock) while the memory operates at 1.0 GHz along with a 4096-bit bus interface, pumping out 1 TB/s bandwidth. There's a big chance that the final design of the Arcturus GPU could be featuring Samsung's latest HBM2E 'Flashbolt' memory which offers 3.2 Gbps speeds for up to 1.5 Tb/s of bandwidth.
AMD Radeon Instinct Accelerators
Accelerator Name | AMD Instinct MI400 | AMD Instinct MI300X | AMD Instinct MI300A | AMD Instinct MI250X | AMD Instinct MI250 | AMD Instinct MI210 | AMD Instinct MI100 | AMD Radeon Instinct MI60 | AMD Radeon Instinct MI50 | AMD Radeon Instinct MI25 | AMD Radeon Instinct MI8 | AMD Radeon Instinct MI6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
CPU Architecture | Zen 5 (Exascale APU) | N/A | Zen 4 (Exascale APU) | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
GPU Architecture | CDNA 4 | Aqua Vanjaram (CDNA 3) | Aqua Vanjaram (CDNA 3) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Arcturus (CDNA 1) | Vega 20 | Vega 20 | Vega 10 | Fiji XT | Polaris 10 |
GPU Process Node | 4nm | 5nm+6nm | 5nm+6nm | 6nm | 6nm | 6nm | 7nm FinFET | 7nm FinFET | 7nm FinFET | 14nm FinFET | 28nm | 14nm FinFET |
GPU Chiplets | TBD | 8 (MCM) | 8 (MCM) | 2 (MCM) 1 (Per Die) | 2 (MCM) 1 (Per Die) | 2 (MCM) 1 (Per Die) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) |
GPU Cores | TBD | 19,456 | 14,592 | 14,080 | 13,312 | 6656 | 7680 | 4096 | 3840 | 4096 | 4096 | 2304 |
GPU Clock Speed | TBD | 2100 MHz | 2100 MHz | 1700 MHz | 1700 MHz | 1700 MHz | 1500 MHz | 1800 MHz | 1725 MHz | 1500 MHz | 1000 MHz | 1237 MHz |
INT8 Compute | TBD | 2614 TOPS | 1961 TOPS | 383 TOPs | 362 TOPS | 181 TOPS | 92.3 TOPS | N/A | N/A | N/A | N/A | N/A |
FP16 Compute | TBD | 1.3 PFLOPs | 980.6 TFLOPs | 383 TFLOPs | 362 TFLOPs | 181 TFLOPs | 185 TFLOPs | 29.5 TFLOPs | 26.5 TFLOPs | 24.6 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP32 Compute | TBD | 163.4 TFLOPs | 122.6 TFLOPs | 95.7 TFLOPs | 90.5 TFLOPs | 45.3 TFLOPs | 23.1 TFLOPs | 14.7 TFLOPs | 13.3 TFLOPs | 12.3 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP64 Compute | TBD | 81.7 TFLOPs | 61.3 TFLOPs | 47.9 TFLOPs | 45.3 TFLOPs | 22.6 TFLOPs | 11.5 TFLOPs | 7.4 TFLOPs | 6.6 TFLOPs | 768 GFLOPs | 512 GFLOPs | 384 GFLOPs |
VRAM | TBD | 192 GB HBM3 | 128 GB HBM3 | 128 GB HBM2e | 128 GB HBM2e | 64 GB HBM2e | 32 GB HBM2 | 32 GB HBM2 | 16 GB HBM2 | 16 GB HBM2 | 4 GB HBM1 | 16 GB GDDR5 |
Infinity Cache | TBD | 256 MB | 256 MB | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Memory Clock | TBD | 5.2 Gbps | 5.2 Gbps | 3.2 Gbps | 3.2 Gbps | 3.2 Gbps | 1200 MHz | 1000 MHz | 1000 MHz | 945 MHz | 500 MHz | 1750 MHz |
Memory Bus | TBD | 8192-bit | 8192-bit | 8192-bit | 8192-bit | 4096-bit | 4096-bit bus | 4096-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 256-bit bus |
Memory Bandwidth | TBD | 5.3 TB/s | 5.3 TB/s | 3.2 TB/s | 3.2 TB/s | 1.6 TB/s | 1.23 TB/s | 1 TB/s | 1 TB/s | 484 GB/s | 512 GB/s | 224 GB/s |
Form Factor | TBD | OAM | APU SH5 Socket | OAM | OAM | Dual Slot Card | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Half Length | Single Slot, Full Length |
Cooling | TBD | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP (Max) | TBD | 750W | 760W | 560W | 500W | 300W | 300W | 300W | 300W | 300W | 175W | 150W |