AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4 Come Together In An Advanced Packaging Marvel-February 2024-www.yitit.com

The AMD Instinct MI300X & MI300A are some of the most anticipated accelerators in the AI segment which will launch next month. There's a lot of anticipation surrounding AMD's first full-fledged AI masterpiece and today we thought about giving you a roundup of what to expect from this technical marvel.

AMD Instinct MI300X Is Designed For GPU-Accelerated AI Workloads While MI300A Tackles HPC With The Most Technically Advanced APU Package

On the 6th of December, AMD will host its "Advancing AI" keynote where one of the main agendas is to do a full unveiling of the next-gen Instinct accelerator family codenamed MI300. This new GPU and CPU accelerated family will be the lead product of the AI segment which is AMD's No.1 and the most important strategic priority right now as it finally rolls out a product that is not only advanced but also is designed to meet the critical AI requirement within the industry. The MI300 class of AI accelerators will be another chiplet powerhouse, making use of advanced packaging technologies from TSMC so let's see what's under the hood of these AI monsters.

AMD Instinct MI300X - Challenging NVIDIA's AI Supremacy With CDNA 3 & Huge Memory

The AMD Instinct MI300X is definitely the chip that will be highlighted the most since it is clearly targeted at NVIDIA's Hopper and Intel's Gaudi accelerators within the AI segment. This chip has been designed solely on the CDNA 3 architecture and there is a lot of stuff going on. The chip is going to host a mix of 5nm and 6nm IPs, all combining to deliver up to 153 Billion transistors (MI300X).

AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4 Come Together In An Advanced Packaging Marvel 2

AMD Instinct MI300X Accelerator.

Starting with the design, the main interposer is laid out with a passive die which houses the interconnect layer using a next-gen Infinity Fabric solution. The interposer includes a total of 28 dies which include eight HBM3 packages, 16 dummy dies between the HBM packages, & four active dies and each of these active dies gets two compute dies.

Each GCD based on the CDNA 3 GPU architecture features a total of 40 compute units which equals 2560 cores. There are eight compute dies (GCDs) in total so that gives us a total of 320 Compute & 20,480 core units. For yields, AMD will be scaling back a small portion of these cores and we will be getting more details on exact configurations a month from now.

AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4 Come Together In An Advanced Packaging Marvel 4

AMD Instinct MI300X Accelerator with CDNA 3 dies.

Memory is another area where you will see a huge upgrade with the MI300X boasting 50% more HBM3 capacity than its predecessor, the MI250X (128 GB). To achieve a memory pool of 192 GB, AMD is equipping the MI300X with 8 HBM3 stacks and each stack is 12-Hi while incorporating 16 Gb ICs which give us 2 GB capacity per IC or 24 GB per stack.

The memory will offer up to 5.2 TB/s of bandwidth and 896 GB/s of Infinity Fabric Bandwidth. For comparison, NVIDIA's upcoming H200 AI accelerator offers 141 GB capacities while Gaudi 3 from Intel will be offering 144 GB capacities. Large memory pools matter a lot in LLMs which are mostly memory bound and AMD can definitely show its AI prowess by leading in the memory department. For comparisons:

Instinct MI300X - 192 GB HBM3Gaudi 3 - 144 GB HBM3H200 - 141 GB HBM3eMI300A - 128 GB HBM3MI250X - 128 GB HBM2eH100 - 96 GB HBM3Gaudi 2 - 96 GB HBM2e

232328650_instinct_mi300a_exploded_view_01-custom

232328650_instinct_mi300x_exploded_view_01-custom

232328650_instinct_mi300a_exploded_view_02-custom

232328650_instinct_mi300x_exploded_view_02-custom

232328650_instinct_mi300a_cross_section_angle_01-custom

232328650_instinct_mi300x_cross_section_angle_01-custom

232328650_instinct_mi300a_exploded_view_03-custom

232328650_instinct_mi300x_exploded_view_03-custom

2 of 9

In terms of power consumption, the AMD Instinct MI300X is rated at 750W which is a 50% increase over the 500W of the Instinct MI250X and 50W more than the NVIDIA H200.

AMD Instinct MI300A - Densely Packaged Exascale APUs Now A Reality

We have waited for years for AMD to finally deliver on the promise of an Exascale-class APU and the day is nearing as we move closer to the launch of the Instinct MI300A. The packaging on the MI300A is very similar to the MI300X except it makes use of TCO-optimized memory capacities & Zen 4 cores.

AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4 Come Together In An Advanced Packaging Marvel 3

AMD Instinct MI300A Accelerator.

One of the active dies has two CDNA 3 GCDs cut out and replaced with three Zen 4 CCDs which offer their own separate pool of cache and core IPs. You get 8 cores and 16 threads per CCD so that's a total of 24 cores and 48 threads on the active die. There's also 24 MB of L2 cache (1 MB per core) and a separate pool of cache (32 MB per CCD). It should be remembered that the CDNA 3 GCDs also have the L2 cache separate.

AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4 Come Together In An Advanced Packaging Marvel 5

AMD Instinct MI300A Accelerator with CDNA 3 & Zen 4 dies.

Rounding up some of the highlighted features of the AMD Instinct MI300 Accelerators, we have:

First Integrated CPU+GPU PackageAiming Exascale Supercomputer MarketAMD MI300A (Integrated CPU + GPU)AMD MI300X (GPU Only)153 Billion TransistorsUp To 24 Zen 4 CoresCDNA 3 GPU ArchitectureUp To 192 GB HBM3 MemoryUp To 8 Chiplets + 8 Memory Stacks (5nm + 6nm process)

Bringing all of these together, AMD will work with its ecosystem enablers and partners to offer MI300 AI accelerators in 8-way configurations featuring SXM designs that connect to mainboard with mezzanine connectors. It will be interesting to see what sort of configurations these will be offered within and while SXM boards are a given, we can also expect a few variants in the PCI-E form factors.

amd-instinct-mi300-ai-accelerators-servers-_2

amd-instinct-mi300-ai-accelerators-servers-_1

amd-instinct-mi300-ai-accelerators-servers-_4

amd-instinct-mi300-ai-accelerators-servers-_5

AMD Instinct MI300A APUs Power French "Adastra" Supercomputer, MI300 Expected To Ship 400,000 Units In 2024 1

2 of 9

One configuration showcased by Gigabyte as part of its G383-R80 rack features a motherboard with four SP5 sockets that are designed to support the Instinct MI300A accelerators. The board features eight PCIe Gen5 x16 slots that can support four dual-slot and four FHFL cards or a total of 12 FHFL cards (4 x16 / 4 x8 speeds).

For now, AMD should know that their competitors are also going full steam ahead on the AI craze with NVIDIA already teasing some huge figures for its 2024 Blackwell GPUs and Intel prepping up its Guadi 3 and Falcon Shores GPUs for launch in the coming years too. One thing is for sure at the current moment, AI customers will gobble up almost anything they can get and everyone is going to take advantage of that. But AMD has a very formidable solution that is not just aiming to be an alternative to NVIDIA but a leader in the AI segment and we hope that MI300 can help them achieve that success.

AMD Radeon Instinct Accelerators

Accelerator Name	AMD Instinct MI400	AMD Instinct MI300X	AMD Instinct MI300A	AMD Instinct MI250X	AMD Instinct MI250	AMD Instinct MI210	AMD Instinct MI100	AMD Radeon Instinct MI60	AMD Radeon Instinct MI50	AMD Radeon Instinct MI25	AMD Radeon Instinct MI8	AMD Radeon Instinct MI6
CPU Architecture	Zen 5 (Exascale APU)	N/A	Zen 4 (Exascale APU)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
GPU Architecture	CDNA 4	Aqua Vanjaram (CDNA 3)	Aqua Vanjaram (CDNA 3)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Arcturus (CDNA 1)	Vega 20	Vega 20	Vega 10	Fiji XT	Polaris 10
GPU Process Node	4nm	5nm+6nm	5nm+6nm	6nm	6nm	6nm	7nm FinFET	7nm FinFET	7nm FinFET	14nm FinFET	28nm	14nm FinFET
GPU Chiplets	TBD	8 (MCM)	8 (MCM)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)
GPU Cores	TBD	19,456	14,592	14,080	13,312	6656	7680	4096	3840	4096	4096	2304
GPU Clock Speed	TBD	2100 MHz	2100 MHz	1700 MHz	1700 MHz	1700 MHz	1500 MHz	1800 MHz	1725 MHz	1500 MHz	1000 MHz	1237 MHz
INT8 Compute	TBD	2614 TOPS	1961 TOPS	383 TOPs	362 TOPS	181 TOPS	92.3 TOPS	N/A	N/A	N/A	N/A	N/A
FP16 Compute	TBD	1.3 PFLOPs	980.6 TFLOPs	383 TFLOPs	362 TFLOPs	181 TFLOPs	185 TFLOPs	29.5 TFLOPs	26.5 TFLOPs	24.6 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP32 Compute	TBD	163.4 TFLOPs	122.6 TFLOPs	95.7 TFLOPs	90.5 TFLOPs	45.3 TFLOPs	23.1 TFLOPs	14.7 TFLOPs	13.3 TFLOPs	12.3 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP64 Compute	TBD	81.7 TFLOPs	61.3 TFLOPs	47.9 TFLOPs	45.3 TFLOPs	22.6 TFLOPs	11.5 TFLOPs	7.4 TFLOPs	6.6 TFLOPs	768 GFLOPs	512 GFLOPs	384 GFLOPs
VRAM	TBD	192 GB HBM3	128 GB HBM3	128 GB HBM2e	128 GB HBM2e	64 GB HBM2e	32 GB HBM2	32 GB HBM2	16 GB HBM2	16 GB HBM2	4 GB HBM1	16 GB GDDR5
Infinity Cache	TBD	256 MB	256 MB	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Memory Clock	TBD	5.2 Gbps	5.2 Gbps	3.2 Gbps	3.2 Gbps	3.2 Gbps	1200 MHz	1000 MHz	1000 MHz	945 MHz	500 MHz	1750 MHz
Memory Bus	TBD	8192-bit	8192-bit	8192-bit	8192-bit	4096-bit	4096-bit bus	4096-bit bus	4096-bit bus	2048-bit bus	4096-bit bus	256-bit bus
Memory Bandwidth	TBD	5.3 TB/s	5.3 TB/s	3.2 TB/s	3.2 TB/s	1.6 TB/s	1.23 TB/s	1 TB/s	1 TB/s	484 GB/s	512 GB/s	224 GB/s
Form Factor	TBD	OAM	APU SH5 Socket	OAM	OAM	Dual Slot Card	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Half Length	Single Slot, Full Length
Cooling	TBD	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP (Max)	TBD	750W	760W	560W	500W	300W	300W	300W	300W	300W	175W	150W