NVIDIA Volta GV100 12nm FinFET GPU Detailed – Tesla V100 Specifications Include 21 Billion Transistors, 5120 CUDA Cores, 16 GB HBM2 With 900 GB/s Bandwidth-February 2024-www.yitit.com

NVIDIA Volta has just been announced at GTC 2017 and boy it's a beast. The next-generation graphics processing unitis the world's first chip that will make use of the industry leading TSMC 12nm FinFET process, so let's cover every detail of this compute powerhouse.

NVIDIA Volta GV100 Unveiled - Tesla V100 With 5120 CUDA Cores, 16 GB HBM2 and 12nm FinFET Process

Last GTC, NVIDIA announced the Pascal based GP100 GPU. It was back then, the fastest graphics chip designed for supercomputers. This year, NVIDIA is taking the next leap in graphics performance and announced their Volta based GV100 GPU. We are going to take a very deep look at the next-generation GPU designed for AI Deep Learning.

"Artificial intelligence is driving the greatest technology advances in human history," said Jensen Huang, founder and chief executive officer of NVIDIA, who unveiled Volta at his GTC keynote. "It will automate intelligence and spur a wave of social progress unmatched since the industrial revolution.

"Deep learning, a groundbreaking AI approach that creates computer software that learns, has insatiable demand for processing power. Thousands of NVIDIA engineers spent over three years crafting Volta to help meet this need, enabling the industry to realize AI's life-changing potential," he said.

Volta, NVIDIA's seventh-generation GPU architecture, is built with 21 billion transistors and delivers the equivalent performance of 100 CPUs for deep learning.

It provides a 5x improvement over Pascal, the current-generation NVIDIA GPU architecture, in peak teraflops, and 15x over the Maxwell architecture, launched two years ago. This performance surpasses by4x the improvements that Moore's law would have predicted.

via NVIDIA

First of all, we need to talk about the workloads this specific chip is designed to handle. The NVIDIA Volta GV100 GPU isdesigned to power the most computationally intensive HPC, AI, and graphics workloads.

The GV100 GPU includes 21.1 billion transistors with a die size of 815 mm2. It is fabricated on a new TSMC 12 nm FFN high performance manufacturing process customized for NVIDIA.The GPU is much bigger than the 610mm2 Pascal GP100 GPU. NVIDIA Volta GV100 delivers considerably more compute performance, and adds many new features compared to its predecessor, the Pascal GP100 GPU and its architecture family. Further simplifying GPU programming and application porting, GV100 also improves GPU resource utilization. GV100 is an extremely power-efficient processor, delivering exceptional performance per watt.

The chip itself is a behometh, featuring a brand new chip architecture that is just insane in terms of raw specifications. The NVIDIA Volta GV100 GPU is composed of six GPC (Graphics Processing Clusters). It has a total of 84 Volta streaming multiprocessor units, 42 TPCs (each including two SMs). The 84 SMs come with 64 CUDA cores per SM so we are looking at a total of 5376 CUDA cores on the complete die. All of the 5376 CUDA Cores can be used for FP32 and INT32 programming instructions while there are also a total of 2688 FP64 (Double Precision) cores. Aside from these, we are looking at 672 Tensor processors, 336 Texture Units.

The memory architecture is updated with eight 512-bit memory controllers. This rounds up to a total of 4096-bit bus interface that supports up to 16 GB of HBM2 VRAM. The bandwidth is boosted with speeds of 878MHz, which delivers increased transfer rates of 900 GB/s compared to 720 GB/s on Pascal GP100. Each memory controller is attached to 768 KB of L2 cache which totals to 6 MB of L2 cache for the entire chip.

NVIDIA Tesla Graphics Cards Comparison:

Tesla Graphics Card Name	NVIDIA Tesla M2090	NVIDIA Tesla K40	NVIDIA Telsa K80	NVIDIA Tesla P100	NVIDIA Tesla V100
GPU Architecture	Fermi	Kepler	Maxwell	Pascal	Volta
GPU Process	40nm	28nm	28nm	16nm	12nm
GPU Name	GF110	GK110	GK210 x 2	GP100	GV100
Die Size	520mm2	561mm2	561mm2	610mm2	815mm2
Transistor Count	3.00 Billion	7.08 Billion	7.08 Billion	15 Billion	21.1 Billion
CUDA Cores	512 CCs (16 CUs)	2880 CCs (15 CUs)	2496 CCs (13 CUs) x 2	3840 CCs	5120 CCs
Core Clock	Up To 650 MHz	Up To 875 MHz	Up To 875 MHz	Up To 1480 MHz	Up To 1455 MHz
FP32 Compute	1.33 TFLOPs	4.29 TFLOPs	8.74 TFLOPs	10.6 TFLOPs	15.0 TFLOPs
FP64 Compute	0.66 TFLOPs	1.43 TFLOPs	2.91 TFLOPs	5.30 TFLOPs	7.50 TFLOPs
VRAM Size	6 GB	12 GB	12 GB x 2	16 GB	16 GB
VRAM Type	GDDR5	GDDR5	GDDR5	HBM2	HBM2
VRAM Bus	384-bit	384-bit	384-bit x 2	4096-bit	4096-bit
VRAM Speed	3.7 GHz	6 GHz	5 GHz	737 MHz	878 MHz
Memory Bandwidth	177.6 GB/s	288 GB/s	240 GB/s	720 GB/s	900 GB/s
Maximum TDP	250W	300W	235W	300W	300W