yitit
Home
/
Hardware
/
NVIDIA 16nm Pascal Based Tesla P100 With GP100 GPU Unveiled – Worlds First GPU With HBM2 and 10.6 TFLOPs of Compute On A Single Chip
NVIDIA 16nm Pascal Based Tesla P100 With GP100 GPU Unveiled – Worlds First GPU With HBM2 and 10.6 TFLOPs of Compute On A Single Chip-February 2024
Feb 12, 2026 6:32 AM

NVIDIA has officially unveiled the Pascal based TeslaP100GPU which is their fastest GPU to date. The Pascal GP100 chip is NVIDIA's first GPU to be based on the latest 16nm FinFET process node which delivers65 percent higher speed, around 2 times the transistor density increase and 70 percent less power than its 28HPM tech. The new FinFET process allows NVIDIA to gain up to 2 times the performance per watt improvement on Pascal compared to the Maxwell GPUs.

nvidia-pascal-tesla-p100-graphics-card_1

nvidia-pascal-tesla-p100-graphics-card_2

nvidia-pascal-tesla-p100-graphics-card_3

nvidia-pascal-tesla-p100-graphics-card_5

2 of 9

NVIDIA Pascal Tesla P100Unveiled - 15.3 Billion Transistors on a 610mm2 16nm Die - 16 GB HBM2 Memory With Insane Compute

The NVIDIA Pascal Tesla P100GPU revives the double precision compute technology on NVIDIA chips which was not featured on the Maxwell generation of cards. The Maxwell generation brought NVIDIA in the most competitive position with a lineup filled with amazing graphics card that won not only in performance per watt but also the performance to value segments. NVIDIA has developed a large ecosystem around their Maxwell cards which is now represented by the GeForce brand.

With Pascal, NVIDIA will not only be aiming at the GeForce brand but also the high-performance Tesla market. The Tesla market is the action filled lineup where the big chips are aimed at. NVIDIA has received huge demand of next-generation chips in this market and they have prepped a range of next-gen chips specifically for the HPC market.

26256601225_8ee0a0a563_o

The GP100 GPU used in Tesla P100 incorporates multiple revolutionary new features and unprecedented performance. Key features of Tesla P100 include:

Extreme performance—powering HPC, deep learning, and many more GPU Computing areas;NVLink—NVIDIA’s new high speed, high bandwidth interconnect for maximum application scalability;HBM2—Fastest, high capacity, extremely efficient stacked GPU memory architecture;Unified Memory and Compute Preemption—significantly improved programming model;16nm FinFET—enables more features, higher performance, and improved power efficiency.NVIDIA Pascal Tesla P100 Graphics Card_2

The current 28nm products have existed in the Tesla market since early 2012. This was the time when NVIDIA had started shipping the GK110 GPUs to built the Titan Supercomputer. The Tesla K20X was used to power the fastest supercomputer in the world at that time. When Maxwell came in the market, NVIDIA still had the bulk Kepler parts that were being sold for their high double precision compute, something that was amiss on Tesla Maxwell cards. While NVIDIA did launch Maxwell based Tesla cards later in the lineup which were aimed at the Cloud / Virtulization sectors, the top brass of NVIDIA's FP64 crunching Tesla cards are arriving again with the new Tesla Pascal graphics cards.

nvidia-pascal-gpu_gtc_10x-maxwell

nvidia-pascal-gpu_gtc_bandwidth

nvidia-pascal-gpu_gtc_memory-capacity

nvidia-pascal-gpu_gtc_mixed-precision

nvidia-pascal-gpu_gtc_performance-per-watt

2 of 9

Pascal GPU Roadmap Slides From GTC 2015 Showcasing The Architecture Updates on The Latest GPU.

The new Pascal GP100 GPU that is aimed at the Tesla market first features three key technologies, NVLINK, FP16 and HBM2. Those go along well with the architectural improvements in NVIDIA's latest CUDA architecture.

NVIDIA Pascal GP100 With 10.6 TFLOPs Single and 5.3 TFLOPs Dual Precision Compute On A SingleGraphics Card

NVIDIA Pascal GP100 GPU Architecture - The Building Blocks of NVIDIA's HPC Accelerator Chip - 3840 CUDA Cores, Preemption and Return of Double Precision With a Bang

Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. GP100 achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total). The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units.

NVIDIA GP100 Block Diagram

Pascal GP100 Has Insane Clock Speeds - Near 1.5 GHz Boost Clocks

The Pascal GP100 comes with insane clock speeds of 1328 MHz core and 1480 MHz boost clock which is an insane leap and shows how the clock speed will scale even higher with the smaller chips so we can expect to see around 1500 MHz+ Pascal GPUs on the consumer market.

GP100’s SM incorporates 64 single-precision (FP32) CUDA Cores. In contrast, the Maxwell and Kepler SMs had 128 and 192 FP32 CUDA Cores, respectively. The GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, and two dispatch units. While a GP100 SM has half the total number of CUDA Cores of a Maxwell SM, it maintains the same register file size and supports similar occupancy of warps and thread blocks.

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla Graphics CardTesla K40
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla P100
(PCI-Express)
Tesla P100 (SXM2)Tesla V100 (PCI-Express)Tesla V100 (SXM2)Tesla V100S (PCIe)
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GP100 (Pascal)GV100 (Volta)GV100 (Volta)GV100 (Volta)
Process Node28nm28nm16nm16nm12nm12nm12nm
Transistors7.1 Billion8 Billion15.3 Billion15.3 Billion21.1 Billion21.1 Billion21.1 Billion
GPU Die Size551 mm2601 mm2610 mm2610 mm2815mm2815mm2815mm2
SMs15245656808080
TPCs15242828404040
CUDA Cores Per SM1921286464646464
CUDA Cores (Total)2880307235843584512051205120
Texture Units240192224224320320320
FP64 CUDA Cores / SM6443232323232
FP64 CUDA Cores / GPU9609617921792256025602560
Base Clock745 MHz948 MHz1190 MHz1328 MHz1230 MHz1297 MHzTBD
Boost Clock875 MHz1114 MHz1329MHz1480 MHz1380 MHz1530 MHz1601 MHz
FP16 ComputeN/AN/A18.7 TFLOPs21.2 TFLOPs28.0 TFLOPs30.4 TFLOPs32.8 TFLOPs
FP32 Compute5.04 TFLOPs6.8 TFLOPs10.0 TFLOPs10.6 TFLOPs14.0 TFLOPs15.7 TFLOPs16.4 TFLOPs
FP64 Compute1.68 TFLOPs0.2 TFLOPs4.7 TFLOPs5.30 TFLOPs7.0 TFLOPs7.80 TFLOPs8.2 TFLOPs
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM24096-bit HBM24096-bit HBM24096-bit HBM
Memory Size12 GB GDDR5 @ 288 GB/s24 GB GDDR5 @ 288 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 900 GB/s16 GB HBM2 @ 900 GB/s16 GB HBM2 @ 1134 GB/s
L2 Cache Size1536 KB3072 KB4096 KB4096 KB6144 KB6144 KB6144 KB
TDP235W250W250W300W250W300W250W

GP100’s SM has the same number of registers as Maxwell GM200 and Kepler GK110 SMs, but the entire GP100 GPU has far more SMs, and thus many more registers overall. This means threads across the GPU have access to more registers, and GP100 supports more threads, warps, and thread blocks in flight compared to prior GPU generations.

Pascal GP100

Overall shared memory across the GP100 GPU is also increased due to the increased SM count, and aggregate shared memory bandwidth is effectively more than doubled. A higher ratio of shared memory, registers, and warps per SM in GP100 allows the SM to more efficiently execute code. There are more warps for the instruction scheduler to choose from, more loads to initiate, and more per-thread bandwidth to shared memory (per thread).

On compute side, Pascal is going to take the next incremental step with double precision performance rated over 5.3 TFLOPs, which is more than double of what’s offered on the last generation FP64 enabled GPUs. As for single precision performance, we will see the Pascal GPUs breaking past the 10 TFLOPs barrier with ease. The chip comes with 4 MB of L2 cache. The GPU is in volume production and will be arriving to HPC markets very soon. On the mixed precision market, the Tesla P100 can achieve a maximum of 21 TFLOPs of FP16 compute performance which can process workloads at twice the compute precision of FP32.

Because of the importance of high-precision computation for technical computing and HPC codes, a key design goal for Tesla P100 is high double-precision performance. Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput. Compared to the 3:1 ratio in Kepler GK110 GPUs, this allows Tesla P100 to process FP64 workloads more efficiently.

NVIDIA Pascal Tesla P100 Graphics Card_4

NVIDIA Pascal is Built on TSMC's 16nm FinFET Process Node

The chip is based on the 16nm FinFET process which leads to efficiency improvements and better performance per watt but with Pascal, double precision compute returns with a bang. Maxwell which is NVIDIA’s current gen architecture made some serious gains in the performance per watt department and Pascal is expected to keep the tradition move forward.

TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value. via TSMC

GPU ArchitectureNVIDIA FermiNVIDIA KeplerNVIDIA MaxwellNVIDIA Pascal
GPU Process40nm28nm28nm16nm (TSMC FinFET)
Flagship ChipGF110GK210GM200GP100
GPU Design SM (Streaming Multiprocessor)SMX (Streaming Multiprocessor)SMM (Streaming Multiprocessor Maxwell)SMP (Streaming Multiprocessor Pascal)
Maximum Transistors3.00 Billion7.08 Billion8.00 Billion15.3 Billion
Maximum Die Size520mm2561mm2601mm2610mm2
Stream Processors Per Compute Unit32 SPs192 SPs128 SPs64 SPs
Maximum CUDA Cores512 CCs (16 CUs)2880 CCs (15 CUs)3072 CCs (24 CUs)3840 CCs (60 CUs)
FP32 Compute1.33 TFLOPs(Tesla)5.10 TFLOPs (Tesla)6.10 TFLOPs (Tesla)~12 TFLOPs (Tesla)
FP64 Compute0.66 TFLOPs (Tesla)1.43 TFLOPs (Tesla)0.20 TFLOPs (Tesla)~6 TFLOPs(Tesla)
Maximum VRAM1.5 GB GDDR56 GB GDDR512 GB GDDR516 / 32 GB HBM2
Maximum Bandwidth192 GB/s336 GB/s336 GB/s720 GB/s - 1 TB/s
Maximum TDP244W250W250W300W
Launch Year2010 (GTX 580)2014 (GTX Titan Black)2015 (GTX Titan X)2016

Comments
Welcome to yitit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Login to display more comments
Hardware
Recent News
Copyright 2023-2026 - www.yitit.com All Rights Reserved