yitit
Home
/
Hardware
/
Nvidia Unveils Pascal Tesla P100 With Over 20 TFLOPS Of FP16 Performance – Powered By GP100 GPU With 15 Billion Transistors & 16GB Of HBM2
Nvidia Unveils Pascal Tesla P100 With Over 20 TFLOPS Of FP16 Performance – Powered By GP100 GPU With 15 Billion Transistors & 16GB Of HBM2-February 2024
Feb 12, 2026 4:53 PM

Nvidia has just unveiledits fastest GPUyet here at GTC 2016, a brand new graphics chip based on the company's next generation Pascal architecture. The GP100 isNVIDIA's most advanced GPU to date, powering the company's next generation compute monster, the Tesla P100.

Nvidia GTC-11

Nvidia claims that GP100is the largest FinFET GPU that has ever been made, measuringat 600mm² and packing over 15billion transistors. The Tesla P100 features a slightly cut back GP100 GPU and delivers5.3 teraflops of double precision compute, 10.6 TFLOPSof single precision compute and 21.2 TFLOPS of half precision FP16 compute. Keeping this massive GPU fed is4MB of L2 cache and a whopping 14MB worth ofregister files.

Nvidia GTC-12

The entire Telsa P100 package is comprised of many chips not just the GPU, that collectively add up to over 150 billion transistors and features 16GB of stacked HBM2 VRAM for a total of 720GB/s of bandwidth. Nvidia's CEO & Co-Founder Jen-Hsun Huang confirmed that this behemoth of a graphics card is already in volume production with samples already delivered to customers which will begin announcing their products in Q4 and will be shipping their products in Q1 2017.

NVIDIA Tesla P100 Quotes

Pascal GP100 Architecture & Specs

Nvidia Press Release

Five Architectural Breakthroughs

The Tesla P100 delivers its unprecedented performance, scalability and programming efficiency based on five breakthroughs:

NVIDIA Pascal architecture for exponential performance leap -- A Pascal-based Tesla P100 solution delivers over a 12x increase in neural network training performance compared with a previous-generation NVIDIA Maxwell™-based solution.

NVIDIA NVLink for maximum application scalability -- The NVIDIA NVLink™ high-speed GPU interconnect scales applications across multiple GPUs, delivering a 5x acceleration in bandwidth compared to today's best-in-class solution1. Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximize application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication.

16nm FinFET for unprecedented energy efficiency -- With 15.3 billion transistors built on 16 nanometer FinFET fabrication technology, the Pascal GPU is the world's largest FinFET chip ever built2. It is engineered to deliver the fastest performance and best energy efficiency for workloads with near-infinite computing needs.

CoWoS with HBM2 for big data workloads -- The Pascal architecture unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design, Chip on Wafer on Substrate (CoWoS) with HBM2, provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the Maxwell architecture.

New AI algorithms for peak performance -- New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning.

The GP100 GPU is comprised of 3840 CUDA cores, 240 texture units and a 4096bit memory interface. The 3840 CUDA cores are arranged in six Graphics Processing Clusters, or GPCs for short. Each of these has 10 Pascal Streaming Multiprocessors. As mentioned earlier in the article the Tesla P100 features a cut down GP100 GPU. This cut back version has3584 CUDA cores and 224 texture mapping units.

Pascal Tesla P100 GPU Board

Each Pascal streaming multiprocessor includes 64 FP32 CUDA cores, half that of Maxwell. Within each Pascal streaming multirprocessor there are two 32 CUDA core partitions, two dispatch units, a warp scheduler and a fairly large instructionbuffer, matching that of Maxwell.

Pascal GP100

The massive GP100 GPU has significantly more pascal streaming multiprocessors, or CUDA core blocks. Because each of these has access to a register file that's the same size of Maxwell's 128 CUDA core SMM. This means that each Pascal CUDA core has access to twice the register files. In turn we should expect even more performance out of each Pascal CUDA cores compared to Maxwell.

NVIDIA GP100 Block Diagram

Nvidia Press Release

Tesla P100 Specifications

Specifications of the Tesla P100 GPU accelerator include:

5.3 teraflops double-precision performance, 10.6 teraflops single-precision performance and 21.2 teraflops half-precision performance with NVIDIA GPU BOOST™ technology

160GB/sec bi-directional interconnect bandwidth with NVIDIA NVLink

16GB of CoWoS HBM2 stacked memory

720GB/sec memory bandwidth with CoWoS HBM2 stacked memory

Enhanced programmability with page migration engine and unified memory

ECC protection for increased reliability

Server-optimized for highest data center throughput and reliability

Tesla P100 Boosts To Nearly 1.5Ghz

Perhaps one of the most exciting, yet perhaps predictable, revaluations about the GP100 Pascal flagship GPU is that it can achieve clocks even higher than Maxwell. Despite Nvidia opting for very conservative clock speeds on its professional GPUs like the Tesla & Quadro products the P100 actually has a base clock speed of 1328mhz and a boost clock speed of 1480mhz. Considering that GPU Boost 2.0 actually allows these cards to operate at even higher clock speeds than the nominal boost clock.

We're looking at actual frequencies of upwards of 1500Mhz on the GeForce equivalent of the P100. What is inevitably going to launch as the next GTX Titan.This means boost clocks of even upwards of 1600Mhz on factory overclocked models, and perhaps 2Ghz+ manual overclocks. This should be extremely exciting news to all GeForce fans.

Tesla ProductsTesla K40Tesla M40Tesla P100
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)
SMs152456
TPCs152428
FP32 CUDA Cores / SM19212864
FP32 CUDA Cores / GPU288030723584
FP64 CUDA Cores / SM64432
FP64 CUDA Cores / GPU960961792
Base Clock745 MHz948 MHz1328 MHz
GPU Boost Clock810/875 MHz1114 MHz1480 MHz
Compute Performance - FP32 5.04 TFLOPS6.82 TFLOPS10.6 TFLOPS
Compute Performance - FP64 1.68 TFLOPS0.21 TFLOPS5.3 TFLOPS
Texture Units240192224
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM2
Memory SizeUp to 12 GBUp to 24 GB16 GB
L2 Cache Size1536 KB3072 KB4096 KB
Register File Size / SM256 KB256 KB256 KB
Register File Size / GPU3840 KB6144 KB14336 KB
TDP235 Watts250 Watts300 Watts
Transistors7.1 billion8 billion15.3 billion
GPU Die Size551 mm²601 mm²610 mm²
Manufacturing Process28-nm28-nm16-nm

Comments
Welcome to yitit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Login to display more comments
Hardware
Recent News
Copyright 2023-2026 - www.yitit.com All Rights Reserved