NVIDIA seems to have provided more information to the press regarding its GeForce RTX 30 series graphics cards and the Ampere GPUs that they utilize. The information is part of a deep-dive NDA'd session which takes a closer look at both GA102 and GA104 Gaming Ampere GPUs which will land in the gaming market in the coming weeks.
NVIDIA GeForce RTX 30 Series Graphics Cards Specs, Performance & GA102/GA104 GPUs Further Detailed in Deep-Dive
The deep-dive session includes information on the NVIDIA GeForce RTX 30 series, some of which that we have already seen during the official unveil on 1st September and some with new info that provides us a more detailed look at the Ampere gaming GPUs. NVIDIA has detailed a small amount of information during its Reddit Q&A session where they talked about the new SM design for their Ampere GPUs. But before that, let's take a look at the GPUs powering NVIDIA's brand new Geforce RTX 30 series lineup.
NVIDIA GA102 GPU - The Flagship Ampere Gaming GPU For GeForce RTX 3090 & RTX 3080
The NVIDIA GA102 GPU is the flagship gaming chip which features a die size of 628mm2 and packs in a total of 28 Billion transistors. According to NVIDIA, the GA102 GPU comprises 6 GPCs that is the Graphics Processing Clusters and 6 TPC (Texture Processing Clusters). The GA102 GPU on the RTX 3090 makes use of 41 TPCs or 82 SMs while the GeForce RTX 3080 makes use of 34 TPCs or 68 SMs. Each SM on the Ampere GPU features 128 CUDA cores along with a redesigned structure which we will detail in a bit. The GA102 GPU on the RTX 3090 features a total of 10,496 cores while the one on the RTX 3080 features 8704 cores.
In terms of GPU density, the GA102 GPU is about twice as dense as the Turing TU102 GPU with 44.56 million transistors per square millimeters versus 24.67 million transistors per square millimeters on Turing and that's all on the Samsung 8nm process node.

Each SM consists of four tensor cores and 1 RT core. The GA102 GPU features a shared L2 cache. It is 6 MB for the GeForce RTX 3090 and 5 MB for the RTX 3080. The specific GPU block diagram that's been shared shows a total of ten 32-bit memory controllers for the GeForce RTX 3080 which deliver a 320-bit bus. The GeForce RTX 3090 will feature a total of twelve 32-bit memory controllers for a 384-bit bus interface.
NVIDIA GA104 GPU - The Efficiency and Gaming Optimized GPU For The GeForce RTX 3070
At the heart of the NVIDIA GeForce RTX 3070 graphics card lies the GA104 GPU. The GA104 is one of the many Ampere GPUs that we will be getting on the gaming segment. The GA104 GPU is the second-fastest Ampere chip in the stack. The GPU is based on Samsung's 8nm (N8) process node. The GPU measures at 395.2mm2 and features 17.4 Billion transistors which are almost 93% of the transistors featured on the TU102 GPU. At the same time, the GA104 GPU is almost half the size of the TU102 GPU which is an insane amount of density.



2 of 9
For the GeForce RTX 3070, NVIDIA has enabled a total of 46 SM units on its flagship which results in a total of 5888 CUDA cores. In addition to the CUDA cores, NVIDIA's GeForce RTX 3070 also comes packed with next-generation RT (Ray-Tracing) cores, Tensor cores, and brand new SM or streaming multi-processor units. The GPU features a total of 184 Tensor cores and 46 RT cores. There's a large possibility that the GA104 GPU comes with a full fat 6144 core configuration which could launch in a future graphics card variant. The GA104 GPU features a 4 MB L2 shared cache and has a total of eight 32-bit memory controllers for a 256-bit wide bus interface.
NVIDIA GeForce RTX 30 Series 'Ampere' Graphics Card Specifications:
| Graphics Card Name | NVIDIA GeForce RTX 3060 | NVIDIA GeForce RTX 3060 Ti | NVIDIA GeForce RTX 3070 | NVIDIA GeForce RTX 3080 | NVIDIA GeForce RTX 3090 |
|---|---|---|---|---|---|
| GPU Name | Ampere GA106-300 | Ampere GA104-200 | Ampere GA104-300 | Ampere GA102-200 | Ampere GA102-300 |
| Process Node | Samsung 8nm | Samsung 8nm | Samsung 8nm | Samsung 8nm | Samsung 8nm |
| Die Size | TBC | 395.2mm2 | 395.2mm2 | 628.4mm2 | 628.4mm2 |
| Transistors | TBC | 17.4 Billion | 17.4 Billion | 28 Billion | 28 Billion |
| CUDA Cores | 3584 | 4864 | 5888 | 8704 | 10496 |
| TMUs / ROPs | 112 / 64 | 152 / 80 | 184 / 96 | 272 / 96 | 328 / 112 |
| Tensor / RT Cores | 112 / 28 | 152 / 38 | 184 / 46 | 272 / 68 | 328 / 82 |
| Base Clock | 1320 MHz | 1410 MHz | 1500 MHz | 1440 MHz | 1400 MHz |
| Boost Clock | 1780 MHz | 1665 MHz | 1730 MHz | 1710 MHz | 1700 MHz |
| FP32 Compute | 13 TFLOPs | 16 TFLOPs | 20 TFLOPs | 30 TFLOPs | 36 TFLOPs |
| RT TFLOPs | 25 TFLOPs | 32 TFLOPs | 40 TFLOPs | 58 TFLOPs | 69 TFLOPs |
| Tensor-TOPs | 101 TOPs | 192 TOPs | 163 TOPs | 238 TOPs | 285 TOPs |
| Memory Capacity | 12 GB GDDR6 | 8 GB GDDR6 | 8 GB GDDR6 | 10 GB GDDR6X | 24 GB GDDR6X |
| Memory Bus | 192-bit | 256-bit | 256-bit | 320-bit | 384-bit |
| Memory Speed | 16 Gbps | 14 Gbps | 14 Gbps | 19 Gbps | 19.5 Gbps |
| Bandwidth | 384 Gbps | 448 Gbps | 448 Gbps | 760 Gbps | 936 Gbps |
| TGP | 170W | 175W | 220W | 320W | 350W |
| Price (MSRP / FE) | $329 US | $399 US | $499 US | $699 US | $1499 US |
| Launch (Availability) | 25th February 2021 | 2nd December 2020 | 29th October 2020 | 17th September 2020 | 24th September 2020 |









