Last week has been one of the more exciting for AMD and its fans as it took the covers off its highly anticipated Vega graphics architecture. Luckily we were there and got to see it in actionand in the flesh. Yes, we got to see the actual Vega graphics cardthat ran all of those impressive 4K game demos.On top of thatthe company also gave us its first comprehensive overview of the Vega architecture and itsnew features and technologies.
Perhaps the most intriguing and exciting of all the new bells and whistles that Vega brings to the table is its unique memory architecture and High Bandwidth Cache. The new memory architecture allows Vega GPUs to do a numberof exciting new things that its predecessors can't. One of its featuresin particular is impressive enough to warrant having its own discussion.
Besides handling memory traffic in a vastly more efficient fashion it alsosignificantly cuts back on wasteful memory allocations. We go into a lot of details on how it works and why it's quite revolutionary in our Vega graphics architecture piece, where we break it all down. We're not going to dive into thedetails here, so if you want to read more about it we'd highly recommend checking out that article.
AMD's New Vega High Bandwidth Cache Controller Will Double Your Usable Graphics Memory Capacity In Games

That's right. An 8GB Vega graphics card, just as an example, will be effectivelyhave as much usable memory as a 16GB graphics card. It's all thanks to the company's brand new High Bandwidth Cache Controller at the heart of every Vega graphics chip andthe way it works is quite clever. And who is better to explainit all than AMD's top graphics manand beloved nerd Raja Koduri.
Raja Koduri– Chief Architect Radeon Technologies Group, AMD
With regards to theHigh Bandwidth Cache from a gaming perspective. We looked at all the modern games, the big games that push memory hard, and one of the things we noticed is theVRAM - graphics memory - utilization. We look at how much of the VRAM that the game allocates. So if the game say needs 4GB of memory when we looked at actually how much of that memory is actually used to render pixels we found that many games, actually most games, don't use more than 50% of what they allocate.
That's because the current/old GPU architecture doesn't give you flexibility to move memory in fine granularity. So with Vega and with the High Bandwidth Cache and the HBC controller, for games it will utilize the amount of frame-buffer you have much more efficiently. So effectively you can think of it as Vega will be doubling your memory capacity for games.
Brad Chacos – Senior Editor,PC World
So basically a game that says it uses 4GB of VRAM right now, is in actuality using 2 and with Vega, you're saying, it will actually allocate 2.
Raja Koduri– Chief Architect Radeon Technologies Group, AMD
Exactly
Wccftech.com transcript, PCWorld interview. Video Timestamp 1:57
Vega's High Bandwidth Cache In Action- Lower Memory Utilization & Faster Multi-Tasking
AMD gave us two examples of the High Bandwidth Cache Controller cutting back onwasteful memory allocations by half. To our surprise they were both triple A gaming titles wheredevelopers have actually done a lot of optimization work to minimize the memory utilization footprint. The games in question are The Witcher 3 fromCD Projekt Red and Fallout 4 fromBethesda Game Studios.

The Radeon Technologies Groupfound that in most titles today, including the two above,only half of all the memory allocated is actually accessed and used. Raja explains that this is the result of game developers working around the quirks of old GPU architectures, where swapping data in and out of the frame-buffer is veryexpensive in terms of latency/performance. This in turn would forcegame developers to guard themselves by allocating more than they need at any given time to avoid running intoa situation where the game needs to swap in data from outside the graphics memory.
With Vega the High BandwidthCache Controller is clever enough toknow beforehand whatdata is actually useful and load it into the cache and what data isn't and leave it out. Which would not only cut the amount of memory allocated by games in half it would also make things like alt-tabbing out of games significantly faster, because the frame buffer isn't clogged up with all of thesewastefuldata allocations.
Brad Chacos – Senior Editor,PC World
On day one, will games that already exist consume less memory?
Raja Koduri– Chief Architect Radeon Technologies Group, AMD
For example say a game is built for 4GB and say you have a 4GB card it all plays well but when you swap in, for example you alt-tab out of the game and go into a browser or something or do something quick and you come back, it takes a long time. Because the whole thing was swapped out and swapped in.
So with Vega you will see that stuff become much more efficient. Because it didn't really.. like I said it wasn't using all 4GB it was only using a portion of it. So we didn't actually load that up all inside your precious cache. So you will see those kinds of benefits. But,let's say you have a game that wants topush 8GB when you turnhigh details on and so on, it will run much more efficiently in a 4GB configuration.
Wccftech.com transcript, PCWorld interview. Video Timestamp9:02
AMD's next generation family of Radeon graphics cards featuring the Vega graphics architecture will officially launch in the first half of this year. The company hasn't given us a specific date yet butpromises to reveal more in the coming weeks and months.
AMD Vega Lineup
| Graphics Card | Radeon R9 Fury X | Radeon RX 480 | Radeon RX Vega Frontier Edition | Radeon Vega Pro | Radeon RX Vega (Gaming) | Radeon RX Vega Pro Duo |
|---|---|---|---|---|---|---|
| GPU | Fiji XT | Polaris 10 | Vega 10 | Vega 10 | Vega 10 | 2x Vega 10 |
| Process Node | 28nm | 14nm FinFET | FinFET | FinFET | FinFET | FinFET |
| Stream Processors | 4096 | 2304 | 4096 | 3584 | 4096 (?) | Up to 8192 |
| Performance | 8.6 TFLOPS 8.6 (FP16) TFLOPS | 5.8 TFLOPS 5.8 (FP16) TFLOPS | ~13 TFLOLPS ~25 (FP16) TFLOPS | 11 TFLOLPS 22 (FP16) TFLOPS | >13 TFLOLPS >25 (FP16) TFLOPS | TBA TBA |
| Memory | 4GB HBM | 8GB GDDR5 | 16GB HBM2 | TBA | TBA | TBA |
| Memory Bus | 4096-bit | 256-bit | 2048-bit | 2048-bit | 2048-bit | 4096-bit |
| Bandwidth | 512GB/s | 256GB/S | 480GB/S | 400GB/S | TBA | TBA |
| TDP | 275W | 150W | TBA | TBA | TBA | TBA |
| Launch | 2015 | 2016 | June 2017 | June 2017 | July 2017 | TBA |









