A couple of weeks ago, NVIDIA published a vademecum of DirectX 12Do’s and Don’tsthat went largely unnoticed. However, it actually contains some interesting information on the tips that NVIDIA gave to developers on how to best use Microsoft's new lower level API with theirexisting architecture.

A couple of them, for instance, seem to confirm two stories we reported last month aboutMaxwell problems with Asynchronous Compute. In case you don't recall, the reference is to AMD's Robert Hallock saying that Maxwell can't perform Async Compute without heavy reliance on slow context switching; a few days later, Tech Report's David Kanter mentioned that according to Oculus employees, preemption context switching was potentially catastrophic for Maxwell GPUs.
Now, under thePipeline State Objects (PSOs)section, they were very clear:
Don’t toggle between compute and graphics on the same command queue more than absolutely necessaryThis is still a heavyweight switch to make
That's not all they had to say about compute and graphics tasks - under theWork Submission – Command Lists & Bundlessection, NVIDIA warned developers as follows:
Check carefully if the use of a separate compute command queues really is advantageousEven for compute tasks that can in theory run in parallel with graphics tasks, the actual scheduling details of the parallel work on the GPU may not generate the results you hope forBe conscious of which asynchronous compute and graphics workloads can be scheduled together
Finally, NVIDIA also gave some advice on how to best use Maxwell and DirectX 12 hardware features. They recommend to use Conservative Rasterization, which right now is only available on Maxwell cards, while they are a bit more cautious about Raster Order Views, the other DX12_1 level feature.
Use hardware conservative raster for full-speed conservative rasterizationNo need to use a GS to implement a ‘slow’ software base conservative rasterization - See https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterizationMake use of NvAPI (when available) to access other Maxwell featuresAdvanced Rasterization features:
Bounding box rasterization mode for quad based geometry
New MSAA features like post depth coverage mask and overriding the coverage mask for routing of data to sub-samples
Programmable MSAA sample locationsFast Geometry Shader features:
Render to cube maps in one geometry pass without geometry amplifications
Render to multiple viewports without geometry amplifications
Use the fast pass-through geometry shader for techniques that need per-triangle data in the pixel shaderNew interlocked operationsEnhanced blending opsNew texture filtering opsDon’t use Raster Order View (ROV) techniques pervasivelyGuaranteeing order doesn’t come for freeAlways compare with alternative approaches like advanced blending ops and atomics
For more about DirectX 12, you can check our Fable Legends benchmark results, Lionhead's statement on the DX12 features used in Fable Legends and our own analysis on Async Compute in the game.









