To say there’s a bit of excitement for DirectX 12 and other low-level APIs is probably an understatement. A big understatement. With DirectX 12 ramping up for a release later this year, Mantle 1.0 already in pseudo-release, and its successor Vulkan under active development, the world of graphics APIs is changing in a way not seen since the earliest days, when APIs such as Direct3D, OpenGL, and numerous vendor proprietary APIs were first released. From a consumer standpoint this change will still take a number of years, but from a development standpoint 2015 is going to be the year that everything changed for PC graphics programming.

So far much has been made about the benefits of these APIs, the potential performance improvements, and ultimately what can be done and what new things can be achieved with them. The true answer to those questions are that this is going to be a multi-generational effort; until games are built from the ground-up for these APIs, developers won’t be able to make full use of their capabilities. Even then, the coolest tricks will take some number of years to develop, as developers become better acquainted with these new APIs, their idiosyncrasies, and the capabilities of the underlying hardware when interfaced with these APIs. In other words, right now we’re just scratching the surface.

The first DirectX 12 games are expected towards the end of the year, and in the meantime Microsoft and their hardware partners have been ramping up the DirectX 12 ecosystem, hammering out the API implementation in Windows 10 while the hardware vendors write and debug their WDDM 2.0 drivers. Meanwhile as this has been going on, we’ve seen a slow release of software released designed to showcase DirectX 12 features in a proof of concept manner. A number of various internal demos exist, and we saw the first semi-public DirectX 12 software release last month with our look at Star Swarm.

This week the benchmarking gurus over at Futuremark are releasing their own first run at a DirectX 12 test with their latest update for the 3DMark benchmark. Futuremark has been working away at DirectX 12 for some time – in fact they were the first partner to show DirectX 12 code in action at Microsoft’s 2014 DX12 unveiling – and now they are releasing their first DirectX 12 project.

In keeping with the general theme of the demos we’ve seen so far, Futuremark’s new DirectX 12 release is another proof of concept test. Dubbed the 3DMark API Overhead Feature Test, this benchmark is a purely synthetic benchmark designed to showcase the draw call benefits of the new API even more strongly than earlier benchmarks. Whereas Star Swarm was a best-case scenario test within the confines of a realistic graphics workload, the API Overhead Feature Test is a proper synthetic benchmark that is designed to test one thing and one thing only: how many draw calls a system can handle. The end result, as we’ll see, showcases just how great the benefits of DirectX 12 are in this situation, allowing for an order of magnitude’s improvement, if not more.

To do this, Futuremark has written a relatively simple test that draws out a very simple scene with an ever-increasing number of objects in order to measure how many draw calls a system can handle before it becomes saturated. As expected for a synthetic test, the underlying rendering task is very simple – render an immense amount of building-like objections at both the top and bottom of the screen – and the bottleneck is in processing the draw calls. Generally speaking, under this test you should either be limited by the number of draw calls you can generate (CPU limited) or limited by the number of draw calls you can consume (GPU’s command processor limited), and not the GPU’s actual rendering capabilities. The end result is that the API Overhead Feature Test can push an even larger number of draw calls than Star Swarm could.

To showcase the difference between various APIs, this test is available with DirectX 12 and Mantle, but also two different DirectX 11 modes. Standard DirectX 11 single-threading is one mode, alongside support for DirectX 11 multi-threading. The latter has a checkered history – it never did work as well in the real world as initially hoped – and in practice only NVIDIA supports it to any decent degree. But regardless, as we’ll see DirectX 12’s throughput will put even DX11MT to shame.

FutureMark’s complete technical description is posted below:

The test is designed to make API overhead the performance bottleneck. The test scene contains a large number of geometries. Each geometry is a unique, procedurally-generated, indexed mesh containing 112 -127 triangles.

The geometries are drawn with a simple shader, without post processing. The draw call count is increased further by drawing a mirror image of the geometry to the sky and using a shadow map for directional light.

The scene is drawn to an internal render target before being scaled to the back buffer. There is no frustum or occlusion culling to ensure that the API draw call overhead is always greater than the application side overhead generated by the rendering engine.

Starting from a small number of draw calls per frame, the test increases the number of draw calls in steps every 20 frames, following the figures in the table below.

To reduce memory usage and loading time, the test is divided into two parts. The second part starts at 98304 draw calls per frame and runs only if the first part is completed at more than 30 frames per second.

Draw calls per frame Draw calls per frame increment per step Accumulated duration in frames
192 – 384 12 320
384 – 768 24 640
768 – 1536 48 960
1536 – 3072 96 1280
3072 – 6144 192 1600
6144 – 12288 384 1920
12288 – 24576 768 2240
24576 – 49152 1536 2560
49152 – 98304 3072 2880
98304 – 196608 6144 3200
196608 – 393216 12288 3520
Other Notes & The Test
Comments Locked


View All Comments

  • silverblue - Saturday, March 28, 2015 - link

    Well, varying results aside, I've heard of scores in the region of eight million. That would theoretically (if other results are anything to go off) put it around the level of a mildly-overclocked i3 (stock about 7.5m). Definitely worth bearing in mind the more-than-six-cores scaling limitation showcased by this test - AMD's own tests show this happening to the 8350, meaning that the Mantle score - which can scale to more cores - should be higher. Incidentally, the DX11 scores seem to be in the low 600,000s with a slight regression in the MT test. I saw these 8350 figures in some comments somewhere but forgot where so I do apologise for not being able to back them up, however the Intel results can be found here:

    I suppose it's all hearsay until a site actually does a CPU comparison involving both Intel and AMD processors. Draw calls are also just a synthetic; I can't see AMD's gaming performance leaping through the stratosphere overnight, and Intel stands to benefit a lot here as well.
  • silverblue - Saturday, March 28, 2015 - link

    Sorry, stock i3 about 7.1m.
  • oneb1t - Saturday, March 28, 2015 - link

    my fx-8320@4.7ghz + R9 290x does 14.4mil :) in mantle
  • Laststop311 - Friday, March 27, 2015 - link

    I think AMD APU's are the biggest winner here. Since draw calls help lift cpu bottlenecks and the apu's have 4 weaker cores the lack of dx11 to be able to really utilize multi core for draw calls means the weak single threaded performance of the apus could really hold things back here. DX12 will be able to shift the bottleneck back to the igpu of the apu's for a lot of games and really help make more games playable at 1080p with higher settings or at least same settings and smoother.

    If only AMD would release an updated version of the 20 cu design for the ps4 using GCN 1.3 cores + 16GB of 2nd generation 3d HBM memory directly on top that the cpu or gpu could use, not only would you have a rly fast 1080p capable gaming chip you could design radically new motherboards that omit ram slots entirely. Could have new mini itx boards that have room for more sata ports and usb headers and fan headers and more room available for vrm's and cool it with good water cooling like the thermaltake 3.0 360mm rad AIO and good TIM like the coollaboratory liquid metal ultra. Or you could even take it the super compact direction and even create a smaller board than mini-itx and turn it into an ultimate htpc. And as well as the reduced size your whole system would benefit from the massive bandwidth (1.2TB/sec) and reduced latency. The memory pool could respond in real time to add more space for the gpu as necessary and since apu's are really only for 1080p that will never be a problem. I know this will probably never happen but if it did i would 100% build my htpc with an apu like that
  • Laststop311 - Saturday, March 28, 2015 - link

    As a side question, Is there some contractual agreement that will not allow AMD to sell these large 20 cu designed APU's on the regular pc market? Does sony have exclusive rights to the chip and the techniques used to make such a large igpu? Or is it die size and cost that scares AMD from making the chip for the PC market as their would be a much higher price compared to current apu's? I'm sure 4 excavator cores cant be much bigger than 8 jaguar so if its doable with 8 jaguar it should be doable with 4 excavator, especially if they put it on the 16/14nm finfet node?
  • silverblue - Saturday, March 28, 2015 - link

    I'm sure Sony would only be bothered if AMD couldn't fulfill their orders. A PC built to offer exactly the same as the PS4 would generally cost more anyway.

    They can't very well go from an eight FPU design to one with two/four depending on how you look at it, even if the clocks are much higher. I think you'd need to wait for the next generation of consoles.
  • FriendlyUser - Saturday, March 28, 2015 - link

    I really hope the developers put this to good use. I am also particularly excited about multicore scaling, since single threaded performance has stagnated (yes, even in the Intel camp).
  • jabber - Saturday, March 28, 2015 - link

    I think this shows that AMD has got a big boost from being the main partner with Microsoft on the Xbox. It's meant that AMD got a major seat at the top DX12 table from day one for a change. I hope to see some really interesting results now that it appears finally AMD hardware has been given some optimisation love other than Intel.
  • Tigran - Saturday, March 28, 2015 - link

    >>> Finally with 2 cores many of our configurations are CPU limited. The baseline changes a bit – DX11MT ceases to be effective since 1 core must be reserved for the display driver – and the fastest cards have lost quite a bit of performance here. None the less, the AMD cards can still hit 10M+ draw calls per second with just 2 cores, and the GTX 980/680 are close behind at 9.4M draw calls per second. Which is again a minimum 6.7x increase in draw call throughput versus DirectX 11, showing that even on relatively low performance CPUs the draw call gains from DirectX 12 are substantial. <<<

    Can you please explain how can it be? I thought the main advantage of new APIs is the workload of all CPU cores (instead of one in DX11). If so, should't the performance double in 2-core mode?Why there is 6.7x increase in draw call instead of 2x ?
  • Tigran - Saturday, March 28, 2015 - link

    Just to make it clear: I know there such advantage of Mantle and DX12 as direct addressing GPU, w/o CPU. But this test is about draw calls, requested from CPU to GPU. How can we boost the number of draw calls apart from using additional CPU core?

Log in

Don't have an account? Sign up now