Tahiti: The First Direct3D 11.1 GPU

One of the many changes coming in Windows 8 next year will be the next iteration of Direct3D, which will be Direct3D 11.1. More so than any other version of Direct3D so far, D3D11.1 is best summed up as a housekeeping release. There will be some new features, but compared to even past point releases such as 10.1 and 9c it’s a small release that’s going to be focusing more on improving the API itself – particularly interoperability with SoC GPUs for Windows 8 – than it will be about introducing new features. This is largely a consequence of the growing length of time for all matters of development hardware and software. By the time Windows 8 ships Direct3D 11 will be 3 years old, but these days that’s shorter than the development period for some AAA games. Direct3D 11/11.1 will continue to be the current Windows 3D API for quite some time to come.

With regards to backward compatibility in D3D11.1, there’s one new feature in particular that requires new hardware to support it: Target Independent Rasterization. As a result AMD’s existing D3D11 GPUs cannot fully support D3D11.1, thereby making Tahiti the first D3D 11.1 GPU to be released. In practice this means that the hardware is once again ahead of the API, even more so than what we saw with G80 + D3D10 or Cypress (5870) + D3D11 since D3D11.1 isn’t due to arrive for roughly another year. For the time being Tahiti’s hardware supports it but AMD won’t enable this functionality until the future – the first driver with D3D11.1 support will be a beta driver for Windows 8, which we expect we’ll see for the Windows 8 beta next year.

So what does D3D11.1 bring to the table? The biggest end user feature is going to be the formalization of Stereo 3D support into the D3D API. Currently S3D is achieved by either partially going around D3D to present a quad buffer to games and applications that directly support S3D, or in the case of driver/middleware enhancement manipulating the rendering process itself to get the desired results. Formalizing S3D won’t remove the need for middleware to enable S3D on games that choose not to implement it, but for games that do choose to directly implement it such as Deus Ex, it will now be possible to do this through Direct3D.

S3D related sales have never been particularly spectacular, and no doubt the fragmentation of the market is partially to blame, so this may be the push in the right direction that the S3D market needs, if the wider consumer base is ready to accept it. At a minimum this should remove the need for any fragmentation/customization when it comes to games that directly support S3D.

With S3D out of the way, the rest of the D3D11.1 feature set isn’t going to be nearly as visible. Interoperability between graphics, video, and compute is going to be greatly improved, allowing video via Media Foundation to be sent through pixel and compute shaders, among other things. Meanwhile target independent rasterization and some new buffer commands should give developers a few more tricks to work with, while double precision (FP64) support will be coming to pixel shaders on hardware that has FP64 support.

Finally, looking at things at a lower level D3D11.1 will be released alongside DXGI 1.2 and WDDM 1.2, the full combination of which will continue Microsoft’s long-term goal of making the GPU more CPU-like. One of Microsoft’s goals has to been to push GPU manufacturers to improve the granularity of GPU preemption, both for performance and reliability purposes. Since XP things have gotten better as Vista introduced GPU Timeout Detection and Recovery (TDR) to reset hung GPUs, and a finer level of granularity has been introduced to allow multiple games/applications to share a GPU without stomping all over each other, but preemption and context switches are still expensive on a GPU compared to a CPU (there are a lot of registers to deal with) which impacts performance and reliability.

To that end preemption is being given a bit more attention, as WDDM 1.2 will be introducing some new API commands to help manage it while encouraging hardware developers to support finer grained preemption. Meanwhile to improve reliability TDR is getting a major addition by being able to do a finer grained reset of the GPU. Currently with Windows 7 a TDR triggers a complete GPU reset, but with Windows 8 and WDDM 1.2 the GPU will be compartmentalized into “engines” that can be individually reset. Only the games/applications using a reset engine will be impacted while everything else is left untouched, and while most games and applications can already gracefully handle a reset, this will further reduce the problems a reset creates by resetting fewer programs.

 

Building Tahiti & the Southern Islands Partially Resident Textures: Not Your Father’s Megatexture
Comments Locked

292 Comments

View All Comments

  • CeriseCogburn - Thursday, March 8, 2012 - link

    Interesting, amd finally copied nvidia...
    " This problem forms the basis of this benchmark, and the NQueen test proves once more that AMD's Radeon HD 7970 tremendously benefits from leaving behind the VLIW architecture in complex workloads. Both the HD 7970 and the GTX 580 are nearly twice as fast as the older Radeons. "

    When we show diversity we should also show that amd radeon has been massively crippled for a long time except when "simpleton" was the key to speed. "Superior architecture" actually means "simple and stupid" - hence "fast" at repeating simpleton nothings, but unable to handle "complex tasks".
    LOL - the dumb gpu by amd has finally "evolved".
  • chizow - Thursday, December 22, 2011 - link

    ....unfortunately its going to be pitted against Kepler for the long haul.

    There's a lot to like about Southern Islands but I think its going to end up a very similar situation as Evergreen vs. Fermi, where Evergreen released sooner and took the early lead, but Fermi ultimately won the generation. I expect similar with Tahiti holding the lead for the next 3-6 months until Kepler arrives, but Kepler and its refresh parts winning this 28nm generation once they hit the streets.

    Overall the performance and changes AMD made with Tahiti look great compared to Northern Islands, but compared to Fermi parts, its just far less impressive. If you already owned an AMD NI or Evergreen part, there'd be a lot of reason to upgrade, but if you own a Fermi generation Nvidia card there's just far less reason to, especially at the asking price.

    I do like how AMD opened up the graphics pipeline with Tahiti though, 384-bit bus, 3GB framebuffer, although I wonder if holding steady with ROPs hurts them compared to Kepler. It would've also been interesting to see how the 3GB GTX 580 compared at 2560 since the 1.5GB model tended to struggle even against 2GB NI parts at that resolution.
  • ravisurdhar - Thursday, December 22, 2011 - link

    My thoughts exactly. Can't wait to see what Kepler can do.

    Also...4+B transistors? mind=blown. I remember when we were ogling over 1B. Moore's law is crazy.... :D
  • johnpombrio - Wednesday, December 28, 2011 - link

    Exactly. If you look at all the changes that AMD did on the card, I would have expected better results: the power consumption decrease with the Radeon 7970 is mainly due to the die shrink to 28nm. NVidia is planning on a die shrink of their existing Fermi architecture before Kepler is released:

    http://news.softpedia.com/news/Nvidia-Kepler-Is-On...

    Another effect of the die shrink is that clock speed usually increases as there is less heat created at the lower voltage needed with a smaller transistor.

    The third change that is not revolutionary is the bump of AMD's 7970's memory bus from 384 bits (matching the 580) from the 6970's 256 bits along with 3GB DDR5 memory vs the GTX580's 1.5GB and the 6970's 2GB.

    The final non revolutionary change is bumping the number of stream processors by 33% from 1,536 to 2,048.

    Again, breaking out my calculator, the 35% bump in the number of stream processors ALONE causes the increase in the change in the benchmark differences between the 7970 and the 6970.

    The higher benchmark, however, does not show ANY OTHER large speed bumps that SHOULD HAVE OCCURED due to the increase in the memory bus size, the higher amount of memory, compute performance, texture fill rate, or finally the NEW ARCHITECTURE.

    If I add up all the increases in the technology, I would have expected benchmarks in excess of 50-60% over the previous generation. Perhaps I am naive in how much to expect but, hell, a doubling of transistor count should have produced a lot more than a 35% increase. Add the new architecture, smaller die size, and more memory and I am underwhelmed.
  • CeriseCogburn - Thursday, March 8, 2012 - link

    Well, we can wait for their 50%+ driver increase package+ hotfixes - because after reading that it appears they are missing the boat in drivers by a wide margin.
    Hopefully a few months after Kepler blows them away, and the amd fans finally allow themselves to complain to the proper authorities and not blame it on Nvida, they will finally come through with a "fix" like they did when the amd (lead site review mastas) fans FINALLY complained about crossfire scaling....
  • KaarlisK - Thursday, December 22, 2011 - link

    What is the power consumption with multiple monitors? Previously, you could not downclock GDDR5, so the resulting consumption was horrible.
  • Ryan Smith - Thursday, December 22, 2011 - link

    "On that note, for anyone who is curious about idle clockspeeds and power consumption with multiple monitors, it has not changed relative to the 6970. When using a TMDS-type monitor along with any other monitor, AMD has to raise their idle clockspeeds from 350MHz core and 600Mhz memory to 350MHz core and the full 5.5GHz speed for memory, with the power penalty for that being around 30W. Matched timing monitors used exclusively over DisplayPort will continue to be the only way to be able to use multiple monitors without incurring an idle penalty."
  • KaarlisK - Thursday, December 22, 2011 - link

    Thank you for actually replying :)
    I am so sorry for having missed this.
  • ltcommanderdata - Thursday, December 22, 2011 - link

    Great review.

    Here's hoping that AMD will implement 64-bit FP support across the whole GCN family and not just the top-end model. Seeing AMD's mobile GPUs don't use the highest-end chip, settling for the 2nd highest and lower, there hasn't been 64-bit FP support in AMD mobile GPUs since the Mobility HD4800 series. I'm interested in this because I can then dabble in some 64-bit GPGPU programming on the go. It also has implications for Apple since their iMacs stick to mobile GPUs, so would otherwise be stuck without 64-bit FP support which presumably could be useful for some of their professional apps.

    In regards to hardware accelerated Megatexture, is it directly applicable to id Tech 5's OpenGL 3.2 solution? ie. Will id Tech 5 games see an immediate speed-up with no recoding needed? Or does Partially Resident Texture support require a custom AMD specific OpenGL extension? If it's the later, I can't see it going anywhere unless nVidia agrees to make it a multivendor EXT extension.
  • Ryan Smith - Thursday, December 22, 2011 - link

    Games will need to be specifically coded for PRT; it won't benefit any current games. And you are correct in that it will require and AMD OpenGL extension to use (it won't be accessible from D3D at this time).

Log in

Don't have an account? Sign up now