DirectX11 Redux

With the launch of the 5800 series, AMD is quite proud of the position they’re in. They have a DX11 card launching a month before DX11 is dropped on to consumers in the form of Win7, and the slower timing of NVIDIA means that AMD has had silicon ready far sooner. This puts AMD in the position of Cypress being the de facto hardware implementation of DX11, a situation that is helpful for the company in the long term as game development will need to begin on solely their hardware (and programmed against AMD’s advantages and quirks) until such a time that NVIDIA’s hardware is ready. This is not a position that AMD has enjoyed since 2002 with the Radeon 9700 and DirectX 9.0, as DirectX 10 was anchored by NVIDIA due in large part to AMD’s late hardware.

As we have already covered DirectX 11 in-depth with our first look at the standard nearly a year ago, this is going to be a recap of what DX11 is bringing to the table. If you’d like to get the entire inside story, please see our in-depth DirectX 11 article.

DirectX 11, as we have previously mentioned, is a pure superset of DirectX 10. Rather than being the massive overhaul of DirectX that DX10 was compared to DX9, DX11 builds off of DX10 without throwing away the old ways. The result of this is easy to see in the hardware of the 5870, where as features were added to the Direct3D pipeline, they were added to the RV770 pipeline in its transformation into Cypress.

New to the Direct3D pipeline for DirectX 11 is the tessellation system, which is divided up into 3 parts, and the Computer Shader. Starting at the very top of the tessellation stack, we have the Hull Shader. The Hull Shader is responsible for taking in patches and control points (tessellation directions), to prepare a piece of geometry to be tessellated.

Next up is the tesselator proper, which is a rather significant piece of fixed function hardware. The tesselator’s sole job is to take geometry and to break it up into more complex portions, in effect creating additional geometric detail from where there was none. As setting up geometry at the start of the graphics pipeline is comparatively expensive, this is a very cool hack to get more geometric detail out of an object without the need to fully deal with what amounts to “eye candy” polygons.

As the tesselator is not programmable, it simply tessellates whatever it is fed. This is what makes the Hull Shader so important, as it’s serves as the programmable input side of the tesselator.

Once the tesselator is done, it hands its work off to the Domain Shader, along with the Hull Shader handing off its original inputs to the Domain Shader too. The Domain Shader is responsible for any further manipulations of the tessellated data that need to be made such as applying displacement maps, before passing it along to other parts of the GPU.

 

 

The tesselator is very much AMD’s baby in DX11. They’ve been playing with tesselators as early as 2001, only for them to never gain traction on the PC. The tesselator has seen use in the Xbox 360 where the AMD-designed Xenos GPU has one (albeit much simpler than DX11’s), but when that same tesselator was brought over and put in the R600 and successive hardware, it was never used since it was not a part of the DirectX standard. Now that tessellation is finally part of that standard, we should expect to see it picked up and used by a large number of developers. For AMD, it’s vindication for all the work they’ve put into tessellation over the years.

The other big addition to the Direct3D pipeline is the Compute Shader, which allows for programs to access the hardware of a GPU and treat it like a regular data processor rather than a graphical rendering processor. The Compute Shader is open for use by games and non-games alike, although when it’s used outside of the Direct3D pipeline it’s usually referred to as DirectCompute rather than the Compute Shader.

For its use in games, the big thing AMD is pushing right now is Order Independent Transparency, which uses the Compute Shader to sort transparent textures in a single pass so that they are rendered in the correct order. This isn’t something that was previously impossible using other methods (e.g. pixel shaders), but using the Compute Shader is much faster.

 


 

Other features finding their way into Direct3D include some significant changes for textures, in the name of improving image quality. Texture sizes are being bumped up to 16K x 16K (that’s a 256MP texture) which for all practical purposes means that textures can be of an unlimited size given that you’ll run out of video memory before being able to utilize such a large texture.

The other change to textures is the addition of two new texture compression schemes, BC6H and BC7. These new texture compression schemes are another one of AMD’s pet projects, as they are the ones to develop them and push for their inclusion in DX11. BC6H is the first texture compression method dedicated for use in compressing HDR textures, which previously compressed very poorly using even less-lossy schemes like BC3/DXT5. It can compress textures at a lossy 6:1 ratio. Meanwhile BC7 is for use with regular textures, and is billed as a replacement for BC3/DXT5. It has the same 3:1 compression ratio for RGB textures.

We’re actually rather excited about these new texture compression schemes, as better ways to compress textures directly leads to better texture quality. Compressing HDR textures allows for larger/better textures due to the space saved, and using BC7 in place of BC3 is an outright quality improvement in the same amount of space, given an appropriate texture. Better compression and tessellation stand to be the biggest benefactors towards improving the base image quality of games by leading to better textures and better geometry.

We had been hoping to supply some examples of these new texture compression methods in action with real textures, but we have not been able to secure the necessary samples in time. In the meantime we have Microsoft’s examples from GameFest 2008, which drive the point home well enough in spite of being synthetic.

Moving beyond the Direct3D pipeline, the next big feature coming in DirectX 11 is better support for multithreading. By allowing multiple threads to simultaneously create resources, manage states, and issue draw commands, it will no longer be necessary to have a single thread do all of this heavy lifting. As this is an optimization focused on better utilizing the CPU, it stands that graphics performance in GPU-limited situations stands to gain little. Rather this is going to help the CPU in CPU-limited situations better utilize the graphics hardware. Technically this feature does not require DX11 hardware support (it’s a high-level construct available for use with DX10/10.1 cards too) but it’s still a significant technology being introduced with DX11.

Last but not least, DX11 is bringing with it High Level Shader Language 5.0, which in turn is bringing several new instructions that are primarily focused on speeding up common tasks, and some new features that make it more C-like. Classes and interfaces will make an appearance here, which will make shader code development easier by allowing for easier segmentation of code. This will go hand-in-hand with dynamic shader linkage, which helps to clean up code by only linking in shader code suitable for the target device, taking the management of that task out of the hands of the coder.

Cypress: What’s New The First DirectX 11 Games
Comments Locked

327 Comments

View All Comments

  • erple2 - Tuesday, September 29, 2009 - link

    What the heck are you talking about? Are you saying that electricity consumed by a device divided by the "volume" of the device is the only way to measure the heat output of the device? Every single Engineering class I took tells me that's wrong, and I'm right. I think you need to take some basic courses in Electrical Engineering and/or Thermodynamics.

    (simplified)
    power consumed = work + waste

    You're looking for the waste heat generated by the device. If something can completely covert every watt of electricity that passes through it to do some type of work (light a light bulb, turn a motor, make some calculation on a GPU etc), then it's not going to heat up. As a result, you HAVE to take into consideration how inefficient the particular device is before you can make any claim about how much the device heats up.

    I'll bet that if you put a Liquid Nitrogen cooler on every ATI card, and used the standard air coolers on every NVidia card, that the ATI cards are going to run crazy cooler than the NVidia cards.

    Ultimately the temperature of the GPU's depends a significant amount on the efficiency of the cooler, and how much heat the GPU is generating as waste. My point is that we don't have enough data to determine whether the ATI die runs hot because the coolers are less than ideal, Nvidia ones are closer to ideal, the die is smaller, or whatever you have. You have to look at a combination of the efficiency of the die (how well it converts input power to "work done"), the efficiency of the cooler (how well it removes heat from it's heat source), and the combination of the two.

    I'd posit that the ATI card is more efficient than the NVidia card (at least in WoW, the only thing we have actual numbers of the "work done" and "input power consumed").

    Now, if you look at the measured temperature of the core as a means of comparing the worthiness of one GPU over another, I think you're making just as meaningful a comparison as comparing the worthiness of the GPU based on the size of the retail box that it comes in.
  • SiliconDoc - Friday, September 25, 2009 - link

    You simply repeated my claim about watts, and replaced core size, with fps, and created a framerate per watt chart, that has near nothing to do with actual heat inside the die, since the SIZE of the die, vs the power traversing through it is the determining factor, affected by fan quality (ram size as well).
    Your argument is "framerate power efficiency", as in watts per framerate, and has nothing to do with core temperature (modified by fan cooling of course to some degree), that the article indeed posts except for the two failed ati cards.
    The problem with your flawwed "science" that turns it into hokum, is that no matter what outputs on the screen, the HEAT generated by the power consumption of the card itself, remains in the card, and is not "pumped through the videoport to the screen".
    If you'd like to claim "wattage vs framerate" efficency for 5870, fine I've got no problem, but claiming that proves core temps are not dependent on power consumption vs die size ( modified by the rest of the card *mem size/power useage/ and the fan heatsink* ) is RIDICULOUS.
    ---
    The cards are generally equivalent manufacturing and component additions, so you take the wattage consumed (by the core) and divide by core size, for heat density.
    Hence, ATI cards, smaller cores and similar power consumption, wind up hotter.
    That's what the charts show, that's what should be stated, that is the rule, and that's the way it plays in the real world, too.
    ---
    The only modification to that is heatsink fan efficiency, and I don't find you fellas claiming stock NVIDIA fans and heatsinks are way better than the ATI versions, hence 66C for NVIDIA, 75C, 85C, etc, and only higher for ATI, in all their cards listed.
    Would you like to try that one on for size ? Should I just make it known that NVIDIA fans and heatsinks are superior to ATI ?
    What is true is a lager surface area (die side squared) dissipates the same amount of heat easier, and that of course is what is going on.
    ATI dies are smaller ( by a marked surface area as has so foten been pointed out), and have similar power consumption, and a higher DENSITY of heat generation, and therefore run hotter.
  • erple2 - Friday, September 25, 2009 - link

    Oops, "milliwatt" should be "kilowatt". I got the decimal place mixed up - I used kilowatt since I thought it was easier to see than 0.247, 0.140, 0.137, 0.181...
  • SiliconDoc - Wednesday, September 23, 2009 - link

    Let's take that LOAD TEMP chart and the article's comments. Right above it, it is stated a good cooler includes the 4850 that ILDE TEMPs in at around 40C (it's actually 42C the highest of those mentioned).
    "The floor for a good cooler looks to be about 40C, with the GTS 250(39C), 3870(41C), and 4850 all turning in temperatures around here"
    OK, so the 4850 has a good cooler, as well as the 3870... then right below is the LOAD TEMP.. and the 4850 is @ 90C -OBVIOUSLY that good cooler isn't up to keeping that tiny hammered core cool...

    3870 is at 89C, 4870 is at 88C, 5870 is at 89C ALL ati....
    but then, nvidia...
    250, 216, 285, 275 all come in much lower at 66C to 85C.... but "temps are all over the place".
    NOT only that crap, BUT the 4890 and 4870x2 are LISTED but with no temps - and take the "coolest position" on the chart!
    Well we KNOW they are in the 90C range or higher...
    So, you NEVER MENTION why 4870x2 and 4980 are "no load temp shown in the chart" - you give them the WINNING SPOTS anyway, you fail to mention the 260's 65C lowest LOAD WIN and instead mention GTX275 at 75C...LOL

    The bias is SO THICK it's difficult to imagine how anyone came up with that CRAP, frankly.
    So the superhot 4980 and 4870x2 are given #1 and #2 spots repsectively, a free ride, the other Nvidia cards KICK BUTT in lower load temps EXCEPT the 295, but it makes sure to mention the 8800GT while leaving the 4980 and 4870x2 LOAD TEMP spots blank ?
    roflmao
    ---
    What were you saying about "why" ? If why the 8800GT was included is TRUE, then comment on the gigantic LOAD TEMP bias... tell me WHY.
  • SiliconDoc - Wednesday, September 23, 2009 - link

    AND, you don't take temps from WOW to use for those two, which no doubt even though it is NOT gpu stressing much, will yeild the 90C for those two cards 4870x2 and 4980, anyway.
    So they FAIL the OCCT, but you have NOTHING on them, which would if listed put EVERY SINGLE ATI CARD @ near 90C LOAD, PERIOD...
    ---
    And we just CANNOT have that stark FACT revelaed, can we ? I mean I've seen this for well over a year here now.
    LET's FINALLY SAY IT.
    ---
    LOAD TEMPS ON THE ATI CARDS ARE ALL, EVERY SINGLE ONE NEAR 90c, much higher than almost ALL of the Nvidia cards.
  • pksta - Thursday, September 24, 2009 - link

    I just want to know...With this much zeal about videocards and more specifically the bias that you see, doesn't it make you sound biased too? Can you say that you have owned the cards you are bashing and seen the differences firsthand? I can say I did. I had an 8800 GT and it was running in the upper 80s under load. I switched to my 4850 with the worst cooler I think I've ever seen mind you, and it stays in the mid to upper 60s under load. The cooler on the 8800 gt was the dual-slot design that was the original reference design. The 4850 had the most pathetic fan I've ever seen. It was similar to the fan and heatsink Intel used on the first Core2 stuff. It was the really cheap aluminum with a tiny copper circle that made contact with the die itself. Now, don't get me wrong I love ATI...But I also love nVidia...Anything that keeps games getting better and prices getting better. I honestly don't think, though, that the article is too biased. I think maybe a little for ATI but nothing to rage on and on about. Besides...Calm down. You know nVidia will have a response for this.
  • SiliconDoc - Sunday, September 27, 2009 - link

    1. Who cares what you think about how you percieve me ? Unless you have a fact to refute, who cares ? What is biased ? There has been quite a DISSSS on PhysX for quite some time here, but the haters have no equal alternative - NOTHING that even comes close. Just ASK THEM. DEAD SILENCE. So, strangely, the very best there is, is BAD.

    Now ask yourself again who is biased, won't you? Ask yourself who is spewing out the endless strings... Do yourself a favor and figure it out. Most of them have NEVER tried PhysX ! They slip up and let it be known, when they are slamming away. Then out comes their PC hate the greedy green rage, and more, because they have to, to fit in the web PC code, instead of thinking for themselves.

    2. Yes, I own more cards currently than you will in your entire life. I started retail computer well over a decade ago.

    3. And now, the standard red rooster tale. It sounds like you were running in 2d clocks 100% of the time, probably on a brand board like a DELL. Happens a lot with red cards. Users have no idea.
    4850 with The worst fan in the World ! ( quick call Keith Olbermann) and it was ice cold, a degree colder than anything else in the review. ROFLMAO
    Once again, the red shorts pinnocchio tale. Forgive me while I laugh, again !
    ROFLMAO
    Did you ever put your finger on the HS under load ? You should have. Did you check your 3D mhz..
    http://forums.anandtech.com/messageview.aspx?catid...">http://forums.anandtech.com/messageview.aspx?catid...
    Not like 90C is offbase, not like I made up that forum thread.

    4. I could care less if nvidia has a response or not. Point is, STOP LYING. Or don't. I certainly have noticed many of the lies I've complained about over a year or so have gone dead silent, they won't pull it anymore, and in at least one case, used in reverse for more red bias, unfortunately, before it became the accepted idea.

    So, I do a service, at the very least people are going to think, and be helped, even if they hate me.
  • SiliconDoc - Wednesday, September 23, 2009 - link

    Well of course that's the excuse, but I'll keep my conclusion considering how the last 15 reviews on the top videocards were done, along with the TEXT that is pathetically biased for ati, that I pointed out. (Even though Derek was often the author).
    --
    You want ot tell me how it is that ONLY the GTX295 is near or at 90C, but ALL the ati cards ARE, and we're told "temperatures are all over the place" ?
    Can you really explain that, sir ?
  • 529th - Wednesday, September 23, 2009 - link

    holy shit, a full review is up already!
  • bill3 - Wednesday, September 23, 2009 - link

    Does the article keep referring to Cypress as "too big"? If Cypress is too big, what the hell is GT200 at 480mm^2 or whatever it was? Are you guys serious with that crap?

    I've heard that the "sweet spot" talk from AMD was a bit of a misdirection from the start anyway. IMO if AMD is going to compete for the performance crown or come reasonably close (and frankly, performance is all video card buyers really care about, as we see with all the forum posts only mentioning that GT300 will supposedly be faster than 58XX and not anything else about it) then they're going to need slightly bigger dies. So Cypress being bigger is a great thing. If anything it's too small. Imagine the performance a 480mm^2 Cypress would have! Yes, Cypress is far too small, period.

    Personally it's wonderful to see AMD engineer two chips this time, a bigger performance one and smaller lower end one. This works out far better all around.

    The price is also great. People expecting 299 are on crack.

Log in

Don't have an account? Sign up now