Derek Gets Technical Again: Of Warps, Wavefronts and SPMD

From our GT200 review, we learned a little about thread organization and scheduling on NVIDIA hardware. In speaking with AMD we discovered that sometimes it just makes sense to approach the solution to a problem in similar ways. Like NVIDIA, AMD schedules threads in groups (called wavefronts by AMD) that execute over 4 cycles. As RV770 has 16 5-wide SPs (each of which process one "stream" or thread or whatever you want to call it) at a time (and because they said so), we can conclude that AMD organizes 64 threads into one wavefront which all must execute in parallel. After GT200, we did learn that NVIDIA further groups warps into thread blocks, and we just learned that their are two more levels of organization in AMD hardware.

Like NVIDIA, AMD maintains context per wavefront: register space, instruction stream, global constants, and local store space are shared between all threads running in a wavefront and data sharing and synchronization can be done within a thread block. The larger grouping of thread blocks enables global data sharing using the global data store, but we didn't actually get a name or specification for it. On RV770 one VLIW instruction (up to 5 operations) is broadcast to each of the SPs which runs on it's own unique set of data and subset of the register file.

To put it side by side with NVIDIA's architecture, we've put together a table with what we know about resources per SM / SIMD array.

NVIDIA/AMD Feature NVIDIA GT200 AMD RV770
Registers per SM/SIMD Core 16K x 32-bit 16K x 128-bit
Registers on Chip 491,520 (1.875MB) 163,840 (2.5MB)
Local Store 16KB 16KB
Global Store None 16KB
Max Threads on Chip 30,720 16,384
Max Threads per SM/SIMD Core 1,024 > 1,000
Max Threads per Warp/Wavefront 960 256 (with 64 reserved)
Max Warps/Wavefronts on Chip 512 We Have No Idea
Max Thread Blocks per SM/SIMD Core 8 AMD Won't Tell Us
That's right, AMD has 2.5MB of register space

We love that we have all this data, and both NVIDIA's CUDA programming guide and the documentation that comes with AMD's CAL SDK offer some great low level info. But the problem is that hard core tuners of code really need more information to properly tune their applications. To some extent, graphics takes care of itself, as there are a lot of different things that need to happen in different ways. It's the GPGPU crowd, the pioneers of GPU computing, that will need much more low level data on how resource allocation impacts thread issue rates and how to properly fetch and prefetch data to make the best use of external and internal memory bandwidth.

But for now, these details are the ones we have, and we hope that programmers used to programming massively data parallel code will be able to get under the hood and do something with these architectures even before we have an industry standard way to take advantage of heterogeneous computing on the desktop.

Which brings us to an interesting point.

NVIDIA wanted us to push some ridiculous acronym for their SM's architecture: SIMT (single instruction multiple thread). First off, this is a confusing descriptor based on the normal understanding of instructions and threads. But more to the point, there already exists a programming model that nicely fits what NVIDIA and AMD are both actually doing in hardware: SPMD, or single program multiple data. This description is most often attached to distributed memory systems and large scale clusters, but it really is actually what is going on here.

Modern graphics architectures process multiple data sets (such as a vertex or a pixel and its attributes) with single programs (a shader program in graphics or a kernel if we're talking GPU computing) that are run both independently on multiple "cores" and in groups within a "core". Functionally we maintain one instruction stream (program) per context and apply it to multiple data sets, layered with the fact that multiple contexts can be running the same program independently. As with distributed SPMD systems, not all copies of the program are running at the same time: multiple warps or wavefronts may be at different stages of execution within the same program and support barrier synchronization.

For more information on the SPMD programming model, wikipedia has a good page on the subject even though it doesn't talk about how GPUs would fit into SPMD quite yet.

GPUs take advantage of a property of SPMD that distributed systems do not (explicitly anyway): fine grained resource sharing with SIMD processing where data comes from multiple threads. Threads running the same code can actually physically share the same instruction and data caches and can have high speed access to each others data through a local store. This is in contrast to larger systems where each system gets a copy of everything to handle in its own way with its own data at its own pace (and in which messaging and communication become more asynchronous, critical and complex).

AMD offers an advantage in the SPMD paradigm in that it maintains a global store (present since RV670) where all threads can share result data globally if they need to (this is something that NVIDIA does not support). This feature allows more flexibility in algorithm implementation and can offer performance benefits in some applications.

In short, the reality of GPGPU computing has been the implementation in hardware of the ideal machine to handle the SPMD programming model. Bits and pieces are borrowed from SIMD, SMT, TMT, and other micro-architectural features to build architectures that we submit should be classified as SPMD hardware in honor of the programming model they natively support. We've already got enough acronyms in the computing world, and it's high time we consolidate where it makes sense and stop making up new terms for the same things.

That Darn Compute:Texture Ratio A Quick Primer on ILP and ILP vs. TLP Extraction
Comments Locked

215 Comments

View All Comments

  • FITCamaro - Wednesday, June 25, 2008 - link

    Yes I noticed it used quite a bit at idle as well. But its load numbers were lower. And as the other guy said, they probably just are still finalizing the drivers for the new cards. I'd expect both performance and idle power consumption to improve in the next month or two.
  • derek85 - Wednesday, June 25, 2008 - link

    I think ATI is still fixing/finalizing the Power Play, it should be much lower when new Catalyst comes out.
  • shadowteam - Wednesday, June 25, 2008 - link

    If a $200 card can play all your games @ 30+fps, does a $600 card even make sense knowing it'll do no better to your eyes? I see quite a few NV biased elements in your review this time around, and what's all that about the biggest die size TSMC's every produced? GTX's die may be huge, but compared to AMD's, it's only half as efficient. Your review title, I think, was a bit harsh toward AMD. By limiting AMD's victory only up to a price point of $299, you're essentially telling consumers that NV's GTX 2xx series is actually worth the money, which is a terribly biased consumer advice in my opinion. From a $600 GX2 to a $650 GTX 280, Nvidia's actually gone backwards. You know when we talk about AMD's financial struggle, and that the company might go bust in the next few years... part of the reason why that may happen is because media fanatics try to keep things on an even keel, and in doing so they completely forget about what the consumers actually want. No offence to AT, but I've been into media myself, and I can tell when even professionals sound biased.
  • paydirt - Wednesday, June 25, 2008 - link

    You're putting words into the reviewer(s) mouth(s) and you know it. I am pretty sure most readers know that bigger isn't better in the computing world; anandtech never said big was good, they are simply pointing out the difference, duh. YOU need to keep in mind that nVidia hasn't done a die shrink yet with the GTX 2XX...

    I also did not read anything in the review that said it was worth it (or "good") to pay $600 on a GPU, did you? Nope. Thought so. Quit trying to fight the world and life might be different for you.

    I'm greatful that both companies make solid cards that are GPGPU-capable and affordable and we have sites like anandtech to break down the numbers for us.

  • shadowteam - Wednesday, June 25, 2008 - link

    Are you speaking on behalf of the reviewers? You've obviously misunderstood the whole point I was trying to make. When you say in your other post that AT is a reviews site and not a product promoter, I feel terribly sorry you because reviews sites are THE best product promoters around, including AT, and Derek pointed this out earlier that AT's too influential to ignore by companies. Well if that is truly the case, why not type in block letters how NV's trying to rip us off, for consumers' sake, may be just for once do it, it'll definitely teach Nvidia a lesson.
  • DaveninCali - Wednesday, June 25, 2008 - link

    I completely agree. Anand, the GTX 260/280 are a complete waste of money. You are not providing adequate conclusions. Your data speaks for itself. I know you have to be "friendly" in your conclusions so that you don't arouse the ire of nVidia but the launch of the 260/280 is on the order of the FX series.

    I mean you can barely test the cards in SLI mode due to the huge power constraints and the price is ABSOLUTELY ridiculous. $1300 for SLI GTX 280. $1300!!!! You can get FOUR 4870 cards for less than this. FOUR OF THEM!!!! You should be screaming how poorly the GTX 280/260 cards are at these performance numbers and price point.

    The 4870 beats the GTX 260 in all but one benchmark at $100 less. Not to mention the 4870 consumes less power than the GTX 280. Hell, the 4870 even beats the GTX 280 in some benchmarks. For $350 more, there shouldn't even be ONE game that the 4870 is better at than the GTX 280. Not even more for more than 100% of the price.

    I'm not quite sure what you are trying to convey in this article but at least the readers at Anandtech are smart enough to read the graphs for themselves. Given what has been written in the conclusion page (3/4 of it about GPGPU jargon that is totally unnecessary) could you please leave the page blank instead.

    I mean come on. Seriously! $1300 compared to $600 with much more performance coming from the 4870 SLI. COME ON!! Now I'm too angry to go to bed. :(
  • DaveninCali - Wednesday, June 25, 2008 - link

    Oh and one other thing. I thought Anandtech was a review site for the consumer. How can you not warn consumers from spending $650 much less $1300 on a piece of hardware that isn't much faster and in some cases not faster at all than another piece of hardware priced at $300/$600 in SLI. It's borderline scam.

    When you can't show SLI numbers because you can't even find a power supply that can provide the power, at least an ounce of criticism should be noted to try and stop someone from wasting all that money.

    Don't you think that consumers should be getting some better advise than this. $1300 for less performance. I feel so sad now. Time to go to sleep.
  • shadowteam - Wednesday, June 25, 2008 - link

    It reminds of that NV scam from yesteryears... I'm forgetting a good part of it, but apparently NV and "some company" racked up some forum/blog gurus to promote their BS, including a guy on AT forums who eventually got rid off due to his extremely biased posts. If AT can do biased reviews, I can pretty much assure you the rest of the reviewers out there are nothing more than just misinformed, over-arrogant media puppets. To those who disagree w/ me or the poster above, let me ask you this... if you were sent out $600 hardware every other week, or in AT's case, every other day (GTX280's from NV board partners), would you rather delightfully, and rightfully, piss NV off, or shut your big mouth to keep the hardware, and cash flowing in?
  • DerekWilson - Wednesday, June 25, 2008 - link

    Wow ...

    I'm completely surprised that you reacted the way you did.

    In our GT200 review we were very hard on NVIDIA for providing less performance than a cheaper high end part, and this time around we pointed out the fact that the 4870 actually leads the GTX 260 at 3/4 of the price.

    We have no qualms about saying anything warranted about any part no matter who makes it. There's no need to pull punches, as what we really care about are the readers and the technology. NVIDIA really can't bring anything compelling to the table in terms of price / performance or value right now. I think we did a good job of pointing that out.

    We have mixed feelings about CrossFire, as it doesn't always scale well and isn't as flexible as SLI -- hopefully this will change with R700 when it hits, but for now there are still limitations. When CrossFire does work, it does really well, and I hope AMD work this out.

    NVIDIA absolutely need to readjust the pricing of most of their line up in order to compete. If they don't then AMD's hardware will continue to get our recommendation.

    We are here because we love understanding hardware and we love talking about the hardware. Our interest is in reality and the truth of things. Sometimes we can get overly excited about some technology (just like any enthusiast can), but our recommendations always come down to value and what our readers can get from their hardware today.

    I know I can speak for Anand when I say this (cause he actually did it before his site grew into what it is today) -- we would be doing this even if we weren't being paid for it. Understanding and teaching about hardware is our passion and we put our heart and soul into it.

    there is no amount of money that could buy a review from us. no hardware vendor is off limits.

    in the past companies have tried to stop sending us hardware because they didn't like what we said. we just go out and buy it ourselves. but that's not likely to be an issue at this point.

    the size and reach of AnandTech today is such that no matter how much we piss off anyone, Intel, AMD, NVIDIA, or any of the OEMs, they can't afford to ignore us and they can't afford to not send us hardware -- they are the ones who want an need us to review their products whether we say great or horrible things about it.

    beyond that, i'm 100% sure nvidia is pissed off with this review. it is glowingly in favor of the 4870 and ... like i said ... it really shocks me that anyone would think otherwise.

    we don't favor level playing fields or being nice to companies for no reason. we'll recommend the parts that best fit a need at a price if it makes sense. Right now that's 4870 if you want to spend between $300 and $600 (for 2).

    While it's really really not worth the money, GTX 280 SLI is the fastest thing out there and some people do want to light their money on fire. Whatever.

    i'm sorry you guys feel the way you do. maybe after a good night sleep you'll come back refreshed and see the article in a new light ...
  • formulav8 - Wednesday, June 25, 2008 - link

    Even in the review you claim 4870 is a $400 performer. So why don't you reflect that in the articles title by adding it after the $300 price?? Would be better to do so I think anyways. :)

    Maybe say 4870 wins up to the $400 price point and likewise with the 4850 version up to the $250 price that you claimed in the article...

    This tweak could be helpful to some buyers out there with a specific budget and could help save them some money in the process. :)


    Jason

Log in

Don't have an account? Sign up now