A Quick Refresher on the RV770

As Cypress is a direct evolution of the RV770 design, before we talk about what’s new with Cypress we are going to go over a quick rehash of RV770’s internal workings. As it’s necessary to understand how RV770 was built to understand what Cypress changes, if you’re completely unfamiliar with RV770, please take a look at our expanded discussion of RV770 from last year. For the rest of you, let’s get started.

At the center of the RV770 is the Stream Processing Unit (SPU), a single arithmetic logic unit. The RV770 has 800 of these, and they are packaged together in groups of 5 and are what we call a Streaming Processor (SP). A SP contains a register file, a branch predictor, and the aforementioned 5 SPUs, with the 5th SPU being a more complex unit capable of transcendental functions along with the base functions of an ALU. The SP is the smallest unit that can do individual work; every SPU in an SP must execute the same instruction.

For every 16 SPs, AMD groups them together with texture units, L1 cache, shared memory, and controlling logic. This combined block is what AMD calls a SIMD, and RV770 has 10 of them. These 10 SIMDs form the core computational power of the RV770, and in the chip work with various specialized units such as ROPs, rasterizers, L2 cache, and tesselators to form a complete chip.

To utilize the computational power of the hardware, instruction threads are issued to the SPs. These threads are grouped into wavefronts, where there are 64 threads per wavefront. To maximize the utilization of the GPU, threads need to be organized so that they can feed all 5 SPUs in a SP an instruction every clock cycle. Doing this requires extracting instruction level parallelism (ILP) out of programs being passed to the GPU, which is difficult task of AMD’s compiler.

If SPUs go unused, then the performance of the chip suffers due to underutilization. This design gives AMD a great deal of theoretical computational power, but it is always a challenge to fully exploit it.

Meet the Rest of the Evergreen Family Cypress: What’s New
Comments Locked

327 Comments

View All Comments

  • Ryan Smith - Wednesday, September 23, 2009 - link

    The load temp is the same as a single card.
  • ilnot1 - Wednesday, September 23, 2009 - link

    Does anyone have a link to any review that compares 4850's, 4870's, and 4890's in Crossfire against the 5870 & 5870 CF setup?
  • T2k - Wednesday, September 23, 2009 - link

    FWIW: http://www.techpowerup.com/reviews/AMD/HD_5870_PCI...">http://www.techpowerup.com/reviews/AMD/HD_5870_PCI...
  • T2k - Wednesday, September 23, 2009 - link

    Ehh, I meant: http://www.techpowerup.com/reviews/ATI/Radeon_HD_5...">http://www.techpowerup.com/reviews/ATI/Radeon_HD_5...
  • ilnot1 - Wednesday, September 23, 2009 - link

    Thanks T2k, but the only cards that are in Crossfire in that review are the 58XX's. There are no other comparisons to cards in CF or SLI. Since Ryan included some of the most recent nVida cards in SLI I was hoping to find the 48XX's in CF.
  • T2k - Thursday, September 24, 2009 - link

    Basically the rule of thumb seems to be that at 1920x1200 a single 5870 is still slightly slower than 4870X2 and probably slightly faster than a 4850X2 2GB.
    I own the latter so I will wait this time - either they lower the initial price of the 5870X2 or they release a 5850X2, otherwise I'll pass because single 5870 is simply OVERPRICED as it is already.

  • T2k - Wednesday, September 23, 2009 - link

    Seriously: we get a very nice technical background section - then you top it with this more than idiotic collection of games for testing, leaving out 4850X2 2GB, 5850, using TWO stupid CryEngine-based PoS from Crytek, the most un-optimized code producers or WoW, of which even you admit it's CPU-bounded but now CoD:WaW, no Clear Sky, no UT3 or rather a single current Unreal Eninge-based game?

    Benchmarking part is ALMOST WORTHLESS, the only useful info is that unless you go above 1920x1200 the 4870X2 pretty much owns 5870's @ss as of now.
  • Ryan Smith - Wednesday, September 23, 2009 - link

    For what it's worth, Batman: Arkham Asylum is UE3 engine based.
  • T2k - Wednesday, September 23, 2009 - link

    OK, I missed that (probably because I found the game shots ugly and became uninterested.)
    But how about ET:QW? Yes, it's not the best looking game but it is still popular, let alone World at War which is both great looking and crazy popular, let alone Clear Sky which is a very demanding DX10.1 game? Where is Fallout 3? Where is Modern Warfare?
    FFS the most demanding are the quick ation-shooters and we, FPS players are the first one to upgrade to new cards...
  • Werelds - Thursday, September 24, 2009 - link

    How would ET:QW be a good benchmark? Last I checked, it's still limited to the 30 FPS animations, which makes running it at more than 30 FPS pointless because everything will look jerky.

    I agree something like the CoD games should be included for comparison's sake, but they're hardly a good benchmark or taxing on a system. QW does not fall into the same category though, it has a smaller active playerbase than even L4D which lost a lot of players due to the lack of updates.

Log in

Don't have an account? Sign up now