Rambus in Cell

Rambus just proudly announced that their XDR memory interface would be used in the elusive Cell processor, being announced today at the International Solid State Circuits Society conference (ISSCC) in San Francisco. 

There's not much surprise that Rambus was selected to be involved with the Cell project, given their previous history with Sony and the Playstation 2, as well as their ability to deliver extremely high bandwidth memory devices on very low pincounts.  Sony and Toshiba also signed a licensing agreement back at the start of 2003 to work on the Cell project. 

For years Rambus has been telling us that they've been working with GPU manufacturers on getting their high-bandwidth designs into future GPU architectures, and their design win with Sony may just be the key to getting XDR on PC graphics cards as well - especially since NVIDIA handled GPU design for the Playstation 3.

The other interesting part of Rambus' announcement is that they are also responsible for the Cell processor interfaces - it's connection to the outside world (or to other Cell processors).  Rambus has had a serial processor bus interface in their IP repertoire for quite some time now, called FlexIO.  FlexIO is being used as the processor interface standard for Cell.

FlexIO implements two very important features - what Rambus is calling FlexPhase, and DRSL (Differential Rambus Signaling Level).  Normally when traces (wires on a PCB) are laid out, they have to be arranged in such a way that all of the traces going to the same chip have equivalent lengths.  As buses get wider and board designs become more complex, trace routing becomes a very serious engineering problem.  Because of the need to match trace lengths, you'll often see traces wrapped around themselves or laid out in artificially long paths to make sure that the signals they carry don't arrive sooner than they should.  FlexPhase is a technology that allows for on-chip data and clock alignment for signals that don't all arrive at the same time, allowing for traces that aren't matched in length on the PCB.  There is an added element of latency introduced by FlexPhase as the chip must handle all clock and data adjustments that are out of phase, but the idea is that what you lose in latency do to FlexPhase, you make up for it in design simplicity, potentially allowing for higher data rates. 

The next technology that FlexIO enables is DRSL with LVDS (Low Voltage Differential Signaling), which is a technology similar to what Intel uses in the Pentium 4 to reduce power consumption of their high-speed ALUs.  We will actually explain the technology in greater detail later on this week in unrelated coverage, but the basic idea is as follows: normally the lower the voltage you run your interfaces at, the more difficult it becomes to detect an electrical "high" from an electrical "low."  The reason being that it is quite easy to tell a 5V signal from a 0V signal, but telling a 0.9V signal from a 0V signal becomes much more difficult.  DRSL instead takes the difference between two voltage lines with a very low voltage difference and uses that difference for signaling.  By using low signal voltages, you can ensure that even though you may have a high speed bus, power consumption is kept to a minimum.  The technology isn't  quite sophisticated enough to make the transition to the mobile world, but with some additional circuitry to dynamically enable/disable interface pins it would be quite easy to apply FlexIO to mobile applications of the Cell architecture.

The culmination of these two features is that FlexIO offers up to 8.0GHz data rates based off of a 400 - 800MHz interface clock.  It is worth noting that such a high input clock frequency would inherently require some pretty sophisticated technologies to implement.

Because Rambus is providing both the memory and processor I/O interfaces for Cell, it's not too surprising that 90% of the Cell's signaling pins are using Rambus interfaces.  Looking at any modern day microprocessor, the biggest use of signaling pins goes to things like enabling multiprocessor support, a chipset interface and a memory interface (obviously varying based on the type of processor we're talking about) - so Rambus' statistics aren't too surprising. 

There are still some unanswered questions - mainly whether or not FlexIO will be used to interface with NVIDIA's graphics core (which we're guessing it will) and whether or not XDR will be used for the GPU's local memory (which we're also guessing it will).  Given the negative impression of Rambus amongst PC enthusiasts, a successful implementation in PS3 and with NVIDIA's GPU could mean a virtual second chance for Rambus in the PC market.

Intel’s Happy about Dual Core
Comments Locked

38 Comments

View All Comments

  • Idoxash - Monday, April 11, 2005 - link

    I'm glad to see RAMBUS back in the fight :) Maybe this time they will prove their selfs to everyone who has the better tech that's not dino old.

    --Idoxash--
  • Viditor - Monday, February 14, 2005 - link

    Derek - RDRAM is higher latency than SDRAM for anything less than 800MHz (which was essentially unavailable due to poor yields). Also, DDR had far lower latency than even the 800MHz...
    So, yes...the RDRAM roadmap would have been bad for AMD.
    Even with the on-die memory controller, latency appears to be far more critical to Athlon's performance than bandwidth (and of course the opposite is true for Netburst chips).
  • ceefka - Monday, February 14, 2005 - link

    #17 Would Rambus dare to sue Sony? That'll be the day.
  • srg - Monday, February 14, 2005 - link

    At first, I just wasn't impressed with Cell, now with Rambus on the case, I'm possitively against it.

    srg
  • DerekWilson - Monday, February 14, 2005 - link

    Though the discussion is probably over by now, I'm gonna add my two cents and say that AMD would definitely have benefited from RDRAM at the time. When it came out it was higher bandwidth and lower latency than PC100 or PC133. Doing these two things at the time when AMD still relied on the northbridge as a memory interface would have increased performance -- it's one more link in the chain that's stronger (or faster and wider as the case may be).

    If you don't think I'm right, look at nforce2. improved bandwidth to the system increased performance. at the time, latency was also a larger issue, but now that amd has an integraded memory controller it's not as much a problem.

    saying RDRAM would have been "bad" for AMD in terms of performance is probably not true. As far as business goes, AMD made a good decision not to support RDRAM.

    Going back to what many have said before, with RAMBUS, the technology was not the problem (it was very good) as much as their business philosophy (which was horrifically bad).

    But what about XDR + K10 ??

    It'll never happen (and thankfully so, if I do say so myself), and the bandwidth would likely be overkill for even the next AMD solution. Still, it's interesting to think about.
  • Viditor - Thursday, February 10, 2005 - link

    Thanks for the reply Jarred.

    I absolutely agree with you that RDRAM was a FAR better fit for the Netburst architecture (which is why AMD never embraced it, it would have been terrible for the Athlon architecture).
    On price however...IIRC, the price never came down until Intel began subsidizing it (I believe they spent ~$500 million doing so). The inherent problem wasn't market acceptance, it was:
    1. DDR is made with the same Fab lines as SDRAM, and they could actually determine which kind of memory they wanted at the last stage of assembly
    2. RDRAM required all new testing equipment, while DDR could continue using SDRAM testing equipment
    3. The bin-splits for higher clocked RDRAM (800 Mhz) was extremely poor (~15% IIRC), and the lower clocked RDRAM wasn't as good as the DDR.

    These are all the main reasons (IMHO) that Intel abondened RDRAM, because from a business standpoint (all things being equal), RDRAM was perfect for them and bad for AMD.

    As for DDR2, yields are still a bit low while they ramp up. But I don't disagree that they are milking it...

    Your point on 1MB/1066 is well taken, and I was quite surprised that Intel went with the 2MB cache choice (a VERY expensive decision!). I can only assume that they have been running into production problems...
    All that said, I don't see Intel being very competitive on the performance side until next year (JMHO) when Conroe is released. My impression is that they are (wisely) pushing that release as hard as they can and I wouldn't be surprised if it's quite early.

    Cheers, mate!
  • retrospooty - Thursday, February 10, 2005 - link

    RAMBUS = CACA ;)
  • JarredWalton - Thursday, February 10, 2005 - link

    11 - Sorry to not get back to you earlier on this, Viditor. What I said about Rambus and Pentium 3 not going well together is very accurate. Forget the price for a minute. The P3 could only have something like 2 outstanding (unfulfilled) RAM requests at the same time. I think the chipsets could also only support 4 open banks of memory at a time, so the fact that RDRAM could support up to 32 open banks went completely unused.

    P4, on the other hand, could handle more open banks/pages, more outstanding requests, and it had deeper buffers. Up until the 875P chipset, none of the DDR chipsets were actually able to surpass 850E for performance - and even then not in all areas. If Intel had stuck with RDRAM, PC1200 and even PC1600 would have surfaced, and it would be interesting to see performance of a P4 system with PC1600 RDRAM instead of PC3200 DDR.

    If you look at historical price trends, once production of RDRAM ramped up, there was actually a brief period where it was slightly cheaper than DDR memory. Then Intel released DDR chipsets and abandoned RDRAM, demand for RDRAM dried up, and the prices climbed back up. Anyway, look at DDR2 and tell me that memory manufacturers aren't milking new technologies for all they can.

    Shoulda, coulda, woulda... I don't hold any ill will towards Rambus, and if they can actually design a product that noticeably outperforms competitors, more power to them! In reality, of course, caches and such make the memory subsystem less of an impact on performance in a lot of applications. That's why FSB1066 is not doing much for Intel right now: the only official support is with CPUs that have 2MB of cache. I think a 1MB (or 512K) cache with FSB1066 would show more of a benefit. Maybe not enough to make it truly worthwhile, but more than the 3% or so that we saw with the P4XE 3.46.
  • retrospooty - Thursday, February 10, 2005 - link

    ICE 9

    All roads still lead to Rambus ? You aint been around long have you ?

    As I said before... We have been hearing this for years. R&D and unreleased products means nothing. Rambus is full of it, and cannot be beleived until there is a shipping product, and its independantly benchmarked and isnt 10x more expenssive than the competition.

    Its one thing to have specs, and partnerships, its totally another thing to ship working product at a price that consumers will be able to buy in mass quantities.

    RAmbus has proven inept at the latter.
  • Viditor - Thursday, February 10, 2005 - link

    Ice9 - Answer truthfully now, are you a Rambus shareholder? :-)

Log in

Don't have an account? Sign up now