FSB Impact on Performance

We've alluded to FSB bandwidth being a fundamental limitation in Intel's multiprocessor architecture, and now we're here to address the issue a bit further.
A major downside to Intel's reliance on an external North Bridge is that it becomes very expensive to implement multiple high speed FSB interfaces as well as a difficult engineering problem to solve once you grow beyond 2-way configurations. Unfortunately Intel's solution isn't a very elegant one; regardless of whether you're running 1, 2 or 4 Xeon processors they all share the same 64-bit FSB connection to the North Bridge.

The following diagram should help illustrate the bottleneck:

In the case of a 4-way Xeon MP system with a 400MHz FSB, each processor can be offered a maximum of 800MB/s of bandwidth to the North Bridge. If you try running a single processor Pentium 4 3.0GHz with a 400MHz FSB you'll note a significant performance decrease and that's while still giving the processor a full 3.2GB/s of FSB bandwidth; now if you cut that down to 800MB/s the performance of the processor would suffer tremendously.

It is because of this limitation that Intel must rely on larger on-die L3 caches to hide the FSB bottleneck; the more information that can be stored locally in the Xeon's on-die cache, the less frequently the Xeon must request for data to be sent over the heavily trafficked FSB.

What's even worse about this shared FSB is that the problem grows larger as you increase the number of CPUs and their clock speed. A 2-way Xeon system won't experience the negative effects of this FSB bottleneck as much as a 4-way Xeon MP; and a 4-way Xeon MP running at 3GHz will be hurting even more than a 4-way 2.0GHz Xeon MP. It's not a nice situation to be in, but there's nothing you can do to skirt the issue, which is where AMD's solution begins to appear to be much more appealing:

First remember that each Opteron has its own on-die North Bridge and memory controller, so there are no external chipsets to deal with. Each Opteron CPU features three point-to-point Hyper Transport links, delivering 3.2GB/s of bandwidth in each direction (6.4GB/s full duplex). The advantage is clear: as you scale the number of CPUs in an Opteron server there are no FSB bottlenecks to worry about. Scalability on the Opteron is king, which is the result of designing the platform first and foremost for enterprise level server applications.

Intel may be able to add 64-bit extensions to their Xeon MPs, but the performance bottlenecks that exist today will continue to plague the Xeon line until there's a fundamental architecture change.

A Confusing Market Hyper Threading and The Tests
Comments Locked

58 Comments

View All Comments

  • Jason Clark - Wednesday, March 3, 2004 - link

    Pumpkin not really..my point is that we used a standard shipping opteron system. I'm not questioning that Opterons support DDR400 or that if you wanted to "tweak" out a server (which is rarely done) that you could. My point is that currently quad opterons are shipping with DDR333 (what we tested). I'm sure (as I said) that down the road ddr400 will be a reality for the boxed/packaged systems but obviously right now it is not. All 4 systems that were shipped to us all came with ddr33 not ddr400.

    L8r
  • Jeff7181 - Wednesday, March 3, 2004 - link

    I'm surprised nobody has speculated about who the corporation was that helped do the testing.

    I'll speculate that it was newegg.com =)
  • Jeff7181 - Wednesday, March 3, 2004 - link

    I 2nd #15 motion for pics =)
  • DBBoy - Wednesday, March 3, 2004 - link

    Taken from an artilce on the new 4MB L3 products.

    The new 3-GHz Xeon MP with 4 Mbytes of cache is listed by Intel as available for $3,692 each in quantities of 1,000.
  • Tessel8 - Wednesday, March 3, 2004 - link

    Why do all of the benchmark results page refer to "Potomac" as the 2-way Xeon 3.2GHz processor. This is absolutely not correct (maybe you are refering to Prestonia?).

    Ex. The results are split up into two categories: 2-way and 4-way setups. Remember that the 3.2GHz Potomac based Xeon is only available in 2-way configurations and is thus absent from the 4-way graphs.

    I believe only the last paragraph on the last page is the only one refering to the correct Potomac processor.
  • Pumpkinierre - Wednesday, March 3, 2004 - link

    #30 Jason, Your statement would be in conflict with your previous server comparison article(http://www.anandtech.com/IT/showdoc.html?i=1935&am...

    "Just recently, the x48 parts were launched, and with them, the Opteron gained support for DDR400 memory. Support for DDR400 has trickled down to all members of the Opteron family, but only certain revisions of the CPUs support DDR400"

    I certainly thought they released 4 new DDR400 opterons late last year, covering all configs. At any rate it is the 2way that is in question and you had 2way 533MHz Xeons so, by rights, you should have used opteron 248s as this would be what an customer interested in this configuration would buy. The price of these is half again of the 848 making them even more attractive:

    http://www.amd.com/us-en/Corporate/VirtualPressRoo...

    You had two 248s in that last server article but again used DDR333. The photo on Pg 2 showed one of the opterons as an "AM" revision which, you state in the article, qualifies for DDR400 support. Of course, if these cpus, DDR400 Reg. modules or enabled Mobo were not on hand then it cant be helped and as you say the DDR333 setup still shows the Xeon memory structural problem.

    Sante
  • TrogdorJW - Tuesday, March 2, 2004 - link

    Wow... given that the 533 FSB on the 2-way Xeons easily makes up for the difference in cache size, I'm amazed that Intel hasn't actually validated an 800 FSB Xeon solution. Then again, Intel is *SO* cautious with introducing advancements in technology, especially in the server/enterprise markets. Not only would they have to validate the faster CPU, but the motherboard and chipset validation would probably take them a year at least. (Who knows... they might be working on this as we speak.) Too bad the P4EE aren't dual-CPU capable (I think) - that would be interesting to see benchmarks. Not that any real corporation would dare to go that route, but still, interesting.

    It will be interesting to see what happens with the Nocoma cores (and later Potomac). The 1 MB L2 cache can help out in desktop applications and more or less overcome the longer pipeline, but on Xeons where you're already running 2 MB L3 cache, I don't know that it will be as useful. Then again, the 800 FSB will probably more than make up for the deeper pipeline.

    Needless to say, Intel definitely has some work to do. I'm waiting for them to migrate the Pentium M (P6 core with improvements) back to the desktop. Heheheh....
  • lneves - Tuesday, March 2, 2004 - link

    Can you guys share the "SQL Loader" benchmark tool and the scripts used?
    Thanks.
  • Jason Clark - Tuesday, March 2, 2004 - link

    Grayswan, each proc had 1 gb as that is how it has to be configured.

    More thoughts on DDR400. After doing a bit more reading I've confirmed that most all quad opterons ship with ddr333 so our tests conformed to what was available at the time of testing. Testing something that isn't a standard shipping configuration doesn't help people making a buying decision now. Most all quad opterons won't be hand built by an organization, they will be ordered as complete systems. Maybe later on this year we'll see a shift to ddr 400 and we can run some numbers.

    Examples:
    http://www.swt.com/qo.html
    http://www.appro.com/product/server_4144h_2.asp
  • Grayswan - Tuesday, March 2, 2004 - link

    What was the memory organization on the opterons? All memory on 1 proc? 2 modules on each proc? Also the 4-way opteron diagrom on P.3 shows each proc only using 2 interconnects. I believe all 3 are used so the diagram should be "crossbar"ish.

Log in

Don't have an account? Sign up now