FSB Impact on Performance

We've alluded to FSB bandwidth being a fundamental limitation in Intel's multiprocessor architecture, and now we're here to address the issue a bit further.
A major downside to Intel's reliance on an external North Bridge is that it becomes very expensive to implement multiple high speed FSB interfaces as well as a difficult engineering problem to solve once you grow beyond 2-way configurations. Unfortunately Intel's solution isn't a very elegant one; regardless of whether you're running 1, 2 or 4 Xeon processors they all share the same 64-bit FSB connection to the North Bridge.

The following diagram should help illustrate the bottleneck:

In the case of a 4-way Xeon MP system with a 400MHz FSB, each processor can be offered a maximum of 800MB/s of bandwidth to the North Bridge. If you try running a single processor Pentium 4 3.0GHz with a 400MHz FSB you'll note a significant performance decrease and that's while still giving the processor a full 3.2GB/s of FSB bandwidth; now if you cut that down to 800MB/s the performance of the processor would suffer tremendously.

It is because of this limitation that Intel must rely on larger on-die L3 caches to hide the FSB bottleneck; the more information that can be stored locally in the Xeon's on-die cache, the less frequently the Xeon must request for data to be sent over the heavily trafficked FSB.

What's even worse about this shared FSB is that the problem grows larger as you increase the number of CPUs and their clock speed. A 2-way Xeon system won't experience the negative effects of this FSB bottleneck as much as a 4-way Xeon MP; and a 4-way Xeon MP running at 3GHz will be hurting even more than a 4-way 2.0GHz Xeon MP. It's not a nice situation to be in, but there's nothing you can do to skirt the issue, which is where AMD's solution begins to appear to be much more appealing:

First remember that each Opteron has its own on-die North Bridge and memory controller, so there are no external chipsets to deal with. Each Opteron CPU features three point-to-point Hyper Transport links, delivering 3.2GB/s of bandwidth in each direction (6.4GB/s full duplex). The advantage is clear: as you scale the number of CPUs in an Opteron server there are no FSB bottlenecks to worry about. Scalability on the Opteron is king, which is the result of designing the platform first and foremost for enterprise level server applications.

Intel may be able to add 64-bit extensions to their Xeon MPs, but the performance bottlenecks that exist today will continue to plague the Xeon line until there's a fundamental architecture change.

A Confusing Market Hyper Threading and The Tests
Comments Locked

58 Comments

View All Comments

  • Blackbrrd - Wednesday, March 3, 2004 - link

    Hmm... the site below has some info about Numa (non unified memory architecture), and it looks like the os you're using isn't Numa enabled... Is this correct? Is there any real world benefit from Numa with Opteron?

    http://www.gamepc.com/labs/view_content.asp?id=opt...
  • zarjad - Wednesday, March 3, 2004 - link

    Could you speculate which way the advantage should be going in a BI benchmark (say TPC-H type of a test)? These are long running queries with gigabytes size tables.
  • Jason Clark - Wednesday, March 3, 2004 - link

    We started playing around with a couple of mysql benchmarks a few weeks ago namely OSDB and some new multithreaded benchmarks from MySQL themselves. We're hoping to get some valid tests that produce real results in the future.

    Cheers.
  • Jason Clark - Wednesday, March 3, 2004 - link

    In fact we did some recent testing to start out 64bit linux testing and mysql 4.0.17 on suse 64 had a segmentation fault starting <WINK> known issue for mysql as well... <WINK> <WINK>
  • Jason Clark - Wednesday, March 3, 2004 - link

    Steveoc, it hardly runs like a dog. Let's not turn this into a one sided os war :) The test make sense as they are, but a 64bit article is on the books for later. We've already been playing around with Suse 64bit and some others and whether you agree or not 64bit is still immature, period full stop. Support is there but it has some maturing to do.
  • steveoc - Wednesday, March 3, 2004 - link

    All these tests show is that Opteron, running Windows, runs like a Dog. As if we couldnt predict that result already ...

    The tests will only make sense once you are running 64bit linux. In fact, Id love to see a test of Dual Xeon + Win2003 + MSSQL vs Dual Opteron + 64bit Gentoo + 64bit MySQL .. that would be very interesting indeed.

    For anyone out there claiming that '64bit software has a looong way to go', that is only true for Windows. Unix (and Linux) have been running 64bit for a long time now, and the AMD64 has very good support under Linux.
  • dweigert - Wednesday, March 3, 2004 - link

    Seeing the difference whether NUMA us used or not would be *VERY* interesting. Also comparing against other NUMA aware OS's (Linux 2.63 or better kernel, or whatever) would be a good test too.
  • hirschma - Wednesday, March 3, 2004 - link

    #25 - Seems that it is not for sale to the general public, not that I could find. If anyone knows where/how to get one, please let me know.

    I have an application that is quite expensive and is licensed by the box, no matter how many CPUs it has ;) I'm guessing that building a low-end quad would give me more throughput per $$ than a second license/second box.

    Jonathan
  • Jason Clark - Wednesday, March 3, 2004 - link

    We're also looking at some 64bit .NET benchmarks as we're real close to having a real-world application that we can hammer.
  • Jason Clark - Wednesday, March 3, 2004 - link

    An interesting article would be the effect of NUMA on enterprise level applications. GamePC did a bit of a write up on it, but it was limited to desktop and synthetic benchmarks. Would any of you be interested in seeing the effects of NUMA on and off on the sql tests?

Log in

Don't have an account? Sign up now