The Problem with Intel's Approach

The major issue with Intel's approach to dual core designs is that the dual cores must contest with one another for bandwidth across Intel's 64-bit NetBurst FSB. To make matters worse, the x-series line of dual core CPUs are currently only slated for use with an 800MHz FSB, instead of Intel's soon to be announced 1066MHz FSB. The reduction in bandwidth will hurt performance scalability and we continue to wonder why Intel is reluctant to transition more of their CPUs to the 1066MHz FSB, especially the dual core chips that definitely need it.

With only a 64-bit FSB running at 800MHz, a single x40 processor will only have 6.4GB/s of bandwidth to the rest of the system. Now that 6.4GB/s is fine for a single CPU, but an x40 with two cores the bandwidth requirements go up significantly.

AMD's Strategy

While Intel's current roadmap appears to place dual core on the desktop before it makes its way to the enterprise (other than with Itanium), AMD's strategy is reversed - with dual core appearing in workstations and on servers before making a splash on the desktop.

Overall, AMD's approach simply makes more sense, since the overall performance benefit to dual core on the desktop will be minimal at best but strong in very specific applications and usage patterns. With most desktop applications continuing to be single threaded, dual core will still have to wait until there is more application support before truly being useful on the desktop. Heavy multitaskers and those running workstation applications will appreciate the benefits of dual core, but gamers and most other users will find higher clocked single core chips to be better suited for their needs.

The scenario is exactly the opposite in the workstation and server space, with the applications already seeing huge benefits from going to multiple processors thanks to their multithreaded nature.

When AMD mentions that their K8 architecture was designed for multicore operation from the start, they weren't lying. Each Socket-939 or Socket-940 K8 chip, whether it's an Athlon 64, Athlon 64 FX or Opteron, features three Hyper Transport links (whether they are all operational is another question). In order to create a dual core version of a K8 based chip, you simply remove a single pair of Hyper Transport PHYs, one from each chip, and fuse the two Hyper Transport links together - thus creating a direct path of communication between the two cores, capable of transmitting data at up to 8GB/s (at 1GHz) between the two chips. Update: There is some debate as to how AMD implements dual core in their K8 architecture. The above description was provided by AMD from an earlier discussion but many readers have emailed to point out that the two cores are connected at the SRQ level. We are awaiting official confirmation from AMD as to exactly how their dual core technology is implemented. Update 2:While AMD never got back to us with an official response, unofficially they did confirm that the two cores on a single dual core Opteron die do communicate at full speed and are not connected at the HT level. We apologize for the error.

AMD's performance limitation here will be memory bandwidth, with the two K8 cores sharing the 128-bit DDR memory bus. While we currently don't see a huge performance increase from going to a 128-bit memory bus from a single channel 64-bit interface, the move to dual core will definitely make greater use of memory bandwidth.

AMD continues to list the second half of 2005 as the introduction timeframe for their dual core CPUs, with Opteron coming first and then Athlon 64 FX. Once again, as with all release dates, nothing is set in stone, but right now it looks like that both AMD and Intel are planning on having dual core on the desktop in the same general timeframe.

AMD has yet to reveal what the official specifications of their upcoming dual core desktop products are, but based on roadmaps and what we've seen, it would seem that the first dual core desktop parts will be based on two 90nm Athlon 64 FX cores with a shared memory controller. Interally AMD is referring to this CPU as "Toledo" as we've already published.

Dual Core Mobility Final Words
Comments Locked

59 Comments

View All Comments

  • Dasoo - Tuesday, November 2, 2004 - link

    Has anyone heard anything about possible implications of the move to dual-core on memory? While I would guess that there would be little impact, I'm wondering if dual-core systems will use more memory or if dual-core will require performance characteristics.

    Thanks
  • Speedo - Sunday, October 31, 2004 - link

    #55, "Right...unless you also happen to be running another application. For example "Windows" with 26 services..."

    Yea, but does these services, in a normal gaming computer installation, really take that much cpu time to really show an improvement in games?

    For example, taking a look at the taskmanager right now shows I have 99% (or more) cpu resources free.
  • dak - Monday, October 25, 2004 - link

    #31, "Hang on -- to all those that say dual threads are crap -- what exactly are you running -- AMD 64 maybe? they'res no software that can take advantage of the 64 bit, so its virtually the same thing no? "

    Sorry mate, I've got 2 amd64 boxes running 64 bit. It's called Linux you dolt. Windows ain't the only thing going on out there. And I can't wait for dual core, it'll be great for source based linux distros like gentoo....
  • knitecrow - Monday, October 25, 2004 - link

    The only people raving about dual-core equals double the performance would be Intel spin doctors and computer noobs.

    Having a multithreaded application is not a simple matter of a linear increase in programming time/skill/effort/debug/validation ... it’s a geometric increase.

    This makes multithreaded apps, inherently expensive with longer development cycle.

    Furthermore, poorly written multithread apps can run far worse than single threads.

    The windows OS is quite dumb when it comes to multithreading; while it may suffice for 2P and 4P... it becomes less appealing when you scale to 8P and outright useless after that. No wonder UNIX remains top choice for multiprocessor supercomputers.

    Please consider REALITY before raving about dual processor.

  • Audiophile1980 - Sunday, October 24, 2004 - link

    [q]In a single threaded application, no they will not be any faster. In a game for example, two 3.2GHz cores will not be faster than a single 3.2GHz core.
    [/q]

    Right...unless you also happen to be running another application. For example "Windows" with 26 services...
  • eachus - Saturday, October 23, 2004 - link

    "When AMD mentions that their K8 architecture was designed for multicore operation from the start, they weren't lying. Each Socket-939 or Socket-940 K8 chip, whether it's an Athlon 64, Athlon 64 FX or Opteron, features three Hyper Transport links (whether they are all operational is another question). In order to create a dual core version of a K8 based chip, you simply remove a single pair of Hyper Transport PHYs, one from each chip, and fuse the two Hyper Transport links together - thus creating a direct path of communication between the two cores, capable of transmitting data at up to 8GB/s (at 1GHz) between the two chips."

    That is not how AMD does it. Hammer chips have a cross-bar switch with connections to memory, Hypertransport links, and up to TWO CPU cores with cache. Dual core chips have one copy of the crossbar and memory controller, and both CPU cores connect to it. All done. The crossbar works at core-clock not memory speeds. The only case where the cross-bar could be a bottleneck is if both CPU cores have >50% cache hit rates on the other core's cache.
  • eachus - Saturday, October 23, 2004 - link

  • stephenbrooks - Saturday, October 23, 2004 - link

    #27:

    --[There are several reasons why games aren't written multithreaded: 1. multithreaded apps have more overhead so they run slower on single CPU systems.]--

    I never said they'd use multiple threads when running on single CPU systems. There's a very simple call in Windows you can do to determine how many processors there are, and you can decide how many threads you produce based on that. For instance if you have to detect collisions with 400 objects, you could do 100 in each of 4 threads, or 200 in 2 or 400 in the original thread.

    --[2. most gaming systems are single CPU.]--

    Yes, _right now_. If we end up having 4 or 8-core chips by 2010, single-threaded games are going to look rather silly.

    --[3. the threads need to communicate with each other to get the frames drawn. Since the threads have critical sections, running them on a single CPU will make the critical sections que up causing major lag and drop in framerate.]--

    The game would scale down to 1 thread on a 1 CPU (non-HT) system.

    I think the main problem is that since there aren't so many multi-processor SMP systems out there, developers just think in terms of one thread all the time. It will take dual-cores etc. to become commonplace to change that.

    Finally, will everyone who assumes "different threads have to be doing qualitatively different things" please stop it? That's complete pants. Often you get the best (near-linear) scaling when you just have a lot of repetitive (non-mutually-relying) things to do and you can split them equally between a thread for each CPU.

    It's certainly true that _when no apps are multiprocessor-aware_ the different threads you have will be doing different things, but when the programmers know about how many CPUs there are, it's a whole different ball game.
  • douglar - Saturday, October 23, 2004 - link

    From aces hardware--

    "According to AMD documentation, both cores in a dual-core chip are connected to one shared SRQ (System Request Queue). The SRQ has ports for CPU0 and CPU1. The links between the two cores and the SRQ runs at core frequency with 64-bit data paths. The SRQ is connected to the XBar (crossbar) which handles signal routing between the SRQ, MCT (Memory controller) and up to three HyperTransport Links. The SRQ is also connected to a APIC (Advanced Priority Interrupt Controller) that services both cores (dual Int ports).

    The important thing here is that the cores are connected before the crossbar, not after it, as Anand suggests. Hence the cores in a dual-core chip will share all the HyperTransport links and the memory controller.

    See slide 26 of Fred Weber's MPF Presentation, 2001:

    http://www.amd.com/us-en/assets/content_type/Downl... "

  • Briggsy - Saturday, October 23, 2004 - link

    The following is complete and utter bullcrap (from page 2): "Each Socket-939 or Socket-940 K8 chip, whether it's an Athlon 64, Athlon 64 FX or Opteron, features three Hyper Transport links (whether they are all operational is another question). In order to create a dual core version of a K8 based chip, you simply remove a single pair of Hyper Transport PHYs, one from each chip, and fuse the two Hyper Transport links together - thus creating a direct path of communication between the two cores, capable of transmitting data at up to 8GB/s (at 1GHz) between the two chips."

    NO NO NO NO NO NO NO

    This has been described in detail by AMD since 2001. AMD DO NOT 'fuse' together two Hypertransport links to make a dual core processor.

    AMD's processor incorporates an integrated Northbridge, which is a crossbar that attaches to the memory controller, hypertransport controllers, and the processor interface, called the SysReq. The SysReq can connect to TWO cores, and this was designed as a capability from the very beginning. AMD's dual core simply adds another CPU core and attaches it to the currently unused port on the SysReq.

    If you get a simple, well explained, detail like that wrong, I can only assume the rest of the article isn't very reliable either.

Log in

Don't have an account? Sign up now