Introduction

In our first article, we explained that dynamic power, power leakage, the memory wall and wire delay have forced CPU designers to rethink the methods that they use to achieve higher performance CPUs.

In Part 2, we will investigate the advantages and disadvantages of the new market trend: multi-core CPUs. Will dual core enhance your gaming experience? Tim Sweeney, the leading developer behind the Unreal 3 engine, was so kind to answer our questions about multi-threaded development with concise answers. There is more - in the third part of this series, we will investigate what future multi-core and single core architectures will bring. We examine if the stories about "the new era of multi-threaded multi-core CPUs" are true and whether or not this will really benefit the consumer.

Should you care?

Should you care whether or not we are moving to multi-core and multi-threaded CPUs? After all, the past decades, we were able to get consistently more performance for lower prices. However, it is pretty unclear whether or not multi-cores will benefit all consumers. We will explain this statement in more detail, but it is very interesting to see whether or not it will benefit you. The last spring IDF was all about multi-core CPUs, but there was very little information on how this is going to benefit the consumers. Let us take a critical look at this new direction that the desktop CPUs have taken.

Multi-core, multi-expensive?

Dual cores are expensive to manufacture. Yields (the number of working chips on one wafer) are roughly proportional to size. Larger, dual core chips will always have lower yields than smaller, single core chips on the same process technology. But that is only a small problem. A bigger and more obvious problem is that you have only half the number per wafer (even slightly less). So, dual cores (such as Pressler) cost at least twice as much to manufacture compared to a single core chip - most likely more (such as Yonah, Pentium-D). Dual and multi-cores might not increase the thermal density (dissipated power per mm²), but they do increase the total power. Granted, from the viewpoint of a heat sink designer, it is not much harder to cool a 112 mm² Prescott chip that dissipates +/- 90 Watt than a theoretical 206 mm² Pentium-D with 180 Watt. However, making sure that those 180 Watts do not cook all the components inside your computer is almost an impossible task for the system designer who wants to design a relatively silent PC. The result is that multi-core CPUs will run at lower clockspeeds than their single core counterparts. The Pentium-D, the dual core Prescott, is limited to 130 Watt and 3.2 GHz, while the current Prescott dissipates up to 115 Watt and runs at 3.8 GHz. And last, but not least, dual core CPUs need more bandwidth than a single core to make a difference and increase the "CPU perceived" latency. Cache coherency and getting access to the same memory bus all increase the total latency that the CPU sees and thus, lowers performance.

Multi-core, multi-performance?

The advantages of multi-core and multi-threaded CPUs far outweigh the disadvantages in the server market. While most server applications produce a lot of threads and processes, performance scales close to linear as more cores are added to the die. This is in sharp contrast with the superscalar CPU where increasingly complex designs require exponentionally more transistors, and power show diminishing returns, especially in server applications where the IPC can go below 1. While Dual core CPUs are more expensive to manufacture, they are far easier to design than turning a single core CPU into an even wider issue, complex CPU. Development costs for a new CPU design are astronomically high. So, it does not surprise us at all that Server CPU manufacturers have turned en masse towards multi-core CPU designs: significant power gains with a fraction of the time and money invested. And the same can be said about a big part of the HPC market.

A good example of how well server applications can scale with more CPUs, refer to our DB2 tests, which showed up to a 96% performance increase going from single to dual, and a boost of up to 89% when we increased the number of Opterons from two to four. Most desktop and many workstation applications are single-threaded, however. Or more accurately, they might be multithreaded to be more responsive, but there is only one thread that really needs CPU power.

Even some workstation applications that are supposed to be prime examples of multi-threaded applications are not as multi-core friendly as they appear to be. I ran a lot of Adobe Premier benchmarking with different video formats, and I found out that the second CPU offered a meagre 10% to 40% speed increase in video editing (rendering). 3DSMax shows only big increases when you use very complex scenes. When using a relatively light animation scene, the second CPU adds about 20% to 50%. One of the best scenes, the architecture scene of the Spec test, shows an 89% increase when adding a second Opteron, but two extra Opterons already show some diminishing returns - performance went up to 72%.

Multitasking scenarios might be another way to use the power of dual and multi-cores. However, many of the CPU heavy applications that desktop and workstation users like to run in the background - archiving, encoding - also operate on the hard disk. And despite the merits of NCQ (Native Command Queuing), high rotation speeds, and lower seek times, disk heavy tasks and especially multithreaded ones can bring a whole system to a crawl when there is too much hard disk activity. So, it is clear that there are big challenges ahead before multi-core CPUs will really bring benefits to most consumers and employees.

Threads & Performance
Comments Locked

49 Comments

View All Comments

  • ChronoReverse - Tuesday, March 15, 2005 - link

    Eh? 20% speed reduction? The dual-core sample in the new post was running at 2.4GHz (FX-53). Sure it's not FX-55 speeds but it's still faster than most everything.
  • kmmatney - Monday, March 14, 2005 - link

    edit - I just read some of the above posts. Yes, I agree that dual core can be more efficent than dual cpu. However you have about a 20% reduction in core speed which the dual core optimizations will have to overcome, when compared to a single core cpu.
  • kmmatney - Monday, March 14, 2005 - link

    For starters, why would dual core be any different than dual cpu? One of the Quake games (quake 3?) was able to make use of a second cpu, and the gain was very minimal. I'm not even sure Id bothered with dual cpu use for Doom3. If everybody has dual core cpu's, then obviously more work would be done to make use of it, but we've had dual cpu motherboards for a long time already.
  • Verdant - Monday, March 14, 2005 - link

    there is no one who (has a clue) doubts that you will see an ever increasing level of cores provide an ever increasing level of performance, in fact i would not be surprised if the Mhz races of the 90s become the "number of core" races of this decade.

    but i think the one line that really hit the nail on the head is the one about a lack of developer tools.

    writting a lower level multi-threaded application is extremely difficult, game developers aren't using tools like java or c# where it is a matter of enclosing a section of code in a synchronized/lock block, throwing a few wait() calls in and launching their new thread. - the performance of these platforms just isn't there.

    for consideration - a basic 2 thread bounded buffer program in C is easily 200 lines of code, while it can easily be done in a language like C# in about 20.

    developers are going to need to either: move to one of these new languages/platforms and take the performance hit, develop a new specialized platform/language, or they will most likely go bankrupt with the old tools.

    the other thing that may have some merit - is a compiler that can generate multi-threaded code from single thread code, however to have any sort of real effect it will need to have an enormous amount of research poured into it, as automatically deciding un-serializable tasks is a huge AI task. Intel's current compiler obviously is many years away from the sort of thing i am talking about.
  • Doormat - Monday, March 14, 2005 - link

    #20/#29:

    The AMD architecture is different than Intels dual core architecture.

    AMD will have a seperate HTT link between chips (phy layer only) for intercore communication, and a seperate link to the memory arbitor/access unit.

    Whereas intel (when they opt for two seperate cores, two seperate pieces of silicon) will have a link between the two processors, but its is a bus, and not point-to-point, and also will share that bus with all traffic out to the northbridge/mch. Memory traffic, non-DMA I/O traffic, etc.

    In other words, AMD has a dedicated intercore comm channel via HTT while Intel does not. This will affect heavily interconnected threads.
  • saratoga - Monday, March 14, 2005 - link

    "Unless you hit a power and/or heat output wall.

    Tell nVidia that parallell GPUs are bad, they alreay sell their SLI solution for dual-GPU computers."

    Multicore doesn't make much sense for GPUs because its not cost effective, and because GPUs do not have the same problems as CPUs. With a GPU you can just double the number of pipelines and your throughput more or less doubles (though bandwidth can be an issue here), and for a fraction the cost of two discreet boards or two seperate GPUs. That approach doesn't work well with CPUs, hence the interest in dual core CPUs.

    "Isn't a high IPC-count also a form of parallelism? If so, then beyond a certain count won't it be just as hard to take advantage of a high IPC-count."

    Yup. High IPC means you have a high degree of instruction level parallelism. Easily multithreaded code means you have a high degree of thread level parallelism. They each represent part of the parallelism in a piece of code/algorythm, etc.
  • Fricardo - Monday, March 14, 2005 - link

    "While Dual core CPUs are more expensive to manufacture, they are far more easier to design than turning a single core CPU into an even more wider complex CPU issue."

    Nice grammer ;)

    Informative article though. Good work.
  • suryad - Monday, March 14, 2005 - link

    Dang...good thing I have not bought a new machine yet. I am going to stick with my Inspiron XPS Gen1 for a good 3-4 years when my warranty runs out before I go run out and by another top of the line laptop and a desktop.

    It will be extremely interesting how these things turn out. Things had been slowing down quiet a lot in the technology envelope front last year but AMD with its FX line of processors were giving me hope...now dual cores...I want an 8 cored AMD FX setup. I think beyond 8 the performance increases will be zip.

    I am sure by the end of 2006 we will have experienced quiet a massive paradigm shift with multi cored systems and software taking advantage of it. I am sure the MS DirectX developers for WinFX or DirectX Next or WGF 1.0 or whatever the heck it is called are not going to be sitting on their thumbs and not fixing up the overheads associated as mentioned in the article with the current Direct3D drivers. So IMHO we are going to see a paradigm shift.

    Good stuff. And as far as threads over processes, I would take threads, lightweight...thats the main thing. Threading issues are a pain in the rear though but I am quiet confident that problem will be taken care of sooner or later. Interesting stuff.

    Great article by the way. Tim Sweeney seems quiet humble for a guy with such knowhow. I wonder if Doom's next engine will be multithreaded. John Carmack i am sure is not going to let the UE 3.0 steal all the limelight. What I would love to see is the next Splinter Cell game based on the UE 3.0 engine. I think that would be the bomb!!
  • stephenbrooks - Monday, March 14, 2005 - link

    In the conclusion - some possibly bad wording:

    --[The easiest part of multithreading is using threads that are running completely independent, that don't share any data. But this source of threading is probably already being used almost to the fullest.]--

    It'll still provide large performance increases when you go to multi-cores, though. You can't "already use" the concept of little-interacting threads when you don't have multiple cores to run them on! This is probably actually one of the more exciting increases we'll see from multi-core.

    The stuff that needs a lot of synchronising will necessarily be a bit of a compromise.
  • Matthew Daws - Monday, March 14, 2005 - link

    #26: I don't think that's true:

    http://www.anandtech.com/tradeshows/showdoc.aspx?i...

    This suggests (and I'm certain I've read this for a fact elsewhere) that each *core* has it's own cache: this means that cache contention will still be an issue, as it is in dual-CPU systems. I'm not sure about the increased interconnection speed: it would certainly seem that this *should* increase, but I've also read that, in particular, Intel's first dual-core chips will be a real hack in regards to this.

    In the future, sure, dual-core should be much better than dual-cpu.

    --Matt

Log in

Don't have an account? Sign up now