Back to Article

  • hzmonte - Tuesday, October 04, 2005 - link

    For everyone's convenience, here is Part 3:"> Reply
  • bmayer - Friday, March 25, 2005 - link

    About the automatic parallization:
    It can be fairly easy to do. I work on some Cray X1 and X1es. A little bit about the X1. The Processors (called Multi-Streaming Processors (MSP)) are made up of 4 Single Streaming Processors (SSP). They are vector units with a lenght of 64 or 32 (can't remember, the point is they are good sized vector units). The processors are clocked at 800MHz for the X1 or 1.3GHz if you have an X1e.

    Ok so what do we see? These CPUs *suck* if your code is not vectorized and running in parallel.Guess what the Cray compiler does? Automatically vectorizes, streams (takes advantage of a full MSP instead of a single SSP), and parallizes.

    They lay out very clearly what the conditions are where the compiler can NOT optimize, and give you directives where you can force it to do so. You can also get a listing of why it did not do a given optimization for any given line. Actually it gives you all information by default which combined with grep is nice.

    OK so there are different types of parallelizm, and the one I have just talked about is different then what they are trying to do. This has been talking about speeding up the execution of some inner loop, which is very different from doing two different things at the same time (AI module and sound module running at the same time). BUT this can still be used for great effect. When the inner loops execute for half the time as on a single core/CPU machine we now have more time to do other things, and thus see a speed improvement.

    I have thought that Sony/IBM should get in touch with Cray to supply compiler tech for the Cell processor. If the Cell is as easy to write parallel code for as the X1 is we will have some very awesome games, and clusters of PS3s.

    If you want to see a very nice overview of processor history and some of the crazy things people are proposing to do with the multi-cores check out

    I agree that parallel C++ is just not happening very well. There are languages like UPC which are starting to gain hold in the HPC market, which *could* find some use in the game market. But as the state of the art stands it is Fortran which is really great for automatically generating parallel code. But who could serriously say that someone outside of engineering writes a code in Fortran?

    Great article, lots to think about!
  • blckgrffn - Friday, March 18, 2005 - link

    Loved it. All of it. Especially the interview with Sweeney - it is always nice to hear where the future *will* lie with regards to at least one major application/game. Now, just get an interview with Carmack, and I will be happy for a long time... :D

  • Caleb Jasin - Friday, March 18, 2005 - link


    Sorry for the late reply.

    And yeah pthreads are not threads really. They are processes. When you call the pthread_create() function you create a new proccess ;)
  • ravedave - Wednesday, March 16, 2005 - link

    Sorry to double post. You could easily make a benchmark that saw 1000000000% speed increase. Take one application give it high priority and have it loop for 5 days. It would lock everything else up. Throw in a second processor and you no longer have that problem, hence a huge speedup in the other processes. I dont trust any numbers from any manufacturer.
  • ravedave - Wednesday, March 16, 2005 - link

    Excellent article. Extremely excllent. I like the fact that you mentioned GUI updates, most people forget that almost all applications are multi-thread as far as GUI/core go. I really think that Microsoft is on the right track with .NET though. I belive .NET 2 or 2.5 will really take multithreading to the next level.
  • RockHydra11 - Tuesday, March 15, 2005 - link

    My fear is that instead of creating new architctures for their processors to increase performance, they will just shove more cores on it and pass it off on people. Reply
  • Verdant - Tuesday, March 15, 2005 - link


    i do see a shift to something like C# but anything that brings a "performance" hit, is likely to scare away developers, especially since on non-windows platforms the hit is pretty huge atm.

    you are right that a compiler probably isn't an answer, i was merely stating that if the industry was dedicated to creating a "deserializing" compiler it would be possible, extremely complex, and probably technically more than a "compiler" but still possible...

    also you are thinking of UNIX, the linux kernel has supported ever since i can remember, take a look at the pthreads and linuxthreads (glibc2) libraries
  • Caleb Jasin - Tuesday, March 15, 2005 - link


    Yeah agreed, from my own experience C# threadding is much easier than threadding in C. And I would say it is the same for most code. Developing in C# is generally much faster than in C or C++. And the tests I have seen shows about a 10% performance hit between optimal C# and C++ code. So I think it is just a question of time before we see games coded primarily in C#. In the end, the time saved could be used to write more optimal code I guess, so maybe the performance hit would be negligable.

    However, I don't think that we will see compilers that are smart enough to multithread code any time soon. I wrote a very simple compiler for a very simple language in university and coding compilers is extremely complicated. As Tim says, they aren't threadding gamelogics and that wouldn't make much sense either because there are too many dependencies. And even though threadding takes alot of time, there are alot of relatively easily paralizeable code in games.

    Btw, there is a small error in the article. It says that Linux has thread support. It really doesn't. A thread in Linux is a process. There is no diffrence at kernel level between starting a new thread and forking a new process.
  • melgross - Tuesday, March 15, 2005 - link

    Don't forget that the idea behind these game engines is the reusability of the code. What I mean is that they will first tackle the problems that Sweeney thought most important, and easier, and then, one by one, the harder problems will be resolved. This might take years, but performance increases are always going to be appreciated. Competing products are always going to put pressure on on each other.

    Ten years from now the discussion will be about how they accomplished all of this.

    While dual-cored GPUs have never been used, since that is just now becoming a viable technology, dual and quad GPUs have been used for many years now on the high end boards. Not the gamer boards that we see for $500 and below.
  • ChronoReverse - Tuesday, March 15, 2005 - link

    Eh? 20% speed reduction? The dual-core sample in the new post was running at 2.4GHz (FX-53). Sure it's not FX-55 speeds but it's still faster than most everything. Reply
  • kmmatney - Monday, March 14, 2005 - link

    edit - I just read some of the above posts. Yes, I agree that dual core can be more efficent than dual cpu. However you have about a 20% reduction in core speed which the dual core optimizations will have to overcome, when compared to a single core cpu. Reply
  • kmmatney - Monday, March 14, 2005 - link

    For starters, why would dual core be any different than dual cpu? One of the Quake games (quake 3?) was able to make use of a second cpu, and the gain was very minimal. I'm not even sure Id bothered with dual cpu use for Doom3. If everybody has dual core cpu's, then obviously more work would be done to make use of it, but we've had dual cpu motherboards for a long time already. Reply
  • Verdant - Monday, March 14, 2005 - link

    there is no one who (has a clue) doubts that you will see an ever increasing level of cores provide an ever increasing level of performance, in fact i would not be surprised if the Mhz races of the 90s become the "number of core" races of this decade.

    but i think the one line that really hit the nail on the head is the one about a lack of developer tools.

    writting a lower level multi-threaded application is extremely difficult, game developers aren't using tools like java or c# where it is a matter of enclosing a section of code in a synchronized/lock block, throwing a few wait() calls in and launching their new thread. - the performance of these platforms just isn't there.

    for consideration - a basic 2 thread bounded buffer program in C is easily 200 lines of code, while it can easily be done in a language like C# in about 20.

    developers are going to need to either: move to one of these new languages/platforms and take the performance hit, develop a new specialized platform/language, or they will most likely go bankrupt with the old tools.

    the other thing that may have some merit - is a compiler that can generate multi-threaded code from single thread code, however to have any sort of real effect it will need to have an enormous amount of research poured into it, as automatically deciding un-serializable tasks is a huge AI task. Intel's current compiler obviously is many years away from the sort of thing i am talking about.
  • Doormat - Monday, March 14, 2005 - link


    The AMD architecture is different than Intels dual core architecture.

    AMD will have a seperate HTT link between chips (phy layer only) for intercore communication, and a seperate link to the memory arbitor/access unit.

    Whereas intel (when they opt for two seperate cores, two seperate pieces of silicon) will have a link between the two processors, but its is a bus, and not point-to-point, and also will share that bus with all traffic out to the northbridge/mch. Memory traffic, non-DMA I/O traffic, etc.

    In other words, AMD has a dedicated intercore comm channel via HTT while Intel does not. This will affect heavily interconnected threads.
  • saratoga - Monday, March 14, 2005 - link

    "Unless you hit a power and/or heat output wall.

    Tell nVidia that parallell GPUs are bad, they alreay sell their SLI solution for dual-GPU computers."

    Multicore doesn't make much sense for GPUs because its not cost effective, and because GPUs do not have the same problems as CPUs. With a GPU you can just double the number of pipelines and your throughput more or less doubles (though bandwidth can be an issue here), and for a fraction the cost of two discreet boards or two seperate GPUs. That approach doesn't work well with CPUs, hence the interest in dual core CPUs.

    "Isn't a high IPC-count also a form of parallelism? If so, then beyond a certain count won't it be just as hard to take advantage of a high IPC-count."

    Yup. High IPC means you have a high degree of instruction level parallelism. Easily multithreaded code means you have a high degree of thread level parallelism. They each represent part of the parallelism in a piece of code/algorythm, etc.
  • Fricardo - Monday, March 14, 2005 - link

    "While Dual core CPUs are more expensive to manufacture, they are far more easier to design than turning a single core CPU into an even more wider complex CPU issue."

    Nice grammer ;)

    Informative article though. Good work.
  • suryad - Monday, March 14, 2005 - link

    Dang...good thing I have not bought a new machine yet. I am going to stick with my Inspiron XPS Gen1 for a good 3-4 years when my warranty runs out before I go run out and by another top of the line laptop and a desktop.

    It will be extremely interesting how these things turn out. Things had been slowing down quiet a lot in the technology envelope front last year but AMD with its FX line of processors were giving me dual cores...I want an 8 cored AMD FX setup. I think beyond 8 the performance increases will be zip.

    I am sure by the end of 2006 we will have experienced quiet a massive paradigm shift with multi cored systems and software taking advantage of it. I am sure the MS DirectX developers for WinFX or DirectX Next or WGF 1.0 or whatever the heck it is called are not going to be sitting on their thumbs and not fixing up the overheads associated as mentioned in the article with the current Direct3D drivers. So IMHO we are going to see a paradigm shift.

    Good stuff. And as far as threads over processes, I would take threads, lightweight...thats the main thing. Threading issues are a pain in the rear though but I am quiet confident that problem will be taken care of sooner or later. Interesting stuff.

    Great article by the way. Tim Sweeney seems quiet humble for a guy with such knowhow. I wonder if Doom's next engine will be multithreaded. John Carmack i am sure is not going to let the UE 3.0 steal all the limelight. What I would love to see is the next Splinter Cell game based on the UE 3.0 engine. I think that would be the bomb!!
  • stephenbrooks - Monday, March 14, 2005 - link

    In the conclusion - some possibly bad wording:

    --[The easiest part of multithreading is using threads that are running completely independent, that don't share any data. But this source of threading is probably already being used almost to the fullest.]--

    It'll still provide large performance increases when you go to multi-cores, though. You can't "already use" the concept of little-interacting threads when you don't have multiple cores to run them on! This is probably actually one of the more exciting increases we'll see from multi-core.

    The stuff that needs a lot of synchronising will necessarily be a bit of a compromise.
  • Matthew Daws - Monday, March 14, 2005 - link

    #26: I don't think that's true:

    This suggests (and I'm certain I've read this for a fact elsewhere) that each *core* has it's own cache: this means that cache contention will still be an issue, as it is in dual-CPU systems. I'm not sure about the increased interconnection speed: it would certainly seem that this *should* increase, but I've also read that, in particular, Intel's first dual-core chips will be a real hack in regards to this.

    In the future, sure, dual-core should be much better than dual-cpu.

  • NullSubroutine - Monday, March 14, 2005 - link

    I think there is a few things that most people overlook when looking at multi-cpu/multi-core, almost all benchmarks that I have seen are written and tested on systems with clean installs, and have no other programs running (anti-virus, aim, msn, teamspeak, IRC, p2p software, firewall, decode human genome :b, etc). I would think that most people do leave many programs open, such as those above, when playing games.

    With this in mind, people will find an increase of system performance when leaving multiple programs running. It wont be an increase for performance for benchmark testbeds so much, as an increase in real world performance.

    So basically it won't increase speed in these circumstances, but limit the decrease of fps while running many different programs.
  • fitten - Monday, March 14, 2005 - link

    Article: "Be warned that Intel was already showing performance increases, which are not realistic "up to 124%"."

    #5, there's another explanation as well, but it's a more rare condition. Suppose you had two processors (doesn't even have to be dual core), each with 1M L2 cache. Suppose you also had a problem that has data that is 1.5M in size and is very coarse grained (very parallelizable). One processor cannot fit all the data into L2 cache so it will have to run at main memory speeds most/all of the time. With two processors, each gets 768K, which can easily fit into its L2 cache, which enables each processor to run at L2 cache speeds. This would show up as a superlinear speedup (two cores = more than 2X as fast). This is an extreme example, but one I expect to find in published marketing propaganda.

    #13 " A though! I still think threads are rubbish, that processes and better schedulers are the way forward. "

    Well, with threads you get shared memory for "free", if you've ever written processes that use shared memory, well, there you are. However, since a threaded kernel and a process based kernel are pretty much the same when a process has only one thread, there's little difference between the two for single-threaded executables and you can continue to use your multi-process model without any problems.

    As with #17... like it or not, multi-core/multi-processor boxes are what's coming. You can choose to use what resources are available to you or you can stick to one process programming. Some groups will choose to use what resources are available and some won't. The marketplace will sort out the winners/losers based on which solution is better.

    #18 The PPU is just another form of multiprocessing (just like GPUs are). It's just Asymmetric Multiprocessing (AMP) instead of Symmetric Multiprocessing (SMP). It's not new or anything. I do agree, though, that the PPU has a lot of potential and, just out of my own preferences, goes by the idea that adding specialized hardware (cheaply) usually is a bigger win than adding more generalized hardware. Just think of graphics cards today. Adding a relatively cheap graphics card will make your game run much better/prettier than adding another P4 or Opteron.

    Basically, my thoughts are this: The gaming industry has already gone "multi-threaded" in an asymmetric way simply because of 3D video cards. They already have solved some problems by abstracting parts of their problem. This is simply adding more resources that they can take advantage of, or not, as they see fit. Having dual-core or dual processor systems doesn't prevent them from writing as they've done today. The main issue, for the short term, is that they will need to know whether or not they are on a dual core machine and write accordingly. The main reason that multithreaded games haven't really caught on as of yet is because 99% (or more) of the target audience has only one core. Spending the amount of time/effort to optimize for dual processors for less than 1% of your target market doesn't make sense. If 90% of the market had dual processors, then it would probably be worth the effort to plan to use the resources available. Since both major CPU houses are going dual core and it looks like that's the "way it's gonna be", there will be a rocky period for a while while dual core machines are rare, but they will get more common until the point where they are in the majority. At that time, it will make sense to consider single core machines as the degenerate case and, basically, make single cores the exception instead of the rule.
  • Calin - Monday, March 14, 2005 - link

    #20, a multicore implementation could have shared cache, and also have very fast inter processor communications. You could write a program with small interdependent threads that wait to end both and update parts of some common data. The data used stays in the common cache, and every update is made extremely fast.
    Compare this to a dual processor, that must maintain its caches in synchronization. After a fraction of a millisecond (or less) or work, the processors update different portions of the common data. And there goes: invalidation of cache lines, writing of modified cache lines to memory, the processors must fight for a single FSB (the case with Intel Pentium processors), and so on. You can see that there are some cases (even if somehow artificial) when a proper implementation of dual core can be much faster than multiprocessor.
    The best advantage the multicore will have over multiprocessor would be in numerical tasks like weather prediction, and other highly interdependant computation tasks
  • hzmonte - Tuesday, October 04, 2005 - link

    "a multicore implementation could have shared cache and also have very fast inter processor communications... Compare this to a dual processor, that must maintain its caches in synchronization." Is this the real reason that multi-core multiprocessing is better than multi-chip multiprocessing (the traditional SMP)? A multi-core chip can have dedicated caches (per core) too, and that requires synchronization. And multi-chip SMP could also have shared cache and fast inter-chip communication. Well, you may argue that it is easier to make inter-core communication faster than inter-chip communication. But is this really the fundamental reason why multicore is better than multichip? Could someone explain why a processor manufacturer and a consumer would prefer making/buying a multicore than multichip processors? As far as power consumption and leakage is concerned, isn't it true that multichip is more manageable? In a paper "Planning Considerations for Multicore Processor Technology" by John Fruehe (May 2005) in, the author compares the effective performance level of a multicore and multichip processors. (But he does not address my question.) Without giving reason, he assume that the core-to-core scalability is 70% (that is, the second core delivers 70% of its processor power due to overhead) whereas the estimated socket-to-socket (i.e. chip-to-chip) scalability is 80% (that is, the dual processors achieve 180% of their combined processing power). That is kind of interesting. I really want to see a comparison between multi-core multiprocessing vs. multi-chip multiprocessing. Reply
  • ksherman - Monday, March 14, 2005 - link

    at80eighty, by sexy, I mean SCARY AS ALL FREAKIN REASON!!! ;-) Reply
  • Calin - Monday, March 14, 2005 - link

    High IPC is not the form of parallelism from the article - the focus of the article was on running a process on two (or more) different cores. The idea is that high IPC profits all the programs, no matter how written. Multi thread is different - the idea is to have parts of a program that execute simultaneously but with very few interrelations (you can have a thread to paint the interface in a game, while having another thread to paint the rest of the screen. The threads would be with almost no correlations (except for sending commands).
    High IPC is not a solution in x86 world because the code tends to have dependencies close to each other, so you can start executing 100 instructions at a time, but 99 of them needs to wait for the execution of one. You simply have those moments when all execution must wait for an instruction to end.
    EPIC (Itanium) will help with that, as the high IPC could be guaranteed by the instructions - at every clock you can execute one instruction = equivalent to several x86 instructions. So, the performance would be the clock speed multiplied by an IPC of 3 or 4, unlike the Athlon (let's say) that have a performance generated by its larger clock speed multiplied by 1 IPC or something.
  • Kensei - Monday, March 14, 2005 - link

    Wonderful article! I loved the "hardware meets software" focus of this piece. I've had many questions about the practicality of multi-threaded applications and this article answered many of them. Also, loved the interview with Sweeny.

  • bob661 - Monday, March 14, 2005 - link

    I am offended by the word "steam".
  • Matthew Daws - Monday, March 14, 2005 - link

    #20:MarriedMan - Yes, I think so. This is actually an interesting question. As I understand it, I think both AMD and Intel are using pretty much the same technology in both, so that communication channels on the motherboard (in the dual CPU case) will be replaced by communication channels on the CPU die. I think AMD's approach is better only because HT etc. lends itself to dual-core much better than Intel's older technology. I guess the next generation of dual-core chips might be somewhat different though. Anyone else know anything? Reply
  • MarriedMan - Monday, March 14, 2005 - link

    I assume that when a program is multi-threaded to take advantage of dual core CPUs, it will automatically take advantage of dual CPU systems as well.

    Is that a correct assumption? Will the Unreal 3 engine use multiple single core CPUs on an MP system?
  • Pjotr - Monday, March 14, 2005 - link

    "dual-cored GPUs are stupid. given the parallel nature of graphics, it makes more sense to just add another pipeline at very little design cost."

    Unless you hit a power and/or heat output wall.

    Tell nVidia that parallell GPUs are bad, they alreay sell their SLI solution for dual-GPU computers.
  • silverwolf - Monday, March 14, 2005 - link

    PPU, is the way to go. Reply
  • defter - Monday, March 14, 2005 - link

    "Given the immense complexity involved, I expect dual cores taking a VERY VERY long time to catch on... even then it'll be a half assed job."

    Well ALL future consoles will use multi core CPUs. Thus if developers want to sell games, their games must take advantage of at least two cores :)
  • ceefka - Monday, March 14, 2005 - link

    #12 That's how it looks, for now. "No one will ever need more than 8 cores." :-D

    My dumb question, I was reading:

    "Tim clearly emphasizes that only parts of the application can be economically parallelized. Increasing parallelisation, using more threads, is simply not feasible. There is a pretty hard economic limit to TLP."

    Isn't a high IPC-count also a form of parallelism? If so, then beyond a certain count won't it be just as hard to take advantage of a high IPC-count.
  • sandorski - Monday, March 14, 2005 - link

    Good article. Most of it was over my head, but the gist was most important. That being that Multicore is a big question mark for Gamers and other common users.

    I've always preferred Sweeney over others in the industry, he knows what he's doing without getting in everybody elses face about it. I also found it appropriate that he was interviewed on the subject since Unreal Engines have always internally manged 100's of Processes in order to work(I assume other Engines do the same, but my knowledge of hte Unreal Engines is more thorough than them).If anyone can figure out how to use Multicore in gaming my money's on Sweeney.
  • Calin - Monday, March 14, 2005 - link

    Auto-parallelization is of limited use, and it can work only on small pieces of code. You might get a couple of percent extra speed, but no more (or no much more). Managing multi-threaded code interdependent code would be a nightmare.
    However, some "extra" speed can be recovered in case of multi processors (or multi core) from the reduced state (thread) change. Certainly not extra 24%, but more than a bit.
  • FDSatyr - Monday, March 14, 2005 - link

    Good read - mainly for Tim's comments though! I really enjoy the way Tim isn't arrogant at all in the way he talks. Some fairly silly questions from A though! I still think threads are rubbish, that processes and better schedulers are the way forward. I think the next step - realistically impossible in the industry it may be - would be to create a fresh architecture, and put an x86/87 core on the same die. Ho-hum. Reply
  • mkruer - Monday, March 14, 2005 - link

    My magic eightball says that after 4-8 cores, any other core that will be added will be near worthless. Reply
  • AnandThenMan - Monday, March 14, 2005 - link

    Anand got owned by Tim Sweeney.

    AnandTech: Did you make use of auto-parallelisation compiler technology...
    Tim Sweeney: Auto-parallelization of C++ code is not a serious notion.

    Good article though.
  • overclockingoodness - Monday, March 14, 2005 - link

    Oh my God - the Unreal 3 engine is beautiful. I get amazed everytime I look at it. I can't wait for games to be featured on Unreal 3 and the like engines in the future. So fine... Reply
  • at80eighty - Monday, March 14, 2005 - link

    lame jokes aside... i agree... thats is some serious graphics... im gonna bust a nut or two to have a machine running a game like that at full steam : ( Reply
  • at80eighty - Monday, March 14, 2005 - link


    DEFINE 'sexy' ?

    : )
  • knitecrow - Monday, March 14, 2005 - link

    dual-cored GPUs are stupid

    given the parallel nature of graphics, it makes more sense to just add another pipeline at very little design cost.
  • xsilver - Monday, March 14, 2005 - link

    isnt dual cores also coming to GPU's --- would it be any easier to code for this? eg. one GPU can be assaigned textures and the other GPU the lighting?

    multi cpu will be definitley hard to code for
  • tygrus - Monday, March 14, 2005 - link

    The 124% is misleading but can be explained as valid.
    Intel said 124% more frames per second for a single task from a group of three; not 124% faster for all three tasks. Having the two video encoding running on a separate CPU most of the time could allow the game to have 3x the CPU time while the two video encoding threads get slightly more time. These advantages still have to be reduced because of CPU speed reduction, Mem, disk IO and other bottlenecks. If the required CPU cycles for video encoding, games sound, IO control is almost the same then almost 100% of the extra CPU cycles can be devoted to speeding up the game. I'm sure Intel have the OS and software benchmark to prove it.
  • knitecrow - Monday, March 14, 2005 - link

    In summary, dual core products for the consumer market is hype.

    One thing I know, is that developers want to minimize development costs. Given the immense complexity involved, I expect dual cores taking a VERY VERY long time to catch on... even then it'll be a half assed job.

  • wien - Monday, March 14, 2005 - link

    Ah yes.. Great times for us programmers. I can't want to start debugging those highly multithreaded applications.

    Thanks a lot Intel and AMD!
  • ksherman - Monday, March 14, 2005 - link

    That Unreal Engine is sexy, no? Reply
  • Jynx980 - Monday, March 14, 2005 - link

    This will surely drive up the cost of games, not to mention that you would need a new cpu(s) to take full advantage. Its going to be a tough market to push. Reply

Log in

Don't have an account? Sign up now