Unreal 3

[2] The new Unreal 3 engine is a state of the art game development framework for next-generation consoles and DirectX9 PC's, but what sparked our interest for this article was the fact that it is probably one of the first multithreaded game engines for the most popular game genre: first person shooters.

AnandTech: The new Unreal Engine 3 is designed for multi-threading, and will make good use of dual core CPUs available when games on the new engine come out. What parts of the game will benefit/be improved, thanks to multiprocessing? What will be the parts that will benefit the most?

Tim Sweeney: For multithreading optimizations, we're focusing on physics, animation updates, the renderer's scene traversal loop, sound updates, and content streaming.We are not attempting to multithread systems that are highly sequential and object-oriented, such as the gameplay.

Implementing a multithreaded system requires two to three times the development and testing effort of implementing a comparable non-multithreaded system, so it's vital that developers focus on self-contained systems that offer the highest effort-to-reward ratio.


AnandTech: What kind of performance improvement (rough estimate) do you expect from a dual core CPU compared to a single core CPU with the same core? (A few percents, a bit more than 10%, tens of percents?) In other words, will a gamer "feel" the difference between a dual core and single core or between a single and dual CPU system running an Unreal 3 engine based game?

Tim Sweeney: It's too early to talk numbers, but we certainly expect Unreal Engine 3 titles to see significant gains on multi-core platforms.

AnandTech: In the past years, games have typically depended more on GPU power than on CPU power (a mid-range CPU with a high end video card was/is faster than a high end CPU with a mid-range video card even at relatively low resolutions). Is the multithreaded nature of the Unreal 3 engine a sign that CPU performance is playing again a more important role in the gaming experience?

Tim Sweeney: Unreal Engine games have always been more CPU-intensive than the norm, for two reasons. First, we're always trying to push the leading edge with physics and other CPU-based features. Second, the Unreal Engine has a much more extensive gameplay scripting interface aimed at empowering mod authors and improving developer productivity by enabling safer and higher-level gameplay development. So we're not going to have any trouble keeping up with increases in CPU power.

Multi-core will be especially valuable because CPU performance scaling due to frequency improvements has tapered off over the past few years.

Clock speed has increased slowly, and real performance hasn't increased in proportion to clocks. But two cores have approximately twice the real aggregate performance as one core, so we're about to see a nonlinear improvement.

Finally, keep in mind that the Windows XP driver model for Direct3D is quite inefficient, to such an extent that in many applications, the OS and driver overhead associated with issuing Direct3D calls approaches 50% of available CPU cycles.Hiding this overhead will be one of the major immediate uses of multi-core.


AnandTech: Did you make use of auto-parallelisation compiler technology (like the auto parallelisation found in Intel C++ compiler) to make the engine multithreaded?

Tim Sweeney: Auto-parallelization of C++ code is not a serious notion. This falls in the same category as the Intel compiler's strip-mining optimizations and other such tricks, which are designed to speed up one particular loop in one particular SpecFP benchmark. These techniques applied to C/C++ programs are completely infeasible on the scale of real applications.

AnandTech: What about OpenMP?

Tim Sweeney: There are two parts to implementing multithreading in an application. The first part is launching the threads and handing data to them; the second part is making the appropriate portions of your 500,000-line codebase thread-safe. OpenMP solves only the first problem. But that's the easy part - any idiot can launch lots of threads and hand data to them. Writing thread-safe code is the far harder engineering problem and OpenMP doesn't help with that.

AnandTech: Programming multiple threads can be complex. Wasn't it very hard to deal with the typical problems of programming multithreaded such as deadlocks, racing and synchronization?

Tim Sweeney: Yes! These are hard problems, certainly not the kind of problems every game industry programmer is going to want to tackle. This is also why it's especially important to focus multithreading efforts on the self-contained and performance-critical subsystems in an engine that offer the most potential performance gain. You definitely don't want to execute your 150,000 lines of object-oriented gameplay logic across multiple threads - the combinatorical complexity of all of the interactions is beyond what a team can economically manage. But if you're looking at handing off physics calculations or animation updates to threads, that becomes a more tractable problem.

We also see middleware as one of the major cost-saving directions for the industry as software complexity increases. It's certainly not economical for hundreds of teams to write their own multithreaded game engines and tool sets. But if a handful of company write the core engines and tools, and hundreds of developers can reuse that work, then developers can focus more of their time and money on content and design, the areas that really set games apart.


AnandTech: The current OpenGL and DirectX are - AFAIK - not very well adapted to multithreaded programming. How did you solve this problem? Or wasn't it a problem at all?

Tim Sweeney: There is only one GPU in there, and though it is highly parallel at the pixel level, its execution is still serial on the granularity of state changes and triangle submission. So it is natural that the interface to the GPU remain single-threaded, and that part of one CPU thread be dedicated to submitting rendering commands.

Threads & Performance Threads & Gaming
Comments Locked

49 Comments

View All Comments

  • ChronoReverse - Tuesday, March 15, 2005 - link

    Eh? 20% speed reduction? The dual-core sample in the new post was running at 2.4GHz (FX-53). Sure it's not FX-55 speeds but it's still faster than most everything.
  • kmmatney - Monday, March 14, 2005 - link

    edit - I just read some of the above posts. Yes, I agree that dual core can be more efficent than dual cpu. However you have about a 20% reduction in core speed which the dual core optimizations will have to overcome, when compared to a single core cpu.
  • kmmatney - Monday, March 14, 2005 - link

    For starters, why would dual core be any different than dual cpu? One of the Quake games (quake 3?) was able to make use of a second cpu, and the gain was very minimal. I'm not even sure Id bothered with dual cpu use for Doom3. If everybody has dual core cpu's, then obviously more work would be done to make use of it, but we've had dual cpu motherboards for a long time already.
  • Verdant - Monday, March 14, 2005 - link

    there is no one who (has a clue) doubts that you will see an ever increasing level of cores provide an ever increasing level of performance, in fact i would not be surprised if the Mhz races of the 90s become the "number of core" races of this decade.

    but i think the one line that really hit the nail on the head is the one about a lack of developer tools.

    writting a lower level multi-threaded application is extremely difficult, game developers aren't using tools like java or c# where it is a matter of enclosing a section of code in a synchronized/lock block, throwing a few wait() calls in and launching their new thread. - the performance of these platforms just isn't there.

    for consideration - a basic 2 thread bounded buffer program in C is easily 200 lines of code, while it can easily be done in a language like C# in about 20.

    developers are going to need to either: move to one of these new languages/platforms and take the performance hit, develop a new specialized platform/language, or they will most likely go bankrupt with the old tools.

    the other thing that may have some merit - is a compiler that can generate multi-threaded code from single thread code, however to have any sort of real effect it will need to have an enormous amount of research poured into it, as automatically deciding un-serializable tasks is a huge AI task. Intel's current compiler obviously is many years away from the sort of thing i am talking about.
  • Doormat - Monday, March 14, 2005 - link

    #20/#29:

    The AMD architecture is different than Intels dual core architecture.

    AMD will have a seperate HTT link between chips (phy layer only) for intercore communication, and a seperate link to the memory arbitor/access unit.

    Whereas intel (when they opt for two seperate cores, two seperate pieces of silicon) will have a link between the two processors, but its is a bus, and not point-to-point, and also will share that bus with all traffic out to the northbridge/mch. Memory traffic, non-DMA I/O traffic, etc.

    In other words, AMD has a dedicated intercore comm channel via HTT while Intel does not. This will affect heavily interconnected threads.
  • saratoga - Monday, March 14, 2005 - link

    "Unless you hit a power and/or heat output wall.

    Tell nVidia that parallell GPUs are bad, they alreay sell their SLI solution for dual-GPU computers."

    Multicore doesn't make much sense for GPUs because its not cost effective, and because GPUs do not have the same problems as CPUs. With a GPU you can just double the number of pipelines and your throughput more or less doubles (though bandwidth can be an issue here), and for a fraction the cost of two discreet boards or two seperate GPUs. That approach doesn't work well with CPUs, hence the interest in dual core CPUs.

    "Isn't a high IPC-count also a form of parallelism? If so, then beyond a certain count won't it be just as hard to take advantage of a high IPC-count."

    Yup. High IPC means you have a high degree of instruction level parallelism. Easily multithreaded code means you have a high degree of thread level parallelism. They each represent part of the parallelism in a piece of code/algorythm, etc.
  • Fricardo - Monday, March 14, 2005 - link

    "While Dual core CPUs are more expensive to manufacture, they are far more easier to design than turning a single core CPU into an even more wider complex CPU issue."

    Nice grammer ;)

    Informative article though. Good work.
  • suryad - Monday, March 14, 2005 - link

    Dang...good thing I have not bought a new machine yet. I am going to stick with my Inspiron XPS Gen1 for a good 3-4 years when my warranty runs out before I go run out and by another top of the line laptop and a desktop.

    It will be extremely interesting how these things turn out. Things had been slowing down quiet a lot in the technology envelope front last year but AMD with its FX line of processors were giving me hope...now dual cores...I want an 8 cored AMD FX setup. I think beyond 8 the performance increases will be zip.

    I am sure by the end of 2006 we will have experienced quiet a massive paradigm shift with multi cored systems and software taking advantage of it. I am sure the MS DirectX developers for WinFX or DirectX Next or WGF 1.0 or whatever the heck it is called are not going to be sitting on their thumbs and not fixing up the overheads associated as mentioned in the article with the current Direct3D drivers. So IMHO we are going to see a paradigm shift.

    Good stuff. And as far as threads over processes, I would take threads, lightweight...thats the main thing. Threading issues are a pain in the rear though but I am quiet confident that problem will be taken care of sooner or later. Interesting stuff.

    Great article by the way. Tim Sweeney seems quiet humble for a guy with such knowhow. I wonder if Doom's next engine will be multithreaded. John Carmack i am sure is not going to let the UE 3.0 steal all the limelight. What I would love to see is the next Splinter Cell game based on the UE 3.0 engine. I think that would be the bomb!!
  • stephenbrooks - Monday, March 14, 2005 - link

    In the conclusion - some possibly bad wording:

    --[The easiest part of multithreading is using threads that are running completely independent, that don't share any data. But this source of threading is probably already being used almost to the fullest.]--

    It'll still provide large performance increases when you go to multi-cores, though. You can't "already use" the concept of little-interacting threads when you don't have multiple cores to run them on! This is probably actually one of the more exciting increases we'll see from multi-core.

    The stuff that needs a lot of synchronising will necessarily be a bit of a compromise.
  • Matthew Daws - Monday, March 14, 2005 - link

    #26: I don't think that's true:

    http://www.anandtech.com/tradeshows/showdoc.aspx?i...

    This suggests (and I'm certain I've read this for a fact elsewhere) that each *core* has it's own cache: this means that cache contention will still be an issue, as it is in dual-CPU systems. I'm not sure about the increased interconnection speed: it would certainly seem that this *should* increase, but I've also read that, in particular, Intel's first dual-core chips will be a real hack in regards to this.

    In the future, sure, dual-core should be much better than dual-cpu.

    --Matt

Log in

Don't have an account? Sign up now