Threading Models

There are four basic threading models available for computer programmers. Some of them are easier to implement than others, and some of them are more beneficial for certain types of work than others. The four threading models are:
  • Single Threading
  • Coarse Threading
  • Fine-Grained Threading
  • Hybrid Threading
Single threading is what everyone generally understands. You write a program as a sequential series of events that are executed. It's easy to do, and this is what most programmers learn in school or when they first start "hacking". Unfortunately, single threading is becoming obsolete for a lot of tasks. Simple programs that generally wait for user input or other events can still get by with taking this approach, but any application that depends on CPU computational power is going to miss out on a large amount of performance if it isn't designed to run in a multithreaded fashion. There's not much more to say about single threading.


The next step up from single threading is to look at coarse threading. Next to avoiding threads altogether, this is the simplest approach to multithreading. The idea is basically to find discrete systems of work that can be done separate from each other. The example we gave earlier of having two cooks working a single dish, with each person preparing a portion of the dish is an example of coarse threading. If one person is working on slicing up and grilling some chicken while a second person is preparing a Hollandaise sauce, you have two separate systems that can easily be accomplished at the same time.

A classic example of this in the computing world is an application that uses a client-server architecture. You would simply run the client code as one thread and the server code as a second thread, each would handle certain tasks, there would be a minimal amount of interaction between the two threads, and hopefully you would get a net increase in performance. This type of architecture is in fact what has allowed two of the current multithreaded games to begin to take advantage of multiple processors. Quake 4 and Call of Duty 2 both stem from id Software code that used a client-server design. The server handles such tasks as networking code, artificial intelligence and physics calculations, and perhaps a few other tasks related to maintaining the world state. Meanwhile the client handles such things as user input, graphics rendering, audio and some other tasks as well. (We might have the breakdown wrong, but it's the general idea we're interested in.)

Depending on the game, breaking up a client-server design into two separate threads can provide anywhere from ~0% to 20% or more performance. 0% - as in no gain at all? Unfortunately, as many of you are already aware, many games are bottlenecked by one specific area: graphics performance. If you reach the point where the graphics card is slowing everything else down, the CPU might have more than enough time available to run all of the other tasks while waiting for the graphics card to finish its work. If that's the case, having a faster CPU -- or even more CPU cores -- would provide little or no benefit in a coarse threaded client-server game. This is why Quake 4 and Call of Duty 2 see performance increases for enabling SMP support that start out high when you run at lower resolutions using fast graphics cards, but as the resolution increases and the GPU becomes the bottleneck, the benefit of SMP begins to disappear. Note also that both of these games only use two threads, so if you have something like a quad core processor or two dual core processors, you still aren't able to effectively use even half of the total computational resources available.

This is not to say that coarse threading can't be advantageous. If you have an application with two tasks that need to be computed, and both tasks take the same amount of time -- or nearly the same amount of time -- using a coarse threaded approach allows you to easily utilize all of the available CPU power. There are many tasks that fall into this category, although likewise there are plenty of tasks that do not fit this model. It is still a better solution than single threading in a multiprocessor world, but in many instances it's a baby step in the right direction.


Fine-grained threading takes the opposite approach of coarse threading. Instead of looking for entire systems that can be broken into separate threads, fine-grained threading looks at a single task and tries to break it into smaller pieces. A couple of the best examples of fine-grained threading providing a dramatic increase in performance are 3D rendering applications and video encoding applications. If you try to take a coarse threading approach to 3D rendering, you will usually end up with one system that handles the actual 3D rendering, and it will consume well over 90% of the application time. The same can be said of video encoding. Luckily, both tasks -- 3D rendering and video encoding -- are relatively simple to break up into smaller pieces. You simply take a divide and conquer approach, so if you have two processor cores you split the scene in half, or with four cores you split the scene into four pieces, etc. You can further optimize things by allowing any cores that finish early to begin working on a portion of one of the unfinished sections.

Thinking back to our kitchen analogy, what exactly is fine-grained threading? The best example we can come up with would be the preparation of a large quantity of food. Let's say you have a large barbecue where there will be hundreds of people present. Everyone wants their food to be warm, and hopefully the food doesn't get overcooked, undercooked, or any one of a number of other potential problems. If you're preparing something like hamburgers or steaks, cooking several at a time makes sense, but there's only so much one person can do. The answer is to add additional barbecues with additional cooks, so they can all be working on making hamburgers at the same time.

The problem with fine-grained threading is that not all tasks easily lend themselves to this approach. There are other types of foods that could be served at a barbecue where you might not need to have a bunch of chefs doing the same task, for example. It is definitely more difficult (overall) than coarse threading, but you can run into problems when you have a lot of work to be done but you don't know exactly how long each piece will take. Performance scaling and resource management also tend to become issues the more you split up a task - you wouldn't want 300 chefs, 300 barbeques, all cooking one hamburger each, right? You have to maintain a balance between spending time breaking a task into smaller pieces versus simply getting the work done. Typically, there is a minimum size beyond which it's not worth splitting up a task any further, and when you reach this point you just let all of the assigned pieces complete. Things like memory bandwidth can also become a performance bottleneck, so if you have a task that was really easy to break into pieces but each piece demanded a lot of memory bandwidth, you might discover that more than four pieces shows no performance improvements on current computer systems due to memory bandwidth limitations.

The net result is that fine-grained threading often has diminishing returns as you add more processors. With 3D rendering for example, having two processor cores is often about 90% faster than having only one processor core, whereas four processor cores is only about 80% faster than two cores, and eight cores might only be 70% faster than four cores. So if one processor takes 20 minutes to complete a 3D rendering task, two cores might be able to complete the task in 11 minutes, four cores might complete the task in 6 minutes, and eight cores could complete the task in only 3.5 minutes. The benefits of fine-grained threading will vary depending on the work that is being performed, and it can place a larger burden on the programmer to get things properly optimized, but the performance increases can still be substantial.


Hybrid threading really doesn't add much new to the table. As the name implies, this is a hybrid approach where you basically take both coarse and fine-grained threading and utilize them as appropriate to the task at hand. This requires the most work, as now you have to deal with synchronizing coarse system threads along with potentially individual threads within the systems. Hybrid threading will typically have more threads to worry about than either of the other approaches. The good news is that when done properly, hybrid threading can make maximum use of the available CPU power.

What Is Multithreading? Building a Threaded Game Engine
Comments Locked

55 Comments

View All Comments

  • primer - Tuesday, November 7, 2006 - link

    it's a nice article. i'm anxious to see what develops out of Valve in the near future.

    how about using some Athlon FX, Opteron 100/1200 series or higher speed Athlon X2s for crying out loud. i know that here is a performance gap even with the higher AMD models currently, but please show us an attempt at not being as one-sided.
  • jm20 - Tuesday, November 7, 2006 - link

    The tests at 2.0 and 2.4 Ghz for the AMD chips are there, you can plot how it 'should' improve. Here are my scaling projections based on the data given for the 939 platform.

    AMD Speed single dual
    2.0 14.0 27.0
    2.2 15.5 29.5
    2.4 17.0 32.0
    2.6 18.8 35.0
    2.8 20.8 38.2
    3.0 23.1 41.7

    I just hope this will show up correctly :/


    Very interesting article, I'm very excited to see an almost linear improvement from single to dual core.
  • yacoub - Tuesday, November 7, 2006 - link

    Your Tested Systems details on page 8 - the last three show a CPU of a Venice 3200+ oc'd to 2.4GHz (hey that's what I run too!), but the headings suggest they should really show Intel chips. :)
  • yacoub - Tuesday, November 7, 2006 - link

    quote:

    If you think about the way people move through the real world, they are constantly bumping into other objects or touching other objects and people. While this would be largely a visual effect, ... Two characters running past each other could even bump and react realistically, with arms and bodies being nudged to the side instead of mysteriously gliding past each other.


    Yes please make it only visual and not actually a part of the game, because much like trying too hard to make graphics look realistic, it really just adds more frustration for the player than anything else. It's already annoying in HL2 getting caught up on the edge of prop_physics related objects. The last thing we need is also getting caught or bumped by another player ;)
  • JarredWalton - Tuesday, November 7, 2006 - link

    If the world is more interactive, there might not actually be a need for "prop physics" objects. There's also the potential to make a game with different gameplay mechanics depending on how the character interacts with the world. It sounds like initially Valve will make it a visual enhancement, to make entities look more lifelike in their behavior, but down the road others could potentially do more. Like most of the enhancements, what we want to see is how they can actually change and improve gameplay beyond just being visual.
  • yacoub - Tuesday, November 7, 2006 - link

    "You're doing a disservice to the customer if you're not using all of the CPU power."

    okay so that alone shows the need for dual-core so the customer can offload background tasks to a second core so the game can truly get 100% of a core. ;)
  • timmiser - Tuesday, November 7, 2006 - link

    Background tasks?? I agree with Gabe that I want all of my CPU working the game. I'm sitting here with an Athlon X2 4800+ and I can't run Flight Sim X at an acceptable frame rate all the while I watch the graph that shows CPU#2 at idle while CPU#1 tries to run the entire program by itself!

    On another note, how about Microsoft develope some type of new API (DirecThread?) that automatically takes care of utilizing multiple cores so that game companies like Valve don't have to employ entire multi-threaded R&D divisions just so their games use both cores.

    It seems like we are going back to the caveman days before DirectInput, DirectSound, & DirectX etc. Remember when you had to choose your sound card, joystick, and video card from a list within the game?? Let's not go there again!!

    -Tim
  • JarredWalton - Tuesday, November 7, 2006 - link

    That was a direct quote from... Gabe I think. Others might disagree, but I thought it was an interesting take on things. As for background tasks, they usually only need a bit of CPU time (less than 5% in most cases), so unless you really want to encode videos and play games at the same time....
  • yacoub - Tuesday, November 7, 2006 - link

    What a worthless attempt at sarcasm:

    quote:

    Valve is one of the most respected gaming software companies around, with a little gaming property called Half-Life that you may have heard of.


    Why not just say something normal like:

    quote:

    As developer of the Half-Life games, Valve is one of the most respected gaming software companies around.


    It's not omg-witty, but at least that sentence doesn't end with a preposition. @___@
  • JarredWalton - Tuesday, November 7, 2006 - link

    Which obviously ruined the whole article. Seriously yacoub, a little tact with posting would be nice. Rather than stating:

    quote:

    What a worthless attempt at sarcasm:


    You could have just said something normal like:

    quote:

    The sarcasm was a bit flat, and you ended with a preposition.


    We are real people, and derogatory adjectives like "worthless" tend to irritate more than help. You may not intend it that way, but imagine for a second someone talking about what you worked on for a couple days last week and describing it as "absolute garbage". The key word in "constructive criticism" is to actually make it constructive.

    Now, thanks for the suggestion, and I'm more than happy to change things a bit to appeas people. Like most people, I just appreciate a bit more consideration, even if I'm just doing my job here. :) As the old cliche goes, you can catch more flies with honey than with vinegar. (Well, my grandpa used to say that anyway.)

Log in

Don't have an account? Sign up now