Threading Models

There are four basic threading models available for computer programmers. Some of them are easier to implement than others, and some of them are more beneficial for certain types of work than others. The four threading models are:
  • Single Threading
  • Coarse Threading
  • Fine-Grained Threading
  • Hybrid Threading
Single threading is what everyone generally understands. You write a program as a sequential series of events that are executed. It's easy to do, and this is what most programmers learn in school or when they first start "hacking". Unfortunately, single threading is becoming obsolete for a lot of tasks. Simple programs that generally wait for user input or other events can still get by with taking this approach, but any application that depends on CPU computational power is going to miss out on a large amount of performance if it isn't designed to run in a multithreaded fashion. There's not much more to say about single threading.


The next step up from single threading is to look at coarse threading. Next to avoiding threads altogether, this is the simplest approach to multithreading. The idea is basically to find discrete systems of work that can be done separate from each other. The example we gave earlier of having two cooks working a single dish, with each person preparing a portion of the dish is an example of coarse threading. If one person is working on slicing up and grilling some chicken while a second person is preparing a Hollandaise sauce, you have two separate systems that can easily be accomplished at the same time.

A classic example of this in the computing world is an application that uses a client-server architecture. You would simply run the client code as one thread and the server code as a second thread, each would handle certain tasks, there would be a minimal amount of interaction between the two threads, and hopefully you would get a net increase in performance. This type of architecture is in fact what has allowed two of the current multithreaded games to begin to take advantage of multiple processors. Quake 4 and Call of Duty 2 both stem from id Software code that used a client-server design. The server handles such tasks as networking code, artificial intelligence and physics calculations, and perhaps a few other tasks related to maintaining the world state. Meanwhile the client handles such things as user input, graphics rendering, audio and some other tasks as well. (We might have the breakdown wrong, but it's the general idea we're interested in.)

Depending on the game, breaking up a client-server design into two separate threads can provide anywhere from ~0% to 20% or more performance. 0% - as in no gain at all? Unfortunately, as many of you are already aware, many games are bottlenecked by one specific area: graphics performance. If you reach the point where the graphics card is slowing everything else down, the CPU might have more than enough time available to run all of the other tasks while waiting for the graphics card to finish its work. If that's the case, having a faster CPU -- or even more CPU cores -- would provide little or no benefit in a coarse threaded client-server game. This is why Quake 4 and Call of Duty 2 see performance increases for enabling SMP support that start out high when you run at lower resolutions using fast graphics cards, but as the resolution increases and the GPU becomes the bottleneck, the benefit of SMP begins to disappear. Note also that both of these games only use two threads, so if you have something like a quad core processor or two dual core processors, you still aren't able to effectively use even half of the total computational resources available.

This is not to say that coarse threading can't be advantageous. If you have an application with two tasks that need to be computed, and both tasks take the same amount of time -- or nearly the same amount of time -- using a coarse threaded approach allows you to easily utilize all of the available CPU power. There are many tasks that fall into this category, although likewise there are plenty of tasks that do not fit this model. It is still a better solution than single threading in a multiprocessor world, but in many instances it's a baby step in the right direction.


Fine-grained threading takes the opposite approach of coarse threading. Instead of looking for entire systems that can be broken into separate threads, fine-grained threading looks at a single task and tries to break it into smaller pieces. A couple of the best examples of fine-grained threading providing a dramatic increase in performance are 3D rendering applications and video encoding applications. If you try to take a coarse threading approach to 3D rendering, you will usually end up with one system that handles the actual 3D rendering, and it will consume well over 90% of the application time. The same can be said of video encoding. Luckily, both tasks -- 3D rendering and video encoding -- are relatively simple to break up into smaller pieces. You simply take a divide and conquer approach, so if you have two processor cores you split the scene in half, or with four cores you split the scene into four pieces, etc. You can further optimize things by allowing any cores that finish early to begin working on a portion of one of the unfinished sections.

Thinking back to our kitchen analogy, what exactly is fine-grained threading? The best example we can come up with would be the preparation of a large quantity of food. Let's say you have a large barbecue where there will be hundreds of people present. Everyone wants their food to be warm, and hopefully the food doesn't get overcooked, undercooked, or any one of a number of other potential problems. If you're preparing something like hamburgers or steaks, cooking several at a time makes sense, but there's only so much one person can do. The answer is to add additional barbecues with additional cooks, so they can all be working on making hamburgers at the same time.

The problem with fine-grained threading is that not all tasks easily lend themselves to this approach. There are other types of foods that could be served at a barbecue where you might not need to have a bunch of chefs doing the same task, for example. It is definitely more difficult (overall) than coarse threading, but you can run into problems when you have a lot of work to be done but you don't know exactly how long each piece will take. Performance scaling and resource management also tend to become issues the more you split up a task - you wouldn't want 300 chefs, 300 barbeques, all cooking one hamburger each, right? You have to maintain a balance between spending time breaking a task into smaller pieces versus simply getting the work done. Typically, there is a minimum size beyond which it's not worth splitting up a task any further, and when you reach this point you just let all of the assigned pieces complete. Things like memory bandwidth can also become a performance bottleneck, so if you have a task that was really easy to break into pieces but each piece demanded a lot of memory bandwidth, you might discover that more than four pieces shows no performance improvements on current computer systems due to memory bandwidth limitations.

The net result is that fine-grained threading often has diminishing returns as you add more processors. With 3D rendering for example, having two processor cores is often about 90% faster than having only one processor core, whereas four processor cores is only about 80% faster than two cores, and eight cores might only be 70% faster than four cores. So if one processor takes 20 minutes to complete a 3D rendering task, two cores might be able to complete the task in 11 minutes, four cores might complete the task in 6 minutes, and eight cores could complete the task in only 3.5 minutes. The benefits of fine-grained threading will vary depending on the work that is being performed, and it can place a larger burden on the programmer to get things properly optimized, but the performance increases can still be substantial.


Hybrid threading really doesn't add much new to the table. As the name implies, this is a hybrid approach where you basically take both coarse and fine-grained threading and utilize them as appropriate to the task at hand. This requires the most work, as now you have to deal with synchronizing coarse system threads along with potentially individual threads within the systems. Hybrid threading will typically have more threads to worry about than either of the other approaches. The good news is that when done properly, hybrid threading can make maximum use of the available CPU power.

What Is Multithreading? Building a Threaded Game Engine
Comments Locked

55 Comments

View All Comments

  • yacoub - Wednesday, November 8, 2006 - link

    sorry about that. got a little too excited.
  • Ruark - Tuesday, November 7, 2006 - link

    Page 6: ". . . everything crawls to a slow."
  • duploxxx - Tuesday, November 7, 2006 - link

    page 8 test setup, clear a cut/paste job... all of the cpu's are the same Athlon.

    put an allendale in the benchmark, lot's of people want to see how cache related this multithread sw is.

    Valve talks about 64-bit... tests are 32bit? Since some competitors are talking about 64-bit code in there gaming also, should be interesting to see what the difference is vs 32bit.
  • JarredWalton - Tuesday, November 7, 2006 - link

    Sorry 'bout that - I've fixed the CPU line. All systems were tested at stock and 20% OC. The problem with Allendale is that it has different stock clock speeds, so you're not just comparing different CPU speeds (unless you use a lower multiplier on Conroe). Anyway, this is a first look, and the next CPU article should have additional benches.

    The tests are all 32-bit. This is not full Episode 2 code, and most people are waiting for Vista to switch to running 64-bit anyway. All we know is that Episode 2 will support 64-bits natively, but we weren't given any information on how it will perform in such an environment yet.
  • brshoemak - Tuesday, November 7, 2006 - link

    Nice to know where things are headed. Great article.

    Jarred, 2nd page, 2nd paragraph

    quote:

    place it into a pan and start beating the oven


    should be 'heating the oven' - although quite funny as is, you may want to keep it ;)
  • JarredWalton - Tuesday, November 7, 2006 - link

    Darn speech recognition. And "b" and "h" looked close enough that I missed it. Heh. No ovens were harmed in the making of this article.
  • MrJim - Tuesday, November 7, 2006 - link

    This was a very interesting article to read, the future looks bright for us with multi-core systems and Valve games. Excellent work Mr Walton!
  • timmiser - Tuesday, November 7, 2006 - link

    Shoot, I didn't make it past the dinner description. Got too hungry!
  • George Powell - Tuesday, November 7, 2006 - link

    I quite agree. Top notch article there. It is great to see how Valve are committed to giving us the best gaming experience.
  • Regs - Tuesday, November 7, 2006 - link

    The future looks bright for people willing to buy valve and multi-core CPUs!!

Log in

Don't have an account? Sign up now