Fall IDF 2005 - Day 1: Coverage of Everythingby Anand Lal Shimpi on August 24, 2005 2:31 AM EST
- Posted in
- Trade Shows
Turning Single into Multi-Threaded with Speculative Threading
Intel had a particularly interesting research project being demonstrated called Mitosis, a hardware and compiler solution to implementing speculative threading.
On modern day Out of Order microprocessors, the CPUs themselves will speculatively execute code based on what it thinks will need to be executed in the future - thus improving processor utilization and overall performance. This research project proposes that the same sort of speculative execution be applied on a thread level, meaning that threads are created on the fly and speculatively executed by idle cores in a system in order to improve performance on future multi-core CPUs.
The Mitosis project relies on both hardware and software (compiler) support to work. First, on the software side, blocks of code that have very few inputs and outputs are detected and considered for use as a separate thread.
The entry and exit points of the current working thread are marked, and the portion of the thread that would be split off is separated. The new thread is then fed the appropriate input data that it needs to begin its execution.
With the single thread now split into two threads, they are both sent to the multi-core processor and executed in parallel. At the end of the execution of the thread, its result is checked to make sure that the data is still valid, and if so, the result is committed and all is well. If the result is invalid the thread must be thrown away, but since we're talking about a single threaded application to begin with, there is no wasted performance, only wasted power as the core this thread was running on would have been idle had it not been for the speculative thread generation.
On the hardware side, there is one major change needed to implement Mitosis:
The inclusion of a global register file and a register versioning table to keep track of which cores have the latest and most correct register valuesis necessary. You also need some additional logic to help validate the outcome of these threads.
The end result of all of this is pretty promising, especially for single threaded applications that would otherwise get no benefit from being run on a multi-core CPU.
In order to demonstrate the performance potential, the Intel researchers working on the project took a look at performance in the Olden benchmark suite. The Olden suite was chosen because it is a set of code that is extremely difficult to parallelize. The graph below shows performance improvement over a single Out of Order core:
The green bars show the performance improvement going from single to dual core, the red bars show the performance improvement from having the benchmark and its data stored entirely within L1 cache (no cache misses) and finally the yellow bars show the performance improvement due to the use of the Mitosis compiler/hardware modifications with a dual core CPU.
As you can see, offering in many cases a 2 - 3x performance improvement is nothing short of impressive. But keep in mind, this project is in its very early stages of research and as promising as this looks, it may take 5 - 10 years for the research to make its way into the real world.