Intel Developer Forum Spring 2004 - Wrapupby Derek Wilson on February 23, 2004 8:44 PM EST
- Posted in
- Trade Shows
The Information Age
Pat Gelsinger's presentation began with a bit of a history lesson taking us from kilobyte to gigabytes. What followed was a discussion of how Intel will attack the "Era of Tera" as Pat dubbed the next step up.
Much of the rest of the keynote was geared toward answering the questions of why we need "tera" anything, and how Intel plans to approach the problem of achieving such high performance computing. The answer to the first question came back to reflect what Sean Maloney had said about broadband pushing computer systems to their limit: recognition, mining, and synthesis of data (for which Gelsinger used the poorly chosen acronym RMS). Essentially, Gelsinger is saying that the "Era of Tera" will allow us to operate quickly on massive data sets to identify very complex patterns and situations, as well as help us generate data that blurs the lines of reality.
As examples of usefulness, Pat mentioned that computers were able to detect the possibility of what happened on September 11, but they were a week or two late in doing so. Data mining would allow us to do such things as search the web for image data based on what the image actually looked like. As for an example of synthesis, we were shown a demo of realtime raytracing. Visualization being the infinitely parallelizable problem that it is, this demo was a software renderer running on a cluster of 23 dual 2.2GHz Xeon processors. The world will be a beautiful place when we can pack this kind of power into a GPU and call it a day.
Of course, we still need to answer the question of how we are going to get from here to there. As surprising as it may seem, Intel's answer isn't to push for ever increasing frequencies. With some nifty charts and graphs, Pat showed us that we wouldn't be able to rely on increases in clock frequency giving us the same increases in performance as we have had in the past. The graphs showed the power density of Intel processors approaching that of the sun if it remains on its current trend, as well as a graph showing that the faster a processor, the more cycles it wastes waiting for data from memory (since memory latency hasn't decreased at the same rate as clock speed has increased). Also, as chips are fabbed with smaller and smaller processes, increasing clock speeds will lead to problems with moving data across around a chip in less than one clock cycle (because of interconnect RC delays).
In addition to clock speed not being able to pull us out of the mud, architectural advances in processors are limited by the maximum instruction level parallelism (ILP) available in any given program (the max amount of work a processor can do is limited because not all instructions can be completed in parallel: some instructions are dependant on the result of other instructions). Since the average maximum ILP isn't increasing in programs, we will need to find another way to increase the performance of a processor.
If clock frequency isn't going to get us anywhere, and we are hitting a wall with increasing how many instructions per cycle we can complete in a single program, the only other option is to increase parallelism on the thread level. Rather than trying to get more done in a single program or thread, we will have to have multiple processors running independent code at the same time. Intel's first step in this direction was the baby step of Hyper Threading, but dual core, multicore and massively multicore processors are on the horizon for Intel.
In addition to massively multicore architectures, Intel needs to eliminate bottlenecks from other parts of the system as well. One of the ways they plan on doing this is via a feature called Helper Threads. Apparently, half of the execution time of any given processes is spent waiting for data. If that data could be available in the cache for the processes when they needed it, everything would run much faster. Helper Threads are apparently able to warm up the cache for a specific process when they would normally have a cache miss. In the demo of Helper Threads Intel ran a benchmark on an Itanium processor and a "research Itanium," and we saw 8.9% speedup and 23% fewer cache misses from the Helper Thread enabled side.
One of the other paths Intel is looking down is adaptability. Adaptive body biasing (forward biasing a transistor when it is on, and reverse biasing when it is off) to increase performance and decrease power lost to leakage is being explored on the silicon level. On the large scale, adaptive architectures and platforms are being explored. Reconfigurable architectures such as adaptive wireless radio arrays that can be easily reconfigured to work with multiple types of wireless networks are another example of the kind of adaptability Intel wants to see evolve in the future.
By utilizing massive multiprocessing and adaptive/programmable architectures, the hope is that systems will be able to form themselves to the needs of the programs they are running while doing as many things as possible in any given nanosecond (or part thereof as the case may be).
Of course, that's the future. Dual core processors aren't even going to be showing up this year (though next year might be a different story if we are lucky), and reconfigurable and adaptive computing has been discussed for a very long time. It is very exciting to see what some of the visions Intel's farthest looking people have to say about where we are headed, but it also serves to make us a little bit like the next few years will be an eternal day-before-Christmas.