IDF has started and the first benchmarks of Nehalem are going to start popping up. It is without a doubt an impressive architecture with a much better platform to run on, but this CPU is not about giving you better frames per second in your favorite game than the Penryn family. Let me make that more clear: even when the GPU is not the bottleneck, it is likely that most games will not be significantly faster than on Penryn. We, the people behind will probably have the most fun with it, more than your favorite review crew at :-). And no, I have not seen any tests before I type this. Nehalem is about improving HPC, Database, and virtualization performance, and much less about gaming performance. Maybe this will change once games get some heavy physics threads, but not right away.

Why? Most Games are about fast caches and super integer performance. After all, most of the Floating point action is already happening on the GPU. The Core 2 CPUs were a huge step forward in integer performance (not the least because of memory disambiguation) compared to the CPUs of that time (P4 and K8). Nehalem is only a small step forward in integer performance, and the gains due to slightly increased integer performance are mostly negated by the new cache system. In a previous post I told you that most games really like the huge L2 of the Core family. With Nehalem they are getting a 32KB L1 with a 4 cycle latency, next a very small (compared to the older Intel CPUs) 256KB L2 cache with 12 cycle latency, and after that a pretty slow 40 cycle 8MB L3. When running on Penryn, they used to get a 3 cycle L1 and a 14 cycle 6144KB L2. The Penryn L2 is 24 times larger than on Nehalem!

The percentage of L2 caches misses for most games running on a Penryn CPU is extremely low. Now that is going to change. The integrated memory controller of Nehalem will help some, but the fact remains that the L3 is slow and the L2 is small. However, that doesn't mean Intel made a bad choice. Intel made a superbly good choice by improving the performance where Core (Merom/Penryn) was mediocre to good. Penryn was already a magnificent gaming CPU, but it could not beat the AMD competition in HPC benchmarks, and AMD put up a good fight in database performance benchmarks. Now Intel is ready to fix these shortcomings.

Most Database code cannot use the wide architecture of Penryn very well. The number of instructions per cycle can be lower than 0.5 and waiting for the memory is the most probable cause. SMT or Hyper-Threading can do wonders here: while one thread waits for a memory stall, the other thread continues working and vice versa.

Secondly, quad (and eight) socket performance is going to improve a lot as four Nehalems only have to keep four L3 caches in sync, while a similar Tigerton system has to keep eight L2 caches in sync. That is why the cache system is perfect for server performance, but a little less interesting for gaming performance.

The massive bandwidth that the integrated tri-channel memory controller delivers should also do wonders for HPC code, and the new TLB architecture with EPT will make Nehalem shine compared to its older Core brothers.

No, Nehalem wasn't made for the gaming enthusiasts. Rather, it was made to please the IT and HPC people. So we say bring it to; it's just not that interesting for you gamers! ;-)



View All Comments

Log in

Don't have an account? Sign up now