IDF has started and the first benchmarks of Nehalem are going to start popping up. It is without a doubt an impressive architecture with a much better platform to run on, but this CPU is not about giving you better frames per second in your favorite game than the Penryn family. Let me make that more clear: even when the GPU is not the bottleneck, it is likely that most games will not be significantly faster than on Penryn. We, the people behind it.anandtech.com will probably have the most fun with it, more than your favorite review crew at Anandtech.com :-). And no, I have not seen any tests before I type this. Nehalem is about improving HPC, Database, and virtualization performance, and much less about gaming performance. Maybe this will change once games get some heavy physics threads, but not right away.

Why? Most Games are about fast caches and super integer performance. After all, most of the Floating point action is already happening on the GPU. The Core 2 CPUs were a huge step forward in integer performance (not the least because of memory disambiguation) compared to the CPUs of that time (P4 and K8). Nehalem is only a small step forward in integer performance, and the gains due to slightly increased integer performance are mostly negated by the new cache system. In a previous post I told you that most games really like the huge L2 of the Core family. With Nehalem they are getting a 32KB L1 with a 4 cycle latency, next a very small (compared to the older Intel CPUs) 256KB L2 cache with 12 cycle latency, and after that a pretty slow 40 cycle 8MB L3. When running on Penryn, they used to get a 3 cycle L1 and a 14 cycle 6144KB L2. The Penryn L2 is 24 times larger than on Nehalem!

The percentage of L2 caches misses for most games running on a Penryn CPU is extremely low. Now that is going to change. The integrated memory controller of Nehalem will help some, but the fact remains that the L3 is slow and the L2 is small. However, that doesn't mean Intel made a bad choice. Intel made a superbly good choice by improving the performance where Core (Merom/Penryn) was mediocre to good. Penryn was already a magnificent gaming CPU, but it could not beat the AMD competition in HPC benchmarks, and AMD put up a good fight in database performance benchmarks. Now Intel is ready to fix these shortcomings.

Most Database code cannot use the wide architecture of Penryn very well. The number of instructions per cycle can be lower than 0.5 and waiting for the memory is the most probable cause. SMT or Hyper-Threading can do wonders here: while one thread waits for a memory stall, the other thread continues working and vice versa.

Secondly, quad (and eight) socket performance is going to improve a lot as four Nehalems only have to keep four L3 caches in sync, while a similar Tigerton system has to keep eight L2 caches in sync. That is why the cache system is perfect for server performance, but a little less interesting for gaming performance.

The massive bandwidth that the integrated tri-channel memory controller delivers should also do wonders for HPC code, and the new TLB architecture with EPT will make Nehalem shine compared to its older Core brothers.

No, Nehalem wasn't made for the gaming enthusiasts. Rather, it was made to please the IT and HPC people. So we say bring it to it.anandtech.com; it's just not that interesting for you gamers! ;-)

POST A COMMENT

47 Comments

View All Comments

  • gochichi - Thursday, August 21, 2008 - link

    I think waiting is silly, particularly as DDR2 memory is so cheap and so good.

    Get a Quad-core, get 8GB of RAM, and never look back... don't wait another minute.
    Reply
  • gabo - Tuesday, August 19, 2008 - link

    I'm waiting for this new generation of processors. I'm currently using a P4 1.6 Mhz with 1.5 GB of RAM. I dont do to much work with 3D or gaming (for that I also use my XBOX360). I mainly work coding in C asp php java and some flash, some database related work like admin backup and such, and sometimes work with excel word powerpoint, etc. and obiously web surfing and email.

    Of course any upgrade would be tremendous in my situation, but what would any of you recommend most? a very low cost Penryn, when they drop in price, or a more expensive Nehalem? Which is the most bang for the buck for my particular needs?

    Thanks in advance
    Reply
  • theplaidfad - Tuesday, August 19, 2008 - link

    The correct decision is to hold onto that athlon a little longer, and THEN hit the q9550 when the i7 comes out and get in on that oh so sssschuuweeeeet price drop that should happen :) Reply
  • Calin - Tuesday, August 19, 2008 - link

    There is always something better in the future: a new architecture, faster (same architecture) processors, and price cuts.
    As such, if you can wait until the next generation comes, you could hope for a price cut. And maybe better performance/overclockability due to new microprocessor stepping, and a better mainboard than right now (though this might not be so).
    Reply
  • wingless - Tuesday, August 19, 2008 - link

    This sort of validates AMD's approach with Barcelona in a way. Their actions were deliberate but they got berated because K10's architecture was not 100% focused on gaming. Now Nehalem is going to take an AMD-like approach but throw in that AMAZING Hyper-Threading which will take server applications and multi-threaded gaming to a new level past AMD's K10. On Xtremesystems.org I stated in a post that Penryn owners probably won't get much out of Nehalem that they can't already do with their 45nm Quads as far as gaming is concerned. My ego has just been given a boost now that my thoughts have been confirmed by the pros at Anandtech! Reply
  • MDme - Wednesday, August 20, 2008 - link

    In a VERY big way yes.

    1. "NEW" cache architecture: small L2; big shared L3 = Barcelona
    2. Point to point serial links: Hypertranspor er Quickpath = Barcelona
    3. On board memory controller: oh that's Barcelona too (and K8)
    4. EPT: sounds a lot like NPT or RVI to me

    on all these points intel did incorporate AMD tech into the CPU. So i7 is kinda like the P4, K10, Core2 combined.
    Reply
  • JohanAnandtech - Wednesday, August 20, 2008 - link

    P4 is not really in there, except for the the "trace cache alike" Loop Stream Detector. If you are talking about SMT, SMT in the P4 is only a shadow of what SMT should be. SMT is a lot better implemented in Nehalem.

    But for the rest you are right, EPT = RVI/NPT. But I wouldn't call p2p serial links and IMC AMD tech. There are a lot of companies who have done this before AMD.
    Reply
  • formulav8 - Wednesday, August 20, 2008 - link

    But if AMD didn't have it would Intel have adopted it? I say No for the most part.

    In a way AMD did alot of things before Intel, that Intel later followed/copied. IMC, HTT, x64, True/Native Dual and Quad Core, Shared L3 Cache, High Performance Discreet GPU's, High Performance IGP's, 4x4, DDR based memory, and so on.


    Jason
    Reply
  • DigitalFreak - Tuesday, August 19, 2008 - link

    LOL

    It's been posted in a number of "previews" of Core i7 that it won't do much for gaming or single socket desktops, period. Perhaps your should get your ego under control and stop taking credit for information you ripped off from somewhere else.
    Reply
  • MonkeyPaw - Tuesday, August 19, 2008 - link

    From what I gather, the third channel for the memory doesn't do much either. I can't imagine that dual channel DDR3 is at all handicapped for bandwidth. Perhaps the third channel is put in place for Larrabee descendants? Reply

Log in

Don't have an account? Sign up now