Multiple Clock Domains

Functionally there are some basic differences between Nehalem and previous Intel architectures. The Front Side Bus is gone and replaced with Intel's Quick Path Interconnect, similar to AMD's Hyper Transport. The QPI implementation on the first Nehalem is a 25.6GB/s interface which matches up perfectly to the 25.6GB/s of memory bandwidth Nehalem has.

The CPU operates on a multiplier of the QPI source clock, which in this case is 133MHz. The top bin Nehalem runs at 3.2GHz or 133MHz x 24. The L3 cache and memory controller operate on a separate clock frequency called the un-core clock. This frequency is currently 20x the BCLK, or 2.66GHz.

This is all very similar to AMD's Phenom, but where the two differ is in how they handle power management. While AMD will allow individual cores to request different clock speeds, Nehalem attempts to run all of its cores at the same frequency; if one core is idle then it's simply power gated and the core is effectively turned off. I explain this in greater detail here but the end result is that we don't have the strange performance issues that sometimes appear with AMD's Cool'n'Quiet enabled. While we have to turn off CnQ to get repeatable results in some of our benchmarks (in some cases we'll see a 50% performance hit with CnQ enabled), Intel's EIST seems to be fine when turned on and does not concern us.

My Concern

Looking at Nehalem's microarchitecture one thing becomes very clear: this is a CPU designed to address Intel's shortcomings in the server space. There's nothing inherently wrong about that, but it's a different approach than what Intel did with Conroe. With Conroe Intel took a mobile architecture and using the philosophy that what was good for mobile, in terms of power efficiency and performance per watt, would also be good for the desktop, it created its current microarchitecture.

This was in stark contrast to how microprocessor development used to work; chips would be designed for the server/workstation/high end desktop market and trickle down to mainstream users and the mobile space. But Conroe changed all of that, it's a good part of why Intel's Core 2 architecture makes such a great desktop and mobile processor.

Power obviously also matters in servers but not to the same extent as notebooks, needless to say Conroe did well in the server market but it lacked some key features that allowed AMD to hang onto market share.

Nehalem started out as an architecture that addressed these enterprise shortcomings head on. The on-die memory controller, Hyper Threading, larger TLBs, improved virtualization performance, restructured cache hierarchy, the new 2nd level branch predictor, all of these features will be very important to making Intel more competitive in the enterprise space, but at what cost to desktop power consumption and performance?


Intel promises better energy efficiency for the desktop, we'll be the judge of that...

I'm stating the concern up front because when I approached today's Nehalem review that's what I had in mind. Everyone has high expectations for Nehalem, but it hasn't been that long since Intel dropped Prescott on us - what I want to find out is whether Intel has stayed true to its mission on keeping power in check or if we've simply regressed with Nehalem.

The only hope I had for Nehalem was that it was the first high performance desktop core that implemented Intel's new 2:1 performance:power ratio rule. Also used by the Atom's design team, every feature that made its way into Nehalem had to increase performance by 2% for every 1% increase in power consumption otherwise it wasn't allowed in the design. In the past Intel used a general 1:1 ratio between power and performance, but with Nehalem the standards were much higher. We'll find out if Intel was all talk in a moment, but let's take a look at Nehalem's biggest weakness first.

Index The Chips
POST A COMMENT

74 Comments

View All Comments

  • anand4happy - Sunday, February 08, 2009 - link

    saw many thing but this is the thing something dfferent

    sd4us.blogspot.com/2009/01/intel-viivintel-975x-express-955x.html
    Reply
  • nidhoggr - Monday, November 10, 2008 - link

    I cant find that information on the test setup page. Reply
  • nidhoggr - Monday, November 10, 2008 - link

    test not text :) Reply
  • puffpio - Wednesday, November 05, 2008 - link

    would you guys consider rebenchmarking?
    from the x264 changelog since the nehalem specific optimizations:
    "Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%."
    Reply
  • anartik - Wednesday, November 05, 2008 - link

    Good review and better than Tom's overall. However Tom's stumbled on something that changed my mind about gaming with Nehalem. While Anand's testing shows minimal performance gains (and came to the not good for games conclusion) Tom's approached it with 1-4 GPU's SLI or Crossfire. All I can say is the performance gains with Nvidia cards in SLI was stunning. Maybe the platform favors SLI or Nvidia had a driver advantage in licensing SLI to Intel. Either way Nehalem and SLI smoked ATI and the current 3.2 extreme quad across the board. Reply
  • dani31 - Wednesday, November 05, 2008 - link

    I know it would't change any conclusion, but since we discuss bleeding edge Intel hardware it would have been nice to see the same in the AMD testbed.

    Using a SB600 mobo (instead of the acclaimed SB750) and an old set of drivers makes it look like the AMD numbers were simply pasted from an old article.
    Reply
  • Casper42 - Tuesday, November 04, 2008 - link

    Something I think you guys missed in your article/conslusion is the fact that we're now able to pair a great CPU with a pretty damn good North/South Bridge AND SLI.

    I found that the 680/780/790 featureset is plainly lacking and that the Intel ICH9R/10R seems to always perform better and has more features. If any doubt, look at Matrix RAID vs nVidia's RAID. Night and day difference, especially with RAID5.

    The problem with the X38/X48 was you got a great board but were effectively locked into ATI for high end Gaming.

    Now we have the best of both worlds. You get ICH10R, a very well performing CPU (even the 920 beats most of the Intel Quad Core lineup) AND you can run 1/2/3 nVidia GPUs on the machine. In my opinion, this is a winning combination.


    The only downside I see is board designs seem to suck more and more.

    With socket 1366 being so massive and 6 DIMM slots on the Enthusiast/Gamer boards, we're seeing not only 6 expansion slots (down from the standard of 7) but in most boards I have seen pics of, the top slot is an x1 so they can wedge it next to the x58 IOH which means your left with only 5 slots for other cards. Using 3 dual slot cards is out of the question without a massive 10 slot case (of which there are only like 3-5 on the market) and even if you can wedge 2 or 3 dual slot cards into the machine, you have almost zero expansion card slots should you ever need them.

    Then we get to all the cooling crap surrounding the CPU. ALL these designs rely on a top down traditional cooler and if you decide to use a highly effective tower cooling solution, all the little heatsink fins on the Northbridge and pwer regulators around the CPU get very little or no airflow. Now your in there adding puny little 40/60mm fans that produce more noise than airflow, not to mention that the DIMMs are hardly ever cooled in today's board designs.
    Call me a cooling purist if you will, but I much prefer traditional front to back airflow and all this side intake top exhaust stuff just makes me cringe. I personally run a Tyan Thunder K8WE with 2 Hyper6+ coolers and the procs and RAM are all cooled front to back. Intake and exhaust are 120mm and I have a bit of an air channel in which that airflow never goes near the expansion card slots below, which by the way have a 92mm fan up front pushing air in across the drives and another 92mm fan clipped onto the expansion slots in the back pulling it back out.

    I dont know how to resolve these issues, but I think someone surely needs to because IMHO its getting out of control.
    Reply
  • lemonadesoda - Tuesday, November 04, 2008 - link

    "Looking at POV-Ray we see a 30% increase in performance for a 12% increase in total system power consumption, that more than exceeds Intel's 2:1 rule for performance improvement vs. increase in power consumption."

    You cant use "total system power", but must make the best estimate of CPU power draw. Why? Because imagine if you had a system with 6 sticks of RAM, 4 HDDs, etc. you would have ever increasing power figures that would make the ratio of increased power consumption (a/b) smaller and smaller!

    If you take your figures and subtract (a guestimate of) 100W for non CPU power draw, then you DONT get the Intel 2:1 ratio at all!

    The figures need revisiting.
    Reply
  • AnnonymousCoward - Thursday, November 06, 2008 - link

    Performance vs power appears to linearly increase with HT. Using the 100W figure for non-CPU draw means a 25% power increase, which is close to the 30% performance.

    Unless we're talking about servers, I think looking at power draw per application is silly. Just do idle power, load power, and maybe some kind of flops/watt benchmark just for fun.
    Reply
  • silversound - Tuesday, November 04, 2008 - link

    great article, tomsharware reviews always pro intel and nvidia, not sure if they got pay $ to suppot them. anandtech is always neutral, thx! Reply

Log in

Don't have an account? Sign up now