Mainstream Nehalem: On-chip GPU and On-chip PCIe

Let’s take a look at the Core i7, the first Nehalem incarnation:

Three DDR3 memory channels, a QPI link to the X58 chipset and support for multiple GPUs off of the X58 IOH. The Core i7, as you know by now, plugs into Intel’s new LGA-1366 socket. But in the second half of next year, there will be a new socket for mainstream users: LGA-1156.

Meet Lynnfield, it’s also a 4-core/8-thread design with an 8MB L3 cache, just like the Core i7, but it plugs into LGA-1156. Instead of 3-DDR3 channels it’s got two and instead of QPI it’s got Intel’s DMI connecting it to the chipset. It’s a lower bandwidth interconnect but Lynnfield doesn’t need a ton of bandwidth between it and the chipset, the reason being its secret weapon: Lynnfield has an on-package PCIe controller.

There are 16 PCIe lanes on Lynnfield (presumably PCIe 2.0) and they can be used as two x8s or a single x16, so you’ll get 2-way SLI/CrossFire support assuming all licensing silliness is worked out. The close proximity of the PCIe controller to the CPU could mean some very interesting things for latency, if well designed Lynnfield could have the lowest latency CPU-GPU connection we’ve seen on a desktop PC. Whether or not that’ll actually mean anything for real world performance remains to be seen, I’d guess not but it’s neat to talk about nonetheless.

Next up we’ve got Havendale, this is a 2-core/4-thread part with a 4MB L3 (still 2MB of L3 per core, just like Lynnfield and the Core i7). The “pinout” (if we can still call it that on these pinless CPUs) is the same as Lynnfield, so we’ve got a two channel DDR3 memory controller and DMI to the chipset.

Havendale’s secret sauce is that it’s got an on-package GPU, I’d expect it to be a bigger, better, faster variant of G45 (hopefully a lot better/faster) built on a 45nm process. This should beat AMD to the punch with the first single-chip CPU/GPU for mainstream desktops/notebooks, as AMD delayed its first APUs until 2011. Alongside the on-package GPU we've also got the same PCIe controller from Lynnfield.

The actual display output on Havendale will be through the chipset itself but the GPU and PCIe interface are on the CPU’s package. Harvendale only offers a single x16 PCIe slot, you can’t run it in 2 x8 mode.

At the right clock speeds, Havendale should be perfect for notebooks and desktops as well. These days two-cores with Hyper Threading would be the perfect mixture of cores/performance for the majority of consumers. As I noted in part 2 of our Nehalem coverage, give me Nehalem’s power efficiency in a notebook and I’ll be beyond happy.

The hiccup however is that we won’t see Havendale until Q1 2010. It’ll start production in Q4’09 but systems won’t ship until the beginning of the next year. This does leave a hole in Intel’s Nehalem roadmap as there won’t be any Intel integrated graphics chipsets between now and Q1 2010, which should give NVIDIA ample opportunity to sell chipsets into the mainstream Nehalem market.

Index What to Buy: Mainsteam vs. High End Nehalem
Comments Locked

33 Comments

View All Comments

  • IntelUser2000 - Saturday, November 22, 2008 - link

    To: ltcommanderdata

    Actually you can't compare to Dothan. You have to compare to Conroe/Penryn. Conroe's L2 latency is at 14 cycles. I think it went up to make up for the complexity of the core(which is more than Dothan). Nehalem makes it even more complex.

    The reason individual transistors can run at 200GHz+ within certain research labs but nowhere near with a commercial chip is they have to synchronize every part of the chip with the clock.

    The CPU designers seem to take some chances when making a chip. Likely that's the reason for the delays for certain products as if you make a wrong decision then the prototypes might not come up as you wanted and you gotta make up for it.

    That's probably the reason that Conroe didn't come with SMT as the Israeli team managing the chip wasn't experienced as the team that made the P4. They probably could have but risking it would not have been a good idea.

    The Israeli team clings on proven technologies while the Hillsbro team makes up more radical ones, like Trace Cache, Out of Order, SMT, etc.
  • JonnyDough - Friday, November 21, 2008 - link

    It should be exactly like Penryn. Die shrink = less heat = higher clocks = performance increase.
  • ltcommanderdata - Friday, November 21, 2008 - link

    The point is that Penryn was not just a dumb shrink of Conroe with added cache as Presler was of Smithfield. Penryn wasn't a major redesign, but it did have architectural tweaks over Conroe including speeding up how the execution units divide numbers and execute shuffles. The FSB was also reworked to allow half multipliers while lower power states were added in mobile versions. VT support was enhanced and of course SSE4.1 was added.

    I believe clock-for-clock Penryn is on average 5% faster than Conroe while the difference can be substantially higher for SSE4.1 optimized apps. When I say I hope Westmere is more like Penryn, I'm hoping for similar tweaks to be made to increase performance clock-for-clock, rather than just relying on 32nm to increase clock speeds. I don't believe Intel is releasing another SSE instruction set before AVX in Sandy Bridge, so I guess they'll have to dig deeper for a performance boost.
  • VaultDweller - Thursday, November 20, 2008 - link

    "We’re finally getting wind of X58 motherboards at well below $300"

    Oh, please do share! This is what I'm interested in. Without this I would not even consider touching Nehalem with a ten foot pole.

    In the past I brushed off X38 and X48 completely, as it was so hard to find reasonable motherboards based on these chipsets. X58 is shaping up to be the same.

    The problem is that when I found X38 to be too expensive, I was able to find my peace with a P35 board (a P5K Premium). If I had building a system when X48 was hot off the press, I could find comfort knowing that P45 was right around the corner. There is no such comfort with Nehalem - the only lower-priced chip platform on the radar is based on a different socket, like S754 all over again.

    I don't want to cripple or limit the options for my next system build by going with LGA1156, but I don't want to pay $300-450 for a motherboard either.
  • heavyglow - Thursday, November 20, 2008 - link

    this is exactly what im thinking. im concerned that intel will abandon LGA1156 and ill be left with nothing.
  • 3DoubleD - Thursday, November 20, 2008 - link

    I can think of the reverse scenario where AMD abandoned the 940 platform and released all FX processors on 939. Neither option is safe, just pick one you don't mind sticking with if you have to.
  • Kiijibari - Thursday, November 20, 2008 - link

    It's so small because Nehalem is a 100% Server design.

    Because of this Intel went ahead with the inclusive cache design. It comes in quite handy in MP systems, if you just have to probe one L3 only instead of 4 L1/L2 caches.

    But there is one drawback, bigger L2 kills the benefit of the L3 size.
    Neglecting the L1 Caches, Nehalem has an effective L3 size of 7 MB, as 4x256kb are just copied data from the L2.
    Now imagine what would happen if intel would double the L2. Effective L3 cache size would have shrunk to 6MB, 2 MB waste .. that a lot of transistors.

    To make L2 problems worse, Intel reintroduced Hyperthreading. Great technique, no doubt, but now we even have 2 threads struggling for the tiny, little 256kb cache.

    I guess all the decisions pay off in a server environment, but to state that intel designed the small size L2 Caches because of the latency only is just a fine excuse for all the wanna-be gamers, who once heard that CL3 memory is better than CL5.

    cheers

    Kiiji
  • plonk420 - Thursday, November 20, 2008 - link

    If 8core i7s will work on x58, i'll likely bite sooner rather than doing a "wait and see."

    does this seem highly likely? or is it anyone's guess?
  • Casper42 - Thursday, November 20, 2008 - link

    Speaking of which, I ran across this today on accident:

    http://www.ecs.com.tw/ECSWebSite/Downloads/Product...">http://www.ecs.com.tw/ECSWebSite/Downlo...ilName=M...

    The ECS X58B-A
    Contains:
    6 DDR3 Slots
    2 x16 Slots
    1 x4 Slot
    2 x1 Slots
    1 PCI Slot

    The Manual makes mention of SLI as well which was surprising to me.

    I can see that a machine with this ECS Board, a 920 proc and 2 x 9800GTX+ cards (Currently going for around $150 each) and you could have a pretty potent little machine for around $1000
  • iwodo - Thursday, November 20, 2008 - link

    So we wont see new Mobile Part till 2010 ?

    That doesn't sound right to me at all. If that is the case then the rumours about it being a 32nm part may be right.

    However, the idea Intel not updating their Mobile Part for 18 months doesn't sound right to me at all.

Log in

Don't have an account? Sign up now