Not Another Conroe

Comparing Conroe to Pentium 4 was night-and-day, the former was such a radical departure from the NetBurst micro-architecture that seemingly everything was done differently. The Pentium 4 needed a tremendous amount of software optimization to actually extract performance from that chip, Intel has since learned its lesson and no longer expects the software community to re-compile and re-optimize code for every new architecture. Nehalem had to be fast out of the box, so it was designed that way.

Conroe was the first Intel processor to introduce this 4-issue front end. The processor could decode, rename and retire up to four micro-ops at the same time. Conroe’s width actually went under utilized a great deal of the time, something that Nehalem did address, but fundamentally there was no reason to go wider.

Intel introduced macro-ops fusion in Conroe, a feature where two coupled x86 instructions could be “fused” and treated as one. They would decode, execute and retire as a single instruction instead of two, effectively widening the hardware in certain situations.

Nehalem added additional instructions that could be fused together, in addition to all of the cases supported in existing Core 2 chips:

The other macro-ops fusion enhancement is that now 64-bit instructions can be fused together, whereas in the past only 32-bit instructions could be. It’s a slight performance improvement but 64-bit code could see a performance improvement on Nehalem.

Looking at Nehalem Improved Loop Stream Detection


View All Comments

  • defter - Friday, August 22, 2008 - link

    Links are 20-bit wide, regardless of encoding or whether 1,2,8,16 or 20 bits are used to tranmist the data.

    I wonder who is flamebaiting here, a previous poster just mentioned the correct link width, he wasn't talking about "usable speed".
  • rbadger - Thursday, August 21, 2008 - link

    "Each QPI link is bi-directional supporting 6.4 GT/s per link. Each link is 2-bytes wide..."

    This is actually incorrect. Each link is 20 bits wide, not 16 (2 bytes). This information is on the slide posted directly below the paragraph.
  • JarredWalton - Thursday, August 21, 2008 - link

    It's 20-bits but using a standard 8/10 encoding mechanism, so of the 20 bits only 16 are used to transmit data and the other four bits are (I believe) for clock signaling and/or error correction. It's the same thing we see with SATA and HyperTransport. Reply
  • ltcommanderdata - Thursday, August 21, 2008 - link

    Since the PCU has a firmware, I wonder if it will be updatable? It would be useful if lessons learn in the power management logic of later steppings and in Westmere can be brought back to all Nehalems through a firmware update for lower power consumption or even better performance with better Turbo mode application. Although a failed or corrupt firmware update on a CPU could be very problematic. Reply
  • wingless - Thursday, August 21, 2008 - link

    I thought about this when I read about it the first time too. Flashing your CPU could kill the power management or the whole CPU in one fell swoop! Reply

Log in

Don't have an account? Sign up now