Introduction

Wide Dynamic Execution, Advanced Digital Media Boost, Smart Memory Access and Advanced Smart Cache; those are the technologies that according to the marketing people at Intel enable Intel to build the high performance, low energy CPUs using the new Core architecture.

Of course, as an AnandTech Reader, you couldn't care less about which Hyper Super Advanced Label the marketing folks glue on their CPUs. "Extend the digital lifestyle by combining robust performance with low power consumption" could have been another marketing claim for the new Core architecture, but VIA already cornered that sentence for its C7 CPUs. The marketing slogans for Intel's Core and VIA's C7 are almost the same; the architectures are however vastly different.

No, let us find out what is really behind all this marketing hyper-talk, and preferably compare it with the AMD "K8" (Athlon 64, Opteron) architecture of Intel's NetBurst and Pentium M processors. That is what this article is all about. We talked to Jack Doweck, the engineer who designed the completely new Memory Reorder Buffer and Memory disambiguation system. Jack Doweck is one of the Intel Israel Development Center (IDC) architects.

The Intel "P8"

Intel marketing states that Core is a blend of P-M techniques and NetBurst architecture. However, Core is clearly a descendant of the Pentium Pro, or the P6 architecture. It is very hard to find anything "Pentium 4" or "NetBurst" in the Core architecture. While talking to Jack Doweck, it became clear that only the prefetching was inspired by experiences with the Pentium 4. Everything else is an evolution of "Yonah" (Core Duo), which was itself an improvement of Dothan and Banias. Those CPUs inherited the bus of the Pentium 4, but are still clearly children of the hugely successful P6 architecture. In a sense, you could call Core the "P8" architecture, with Banias/Dothan being based on the "P7" architecture. (Note that the architecture of Banias/Dothan was never given an official name, so we will refer to it as "P-M" for simplicity's sake.)

Of course this doesn't mean that Intel's engineers just bolted a few functional units and a few decoders on Yonah and called it a day. Jack told us that Woodcrest/Conroe/Merom are indeed based on Yonah, but that almost 80% of both the architecture and circuit design had to be redone.

CPU architecture in a nutshell

For those of you who are not so familiar with CPUs, we'll start with a crash course in CPU architectures. To understand CPU design, you must first look at the instructions that are sent to the CPU, and thus we start with the software.

Typical x86 software code consists of about 50% stores and loads, and there are about twice as many loads as there are stores. Of the remainder, about 15 to 20% of the instructions are branches (If, Then, Else), and the rest are mostly "ADD" (addition) and "MUL" (multiply) instructions. Only a very small percentage of code consists of more exotic instructions such as DIV (divisions), SQRT (square root), or other higher order math (e.g. trigonometric functions).

All these instructions are processed in a typical "Von Neuman" pipeline: Fetch, Decode, Operand Fetch, Execute, Retire.

Instructions are fetched based on the instruction pointer register, and initially they are nothing but long bit patterns to the CPU. It's only after the CPU starts decoding the bits that the instructions "start to make sense" to the CPU. Addresses and opcodes are decoded out of the instructions, and the addresses are used for the next step: the operand fetch. As you don't want the CPU to perform calculations with the addresses but rather on the content of these addresses - the "operands" - the CPU has to fetch the right data out of the data cache. Once these operands are put in the registers, the ALU is steered by the "opcode" (which has been decoded) to perform the right calculation on the operands in the registers.

The results are written to the architecture register file, the registers which can be used by the compiler. The results must also be written to the caches and the main memory, so that these are also up to date. That is the final phase, the retire phase. That is the basically how processing works in all CPUs.

The main challenge for the CPU designer today is the average memory latency the CPU sees. A Pentium 4 3.6 GHz with DDR-400 runs no less than 18 times faster than the base clock of the RAM (200 MHz). Every cycle the memory is being accessed, a minimum of 18 cycles pass on the CPU. At the same time, it takes several cycles to even send a request, and it takes a few cycles to send a request back. (We discussed this in the past in our overview of memory technology article.) The result is that wait times of 200 to 300 cycles are not uncommon on the Pentium 4. The goal of CPU cache is to avoid accessing RAM, but even if the CPU only has to go to system memory 4% of the time, that 4% of the time can lower performance significantly.

Memory Subsystem
POST A COMMENT

85 Comments

View All Comments

  • PandaBear - Monday, May 01, 2006 - link

    Of course Core should be better than K8, it better be.

    The only thing I am concerned about the Core architecture is with all these additional stuff, it will probably cost a lot to make, not just the CPU, but the MB, chipset, will also be expensive with the additional high speed circuitry. That means it will probably cost more.

    K8 has been 5 years old and it is not bad standing against the latest and greatest. If AMD have something in the pipeline that will be the next monster CPU, it will be great. What I am concern about AMD is whether they can keep their yield up and have enough $ left behind to design K9 and beyond. Don't just sit there and lose the momentum they gain.
    Reply
  • saratoga - Monday, May 01, 2006 - link

    Core is a pretty conservative design with a pretty small die for a new core. It should be very economical to produce. Probably more so then the chips its replaceing. Reply
  • IntelUser2000 - Monday, May 01, 2006 - link

    quote:

    Of course Core should be better than K8, it better be.

    The only thing I am concerned about the Core architecture is with all these additional stuff, it will probably cost a lot to make, not just the CPU, but the MB, chipset, will also be expensive with the additional high speed circuitry. That means it will probably cost more.


    Not really. Not many expected that Intel will do more than increasing clock speeds and cache sizes since that's what they have been doing that since Pentium II.

    http://www.reghardware.co.uk/2006/04/05/intel_conr...">http://www.reghardware.co.uk/2006/04/05/intel_conr...

    The ASP went down. $530 for the fastest mainstream Conroe is rather good.
    Reply
  • zsdersw - Monday, May 01, 2006 - link

    The pricing put out by Intel suggests that Core will be priced very aggressively. I can't see the 975 chipset costing significantly more than it does now when Core is released.

    The fact that Core is going to be built on Intel's 65nm process means that the "additional stuff" you refer to will cost less than it would if built on the 90nm process. And the die size probably grew a little, but not enough to offset the cost gains from the 65nm process.
    Reply
  • xtremejack - Monday, May 01, 2006 - link

    K8 is only 3 years old. Didn't AMD celebrate their 3rd anniversary of Opteron a few days ago. Reply
  • Griswold - Thursday, May 04, 2006 - link

    Its been sold for 3 years, but clearly the design is "a few days" older than that. Reply
  • evident - Monday, May 01, 2006 - link

    as a junior computer engineer at villanova university, i found this article to be really informative and an awesome read. it's really cool to see the differences between these CPU architectures and shows that they are actually teaching me something useful! Reply
  • PeteRoy - Monday, May 01, 2006 - link

    How can you say Netburst wasn't a huge success?

    I think Netburst was a success when it was launched and it should have died sooner, but it was good for it time and now it will be replaced.
    Reply
  • JarredWalton - Monday, May 01, 2006 - link

    NetBurst started at 1.5 GHz basically and topped out at 3.8 GHz. Compared to previous architectures, that's pretty tame. P6 went from 150 MHz to 1.26 GHz (and beyond if you want to count P-M). Success monetarily vs. success as an overall design are two different things, and clearly NetBurst ran into trouble. Where are the 5 GHz+ Tejas chips? Waiting somewhere beyond the thermal even horizon.... :) Reply
  • Missing Ghost - Monday, May 01, 2006 - link

    hum, no. There is a 1.4gHz P6, you forgot Tualatin. Reply

Log in

Don't have an account? Sign up now