Core Duo - A High Level Architectural Overview

We have talked extensively about Intel’s Core Duo processor since back when it was called Yonah, and while we would have liked to bring you an extremely detailed report on exactly what was done (architecturally) in Yonah to make it what it is today the fact of the matter is that Intel just isn’t very forthcoming with this sort of information. 

Intel has been very protective with their Centrino processor architectures ever since the platform’s introduction.  We’ve always been given bits and pieces of information, but never the full disclosure we’ve hoped for.  Even to this day Intel has not disclosed the exact number of pipeline stages in the Pentium M or Core Duo processors.  In many cases it was Intel’s first Pentium M processor, Banias, that they were most forthcoming with.  With every successor, the flow of information became far more marketing and far less technical.  We do hope that at some point this won’t be the case, but until then we will have to do the best with what we’re given at this point.  What follows is a brief architectural overview of Intel’s Core Duo to help you understand where some of the performance advantages and more importantly, improvements in power consumption, come from at an architectural level. 

Intel’s Smart Cache

The very first thing you can take for granted about the Core Duo processor is that all of the advancements in performance and power efficiency seen in previous Pentium M processors are all rolled into the Core Duo.  That means the power saving cache in Banias and Dothan is still here today, as is the unique method of architecting the CPU for a fixed clock frequency rather than maximum performance. 

If you take for granted the years of work that Intel’s Israel team put into Banias and Dothan (which I’m sure they would love for you to do), it makes understanding the improvements of Core Duo all that much easier. 

While it’s easy to assume that the biggest change in Yonah is the fact that it is a dual core processor, the largest impacts actually come from how those two cores interact and not the fact that there are two of them to begin with.  One such example is what Intel is calling their “Smart Cache”, which is just a really terrible way of saying that the two cores share a common 2MB L2 cache. 

In AMD’s Athlon 64 X2 design there is a System Request Queue (SRQ) that queues up memory requests from each core and sends the requests off to either main memory or one of the on-die caches. 


How core-to-core communication works on the Athlon 64 X2  

The beauty of the SRQ is that the two cores can communicate with one another at core speed.  This is in stark contrast to the way the Pentium D works, where all cache-to-cache requests must actually leave one core, go out onto the external bus before finally being sent to the other core. 


How core-to-core communication works on the Pentium D  

Why do you need to communicate between two cores?  If both cores are working on the same data, they must be in communication to determine which cache actually contains the latest and most correct copy of the data before using it (either for further operations or for writing it to main memory).  In order to ensure that only one core has the “correct” copy of the data, the individual cores may put requests on the bus (or SRQ in the case of the Athlon 64 X2) asking to invalidate the copy of the data that should not be used.  This sort of traffic does take up bus bandwidth and is much better handled on-die than over a higher latency external bus. 

With a shared L2 cache, core-to-core communication can happen much faster than on the Pentium D, since it runs at clock speed - making the Core Duo a lot more like the Athlon 64 X2 in that regard.  It still lacks an on-die memory controller, but communication between the two cores is improved.  It is worth noting that even when comparing AMD’s Athlon 64 X2 with its SRQ to Intel’s Pentium D which lacked any low latency core-to-core communication, the real world impacts in desktop applications were tough to find.  That being said, we would rather have the benefit on paper and have it hard to prove in the real world than not have it at all.


Core-to-core communication on the Core Duo

The other benefit of Intel’s Smart Cache is that it can be dynamically resized depending on the needs of the individual cores.  So if one core is running idle, the other core can get full access to the 2MB L2 cache.  If both are active, they are able to split the 2MB of cache depending on their needs, which means that as long as both cores wouldn’t benefit from a full 2MB cache then overall efficiency of the chip is better than a similar design with two separate 2MB caches. 

Also, as the size of the L2 cache is changed its usage is also monitored.  If it is determined that the cache can safely be flushed to main memory and powered down, the cache controller will do so to keep CPU power consumption down.  The idea here is that refreshing main memory will eat up less power than keeping the on-die cache running and active.

Intel’s Digital Media Boost

Nothing says an increase in decoder throughput and instruction level parallelism better than Digital Media Boost, which is what the next group of Core Duo’s enhancements are  called. 

We’ve known about Digital Media Boost for a while now, and Intel actually publicly disclosed information about the Boost at the last IDF.  Unfortunately since then there has been no new information, so all we can report on is what we’ve already talked about:

Making Pentium M more "Media Friendly"

All of the major performance improvements to each of Yonah's cores seem to revolve around SIMD FP and FP performance, two of the Pentium M's present day weaknesses in comparison to the Pentium 4.

The first improvement is that now all three of Yonah's decoders can decode SSE instructions, regardless of the type of instruction. Improving the decode width of the processor is a quick way to improve performance.

Next, SSE/SSE2 operations (not sure if all can be, but at least some) can now be fused using the Micro Ops Fusion engine of Yonah. At a high level, the benefit here is increased performance and lower power consumption, we'll get into architectural details of why that is when we eventually sink our teeth into Yonah next year.

Each of the two cores in Yonah have also received support for SSE3 instructions much like the Pentium 4 E [Prescott].

And finally there have been some improvements to Yonah's floating point performance, although Mooly would not say exactly what's been done. Curiously, Mooly referred to the floating point performance improvements as specifically made to improve gaming performance. Intel may have grander plans for Yonah than once thought...

The SSE/FP optimizations are all being grouped into what Intel is calling their Digital Media Boost technology, yes the names seem to get worse and worse as time goes on - but at least the functionality should be good.

 

Napa vs. Sonoma - Tangible Features CPU-Level Power and Thermal Enhancements
Comments Locked

29 Comments

View All Comments

  • Shark Tek - Thursday, January 5, 2006 - link

    Lets hope that AMD release Turion's X2 with a even more reduced power consumption and DDR2 support that will be really "Sweet".

    Se imaginan un Turion64 X2 o un Core Duo combinado con un x1800 Mobility Radeon eso seria la combinacion perfecta para 'Lan Parties'. Sin la necesidad de andar con equipo pesado.

    ==============================================================================
    Can you imagine a notebook with Turion X2 or Core Duo matched with a X1800 Mobility Radeon. That will be the perfect combination for Lan-Parties. Without the need for carrying heavy parts from your Desktop @ home.



    Just imagine that ....
  • coldpower27 - Thursday, January 5, 2006 - link

    Very impressive.
  • monsoon - Thursday, January 5, 2006 - link

    yeah, me too i'm curious about the Apple products coming with Yonah, and how they stack up to X2 athlons PC Yonah notebooks...

    ...and overclocking !!!

    PS - BTW did you try to overclock the ASUS Yonah notebook ?
  • PeteRoy - Thursday, January 5, 2006 - link

    no
  • Doormat - Thursday, January 5, 2006 - link

    Page loads took forever but the review was interesting.

    I'm still interested to see what Apple does with these chips in their iBooks next week.

    The battery life of the T60 was impressive - 227 minutes for DVD playback. Finally, I can watch an LOTR episode on one battery!

    The release of only 1 single core chip speaks volumes - intel is ditching single core chips when they can. They want to push dual core hard.
  • Calin - Friday, January 6, 2006 - link

    In DVD playback the DVD unit consume some of the power... I wonder if playing a DVD from a virtual drive or from a network would prolong battery life
  • Furen - Thursday, January 5, 2006 - link

    Very lovely power consumption. I suppose power consumption will be a bit higher when both cores are at 100% usage but most of us dont keep our CPU usage pegged at 100% when using a notebook and specially not if we care about power consumption at all. It'd be nice if Intel had decided to go to 90nm on the chipsets but I suppose their power consumption is not that high to begin with and Intel needs a use for its 130nm fabs...
  • Calin - Friday, January 6, 2006 - link

    off course the power consumption will be higher with both cores at 100% usage - but in this case the "work per watt" is greater, as processors don't use all the power in the system.
    Just that people would prefer a laptop that consume a battery charge faster but finish the work much faster than the other way around.
  • cheburashka - Thursday, January 5, 2006 - link

    Intel's chipset shortage problem is because all current MCH's are still on 130nm, which is maxed out in the fabs. They would love to get the 90nm Broadwater/Crestline chips out the door to free up 130nm capacity to build low end parts again.

Log in

Don't have an account? Sign up now