We've never seen Intel with such a strong roadmap before, the company is truly firing on all cylinders and executing with amazing precision. Server, mobile, desktop and even new areas like ultra mobility and graphics all have absolutely wonderful roadmaps to look forward to. The biggest complaint we've had about Intel these days is that they kind of botched the X38 launch. Think back to a couple of years ago, what was our chief complaint then? Probably leaving us with power hungry, under performing processors for about 5 years. Today we're looking at a damn good Intel.

Recap: What's a Penryn?

Core, it's the architecture that shook an industry and today Intel is officially doing its first update to it. Prior to Intel's Core architecture, there wasn't much to get excited about when it came to Intel on the desktop.

At the same time, with Penryn Intel is very much the victim of its own success. How do you follow up such a tremendous splash with anything but equal greatness? AMD is close but still has yet to produce a response to Core 2, much less Penryn, and thus Intel's biggest competition today is itself.

In January 2007 Intel first showed off its 45nm High K + metal gate transistors, a dramatic departure from Intel's current 65nm transistors not only in size/switching speed but actual composition. If you remember back to the days of the original P6 processors, with a smaller transistor we saw tremendous improvements in die size, power consumption and performance. These days, such dramatic improvements are much harder to come by given that we're already dealing with such small transistor feature sizes. Gone are the days of the free lunch with each die shrink.

We've gone over the technical details of Intel's high-K + metal gate enhancements, but the end result is that at the same clock speed you can expect dramatic reductions in power. Alternatively, at the same power levels, you can achieve much higher switching rates and thus higher clock speeds.

The core architecture of Penryn remains unchanged from Conroe; with the smaller transistors Intel's able to fit in a few new features and more cache on the chip while still maintaining a smaller die size. Where each dual-core Conroe die measured 143 mm^2, Penryn is merely 107 mm^2 despite having 50% more cache (6MB vs. 4MB). Obviously the quad core chips double overall area but you get the point.

Intel also uses a lot more of these new 45nm transistors than before; while a dual-core Conroe was made up of 291 million transistors, the comparable Penryn weighs in at 410 million (582M vs. 820M for quad-core variants). You're getting 40% more transistors and 50% more cache in a 25% smaller package; the latter is obviously most important to Intel as it helps reduce costs and drive profits up. So while it may seem generous, the move is purely self motivated on Intel's part.

The larger cache is a bit different than what we've seen in Conroe. While Conroe's cache is a 4MB 16-way set-associative L2, the 6MB Penryn cache is 24-way set-associative, designed to improve hit rates and keep latency manageable in an already large cache. Intel hasn't revealed whether Penryn's prefetchers have been adjusted to help populate its larger cache any better. As we saw in our original Penryn preview, Penryn's cache performance remains unchanged; latencies in our final stepping are identical to Conroe.

The cache enhancements are by far the biggest consumer of those extra transistors in Penryn, but believe it or not, they aren't responsible for the biggest performance boost. Intel has been fairly steady in adding new instructions to the x86 ISA and Penryn continues the trend with the addition of SSE4. Penryn gets 47 new instructions that make up the first implementation of SSE4; more will come with Nehalem at the end of 2008. We'll talk about SSE4 performance later on in this article, but here are the instructions you get with Penryn:

Penryn also implements a new divider that impacts both integer and floating point divides using a radix-16 algorithm. The algorithm computes more bits of the result of a divide each pass (four bits per iteration vs. two bits in Conroe), decreasing divide latency.

The faster divider is a very specific enhancement that should manifest itself as a performance boost in 3D and imaging applications.

Penryn's Super Shuffle Engine should also improve SSE2, SSE3 and SSE4 applications that use a lot of shuffle operations. Cache performance is also improved slightly for misaligned stores, which should improve performance, once again, in 3D and imaging applications. Finally, there are some power enhancements made to Penryn, but these are mobile-specific and thus don't apply to any of the desktop variants.

What do we have here today? Yorkfield
Comments Locked

16 Comments

View All Comments

  • emenk - Sunday, January 20, 2008 - link

    From first page (this article): "As we saw in our original Penryn preview, Penryn's cache performance remains unchanged; latencies in our final stepping are identical to Conroe."

    From the original Penryn preview (3rd page):
    "Not only is Wolfdale's L2 cache larger, but it also happens to be slightly faster than its predecessor. Intel has shaved off a single clock cycle from Wolfdale's L2 access time; we're already off to a good start."

    Isn't this a contradiction?

    Ignore this (testing quote tags):
    [quote]Quote goes here.[/quote]
  • IntelUser2000 - Tuesday, October 30, 2007 - link

    You know that will not be true in the true Phenom comparison right Anand?? Take a look here: http://techreport.com/articles.x/8236/14">http://techreport.com/articles.x/8236/14

    Dual Opteron is slower than a Single Opteron, yet you still used Dual Opteron against a single Barcelona. Why?? No really, WHY?!?

    "Because of these limitations we refrained from running any comparative benchmarks to desktop Athlon 64 X2s, instead we chose to run a single quad-core Opteron in our server platform against a pair of dual-core Opterons to simulate Phenom vs. K8 on the desktop."

    You could have took games like Oblivion with Single socket Opteron to see the real advantages. This is the worst comparison, ever. And to make it worse, you put "simulated" benchmarks.
  • victory - Tuesday, October 30, 2007 - link

    Wouldn't Intel be able to take immediate advantage of the new SSE4
    instructions in a new integrated graphics chipset perhaps then
    competing with nVidia as well as beating AMD's integrated chipsets?
  • magreen - Monday, October 29, 2007 - link

    It does 4GHz easily on the stock cooler? So why don't you strap a TR ultra 120 ex on there and tell us what it can really do? Cmon Anand, stop teasing us and tell us what we really want to know!
  • AnnihilatorX - Monday, October 29, 2007 - link

    It's a shame that they delay the release date of more affordable Yorkfields to January, just missed to Christmas sales.

    I am p0lanning to upgrade my computer and not sure whether to wait for Yorkfield or buy a Q6600.
  • idgaf13 - Monday, October 29, 2007 - link

    Intel is trying to suppress Christmas sales and have a negative influence on "other companies" earnings while relieving themselves of Old Inventory.
    45nm process is going to produce so many CPUs per wafer that prices will fall fast or inventory will rise quickly.
    With respect to the traditional cycle of product releases and price changes ,
    A January launch date allows for the longest possible time before prices begin to tumble
    typically after the trade shows in the first two quarters of the year.
    It also more time to perfect the production process.
    Question is do really need to be "the first on the block" to have this CPU ?
    Or can you wait until the price falls by 50% or June/July for the best price?
    Possibly even a faster CPU by then.
  • MGSsancho - Monday, October 29, 2007 - link

    anand, could you be so kind as to point to where you got the info on the new sse4 instructions? the chart would be cool but some pdfs or something from into would be awsome
  • jsaldate - Friday, November 9, 2007 - link

    Penryn SDK: http://softwarecommunity.intel.com/articles/eng/11...">http://softwarecommunity.intel.com/articles/eng/11...http://softwarecommunity.intel.com/articles/eng/11...
  • Ryan Smith - Monday, October 29, 2007 - link

    http://www.intel.com/technology/architecture-silic...">From Intel's website
  • MGSsancho - Monday, October 29, 2007 - link

    thanks a lot =)

Log in

Don't have an account? Sign up now