New Westmere Details Emerge: Power Efficiency and 4/6 Core Plans

Name: New Westmere Details Emerge: Power Efficiency and 4/6 Core Plans
Item: New Westmere Details Emerge: Power Efficiency and 4/6 Core Plans
Author: Anand Lal Shimpi

by Anand Lal Shimpi on February 3, 2010 12:00 AM EST

Posted in
CPUs

45 Comments | Add A Comment

45 Comments

Today Intel started talking about its ISSCC plans and included in the conference call were some details on Westmere that I previously didn't know. Most of it has to do with power savings, but also some talk about 32nm quad-core Westmere derivatives!

Westmere is Intel’s 32nm Nehalem derivative. Take Nehalem with all of its inherent goodness, add AES instructions, build it using 32nm transistors and you’ve got Westmere.

Westmere's Secret: Power Gated Un-Core

We just recently met the first incarnation of Westmere - Clarkdale, the dual-core processor that’s been branded the Core i3 and Core i5. Later this quarter we’ll meet Gulftown, a six-core Westmere that’ll be sold under the Core i7 label. All of that is old news, now for the new stuff.

With Nehalem Intel started power gating parts of the chip. Stick a power gate transistor in front of the supply voltage to each core and you can effectively shut off power (including leakage power) to the core when it’s not in use. This was a huge step in increasing power efficiency, something that’s evident when you look at Nehalem idle power numbers.

When you shut off a core you need to save the core’s state so that when it wakes back up it knows what to do next. Remember that power down these cores can happen dozens of times in the course of a second. The cores can’t wake up in a reboot state, they need to simply shut off when they’re not needed and wake back up to continue work when they are needed.

In Nehalem the core’s state (what instruction it’s going to work on next, data in its registers, etc...) is saved in the last level cache - L3. Unfortunately this means that the L3 cache can’t be powered down when the cores are idle, because that’s where they store their state information. Take this one step further and it also means that Nehalem’s L3 cache wasn’t power-gated.

In Westmere, Intel has added a dedicated SRAM to store core state data. Each core dumps its state information into the dedicated SRAM and then shuts off. With the state data kept out of the L3 cache, Westmere takes the next logical step and power gates the L3.

Intel lists this dedicated SRAM as a Westmere-mobile feature, there’s a chance it’s not present on the desktop chips. But it makes sense. Without a way of powering down the L3 cache, Westmere would be a very power hungry mobile CPU. Westmere appears to make it mobile-friendly.

Hex and Quad Core Westmere in 2010?

The last bits of information Intel revealed have to do with its high end desktop/workstation/server intentions with Westmere. The 6-core Westmere is a 240mm^2 chip made up of 1.17B transistors:

That’s six cores on a single die, but with 12MB of L3 cache. Remember that Nehalem/Lynnfield have 8MB and Clarkdale has 4MB. Nehalem’s chief architect, Ronak Singhal told me that he wanted to maintain at least 2MB of L3 per core on the die. A 6-core Westmere adheres to that policy.

The chip works in existing LGA-1366 sockets, so you still have three DDR3 memory channels. 6C Westmere does support both regular DDR3 (1.5V) as well as low voltage DDR3 (1.35V). This is particularly useful in servers where you’ve got a lot of memory present, power consumption should be noticeably lower.

The other big news is that Intel will be releasing 4-core variants of Westmere as well. While I originally assumed this would mean desktop and server, Intel hasn't committed to anything other than a quad-core Westmere. These parts could end up as server only or server and desktop.

The table below shows you the beauty of 32nm. Smaller die, more transistors:

CPU	Codename	Manufacturing Process	Cores	Transistor Count	Die Size
Westmere 6C	Gulftown	32nm	6	1.17B	240mm²
Nehalem 4C	Bloomfield	45nm	4	731M	263mm²
Nehalem 4C	Lynnfield	45nm	4	774M	296mm²
Westmere 2C	Clarkdale	32nm	2	384M	81mm²

It also shows that there's a definite need for Intel to build a quad-core 32nm chip. Die sizes nearing 300mm² aren't very desirable. The question is whether we'll see quad-core 32nm in 2010 desktops or if we'll have to wait for Sandy Bridge in 2011 for that.

We’ll find out soon enough.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

45 Comments

View All Comments

IntelUser2000 - Wednesday, February 3, 2010 - link
I think you need to check again which chips do power gating for L3 Anandtech.

On the presentation titled: The Platform Evolves: Understanding the Intel Next Generation Microarchitectures(Nehalem and Westmere)

Filename: ACHS002

Page 10 says:
"Extended in 2009 platforms as Integrated Power Gates also used in shared cache and I/O logic to dynamically power down when inactive".

2009 doesn't seem like Westmere does it?
TETRONG - Wednesday, February 3, 2010 - link
That's great. Just get to 10 Ghz already:|
IntelUser2000 - Wednesday, February 3, 2010 - link
(Sorry Tetrong, I accidentally reported you, MODS please do not ban or delete the post!!)

Tetrong, let's see what really happens at the simplest level using simplest math. At 10GHz frequency, light in vacuum can travel 0.3 centimetres or 30mm per clock cycle. Now if we assume the core is perfectly square, we can use pythagorean's theorem to calculate the maximum die size.

You can have a maximum die size of 21mm on each side or 441mm2 for 10GHz frequency, if EVERYTHING is perfect. That won't be true, even if the circuitry doesn't have any faults reaching such frequency. True, the core execution won't be anywhere near 400mm2, so 10GHz is possible. But as of now, we won't see it in the near future.
ssj4Gogeta - Thursday, February 4, 2010 - link
Your calculations are right, it's 3cm or 30mm.

But electrons do NOT travel at the speed of light. Going by your theory, that would make the max die size just a fraction of 441 sq. mm.

Actually it may be even smaller, because I'm quite sure electrons don't take a linear path inside the processor. But the main reasons why still don't see a 10GHz chip are completely unrelated to this.
Vutshi - Thursday, February 4, 2010 - link
Indeed! The electrons do not travel at the speed of light and they never have to do it. This is the electromagnetic field which forces them to move and the propagation speed of the field is the speed of light (well, almost, in the dielectri?). So the calculation is valid.
LaughingTarget - Wednesday, February 10, 2010 - link
It's true that electrons don't travel at the speed of light. They travel at a few inches per hour through any form of conductor. Electrons are "bumped" from the rear by an entering electron. You need to think of the inside of a transistor, wire, or other conductor as a huge traffic jam. Electrons are lined up and when hit from behind by a new electron, the whole thing just moves forward slowly until the one at the front takes all the force and is shot out.

A person can calculate using a manual stopwatch how long it takes for an actual electron to make its way through a circuit. You could leave the system on all day and the first electron that entered your PC would find its way to ground by the end of the day. That's how slow electrons are.

What needs to be considered is how easy and quickly the force generated by the entering electron reaches the lead electron, not the electron movement itself.
nonoski - Thursday, February 4, 2010 - link
You guys have no idea what you are talking about!!! Stop with the uninformed techno babble all ready! There is not one modern processor that needs to get a signal from one end of the chip to the other in ONE clock cycle!!!!!!!!!!!! Just stop.
stephenbrooks - Monday, March 15, 2010 - link
Quite true! I believe the reason clocks are stuck around 3GHz is more to do with thermal/power usage considerations than feasibility. And if CPUs are doing more "per clock" then the clock rate isn't a particularly good metric of speed either. Can't help thinking at some point they'll run out of instruction-level parallelism and have to start upping the clocks on the executing parts again though.
ssj4Gogeta - Friday, February 5, 2010 - link
That's why I said "going by your theory..." and I also said the reasons are completely unrelated.
I was just commenting on the calculations, not the theory.
IntelUser2000 - Thursday, February 4, 2010 - link
Damnit, so I did have the original calculation right lol. Somehow I was really confused there.

Well I do realize that. Only in vacuum can a light reach that speed, and electrons in a wire are even slower.

You don't need to limit the max die size to 441 sq mm. You can just have different clocks for each parts of the processor or have its own clocks.

New Westmere Details Emerge: Power Efficiency and 4/6 Core Plans

Post Your Comment

45 Comments

View All Comments

IntelUser2000 - Wednesday, February 3, 2010 - link

TETRONG - Wednesday, February 3, 2010 - link

IntelUser2000 - Wednesday, February 3, 2010 - link

ssj4Gogeta - Thursday, February 4, 2010 - link

Vutshi - Thursday, February 4, 2010 - link

LaughingTarget - Wednesday, February 10, 2010 - link

nonoski - Thursday, February 4, 2010 - link

stephenbrooks - Monday, March 15, 2010 - link

ssj4Gogeta - Friday, February 5, 2010 - link

IntelUser2000 - Thursday, February 4, 2010 - link

Log in

Don't have an account? Sign up now