Better Virtualization

Intel's current hardware support for virtualization in the current Core architecture is lackluster to say the least. To understand this you must understand what happens in a "pure" software-based virtualization solution such as VMware ESX 2.5.3 running on older Intel CPUs.

A technique called "ring deprivileging" is used as the guest OS cannot be allowed to run in the lowest ring 0 where it normally runs; the Virtual Machine Manager or hypervisor now runs there. That means that every time the guest application asks the help of the guest OS, which needs to run instructions which are only available in ring 0, the VMM must intercept that "SYSENTER" and emulate the normal execution. This is quite costly in performance terms.

Hardware assisted virtualization does not have that problem: both the OS and the VMM have their own ring 0. Despite this, Intel's HW assisted solutions didn't give any speed boost. It has not been discussed in detail, but Penryn speeds up virtual machine transition (entry/exit) times by 25% to 75%, and this requires no virtual machine software changes. This might be similar to AMD's nested page technology, although we don't have any clear details at present.

Last but not least, the dual core Penryn processors get a 6 MB shared cache and the quad versions get 12 MB cache. Both new designs will also come with a "higher degree of associativity". Considering the current designs are 16-way set associative, most likely the newer chips will feature a 24-way set associative L2 cache.

Intel EDAT: the End of the Multi-core Clock Speed Disadvantage?

Intel also talked about its "Enhanced Dynamic Acceleration Technology" which is effectively integrated overclocking based on load. If you are running a single threaded application (or a multi-threaded application that's predominantly using a single thread), Intel's EDAT can power down the second core and increase the frequency of the working core to maintain the same thermal envelope at all times.

Intel's EDAT could spell the end of the clock speed differential between single and multi-core processors. With all cores running workloads, the multi-core system would be clocked lower, but when some cores are idle the chip could potentially run at the same speed as a single core solution would. Single core designs have pretty much disappeared from roadmaps already, but considering there are still applications that are single threaded in nature and benefit more from clock speed improvements, future processors will offer both options in a single package.

Performance

Intel hasn't revealed too much about the performance of Penryn but Pat did leave us with a few comments. We don't know anything more about the test conditions than what we are presenting, and we didn't do the measurements ourselves, so take it for what it's worth.

Comparing a 3.2GHz Penryn (1.6GHz FSB) to a 3.0GHz Conroe (1.33GHz FSB), Intel has measured more than 20% increase in gaming performance (with no code changes). For video encoding applications, if SSE4 is utilized, the same Penryn vs. Conroe comparison can offer more than a 40% increase in performance.

Finally, Intel mentioned that in the server space, the fastest quad core Penryn available (>3GHz) vs. a 2.67GHz quad core Xeon resulted in a greater than 45% increase in performance in "bandwidth and FP intensive applications". It's incredibly vague (and oddly similar to AMD's claims of Barcelona vs. Xeon performance), but Pat mentioned that STREAM and certain benchmarks in SpecFP could be considered to be "bandwidth and FP intensive".

Again, we are just reporting what Intel told us. It will be a while before we can actually verify any of these claims or put them in the right context. Given the various enhancements that we've reported on, however, it's only reasonable to expect Penryn to be faster than Conroe, clock-for-clock. Whether that's 10% faster, 20% faster, or something else will be made clear in the future.

Index Nehalem Details and Conclusion
Comments Locked

14 Comments

View All Comments

  • sdsdv10 - Thursday, March 29, 2007 - link

    Yes, I agree early disclosure is a good thing, but let's clarify the "public" doesn't hate RAMBUS. Most of the "public" doesn't know who Rambus is or even care. The only ones who hate Rambus is a minority of tech geeks who can't move on.
  • feelingshorter - Wednesday, March 28, 2007 - link

    So Intel will finally use a IMC like AMD. Does that mean overclocking will be relatively flat across the board? If all the CPUs of the same stepping have the same IMC, then overclocks will be less motherboard dependent wont it? Or am I confusing IMC with something else? That also means all the chip manufactures will have to find new business. Although they can continue to manufacture motherboards, we've seen what happened to companies in the past when AMD decided to use IMC.
  • tuteja1986 - Thursday, March 29, 2007 - link

    Well we will have to wait till Q2 of 2008 to see if intel cliams are true or not. Till then my eyes lay on AMD's Barcelona architecture since its coming out this year and if AMD claims are right then it should be able to beat up a intel offering.
  • Locutus465 - Wednesday, March 28, 2007 - link

    Oh yeah, you hit the nail on the head... Much more important than the actual performance gains is the fact that neither company is going to be sititing on their laurals putting out bad or otherwise unexciting products out for the forseeable future.

Log in

Don't have an account? Sign up now