A couple days ago, we published our Ivy Bridge Desktop Lineup Overview in which we mentioned that Ivy Bridge will remain a quad-core solution. There are dozens of forum posts with people asking why there's no hex-core Ivy Bridge, so now seems like a good time to address the question. Fundamentally, Ivy Bridge is a die shrink of Sandy Bridge (a "tick" in Intel's world), and that usually means either the core count or frequency is increased due to the lower power consumption of the smaller process node. Thus, instead of hex-core, we get a chip that looks much the same as a year-old Sandy Bridge, only with improved efficiency and some other moderate tweaks to the design. Let's go through some of the elements that influence the design of a new processor, and when we're done we will have hopefully clarified why Ivy Bridge remains a quad-core solution.

Marketing

If we look at the situation from the marketing standpoint first, having a hex-core Ivy Bridge die would more or less kill the just released Sandy Bridge E. Sure, IVB is about five months away, but I doubt Intel wants to relive the Sandy Bridge vs. Nehalem (i7-9xx) situation--even Bloomfield vs. Lynnfield was quite bad. If Intel created a hex-core IVB die, they would have to also substantially cut the prices of SNB-E. The current cheapest hex-core SNB-E is $555, while IVB hex-core would most likely be priced at $300~$400 since it's aimed at the mainstream; otherwise very few SNB-E systems would be sold. Even then, most consumers would opt for the IVB platform due to cheaper motherboard costs and lower TDP. PCIe 3.0 should also make 16 lanes fine for dual-GPU setups, reducing the market for SNB-E even more.

Differentiating the lineup by keeping Ivy Bridge quad-core allows some market for SNB-E among enthusiast consumers. Ivy Bridge E isn't coming before H2 2012 anyway so SNB-E must please the high-end until IVB-E hits. In the end, we still recommend SNB-E primarily for servers and workstations where the extra memory channels, PCIe lanes, and dual-socket support are more important, but the lack of hex-core IVB parts at least gives the platform a bit more of an advantage.

Evolution from traditional CPU to SoC

There are more than just marketing reasons, though. If we look at the following die shots, we can see that CPUs are becoming increasing similar to SoCs.


Quad-core Kentsfield package (2006)


Quad-core Nehalem die (2008)

 

Quad core Lynnfield die (2009)


Quad-core Sandy Bridge die (2011)

These four (well, techically three because Kentsfield consists of two dual-core Conroe dies) chips are the only "real" quad-core CPUs from Intel. There are quad-core Gulftown Xeons, and there will soon be quad-core SNB-E CPUs, but they all have more cores on the actual die; some of them have just been disabled. Comparing the die shots, we notice that our definition of CPU has changed a lot in only five years or so. Kentsfield is a traditional CPU, consisting of processing cores and L2 cache. In 2008, Nehalem moved the memory controller onto the CPU die. In 2009, Lynnfield brought on-die PCIe controller, which allowed Intel to get rid of the Northbridge-Southbridge combination and replace it with their Platform Controller Hub. A year and a half later, Westmere (e.g. Arrandale and Clarkdale) brought us on-package graphics--note that it was on-package, not on-die as the GPU was on a separate die. It wasn't until Sandy Bridge that we got on-die graphics. The SNB graphics occupy roughly 25% of the total die area, or the space of three cores if you prefer to look at it that way, and IVB's graphics (a "tock" on the GPU side, as opposed to a "tick") will occupy even more space.

While we don't have a close-up die shot of Ivy Bridge (yet), we do know its approximate die size and the layout should be similar to the Sandy Bridge die as well. Anand estimated the die size to be around 162mm^2 for what appears to be the quad-core die (dual-core SNB with GT2 is 149mm^2, and even with the more complex IGP we wouldn't expect dual-core IVB to be larger). That's a 25% reduction in the die size when compared with quad-core SNB die (216mm^2). A 22nm quad-core SNB die would measure in at 102mm^2 with perfect scaling and assuming all the logic/architecture is the same; however, scaling is never perfect and we know there are a few new additions to IVB, so 162mm^2 for IVB die sounds right. Transistor wise, IVB counts in at around 1.4 billion, a 20.7% increase over quad-core SNB.

To the point, today's CPUs have much more than just CPU cores in them. We could easily have had a hex-core 32nm SNB die at the same die size if the graphics and memory controller were not on-die .We've actually got a pretty good reference point with SNB and Gulftown; accouting for the larger L3 cache and extra QPI link, Gulftown checks in at 240mm^2, though TDP is higher than SNB thanks to the extra cores. The same applies to Ivy Bridge. If Intel took away the graphics, or even kept the same die size as SNB, a hex-core would be more or less given. Instead, Intel has chosen to boost the graphics and decrease the die size.

Subjectively, this is not a bad decision. Intel needs to increase graphics performance, and will do just that in IVB. Intel's IGP solutions account for over 50% of the PC marketshare, yet the graphics are their Achilles' Heel. All modern laptops have integrated graphics (though many still opt to go discrete-only or use switchable graphics), and having more CPU cores isn't that useful if your system will be severely handicapped by a weak GPU. We've also shown in numerous articles how hex-core scaling over quad-core is largely unnecessary on desktop workloads (more on this below). Increasing the graphics' EU count and complexity while also adding CPU cores would have led to a larger than ideal die, not to mention the increased complexity and cost. Remember, Moore's Law was more an observation of the ideal size/complexity relationship of microprocessors rather than pure transistor count, and smaller die sizes generally improve yields in addition to being less expensive.

Performance

While six cores is obviously 50% more than four cores, the increase in cores isn't proportional to the increase in performance. More cores put off more heat and hence clock speeds must be lower, unless the TDP is increased. Intel couldn't have achieved the 77W TDP at reasonable clock speeds if Ivy Bridge was hex-core. On top of that, there is still plenty of software that is not fully multithreaded or fails to scale linearly with core count, so you would rarely be using all six cores (plus six more virtual cores thanks to Hyper-Threading). More cores will only help if you can actually use them, while higher frequencies universally improve performance (all other things being equal). We can give some clear examples of this with a few graphs from our Sandy Bridge E review.

Photoshop is a prime example of software that has limited multithreading. We used the older CS4 in our tests, but CS5 isn't any better, unfortunately. Photoshop can actively take advantage of four threads, and thus the hex-core i7-3960X isn't really faster than quad-core i7-2600K. The slight difference is most likely due to the difference in Turbo (3.9GHz vs 3.8GHz) or the quad-channel vs. dual-channel memory configuration. There are also a few peaks where more than four threads are used, thus i7-2600K is faster than i5-2500K thanks to Hyper-Threading, on top of the extra cache and higher Turbo of course.

In general, games are horribly multithreaded. DiRT 3 is an example of a typical game engine, and adding more cores and enabling Hyper-Threading actually hurts the performance. There are only a handful of games that benefit from more cores, although there are still obstacles to overcome even then (see below).

Civilization V fits in the handful of games that can scale across multiple cores. However, you will still be bottlenecked by your GPU in GPU bound scenarios (like in the second graph), which makes the usefulness of more cores questionable in this case. It's irrelevant whether you get 60 or 120 FPS in CPU bound scenarios if the real gaming performance is ultimately bound by your GPU speed.

The above graphs are biased in the sense that they are for tests where SNB-E is roughly on-par with regular quad-core SNB. However, keep in mind that we are comparing 130W hex-core and 95W quad-core; a 77W hex-core part might need lower clock speeds and could perform worse in limited-threaded tasks (depending on the Turbo speeds of course). In general, tasks like video encoding, 3D rendering, and archiving scale well with additional cores, but how many consumers run these tasks on a day-to-day basis? If you know you will be doing a lot of CPU intensive work that can benefit from additional cores, SNB-E (and later IVB-E) will always be an option--though you'll give up Quick Sync and the integrated graphics in the process. For most consumers, higher frequencies will likely prove far more useful due to the limited multithreading of everyday applications.

There is also the AMD point of view. Bulldozer hasn't exactly been a success story and there is no real competition in the high-end CPU market because of that. Intel could skip Ivy Bridge altogether and their position at the top of the performance charts would still hold. With no real competition, there's no need to push the performance much higher. Four cores is enough to keep the performance higher than AMD's, and reducing the TDP as a side effect is a big plus, especially when thinking about the future and ARM. As another point of comparison with AMD, look at Llano: it's a quad-core CPU that focuses more on improved graphics. For example, the now rather "old" Lynnfield i5-750 (quad-core, no Hyper-Threading) is able to surpass the CPU performance of Llano, but that hasn't stopped plenty of people from picking up Llano as an inexpensive solution that provides all the performance needed for most tasks.

Wrap-Up

When looking at the big picture, there really aren't any compelling reasons why Intel should have gone with hex-core design for Ivy Bridge. Just like the Sandy Bridge vs. Gulftown comparison, IVB vs. SNB-E looks like a good use of market segmentation. Sure, some enthusiasts will argue that having a quad-core CPU is so 2007, but don't let the number of cores fool you. The only thing that 2007 and 2012 quad-cores share is the core count; otherwise they are very different animals (see for example i7-2600K vs Q6600). It also appears that even without additional cores or clock speed improvements, Ivy Bridge will be around 15% faster clock for clock than Sandy Bridge (according to Intel's own tests; a deeper performance analysis will come soon).

Increasing the frequencies and boosting the clock for clock performance yields increased performance in every CPU bound task, and improving the quality of the on-die graphics helps in other areas. In contrast, increasing the core count only helps if the software has proper multithreading and can scale to additional cores--both of which are easier said than done. Given all of the possibilities, it would appear that Intel has done the right thing, and in the process there's no need to try and convince consumers into believing that they need more cores than they actually do.

POST A COMMENT

79 Comments

View All Comments

  • hechacker1 - Monday, December 05, 2011 - link

    Perhaps some of that increased performance is due to the lower TDP and increased efficiency per watt. Ivy could potentially stay in Turbo far more often without exceeding the overall TDP limit. Reply
  • haukionkannel - Monday, December 05, 2011 - link

    That seems very reasonable explanation! That Tri-gate offers some improvement, but low TDP does actually make it more easy to use higher turbo more often! Reply
  • Iketh - Monday, December 05, 2011 - link

    aaaHAH! Yes, that is it, or probably 75% of it anyway... Reply
  • Taft12 - Monday, December 05, 2011 - link

    The 15% figure is coming from marketing.

    I'd be shocked if you could squeeze 15% faster performance from IVB at the same clockspeed except in the most useless of synthetic benchmarks (but don't worry, there'll be plenty of those!)
    Reply
  • MrSpadge - Monday, December 05, 2011 - link

    Yeah, that would actually be a hefty tock, not a tick.

    MrS
    Reply
  • name99 - Tuesday, December 06, 2011 - link

    Just because it isn't a "major" rev doesn't mean that there won't be minor mods. There usually are. They tweak the L2 and L3 latencies, maybe add a few more "virtual" registers, maybe make the buffers holding preloaded instructions, or post-decoded instructions a little longer, etc etc.
    Along with that, there are sometimes ideas that were put in the SB micro-architecture that will add a % or two to performance, but which were disabled for SB because they couldn't be made to work in time, but they're now working in IB.

    At this stage of the game, it would be surprising if all these add up to 15% rather than, say, 5%, but I think we have to withhold judgement until Intel gives us the real micro-architecture details.
    Reply
  • Marlin1975 - Monday, December 05, 2011 - link

    Same thing was said when dual core was just coming on and later Quad.

    Things are not going to be coded for dual, quad, etc... until there are enough on the market.

    "Build it and..."
    Reply
  • JarredWalton - Monday, December 05, 2011 - link

    Quad-core chips have now been on the market for over five years, and there are still regrettably few applications that can leverage the additional cores, particularly when we look at the apps that people use 95% of the time (e.g. web browsers, email, office apps, and to a lesser extent image editing). There are plenty of tasks that can get split in two, but splitting them into four independent tasks isn't always possible, and taking it to six, eight, etc. results in most things reaching their limit of subdivision.

    3D rendering is a great example of a task that scales almost perfectly, and video transcoding is right there with it, but what can you do to make Word utilize (or even need) more than four cores? Sure, multitasking, but even then you're either going to run into bottlenecks elsewhere (e.g. HDD needs to be replaced with SSD for it to scale), and while you can do something like a virus scan, video transcode, and play a game on a sixteen core monster... who actually does that sort of thing?
    Reply
  • DanNeely - Monday, December 05, 2011 - link

    Chrome/IE's process per tab and to a (much) lesser extent FF/Opera's seperate plugin processes kinda sorta take advantage of multiple cores in that they reduce the ability of a badly behaving tab from strangling the browser 's performance by going into an infinite loop. Reply
  • name99 - Tuesday, December 06, 2011 - link

    "Chrome/IE's process per tab "
    This is a perfect example of a "not-a-real" solution.
    Yeah, it kinda helps if you're running Google's hoped for world of really heavy-weight JS apps in multiple windows, but that's not most people.

    What most people want is "I open a page and it appears immediately", not "I can run three heavy weight, constantly running JS pages in the background".

    And you ultimately admit this yourself: "kinda sorta take advantage of multiple cores in that they reduce the ability of a badly behaving tab from strangling the browser 's performance by going into an infinite loop." That's nice --- but the point of better hardware is not to deal with crap web pages. (Google's solution of shunting them to ever lower in the search rankings is a much better solution).

    What we WANT is an engine that will run all the heavyweight parts of processing a web page in parallel. We simply don't have that --- no point in pretending otherwise.
    Reply

Log in

Don't have an account? Sign up now