Core Architecture Changes

Ivy Bridge is considered a tick from the CPU perspective but a tock from the GPU perspective. On the CPU core side that means you can expect clock-for-clock performance improvements in the 4 - 6% range. Despite the limited improvement in core-level performance there's a lot of cleanup that went into the design. In order to maintain a strict design schedule it's not uncommon for a number of features not to make it into a design, only to be added later in the subsequent product. Ticks are great for this.

Five years ago Intel introduced Conroe which defined the high level architecture for every generation since. Sandy Bridge was the first significant overhaul since Conroe and even it didn't look very different from the original Core 2. Ivy Bridge continues the trend.

The front end in Ivy Bridge is still 4-wide with support for fusion of both x86 instructions and decoded uOps. The uOp cache introduced in Sandy Bridge remains in Ivy with no major changes.

Some structures within the chip are now better optimized for single threaded execution. Hyper Threading requires a bunch of partitioning of internal structures (e.g. buffers/queues) to allow instructions from multiple threads to use those structures simultaneously. In Sandy Bridge, many of those structures are statically partitioned. If you have a buffer that can hold 20 entries, each thread gets up to 10 entries in the buffer. In the event of a single threaded workload, half of the buffer goes unused. Ivy Bridge reworks a number of these data structures to dynamically allocate resources to threads. Now if there's only a single thread active, these structures will dedicate all resources to servicing that thread. One such example is the DSB queue that serves the uOp cache mentioned above. There's a lookup mechanism for putting uOps into the cache. Those requests are placed into the DSB queue, which used to be split evenly between threads. In Ivy Bridge the DSB queue is allocated dynamically to one or both threads.

In Sandy Bridge Intel did a ground up redesign of its branch predictor. Once again it doesn't make sense to redo it for Ivy Bridge so branch prediction remains the same. In the past prefetchers have stopped at page boundaries since they are physically based. Ivy Bridge lifts this restriction.

The number of execution units hasn't changed in Ivy Bridge, but there are some changes here. The FP/integer divider sees another performance gain this round. Ivy Bridge's divider has twice the throughput of the unit in Sandy Bridge. The advantage here shows up mostly in FP workloads as they tend to be more computationally heavy.

MOV operations can now take place in the register renaming stage instead of making it occupy an execution port. The x86 MOV instruction simply copies the contents of a register into another register. In Ivy Bridge MOVs are executed by simply pointing one register at the location of the destination register. This is enabled by the physical register file first introduced in Sandy Bridge, in addition to a whole lot of clever logic within IVB. Although MOVs still occupy decode bandwidth, the instruction doesn't take up an execution port allowing other instructions to execute in place of it.

ISA Changes

Intel also introduced a number of ISA changes in Ivy Bridge. The ones that stand out the most to me are the inclusion of a very high speed digital random number generator (DRNG) and supervisory mode execution protection (SMEP).

Ivy Bridge's DRNG can generate high quality random numbers (standards compliant) at 2 - 3Gbps. The DRNG is available to both user and OS level code. This will be very important for security and algorithms going forward.

SMEP in Ivy Bridge provides hardware protection against user mode code being executed in more privileged levels.

Motherboard & Chipset Support Cache, Memory Controller & Overclocking Changes
POST A COMMENT

97 Comments

View All Comments

  • Arnulf - Sunday, September 18, 2011 - link

    "Voltage changes have a cubic affect on power, so even a small reduction here can have a tangible impact."

    P = V^2/R

    Quadratic relationship, rather than cubic ?
    Reply
  • damianrobertjones - Sunday, September 18, 2011 - link

    " As we've already seen, introducing a 35W quad-core part could enable Apple (and other OEMs) to ship a quad-core IVB in a 13-inch system."

    Is Apple the only company that can release a 13" system?
    Reply
  • medi01 - Monday, September 19, 2011 - link

    No. But it's the only one that absolutely needs to be commented on in orgasmic tone in US press (and big chunk of EU press too) Reply
  • JonnyDough - Monday, September 19, 2011 - link

    They're the only ones who will market it with a flashy Apple logo light on a pretty aluminum case. Everyone knows that lightweight pretty aluminum cases are a great investment on a system that is outdated after just a few years. I wish Apple would make cars instead of PCs so we could bring the DeLorean back. Something about that stainless steel body just gets me so hot. Sure, it would get horrible gas mileage and be less safe in an accident. But it's just so pretty! Plus, although it would use a standard engine made by Ford or GM under the hood, its drivers would SWEAR that Apple builds its own superior hardware! Reply
  • cldudley - Sunday, September 18, 2011 - link

    Am I the only one who thinks Intel is really wasting a lot of time and money on improvements to their on-die GPU? They keep adding features and improvements to the onboard video, right up to including DirectX 11 support, but isn't this really all an excersise in futility?

    Ultimately a GPU integrated with the CPU is going to be bottlenecked by the simple fact that it does not have access to any local memory of it's own. Every time it rasterizes a triangle or performs a texture operation, it is doing it through the same memory bus the CPU is using to fetch instructions, read and write data, etc.

    I read that the GPU is taking a larger proportion of the die space in Ivy Bridge, and all I see is a tragic waste of space that would have been better put into another (pair of?) core or more L1/L2 cache.

    I can see the purpose of integrated graphics in the lowest-end SKUs for budget builds, and there are certainly power and TDP advantages, and things like Quick-Sync are a great idea, but why stuff a GPU in a high-end processor that will be blown away by a comparatively middle-of-the-road discrete GPU?
    Reply
  • Death666Angel - Sunday, September 18, 2011 - link

    I disagree. AMD has shown that on-die GPUs can already compete with middle-of-the-road discrete graphics in notebooks. Trinity will probably take on middle-of-the-road in the current desktop space.
    Your memory bandwidth argument also doesn't seem to be correct, either. Except for some AMD mainboard graphics with dedicated sideport memory, all IGPs use the RAM, but a lot of them are doing fine. It is also nice to finally see higher clocked RAM be taken advantage of (see Llano 1666MHz vs 1800MHz). DDR4 will add bandwidth as well.
    Once the bandwidth becomes a bottleneck, you can address that, but at the moment Intel doesn't seem to be there, yet, so they keep addressing their other GPU issues. What is wrong with that?
    Also, how many people who buy high-end CPUs end up gaming 90% of the time on them? A lot of people need high-end CPUs for work related stuff, coding, CAD etc. Why should they have to buy a discrete graphics card?

    Overall, you are doing a lot of generalization and you don't take into account quite a few things. :-)
    Reply
  • cldudley - Sunday, September 18, 2011 - link

    Ironically I spend lots of time in AutoCAD, and a discrete graphics board makes a tremendous difference. Gamer-grade stuff is usually not the best thing in that arena though, it needs to be the special "workstation" cards, which have very different drivers. Quadro or FireGL.

    I agree with you on the work usage, and gaming workloads not being 90% of the time, but on the other hand,workstations tend to have Xeons in them, with discrete graphics cards.
    Reply
  • platedslicer - Sunday, September 18, 2011 - link

    As a fraction of the computer market, buyers who want power over everything else have plunged. Mobility is so important for OEMs now that fitting already-existent performance levels into smaller, cheaper devices becomes more important than pushing the envelope. I still remember a time when hardly anybody gave a rat's ass about how much power a CPU consumed as long as it didn't melt down. Today, power consumption is a crucial factor due to battery life and heat.

    Personally these developments make me rather sad, partly because I like ever-shinier games, and (more importantly) because seeing the unwashed masses talk about computers as if they were clothing brands makes me want to rip out their throats. That's how the world works, though. Hopefully the chip makers will realize that there's still a market for power over fluff.

    Looking at it on the bright side, CPU power stagnation might make game designers pay more attention to content. Hey, you have to look on the bright side of life.
    Reply
  • KPOM - Monday, September 19, 2011 - link

    I think that's largely because for the average consumer, PCs have reached the point where CPU capabilities are no longer the bottleneck. Look at the success of the 2010 MacBook Air, which had a slow C2D but a speedy SSD, and sold well enough to last into mid-2011. Games are the next major hurdle, but that's the GPU rather than the CPU, and hence the reason it receives a bigger focus in Ivy Bridge (as it also did in Sandy Bridge compared to Westmere).

    The emphasis now is having the power we have last longer and be available in smaller, more portable devices.
    Reply
  • JonnyDough - Monday, September 19, 2011 - link

    You're missing the point. They aren't trying to beef the power of the CPU. CPUs are already quite powerful for most tasks. They are trying to lower energy usage and sell en-mass to businesses that use thousands of computers. Reply

Log in

Don't have an account? Sign up now