The Comparison Points

Intel sort of dropped this CRB off without anything to compare it to, so I scrambled over the past week looking for things to put Iris Pro’s performance in perspective. The obvious candidate was Apple’s 15-inch MacBook Pro with Retina Display. I expect its successor will use Iris Pro 5200, making this a perfect comparison point. The 15-inch rMBP is equipped with a GeForce GT 650M with a 900MHz core clock and a 5GHz memory datarate.

I also dusted off a GeForce GT 640 desktop card to shed a little more light on the 650M comparison. The 640 has a slightly higher core clock (925MHz) but it only has 1.7GHz DDR3, working out to be 27GB/s of memory bandwidth compared to 83GB/s for the 650M. Seeing how Iris Pro compares to the GT 640 and 650M will tell us just how good of a job Crystalwell is doing.

Next up is the desktop Core i7-4770K with HD 4600 graphics. This is a Haswell GT2 implementation, but at a much higher TDP than the 47W mobile part we’re comparing it to (84W). In a notebook you can expect a much bigger gap in performance between the HD 4600 and Iris Pro than what we’re showing here. Similarly I also included a 77W HD 4000 for a comparison to Ivy Bridge graphics.

On the AMD front I have the 35W A10-4600M (codename Trinity), featuring AMD’s 7660G processor graphics. I also included the 100W A10-5800 as a reference point since we were largely pleased with the GPU performance of Trinity on the desktop.

I listed TDPs with all of the parts I’m comparing here. In the case of the GT 640 I’m adding the TDP of the CPU (84W) and the GPU (65W). TDP is half of the story with Iris Pro, because the CPU, GPU and eDRAM all fit into the same 47W power envelope. With a discrete GPU, like the 650M, you end up with an extra 45W on top of the CPU’s TDP. In reality the host CPU won’t be running at anywhere near its 45W max in that case, so the power savings are likely not as great as you’d expect but they’ll still be present.

At the request of at least one very eager OEM, Intel is offering a higher-TDP configuration of the i7-4950HQ. Using Intel’s Extreme Tuning Utility (XTU) I was able to simulate this cTDP up configuration by increasing the sustained power limit to 55W, and moving the short term turbo power limit up to 69W. OEMs moving from a 2-chip CPU + GPU solution down to a single Iris Pro are encouraged to do the same as their existing thermal solutions should be more than adequate to cool a 55W part. I strongly suspect this is the configuration we’ll see in the next-generation 15-inch MacBook Pro with Retina Display.

To remove as many bottlenecks as possible I configured all integrated GPU options (other than Iris Pro 5200) with the fastest supported memory. That worked out to being DDR3-2133 on desktop Trinity and desktop IVB, and DDR3-2400 on desktop Haswell (HD 4600). The mobile platforms, including Iris Pro 5200, all used DDR3-1600.

On the software side I used NVIDIA's GeForce R320 v320.18, AMD's Catalyst 13.6 beta and Intel's 9.18.10.3177 drivers with Crystalwell support.

The Core i7-4950HQ Mobile CRB Metro: Last Light
Comments Locked

177 Comments

View All Comments

  • 8steve8 - Saturday, June 1, 2013 - link

    Great work intel, and great review anand.
    As a fan of low power and small form factor high performance pcs, I'm excited about the 4770R.

    my question is how do we get a system with 4770R ?
    will it be in an NUC, if so, when/info?
    will there be mini-itx motherboards with it soldered on?
  • bill5 - Saturday, June 1, 2013 - link

    Anand, would you say the lack of major performance improvement due to crystalwell bodes ill for Xbox one?

    The idea is ESRAM could make the 1.2 TF Xbox One GPU "punch above it's weight" with more efficiency due to the 32MB of low latency cache (ALU's will stall less waiting on data). However these results dont really show that for Haswell (the compute results that scale perfectly with ALU's for example).

    Here note I'm distinguishing between the cache as bandwidth saver, I think we can all agree it will serve that purpose- and as actual performance enhancer. I'm interested in the latter for Xbox One.
  • Kevin G - Saturday, June 1, 2013 - link

    A couple of quotes and comments from the article:

    "If Crystalwell demand is lower than expected, Intel still has a lot of quad-core GT3 Haswell die that it can sell and vice versa."

    Intel is handicapping demand for GT3e parts by not shipping them in socketed form. I'd love to upgrade my i7-2600k system to a 4770K+GT3e+TSX setup. Seriously Intel, ship that part and take my money.

    "The Crystalwell enabled graphics driver can choose to keep certain things out of the eDRAM. The frame buffer isn’t stored in eDRAM for example."

    WTF?!? The eDRAM would be the ideal place to store various frequently used buffers. Having 128 MB of memory leaves plenty of room for streaming in textures as need be. The only reason not to hold the full frame buffer is if Intel has an aggressive tile based rendering design and only a tile is stored there. I suspect that Intel's driver team will change this in the future.

    "An Ultrabook SKU with Crystalwell would make a ton of sense, but given where Ultrabooks are headed (price-wise) I’m not sure Intel could get any takers."

    I bet Apple would ship a GT3e based part in the MacBook Air form factor. They'd do something like lower the GPU clocks to prevent it from melting but they want it. It wouldn't surprise me if Apple managed to negotiate a custom part from Intel again.

    Ultimatley I'm pleased with GT3e. On the desktop I can see the GPU being used for OpenCL tasks like physics while my Radeon 7970 handles the rest of the graphics load. Or for anything else, I'd like GT3e for the massive L4 cache.
  • tipoo - Saturday, June 1, 2013 - link

    "Ultimatley I'm pleased with GT3e. On the desktop I can see the GPU being used for OpenCL tasks like physics while my Radeon 7970 handles the rest of the graphics load. Or for anything else, I'd like GT3e for the massive L4 cache."

    I'd love that to work, but what developer would include that functionality for that niche setup?
  • Kevin G - Saturday, June 1, 2013 - link

    OpenCL is supposed to be flexible enough that you can mix execution targets. This also includes the possibility of OpenCL drivers for CPU's in addition to those that use GPU's. At the very least, it'd be nice for a game or application to manually select the OpenCL target in some config file.
  • Egg - Saturday, June 1, 2013 - link

    I'm only a noob high school junior, but aren't frame buffers tossed after display? What would be the point of storing a frame buffer? You don't reuse the data in it at all. As far as I know, frame buffer != unpacked textures.
    Also, aren't most modern fully programmable GPUs not tile based at all?
    Also, wasn't it mentioned that K-series parts don't have TSX?
  • Kevin G - Saturday, June 1, 2013 - link

    The z-buffer in particular is written and often read. Deferred rendering also blends multiple buffers together and at 128 MB in size, a deferred render can keep several in that memory. AA algorithms also perform read/writes on the buffer. At some point, I do see Intel moving the various buffers into the 128 MB of eDRAM as drivers mature. In fairness, this change may not be universal to all games and dependent on things like resolution.

    Then again, it could be a true cache for the GPU. This would mean that the drivers do not explicitly store the frame buffers there but can could be stored there based upon prefetching of data. Intel's caching hierarchy is a bit weird as the CPU's L3 cache can also be used as a L4 cache for the GPU under HD2000/2500/3000/4000 parts. Presumably the eDRAM would be a L5 cache under the Sandy Bridge/Ivy Bridge schema. The eDRAM has been described as a victim cache though for GPU operations it would make sense to prefetch large amounts of data (textures, buffers). It'd be nice to get some clarification on this with Haswell.

    PowerVR is still tile based. Previous Intel integrated solutions were also tile base though they dropped that with the HD line (and I can't remember if the GMA line was tile based as well).

    And you are correct that the K series don't have TSX, hence why I'd like a 4770K with GT3e and TSX. Also I forgot to throw in VT-d since that too is arbitrarily disabled in the K series.
  • IntelUser2000 - Sunday, June 2, 2013 - link

    Kevin G: Intel dropped the Tile-based rendering in the GMA 3 series generation back in 2006. Although, their Tile rendering was different from PowerVR's.
  • Egg - Sunday, June 2, 2013 - link

    Fair points - I was being a bit myopic and only thought about buffers persisting across frames, neglecting the fact that buffers often need to be reused within the process of rendering a single frame! Can you explain how the CPU's L3 cache is an L4 cache for the GPU? Does the GPU have its own L3 cache already?

    Also I don't know whether PowerVR's architecture is considered fully programmable yet. I know they have OpenCL capabilities, but reading http://www.anandtech.com/show/6112/qualcomms-quadc... I'm getting a vague feeling that it isn't as complete as GCN or Kepler, feature wise.
  • IntelUser2000 - Tuesday, June 4, 2013 - link

    Gen 7, the Ivy Bridge generation, has its own L3 cache. So you have the LLC(which is L3 for the CPU), and its own L3. Haswell is Gen 7.5.

Log in

Don't have an account? Sign up now