Final Words

For the past few years Intel has been threatening to make discrete GPUs obsolete with its march towards higher performing integrated GPUs. Given what we know about Iris Pro today, I'd say NVIDIA is fairly safe. The highest performing implementation of NVIDIA's GeForce GT 650M remains appreciably quicker than Iris Pro 5200 on average. Intel does catch up in some areas, but that's by no means the norm. NVIDIA's recently announced GT 750M should increase the margin a bit as well. Haswell doesn't pose any imminent threat to NVIDIA's position in traditional gaming notebooks. OpenCL performance is excellent, which is surprising given how little public attention Intel has given to the standard from a GPU perspective.

Where Iris Pro is dangerous is when you take into account form factor and power consumption. The GT 650M is a 45W TDP part, pair that with a 35 - 47W CPU and an OEM either has to accept throttling or design a cooling system that can deal with both. Iris Pro on the other hand has its TDP shared by the rest of the 47W Haswell part. From speaking with OEMs, Iris Pro seems to offer substantial power savings in light usage (read: non-gaming) scenarios. In our 15-inch MacBook Pro with Retina Display review we found that simply having the discrete GPU enabled could reduce web browsing battery life by ~25%. Presumably that delta would disappear with the use of Iris Pro instead.

Lower thermal requirements can also enabler smaller cooling solutions, leading to lighter notebooks. While Iris Pro isn't the fastest GPU on the block, it is significantly faster than any other integrated solution and does get within striking distance of the GT 650M in many cases. Combine that with the fact that you get all of this in a thermal package that a mainstream discrete GPU can't fit into and this all of the sudden becomes a more difficult decision for an OEM to make.

Without a doubt, gaming focused notebooks will have to stick with discrete GPUs - but what about notebooks like the 15-inch MacBook Pro with Retina Display? I have a dedicated PC for gaming, I use the rMBP for work and just need a GPU that's good enough to drive everything else in OS X. Intel's HD 4000 comes close, and I suspect Iris Pro will completely negate the need for a discrete GPU for non-gaming use in OS X. Iris Pro should also be competent enough to make modern gaming possible on the platform as well. Just because it's not as fast as a discrete GPU doesn't mean that it's not a very good integrated graphics solution. And all of this should come at a much lower power/thermal profile compared to the current IVB + GT 650M combination.

Intel clearly has some architectural (and perhaps driver) work to do with its Gen7 graphics. It needs more texture hardware per sub-slice to remain competitive with NVIDIA. It's also possible that greater pixel throughput would be useful as well but that's a bit more difficult to say at this point. I would also like to see an increase in bandwidth to Crystalwell. While the 50GB/s bi-directional link is clearly enough in many situations, that's not always the case.

Intel did the right thing with making Crystalwell an L4 cache. This is absolutely the right direction for mobile SoCs going forward and I expect Intel will try something similar with its low power smartphone and tablet silicon in the next 18 - 24 months. I'm pleased with the size of the cache and the fact that it caches both CPU and GPU memory. I'm also beyond impressed that Intel committed significant die area to both GPU and eDRAM in its Iris Pro enabled Haswell silicon. The solution isn't perfect, but it is completely unlike Intel to put this much effort towards improving graphics performance - and in my opinion, that's something that should be rewarded. So I'm going to do something I've never actually done before and give Intel an AnandTech Editors' Choice Award for Haswell with Iris Pro 5200 graphics.

This is exactly the type of approach to solving problems I expect from a company that owns around a dozen modern microprocessor fabs. Iris Pro is the perfect example of what Intel should be doing across all of the areas it competes in. Throw smart architecture and silicon at the problem and don't come back whining to me about die area and margins. It may not be the fastest GPU on the block, but it's definitely the right thing to do.

I'm giving Intel our lowest award under the new system because the solution needs to be better. Ideally I wouldn't want a regression from GT 650M performance, but in a pinch for a mostly work notebook I'd take lower platform power/better battery life as a trade in a heartbeat. This is absolutely a direction that I want to see Intel continue to explore with future generations too. I also feel very strongly that we should have at least one (maybe two) socketed K-series SKUs with Crystalwell on-board for desktop users. It is beyond unacceptable for Intel to not give its most performance hungry users the fastest Haswell configuration possible. Most companies tend to lose focus of their core audience as they pursue new markets and this is a clear example of Intel doing just that. Desktop users should at least have the option of buying a part with Crystalwell on-board.

So much of Intel's march towards improving graphics has been driven by Apple, I worry about what might happen to Intel's motivation should Apple no longer take such an aggressive position in the market. My hope is that Intel has finally realized the value of GPU performance and will continue to motivate itself.

Pricing
Comments Locked

177 Comments

View All Comments

  • 8steve8 - Saturday, June 1, 2013 - link

    Great work intel, and great review anand.
    As a fan of low power and small form factor high performance pcs, I'm excited about the 4770R.

    my question is how do we get a system with 4770R ?
    will it be in an NUC, if so, when/info?
    will there be mini-itx motherboards with it soldered on?
  • bill5 - Saturday, June 1, 2013 - link

    Anand, would you say the lack of major performance improvement due to crystalwell bodes ill for Xbox one?

    The idea is ESRAM could make the 1.2 TF Xbox One GPU "punch above it's weight" with more efficiency due to the 32MB of low latency cache (ALU's will stall less waiting on data). However these results dont really show that for Haswell (the compute results that scale perfectly with ALU's for example).

    Here note I'm distinguishing between the cache as bandwidth saver, I think we can all agree it will serve that purpose- and as actual performance enhancer. I'm interested in the latter for Xbox One.
  • Kevin G - Saturday, June 1, 2013 - link

    A couple of quotes and comments from the article:

    "If Crystalwell demand is lower than expected, Intel still has a lot of quad-core GT3 Haswell die that it can sell and vice versa."

    Intel is handicapping demand for GT3e parts by not shipping them in socketed form. I'd love to upgrade my i7-2600k system to a 4770K+GT3e+TSX setup. Seriously Intel, ship that part and take my money.

    "The Crystalwell enabled graphics driver can choose to keep certain things out of the eDRAM. The frame buffer isn’t stored in eDRAM for example."

    WTF?!? The eDRAM would be the ideal place to store various frequently used buffers. Having 128 MB of memory leaves plenty of room for streaming in textures as need be. The only reason not to hold the full frame buffer is if Intel has an aggressive tile based rendering design and only a tile is stored there. I suspect that Intel's driver team will change this in the future.

    "An Ultrabook SKU with Crystalwell would make a ton of sense, but given where Ultrabooks are headed (price-wise) I’m not sure Intel could get any takers."

    I bet Apple would ship a GT3e based part in the MacBook Air form factor. They'd do something like lower the GPU clocks to prevent it from melting but they want it. It wouldn't surprise me if Apple managed to negotiate a custom part from Intel again.

    Ultimatley I'm pleased with GT3e. On the desktop I can see the GPU being used for OpenCL tasks like physics while my Radeon 7970 handles the rest of the graphics load. Or for anything else, I'd like GT3e for the massive L4 cache.
  • tipoo - Saturday, June 1, 2013 - link

    "Ultimatley I'm pleased with GT3e. On the desktop I can see the GPU being used for OpenCL tasks like physics while my Radeon 7970 handles the rest of the graphics load. Or for anything else, I'd like GT3e for the massive L4 cache."

    I'd love that to work, but what developer would include that functionality for that niche setup?
  • Kevin G - Saturday, June 1, 2013 - link

    OpenCL is supposed to be flexible enough that you can mix execution targets. This also includes the possibility of OpenCL drivers for CPU's in addition to those that use GPU's. At the very least, it'd be nice for a game or application to manually select the OpenCL target in some config file.
  • Egg - Saturday, June 1, 2013 - link

    I'm only a noob high school junior, but aren't frame buffers tossed after display? What would be the point of storing a frame buffer? You don't reuse the data in it at all. As far as I know, frame buffer != unpacked textures.
    Also, aren't most modern fully programmable GPUs not tile based at all?
    Also, wasn't it mentioned that K-series parts don't have TSX?
  • Kevin G - Saturday, June 1, 2013 - link

    The z-buffer in particular is written and often read. Deferred rendering also blends multiple buffers together and at 128 MB in size, a deferred render can keep several in that memory. AA algorithms also perform read/writes on the buffer. At some point, I do see Intel moving the various buffers into the 128 MB of eDRAM as drivers mature. In fairness, this change may not be universal to all games and dependent on things like resolution.

    Then again, it could be a true cache for the GPU. This would mean that the drivers do not explicitly store the frame buffers there but can could be stored there based upon prefetching of data. Intel's caching hierarchy is a bit weird as the CPU's L3 cache can also be used as a L4 cache for the GPU under HD2000/2500/3000/4000 parts. Presumably the eDRAM would be a L5 cache under the Sandy Bridge/Ivy Bridge schema. The eDRAM has been described as a victim cache though for GPU operations it would make sense to prefetch large amounts of data (textures, buffers). It'd be nice to get some clarification on this with Haswell.

    PowerVR is still tile based. Previous Intel integrated solutions were also tile base though they dropped that with the HD line (and I can't remember if the GMA line was tile based as well).

    And you are correct that the K series don't have TSX, hence why I'd like a 4770K with GT3e and TSX. Also I forgot to throw in VT-d since that too is arbitrarily disabled in the K series.
  • IntelUser2000 - Sunday, June 2, 2013 - link

    Kevin G: Intel dropped the Tile-based rendering in the GMA 3 series generation back in 2006. Although, their Tile rendering was different from PowerVR's.
  • Egg - Sunday, June 2, 2013 - link

    Fair points - I was being a bit myopic and only thought about buffers persisting across frames, neglecting the fact that buffers often need to be reused within the process of rendering a single frame! Can you explain how the CPU's L3 cache is an L4 cache for the GPU? Does the GPU have its own L3 cache already?

    Also I don't know whether PowerVR's architecture is considered fully programmable yet. I know they have OpenCL capabilities, but reading http://www.anandtech.com/show/6112/qualcomms-quadc... I'm getting a vague feeling that it isn't as complete as GCN or Kepler, feature wise.
  • IntelUser2000 - Tuesday, June 4, 2013 - link

    Gen 7, the Ivy Bridge generation, has its own L3 cache. So you have the LLC(which is L3 for the CPU), and its own L3. Haswell is Gen 7.5.

Log in

Don't have an account? Sign up now