Ever since its arrival in the ultra mobile space, NVIDIA hasn't really flexed its GPU muscle. The Tegra GPUs we've seen thus far have been ok at best, and in serious need of improvement at worst. NVIDIA often blamed an immature OEM ecosystem unwilling to pay for the sort of large die SoCs necessary in order to bring a high-performance GPU to market. Thankfully, that's all changing. Earlier this year NVIDIA laid out its mobile SoC roadmap through 2015, including the 2014 release of Project Logan - the first NVIDIA ultra mobile SoC to feature a Kepler GPU. Yesterday in a private event at Siggraph, NVIDIA demonstrated functional Logan silicon for the very first time.

NVIDIA got Logan silicon back from the fabs around 3 weeks ago, making it almost certain that we're dealing with some form of 28nm silicon here and not early 20nm samples.

NVIDIA isn't talking about CPU cores, but it's safe to assume that Logan will be another 4+1 arrangement of cores - likely still based on ARM's Cortex A15 IP (but perhaps a newer revision of the core). On the GPU front, NVIDIA confirmed our earlier speculation that Logan includes a single Kepler SMX:

One Kepler SMX features 192 CUDA cores. NVIDIA isn't talking about shipping GPU frequencies either, but it did provide this chart to put Logan's GPU capabilities into perspective:

Don't get too excited as we're looking at a comparison of GFLOPS and not game performance, but the peak theoretical ALU bound performance of mobile Kepler should exceed that of a Playstation 3 or GeForce 8800 GTX (memory bandwidth is another story however). If we look closely at NVIDIA's chart and compare mobile Kepler to the iPad 4, we get a better idea of what sort of clock speeds NVIDIA would need to attain this level of performance. Doing some quick Photoshop estimation it looks like NVIDIA is claiming mobile Kepler has somewhere around 5.2x the FP power of the PowerVR SGX 554MP4 in the iPad 4 (76.8 GFLOPS). That works out to be right around 400 GFLOPS. With a 192 core implementation of Kepler, you get 2 FLOPS per core or 384 FLOPS per cycle. To hit 400 GFLOPS you'd need to clock the mobile Kepler GPU at roughly 1GHz. That's certainly doable from an architectural standpoint (although we've never seen it done on any low power 28nm process), but it's probably a bit too high for something like a smartphone.

NVIDIA didn't want to talk frequencies but they did tell me that we might see something this fast in some sort of a tablet. I suspect that most implementations will be clocked significantly lower. Even at half the frequency though, we're still talking about roughly Playstation 3 levels of FP power out of a mobile SoC. We know nothing of Logan's memory subsystem, which obviously plays a major role in real world gaming performance but there's no getting around the fact that Logan's Kepler implementation means serious business. For years we've lamented NVIDIA's mobile GPUs, Logan looks like it's finally going to change that.

API Support and Live Demos
 

Unlike previous Tegra GPUs, Kepler is a fully unified architecture and OpenGL ES 3.0, OpenGL 4.4 and DirectX 11 compliant. The API compliance alone is a huge step forward for NVIDIA. It's also a big one for game developers looking to move more seriously into mobile. Epic's Tim Sweeney even did a blog post for NVIDIA talking about Logan's implementation of Kepler and how it brings feature parity between PCs, next-gen consoles and mobile platforms. NVIDIA responded in kind by running some Unreal Engine 4 demos on Android on a Logan test platform. That's really the big story behind all of this. With Logan, NVIDIA will bring its mobile GPUs up to feature parity with what it's shipping in the PC market. Game developers looking to port games between console, PC, tablet and smartphone should have an easier job of doing that if all platforms supported the same APIs. Logan will take NVIDIA from being very behind in API support (with no OpenGL ES 3.0 support) to the head of the class.

NVIDIA took its Ira demo, originally run on a Titan at GTC 2013, and got it up and running on a Logan development board. Ira did need some work to make the transition to mobile. The skin shaders were simplified, smaller textures are used and the rendering resolution is dropped to 1080p. NVIDIA claims this demo was done in a 2 - 3W power envelope.

The next demo is called Island and was originally shown on a Fermi desktop part. Running on Logan/mobile Kepler, this demo shows OpenGL 4.3 and hardware tessellation working.

The development board does feature a large heatspreader, but that's not too unusual for early silicon just out of bring up. Logan's package size should be comparable to Tegra 4, although the die size will clearly be larger. The dev board is running Android and is connected to a 10.1-inch 1920 x 1200 touchscreen.

Power Consumption & Final Words
POST A COMMENT

141 Comments

View All Comments

  • djboxbaba - Wednesday, July 24, 2013 - link

    This is ALL relative, does anyone realize that this will not be released till 2014? ......2014? the A8/A8X (or equivalent) will be released by then, as well as Qualcomm's latest snapdragon processors. Don't place your expectations too high. Reply
  • Scannall - Wednesday, July 24, 2013 - link

    I am wondering if they will be too late to market again. PowerVR ssries 6 (Rogue) devices should be shipping any time.

    http://en.wikipedia.org/wiki/PowerVR#Series_6_.28R...
    Reply
  • NLPsajeeth - Wednesday, July 24, 2013 - link

    Seems like the most similar desktop part is the GeForce GT 630 OEM
    http://www.geforce.com/hardware/desktop-gpus/gefor...

    And laptop somewhere between 720M and 730M.
    Reply
  • randomhkkid - Wednesday, July 24, 2013 - link

    Think about the implications if they can get just shy of 720m - 730m performance at about 5W what can they do with laptop GPU at around 35W O.o Reply
  • Spunjji - Wednesday, July 24, 2013 - link

    We already have Kepler at that power level, though. So nothing that they haven't done already... Reply
  • randomhkkid - Wednesday, July 24, 2013 - link

    I mean that performance currently hasn't scaled with the power, i'm assuming that the current 720m has a power envelope of around 35W and yet there is a ~5W kepler GPU that has around the same performance. If they could scale it up while keeping the efficiency it would be a sight to behold. Reply
  • DanNeely - Wednesday, July 24, 2013 - link

    I think you're way high on the 720M TDP. It's a GF117 part. The other GF117 parts are 12.5W (710M) or 15W (620M and 625M).

    35W is probably a bit too high for the 730M too, it's a GK208 part and the Quadro 510M/610M (only mobile GK208's I can find TDP for) run at 30W,.
    Reply
  • Jaybus - Wednesday, July 24, 2013 - link

    For the same architecture, it is true that performance scales linearly with clock speed. However it is not true that power consumption scales linearly with clock speed. 5 W at 500 MHz does not equate to 25 W at 2.5 GHz. Reply
  • roberto.tomas - Wednesday, July 24, 2013 - link

    This mostly looks very rosy for Nvidia's future IP in mobile space. If Logan comes at 28nm though, it *does* face the same issue as Tegra 3 had — unless it comes to market not in 2014 but in 2013.

    As a rough estimate, at 28nm and from the power shown, to achieve max performance they need 25w on the gpu power meter. That's a bit of a shame, because that is clearly outside the bounds of minimal cooling.

    On the plus side, that is a tonnage of power, which companies can tune to use as much or little of as they want, all the way up into the absurd levels — rather like the PowerVR Rogue. In that sense it looks good for NVidia.
    Reply
  • sontin - Wednesday, July 24, 2013 - link

    What? If they can get 76GFLOPs/s in less than 1W they do not need 25W for 5x the performance...

    And the remark about 28m: Every company will supply constraint with 20nm in 2014. And Tegra 3 was a huge success for nVidia (50% revenue increase in FY2013) that they can sell it for a lower price (cheaper wafer) and in huge numbers (more wafers, better yields).
    And do not forget: 20nm will only bring a 30% lower power consumption. So if you can archive this with an optimize architectures there is no benefit for using the 20nm from the start for Tegra.
    Reply

Log in

Don't have an account? Sign up now