The GeForce ULV

Complementing the three major CPU architectures in the mobile applications processor market for 2011 there are three major GPUs you’ll see crop up in devices this year: Imagination Technologies’ PowerVR SGX Series5 and Series5XT, Qualcomm’s Adreno 205/220 and NVIDIA’s GeForce ULV. There are other players but these three are the ones that will show up in the most exciting devices this year.

ImgTec licenses its GPUs for use in a number of SoCs. Apple’s A4, TI’s OMAP 3 and 4 and Samsung’s Hummingbird all use ImgTec GPUs. The currently used high end from ImgTec is the PowerVR SGX 540, which features four unified shader pipelines capable of handling both pixel and vertex shader operations. The PowerVR SGX 543 is widely expected to be used in Apple’s 5th generation SoC.

The PowerVR SGX as well as Qualcomm’s Adreno GPUs both implement tile based deferred rendering architectures. In the early days of the PC GPU race deferred renderers were quite competitive. As geometry complexity in games increased, ATI and NVIDIA’s immediate mode rendering + hidden surface removal proved to be the better option. Given the lack of serious 3D gaming, much less geometry heavy titles on smartphones today the tile based approach makes a lot of sense. Tile based renderers conserve both power and memory bandwidth, two things that are in very short supply on smartphones. Remember from our CPU discussions that in many cases only a single 32-bit LPDDR2 memory channel has to feed two CPU cores as well as the GPU. By comparison, even PCs from 10 years ago had a 64-bit memory bus just for the CPU and a 128-bit memory bus for the GPU.

NVIDIA believes that the future of GPUs on smartphones is no different than the future of PC GPUs: immediate mode renderers. As a result, the GeForce ULV GPU in NVIDIA’s Tegra 2 looks very similar to a desktop GPU—just a lot smaller, and a lot lower power. It’s also worth pointing out that until we get PC-like content on smartphones, NVIDIA’s approach to ultra mobile GPU architectures may not always make the most sense for power efficiency.

(Note that some of what follows below is borrowed from our earlier coverage of NVIDIA's Tegra 2):

At a high level NVIDIA is calling the GeForce ULV an 8-core GPU, however its not a unified shader GPU. Each core is an ALU but half of them are used for vertex shaders and the other half are for pixel shaders. You can expect the GeForce ULV line to take a similar evolutionary path to desktop GeForce in the future (meaning it’ll eventually be a unified shader architecture).

The four vertex shader cores/ALUs can do a total of 4 MADDs per clock, the same is true for the four pixel shader ALUs (4 MADDs per clock).

The GeForce ULV in NVIDIA’s Tegra 2 runs at a minimum of 100MHz but it can scale up to 400MHz depending on the SoC version:

NVIDIA Tegra 2
SoC Part Number CPU Clock GPU Clock Availability
NVIDIA Tegra 2 T20 1GHz 333MHz Now
NVIDIA Tegra 2 AP20H 1GHz 300MHz Now
NVIDIA Tegra 2 3D T25 1.2GHz 400MHz Q2 2011
NVIDIA Tegra 2 3D AP25 1.2GHz 400MHz Q2 2011

The AP20H runs at up to 300MHz, while the tablet version runs at a faster 333MHz.

Architecturally, the GeForce ULV borrows several technologies that only recently debuted on desktop GPUs. GeForce ULV has a pixel cache, a feature that wasn’t introduced in GeForce on the desktop until Fermi. This is purely an efficiency play as saving any trips to main memory reduces power consumption considerably (firing up external interfaces always burns watts quicker than having data on die).

NVIDIA also moved the register files closer to the math units, again in the pursuit of low power consumption. GeForce ULV is also extremely clock gated although it’s not something we’re able to quantify.

NVIDIA did reduce the number of pipeline stages compared to its desktop GPUs by a factor of 2.5 to keep power consumption down.

The GeForce ULV supports Early Z culling, a feature first introduced on the desktop with G80. While G80 could throw away around 64 pixels per clock, early Z on GeForce ULV can throw away 4 pixels per clock. While early Z isn’t the equivalent of a tile based renderer, it can close the efficiency gap between immediate mode renderers and TBRs.

The ROPs are integrated into the pixel shader, making what NVIDIA calls a programmable blend unit. GeForce ULV uses the same ALUs for ROPs as it does for pixel shaders. This hardware reuse saves die size although it adds control complexity to the design. The hardware can perform one texture fetch and one ROP operation per clock.

While GeForce ULV supports texture compression, it doesn’t support frame buffer compression.

Both AA and AF are supported by GeForce ULV. NVIDIA supports 5X coverage sample AA (same CSAA as we have on the desktop) and up to 16X anisotropic filtering.

The performance comparison is far more difficult to quantify in the ultra mobile space than among desktop GPUs. There are some very good 3D games out for Android and iOS, unfortunately none of them have built in benchmarks. There are even those that would make for good performance tests however OEM agreements and politics prevent them from being used as such. At the other end of the spectrum we have a lot of absolutely horrible 3D benchmarks, or games with benchmarks that aren’t representative of current or future game performance. In between the two extremes we have some benchmark suites (e.g. GLBenchmark) that aren’t representative of current or future GPU performance, but they also aren’t completely useless. Unfortunately today we’ll have to rely on a mixture of all of these to paint a picture of how NVIDIA’s GeForce ULV stacks up to the competition.

Just as is the case in the PC GPU space, game and driver optimizations play as large of a role in performance as the GPU architecture itself. NVIDIA believes that its experience with game developers will ultimately give it the edge in the performance race. It’s far too early to tell as most of NVIDIA’s partners aren’t even playing in the smartphone space yet. However if PC and console titles make their way to smartphones, NVIDIA’s experience and developer relationships may prove to be a tremendous ally.

The CPU Comparison: NVIDIA, TI & Qualcomm in 2011 The Partners and the Landscape
Comments Locked

75 Comments

View All Comments

  • matt b - Tuesday, February 8, 2011 - link

    Just curious because I've heard rumors that HP will use the Qualcomm chipset and I've also heard rumors that they will stick with Ti for their new tablets/phones. I just wondered if you know for sure since I know that you met with folks at CES. I hope that we all find out tomorrow at the HP event.
    Great review.
  • TareX - Wednesday, February 9, 2011 - link

    I'd like to see Tegra 2 on the Xoom compared to Tegra 2 on the Optimus 2X.

    Why? Well, simply put, the only Android version that seems to be optimized for dual-core is Honeycomb.
  • Dark Legion - Wednesday, February 9, 2011 - link

    Why is there no Incredible on 2.2? I could understand if you had both 2.1 and 2.2, like the Evo, but as it is now does not show the full/current performance.
  • Morke - Thursday, February 10, 2011 - link

    "It’s a strange dichotomy that LG sets up with this launcher scheme that divides “downloaded” apps from “system applications,” one that’s made on no other Android device I’ve ever seen but the Optimus One. The end result is that most of the stuff I want (you know, since I just installed it) is at the very last page or very bottom of the list, resulting in guaranteed scrolling every single time. If you’re a power user, just replace the launcher with something else entirely."

    You are not right there.
    First you can create additonal categories (aside from system applications and downloads) and move applications between them.
    Secondly you can rearrange the ordering of the applications inside a category (allowing you to have those on top which you access most frequently). You can also delete applications right away in this edit mode.

    There is a youtube video demonstrating this:
    http://www.youtube.com/watch?v=Dvvtl6pSNp8
    See time index starting with 4:21.

    Maybe you should correct your review on this?
  • Morke - Thursday, February 10, 2011 - link

    The correct youtube URL demonstrating application launcher management is actually
    http://www.youtube.com/watch?v=lDo-1-jwLko&fea...
  • brj_texas - Thursday, February 10, 2011 - link

    Anand,
    A question on the statement in the benchmarking section, "the SunSpider benchmark isn't explicitly multithreaded, although some of the tests within the benchmark will take advantage of more than one core. "

    My understanding was that all of the tests within sunspider are single-threaded, but a dual-core processor can run the javascript engine (and the sunspider tests) in a separate thread from the main browser thread when you call sunspider from a browser window.

    Can you clarify which tests support multi-threading in sunspider if that is in fact what you meant?

    On the topic of multi-threading, we've used moonbat, a multi-core variant of sunspider, to explicitly test multi-core performance with javascript code. I wonder if you have any other benchmarks under investigation that measure multi-core performance?
    Thanks

    -Brian
  • worldbfree4me - Saturday, February 12, 2011 - link

    Thanks for another thorough and in-depth analysis. But I have a question to ask,

    Should we upgrade (break our 2 year contract agreement for this phone) or ride out our contract?

    We trust and value your our opinion. Tom’s hardware does a GPU hierarchy chart every few months, can you do a phone hierarchy in the future?
  • lashton - Sunday, February 13, 2011 - link

    They have a really good idea and lead the market but it falls short because its not quite right
  • tnepres - Tuesday, April 5, 2011 - link

    I now own a optimus 2x. The first was dead on arrival, but this one is perfect. The LG software is innovative and pleasing to the eye. In various places they made real improvements to the UI that are just brilliant,ie. the ability to sort and categorize apps. At times the UI is not as fast as you would expect, especially when adding apps/widgets to one of the 7 pages. It seems LG generates a list of widgets for you, so you can see what apps support this mode, and that takes about a second. As I recall, on HTC devices you are just presented with a list of apps and u have to try and see if you can widget it.

    The LG keybord has a brilliant feature, you tab the side of the phone to move the cursor. Sadly in other respects the keyboard is lacking, ie. when you long-pres you do not get the alternates you might wish, such as numbers.

    The batterytime is superb, using the UI consumes much less power than on my desire.

    Copy/paste in the browser does not activate via long-pres, you have to hit menu button, but on the plus side its easier to use than what HTC made.

    During 2 days of very intensive use i have had 1 app (partially) crash and that was the marketplace. No other issues so far, its my verdict that the unstability issues are overrated.

    No problems with wifi using stock ISP (TDC) supplied router. (sagemcom)

    To engadget: How on earth (!!?!!?) can you state there is no use for dualcore. When browsing one loads flash the other the rest. Its so fast you cant believe it. Try loading www.ufc.com on a non dualcore phone and you get my drift.

    I do not hesitate to give the optimus 2x my warm recommendations.

    VERDICT: 9/10 (missing 4g)
  • Sannat - Thursday, May 12, 2011 - link

    gsmarena sound benchmark for optimus 2x isnt great...could it be a s/w issue...??

Log in

Don't have an account? Sign up now