Understanding Rendering Techniques

It's been years since I've had to describe the differences in rendering techniques but given the hardware we're talking about today it's about time for a quick refresher. Despite the complexities involved in CPU and GPU design, both processors work in a manner that's pretty easy to understand. The GPU fundamentally has one function: to determine the color of each pixel displayed on the screen for a given frame. The input the GPU receives however is very different from a list of pixel coordinates and colors.

A 3D application or game will first provide the GPU with a list of vertex coordinates. Each set includes the coordinates for three vertices in space, these describe the size, shape and position of a triangle. A single frame is composed of hundreds to millions of these triangles. Literally everything you see on screen is composed of triangles:

Having more triangles (polygons) can produce more realistic scenes but it requires a lot more processing on the front end. The trend in 3D gaming has generally been towards higher polygon counts over time.

The GPU's first duty is to take this list of vertices and convert them into triangles on a screen. Doing so results in a picture similar to what we've got above. We're dealing with programmable GPUs now so its possible to run code against these vertexes to describe their interactions or effects on them. An explosion in an earlier frame may have caused the vertices describing a character's elbow to move. The explosion will also impact lighting on our character. There's going to be a set of code that describes how the aforementioned explosion impacts vertices and another snippet of code that describes what vertices it impacts. These code segments run and modify details of the vertices at this stage.

With the geometry defined the GPU's next job is rasterization: figure out what pixels cover each triangle. From this point on the GPU stops dealing in vertices and starts working in pixel coordinates.

Once rasterized, it's time to give these pixels some color. The color of each pixel is determined by the texture that covers that pixel and/or the pixel shader program that runs on that pixel. Similar to vertex shader programs, pixel shader programs describe effects on pixels (e.g. flicker bright orange at interval x to look like fire).

Textures are exactly what they sound like: wallpaper for your polygons. Knowing pixel coordinates the GPU can go out to texture memory, fetch the texture that maps to those pixels and use it to determine the color of each pixel that it covers.

There's a lot of blending and other math that happens at this stage to deal with corner cases where you don't have perfect mapping of textures on polygons, as well as dealing with what happens when you've got translucency in your textures. After you get through all of the math however the GPU has exactly what it wanted in the first place: a color value for every pixel on the screen.

Those color values are written out to a frame buffer in memory and the frame buffer is displayed on the screen. This process continues (hopefully) dozens of times per second in order to deliver a smooth visual experience.

The pipeline I've just described is known as an immediate mode renderer. With a few exceptions, immediate mode renderers were the common architectures implemented in PC GPUs over the past 10+ years. These days pure immediate mode renderers are tough to find though.

IMRs render the full car and the tree, even though part of the car is occluded

Immediate mode renderers (IMRs) brute force the problem of determining what to draw on the screen. They take polygons as they receive them from the CPU, manipulate and shade them. The biggest problem here is although data for every polygon is sent to the GPU, some of those polygons will never be displayed on the screen. A character with thousands of polygons may be mostly hiding behind a pillar, but a traditional immediate mode renderer will still put in all of the work necessary to plot its geometry and shade its pixels, even though they'll never be seen. This is called overdraw. Overdraw unfortunately wastes time, memory bandwidth and power - hardly desirable when you're trying to deliver high performance and long battery life. In the old days of IMRs it wasn't uncommon to hear of 4x overdraw in a given scene (i.e. drawing 4x the number of pixels than are actually visible to the user). Overdraw becomes even more of a problem with scene complexity.

Tile Based Deferred Rendering

On the opposite end of the spectrum we have tile based deferred rendering (TBDR). Immediate mode renderers work in a very straightforward manner. They take vertices, create polygons, transform and light those polygons and finally texture/shade/blend the pixels on them. Tile based deferred renderers take a slightly different approach.

TBDRs subdivide the scene into smaller tiles on the order of a few hundred pixels. Vertex processing and shading continue as normal, but before rasterization the scene is carved up into tiles. This is where the deferred label comes in. Rasterization is deferred until after tiling and texturing/shading is deferred even longer, until after overdraw is eliminated/minimized via hidden surface removal (HSR).

Hidden surface removal is performed long before we ever get to the texturing/shading stage. If the frontmost surface being rendered is opaque, there's absolutely zero overdraw in a TBDR architecture. Everything behind the frontmost opaque surface is discarded by performing a per-pixel depth test once the scene has been tiled. In the event of multiple overlapping translucent surfaces, overdraw is still minimized. Only surfaces above the farthest opaque surface are rendered. HSR is performed one tile at a time, only the geometry needed for a single tile is depth tested to keep the problem manageable.

With all hidden surfaces removed then, and only then, is all texture data fetched and all pixel shader code executed. Rendering (or more precisely texturing and shading) is deferred until after a per-pixel visibility test is passed. No additional work is expended and no memory bandwidth wasted. Only what is visible in the final scene is rasterized, textured and shaded on each tile.

The application doesn't need to worry about the order polygons are sent for rendering when dealing with a TBDR, the hidden surface removal process takes care of everything.

In memory bandwidth constrained environments TBDRs do incredibly well. Furthermore, the efficiencies of a TBDR really shine when running applications and games that are more shader heavy rather than geometry heavy. As a result of the extensive hidden surface removal process, TBDRs tend not to do as well in scenes with lots of complex geometry.

What's In Between Immediate Mode and Deferred Rendering?

These days, particularly in the mobile space, many architectures refer to themselves as "tile based". Unfortunately these terms can have a wide variety of meanings. The tile based deferred rendering architecture I described above really only applies to GPUs designed by Imagination Technologies. Everything else falls into the category of tile based immediate mode renderers, or immediate mode renderers with early-z.

These GPUs look like IMRs but they implement one or both of the following: 1) scene tiling, 2) early z rejection.

Scene tiling is very similar to what I described in the section on TBDRs. Each frame is divided up into tiles and work is done on a per-tile basis at some point in the rendering pipeline. The goal of dividing the scene into tiles is to simplify the problem of rendering and better match the workload to the hardware (e.g. since no GPU is a million execution units wide, you make the workload more manageable for your hardware). Also by working on small tiles caches behave a lot better.

The big feature that this category of GPUs implements is early-z rejection. Instead of waiting until after the texturing/shading stage to determine pixel visibility, these architectures implement a coarse test for visibility earlier in the pipeline.

Each vertex has a depth value and using those values you can design logic to find out what polygons (or parts of polygons) are occluded from view. GPU makers like ATI and NVIDIA introduced these early visibility tests years ago (early-z or hierarchical-z are some names you may have heard). The downside here is that early-z techniques only work if the application submits vertices in a front-to-back order, which does require extra work on the application side. IMRs process polygons in the order they're received, and you can't reject anything if you're not sure if anything will be in front of it. Even if an application packages up vertex data in the best way possible, there are still situations where overdraw will occur.

The good news is you get some of the benefits of a TBDR without running into trouble should geometry complexities increase. The bad news is that a non-TBDR architecture will still likely have higher amounts of overdraw and be less memory bandwidth efficient than a TBDR.

Most modern PC GPUs fall into this category. Both NVIDIA's Fermi and AMD's Cayman GPUs do some amount of tiling although they have their roots in immediate mode rendering.

The Mobile Landscape

Understanding the difference between IMRs, IMRs with early-z, TBRs and TBDRs, where do the current ultra mobile GPUs fall? Imagination Technologies' PowerVR SGX 5xx is technically the only tile based deferred renderer that allows for order independent hidden surface removal.

Qualcomm's Adreno 2xx and ARM's Mali-400 both appear to be tile based immediate mode renderers that implement early-z. This is particularly confusing because ARM lists the Mali-400 as featuring "advanced tile-based deferred rendering and local buffering of intermediate pixel states". The secret is in ARM's optimization documentation that states: "One specific optimization to do for Mali GPUs is to sort objects or triangles into front-to-back order in your application. This reduces overdraw." The front-to-back sort requirement is necessary for most early-z technologies to work properly. These GPUs fundamentally tile the scene but don't perform full order independent hidden surface removal. Some aspects of the traditional rendering pipeline are deferred but not to the same extent as Imagination's design.

NVIDIA's GeForce ULP in the Tegra 2 is an IMR with early-z. NVIDIA has long argued that its design is the best for future games with increasing geometry complexities as a result of its IMR design.

Today there's no real benefit to not building a TBDR in the ultra mobile space. Geometry complexities aren't very high and memory bandwidth does come at a premium. Moving forward however, the trend is likely going to mimic what we saw in the PC space: towards more polygon heavy games. There is one hiccup though: Apple.

In the evolution of the PC graphics industry the installed base of tile based deferred renderers was extremely small. Imagination's technology surfaced in two discrete GPUs: STMicro's Kyro and Kyro II, but neither was enough to stop NVIDIA's momentum at the time. Since immediate mode renderers were the norm, games simply developed around their limitations. AMD and NVIDIA both eventually implemented elements of tiling and early-z rejection, but TBDRs never took off in PCs.

In the ultra mobile space Apple exclusively uses Imagination Technologies GPUs, which I mentioned above are tile based deferred renderers. Apple also happens to be a major player, if not the biggest, in the smartphone/tablet gaming space today. Any game developer looking to put out a successful title is going to make sure it runs well on iOS hardware. Game developers will likely rely on increasing visual quality through pixel shader effects rather than ultra high polygon counts. As long as Imagination Technologies is a significant player in this space, game developers will optimize for TBDRs.

The Fastest Smartphone SoC Today: Samsung Exynos 4210 The Mali-400


View All Comments

  • tipoo - Sunday, September 11, 2011 - link

    The iPhone 4 always scores near the bottom of the 2.0 test since its native resolution is so high, but I'd be interested to know how it does with the resolution independent 2.1 test? Reply
  • B3an - Sunday, September 11, 2011 - link

    ...but the iPhone 4 is already in the 2.1 tests which are all run at 1280x720 so it's equal on every phone... and unsurprisingly it's the worst performer. Reply
  • Lucian Armasu - Sunday, September 11, 2011 - link

    The iPhone 4 has a GPU that is one generation older than the one in the first Galaxy S phone. So that's the main reason why it performs the worst in all these GPU tests. Reply
  • LostViking - Saturday, September 17, 2011 - link

    You can do the math already.

    If you calculate the pixel ratio (width * height) between the iPhone and the others you can correct the numbers.
  • 3lackdeath - Sunday, September 11, 2011 - link

    When are you guys going to start adding WP7 to the Comparisons list WP7 is soooo lacking in your reviews.

    It has been out for a while now you know, a long long time did i say long?.
  • shamalh108 - Sunday, September 11, 2011 - link

    Hi Brian.. first off thanks for the great review..its quite honestly the best I've read on the SGS2..

    As an SGS2 user i need to just testify to my experience of the AOS bug..
    This bug or its effects aren't actually experienced by me while the phone is actually in use, but actually results in a dramatic use of battery when in suspend.. it is intermittent so it won't occur all the time but over the last month I've been able to identify it using battery monitor pro.
    what i find is that in the morning when unplugged i can put my edge data on and then leave the phone in standby for up to two hours and see no drain... if i then proceed to use the phone for about 20min and note the battery percentage , i then lock the phone and leave it in standby again with edge data enabled and push email... after closing all tasks but the battery percentage will drop by up to 10% in those two hours while battery monitor pro reports an estimate usage of 100+ mah ..compared to the same running conditions it was in when just unplugged and consumed almost no power. this isn't always the case though sometimes the phone will only drop 2% or less per hour with the battery monitor pro reporting usage of 25~35 mah ... As you can see this bug actually affects standby time more than nonstop usage and that is probably why the benchmarks havent been affected.. also im not sure if its normal but when the phone is experiencing the high usage and i look at the process cpu usage the events and suspend process are consuming around 15~20% cpu... this checked immediately after unlocking the phone using watchdog task manager pro.
    while i understand all the measurements are estimates .. i really feel the effects of this as with the same usage i can't be certain if ill get the 14hours battery life i need or 10.. what is the normal power consumption for an android phone in suspend as I've noticed my brothers HTC desire consistently consumes 10~15mah in standby with a similar set up..

    again thanks for the great review..
    my international SGS2 is running stock with no root , XXKF3 .
  • willstay - Sunday, September 11, 2011 - link

    I have been using SGS2 for two months now and this is my 3rd Android. In the past, I always flashed closest to stock ROM, now after 2 months, I think google should consider touchWiz kindof UI as default. It is really minimalistic with just few tiny bit feature that makes it way better than stock - folders and page scrolling where I can put important apps in page 1, system apps in page 2 and so on.

    One consistent touchWiz feature to swipe contacts left for message and right for call is a must have.

    I must be having over sensitive eye that comfortable brightness level I use during day (indoor) is zero and for evening and night, I am using app called "Screen Filter" to make it dimmer. (I know this is only me - for my laptop I had to hack drivers to make it dimmer than allowed normally).

    When idle, processor goes back to 200 MHz and normally with wifi off, cellular net off, SGS2 lives through the night depleting only 1% of the battery. When I only use it for phone and sms, I get two days. Most of the time when I have access to desktop, I turn off wifi and push mail. My usual battery indicator runs as follows - fully charged before going to sleep - 99% when I wake up - I turn wifi and push mail on and by the time I move out to office it is 97% - wifi off in office but sometimes on when I move out of my desk to run SIP client and get my desk extension routed to phone and by lunch time it is 90% - push mail on and cellular net on during lunch time 86% - when I reach home it is from 80 to 75% - that is when my phone gets highest load of games, browsing, wifi, pushmail until I plug for charing around 11 pm and before I plug in it is usally 30%. For comparison, the lowly Nokia 1280 I am using for backup ran for 15 days in single charge and there was still 1/5 bar left in it.

    "light weight seems to imply a certain level of cheapness" - people will soon start to understand weight has no correlation with quality and when devices grow bigger and bigger, they will appreciate lighter weight design.

    As for me, this is my first Samsung and I am impressed!! Unfortunagely SGS2 has short life it seems - I am so impressed with this light weight, thinness, SAMOLED+, touchWiz that I am getting SG-Note at whatever cost when it comes out :)
  • shamalh108 - Sunday, September 11, 2011 - link

    hey willstay.. wow ! please help me , how are you getting such astonishing battery life ? what Rom are you on ? is your phone used at all during the day ? i simply can not get that kinda standby consumption between my few use periods during the day.. i love my phone and right now its just the battery life that's frustrating me.. why are the reports so varied .. any info you have would be welcome :) Reply
  • ph00ny - Sunday, September 11, 2011 - link

    I'm also getting a full day of usage like the user above. I ran stock rom forever until i ventured over to the some of the newer custom roms and i'm getting slightly less battery life with the newest sensation 1.6 rom (2.3.4) compared to stock and cognition 1.07. Reply
  • willstay - Sunday, September 11, 2011 - link

    I am using default ROM but flashed kernel for rooting. I guess it must be rouse app. I've found Location And Security -> Use Wireless Networks eats up around 7% of battery through night (which otherwise is only 1%). Sometimes service called MediaService (after I've played songs through Btooth) eats up around 25% through sleep hours. Once I used very nice network bandwidth monitoring app to find individual data usage, it was sipping 25% during sleep hours (I install this app only when I need it). Pushmail on low signal cellular network eats battery like hell - my phone gets warm at the back. Interestingly, always-on low light digital clock of app NoLED eats only 20% through night. For most of the bug related drainage, flushing RAM helps.

    If I were you, I would temporarily uninstall few apps at a time to find the culprit. You may be able to short list possible apps through battery usage tool of the phone too.

Log in

Don't have an account? Sign up now