Understanding Rendering Techniques

It's been years since I've had to describe the differences in rendering techniques but given the hardware we're talking about today it's about time for a quick refresher. Despite the complexities involved in CPU and GPU design, both processors work in a manner that's pretty easy to understand. The GPU fundamentally has one function: to determine the color of each pixel displayed on the screen for a given frame. The input the GPU receives however is very different from a list of pixel coordinates and colors.

A 3D application or game will first provide the GPU with a list of vertex coordinates. Each set includes the coordinates for three vertices in space, these describe the size, shape and position of a triangle. A single frame is composed of hundreds to millions of these triangles. Literally everything you see on screen is composed of triangles:

Having more triangles (polygons) can produce more realistic scenes but it requires a lot more processing on the front end. The trend in 3D gaming has generally been towards higher polygon counts over time.

The GPU's first duty is to take this list of vertices and convert them into triangles on a screen. Doing so results in a picture similar to what we've got above. We're dealing with programmable GPUs now so its possible to run code against these vertexes to describe their interactions or effects on them. An explosion in an earlier frame may have caused the vertices describing a character's elbow to move. The explosion will also impact lighting on our character. There's going to be a set of code that describes how the aforementioned explosion impacts vertices and another snippet of code that describes what vertices it impacts. These code segments run and modify details of the vertices at this stage.

With the geometry defined the GPU's next job is rasterization: figure out what pixels cover each triangle. From this point on the GPU stops dealing in vertices and starts working in pixel coordinates.

Once rasterized, it's time to give these pixels some color. The color of each pixel is determined by the texture that covers that pixel and/or the pixel shader program that runs on that pixel. Similar to vertex shader programs, pixel shader programs describe effects on pixels (e.g. flicker bright orange at interval x to look like fire).

Textures are exactly what they sound like: wallpaper for your polygons. Knowing pixel coordinates the GPU can go out to texture memory, fetch the texture that maps to those pixels and use it to determine the color of each pixel that it covers.

There's a lot of blending and other math that happens at this stage to deal with corner cases where you don't have perfect mapping of textures on polygons, as well as dealing with what happens when you've got translucency in your textures. After you get through all of the math however the GPU has exactly what it wanted in the first place: a color value for every pixel on the screen.

Those color values are written out to a frame buffer in memory and the frame buffer is displayed on the screen. This process continues (hopefully) dozens of times per second in order to deliver a smooth visual experience.

The pipeline I've just described is known as an immediate mode renderer. With a few exceptions, immediate mode renderers were the common architectures implemented in PC GPUs over the past 10+ years. These days pure immediate mode renderers are tough to find though.


IMRs render the full car and the tree, even though part of the car is occluded

Immediate mode renderers (IMRs) brute force the problem of determining what to draw on the screen. They take polygons as they receive them from the CPU, manipulate and shade them. The biggest problem here is although data for every polygon is sent to the GPU, some of those polygons will never be displayed on the screen. A character with thousands of polygons may be mostly hiding behind a pillar, but a traditional immediate mode renderer will still put in all of the work necessary to plot its geometry and shade its pixels, even though they'll never be seen. This is called overdraw. Overdraw unfortunately wastes time, memory bandwidth and power - hardly desirable when you're trying to deliver high performance and long battery life. In the old days of IMRs it wasn't uncommon to hear of 4x overdraw in a given scene (i.e. drawing 4x the number of pixels than are actually visible to the user). Overdraw becomes even more of a problem with scene complexity.

Tile Based Deferred Rendering

On the opposite end of the spectrum we have tile based deferred rendering (TBDR). Immediate mode renderers work in a very straightforward manner. They take vertices, create polygons, transform and light those polygons and finally texture/shade/blend the pixels on them. Tile based deferred renderers take a slightly different approach.

TBDRs subdivide the scene into smaller tiles on the order of a few hundred pixels. Vertex processing and shading continue as normal, but before rasterization the scene is carved up into tiles. This is where the deferred label comes in. Rasterization is deferred until after tiling and texturing/shading is deferred even longer, until after overdraw is eliminated/minimized via hidden surface removal (HSR).

Hidden surface removal is performed long before we ever get to the texturing/shading stage. If the frontmost surface being rendered is opaque, there's absolutely zero overdraw in a TBDR architecture. Everything behind the frontmost opaque surface is discarded by performing a per-pixel depth test once the scene has been tiled. In the event of multiple overlapping translucent surfaces, overdraw is still minimized. Only surfaces above the farthest opaque surface are rendered. HSR is performed one tile at a time, only the geometry needed for a single tile is depth tested to keep the problem manageable.

With all hidden surfaces removed then, and only then, is all texture data fetched and all pixel shader code executed. Rendering (or more precisely texturing and shading) is deferred until after a per-pixel visibility test is passed. No additional work is expended and no memory bandwidth wasted. Only what is visible in the final scene is rasterized, textured and shaded on each tile.

The application doesn't need to worry about the order polygons are sent for rendering when dealing with a TBDR, the hidden surface removal process takes care of everything.

In memory bandwidth constrained environments TBDRs do incredibly well. Furthermore, the efficiencies of a TBDR really shine when running applications and games that are more shader heavy rather than geometry heavy. As a result of the extensive hidden surface removal process, TBDRs tend not to do as well in scenes with lots of complex geometry.

What's In Between Immediate Mode and Deferred Rendering?

These days, particularly in the mobile space, many architectures refer to themselves as "tile based". Unfortunately these terms can have a wide variety of meanings. The tile based deferred rendering architecture I described above really only applies to GPUs designed by Imagination Technologies. Everything else falls into the category of tile based immediate mode renderers, or immediate mode renderers with early-z.

These GPUs look like IMRs but they implement one or both of the following: 1) scene tiling, 2) early z rejection.

Scene tiling is very similar to what I described in the section on TBDRs. Each frame is divided up into tiles and work is done on a per-tile basis at some point in the rendering pipeline. The goal of dividing the scene into tiles is to simplify the problem of rendering and better match the workload to the hardware (e.g. since no GPU is a million execution units wide, you make the workload more manageable for your hardware). Also by working on small tiles caches behave a lot better.

The big feature that this category of GPUs implements is early-z rejection. Instead of waiting until after the texturing/shading stage to determine pixel visibility, these architectures implement a coarse test for visibility earlier in the pipeline.

Each vertex has a depth value and using those values you can design logic to find out what polygons (or parts of polygons) are occluded from view. GPU makers like ATI and NVIDIA introduced these early visibility tests years ago (early-z or hierarchical-z are some names you may have heard). The downside here is that early-z techniques only work if the application submits vertices in a front-to-back order, which does require extra work on the application side. IMRs process polygons in the order they're received, and you can't reject anything if you're not sure if anything will be in front of it. Even if an application packages up vertex data in the best way possible, there are still situations where overdraw will occur.

The good news is you get some of the benefits of a TBDR without running into trouble should geometry complexities increase. The bad news is that a non-TBDR architecture will still likely have higher amounts of overdraw and be less memory bandwidth efficient than a TBDR.

Most modern PC GPUs fall into this category. Both NVIDIA's Fermi and AMD's Cayman GPUs do some amount of tiling although they have their roots in immediate mode rendering.

The Mobile Landscape

Understanding the difference between IMRs, IMRs with early-z, TBRs and TBDRs, where do the current ultra mobile GPUs fall? Imagination Technologies' PowerVR SGX 5xx is technically the only tile based deferred renderer that allows for order independent hidden surface removal.

Qualcomm's Adreno 2xx and ARM's Mali-400 both appear to be tile based immediate mode renderers that implement early-z. This is particularly confusing because ARM lists the Mali-400 as featuring "advanced tile-based deferred rendering and local buffering of intermediate pixel states". The secret is in ARM's optimization documentation that states: "One specific optimization to do for Mali GPUs is to sort objects or triangles into front-to-back order in your application. This reduces overdraw." The front-to-back sort requirement is necessary for most early-z technologies to work properly. These GPUs fundamentally tile the scene but don't perform full order independent hidden surface removal. Some aspects of the traditional rendering pipeline are deferred but not to the same extent as Imagination's design.

NVIDIA's GeForce ULP in the Tegra 2 is an IMR with early-z. NVIDIA has long argued that its design is the best for future games with increasing geometry complexities as a result of its IMR design.

Today there's no real benefit to not building a TBDR in the ultra mobile space. Geometry complexities aren't very high and memory bandwidth does come at a premium. Moving forward however, the trend is likely going to mimic what we saw in the PC space: towards more polygon heavy games. There is one hiccup though: Apple.

In the evolution of the PC graphics industry the installed base of tile based deferred renderers was extremely small. Imagination's technology surfaced in two discrete GPUs: STMicro's Kyro and Kyro II, but neither was enough to stop NVIDIA's momentum at the time. Since immediate mode renderers were the norm, games simply developed around their limitations. AMD and NVIDIA both eventually implemented elements of tiling and early-z rejection, but TBDRs never took off in PCs.

In the ultra mobile space Apple exclusively uses Imagination Technologies GPUs, which I mentioned above are tile based deferred renderers. Apple also happens to be a major player, if not the biggest, in the smartphone/tablet gaming space today. Any game developer looking to put out a successful title is going to make sure it runs well on iOS hardware. Game developers will likely rely on increasing visual quality through pixel shader effects rather than ultra high polygon counts. As long as Imagination Technologies is a significant player in this space, game developers will optimize for TBDRs.

The Fastest Smartphone SoC Today: Samsung Exynos 4210 The Mali-400
Comments Locked

132 Comments

View All Comments

  • numberoneoppa - Wednesday, September 14, 2011 - link

    Guys, that mysterious notch you write about is not for straps, it's for phone charms, and it's arguably my favourite feature of samsung phones. (In korea, phone charms can be used for more than just cute things, one can get a T-money card that will hang here, or an apartment key).
  • Tishyn - Wednesday, September 14, 2011 - link

    I spend hours every week just browsing through reviews and tests comparing devices and vendors. This is one if the most interesting and most comprehensive review I've read for a veery long time.

    I especially enjoyed the rendering part and how it relates to the ultra mobile device market. Thumbs up!
  • milli - Wednesday, September 14, 2011 - link

    Brian / Anand, why are you so reluctant to test chips from this company? ZiiO tablets, sporting the ZMS-08, are available for a while now and i'm sure Creative would send you the new Jaguar3 tablet (ZMS-20) if you guys would ask for it.
    The ZMS-20 has 26 GFlops ... faster than anything you've tested till now. The ZMS-40 coming in Q4 doubles that number!
    I'm an old school IT technician and I for one don't understand your lack of interest. The GPU's in these chips are based on technology that Creative acquired with the 3DLabs purchase.
  • rigel84 - Thursday, September 15, 2011 - link

    Just a quick tip: You can take a screenshot by pressing the power and home button at the same time.

    If you double tap your home button it will bring the voice talk feature.

    While watching video clips just press the power button to disable the touch sensitive buttons.

    Swipe your finger to the left on contact name to send him a message
    Swipe you finger to the right on the contact name to dial the contact.

    To see all the tabs in the browser just pinch inside twice :)

    If you experience random reboots when you drop it on the table, or if you are leaning towards things or running, then try to cut a piece of paper and put it under the battery. It happens because the battery shortly looses connection to the pins. If you check XDA you can see that many people has this problem, and I had it too. I was experiencing many random reboots whenever I had it in my pocket, but after I pit a piece of paper below the battery they all disappeared.

    A few things...
    - GPS is horrible if you ask me. Unless I download the data before with gps-status then it takes ages. Mostly 15-30 seconds with 2.3.3 (no idea if the radio got updated in the release)
    - Kies AIR is HORRIBLE! It's on pair with realmedia's real player from 10 years ago. Crash on crash on crash and sluggish behavior.
    - I don't know whether it's the phone or not, but I've been missing a lot of text messages after I got my Galaxy S2. I'm on the same net, but along with the poor GPS reception I'm suspectiong the phone :(
    - There is a stupid 458 character limit on textmessages, and then they are auto-converted to an MMS message. There is a fixed mms.apk on XDA (requires root) or you can download something like Go SMS Pro (still free) on the market, which removes this stupid limit.
  • ph00ny - Thursday, September 15, 2011 - link

    Odd

    I haven't seen any posts about the battery disconnect issues and if you've been browsing the xda forum, probably saw my thread about dropping my phone on concrete twice...

    As for Kies AIR, i've used it twice and my expectation was low to begin and it wasn't that bad. Some things were definitely slow but it's a good start

    -GPS for me has always been solid. I even used it on multiple trips in less than ideal location, not a single glitch even with shoddy cell reception.
  • ciparis - Tuesday, September 27, 2011 - link

    I've been using Sprint's SGS2 (Epic 4G) for less than a day, but already there are some annoying points which I'm surprised aren't mentioned in this review:

    1) The digitizer lags behind finger movement.
    In the web browser, when your finger moves, there is a disconnected rubber-band effect before the screen catches up with your finger. This is visible in the browsing smoothness video as well, and it's very noticeable in actual use. Coming from an iPhone 4, it feels cheap and broken.

    2) Back/Forward navigation often ignores the previous scroll point.
    If you spend some amount of time reading a page you arrived at from a link (it seems to be about 10 seconds or so), hitting back doesn't take you back where you were previously reading from -- instead of returning you to the page position where the link was, it drops you at the top of the page. This makes real web usage tedious. On the Sprint, the timing seems to be related to when the 4G icon indicates sleep mode: hit back before the radio sleeps and you are returned to the right spot. In actual use, this rarely happens.

    3) The browser resets the view to the top, even after you've started scrolling.
    When loading a page, there's a point in which the page is visible and usable, but it's technically still loading (which can go on for quite a awhile, depending on the page). It's natural to start reading the page and scrolling down, but typically the phone will randomly jerk the scroll back up to the top of the page, sometimes several times before the page is done. This is unbelievably annoying.

    I suppose expecting an Apple level of polish prior to release is unrealistic, but Samsung seems hell-bent on positioning themselves as an Apple-level alternative; even the power brick looks like they took the square Apple USB charger, colored it black, and slapped their logo on it. The point being, they're inviting direct comparison, and it's a comparison their software team isn't ready to deliver on -- certainly not out of the box.
  • ciparis - Tuesday, September 27, 2011 - link

    How are you supposed to use this phone if the keyboard is covering up the text fields, there's no "next" button to get to the next field, you can't see what you're typing, and there's no button to make the keyboard go away?

    Case in point: go to Google News and click on Feedback at the bottom of the page. There's no scrolling room at the bottom, so the keyboard obscures the fields; I was unable to send feedback to Google that their news site was opening every link in a new bowser window on a mobile phone (...) despite my account having the preference for that set to "off", because I couldn't navigate the form fields.
  • mythun.chandra - Wednesday, September 28, 2011 - link

    Just realized there are no numbers for the Adreno 220 in the GLBench 2.1 offscreen tests...?
  • sam46 - Saturday, October 1, 2011 - link

    brian,please tell me which one of these smartphones is the best.i wanna purchase one of them so,pls help me in deciding.
  • b1cb01 - Wednesday, October 5, 2011 - link

    I love the green wallpaper on the first page of the review, but I can't find it anywhere. Could someone point me to where I could find it?

Log in

Don't have an account? Sign up now