The PowerVR Approach: Tile Rendering

You may be thinking to yourself "Isn't that a lot of wasted time rendering pixels that are just going to be thrown out later?" Well Imagination Technologies came to the same conclusion and decided to rethink the entire process. What they came up with is an entirely different approach to 3D rendering, known as tile based rendering, that is the key to their PowerVR technology. The basic idea is to eliminate any redundant processing in the 3D pipeline, which in turn results in decreased memory bandwidth requirements.

The first difference in the pipeline is critical. Rather than process polygons one at a time as they are passed to the accelerator, they are grouped together into groups known as display lists. This allows each scene to be broken up into smaller tiles that are rendered independently, leading to a number of benefits.

Since each tile is only a small piece of the whole scene, key operations can be performed on-chip without accessing external memory. The most important function implemented on-chip in PowerVR technology is the z-buffer that performs hidden surface removal, cutting memory access significantly. The on-chip z-buffer in the KYRO is 24-bits deep, regardless of the color depth of the external frame buffer. Being on-chip eliminates the continual z-buffer memory accesses that traditional accelerators must perform and also frees up space in the cards external memory. The amount saved in external memory is equal to the memory required for the frame buffer, since traditional 3D accelerators run the z-buffer at the same color depth and resolution as the frame buffer.

Next, is the ability to texture only the pixels that are visible on screen, so no pixel is ever rendered only to be thrown out later. This is possible because each tile has a display list that includes all the polygons for that tile, allowing for hidden surface removal to occur before textures are applied, eliminating overdraw. A small "tile buffer," which is basically a frame buffer the size of an individual tile, is also located on chip so that blending can be performed without going to external memory.

This is how the KYRO is able to have a higher effective fillrate than its theoretical fillrate. According to Imagination Technologies, an overdraw of between 2.5 and 3.5 is typical for today's games. Going on an average of 3 allows them to acheive the claimed 750 megapixel/s effective fillrate out of the 250 megapixels that the Kyro is actually speced at. Imagination Technologies and STMicro like to refer to this as "Deferred Texturing."

Since texturing is performed on-hip, multitexturing becomes much more efficient in certain circumstances. Consider the GeForce 2 GTS, which can apply 2 texels to 4 pixels in a single pass. If the number of textures for a single pixel exceeds 2, then the GeForce 2 GTS will have to render the pixel in two passes. Those two passes mean that geometry data be sent again for the second pass. On the other hand, the KYRO is capable of apply up to 8 textures to a pixel in a single pass. Tile rendering, once again, reduces memory bandwidth requirements. Note that this does not mean that the KYRO can apply 8 textures in a single clock.

As a result of texturing and z-buffering being performed on-chip, they can be done in full 32-bit color without the large performance penalty that traditional architectures must incur. Further, the internal 32-bit rendering occurs regardless of the frame buffer's color depth. The penalty that most architectures incur for 32-bit rendering is a result of memory bandwidth constraints that are in turn a result of the constant z-buffer accesses and unnecessary overdraw. In an ideal world, with infinite memory bandwidth, traditional 3D architectures would not slow down when rendering in 32-bit color. Imagination Technologies and STMicro like to refer to this as "Internal True Color."

So why not render in 32-bit mode all the time? If it were that simple, the KYRO probably would always operate in 32-bit mode. The fact remains that a 32-bit frame buffer and textures still take up twice as much memory as 16-bit ones. While the KYRO is able to render each tile on-chip, it is still necessary to put the completed tile in the frame buffer and also to read textures from memory, so the memory bandwidth requirements for 32-bit color are still double what they are for 16-bit. The obvious question is then why not use 16-bit frame buffers with 32-bit internal rendering all the time? As the screen shots below show, full 32-bit still looks better since the 16-bit image is dithered down from the internal 32-bit. Note that the significant reduction in dithering for the 16-bit image compared to most cards 16-bit rendering.

16-bit shown above, 32-bit is below

The images above are JPEG compressed and thus have some quality loss compared to the originals.
Click here to download a zip file (300KB)with the original images in BMP format.
For the full effect, the images should be viewed full screen.

The last major advantage of tile rendering is that it is easily scalable with multiple cores on a single chip or multiple chips working in tandem. Each chip simply renders a different tile in parallel with the other chip. This is completely unlike ATI's AFR technology used on the Rage Fury MAXX or 3dfx's SLI on the Voodoo2 and Voodoo5.

Traditional 3D Rendering The Future & History of Tile Rendering
Comments Locked

1 Comments

View All Comments

  • Lanning Donald - Saturday, March 28, 2020 - link

    Reading these specifications of KYRO has made me so much interested in purchasing and using this technology for the commercial purposes. I have visited https://legitimate-writing-services.blogspot.com/2... site to get paper writing help and now I am hoping to reap out some fantastic benefits after using this technology.

Log in

Don't have an account? Sign up now