Original Link: http://www.anandtech.com/show/558



Introduction

When things are going crazy in your life, taking a step back for a moment to reevaluate everything can often be the key to a revlation that solves some of those problems. The same concept can be applied to all walks of life and any circumstances.

In fact, the same can even be said about technology. After all, this is the way most revolutionary products have come about. Take a look at RISC CPU's for example. In the mid 80's, RISC technology was developed by a bunch of grad students who took a step back to find a more efficient way to design a CPU. In only a few months of development, they were able to surpass the performance of the most complex CISC machines available from all the big CPU manufacturers and their huge R&D budgets. Today, just about every new CPU is RISC based, or at least very RISC-like internally.

There are numerous examples throughout history and you can even see companies that are taking that approach today in the computer industry. While some of these crazy ideas will fail, and thus be forgotten, others will provide the big break throughs for the 21st century. Transmeta believes they have a unique way to design a CPU that gives great flexibility and performance for today and tomorrow. Rambus believes their RDRAM technology can overcome bottlenecks they see on the horizon with current memory technology. Who knows if we'll even remember Transmeta or Rambus if they fail, but if they succeed they will definitely not be forgotten.

Interestingly, 3D acceleration has remained relatively unchanged at its root for quite a while. Sure 3D acceleration has gotten faster thanks to clock rate increases and parallel pipelines. True we've constantly moved more of 3D pipeline into hardware and left less for the CPU. But the same basic rendering process has been in place since the 1960's, although back then it took a building filled with room-sized computers days to calculate a scene that current accelerators render at a 100 frames a second with ease.

Nevertheless, the pipeline is basically the same and we're beginning to run into the limitations of the current 3D rendering process. Take a look at the GeForce 2 GTS - we've shown clearly in our Overclocking the GeForce 2 GTS Guide that the core clock increases we are able to acheive today are quickly outstripping the memory bandwidth available with today's technology. While tomorrow's memory technology, whatever that may be, will certainly help, improvements to the core are happening at a much more rapid pace.

The time is right for someone to step forward with something revolutionary, and the first one to do so successfully may well take over the consumer 3D market if others don't catch up quickly. Imagination Technologies and STMicro believe that the solution is tile rendering. Tile rendering is at the heart of their new chip, the PowerVR Series 3 dubbed KYRO.



KYRO Specifications

  • 12 million transistors
  • 0.25 micron process
  • 125 MHz core / memory clock
  • 2 Pixel Pipelines
  • 250 megapixel/s fillrate (750 megapixel/s effective fillrate)
  • Support for 16-64 MB of SDRAM or SGRAM
  • 128-bit data path to memory (2GB/s bandwidth)
  • 32-bit z-buffer
  • Tile rendering architecture
  • Full Scene Anti-Aliasing (2x and 4x)
  • Environment Mapped Bump Mapping (EMBM)
  • 8-Layer Multitexturing
  • Motion compensation support
  • Support for AGP 4X, SBA, DiME
  • DXTC Texture Compression
  • Full OpenGL ICD

A quick glance at those specs may ellicit a "Are you kidding me?" initial response from many of you. Afterall, 125 MHz, 2 pixel pipelines, 250 megapixel/second fillrate, and a 0.25 micron manufacturing process sounds a lot like an NVIDIA RIVA TNT2 (not ultra) which has been out for over a year. But look closer and things quickly become interesting - the key is the tile rendering architecture, which we'll discuss in depth a bit later.

For now, note the 750 megapixel/s "effective" fillrate. Never before have we seen manufacturers quote an "effective fillrate" number as part of their specs. The simple reason is that for traditional 3D accelerators, the actual or effective fillrate never exceeds the theoretical fillrate. Of course, the KYRO can't run faster than its theoretical limitations. Rather, the PowerVR tile rendering architecture operates more efficiently than a traditional accelerator, allowing the KYRO, with its "lowly" 250 megapixel/s fillrate, to perform like a traditional accelerator with a fillrate of 750 megapixel/s. At least, that's what Imagination Technologies and STMicro claim, but we'll have to test that claim with real games first. Once again, the key is the tile rendering architecture that allows the KYRO to acheive this seemingly impossible feat.

Now let's take a look at that 125 MHz memory clock, which results in a memory bandwidth of 2GB/s, a far cry from the 5.3GB/s of the GeForce 2 GTS and Voodoo5 5500 or even the 2.7GB/s of the GeForce SDR. Much has been made of the GeForce 2 GTS and GeForce SDR having their fillrate limited by memory bandwidth at higher resolutions - our Overclocking the GeForce 2 GTS Guide illustrates this fact quite clearly. Yet the KYRO scoffs at NVIDIA and 3dfx and then gets away with it by using a more efficient architecture. Yet again, the key is the tile rendering architecture and its ability to use memory bandwidth much more efficiently than traditional 3D accelerators.

The rest of the specs are pretty typical, but we can see that Imagination Technologies threw in just about every feature that other 3D accelerators use, including Environment Mapped Bump Mapping (EMBM) that Matrox popularized with their G400, Full Scene Anti-Aliasing that 3dfx has been hyping for over a year and provided in the case of the Voodoo5 5500 by their proprietary T-Buffer, and DXTC Texture Compression that S3 started with S3TC on the Savage 4 that is just now showing its benefits to gamers. AGP 4X/2X/1X are all supported, as is side band addressing and texturing straight from system memory.

The full OpenGL ICD mentioned above is notable, not because it is something special in the industry today, but rather because Neon250 (PowerVR Series 2) users had to deal with OpenGL miniports - KYRO brings the PowerVR series back in line with the rest of the industry.



The 3D Pipeline

Before we can talk about tile rendering, it is critical to first understand how a 3D accelerator works in general.

All 3D accelerators seek to take a 3D world, modeled on your computer mathematically, and render it to a 2D image that is displayed on your computer monitor. This 3D "world" is typically modeled by objects made up of adjoining polygons that are defined by their verticies, which in turn are represented by their x, y, and z coordinates as well as a color value. These polygons are later shaded and/or textured to add color and surface properties. There are three steps to processing such an image in order to render it as a 2D image for display on your monitor: transform and lighting, hidden surface removal, and texturing and shading.

NVIDIA has chosen to focus on accelerating the transform and lighting (T&L) part of the pipeline by implementing it in hardware on their GeForce series of cards. ATI has done the same with their upcoming Radeon. Other cards use the CPU to handle these functions. The hardware implementation allows many more polygons to be drawn in a scene, while freeing up the CPU for other tasks. Although the power of hardware T&L is yet to be utilized to any great extent by any game currently on the market, future games are expected to implement higher polygon counts that only a hardware T&L unit can process. Most games today actually use light maps to speed things up instead of the lighting implemented by current T&L units. The reasoning is that without hardware lighting, performance would degrade to unacceptable levels and game developers have to design their games such that most people can play them.

Imagination Technologies has decided to focus on improving the other two parts of the rendering pipeline by using a completely new way of performing those functions. Before we get into how they've improved on these parts of the pipeline, let's take a look at a traditional 3D accelerator.

Traditional Hidden Surface Removal and Texturing / Shading

A traditional 3D accelerator processes each polygon as it is sent to the hardware, without any knowledge of the rest of the scene. Since there is no knowledge of the rest of the scene, every forward facing polygon must be shaded and textured. A z-buffer is used to store the depth of each pixel in the current back buffer. Each pixel of each polygon rendered must be checked against the z-buffer to determine if it is closer to the viewer than the pixel currently stored in the back buffer.

Checking against the z-buffer must be performed after the pixel is already shaded and textured. If a pixel turns out to be in front of the current pixel, the new pixel replaces (or is blended with in the case of transparency) the current pixel in the back buffer and the z-buffer depth updated. If the new pixel ends up behind the current pixel, the new pixel is thrown out and no changes are made to the back buffer (or blended in the case of transparency). When pixels are drawn for no reason, this is known as overdraw. Drawing the same pixel three times is equivalent to an overdraw of 3, which Imagination Technologies and STMicro claim is typical.

Once the scene is complete, the back buffer is flipped to the front buffer for display on the montior.

What we've just described is known as "immediate mode rendering" and has been used since the 1960's for still frame CAD rendering, architectural engineering, film special effects, and now in most 3D accelerators found inside your PC.



The PowerVR Approach: Tile Rendering

You may be thinking to yourself "Isn't that a lot of wasted time rendering pixels that are just going to be thrown out later?" Well Imagination Technologies came to the same conclusion and decided to rethink the entire process. What they came up with is an entirely different approach to 3D rendering, known as tile based rendering, that is the key to their PowerVR technology. The basic idea is to eliminate any redundant processing in the 3D pipeline, which in turn results in decreased memory bandwidth requirements.

The first difference in the pipeline is critical. Rather than process polygons one at a time as they are passed to the accelerator, they are grouped together into groups known as display lists. This allows each scene to be broken up into smaller tiles that are rendered independently, leading to a number of benefits.

Since each tile is only a small piece of the whole scene, key operations can be performed on-chip without accessing external memory. The most important function implemented on-chip in PowerVR technology is the z-buffer that performs hidden surface removal, cutting memory access significantly. The on-chip z-buffer in the KYRO is 24-bits deep, regardless of the color depth of the external frame buffer. Being on-chip eliminates the continual z-buffer memory accesses that traditional accelerators must perform and also frees up space in the cards external memory. The amount saved in external memory is equal to the memory required for the frame buffer, since traditional 3D accelerators run the z-buffer at the same color depth and resolution as the frame buffer.

Next, is the ability to texture only the pixels that are visible on screen, so no pixel is ever rendered only to be thrown out later. This is possible because each tile has a display list that includes all the polygons for that tile, allowing for hidden surface removal to occur before textures are applied, eliminating overdraw. A small "tile buffer," which is basically a frame buffer the size of an individual tile, is also located on chip so that blending can be performed without going to external memory.

This is how the KYRO is able to have a higher effective fillrate than its theoretical fillrate. According to Imagination Technologies, an overdraw of between 2.5 and 3.5 is typical for today's games. Going on an average of 3 allows them to acheive the claimed 750 megapixel/s effective fillrate out of the 250 megapixels that the Kyro is actually speced at. Imagination Technologies and STMicro like to refer to this as "Deferred Texturing."

Since texturing is performed on-hip, multitexturing becomes much more efficient in certain circumstances. Consider the GeForce 2 GTS, which can apply 2 texels to 4 pixels in a single pass. If the number of textures for a single pixel exceeds 2, then the GeForce 2 GTS will have to render the pixel in two passes. Those two passes mean that geometry data be sent again for the second pass. On the other hand, the KYRO is capable of apply up to 8 textures to a pixel in a single pass. Tile rendering, once again, reduces memory bandwidth requirements. Note that this does not mean that the KYRO can apply 8 textures in a single clock.

As a result of texturing and z-buffering being performed on-chip, they can be done in full 32-bit color without the large performance penalty that traditional architectures must incur. Further, the internal 32-bit rendering occurs regardless of the frame buffer's color depth. The penalty that most architectures incur for 32-bit rendering is a result of memory bandwidth constraints that are in turn a result of the constant z-buffer accesses and unnecessary overdraw. In an ideal world, with infinite memory bandwidth, traditional 3D architectures would not slow down when rendering in 32-bit color. Imagination Technologies and STMicro like to refer to this as "Internal True Color."

So why not render in 32-bit mode all the time? If it were that simple, the KYRO probably would always operate in 32-bit mode. The fact remains that a 32-bit frame buffer and textures still take up twice as much memory as 16-bit ones. While the KYRO is able to render each tile on-chip, it is still necessary to put the completed tile in the frame buffer and also to read textures from memory, so the memory bandwidth requirements for 32-bit color are still double what they are for 16-bit. The obvious question is then why not use 16-bit frame buffers with 32-bit internal rendering all the time? As the screen shots below show, full 32-bit still looks better since the 16-bit image is dithered down from the internal 32-bit. Note that the significant reduction in dithering for the 16-bit image compared to most cards 16-bit rendering.

16-bit shown above, 32-bit is below

The images above are JPEG compressed and thus have some quality loss compared to the originals.
Click here to download a zip file (300KB)with the original images in BMP format.
For the full effect, the images should be viewed full screen.

The last major advantage of tile rendering is that it is easily scalable with multiple cores on a single chip or multiple chips working in tandem. Each chip simply renders a different tile in parallel with the other chip. This is completely unlike ATI's AFR technology used on the Rage Fury MAXX or 3dfx's SLI on the Voodoo2 and Voodoo5.



The Future of Tile Rendering

So why doesn't everyone use tile based rendering? Since it is not the conventional 3D that we've had for years, it requires a complete redesign. Imagination Technology has already been through these growing pains and is thus ahead of the game. Will everything eventually transition to tile rendering? We wouldn't be surprised, but it is very hard to say for certain.

Consider for a moment that by the end of this year, we'll already be pushing 200 MHz DDR (400 MHz effective) memory on a 128-bit bus just for graphics. We're approaching the speed limits of current memory technologies and the bus can't be widened much more. So what's the solution? Rambus would like you to believe it's RDRAM, but a more efficient rendering scheme seems to make more sense.

So just how much bandwidth does tile based rendering save and how much does it need. If you ask Imagination Technologies, a traditional 3D accelerator requires 5.3GB/s of memory bandwidth for 1024x768x32 at 60 fps, which just happens to be exactly what the GeForce 2 GTS and Voodoo5 5500 both have available to them. Mean while, the KYRO under the same conditions requires just under 2GB/s, which also happens to be exactly what it has available to it. The GeForce 2 GTS is able to hit 80 fps in Quake III Arena under those conditions, so obviously these numbers are skewed a bit, most likely in favor of KYRO. For the full details of these calculations, visit http://kyro.st.com and read the white paper on Tile Based Rendering. Stepping up to 1600x1200x32, traditional accelerators need 13 GB/s of bandwidth under the same conditions, while PowerVR tile rendering would need just under 4GB/s. Interestingly, if these numbers are accurate, DDR memory at 166 MHz already has more than enough bandwidth to handle 1600x1200x32, whereas there is virtually nothing on the horizon that can provide 13GB/s of bandwidth.

The one possibility is embedded DRAM, or eDRAM for short, which is just DRAM integrated onto the graphics core. Bitboys is currently the biggest proponent of this solution, but we've seen nothing from them beyond press releases and hype for quite a while now. Their press releases claim 9MB of eDRAM running on a 512-bit internal bus for a total of 9.6GB/s of memory bandwidth. Still not quite up to the 13GB/s number from Imagination Technologies, but certainly impressive nonetheless. If you do the math, that comes out to either a 600 MHz clock or DDR style transfers at 300 MHz. However, eDRAM will add quite a few transistors to any graphics core and would probably not be a viable solution for something already as complex as the GeForce 2 GTS for example.

It seems that 3dfx believes in tile rendering as evidenced by their recent aquisition of Gigapixel. Gigapixel had been touting their GP-1 core as the first gigapixel graphics technology that also had the ability to perform free FSAA. The GP-1 was also a tile renderer, but has yet to released to be turned into an actual product. Note that even though Gigapixel claimed they have the ability to do free FSAA with their architcecture, the KYRO does not have this ability. Imagination Technologies was definitely the first with an actual product available in the consumer market that incorporated tile based rendering.

If STMicro can move the part to 0.18 micron technology and crank up the clock speed, they have the potential to dominate the market. Imagine a quad pixel pipeline running at 200 MHz and assume an overdraw of 4 for future games - that's 3.2 gigapixels/s of fillrate. A clock speed of 200 MHz on a 0.18 micron process is not completely unreasonable, as evidenced by NVIDIA's much larger GeForce 2 GTS attaining those speeds. Realize that the previous numbers are purely hypothetical. What are sources have told us is that STMicro and Imagination Technologies have an agressive road map that includes eventually adding the Series 3 to motherboards and eventually a T&L product for Series 4. Rumored specs for Series 4 include a 0.18 micron process, 166 MHz clock speed, DDR memory, and quad pixel pipelines - possibly by the end of this year, but most likely for early 2001.

Some History

The original PowerVR Series 1, known as the PCX, was a stand alone 3D accelerator that was the first widely distributed product to use tile based rendering. Matrox sold the PCX for a while, under the name M3D, for just $100. It actually featured a unique rendering method, using infinite planes instead of polygons and lacked a z-buffer completely. The infinite plane setup caused all sorts of problems for developers used to working with traditional polygons. PowerVR Series 2, 3, and beyond do not use infinite planes and do not have the problems associated with the Series 1.

Tile rendering's biggest success is probably in Imagination Technologies own PowerVR Series 2 chip, featured on the Neon250 and Sega Dreamcast. However it was the deal with Sega for the Dreamcast that delayed the PowerVR Series 2 debut on the PC platform for almost a year, which in turn allowed the chip to be passed by in terms of performance. A year is a long time in the 3D accelerator business where product cycles have shortened to just 6 months. This is the reason for the low sales numbers and minimal excitement about the Neon250 at the time of launch, although it was fairly popular in Europe relative to the rest of the world.

PowerVR Series 1 and 2 were designed by Imagination Technologies and manufactured by NEC.

This time around, there is no console system to get in the way of the PC launch of the PowerVR Series 3, now known as KYRO. The 250 in the name Neon250 refers to the claimed effective fillrate of the PowerVR Series 2, which is derived from its 125 MHz clock rate and single pixel pipeline with an overdraw estimate of 2. For Series 3, the thought is that games have progressed to a level where an over draw of 3 is more typical. That means that the Neon250 would actually have an effective fillrate of 375 megapixels/s in today's games. However you look at it, the Series 3 is twice as powerful as the Series 2 in terms of raw fillrate, thanks to the additional pixel pipeline, and includes a number of new features.

And how does STMicro fit into all this? They've licensed the Series 3 technology from Imagination Technologies and will be producing and distributing the KYRO. Those of you that have been around the industry for a while and have good memories may remember that STMicro actually manufactured a portion of the RIVA128 chips and helped take the RIVA128 to markets that NVIDIA may never have been able to penetrate. Sounds like a good partnership for both parties.



The Chip

One benefit of the tile based rendering technique is that it is extremely efficient, meaning that less raw power is needed for a fast performing chip. This is reflected in the KYRO's core design, which houses a relatively small number of transistors, 12 million to be exact, on a 0.25 micron process. Interestingly, the KYRO turns out to have about the same transistor count and manufacturing process as a VSA-100 chip (14 million transistors on an enhanced 6-layer 0.25 micron process) used by 3dfx on the Voodoo4 and Voodoo5. While the manufacturing process may be larger, the chip itself is noticeably smaller than the behemoth GeForce 2 GTS and its 25 million transistors on a 0.18 micron process. Despite the similar specifications, the KYRO is also much smaller than a single VSA-100.

Unfortunately, this smaller size does not necessarily mean that the KYRO is able to pack the same punch as the GeForce or a Voodoo4/5. in a smaller package. The KYRO lacks hardware T&L, a feature that has just recently been embraced by the video card and gaming industries, but accounts for a large portion of the GeForce's transistors. Although the power of hardware T&L is yet to be utilized to any great extent by any game currently on the market, future games are expected to implement higher polygon counts that only a hardware T&L unit can process. Imagination Technologies / STMicro are betting that games such as these will not come out until their next product, the PowerVR Series 4, is released with its own hardware T&L unit. As far as KYRO goes, they are following 3dfx's lead by leaving the T&L processes up to the CPU for now, with future products destined to provide hardware T&L.

The main advantage to reducing chip size as drastically as Imagination Technologies / STMicro have been able to is that production price of the chip is significantly lower. Fewer transistors means less silicon which results in more chips per wafer. More chips per wafter means higher yields and therefore lower cost. In addition, for a given manufacturing process, fewer transistors results in lower heat output. Thus, we were not surprised when our reference board arrived at the lab with an extremely small 3.8 x 3.8 cm heatsink, which is nearly 1 cm x 1 cm less than reference GeForce cards. The active cooling fan/heatsink combination looked more like a fan mounted on a metal plate than a fan actively cooling the fins of a heatsink.

As a result of the KYRO's small size, heat produced by the chip is minimal. This was especially the case with our KYRO card, which was running at 115 MHz in the core. Although the final shipping version of the card is slated to have a core clock speed of 125 MHz, the heat produced by an additional 10 MHz of overclocking should be minimal. With this in mind, it seems odd that PowerVR would have chosen to have a metal plate of sorts cooled by a fan over a non-active cooling solution. The reason for this is most likely cosmetic. When people see a video card without an active cooling element, they assume that the card is not very powerful. Since processors have been running hot for many years now, lack of a heatsink/fan stack would be frowned upon. In an effort to improve image, an active cooling element was used. This choice will leave some consumers pleased that their card has a heatsink and fan on it, others will be left wondering why it is so small.

It should be noted that our card is a pre-production reference board running pre-production silicon. The final design of the board and cooling solution will be left up to the individual card manufacturers.

The RAM

While the die size of the KYRO has decreased, the available memory options has increased. KYRO is set to support 16, 32 and 64 MB of SDRAM. Our reference board arrived at the lab outfitted with eight 8 MB SDRAM modules produced by IBM and rated at 7 ns / 143 MHz for a total of 64 MB of RAM. Our board did not even come close to utilizing the 143 MHz that these chips were rated to; the core clock and the memory clock speeds are synchronous, meaning that our test board was running with a 115 MHz memory clock. Even the shipping card, with a memory clock speed of 125 MHz, is not set to take advantage of the 143 MHz RAM, making this choice an odd one. Perhaps there are plans of a higher memory clocked card or maybe they just got a good deal on IBM memory. Further, shipping boards may feature different memory; only time will tell. Besides, even if it ships with faster memory than necessary, that could help when it comes time to overclock ;)

It is natural to ask why, in this world of DDR memory graphics cards, does the KYRO use standard SDR SDRAM chips? Besides lowering production costs even more, as SDR memory chips have been falling in price ever since the introduction of the DDR chip, the tile based rendering architecture utilized by the KYRO results in extremely less emphasis on the memory bus. The deferred texturing of the KYRO dramatically decreases the amount of data being sent to and from the memory. By decreasing the amount of data that has to travel over the memory bus, the memory bottleneck we are used to seeing in video cards is all but gone. There is sufficient time for data to be written to and from the memory because the amount of data that must travel there is smaller. This is analogous to cars entering a freeway from a town. Think of a traditional video card, such as the GeForce. In this case, houses in the town would act like the video card processor, producing cars to travel to the freeway. Imagine a town of 1 million people all trying to get on the same on-ramp on the same freeway. There would be no problem getting from the houses to the on-ramp, but once 1 million cars try to go to the same on-ramp and the same freeway, a bottleneck is instantly formed. As stated earlier, the deferred texturing technique of the KYRO only passes the information that is going to be used into the memory. This is like our model town during Labor day. Rather than send out all 1 million cars only the people vital for the towns operation will travel. This reduces the number of cars to, lets say 200,000, meaning that the backup and delay encounter on the freeway on a normal day is all but gone. It is for this reason that the KYRO does not need the enormous bandwidth DDR memory would provide: since the KYRO passes less information, there is no bottleneck formed.



The Drivers

The KYRO board being tested testing is 30 - 60 days from the actual shipping date. With this in mind, it is no surprise that the drivers that came with our card were anything but stable. These drivers, referred to as "early alpha" by the folks at PowerVR, gave us numerous problems, including system lockups. Further, we did notice some visual anomolies during game play, but these were not due to the architecture of the chip, but simply immature drivers. Therefore, keep in mind that the descriptions and pictures below are likely to change with time.

The KYRO's drivers incorporate the standard features we have come to expect with modern day video drivers, with little more. Those of you that remember the PowerVR Series 2, the Neon 250, will be happy to hear that the KYRO includes a full OpenGL ICD, so it is no longer necessary to deal with miniport drivers. Although DXTC and S3TC are the same algorithm, Imagination Technologies and STMicro dont't yet have a license for S3TC, which prevents them from including texture compression support in OpenGL. It is, however, something they are looking into for the future.

The driver utilities themselves are also the standard fare we see today. There is the standard "display" page that contains some basic monitor information as well as a few features for moving the screen display and adjusting the gamma correction. The next tab over brings the user to the LCD screen, where one can adjust what signal the DFP port is sending out as well as adjust the gamma correction on this monitor. The final tab is the "3D Optimization" tab which includes the OpenGL and Direct3D settings that we have come to look for. Under the OpenGL tab you will find the standard set of tweaking options we see on compatible driver sets. The Direct3D tab reveals the same thing: with features such as FSAA.

 

The KYRO driver set is not especially visually impressive or technically powerful. Mainly the driver tabs act as they should: allowing the user to tweak different settings of the card. It is possible that we will see a slight design change between these drivers and the shipping drivers. It's worth noting that Imagine Technologies already has alpha Windows NT 4.0 and Windows 2000 drivers as well, so support under those operating systems should be available at the release.



The Test

Since we performed the tests on the same systems that we tested the GeForce 2 GTS on, below is an except from that review:

We chose three systems to measure the performance of these video cards. Remember that this is a comparison of the performance of video cards, not of CPUs or motherboard platforms.

For our High End testing platform, we picked an Athlon 750 running on a KX133 motherboard. The Athlon 750 is fast enough that it won’t be a limiting factor in the benchmarks and should also provide a good estimate of how all of the cards compared would perform on a 600 – 800MHz Athlon or Pentium III system (it will at least tell you which card would be faster).

For our Low End testing platform we picked a Pentium III 550E running on a BX motherboard. Although this isn’t a very “low-end” processor, it is fast enough to see a performance difference between video cards without the processor stepping in as a huge limitation. If we used something like a Celeron 466, the performance of virtually all the cards would be virtually identical at the lower resolutions because the CPU and FSB/memory buses are limiting factors. Once again, this is a test of graphics cards not of CPU/platform performance.

Keep in mind that the drivers here are ALPHA drivers and that performance and stability is likely to improve. In addition, the default clock speed will be increasing by 10 MHz, which should further provide a speed boost.

Windows 98 SE Test System

Hardware

CPU(s)

Intel Pentium III 550E

AMD Athlon 750
Motherboard(s) AOpen AX6BC Pro Gold ASUS K7V-RM
Memory

128MB PC133 Corsair SDRAM

128MB PC133 Corsair SDRAM
Hard Drive

IBM Deskstar DPTA-372050 20.5GB 7200 RPM Ultra ATA 66

CDROM

Phillips 48X

Video Card(s)

3dfx Voodoo5 5500 AGP 64MB
3dfx Voodoo5 4500 AGP 32MB
3dfx Voodoo3 3000 AGP 16MB

ATI Rage 128 Pro 32MB

ATI Rage Fury MAXX 64MB

Matrox Millennium G400MAX 32MB (would not run on Athlon platform)

NVIDIA GeForce 2 GTS 32MB DDR (default clock - 200/166 DDR)
NVIDIA GeForce 256 64MB DDR (default clock - 120/150 DDR)
NVIDIA GeForce 256 32MB DDR (default clock - 120/150 DDR)
NVIDIA GeForce 256 32MB SDR (default clock - 120/166)

NVIDIA Riva TNT2 Ultra 32MB (default clock - 150/183)

S3 Diamond Viper II 32MB

Imagine Technologies KYRO 64MB Reference Board (default clock - 115/115)

Ethernet

Linksys LNE100TX 100Mbit PCI Ethernet Adapter

Software

Operating System

Windows 98 SE

Video Drivers

3dfx Voodoo5 5500 AGP 64MB - beta drivers v1.00.00
3dfx Voodoo5 4500 AGP 32MB - beta drivers v1.00.00
3dfx Voodoo3 3000 AGP 16MB
- beta drivers v1.04.07

ATI Rage 128 Pro 32MB - 6.31CD25

ATI Rage Fury MAXX 64MB - A6.32CD48

Matrox Millennium G400MAX 32MB - 5.52.015

NVIDIA GeForce 2 GTS 32MB DDR (default clock - 200/166 DDR) - Detonator 5.16
NVIDIA GeForce 256 64MB DDR (default clock - 120/150 DDR) - Detonator 5.16
NVIDIA GeForce 256 32MB DDR (default clock - 120/150 DDR) - Detonator 5.16
NVIDIA GeForce 256 32MB SDR (default clock - 120/166) - Detonator 5.16

NVIDIA Riva TNT2 Ultra 32MB (default clock - 150/183) - Detonator 5.16

S3 Diamond Viper II 32MB - 4.12.01.9002-9.10.30

PowerVR KYRO 64MB (default clock - 115/115) - 1.00.3.118 (Alpha)

Benchmarking Applications

Gaming

GT Interactive Unreal Tournament 4.04 AnandTech.dem
idSoftware Quake III Arena demo001.dm3
idSoftware Quake III Arena quaver.dm3



As we have seen in the past, slow 640x480 number are usually a product of poor drivers. This is one reason why the KYRO scores relatively low in this case. In addition, the card's lack of T&L support cripples its performance slightly at such a low resolution. Expect this score to jump up with final drivers.

The one thing that we immediately notice when looking at the KYRO's scores is that the performance difference between 16-bit color and 32-bit color is nearly zero. In the case of the GeForce 2 GTS, the performance drops by 5% when switching from 16-bit color to 32-bit color. The KYRO, on the other hand, looses only 1.3 FPS for a 2% decrease in speed. The tile based rendering power of the KYRO is obviously starting to show how it can work. This small performance difference between 16-bit and 32-bit color is a result of two things. As previously noted, the Tile Rendering architecture of the KYRO reduces memory bandwidth requirements greatly. Further, each tile is rendered internally in 32-bit color, regardless of the external frame buffer's color depth.

The second item that results in the equivalent scores between 16-bit and 32-bit color is the KYRO's internal true color system. Since the card is basically rendering each frame in 32-bit color mode and then scaling the image down, almost the same amount of work is necessary to do 16-bit compared to 32-bit. Both of these items combine to form the similar 16-bit and 32-bit color scores we see through the remainder of the benchmarks on the KYRO.

Now we can truly begin to see where the KYRO falls in terms of performance. Although when in 16-bit color the card is easily tramped by the GeForce 256 DDR, keep in mind that the internal true color system produces better looking 16-bit images. On the other hand, switching to 32-bit color will only cost you 4.2 FPS and will place the KYRO 7.1 FPS behind both the 32 MB and 64 MB DDR GeForce cards, a mere 12.5% slower. The card easily beats the 32 MB SDR GeForce, showing that not all SDR RAM combinations are that bad.



Quake III Arena - Pentium III 550E (cont)

Once again, we see that the KYRO is essentially tied with the 32MB GeForce DDR when in 32-bit color mode and kills the GeForce SDR. As explained in the Quake III Arena 800x600 Pentium III 550E section, the performance difference between 16-bit color and 32-bit color is very small.

At the huge resolution of 1600x1200, the KYRO began to show some driver problems. Prior to this, all Quake III Arena runs were flawless. At 1600x1200x32, the card locked up our test computer several times. Luckily, we were able to get a few runs out of it at this resolution and therefore can report a score.

We see that 32-bit color mode seems to be the pride of the KYRO. At this color depth, the card is able to keep up with both the 32 MB and 64 MB GeForce DDR cards. Although 16-bit color mode is slower than other cards, it does look better than the competing cards.



Quake III Arena - Athlon 750

The KYRO really did not like our Athlon testbed. On top of frequent crashes, the system would lockup every time the resolution was switched. Once again, this problem is no doubt a result of our early alpha drivers, so expect this problem to be fixed before the card is sold on the market.

We see that at 640x480, the KYRO is limited by its alpha drivers, as speed here is not as impressive as in other resolutions. Noteworthy is the fact that the KYRO actually comes on top over the Voodoo5 5500, a card also tested with very early driver sets.

Essentially tied with the Voodoo5 5500, the KYRO is still suffering from it's crippled drivers at 800x600. The card performs 17% slower than the 32 MB DDR GeForce at 800x600x32, but performs 20% faster than the 32 MB SDR GeForce.

Once again, the true potential of the KYRO begins to be seen at 1024x768x32. At this resolution, the card actually performs slightly faster than the 64 MB and 32 MB GeForce DDR cards. The performance seems weak when compared at 16-bit color mode, but as stated in the Quake III Arena 800x600 Pentium III 550E section, this is to be expected.



Quake III Arena - Athlon 750 (cont)

The KYRO shows its might at 1280x1024x32 once again. The card was able to outperform the 64MB DDR GeForce by a hardly noticeable .4 FPS, but beat the 32 MB DDR GeForce by a more impressive 2.1 FPS. Expect this performance gain to rise even more as drivers finalize and core and memory speeds move up 10 more MHz.

Once again the KYRO is able not only to hold its own but also gain a bit of edge over the GeForce DDR cards in 32-bit color. The only thing in KYRO's way is the GeForce 2 GTS and the Voodoo5 5500 by a small amount.



Quake III Arena quaver.dm3 - Pentium III 550E

Throughout the testing of the card in Quaver, many driver related problems were encountered, many resulting in system lockups. This is sure to change with the final driver release.

The alpha drivers of the KYRO allow it to be beat by the GeForce cards by a large margin when at 640x480. In addition, the card is somewhat crippled by the lack of T&L in a complex 3D environment such as Quaver.

Beating the Voodoo5 5500 with beta drivers is all that KYRO was able to do impressively at this resolution. Once again the card is most likely suffering from poor drivers and decreased clock speeds.

It looks like the KYRO is able to prove itself at 1024x768x32. At this resolution, the card is performing almost on par with DDR GeForce cards, falling only a few frames behind. The card continues to beat the SDR GeForce by somewhat of a margin at this resolution. The small difference between 16-bit and 32-bit color is a result of the same conditions described in the Quake III Arena 800x600 Pentium III 550E section. It should be noted that the KYRO is the only card that doesn't take a massive performance hit at 32-bit color without the use of texture compression. Thank tile rendering and its increased efficiency once again.



Quake III Arena quaver.dm3 - Pentium III 550E (cont)

The KYRO falls a bit behind here, most likely due to a driver problem experienced. When the demo was being run at 1280x1024x32, the lightning gun actually began to fire triangles instead of lightning bolts. This resulted in inaccurate benchmark scores for this resolution, as the scene being rendered was physically different that in other cards.

The KYRO was unable to complete our quaver.dm3 run at 1600x1200x32. We are left of analyze the 16-bit color scores, which are low due to the enhanced 16-bit experience that the KYRO adds with its internal true color feature.



Quake III Arena quaver.dm3 - Athlon 750

As the beta drivers of the Voodoo5 5500 show, poor drivers can result in poor performance. The KYRO's alpha drivers were able to beat the Voodoo5 5500's drivers, however both cards were easily crushed by the competition.

The early drivers appear to cripple the KYRO once again, as it is only able to beat the Voodoo5 5500 with its beta drivers in use. The card also beats the 32 MB SDR GeForce.

As we have seen before, at 1024x768 the KYRO is able to overcome its crippling drivers. The card not only keeps up with both the Voodoo5 5500 and the 32 and 64 MB DDR GeForce cards, it is actually able to beat these cards by a small margin. The KYRO performed 5% faster than the 64 MB DDR GeForce.



Quake III Arena quaver.dm3 - Athlon 750 (cont)

At this resolution, the problems of the KYRO's alpha drivers could not be overcome and the card would not complete the test when in 32-bit color. The performance when in 16-bit color is slower than other cards, with exception to the SDR GeForce, but the image quality is better.

Again, the KYRO would not complete the 32-bit test when at this resolution. The 16-bit score shows that the card is working extra to render internally at 32-bit color. The card comes out tied with the 32 MB SDR GeForce.



Unreal Tournament - Pentium III 550E

The Voodoo5 was not tested in Glide mode. Click here to find out why.



Unreal Tournament - Pentium III 550E (cont)

The KYRO actually proves to be a good performer in Unreal Tournament despite its driver limitations and lowered clock speeds. KYRO is actually able to keep up with top speed UT cards, such as the the ATI Rage Fury MAXX. This is most likely due to the nature of the game. Since Unreal uses textures very heavily, the KYRO is actually able to gain performance by not having to write textures to the parts that are not seen. By doing this, and decreasing the amount of traffic in the memory bus, KYRO is able to maintain a good speed in Unreal Tournament.

Even at 1280x1024x32, the KYRO is able to maintain its rank on top of the pack, performing faster than any other video card tested to date. When in 16-bit color mode, the card is slowed down by its internal true color system.

The KYRO would not complete any benchmarks when at 1600x1200x16, the highest resolution that Unreal Tournament will go, due to our early alpha drivers



Unreal Tournament - Athlon 750

The Voodoo5 was not tested in Glide mode. Click here to find out why.



Unreal Tournament - Athlon 750 (cont)

Using the Athlon test system, the KYRO is not the fastest card in UT any more. This is most likely due to incompatibilities between the alpha KYRO drivers and Athlon motherboards, a problem that will hopefully be fixed as the drivers mature. The card is able to keep up with the top performing Unreal Tournament cards.

At 32-bit color, the KYRO falls just .1 FPS short of being tied with the fastest Unreal Tournament card at this resolution, the Voodoo5 5500. The card performs extremely well in 32-bit color.

Once again, the KYRO failed in Unreal's highest resolution, 1600x1200x16. Expect this issue to be resolved by shipping time.



Full Scene Anti-Aliasing

The KYRO's FSAA implementation is similar to that used by NVIDIA - the image is rendered at a higher resolution than the current frame buffer, then scaled down to fit in the frame buffer. Despite this fact, Imagination Technologies and STMicro refer to the available modes as 2x and 4x, just like 3dfx does. The KYRO's FSAA implementaiton is still very early thanks to those alpha drivers. It currently only works in D3D, although we've been told it will definitely support OpenGL in the shipping drivers. Below are screen shots from Need for Speed: Porsche Unleashed. We have similar screenshots from the Voodoo5 and GeForce 2 GTS here and here, respectively.


The image above is JPEG compressed and thus has some quality loss compared to the original.
Click here to download a zip (189 KB) containing the original targa screenshot.
For the full effect, the image should be viewed full screen.

The option to select the FSAA mode in the drivers had not been implemented in our early alpha drivers, nor could we verify which mode it was using by the time of publication. As soon as we can, we'll make an update. Most likely, it is 4x FSAA. We also decided not to benchmark the FSAA performance, once again due to the early alpha status of the drivers.



Conclusion

The idea of a tile based rendering system may not be that new: Imagination Technologies has attempted to implement this technology ever since the launch of the PowerVR Series 2 card, which was supposed to ship almost one and a half years ago. While the idea behind this technology has not changed, the way it is being implemented has. Long gone are the days where tile based rendering video cards were what dreams were made of. This outlook on tile based rendering technology changed with the incorporation of the PowerVR Series 2 based processor in the Sega Dreamcast. Surprised by the high performance and low cost, many consumers wondered if such as systems could be made powerful enough for a PC. KYRO proves that this is so.

Is tile rendering the solution to the increasingly significant memory bandwidth constraints that limit today's 3D accelerators at high resolutions? Gigapixel seemed to think so since their GP-1 core used similar techniques. 3dfx must think so as well considering their recent aquisition of Gigapixel. The technology just makes good sense - afterall, why bother rendering pixels that won't even be drawn to the screen? The question is, can tile rendering be executed properly. As far as actual products go, Gigapixel is still a vaporware company, but the PowerVR Series 2 has been available (albeit in the Dreamcast) for quite a while.

How can Imagination Technologies / STMicro even think about entering the market at such a hectic time as now, with new chips from ATI, NVIDIA, and 3dfx just announced? Well, by decreasing costs and providing a high level of performance, they are attempting to grab a section of the market left largely untapped by the major vendors: the budget gamer. The KYRO is not attempting to take NVIDIA's crown by outperforming it: Imagination Technologies / STMicro would rather sneak around and steal it from behind NVIDIA''s back. By appealing to the performance conscious gamer on a budget, they are in position to grab some of the precious video card market.

What is standing in the way between KYRO and success? Well, right now it is two main things: the drivers and the drivers. We understand that the drivers are early alpha drivers, however we have seen some companies release drivers nearly as buggy as these in order to get their product to market in time. We can only hope that PowerVR does not take this path and that the 30-60 days before product shipment is spent tweaking and perfecting the drivers.

The key to KYRO's success is its price. While it can't compete with the GeForce 2 GTS or the Voodoo5 5500 in terms of performance, the KYRO cards will also be considerably less expensive. With the 32 MB version slated to cost under $200 and the 64 MB version to cost slightly above the $200 mark, the 64 MB KYRO we tested is able to achieve speeds almost equal to 64 MB GeForce DDR cards costing over $100 more. We suspect that the 32 MB KYRO will perform nearly identically to the 32 MB DDR GeForce, making it about $50 less than the competition.

The idea behind a tile based rendering system is intriguing, to say the least. By shrinking the amount of data that must pass through the memory bus, a tile based rendering system could produce a video card that fully scales with core speed once again. Do Imagination Technologies / STMicro have what it takes to pull it off? If the KYRO is any indication, the answer is a resounding yes.

Log in

Don't have an account? Sign up now