Original Link: http://www.anandtech.com/show/1536

This year we have seen a very heated competition between ATI and NVIDIA for dominance in the desktop GPU market. With NVIDIA making a very strong comeback thanks to their NV4x architecture, it was only a matter of time before we saw NV4x filter its way down to NVIDIA's mobile designs.

Not too long ago, NVIDIA's mobile GPUs were a joke as far as market penetration was concerned. More recently, NVIDIA has made some gains but still yields to ATI as the dominant force in the discrete mobile GPU market. Today, NVIDIA is hoping to start to further erode ATI's mobile market share with the release of their very first NV4x derived mobile GPU - the GeForce 6800 Go.

As its name implies, the 6800 Go is a derivative of NVIDIA's flagship GeForce 6800. More specifically, GeForce 6800 Go is a mobile version of the desktop 12-pipe GeForce 6800 GPU, code named NV41M. The architectural features are identical between the GeForce 6800 Go and the desktop GeForce 6800, the only difference being power management features and clock speeds.

The GeForce 6800 Go is available in both 128-bit and 256-bit memory configurations with up to 256MB of memory on a MXM module (mobile PCI Express module); the notebook we tested did not use the 6800 Go on a MXM module although other designs are expected to use MXM.

The clock speeds of the GeForce 6800 Go will vary depending on the actual notebook vendor, but there are two basic configurations: 300/300 and 450/600 (core/memory). The difference in the two configurations mainly boils down to the type of memory used - DDR1 or DDR3, with the faster configuration being a DDR3 one.

Being that we are talking about a very high end (and thus high power consumption) GPU, it is no surprise that the GeForce 6800 Go is only going to be found in very large desktop replacement (aka Transportable) notebooks. These are the types of notebooks with 17" widescreen displays and have no qualms about tipping the scales at over 10 lbs. These are the "take your work home with you" types of notebooks, or the closest thing to a PC iMac equivalent. NVIDIA will eventually release mobile versions of its mid-range and low-end GPUs, but the easiest thing is to start high and work their way down.

One of the most impressive things about today's launch is that the GeForce 6800 Go is actually available today in notebooks that are shipping immediately. Our review notebook, the Sager NP9860 should be available at the time of publication. Notebooks from Alienware, Falcon Northwest, Prostar and Voodoo should also be shipping today, with others following shortly. While this may not sound like a big deal, if you've ever followed a mobile GPU launch, it is usually several months before we see mobile GPUs actually used in laptops, not days and definitely not immediately. So the magnitude of today's launch and availability is definitely not to be under-spoken.

While NVIDIA would like to think that they have this launch to themselves, ATI is actually cooking up a competing product based on their desktop X800 offering for launch in about two weeks. ATI's competing part has yet to be branded, but you can expect it to be some variation of the X800 name - internally it is known as M28. Considering that the mobile Radeon 9800 was called the Mobility Radeon 9800, you can pretty much guess what the M28 will be called.

M28, just like the GeForce 6800 Go, is a 12-pipe version of their desktop high-end X800 part. The GPU will be available in 128-bit and 256-bit versions (up to 256MB of memory), with clock speeds of 400MHz core and with 400MHz GDDR3 memory; ATI is expecting some vendors to actually implement M28 clocked at 450MHz. Although vendors could develop an MXM version of M28, more than likely you'll see mobile PCI Express modules used as the standard of choice for implementing the M28. Our review sample did not use a PCI Express board, rather a custom board design, but here's a shot of M28 on an AXIOM PCI Express board:

The beauty of MXM/AXIOM PCI Express based designs is that it does contribute to reduced time to market, as all you'll need is to implement the board connector and ensure proper cooling in order to implement a new GPU. While in theory MXM/AXIOM designs would allow for upgradable graphics on notebooks, increasing the flexibility of manufacturers to offer multiple GPUs on a single notebook is much more likely to be a real world benefit of the technology.

With the GeForce 6800 Go and ATI's M28, we essentially have another 6800 vs. X800 comparison on our hands, this time in the mobile sector.

Reducing GPU Power: A Quick Lesson in Clock Gating

In order to prepare a desktop GPU for use in a mobile environment, one of the fundamentals of chip design is violated by introducing something called clock gating. All GPUs are composed of tens of millions of logic gates that make up everything from functional units to memory storage on the GPU itself. Each gate receives an input from the GPU-wide clock to ensure that all parts of the GPU are working at the same frequency. But with GPUs getting larger and larger, it becomes increasingly difficult to ensure that all parts of the chip receive the same clock signal at the same time. In order to compensate, elaborate clock trees are created carrying a network of the same clock signal to all parts of the chip, so that when the clock goes high (instructing all of the logic in the chip to perform their individual tasks, sort of like a green light to begin work) all of the gates get the signal at the same time.

The one principle that is always taught in chip design is to make sure that you never allow the clock signal to pass through any sort of logic gates, it should always go from the source of the signal to its target without passing through anything else. The reason being that if you start putting logic in between the clock signal source and its target, you make the clock tree extremely complicated - since now, you not only have to worry about getting the same clock signal to all parts of the GPU at the same time, but you must also worry about delays introduced by feeding the clock through logic. There are benefits to clock gating however, and they primarily are for power savings.

It turns out that the easiest way to turn off a particular part of a chip is to stop feeding a clock signal to it; if the clock never goes high, then that part of the chip never knows to start working, so it remains in its initial state which is of non-operation. But you obviously don't want the clock disabled all of the time, so you need to implement logic that determines whether or not the clock should be fed to a particular part of the chip.

When you're typing in Word, all your GPU is doing consists of 2D and memory operations. The floating point units of the GPU are not needed, nor is anything related to the 3D pipeline. So let's say we have a part of the chip that detects whether or not you are typing in Word instead of playing a game, and that part of the chip sends a signal called 2D_Power_Save. When you are just in Word and don't need any sort of 3D acceleration, 2D_Power_Save goes high (the signal carries a value of '1', or electrically, whatever the high voltage is on the core), otherwise the signal stays low ('0' or 0V).

Using this 2D_Power_Save signal we could construct some logic using the clock that is fed to all of the parts of the 3D engine on the GPU. The logic could look something like this:

The very simple logic illustrated above is a logical AND gate with two input signals and one output. The 2D_Power_Save signal is inverted, so when it is high the value fed to the AND gate is low and vice versa. If the 2D_Power_Save signal is high, it is inverted and passed to the AND gate as a low signal, meaning that the Clock_Out signal will never be high and thus anything connected to it will always be low. If 2D_Power_Save is low, then the clock gets pass through to the rest of the GPU. That's how clock gating works.

We mentioned earlier that modern day GPUs are composed of tens of millions of gates (each gate is made up of multiple transistors), and while it would be nice to, it's virtually impossible to put this sort of logic in place for every single one of those gates. For starters, you'd have an incredibly huge chip thanks to spending even more transistors on logic for your clock gating, and it would also make your clock tree incredibly difficult to construct. So what happens is that the clock fed to large groups of gates, known as blocks, is gated, instead of gating the clocks to individual gates. The trick here is that the smaller the blocks you gate (or the more granular you clock-gate), the more efficient your power savings will be.

Let's say we've taken our little gated clock from above and fed it to the entire 3D rendering pipeline. So when 3D acceleration is not required (e.g. we're just typing away in MS Word), the entire 3D pipeline and all of its associated functional units are shut off, thus saving us lots of power. But now, when we fire up a game of Doom 3, all of our power savings are lost as the entire 3D engine is turned back on.

What if we could turn off parts of the GPU not only depending on what type of application we're running (2D or 3D) but also based on what the specific requirements of that application are. For example, Doom 3's shaders perform certain operations that will stress some parts of the GPU, while a game like Grand Theft Auto will stress other parts of the GPU. A more granular implementation of clock gating would allow the GPU to differentiate between the requirements of the two applications and thus offer more power savings.

While we're not quite at the level of the latter example, the one thing that is true is that today's mobile GPUs offer more granular clock gating than the previous generation. This will continue to be true for future mobile GPUs as smaller manufacturing processes and improvements in GPU architecture will allow for more and more granular clock gating.

So where are we today with the GeForce 6800 Go?

With the NV3x series of GPUs, as soon as a request hit the 3D pipeline, the entire 3D pipeline powered up and it didn't power down until the last bits of data left the pipeline. With the GeForce 6800 Go, different stages of the 3D pipeline will only power up if they are being used, otherwise they remain disabled thanks to clock gating. What this means is that power consumption in 3D applications and games is much more optimized now than it ever was before and it will continue to improve with future mobile GPUs.

Since ATI's M28 has not officially been launched yet we don't have any information on its power consumption, however given that the X800 consumes less power than the 6800 on the desktop, we wouldn't be too surprised to see a similar situation emerge on the mobile side of things as well.

The Test

For this comparison we were provided with two notebooks - in the NVIDIA corner we have the Sager NP9860, using a 3.2GHz Prescott (P4-E), 1GB of RAM and a GeForce 6800 Go clocked at 300/300 with a 256-bit memory bus.

Our GeForce 6800 Go testbed

In ATI's corner we have an unnamed M28 launch partner (we can't mention who until M28 is launched in about 2 weeks) using a 3.4GHz Prescott (P4-E), 1GB of RAM and a M28 clocked at 400/400 with a 256-bit memory bus.

The systems are configured as closely as possible, with the CPU clock speed being a slight advantage in ATI's favor. But given the fact that most of our benchmarks will be GPU limited and the CPU clock speed advantage is around 6%, don't expect it to tilt things too far in ATI's favor.

Doom 3 Performance

In Doom 3, as expected, NVIDIA takes the lead at 1280 x 1024, with just under an 8% performance advantage over the M28. On the desktop side, NVIDIA has done extremely well with Doom 3 performance and the same can be seen on the mobile side.

Doom 3 - High Quality

In terms of scaling with resolution, both the 6800 Go and the M28 appear to scale quite similarly, with the 6800 Go separating itself a bit more from the M28 at the higher resolutions.

AA Performance

Given the similarity in performance thus far, it's no big surprise to see relatively similar performance scaling with Anti Aliasing enabled.

Half Life 2 (Source VST) Performance

Under the Source Visual Stress Test we have ATI's M28 with a 13% performance lead over the GeForce 6800 Go. Just as NVIDIA appears to have better performance under Doom 3, Half Life 2 seems to be much more of an ATI-friendly title.

Half Life 2 (Source Visual Stress Test)

Our resolution scaling graph shows an interesting phenomenon; while the 6800 Go takes the early lead at lower CPU/driver bound resolutions, once we get up to and beyond 1024 x 768 the M28 begins to separate itself.

Unreal Tournament 2004 Performance

ATI's M28 manages just under a 5% performance advantage in UT2004, which isn't anything huge.

Unreal Tournament 2004

Despite the performance lead at 1280 x 1024, the performance at the rest of the resolutions is virtually identical between the two GPUs.

Halo Performance

The 13% performance advantage of the M28 is definitely noticeable in Halo.


The two GPUs scale virtually identically across all of the resolutions, with ATI's M28 maintaining a fairly constant lead.

Wolfenstein: Enemy Territory Performance

NVIDIA has historically shown much better performance in OpenGL games, and thus it's no surprise to see an advantage here in our OpenGL Quake III engine based Wolfenstein: ET test. The performance advantage is negligible however at just over 3% for NVIDIA.

The two GPUs offer virtually identical performance across the board, only really separating slightly at 1280 x 1024.

Far Cry Performance

We completed our GeForce 6800 Go testing on a notebook before the Far Cry 1.3 patch was released and did not have access to the notebook after its release, so all of our tests here use the 1.2 patch. that being said, ATI holds a huge performance advantage here of just under 34%. Despite the fact that both of the notebooks are quite playable at this resolution, the performance advantage is clearly in ATI's court in this test.

Our resolution scaling tests show us that once we hit 1024 x 768, the GeForce 6800 Go is left in the dust by M28, but before that when the benchmark is mostly CPU bound the two GPUs perform very similarly.

The Sims 2 Performance

ATI's M28 manages a 5% lead under The Sims 2, which isn't anything huge but it does give them the slight edge over the 6800 Go.

ATI only actually gained a performance advantage at 1280 x 1024, at lower resolutions NVIDIA actually held a slight performance advantage.

Battlefield: Vietnam Performance

The Battlefield performance crown continues to go to ATI, with the M28 offering a 20% performance advantage at 1024 x 768. ATI's M28 driver did not support the 1280 x 960 resolution we originally ran the GeForce 6800 Go laptop we had on loan, thus we had to perform this comparison at the lower 1024 x 768 resolution.

Although ATI has a performance advantage across all resolutions, the two GPUs scale very similarly with resolution, although the performance gap does begin to shrink at the higher resolutions.

Star Wars: Battlefront Performance

Based off the Battlefield engine, it's no surprise that Battlefront shows the M28 as being almost twice as fast as the GeForce 6800 Go.

Warcraft III Performance

Warcraft shows both the GeForce 6800 Go and M28 performance as virtually identical.


Final Words

First and foremost, kudos to NVIDIA for launching a mobile GPU and be able to promise availability of notebooks based on the GPU on the same day. This year we have seen far too many GPU launches on the desktop side met with absolutely zero availability, and to have a launch with availability on the same day is a nice change of pace. Hopefully this will be the beginning of a new era for NVIDIA, we'll just have to wait and see. With ATI's M28 launch just two weeks away, we can only hope that ATI will follow suit in having launch and availability paired with one another in the same manner as NVIDIA. With the GeForce 6800 Go, NVIDIA has effectively set the launch schedule standard that ATI must at least follow in order to avoid the scorn of AnandTech and end-users alike.

Performance-wise, the latest mobile GPUs from both ATI and NVIDIA are quite strong. Offering desktop-class performance (because they are basically desktop GPUs with some neat power management features), ATI's M28 and NVIDIA's GeForce 6800 Go make perfect LAN-party notebooks as well as excellent desktop replacement notebooks for users who happen to be gamers. The performance of both solutions was pretty impressive, with 1280 x 1024 being an extremely playable resolution on either notebook. The performance advantage does to go ATI however, the M28 performed very well across the board, only losing to NVIDIA in Doom 3 performance but offering much higher performance in most other benchmarks.

Things could get very interesting with NVIDIA's higher performance configuration of the GeForce 6800 Go running at 450/600, instead of the 300/300 configuration we tested here today. At 450/600, the performance advantage could definitely shift to NVIDIA in the areas where things are already close and ATI's performance gap could also be eaten into. ATI may have an answer to NVIDIA's higher clocked configuration of the GeForce 6800 Go; while ATI only rates the M28 at 400/400, some manufacturers are apparently running it at higher speeds, we will have to wait and see what is launched by the end of this month, but the performance verdict is far from final. All we know today is that M28 is faster than NVIDIA's baseline GeForce 6800 Go configuration, and we'll have to wait until the end of this month for a truly final verdict on the king of the DTR mobile GPU market.

What's even more exciting however is the possibility of both ATI and NVIDIA's mid-range GPUs coming down to more manageable-sized notebooks in the near future. While we just had the 6600 vs X700 battle on the desktop, don't be too surprised if we see a very similar comparison on the mobile side next year.

Log in

Don't have an account? Sign up now