It’s been 7 months since the launch of the first Fermi cards, and at long last we’re here: we’ve reached the end of the road on the Fermi launch. Today NVIDIA is launching the final GPU in the first-generation Fermi stack into the add-in card market, launching the GeForce GT 430 and the GF108 GPU that powers it. After months of launches and quite a bit of anticipation we have the complete picture of Fermi, from the massive GTX 480 to today’s tiny GT 430.

For the GT 430, NVIDIA is taking an interesting position. AMD and NVIDIA like to talk up their cheaper cards’ capabilities in HTPC environments but this is normally in the guise of an added feature. Rarely do we see a card launched on one or two features and today is one of those launches. NVIDIA believes that they’ve made the ultimate HTPC card, and that’s the line they’re going to be using to sell it; gamers need not apply. So just what is NVIDIA up to, and do they really have the new king of the HTPC cards? Let’s find out.

  GTX 480 GTX 460 768MB GTS 450 GT 430 GT 240 (DDR3)
Stream Processors 480 336 192 96 96
Texture Address / Filtering 60/60 56/56 32/32 16/16 16/16
ROPs 48 24 16 4 8
Core Clock 700MHz 675MHz 783MHz 700MHz 550MHz
Shader Clock 1401MHz 1350MHz 1566MHz 1400MHz 1340MHz
Memory Clock 924MHz (3696MHz data rate) GDDR5 900MHz (3.6GHz data rate) GDDR5 902MHz (3.608GHz data rate) GDDR5 900MHz (1800MHz data rate) DDR3 790MHz (1580MHz data rate) DDR3
Memory Bus Width 384-bit 192-bit 128-bit 128-bit 128-bit
Frame Buffer 1.5GB 768MB 1GB 1GB 1GB
FP64 1/8 FP32 1/12 FP32 1/12 FP32 1/12 FP32 N/A
Transistor Count 3B 1.95B 1.17B 585M 727M
Manufacturing Process TSMC 40nm TSMC 40nm TSMC 40nm TSMC 40nm TSMC 40nm
Price Point $449 $169 $129 $79 $75

The GT 430 is based on NVIDIA’s GF108 GPU, which like the GT21x GPUs before it, is coming to retail cards last rather than first. It’s already shipping in notebooks and prebuilt HTPCs, but this is the first time we’ve had a chance to look at just the complete card. And it really is a complete card – unlike all of NVIDIA’s other desktop launches which had GPUs with disabled functional units, the GT 430 uses a fully enabled GF108 GPU. For once with Fermi, we’ll be able to look at the complete capabilities of the GPU.

On the shader side of things, NVIDIA has taken GF106 and nearly cut it in half. We still have 1 GPC, but now it houses 2 SMs instead of 4. Each SM still contains 48 shaders, 8 texture units, and has FP64 capabilities, fulfilling NVIDIA’s commitment to FP64 capabilities (no matter how slow) on all Fermi GPUs. So yes Virginia, you can write and debug FP64 CUDA code on GF108. Attached to the shader block is 2 64bit memory controllers providing a 128bit memory bus, along with 128KB of L2 cache and a block of 4 ROPs.

For the memory NVIDIA is using DDR3, which is still common for cards under $100 given the price premium of GDDR5. Much like the GT 240 we believe this puts the GT 430 at a memory bandwidth disadvantage, and NVIDIA is already talking about working with partners on a GDDR5 version of the card in the future. We suspect that such a card will appear once 2Gbit GDDR5 is available in sufficient volume, as NVIDIA and their partners would seem to be fixated on having 1GB of RAM for now. In practice we usually find that 512MB of GDDR5 is better than 1GB of DDR3 in most cases.

Based on what we originally saw with GF104, we had expected GF108 to be a near-perfect quarter of the GF104 die: one-quarter the shaders, one-quarter the memory controllers, one quarter the ROPs. Even though GF108 has been available for some time now in mobile, OEM, and professional parts, we’ve never really taken a look at it beyond the fact that it had 96 shaders. If we had, we would have noticed something very important much sooner: it only has 4 ROPs.

For GF100-GF106, NVIDIA paired a block of 8 ROPS with a single 64bit memory controller. At the top this gave GF100 a 384bit memory bus, and down at GF106 it had a 192bit memory bus (with the GTS 450 shipping with 2 of those 3 64bit controllers active for a 128bit bus). For GF108 NVIDIA went with 2 64bit controllers to make a 128bit memory bus, which itself is not surprising since 64bit buses have extremely limited bandwidth, and that’s only suitable for bottom-tier ultra-cheap parts of which GF108 is not. So imagine our surprise when we were looking at the final spec sheet for GF108 and noticed that it didn’t have the 16 ROPs that logic dictates would be paired with a 128bit memory bus. And imagine our further surprise when that wasn’t even 8 ROPs, which is the size of a single block of ROPs or what GT214/GT216 had.

Instead NVIDIA’s thrown u