ATI FireGL X3-256 Technology

The FireGL X3-256 is based on ATI's R420 architecture. While this isn't a surprise, it is interesting that the highest end AGP offering that ATI has on the table is based on the X800 Pro. On the PCI Express side, ATI is offering a higher performance part, but for now, the FireGL on AGP is a little more limited than on PCI Express. When we tackle the PCI Express workstation market, we'll bring out a clearer picture of how ATI's highest end workstation component stacks up against the rest of the competition. As the ATI part isn't positioned as an ultra high end workstation solution, we'll be focusing more on price performance. Unfortunately for ATI, the street price of the 3Dlabs Wildcat Realizm 200 comes in at just about the same as the Radeon FireGL X3-256 and is targeted at a higher performance point. But we'll have to see how that pans out when we've taken a look at the numbers. For now, let's pop open the hood on the ATI FireGL X3-256.

We will start out with the vertex pipeline as we did with the NVIDIA part. The overall flow of data is very similar to the Quadro, except, of course, that the ATI part runs with 12 pixel pipelines rather than 16. The internals are the differentiating factor.



We can see that the ATI vector unit supports the parallel operation of a 4x 32-bit vector unit and a 32-bit scalar unit. This allows the same type of operation that the NVIDIA GPU supports, but the FireGL lacks the VS 3.0 capabilities and support for vertex textures. Interestingly, in the documents that list the features of the FireGL X3, we see that "Full DX9 vertex shader support with 4 vertex units" is mentioned in addition to its "6 geometry engines". This obviously indicates that 2 of the geometry engines don't handle full DX9 functionality. This isn't of as much importance in a workstation part, as the fixed function path will be more often stressed, but it's worth noting that this core is based on the desktop part and we didn't pick up this information from any of our desktop briefings or data sheets.

The FireGL X3-256 employs the HyperZ HD engine that the Radeon uses, which combines early/hierarchical z hardware with a z/stencil cache and z compression. The hierarchical z engine looks at tiles of pixels (in the case of the FireGL 16x16 blocks), and if the entire block is occluded, up to 256 pixels can be eliminated in one clock. These pixels never need to touch the fragment/pixel processing hardware and save a lot of processing power. When we look at the pixel engine, we can see that ATI divides their pixels into "quad" pipes as well, but an NVIDIA and ATI quad is defined slightly differently. On ATI hardware, data out of setup is tiled into those 16x16 blocks for the hierarchical z pass. It's these blocks on which each quad pipe shares its efforts.



Inside each of the pixel pipes, we have something that also looks similar to the NVIDIA architecture. It is possible for ATI to handle completing two vector 3 operations and 2 scalar operations in combination with a texture operation every clock cycle. This is what the hardware ends up looking like:



Since the texture unit does not share hardware with either of the shader math units, ATI is able to handle theoretically more math per clock cycle in its pixel shaders than NVIDIA. The 3 + 1 arrangement is also not as robust as NVIDIA claims it to be, as NVIDIA is capable of handling 2 vector + 2 vector operations.

ATI is not as robust as either NVIDIA's architecture or 3Dlabs with only PS2.0 support. The FireGL can only support between 512 and 1536 shader instructions depending on the conditions, and uses fp24 for processing. The Radeon architecture has favored DirectX over OpenGL traditionally, so we will be very interested to see where these pre-dominantly OpenGL benchmarks will end up.

As far as rasterization is concerned, ATI does not support any floating point framebuffer display types. The highest accuracy framebuffer that the FireGL X3-256 supports is a 10-bit integer format, which is good enough for many applications today. As with both 3Dlabs' and NVIDIA's parts, the FireGL X3-256 includes dual 10-bit RAMDACs and 2 dual-link DVI-I connections allowing support of up to 9MP displays. Unlike the Wildcat Realizm and Quadro FX lines, there is no way to get any sort of genlock, framelock, or SDI output support for the FireGL line. This puts ATI behind when it comes to video editing, video walls, multi-system displays, and broadcast solutions.

The added features that ATI's FireGL X3-256 supports beyond the Radeon include:
  • Anti-aliased points and lines - Lines and points are smoothed as they're drawn in wireframe mode. This is much higher quality and faster than FSAA when used for wireframe graphics, and is of the utmost importance to designers who use workstations for wireframe manipulation (the majority of the 3D workstation market).
  • Two-sided lighting - In the fixed function pipeline, enabling two-sided lighting allows hardware lights to illuminate both sides of an object. This is useful for viewing cut-away objects. SM 3.0 supports two-sided lighting registers for programmable shaders, but these don't apply to the fixed function light sources.
  • OpenGL overlay planes - Overlays are useful for adding to a 3D accelerated viewport without making the buffer dirty. This can significantly speed up things like displaying pop-up windows or selection highlights in 3D applications.
  • 6 user defined clip planes - User defined clip planes allow the cutting away of surfaces in order to look inside objects in application that support their creation.
  • Quad-buffered stereo 3D support - This enables smooth real-time stereoscopic image output by supporting a front-left, back-left, front-right, and back-right buffer for display.
Undoubtedly, the FireGL line also features a different memory management setup and driver development focuses more heavily on OpenGL and stability. This is quite a different market than the consumer side, but ATI has quite a solid offering with the strength of the FireGL X3-256. Of course, we would rather see a 16-pipeline part, but we'll have to wait until we evaluate PCI Express graphics workstations for that.

NVIDIA Quadro FX 4000 Technology The Cards
Comments Locked

25 Comments

View All Comments

  • Sword - Friday, December 24, 2004 - link

    Hi again,

    I want to add to my first post that there were 2 parts and a complex assembly (>110 very complex parts without simplified rep).

    The amount of data to process was pretty high (XP shows >400 Mb and it can goes up to 600 Mb).

    About the specific features, I believe that most of the CAD users do not use them. People like me, mechanical engineers and other engineers, are using the software like Pro/E, UGS, Solidworks, Inventor and Catia for solid modeling without any textures or special effects.

    My comment was really to point that the high end features seams useless in real world application for engineering.

    I still believe that for 3D multimedia content, there is place for high-end workstation and the specviewperf benchmark is a good tool for that.
  • Dubb - Friday, December 24, 2004 - link

    how about throwing in soft-quadro'd cards? when people realize with a little effort they can take a $350 6800GT to near-q4000 performance, that changes the pricing issue a bit.
  • Slaimus - Friday, December 24, 2004 - link

    If the Realizm 200 performs this well, it will be scary to see the 800 in action.
  • DerekWilson - Friday, December 24, 2004 - link

    dvinnen, workstation cards are higher margin -- selling consumer parts may be higher volume, but the competition is harder as well. Creative would have to really change their business model if they wanted to sell consumer parts.

    Sword, like we mentioned, the size of the data set tested has a large impact on performance in our tests. Also, Draven31 is correct -- a lot depends on the specific features that you end up using during your normal work day.

    Draven31, 3dlabs drivers have improved greatly with the Realizm from what we've seen in the past. In fact, the Realizm does a much better job of video overlay playback as well.

    Since one feature of the Quadro and Realizm cards is their ability to run genlock/framelock video walls, perhaps a video playback/editing test would make a good addition to our benchmark suite
  • Draven31 - Friday, December 24, 2004 - link

    Coming up with the difference between the spec viewperf tests and real-world 3d work means finding out which "high-end card' features that the test is using and then turning them off in the tests. With NVidia cards, this usually starts with antialiased lines. It also depends on whether the application you are running even uses these features... in Lightwave3D, the 'pro' cards and the consumer cards are very comparable performance-wise because it doesn't use these so-called 'high-end' features very extensively.

    And while they may be faster in some Viewperf tests, 3dLabs drivers generally suck. Having owned and/or used several, I can tell you any app that uses DirectX overlays as part of its display routines is going to either be slow or not work at all. For actual application use, 3dLabs cards are useless. I've seen 3dLabs cards choke on directX apps, and that includes both games and applications that do windowed video playback on the desktop (for instance, video editing and compositing apps)
  • Sword - Thursday, December 23, 2004 - link

    Hi everyone,

    I am a mechanical engineer in Canada and I am a fan of anandtech.

    I made last year a very big comparison of mainstream vs workstation video card for our internal use (the company I work for).

    The goal was to compare the different systems (and mainly video cards) to see if in Pro-Engineer and the kind of work with do we could take real advantage of high-end workstation video card.

    My conclusion is very clear : in specviewperf there is a huge difference between mainstream video card and workstation video card. BUT, in the day-to-day work, there is no real difference in our reaults.

    To summarize, I made a benchmark in Pro/E using the trail files with 3 of our most complex parts. I made comparison in shading, wireframe, hidden line and I also verified the regeneration time for each part. The benchmark was almost 1 hour long. I compared 3D Labs product, ATI professionnal, Nvidia professionnal and Nvidia mainstream.

    My point is : do not believe specviewperf !! Make your own comparison with your actual day-to-day work to see if you really have to spend 1000 $ per video cards. Also, take the time to choose the right components so you minimize the calculation time.

    If anyone at Anandtech is willing to take a look at my study, I am willing to share the results.

    Thank you
  • dvinnen - Thursday, December 23, 2004 - link

    I always wondered why Creative (they own 3dLabs) never made a consumer edition of the Wildcat. Seems like a smallish market when it wouldn't be all that hard to expand into consumer cards.
  • Cygni - Thursday, December 23, 2004 - link

    Im surprised by the power of the Wildcat, really... great for the dollar.
  • DerekWilson - Thursday, December 23, 2004 - link

    mattsaccount,

    glad we could help out with that :-)

    there have been some reports of people getting consumer level driver to install on workstatoin class parts, which should give better performance numbers for the ATI and NVIDIA parts under games if possible. But keep in mind that the trend in workstation parts is to clock them at lower speeds than the current highest end consumer level products for heat and stability reasons.

    if you're a gamer who's insane about performance, you'd be much better off paying $800 on ebay for the ultra rare uberclocked parts from ATI and NVIDIA than going out and getting a workstation class card.

    Now, if you're a programmer, having access to the workstation level features is fun and interesting. But probably not worth the money in most cases.

    Only people who want workstation class features should buy workstation class cards.

    Derek Wilson
  • mattsaccount - Thursday, December 23, 2004 - link

    Yes, very interesting. This gives me and lots of others something to point to when someone asks why they shouldn't get the multi-thousand dollar video card if they want top gaming performance :)

Log in

Don't have an account? Sign up now