Original Link: http://www.anandtech.com/show/1651




Introduction

The recent re-introduction of multiple GPU configurations into the mainstream has given us quite a bit to talk about over the past few months. Recently, the Quadro SLI from NVIDIA was paper launched. While we wait for the drivers to trickle out of NVIDIA so that we can begin testing, we have another interesting multiGPU solution to consider: the 3Dlabs Wildcat Realizm 800.

We've already taken a look at the Wildcat Realizm 200, which forms the basis for the 800. In our investigation of the Realizm 800, we will see more extensive testing of the part when we can get a hold of NVIDIA's SLI drivers, since that will be a relevant comparison. We would also like to compare this card against the top of the line Quadro 4400 and FireGL V7100, which we don't have in our labs yet. As the Realizm 800 is very much an extension of the 200, and because we don't have the very interesting comparisons available yet, we will treat this article as a brief introduction to the card.

3Dlabs parts have their advantages and disadvantages. Whether the Realizm 800 fits your needs or not, Creative has certainly helped to keep the battle for the 3D workstation space interesting. To get us started, here's a quick table showing the differences between some of the cards that we've tested and the Wildcat Realizm 800.

AGP Workstation Graphics Contenders
3Dlabs Wildcat Realizm 200 3Dlabs Wildcat Realizm 800 ATI FireGL X3-256 NVIDIA Quadro FX 4000
Street Price ~$850 ~$2000 ~$800 ~$1500
Memory Size/Type 512MB GDDR3 640MB GDDR3 256MB GDDR3 256MB GDDR3
Memory Bus 256bit 512bit 256bit 256bit
Memory Clock 500MHz 500MHz 450MHz 500MHz
Core Clock ? ? 490MHz 375MHz
Vertex Pipes 4 8 6 6
Vertex Processing 36-bit 36-bit 32-bit 32-bit
Pixel Pipes 12 24 12 16
Pixel Processing 32-bit / 16-bit storage 32-bit / 16-bit storage 24-bit 32-bit / 16-bit selectable
Shader Model Support VS 2.0 / PS 3.0 VS 2.0 / PS 3.0 SM 2.0 SM 3.0
2x Dual-Link DVI Yes Yes Yes Yes
Stereo 3D Yes Yes Yes Yes
Genlock/Framelock Multiview Upgrade Multiview Upgrade No SDI version




Inside the Wildcat Realizm 800

The first thing that we noticed about the Wildcat Realizm 800 was its size. This is a full length card in every way. Sporting two of the GPUs featured on the Realizm 200, the 800 needs plenty of space for silicon, RAM, and all the routing going on. For background on the GPUs referenced in this article, take a peek at our architectural analysis of the Wildcat Realizm 200.

The two GPUs are connected by a discreet vertex unit that 3Dlabs calls a Vertex/Scalability Unit (VSU). This bit of hardware is responsible for the card's PCI Express interface, geometry processing for the entire scene, and splitting the workload into two parts. There's also 128MB GDDR3 "DirectBurst" memory off of the VSU (connected with a 128-bit wide interface). This memory stores rendering commands and geometry data for the VSU.

Unlike the NVIDIA SLI solution, the VSU handles all scene splitting tasks in its Breaker/Distributer. This offers many important advantages, including the fact that software support is not required for full hardware utilization. Having a unified 512MB framebuffer into which both GPUs are connected gives the Wildcat Realizm 800 its definitive multiGPU advantage. All scenes can benefit from geometry and fragment acceleration equally and transparently.

With NVIDIA's solution using separate framebuffers that are either combined in every frame or used in an alternating fashion, there are some compatibility limitations that prevent all graphics from benefiting equally from the technology. Situations can come up where one GPU will need data from a previous frame or a part of the current frame being rendered on the other GPU. Not having a unified framebuffer makes this more difficult logistically than it needs to be.

So, the VSU splits the frame first, then processes the graphics and sends fragment data on to the GPUs over two 64-bit parallel interfaces. It's unclear whether this is a physically different interface from the AGP interfaces already on the GPUs. If 3Dlabs reused the interface, it's definitely running faster than normal delivering 4.2GB/s from the VSU to each GPU.

The data is sent to the GPUs bypassing the geometry processing steps. It's a shame that all that geometry power goes unused, but sometimes sacrifices must be made. Likely, combining the geometry power from the VSU and both GPUs would have unbalanced the fragment processing abilities of the card (or would have been otherwise too difficult to accomplish effectively).

Each GPU then handles the rest of the pipeline as it would normally have done on the Wildcat Realizm 200.

The Wildcat Realizm 800 still has 4 Silicon Image TMDS chips for solid refresh rates on 2x Dual-Link DVI connections. It should be easier for 3Dlabs to address the major issue that NVIDIA is focusing on with SLI in professional graphics: spanning multiple displays with a single 3D accelerated window. We will try to compare this aspect of the WR800 and Quadro SLI solutions when we have all the cards and drivers in our labs for testing.

Before we head to the numbers, we'll lead off by saying that doubling the hardware doesn't usually result in a linear 2x performance increase. 3Dlabs still has not disclosed core clock speeds to us, and it may well be the case that the WR800 runs its GPUs at a slightly decreased speed from the 200. With some of our numbers, it's difficult to tell exactly what causes performance to look like it does, but we will do our best to explain our results.

Test

Here are our test system setups:

 Performance Test Configuration
Processor(s): 2 x AMD Opteron 250
AMD Opteron 150
AMD Opteron FX-53
RAM: 4 x 512MB OCZ PC3200 EL ECC Registered (2 per CPU)
2 x 1024MB Kingston PC 3200 ECC Registered
2 x 1024MB OCZ PC3200 EB Platinum Edition
Hard Drive(s): Seagate 120GB 7200RPM IDE (8MB Buffer)
Motherboard & IDE Bus Master Drivers: AMD 8131 APIC Driver
NVIDIA nForce 5.10
NVIDIA nForce 6.10
Video Card(s): ATI FireGL V5000
3Dlabs Wildcat Realizm 200
ATI FireGL X3-256
NVIDIA Quadro FX 4000
HIS Radeon X800 XT Platinum Edition IceQ II
Prolink GeForce 6800 Ultra Golden Limited
Video Drivers: 3Dlabs 4.04.0757 Driver
ATI FireGL 8.083 Driver
NVIDIA Quadro 70.41 (Beta)
NVIDIA ForceWare 67.03 (6800U)
ATI Catalyst 4.12 (X800)
Operating System(s): Windows XP Professional SP2
Motherboards: HP WX9300 (nForce Professional)
IWill DK8N v1.0 (AMD-81xx + NVIDIA nForce 3)
DFI LANParty UT nF4 Ultra-D (for FireGL V5000)
Power Supply: 600W OCZ Powerstream PSU




SPECviewperf 8.0.1 Performance

For our SPECviewperf tests, we will look at graphs of the overall weighted scores for each viewset. We have also listed the scores for the individual tests for each viewset. For each section, we will begin by listing the description of the viewset from the SPEC website, and then analysing the data.

For futher details on how SPECviewperf scores are compiled, please see the SPEC website.

All cards were set to their default professional graphics settings, whatever those happened to be, before running SPECviewperf.

3dsmax Viewset (3dsmax-03)

"The 3dsmax-03 viewset was created from traces of the graphics workload generated by 3ds max 3.1. To ensure a common comparison point, the OpenGL plug-in driver from Discreet was used during tracing.

The models for this viewset came from the SPECapc 3ds max 3.1 benchmark. Each model was measured with two different lighting models to reflect a range of potential 3ds max users. The high-complexity model uses five to seven positional lights as defined by the SPECapc benchmark and reflects how a high-end user would work with 3ds max. The medium-complexity lighting models use two positional lights, a more common lighting environment.

The viewset is based on a trace of the running application and includes all the state changes found during normal 3ds max operation. Immediate-mode OpenGL calls are used to transfer data to the graphics subsystem."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 3dsmax-03
  1 2 3 4 5 6 7 8 9 10 11 12 13 14
WC RZ 800 48.4 48.1 38.3 38 93.6 80.8 69.7 51.8 46.1 95.6 37 36.9 30.6 30.6
WC RZ 200 38.6 38 30.2 24.3 74.1 62.1 43.5 40.4 28.3 68.7 24.1 24.1 18.3 18.3
Quadro 4K 38.6 38.5 21.8 18.8 75 53.7 40.6 30.6 17.2 70.3 25.3 26.2 21 20.1
FGL X3 32.2 27.3 23.4 18.5 62.8 51 28.9 31.9 18.6 66.6 20.5 20.6 16 16.1
FGL V5K 33.5 23.5 24.3 15.9 65.1 43.2 24.2 31.5 15.7 66.1 22.7 21.6 18.4 14.3
GF 68UGC 17.5 15.9 14.8 14.7 42.4 29.2 27.1 19.1 14.9 40.8 14.2 14.1 11.5 11.2
R X8XTPE 12 8.09 12.5 10.7 35.9 23.6 11.6 15.8 10 32.7 11.4 6.37 9.3 7.86

CATIA Viewset (catia-01)

"The catia-01 viewset was created from traces of the graphics workload generated by the CATIATM V5R12 application from Dassault Systemes.

Three models are measured using various modes in CATIA. Phil Harris of LionHeart Solutions, developer of CATBench2003, supplied SPEC/GPC with the models used to measure the CATIA application. The models are courtesy of CATBench2003 and CATIA Community.

The car model contains more than two million points. SPECviewperf replicates the geometry represented by the smaller engine block and submarine models to increase complexity and decrease frame rates. After replication, these models contain 1.2 million vertices (engine block) and 1.8 million vertices (submarine).

State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older SPECviewperf viewsets.

Mirroring the application, draw arrays are used for some tests and immediate mode used for others."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 catia-01
  1 2 3 4 5 6 7 8 9 10 11
Wildcat Realizm 800 64.5 64.7 15.5 37 23.6 77.1 41.3 35 27.9 26.6 12.9
Quadro 4000 48.9 37.2 26.9 27.9 19 47.8 29.6 28.9 19 23.2 46.1
Wildcat Realizm 200 52.1 38.6 15.8 29.7 18.9 63.3 32.5 27.6 22 20.6 13.7
FireGL X3-256 39.7 25.8 19.1 22.8 16.2 34 25.2 18.8 15 19.1 34.8
FireGL V5000 42.6 27.6 20.8 24.1 14 38.6 26.9 20.3 16.3 17.1 38.2
GeForce 6800 U GC 20.2 22.5 11.5 12.6 8.9 31.1 17.2 4.49 11.2 12.7 27.2
Radeon X800 XTPE 20.9 10.3 10.9 11.7 7.38 26.7 13.1 12.5 8.86 7.41 21

EnSight (ensight-01)

"The ensight-01 viewset replaces the Data Explorer (dx) viewset. It represents engineering and scientific visualization workloads created from traces of CEI's EnSight application.

CEI contributed the models and suggested workloads. Various modes of the EnSight application are tested using both display-list and immediate-mode paths through the OpenGL API. The model data is replicated by SPECviewperf 8.0 to generate 3.2 million vertices per frame.

State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.

Mirroring the application, both immediate-mode and display-list modes are measured."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 ensight-01
  1 2 3 4 5 6 7 8 9
Wildcat Realizm 800 27.2 35.3 46.1 27.5 20.5 20.6 27.9 20.4 28.1
Quadro 4000 42.7 35.4 35.6 31.1 13.7 13.6 31 13.8 31
Wildcat Realizm 200 24.5 20.9 26.7 23 16 15.9 23 16 23
FireGL X3-256 53.6 50 53.7 44.6 12.3 12.3 44.7 12.3 44.6
FireGL V5000 37.2 34.2 45.2 35.2 12 12.5 35.2 12.4 35.2
GeForce 6800 U GC 11.6 3.98 39.3 34.1 7.22 7.23 34 7.26 34
Radeon X800 XTPE 14.1 54.4 45.5 34.8 5.84 5.78 34.8 5.8 34.8

Lightscape Viewset (light-07)

"The light-07 viewset was created from traces of the graphics workload generated by the Lightscape Visualization System from Discreet Logic. Lightscape combines proprietary radiosity algorithms with a physically-based lighting interface.

The most significant feature of Lightscape is its ability to simulate global illumination effects accurately by pre-calculating the diffuse energy distribution in an environment and storing the lighting distribution as part of the 3D model. The resulting lighting "mesh" can then be rapidly displayed."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 light-07
  1 2 3 4 5
Wildcat Realizm 800 30 51.6 15.8 14.7 29.2
Quadro 4000 23.7 37.5 18.1 13.8 27.8
Wildcat Realizm 200 25.4 44.3 14.1 12.8 25.8
FireGL X3-256 24.8 44.5 12.5 11.6 24.2
FireGL V5000 25.2 46.9 13.5 10.3 25.6
GeForce 6800 U GC 13.2 24.2 9.06 6.75 16
Radeon X800 XTPE 12.1 22.6 7.96 5.89 14.2

Maya Viewset (maya-01)

"The maya-01 viewset was created from traces of the graphics workload generated by the Maya V5 application from Alias.

The models used in the tests were contributed by artists at NVIDIA. Various modes in the Maya application are measured.

State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets. As in the Maya V5 application, array element is used to transfer data through the OpenGL API."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 maya-01
  1 2 3 4 5 6 7 8 9
Wildcat Realizm 800 151 50.5 35.1 42.6 29.8 127 101 99.9 55.6
Quadro 4000 152 52.5 37.2 42.6 27.6 100 82.3 82.7 49.6
Wildcat Realizm 200 126 41.1 30.8 33.3 17.7 96.3 79.8 76.8 44.7
FireGL X3-256 110 33.3 23.2 29.2 20.6 82 51.1 63.7 37.2
FireGL V5000 115 35.4 24.5 30.4 21.1 85.5 56.4 66.3 38.8
GeForce 6800 U GC 54 24.5 16 12.8 10 52 39.5 37.4 23.7
Radeon X800 XTPE 25.2 14.9 9.02 8.03 7.2 36.1 26.7 25 16.3

Pro/ENGINEER (proe-03)

"The proe-03 viewset was created from traces of the graphics workload generated by the Pro/ENGINEER 2001TM application from PTC.

Two models and three rendering modes are measured during the test. PTC contributed the models to SPEC for use in measurement of the Pro/ENGINEER application. The first of the models, the PTC World Car, represents a large-model workload composed of 3.9 to 5.9 million vertices. This model is measured in shaded, hidden-line removal, and wireframe modes. The wireframe workloads are measured both in normal and antialiased mode. The second model is a copier. It is a medium-sized model made up of 485,000 to 1.6 million vertices. Shaded and hidden-line-removal modes were measured for this model.

This viewset includes state changes as made by the application throughout the rendering of the model, including matrix, material, light and line-stipple changes. The PTC World Car shaded frames include more than 100MB of state and vertex information per frame. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.

Mirroring the application, draw arrays are used for the shaded tests and immediate mode is used for the wireframe. The gradient background used by the Pro/E application is also included to model the application workload better."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 proe-03
  1 2 3 4 5 6 7
Wildcat Realizm 800 34 39.4 34.1 67 58.5 175 64.2
Quadro 4000 26.6 31 24.9 53.4 53.2 170 52.6
Wildcat Realizm 200 27.1 31.8 24.5 54.2 45.2 143 51.3
FireGL X3-256 38.3 45.9 31.7 38.4 36 135 42.9
FireGL V5000 40.4 47.2 27 41.2 38.8 145 45.7
GeForce 6800 U GC 11.2 13.1 15.3 33.5 8.55 88.5 32.7
Radeon X800 XTPE 7.33 8.63 9.05 22.7 22.5 44.2 23.8

SolidWorks Viewset (sw-01)

"The sw-01 viewset was created from traces of the graphics workload generated by the Solidworks 2004 application from Dassault Systemes.

The model and workloads used were contributed by Solidworks as part of the SPECapc for SolidWorks 2004 benchmark.

State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.

Mirroring the application, draw arrays are used for some tests and immediate mode used for others."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 sw-01
  1 2 3 4 5 6 7 8
Wildcat Realizm 800 47.7 11.2 20.1 24.5 34.5 36.1 118 25.2
Quadro 4000 34.4 12.4 14.2 17.2 40.5 28.9 49.3 21.3
Wildcat Realizm 200 43.4 9.76 12.6 15.5 48 32.7 103 22.6
FireGL X3-256 26.6 12.1 15.3 18.5 39.5 25.5 52.4 18.2
FireGL V5000 30.4 13.2 12.9 15.6 36.2 24.7 60.3 19.7
GeForce 6800 U GC 31.8 10.6 10.6 13.8 32.8 12.2 33.4 12.3
Radeon X800 XTPE 23.7 6.04 6.37 8.04 18.3 11.5 52.1 11.1

Unigraphics (ugs-04)

"The ugs-04 viewset was created from traces of the graphics workload generated by Unigraphics V17.

The engine model used was taken from the SPECapc for Unigraphics V17 application benchmark. Three rendering modes are measured: shaded, shaded with transparency, and wireframe. The wireframe workloads are measured both in normal and anti-alised mode. All tests are repeated twice, rotating once in the center of the screen and then moving about the frame to measure clipping performance.

The viewset is based on a trace of the running application and includes all the state changes found during normal Unigraphics operation. As with the application, OpenGL display lists are used to transfer data to the graphics subsystem. Thousands of display lists of varying sizes go into generating each frame of the model.

To increase model size and complexity, SPECviewperf 8.0 replicates the model two times more than the previous ugs-03 test."

SPECviewperf 8.0.1

SPECviewperf 8.0.1 ugs-04
  1 2 3 4 5 6 7 8
Wildcat Realizm 800 39.4 40.7 29.4 26.3 58.4 35.6 46.6 33.8
Quadro 4000 30.1 34.7 27.8 31.2 51 62.2 47.5 49.8
Wildcat Realizm 200 24.3 26.8 22 23.2 38.8 31.4 30.2 26.7
FireGL X3-256 18.5 19.1 13.9 14.4 53.4 56.4 53.7 53.3
FireGL V5000 17.8 19.6 14.5 15.7 47.5 41.5 47 36.3
GeForce 6800 U GC 3.29 3.86 3.01 3.52 27.3 31.9 4.94 7.38
Radeon X800 XTPE 11.9 12 9.13 9.13 22.2 22.6 22.2 22.4




Final Words

Looking back over the SPECviewperf benchmarks that we've shown today, the 3Dlabs Wildcat Realizm 800 is slower than the FireGL X3-256 under ensight and the Quadro FX 4000 under UGS.

Improvement over the Realizm 200 is generally between 15 and 30 percent. This seems to suggest to us that 3Dlabs lowered the clock speed of the GPUs a bit in order to compensate for the added heat. There is quite a bit of highly clocked silicon under the hood of the Wildcat Realizm 800, and it would definitely make sense to drop the clocks a little bit to compensate.

Of course, we would never expect 2x performance gain from this doubling of processing power with the added overhead of breaking up the scene. Aside from clock speed, there could be some driver/hardware issues associated with the way the scene is broken up, which could add to the bottleneck.

Regardless of the fact that we aren't seeing the 50 to 90 percent improvements that we would have expected, the Wildcat Realizm 800 is a high performance part across the board. Coming in at a street price of about $2000 USD, this part could pose some serious competition to the Quadro FX 4400 (~$2300 USD). Of course, we'll have to wait until we can get our hands on the top-of-the-line NVIDIA and ATI cards before we can really delve into that issue.

We are very interested in comparing this solution to NVIDIA's SLI option. If NVIDIA is able to adapt their SLI profiles to workstation applications effectively, we could see some very impressive numbers. NVIDIA isn't choosing to focus on performance as the main selling point of Quadro SLI though. They prefer to highlight the added features of the solution. The major advantages here are the massive display capabilities. We will be sure to pay attention to these key features as soon as we get our hands on a driver to test the system.

Until we are able to test the top-of-the-line NVIDIA and ATI graphics cards, the Wildcat Realizm 800 is on the top of the heap. But keep in mind that game developers will still prefer an NVIDIA or ATI based solution for their superior DirectX support. Hopefully after we get the rest of the cards together, our full review will be as interesting as our first look. Stay tuned!

Log in

Don't have an account? Sign up now