Original Link: http://www.anandtech.com/show/439

ELSA GLoria II Quadro SDR

by Gary Jones on January 9, 2000 11:44 PM EST


The release of NVIDIA’s GeForce not only impacted the gaming market with its integrated hardware transforming and lighting capabilities but it also raised a few eyebrows among professional users that depend on powerful graphics accelerators for their applications.  NVIDIA’s GPU (Graphics Processor Unit) was capable of 50 billion floating point operations per second (50GFLOPS) which is considerably higher than most mainstream professional cards with on-board transforming and lighting engines that boast performance of around 5 billion floating point operations per second. 

At the same time, the professional market saw no intent on NVIDIA’s part to market the GeForce to the high-end crowd and thus were lead to believe that the GeForce was purely a gamer’s chip, a very powerful one at that.  Those that did try the GeForce in 3D visualization applications, MCAD, and other such professional level applications were generally very impressed by its performance.  NVIDIA had produced a solution for less than $300 that was capable of besting the performance of $500 - $1500 professional level cards by a large degree.  So why wasn’t NVIDIA pushing the GeForce as a professional level solution at all?

Shortly after the release of the GeForce came the announcement of a professional level version of the GeForce, called the Quadro.  On paper, NVIDIA’s Quadro very closely resembled the GeForce, with a few exceptions.  The core clock of the Quadro is 135MHz, a 15MHz increase from the 120MHz core clock of the GeForce.  The higher core clock allows the Quadro to achieve a 13% higher fill rate of 540 Mpixels/s.   The memory clock of the Quadro specification, like the GeForce, is 166MHz, providing for an effective 2.656GB/s of memory bandwidth for the chip. 

According to NVIDIA, in order to set the Quadro apart from the GeForce (aside from it’s $600+ price tag) certain features were “enabled” on the Quadro that were left disabled on the GeForce since the latter was not intended to be a high end solution.  Among these features is enhanced support for anti-aliased points and lines, which although isn’t a commonly used feature with most users, was an extremely poor performance point for the GeForce.  The Quadro also features a peak triangle rate of 17 million triangles per second up from the GeForce’s 10 – 15 million triangles per second.  While these settings could potentially be “enabled” on the GeForce through driver tweaks/registry hacks, it is unclear exactly what methods NVIDIA went to in order to make sure that they were disabled on the GeForce. 

NVIDIA has been very successful in the consumer gaming market with a variety of card vendors selling GeForce based products, but in this new market it will face a different set of competitors and requirements. Thus NVIDIA selected ELSA, which has developed and marketed a variety of professional OpenGL graphics cards for some time, to be the exclusive card manufacturer for the Quadro based product.

In the space of a few years, NVIDIA has taken the leadership position in the performance segment of the consumer graphics card market and displaced a well entrenched vendor, 3DFx.  Will history repeat itself in the professional graphics card market? The established vendors like 3Dlabs, Intergraph, and E&S must be nervous about the impact of this new player in this market.



The applications

The technical applications and their associated needs are what drives this professional market segment.  Typical applications are Mechanical Computer Aided Design (MCAD) tools such as PTC’s Pro/E, Solidworks, and SDRC Ideas and visualization/simulation applications such as Maya and 3D Studio MAX.  Unlike the consumer game market where the goal is entertainment, these systems must be a productive and cost effective part of an organization’s work process.

Let’s look at the MCAD portion of this market and contrast those needs with the game market.   The goal of the MCAD process is basically expressed as “art to part”.  In an essentially paperless setting, engineers conceptualize, design, and analyze a product and all its component parts based on the data contained in the MCAD program’s database and, at the end of this iterative process, the database is used to generate the Computer Aided Manufacturing (CAM) data to drive the fabrication process.  Traditional paper 2D drawings (blueprints) are neither used nor needed.  This type of work process reduces the time to market for new products.  Some important contrasts with the game apps:

Geometric precision

This engineering data must be precise so the geometric data is usually stored in double precision (64 bit) floating point format. Game developers don’t need this degree of fidelity but they do need speed and minimum data storage, so the underlying databases that drive games are at most of the 32 bit variety. 

Simple shading

The MCAD designer doesn’t normally need complex multi texturing, lighting and other effects for working on their models.  Basic Gouraud (smooth) or flat shading and simple lighting is just fine in most cases.   In contrast, the game developer needs to have a wide variety of these effects in order to enhance the viewing experience and to approximate reality. Whereas fill rate is often the definitive performance factor in today’s 3D games, it carries much less weight in the MCAD field where basic shading is all that’s necessary.  Thus cards with higher fill rates don’t necessarily hold an incredible advantage in the MCAD arena which is why most professional cards have horrible gaming performance (low fill rates).

Line drawing

Game programs make almost no use of 3D line or wireframe rendering; CAD programs require an effective implementation of this feature.

Viewing fidelity

CAD programs put more emphasis on viewing fidelity.  For example, while a 16-bit Z buffer is fine for game play it is not adequate for the viewing of large complex CAD models. Unfortunately for some graphic card drivers (i.e. all NVIDIA drivers) when you select the 16 bit color mode you also get a 16 bit Z-buffer.  This fact forces the CAD user to work in the truecolor mode in order to get the Z-buffer depth that is required.  

Screen Resolution

Because of the complexity of the CAD models and the Unix heritage of most of the technical codes, a screen resolution 1280x1024 is desired.  Almost nobody runs CAD apps at less than 1024x768.  This is in contrast to most games where fill rate limitations of the current crop of 3D accelerators keep playable frame rates under the 1024 x 768 mark.  If frame rate is king in games, then image detail and quality is king in the CAD world.

Model size

The game developer controls model size (polygon count) to get acceptable performance. The CAD user has no such luxury, he is modeling a real world object or system.  MCAD models of large systems may have in excess of 500K polys and there are no workstations made that will spin one of these at say 30 fps.

Cost

The cost of a typical NT workstation ($3K - $10K) is not the dominant factor in deploying MCAD tools.  The full up cost of a skilled engineer usually exceed  $100K per year. The purchase cost of MCAD software is typically $10K-$15K per seat with a software maintenance cost of about $2K per year. In this situation, performance and user productivity are the key factors in the selection of a graphics card and not a $500 cost saving.

Optimization

MCAD and most of the other technical programs are developed to be precise full featured tools and the vendors don’t seem to optimize the code for graphics speed.  In contrast, games (and most OpenGL benchmarks) are highly optimized for speed.

The Unix factor

The technical programs are mostly cross platform (NT and Unix ) applications that for the most part were originally developed on Unix platforms.  In many cases they not especially well tuned for the Windows-Intel platform.  In contrast, game code is mainly designed for the Windows environment.

The visualization/simulation side of the technical workstation market has a different but no less demanding set of market driven technical requirements.



The competitors

ELSA’s competitors in the technical workstation graphics card market segment include 3Dlabs, Evens & Sutherland, Diamond, and Intergraph.  Some of the key characteristics of ELSA and the competitors’ products sold on the open market are summarized in the following table:

Quick Comparison Chart

Card

Estimated Street Price (USD) Driver with Profiles for Professional Applications Transforming & Lighting Memory Sub System

ELSA GLoria II - NVIDIA Quadro

$650.00

Yes
On-chip 50 GFLOPS
64MB - 128 bit SDR

3DLabs VX1

$200.00

Yes

Host CPU

32 MB - 128 bit SDR

3DLabs GVX1

$680.00

Yes

On Card - 3 GFLOPS

32 MB - 128 bit SDR

3DLabs GVX210

Unknown

Yes

On Card - 5 GFLOPS

64 MB - 256 bit SDR

E&S Lighting 1200

$400.00

Yes

Host CPU

15 MB 3DRAM, 16 MB CDRAM

E&S Tornado 3000

$1,250.00

Yes

Host CPU

30 MB 3DRAM, 16 MB CDRAM

Diamond Fire GL1

$770.00

Yes

Host CPU

32MB - 256 bit SDR

Note: The 3Dlabs GVX210 was announced an shown in August, 1999 but is not yet in production.

In addition to these products other high end products are available as part of complete workstation systems: 

1) Intergraph Wildcat cards in Intergraph, Dell, Compaq and IBM workstations.

2) HP’s fx+ series of graphics processors available in their workstations.

3) SGI’s line of NT workstations with SGI’s own integrated graphics processor (Cobalt).

4) NEC’s new line of high end NT workstation with their hot new TE4E graphics systems on sale in Japan at only about $15K-$25K US.

Price – The sweet spot in this market seem to be in the $500-$1000 range; 18 months ago it was $2000-$3000.

Drivers - Cards sold into this market need to have OpenGL driver tuning profiles set up for the mainstream technical applications.

Transform & Lighting  – Dedicated transform and lighting hardware can provide real performance benefits when running large models.

Memory system - A graphics card’s memory design is one good indicator of its fill rate performance potential.

The GLoria II with the Quadro GPU design does appear to have a key technical advantage over the competition; it has an on chip 50 Gflop transform and lighting engine. Of the other cards only the 3Dlabs (GVX1 and GVX210) and expensive Wildcat 4110 cards have this specialized hardware support.  However, note that these cards use a separate processor chip for this function and that the processing rate is approximately 10 to 16 times slower. 



The Card

The ELSA GLoria II is a very compact AGP graphics card whose main components are a NVIDIA Quadro chip, hidden under the fan/heatsink, and (8) 64 Mbit Samsung (SEC) SDRAM memory chips.  Four of the memory chips are on the front side as shown in the photo below and four are on the reverse side.

The card itself adheres to the NLX form factor specification for a peripheral card, which explains its odd shape.  The layout closely resembles that of ELSA’s ERAZOR X, their SDRAM based GeForce 256 product.  Adjusting the ERAZOR X design for use with the Quadro most likely required very little effort as the Quadro/GeForce chips should be virtually interchangeable in designs.


Click to Enlarge

The notch on the lower back of the GLoria II card is for an AGP card retention latch that some motherboard vendors (i.e. Intel) have started to provide.  The card has 64 Mbytes of Single Data Rate (SDR) SDRAM on the card, which is twice the amount on the current production GeForce 256 cards.  The memory bus architecture appears to be the same as that on the GeForce 256 cards.  The RAMDAC is a 350 MHz unit and resolutions to 2048x1536 at 85 Hz at truecolor are supported. The card only supports normal analog video output and no LCD digital outputs are provided. The GLoria II supports AGP 2X/4X (AGP Spec. 2.0).  The card’s warranty is specified at 6 .

According to the GLoria II spec sheet on line at the ELSA Europe web site, the GLoria II is provided with a comprehensive software bundle:

  • High-performance graphics drivers for Windows NT 4.0, Windows 98 and Windows 2000
  • User-friendly tools in ELSA WINmanSuite
  • OpenGL drivers optimized for professional graphics applications
  • Dedicated applications driver for AutoCAD 2000/R14 and 3D Studio MAX/VIZ R3/2.x
  • Contains software DVD player ELSAmovie


Drivers

The ELSA NT 4.0 driver used for this review has a lot of useful set up options:

The ELSA driver (on the CDROM as 4.02.02.014_019 B) has profiles specifically tuned for the following 33 technical apps:  3Dstudio MAX, ALLPLAN/ALLPLOT, AnySim, AutoCad, AVS/Express, CADdy++, CADKEY, CATIA, Cinema4D, DataCAD, dVISE-NT, Exceed, FLOTHERM, Helix, HiCAD, HyperMesh, LightScape, LightWave3D, LOGOCAD/Triga, MicroStation, OpenInventor, Pro/Designer,  Pro/Engineer, Pro/Mechanica, SoftImage3.8, SolidDesigner, SolidEdge, SolidWorks, UniCenter TNG,  Unigraphics, VISI-CAD & VISI-CAM, Visplan, and World Tool Kit.



The Test

Windows NT SP6a Test System

Hardware

CPU


Intel Pentium III 600 (Katmai)

Motherboard

Intel SE440BX2

Memory

3 x 128MB PC100 SDRAM  - 384 MB Total

Hard Drive / Controller

2 x Quantum Atlas III 9 GB – Wide Ultra2 SCSI / Adaptec AHA-2940U2W

CDROM

HP 8100 CD/RW

Video Card(s)

ELSA GLoria II Quadro SDR 64 MB (default clock - 135/166)
Creative Labs Annihilator SDR 32 MB (GeForce) (default clock - 120/166)
3Dlabs Oxygen GVX1 AGP 32 MB (default clock)
3Dlabs Oxygen GVX1 PCI  32 MB (default clock)
3Dlabs VX1 32 MB (default clock)
Diamond Viper 770U 32 MB (TNT2Ultra) (default clock - 150/183)

Ethernet

Intel 100Mbit PCI Ethernet Adapter

Software

Operating System

Windows NT 4 SP6a

Video Drivers

GLoria II Quadro SDR- ELSA Driver 4.02.02.014_19 B
GLoria II Quadro SDR , GeForce, TNT2U - NVIDIA Detonator 3.65
Oxygen GVX1 - 3Dlabs Oxygen Driver 2.14-1060
Oxygen VX1 - 3Dlabs Oxygen Driver 2.15-0146b

Benchmarking Applications

Technical

Indy3D ver. 3
OCUS R20 together with SPEC’s: GLperf 3.1.2, Viewperf 6.1.1
Pro/E Rel. 20 APC test
 

The tests were performed at 1280x1024 truecolor with Vsync off.

Note that four categories of technical benchmark codes from simple to complex were used in this review:

1)       SPEC’s GLperf 3.1.2 - A very simple OpenGL test tool useful for looking at the performance of specific OpenGL operations.

2)       Indy3D Ver. 3 - Somewhat more complex code that tries to simulate real apps.

3)       SPEC’s ViewPerf 6.1.1 - Test code that uses fragments of real apps with simulated data sets.

4)       SPEC APC Pro/E 20 and OCUS R20  – Tests that require the actual application, Pro/E Ver. 20, to be run with test data.



SPEC’s GLperf

SPEC’s Glperf 3.1.2 is a good, low level, easy to use OpenGL test tool that one can use to look at how graphics card/systems handle specific OpenGL functions.  This tool was used to investigate the tri-meshed strip performance of the cards under review.  Tri-meshed strips are one of the most efficient ways of presenting geometric data to be processed in that it reduces the compute load for either the GPU or host CPU depending on the card’s design.  GLperf is driven by a script and the execute fragment for the one used for initial test was:

TriangleStripTest {

(UserString printf("(%s,flat)",
                  ExecuteMode))
(ExecuteMode Immediate)
(DepthTest GL_LEQUAL)
(ObjsPerBeginEnd    4)
(Size from 1 to 512 step 100%)
(ShadeModel GL_FLAT)

}

This test script generates and plots flat shaded triangle strips with the triangle areas stepping from 1 pixel to 512 pixels. The test output was put into chart form:

This chart shows triangle performance (triangles per sec.) as a function of pixels per triangle. The two main constraints on card performance is evident in the plot’s shape.  A card’s peak geometry T&L rate defines a horizontal line and the card’s peak fill rate locates the diagonal line representing constant maximum fill rate.  On the flat horizontal portion of the curve the card is triangle rate limited and on the sloped portion of the curve it is fill rate limited.

The GLoria II (Quadro) with the ELSA driver and the CL Annihilator (GeForce) with the NVIDIA 3.65 driver have almost exactly the same performance and are considerably better than any of the other cards.  The AGP version of the GVX1 is better than the PCI version.  Since the VX1 and the TNT2Ultra use the host CPU for T&L, their peak triangle rates were about the same.  The value of an on card GPU is clearly seen in this chart: the GVX1, GeForce and Quadro are clearly faster on peak triangle rate than the other cards that depend on the host CPU.

Notice that for the best of the lot (Quadro and GeForce), the peak triangle rate was only about 9 million triangles per second and the effective fill rate for simple flat shading was about 100 million pixels per second.  Welcome to the real world - so much for the PR materials.



As one moves from simple flat shading to more complex shading and textures, the performance drops.  A test was performed on the GLoria II (Quadro) and GVX1 – AGP with tri-linear textures and one infinite light.  The GLperf script fragment was:

TriangleStripTest {

    (UserString printf("Triangle Strip (%s, 64x64 RGB trilinear modulated texture, smooth, 1 inf light)", ExecuteMode, Size))
    (ExecuteMode Immediate )
    (DepthTest GL_LEQUAL)
    (ObjsPerBeginEnd    4)
    (Size from 1 to 512 step 100%)
    (NormalData PerVertex)
    (TexTarget GL_TEXTURE_2D)
    (TexWidth 64)
    (TexHeight 64)
    (TexComps 3)
    (TexLOD 3)
    (TexMagFilter GL_LINEAR)
    (TexMinFilter GL_LINEAR_MIPMAP_LINEAR)
    (TexFunc GL_MODULATE)
    (NormalData PerVertex)
    (TexData PerVertex)
    (ShadeModel GL_SMOOTH)   
    (InfiniteLights 1)

}

The results are compared to the flat shaded performance in the following chart:

The peak triangle rates drop from about 9 million per second to a little over 3 million per second for the GLoria II!  The impact on the GVX1 performance was about a 40% reduction.

What performance does one need? Let’s suppose we wanted to spin smoothly (at 30 fps) a large flat shaded MCAD model (500,000 polys at 30 pixels per poly); this would require performance of 15 million polys per second at 30 pixels per triangle size.  The best card tested ( GLoria II Quadro SDR ) offers only about 18% of what is needed, so we some way to go in order to get the performance needed for this class of problem.



Other reviewers have noticed problems with anti -aliased (AA) wireframe performance for the GeForce equipped cards; the GLoria II with the Quadro GPU should not have this problem.  GLperf was used to investigate the AA line performance of the cards used in this review.  The chart below shows the results for 32 pixel flat shaded AA lines and illustrates the GeForce/driver issue:

In this test, the GLoria II Quadro SDR performed best and was about twice as fast as the next best (GVX1 – AGP).  However, the GeForce needs a lot more zeros added to its score to be competitive!  The GeForce’s 100 lines per second is less than one got in 1985 on at Tektronix 4014 graphics terminal at 9600 baud.  It’s interesting to note that the GeForce’s performance here is considerably lower than that of its predecessor, the TNT2 Ultra.  It’s obvious that NVIDIA was trying to distinguish the GeForce as clearly a gaming solution and not suited for professional applications.  Hopefully, NVIDIA will fix this problem with the next driver release but more realistically, we won’t see any attention from NVIDIA to this problem with the GeForce.  Their most likely response to the issue would be “buy a Quadro.”    

Indy3D Version 3

The Indy3D graphics evaluation tool was constructed by Sense8 to aid end-users in evaluating OpenGL 3D performance issues using application-focused metrics (MCAD, animation and simulation markets), image quality stress tests and 3D primitive metrics. The four application sub tests were used in the review measure the cards performance.  The tests were performed with the standard Indy3D default settings.  The screen resolution was 1280 x1024 truecolor with Vsync off.

First, a word about the driver profiles used for these tests.  For the ELSA and 3Dlabs drivers, the four tests (MCAD40, MCAD150, Animation, Simulation) were run with the Pro/E profile. In addition, the Animation and Simulation tests were run again using the Lightscape profile.  The best scores were logged. The NVIDIA 3.65 driver did not have any such tuning options.

The scores are in fps with the best scores shown in bold.

Indy3D – Ver 3

MCAD40
(fps)

MCAD150
(fps)

MCAD150 WF/Poly
(fps/fps)

Animation
(fps)

Simulation
(fps)

ELSA GLoria II SDR Quadro – ELSA Driver

36.32

13.74

5.30/17.96

41.68

48.12

ELSA GLoria II SDR Quadro – NVIDIA 3.65

36.93

13.88

5.77/17.93

38.17

50.84

CL Annihilator SDR  GeForce –NVIDIA 3.65

30.33

11.31

1.56/16.19

38.17

49.92

3Dlabs Oxygen GVX1, AGP

19.06

5.71

3.52/6.80

20.79

29.33

3Dlabs Oxygen GVX1 – PCI

17.17

5.02

2.46/6.30

19.29

25.98

3Dlabs Oxygen VX1

11.54

3.68

1.40/4.82

14.22

22.56

Diamond Viper 770 Ultra TNT2 – NVIDIA 3.65

11.57

3.65

1.94/4.51

11.94

35.89

When looking at these scores remember that this test suite consists of simulated app code, and the models aren’t very large or complex.  It was designed to be an easy to run, get the numbers quick, test tool.  Shown in the results table is a breakout of the wireframe (WF) and polygon (Poly) performance in the MCAD150 sub test. 

The GLoria II (Quadro) clearly beat all the others for this suite of tests.  The GLoria II (Quadro) has a huge advantage over the GeForce in wireframe drawing speed.  In the other areas, the GeForce was very close to the GLoria II in performance.  The GVX1, VX1 and TNT2U cards were much slower than the GLoria II on this test suite.

For the GLoria II card, the NVIDIA 3.65 driver was slightly faster than the ELSA driver but was less stable than the ELSA driver when running this benchmark suite.  Several crashes and hangs occurred with this driver/card combo.  The ELSA driver was very stable just like the 3Dlabs drivers.

As one would expect, the PCI flavor of the GVX1 was somewhat slower (10%-25%) than the normal AGP version of the GVX1.



SPEC’s ViewPerf - ProDesigner Viewset (ProCDRS-02)

The SPEC organization viewset, ProCDRS-02, is a replacement for the much used CDRS viewset.  It is intended to model the graphics performance of PTC's Pro/Designer software.  Unlike the CDRS, the ProCDRS-02 data set is large (a trunk); for the shaded model 281,000 vertices and 131,000 triangles and for the wireframe 202,000 vertices and 184,000 lines. 

There are 10 tests in this test suite; the first 2 tests are AA wireframe and the other 8 are shaded/rendered tests.  

Tests 1 & 2 - AA Wireframe
Tests 3 & 4 - Shaded
Tests 5 & 6 - Shaded with texture
Tests 7 & 8 - Shaded with texture & dynamic reflections
Tests 9 & 10 - Shaded with color per vertex

<>The tests were ran at 1280x1024 truecolor with Vsync off. The Pro/E profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options.

ProCDRS-02 Weight:

25%

25%

10%

10%

5%

5%

3%

3%

7%

7%

Composite

Shaded

Test #

1

2

3

4

5

6

7

8

9

10

Score

3+4

fps

fps

fps

fps

fps

fps

fps

fps

fps

fps

GLoria II SDR Quadro – ELSA Driver

29.8

33.2

22.7

24.7

18.2

20.0

17.7

18.3

16.4

18.8

25.19

47.4

GLoria II SDR  Quadro– NVIDIA 3.65

25.8

29.9

16.2

18.3

13.8

15.6

15.5

16.4

12.8

14.8

20.76

34.5

CL Annihilator 256 SDR - GeForce

Crashed

--

--

3Dlabs GVX1 - AGP

20.1

16.6

15.8

15.8

12.8

12.7

12.9

13.5

10.2

12.3

15.68

31.6

3Dlabs GVX1 - PCI

16.5

14.2

11.4

12.7

8.25

9.60

8.27

10.2

10.1

11.7

12.77

24.1

3Dlabs  VX1

7.31

7.78

7.08

8.26

5.72

6.75

4.56

5.95

4.18

5.57

6.815

15.34

Diamond Viper 770 Ultra - TNT2

10.4

14.3

15.2

17.2

14.3

15.8

13.3

15.5

12.9

14.9

13.55

32.4

 

The test suite measures the frame rate in fps for each of the tests.  Also measured in this suite is the display list build time (DLB) for each of the sub tests, but this was not a significant factor for any of the tested cards.

One could reasonably argue that the weight factors for this test are too heavily biased toward sub tests 1 and 2, which are AA wireframe views. Thus, in addition to the composite score, the sum of the test 3 and 4 results for each card tested is presented.  Tests 3 and 4 are relevant for most MCAD users who like to work with shaded views.

The GLoria II Quadro with the ELSA driver was the clear winner in this test.  The GLoria II - NVIDIA 3.65 driver combination was about 20%-25% slower than the ELSA driver on some of the key subtests. The GLoria II was considerably faster then the other cards tested. On the important Test 3+4 metric, the GLoria II was 50% faster than the next fastest card (GVX1 AGP).

The CL Annihilator (GeForce) SDR – NVIDIA 3.65 driver combo hung and then crashed on the first sub test (AA line) with a blue screen.  As a side note, on another system (Athlon 650, Gigabyte MB, Guillemot GeForce card) and what appeared to be a 3.59 driver, ProCDRS-02 ran but was very slow on the first two sub tests. 

The highest performing NT systems reported at the SPEC site score about 50% better on the composite and the Test 3+Test 4 metric than the GLoria II but they used Intel 700/733 Coppermine processors and were expensive systems.



SPEC’s ViewPerf - DesignReview Viewset (DRV-06)

A description from SPEC’s site:

“DesignReview is a 3D computer model review package specifically tailored for plant design models consisting of piping, equipment and structural elements such as I-beams, HVAC ducting, and electrical raceways. It allows flexible viewing and manipulation of the model for helping the design team visually track progress, identify interference, locate components, and facilitate project approvals by presenting clear presentations that technical and non-technical audiences can understand. 

“The DRV-06 model in this viewset is a subset of the 3D plant model made for the GYDA offshore oil production platform located in the North Sea. DesignReview works from a memory-resident representation of the model that is composed of high-order objects such as pipes, elbows valves, and I-beams. During a plant walkthrough, each view is rendered by transforming these high-order objects to triangle strips or line strips. Tolerancing of each object is done dynamically and only triangles that are front facing are generated. This is apparent in the viewset model as it is rotated. Most DesignReview models are greater than 50 megabytes and are stored as high-order objects. For this reason and for the benefit of dynamic tolerancing and face culling, display lists are not used. “

The test data set is shown below:

There are 6 tests specified by the SPEC DRV-06 viewset that represent the most common operations performed by DesignReview. These tests are as follows:

Test 

Weight

DRV functionality represented 

45%

Walkthrough rendering of curved surfaces. Each curved object (i.e., pipe, elbow) is rendered as a triangle mesh, depth-buffered, smooth-shaded, with one light and a different color per primitive. 

30%

Walkthrough rendering of flat surfaces. This is treated as a different test than #1 because normals are sent per facet and a flat shade model is used. 

8%

For more realism, objects in the model can be textured. This test textures the curved model with linear blending and mipmaps. 

5%

Texturing applied to the flat model. 

4%

As an additional way to help visual identification and location of objects, the model may have "screen door" transparency applied. This requires the addition of polygon stippling to test #2 above. 

4%

To easily spot rendered objects within a complex model, the objects to be identified are rendered as solid and the rest of the view is rendered as a wireframe (line strips). The line strips are depth-buffered, flat-shaded and unlit. Colors are sent per primitive. 

4%

Two other views are present on the screen to help the user select a model orientation. These views display the position and orientation of the viewer. A wireframe, orthographic projection of the model is used. Depth buffering is not used, so multithreading cannot be used; this preserves draw order. 

The tests were performed at 1280x1024 truecolor Vsync off. The Pro/E profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options.  The scores were: 

DRV-06  Weight:

45%

30%

8%

5%

4%

4%

4%

Composite

Test #

1

2

3

4

5

6

7

Score

fps

fps

fps

fps

fps

fps

fps

ELSA GLoria II SDR Quadro – ELSA Driver

23.6

33.4

17.5

20.4

33.6

13.8

13.8

24.67

ELSA GLoria II SDR Quadro – NVIDIA 3.65

24.7

31.6

17.9

19.9

30.8

12.7

13.9

24.62

CL Annihilator 256 SDR - GeForce

24.4

30.9

17.9

19.7

29.8

12.7

13.9

24.27

3Dlabs GVX1 - AGP

15.4

17.9

13.2

13.4

18.4

9.35

10.5

15.36

3Dlabs GVX1 - PCI

13.2

16.0

10.1

11.7

16.7

7.87

9.20

13.26

3Dlabs  VX1

7.80

7.65

6.66

6.72

7.86

4.75

5.08

7.327

Diamond Viper 770 Ultra - TNT2

10.1

11.8

8.42

9.13

9.36

3.38

3.41

9.481

 

The GLoria II (Quadro) and GeForce equipped cards scored about the same and were about 60% faster than the next best card, the GVX1-AGP.  The GVX1-PCI was not much slower in this test than the AGP version of the same card.  The cards that lacked a GPU (VX1 and TNT2Ultra) performed the slowest on this test. 

Test data posted at the SPEC site shows that only two of the high end workstation systems (that use 700/733 MHz Intel Coppermines with the Intergraph Wildcat 4110 and NEC TE4E graphic systems) score only slightly, 2%-5%, better than the Quadro or GeForce. In essence, a $250 GeForce card gives one the performance of the best $25K NT workstation for this test.  This is another example of where the Quadro does not offer a large enough performance improvement over the GeForce to justify its cost.



SPEC’s ViewPerf - Data Explorer (DX-05)

A description from SPEC’s site:

“The IBM Visualization Data Explorer (DX) is a general-purpose software package for scientific data visualization and analysis. It employs a data-flow driven client-server execution model and is currently available on Unix workstations from Silicon Graphics, IBM, Sun, Hewlett-Packard and Digital Equipment. The OpenGL port of Data Explorer was completed with the recent release of DX 2.1.

“The tests visualize a set of particle traces through a vector flow field. The width of each tube represents the magnitude of the velocity vector at that location. Data such as this might result from simulations of fluid flow through a constriction. The object represented contains about 1,000 triangle meshes containing approximately 100 vertices each. This is a medium-sized data set for DX.”

The tests assume z-buffering with one light in addition to specification of a color at every vertex. Triangle meshes are the primary primitives for this viewset. While Data Explorer allows for many other modes of interaction, these assumptions cover the majority of user interactions.

There are 10 sub test in this viewset:

Test 

Weight 

DX functionality represented 

40% 

TMESH's immediate mode. 

20% 

LINE's immediate mode. 

10% 

TMESH's display listed. 

8% 

POINT's immediate mode. 

5% 

LINE's display listed. 

5% 

TMESH's list with facet normals. 

5% 

TMESH's with polygon stippling. 

2.5% 

TMESH's with two sided lighting. 

2.5% 

TMESH's clipped. 

10 

2% 

POINT's direct rendering display listed. 

All tests were performed at 1280x1024 truecolor with Vsync off. The Lightscape profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options.  The test results were:

DX-05 Weight:

40%

20%

10%

8%

5%

5%

5%

2.5%

2%

2%

Composite

Test #

1

2

3

4

5

6

7

8

9

10

Score

fps

fps

fps

fps

fps

fps

fps

fps

fps

fps

ELSA Gloria II SDR Quadro– ELSA Driver

28.8

28.9

40.3

28.8

43.7

42.0

29.1

17.7

28.8

42.7

30.90

ELSA Gloria II SDR  Quadro– NVIDIA 3.65 Driver

29.6

29.9

38.9

29.4

42.0

42.6

30.2

18.4

29.7

41.3

31.44

CL Annihilator 256 SDR – GeForce

29.8

30.0

39.3

30.2

42.2

41.8

29.7

18.3

30.1

42.0

31.62

3Dlabs GVX1 - AGP

24.9

25.1

27.6

25.4

25.7

26.3

25.0

18.5

25.1

28.0

25.23

3Dlabs GVX1 - PCI

16.5

15.2

16.5

14.2

15.2

16.5

16.6

16.4

18.2

14.2

15.97

3Dlabs  VX1

12.8

13.3

16.2

15.2

18.1

14.5

13.0

9.76

13.6

24.0

13.82

Diamond Viper 770 Ultra - TNT2

16.9

10.6

22.8

10.8

13.5

19.8

15.5

5.03

17.7

13.2

14.68

 

The GLoria II (Quadro) and GeForce equipped cards scored about the same and were about 20% faster than the next best card, the GVX1-AGP.  The GVX1-PCI was much slower than the AGP version of the same card.  In fact the GVX1-PCI, VX1 and TNT2Ultra cards performed about the same on this test.  Test data posted at the SPEC site shows that two of the higher end workstation systems, which use the Intergraph Wildcat 4110 and NEC TE4E graphic systems, score about 100% better than the GLoria II (Quadro) or GeForce.



SPEC’s ViewPerf - Advanced Visualizer (AWadvs- 03)

SPEC's AWadvs-03 viewset represents the most common operations performed by Advanced Visualizer.  Advanced Visualizer from Alias/Wavefront is an integrated workstation-based 3D animation system that offers a comprehensive set of tools for 3D modeling, animation, rendering, image composition, and video output.

All operations within Advanced Visualizer are performed in immediate mode with double buffered windows.  The test data set is illustrated below:

These are the 10 tests specified by SPEC's AWadvs-03 viewset:

Test

Weight

Advanced Visualizer functionality represented

1

41.8%

Material shading of polygonal animation model with highest interactive image fidelity and perspective projection.

2

28.5%

Wireframe rendering of polygonal animation model with perspective projection.

3

10.45%

Material shading of polygonal animation model with lowest interactive image fidelity and perspective projection.

4

9.5%

Smooth shading of polygonal animation model with perspective projection.

5

4.75%

Flat shading of polygonal animation model with perspective projection.

6

2.2%

Material shading of polygonal animation model with highest interactive image fidelity and orthogonal projection.

7

1.5%

Wireframe rendering of polygonal animation model with orthogonal projection.

8

.55%

Material shading of polygonal animation model with lowest interactive image fidelity and orthogonal projection.

9

.5%

Smooth shading of polygonal animation model with orthogonal projection.

10

.25%

Flat shading of polygonal animation model with orthogonal projection.

 

All tests were performed at both 1280x1024 truecolor, with Vsync off. The Lightscape profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options.  The test results were:

AWadvs-03 Weight:

41.8%

28.5%

10.45%

9.5%

4.75%

2.2%

1.5%

0.55%

0.50%

0.25%

Composite

Test #

1

2

3

4

5

6

7

8

9

10

Score

fps

fps

fps

fps

fps

fps

fps

fps

fps

fps

ELSA GLoria II SDR Quadro - ELSA Driver

59.2

129

59.2

60.9

60.9

59.2

129

59.2

60.9

60.9

75.10

ELSA GLoria II SDR Quadro – NVIDIA 3.65 Driver

59.2

129

59.2

60.9

60.9

59.3

129

59.2

60.9

60.9

75.10

CL Annihilator SDR – GeForce

54.1

120

54.0

55.5

55.4

54.1

120

54.1

55.6

55.6

68.95

3Dlabs GVX1 – AGP

16.5

41.7

16.5

16.9

41.5

16.2

41.6

16.2

16.6

41.1

22.86

3Dlabs GVX1 – PCI

15.5

30.9

15.5

16.5

39.1

15.3

30.8

15.3

16.3

38.8

20.08

3Dlabs Oxygen VX1

17.6

36.8

17.7

21.0

21.5

18.0

38.3

18.0

21.3

22.2

22.62

Diamond Viper 770 Ultra - TNT2

12.2

16.7

12.1

13.1

13.0

12.1

16.7

11.9

12.9

12.9

13.53

 

The GLoria II (Quadro) and GeForce equipped cards performed quite well on this benchmark suite; in fact the GLoria II scores were better than any of the system scores recently published at SPEC except those of high cost NEC TE4E system (which scored 93.4 composite).  The ELSA GLoria II with either driver was more than 3 times faster than the GVX1 - AGP and 5.5 times faster than a TNT2Ultra.  It looks like the 50 GFLOPS on the GPU are being used effectively.  The three 3Dlabs cards (GVX1-AGP, GVX1-PCI and VX1) all scored within 10% of each other.  The TNT2U card performed a lot slower than any of the other cards on this test.



SPEC’s ViewPerf - Lightscape-03

The Lightscape Visualization System from Discreet Logic combines proprietary radiosity algorithms with a physically based lighting interface.  A sample screen shot is shown:




There are four tests specified by the viewset that represent the most common operations performed by the Lightscape Visualization System:

Test

Weight

Functionality represented

1

25%

Walkthrough wireframe rendering of "Cornell Box" model using line loops with colors supplied per vertex.

2

25%

Full-screen walkthrough solid rendering of "Cornell Box" model using smooth-shaded z-buffered quads with colors supplied per vertex.

3

25%

Walkthrough wireframe rendering of 750K-quad Parliament Building model using line loops with colors supplied per vertex.

4

25%

Full-screen walkthrough solid rendering of 750K-quad Parliament Building model using smooth-shaded z-buffered quads with colors supplied per vertex.

 

The SPEC Lightscape-03 viewset tests were performed at 1280x1024, truecolor, Vsync  off. The Lightscape profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options

Lightscape - 03 Weight:

25%

25%

25%

25%

Composite

Test #

1

2

3

4

Score

fps

fps

fps

fps

ELSA GLoria II SDR Quadro– ELSA Driver

4.37

4.97

2.49

2.80

3.508

ELSA GLoria II SDR Quadro– NVIDIA 3.65 Driver

4.49

4.90

2.52

2.80

3.530

CL Annihilator SDR GeForce

4.64

5.03

2.64

2.85

3.640

3Dlabs Oxygen GVX1 - AGP

3.11

3.58

1.99

2.18

2.636

3Dlabs Oxygen GVX1 - PCI

2.32

2.78

1.56

1.73

2.043

3Dlabs Oxygen VX1

2.02

2.51

1.21

1.56

1.759

Diamond Viper 770 Ultra TNT2

1.53

2.95

0.981

1.68

1.651

 

The GLoria II (Quadro) and GeForce equipped cards scored about the same and were about 38% faster than the next best card the GVX1-AGP.  The GVX1-PCI was a lot slower than the AGP version of the same card.  The VX1 and TNT2Ultra cards performed the slowest on this test.  Test data posted at the SPEC site shows that only the higher end NT workstation systems, which use the Intergraph Wildcat 4110 and NEC TE4E graphic systems, score better (~40%) than the GLoria II (Quadro) or GeForce cards.



OCUS R20 Pro/E Test

The OCUS R20 benchmark, developed by Olaf Corten and available at his site, is a good small-medium size Pro/E benchmark and typically runs in about 8 -12 minutes for these cards.  The test consists of 17 sub tests that run via a Pro/E trail file; OCUS roughly categorizes the tests into CPU, Graphics, GUI and I/O components as shown below. Note that this suite tests not only the graphics component but also the CPU and I/O systems. 

The test conditions were at 1280x1024 truecolor, Vsync off. The Pro/E profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options. The test scores are shown below (with the best scores shown in bold):

 

All the cards with on card T&L (GLoria II – Quadro, GeForce, GVX1’s ) scored about the same  (~ 5%) on the graphics portion of this test.  The two cards without on card T&L, VX1 and TNT2U, performed much worse (~40%). On card T&L does make a lot of difference for some apps.

On the GUI component part of the test, all the tested cards, except for the TNT2U, were within a few seconds of each other.  The TNT2U was about 14% slower than the best score.

The ELSA driver performed about the same as the NVIDIA 3.65 driver when used with the GLoria II card.



SPEC’s APC Pro/E Ver. 20

The Pro/Engineer MCAD benchmark from the SPEC APC organization is available at the SPEC/APC web site.  This is a very large Pro/E test, the NT download is about 45MB in zipped form and it expands to about 240 MB of files.  The run times range from one to several hours.  A description from SPEC’s web page:

“SPEC/GPC's Application Performance Characterization (SPECapcSM) project group offers performance results and free downloads for a benchmark based on Pro/ENGINEER Rev. 20. The model used in the benchmark is a realistic rendering of a complete photocopy machine consisting of approximately 370,000 triangles.

“The benchmark comprises 17 tests. Startup and initialization time is measured, but given no weight (0.0) within the composite score for the benchmark. There are 16 graphics tests, each of which measures a different rendering mode or features. The first three graphics tests measure wireframe performance using the entire model. The next four measure different aspects of shaded performance, using the same model. Each of these tests executes exactly the same sequence of 3D transformations to provide a direct comparison of different rendering modes. The next four tests use a subassembly, and compare the two FASTHLR modes, the default shading mode, and shaded with edges. These tests also execute a common sequence of 3D transformations. The last five graphics tests use two different instances of the model - the first three without its outer skins (to illustrate the effect of FASTHLR and level-of-detail operations), and the last two to illustrate complex lighting modes and surface curvature display. The last test is an aggregate of all time not accounted for by the previous 16 tests, and is a mix of CPU and graphics operations.

“Scores are generated for all 17 tests. Composite numbers are provided for each set of graphics tests (shaded, sub-assembly, wireframe and other) and there is an overall composite score for graphics and CPU operations. Start-up and initiation time is not included in the composite score. “

Check out the SPEC/APC site for a more complete benchmark description and sample output from a benchmark run.

The Pro/E profile was selected for the ELSA driver and the 3Dlabs drivers; the NVIDIA 3.65 driver did not have any such tuning options.  Listed below are the run time results in seconds (the best times are shown in bold) obtained from this test suite:

 

The CL Annihilator (GeForce) with the NVIDIA 3.65 driver failed to complete the test sequence; it appeared to hang in test # 2 (Wireframe-Smooth).

Looking at these raw numbers, two things are apparent: 

1)       The ELSA GLoria II (Quadro) is the fastest card for most of the tested operations. 

2)       ELSA and NVIDIA need to fix the problem with their drivers that is evident when one looks at the times for test #6 (Shaded–clipped).



p>To generate the composite scores which SPEC used, a dual Pentium II 300MHz system with 512 MB of memory and an Accel ECLIPSE graphics card was used as a reference system. The composite test scores are computed ratios with respect to this system. The composite scores obtained from the tests were:

Test System

ELSA GLoria II SDR

ELSA Gloria II SDR

CL Annihilator SDR

3Dlabs GVX1 AGP

3Dlabs GVX1 PCI

3Dlabs VX1

Diamond 770 Ultra - TNT2

Driver:

ELSA

3.65

3.65

2.14-1060

2.14-1060

2.15-0146b

3.65

Failed Test

 Overall  Composite Score

3.77

3.98

3.41

3.19

1.79

1.70

-- Wireframe composite

5.18

5.27

4.65

3.82

2.37

2.49

-- Shaded composite

2.48

3.13

3.55

3.52

1.50

1.40

-- Sub Assy composite

4.16

4.07

3.62

3.41

1.87

1.73

-- Other Composite

4.66

4.69

3.59

3.38

1.68

1.63

 

Even with the problems with test # 6, the GLoria II Quadro SDR had the best overall composite scores, with the 3.65 driver having a somewhat higher value than for the ELSA driver configuration.  When the problem with clipping is fixed the GLoria II should score in the mid 4’s on the overall composite, which would be about 30% better than those for the 3Dlabs GVX1.  The GLoria II appears to be the clear winner for this test.

How does the ELSA GLoria II card compare with the somewhat higher cost (more than  8K$) engineering workstations from HP, with the fx6+ graphics, and those that use the new Intergraph Wildcat 4110 graphics engine?  Using some of the more basic test scores combined with data from the SPEC site we get the following:

Test System

ELSA GLoria II SDR Quadro

ELSA GLoria II SDR Quadro

IBM Intell. M Pro 733

Wild 4110

HP P-Class 700, fx6+

Driver:

ELSA

3.65

From
SPEC

From
SPEC

Test #

(sec)

(sec)

(sec)

(sec)

2. Wireframe -smooth

109

107

82

85

5. Shaded

253

256

224

179

 

Notice that this comparison is not exactly one to one, in that the workstations had faster Coppermine CPU’s.  Even so, the GLoria II compares quite well with these more expensive systems.



CPU Utilization

The use of either an on chip (GLoria II - Quadro) or on card  (GVX1) T&L processing should off load the host CPU to considerable extent.  The CPU workload was measured during the running of the ProCDRS-02 test for the GLoria II (ELSA driver) and for the 3Dlabs GVX1 – AGP.  A chart of the results was surprising:

ProCDRS-02 is comprised of 10 sub-tests and the CPU load profiles for each of the cards clearly shows that fact.  The GLoria II (Quadro) uses almost all the CPU when running this test and no CPU offloading was observed.  On the other hand, the GVX1 uses almost no CPU (less than 10%) during the actual running of each sub-test but does use some at the start of each of the sub-tests when the display list is being created. 

The GLoria II finishes the test quicker than the GVX1 but at the expense of using all the CPU.  This result means that the GLoria II’s performance should scale as CPU speed and the GVX1’s performance should be somewhat independent of CPU speed for this type of test.



Analysis/Conclusion

The Quadro and GeForce GPU’s are remarkable achievements due to NVIDIA’s ability to put a very powerful T&L engine into a single chip graphics processor.  The performance of these two units are very similar except that the wireframe feature is not enabled or implemented effectively in the GeForce GPU. Some reasonable conclusions about the GLoria II Quadro SDR card can be inferred from the test data:

- The polygon performance of the GLoria II Quadro SDR and GeForce SDR cards are about the same.  The GLperf test results, and the Indy3D Simulation and Animation results support this conclusion.

- The GLoria II Quadro SDR does not have any problems with AA lines unlike the GeForce; in fact, on the GLperf AA line test the GLoria II was the fastest card.

- For the visualization class of tasks, the GLoria II Quadro SDR and the GeForce SDR are similar in performance and are much faster than any of the other tested cards.  The test results from the Indy3D Simulation and Animation tests and Viewperf’s DX-05, Awadvs-03, and Lightscape-03 tests support this view.

- On the MCAD types of application and tasks, the GLoria II Quadro SDR was the fastest card tested. On the large model in the SPECapc Pro/E 20 test suite, the GLoria II was fastest in most of the sub tests. The GeForce has problems with AA lines and when used with the NVIDIA 3.65 driver it could not run to completion the SPECapc Pro/E Rel. 20 test or the ProCDRS-02 test.

- The GLoria II Quadro SDR does make considerable use of the host CPU as was illustrated by the test results; the up side to this is that its performance will scale as the CPU speed.  Thus it will run even faster on the new generation of fast CPU’s just coming to market.  Conversely, it might not be too good a choice to use with an older generation slower CPU.  This is a very interesting conclusion because you would think that the overhead placed on the CPU would have been offloaded almost completely onto the hardware T&L engine of the Quadro.

- The ELSA driver is a better choice to use with the GLoria II than the NVIDIA 3.65; the performance is about the same but the ELSA driver is more stable and has the tuning features that a professional user needs.  This is exactly why NVIDIA went to ELSA to manufacturer Quadro based boards and not a gaming card manufacturer like Creative Labs.

In the world of desktop systems and workstations, the best is usually the newest and the GLoria II Quadro SDR seems to live up to that standard. It is an excellent graphics card for professional applications and its competitors will have to work very hard to match or exceed its performance. 

If the advantages the Quadro holds over the GeForce don’t matter to you, then a GeForce based card may be a better overall option because of its lower cost.  Even if you go with the Quadro, the  $650 card earns its value by the incredible performance improvement it holds over previous title holders that fall in the $1000+ range. 

Log in

Don't have an account? Sign up now