Benchmarking Physics

We've had a lot of responses about the benchmarking procedures we used in our first PhysX article. We would like to clear up what we are trying to accomplish with our tests, and explain why we are doing things the way we are. Hopefully, by opening up a discussion of our approach to benchmarking, we can learn how to best serve the community with future tests of this technology.

First off, average FPS is a good measure of full system performance under games. Depending on how the system responds to the game over multiple resolutions, graphics cards and CPU speeds, we can usually get a good idea of the way the different components of a system impact an applications performance.

Unfortunately, when a new and under used product (like a physics accelerator) hits the market, the sharp lack of applications that make use of the hardware present a problem to consumers attempting to evaluate the capabilities of the hardware. In the case of AGEIA's PhysX card, a sharp lack of ability to test applications running with a full compliment of physics effects in software mode really hampers our ability to draw solid conclusions.

In order to fill in the gaps in our testing, we would usually look towards synthetic benchmarks or development tools. At this point, the only synthetic benchmark we have is the boxes demo that is packaged with the AGEIA PhysX driver. The older tools, demos and benchmarks (such as 3DMark06) that use the PhysX SDK (formerly named Novodex) are not directly supported by the hardware (they would need to be patched somehow to enable support if possible).

Other, more current, demos will not run without hardware in the system (like CellFactor). The idea in these cases would be to stress the hardware as much as possible to find out what it can do. We would also like to find out how code running on the PhysX hardware compares to code running on a CPU (especially in a multiprocessor environment). Being able to control the number and type of physics objects to be handled would allow us to get a better idea of what we can expect in the future.

To fill in a couple gaps, AGEIA states that the PhysX PPU is capable of handling over 533000 convex object collisions per second and 3X as many sphere collisions per second. This is quite difficult to relate back to real world performance, but it is appears to be more work than a CPU or GPU could perform per second.

Of course, there is no replacement for actual code, and (to the end user) hardware is only as good as the software that runs on it. This is the philosophy by which we live. We are dedicated first and foremost to the enthusiast who spends his or her hard earned money on computer hardware, and there is no substitute for real world performance in evaluating the usefulness of a tool.

Using FPS to benchmark the impact of PhysX on performance is not a perfect fit, but it isn't as bad as it could be. Frames per second (in an instantaneous sense) is one divided by the time it takes to render a single frame. We call this the frametime. One divided by an average FPS is the average time it takes for a game to produce a finished frame. This takes into account the time it takes for a game to take in input, update game logic (with user input, AI, physics, event handling, script processing, etc.), and draw the frame via the GPU. Even though a single frame needs to travel the same path from start to finish, things like cueing multiple frames for rendering to the GPU (usually 3 at most) and multithreaded game programming are able to hide some of the overhead. Throw PhysX into the mix, and ideally we can offload some of this work somewhere else.

Here are some examples of how frametime can be affected by a game. These are very limited examples and don't reflect the true complexity of game programming.

CPU limited situations:
CPU: |------------ Game logic ------------||----
GPU: |---- Graphics processing ----|       |----

The GPU must wait on the CPU to setup the next frame before it can start rendering. In this case, PhysX could help by reducing the CPU load and thus frametime.

Severely GPU limited situations:
CPU: |------ Game Logic ------|             |---
GPU: |-------- Graphics processing --------||---

The CPU can start work on the next frame before the GPU finishes, but any work after three frames ahead must be thrown out. In the extreme case, this can cause lag between user input and the graphics being displayed. In less severe cases, it is possible to keep the CPU more heavily loaded while the frametime still depends on the GPU alone.

In either case, as is currently being done in both City of Villains and Ghost Recon Advanced Warfighter, the PhysX card can ideally be added to create additional effects without adding to frametime or CPU/GPU load. Unfortunately, the real world is not ideal, and in both of these games we see an increase in frametime for at least a couple frames. There are many reasons we could be seeing this right now, but it seems to not be as much of a problem for demos and games designed around the PPU.

In our tests of PhysX technology in the games which currently make use of the hardware, multiple resolutions and CPU speeds have been tested in order to determine how the PhysX card factors into frametime. For instance, it was very clear in our initial GRAW test that the game was CPU limited at low resolutions because the framerate dropped significantly when running on a slower processor. Likewise, at high resolutions the GPU was limiting performance because the drop in processor speed didn't affect the framerate in a very significant way. In all cases, after adding the PhysX card, we were easily able to see that frametime was most significantly limited by either the PhysX hardware itself, AGEIA driver overhead, or the PCI bus.

Ideally, the PhysX GPU will not only reduce the load on the CPU (or GPU) by unloading the processing of physics code, but will also give developer the ability to perform even more physics calculations in parallel with the CPU and GPU. This solution absolutely has the potential to be more powerful than moving physics processing to the GPU or a second core on a CPU. Not only that, but the CPU and GPU will be free to allow developers to accomplish ever increasingly complex tasks. With current generation games becoming graphics limited on the GPU (even in multi-GPU configurations), it seems counterintuitive to load it even more with physics. Certainly this could offer an increase in physics realism, but we have yet to see the cost.

BFG PhysX and the AGEIA Driver City of Villains Tests
Comments Locked

67 Comments

View All Comments

  • yanyorga - Monday, May 22, 2006 - link

    Firstly, I think it's very likely that there is a slowdown due to the increased number of objects that need to be rendered, giving credence to the apples/oranges arguement.

    However, I think it is possible to test where there are bottlenecks. As someone already suggested, testing in SLI would show whether there is an increased GPU load (to some extent). Also, if you test using a board with a 2nd GPU slot which is only 8x and put only 1 GPU in that slot, you will be left with at least 8x left on the pci bus. You could also experiment with various overclocking options, focusing on the multipliers and bus.

    Is there any info anywhere in how to use the PPU for physics or development software that makes use of it?
  • Chadder007 - Friday, May 26, 2006 - link

    That makes wonder why City of Villans was tested with PPU at 1500 Debris objects comparing it to software at 422 Debris objects. Anandtech needs to go back and test WITH a PPU at 422 Debris objects to compare it to the software only mode to see if there is any difference.
  • rADo2 - Saturday, May 20, 2006 - link

    Well, people have now pretty hard time justifying spending $300 on a decelerator.

    I am afraid, however, that Ageia will be more than willing to "slow down a bit" their future software drivers, to show some real-world "benefits" of their decelerator. By adding more features to their SW (by CPU) emulation, they may very well slow it down, so that new reviews will finally bring their HW to the first place.

    But these review will still mean nothing, as they compare Ageia SW drivers, made intentionally bad performing, with their HW.

    Ageia PhysX is a totally wrong concept, Havok FX can do the same via SSE/SSE2/SSE3, and/or SM 3.0 shaders, it can also use dualcore CPUs. This is the future and the right approach, not additional slow card making big noise.

    Ageia approach is just a piece of nonsense and stupid marketing..
  • Nighteye2 - Saturday, May 20, 2006 - link

    Do not take your fears to be facts. I think Ageia's approach is the right one, but it'll need to mature - and to really get used. The concept is good, but execution so far is still a bit lacking.
  • rADo2 - Sunday, May 21, 2006 - link

    Well, I think Ageia approach is the worst possible one. If game developers are able to distribute threads between singlecore CPU and PhysX decelerator, they should be able to use dualcore CPUs for just the same, and/or SM3.0 shaders. This is the right approach. With quadcore CPUs, they will be able to use 4 core, within 5-6 yers about 8 cores, etc. PhysX decelerator is a wrong direction, it is useful only for very limited portfolio of calculations, while CPU can do them as well (probably even faster).

    I definitely do NOT want to see Ageix succeed..
  • Nighteye2 - Sunday, May 21, 2006 - link

    That's wrong. I tested it myself running Cellfactor without PPU on my dual-core PC. Even without the liquid and cloth physics, large explosions with a lot of debree still caused large slowdowns, after which it stayed slow until most of the flying debree stopped moving.

    On videos I saw of people playing with a PPU, slowdowns also occurred but lasted only a fraction of a second.

    Also, the CPU is also needed for AI, and does not have enough memory bandwidth to do proper physics. If you want to get it really detailed, hardware physics on a dedicated PPU is the best way to go.
  • DigitalFreak - Thursday, May 18, 2006 - link

    Don't know how accurate this is, but it might give the AT guys some ideas...

    http://www.hardforum.com/showthread.php?t=1056037">HardForum
  • Nighteye2 - Saturday, May 20, 2006 - link

    I tried it without the PPU - and there's very notable slowdowns when things explode and lots of crates are moving around. And that's from running 25 FPS without moving objects. I imagine performance hits at higher framerates will be even bigger. At least without PPU.
  • Clauzii - Thursday, May 18, 2006 - link

    The German site Hartware.de showed this in their test:

    Processor Type: AGEIA PhysX
    Bus Techonology: 32-bit PCI 3.0 Interface
    Memory Interface: 128-bit GDDR3 memory architecture
    Memory Capacity: 128 MByte
    Memory Bandwidth: 12 GBytes/sec.
    Effective Memory Data Rate: 733 MHz
    Peak Instruction Bandwidth: 20 Billion Instructions/sec
    Sphere-Sphere collision/sec: 530 Million max
    Convex-Convex(Complex) collisions/sec.: 533,000 max

    If graphics are moved to the card, a 12GB/s memory will be limiting, I think :)
    Would be nice to see the PhysiX RAM @ the specced 500MHz, just to see if it has anything to do with that issue..
  • Clauzii - Thursday, May 18, 2006 - link

    Not test - preview, sorry.

Log in

Don't have an account? Sign up now