AGEIA PhysX Technology and GPU Hardware

First off, here is the low down on the hardware as we know it. AGEIA, being the first and only consumer-oriented physics processor designer right now, has not given us as much in-depth technical detail as other hardware designers. We certainly understand the need to protect intellectual property, especially at this stage in the game, but this is what we know.

PhysX Hardware:
125 Million transistors
130nm manufacturing process
128MB 733MHz Data Rate GDDR3 RAM
128-bit memory bus interface
20 giga-instructions per second
2 Tb/sec internal memory bandwidth
"Dozens" of fully independent cores


There are quite a few things to note about this architecture. Even without knowing all the ins and outs, it is quite obvious that this chip will be a force to be reckoned with in the physics realm. A graphics card, even with a 512-bit internal bus running at core speed, has less than 350 Gb/sec internal bandwidth. There are also lots of restrictions on the way data moves around in a GPU. For instance, there is no way for a pixel shader to read a value, change it, and write it back to the same spot in local RAM. There are ways to deal with this when tackling physics, but making highly efficient use of nearly 6 times the internal bandwidth for the task at hand is a huge plus. CPUs aren't able to touch this type of internal bandwidth either. (Of course, we're talking about internal theoretical bandwidth, but the best we can do for now is relay what AGEIA has told us.)

Physics, as we noted in last years article, generally presents itself in sets of highly dependant small problems. Graphics has become sets of highly independent mathematically intense problems. It's not that GPUs can't be used to solve these problems where the input to one pixel is the output of another (performing multiple passes and making use of render-to-texture functionality is one obvious solution); it's just that much of the power of a GPU is mostly wasted when attempting to solve this type of problem. Making use of a great deal of independent processing units makes sense as well. In a GPU's SIMD architecture, pixel pipelines execute the same instructions on many different pixels. In physics, it is much more often the case that different things need to be done to every physical object in a scene, and it makes much more sense to attack the problem with a proper solution.

To be fair, NVIDIA and ATI are not arguing that they can compete with the physics processing power AGEIA is able to offer in the PhysX chip. The main selling points of physics on the GPU is that everyone who plays games (and would want a physics card) already has a graphics card. Solutions like Havok FX which use SM3.0 to implement physics calculations on the GPU are good ways to augment existing physics engines. These types of solutions will add a little more punch to what developers can do. This won't create a revolution, but it will get game developers to look harder at physics in the future, and that is a good thing. We have yet to see Havok FX or a competing solution in action, so we can't go into any detail on what to expect. However, it is obvious that a multi-GPU platform will be able to benefit from physics engines that make use of GPUs: there are plenty of cases where games are not able to take 100% advantage of both GPUs. In single GPU cases, there could still be a benefit, but the more graphically intensive a scene, the less room there is for the GPU to worry about anything else. We are certainly seeing titles coming out like Oblivion which are able to bring everything we throw at it to a crawl, so balance will certainly be an issue for Havok FX and similar solutions.

DirectX 10 will absolutely benefit AGEIA, NVIDIA, and ATI. For physics on GPU implementations, DX10 will decrease overhead significantly. State changes will be more efficient, and many more objects will be able to be sent to the GPU for processing every frame. This will obviously make it easier for GPUs to handle doing things other than graphics more efficiently. A little less obviously, PhysX hardware accelerated games will also benefit from a graphics standpoint. With the possibility for games to support orders of magnitude more rigid body objects under PhysX, overhead can become an issue when batching these objects to the GPU for rendering. This is a hard thing for us to test for explicitly, but it is easy to understand why it will be a problem when we have developers already complaining about the overhead issue.

While we know the PhysX part can handle 20 GIPS, this measure is likely simple independent instructions. We would really like to get a better idea of how much actual "work" this part can handle, but for now we'll have to settle for this ambiguous number and some real world performance. Let's take a look a the ASUS card and then take a look at the numbers.

Index ASUS Card and Test Configuration
Comments Locked

101 Comments

View All Comments

  • Walter Williams - Friday, May 5, 2006 - link

    Too bad not even quadcores will be able to outperfrom the PPU when it comes to physics calculations.

    You all need to wait for another game that uses the PPU to be reviewed before jumping to any conclusions.

    The developers of GRAW did a very poor job compared to the developers of CellFactor. This will come to light soon.
  • saratoga - Friday, May 5, 2006 - link

    quote:

    Too bad not even quadcores will be able to outperfrom the PPU when it comes to physics calculations.


    quote:

    jumping to any conclusions.


    Haha.
  • DerekWilson - Friday, May 5, 2006 - link

    just because something is true about the hardware doesn't mean it will every come to fruition in the software. it isn't jumping to a conclusion to say that the PPU is *capable* of outperforming a quadcore cpu when it comes to physics calculations -- that is a fact, not an opinion due to the architecture.

    had the first quote said something about games that use physics performing better on one rather than the other, that would have been jumping to conclusions.

    the key here is the developers and how the problem of video game physics maps to hardware that is good at doing physics calculations. there are a lot of factors.
  • saratoga - Saturday, May 6, 2006 - link

    quote:

    it isn't jumping to a conclusion to say that the PPU is *capable* of outperforming a quadcore cpu when it comes to physics calculations -- that is a fact, not an opinion due to the architecture.


    Its clearly an opinion. For it to be a fact, it would have to be verifiable. However, no one has made a quad core x86 processor, and no game engine has been written to use one.

    The poster simply stated his opinion and then blasted other people for having their own opinions, all without realizing how stupid it sounded which is why it was such a funny post.

  • Walter Williams - Saturday, May 6, 2006 - link

    I did not blast anybody...

    It is a simple fact that a dedicated processor for X will always outperfrom a general purpose processor when doing X from a hardware perspective.

    Whether or not the software yields the same results is another question. Assuming that the PCI bus is not holding back performance of the PPU, it is incredibly unlikely that quad core CPUs will be able to outperform the PPU.
  • saratoga - Saturday, May 6, 2006 - link

    quote:

    It is a simple fact that a dedicated processor for X will always outperfrom a general purpose processor when doing X from a hardware perspective.


    Clearly false. General purpose processors sometimes beat specialized units. It depends on resources available to each device, and the specifics of the problem. Specialization is a trade off. If your calculation has some very specific and predictable quality, you might design a custom processor that exploits some property of your problem effectively enough to overcome the billions Intel and AMD poured into developing a general purpose core. But you may also end up with an expensive processor thats left behind by off the shelf components :)

    Furthermore, this statement is hopelessly general. What if X is running Linux? Or any other application that x86 CPUs are already specialized for. Can you really concieve of an even specialized processor for this task that didn't resemble a general purpose CPU? Doubtful.

    quote:

    Assuming that the PCI bus is not holding back performance of the PPU, it is incredibly unlikely that quad core CPUs will be able to outperform the PPU.


    You're backpeddleing. You said:

    "Too bad not even quadcores will be able to outperfrom the PPU when it comes to physics calculations."

    Now you're saying they might be able to do it. So much for jumping to conclusions?
  • JarredWalton - Friday, May 5, 2006 - link

    People keep mentioning Cell Factor. Well and good that it uses more physics calculations as well as the PhysX card. Unfortunately, right now it requires the PhysX card and it's looking like 18 MONTHS (!) before the game ships - if it ever gets done. We might as well discuss how much better Havok FX is going to be in The Elder Scrolls V. :p

    For the first generation, we're far more likely to see a lot of the "tacked on" approach as companies add rudimentary support to existing designs. We also don't have a way to even compare Cell Factor with and without PhysX. Are they hiding something? I mean, 15% faster under the AGEIA test demo using a high-end CPU isn't looking like much. If they allow CellFactor to run on software (CPU) PhysX calculations, get that to support SMP systems for the calculations, and we get 2 FPS in Cell Factor, that's great. It shows the PhysX card does soemthing. If they allow all that and the dual core chips end up coming very close to the same performance, we've got a problem.

    Basically, right now we're missing real world (i.e. gaming) apples-to-apples comparisons. It's like comparing X800 to 6800 cards under games that only supported SM3.0 or SM1.1 - better shaders or faster performance, but X800 could have come *much* closer with proper SM2.0 support.
  • NastyPope - Friday, May 5, 2006 - link

    AMD & Intel could license the PhysX technology and include a dedicated PhysX (or generic multi-API) core on their processors and market them as game processors. Although some science and technology applications could make use of it as well. Being on-die would reduce latency and provide a huge amount of bandwidth between cores. Accessing system memory could slow things down but still be much faster than data transfers across a PCI bus.
  • Woodchuck2000 - Friday, May 5, 2006 - link

    The reason that framerates drop with the PhysX card installed is simply that the graphics card is given more complex effects to render.

    At some point in the future, games will be coded with a physics API in mind. Interactions between the player and the game environment will be through this API, regardless of whether there is dedicated hardware available.

    It's a truth universally acknowledged that graphics are better left to the graphics card - I don't hear anyone suggesting that the second core in a duallie system should perform all the graphics calculations. I think that in time, this will be true of physics too.

    Once the first generation of games built from the ground up with a physics API in mind come out, this will sell like hot cakes.
  • Calin - Friday, May 5, 2006 - link

    The reasons frame rates drop is the fact that with the physics engine, the video card have more to render - in the grenade explosion images, the "with physics" image has tens of dumpster bits flying, while in the "non physics" there are hardly a couple.
    If there would have been the same complexity of scenes, I wonder how much faster the ageia would be

Log in

Don't have an account? Sign up now