The Deal with BadaBOOM

Due out in Q3, BadaBOOM is going to be the consumer version of the encoder. It will be an "affordable" program designed for those users who want to quickly take a video file and convert it to another format without playing with settings like bitrate. This is the application we were given a chance to preview.

Under the RapiHD brand, Elemental will deliver a professional version of their encoder/transcoding software. This application will allow you more options than BadaBOOM, letting you select bitrate and resolution, among other quality settings, manually.

As we mentioned before, the software was developed using CUDA and thus will only run on a CUDA-enabled NVIDIA GPU. NVIDIA has a full list here but in short, anything from the GeForce 8, GeForce 9 or GeForce GTX 280/260 families will work.

Mr. Blackman told us that the company isn't specifically tied to using NVIDIA hardware and that as Larrabee and other AMD/ATI solutions come to light it may evaluate bringing the technology to more platforms. But for now, this will only work if you have a CUDA-enabled GPU (and as such, it stands to be one of the biggest non-gaming killer apps for NVIDIA hardware).

Performance

In our testing we found that even though performance improved tremendously over a CPU-only encode, the process still required a fast host CPU (the Core 2 Extreme QX9770 was at 25 - 30% CPU utilization). It turns out that there are two factors at work here.

According to Mr. Blackman, NVIDIA's initial CUDA release didn't have the streaming mechanism that allows you to run CPU cycles in parallel with the GPU. This functionality was added in later versions of CUDA, but the early beta we tested was developed using the initial CUDA release. Once the CPU and GPU can be doing work in parallel, the CPU side of the equation should be reduced.

Secondly, it's worth pointing out that only parts of the codec are very parallelizable (motion compensation, motion estimation, DCT and iDCT) but other parts of the pipeline (syntax decoding, variable length coding, CABAC) are not so well suited for NVIDIA's array of Streaming Processors.

Elemental also indicated that performance scales linearly with the number of SPs in the GPU, so presumably the GeForce GTX 280 should be nearly 90% faster (at least at the GPU-accelerated functions) than a GeForce 9800 GTX.

SLI Support?

As we found in our GT200 article, in most cases NVIDIA's fastest GPU is actually the pair of G92s found on a single GeForce 9800 GX2. Unfortunately, Elemental's software will not split up a single video stream for processing across multiple GPUs - so NVIDIA's fastest GPU would be the GeForce GTX 280.

There is an exception however; if you do have multiple GPUs in your system, the professional version of Elemental's software will let you output to two different resolution/bitrate targets at the same time - with each GPU handling a different transcode stream.

Final Words

We've still got a couple of months before Elemental's software makes its official debut, but it's honestly the most exciting non-gaming application we've seen for NVIDIA's hardware.

We have already given Elemental some feedback as to features we'd like to see in the final version of the software (including support for .m2ts and .evo files as well as .mkv input/output). If there's anything you'd like to see, leave it in the comments and we'll pass along the thread to Elemental.

Elemental's software, if it truly performs the way we've seen here, has the potential to be a disruptive force in both the GPU and CPU industries. On the GPU side it would give NVIDIA hardware a significant advantage over AMD's GPUs, and on the CPU side it would upset the balance between NVIDIA and Intel. Video encoding has historically been an area where Intel's CPUs have done very well, but if the fastest video encoder ends up being a NVIDIA GPU - it could mean that video encoding performance would be microprocessor agnostic, you'd just need a good NVIDIA GPU.

If you're wondering why Intel is trying to launch Larrabee next year, this is as good of a consumer example as you're going to get.

Index
Comments Locked

50 Comments

View All Comments

  • lucapicca - Tuesday, June 24, 2008 - link

    I wonder if all this buzz about GPU programming is really a sane idea...
    First of all, from a video coding point of view, one would not transrate a video (I mean... same resolution, same GOP structure) and perform motion estimation again from scratch.
    And this is what GPUs might really be good at.
    Lastly, I'm not that impressed at the speed numbers.
    Is the performance/power ratio favourable to GPUs (in this application)?
    Is the transcoding done entirely on the GPU?
    Because... if 75% of the time is spent in communication/synchronization between CPU and GPU, I think that the future of computation is not in GPUs... and perhaps some sort of less powerful DSPs integrated in the CPU might really do dthe job better (see Cell).
    After all, it's just a matter of communication speed:
    sometimes sending a job to a remote CPU is not really worth it.
    Any opinion?
  • JonnyDough - Wednesday, June 25, 2008 - link

    I was thinking that myself as I read this article. The CPU simply isn't designed with this in mind, and if it was more specialized it could probably outperform the GPU. I think what you're suggesting is the merging of the GPU and the CPU...and it's my understanding that that merger is now finally underway. Once we hit 32nm and smaller, and begin to utilize more power saving features we'll see laptops REALLY begin to take off. Gaming on a 3 day battery powered laptop here we come. Hopefully.
  • Pjotr - Tuesday, June 24, 2008 - link

    [quote]In the worst case scenario, the GTX 280 is around 40% faster than encoding on Intel's fastest CPU alone.[/quote]

    Why can you never learn simple maths. If something completes in 8 seconds over something that completes in 14 seconds, it's 75% faster. (If it had run in 7 seconds over 14 seconds, it's obvious it's 100% faster not 50%, isn't it?)
  • strikeback03 - Tuesday, June 24, 2008 - link

    It is not so much math as semantics. The 6 seconds between the NVIDIA number and the fastest 9770 is about a 40% time savings (6/14=0.4285...), which could be thought of as 40% faster.

  • Pjotr - Wednesday, June 25, 2008 - link

    To be done in 40% less time (43.9% in this case though) you are done in 60% of the time. To be done in 60% of the time, you must process 1/60% = 1.67x faster = 67% faster. To be done in 57.1% of the time (8 seconds of 14) you must process 75% faster.

    If something is running 40% faster as stated in the article, it should be done in 71.4% of the time (use 28.6% less time). The article is incorrect in claiming processing is done 40% faster, it should say 75% faster OR done in 40% less time.
  • JonnyDough - Wednesday, June 25, 2008 - link

    The two of you, are correct. 40% faster is wrong.
  • DigitalFreak - Tuesday, June 24, 2008 - link

    Why can YOU never learn simple Englishs
  • INNAM - Tuesday, June 24, 2008 - link

    and to make it clear the VOB file can hold H.264 but i think i has to be under 4gbs and no DTS... Also the can play H.264 as well.
  • INNAM - Tuesday, June 24, 2008 - link

    as a dvd/blu-ray ripper myself i suggest other formates besides MKV. The fact that MKV can only be played on the computer. This makes it painful to than convert it once more to VOB(PS3 playback) or WMV/AVI(X360 playback). Don't get me wrong, if BadaBOOM wants to make it they NEED to add MKV support because if you're going to watch it via PC/HDMI it has Chapter support all the way to VC-1 and DTS.

    oh and i know having bult in AC3 or DTS encode plugin cost money but you could leave a option for the hardcore user to allow their own plugins. That would make it oh so good!
  • shiggz - Tuesday, June 24, 2008 - link

    Also shame on ATI years ago i spent 380$ to buy an x1900 to help speed up video encodes (as they promised) and they never came through on gpu accelerate! Even all these years later its still software accel. They just dropped the program. That was the last expensive ati card i bought.

    However with 4850's total 800 stream processor count If these could work same as nvidias' as mentioned potentially ATI could blow them away. Or am i missing something about 4850 SP count that would make it not directly proportional to Nvidia?

Log in

Don't have an account? Sign up now