GPU Transcoding Throwdown: Elemental's Badaboom vs. AMD's Avivo Video Converter

Name: GPU Transcoding Throwdown: Elemental's Badaboom vs. AMD's Avivo Video Converter
Item: GPU Transcoding Throwdown: Elemental's Badaboom vs. AMD's Avivo Video Converter
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on December 15, 2008 3:00 PM EST

Posted in
GPUs

36 Comments | Add A Comment

36 Comments

Last year, NVIDIA introduced it's CUDA development package. Existing as a stand alone download for a while, eventually CUDA was rolled in to the driver itself. Today, AMD is following suit rolling their own GPU computing package, called ATI Stream, into their Catalyst 8.12 driver. While the package has been available for a while now, AMD is really starting to try and push forward on the idea of using their hardware as a GPU computing platform. In both market penetration and branding, AMD is way behind NVIDIA and CUDA.

NVIDIA has been pushing forward in the HPC market very well with CUDA, and PhysX on the desktop uses CUDA to implement hardware accelerated physics. While AMD is a bit behind, we don't see them as hugely lagging either. NVIDIA has some good ground work laid, but the market for GPU computing is still incredibly untapped. Both AMD and NVIDIA are in a good position to take advantage of GPU computing efforts when OpenCL and DriectX 11 come along, and we really do see this effort by both camps to sell GPUs using stream computing as a pitch that is very preliminary.

Software like the Adobe's photoshop perform GPU acceleration using OpenGL. We are at a place where, if a commercial application would significantly benefit from GPU acceleration, software companies still want to develop it once and just have it work. Targeting either NVIDIA through CUDA or AMD through Brook+ is too much of a headache for most software vendors. Having an API (or better yet a choice of APIs) targeted at hardware agnostic GPU computing will kick off the real revolution that both AMD and NVIDIA want their solutions to provide.

But ATI Stream isn't the only thing in Catalyst 8.12 that piqued our interest. To show off the inclusion of the new package, AMD built-in a free video transcoder that actually makes use of GPU acceleration. It is somewhat limited (as is the Badaboom package that runs on NVIDIA hardware), but free is always a nice price. We will compare what you get with the Avivo Video Converter to Badaboom, but we must remind our readers that these are distinct applications that approach the problem of video encoding in different ways and thus a direct comparison isn't as telling as if we could run the same code on both hardware platforms. It's sort of like comparing Unreal Tournament 3 performance on AMD hardware to Enemy Territory: Quake Wars performance on NVIDIA hardware. We've got to look at what is being done, the quality of the output, and we must consider the fact that completely different approaches are likely to have been used by each development team.

The final bit of goodness in the 8.12 driver are some performance tweaks in some apps and fixes for performance and CrossFire in Far Cry 2. There were quite a number of unresolved issues we had to deal with especially on our Core i7 systems that we needed to run down. We'll let you know what issues remain in the following pages as well. This driver release has been highly anticipated not only by reviewers, but by consumers awaiting the merging of older hot fixes into a WHQL driver. We have high hopes, and we'll see if AMD has delivered something that meets our expectations.

But first, let's go a little deeper into ATI Stream and CUDA and take a look at the competing video transcoding offerings now available.

ATI Stream vs. NVIDIA CUDA

ATI and NVIDIA both know that ubiquitous GPU computing is the future for all data parallel tasks. Computer graphics is one of the most heavily parallel tasks around. It also happens to be a problem that easily lends itself to parallelization because of how independent the parallel tasks can be. These two factors, plus demand for high quality graphics, are what made the consumer GPU industry explode in it's dozen years of existence. While the GPGPU (general purpose use of GPUs) philosophy has been around for a while, the advent of using the GPU for general data parallel tasks has been slow on the uptake in the main stream. This has not only to do with the fact that there were no specialized tools available (developers needed to shoehorn algorithms into OpenGL or DirectX and "draw" triangles to solve problems), but was further complicated by the fact that GPU architecture did not lend itself to the implementation of many useful data parallel algorithms.

Hardware designed for graphics, until very recently, has meant floating point only acceleration of completely independent data points with heavy restrictions on reading and writing data. Only a small subset of problems really map to that kind of architecture. With DirectX 9 we only saw inklings of real programmability, and DirectX 10 class hardware has really brought the tools developers need to bear. We now have hardware that can handle not only floating point but integer and bitwise operations as well. This combines nicely with the fact that local and global data stores have been added for sharing data and there is better support for non-sequential reads and writes out there. With DirectX 11, the package will be fairly feature complete, adding real support for data structures beyond simple arrays, optional double precision support and a whole host of other minor improvements that really add up for GPU computing (which is no surprise because DX11 will also feature a general purpose Compute Shader that doesn't need to tie work to triangles, vertecies, fragments or pixels).

Both ATI and NVIDIA have been working on GPU computing for quite some time, though NVIDIA has really been pushing it lately. Even before either company got involved, pioneers in GPU computing were using graphics languages to solve their problems. Out of this, at Stanford, grew a project called Brook. Brook was able to take specially written code that looked fairly similar to standard C programing and source-to-source compile it into code that would implement the appropriate graphics API calls and shaders to run the program. ATI took an early interest in this project and sort of latched onto it (perhaps because it showed better performance on ATI hardware than NVIDIA hardware at the time). After releasing CTM (the low level ISA spec for their graphics hardware) and later CAL (their abstracted pseudo instruction set that can span hardware generations), Brook was modified to compile directly to code targeted at ATI GPUs. This modification was called Brook+ and is the major vehicle AMD uses for GPU Computing today.

With the Catalyst 8.12 release, AMD is now including the necessary software to build and run GPU computing applications with Brook+ and CAL in the driver itself. This software is bundled up in a package AMD likes to call the ATI Stream SDK and has been available as a separate download for a while now. NVIDIA also did this with CUDA, first offering it as a separate download and later integrating it into their driver.

With both Brook+ and CUDA there are limitations in what can be done from both a language and a hardware target standpoint. At this point, the documentation for CUDA is more practical giving better guidance on how to organize things, while the ATI Stream documentation is much lower level and arguably more complete than NVIDIA's. The long and short of it is that you'll be more likely to get up and running quickly with CUDA, but with ATI Stream there is the information to really understand what is going on if you want to go in and tweak the low level code generated by Brook+ (or if you just want to program at an assembly level).

As far as language extensions in general go, I prefer the CUDA approach to data parallel computing in spite of the fact that I still have my qualms about both. The major draw back to either is still the fact that they are locked in to a specific hardware target. Good data parallel programming is hard, and there's no reason to make it more difficult than it needs to be by forcing developers to write their code twice in two different languages and very likely in two completely different ways to take advantage of both architectures. It's ridiculous.

Both NVIDIA and AMD like to get on their high horse when talking about their GPU computing efforts. We have AMD talking about openness and standards and NVIDIA talking about their investment in CUDA and the already apparent adoption and market penetration in high performance computing. The problem is that both approaches are lacking and both companies are fully capable of writing compilers to take the current Brook or CUDA C language extensions and target them at their own architecture. Both companies will eventually support OpenCL when it hits and the DirectX 11 Compute Shader as well. But in the mean time they just aren't interested in working together. Which may or may not make sense from a business standpoint, but it certainly isn't the best path for the consumer or the industry.

Meanwhile NVIDIA, and now AMD, want to push their proprietary GPU computing technologies as a reason end users should want their hardware. At best, Brook+ and CUDA as language technologies are stop gap short term solutions. Both will fail or fall out of use as standards replace them. Developers know this and will simply not adopt the technology if it doesn't provide the return on investment they need, and 9 times out of 10, in the consumer space, it just won't make sense to develop either a solution for only one IHV's GPUs or to develop the same application twice using two different languages and techniques.

Where proprietary solutions for GPU computing do make sense is where the bleedingest edge performance is absolutely necessary: the HPC market. High Performance Computing for large companies with tons of cash for research will save more money than they put in when developing for the types of scientific computing that really benefit from the GPU today. No matter how long it takes a development team to port a solution to CUDA or Brook+, if the application has anything like the order of magnitude speedups we are used to seeing in this space, the project will have more than made up for the investment no time at all. Realized compute per dollar goes up at a similar rate to application speedup. GPU computing just makes sense here, even with proprietary solutions that only target one hardware platform.

In the consumer space, the real advantage that CUDA has over ATI Stream is PhysX. But this is barely even a real advantage at this point as PhysX suffers from the same fundamental problem that CUDA does: it only targets one hardware vendor's products. While there are a handful of PhysX titles out there, they aren't that compelling at this point either. We will have to start seeing some real innovation in physics with PhysX before it becomes a selling point. The closest we've got so far is the upcoming Mirror's Edge for the PC, but we must reserve judgement on that one because we haven't had the opportunity to play it yet.

And now we've got AMD's first real effort with the Avivo video converter finally using the GPU to do something (it did not in it's original incarnation). This competes with the only real consumer level application available for CUDA: Badaboom. Now that there are video converters available on both sides of the aisle, we have the opportunity to compare something that really still doesn't matter that much: we get to see the relative performance of two applications written by different teams with different goals targeted at different hardware for different markets. Great. Let's get started.

ATI Catalyst 8.12 Changes and Bug Fixes

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

36 Comments

View All Comments

mediaconvert - Wednesday, January 28, 2009 - link
I record a lot of tele on my computer and am always wanting faster ways to convert and compress my videos. When I heard about ati producing an equivilant of badaboom I was really excited and thought I could finally justify spending £150+ on a graphics card especially when it would be faster than the cpu. I have a ati 3450 and man was I dissapointed. I tried to compress a 120mb mpeg2 file and ended up with a 150 mb file. Also if the reviews are right it doesn't use the gpu. whats the point in having a gpu converter that doesn't use the gpu??? I can only speak for myself but if amd/ati comes out with a serious way of quickly converting/compressing the mpeg2 files (perhaps also with a batch processing mode) then they have a sale here especially if it allows me to play the latest video games.

Currently I have been looking at video cards and I have to say there are two things pushing me to nvidia one is badaboom and the other nvidias hybridpower (use of an nvidia motherboard integrated graphics to reduce gpu usage and hence gpu fan noise when gpu is not needed)

I recon ati/amd needs to get creative here and really commit to gpu video conversion. ( or even gpu + cpu video conversion ) If they can produce real world speed benefits then people will buy it.
Focher - Wednesday, December 17, 2008 - link
I have a 3-way SLI of 280s with a QX9650 CPU. I have both Badaboom and TMPG Xpress, both of which support GPU encoding. In my experience, I can actually encode video a bit faster with just the CPU. Badaboom apparently supports multi-GPU configurations now, but only to split encoding when you have queued multiple files. TMPG Xpress is definitely the more powerful and capable tool, but doesn't support multiple GPUs. Also, Badaboom apparently just released 1.1 that adds quite a few features but I have not yet tried it.
Rainman200 - Wednesday, December 17, 2008 - link
Just assign resources to help the developers of x264 to make use of GPU's through OpenCL and that will do more good than any of these waste of time apps.

Anand I'd definitely say the x264 is sharper vs Badaboom in the two pictures you posted, also please use Ribot264 or AutoMKV as they use the latest builds of x264, Handbrake trails development of x264 because of its Apple Mac focus so important features added to x264 which improve its image quality are left out months behind other x264 encoders.
dryloch - Wednesday, December 17, 2008 - link
I had a few ATI cards years ago but have been using Nvidia recently. I decided to try a 4800 series ATI card this time around partly because I hoped the number of stream processors would be useful for stuff like this. I have been looking forward to this driver for months and now they release something that doesn't work. My time is valuable to me ATI, don't waste it trying to make somthing work that you know is broken. I don't care what happens with the speed of the next gen cards I am going back to Nvidia.
toyotabedzrock - Tuesday, December 16, 2008 - link
http://www.pcper.com/article.php?aid=647">http://www.pcper.com/article.php?aid=647
This review seems to have gotten it to work better. Althought still not flawless.
talmholt - Tuesday, December 16, 2008 - link
Anand,

I think some of your issues are coming from Vista. I have used the converter on a WinXP32 machine with good results. It converts a 2 hour movie (MPEG2 640x480 3GB initial size) to an iPod file (320x240 500MB final size) in 8 minutes and the result is flawless!

I have also tried converting HDTV (OTA) content to a DVD format and that worked great too.

PS, my system is only a Intel Core 2 E6420 with a AMD 4850 (everything at stock speeds). Please try again Anand.

Thomas
Chris Simmo - Tuesday, December 16, 2008 - link
I use handbrake, but noticed something wierd. I had a 9800GT in the system, using handbrakes default movie options x264 and I would get about 150 turbo first pass, 48fps second pass on my overclocked q9400@3.5. I changed the graphics over to a HD4850, and saw an option for VP3. I selected it, the CLI crashed, the handbrake UI was still running though, changed back to x264, and then it was 290 turbo first pass and over 150 second pass. This is running vista 64 with the 8.12 drivers. During this time the GPU temp went up 2 degrees, all four cores were at 100%. I really need some one else to have a play and see what they get. I put in a 4870 to try, but I hadn't worked out the VP3 thing yet, so it didn't change form the standard 48fps
Chris Simmo - Tuesday, December 16, 2008 - link
Sorry, that was 'Shaun of the dead' DVD to MKV,
niuniu2012 - Wednesday, March 10, 2010 - link
You can use http://www.dvdtomp3converter.com/">http://www.dvdtomp3converter.com/ to select target subtitle and audio track according at your will. DVD to MP3 Converter also provides you with fruitful options to set audio properties of audio bitrate, Sample Rate and so on.
piroroadkill - Tuesday, December 16, 2008 - link
"Last year, NVIDIA introduced it's CUDA"

it is CUDA!

GPU Transcoding Throwdown: Elemental's Badaboom vs. AMD's Avivo Video Converter

ATI Stream vs. NVIDIA CUDA

Post Your Comment

36 Comments

View All Comments

mediaconvert - Wednesday, January 28, 2009 - link

Focher - Wednesday, December 17, 2008 - link

Rainman200 - Wednesday, December 17, 2008 - link

dryloch - Wednesday, December 17, 2008 - link

toyotabedzrock - Tuesday, December 16, 2008 - link

talmholt - Tuesday, December 16, 2008 - link

Chris Simmo - Tuesday, December 16, 2008 - link

Chris Simmo - Tuesday, December 16, 2008 - link

niuniu2012 - Wednesday, March 10, 2010 - link

piroroadkill - Tuesday, December 16, 2008 - link

Log in

Don't have an account? Sign up now