Original Link: http://www.anandtech.com/show/4380/discrete-htpc-gpus-shootout
Discrete HTPC GPU Shootoutby Ganesh T S on June 12, 2011 10:30 PM EST
The popularity of Intel's HD Graphics amongst HTPC enthusiasts and the success of the AMD APUs seem to indicate that the days of the discrete HTPC GPU are numbered. However, for those with legacy systems, a discrete HTPC GPU will probably be the only way to enable hardware accelerated HD playback. In the meanwhile, discrete HTPC GPUs also aim to offer more video post processing capabilities.
In this context, both AMD and NVIDIA have been serving the market with their low end GPUs. These GPUs are preferable for HTPC scenarios due to their low power consumption and ability to be passively cooled. Today, we will be taking a look at four GPUs for which passively cooled solutions exist in the market. From AMD's side, we have the 6450 and 6570, while the GT 430 and GT 520 make up the numbers from the NVIDIA side.
Gaming benchmarks are not of much interest to the HTPC user interested in a passively cooled solution. Instead of focusing on that aspect, we will evaluate factors relevant to the AV experience. After taking a look at the paper specifications of the candidates, we will describe our evaluation testbed.
We will start off the hands-on evaluation with a presentation of the HQV benchmarks. This provides the first differentiating factor.
While almost all cards (including the integrated graphics on CPUs) are able to playback HD videos with some sort of acceleration, videophiles are more demanding. They want to customize the display refresh rate to match the source frame rate of the video being played. Casual HTPC users may not recognize the subtle issues created by mismatched refresh rates. However, improper deinterlacing may lead to highly noticeable issues. We will devote a couple of sections to see how the cards handle custom refresh rates and fare at deinterlacing.
After this, we will proceed to identify a benchmark for evaluating HTPC GPUs. This benchmark gives us an idea of how fast the GPUs can decode the supported codecs, and whether faster decoding implies more time for post processing. We will see one of the cards having insane decoding speeds, and try to find out why.
Over the last few months, we have also been keeping track of some exciting open source software in the HTPC area. Aiming to simplify the player setup and also take advantage of as many features of your GPU as possible, we believe these are very close to being ready for prime time. We will have a couple of sections covering the setup and usage of these tools.
Without further ado, let us go forward and take a look at the contenders.
Our first candidate has been in our labs for more than 6 months now. When we last took a look at the NVIDIA GT 430, it was an underwhelming performer. Have driver updates fixed the issues we had with the card? We are going to put the NVIDIA reference card through the paces and also through our new benchmarking methodology.
From the AMD side, we have 6450 reference card. This is the same card that Ryan took a look at earlier. We didn't cover too many HTPC specific issues in that review (except noting that the deinterlacing performance matched that of the 5570's), and we will correct that aspect in this piece.
AMD also provided us with a Display Port to HDMI adapter so that we could test out HD audio bitstreaming, and it worked without issues.
The second NVIDIA card we have in the list is the recently introduced GT 520. Ryan had again covered the launch, but without a detailed review. NVIDIA provided us with a retail sample of the MSI N520GT.
Rounding up our initial set of candidates is the AMD 6570 retail sample, courtesy Sapphire.
NVIDIA let us know that the GDDR5 based 6450 from AMD was not representative of what is available in the market. There is no passively cooled GDDR5 based 6450, and the core clocks of all the 6450s in the market weigh in at 625 MHz (compared to the reference card's 750 MHz). Keeping this in mind, we also added a DDR3 based MSI 6450 (quite late in the game) to the list of contenders.
The table below compares the listed specifications of all the contenders.
|Discrete HTPC GPUs Shootout Contenders|
|NVIDIA GT 430||AMD 6450 [GDDR5]||MSI GT 520||Sapphire 6570 [DDR3]||MSI 6450 [DDR3]|
|AMD Stream Processors||160||480||160|
|NVIDIA Stream Processors||96||48|
|Core Clock (MHz)||700||750||810||650||625|
|Shader Clock (MHz)||1400||1620|
|Memory Clock (Data Rate) (MHz)||900 (1800)||900 (3600)||900 (1800)||900 (1800)||667 (1333)|
|DRAM Configuration||128b 1GB DDR3||64b 512MB GDDR5||64b 1GB DDR3||128b 1GB DDR3||64b 1GB DDR3|
|Max. TDP (W)||49||27||29||44||20|
Let us now proceed to take a look at the HTPC testbed in which these cards were benchmarked.
For the purpose of HTPC reviews (in particular, HQV benchmarking for discrete GPUs), we have set up a dedicated testbed with the following configuration.
|HTPC Benchmarking Testbed Setup|
|Processor||Intel i5-680 CPU - 3.60GHz, 4MB Cache|
|Motherboard||ASUS P7H55D-M EVO|
|OS Hard Drive||Seagate Barracuda XT 2 TB|
|Secondary Drive||Kingston SSDNow 128GB|
|Memory||G.SKILL ECO Series 4GB (2 x 2GB) SDRAM DDR3 1333 (PC3 10666) F3-10666CL7D-4GBECO CAS 7-7-7-21|
|Optical Drives||ASUS 8X Blu-ray Drive Model BC-08B1ST|
|Case||Antec VERIS Fusion Remote Max|
|Power Supply||Antec TruePower New TP-550 550W|
|Operating System||Windows 7 Ultimate x64|
This is the same configuration that we had for evaluating the GT 430 last fall. Windows 7 was updated to SP1, but that has little bearing on our HTPC benchmarks.
Each hardware configuration has an associated OS image which was created / restored as necessary using Clonezilla. This ensures that we do not end up with conflicting drivers while evaluating GPUs from different companies on the same base testbed. Read on for the results from benchmarking the four candidates with the HQV clips
HTPC enthusiasts are often concerned about the quality of pictures output by the system. While this is a very subjective metric, we have decided to take as much of an objective approach as possible. Starting with our HTPC reviews, we have been using the HQV 2.0 benchmark for this purpose. The HQV benchmarking procedure has been heavily promoted by AMD, but it is something NVIDIA says it doesn't optimize for. Considering the fact that there aren't any other standardized options available to evaluate the video post processing capabilities of the GPUs, we feel that HQV benchmarking should be an integral part of the reviews.
However, HQV scores need to be taken with a grain of salt. In particular, one must check the tests where the GPU lost out points. In case those tests don't reflect the reader's usage scenario, the handicap can probably be ignored. An example is cadence detection. Only interlaced streams with non-native frame rates (i.e, 24p content at 60i, 25p content at 50i etc.) need this post processing. Even within this, it is streams requiring 3:2 cadence detection that are most common. Streams with 2:3:3:2 and other fancy patterns are almost non-existent in most usage scenarios. So, it is essential that the scores for each test be compared, rather than just the total value.
The HQV 2.0 test suite consists of 39 different streams divided into 4 different classes. In our HTPC(s), we use Cyberlink PowerDVD 11 with TrueTheater disabled and hardware acceleration enabled for playing back the HQV streams. The playback device is assigned scores for each, depending on how well it plays the stream. Each test was repeated multiple times to ensure that the correct score was assigned. The scoring details are available in the testing guide from HQV.
In the table below, we indicate the maximum score possible for each test, and how much each GPU was able to get. The NVIDIA GPUs were tested with driver version 270.61 and the AMD GPUs were tested with Catalyst 11.5.
|HQV 2.0 Benchmark Shootout|
|Test Class||Chapter||Tests||Max. Score||NVIDIA GT 430||MSI GT 520||AMD 6450||Sapphire 6570||MSI 6450|
|Video Conversion||Video Resolution||Dial||5||5||4||5||5||4|
|Dial with Static Pattern||5||5||5||5||5||5|
|Film Resolution||Stadium 2:2||5||5||0||5||5||5|
|Overlay On Film||Horizontal Text Scroll||5||5||5||5||5||5|
|Vertical Text Scroll||5||5||5||5||5||5|
|Cadence Response Time||Transition to 3:2 Lock||5||5||5||5||5||5|
|Transition to 2:2 Lock||5||5||0||5||5||5|
|Multi-Cadence||2:2:2:4 24 FPS DVCam Video||5||5||0||5||5||5|
|2:3:3:2 24 FPS DVCam Video||5||5||0||5||5||5|
|3:2:3:2:2 24 FPS Vari-Speed||5||5||0||5||5||5|
|5:5 12 FPS Animation||5||5||0||5||5||5|
|6:4 12 FPS Animation||5||5||0||5||5||5|
|8:7 8 FPS Animation||5||5||0||5||5||5|
|Color Upsampling Errors||Interlace Chroma Problem (ICP)||5||5||5||5||5||5|
|Chroma Upsampling Error (CUE)||5||5||5||5||5||5|
|Noise and Artifact Reduction||Random Noise||SailBoat||5||5||5||5||5||0|
|Compression Artifacts||Scrolling Text||5||5||3||3||5||0|
|Upscaled Compression Artifacts||Text Pattern||5||3||3||3||3||0|
|Image Scaling and Enhancements||Scaling and Filtering||Luminance Frequency Bands||5||5||5||5||5||5|
|Chrominance Frequency Bands||5||5||5||5||5||5|
|Resolution Enhancement||Brook, Mountain, Flower, Hair, Wood||15||15||15||15||15||15|
|Video Conversion||Contrast Enhancement||Theme Park||5||5||5||5||5||5|
|Beach at Dusk||5||5||5||5||5||5|
|White and Black Cats||5||5||5||5||5||5|
|Skin Tone Correction||Skin Tones||10||7||7||7||7||7|
A look at the above table reveals that there is not much to differentiate between the AMD 6450, GT 430 and 6570. The GT 430 scores in between the 6450 and 6570. However, the GT 520 and the DDR3 based MSI 6450 stand out because of their low scores.
In our GT 430 review last October, we were willing to give it some leeway because it lost out in the bulk of the cadence detection tests. The GT 520 is in a similar situation here. The all-important 3:2 pulldown is performed correctly. However, none of the other cadence detection tests passed. GT 520 also has other issues in general which cause it to get a lower score than what the GT 430 obtained in its initial review. We will take a look at how the GT 520 fares in the other tests before delivering the final verdict.
The DDR3 based 6450 misses out on the bulk of the scores because it is unable to perform denoising in a proper manner. When AMD was contacted about this, they admitted the issue and indicated that they were working on a fix. However, they pointed out that the problem was only for standalone files and not Blu-ray discs. To our surprise, we found that denoising worked properly in PowerDVD irrespective of ESVP when the HQV Benchmark Blu-ray was used! We decided not to let that alter the scores above. Blu-rays are already mastered carefully, and don't need as much post processing as local files from recorded TV shows or camcorder files. The low score of the DDR3 based 6450 will probably improve a great deal after driver updates, but we will consider only playback of files on the hard drive in the rest of this review.
One of the drawbacks of the GPUs built into the Clarkdale/Arrandales CPUs and the Sandy Bridge CPUs was the lack of 23.976 Hz for matching the source frame rate of many videos. Combined with the lack of reliable support for open source software, this has often pushed users to opt for a discrete HTPC GPU.
Ideally, a GPU should be capable of the following refresh rates at the minimum:
- 23.976 Hz
- 24 Hz
- 25 Hz
- 29.97 Hz
- 30 Hz
- 50 Hz
- 59.94 Hz
- 60 Hz
Some users demand integral multiples of 23.976 / 24 Hz because they result in a smoother desktop experience, while also making sure that the source and display refresh rates are still matched without repeated or dropped frames.
However, being in the US (NTSC land), we are looking at the minimum necessary subset here, namely, support for the following:
- 23.976 Hz for 23.976 fps source material
- 24 Hz for 24 fps source material
- 59.94 Hz for 59.94 fps source material
We have observed that the refresh rate is heavily dependent on the AV components in the setup (a card which provided perfect 23.976 Hz in my setup performed quite differently in another). In order to keep the conditions same for all the contenders, the custom refresh rates were tested with the HDMI output of the card connected to an Onkyo TX-SR606 and then onto a Toshiba Regza 37" 1080p TV. The Toshiba TV model is indeed capable of displaying 24p material.
The NVIDIA Control Panel provided a 23 Hz option by default when connected in the test setup. This is obviously coming from the EDID information. Setting the refresh rate to 23 Hz and playing back a 23.976 fps video resulted in the following:
Note that the playback frame rate locks on to 23.971 fps, and the display refresh rate also loosely locks on to 23.971 Hz. Unfortunately, this is only slightly better than the 24 Hz lock that Intel provides for the 23 Hz setting. With this, one can expect a dropped frame every 200 seconds.
Fortunately, NVIDIA provides us with a way to create custom resolutions using the NVIDIA Control Panel, as in the gallery below.
The display mode refresh rate should be set to 23 Hz, and the Timing parameters need to be tweaked manually (altering the refresh rate to change the pixel clock). This is more of a trial and error process (setting the refresh rate to 23.976 as in the gallery below didn't necessarily deliver the 23.976 frame lock and refresh rate during media playback). With a custom resolution setup, we are able to get the playback frame rate to lock at 23.976.
The display refresh rate oscillates a little around this value, but, in all probability, averages out over time. We do not see any dropped or repeated frames.
Moving on to the 24 Hz setting (needed for 24 fps files, common in a lot of European Blu-rays), we find that it works without the need for much tweaking.
Playback locks at 24 fps, and the refresh rate oscillates around this value with very little deviation.
The default NTSC refresh rate (59.94 Hz) works in a manner similar to the 24 Hz setting, as is evident in the gallery below.
MSI GT 520:
With respect to custom refresh rates, the GT 520 is very similar to the GT 430. The 23 Hz setting, at default, had the same issues as the GT 430, but nothing that a little tweaking didn't fix. The gallery below shows the behavior with the default 23 Hz setting:
After setting up a custom resolution, we get the following:
The 24 Hz setting, at default, showed a slight issue with the playback frame rate locking at 24.001 Hz. This would imply a repeated frame every 1000 seconds (~17 minutes).
This can probably be fixed by altering the timing parameters for the 24 Hz setting, but we didn't take that trouble.
Setting up NTSC refresh rates with the 59 Hz native setting gave us the following results, similar to the issue we had with 24 Hz setting.
DDR3 and GDDR5 based 6450 :
We didn't find any difference between the two versions of the 6450 that we tested with respect to refresh rate handling. In this section, we will present screenshots from the GDDR5 based 6450.
Catalyst Control Center automatically enables the 23 and 24 Hz settings in the drop down box for refresh rates by recognizing the EDID information. How well do these settings work? A look at the gallery below shows that the behavior is better than Intel's and NVIDIA's native offerings. However, there is still the issue that the play back frame rate locks to 23.977 fps / 24.001 fps. The refresh rate is not exactly 23.977 either, but mostly below that. All in all, this is not the ideal 23.976 Hz, but something that the 'set-it-and-forget-it' crowd might be OK with.
We didn't get a chance to test the 59.94 Hz settings for videos, because the 6450s' way of playing back 1080p60 videos was to present a slideshow. A brief look at the gallery below reveals the issue:
There is a little bit more coverage about this in the 'ESVP on the 6450s' section.
While the 6450 was only slightly off from the required 23.976 and 24 Hz settings, the Sapphire 6570 took a little more liberty. 23 Hz gave us 23.978 Hz and 24 Hz gave us 24.002 Hz, resulting in repeated frames every 500 seconds.
The 59 Hz setting for the 6570 gave us 59.946 instead of 59.94, which eventually results in a repeated frame every 167 seconds (~3 minutes).
The takeaway from this section is that none of the GPUs can claim to do fully perfect 23.976 Hz refresh rates. With luck, the ATI card in a particular setup may be able to provide the perfect refresh rate. After all, they came very close to the required settings in our testbed. The NVIDIA cards, at default, are probably going to be always off. However, for the advanced users, there are some avenues available to obtain the required display refresh rate. Unfortunately, there is no way I am aware of to feed custom refresh rates in the Catalyst Control Center.
Before I started the review, it was my opinion that AMD is much better at native refresh rates compared to NVIDIA. After putting the various cards through the paces, I am forced to reconsider. AMD may work well for the average HTPC user. For the more demanding ones, it looks like NVIDIA is the winner in this area because of the ability to create custom resolutions.
One of the video post processing aspects heavily emphasized by the HQV 2.0 benchmark is cadence detection. Improper cadence detection / deinterlacing leads to the easily observed artifacts during video playback. When and where is cadence detection important? Unfortunately, the majority of the information about cadence detection online is not very clear. For example, one of the top Google search results makes it appear as if telecine and pulldown are one and the same. They also suggest that the opposite operations, inverse telecine and reverse pulldown are synonymous. Unfortunately, that is not exactly true.
We have already seen a high level view of how our candidates fare at cadence detection in the HQV benchmark section. In this section, we will talk about cadence detection in relation to HTPCs. After that, we will see how our candidates fare at inverse telecining.
Cadence detection literally refers to determining whether a pattern is present in a sequence of frames. Why do we have a pattern in a sequence of frames? This is because most films and TV series are shot at 24 frames per second. For the purpose of this section, we will refer to anything shot at 24 fps as a movie.
In the US, TV broadcasts conform to the NTSC standard, and hence, the programming needs to be at 60 frames/fields per second. Currently, some TV stations broadcast at 720p60 (1280x720 video at 60 progressive frames per second), while other stations broadcast at 1080i60 (1920x1080 video at 60 fields per second). The filmed material must be converted to either 60p or 60i before broadcast.
Pulldown refers to the process of increasing the movie frame rate by duplicating frames / fields in a regular pattern. Telecining refers to the process of converting progressive content to interlaced and also increasing the frame rate. (i.e, converting 24p to 60i). It is possible to perform pulldown without telecining, but not vice-versa.
For example, Fox Television broadcasts 720p60 content. The TV series 'House', shot at 24 fps, is subject to pulldown to be broadcast at 60 fps. However, there is no telecining involved. In this particular case, the pulldown applied is 2:3. For every two frames in the movie, we get five frames for the broadcast version by repeating the first frame twice and the second frame thrice.
Telecining is a bit more complicated. Each frame is divided into odd and even fields (interlaced). The first two fields of the 60i video are the odd and even fields of the first movie frame. The next three fields in the 60i video are the odd, even and odd fields of the second movie frame. This way, two frames of the movie are converted to five fields in the broadcast version. Thus, 24 frames are converted to 60 fields.
While the progressive pulldown may just result in judder (because every alternate frame stays on the screen a little bit longer than the other frame), improper deinterlacing of 60i content generated by telecining may result in very bad artifacting as shown below. This screenshot is from a sample clip in the Spears and Munsil (S&M) High Definition Benchmark Test Disc
|Inverse Telecine OFF||Inverse Telecine ON|
Cadence detection tries to detect what kind of pulldown / telecine pattern was applied. When inverse telecine is applied, cadence detection is used to determine the pattern. Once the pattern is known, the appropriate fields are considered in order to reconstruct the original frames through deinterlacing. Note that plain inverse telecine still retains the original cadence while sending out decoded frames to the display. Pullup removes the superfluous repeated frames (or fields) to get us back to the original movie frame rate. Unfortunately, none of the DXVA decoders are able to do pullup. This can be easily verified by taking a 1080i60 clip (of known cadence) and frame stepping it during playback. You can additionally ensure that the refresh rate of the display is set to the same as the original movie frame rate. It can be observed that a single frame repeats multiple times according to the cadence sequence.
Now that the terms are clear, let us take a look at how inverse telecining works in our candidates. The gallery below shows a screenshot while playing back the 2:3 pulldown version of the wedge pattern in S&M.
This clip checks the overall deinterlacing performance for film based material. As the wedges move, the narrow end of the horizontal wedge should have clear alternating black and white lines rather than blurry or flickering lines. The moire in the last quarter of the wedges can be ignored. It is also necessary for both wedges should remain steady and not flicker for the length of the clip.
The surprising fact here is that the NVIDIA GT 430 is the only one to perfectly inverse telecine the clip. Even the 6570 fails in this particular screenshot. In this particular clip, the 6570 momentarily lost the cadence lock, but regained it within the next 5 frames. Even during HQV benchmarking, we found that the NVIDIA cards locked onto the cadence sequence much faster than the AMD cards.
Cadence detection is only part of the story. The deinterlacing quality is also important. In the next section, we will evaluate that aspect.
Mismatches in the display refresh rate and source frame rate are difficult to spot for the average HTPC user, particularly if the dropped or repeated frames are far apart. Bad deinterlacing performance, on the other hand, may easily ruin the HTPC experience even for the average user. From DVDs to recorded TV shows and even Blu-ray documentaries, interlaced content is quite common.
We have been using the Cheese Slices test to check up on deinterlacing performance in the past. Instead of just covering the cheese slice alone, we will present a set of four consecutive deinterlaced frames from the video for you to judge.
Before presenting the results, let us take a look at how the ideal deinterlacing output should look like (a screenshot from the progressive version of the Cheese Slices clip around the same timestamp)
Click for Lossless Version
NVIDIA GT 430 Cheese Slices Deinterlacing:
MSI GT 520 Cheese Slices Deinterlacing:
AMD 6450 Cheese Slices Deinterlacing:
MSI 6450 Cheese Slices Deinterlacing:
Sapphire 6570 Cheese Slices Deinterlacing:
The Cheese Slices test is an artificial test clip. To bring some real world perspective to the deinterlacing performance, let us take a look at some screenshots of the 'Ship' clip from the Spears and Munsil High Definition Benchmark Test Disc (hereon referred to as the S&M clip). It is supposed to test the edge adaptive deinterlacing capabilities of the GPU. Note the jaggies in the various ropes in the screenshot. Roll the mouse over the various GPUs in the list below the image to see how each candidate performs.
|MSI GT 520||MSI 6450||NVIDIA GT 430||Sapphire 6570|
Pay particular attention to the deinterlacing performance of the GT 520. Compared to the other 4 cards in the test, this one emerges as the worst of the lot in the Cheese Slices test as well as the real world video test. When this was brought to NVIDIA's attention, they indicated the lack of shaders on the GF119 as the main reason for this issue. It looks like driver updates are not going to solve the issue in the future either. This is a big letdown for the prospective customers of the GT 520. Even though the deinterlacing performance of the GT 430 looks pretty good, a closer look reveals that it is not as effective as AMD's vector adaptive deinterlacing strategy.
On the AMD side, it looks like the reduced core clock frequency and lessened DRAM bandwidth of the MSI 6450 doesn't affect the deinterlacing performance in the Cheese Slices test. Vector adaptive deinterlacing works across all the cards in the 6xxx lineup. However, AMD has admitted to some driver issues for local file playback in the DDR3 based 6450s. This is probably the reason for the bad performance of the MSI 6450 in the S&M ship clip above. In an informal blind test, a majority seemed to prefer the 6570's output in the clip above, but you can decide for yourself.
The Catalyst Control Center allows users to experiment with different deinterlacing algorithms, while NVIDIA's Control Center doesn't. Admittedly, this choice of deinterlacing algorithms is of academic interest only. That said, the bad deinterlacing performance of the GT 520 and the fact that it is not going to improve in the future forces us to declare AMD the winner in this area.
We will see in the 'DXVA Benchmarking' section that denoising is one of the more GPU intensive video post-processing tasks. To put that in perspective, let us take a look at how the denoising performance of each card is, and the factors which affect it.
In each of the galleries above, you can see a screenshot of a noisy video being played back with PowerDVD. The first shot shows the appearance of the video without denoising turned on. The second shot shows the performance with denoising enabled. For both cards, it can be seen that the denoising kicks in, as expected. This is also reflected in the relevant HQV benchmark section. With denoising turned on, note that the GPU load increases from 75% to 81% for the GT 520, while the corresponding increase in the GT 430 is much smaller.
Is it similarly straightforward to test the denoising performance on the AMD GPUs? Unfortunately, that is not the case. AMD has this nifty feature 'Enforce Smooth Video Playback' (ESVP) in the Catalyst Control Center.
Simply put, it just means that the drivers automatically turn off post processing features if it finds that the card is not powerful enough to do it in real time. How well does this feature work? While we are on the topic of denoising, let us check up on that first.
The first shot shows the noisy video being played back with ESVP on and the denoising options turned off.
The second and third shots sows the denoising options (Denoise and Mosquito Noise Reduction) taking effect. Note the GPU load increasing from 40 to 49%. The fourth shot in the above gallery show that ESVP has no effect on denoising. Note that turning off ESVP increases the GPU load from 49% to 88%. This implies that some other post processing option was enabled in CCC, but didn't actually kick in because the card was too weak.
Moving on to the MSI 6450, the gallery below presents two shots.
The first one forces the denoising algorithms to take effect by disabling ESVP. Note that the GPU load rocketed up to 100%. The video became a slideshow soon enough. The second shot shows that ESVP is turned on, and the denoising algorithms are also turned on. It was quite evident that the denoising algorithms didn't take effect and the drivers silently turned off the denoising algorithms. This can also be inferred from the fact that enabling the denoising algorithms increased the GPU load to 100%.
AMD acknowledge the issue and indicated that they are working on a fix. I have little doubt that this is going to be resolved soon because the same files on a Blu-ray disc play back with all the post processing options. However, with the current drivers, the DDR3 based 6450 suffers heavily.
The Sapphire 6570 is, thankfully, not an ESVP mess like the 6450. The gallery below presents two shots.
The first one has ESVP on, but the denoising algorithms are off. The video is clearly noisy, and GPU utilization is pegged at 52%. In the second shot, ESVP is off (which means that almost all the video post processing algorithms except brightness level adjustments are forced to take effect). GPU utilization shoots up to 76%, but the end results are very good. It is a matter of personal taste, but the addition of mosquito noise reduction seems to make the AMD denoising results much better than NVIDIA's.
Let us come back to the ESVP mess on the 6450s. The intent of ESVP is to make sure that the decoder puts out the decoded frame within the required time. It should be OK to forsake any post processing steps in case the GPU is not able to keep up. We saw in the 'Custom Refresh Rates' section that both the 6450s were unable to keep up with 1080p60 H264 decoding. Those tests were run with ESVP turned on. The gallery below shows how the same video can be played back with all the post processing options turned off (including ESVP).
It is clear that the UVD engine in the 6450 can handle 1080p60 H264 decoding. It is a combination of ESVP and other post processing features which makes AVCHD clips unplayable on the 6450s. The last two shots in the gallery are from the MSI 6450. They show that 1080p60 H264 decode with all the CCC options turned off has a GPU load of 36%. Turning on ESVP makes it shoot up to 100% and results in jerky playback. This, however, has not yet been acknowledged by AMD as a problem yet.
In addition, the gallery below shows screenshots of a 1080p24 video being played back on the MSI 6450 (DDR3 based, lower core clock) in PowerDVD 11 and MPC-HC.
In both cases, GPU load regularly spikes up to 100% resulting in very noticeable stutters in the video playback. We were able to reproduce the problem with MPC-HC also. We suspect it is a combination of AMD's drivers as well as the lower core clock in the MSI 6450 which is causing this issue.
The takeaway from this section is that the AMD drivers need a lot of work with respect to ESVP on the 6450s. The denoising performance of both the NVIDIA cards is passable. I personally find AMD's denoising implementation (in the 6570) to be better. However, I strongly recommend readers to avoid the 6450s for some time to come.
The lack of a standardized HTPC GPU evaluation methodology always puts us in a quandary when covering the low end / integrated GPUs. Towards this, I had a long discussion with Andrew Van Til, Mathias Rauen and Hendrik Leppkes, all popular open source multimedia software developers. The methodology we developed is presented below.
The first step is to ensure that all the post processing steps work as expected. HQV benchmarking gives us an idea. Once the cards' post processed videos pass visual inspection, we need to gather an idea of how much time is left for the GPU to do further post processing activities. These may include specialized scaling algorithms, bit-depth etc. as implemented by custom MPC-HC shaders / renderers like madVR.
Deinterlacing and cadence detection are aspects which affect almost all HTPC users. Other aspects such as denoising, edge sharpening, dynamic contrast enhancement etc. are not needed in the mainstream HTPC user's usage scenario. Most mainstream videos being watched are either from a Blu-ray source or re-encoded offline or TV shows which need deinterlacing (if they are in 480i / 1080i format).
|Denoising OFF||Denoising ON|
The intent of the benchmark is to first disable all post processing and check how fast the decoder can pump out decoded frames. In the typical scenario, we expect post processing to take more time than the decoding. Identifying the stage which decides the throughput of the decoded frames can give us an idea of whether we can put in more post processing steps. This is similar to a pipeline whose operating frequency is decided by the slowest stage. We then enable post processing steps one by one and see how the throughput is affected.
DXVAChecker enables us to measure the performance of the DXVA decoders. We use a standard set of 1080p / 1080i H264 / MPEG-2 and VC-1 clips. We also have 1080p DIVX / XVID and MS-MPEG4 clips. Cyberlink PowerDVD 11, Arcsoft Total Media Theater 5 and MPC-HC video decoders were registered under DirectShow. DXVA Checker was used to identify which codecs could take advantage of DXVA2 and capable of rendering under EVR for the sample clips. An interesting aspect to note was that none of the codecs could process 1080i VC-1 or the MPEG-4 clips with DXVA2.
Note that the results in the next section list all the cards being tested. However, the 6450s and GT 520 shouldn't really be considered with seriousness because of the issues pointed out in the previous sections.
We ran the DXVA Checker benchmark on all the cards, but our graphing engine allows us to present only four series in each graph. This meant that we had to choose between the GDDR5 based AMD 6450 and the DDR3 based MSI 6450. Keeping in mind our focus on passively cooled GPUs, we went with the latter. It was observed that the 'No VPP (Video Post Processing)' frame rates were similar for both the candidates. However, as post processing algorithms were enabled, the MSI 6450 began to perform a bit worse than the AMD 6450. We will analyze the probable cause later. We were able to get DXVA2 acceleration with the EVR renderer for all codecs except the MPEG-4 variants.
First, we look at a 1080p H.264 clip. The MPC Video Decoder v184.108.40.20634 was able to playback the clip without issues on all the GPUs. There were a couple of surprises in store when the DXVA Checker benchmark (as described in the previous section) was run.
While the GT 430 was unable to reach the magical 60 fps benchmark figure (I expect any GPU worth its salt to be able to decode 1080p60 H264 clips), the GT 520 sprang a surprise with some insane decoding speeds. Even considering the fact that the GT 520 took shortcuts by skimping on the post processing, it comfortably beats every other GPU in the race. The 430's benchmark result was even more puzzling, considering the fact that all the 1080p60 AVCHD and re-encoded broadcast clips that we threw at it played back flawlessly. We talked to NVIDIA about this, and it looks like the culprit in this case was the bitrate. Our sample was a 40 Mbps clip at 1080p30. At 60 fps, the VPU engine would have had to process a sample at 80 Mbps, and apparently, the VP4 engine in the GT 430 is simply not capable of that. We are willing to cut NVIDIA some slack here, because I have personally not seen any real 1080p60 content at 80 Mbps. We will cover both of the above aspects in detail in the next section.
With the exception of the 6450, we find that enabling various post-processing options doesn't bring down the decode frame rate. This shows that the latency of the post processing steps is completely hidden by the time spent in the UVD / VPU engines to obtain the decoded frame. For the 6450, we infer that the lower core clock for the stream processors slows down the post processing steps a bit too much.
For the 1080p VC-1 clip, we again use MPC Video Decoder v220.127.116.1134 for flawless play back.
We find that the NVIDIA GPUs hide their post processing latency in the time taken by the VPU engine. However, the 6570 shows a gradual decline in the throughput as various options are enabled. The decline is not as alarming as the 6450's, and manages to comfortably stay above 60 fps.
VLD acceleration for MPEG-2 was only recently introduced in the UVD 3 engine by AMD. The Microsoft DTV-DVD Video Decoder is able to provide DXVA2 acceleration for MPEG-2 clips.
It is not clear why turning on deinterlacing / cadence detection should affect the throughput of the decode of the progressive clip, but that is what we observe for all the candidates except the 6570. Compared to VC-1 and H.264 decoding which decided the throughput of the video pipeline, MPEG-2 is much easier on the UVD / VPU engine. This is reflected in the fact that the video post processing brings down the throughput quite a bit on all the GPUs.
Moving onto interlaced streams, we will consider a 1080i H.264 clip first.
As expected, deinterlacing definitely kicks in to lessen the throughput of the frames. Unlike the 1080p H.264 decode performance, we find that all the GPUs are now limited by how fast the post-processing can be done. This makes sense, since the UVD/VPU engine needs to operate for only half the usual horizontal resolution for interlaced content. Note that the 'frames per second' figure presented for the interlaced streams is actually 'fields per second' (a 1080i clip showing 29.97 fps with MediaInfo actually has 59.94 fields per second).
The interlaced MPEG-2 performance is as below:
Results are very similar to what we got for the interlaced H.264 clip. One can conclude that interlaced clips spend more time getting post-processed compared to the progressive clips, but that is hardly surprising.
We had noted earlier that DXVA2 / EVR wasn't enabled for interlaced VC-1 streams on any of the GPUs. However, with the checkactivate.dll hack (described in the LAV Splitter section), we were able to make Arcsoft Video Decoder appear in the list of codecs when the 'Check DirectShow / MediaFoundation Decoders' was used for interlaced VC-1 clips. Though it wasn't explicitly indicated that the support was DXVA2 using EVR, we did find that playing back the stream using EVR consumed almost nil CPU resources and kept the GPU / VPU engine quite busy. Presented below is the interlaced VC-1 performance using the Arcsoft Video Decoder in Total Media Theater v5.0.187
The takeaway from this section is that cards which run too close to the 60 fps limit with all post processing steps enabled should be avoided, unless there are some convincing reasons for that. The results also need to be taken in conjunction with the day-to-day usage experience. As mentioned before, the 6450 fails on both counts. The GT 520 fails the day-to-day usage test (deinterlacing performance). The GT 430 gets a recommendation despite weighing in at less than 60 fps for the 1080p H.264 stress stream. The 6570 is the hands down winner in this section. It is able to carry out all the post processing steps even when it is forced to process very stressful video streams.
In the previous section, I mentioned about the bitrate limitations of the GT 430 when decoding 1080p H264 clips. NVIDIA confirmed that the GT 430 couldn't decode 60 fps videos at 80 Mbps. This piqued my curiosity and I tried out a few experiments to find out whether bitrate limitations exist for the usual 1080p24 videos on both the GT 430 and GT 520.
The DXVA Checker benchmark was repeated for all the bitrate testing files found in the NMT test files upto 110 Mbps.
We also created our own suite of bitrate testing streams at 1080p60. Running them through the DXVA Checker benchmark yielded the following results.
The results are presented in a bar chart above (A line chart would have made much more sense, but the outer values get placed only for bar charts in our graphing engine). For 1080p24 streams, we find that the GT 430 is unable to keep up with the real time decode frame rate requirements at 110 Mbps. For 1080p60 streams, the limit gets further reduced to somewhere between 65 and 70 Mbps. The GT 520 has no such issues.
The above testing is only of academic interest, since there is no real 1080p24 content at 110 Mbps. Even 3D Blu-rays max out around 60 Mbps (and that includes the audio stream!), so users shouldn't really be concerned about this bitrate limitation of the GT 430.
The GT 520's scores above are more interesting. Even the high end GPUs such as the 460 and 560 are unable to achieve that frame rate. The answer was buried in the README for the latest Linux drivers. The GT 520 is the first (and only GPU as of now) to support the VDPAU Feature Set D.
We asked NVIDIA about the changes in the new VDPAU feature set and what it meant for Windows users. They indicated that the new VPU was a faster version, also capable of decoding 4K x 2K videos. This means that the existing dual stream acceleration for 1080p videos has now been bumped up to quad stream acceleration.
Though the GPU can decode 4K videos, it is unfortunately not able to output it through HDMI. Despite the HDMI controller being advertised as HDMI 1.4a, it doesn't implement the 4K x 2K resolution part of the standard. The lack of HDMI sinks which accept that resolution is another matter, but that should get resolved in the next few years.
Despite the GT 520's advanced VPU engine, the lack of shaders limits its post processing capabilities. With all post processing options enabled, the GT 520's GPU load was always between 60 and 80%. The memory controller load (DRAM bandwidth usage) was between 20 and 40%. Despite the headroom apparently available, NVIDIA indicated that there weren't enough shaders available to implement the more advanced deinterlacing algorithms.
We will now devote a couple of sections to open source / freeware software for HTPCs running Windows 7.
One of the main issues with the integrated GPUs in the Clarkdale/Arrandale and Sandy Bridge CPUs is the fact that the Intel HD Graphics engine doesn't play nice with open source software. Commercial Blu-ray playback software have woeful support for MKVs and HD audio bitstreaming from non-Blu-ray sources. Users of the Intel IGP are often left to implement a series of hacks (such as using the Arcsoft and Cyberlink decoders in other DirectShow players) to get the expected HTPC experience.
Windows 7 uses the DirectShow / Media Foundation framework for media processing. A detailed description of the framework is beyond the scope of this piece. However, what we need to specifically be aware of is the architecture of the DirectShow framework, as described here.
The two important components of the filter chain in the above link are the splitter and the decoder. MPC-HC is undoubtedly the most common DirectShow based media player used to playback HD material. The player aims to make the task of the filter construction transparent to the user by packaging a variety of source filters for various containers. These include the Gabest MPEG Splitter, internal Matroska Splitter (I remember preferring the Haali Matroska Splitter from the CCCP codec pack in late 2009), MP4 Splitter, OGG Splitter etc. A look at the standalone filters reveals the large number of standalone source filters / splitters which are in the self-contained executable.
It is preferable to have a single actively developed multi-format splitter capable of handling a wide variety of containers with a multitude of audio and video codecs / subtitle formats. This is where the LAV Splitter project comes in. Basic supported for decrypted Blu-rays (playback of MPLS / index.bdmv files) is the icing on the cake.
With HD audio capable AV receivers becoming the rule rather than the exception, one increasingly finds users interested in HD audio bitstreaming. MediaSmartServer's Damian has a great guide dealing with the usage of ffdshow for HD audio bitstreaming. Advanced users often consider ffdshow Audio as bloatware for those who just want HD audio bitstreaming. LAV Splitter comes with the optional LAV Audio Decoder which achieves the same purpose. For users who want HD audio to be decoded in a bit-perfect manner, the LAV Splitter can also connect to the Arcsoft Audio Decoder as explained here. In this section, we will deal only with HD audio bitstreaming.
The gallery below shows the sequence of steps to install LAV Splitter and Audio Decoder on a Windows 7 machine for usage with MPC-HC. You can also use any other DirectShow based player. The LAV Audio Decoder is then configured to bitstream all the HD audio formats.
Ensure that all the file formats chosen in screenshot 5 are unselected in screenshot 8 (MPC-HC options -> Internal Filters -> Source Filter). Also, the Transform Filters for AC3 and DTS must be unselected to ensure that the LAV Audio Decoder is used. Screenshots 9 through 12 show the setting up of the LAV Audio Decoder for bitstreaming in the External Filters section using the 'Add Filter' button. Screenshots 13 through 16 show the setting up of the LAV Splitter in a similar manner. Fortunately, the default settings in the splitter configuration are good to go unless you have some specific requirements with respect to the language code or don't want the LAV Splitter to activate for some specific extensions. Make sure that both the LAV Splitter and the LAV Audio Decoder are set to 'Prefer' in the External Filters section.
Playing back interlaced VC-1 clips with the default codecs in MPC-HC usually results in a blank screen. To resolve this, the Arcsoft Video Decoder needs to be registered and used. (The WMV Decoder DMO codec is able to play back such clips, but it does software decode). After installing Arcsoft TMT (TMT 5 was used to test this out), the Arcsoft Video Decoder (ASVid.ax) was manually registered using the regsvr32 command in Administrator mode. This exposes the decoder for inclusion in the External Filters section in MPC-HC, as in the first picture of the gallery below. However, setting the decoder to Preferred doesn't enable its usage unless the checkactivate.dll from this doom9 post gets placed in the same folder as ASVid.ax. After this process, an interlaced VC-1 clip can be loaded into MPC-HC. In the default configuration, you will still end up with a blank screen. The VC-1 output compatibility of the LAV Splitter needs to be configured to reflect the presence of the Arcsoft VC-1 decoder, as in the rest of the screenshots in the gallery below. Note that this configuration is necessary only for AMD (and Intel) GPUs. NVIDIA GPUs have a open source decoder capable of playing back interlaced VC-1 clips in any DirectShow based player. We will cover that in the next section.
Once configured, you should be able to see the filters in action during the playback of any media file in the appropriate container.
Playing The DaVinci Code's BR Folder Structure - index.bdmv
(Click to Enlarge)
We have never seen the ffdshow Audio decoder successfully bitstream E-AC3 (Dolby Digital Plus). With the LAV Audio Decoder, there were no such issues.
Dolby Digital + Lights Up on the Onkyo 606
The ease of use of the LAV Audio Decoder, its tight integration with LAV Splitter and the ability to use the Arcsoft HD audio decoder for DTS-HD / DTS-ES streams (for which no open source decoders exist) make it a very attractive option for HTPC users.
The author of the LAV Splitter / Audio Decoder has another nifty tool coded up for HTPC users with NVIDIA cards. Based on the CUDA SDK, it is called LAV CUVID. The video decoder is not a typical CUDA API and does not use CUDA to decode. NVIDIA provides an extension to CUDA called CUVID, which just accesses the hardware decoder.
The only unfortunate aspect of LAV CUVID is that it is restricted to NVIDIA GPUs only. While OpenCL might be an open CUDA, it does not provide a CUVID-like video decoder extension on its own. ATI/AMD has the OpenVideoDecode API, which is an extension to OpenCL. Despite being open, it hasn't gained as much traction as CUDA. The AMD APIs are also fairly new and probably not mature enough for developers to focus attention on them yet. Intel offers a similar API through their Media SDK. Again, the lack of support seems to turn away developers.
On Linux, there is the VA-API abstraction layer, which is natively supported by Intel, and has compatibility layers onto VDPAU (NVIDIA), and OVD (ATI/AMD). So, on Linux it is theoretically possible for developers to create a multi-format video decoder. But, there is no support for HD audio bitstreaming with Linux. For Windows, developers are forced to use DXVA(2) for multi-platform video decoding applications.
Is there an incentive for NVIDIA users to shift from the tried and tested MPC Video Decoder (which uses DXVA(2))? I personally use LAV CUVID as my preferred decoder on NVIDIA systems for the following reasons:
- Support for uniform hardware acceleration for multiple codecs: Theoretically, everything listed under the supported DXVA modes by DXVA Checker should be utilized by the software decoders. Unfortunately, that is not the case. This is evident when the 'Check DirectShow/MediaFoundation Decoders' feature is used to verify the compatibility of a MPEG-4 or interlaced VC-1 stream. The mode either comes out as 'Unsupported', or, it is active only under DXVA1 for VMR (Video Mixing Renderer). LAV CUVID doesn't show DXVA support under DXVA Checker (because it really doesn't use DXVA). However, analysis of the GPU/CPU load reveals that its performance and usage of the GPU are very similar to that of the DXVA2 decoders. Furthermore, all our GPU stress tests were hardware accelerated except for the MS-MPEG4 clip.
- Support for choice of renderer: For the average Windows 7 HTPC user, the EVR (Enhanced Video Renderer) is much better than VMR since it contains multiple enhancements which are beyond the scope of this piece.
Almost all DXVA2 decoders can connect to the EVR. Advanced HTPC users are more demanding and want to do more post-processing than what EVR provides. madVR enters the scene here, and has support for multiple post processing steps which we will cover further down in this section. However, it doesn't interface with DXVA decoders. The LAV CUVID decoder can interface to all these renderers, and is not restricted like the other DXVA2 decoders.
Starting with v0.8, LAV CUVID has an installation program. Prior to that, the filters had to be registered manually, as in the gallery below.
After downloading and extracting the archive, the installation batch script needs to be run with administrator privileges. If the filter gets successfully registered, your favorite DirectShow player can be configured to use LAV CUVID. The process setup for MPC-HC is shown in the gallery. Make sure that the internal transform filters for the codecs you want to decode with LAV CUVID are unselected. After that, add LAV CUVID in the External Filters section and set it to 'Prefer'.
Here is a sample screenshot with EVR CP statistics for a MKV file played back with LAV Splitter, Audio Decoder and LAV CUVID Decoder.
Click to Enlarge
Now, let us shift our focus to madVR. It is a renderer replacement for EVR, and can be downloaded here. Currently, madVR does not do deinterlacing, noise reduction, edge enhancement and other post processing steps by itself. These need to be done before the frame is presented to madVR for rendering. When using a DXVA decoder, these steps are enabled from the NVIDIA or AMD control panel settings. With the LAV CUVID decoder, we get the post processing steps as enabled in the drivers. The decoded frames are copied back to the system RAM for madVR to use.
The madVR renderer uses the GPU pixel shader hardware for the following steps:
- Chroma upsampling
- High bit-depth color conversion
- Display calibration (optionally, if you have your own meter)
- Dithering of the internal calculation bit-depth (32bit+) down to the display bit-depth (8 bit)
All of these things are realized with a higher bit-depth and quality compared to what the standard GPU post processing algorithms do.
The gallery below gives an overview of how to install madVR and configure it appropriately.
After downloading and extracting the archive, run the installation batch script to register the renderer filter. By default, the MPC-HC 'Output' options has madVR grayed out under the 'DirectShow Video' section. After the registration of the madVR filter, it becomes possible to select this option. When you try to play a video with the new output settings, it is possible that a security warning pops up asking for permission to open the madVR control application. Allowing the application to run creates a tray icon to control the madVR settings as shown in the fifth screenshot in the gallery. Screenshots 6 through 12 show the various madVR post processing options.
madVR requires a very powerful GPU for its functioning. Do the GT 430 and GT 520 cut it for the full madVR experience? We will try to find that out in the next section.
LAV CUVID can be benchmarked using GraphStudio's inbuilt benchmark to check the video decoder performance. Unfortunately, GraphStudio can't use madVR in this process. Since our intent was to determine the performance of the GPU with and without madVR enabled, it was essential that madVR be a part of the benchmark. The developer of madVR, Mathias Rauen, created a special benchmarking build which was used to generate the figures in this section.
The picture below shows the madVR benchmark build working in the decode-only mode on the GT 430 for a 1080i60 H264 clip.
Click to Enlarge
LAV CUVID is doing the actual decoding (that is not visible in the picture) and sending frames over to the madVR filter, but the filter just keeps track of the decode frame rate and doesn't render it. All the driver post processing steps are enabled. The interlaced clip being played back uses around 76% of the VPU. Decoding is being performed at 91 fps, much more than the clip's 60 fps rate. The GPU load is 79%, and that is because of the deinterlacing being performed using the shaders. This shows there is some headroom available in the GPU for further post processing. Is there enough for madVR ? The picture below shows the benchmark build working in the decode + post processing mode.
Click to Enlarge
Note that the frame rate falls below the real time requirement. At 52 fps, the renderer drops approximately 8 frames every second. The VPU load falls to 38% because the process is now limited by how fast the processing steps in madVR can execute. GPU-Z shows that madVR has caused the GPU load to hike up to 97%, and this becomes the bottleneck in the chain.
Another interesting aspect to note in the GPU-Z screenshots above is that madVR increases the load on the GPU's memory controller from 23% to 36%. This is to be expected, as madVR makes multiple passes over the frame and needs to move data back and forth between the shaders and the GPU's DRAM.
The extent of drop in the frame rate (and whether it fails to meet real time requirements) is decided by the options enabled in the madVR settings. We ran the benchmarks with various madVR configurations and for various codecs to get an idea of the performance of LAV CUVID, madVR and of course, the GPUs.
Before moving on to the benchmarking results, we have some more notes about the upsampling algorithms in madVR. Human eyes are much less sensitive to chroma resolution than to luma resolution. This is the reason why chroma is stored in a lower resolution with 4:2:0 compression. Due to the low chroma resolution, chroma often tends to look blocky with visible aliasing (especially visible when you have e.g. red fonts on black background). Usually, the best way to upsample chroma is to use a very soft interpolator to remove all the aliasing. However, that comes at the cost of chroma sharpness. A less soft chroma upsampling algorithm will achieve sharpness. Basically, one can't have the cake and eat it too. So, it is a matter of taste as to whether one prefers removal of aliasing or wants a sharper picture.
The default luma algorithm used by madVR is Lanczos. The default chroma algorithm is SoftCubic 100 (which is very soft). It is not recommended to set chroma upsampling to Lanczos or Spline as they are very sharp. The cost in performance is also too big to be worth the gain for chroma. SoftCubic, Bicubic or Mitchell-Netravali are suggested for chroma upsampling as they are all 2-tap and need less GPU resources. In any case, it is hard to spot differences between various chroma algorithms in most real life images.
For luma upsampling the situation is very different. Most people prefer sharp results. The luma algorithm has a much bigger impact on overall image quality than the chroma upsampling algorithm. For luma upscaling, the nice sharp Lanczos 4 or Spline 4 is preferred by some users. Some prefer the SoftCubic 50 because it does a better job at hiding source artifacts. Others prefer Mitchell-Netravali or Bicubic for a more all around solution. There is no hard recommendation for this.
The madVR settings used for benchmarking were classified broadly into three categories:
- Low Quality : Bilinear luma and chroma scaling
- Medium Quality : Bicubic (sharpness 50) luma scaling and Bilinear chroma scaling
- High Quality : Lanczos (4-tap) luma scaling and SoftCubic (softness 70) chroma scaling
Scaling is one of the core functions in madVR, but it is not needed if the display resolution matches that of the video. In the 1080p and 1080i videos presented below, there is no scaling of luma, but chroma needs to be upsampled, though. The 'trade quality for performance' madVR options didn't seem to improve performance too much, and all of them were kept unchecked for benchmarking.
In the graphs below, 'Full VPP' refers to all the video post processing options as set in the NVIDIA Control Panel. The other entries refer to the madVR settings described above. The top row in each graph indicates the performance of the LAV CUVID decoder. When compared with the benchmarks of the DXVA2 decoders (presented in an earlier section), we see that the LAV CUVID decoder has almost no performance penalty.
In the graphs below, we try to identify what causes the throughput to fall below 60 fps. First, let us take a look at the 1080p H.264 clip.
In the above graph, we see that the lack of shaders in the GT 520 affects the madVR performance. The madVR steps become the bottleneck in this case. On the GT 430, the VPU remains the bottleneck till the more complicated scaling algorithms (of theoretical interest) are enabled (which are not presented in the graph above).
We see the same trends continuing for MPEG-2 and VC-1 also. Now, we move on to get a first glimpse at the extent of hardware acceleration available for MPEG-4 streams.
As expected, we get decent hardware acceleration for MPEG-4 and the post processing impact is the same as that for the other codecs.
Interlaced streams don't seem to alter the trend. The absolute values of the maximum decode frame rate is slightly lower in the high stress cases due to the overhead from deinterlacing. The GT 430's efficiency is now limited by shader power, rather than the VPU.
How do things change when we try to upscale the non-1080p content onto a 1080p display? This is probably where madVR's algorithms are needed most. To test this out, we put some non-1080i/p H.264 clips through the same benchmark.
An interesting result in the above benchmark is that the 480i H.264 stream can be processed faster using the GT 430 compared to the GT 520 with madVR disabled. It is quite obvious here that the deinterlacing using the GT 520's shaders is the bottleneck once the VPU hits 300 fps.
In all of the above non-1080i/p benchmarks, the lack of shaders in the GT 520 really hurt it. At 720p60, the High Quality frame rate is very close to 60 fps, and can't be recommended. The GT 430 holds up pretty decently in all the cases.
The takeaway from this section is that the GT 520 is not entirely suitable for madVR processing if you deal with a lot of SD material. The GT 430 is quite suitable for madVR processing as long as you keep the settings sane.
madVR is still an advanced HTPC user's tool. However, it should gain further traction with support for integrated hardware decoding and other driver supplied post processing options. We have covered a solution for NVIDIA GPU based HTPCs in this section. Let us see how this plays out for the AMD and Intel GPU platforms in the future.
Before proceeding to the conclusions, let us deal with a couple of topics which didn't fit into any of the preceding sections.
First off, we have some power consumption numbers. In addition to idle power, we also measure the average power consumption of the testbed over a 15 minute interval when playing back a 1080p24 MKV file in MPC-HC.
|HTPC Testbed Power Consumption|
|Idle Power Consumption (W)||Playback Power Consumption (W)|
|HTPC Testbed (Core i5-680)||56.6||67.7|
|NVIDIA GT 430||65.7||76|
|MSI GT 520||67||73.4|
There is not much to infer from the above power consumption numbers except that the GDDR5 based AMD 6450 needs to be avoided. All the cards idle around the same value. The AMD cards consume slightly more power when playing back the video.
I am sure many readers are also interested in the performance of the GPUs for 3D videos. With the latest PowerDVD and Total Media Theater builds, all the 3D Blu-rays we tried played back OK. Beyond this, we did't feel it necessary to devote time to develop a benchmarking methodology for 3D videos. There is no standardized way to store and transfer 3D videos. 3D Blu-ray ISOs are different from the 3D MKV standard, which, in turn are different from the standards adopted by some of the camcorder manufacturers. In our personal opinion, the 3D ecosystem for HTPCs is still in a mess. It is no secret that NVIDIA has invested heavily in the 3D ecosystem. In addition to the support for 3D movies, they also supply software to view stereoscopic photographs. If you plan on connecting your HTPC to a 3D TV and also plan to invest in 3D cameras or camcorders, the NVIDIA GPUs are a better choice (purely from a support viewpoint). If all you want to do is to play back your 3D Blu-rays any current GPU solution (Intel or AMD or NVIDIA) should be fine. Note that SBS/TAB (side-by-side/Top-and-Bottom) 3D streams (as used in TV broadcasts) are likely to have performance similar to that of the 2D 720p/1080i content.
From a broadcast perspective, MPEG-2 is a mature codec, but it is not very efficient at HD resolutions. H.264 is widely preferred. Current H.264 broadcast encoders take in the raw 4:2:2 10-bit data, but compress them using 8-bit 4:2:0 encoders. Recently, companies have put forward 10-bit 4:2:2 encoding [PDF] as a way to boost the efficiency of H.264 encoding. Unfortunately, none of the GPUs have support for decoding such streams (encoded with profile level High10). Considering that 10-bit 4:2:2 is finding acceptance within the professional community only now, we wouldn't fault the GPU vendors too much. However, x264 has started implementing 10-bit support now, making it possible for users to generate / back-up videos in the new profile. We would like GPU vendors to provide decode support for the High10 AVC profile as soon as possible in their mainstream consumer offerings.
Coming to the business end of the review, it must be quite clear by now that we can't recommend the GT 520 or the AMD 6450 with full confidence. They are probably doing well in the OEM market by getting incorporated into generic systems (not geared towards HTPC use). A discerning buyer building a HTPC system, having perused the various sections in this piece, would do well to avoid these two products.
Both AMD and NVIDIA GPUs suffer from a host of driver issues for the discerning HTPC user. Catalyst releases have been known to break GPU decoding in applications like VLC (something AMD has promised to fix in their next WHQL-certified driver). Supported refresh rates disappear from Catalyst if GPU scaling or ITC post processing is enabled. Different refresh rates default to different pixel formats on AMD cards. The HDMI audio driver maps the surround sounds in a 5.1 track to the rear surrounds in a 7.1 system on both NVIDIA and AMD cards. Both NVIDIA and AMD cards have been known to suffer from the silent stream bug at various points of time. Issues with RGB output levels and dithering resulting in banding artifacts on some displays have been reported on cards from both the vendors. The frustrating issue is that these problems get resolved in a particular driver release only to reappear in a later release. Unfortunately, issues like these are part and parcel of the HTPC experience. Both GPU vendors have a lot to learn from each other also.
If you prefer only AMD cards, the 6570 is the perfect HTPC card. The set of post processing options provided is very broad compared to what is provided by NVIDIA. All post processing options are enabled irrespective of ESVP, even for 60 fps videos. It has the highest HQV benchmark score of any HTPC-oriented GPU that we have evaluated so far. We didn't encounter any bitrate limitations with video playback. The pesky 23.976 Hz refresh rate may be a hit or miss depending on your setup, but it is way better than Intel's implementation. The lack of support for open source software developers and pricing relative to the NVIDIA GT 430 are probably the only complaints we can file against the 6570.
If you prefer only NVIDIA cards, the GT 430 is the perfect HTPC card for which you can obtain a passively cooled model. For enthusiasts, the ideal card would be one having more shaders than the GT 430 (for better madVR processing) and also the new VPU engine. However, there is no card fitting those criteria in the market right now. Our first impressions of the GT 430 last October were not favorable. However, driver updates have finally brought to fore the capabilities of the GPU. NVIDIA's support for the 3D ecosystem is better compared to AMD's. Support for custom refresh rates is a godsend for the videophiles and advanced HTPC users. The extensive support from open source applications is a definite plus. It is no wonder that most of the multimedia application developers swear by NVIDIA cards. The video bitrate limitations (not something one would encounter in real life), lack of comprehensive post processing options and the post-processing results when compared to the AMD 6570 (quite subjective) are probably the only complaints we can file against the GT 430.
If you are not in either camp, I would suggest going with the GT 430, if only for the price. Just last week, Newegg had a deal for the GT 430 at $20 after MIR. At that price, the card is simply unbeatable. At $70 without rebates, it is a more difficult decision to make. The 6570, retailing around $75, is probably a more future-proof card and has a better out-of-the-box HTPC experience. If you are the type of person who likes to constantly tinker with your HTPC and get excited by software tools which expose the HTPC capabilities of your GPU, go with the GT 430. If you are the install-it-and-forget-it type, a DDR3-based 6570 is the right card.