Original Link: http://www.anandtech.com/show/6335/amds-trinity-an-htpc-perspective
AMD's Trinity : An HTPC Perspectiveby Ganesh T S on September 27, 2012 11:00 AM EST
Intel started the trend of integrating a GPU along with the CPU in the processor package with Clarkdale / Arrandale. The GPU moved to the die itself in Sandy Bridge. Despite having much more powerful GPUs at its disposal (from the ATI acquisition), AMD was a little late in getting to the CPU - GPU party. Their first full endeavour, the Llano APU (we're skipping Brazos / Zacate / Ontario as it was more of a netbook/nettop part), released towards the end of Q2 2011. The mobile version of the next generation APUs, Trinity, was launched in May 2012.
The desktop version of Trinity will be rolling out shortly. We have a gaming centric piece with general observations here. This piece will deal with the HTPC aspects. Llano, while being pretty decent for HTPC use, didn't excite us enough to recommend it wholeheartedly. Intel's Ivy Bridge, on the other hand, surprised us with its HTPC capabilities. In the rest of this review, we will see whether Trinity manages to pull things back for AMD on the HTPC front.
Some of the issues that we had with Llano included differences in video post processing for Blu-ray and local videos, issues with the Enforce Smooth Video Playback (ESVP) feature and driver problems related to chroma upsampling. Our first step after setting up the Trinity HTPC testbed was to check up on these issues. At the very outset, we are happy to note that advancements in software infrastructure, driver quality and to some extent, the hardware itself, have resolved most of the issues.
We see that the Trinity GPU is much better than Intel's HD4000 from a gaming viewpoint. Does this translate to a better performance when it comes to HTPC duties? As we will find out in the course of this piece, the answer isn't a resounding yes, but AMD does happen to get some things right where Intel missed the boat.
In this review, we present our experience with Trinity as a HTPC platform using an AMD A10-5800K (with AMD Radeon HD 7660D). In the first section, we tabulate our testbed setup and detail the tweaks made in the course of our testing. A description of our software setup and configuration is also provided. Following this, we have the results from the HQV 2.0 benchmark and some notes about the driver fixes that have made us happy. A small section devoted to the custom refresh rates is followed by some decoding and rendering benchmarks. No HTPC solution is completely tested without looking at the network streaming capabilities (Adobe Flash and Microsoft Silverlight performance). In the final section, we cover miscellaneous aspects such as power consumption and then proceed to the final verdict.
AMD provided us with an A10-5800K APU along with the Asus F2 A85-M Pro motherboard for our test drive. Purists might balk at the idea of an overclockable 100W TDP processor being used in tests intended to analyze the HTPC capabilities. However, the A10-5800K comes with the AMD Radeon HD 7660D, the highest end GPU in the Trinity lineup. Using this as the review platform gives readers an understanding of the maximum HTPC capabilities of the Trinity lineup.
The table below presents the hardware components of our Trinity HTPC testbed:
|Trinity HTPC Testbed Setup|
|Processor||AMD A10-5800K - 3.80 GHz (Turbo to 4.2 GHz)|
|AMD Radeon HD 7660D - 800 MHz|
|Motherboard||Asus F2A85-M Pro uATX|
|OS Drive||OCZ Vertex2 120 GB|
|Memory||G.SKILL Ares Series 8GB (2 x 4GB) SDRAM DDR3 2133 (PC3 17000) F3-2133C9Q-16GAB CAS 9-11 -10-28 2N|
|Optical Drives||ASUS 8X Blu-ray Drive Model BC-08B1ST|
|Case||Antec Skeleton ATX Open Air Case|
|Power Supply||Antec VP-450 450W ATX|
|Operating System||Windows 7 Ultimate x64 SP1|
|Display / AVR||Acer H243H / Pioneer Elite VSX-32 + Sony Bravia KDL46EX720|
The Trinity platform officially supports DDR3-1866 modules. Towards this, we obtained a 16 GB DDR3-2133 Ares kit from G.Skill for our testbed. Using this kit made it possible to study HTPC behaviour from a memory bandwidth perspective.
The software setup for the Trinity HTPC testbed involved the following:
|Trinity HTPC Testbed Software Setup|
|Blu-ray Playback Software||CyberLink PowerDVD 12|
|Media Player||MPC-HC v18.104.22.16818|
|Splitter / Decoder||LAV Filters 0.51.3|
|Renderers||EVR / EVR-CP (integrated in MPC-HC v22.214.171.12418)|
The madVR renderer settings were fixed as below for testing purposes:
- Decoding features disabled
Deinterlacing set to:
- automatically activated when needed (activate when in doubt)
- automatic source type detection (i.e, disable automatic source type detection is left unchecked)
- only look at pixels in the frame center
- be performed in a separate thread
Scaling algorithms were set as below:
- Chroma upscaling set to SoftCubic with softness of 100
- Luma upscaling set to Lanczos with 4 taps
- Luma downscaling set to Lanczos with 4 taps
Rendering parameters were set as below:
- Start of playback (including post-seek) was delayed till the render queue filled up
- Automatic fullscreen exclusive mode was used
- A separate device was used presentation, and D3D11 was used
- CPU and GPU queue sizes were set to 32 and 24 respectively
- Under exclusive mode settings, the seek bar was enabled, switch to exclusive mode from windowed mode was delayed by 3 seconds and 16 frames were configured to be presented in advance. The GPU was set to fush after the intermediate render steps, copy to back buffer and after D3D peresentation. In addition, the GPU was set to wait (sleep) after the last render step.
Unlike our Ivy Bridge setup, we found the windowed mode to be generally bad in terms of performance compared to exclusive mode. Also, none of the options to trade quality for performance were checked.
HTPC enthusiasts are often concerned about the quality of pictures output by the system. While this is a very subjective metric, we have been taking as much of an objective approach as possible. We have been using the HQV 2.0 benchmark in our HTPC reviews to identify the GPUs' video post processing capabilities. The HQV benchmarking procedure has been heavily promoted by AMD, but Intel also seems to be putting its weight behind that now.
The control panel for the Trinity GPU retains the host of options from earlier Catalyst releases. We used Catalyst 12.8 in our testing.
HQV scores need to be taken with a grain of salt. In particular, one must check the tests where the GPU lost out points. In case those tests don't reflect the reader's usage scenario, the handicap can probably be ignored. So, it is essential that the scores for each test be compared, rather than just the total value.
The HQV 2.0 test suite consists of 39 different streams divided into 4 different classes. For the Trinity HTPC, we used Cyberlink PowerDVD 12 with TrueTheater disabled and hardware acceleration enabled for playing back the HQV streams. The playback device was assigned scores for each, depending on how well it played the stream. Each test was repeated multiple times to ensure that the correct score was assigned. The scoring details are available in the testing guide from HQV.
Blu-rays are usually mastered very carefully. Any video post processing (other than deinterlacing) which needs to be done is handled before burning it in. In this context, we don't think it is a great idea to run the HQV benchmark videos off the disc. Instead, we play the streams after copying them over to the hard disk. How does the score compare to what was obtained by the Llano and Ivy Bridge at launch?
In the table below, we indicate the maximum score possible for each test, and how much each GPU was able to get. The HD4000 is from the Core i7-3770K with the Intel 126.96.36.19996 drivers. The AMD 6550D was tested with Catalyst 11.6 (driver version 8.862 RC1).
|HQV 2.0 Benchmark|
|Test Class||Chapter||Tests||Max. Score||AMD 6550D (Local file)||Intel HD4000||AMD 7660D|
|Video Conversion||Video Resolution||Dial||5||4||5||5|
|Dial with Static Pattern||5||5||5||5|
|Film Resolution||Stadium 2:2||5||5||5||5|
|Overlay On Film||Horizontal Text Scroll||5||5||3||5|
|Vertical Text Scroll||5||5||5||5|
|Cadence Response Time||Transition to 3:2 Lock||5||5||5||5|
|Transition to 2:2 Lock||5||5||5||5|
|Multi-Cadence||2:2:2:4 24 FPS DVCam Video||5||5||5||5|
|2:3:3:2 24 FPS DVCam Video||5||5||5||5|
|3:2:3:2:2 24 FPS Vari-Speed||5||5||5||5|
|5:5 12 FPS Animation||5||5||5||5|
|6:4 12 FPS Animation||5||5||5||5|
|8:7 8 FPS Animation||5||5||5||5|
|Color Upsampling Errors||Interlace Chroma Problem (ICP)||5||2||5||5|
|Chroma Upsampling Error (CUE)||5||2||5||5|
|Noise and Artifact Reduction||Random Noise||SailBoat||5||5||5||5|
|Compression Artifacts||Scrolling Text||5||3||5||5|
|Upscaled Compression Artifacts||Text Pattern||5||3||3||3|
|Image Scaling and Enhancements||Scaling and Filtering||Luminance Frequency Bands||5||5||5||5|
|Chrominance Frequency Bands||5||5||5||5|
|Resolution Enhancement||Brook, Mountain, Flower, Hair, Wood||15||15||15||15|
|Video Conversion||Contrast Enhancement||Theme Park||5||5||5||5|
|Beach at Dusk||5||5||5||5|
|White and Black Cats||5||5||5||5|
|Skin Tone Correction||Skin Tones||10||7||7||7|
We did some quick tests to ensure that all the post processing steps available for Blu-rays were also available for local files using MPC-HC's EVR renderer. AMD deserves kudos for being the only GPU vendor to get the local cadence detection (with respect to the shredding of overlay text) correct. In addition, they are the only ones to offer mosquito noise reduction as an option in addition the usual denoising setting. The chroma upsampling issue we noticed with the Llano is no longer an issue. With a score of 199, the 7660D with the Catalyst 12.8 drivers becomes the best performer on the HQV front. It is possible that the 7750's scores have also improved with these drivers, but we will take a look at that in another piece.
However, this doesn't mean that AMD's drivers are perfect. By default, the EVR / EVR-CP renderers rely on the drivers to supply the correct video levels to the display. As the screenshots below indicate, the meanings of full and limited seem to be interchanged.
Dynamic Range Set to Full for Video Playback (background)
Dynamic Range Set to Limited for Video Playback (background)
Setting the dynamic range to Limited (16-235) exposes black levels in the range 0-15 and 236-255, while setting it to Full (0-255) actually clips the color levels in the video in the background. This was one of the more obvious bugs that we encountered in our review process.
Many users tend to avoid Intel GPUs because of the absence of accurate video output refresh rates. Intel has still not come out with their promised update to bring 23.976 Hz refresh to Ivy Bridge. AMD has historically been able to provide quite accurate refresh rates while NVIDIA gives users the ability to make fine-grained adjustments to their settings.
How does Trinity fare? The short story is that the display refresh rate is not as accurate as we would like. However, it is still much better than Intel's setting. NVIDIA cards, when configured correctly, can probably provide better accuracy. We are not sure whether this is an issue specific to the Asus board, or it is a problem with the drivers / processor's video output itself. Setting the display refresh rate to 23 Hz yields 23.977 Hz, as shown below.
Other refresh rates also suffer similar problems The gallery below shows some of the other refresh rates that we tested.
An interesting point to note here is that AMD is able to drive 25 Hz, 29 Hz and 30 Hz refresh rates on the Sony KDL46EX720 through the Pioneer Elite VSX-32. In the same setup, NVIDIA and Intel don't present these settings in the progressive format. That said, both Intel and NVIDIA offer 50 Hz, 59 Hz and 60 Hz settings which are exactly double of the above settings (Clarification: 29 Hz in the control panel corresponds to a refresh rate of 29.97 Hz, and 59 Hz in the panel corresponds to a refresh rate of 59.94 Hz).
It would be nice to have more control over the display refresh rate similar to what NVIDIA provides. That would help users fine-tune their settings in case the out of the box behaviour doesn't match the user's expectations.
In the last few HTPC reviews, we have incorporated video decoding and rendering benchmarks. The Ivy Bridge review carried a table of values with the CPU and GPU usage. The Vision 3D 252B review made use of HWInfo's sensor graphs to provide a better perspective. In the latter review, it was easier to visualize the extent of stress that a particular video decode + render combination gave to the system. Unfortunately, HWInfo doesn't play well with the A10-5800K / Radeon HD 7660D yet. In particular, GPU loading and CPU package power aren't available for AMD-based systems yet.
The tables below present the results of running our HTPC rendering benchmark samples through various decoder and renderer combinations. Entries in bold with a single star indicate that there were dropped frames as per the renderer status reports in the quiescent state, while double stars indicate that the number of dropped frames made the video unwatchable. The recorded values include the GPU loading and power consumed by the system at the wall. An important point to note here is that the system was set to optimized defaults in the BIOS (GPU at 800 MHz, DRAM at 1600 MHz and CPU cores at 3800 MHz).
madVR was configured with the settings mentioned in the software setup page. All the video post processing options in the Catalyst Control Center were disabled except for deinterlacing and pulldown detection. In our first pass, we used a pure software decoder (avcodec / wmv9 dmo, through LAV Video Decoder) to supply madVR with the decoded frames.
|LAV Video Decoder Software Fallback + madVR|
|Stream||GPU Usage %||Power Consumption|
|480i60 MPEG-2||38||77.9 W|
|576i50 H.264||24||68.2 W|
|720p60 H.264||49||106.6 W|
|1080i60 H.264||81||128.1 W|
|1080i60 MPEG-2||85||115.4 W|
|1080i60 VC-1||84||131.7 W|
|1080p60 H.264||51||116.6 W|
madVR takes up more than 80% of the resources when processing 60 fps interlaced material. The software decode penalty is reflected in the power consumed at the wall, with the 1080i60 VC-1 stream consuming more than 130W on an average. The good news is that all the streams played without any dropped frames with the optimized default settings.
The holy grail of HTPCs, in our opinion, is to obtain hardware accelerated decode for as many formats as possible. A year or so back, it wasn't possible to use any hardware decoders with the madVR renderer. Thanks to Hendrik Leppkes's LAV Filters, we now have a DXVA2 Copy-Back (DXVA2CB) decoder which enables usage of DXVA2 acceleration with madVR. The table below presents the results using DXVA2CB and madVR.
|LAV Video Decoder DXVA2 Copy-Back + madVR|
|Stream||GPU Usage %||Power Consumption|
|480i60 MPEG-2||44||76.8 W|
|576i50 H.264||24||66.2 W|
|720p60 H.264||54||102.4 W|
|1080i60 H.264 **||72||111.1 W|
|1080i60 MPEG-2 *||82||111.8 W|
|1080i60 VC-1 *||84||111.6 W|
|1080p60 H.264 **||64||110.4 W|
There is a slight improvement in power consumption for the first few streams. We still have a bit of power penalty compared to pure hardware decode because the decoded frames have to get back to the system memory and then go back into the GPU for madVR to process. An unfortunate point to note here is that none of the 1080i60 / 1080p60 streams could play properly with our optimized default settings (rendering their GPU usage and power consumption values meaningless). We did boost up the memory speeds to DDR3-2133 and saw some improvements with respect to the number of dropped frames. However, we were unable to make the four streams play perfectly even with non-default settings.
For non-madVR renderers, we set Catalyst 12.8 to the default settings. The table below presents the results obtained with LAV Video Decoder set to DXVA2 Native mode. All the streams played perfectly, but the power numbers left us puzzled.
|LAV Video Decoder DXVA2 Native + EVR-CP|
|Stream||GPU Usage %||Power Consumption|
|480i60 MPEG-2||26||78.1 W|
|576i50 H.264||22||78.1 W|
|720p60 H.264||38||90.1 W|
|1080i60 H.264||69||103.9 W|
|1080i60 MPEG-2||69||102.2 W|
|1080i60 VC-1||69||104.2 W|
|1080p60 H.264||60||98.4 W|
For SD streams, the power consumed is almost as much as madVR with software decode. However, the HD streams pull back the numbers a little. This is something worth investigating, but outside the scope of this article. However, we wanted to dig a bit into this, and decided to repeat the tests with the EVR renderer.
With Catalyst 12.8 in default settings and LAV Video Decoder set to DXVA2 Native mode, all the streams played perfectly with low power consumption. All post processing steps were also visible (as enabled in the drivers)
|LAV Video Decoder DXVA2 Native + EVR|
|Stream||GPU Usage %||Power Consumption|
|480i60 MPEG-2||27||60.6 W|
|576i50 H.264||25||60.1 W|
|720p60 H.264||35||65.7 W|
|1080i60 H.264||67||80.1 W|
|1080i60 MPEG-2||67||80.6 W|
|1080i60 VC-1||67||82.5 W|
|1080p60 H.264||59||79.2 W|
A look at the above table indicates that hardware decode with the right renderer can make for a really power efficient HTPC. In some cases, we have more than 20 W difference depending on the renderer used, and as much as 40 W difference between software and hardware decode with additional renderer steps.
Flash acceleration has traditionally worked without issues in AMD and NVIDIA drivers, unlike Intel. Intel and Adobe got it right with Ivy Bridge. Fortunately, things look good with Trinity too. As the screenshot below indicates, we have full GPU acceleration for both decoding and rendering. AMD's System Monitor shows how the CPU and GPU resources are balanced when playing H.264 Flash videos.
Netflix streaming, on the other hand, uses Microsoft's Silverlight technology. Unlike Flash, hardware acceleration for the video decode process is not controlled by the user. It is up to the server side code to attempt GPU acceleration. Thankfully, Netflix does try to take advantage of the GPU's capabilities.
This is evident from the A/V stats recorded while streaming a Netflix HD video at the maximum possible bitrate of 3.7 Mbps. The high GPU usage in the AMD System Monitor also points to hardware acceleration being utilized.
One point which deserves mention here is that Flash and Silverlight acceleration works without hiccups here, unlike what we saw in the Brazos-based machines (where the CPU was too weak despite the availability of hardware acceleration through the GPU).
Before proceeding to the business end of the review, let us take a look at some power consumption numbers. The G.Skill RAM was set to DDR3 1600 during the measurements. We measured the average power drawn at the wall under different conditions. In the table below, the Blu-ray movie from the optical disk was played using CyberLink PowerDVD 12. The Prime95 + Furmark benchmark was run for 1 hour before any measurements were taken. The MKVs were played back from a NAS attached to the network. The testbed itself was connected to a GbE switch (as was the NAS). In all cases, a wireless keyboard and mouse were connected to the testbed.
|Trinity HTPC Power Consumption|
|Prime95 + Furmark (Full loading)||172.1 W|
|Blu-ray from optical drive||93.1 W|
|Blu-ray ISO from NAS||62.3 W|
|1080p24 MKV Playback (MPC-HC + QuickSync + EVR-CP)||55.8 W|
|1080p24 MKV Playback (MPC-HC + QuickSync + madVR)||58.3 W|
The Trinity platform ticks all the checkboxes for the mainstream HTPC user. Setting up MPC-HC with LAV Filters was a walk in the park. With good and stable support for DXVA2 APIs in the drivers, even software like XBMC can take advantage of the GPU's capabilities. Essential video processing steps such as chroma upsampling, cadence detection and deinterlacing work beautifully. For advanced users, the GPU is capable of supporting madVR for most usage scenarios even with DDR3-1600 memory in the system (provided DXVA is not used for decoding the video). Ivy Bridge wasn't a slam-dunk in this scenario even with software decode.
Does this signify the end of the road for the discrete HTPC GPU? Unfortunately, that is not the case. The Trinity platform is indeed much better than Llano, and can match / surpass even Ivy Bridge. However, it is not future proof. While AMD will end up pleasing a large HTPC audience with Trinity, there are still a number of areas which AMD seems to have overlooked:
- Despite the rising popularity of 10-bit H.264 encodes, the GPU doesn't seem to support decoding them in hardware. That said, software decoding of 1080p 10-bit H.264 is not complex enough to overwhelm the A10-5800K (but that may not be true for the lower end CPUs).
- Full hardware decode of MVC 3D videos is not available. 3D Blu-rays have a slightly greater power penalty as a result. However, 3D is fast becoming an 'also-ran' feature, and we don't really fault Trinity for not having full acceleration.
- The video industry is pushing 4K and it makes more sense to a lot of people compared to the 3D push. 4K should see a much faster rate of adoption compared to 3D, but Trinity seems to have missed the boat here. AMD's Southern Islands as well as NVIDIA's Kepler GPUs support 4K output over HDMI, but Trinity doesn't have 4K video decode acceleration or 4K display output over HDMI.
Our overall conclusion is that discrete GPUs for HTPC use are only necessary if one has plans to upgrade to 4K in the near term, or the user is set upon using madVR for 1080i60 content. Otherwise, the Trinity platform has everything that a mainstream HTPC user would ever need.