The NVIDIA SHIELD Android TV Review: A Premium 4K Set Top Boxby Ganesh T S on May 28, 2015 3:00 PM EST
Tegra X1: The Heart Of the SHIELD Android TV
Along with being NVIDIA’s first entry into the set top box/console market, the SHIELD Android TV also marks the launch of the latest generation Tegra SoC from NVIDIA, the Tegra X1. Formerly known by the codename Erista, NVIDIA first announced the Tegra X1 back at the company’s annual CES mobile presentation. In the long run NVIDIA has lofty plans for Tegra X1, using it to power their ambitious automotive plans – Drive CX visualization and Drive PX auto-pilot – but in the short run Tegra X1 is the successor to Tegra K1, and like K1 is meant to go into mobile devices like tablets and now set top boxes.
We’ve already covered Tegra X1 in quite some depth back at its announcement in January. But I wanted to recap the major features now that it’s finally shipping in a retail device, and in the process highlight what NVIDIA has capitalized on for the SHIELD Android TV.
The Tegra X1 is something of a crash project for NVIDIA, as NVIDIA originally planned for the codename Parker SoC to follow Tegra K1. However with Parker delayed – I suspect due to the fact that it’s scheduled to use a next-generation FinFET process, which is only now coming online at TSMC – Tegra X1 came in on short notice to take its place. The significance of Tegra X1’s rapid development is that it has influenced NVIDIA’s selection of features, and what they can pull off on the TSMC 20nm process.
This is most obvious on the CPU side. NVIDIA of course develops their own ARMv8 CPU core, Denver. However the rapid development of Tegra X1 meant that NVIDIA was faced with the decision to either try to port a Denver design to 20nm – something that was never originally planned – or to go with an off-the-shelf ARM CPU design that was already synthesized on 20nm, and NVIDIA chose the latter. By pairing up a readily available CPU design with their own GPU design and supporting logic, NVIDIA was able to get Tegra X1 developed in time to roll out in 2015.
The end result is that for Tegra X1, NVIDIA has tapped ARM’s Cortex-A57 and Cortex-A53 CPU cores for their latest SoC. The use of standard ARM cores makes it a bit harder for NVIDIA to stand apart from the likes of Samsung and Qualcomm, both of which use A57/A53 as well, but as A57 is a very capable ARMv8 design it’s not a bad place to be in on the whole.
Overall NVIDIA is using a quad-A57 + quad-A53 design, similar to other high-end SoCs. The A57s have been clocked at 2.0GHz, on part with some of the other A57 designs we’ve seen, meanwhile we’ve been unable to get confirmation on the clockspeed of the A53 cores. Meanwhile rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM’s CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications. It’s important to note that NVIDIA’s solution is cache coherent, so this system won't suffer from the power/performance penalties that one might expect given experience with previous SoCs that use cluster migration.
Throwing in an extra bonus in NVIDIA’s favor of course is the fact that the SHIELD Android TV is a set top box and not a mobile device, meaning it has no power limitations an essentially unlimited thermal headroom. We’ll see how this plays out in benchmarking, but the biggest impact here is that NVIDIA won’t have to fight with TSMC’s 20nm process too much for SHIELD Android TV, and can keep their A57s consistently clocked high.
Meanwhile feeding Tegra X1’s CPU cores and GPU is a new 64-bit LPDDR4-3200 memory interface, which is attached to a sizable 3GB of RAM. LPDDR4 offers a good mix of bandwidth increases and power consumption reduction through a lower 1.1v operating voltage, and is quickly being adopted by the industry as a whole. Otherwise NVIDIA’s choice to stick to a 64-bit memory bus is expected, though it continues to be an interesting choice as it requires they fully exploit their memory bandwidth efficiency capabilities, as other SoCs geared towards tablets and larger device (e.g. Apple A8X) come with larger memory buses.
On the GPU side, Tegra X1 features one of NVIDIA’s Maxwell architecture GPUs (the X being for maXwell, apparently). As a GPU architecture Maxwell marks the start of something of a new direction for NVIDIA, as NVIDIA designed it in what they call a mobile-first fashion. By starting in mobile and scaling up to the desktop, NVIDIA is integrating deep power optimizations into their GPU architectures at an earlier stage, achieving better power efficiency than scaling down desktop GPUs. This has also led to the gap between desktop and SoC implements of NVIDIA’s latest and greatest GPUs shrinking, with Tegra X1 showing up on a bit more than a year after Maxwell first appeared in desktop GPUs.
In any case, with Maxwell already shipping in desktops, it has proven to be a powerful and formidable GPU, both on an absolute performance basis and on a power efficiency basis. Though it’s a bit of circular logic to say that NVIDIA is intending to exploit these same advantage in the SoC space as they have the desktop space – after all, Maxwell was designed for SoCs first – Maxwell’s capabilities are clearly established at this point. So from a marketing/branding perspective, NVIDIA is looking to capitalize on that for Tegra X1 and the SHIELD Android TV.
Overall the X1’s GPU is composed of 2 Maxwell SMMs inside a single GPC, for a total of 256 CUDA cores. On the resource backend, NVIDIA has gone from 4 ROPs on Tegra K1 to 16 on X1, which won’t lead to anything near a 4x performance increase, but it is very important for NVIDIA’s desires to be able to drive a 4K display at 60Hz. And while NVIDIA isn’t listing the clockspeeds for the SHIELD Android TV, we believe it to be at or close to 1GHz based on past statements and the device’s very high thermal threshold.
Meanwhile from a feature standpoint Maxwell is as modern a SoC GPU as you’re going to find. Derived from a desktop GPU, it features support for all modern Android APIs and then some. So not only does this include OpenGL ES 3.1 and the Android Extension Pack, but it supports full desktop OpenGL 4.5 as well. On paper it is also capable of supporting Khronos’s forthcoming low-level Vulkan API, though we’re still a bit early to be talking about Vulkan on mobile platforms.
Also introduced on the Maxwell architecture – and by extension Tegra X1 – is NVIDIA’s latest generation of color compression technology, which significantly reduces NVIDIA’s memory bandwidth needs for graphics workloads. NVIDIA’s memory bandwidth improvements are in turn going to be very important for Tegra X1 since they address one of the biggest performance bottlenecks facing SoC-class GPUs. In the case of memory bandwidth optimizations, memory bandwidth has long been a bottleneck at higher performance levels and resolutions, and while it’s a solvable problem, the general solution is to build a wider (96-bit or 128-bit) memory bus, which is very effective but also drives up the cost and complexity of the SoC and the supporting hardware. In this case NVIDIA is sticking to a 64-bit memory bus, so memory compression is very important for NVIDIA to help drive X1. This coupled with a generous increase in memory bandwidth from the move to LPDDR4 helps to ensure that X1’s more powerful GPU won’t immediately get starved at the memory stage.
The other major innovation here is support for what NVIDIA calls “double speed FP16”, otherwise known as packed FP16 support. By packing together two compatible low-precision FP16 operations, NVIDIA is able to double their FP16 throughput per CUDA core relative to the Tegra K1, which coupled with the overall increase in CUDA cores leads to a very significant improvement in potential FP16 performance. Though this feature is perhaps most strongly aimed at NVIDIA’s Drive platforms, Android itself and a good chunk of Android games still use a large number of FP16 operations in the name of power efficiency, so this further plays into X1’s capabilities, and helps NVIDIA stretch X1’s performance a bit further for gaming on the SHIELD Android TV.
Last but certainly not least however is Tegra X1’s media capabilities, which more than anything else are the heart and soul of the SHIELD Android TV. By being one of the newest SoCs on the block the Tegra X1 is also one of the most capable SoCs from a media standpoint, which is allowing NVIDIA to come out of the gate as the flagship Android TV device.
Chief among these is support for everything NVIDIA needs to drive 4K TVs. Tegra X1 and SHIELD Android TV support HDMI 2.0, allowing it to drive TVs up to 4Kp60, and with full quality 4:4:4 chroma subsampling. NVIDIA also supports the latest HDCP 2.2 standard, which going hand-in-hand with HDMI 2.0 is (unfortunately) required by 4K streaming services such as Netflix to protect their content, as they won’t stream 4K to devices lacking this level of DRM.
On the backend of things, Tegra X1 brings with it support for H.264, VP9, and H.265 (HEVC) decoding. The latter two are just now appearing in SoCs, and as higher efficiency codecs are going to be the codecs of choice for 4K streaming. Consequently then Tegra X1 is capable of decoding all of these codecs at up to 4K resolution at 60fps, ensuring that it can decode not just 24fps movie content, but 30fps and 60fps TV content as well. As one final benefit, NVIDIA is also supporting full hardware decoding of 10-bit (Main 10) H.265 video, which means that the Tegra X1 and SHIELD Android TV will be capable of handling higher quality, higher bit depth content, including forthcoming HDR video.
|NVIDIA SHIELD SoC Comparison|
|SHIELD Tablet (Tegra K1)||SHIELD Android TV (Tegra X1)|
|CPU||4x Cortex A15r3 @ 2.2 GHz||4x Cortex A57 @ 2.0GHz
4X Cortex A53@ ?GHz
|GPU||Kepler, 1 SMX (192 CUDA Cores)
|Maxwell, 2 SMMs (256 CUDA Cores)
|Memory||2 GB, LPDDR3-1866||3 GB, LPDDR4-3200|
|Memory Bus Width||64-bit||64-bit|
|FP16 Peak||365 GFLOPS||1024 GFLOPS|
|FP32 Peak||365 GFLOPS||512 GFLOPS|
|Manufacturing Process||TSMC 28nm||TSMC 20nm SoC|
Taken in overall, the use of the Tegra X1 puts the SHIELD Android TV in a very interesting position. From a raw graphics standpoint the system is arguably overpowered for basic Android TV functionality. Even though this is a SoC-class Maxwell implementation, the basic Android TV UI does not heavily consume resources, a design decision mindful of what most other SoCs are capable of. But this also means that NVIDIA should have no trouble keeping the Android TV UI moving along at 60fps, and if they do struggle then it would certainly raise some questions given just how powerful Tegra X1’s GPU is.
Gaming on the other hand still needs all the GPU processing power it can get, and to that end NVIDIA is delivering quite a bit. NVIDIA still has to live with the fact that Tegra X1 isn’t close to the performance of the current-generation consoles, with SoCs having just recently surpassed the last-generation consoles, but by being the most powerful SoC in the Android TV space, it means NVIDIA can at least deliver an experience similar to (and likely a bit better than) the last-generation consoles, which is still a bit step up.
Otherwise from a media decode standpoint, Tegra X1 is the perfect fit for the device that will be the flagship Android TV box. By supporting all of the latest codecs and display standards, NVIDIA is in a good position going forward to work with the increasing number of 4K TVs and the various over-the-top media services that will be utilizing H.265 to drive their 4K streaming. The fact that NVIDIA is pushing media capabilities so hard for today’s launch is not a mistake, as it’s likely to be their most useful advantage early in the device’s lifetime.