NVIDIA Tegra X1 Preview & Architecture Analysis

Name: NVIDIA Tegra X1 Preview & Architecture Analysis
Item: NVIDIA Tegra X1 Preview & Architecture Analysis

by Joshua Ho & Ryan Smith on January 5, 2015 1:00 AM EST

194 Comments | Add A Comment

194 Comments

Automotive: DRIVE CX and DRIVE PX

While NVIDIA has been a GPU company throughout the entire history of the company, they will be the first to tell you that they know they can’t remain strictly a GPU company forever, and that they must diversify themselves if they are to survive over the long run. The result of this need has been a focus by NVIDIA over the last half-decade or so on offering a wider range of hardware and even software. Tegra SoCs in turn have been a big part of that plan so far, but NVIDIA of recent years has become increasingly discontent as a pure hardware provider, leading to the company branching out in unusual ways and not just focusing on selling hardware, but selling buyers on whole solutions or experiences. GRID, Gameworks, and NVIDIA’s Visual Computing Appliances have all be part of this branching out process.

Meanwhile with unabashed car enthusiast Jen-Hsun Huang at the helm of NVIDIA, it’s slightly less than coincidental that the company has also been branching out in to automotive technology as well. Though still an early field for NVIDIA, the company’s Tegra sales for automotive purposes have otherwise been a bright spot in the larger struggles Tegra has faced. And now amidst the backdrop of CES 2015 the company is taking their next step into automotive technology by expanding beyond just selling Tegras to automobile manufacturers, and into selling manufacturers complete automotive solutions. To this end, NVIDIA is announcing two new automotive platforms, NVIDIA DRIVE CX and DRIVE PX.

DRIVE CX is NVIDIA’s in-car computing platform, which is designed to power in-car entertainment, navigation, and instrument clusters. While it may seem a bit odd to use a mobile SoC for such an application, Tesla Motors has shown that this is more than viable.

With NVIDIA’s DRIVE CX, automotive OEMs have a Tegra X1 in a board that provides support for Bluetooth, modems, audio systems, cameras, and other interfaces needed to integrate such an SoC into a car. This makes it possible to drive up to 16.6MP of display resolution, which would be around two 4K displays or eight 1080p displays. However, each DRIVE CX module can only drive three displays. In press photos, it appears that this platform also has a fan which is likely necessary to enable Tegra X1 to run continuously at maximum performance without throttling.

NVIDIA showed off some examples of where DRIVE CX would improve over existing car computing systems in the form of advanced 3D rendering for navigation to better convey information, and 3D instrument clusters which are said to better match cars with premium design. Although the latter is a bit gimmicky, it does seem like DRIVE CX has a strong selling point in the form of providing an in-car computing platform with a large amount of compute while driving down the time and cost spent developing such a platform.

While DRIVE CX seems to be a logical application of a mobile SoC, DRIVE PX puts mobile SoCs in car autopilot applications. To do this, the DRIVE PX platform uses two Tegra X1 SoCs to support up to twelve cameras with aggregate bandwidth of 1300 megapixels per second. This means that it’s possible to have all twelve cameras capturing 1080p video at around 60 FPS or 720p video at 120 FPS. NVIDIA has also made most of the software stack needed for autopilot applications already, so there would be comparatively much less time and cost needed to implement features such as surround vision, auto-valet parking, and advanced driver assistance.

In the case of surround vision, DRIVE PX is said to deliver a better experience by improving stitching of video to reduce visual artifacts and compensate for varying lighting conditions.

The valet parking feature seems to build upon this surround vision system, as it uses cameras to build a 3D representation of the parking lot along with feature detection to drive through a garage looking for a valid parking spot (no handicap logo, parking lines present, etc) and then autonomously parks the car once a valid spot is found.

NVIDIA has also developed an auto-valet simulator system with five GTX 980 GPUs to make it possible for OEMs to rapidly develop self-parking algorithms.

The final feature of DRIVE PX, advanced driver assistance, is possibly the most computationally intensive out of all three of the previously discussed features. In order to deliver a truly useful driver assistance system, NVIDIA has leveraged neural network technologies which allow for object recognition with extremely high accuracy.

While we won’t dive into deep detail on how such neural networks work, in essence a neural network is composed of perceptrons, which are analogous to neurons. These perceptrons receive various inputs, then given certain stimulus levels for each input the perceptron returns a Boolean (true or false). By combining perceptrons to form a network, it becomes possible to teach a neural network to recognize objects in a useful manner. It’s also important to note that such neural networks are easily parallelized, which means that GPU performance can dramatically improve performance of such neural networks. For example, DRIVE PX would be able to detect if a traffic light is red, whether there is an ambulance with sirens on or off, whether a pedestrian is distracted or aware of traffic, and the content of various road signs. Such neural networks would also be able to detect such objects even if they are occluded by other objects, or if there are differing light conditions or viewpoints.

While honing such a system would take millions of test images to reach high accuracy levels, NVIDIA is leveraging Tesla in the cloud for training neural networks that are then loaded into DRIVE PX instead of local training. In addition, failed identifications are logged and uploaded to the cloud in order to further improve the neural network. Both of these updates can be done either over the air or at service time, which should mean that driver assistance will improve with time. It isn’t a far leap to see how such technology could also be leveraged in self-driving cars as well.

Overall, NVIDIA seems to be planning for the DRIVE platforms to be ready next quarter, and production systems to be ready for 2016. This should mean that it's possible for vehicles launching in 2016 to have some sort of DRIVE system present, although it's possible that it would take until 2017 to see this happen.

GPU Performance Benchmarks Final Words

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

194 Comments

View All Comments

chizow - Monday, January 5, 2015 - link
Careful, you do mean A8X right? Because Denver K1 is an actual product that absolutely stomps A8, only after Apple somewhat unexpectedly "EnBiggened" their A8 by increasing transistors and functional units 50%, did they manage to match K1's GPU and edge the CPU in multi-core (by adding a 3rd core).

To say Denver K1 didn't deliver is a bit of a joke, since it is miles ahead of anything on the Android SoC front, and only marginally bested in CPU due to Apple's brute-force approach with A8X while leveraging 20nm early. We see that once the playing field has been leveled 20nm, its no contest in favor of Tegra X1.
Jumangi - Monday, January 5, 2015 - link
I mean a product that is widely available to CONSUMERS dude. And please stop with the "stomping" stuff. It means nothing about its performance with its also vastly higher power consumption. The A8 can exist in a smartphone. What smartphones have the K1? Oh that's right none because you would get a hour of use before your battery was dead. Mobile is about performance and speed. You can diss Apple all you want but from an SoC perspective they do it better than anyone else right now.
pSupaNova - Tuesday, January 6, 2015 - link
@Jumangi try to comprehend what he is saying.
Apple used a superior process on its A8X and more transistors to just edge the K1 in some CPU benchmarks. While core for core Nvida's is actually more powerful.
The GPU in the K1 also has near desktop parity etc OpenGL 4.4. Features like Hardware Tessellation are absent from the A8X.
Alexey291 - Tuesday, January 13, 2015 - link
That's great. It really is but lets be honest. A8x is faster than K1.

And end of the day that is sadly all that matters to the vaaaaaast majority of consumers.

Frankly even that barely matters. What does though is that games run better on my tablet than they do on yours so to speak. (Actually likely they run better on yours since I'm still using a nexus 10 xD)

But sure the new paper launch from nv late this year our early next year will be great and the 2.5 devices that x1 will appear in will be amazing. Making sales in hundreds of thousands.
SM123456 - Sunday, February 1, 2015 - link
The point is that the Tegra K1 Denver on 28nm beats the Apple A8 fairly comprehensively on 20nm with the same number of cores. Apple stuck on 50% more cores and 50% more transistors to allow the A8X on 20nm to have a slight edge over the Tegra K1 Denver. This means if Tegra K1 is put on 20nm, it will beat the 3 core Apple A8X with two cores, and the same thing will happen when both move to 16nm.
utferris - Monday, April 13, 2015 - link
Oh. Really? Denver K1 is not even as fast as A8X. Do not mention that it uses more than 2 times energy. I really do not understand people like you going around and saying how good nvidia shit is.
eanazag - Wednesday, January 7, 2015 - link
It'll likely be in the next Shield.
name99 - Monday, January 5, 2015 - link
(1) I wouldn't rave too enthusiastically about Denver. You'll notice nV didn't...
Regardless of WHY Denver isn't on this core, the fact that it isn't is not a good sign. Spin it however you like, but it shows SOMETHING problematic. Maybe Denver is too complicated to shift processes easily? Maybe it burns too much power? Maybe it just doesn't perform as well as ARM in the real world (as opposed to carefully chosen benchmarks)?

(2) No-one gives a damn about "how many GPU cores" a SoC contains, given that "GPU core" is a basically meaningless concept that every vendor defines differently. The numbers that actually matter are things like performance and performance/watt.

(3) You do realize you're comparing a core that isn't yet shipping with one that's been shipping for three months? By the time X1 actually does ship, that gap will be anything from six to nine months. Hell, Apple probably have the A9/A9X in production TODAY at the same level of qualification as X1 --- they need a LONG manufacturing lead time to build up the volumes for those massive iPhone launches. You could argue that this doesn't matter since the chip won't be released until September except that it is quite likely that the iPad Pro will be launched towards the end of Q1, and quite likely that it will be launched with an A9X, even before any Tegra X1 product ships.
chizow - Tuesday, January 6, 2015 - link
@Name99

1) Huh? Denver is still one of Nvidia's crowning achievements and the results speak for themselves, fastest single-core ARM performance on the planet, even faster than Apple's lauded Cyclone. Why it isn't in this chip has already been covered, its a time to market issue. Same reason Nvidia released a 32-bit ARM early and 64-bit Denver version of Tegra K1 late, time to market. Maybe, in the tight 6 month window they would have needed between bringing Denver and working on Erista, they simply didn't have enough time for another custom SoC? I'm not even an Apple fan and I was impressed with Cyclone when it was first launched. But suddenly, fastest single-core and a dual-core outperforming 4 and even 8-core SoC CPUs is no longer an impressive feat! That's interesting!

2) Actually, anyone who is truly interested does care, because on paper, a 6-core Rogue XT was supposed to match the Tegra K1 in theoretical FLOPs performance. And everyone just assumed that's what the A8X was when Apple released the updated SoC that matched TK1 GPU performance. The fact it took Apple a custom 8-core variant is actually interesting, because it shows Rogue is not as efficient as claimed, or conversely, Tegra K1 was more efficient (not as likely since real world synthetics match their claimed FLOPs counts). So if 6 core was supposed to match Tegra K1 but it took 8 cores, Rogue XT is 33% less efficient than claimed.

3) And you do realize, only a simpleton would expect Nvidia to release a processor at the same performance level while claiming a nearly 2x increase in perf/w right? There's live demos and benchmarks of their new X1 SoC for anyone at CES to test, but I am sure the same naysayers will claim the same as they did for the Tegra K1 a year ago, saying it would never fit into a tablet, it would never be as fast as claimed yada yada yada.

Again, the A9/A9X may be ready later this year, but the X1 is just leveling the playing field at 20nm, and against the 20nm A8/X we see it is no contest. What trick is Apple going to pull out of its hat for A9/A9X since they can't play the 20nm card again? 16nm FinFET? Possible, but that doesn't change the fact Apple has to stay a half step ahead just to remain even with Nvidia in terms of performance.
lucam - Wednesday, January 7, 2015 - link
1) He was saying: why NV didn't continue with Denver design? Being so efficient and only 2 cores why don't shift at 20nn easily? Because they can't and that's it. The other things are speculations.

2) You still compare apple (not Apple) with pears. Any vendors put inside his proprietary technology with their market strategy, important is to figure how GFLOPS and Texel is capable at same frequency and watt. You don't even know how Img cluster is built and nobody does and you still compare with NV cuda cores. Rogue XT frequency is set at 200mhz, Tegra K1 at 950mhz. Again what the heck you re talking about.

3) it is still a prototype type with a fan and nobody could check all the real frequency even though 1ghz seem reasonable. Hod dare you can compare a tablet with a reference board?

Again A9/A9X already exist now as prototypes, Apple doesn't sell chips and doesn't to any those sort of market. They need to see their product in a cycle year life. You live in another planet to not understand that.

NVIDIA Tegra X1 Preview & Architecture Analysis

Automotive: DRIVE CX and DRIVE PX

Post Your Comment

194 Comments

View All Comments

chizow - Monday, January 5, 2015 - link

Jumangi - Monday, January 5, 2015 - link

pSupaNova - Tuesday, January 6, 2015 - link

Alexey291 - Tuesday, January 13, 2015 - link

SM123456 - Sunday, February 1, 2015 - link

utferris - Monday, April 13, 2015 - link

eanazag - Wednesday, January 7, 2015 - link

name99 - Monday, January 5, 2015 - link

chizow - Tuesday, January 6, 2015 - link

lucam - Wednesday, January 7, 2015 - link

Log in

Don't have an account? Sign up now