NVIDIA Optimus Unveiled

Optimus is switchable graphics on steroids, but how does it all work and what makes it so much better than gen2? If you refer back to the last page where we discussed the problems with generation two switchable graphics, Optimus solves virtually every one of the complaints. Manual switching? It's no longer required. Blocking applications? That doesn't happen anymore. The 5 to 10 second delay is gone, with the actual switch taking around 200 ms—and that time is hidden in the application launch process, so you won't notice it. Finally, there's no flicker or screen blanking when you switch between IGP and dGPU. The only remaining concern is the frequency of driver updates. NVIDIA has committed to rolling Optimus into their Verde driver program, which means you should get at least quarterly driver updates, but we're still looking forward to the day when notebook and desktop drivers all come out at the same time.

As we mentioned, most of the work that went into Optimus is on the software side of things. Where the previous switchable graphics implementations used hardware muxes, all of that is now done in software. The trick is that NVIDIA's Optimus driver is able to look at each running application and decide whether it should use discrete graphics or the IGP. If an application can benefit from discrete graphics, the GPU is "instantly" powered up (most of the 200 ms delay is spent waiting for voltages to stabilize), the GPU does the necessary work, and the final result is then copied from the GPU frame buffer into the IGP frame buffer over the PCI Express bus. This is how NVIDIA is able to avoid screen flicker, and they have apparently done all of the background work using standard API calls so that there's no need to worry about updating drivers for both graphics chips simultaneously. They're a bit tight-lipped about the precise details of the software implementation (with a patent pending for what they've done), but we can at least go over the high-level view and block diagram as we discuss how things work.


NVIDIA states that their goal was to create a solution that was similar to hybrid cars. In a hybrid car, the driver doesn't worry about whether they're currently using the battery or if they're running off the regular engine. The car knows what's best and it dynamically switches between the two as necessary. It's seamless, it's easy, and it just works. You can get great performance, great battery life, and you don't need to worry about the small details. (Well, almost, but we'll discuss that in a bit.) The demo laptop for Optimus is the ASUS UL50Vf, which is identical from the outside when looking at the UL50Vt, but there are some internal changes.


Previously, switchable graphics required several hardware multiplexers and a hardware or software switch. With Optimus, all of the video connections come through the IGP, so there's no extra hardware on the motherboard. Let me repeat that, because this is important: Optimus requires no extra motherboard hardware (beyond the GPU, naturally). It is now possible for a laptop manufacturer to have a single motherboard design with an optional GPU. They don't need to have extra layers for the additional video traces and multiplexers, R&D times are cut down, you don't need to worry about signal integrity issues or other quality concerns, and there's no extra board real estate required for multiplexers. In short, if a laptop has an NVIDIA GPU and a CPU/chipset with an IGP, going forward there is no reason it shouldn't have Optimus. That takes care of one of the biggest barriers to adoption, and NVIDIA says we should see more than 50 Optimus enabled notebooks and laptops this summer. These will be in everything from next-generation ION netbooks to CULV designs, multimedia laptops, and high-performance gaming monsters.


We stated that most of the work was on the software side, but there is one new hardware feature required for Optimus, which NVIDIA calls the Optimus Copy Engine. In theory you could do everything Optimus does without the copy engine, but in that case the 3D Engine would be responsible for getting the data over the PCI-E bus and into the IGP frame buffer. The problem with that approach is that the 3D engine would have to delay work on graphics rendering while it copied the frame buffer, resulting in reduced performance (it would take hundreds of GPU cycles to copy a frame). To eliminate this problem, NVIDIA added a copy engine that works asynchronously from frame rendering, and to do that they had to separate the rendering effort from the rest of the graphics engine. With those hardware changes complete, the rest is relatively straightforward. Graphics rendering is already buffered, so the Copy Engine simply transfers a finished frame over the PCI-E bus while the 3D Engine continues working on the next frame.

If you're worried about bandwidth, consider this: In a worst-case situation where sixty 2560x1600 32-bit frames are sent at 60FPS (the typical LCD refresh rate), the copying only requires 983MB/s. An x16 PCI-E 2.0 link is capable of transferring 8GB/s, so there's still plenty of bandwidth left. A more realistic resolution of 1920x1080 (1080p) reduces the bandwidth requirement to 498MB/s. Remember that PCI-E is bidirectional as well, so there's still 8GB/s of bandwidth from the system to the GPU; the bandwidth from GPU to system isn't used nearly much. There may be a slight performance hit relative to native rendering, but it should be less than 5% and the cost and routing benefits far outweigh such concerns. NVIDIA states that the copying of a frame takes roughly 20% of the frame display time, adding around 3ms of latency.

That covers the basics of the hardware and software, but Optimus does work beyond simply rendering an image and transferring it to the IGP buffer. It needs to know which applications require the GPU, and that brings us to a discussion of the next major software enhancement NVIDIA delivers with Optimus.

A Brief History of Switchable Graphics Optimus: Recognizing Applications
POST A COMMENT

49 Comments

View All Comments

  • jkr06 - Saturday, February 27, 2010 - link

    From all the articles I read, one thing is still not clear to me. I have a laptop with core i5(which has IGP) and nvidia 330M. So can I utilize the optimus solution with just SW. Or the laptop manufacturers specifically need to add something to truly make it work. Reply
  • JonnyDough - Friday, February 19, 2010 - link

    was a wasted effort. Seems sort of silly to switch between two GPU's when you can just use one powerful one and shut off parts of it. Reply
  • iwodo - Thursday, February 18, 2010 - link

    First, the update, they should definitely set up something like Symantec or Panda Cloud Database, where users input are stored and shared and validated worldwide. The amount of games that needs to be profiled is HUGE. Unless there is a certain simple and damn clever way of catching games. Inputting every single games / needed apps running exe names sounds insane to me. There has to be a much better way to handle this.

    I now hope Intel would play nice, and gets a VERY power efficient iGPU inside SandyBridge to work with Optimus, Instead of botching even more transistors for GPU performance.
    Reply
  • secretanchitman - Saturday, February 13, 2010 - link

    any chance of this going to the macbook pros? im hoping when they get updated (soon i hope), it will have some form of optimus inside. if not, something radeon based. Reply
  • strikeback03 - Thursday, February 11, 2010 - link

    As I don't need a dGPU, I would like to see a CULV laptop with the Turbo33 feature but without any dGPU in order to save money. Maybe with Arrandale. Reply
  • jasperjones - Wednesday, February 10, 2010 - link

    Jarred,

    As you imply, graphics drivers are complex beasts. It doesn't make me happy at all that now Optimus makes them even more complex.

    Optimus will likely require more software updates (I don't think it matters whether they are called "driver" or "profile" updates).
    That puts you even more at the mercy of the vendor. Even prior to Optimus, it bothered me that NVIDIA's driver support for my 3 1/2 year old Quadro NVS 110m is miserable on Win 7. But, with Optimus, it is even more critical to have up-to-date software/driver support for a good user experience! Furthermore, software solutions are prone to be buggy. For example, did you try to see if Optimus works when you run virtual machines?

    Also, I don't like wasting time installing updates. Why can't GPUs just work out of the box like CPUs?

    Lastly, these developments are completely contrary to what I believe are necessary steps towards more platform independence. Will NVIDIA ever support Optimus on Linux? While I suspect the answer is yes, I imagine it will take a year and a half at the very least.
    Reply
  • obiwantoby - Wednesday, February 10, 2010 - link

    I think it is important to note, that the demo video, even though it is a .mov, works in Windows 7's Windows Media Player. It works quite well, even with hardware acceleration.

    Keep encoding videos in h.264, it works on both platforms in their native players.

    No need for Quicktime on Windows, thank goodness.
    Reply
  • dubyadubya - Wednesday, February 10, 2010 - link

    "Please note that QuickTime is required" FYI Windows 7 will play the mov file just fine so no need for blowtime. Why the hell would anyone use a codec that will not run on XP or Vista without Blowtime is beyond me. For anyone wanting to play mov files on XP or Vista go get Quicktime alternative. Reply
  • beginner99 - Wednesday, February 10, 2010 - link

    Did I read that right, HD video is always decoded on the dGPU even if the (Intel) IGP could deal with it?

    I mean it sounds nice but is there also an option prevent certain apps from using de dGPU?

    Or preventing the usage of dGPU completely like when one really needs the longest battery time possible? -> some users like to have control themselves.
    intel IGP might offer worse quality with their videao decode feature (but who really sees that on laptop lcd?) but when travelling the whole day and watching movies, I would like to use as little power as possible.
    Reply
  • JarredWalton - Wednesday, February 10, 2010 - link

    It sounds like this is really just a case of the application needing a "real profile". Since I test x264 playback using MPC-HC, I had to create a custom profile, but I think that MPC-HC detected the GMA 4500MHD and decided that was perfectly acceptable. I couldn't find a way to force decoding of an .mkv x264 video within MPC-HC, but other video playback applications may fare better. I'll check to see what happens with WMP11 as well tomorrow (once I install the appropriate VFW codec). Reply

Log in

Don't have an account? Sign up now