The last time we visited TI's OMAP 4 SoC was at Mobile World Congress, there we benchmarked the LG Optimus 3D and came away decently impressed with performance even on a pre-launch device. Back then, Anand wrote that the remainder of this year and the next is going to be a heated battle for dual core and quad core SoCs fighting in the tablet and smartphone space. After today, you can add Windows 8 to that list as well. Today, TI is announcing its latest SoC, the OMAP4470, which offers a 20% increase in CPU clocks and an entirely new SGX 544GPU over OMAP4460. 

OMAP4470 is architecturally very similar to OMAP4460 with a number of notable changes. First off is that 20% increase in CPU clocks from 1.5 GHz in OMAP4460 to 1.8 GHz in OMAP4470. TI's comparison point for most of the OMAP4470 specs is the OMAP4430 which has its two Cortex-A9s clocked at 1.0 GHz. The two Cortex-M3 cores remain clocked at 266 MHz for handling multimedia processing and background realtime events. The end result is an effort to both let the two Cortex-A9s remain idle for more of the time, and unburden them during heavy processing. TI feels this dichotomy of two big and fast Cortex-A9 cores for web browsing and very computationally intensive tasks augmented with two ligher weight, low power Cortex-M3 cores offers it unique power savings potential. The two Cortex-M3 cores can offload Thumb and Thumb-2 instructions, as well as some hardware multiply and divide operations from the A9s. 

The real interesting change with OMAP4470, however, is a similar two-pronged approach on the GPU side of things. First, OMAP4470 moves from the PowerVR SGX540 present in OMAP4430 and OMAP4460 to a more powerful single core (MP1, if you will) PowerVR SGX544 GPU which offers 2.5x the performance of OMAP4430's SGX540. 

If you recall from Anand's excellent iPad 2 GPU exploration, SGX543/544 features four USSE2 pipes each with a 4-wide vector ALU churning thorugh 4 MADs per clock. I'm reproducing his table below, but if you mentally replace SGX543 with SGX544 you get the same picture. As an aside, the difference between SGX543 and SGX544 is purely that full DirectX 9 compliance is offered in the latter, making it a possible shoe-in for future Windows 8 platforms.

Mobile SoC GPU Comparison
  PowerVR SGX 530 PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543/544 PowerVR SGX 543/544MP2 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE USSE2 USSE2 Core Core
# of SIMDs 2 2 4 4 8 8 12
MADs per SIMD 2 2 2 4 4 1 ?
Total MADs 4 4 8 16 32 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 4.8 GFLOPS ?

If you recall the clocks for the OMAP4430, and OMAP4460, you can start to see where TI's 2.5x claim over its own OMAP4430 comes into play. Going from 304 MHz to 384 MHz is an ~25% increase in clock speed, which adds into the 200% increase in MADs per clock from the change from USSE to USSE2 going from SGX540 to SGX544. Do the math and it works out to almost exactly 2.5x. 

TI OMAP 4xxx SoC GPU Comparison
  OMAP4430 OMAP4460 OMAP4470
GPU Used PowerVR SGX540 PowerVR SGX540 PowerVR SGX544
Clock 304 MHz 384 MHz 384 MHz

The next part of what's new in OMAP4470 is inclusion of a new hardware composition system for doing display composition without taxing the SGX544. TI wouldn't disclose whose IP this is, but did acknowledge that it's from a third party and includes a dedicated 2D graphics core for compositing the entire display. Ordinarily this is done on the GPU, but TI hopes to accomplish the same composition on this hardware accelerator in a more power and bandwidth efficient manner for driving large displays while maintaining low power profile.

When big 3D applications kick in, then SGX544 powers up and takes over, but for the majority of UI paradigms, TI believes its hardware composition engine can enable power savings - analogous to the way the two Cortex-M3 cores augment the two Cortex-A9s. It's an interesting approach, and TI claims the hardware composition abstraction layer (HAL) is already completed to enable Android and other mobile OSes to leverage that acceleration immediately.

OMAP 4470 vs. 4430 Feature List - Provided by TI
Feature Benefit
Two ARM Cortex A9 MPCores @ 1.8GHz per core 80% increase in Web browsing performance
Two ARM Cortex-M3 cores Smart multicore processing optimized for low-power and real-time responsiveness
SGX544 GFX Core running at 384 MHz 2.5x overall graphics performance increase; support for DirectX, OpenGL ES 2.0, OpenVG 1.1, and OpenCL 1.1
Hardware composition engine with dedicated 2D graphics core Frees GPU to manage intensive tasks; maximizes power- efficiency
Display subsystem Supports as many as three HD displays and up to QXGA (2048x1536) resolution; HDMI supporting stereoscopic 3D
Dual-channel, 466 MHz LPDDR2 memory Higher memory bandwidth enables rendering and compositing of multilayer content at high resolutions
Complete pin-to-pin hardware and software compatibility Rapid transition and maximum re-use of investment from OMAP4430 and OMAP4460 processors

The real hope with OMAP4470 is the ability to drive very high resolution displays as well, up QXGA (2048x1536) and maintaining HDMI 1.4a stereoscopic 3D support. TI expects OMAP4470 devices to arrive in the first half of 2012 with sampling happening in the second half of 2011. 

POST A COMMENT

30 Comments

View All Comments

  • JMC2000 - Thursday, June 02, 2011 - link

    It would have to be P5 or P6-based at minimum... A 386 or even a 486, wouldn't cut it. Intel could integrate one or two Atom-like cores with a Core-i Series chip, and AMD could integrate some Bobcat-like small cores and Phenom or Bulldozer-based larger cores. Reply
  • sum1guy - Thursday, June 02, 2011 - link

    It is not necessary to pair small processors with large processors for power efficiency. Intel already studied it.

    CPU A runs at 1000 MHz and works on hard problems
    CPU B runs at 100 MHz to save power

    So you have an easy task that you'd like to pass to CPU B. CPU B performs the task in 100 ms. CPU A performs the task in 10 ms and then sleeps for the next 90 ms. Intel concluded that a super fast processor that is idle most of the time is more energy efficient than a slower processor that is running most of the time. Thanks to modern power gating, it makes sense to throw a super powerful CPU at even the simplest of tasks. It simply performs the task quickly and then power down.
    Reply
  • lmcd - Sunday, October 07, 2012 - link

    Umm, Intel isn't as right as you think. They had to spend A TON of R&D to get power gating to the point that what you suggested works reasonably well.

    As for why it still isn't perfect...

    Think about it: execution cores 3x the size of the little core out of the "big little" at the same speed are still going to use more power. And the instructions common for LP cores might be different than those common to regular usage. So where do you draw the line? And if you draw a line anywhere towards the low power side, you have peak performance compromises, so still less than optimal efficiency.

    Two different core types, with power gating or asynchronous clocks, is easily viable and probably more efficient
    Reply
  • sleepeeg3 - Thursday, June 02, 2011 - link

    Innovating?! They are just making things faster. If intel wanted to make a 10GHz processor, they could. It would probably also require a 10lb block of silver to cool it.

    Unlike the x86 world, mobile chips are slow enough that they are not constrained by heat, they are only constrained by battery life. Every one of these speed increases comes at an increase of power consumption and lower battery life. They can keep doing this forever until batteries will only last a minute on a charge. Why? ...because people believe faster=better. Not sure what anyone does on a cell phone that requires this increase in speed.

    As for your suggestion to use a 386, it is not a terrible idea, but the system components all draw a large amount of power, beside the CPU and integrating it would just add to the cost of the board that no one cares about. Sleep or standby consumption is not enough for most consumers to care about, unless they buy into this global warming nonsense.
    Reply
  • erple2 - Tuesday, June 07, 2011 - link

    They're nearly synonymous. For a while, mobile chips in the x86 world were slow enough that they weren't constrained by heat.

    As ARM chips get faster and faster, they're going to start heating up, and the constraints of being able to cool them become more and more important. I realize that there are architectural differences that are somewhat in ARM's favor for the class of workloads the ARM is expected to do vs. an x86 processor, but you're still eventually going to run into the heating issue.

    If for no other reason, fully discharging a Li-Ion battery in 1 minute will probably cause it to catch fire.
    Reply
  • djgandy - Friday, June 03, 2011 - link

    I fail to see the innovation you speak of. I see slapping more cores on and upping clock rates with die shrinks. Looks like it is mirroring the x86 market but is about 7 years behind.

    Remember smart phone chips are also getting a larger because their performance is seen as more important. You can't compare phones from 5 years ago with phones of today simply for this fact.
    Reply
  • sleepeeg3 - Thursday, June 02, 2011 - link

    You can just hear that battery life being sucked away! I think they are trying to convert mobile phones back to corded. You will need to cord constantly plugged in, just to keep it charged! Do you think we will get to use those nifty spiral wound cords again?

    So now we will be able to... preview a demo of UT3 on our keyboardless cell phones? I don't get it. Would be much more excited to hear that the new phones lasted a month without a charge. That would be awesome.
    Reply
  • Veerappan - Thursday, June 02, 2011 - link

    "2.5x overall graphics performance increase; support for DirectX, OpenGL ES 2.0, OpenVG 1.1, and OpenCL 1.1"

    OpenCL support on a mobile GPU could introduce some nice possibilities.
    Reply
  • erple2 - Tuesday, June 07, 2011 - link

    True, but I think it's more interesting from the giant massively parallel ARM based "servers" that are planned perspective. I wouldn't see that as a viable anything on a mobile device (yet). Reply
  • jessie320 - Monday, July 11, 2011 - link

    what is the package type? Flip chip BGA ? Reply

Log in

Don't have an account? Sign up now