Hybrid Systems

While the Cortex-M series aims to be the MCU for many application markets including IoT and wearables, ARM does not expect M series processors to always be used alone and expects many devices to combine A series application processors with the M seires. When I mentioned the word coprocessor referring to the M series, Nandan quickly pointed out that in this market, the A series might actually be considered the coprocessor. Considering the MCU is the always on device and the A series CPU wakes only sparingly, I can see his point of view. The following diagram from ARM lays out this perspective well.

MediaTek used a much simpler table to describe the sub markets of IoT and wearables that, as I noted at the time, insinuated there was no overlap between MCUs and APs. I tend to agree more with Intel’s Edison platform and ARM’s slide here that there are large market segments that will indeed be combining these differentiated processors.

When designing hybrid systems like this for IoT and wearables, it is very important to synthesize the AP with power optimization goals. The process of synthesizing HDL to an ASIC is essentially an optimization problem, much like all engineering. Targeting one aspect of performance, such as power consumption, means you’re willing to sacrifice something else. The prevailing trend so far has been to reuse smartphone processors in wearables. Companies practicing this approach are not optimizing their wearables' power use but are instead optimizing time to market and internal expenses.

To emphasize what this means, when the Cortex A15 launched ARM stated it was optimized for 1.2 GHz operation. When the first smartphone featuring an A15 hit the market it actually ran at much higher voltages to achieve higher frequencies and thus relatively high power consumption. Reusing this chip inside an IoT or wearable device is not only choosing a performance focused CPU instead of a power optimized one like the A7, but it has also been synthesized to further push the CPU away from power efficiency. This is why many wearables today featuring rich operating systems have struggled with battery life. Apple has traditionally been conservative with smart phone SoC power consumption and it will be interesting to see how their new wearable is designed.

For wearable devices, ARM recommends reducing A series frequency and area by over half, which has a direct effect on power consumption. ARM states that wise choices of CPU cores and caches, synthesis goals, and software optimizations to offload certain tasks to an MCU, can reduce power consumption by as much as 85%. This will be something we will keep an eye on when we review future wearables.

The Cortex M7 CPU Final Words
Comments Locked

43 Comments

View All Comments

  • nathanddrews - Wednesday, September 24, 2014 - link

    For as often as my phone, computer, stove, and microwave all get out of sync, you'd think we humans haven't yet mastered the whole "tracking of time" thing. Most of my devices are supposed to auto-update online, but that doesn't excuse the poor time-keeping of modern devices in between updates. Smart watch? Doubtful.
  • markmuehlbauer - Wednesday, September 24, 2014 - link

    Good point. 2014 and where are standards for this across all devices? The only devices that are truly accurate subscribe to a network time protocol (NTP) service.
    How about we recycle amplitude modulated carriers toward a narrow band digital signal broadcast NTP service. Then every device out there can have a simple AM loop antenna in it to "listen" to the NTP service and update. Of course this would require the FCC and IEEE to actually work together. Sigh. . .
  • otherwise - Wednesday, September 24, 2014 - link

    There are already at least two ways to get a reference timestamp over the air. First is GPS, which will get you accurate timestamps into the sub-millisecond domain, which is why you see GPS devices in data centers usually as a stratum0 NTP device and/or a PTP Master. The second is WWVB operated by the NIST, which broadcasts reference timestamps at the 60KHz band. If you have an alarm clocks that sets itself this is probably the source it uses.
  • makerofthegames - Tuesday, September 23, 2014 - link

    So for my own understanding of roughly how powerful these are, what x86 processor would you say they're most comparable to? From the general architecture they look like a first-gen P5 Pentium, minus the MMU (and optionally minus the cache and FPU). Would that be an accurate analogy?
  • jdesbonnet - Tuesday, September 23, 2014 - link

    It's hard to directly compare MCUs to application processors. They have different types of tasks. Application processors are about computational throughput and you pay for that in enormous gate counts (=area=cost) and energy consumption. Eg the quad core Intel Xeon I'm using right now has a gate count in excess of 2G gates. By comparison a small Cortex M0 will have as few as 12K gates and can run tasks such as wireless security sensors for years on end from a single coin cell battery. Also MCUs facilitate deterministic code execution where timing of events is critical (eg detect car crashing -> deploy airbag). For benchmarks look a this table: http://en.wikipedia.org/wiki/Instructions_per_seco... Seems a ARM Cortex M0 at 50MHz is somewhere between an Intel 486DX and the original Pentium.
  • makerofthegames - Tuesday, September 23, 2014 - link

    Yeah, it's never going to be an exact comparison. I pulled the P5 guess from the architecture - both P5 and M7 are in-order, two-issue superscalar with two main integer paths and an FPU. And while their goals were very different, they both were very constrained for transistors (by current standards), so I figured they might have some level of comparison.
  • wetwareinterface - Wednesday, September 24, 2014 - link

    the nearest comparison to an x86 processor is the new intel edison. however where this will excel at certain tasks the edison will excel at others.

    to break it down any task that is memory bandwidth constrained will be better on edison.
    any task that is latency critical and served best by an interrupt (the car crash airbag analogy above serves well) is better performed on the m7.

    any task that is math based, requires simple calculations, and parallel execution or reordering on the fly of the task being performed is done frequently then the m7 would be a good choice with the inclusion of the dsp. an example of this is manipulation of incoming audio or video data in response to user input (examples like a guitar pedal for audio or video mixing effects like on a club's screens).

    the simple fact is there's no perfect "one solution".
    what one does better the other does worse and no processor is best in all catagories
  • Stephen Barrett - Wednesday, September 24, 2014 - link

    edison isnt so much a processor as it is a SoM or system on module. It contains an atom silvermont applications processor (AP) and an Intel Quark microcontroller (MCU).
  • HardwareDufus - Tuesday, September 23, 2014 - link

    hmmm.. they are a risc based architecture (armv7 ISA), so they wouldn't have as high as an IPC as a cisc based architecture like pentium. however this cortex-m7 mcu is more a soc than just a straight processor like the pentium was. hard to say as the workloads are so absolutely different.

    BTW, with the embargo lifted.... ARM just updated their website:
    http://arm.com/products/processors/cortex-m/cortex...
    http://arm.com/Cortex-M7-chip-diagramLG.png

    will be exciting to watch some of the initial development & prototyping boards introduced by some of the big players (ti, stmicroelectronics and atmel) to see what native capabilities they break out.
    hopefully ethernet, sdio, jtag, and multiple channels of uart, can, spi, i2c/twi, dac are givens. would love a basic lcd interface too.
  • Wilco1 - Tuesday, September 23, 2014 - link

    No, by definition RISCs have better IPC than CISCs, not the other way around (on a RISC pretty much every instruction executes in a single cycle, unlike the complex CISC instructions). Studies have shown x86 and ARM actually do the same amount of work per instruction due to x86 compilers avoiding the complex instructions (this observation is why RISC exists!), and ARM doing a lot of work per instruction (compared to other RISCs - such as allowing shift+add as a single instruction, conditional execution, load/store multiple). This effectively means any IPC difference is not due to ISA but due to microarchitecture.

    Looking at the Dhrystone results, the M7 is a bit faster than A7, so should obliterate Quark, beat an old Pentium and do well against Silverthorne at similar frequency.

Log in

Don't have an account? Sign up now