Hybrid Systems

While the Cortex-M series aims to be the MCU for many application markets including IoT and wearables, ARM does not expect M series processors to always be used alone and expects many devices to combine A series application processors with the M seires. When I mentioned the word coprocessor referring to the M series, Nandan quickly pointed out that in this market, the A series might actually be considered the coprocessor. Considering the MCU is the always on device and the A series CPU wakes only sparingly, I can see his point of view. The following diagram from ARM lays out this perspective well.

MediaTek used a much simpler table to describe the sub markets of IoT and wearables that, as I noted at the time, insinuated there was no overlap between MCUs and APs. I tend to agree more with Intel’s Edison platform and ARM’s slide here that there are large market segments that will indeed be combining these differentiated processors.

When designing hybrid systems like this for IoT and wearables, it is very important to synthesize the AP with power optimization goals. The process of synthesizing HDL to an ASIC is essentially an optimization problem, much like all engineering. Targeting one aspect of performance, such as power consumption, means you’re willing to sacrifice something else. The prevailing trend so far has been to reuse smartphone processors in wearables. Companies practicing this approach are not optimizing their wearables' power use but are instead optimizing time to market and internal expenses.

To emphasize what this means, when the Cortex A15 launched ARM stated it was optimized for 1.2 GHz operation. When the first smartphone featuring an A15 hit the market it actually ran at much higher voltages to achieve higher frequencies and thus relatively high power consumption. Reusing this chip inside an IoT or wearable device is not only choosing a performance focused CPU instead of a power optimized one like the A7, but it has also been synthesized to further push the CPU away from power efficiency. This is why many wearables today featuring rich operating systems have struggled with battery life. Apple has traditionally been conservative with smart phone SoC power consumption and it will be interesting to see how their new wearable is designed.

For wearable devices, ARM recommends reducing A series frequency and area by over half, which has a direct effect on power consumption. ARM states that wise choices of CPU cores and caches, synthesis goals, and software optimizations to offload certain tasks to an MCU, can reduce power consumption by as much as 85%. This will be something we will keep an eye on when we review future wearables.

The Cortex M7 CPU Final Words
Comments Locked

43 Comments

View All Comments

  • Wilco1 - Wednesday, September 24, 2014 - link

    Embedded (M) was traditionally a micro controller using on-chip flash and SRAM, no MMU, no DSP, no FP support. The R series are higher performance realtime CPUs with TCM, caches, branch prediction and often external DRAM and FP. Now that M also supports DSP, FP, caches, and is becoming high performance, things have become blurred. The ISA differences are now the main distinction, M only supports Thumb-1, Thumb-2 and uses a different interrupt model, while the R architecture is basically A series plus TCM minus MMU. So many TLA's...
  • hammer256 - Wednesday, September 24, 2014 - link

    Oh that's right, different interrupt model. M series is generally lower latency because it's directly coupled, if I recall. It's just that this new M7 line blurs that line even further than the M4 did...
    For TCM, is it generally DRAM integrated on the MCU, or a tight interface between the MCU and the DRAM chips?
    Didn't one of samsung's SSDs use a few Cortex-R3 cores for their controller?
  • Wilco1 - Wednesday, September 24, 2014 - link

    3 R4 cores are used in Samsung SSDs.

    Simply put, TCM is fast on-core instruction/data SRAM, similar to an I- or D-cache. It is fully under user control and thus without the non-deterministic effects of a traditional cache. TCM can be used in addition to a cache. TCM allows high frequencies like a cache, and thus is faster than an external SRAM.

    The usage model is that you put all your critical realtime code/data in the instruction/data TCMs and run the rest from flash/DRAM. When an interrupt occurs, you start executing realtime code from the TCM immediately rather than having to wait for cache misses that inevitably occur if you didn't have TCM. So the TCMs are actually necessary for realtime on a fast CPU, having a low interrupt latency alone is not the whole story.
  • hammer256 - Wednesday, September 24, 2014 - link

    Oooh I see. It sounds like TCM is a big distinguishing feature between the M and R series then. So even if performance is equal, R series actually allows for applications with even tighter latency requirements than the M series.
    Well, learned something new today, thanks!
  • toyotabedzrock - Wednesday, September 24, 2014 - link

    It is not a good idea to put this in a wearable or a car. The lack of an MMU seems tone deaf given the security environment we live in.
  • Wilco1 - Wednesday, September 24, 2014 - link

    Most of the M series support an optional MPU for OS task protection. That said, security and MMU are 2 orthogonal things - an MMU doesn't stop exploits as otherwise we wouldn't have any viruses/trojans/rootkits/etc on PCs. For microcontrollers security is easier as there are far fewer possible security breaches, so it's more down to not setting default passwords or using old, already broken encryption algorithms.
  • ah06 - Thursday, September 25, 2014 - link

    Which one makes most sense in a wearable? M4, M7, Rx, A7, A53?
  • Wilco1 - Thursday, September 25, 2014 - link

    IMHO only M3 or M4 - anything else is way overkill for eg. a watch. You definitely don't want to run anything as big/complex as Linux/Android if you want to provide at least a week of battery life.
  • DIYEyal - Sunday, September 28, 2014 - link

    Actually the WeLoop tommy smartwatch has the M0, they claim 3 weeks of battery life with a 110mAh battery.
  • RomanR - Thursday, September 25, 2014 - link

    Hi,

    who can tell me: how many clock cycles will be needed for ten taps 32-bit FIR filter output sample computation ?
    1 cycle MAC instruction is O.K. but what about data transfer ?

Log in

Don't have an account? Sign up now