Final Words

When it comes to processors, enthusiasts and laymen alike can identify the three largest players: Intel, AMD and ARM. Those names are also not mutually exclusive: AMD utilizes ARM designs for consumer security coprocessors and in its Opteron A1100 server processor. There are other processors out there (e.g. IBM's POWER CPUs), but they're generally not as well known. That's also the case with MIPS.

Not everyone knows the name MIPS, but Imagination hopes to change that by offering a viable alternative to the embedded market dominated by ARM. MIPS already has a large presence in networking and embedded devices. Introducing the I6400 keeps MIPS relevant and places additional pressure on ARM. According to the provided numbers (admittedly from MIPS) and feature descriptions, the I6400 appears to compete with and even surpass the highly anticipated ARM Cortex-A53. Imagination projects general availability of the I6400 to SoC designers by December 2014. We can estimate end-user availability at least 6 to 9 months after that.

Consumers will most likely directly experience the MIPS I6400 CPU in low cost Android tablets and handsets. Due to Android's Java heritage, some applications will work out-of-the-box. Other applications using the Android Native Development Kit (NDK) targeting Intel or ARM ISAs will unfortunately be incompatible. Until MIPS achieves enough volume to convince application developers to code to the MIPS3264 ISA or stick with Java, MIPS Android devices will be second class citizens. This is something to keep in mind if you're purchasing a phone for yourself or a tech savvy friend. Of course, basic operating system features like email, phone, text, web browsing, and chatting should all work fine.

Intel has enjoyed dominance of its performance leading processors in non-handset settings for the better part of a decade. ARMs embedded low power heritage has emerged as Intel’s biggest threat as mobile devices have exploded and now dominate the computing landscape. As Intel and ARM continue to battle for the high end embedded market, Imagination and MIPS hope to erode away ARM’s mid-range and low-end core competency. As a consumer, we can lean back and enjoy the competition that will force each company to work harder each and every year.

The I6400’s revised MIPS3264 Release 6 ISA, instruction bonding, and SMT execution pipeline bring a refreshing set of new innovations to the small-core market. In our A53 coverage we noted ARM was pushing in-order CPU performance about as far as it could possibly go. I’m always happy to see we might have been wrong.

The MIPS I6400 CPU
Comments Locked

84 Comments

View All Comments

  • alexvoica - Tuesday, September 2, 2014 - link

    You've (almost too) carefully forgot to mention the trap-and-emulate feature described in the spec.
  • DMStern - Tuesday, September 2, 2014 - link

    The documentation also says that only a subset can be trapped, and that some encodings have been re-used. I haven't studied the instruction encoding tables closely enough to know how many, and how serious the conflicts are. Presumably, as more instructions are added in later revisions, the less useful trapping will be.
  • Daniel Egger - Tuesday, September 2, 2014 - link

    Finally! I've been waiting a long time for new decent MIPS processors to show up as I've never quite warmed up with the ARM ISA.

    However the introduction is missing a couple of important facts (probably even more):
    1) RISC is usually a load-and-store architecture, meaning there're registers in abundance and the only way to work with data is to load it into registers and store it back if the result is needed later
    2) ... and that's also the reason why the instruction set is much simpler because there're less instruction variants because source and targets are known to be registers and in few cases immediates but almost never those funky combinations of different memory access types one can find in CISC
    3) This also means that instruction size is constant on RISC vastly simplifying instruction fetching and decoding
    4) Whether the code size increases or decreases compared to CISC very much depends on how the application and compiler can utilize the available registers because most of the bloat in RISC is actually caused by loads and stores, however thanks to register starvation on x86 there might be lots of cases where the addressing causes lots of bloat

    I would say if there's a comparison between RISC and CISC it should be more detailed on the important differences. Otherwise, why bother at all?
  • darkich - Tuesday, September 2, 2014 - link

    Great, but unfortunately for Imagination, ARM have already started licensing the successors to Cortex A53 and A57.
    They are codenamed Artemis and Maya
  • darkich - Tuesday, September 2, 2014 - link

    ..a bit of clarification, the Artemis refers to the big core while Maya is the small one
  • tuxRoller - Tuesday, September 2, 2014 - link

    Great article Stephen.
    Could you, at some point, go into a bit more depth on the relationships between out of order, superscalar and simultaneous multithreading? Your description of the dispatcher, and classification of this core as in-order, makes me wonder if I understand it at all. In particular, I didn't realise that superscalar is just a special case of out of order, as your text seems to imply (though you do say that it is not out of order, so it is puzzling).
  • heartinpiece - Tuesday, September 2, 2014 - link

    Some inaccurate information:
    Snoop coherence protocol doesn't connect cores to other cores, or one core doesn't monitor cache lines of another core.
    Instead coherence messages are broadcast to all cores of the system, and each core checks whether it has the cache that was broadcasted, and takes appropriate actions.
    If 8 cores use snooping, they don't 'connect' to the other 7, but rather the amount of broadcasted coherence messages increases. (Which may clog the interconnect)

    In the directory protocol, when a data is updated, the directory notifies the other cores which hold the data to the same address to invalidate the corresponding cache lines (instead of filling the other cores with the updated value).
    The reason for such action is because sending the actual value would be too large, and also, even if it is updated in the other cores, if the other cores don't access the newly updated cacheline, then we have sent the updated value for no reason.
    Rather, the invalidated approach takes a lazier approach and only fetches the updated value upon read/write to the cacheline.
  • Exophase - Wednesday, September 3, 2014 - link

    More on snooping in Cortex-A53:

    "Each core has tag and dirty RAMs that contain the state of the cache line. Rather than access these for each snoop request the SCU contains a set of duplicate tags that permit each coherent data request to be checked against the contents of the other caches in the cluster. The duplicate tags filter coherent requests from the system so that the cores and system can function efficiently even with a high volume of snoops from the system."
  • Stephen Barrett - Wednesday, September 3, 2014 - link

    Interesting! Thank you for this detail. I tried to find info about the SCU but I couldn't and my ARM contacts had not gotten back to me yet. I've added a paragraph about this
  • tuxRoller - Wednesday, September 3, 2014 - link

    You might also want to update this sentence as the reasoning no longer seems to apply:
    "This is likely a contributing factor in why the I6400 can be used in SMP clusters of 6, whereas the A53 is limited to SMP clusters of 4."

Log in

Don't have an account? Sign up now