POST A COMMENT

84 Comments

Back to Article

  • Flunk - Tuesday, September 02, 2014 - link

    Competition is always good, it will be interesting to see how these perform in real devices. The performance/power consumption offered by modern ARM processors is difficult to compete with. Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    I6400 offers better performance at lower power and reduced area vs. the competition. I have included some benchmarks in my article http://blog.imgtec.com/mips-processors/meet-mips-i... Reply
  • name99 - Tuesday, September 02, 2014 - link

    I'm sorry but that article appears to be marketing crap.

    You state "Preliminary results for I6400 show that adding a second thread leads to performance increases of 40-50% on SPECint or CoreMark". So adding a second thread speeds up the SINGLE-THREADED version of SPEC? That's a neat trick.

    Likewise you happily claim that multi-threading make a "big difference" to web browsing, something that will come as news to the many engineers on the WebKit, Blink and IE teams who have sweated blood over this without much to show for their efforts.

    On your blog you can post whatever marketing fluff you like, but how about on AnandTech you limit yourself to actual numbers of real benchmarks?

    (Sorry to be cruel but, christ, throwing raw ads into the comment stream and pretending they're informed comment pisses me off no end.)
    Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    I might not be as versed as you are and excused me if I'm wrong (someone correct me if I am) but, as far as I know, SPEC supports multi-threading. Multi-threading really does improve performance - but don't take it from me, take it from our customers who are already using it in both 32- and 64-bit MIPS-based designs: Broadcom, Cavium, Lantiq - I could go on.

    I don't really understand how you can claim that my article is marketing fluff. It is marketing, yes. But doesn't every company have an official release? And doesn't part of that release include competitive positioning?

    Let's not be behind-the-screen aggressive for behind-the-screen aggressiveness's sake. We have already offered a lot more information than our competitors, including benchmark data in CoreMark, DMIPS and SPECint.
    Reply
  • name99 - Tuesday, September 02, 2014 - link

    "We have already offered a lot more information than our competitors, including benchmark data in CoreMark, DMIPS and SPECint."
    Then why is the post full of claims, and basically numberless graphs, but not actual tables of numbers? Ooh, we're 1.3x faster than "competing CPU" --- that's helpful.
    There's more information available in any AnandTech phone review.

    Say what you like about nVidia, at least their HotChips Denver marketing slide gave numbers of a sort for Denver, compared to Baytrail, Krait-400, iPhone 5S and Haswell, all for a range of benchmarks (DMIPS, SPECInt2K and SPECFP2K, AnTuTu, Geekbench, Google Octane and some memory benchmarks). I think they were wrong to omit (definitely) SunSpider and (I care less) Kraken because SunSpider in particular gives a good feel for single-threaded performance on a large real-world code base. (SPECInt2K is a reasonable proxy, but stresses the uncore more than is probably usual for mobile devices.) Octane (and Kraken) are less interesting IMHO because they synthesize a workload that is vastly more parallelized than most actual websites.

    (Of course I'd expect you to do better than nVidia, especially since you're the new kid on the block.
    That means, for example, real numbers not scaled percentages;
    it means running the benchmarks honestly --- using the optimal compiler plus flags for each device;
    it means telling the public what those flags were so they can reproduce if necessary;
    it means not playing games with cooling systems that aren't going to be used on a real device, or an OS power driver that does not match what will ship in real devices;
    and it means using appropriate best of breed devices --- eg it's a bit slimy to use an iPhone 5S [1.3GHz] rather than iPad Air [1.4GHz] unless you have some damn good reason (like you're comparing against the phone version of your chip, not the tablet version.)

    The code to be compiled to perform the SPECInt bechmark runs is not threaded. Sure, if your compiler is smart enough to auto-parallelize that code, it can go right ahead. Since no-one else's compiler has managed to achieve much by doing that, I kinda doubt MIPS has made a breakthrough here...

    Multi-threading improves performance IF YOUR CODEBASE IS THREADED. My point is that the market that's being implied here (phones, tablets) is NOT substantially threaded.
    There absolutely are markets (in many of which MIPS already does well, things like networking or cellular) where threading is important and of benefit. That doesn't change the fact that phones and tablets are not such a market, and pretending otherwise is not helpful to anyone.
    Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    This is where you are wrong, no matter how much your finger gets stuck on caps lock. Programming for multithreading is not radically different than programming for multicore. In fact, Linux-SMP operating systems (e.g. Android) will see a dual-threaded CPU as two physical cores.

    Regarding your comments about benchmarks, I invite you to show me real, concrete numbers from our CPU IP competitor. We have said 5.6 CoreMark and 3.0 DMIPS per MHz. Now show me the data - and I am not interested in semiconductor manufacturers who are not our competitors but IP vendors.

    The comparisons were made based on similar core configurations to ensure accuracy; how would you be able to reproduce them - are you an ARM licensee?
    Reply
  • Wilco1 - Tuesday, September 02, 2014 - link

    You've showed some numbers but not explained how they were made. As I said in my other post, MIPS uses a trick to get its CoreMark score, so any competitor result without the same trick will obviously look bad.

    And this is the issue with benchmarketing, unless it is possible to reproduce the score yourself, it is hard to believe any vendor-supplied scores.
    Reply
  • name99 - Tuesday, September 02, 2014 - link

    (a) Thanks for explaining SMT to stupid old me who's been in a coma for the past fifteen years and has never heard of the concept. Not sure WTF it has to do with my actual point about the dearth of threaded APPLICATIONS...

    (b) I'm not the guy trying to sell a CPU to the rest of the world, so I'm not sure why it's my job to provide numbers, but OK, here we go.

    iPhone 5S at 1.3GHz gets a geekbench-singlecore rating of about 1300, and a sunspider rating (with iOS7) of 416. What do you have as closest equivalent numbers?
    DMIPS --- give me a break. No-one cares about that because it tells you precisely nothing about anything hard that the CPU does. Coremark's slightly more interesting, but why don't you give some comparable CoreMark/MHz values so we can see what you consider to be your competitors.
    I see, for example, that Exynos quad A9 claims a value of 15.89 and a dual-core A15 claims 9.36. Would you consider those competitors?
    (As comparison, a single core A53 (at least the QC Snapdragon 410 variant) gets 3.7 according to AnandTech --- but 3.0 according to other sources so??? A57 is supposed to get 3.9, but who knows how trustworthy that number is.)

    Assuming your 5.6 number is for multi-threaded operation, I'm going to do the naive thing and say that that tells me the single-threaded value is 2.8, which is apparently worse than an A53. If you don't like that arithmetic, then give us the single-threaded benchmark numbers, rather than trying to persuade us that phones are a great example of user-level multi-threaded software.
    Reply
  • alexvoica - Wednesday, September 03, 2014 - link

    Please understand that CoreMark does not work like that for multi-threading vs multicore.

    If you look at their website https://www.eembc.org/coremark/

    PThreads refer to performance for both cores and/or threads - they do not specifically say which is which.

    ARM scores are for multicore versions - this is why the CoreMark per MHz per core number is obtained by dividing that number by the number of PThreads. For example, for one Cortex-A15 you have 9.36 / 2 = 4.68 CoreMark/MHz. A single core proAptiv - which is a single-threaded design too - offers 5.1 CoreMark/MHz.

    The number we've quoted for I6400 is 5.6 CoreMark/MHz. For multithreading however, you do not divide by number of threads since these are not individual CPUs but threads part of a single core. The score for a single core, single threaded I6400 is not half of 5.6. We specify very clearly in the press release/blog article that adding another thread improves performance by 40-50%, so your numbers are incorrect.

    I still don't understand why you are pushing your agenda so aggressively and jump to conclusions since the data is clear. The author of the article chose to quote DMIPS, but I believe we have presented a valid combination of benchmarks and scenarios. Again, we are not competing with silicon manufacturers - some of them are licensees - but with other IP vendors.
    Reply
  • Wilco1 - Wednesday, September 03, 2014 - link

    I don't agree Dhrystone and CoreMark are valid benchmarks for CPU comparisons - both are easily cheated. You claim some great results but you know very well these are not indicative of actual CPU performance. Both benchmarks use special compiler tricks (like I mentioned in other posts) that only speedup these benchmarks, but nothing else. I bet SPEC scores are not nearly as good.

    Once again eg. NVidia actually posted real scores for lots of benchmarks of their SoCs, including SPEC. Do the same rather than playing these benchmarketing games and you'll gain a lot more credibility.
    Reply
  • alexvoica - Wednesday, September 03, 2014 - link

    CoreMark is a superior benchmark over DMIPS and reflects real world performance and workloads - not all, I agree but it still covers a lot more. If you look at our CoreMark results, you will notice we use gcc.

    https://www.eembc.org/coremark/

    not proprietary (and expensive) compilers. I recommend you actually click on the link, see for yourself and then come back here and copy/paste what it says in the compiler section of that page.

    Again, if you had read my article and the press release, you would have seen we actually have said that we lead in SPECint scores AND provide better performance when using multithreading.

    I have no problem talking to you or anyone here trying to dispute our claims but let's keep it civilized and not imply I am deliberately lying.
    Reply
  • bji - Wednesday, September 03, 2014 - link

    Why are you so hostile? Calm the freak down man. Reply
  • Wilco1 - Tuesday, September 02, 2014 - link

    SPEC has a multithreaded variant called SPECrate, this runs as many threads as you want. Various compilers (eg. icc) do autoparallelize some of the SPEC benchmarks even for the base results. This has made SPEC almost useless as a single-threaded comparison. So what people do is ignore Intel's icc results and rerun SPEC using GCC with identical options on the CPUs to be compared. Reply
  • Samus - Wednesday, September 03, 2014 - link

    Wow name99 and Wilco1 chill the fuck out you're way over-complicating this article. When we have working devices we'll get the real benchmark comparisons to the "ARM equivalent" but its well understood MIPS has superior performance per watt capability at the cost of code and compiler complexity, in the same way ARM has superior performance per watt over x86 at the cost of x86 compatibility. The legacy P55C datapath inherently inflats the transistor count and inefficiency of x86 CPU's, and this is starting to become an issue for ARM as they have over 15 generations of designs, most of which are all backwards compatible with each other.

    MIPS has just 6 generations of designs to contend with, and was already a more efficient processing method from the get go. ARM's initially superior licensing model and incredibly successful development platform are what has lead to their success decades later.
    Reply
  • Wilco1 - Wednesday, September 03, 2014 - link

    When we have working devices we can compare performance per Watt. Until then which will be more efficient is just a guess. MIPS and ARM started around the same time and have similar baggage accumulated. Note MIPS is a simpler ISA and actually easier for the compiler as it doesn't have some of the more complex instructions that ARM has. Reply
  • Samus - Wednesday, September 03, 2014 - link

    MIPSv6 is way more complex than ARM57, just like MIPSv1 was way more complex than ARM4

    JAVA is JAVA, but the compiler and instruction sets are more advanced in MIPS, hence more complex. An author may chose not to use a lot of extensions, but long story short, I work with programmers all the time and optimizing for MIPS is more work as optimizing for ARM. There are a lot of reasons for this (admittedly, industry support being one huge one.)
    Reply
  • Wilco1 - Wednesday, September 03, 2014 - link

    Cortex-A57 is ARM's highest performance and most complex OoO core. You can't compare that with MIPSv6 - an architecture. Wait until Imagination designs a CPU with comparable performance.

    MIPS is a simpler ISA than ARM, so compilers are easier to write. But MIPS pays for that by having to use more instructions to achieve the same task and more complex hardware to achieve the same performance. The fusing of 2 loads or stores is a very good example of this - if this was supported in the ISA like on ARM, it wouldn't need to be special cased. And while it improves performance, you still pay with larger codesize.
    Reply
  • defiler99 - Thursday, September 04, 2014 - link

    MIPS is hardly the "new kid on the block"; the MIPS instruction set has always been a rare example of beauty and simplicity in design.

    It's hardly fair to call that article "marketing fluff" either; have you seen some examples of true fluff? That isn't one.
    Reply
  • puppies - Tuesday, September 02, 2014 - link

    The time to admit your affiliation was before you started your first post, not after you got called out on it...... Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    Called out for using my real name and claiming ownership (I, my) of the article from the first post? Don't think so. Reply
  • Wilco1 - Tuesday, September 02, 2014 - link

    I think name99 is a bit harsh but he has valid points. For example the MIPS CoreMark results use a special plugin that adds a CoreMark specific optimization. So it is misleading to claim a great CoreMark result.

    Now if this optimization was added to mainline and also enabled when benchmarking the "competitor" then it would be a fair comparion.
    Reply
  • mthrondson - Sunday, September 07, 2014 - link

    I'm not familiar with this special plug in. Can you elaborate? Reply
  • Samus - Wednesday, September 03, 2014 - link

    MIPS has always been superior to ARM in regards to "performance per watt" even dating back to the PocketPC day's (1997) but its success has always suffered because of architecture complexity. You might as well port machine code before you recompile an app from ARM to MIPS.

    The success of MIPS lies with the quality of the development kits, compilers, and price (which they will apparently be aggressively targeting)
    Reply
  • jjj - Tuesday, September 02, 2014 - link

    Interesting but will be hard for them to get big wins in mobile (the main SoC anyway). Even if they try to be cheaper , how much cheaper do they need to be to justify the compatibility issues.
    A note here, you look at the competition between ISAs but don't forget that we got a bunch of custom ARM cores too so for consumers it's even more fun.
    Also the Meizu MX4 launched today ,first device with the quad A17 Mediatek so a notable event.Haven't really seen any proper benchmarks for A17 yet ,hopefully soon.
    Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    The compatibility issue is going away fast. MIPS64 is a proven architecture that has a full ecosystem built around it, while others are still catching up. Additionally, 64-bit is an inflection point where a lot of software will need to be recompiled so a lot of people are actually starting from scratch. Reply
  • evolucion8 - Tuesday, September 02, 2014 - link

    While I am in love with ARM, I look forward for MIPS too! Reply
  • Notmyusualid - Tuesday, September 02, 2014 - link

    The more competition, the better... Reply
  • Wilco1 - Tuesday, September 02, 2014 - link

    There is no requirement to recompile anything but the OS - existing apps continue to run in 32-bit without a penalty. Reply
  • alexvoica - Wednesday, September 03, 2014 - link

    There is a requirement if you want your software to take advantage of the latest architectural improvements. If you don't, it will still run, yes - but not as fast/efficiently as it could. Reply
  • bleh0 - Tuesday, September 02, 2014 - link

    I have to say from a feature standpoint this seems far more interesting then A53. I just doubt that by this time next year we will actually see it in devices. Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    This is typical of any CPU IP - there is a design-in cycle so yes, it will still take some time before it will appear in end devices. But it will blow everyone out of the water when it does - this is why we already see a lot of interest and have secured multiple licensees for I6400. Reply
  • xdrol - Tuesday, September 02, 2014 - link

    Directly comparing the IPC of two different ISAs makes no sense; one needs to factor the instruction ratio of the same program compiled to ARM and MIPS too. Reply
  • Stephen Barrett - Tuesday, September 02, 2014 - link

    Good point. I've added some text discussing that. Thank you! Reply
  • WonderfulVoid - Tuesday, September 02, 2014 - link

    MIPS(64) and 64-bit ARM ISA are quite similar. ARM keeps the condition codes which MIPS does not have. Conditional instructions (except branches) are almost gone. Use of the barrel shifter on one of the operands have been limited.

    One of the ARMv8 architects is a former MIPS architect...
    Reply
  • BoyBawang - Tuesday, September 02, 2014 - link

    Sooner ARM will simply evolve to have all the new features of MIPS like SMT. It's dead on arrival. Even in low to mid end I can't see how it could challenge MediaTek unless they price it super aggressively. Reply
  • Stephen Barrett - Tuesday, September 02, 2014 - link

    Some companies would be interested in having an alternative to single sourcing ARM processors, even if MIPS processors are the same performance. Its all about managing risk. Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    SMT is nothing you build over night. It takes years of testing and proper microarchitectural re-design so let's agree to disagree. Reply
  • varulv - Tuesday, September 02, 2014 - link

    Mediatek is a Soc designer, they could buy mips licenses if they can save some money per ciò Reply
  • Krysto - Tuesday, September 02, 2014 - link

    > Until MIPS achieves enough volume to convince application developers to code to the MIPS3264 ISA or stick with Java, MIPS Android devices will be second class citizens.

    Patently false. The next version of Android will make that point moot.
    Reply
  • Brett Howse - Tuesday, September 02, 2014 - link

    ART is the replacement for Dalvik, not native code.
    http://www.anandtech.com/show/8231/a-closer-look-a...
    Reply
  • extide - Tuesday, September 02, 2014 - link

    No.. it's correct, and it has nothing to do with Android L. NDK apps are compiled directly to a specific ISA. For java apps, Android L/ART works the same way as Dalvik ... the only difference is WHEN the compiling to native code happens. Reply
  • coder111 - Tuesday, September 02, 2014 - link

    Imagination technologies? Same company that's responsible for the disaster that was the GPU in Intel Poulsbo? With no drivers coming to Linux and the most hostile approach to any Linux driver development?

    And they want to succeed in embedded/mobile space, where lots of things run Linux? I hope they change their stance on open-source development and hardware support, and soon...
    Reply
  • dwforbes - Tuesday, September 02, 2014 - link

    Imagination Technologies, the makers of PowerVR used in a significant percentage (if not majority) of mobile devices, doesn't just want to succeed -- they have been fairly dominant for years. Reply
  • Lonyo - Tuesday, September 02, 2014 - link

    I think that's an Intel problem partially, and it would be interesting to see if Imagination changes their approach. Intel licensed the GPU but didn't have all the access they need to develop proper drivers, and Imagination didn't care as it was a third party chip using their IP, I'd guess.
    It's 50/50 Intel not being sensible with the licensing and Imagination not feeling a need to care.

    And yes, they were a mess. DXVA support didn't exist in XP and caused BSOD every time, and in Win Vista/7 it didn't really do much, from my own personal non-Linux experience. No support for that GPU, but I'd mainly blame Intel for not managing to sort it out when they licensed the IP. They are the ones putting their name on the product and seem to have forgotten to sort out a proper way to support it, and Imagination probably don't care when they have their money.

    When it's Imagination's name on the side they would probably care.
    Reply
  • extide - Tuesday, September 02, 2014 - link

    You mean Imagination Technologies who has been the GPU provider in all of the Apple SOC's in All of the iPhones, iPads, and MANY existing android devices already on the market? That whole Atom GPU disaster was really more of Intel's fault than IMG's. Reply
  • patrickjchase - Tuesday, September 02, 2014 - link

    On page 3 you state "even though the core is listed by Imagination as in-order, the SMT feature (when present) allows the I6400 to behave as a superscalar core".

    Superscalar and out-of-order are orthogonal concepts. It is possible to have a core which is superscalar but not OoO (Cortex-A7/A8/A53, MIPS R5000, the original Pentium) as well as a core which is OoO but not superscalar (IBM 360/91, the very first OoO design). Note that all of the examples I gave do not use SMT/Hyperthreading.
    Reply
  • Stephen Barrett - Tuesday, September 02, 2014 - link

    I cleaned up that paragraph. Thank you for the feedback. I think I got my wires crossed when Imagination's details discussed superscalar at the same time they discussed SMT Reply
  • SarahKerrigan - Tuesday, September 02, 2014 - link

    "I would imagine the superscalar execution is limited to the next two instructions within a thread (as there is no reorder buffer); otherwise the entire core wouldn’t be listed as in-order."

    The diagram clearly shows that it can issue two ops from two different threads in a cycle. This is what makes it *simultaneous* multithreading, instead of fine-grained multithreading.

    There have been plenty of examples of in-order cores with SMT - for instance, the first-gen Atom core could issue from both hardware contexts in the same cycle, as could the lightweight dual-issue in-order PowerPC core in the Xbox 360 and the Cell. Issuing from multiple contexts per cycle isn't dependent on reorder capability.
    Reply
  • Stephen Barrett - Tuesday, September 02, 2014 - link

    Yes that was hopefully made clear in the previous paragraph about how SMT works. What I was discussing in that quoted sentance is the superscalar execution capability of a single thread (not SMT). See the preceeding sentance in the same paragraph "Even though the core is in-order, the I6400 performs superscalar execution for a given thread. Since it is dual dispatch, two instructions from a single thread can be executed in parallel." Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    It behaves like a superscalar CPU when used in a single threaded configuration and like an in-order design in multithreading variants. Reply
  • Exophase - Wednesday, September 03, 2014 - link

    Hi Alex, could you clarify what you mean by this comment? Superscalar and in-order are completely orthogonal properties, and I would expect that it always behaves like an in-order design regardless of SMT. Do you mean that in SMT mode it can only dispatch one instruction per cycle from the same thread? If that's the case, surely this is something that can be dynamically configured based on active thread count and not a fixed property of the processor? Reply
  • MartinT - Thursday, September 04, 2014 - link

    I guess it is true strictly speaking that because of the two execution queues, it's limited to either super-scalar (single-threaded) or (scalar) multi-threaded operation at any one instant.

    I agree that it should read 'scalar' rather than 'in-order'.
    Reply
  • mthrondson - Sunday, September 07, 2014 - link

    To clarify - the I6400 can run superscalar on a single thread, or issue from two threads simultaneously. And it can switch which thread(s) it is working from on a per cycle basis. Reply
  • Guspaz - Tuesday, September 02, 2014 - link

    They taught us MIPS32 assembly in school. My impression was that it was enormously simpler to write by hand than x86 assembly, much less of a headache to work with. Of course, assembly is almost entirely irrelevant these days. Reply
  • patrickjchase - Tuesday, September 02, 2014 - link

    The main reason everybody learns MIPS (a.k.a. "DLX") in school is because the dominant architecture text was co-written by MIPS' inventor. Reply
  • Guspaz - Tuesday, September 02, 2014 - link

    That's entirely possible. To be honest, I don't remember which textbook we used for our processor architecture course. But it was a breath of fresh air compared to x86 or even SIC. Having all the registers be general-purpose and letting you specify which register to put results into in the instruction was much easier to work with when writing assembly by hand on paper than x86, where every register seemed to be special-purpose, with different instructions putting results in different abstractly named registers. Reply
  • Exophase - Wednesday, September 03, 2014 - link

    I've written x86 and MIPS assembly in real world applications, and personally I find both to be pretty annoying. When writing MIPS assembly, the poor addressing modes, delay slots, and range of immediates make it more cumbersome than x86. When writing x86 assembly, the lack of registers and three-address operands make it more cumbersome than MIPS. I haven't written much in x86-64, which I suspect is less annoying. Reply
  • dwforbes - Tuesday, September 02, 2014 - link

    It's worth noting that for Android developers using the NDK, coding for MIPS, ARM64, or x86-64, is in the vast majority of cases nothing more than a compiler flag. There is seldom extra work necessary unless you've specifically used inline assembly. Reply
  • Samastrike - Tuesday, September 02, 2014 - link

    I couldn't help noticing that the android logo used in the slide on the second page is holding a lollipop. Is this some confirmation of the official name when android L releases? Reply
  • Stephen Barrett - Tuesday, September 02, 2014 - link

    I had the same thought. Wondered if someone would notice that. I'll let everyone conjecture on that topic :) Reply
  • flamethrower - Tuesday, September 02, 2014 - link

    For me, I know MIPS because the Sony PSP (handheld game console) uses a MIPS, the MIPS R4000. Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    Nintendo 64 used MIPS too. On top of that, it was a 64-bit MIPS-based CPU! Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    Can't sleep so I'm doing a live MIPS AMA at http://redd.it/2f9c14 if anyone wants to join. Reply
  • name99 - Tuesday, September 02, 2014 - link

    "
    If two load or store instructions arrive at the scheduler with adjacent addresses, the I6400 can "bond" them together into a single instruction executed by the load/store unit.
    "

    Armv8 has essentially the same thing. Details differ, but there is an instruction that loads/stores two registers to adjacent memory locations as one operation --- same idea to utilize the full width of the 128bit bus to the cache.
    Reply
  • Stephen Barrett - Tuesday, September 02, 2014 - link

    Good to know. That is an ISA update though so it requires compiler support and a recompile. The MIPS feature is part of their hardware scheduler so they can do it on 32 bit programs and 64 bit programs simultaneously and without any updates to the programs Reply
  • WonderfulVoid - Tuesday, September 02, 2014 - link

    Load/store dual (or double) is supported already on ARMv7A (infocenter.arm.com mentions support from ARMv5TE). These are 32-bit architectures but I am sure 64-bit ARMv8 can load/store 128 bits using equivalent instructions.

    Having the HW do it for you automatically is of course a nice feature. The end result might be the same.
    Reply
  • Wilco1 - Tuesday, September 02, 2014 - link

    Having separate instructions to do load/store double means smaller codesize - these instructions are commonly used during function prolog and epilog so they give significant gains. Reply
  • DMStern - Tuesday, September 02, 2014 - link

    The MIPS r6 architecture is very interesting, because in order to clear opcode space, a number of rarely-used instructions have been deleted. Some architectural wart have also been removed, maybe most notably the branch delay slot instruction. This is the first time anything has been removed from the base ISA since its creation in 1985. Reply
  • WonderfulVoid - Tuesday, September 02, 2014 - link

    Is MIPSr6 backwards compatible? Can you run earlier user and kernel space binaries on a MIPSr6 processor?
    Difficult to emulate the removed instructions if those opcodes are used for new instructions.
    Maybe there is a need for a mode switch, r6 mode or pre-r6 mode?
    Reply
  • DMStern - Tuesday, September 02, 2014 - link

    It is not backwards compatible.
    "In Release 6 implementations, object-code compatibility is not guaranteed when directly executing pre-Release 6 code, because certain pre-Release 6 instruction encodings are allocated to different instructions in Release 6."
    Removing the delay slot of course also breaks binary compatibility in a major way. The documentation (which you can download from ImgTec's website) claims r6 has been designed to make translation of old binaries efficient.
    Reply
  • alexvoica - Tuesday, September 02, 2014 - link

    You've (almost too) carefully forgot to mention the trap-and-emulate feature described in the spec. Reply
  • DMStern - Tuesday, September 02, 2014 - link

    The documentation also says that only a subset can be trapped, and that some encodings have been re-used. I haven't studied the instruction encoding tables closely enough to know how many, and how serious the conflicts are. Presumably, as more instructions are added in later revisions, the less useful trapping will be. Reply
  • Daniel Egger - Tuesday, September 02, 2014 - link

    Finally! I've been waiting a long time for new decent MIPS processors to show up as I've never quite warmed up with the ARM ISA.

    However the introduction is missing a couple of important facts (probably even more):
    1) RISC is usually a load-and-store architecture, meaning there're registers in abundance and the only way to work with data is to load it into registers and store it back if the result is needed later
    2) ... and that's also the reason why the instruction set is much simpler because there're less instruction variants because source and targets are known to be registers and in few cases immediates but almost never those funky combinations of different memory access types one can find in CISC
    3) This also means that instruction size is constant on RISC vastly simplifying instruction fetching and decoding
    4) Whether the code size increases or decreases compared to CISC very much depends on how the application and compiler can utilize the available registers because most of the bloat in RISC is actually caused by loads and stores, however thanks to register starvation on x86 there might be lots of cases where the addressing causes lots of bloat

    I would say if there's a comparison between RISC and CISC it should be more detailed on the important differences. Otherwise, why bother at all?
    Reply
  • darkich - Tuesday, September 02, 2014 - link

    Great, but unfortunately for Imagination, ARM have already started licensing the successors to Cortex A53 and A57.
    They are codenamed Artemis and Maya
    Reply
  • darkich - Tuesday, September 02, 2014 - link

    ..a bit of clarification, the Artemis refers to the big core while Maya is the small one Reply
  • tuxRoller - Tuesday, September 02, 2014 - link

    Great article Stephen.
    Could you, at some point, go into a bit more depth on the relationships between out of order, superscalar and simultaneous multithreading? Your description of the dispatcher, and classification of this core as in-order, makes me wonder if I understand it at all. In particular, I didn't realise that superscalar is just a special case of out of order, as your text seems to imply (though you do say that it is not out of order, so it is puzzling).
    Reply
  • heartinpiece - Tuesday, September 02, 2014 - link

    Some inaccurate information:
    Snoop coherence protocol doesn't connect cores to other cores, or one core doesn't monitor cache lines of another core.
    Instead coherence messages are broadcast to all cores of the system, and each core checks whether it has the cache that was broadcasted, and takes appropriate actions.
    If 8 cores use snooping, they don't 'connect' to the other 7, but rather the amount of broadcasted coherence messages increases. (Which may clog the interconnect)

    In the directory protocol, when a data is updated, the directory notifies the other cores which hold the data to the same address to invalidate the corresponding cache lines (instead of filling the other cores with the updated value).
    The reason for such action is because sending the actual value would be too large, and also, even if it is updated in the other cores, if the other cores don't access the newly updated cacheline, then we have sent the updated value for no reason.
    Rather, the invalidated approach takes a lazier approach and only fetches the updated value upon read/write to the cacheline.
    Reply
  • Exophase - Wednesday, September 03, 2014 - link

    More on snooping in Cortex-A53:

    "Each core has tag and dirty RAMs that contain the state of the cache line. Rather than access these for each snoop request the SCU contains a set of duplicate tags that permit each coherent data request to be checked against the contents of the other caches in the cluster. The duplicate tags filter coherent requests from the system so that the cores and system can function efficiently even with a high volume of snoops from the system."
    Reply
  • Stephen Barrett - Wednesday, September 03, 2014 - link

    Interesting! Thank you for this detail. I tried to find info about the SCU but I couldn't and my ARM contacts had not gotten back to me yet. I've added a paragraph about this Reply
  • tuxRoller - Wednesday, September 03, 2014 - link

    You might also want to update this sentence as the reasoning no longer seems to apply:
    "This is likely a contributing factor in why the I6400 can be used in SMP clusters of 6, whereas the A53 is limited to SMP clusters of 4."
    Reply
  • OreoCookie - Thursday, September 04, 2014 - link

    Just a small correction: also Oracle's/Sun's UltraSparc T-series supports SMT. From the T2 on Oracle has implemented 8-way SMT while the original T1 »only« had 4-way SMT. Reply
  • MrSpadge - Thursday, September 04, 2014 - link

    The inclusion of SMT should be especially helpful for in-order designs. Which is probably why they can claim such a huge performance increase. And to make the number of threads configurable is an interesting design choice. This way "users" (licensees) can balance throughput and latency for the intended application. Reply
  • narmermenes - Friday, September 05, 2014 - link

    I'm excited the Imagination is being aggressive with their acquisition of MIPS. The platform is a great alternative to ARM, and in many way superior to the ARM architecture.
    While ARM is just starting to offer a 64 bit variant, MIPS has offered a 64 bit version for the last 10-12 years.
    Thousands of server designs have already been performed for MIPS and there already exists a mountain of server and enterprise software for the 64 bit version.
    Now with the Open-MIPS program running full steam, MIPS can really be the next big thing in Open Hardware allowing fully open systems based on Open-MIPS hardware running Open Operating system in Linux.
    Reply
  • fteoath64 - Monday, September 08, 2014 - link

    Great discussion guys!. Thanks and welcome back MIPS!. Almost like an old "ghost" re-appearing on the scene with a vengeance this time. I see MIPS as targeting the server market being the lucrative one since only Intel is the surviving player and Intel has no RISC to play!. This server market can quickly come to the household where replacement for x86 aging PCs would be common as Linux takes hold and Android games goes into gamestream mode (a la Nvidia). Does anyone think MS would do a MIPS version of WIndows 9 just to get back at Intel ?. Don't think so. Hence, Open source is the only way to go s a home OS. It has everything people need and more.

    Note: Flashback and remember the MIPS NT server OS in the old days and how that died. I am sure the older MIPS people understand and remembered ....
    A MIPS server need to be twice as powerful as an equivalent x86 server and going for half the price to compete. They can with the current tech and some serious investment in software as well. Both OS development and apps. Focussing on the cloud and mobile clients, a market for private cloud is there for the taking. Storage vendors like WD and Seagate will be happy with more local storage that re-cycles their products every 3-4 years.
    Reply

Log in

Don't have an account? Sign up now