In November 2019, the company NUVIA broke out of stealth mode. Founded by former senior Apple and Google processor architects, John Bruno, Manu Gulati and Gerard Williams III, the company came crashing out of the gate with quite considerable goals to revamp the server market with an SoC that would provide ‘A step-function increase in compute performance and power efficiency’. Today NUVIA is putting more data behind those goals.

The press release we received from NUVIA takes some time to cover some of the basics of the modern day server market, and it initially read almost like an AnandTech article, which is eerily scary. Suffice to say, NUVIA understands the current state of play of the server market, including where Intel and AMD stand with respect to each other, and how x86 offerings are squaring up against the other options on the market. As with most elements of the server market, different verticals often have different requirements, on compute, memory, IO, power, or physical constraints, as well as initial cost of hardware alongside total cost of ownership. To that end, NUVIA’s processor designs, according to the company, ‘an SoC that will deliver industry-leading performance with the highest levels of efficiency, at the same time’.

With that, NUVIA is announcing that its first generation CPU core will be called Phoenix and be built upon the ARM architecture (likely Armv9) with an architecture license. Phoenix will be part of the Orion SoC, with NUVIA stating that they are implementing ‘a complete overhaul of the CPU pipeline’. Gerard William’s designs from Apple are known to be considerably different to what we’ve seen elsewhere in the market, so we suspect that this is going to be a big part of the secret sauce behind Orion and its Phoenix cores.

NUVIA goes on to say that Phoenix is ‘a clean sheet design’, focusing on single core performance leadership and maximizing memory bandwidth and utilization. The Orion SoC will be built to focus on high utilization and sustained frequencies, without having to rely on high-turbo marketing numbers, to allow customers to make the best use of the hardware within allocated power and cooling budgets. Alongside this, NUVIA is stating that there will be hardware infrastructure built to specification ‘to support peak performance on real cloud workloads’.

NUVIA’s Numbers

The big part of the press release is NUVIA’s performance-per-watt claims. To do this, NUVIA is using Geekbench5 as a performance indicator, along with direct power measuring, of current in-market x86 and Arm offerings. NUVIA is taking smartphone and mobile based cores, such as Intel Ice Lake, Qualcomm SD865, AMD Ryzen 4700U, as well as Apple’s A12Z Vortex and A13 Lightning, as starting points. The reason for this is that NUVIA believes there is starting to become no meaningful difference between smartphone/mobile cores and server cores when extrapolated – only if you start adding in massive vector engines for specific customers does that become relevant.

According to NUVIA’s numbers, this is where the current market stands with respect to Geekbench 5. At every point, ARM’s results are more power efficient/higher performant than anything available on x86, even though at the high end Apple and Intel are almost equal on performance (for 4x the power on Intel).

NUVIA notes that power of the x86 cores can vary, from 3W to 20W per core depending on the workload, however in the sub 5W bracket, nothing from x86 can come close to the power efficiency of high-performance Arm designs. This is where Phoenix comes in.

NUVIA’s claim is that the Phoenix core is set to offer from +50% to +100% peak performance of the other cores, either for the same power as other Arm cores or for a third of the power of x86 cores. NUVIA’s wording for this graph includes the phrase ‘we have left the upper part of the curve out to fully disclose at a later date’, indicating that they likely intend for Phoenix cores to go beyond 5W per core.

At this point, NUVIA is running simulations of its core designs in-house to get these numbers. This is a standard thing for any company developing a new SoC or a new core before actually going to the fab to get it made. It also helps investors analyze where things stand.

What gives credibility to the new company’s lofty goals is the founder’s track record of their past designs. Apple’s silicon success over the last half decade has been one of the most impressive developments in the industry, and it seems NUVIA has been able to recruit top talent with the aim to reproduce such success in the datacentre market.

Some users might consider that SPEC should have been used, given its relevance to NUVIA’s initial target markets on server, and I perhaps agree. I suspect that NUVIA believed that GB5 might be more accessible to a wider audience for core-to-core comparisons.

The Future

NUVIA states with this press release that it will aim to have some of the highest performance and best efficiency CPU/SoC products in the market. The company reiterates that even if other vendors suddenly see a 20% year-over-year gain in raw performance, NUVIA still expects to be ahead of its main competitors. We shall have to wait and see what magic NUVIA has that others do not.

Update: Initially this article said that NUVIA will have new products in the next 18 months. This was a simple misreading of NUVIA's press release and the relevent sentence has been removed.

Related Reading

POST A COMMENT

49 Comments

View All Comments

  • npz - Wednesday, August 12, 2020 - link

    I have not seen graviton outperform Epyc yet since EVERY single benchmark done uses Amazon's constrained "vcpu" terminology and bechmarks HALF the physical cores of Epyc to Graviton's Reply
  • Wilco1 - Wednesday, August 12, 2020 - link

    Graviton 2 is cost optimized and runs at a low frequency, but despite that competes with fast x86 servers. Ampere Altra is 32% faster per core, plus it has 80 cores (and 128 in the next generation), so it outperforms EPYC.

    EPYC has 8 times the L3 cache, 3 times as much silicon and is beaten by a small startup company using an off-the-shelf Arm core! Now that's embarassing... Nuvia is doing the same again using a custom core. Intel and AMD won't be able to match this.
    Reply
  • abufrejoval - Tuesday, August 11, 2020 - link

    I keep wondering what their secret sauce is...

    With something like Ivan Godard's Mill architecture, I understand how they achieve an order of magnitude more compute performance out the same number of transistors and energy budget: It's quite simply a very clever way of doing things with a DSP inspired ISA that manges to remain general purpose and still my personal favorite, while I'll concede that general purpose has diminishing returns and RISC-V may be better.

    But with a given architecture like ARM, just how much can you do?

    The last architectural doubling of IPC performance I could sort of understand was the VISC design presented here four years ago. That was just a factor of 2 and it came with a very high effort in an area likely more prone than ever to side channel issues.

    But how these new cores can deliver the same general purpose compute power at a fraction of the energy cost on an existing ISA?

    There are really only two avenues that I can see:
    1. use fewer transistors: To my taste that's too much magic and I don't see Apple chips being small
    2. use more transistors but switch them much more slowly (and more aggressively off): At least that seems more likely than 1.

    In any case their approach can't be unique to ARM as an ISA, so I guess we won't know, because once that secret got out, everyone would copy their approach.

    Probably with less success on x86, because the inherent overhead and complexity of the translation layer isn't going away, while its benefits become ever less important.

    But RISC-V or Mill would profit, as would any other ARM if that technology became generalized.

    And I can see how and why they got out of Apple: There is really very little sellable benefit for the additional power on the smartphone.

    On the laptop workstation, much more so, but on the server, energy consumption is king.

    Easy to understand why Tim Cook doesn't like them doing a Jim Keller or going independent. But personally I'd be more interested in a 20GB leak from these guys than from Intel.
    Reply
  • Veedrac - Thursday, August 13, 2020 - link

    My reply ended up elsewhere: https://www.anandtech.com/comments/15967/nuvia-pho... Reply
  • npz - Wednesday, August 12, 2020 - link

    Talk about misleading. Everything in the headline is predicated on normalized power measurement. How about without that constraint?

    OR do I take it from this statement:
    > NUVIA notes that power of the x86 cores can vary, from 3W to 20W per core depending on the workload, however in the sub 5W bracket, nothing from x86 can come close to the power efficiency of high-performance Arm designs. This is where Phoenix comes in.

    That they actually NOT competing with x86. Because it sounds plain dishonest to make those claims *implying* total performance of up to the same power envelope that x86 servers use.... and then turn around and only release products in the 5W bracket.
    Reply
  • npz - Wednesday, August 12, 2020 - link

    .. or at least sub-5W per core bracket. Why not allow yourself 20W per core and see what the results are if your efficiency is actually scalable? Reply
  • anonomouse - Wednesday, August 12, 2020 - link

    Did you read a different article than the rest of us? It says their target is substantially >20W per core products, but at 5W per core? Reply
  • npz - Wednesday, August 12, 2020 - link

    Only according to power normalized measurement and not absolute performance measurement regardless of power. And that's my beef Reply
  • npz - Wednesday, August 12, 2020 - link

    In other words, given the much larger power envelope allowed by the socket and modem platform board, then lets see that perf oer watt scale UP with power especially if you are going to compare single thteaded to desktop cou. In that environment, where the cou and the user prioritizes absolute performance to achieve the most work done per unit if time, given the entire socket's power budget at your disposal, then lets see them turbo up the clock speed to max potential Reply
  • ksec - Wednesday, August 12, 2020 - link

    Well the A14 is expected to push to around ~1600 in GB5 ( purely from an GB's perspective ), so those curve will be higher up in only a few weeks time.

    So by the time they have a product out, it looks like Apple will be shipping A15.

    To me it is far more interesting why Gerard Williams III left Apple despite knowing Apple is making a switch to ARM on Mac.
    Reply

Log in

Don't have an account? Sign up now