In November 2019, the company NUVIA broke out of stealth mode. Founded by former senior Apple and Google processor architects, John Bruno, Manu Gulati and Gerard Williams III, the company came crashing out of the gate with quite considerable goals to revamp the server market with an SoC that would provide ‘A step-function increase in compute performance and power efficiency’. Today NUVIA is putting more data behind those goals.

The press release we received from NUVIA takes some time to cover some of the basics of the modern day server market, and it initially read almost like an AnandTech article, which is eerily scary. Suffice to say, NUVIA understands the current state of play of the server market, including where Intel and AMD stand with respect to each other, and how x86 offerings are squaring up against the other options on the market. As with most elements of the server market, different verticals often have different requirements, on compute, memory, IO, power, or physical constraints, as well as initial cost of hardware alongside total cost of ownership. To that end, NUVIA’s processor designs, according to the company, ‘an SoC that will deliver industry-leading performance with the highest levels of efficiency, at the same time’.

With that, NUVIA is announcing that its first generation CPU core will be called Phoenix and be built upon the ARM architecture (likely Armv9) with an architecture license. Phoenix will be part of the Orion SoC, with NUVIA stating that they are implementing ‘a complete overhaul of the CPU pipeline’. Gerard William’s designs from Apple are known to be considerably different to what we’ve seen elsewhere in the market, so we suspect that this is going to be a big part of the secret sauce behind Orion and its Phoenix cores.

NUVIA goes on to say that Phoenix is ‘a clean sheet design’, focusing on single core performance leadership and maximizing memory bandwidth and utilization. The Orion SoC will be built to focus on high utilization and sustained frequencies, without having to rely on high-turbo marketing numbers, to allow customers to make the best use of the hardware within allocated power and cooling budgets. Alongside this, NUVIA is stating that there will be hardware infrastructure built to specification ‘to support peak performance on real cloud workloads’.

NUVIA’s Numbers

The big part of the press release is NUVIA’s performance-per-watt claims. To do this, NUVIA is using Geekbench5 as a performance indicator, along with direct power measuring, of current in-market x86 and Arm offerings. NUVIA is taking smartphone and mobile based cores, such as Intel Ice Lake, Qualcomm SD865, AMD Ryzen 4700U, as well as Apple’s A12Z Vortex and A13 Lightning, as starting points. The reason for this is that NUVIA believes there is starting to become no meaningful difference between smartphone/mobile cores and server cores when extrapolated – only if you start adding in massive vector engines for specific customers does that become relevant.

According to NUVIA’s numbers, this is where the current market stands with respect to Geekbench 5. At every point, ARM’s results are more power efficient/higher performant than anything available on x86, even though at the high end Apple and Intel are almost equal on performance (for 4x the power on Intel).

NUVIA notes that power of the x86 cores can vary, from 3W to 20W per core depending on the workload, however in the sub 5W bracket, nothing from x86 can come close to the power efficiency of high-performance Arm designs. This is where Phoenix comes in.

NUVIA’s claim is that the Phoenix core is set to offer from +50% to +100% peak performance of the other cores, either for the same power as other Arm cores or for a third of the power of x86 cores. NUVIA’s wording for this graph includes the phrase ‘we have left the upper part of the curve out to fully disclose at a later date’, indicating that they likely intend for Phoenix cores to go beyond 5W per core.

At this point, NUVIA is running simulations of its core designs in-house to get these numbers. This is a standard thing for any company developing a new SoC or a new core before actually going to the fab to get it made. It also helps investors analyze where things stand.

What gives credibility to the new company’s lofty goals is the founder’s track record of their past designs. Apple’s silicon success over the last half decade has been one of the most impressive developments in the industry, and it seems NUVIA has been able to recruit top talent with the aim to reproduce such success in the datacentre market.

Some users might consider that SPEC should have been used, given its relevance to NUVIA’s initial target markets on server, and I perhaps agree. I suspect that NUVIA believed that GB5 might be more accessible to a wider audience for core-to-core comparisons.

The Future

NUVIA states with this press release that it will aim to have some of the highest performance and best efficiency CPU/SoC products in the market. The company reiterates that even if other vendors suddenly see a 20% year-over-year gain in raw performance, NUVIA still expects to be ahead of its main competitors. We shall have to wait and see what magic NUVIA has that others do not.

Update: Initially this article said that NUVIA will have new products in the next 18 months. This was a simple misreading of NUVIA's press release and the relevent sentence has been removed.

Related Reading

POST A COMMENT

49 Comments

View All Comments

  • vinayshivakumar - Wednesday, August 12, 2020 - link

    A single Zen2 core consuming 12W peak and 2W min sounds very high ? Am i missing something ? Reply
  • Spunjji - Wednesday, August 12, 2020 - link

    That 12W figure is likely at peak turbo. 12W for the single core, 3W for the rest of the SoC Reply
  • stevekgoodwin - Wednesday, August 12, 2020 - link

    "NUVIA Phoenix Targets +40-50% ST Performance Over Zen 2 for Only 33% the Power"

    How is this headline related to the article? Where is single threaded performance mentioned?
    Reply
  • Veedrac - Thursday, August 13, 2020 - link

    The argument for the Mill made a lot more sense before Apple started making their own chips. It's absolutely true that if you do things the Intel way, pushing frequency to the very limit, out-of-order CPUs are great, big, and power hungry. But at Apple have proven, if you focus on power from the very start, a modern out-of-order processor can just be great and big.

    Consider this: Apple's ‘small’ Thunder cores are actually out-of-order CPUs, but “against a Cortex-A55 implementation such as on the Snapdragon 855, the new Thunder cores represent a 2.5-3x performance lead while at the same time using less than half the energy” per computation.

    Again, that's a much faster, much larger out-of-order core, using less energy than an in-order processor.

    And there's no reason to think you can't go bigger.
    Reply
  • Veedrac - Thursday, August 13, 2020 - link

    This was a reply to https://www.anandtech.com/comments/15967/nuvia-pho... Reply
  • ZachSaw - Thursday, August 13, 2020 - link

    There's a couple of reasons why ARM cores are all on the left of the graph. They aren't designed to scale up in frequency as much as the x86 cores (each lithography process has its own unique curve but generally raising frequency beyond the sweet spot requires exponentially higher voltage). The other more important one is ARM cores are missing a critical hardware feature that software engineers rely on and take for granted - Strong Memory Model. Without this, you'd have to issue memory barrier instructions whenever you need your objects to sync up with other threads. ARM does not yet have the granularity of Itanium when it comes to memory barrier instructions. There is no concept of acquire / release semantics. Geekbench's multithreaded benchmarks run benchmarks in parallel and call it a multicore bench. In other words, they run embarrassingly parallel. That artificially puts ARM in a better light.

    In real life workloads running on the CPU, you'd be dealing with problems that aren't embarrassingly parallel (databases with upserts happening at the same time as reads, game state managements etc). GPU handles the embarrassingly parallel problems much more efficiently than ARM cores.
    Reply
  • vvid - Friday, August 14, 2020 - link

    So wrong on many levels.
    1) x86 is the only popular architecture with "strong model". This is not a critical feature.
    2) A12Z has x86-like TSO mode.
    3) Synchronization is better to do through OS primitives.
    4) ARM has Load-Acquire (LDAR) / Store-Release (STLR)
    5) Results shown on the graph are SINGLE threaded.
    Reply
  • Wilco1 - Friday, August 14, 2020 - link

    In addition to vvid's comments: the graph not only shows Arm outperforming x86 on single threaded perf, but more importantly while using only one quarter of the power! This means Arm keeps its much better power efficiency even when scaling beyond x86.

    There are many reasons for this, but a modern ISA without 42 years of baggage, not chasing 5GHz like a fool, avoiding SMT and the complex x86 memory model certainly help...
    Reply
  • scineram - Monday, August 17, 2020 - link

    So they optimized the microarch to Geekbench binary disassemblies? Reply

Log in

Don't have an account? Sign up now