A Wider Back End

Moving beyond the micro-op queue, Tremont has an 8 execution ports, filled from 7 reservation stations.

The only two ports using a combined reservation station are the address generator units (AGUs) - this is in stark contrast to the Core design, which in Sunny Cove uses a unified reservation for all integer and floating point calculations and three for the AGUs. The reason that Tremont uses a unified reservation station for the two AGUs, also backed by extra memory for queued micro-ops, is in order to supply both AGUs with either 2x 16-byte stores, 2x 16-byte loads, or one of each. Intel clearly expects the AGUs on Tremont to be fairly active compared to other execution ports.

On the integer side, aside from the two AGUs, Tremont has 3 ALUs, a jump port, and a store data port. Each ALU supports different functions, with one enabling shift functions and another for multiplication and division. Compared to core, these ALUs are extremely lightweight, and Intel hasn’t gone into specifics here.

On the floating point side, we are a little bit more varied – the three ports are split between two ALUs and a store port. The two ALUs have one focused on fused additions (FADD), while the other focuses on fused multiplication and division (FMUL). Both ALUs support 128-bit SIMD and 128-bit AES instructions with a 4-cycle latency, as well as single instruction SHA256 at 4-cycles. There is no 256-bit vector support here. In order to help with certain calculations, GFNI instruction support is included.

There is also a larger 1024-entry L2 TLB, supporting 1024x 4K entries, 32x 2M entries, or 8x 1G entries. This is an upgrade from the 512-entry L2 TLB in Goldmont.

New Instructions

As with any generation, Intel adds new supported instructions to either accelerate common calculations that would traditionally require lots of instructions or to add new functionality. Tremont is no different.

TITLE
AnandTech Tremont Goldmont
Plus
Goldmont Airmont Silvermont
Process 10+ 14 14 14 22
Release Year 2019 2017 2016 2015 2013
New Instructions CLWB
GFNI
ENCLV
CLDEMOTE
MOVDIR*
TPAUSE
UMONITOR
UWAIT
SGX1
UMIP
PTWRITE
RDPID
RDSEED
SMAP
MPX
XSAVEC
XSAVES
CLFLUSHOPT
SHA
  SSE4.1
SSE4.2
MOVBE
CRC32
POPCNT
CLMUL
AES
RDRAND
PREFETCHW

(When asked what other new instructions are supported, Intel stated to look at the published documents about future instructions. When it was pointed out that those documents weren’t exactly clear and that in the past Intel hasn’t spoken about future designs, we were not afforded additional comments.)

When we get hold of a Tremont device, we’ll do a full instruction breakdown.

Tremont: A Wider Front End and Caches Beyond The Core, Conclusions
Comments Locked

101 Comments

View All Comments

  • Namisecond - Friday, November 1, 2019 - link

    Which will be far more important for devices that run Windows.
  • petr.koc - Friday, October 25, 2019 - link

    "the enterprise side has been dealing with a clock degradation issue that ultimately leaves Atom systems built on C2000 processors unable to boot,"

    This is unfortunately not precise as all Atom Bay Trail processors (desktop, mobile, server) including 14nm successors manufactured up to approximately 2018 are affected with LPC circuitry degradation issue that will kill them in the end:
    https://en.wikipedia.org/wiki/Silvermont#Erratum
    https://en.wikipedia.org/wiki/Goldmont#Erratum
  • 29a - Friday, October 25, 2019 - link

    Ugh, I just look at your links and I have a NAS box with a J1900. I wonder what can be done to replace it?
  • MASSAMKULABOX - Thursday, October 31, 2019 - link

    Yeah, I'm amazed this didnt byte Intel in the Ass much harder, AFAIK synology and cisco were both victims and I'm sure many others. So, start by making well-tested, reliable products.. and no harm in boosting up the GFX side of things (x2 X3?). Give us desktop systems @10w and lower
  • Bigos - Friday, October 25, 2019 - link

    > (We therefore assume that a 3.0 MB L2 will be 15-way.)

    That is very unlikely. 3.0MB (which is 3 * 1024 * 1024) is not divisible by 15. I'm sure the 3MB L2$ will be 12-way associative.

    1.5MB = 12 * 128kB
    3.0MB = 12 * 256kB
    4.5MB = 18 * 256kB
  • AntonErtl - Friday, October 25, 2019 - link

    It's clear that they drop products with low-$/area when they do not have enough capacity, but AFAIK that's not the case at the moment for 10nm; on the contrary, they have 10nm capacity and not much demand for Ice Lake (because they cannot get the clock rates and efficiency competetive with the 14nm Skylake derivatives). So building Tremont-based successors for Gemini Lake (where performance is not as critical) would be a way for them to get more revenue out of their 10nm production line(s?); of course they have to design that first, and they may have failed to do so, expecting Ice Lake production to be in full swing by now.

    Concerning sucking performance, here are some numbers for our LaTeX benchmark http://www.complang.tuwien.ac.at/franz/latex-bench...

    2.368 Intel Atom 330, 1.6GHz, 512K L2 Zotac ION A
    1.052 Celeron J1900 (Silvermont) 2416MHz (Shuttle XS35V4)
    0.712 Celeron J3455 (Goldmont) 2300MHz, ASRock J3455-ITX
    0.540 Celeron J4105 (Goldmont+) 2500MHz
    0.200 Core i7-6700K (Skylake), 4200MHz

    Skylake has about a factor 1.6 better IPC than Goldmont+, and allows higher clock rates (at higher power consumption), resulting in significantly better overall performance, but whether that makes the Goldmont+ suck depends on the application.
  • 29a - Friday, October 25, 2019 - link

    Decoding video, that's what the other two Atoms I've owned sucked at.
  • PeachNCream - Friday, October 25, 2019 - link

    You keep thrashing at that, but other people that have dissimilar experiences have supported claims that run contrary to your statements. What model Atoms and under what conditions haev you had this problem? This isn't an issue for anyone else and, frankly, watching video isn't the only thing a computer does so that complaint may have no impact on the wider range of use cases beyond watching YouTube and Netflix.
  • Jorgp2 - Friday, October 25, 2019 - link

    He probably has an in order atom.

    Pretty much all out of order atoms have hardware decoding acceleration
  • GreenReaper - Saturday, October 26, 2019 - link

    Or, he's trying to decode a video that isn't supported by the hardware. Like 10-bit anything until very recent. In fairness my Bobcat cores struggle with 60FPS anything, and plain Full HD MP4 decode also bogs down if you add anything but the most minimal of shader filters. But they're from ~2011.

Log in

Don't have an account? Sign up now