12:34PM EDT - We're here at Hot Chips 31 / 2019, and the first talk to be live blogged is IBM's newest variant of its POWER CPUs.

12:37PM EDT - Quite possibly the biggest Hot Chips crowd I can remember.

12:45PM EDT - The Arm talk is set to finish here in a bit, then IBM will start

12:45PM EDT - We already covered Arm's Neoverse N1 strategy earlier in the year: https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform

12:55PM EDT - Just finishing up the previous talk

12:57PM EDT - Hopefully this is about POWER10 :)

12:57PM EDT - It could be the Power9 IO chip

12:58PM EDT - 2018 talk was about Power9 SU core

12:58PM EDT - IBM now has family of processors. Start with some one up front, and work on the rest of the family

12:58PM EDT - Scale out first, then scale up

12:58PM EDT - One optimized for dual socket, one optimized for 16 sockets

12:59PM EDT - Power9 AIO does things they wanted to do before power 10

12:59PM EDT - new accelerator technology deployed on Power9

12:59PM EDT - Today in Power9

12:59PM EDT - Power10 for 2021

12:59PM EDT - New core on Power10 and new transistor technology in 2021

01:00PM EDT - Accessing heterogenous systems

01:00PM EDT - Need to focus on diverse acceleration devices and diverse memory devices beyond CPUs

01:01PM EDT - Need to focus on heterogenous systems, not just GHz

01:01PM EDT - Need to deploy different types of hetergeneous systems

01:01PM EDT - Trying to remove the different types of SerDes on a chip. Want to consolodate these down to a single design

01:02PM EDT - On Power9, now only have two types of SerDes. PCIe and everything else is built on 25G SerDes

01:02PM EDT - SerDes can make something area and power efficient when fixed to 25G, then just scale the number of links

01:02PM EDT - Take all the 25G signals from the chip and deploy composable systems across different accelerator technologies

01:03PM EDT - NVLINK and OpenCAPI and OMI

01:03PM EDT - OMI is the memory interface to connect memory across SerDes

01:04PM EDT - On-chip Gzip accelerator

01:04PM EDT - IBM has delivered #1 and #2 supercomputers on the list

01:04PM EDT - Built for the AI era

01:05PM EDT - Now OpenCAPI, IBM sees it as being very important in future accelerator systems

01:05PM EDT - Minimizing overhead and latency that PCIe has

01:05PM EDT - Accelerators not only GPU, but SmartNICs, networking, FPGAs, AI accel

01:06PM EDT - Want software to take data from anywhere in the system on any device

01:06PM EDT - (some of the images here look low quality - click through to see full quality)

01:06PM EDT - Power9 has direct attached memory

01:07PM EDT - Some of the former secret sauce technologies are in the new open memory standard

01:07PM EDT - Can deal with asymmetry

01:08PM EDT - Having this connectivity allows for independent development of accelerators rather than focusing on the CPU

01:09PM EDT - Don't want programmers to worry about host-to-device connectivity

01:09PM EDT - Also OpenCAPI helps with security

01:09PM EDT - Prevents an accelerator crashing a whole system

01:10PM EDT - Need to make sure accelerators can't add in potential cache coherent bugs

01:11PM EDT - Aligned all packers with deserialised interface

01:11PM EDT - Accelerators always see aligned data to help make assumptions for performance

01:11PM EDT - Can start processing the command before checking the CRC

01:12PM EDT - Separately pipelined control/tag vs data

01:13PM EDT - (coherence over switching is not supported in OpenCAPI due to complexity)

01:14PM EDT - 1/6th the cost in die area to put OMI instead of DDR

01:14PM EDT - So memory is easier to support

01:14PM EDT - Can enable more bandwidth in smaller ASICs with OMI

01:15PM EDT - Differential buffer attach is now agnostic - the buffer is on the memory

01:15PM EDT - Can put buffered DDR or GDDR, rather than one or the other

01:16PM EDT - OMI is lighter weight and open to enable more ecosystem support

01:17PM EDT - With OMI memory, based on OpenCAPI SerDes, can use multiple DDR4 and DDR5 on the same system with the same connector

01:18PM EDT - e.g. if enabled on AMD sIOD, would decouple memory technology from host silicon development

01:19PM EDT - Power9 Advanced IO chip = P9 AIO

01:19PM EDT - 728mm2, 8B transistors

01:19PM EDT - 24 SMT4 cores, 120 MB eDRAM L3

01:19PM EDT - Built on 14FF (GF?)

01:19PM EDT - 17 layer metal stack

01:19PM EDT - 16 channels of x8 OMI, 650 GB/s peak r/w bandwidth

01:20PM EDT - 48 lanes of PCIe 4.0

01:20PM EDT - Up to x16 CAPI 2.0

01:20PM EDT - Up to x48 NVLINK attach

01:20PM EDT - Shows 2S replacement, but can scale to 16 socket

01:21PM EDT - OpenCAPI 4.0

01:21PM EDT - support for 64/128/256B cache lines

01:21PM EDT - supports 128B messages for low latency

01:22PM EDT - Supports virtual address cache for system memory

01:22PM EDT - Host manages the higher level cache coherency

01:23PM EDT - P9 SU supports 4xDDR4, P9 SO supports 4x Centaur, P9 AIO supports 8x OMI

01:23PM EDT - On each side

01:24PM EDT - OMI DDIMM looks very different

01:24PM EDT - Will see if I can get a better photo

01:25PM EDT - Microchip SMC1000 chip used on the OMI DDIMM

01:25PM EDT - effective bandwidth and latency equivalent to LRDIMM

01:26PM EDT - Q: energy per bit on memory vs DDR?

01:27PM EDT - A: Don't have numbers here. We shifted power from the DDR PHY onto the memory DIMM which helps with cooling conditions. The 8 lane memory device can move to 2 lane or 4 lane depending on use. It does dynamically shift based on utilization. Better than DDR anywya

01:28PM EDT - Q: Does the OMI DDIMM have a cache? A: No, it's a slimmer device with write buffering no caching

01:29PM EDT - Q: Is OMI like CXL? A: Viewing CXL is focused more on accelerators. OMI is available today and ahead of the competition and been in development a long time. I'd be surprised if other buffered memory solutions get as low latency as us. I'd be surprised if CXL has such a low latency to memory

01:30PM EDT - That's it for this talk. Small break now, next talk for live blogging is MLperf

Comments Locked

17 Comments

View All Comments

  • nyoungman - Monday, August 19, 2019 - link

    The OpenPOWER Summit is happening at the same time, with a livestream.
    https://www.youtube.com/watch?v=bpAv91NszoQ

    The roadmap has a PCIe Gen4 and a new memory subsystem for POWER9 in 2020, and POWER10 with PCIe Gen5 coming in 2021.
  • mode_13h - Monday, August 19, 2019 - link

    They already had PCIe 4.0 for a couple years, now. POWER was the first kid on the block to have it.
  • Threska - Monday, August 19, 2019 - link

    Someone might want to run the images through a program for a little tweaking to improve clarity.
  • Ian Cutress - Monday, August 19, 2019 - link

    Click through, you get the full quality. (I mentioned during the talk)
  • aryonoco - Monday, August 19, 2019 - link

    728mm2... wow. Talk about a behemoth.

    Would love to know the price of a 16 socket server... yeah I know it's not for us mere mortals.

    A 2S EPYC 7742 gets you 128 cores and 256 threads for $14,000. This thing would get you 3 times the number of cores and 6 times the number of thread for 50x the price probably? And that might be lowballing it.

    On the other hand, lots of people in the HPC crowd, the DoD and a DoE and various other agencies who buy these don't care about hardware cost anyway.
  • mode_13h - Monday, August 19, 2019 - link

    14 nm? *yawn*

    I guess that's due to some GloFo contract?

    These guys are treading down the same path towards technical irrelevance as SPARC. "Remember POWER?" That's going to be the new refrain, to replace "Remember Alpha?"
  • aryonoco - Monday, August 19, 2019 - link

    10nm is not suitable for a high powered chip, and 7nm is nowhere near mature enough to fab a 728mm2 chip.

Log in

Don't have an account? Sign up now