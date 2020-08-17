Hot Chips 2020 Live Blog: IBM's POWER10 Processor on Samsung 7nm (10:00am PT)by Dr. Ian Cutress on August 17, 2020 1:00 PM EST
- Posted in
- CPUs
- Enterprise CPUs
- IBM
- Live Blog
- POWER10
- Hot Chips 32
AnandTech Live Blog: The newest updates are at the top. This page will auto-update, there's no need to manually refresh your browser.
01:37PM EDT - Q: Did power delivery get upgraded, or still on-die LDOs? A: Go into detail at ISSCC. Still similar delivery platform of Power9
01:36PM EDT - Q: Read latency increase with OMI DIMM? A: less than +10ns
01:35PM EDT - Q: PCIe Gen6? Will future Power10 enable this? A: No talk about our future products. We're glad that PCIe is speeding up, we always look at market conditions to create chips.
01:34PM EDT - Q&A time
01:33PM EDT - To allow for customers and developers to adjust
01:33PM EDT - (IBM usually does this - announce a core/product 12 months in advance)
01:33PM EDT - Time scale for Power10 is that initial systems for IBM partners will be available Q4 2021
01:32PM EDT - Improvements over POWER9
01:32PM EDT - 3x inference latency reduction
01:32PM EDT - Implements data-reuse efficiency
01:32PM EDT - Simple library update needed in most cases
01:31PM EDT - New MMA enhanced infernece acceleration
01:30PM EDT - supports FP64, FP32, FP16, BF16, INT16, INT8, INT4
01:30PM EDT - 4 512b engines per SMT8 core
01:30PM EDT - supports fixed, float, permute
01:30PM EDT - 8 SIMD 128-bit engines per SMT8 core
01:29PM EDT - OMI to one core - 256 GB/sec peak, 120 GB/s sustained, 3x L3 prefetch and mem prefetch extensions
01:29PM EDT - 4x 32B loads, 2x 32B stores per SMT8 core (Fusion required)
01:29PM EDT - 2x bytes from all sources: L1, L2, L3, OMI
01:29PM EDT - Also improved memory bandwidth
01:28PM EDT - 3x perf/watt at socket level
01:28PM EDT - = 2.6x perf/watt overall at the core level
01:28PM EDT - 1.3x perf at 0.5x power vs Power9
01:28PM EDT - Redesigned major structures such as queues
01:27PM EDT - each design element was redesigned for performance and efficiency
01:27PM EDT - Improved clock gaiting
01:27PM EDT - Fuse consecutive load/store instructions, double wide load/store bw
01:27PM EDT - Eliminates dependencies
01:27PM EDT - New instruction fusion opportunities
01:26PM EDT - Branch execution has been improvement
01:26PM EDT - New tag predictors
01:26PM EDT - L3 is 27.5 cycle
01:26PM EDT - L2 is 13.5 cycle
01:26PM EDT - 1000 instructions in flight per SMT8 core
01:26PM EDT - 1.5x L1-cache, 4x L2, 4x TLB
01:26PM EDT - 4x in mixed math acceleration
01:25PM EDT - Each SMT4 segment can do 2x512b and 4x128b per cycle
01:25PM EDT - Here's a core diagram - this is half an SMT8 core
01:25PM EDT - Active management for enhanced performance and avoids side channel
01:24PM EDT - Full memory encryption
01:24PM EDT - Secure containers supported at hardware and virtualization layers
01:24PM EDT - Crypto perf for future algorithms already accelerated
01:24PM EDT - Security and isolation
01:24PM EDT - Optimizations for memory tiers
01:23PM EDT - New op-code space for instruction instruction
01:23PM EDT - 64-bit prefix instructions in a RISC-friendly away
01:23PM EDT - Power ISA 3.1
01:23PM EDT - High performance nested hypervisors with enhanced security
01:22PM EDT - Container based stack support over PowerVM hypervisor
01:22PM EDT - Core is modular
01:22PM EDT - In SMT8 mode, 15 cores per chip. In SMT4 mode, 30 cores per chip
01:21PM EDT - DCM is more efficient
01:21PM EDT - 2.6x perf/watt improvement
01:21PM EDT - +30% average perf against POWER9, +20% in ST
01:21PM EDT - Up to 8 threads per core
01:20PM EDT - *602mm2, correction from earlier
01:20PM EDT - 2.2-4.4x socket performance compared to Power9
01:19PM EDT - Also 64 lanes of PCIe G5
01:19PM EDT - Memory disaggregation becomes a reality.
01:19PM EDT - Pod-level memory resource pooling with extra gear
01:19PM EDT - Allows 1000s of nodes to access memory across the whole system
01:19PM EDT - Robust virtual channel management
01:19PM EDT - Paging tables as routing tables
01:18PM EDT - Or servers without memory borrowing from a big server
01:18PM EDT - Connect multiple 16-socket systems with Memory Inception
01:18PM EDT - Supports up to 2 PB of memory
01:17PM EDT - Only +150ns compared to accessing far memory within the same server
01:17PM EDT - Full hardware load/store access to other server memory
01:17PM EDT - Memory Inception comes to Power10 - access memory from any socket in the cluster
01:16PM EDT - PowerAXON supports direct attach SCM or ASIC/FPGA
01:16PM EDT - Also supports storage class memory up to 2 TB
01:15PM EDT - Also supports GDDR for up to 800 GB/sec
01:15PM EDT - Will support DDR5 when DDR5 is ready - no new system, just need new OMI buffer chip
01:15PM EDT - Supports DDR4 at 410 GB/sec bandwidth per Power10 CPU
01:15PM EDT - Tech agnostic - supports any media with OMI buffer
01:14PM EDT - Grandchild of Centaur memory
01:14PM EDT - OMI is OpenCAPI Memory Interface
01:14PM EDT - Several new scaling capabilities
01:14PM EDT - PowerAXON is for chip-to-chip connectivity
01:13PM EDT - optimized placement for packaging
01:13PM EDT - 150 micron bumps
01:13PM EDT - PowerAXON and OMI support 1TB/sec each
01:13PM EDT - 16-socket for big iron systems
01:12PM EDT - Dual chip module is two 602 mm2 chips into one package
01:12PM EDT - SCM allows for 16-socket, DCM is 4-socket
01:12PM EDT - Two packaging options: Single and Dual chip modules
01:12PM EDT - High bandwidth PHYs, OMI and PowerAXON and PCIe G5
01:12PM EDT - 16 physical cores, but 15 will be enabled. Improves economics of yield
01:11PM EDT - Two versions of the core: SMT4 and SMT8. This chip is the SMT8 version
01:11PM EDT - 18B transistors on Samsung 7nm, 602B transistors
01:11PM EDT - Integrating into enterprise workflows
01:11PM EDT - AI acceleration in the processor core
01:10PM EDT - maturing AI landscape
01:10PM EDT - New abilities, ground-up rearchitecting for power efficiency
01:10PM EDT - On track to deliver systems in 12 months
01:10PM EDT - First hardware back in the laps
01:10PM EDT - Power10 is made smarter for everyone
01:10PM EDT - Financial systems, commercial, healthcare, governments
01:09PM EDT - It's the building block for the world's most powerful supercomputers
01:09PM EDT - Power roadmap - power is about the enterprise
01:09PM EDT - Brian is chief core architect
01:09PM EDT - Bill is chief architect of POWER10
01:08PM EDT - Time for Power10! Bill Starke and Brian Thompto
plopke - Monday, August 17, 2020 - linkAah hotchips one of the most fun times of a year to read Anandtech all tough i only vaguely understand half of the stuff being talked about xD Reply
Jon Tseng - Monday, August 17, 2020 - linknb I thought he said 50-100ns worse latency for the funky off-server DMA (but I could be wrong!) Reply
zamroni - Monday, August 17, 2020 - linkProcessor for FUD-ed DB2 database customers. I am sure the core banking applications can work with X86-64 version of DB2. Reply
Cheesecake16 - Monday, August 17, 2020 - link2 Petabytes of addressable RAM Reply