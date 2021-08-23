Hot Chips 2021 Live Blog: CPUs (Alder Lake, Zen3, IBM Z, Sapphire Rapids)by Dr. Ian Cutress on August 23, 2021 10:30 AM EST
AnandTech Live Blog: The newest updates are at the top. This page will auto-update, there's no need to manually refresh your browser.
12:54PM EDT - Q: Is the chiplet technology technology scalable? A: When it comes to the 3D Vcache - latency is not large. For chiplets of having CCDs and IODs, it can give you more flexibility than monolithic. Build best products with chiplets
12:52PM EDT - Q: Primary motication for tripling table walkers A: some workloads with large DRAM access footprint with outstandling TLB misses. Lots of workloads won't need more than 2, but benefits a few pages, but a clever way to add more without excessive
12:51PM EDT - Q: V-Cache is applicable all the segments, all just for desktop/server A: Lot of different workloads, benefit from v-cache, Havenlt announced specific products with v-cache, but some workloads across segments that benefit
12:50PM EDT - Time for Q&A
12:49PM EDT - On track in TSMC N5
12:49PM EDT - Zen4 by end of 2022
12:49PM EDT - Summing up
12:48PM EDT - Performance that matters for the user
12:47PM EDT - Gaming was a main target for Zen 3
12:47PM EDT - All from uarch and physical design
12:47PM EDT - Ryzen performance gains in the same TSMC 7nm
12:44PM EDT - +15% faster on gaming
12:44PM EDT - Already demoed +64 MB L3
12:44PM EDT - Built in support for AMD V-Cache
12:43PM EDT - support 192 misses from L3 to memory
12:43PM EDT - L2 tags in L3
12:43PM EDT - L3 is an non-inclusive cache
12:42PM EDT - 2x32B data channels in opposite directions
12:42PM EDT - reduction in effective L3 memory latency
12:42PM EDT - access from cores, better for gaming
12:41PM EDT - Double L3 cache
12:40PM EDT - New instruction support
12:40PM EDT - No application modification needed
12:40PM EDT - Eliminates page table attack vectors through VMs/hypervisors
12:39PM EDT - SNP is the new feature for Zen 3
12:39PM EDT - SEV, SEV-ES, SEV-SNP
12:39PM EDT - Enterprise security additions
12:38PM EDT - How AMD calculated IPC uplift
12:37PM EDT - Quicker switching with I-cache overflow
12:37PM EDT - Back on track faster when mispredict
12:36PM EDT - Removed bubble cycle with branch prediction
12:36PM EDT - Changes from Zen2
12:35PM EDT - L2 DTLB has 6 page walkers
12:35PM EDT - larger load-store
12:34PM EDT - Doubled INT8 throughput
12:34PM EDT - Reduced FMA latency
12:34PM EDT - Faster 4-cycle FMAC
12:34PM EDT - larger 6-wide FP unit
12:34PM EDT - Without any additional increase in register file ports
12:33PM EDT - Disaggregated the ALUs rather than just add more
12:33PM EDT - More execution bandwidth ILP extraction
12:33PM EDT - 10 issue per cycle up from 7
12:32PM EDT - lower latencies for some instructions
12:32PM EDT - supporting wider execution
12:31PM EDT - reduced bubble cycle latency
12:31PM EDT - Large chunk of performance gain from the front end fetch/decode
12:30PM EDT - +19% IPC gains, which we verified at launch
12:29PM EDT - 4k op cache
12:28PM EDT - Socket compatibility for past products
12:28PM EDT - Scale-out for servers and supercomptuers
12:27PM EDT - Exceeding Industry Trends
12:27PM EDT - Zen3 says AMD 3D Cache support
12:27PM EDT - New era in the market for AMD
12:26PM EDT - The Zen Journey from 2017
12:26PM EDT - Mark Evers from AMD
12:25PM EDT - Now AMD Zen 3 talk
12:25PM EDT - Q: TDT for Linux, when? A: First enabling was Windows 11, work with Linux for time - it is coming, which version and build will be published later
12:24PM EDT - Q; Die photo, PCIe - how many PCIe 5/4/3 lanes? A: As shown, slide 11, 16x PCIe 5, 4x lanes of PCIe 4, Desktop has PCH
12:23PM EDT - Q: Security of side channel attacks with Thread Director A: No security effect, only performance
12:22PM EDT - Q&A time
12:22PM EDT - optimal P/V point is a function of phyiscal properties (thermal, binning)
12:21PM EDT - higher priority gets higher voltage and frequency regardless of P-core and E-core
12:21PM EDT - For power constrained systems
12:21PM EDT - EPP - Energy Performance Preference also takes a role in input to the scheduler
12:19PM EDT - AVX + VNNI / INT8 get highest priority over anything
12:19PM EDT - All AI workloads go to P-Cores over anything else
12:18PM EDT - Helps with asymmetry between the threads
12:17PM EDT - Here's a scheduling example
12:15PM EDT - Table is topology agnostic
12:15PM EDT - OS scheduler is final arbiter
12:14PM EDT - OS has idea of priority of thread
12:14PM EDT - Thread Director Table updated less often than thread classification
12:14PM EDT - Sometimes it makes sense to coalesce a software thread to fewer cores, or one type of core
12:12PM EDT - So every processor gets a section in the table, and it has a value for Perf and Efficiency, and workload is compared
12:12PM EDT - This is more detail about Thread Director
12:11PM EDT - Intel EHFI
12:10PM EDT - Core-to-Core IPC is the main metric
12:10PM EDT - Thread Director will predict the class of workload and bucket it the classes for the OS scheduler on the oder of 30 microseconds
12:09PM EDT - Onboard microcontroller
12:08PM EDT - Thread Director is mostly for Window 11
12:08PM EDT - Smartness is built into the hardware
12:07PM EDT - Only mobile will get native Thunderbolt
12:06PM EDT - 96 EUs on mobile, 32 EUs on desktop
12:06PM EDT - mix and match for future products
12:06PM EDT - modular design
12:05PM EDT - 2+8, 6+8 and 8+8 for P-core + E-core
12:05PM EDT - UP3/UP4 for mobile, Desktop
12:05PM EDT - Scalable SoC architecture
12:04PM EDT - P-core is +50% ST performance over the E-core
12:04PM EDT - E-core has shared L2
12:03PM EDT - P-Core and E-Core
12:03PM EDT - This is what we saw in the Alder Lake part of the Architecture Day
12:02PM EDT - Same arch, different uArch, different opimization point
12:02PM EDT - Moores Law and Dennard Scaling
12:01PM EDT - Duplicating multicore
12:01PM EDT - Working on smarter structures and new instructions for ML
12:01PM EDT - Increase in support of ML
12:00PM EDT - Most apps are Single or lightly MT
12:00PM EDT - The why and how of Alder Lake
11:59AM EDT - Efi Rotem for Intel on Alder Lake
11:59AM EDT - 'State of the art CPUs'
11:58AM EDT - First session is CPUs, about to start
11:57AM EDT - Posters as part of the conference as well
11:57AM EDT - For those attending
11:56AM EDT - 'Chips enabling Chips'
11:55AM EDT - DoE on AI Chips and challenges
11:55AM EDT - Skydio on autonomous flight
11:55AM EDT - Synopsys is on AI in EDA
11:54AM EDT - Three keynotes
11:52AM EDT - Tutorials were yesterday
11:51AM EDT - These people identify keynote speakers, solicit papers for talks
11:51AM EDT - Selecting the best talks
11:51AM EDT - Lots of members on the committees
11:49AM EDT - Behind the scenes
11:49AM EDT - There's a slack channel for all attendees
11:48AM EDT - Apparently some attendees are having issues with too many from the same company on the same VPN
11:47AM EDT - Here we go
11:45AM EDT - It usually starts with 15 minutes of pre-show info to begin
11:45AM EDT - The stream should be starting momentarily
11:40AM EDT - Welcome to Hot Chips! This is the annual conference all about the latest, greatest, and upcoming big silicon that gets us all excited. Stay tuned during Monday and Tuesday for our regular AnandTech Live Blogs. Today we start at 8:45am PT, so set your watches and notifications to return back here! The first set of talks is all about CPUs: Intel Alder Lake, AMD Zen 3, IBM Z, and Intel Sapphire Rapids.
SarahKerrigan - Monday, August 23, 2021 - linkThe early disclosures for IBM's new z processor, Telum, indicate it may have no on-die L3 (but has absolutely immense L2's.) I'm excited to see how that plays out! Reply
Ian Cutress - Monday, August 23, 2021 - linkCan confirm. L3 is virtual on a single chip, and L4 is virtual across chips. It's the future of multi-level caches. Reply
SarahKerrigan - Monday, August 23, 2021 - linkThanks! Any other deets you can provide on Telum before the presentation? IIRC z15 put higher-level BTBs in eDRAM - are those just SRAM structures now? What's the L1 config look like? Early disclosures implied the SC is gone - are memory controllers integrated into the Telum CP now? Reply
Ian Cutress - Monday, August 23, 2021 - linkIt's all one chip :) Presentation is soon, too much to write in a box right now. But I think all your Qs will be answered. Reply
Shaunathan - Monday, August 23, 2021 - linkaww man, i got burrito juice all on my hand Reply
The Hardcard - Monday, August 23, 2021 - linkJason Lowe-Power? What a name for someone in the semiconductor industry! He doesn’t need to do anything extra to get his resume read. Reply
JayNor - Monday, August 23, 2021 - linkWhat pcie 5 chips does Intel have to hook to Alder Lake? Reply