AMD's Steamroller Detailed: 3rd Generation Bulldozer Core
by Anand Lal Shimpi on August 28, 2012 4:39 PM EST- Posted in
- CPUs
- Bulldozer
- AMD
- Steamroller
Cache Improvements
The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.
Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate.
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power.
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense.
Looking Forward: High Density Libraries
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:
The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
Final Words
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
126 Comments
View All Comments
CeriseCogburn - Wednesday, August 29, 2012 - link
Another amd liar, and here's the proof:http://www.tomshardware.com/reviews/fx-4100-core-i...
More BS from the amd bs artists of the web.
GaMEChld - Thursday, August 30, 2012 - link
Wait, I'm confused, who was lying about what? I'm not sure what that toms hardware link was supposed to prove, since both of those guys were talking about BF3 on Ultra settings, and Ultra was not tested on that page you linked. Better dial back the blind AMD hatred, since you were attacking people who were arguing about FPS and price, not Intel and AMD.Spunjji - Thursday, August 30, 2012 - link
Cerise is a special kind of chimp.Galidou - Thursday, August 30, 2012 - link
Last time I said something like that to Cerise, he told me I was in a crysis and I had to take midol, careful about what you say around him.CeriseCogburn - Friday, October 12, 2012 - link
Mr dupemeister got the rez wrong, the framerate wrong, then the his cpu recommendation wrong, then he couldn't comprehend when he went to the link, as it clearly shows his crap cpu pick losing to the cheaper Intel chip, after he claimed his crap amd pick was the best bang and 20 bucks more. LOLBut you ragging amd fans who cannot stand an insult expect us all to stand your constantly insulting lies them smile pretty and thank you for your stupid treachery and lies.
Right ?
Okay, thank you so much for having the midol disability that prevents you from being able to think clearly or get anything correct.
CeriseCogburn - Friday, October 12, 2012 - link
You're both blind, mind numbed, idiot doofy bats. Here is his quote idiot #2" that said, the best bang for the bug gaming cpu is the AMD FX4100 for about $140. Why go weak i3 dual core when you can go mid range quad from AMD for $20 more."
Look at the link again, brain dead core amd fan.
Spunjji - Thursday, August 30, 2012 - link
Bahahahaha, you're such a tool. xDGalidou - Thursday, August 30, 2012 - link
All that counter-offensive for nothing... He never said he runs it on ULTRA. Boy people nowadays thinks you have to play the games on ultra or else you're just not playing it at all. A radeon 6870 or a 550 ti runs the game at that resolution on high details and it's BEAUTIFUL with over 50 fps...Galidou - Thursday, August 30, 2012 - link
Well actually he said ultra but there must be some options not enabled like msaa, I beleive it'S totally possible to get that on a 140$ card. I built myself a pc for one of my friend that total with the case did cost me around 350$ total, without hard drive and power supply and that totally runs Battlefield 3 easily on 1600*900 not ultra but almost.Origin64 - Thursday, August 30, 2012 - link
Unfortunately 60 fps and full hd are the standard these days, so picking custom fps targets and lower resolutions doesn't really count as far as im concerned