Cache Improvements

The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.

Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate. 
 
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
 
 
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power. 
 
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense. 
 

Looking Forward: High Density Libraries

 
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
 
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:

The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
 
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
 
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
 

Final Words

 
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
 
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
Front End & Execution Improvements
POST A COMMENT

126 Comments

View All Comments

  • CeriseCogburn - Friday, October 12, 2012 - link

    Congratulations. You have achieved what others have repeatedly failed to do.
    You have broken the darkside grasp of the AMD PR fanboy advertising pump campaign.

    " Consequently, I sit here asking myself WTF!? Not at AMD (this let down was expected) but at myself. There is no other manufacturer, service provider, or producer that I would tollerate this from, why am accepting it from AMD? "

    It appears your mind has cleared, you have exited enslavement to the Deathstar.
    Reply
  • MLSCrow - Wednesday, September 05, 2012 - link

    Quote from Laststop311: "-28nm really? Intel will be on it's 2nd gen of 22nm over a year after 22nm debuts for intel and amd still can't match that size. Will steamroller be enough to go up against haswell, thats not even a legit question, haswell is going to obliterate steamroller in every way imaginable."

    Response: Intel will always be ahead of AMD in terms of their fab process. This is nothing new. AMD will go to 28 after intel has gone to 22. AMD will then go to 14 after Intel moves to 11 and so on and so forth. Old news. The question isn't whether steamroller will be enough to go up against Haswell. Haswell will be better in every way, you're right. Anyone who tries to argue that isn't well informed. The real question is, will Steamroller be enough to keep AMD in the game and the answer is, if it performs the way they are saying it will, I predict yes, it will for sure keep them in the game. According to AMD's information (I trust this group more than the group in charge of the original Bulldozer who were fired), that Steamroller will perform at least 15% faster than Piledriver, but from what they said at Hot Chips, 30% faster. 30% faster than Piledriver will put it's performance up against Ivy Bridge, possibly Ivy Bridge-E and to be honest, once you're at that level of performance, no one will even notice a 10-20% improvement which is what Haswell is referring to. Most people today wouldn't know the difference if you put a phenom ii x4 in their system or a Sandy bridge. It's only the benchmarkers, hardcore gamers, and enthusiasts that will notice and they only make up a very small portion of the market. If HSA takes off and it might, especially with Samsung having signed on last week, which may spark other companies to join considering Samsung is so huge, and companies start to code for it, Steamroller might actually give AMD a moment of glory that they haven't seen since Athlon 64. If the industry moves toward HSA the way AMD is betting the farm on, Steamroller will actually steamroll Haswell pretty bad, but who knows what the future holds in that regard. Regardless, whether or not HSA takes off, Steamroller should be enough to keep AMD in the game.

    Quote by hapkiman: "If they don't hit one out of the park soon, I see AMD turning into a second rate company making low-end APUs for OEMs. and of course graphics cards."

    Response: They already are a second rate company making low end APU's for OEMs and of course graphics cards. LOL. They'd HAVE to hit a home run with HSA and Steamroller to truly get back into the first rate game. They've been out of it for a few years now. Thanks to Hector Ruiz (should be Ruinz).
    Reply
  • mack53 - Wednesday, September 12, 2012 - link

    Still think it boils down to what you want. If it does that who cares. Plus I can't spend the money that Itel wants for the newest and best. Amd has done me right for along time. If we didn't hav AMD, Itel would tsake over and I'd hate to see the costs then.... Reply
  • HexiumVII - Sunday, April 07, 2013 - link

    Imagine the win ultrabook/tablet when AMD can put a Core class ( even first gen will do) CPU with their Radeon APUs. Common AMD go! Reply
  • scorpysr - Tuesday, June 11, 2013 - link

    hi everyone,
    ive read every post here, lol kinda comical in a way with that said just a bit of introduction: been building comps since the 386 and boy we have come a long way im not a rocket scientist by any means but want to say a few things...Mac53 thats pretty much it in a nut shell...but firstly money is always a factor in life cant get around it, there be some that would rather have a corvet but eat bologna for 2 years but so be it...personally i like my steaks... :) secondly..i started out amd and i am still glad they are around, we need competition in life it does drive the wheels to innovation and helps keep prices down...however the business model has sadly changed alot its no longer sell alot and make 20% fair profit now its get what you can get even if you gouge..i digressed a little sorry but here it is...i bought a i7 950 bought 2 years ago...got a nice video card and 6 gig of ram i doubt anyone here can justify for me going out and spending another 600 bucks for the lastest upgrade..nope this rig will take me well into 2016 or when performance of this chip is beaten by at least 30%....and guys dont get me wrong having a hobby is nice...but can you imagine a site dedicated to who makes the best refridgerator !!!!!!!!!! lol just food for thought and
    thanks for allowing me this oppertunity

    cheers
    Reply
  • gareth112 - Tuesday, June 25, 2013 - link

    Intel have been releasing great products of the last say 4 years I series but the bang for buck is always with AMD and ATI, you can build a great system normally for half the price with AMD and ATI products with the same performance as the Intel based system.

    I use both Intel and AMD for work, and yes Intel processors are faster, but when you have a AMD in your computer doesn't seem to cause as many random crashes because they have been developed longer not to just rush out.

    plus i like being on the Rebels/under dogs team, as come on everyone likes to bet on the under dog.
    Reply

Log in

Don't have an account? Sign up now