Cache Improvements

The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.

Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate. 
 
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
 
 
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power. 
 
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense. 
 

Looking Forward: High Density Libraries

 
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
 
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:

The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
 
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
 
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
 

Final Words

 
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
 
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
Front End & Execution Improvements
Comments Locked

126 Comments

View All Comments

  • jabber - Tuesday, August 28, 2012 - link

    You keep believing that.

    The computing world does not revolve around mainly males aged between 14 to 25 playing Crysis.

    Sorry to burst your bubble.
  • swaaye - Tuesday, August 28, 2012 - link

    I think you forgot to read what you replied to.
  • jabber - Wednesday, August 29, 2012 - link

    It was a reply to Benchpress.

    About time a modern forum got some modern comments software.
  • MySchizoBuddy - Wednesday, August 29, 2012 - link

    so far all these cries of a modern comment system has fallen on deaf ears.
  • rocketbuddha - Tuesday, August 28, 2012 - link

    I look at it slightly different. AMD is totally dependent right now on its OEM partners to push its processors into products that the consumers/market wants.

    Let us take Trinity for example. A excellent mobile APU and improvement over Llano in every single way. With AMD giving OEMs all the freedom to differentiate compared to the strict Ultabook guidelines that Intel forces, you should see a huge number of Ultra Thins (UT)

    But what the OEMs are doing is equipping Trinity based NB/UT with substandard hardware. Worse they are pricing them so friggin close to low-end ultrabooks or UB like a little thicker notebooks with a better performance to hit AMD at all ends.

    So now we have Brazos 2 systems fighting the simple Ivy pentiums which is not its intended competition and Trinity systems against Ivy systems sometimes with a discrete basic NVIDIA chip which can come close/exceed Trinity's graphics performance with a more powerful and efficient CPU.

    So AMD who can just compete in Price/price to performance market right now have
    a) Intel based systems very close in price in consumer market.
    b) High-end gaming is now solely Intel
    c) Stable (repeatedly rewarding) business market firm on the Intel camp.
    d) Intel firmly owning the low-power market (expensive) due to technology advantage.

    That is the reason AMD is missing revenues while Intel is growing albeit at a slower pace.
  • Conficio - Wednesday, August 29, 2012 - link

    You are right and you are wrong.

    Totally agree that there are many computer uses where CPU speed and architecture does nto matter that much.

    Unfortunately, the laptop OEMs have nto yet caught on to this. They still produce only crappy systems with crappy screens and crappy keyboards/mousepads (and in extension flexing cases). Or they go ultra light/thin and add unnecessary GPU and top of the line quad core CPUs

    I think AMD needs to play its cards for affordable systems not just for cheap systems, but for an affordable middle ground, where the UX is quality and the chips are just good enough for the non gamer. AMD could do a lot with a good branding. Where is the AMD equivalent of UltraBooks with standards for solid keyboards, high res/high quality (IPS) screens, etc. At the end a CPU with a decent GPU (good enough for photo/video viewing and the OS animation gimmicks) shines much more in a solid combination. Call them EverydayBooks or A-Class laptops (as in AMD class)
  • CeriseCogburn - Wednesday, August 29, 2012 - link

    Whom is going to build these amd wonders I ask.
    AMD couldn't get their damned fan design correct on the 69xx series and had to SHAVE PLASTIC CORNERS on the 6 pin connector.
    I shudder to think AMD would have a hand in the design... they can't do basic measuring correct - and go to production with that kind of fault.
    I suspect AMD internally is a bunch of scared losers who dare not speak out about problems lest they "get canned for the sake of the bottom line".
    That grows into a real problem very quickly - different parts of the company not communicating with other portions - STOVEPIPED management with workers, engineers, groups, blocks, all living in fear...
    SOMEONE needs to straighten it out - a modern government, secret agency, AND corporation cannot function properly if secrecy due to fear or stovepiping or "security" and/or protecting one's domain is the REASON for the silence and lack of communication...
    In that environment people "give up" pretty quickly and "work with what they've got" which is less than they need to get it done correctly.
    It's like AMD drivers - " I didn't know anything was wrong " says Catalyst Maker.... that kind of level of total and utter lack of communication.
  • Galidou - Sunday, September 2, 2012 - link

    You speak like everyhting is so simple working building video cards and processors. It's so easy that everyone at home should build their own freaking parts, it's so easy... But still you ahve so much knowledge about management but you spend your timje spreading hate on forums about AMD AMD AMD....

    ''I suspect AMD internally is a bunch of scared losers''

    You jsut suspect too much things, go back to your design of the processor and video card that will dominate them all, it seems so easy after all.
  • CeriseCogburn - Friday, October 12, 2012 - link

    Oh must have really severed that nerve as the reality cut through to the bone.
    You're fired !
    (so another amd employee left - of course the emp was told it's layoffs again, but we all know crap under performance leads to it, and is effectively a firing no matter the name, or the nature, as in the boss saying "turn in your resignation")
    Aren't you touchy feely types all about minimizing the hostile work environment ...
    You ever been onboard a sinking ship miss touchy ?
  • ifrit39 - Tuesday, August 28, 2012 - link

    While I agree with your comment about 'good enough' performance for mainstream users, I don't agree with the idea that process node shrinks are not important.

    Most of the consumer market uses their pcs simply for we browsing, email, and video. There has been an enormous shift from desktop to notebook and now to tablet and other mobile and light devices. This shift is enabled by process node shrinks that reduce power consumption and heat to reasonable levels and allow greater performance in smaller form factors. I think core 2 duo was 'good enough' to meet basic needs without feeling sluggish. But what user could complain about an extra hour of battery life?

    If you read AT regularly, you'll see the impact that node shrinks have on battery life, power consumption, and heat/noise. Ivy bridge is no different and neither is any

Log in

Don't have an account? Sign up now