An Unbalanced L1 Cache: We Know Why

The Atom processor is outfitted with fairly large caches, which are quite necessary given its in-order architecture that's very sensitive to high memory latencies. We wrote the following in our initial Atom (Silverthorne) architecture discussion:

"The L1 cache is unusually asymmetric with a 32KB instruction and 24KB data cache, a decision made to optimize for performance, die size, and cost. The L2 cache is an 8-way 512KB design, very similar to what was used in the Core architecture.

While Silverthorne is built entirely on Intel's high-k/metal gate 45nm process, there is one major difference: SRAM cell size. Intel uses a 0.382 um^2 SRAM cell in Silverthorne compared to 0.346 um^2 in Core 2. Each SRAM cell is an 8 transistor design compared to 6 transistors in Core 2. The larger cell size increases the die size of Silverthorne but it draws less power and runs at a lower voltage."

At the time we didn't have a good explanation as to why the Atom's L1 cache wasn't made of equal sized instruction and data caches, which is usually how Intel designs its processors. Since then we have gotten some more insight into the design decision:

Historically, Intel would design a microprocessor for a particular manufacturing process (e.g. 65nm) and shoot for a target voltage, later attempting to lower that voltage when possible. Atom was designed around the absolute minimum voltage the manufacturing process (45nm) was capable of running at and the engineers were left with the task of figuring out what they could do, architecturally, given that requirement.

The perfect example of this approach to design is Atom's L1 instruction and data caches. Originally these two caches were small signal arrays (6 transistors per cell), they were very compact and delivered the performance Intel desired. However during the modeling of the chip Intel noticed that it was a limiter to being able to scale down the operating voltage of the chip.

Instead of bumping up the voltage and sticking with a small signal array, Intel switched to a register file (1 read/1 write port). The cache now had a larger cell size (8 transistors per cell) which increased the area and footprint of the L1 instruction and data caches. The Atom floorplan had issues accommodating the larger sizes so the data cache had to be cut down from 32KB to 24KB in favor of the power benefits. We wondered why Atom had an asymmetrical L1 data and instruction cache (24KB and 32KB respectively, instead of 32KB/32KB) and it turns out that the cause was voltage.

A small signal array design based on a 6T cell has a certain minimum operating voltage, in other words it can retain state until a certain Vmin. In the L2 cache, Intel was able to use a 6T signal array design since it had inline ECC. There were other design decisions at work that prevented Intel from equipping the L1 cache with inline ECC, so the architects needed to go to a larger cell size in order to keep the operating voltage low.

The end result of this sort of a design approach is that the Atom processor is able to operate at its highest performance state (C0) at its minimum operating voltage.

Hardware Prefetchers: So Necessary

Atom features two hardware prefetchers, one that prefetches from the L2 cache into the L1 data cache and one from memory into the L2 cache.

Hardware prefetching is unbelievably important when dealing with an in-order core because as we've mentioned time and time again, not having data available in cache means that the pipelines will stall until that data is available.

The obvious long term solution to the problem of data starvation is to integrate the memory controller on die. With no 45nm MCH design ready by the time the Atom design was complete, Intel has to wait until the second generation Atom (codename: Moorestown) to gain an on-die memory controller.

Fighting Power Consumption...with a Longer Pipeline? Building by FUBs
POST A COMMENT

46 Comments

View All Comments

  • AssBall - Thursday, April 03, 2008 - link

    I was a little surprised at first but when I got thinking... I don't really know if my P35 northbridge is even a 90nm chip.

    - Why would Intel not want to use its old rock solid 130 process that it spent a ton of money on to build simpler parts than it was designed for. As long as their materials and equipment are working fine for a cheap enough 130 part and there is no dire market for lower power chipsets that fit in an ATX standard...

    - Intel's most recent strategy has been to design and manufacture their latest designs on solid existing tech before they shrink it.

    I'm also fairly certain that if this little CPU takes off they will have the 90-65 version of it with some simple refinements out in two shakes of a lamb's tail. I'm really surprised myself that they set it up with ddr2 support... but again its so cheap now, why not?
    Reply
  • rmlarsen - Wednesday, April 02, 2008 - link

    It is articles like this that make me come back to Anandtech. Well written and researched and with the right level of detail. Keep up the good work!
    Reply
  • Woodchuck2000 - Wednesday, April 02, 2008 - link

    One of the best I've seen on Anandtech for quite a while. I've been following this one quite closely and it's great to have such a detailed exposition all in one place.

    Do we have any news on the availability of this as a desktop part? I've been looking to construct an always-on server sitting in a cupboard somewhere, just to act as a file/print server and to do a little light database/web hosting for testing and developement. A 1.6GHz Atom would easily provide enough horsepower to accomplish that and with a notebook HDD, the whole thing should stay well under 10W load power consumption!

    I saw a photo of Silverthorne + Poulsbo on an Intel reference board built to a Mini-ITX form factor but couldn't find any details on whether they were planning to release it to the general public...

    Also any news on when we might expect benchmarks?

    Keep up the good work!
    Reply
  • yyrkoon - Wednesday, April 02, 2008 - link

    You could probably do this now with the VIA pico ITX reference board.

    http://www.logicsupply.com/products/px10000g?refer...">http://www.logicsupply.com/products/px1...=&gc...

    Sorry for the long link . . . but I think power usage is somewhere around ~12W.
    Reply
  • AnnonymousCoward - Friday, April 04, 2008 - link

    Try tinyurl.com

    Thanks for such a high quality article, Anand!
    Reply
  • tfranzese - Wednesday, April 02, 2008 - link

    "These days, Intel manufacturers millions of Core 2 Duo processors each made up of 410 million transistors (over 130 times the transistor count of the original Pentium) in an area around 1/3 the size."

    ...is incorrect. You could nearly fit six Core 2 Duos at 45nm in the same area that the original Pentium occupied or even more impressive, four of them on the die of the original Pentium 4 with room to spare.
    Reply
  • tfranzese - Wednesday, April 02, 2008 - link

    I'm the one with fuzzy logic today and misinterpreted :) Reply
  • Magnus Dredd - Wednesday, April 02, 2008 - link

    It is completely true that there are benefits to having a single platform to support. However, the article is completely off the mark about where the benefits are most realized.

    It's not the HARDWARE.

    It's the API.

    It's all about the API. Unless you're writing drivers or an OS you're not writing to the hardware, with VERY few exceptions. The exceptions to this are for optimizations for seriously intense code like Photoshop filters and video game engines, where 90% or more of the code is to the API. So basically one way or another you're writing to an API. That's Application Programming Interface.

    Since it was mentioned in the article...

    If I'm writing a program that's supposed to run on OSX, the newest version supports TWO hardware platforms (PPC and x86, and not just x86 as the article claimed), and I want to create a "window" using the built in API (named Cocoa) I use the command NSWindow.
    http://developer.apple.com/documentation/Cocoa/Ref...">http://developer.apple.com/documentatio...SWindow_...
    It makes no difference when writing the program whether it's a PPC or Intel based machine that it will be running on with the single exception that in a few places you have to use a small bit of code to make sure that the program uses the byte order appropriate to the processor.

    While I have yet to read info on it, I'd bet that NSWindow is also used by the iPhone which uses a MIPS cpu (yet a third architecture).

    I've written code in ANSI C for Linux that runs without making any changes on PPC, x86(32 bit and 64 bit), and Sparc. If I wanted to go to the trouble I could also compile it on my MIPS based SGI, a Motorola 68000 series Mac, and a HP PA-RISC (if I can ever figure out how to get the damned thing running). That's because nearly all modern applications are compiled.
    ---
    Now if by PC, Anand meant Windows, we're talking a different story, but one with similar flaws.

    So I'm writing an application for Windows and I want to create a "window". I use the win32 command CreateWindow.
    http://msdn2.microsoft.com/en-us/library/ms632679(...">http://msdn2.microsoft.com/en-us/library/ms632679(...

    So lets just say that I want it to run on an Itanium under Windows... I use the win32 command CreateWindow.

    So let's just say that I want to make it run on a WindowsCE based set-top box or internet tablet powered by a MIPS CPU...
    I use the win32 command CreateWindow.
    http://msdn2.microsoft.com/en-us/library/ms908192....">http://msdn2.microsoft.com/en-us/library/ms908192....
    Quoting Wikipedia: "It is supported on Intel x86 and compatibles, MIPS, ARM, and Hitachi SuperH processors."
    http://en.wikipedia.org/w/index.php?title=Windows_...">http://en.wikipedia.org/w/index.php?title=Windows_...
    ---
    I'm actually somewhat saddened to write this post, mostly due to the the amount of respect I have for Anand and many of the great articles he's written over the years. However, I suppose that it goes that way sometimes.

    I just want to make sure that people aren't misled about what makes it hard to port/move a program to another platform like a phone or a BlueRay player. And it's sure as hell not the CPU's instruction set. The CPU may not be fast enough or it's that programs are written for these APIs that may or may not support the hardware. And the fault for this lies with Microsoft, or Apple, or the GTK guys, or Trolltech, or whomever the API belongs to.

    Also simply dropping an x86 CPU into a machine does not mean that it can run Windows. With the sales of XP to cease, your only option for the new batch of supercheap x86 laptops like the ASUS EEE, or the cloudbook may be Linux, regardless of the fact that it's x86 based.

    The bottom line is, if Microsoft doesn't care about your platform, they won't support it and you won't be able to get it with Windows regardless of what the CPU is.

    While I do personally agree that x86 moving "downwards" is a great thing. I just see it taking over for completely different reasons, like Intel's manufacturing prowess.
    Reply
  • yyrkoon - Wednesday, April 02, 2008 - link

    Windows XP Embedded, and the compact .NET framework while you're talking about 'APIs', and platforms. XPe, and Win2003 Embedded are not going away any time soon, and basically have barely been available on a non beta basis. Although CE Builder could probably do the same thing, and as a matter of a fact I've seen some fairly nifty things done with it(eg: a boot-able image that fits on a floppy with all the functionality of your standard NAS, including User Groups and permission policies).

    While sometimes having an OS on an Embedded device may be a hindrance, there are times it can be quite handy. Bank KIOSKs, and Cash registers are only two such examples, and I have worked on/with both that use WinXPe.
    Reply
  • Anand Lal Shimpi - Wednesday, April 02, 2008 - link

    Agreed - the API also plays a large part, but for a company like Apple the pain of maintaining both PPC and x86 codepaths is significant. Perhaps the Firefox reference wasn't the best one, especially as I really see the strengths here for software companies like Apple (not to mention what other conventional hardware companies may start looking more like software companies as their devices get more complex).

    I think you've also hit on a major issue going forward: Microsoft is going to have to focus on these not-PCs a lot more seriously in the future. Instead of trying to scale Windows down, it needs a MCE-esque approach to these "fast enough" devices. Apple made the right first step with the iPhone OS, Microsoft can't stand by idle for too long without a good alternative. And MS does love x86... :)

    Take care,
    Anand
    Reply

Log in

Don't have an account? Sign up now