A Crash Course in CPU Architecture

It’s been years since I’ve gone through the life of an instruction, and when I last did it it was about a very high end desktop processor. I realize that not everyone interested in what’s powering the iPhone 3GS or Palm Pre may have been taken down this path, so I thought some of that knowledge might be useful here.

Applications spawn threads, threads are made up of instructions and instructions are what a CPU “processes”. The actual processing of an instruction is pretty simple; the CPU must fetch the instruction from memory, decode or somehow understand what the instruction is telling it to do (e.g. add two numbers), grab any data that is required by the instruction (e.g. find the numbers to be added), actually execute the instruction and finally write the result of the operation either to a register or memory.


Our basic microprocessor with a 5-stage pipeline

Based on the example above, executing an instruction requires five distinct stages. In a pipelined microprocessor, a different instruction can be active at each stage of the execution pipeline. For example, you can be grabbing data for one instruction, while decoding another and fetching yet another. All modern day processors work this way.


Multiple instructions can exist in the pipeline at once, but only one instruction may be active at any given stage

Each one of these stages should take the same amount of time for the processor to work efficiently; the length of time required at the longest stage actually determines the clock speed of the CPU. If the most complex stage in my example above is the decode stage and it requires 3ns to complete, then my CPU can run no faster than 333MHz (1 / 3ns).

To reach faster frequencies, we need to speed up each stage of the pipeline. You can speed up a stage by implementing some sweet new algorithms, or simply by splitting up complicated stages into simpler ones and increasing the number of stages in your pipeline.

In our previous example, the decode stage required 3ns to complete but if we split decode into three separate stages, each requiring 1ns, then we remove that bottleneck. Let’s say we do that but now some of our other stages become the bottleneck; with a target of a 1ns clock period (1ns spent per stage) we go from five stages to eight:

Fetch
Decode 1
Decode 2
Decode 3
Fetch Operands
Execute 1
Execute 2
Write Output

Now, with each stage running at 1ns, our maximum clock speed goes up from 333MHz to 1000MHz (1GHz). Sweet. Right?

With less work being done in each stage, we reach a higher clock speed, but we also depend on each stage being full in order to operate at peak efficiency.


5-stage pipeline (top) vs 8-stage pipeline (bottom). The 8 stage pipe is more desirable, but also requires more instructions to fill.

In the first CPU example we had a 5 stage pipeline, which meant that we needed to have the pipe full of 5 instructions at any given time to be operating at peak efficiency of 1 instruction completed every cycle. The second example has a ginormous 8 stage pipeline, which requires 8 instructions in the pipe for peak efficiency. In both cases you can only get one instruction out of the pipe every cycle, but the second chip can give us more completed instructions in say, 10 seconds.

Now think for a moment about the time periods we’re talking about here. The first CPU had a clock period of 3ns, where each stage took 3ns to complete. The second CPU had a clock period of 1ns. A single trip to main memory can easily take 60ns for a CPU with a very fast on-die memory controller, or over 100ns otherwise. For the sake of argument let’s say that we’re talking about a 100ns trip to main memory. Remember the Fetch Operands stage? Well if those operands are located in main memory that stage won’t take 3ns to complete, but rather 103ns since it has to get the operands from main memory.

Modern processors will perform a context switch upon any memory access to avoid stalling the pipeline for such an absurd length of time. The contents of the pipeline get flushed and filled with another thread while the data request goes off to main memory. Once the data is ready, the processor switches contexts once more and continues on its execution path. Here’s the problem: it takes time to refill the pipeline, and the longer the pipeline, the longer it takes to refill it. This is a bad, but regular occurrence in a microprocessor. Our instruction throughput drops from its 1 instruction per clock peak to 0; not good.

Other scenarios can create interruptions in the normal flow of things within our microprocessor. Some instructions may take multiple cycles at a single stage to complete. More complex arithmetic may spend significantly longer at the execute stage while the operation works out. With an in-order microprocessor, all instructions behind it must wait.

Again, the more stages in your pipeline, the bigger the penalty for a stall. But when the pipeline is full, a deeper pipeline will give us a higher clock speed and better overall performance - we just need to worry about keeping the pipeline full (which takes a great deal of additional transistors). And yes, there is an upper limit to how deep you can pipeline your processor before you start running into diminishing returns in both a performance and power sense, this was ultimately the downfall of the Pentium 4’s architecture.

Index Superscalar to the Rescue
POST A COMMENT

60 Comments

View All Comments

  • DLeRium - Wednesday, July 08, 2009 - link

    It's unacceptable because what? HTC Touch Diamond 2 and Touch Pro 2 are flagship phones with ARM11 processors? Yes I pointed out some phones that have it, but have you guys even seriously used an S60 phone? Multiple S60 phones? I've gone through N80, N95, N82, and I've toyed with an N85 and 5800 also. The N97 is certainly fine in usability. It could use some more RAM, but even if you stuffed a Cortex A8 and some more RAM, the phone is still going to get a lot of flak except for people who look at it on paper. Symbian S60v5 is perfectly fine on ARM11. It doesn't need some insane CPU to keep up with the UI.

    Moreover, the N97 isn't really that much of a gaming platform like the iPhone. Think of the N97 like the Touch Pro 2. The Touch Pro 2 is more business oriented with the QWERTY and everything. HTC didn't upgrade the camera, and didn't bother to build it the same way the flagship Diamond 2 was built. This doesn't mean it's a BAD phone.

    You guys are thinking of this whole thing like a computer or something. Have you seen the N95 photos? It's a 2007 phone. Pretty much the best 5MP all around. The N97 does a little better. Yes it demolishes the 3MP crap on the 3GS. So coming from a more computer-centric crowd here yes it makes sense to bash a CPU, but from a mobile phone perspective it's not even that bad at all. If anything the phone was first a phone before it was a camera and then an MP3 player, and now a powerhouse mini computer. If you're telling me that in 2006 I could've bought a Sony Ericsson 3 MP cameraphone, then why are we still stuck there on the 3GS? There are more important features that phones push for such as music, camera, later GPS and connectivity, and now processing power. Give it some time and I bet you Nokia will have a winner soon.

    What crappy screen on the N97? Resistive? Get over it. The iPhone is capacitive, so all phones must be capacitive? The iPhone has a Cortex A8, everything else must have it? Please. Multi touch is patented by Apple, so it's a little difficult to move into that arena for now. There are advantages and disadvantages to both resistive and capacitive screens. Just because the N97 doesn't mimick the iPhone doesn't mean it sucks. HTC's WinMo phones are resistive screens too. So are the new Samsung Omnia II and Pro phones. So is the new Sony Ericsson Satio.

    Different phones are built differently, but honestly when you look at pure functionality, the lack of multitasking is much larger than a CPU difference.

    I feel it is justified to say Nokia needs to get to work, but to hear this from people who really doesn't have as much experience with unlocked phones is like hearing one of those ditzy people who buys Apple thinking it'll solve their spyware problems on their PC tell you why a Mac is superior. I'd rather hear it from the computer guru. Gizmodo may be negative, but I think Engadget gave the N97 a fair look and so did other reviewers like PhoneArena, GSMArena, Mobile-Burn, Symbian-Guru.

    Look I have nothing against Apple. I have a 3GS too. It's just not my thing and I'm back on my N-series. I'm not a Nokia fanboy or anything. There's plenty of criticism I've given the N97 and Nokia in the S60 section of HoFo, but I believe having had 3 iPhones, Anand is quite biased.
    Reply
  • vshah - Wednesday, July 08, 2009 - link

    Anand,
    Thanks for this excellent article, I really enjoyed the cpu benchmark/comparisons you did; they paint a very clear picture.

    I was curious as to your thoughts on the multitasking implementation on Android. Holding down home for a couple seconds brings up the 6 most recently accessed apps/tasks, and I've always found switching between them to be pretty fluid. Have you had a chance to try that out?

    Thanks,
    Vivan
    Reply
  • MrX8503 - Wednesday, July 08, 2009 - link

    This isn't even a phone site and it has the most in depth review of the iphone yet.

    I guess being a tech site, Anandtech has an edge over other sites that just review phones.

    Good Work!
    Reply
  • kmmatney - Wednesday, July 08, 2009 - link

    I needed a new phone for work, and spent about 30 minutes in the local AT&T store testing out phones yesterday. After using the blackberry and iPhone for quite some time, I have to say the iPhone was much better. I was way more productive with it - everything was easy to do, while I felt like all the other phones were fighting me. Overall, a fantastic phone for business - I went for the 16GB 3GS model - the only gripe is the 7 day wait for shipment. Reply
  • sprockkets - Wednesday, July 08, 2009 - link

    You mentioned using voice command. Can you use it via a BT headset?

    Also, does it play ringtones over the headset? Does it announce who is calling over the headset?
    Reply
  • nafhan - Wednesday, July 08, 2009 - link

    Thanks for a great article!

    One minor complaint, and it's really not even a complaint. I want to point out that this would have made two excellent stand alone articles.

    First article would have been about the current state of mobile CPU and GPU architecture. This section was excellent and detailed enough that I really felt it deserved it's own article rather than being lumped in as part of your iPhone impressions.

    Second article would have been your impressions and review of the 3GS.
    Reply
  • WeaselITB - Wednesday, July 08, 2009 - link

    I agree with this - it does seem that there are two articles vying for attention here, and with a bit more polish they could have been published separately.

    That said, I do want to commend you for this article. These are the types of in-depth reports that made me start reading AT ten or so years ago, and they are the type of in-depth reports that keep me reading. Thanks, Anand.
    Reply
  • Rolphus - Wednesday, July 08, 2009 - link

    I think this is an important point. Apple see the iPhone as a device, exactly the same as the iPod. No "user" compares about the iPod's CPU, any more than they care about the CPU of their refrigerator. For it to be a true consumer device (rather than a computer), it should "just work", and work with acceptable performance, in all the situations it's designed for.

    Yes, us techies want to know more, and that's precisely why we come to sites like Anandtech and read your articles. I don't think the mainstream user is ever going to care about these specs, but rather what the phone can do.
    Reply
  • medi01 - Wednesday, July 08, 2009 - link

    Wouldn't it be better to review alternatives? Like Samsung's new shiny MOLED display smartphone? Reply
  • wuyanxu - Wednesday, July 08, 2009 - link

    superb article! a lot more indepth than all other websites. would love to see more like this, with more information on how the graphics cores improved its performance.

    however, what you should not forget is avaliability of jailbreak for 3GS. in the conclusion you've mentioned the hassle of re-launching apps. with a jailbreak, you will be able to send an app to background and get instant re-lanuch.

    my dream phone would be an iPhone with Andriod-like pull-down status bar notification system, and have JB's backgrounder come as standard.
    the pull down status bar will have the top 2/3 to be notifications. press to launch its apps. the bottom 1/3 will be icons of opened apps, and to close it, simply drag the icon to a reserved area.
    the idea is similar to a jailbroken app called mQuickDo, except with the notification system.
    Reply

Log in

Don't have an account? Sign up now