Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel

Name: Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel
Item: Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel
Author: Anand Lal Shimpi

by Anand Lal Shimpi on October 5, 2012 2:45 AM EST

Posted in
CPUs
Intel
Haswell

245 Comments | Add A Comment

245 Comments

CPU Architecture Improvements: Background

Despite all of this platform discussion, we must not forget that Haswell is the fourth tock since Intel instituted its tick-tock cadence. If you're not familiar with the terminology by now a tock is a "new" microprocessor architecture on an existing manufacturing process. In this case we're talking about Intel's 22nm 3D transistors, that first debuted with Ivy Bridge. Although Haswell is clearly SoC focused, the designs we're talking about today all use Intel's 22nm CPU process - not the 22nm SoC process that has yet to debut for Atom. It's important to not give Intel too much credit on the manufacturing front. While it has a full node advantage over the competition in the PC space, it's currently only shipping a 32nm low power SoC process. Intel may still have a more power efficient process at 32nm than its other competitors in the SoC space, but the full node advantage simply doesn't exist there yet.

Although Haswell is labeled as a new micro-architecture, it borrows heavily from those that came before it. Without going into the full details on how CPUs work I feel like we need a bit of a recap to really appreciate the changes Intel made to Haswell.

At a high level the goal of a CPU is to grab instructions from memory and execute those instructions. All of the tricks and improvements we see from one generation to the next just help to accomplish that goal faster.

The assembly line analogy for a pipelined microprocessor is over used but that's because it is quite accurate. Rather than seeing one instruction worked on at a time, modern processors feature an assembly line of steps that breaks up the grab/execute process to allow for higher throughput.

The basic pipeline is as follows: fetch, decode, execute, commit to memory. You first fetch the next instruction from memory (there's a counter and pointer that tells the CPU where to find the next instruction). You then decode that instruction into an internally understood format (this is key to enabling backwards compatibility). Next you execute the instruction (this stage, like most here, is split up into fetching data needed by the instruction among other things). Finally you commit the results of that instruction to memory and start the process over again.

Modern CPU pipelines feature many more stages than what I've outlined here. Conroe featured a 14 stage integer pipeline, Nehalem increased that to 16 stages, while Sandy Bridge saw a shift to a 14 - 19 stage pipeline (depending on hit/miss in the decoded uop cache).

The front end is responsible for fetching and decoding instructions, while the back end deals with executing them. The division between the two halves of the CPU pipeline also separates the part of the pipeline that must execute in order from the part that can execute out of order. Instructions have to be fetched and completed in program order (can't click Print until you click File first), but they can be executed in any order possible so long as the result is correct.

Why would you want to execute instructions out of order? It turns out that many instructions are either dependent on one another (e.g. C=A+B followed by E=C+D) or they need data that's not immediately available and has to be fetched from main memory (a process that can take hundreds of cycles, or an eternity in the eyes of the processor). Being able to reorder instructions before they're executed allows the processor to keep doing work rather than just sitting around waiting.

Sidebar on Performance Modeling

Microprocessor design is one giant balancing act. You model application performance and build the best architecture you can in a given die area for those applications. Tradeoffs are inevitably made as designers are bound by power, area and schedule constraints. You do the best you can this generation and try to get the low hanging fruit next time.

Performance modeling includes current applications of value, future algorithms that you expect to matter when the chip ships as well as insight from key software developers (if Apple and Microsoft tell you that they'll be doing a lot of realistic fur rendering in 4 years, you better make sure your chip is good at what they plan on doing). Obviously you can't predict everything that will happen, so you continue to model and test as new applications and workloads emerge. You feed that data back into the design loop and it continues to influence architectures down the road.

During all of this modeling, even once a design is done, you begin to notice bottlenecks in your design in various workloads. Perhaps you notice that your L1 cache is too small for some newer workloads, or that for a bunch of popular games you're seeing a memory access pattern that your prefetchers don't do a good job of predicting. More fundamentally, maybe you notice that you're decode bound more often than you'd like - or alternatively that you need more integer ALUs or FP hardware. You take this data and feed it back to the team(s) working on future architectures.

The folks working on future architectures then prioritize the wish list and work on including what they can.

Other Power Savings & The Fourth Haswell The Haswell Front End

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

245 Comments

View All Comments

dishayu - Friday, October 5, 2012 - link
I derived immense pleasure reading the article. Thank you, Anand. Big ups for the comprehensive read.
My thoughts :
I think Intel really dropped the ball by not having unlinked clocks for each core, like qualcomm has for it's s4 pro processors. There are so many times that, for instance, i have a page open with some animated GIFs. They are strictly single thread processes and they won't let the processor go to idle state. And this is a very VERY common occurance that can IMO, only be solved by adopting unlocked states for each core. 3 cores can stay in sleep state (almost perpetually) and the processor runs on a single core with lowered frequency. THAT would be power efficient.
dagamer34 - Friday, October 5, 2012 - link
Uhh... isn't turning off unused cores and overclocking the 4th core within TDP to perform single threaded tasks exactly what Turbo Boost introduced in Sandy Bridge is?
know of fence - Friday, October 5, 2012 - link
Reducing power is great and also inevitable, but Intel's move to compete against everything and everybody is alarming. With everyone trying to follow/please Apple, that means nothing good for the consumer, throw-away luxury electronics for exceptionally well groomed masses.
Also, isn't it too early to be hyping this stuff?
A5 - Friday, October 5, 2012 - link
Intel has to compete against ARM to keep them from taking over the "good-enough" computing space.

As for the rest of it, you're not making any sense.
jjj - Friday, October 5, 2012 - link
The ARM problem is not about the product but about price, long term the CPU/SoC ASP will drop hard ,there is competition now. Servers will keep them on life support for a while but without fundamental changes to their business model they can't make it.
Intel should remember how they won the market .
dishayu - Friday, October 5, 2012 - link
It's about both. Intel does not have sufficinetly low power parts at all, regardless the price point.
mrdude - Friday, October 5, 2012 - link
Regardless of whether they step foot into that end of the spectrum or not (and by Anand's analysis that's more likely with Broadwell and on?), they still need to compete on price.

It's one thing to make a chip, it's quote another to make it competitive with respect to pricing. What works against a distant AMD won't work against ARM.
DesDizzy - Sunday, October 7, 2012 - link
I agree. This seems to be something that most people overlook when addressing the Wintel monopoly. The costs of Wintel products are high within the PC/Laptop space. The price of ARM/Apps are cheap within the Smartphone/Tab space. How do Wintel square this circle without damaging their business model?
Krysto - Friday, October 5, 2012 - link
You may not agree with Charlie, Anand, but reality seems to agree with him:

http://www.techradar.com/news/computing/apple/appl...

I really don't know how you can think Apple would ever start using Intel chips in their iPads when Apple has already proven they want to make their own chips with A6.

Also, according to Charlie, Haswell will be like 40% more expensive than IVB. Atom tablets already seem to start at like $800. So I wish Intel good luck with that. Ultrabooks and Win8 hybrids won't drop down in price any time soon.

http://semiaccurate.com/2012/10/03/oems-call-intel...
Penti - Friday, October 5, 2012 - link
I don't know how you could fail so much in reading comprehension, Anand only said the same flying spaghetti monster-damn form factor. Nothing else. There also must be an ecosystem, but if you can run the same app on a tablet as well as a desktop on x86 with more performance then ARM why wouldn't you see vendors use it. It is a full system even capable of building itself. It's not about killing ARM. Intel still uses it, they need fairly high-performance RISC chips for stuff like baseband. They had a large markets in smart-phones before 2006 and they made the choice to sell it because they had Atom in their lineup. They didn't forget about it.

It's Microsoft tablets that costs 500-900 dollars even on Atom, but they only need to compete with Windows RT which is totally retarded as far as corporate customers go and not the same system as 8 Pro, doesn't run the same software. An Android tablet could use a Z2460 (and coming Z2580, after that Valleyview SoC's) and build a 240 dollar tablet. There is no price difference to be had as far as hardware is concerned. Windows 8 tablets are a whole other form factor and device to begin with. Most will have keyboard and multitouch trackpad.

He only talks about the same form factor, size and battery life here. In the Microsoft ecosystem there is really no reason to go to Windows RT powered ARM-devices which doesn't have better performance and runs no third party desktop (Win32/Full Windows SDK) software. It also lacks the same features in other areas which makes them devices instead of general computing platforms. Remember they offer both here. Hell the built in email is even worse then the one built into Android since version 3.0 or so, it's a lot worse then Third party mail-clients in Android, it's worse then mail-clients in Blackberry 10, Symbian, iOS and so on. If your replacing a desktop your not going with ARM here, not on a Windows device at least, Anand only talks about a new bread of DTR Tablets and Ultra-portables that will fit in the same form factor and battery life scenarios as ARM-tablets. Apple certainly don't need to participate here.

Intel certainly has sales to be made if they move Haswell down to low-power Atom territory when it comes out later next year. They could be used as the only computing device you have (smartphone + hybrid tablet-pc). Replacing desktops, ARM/ATOM-tablets, media PCs for your TV (just stream with Miracast). Et cetera. ARM-devices would just be cheaper less capable devices there. But it's still different targets. Haswell still targets server (enterprise-market), desktop, notebooks with larger form-factor/power-usage, as well as more portable stuff. Atom is still for the handheld stuff you use with one hand. ARM has moved quiet fast but they have no reason to target high-performance applications or built 100W SoC's that is fast without parallel computing. Applications like high-performance routers for example still uses licensed and custom MIPS and PowerPC chips. There are plenty of markets where a full feature ARM Cortex or x86 won't work either. ARM is just moving into the multimedia-field, replacing customs architectures in TV's, displacing MIPS, PPC etc. If Apple builds a very large custom CPU-architecture compatible with ARM ISA for workstations, notebooks etc they will just be in the same position they were with PowerPC and have to compete with the high-performance chips that most can't compete with, even with much larger resources then Apple. Apple and Samsung has no reason in doing so outside handheld devices, low-power servers, consumer oriented routers, streaming media boxes which leaves plenty of room for Intel and all the rest. Plus WiFi and wireless baseband in a huge market in of it self and it doesn't matter what the application processor architecture is. Stuff like ARM has competed because you could replace previous products with it easily, thus taking some of the SoC-market away from other, but that coincides with the choice to do so.

Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel

CPU Architecture Improvements: Background

Sidebar on Performance Modeling

Post Your Comment

245 Comments

View All Comments

dishayu - Friday, October 5, 2012 - link

dagamer34 - Friday, October 5, 2012 - link

know of fence - Friday, October 5, 2012 - link

A5 - Friday, October 5, 2012 - link

jjj - Friday, October 5, 2012 - link

dishayu - Friday, October 5, 2012 - link

mrdude - Friday, October 5, 2012 - link

DesDizzy - Sunday, October 7, 2012 - link

Krysto - Friday, October 5, 2012 - link

Penti - Friday, October 5, 2012 - link

Log in

Don't have an account? Sign up now