Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel

Name: Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel
Item: Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel
Author: Anand Lal Shimpi

by Anand Lal Shimpi on October 5, 2012 2:45 AM EST

Posted in
CPUs
Intel
Haswell

245 Comments | Add A Comment

245 Comments

TSX

Johan did a great job explaining Haswell's Transactional Synchronization eXtensions (TSX), so I won't go into as much depth here. The basic premise is simple, although the implementation is quite complex.

It's easy to demand well threaded applications from software vendors, but actually implementing code that scales well across unlimited threads isn't easy. Parallelizing truly independent tasks is the low hanging fruit, but it's the tasks that all access the same data structure that can create problems. With multiple cores accessing the same data structure, running independent of one another, there's the risk of two different cores writing to the same part of the same structure. Only one set of data can be right, but dealing with this concurrent access problem can get hairy.

The simplest way to deal with it is simply to lock the entire data structure as soon as one core starts accessing it and only allow that one core write access until it's done. Other cores are given access to the data structure, but serially, not in parallel to avoid any data integrity issues.

This is by far the easiest way to deal with the problem of multiple threads accessing the same data structure, however it also prevents any performance scaling across multiple threads/cores. As focused as Intel is on increasing single threaded performance, a lot of die area goes wasted if applications don't scale well with more cores.

Software developers can instead choose to implement more fine grained locking of data structures, however doing so obviously increases the complexity of their code.

Haswell's TSX instructions allow the developer to shift much of the complexity of managing locks to the CPU. Using the new Hardware Lock Elision and its XAQUIRE/XRELEASE instructions, Haswell developers can mark a section of code for transactional execution. Haswell will then execute the code as if no hardware locks were in place and if it completes without issues the CPU will commit all writes to memory and enjoy the performance benefits. If two or more threads attempt to write to the same area in memory, the process is aborted and code re-executed traditionally with locks. The XAQUIRE/XRELEASE instructions decode to no-ops on earlier architectures so backwards compatibility isn't a problem.

Like most new instructions, it's going to take a while for Haswell's TSX to take off as we'll need to see significant adoption of Haswell platforms as well as developers embracing the new instructions. TSX does stand to show improvements in performance anywhere from client to server performance if implemented however, this is definitely one to watch for and be excited about.

Haswell also continues improvements in virtualization performance, including big decreases to guest/host transition times.

Decoupled L3 Cache Haswell's GPU

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

245 Comments

View All Comments

tipoo - Sunday, October 7, 2012 - link
I don't think so, doesn't the HD4000 have more bandwidth to work with than AMDs APUs yet offers worse performance? They still had headroom there. I think it's just for TDP, they limit how much power the GPUs can use since the architecture is oriented at mobile.
magnimus1 - Friday, October 5, 2012 - link
Would love to hear your take on how Intel's latest and greatest fares against Qualcomm's latest and greatest!
cosmotic - Friday, October 5, 2012 - link
Ah, an MPEG2 encoder. Just in time!
jamyryals - Friday, October 5, 2012 - link
This made me :)
name99 - Friday, October 5, 2012 - link
We laugh but one possibility is that Intel hopes to sell Haswell's inside US broadcast equipment.
There isn't much broadcast equipment sold, but the costs are massive, and there's no obvious reason not to replace much of that custom hardware with intel chips.
And much of the existing broadcast hardware (at least the MPEG2-encoding part) is obviously garbage --- the artifacts I see on broadcast TV are bad even for the prime-time networks, and are truly awful for the budget independent operators.

Much like they have written a cell-tower stack to run on i7's to replace the similarly grossly over-priced custom hardware that lives in cell towers, and are currently deploying in China. Anand wrote about this about two weeks ago.
vt1hun - Friday, October 5, 2012 - link
Do you have an idea when Intel will move to DDR4 ? Not with Haswell according to this article.

Thank you
tipoo - Friday, October 5, 2012 - link
Haswell EX for servers will support DDR4, but even Broadwell on desktops is only DDR3, we won't see DDR4 in desktops until 2015.
jwcalla - Friday, October 5, 2012 - link
We'll probably see DDR4 in the ARM space before we have it on Intel.

Maybe this should be AMD's focus of attack: if they can't compete on performance, at least try on chipset features.

Perhaps Intel's biggest concern would be if somebody comes along with a super-efficient x86 emulator for ARM. Going forward, "legacy applications" is going to be an increasingly important selling point to prevent ARM inroads on the low end.

Microsoft keeping their Windows ARM version locked-down is a key to that too, and likely a deference to their relationship with Intel. But Apple is less likely to similarly constrain themselves.
meloz - Saturday, October 6, 2012 - link
>We'll probably see DDR4 in the ARM space before we have it on Intel.

>Maybe this should be AMD's focus of attack: if they can't compete on performance, at least try on chipset features.

The problem with DDR4 is likely going to be the price. We all know how the memory industry likes to jack up the prices whenever a new spec comes out. Remember how expensive DDr3 was when it started to replace DDR2?

Some people joke that this transition is the only time they make any money in the RAM business, and considering the low prices of DDR3 you have to wonder.

DDR4 might offer some performance and power advantage on release, but it will likely be more expensive and take time (12-18 months?) to offer a compelling performance / $ advantage over cheap DDR3 variants.

If AMD is trying to position itself as 'value' brand, chaining themselves to DDR4 (before Intel's volume brings down the prices for everyone) could spell their doom.
Kevin G - Friday, October 5, 2012 - link
Intel is set to launch Ivy Bridge EX on a new socket late in 2013 on a new socket. The on-die controller will likely use memory buffering similar to what Nehalem-EX and Westmere-EX use. The buffer chips may initially use DDR3 but this would allow for a trivial migration to DDR4 since the on-die controller doesn't communicate directly with the memory chips.

Come to think of it, Intel could migration Nehalem-EX/Westmere-EX to DDR4 with a chipset upgrade. Vendors like HP put the buffer chips and memory slots on a daughter card so only that part would need replacement.

Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel

TSX

Post Your Comment

245 Comments

View All Comments

tipoo - Sunday, October 7, 2012 - link

magnimus1 - Friday, October 5, 2012 - link

cosmotic - Friday, October 5, 2012 - link

jamyryals - Friday, October 5, 2012 - link

name99 - Friday, October 5, 2012 - link

vt1hun - Friday, October 5, 2012 - link

tipoo - Friday, October 5, 2012 - link

jwcalla - Friday, October 5, 2012 - link

meloz - Saturday, October 6, 2012 - link

Kevin G - Friday, October 5, 2012 - link

Log in

Don't have an account? Sign up now