02:28PM - And we're done! We'll be working on a deeper Haswell architecture piece over the next couple of days.

02:27PM - Intel isn't disclosing exact details on what aspects of voltage regulation have been integrated

02:27PM - But lots of the fine grained control on client Haswell platforms we'll see in servers

02:27PM - Not going into detail on the Haswell server product today

02:27PM - Haswell will include far more power gates on the platform level

02:26PM - Haswell integrates some but not all of the voltage regulation so Intel can do more fine grained control of the pieces inside the die

02:25PM - Sidenote: it's always hilarious to see how many Intel OEMs and competitors end up in these tech insight sessions

02:25PM - TSX support coming in Linux and Windows

02:23PM - Time for Q&A

02:22PM - Piazza on Haswell GPU: "this is certainly not the end"

02:21PM - Nearing the end - Summary time

02:21PM - In the past only had two concurrent engines: codec and imaging/scale/composite, now you can do more in parallel as long as there's enough bandwidth to sustain

02:20PM - Now there are three concurrent video engines: codec, imaging and scale/composition

02:19PM - Hardware image stabilization is new in Haswell

02:18PM - Moved some video processing stuff off the EU array into a dedicated video quality engine

02:15PM - 4Kx2K video acceleration is supported

02:15PM - Usages: video serving, multi-party video conferencing

02:14PM - Introducing hardware based SVC codec, can encode once and playback at multiple resolutions

02:13PM - Higher encode quality, faster Quick Sync with GT3

02:13PM - Now talking about Haswell video processing

02:11PM - GT3 seems to double everything

02:11PM - Half a terabyte of internal bandwidth between compute and cache

02:10PM - Doubled the performance of most of the fixed function units for normal rendering on the GT3 part

02:09PM - Added a resource streamer at the front end, offloads some driver work which helps the CPU go to sleep so the GPU can do work on behalf of the driver instead of the CPU

02:08PM - Independent voltage/frequency domains for CPU, ring and GPU now?

02:08PM - CPUs can run at low voltage/low frequency, but the GPU can now pull the ring up to feed the engines without pulling up the CPU voltage/frequency

02:08PM - Haswell totally decouples the ring from the CPU

02:08PM - There's now a GT3 part

02:07PM - Haswell GPU architecture is similar to IVB, Broadwell will likely be different

02:05PM - Tom Piazza is on the stage

02:04PM - Now on to graphics innovations

02:04PM - One hour session tomorrow on TSX, hmm I hope it doesn't conflict with another major event...

02:03PM - Hardware can then attempt to extract parallelism with concurrent memory accesses

02:03PM - TSX allows the developer to give hints about concurrent accesses

02:03PM - But what if you have two threads accessing the same table but are updating completely independent things?

02:02PM - Normally when you have many cores working on the same data structure, you typically have one thread handle updates and lock the structure for everything else

02:01PM - Now talking about Intel Transactional Synchronization Extensions (TSX)

02:01PM - This will benefit AVX2 code as well as on legacy code as well

02:01PM - Also doubled bandwidth at L2 cache, went from 1 read of the L2 every other clock cycle to a read every clock cycle

01:59PM - This is for the L1 data cache

01:59PM - Can also do a write of the cache as well, 2 reads + 1 write at 256bits wide

01:59PM - Can now do a 256-bit load, AVX load, with a single read of the cache - and two ports

01:58PM - Same sizes L1/L2 caches as SNB/IVB

01:58PM - Whenever we double the FLOPS like we did here, you need to double the capability to feed those units

01:57PM - A bunch of new vector and scalar instructions

01:56PM - 4x the peak FP throughput of Nehalem

01:56PM - Since Haswell can do 2 FMAs every cycle per core

01:55PM - AVX2 doubles peak FP throughput of Haswell

01:54PM - Ooh: even deeper dive on Haswell microarchitecture later today

01:54PM - L2 TLB is bigger

01:53PM - We now have the ability to do two FP multiply-adds every cycle

01:53PM - Added another integer ALU, can now execute 2 branches per cycle, another store address port, can do 2 loads and a store every cycle

01:53PM - Haswell adds port 6 and 7, up to 8 ops every cycle

01:53PM - Nehalem/SNB could execute 6 ops every cycle, port 0 - 5

01:52PM - Improved branch prediction

01:52PM - Increasing size of buffers internally, giving us larger OoO window

01:52PM - Now it's time to talk about Haswell CPU microarchitecture

01:51PM - A lot of focus on improving overall platform power, not just the CPU/SoC

01:50PM - Haswell adds more low power IO: I2C, SDIO, I2S, UART

01:50PM - Panel self refresh is supported (if the image doesn't change, display just keeps displaying the same image, rest of the platform goes to sleep)

01:49PM - Worked on increasing efficiency of voltage regulators

01:49PM - To meet the power goals Intel worked with OEMs to give power budgets for main components in the rest of the system

01:48PM - This is how you achieve the 20x platform idle power improvement

01:48PM - We can work with our friends at the process manufacturing side, adapt the process to give us a recipe to fit the processor/die perfectly

01:48PM - Even deeper C-states, can transition between C-states up to 25% faster

01:48PM - Power delivery system is much more fine grained in delivering power to only the pieces that need to be on

01:47PM - That link is optimized for the lowest energy per transfer possible

01:47PM - The link between the CPU and the chipset has been optimized for power, depending on which Haswell part you get

01:47PM - Finer grained voltage/frequency control

01:47PM - Haswell extends the turbo range a little bit

01:46PM - Haswell platform is almost always in this new S0ix active idle state with instant resume

01:45PM - It sounds like Haswell remains in S0 but can quickly transition to active idle, allowing you to get the best of both worlds

01:45PM - "Transparent to well written software"

01:45PM - The hardware does this automatically, continuous, fine grained

01:45PM - Transition times are a lot shorter between high and low power states

01:45PM - This is where we get improvements in platform idle, and battery life

01:44PM - OS thinks the SoC is active, but you get idle power characteristics and can transition between active and idle very quickly

01:44PM - Added completely new set of idle states: S0ix

01:43PM - In the same level of system responsiveness, the system power has come down - transition times to lower power states are quicker now as well

01:43PM - In Haswell, we have worked in making power efficiency/power for active be much better

01:42PM - And you transition between the two, active state was in watts, idle states go into hundreds of milliwatts

01:42PM - IVB had two major power states: S0 (awake) and S3/S4 (sleep)

01:41PM - When you get into those power levels (8W), you can get into very attractive tablets, and also think about going fanless

01:41PM - We can also have the same graphics performance at half the power

01:41PM - Haswell achieves, at the same power level, we have twice the graphics performance [over IVB]

01:40PM - Now talking about Haswell Power Management

01:40PM - "Haswell adds agility"

01:39PM - Active power: from tablet to desktop

01:39PM - Design points in the past still exist, but adding lower power design points that we never had before

01:38PM - Haswell Modularity: 2 - 4 cores, GT1 - GT3 graphics

01:38PM - The same power enhancements you need to get into tablets actually benefit many core server designs as well

01:35PM - Haswell will go from tablets to servers and everything in between

01:35PM - Today's disclosure will focus on what's new

01:35PM - Haswell Design Philosophy: retain prior SNB/IVB microarchitecture features, Hyper Threading, Turbo Boost, Ring Interconnect

01:34PM - Span of Haswell family is larger than previous architectures

01:34PM - Haswell is a tock, second 22nm CPU but significant change at the platform and architectural level

01:32PM - We're going to get a high level architecture disclosure as well as some indication of what we'll see in client deployments of Haswell

01:31PM - Ronak Singhal, one of the Haswell architects, is talking now

01:30PM - Seats are filling up, we're waiting for the session to begin

Comments Locked

42 Comments

View All Comments

  • dishayu - Wednesday, September 12, 2012 - link

    Besides, if we're honest, we know mostly all that is to know about iPhone 5 anyways. Not so much about Haswell.
  • melgross - Wednesday, September 12, 2012 - link

    We don't know much of anything about the SoC. we're seeing rumors about a 32nm A5, 32nm A5x, and now, a 32nm A6.

    Now that it's 12:45 pm, Wednesday, we'll know shortly.
  • softdrinkviking - Tuesday, September 11, 2012 - link

    seems like haswell is more of an enhanced version of IVB/SNB rather than the major change that intel claims. that would take redesigning rather than adding features and increased provisioning.
  • 1008anan - Tuesday, September 11, 2012 - link

    Will the 10 watts TDP Haswell SoCs have:
    --2 CPU Cores per CPU SoC
    --2 CPU threads per Core or a total of 4 CPU threads per CPU SoC
    --8 double precision FMAs and 16 double precision FMAs per thread, or a total of 32 double precision FMAs and 64 double precision FMAs per CPU SoC
    --At a 1 Gigahertz clock would this mean a theoretical maximum of 32 Gigahertz double precision and 64 Gigahertz single precision excluding embedded graphics per CPU SoC.
    Or will the 10 watt TDP Haswell SoC not have 16 double precision FMAs and 32 single precision FMAs per cycle per core?

    In other words will the ultra low voltage Haswell CPU cores have 32 single precision flops per clock and 16 double precision flops per clock? Or are the ultra-low voltage Haswell CPU cores different from the 35 watt TDP or desktop Haswell CPU cores?

    Another question I have is what will be the lowest TDP Haswell CPU skew that has 4 CPU cores, 8 CPU threads. For ivy bridge it is the 35 watt TDP mobile i7.

    Too bad Intel doesn't seem to be sharing much about Haswell Xeon parts. :-(
  • lmcd - Wednesday, September 12, 2012 - link

    Is this equivalent to the dual-integer design of Bulldozer architecture? I know that earlier hyperthreading is not equivalent but does that put one Haswell core equivalent to one Bulldozer module? Not in terms of speed/power but in terms of resources.
  • jeroompje - Wednesday, September 12, 2012 - link

    How many Intel sata-3 ports will Haswell make available?

    cheers,
    Jerome.
  • Lucian Armasu - Wednesday, September 12, 2012 - link

    Haswell only has support for OpenGL 4.0? Well that's very disappointing.
  • jadedcorliss - Monday, September 17, 2012 - link

    Considering that they've updated the OpenGL version with patches in the past, I wouldn't be surprised if this means Haswell will fully support OpenGL 4.0 at launch, and that they'll be working on a patch to 4.2 or 4.3 around then.
  • Cloakstar - Wednesday, September 12, 2012 - link

    Post to the bottom of the frame. Reverse the chronology. (Oldest -> Newest)

    As already stated, if viewing live, the text presently moves as new posts are made.

    Also, if viewing after the fact, the present format requires one to scroll to the bottom to start reading.

    The only time the present format works is if one happens to catch the blog as it starts.

    At the very least, store the blog line by line, so it is easy to reverse order after it is recorded.
  • Arbie - Wednesday, September 12, 2012 - link


    Will it run Crysis?

Log in

Don't have an account? Sign up now