Back to Article

  • Braumin - Sunday, February 09, 2014 - link

    It's fascinating to see how they can just keep finding ways to do things better. I'm sure whomever figured out how to reduce leakage on the memory interface by 100x got a nice Christmas bonus!

    Any word on how Broadwell is coming along? Do you think we'll see it this year or will it be moved to 2015?
  • przemo_li - Sunday, February 09, 2014 - link

    Judging by Linux enablement Broadwell is coming along nicely. 3.14/3.15 should do the job on the kernel side. Mesa is also progressing well. H2 2014 should see Broadwell Out-of-The-Box experience on decent level for stable Linux distros. Reply
  • TheinsanegamerN - Sunday, February 09, 2014 - link

    so, it should be out H2 2014 then? Reply
  • Krysto - Monday, February 10, 2014 - link

    If they're still announcing "new" Haswell chips at this point, then at best we'll see a couple of Broadwell designs ship by Christmas, but definitely in no "mass market" capacity. Reply
  • toyotabedzrock - Monday, February 10, 2014 - link

    Right before the Kernel for Linux reaches 3.15 or right after.
    All the basics for broadwell have been added for 3.14 but are partly disabled on the gpu side by default. The pci ids where added already.
  • masimilianzo - Sunday, February 09, 2014 - link

    Any word on Broadwell? I heard it will have a new graphics architecture and DDR4 support.
    I'd love to see integrated PCH on at least every mobile processor
  • dragonsqrrl - Sunday, February 09, 2014 - link

    I've only heard about DDR4 support for Haswell-E. Where did you hear that Broadwell would support DDR4? Reply
  • extide - Sunday, February 09, 2014 - link

    Broadwell in the client world will NOT support DDR4. Reply
  • stmok - Sunday, February 09, 2014 - link

    DDR4 isn't coming to mainstream Broadwell version. It still uses DDR3.

    As mentioned, Haswell-E will use DDR4.
    (More specifically LGA2011-3 spec uses DDR4...It is NOT compatible with existing LGA2011).

    The main difference of Broadwell vs Haswell (that I know so far) are:

    * Die shrink; 14nm process node.

    * GT3e (Iris Pro IGP) will be available in more models compared to Haswell.
    => Expect it to be in the multiplier unlocked "K" versions of the Core i5 and i7 desktop processors.
    => Still use Gen8 architecture IGP from Haswell.

    * Both Haswell-Refresh and Broadwell will use a different LGA1150 socket variant.
    (Physically unchanged, but electrically different to current LGA1150...Intel has changed the electrical specifications! Effectively killing compatibility between current Haswell, and upcoming Haswell-Refresh/Broadwell processors!)

    * The upcoming 9-series chipset will introduce SATA Express interface.

    * Broadwell introduces a few new instructions involving:
    => Improving multiple precision arithmetic (integer) performance.
    => Meeting random number generation specifications like NIST SP 800-90B.
    => Hardware prefetch.

    If you want DDR4 to be mainstream, wait for Skylake in 2015.
    => Dual-Channel DDR4.
    => PCIe 4.0 slots.
    => AVX 3.2 instructions.
    => Gen9 architecture IGP.
  • qwerty109 - Monday, February 10, 2014 - link

    stmok, minor correction,
    > => Still use Gen8 architecture IGP from Haswell.
    Haswell is GEN7.5, Broadwell is GEN8.0
  • TiGr1982 - Monday, February 10, 2014 - link

    No, Haswell refresh is expected to be just a minor clock speed bump with respect to current Haswell (and die stepping change at best), and, thus, compatible with the current LGA1150 socket for current Haswell line-up.

    As an argument supporting this state of things, Gigabyte recently (two weeks ago) released a new F8 BIOS for my M/B (Z87X-UD3H) stating "Support New 4th Generation Intel Core Processors", while all existing current Haswell models were already supported by previous BIOSes (F7 and earlier).
  • blanarahul - Sunday, February 09, 2014 - link

    So. How many more years will it take to create the perfect architecture?? We are pretty close but how many years (months?) till we reach the limit?? Reply
  • tipoo - Sunday, February 09, 2014 - link

    Perfect is ill defined. Engineering is all about tradeoffs. One persons perfect processor is not anothers. That said, we've enjoyed gains in processor speeds in large part due to fabrication process shrinks and because of them more caches and transistors thrown at problems, and we should start hitting some weird issues after 14nm, so I'll be curious to see what the industry does after that, if they can keep shrinking, or if we'll be stuck there for a while with them only being able to focus on architecture to improve performance. Reply
  • YazX_ - Sunday, February 09, 2014 - link

    @tipoo, couldnt agree more. Reply
  • purerice - Sunday, February 09, 2014 - link

    Maybe by "perfect" s/he meant either the point where further (noticeable) efficiency is (virtually) impossible ( shrinks aside) or the point where things are "fast enough" for (virtually) any user. Or maybe the point where further efficiency gains can no longer (economically) be made.

    I'd really love to see AMD pull a rabbit out of a hat with their next CPU design. If they can't, I really hope ARM can. Haswell is an architectural masterpiece yet somehow at the desktop/workstation level fails to do much for performance/watt or OC over IB/SB. None of that makes much sense. If -Y and -U chips see major gains in graphics and general processing, why don't the -K chips too?
  • DanNeely - Sunday, February 09, 2014 - link

    Because Intel's optimizing their CPU designs for increasingly low power levels. Instead of the sweet spot being between full power laptop CPUs and mid range desktops leaving a decent amount of head room above the top of the high end parts for additional OC the optimum is being centered on the low power laptop parts with the result that the equivalent of the OC gain from more power we used to get is being used up in the spread between full power laptop and desktop parts with very little left for those of us willing to crank the power up even higher. Reply
  • blanarahul - Monday, February 10, 2014 - link

    That's precisely what I meant. Sorry, I wasn't clear enough in my first post. Reply
  • BMNify - Sunday, February 09, 2014 - link

    "the perfect architecture" there's no such thing, even if you refer to something as simple as x264 encoding a UHD-1 video...

    for instance "Cisco is predicting a nearly 11-fold increase in global mobile data traffic over the next four years to reach 190 exabytes in 2018, with Asia Pacific leading the way ..."

    the fact that by estimating the new haswell details numbers above its clear that you can't real time encode x264 UHD-1 3840×2160 video with high quality settings ,never mind do the real HEVC Main 10 profile UHD-2 7680×4320 digital broadcast expected in 3-6 years from now,

    if you're asking what comes next to accommodate this massive data throughput for real time encoding for the masses then its probably going to be wideIO 2.5D then 3D High Bandwidth Memory , and then Si photonics in combination with these options....

    although you just know that the antiquated server providers will try and keep that away from the masses as long as possible to keep their margins high
  • MrDiSante - Sunday, February 09, 2014 - link

    The last two images in the article aren't showing up - if I try to open them directly I get a 404. Reply
  • dragonsqrrl - Sunday, February 09, 2014 - link

    "I suspect the ULT 2+2 configuration is similar in size to the quad-core + GT2 configuration."

    Do you mean ULT 2+3?
  • dragonsqrrl - Sunday, February 09, 2014 - link

    Disregard, sorry for some reason I thought you meant transistor count and not die area. I'm actually a little curious why the transistor counts don't seem to correlate much with die area, despite the fact that they're all the same architecture. Are ULT processors manufactured on a different power-optimized 22nm process? Reply
  • stickmansam - Sunday, February 09, 2014 - link

    I'm assuming the GPU and CPU have differing densities Reply
  • dragonsqrrl - Sunday, February 09, 2014 - link

    That might make sense if ULT 2+2 had a larger GPU than quad-core GT2, but it doesn't. It has the same GPU configuration and half the CPU cores, yet its approximately the same die area of quad-core GT2. Reply
  • stickmansam - Monday, February 10, 2014 - link

    Well it is an estimate, it may very well be the ULT 2+2 is a harvested chip? Reply
  • dragonsqrrl - Monday, February 10, 2014 - link

    ... then why give an estimate? Reply
  • IntelUser2000 - Monday, February 10, 2014 - link

    That doesn't make sense. Looking at the 2+3 config above, the extra 20 EUs is an exact mirror. Therefore, they can simply cut it off and make a 2+2 part.

    The Gallery is showing 5 different parts with 2+3 and 2+2 being seperate, so why would the die be same?
  • p1esk - Sunday, February 09, 2014 - link

    If the embedded RAM frequency is 1600 MHz, how could its interface produce 6.4 GT/s?
    Shouldn't it be 3.2 GT/s?
  • Stahn Aileron - Sunday, February 09, 2014 - link

    You are assuming it is double-pumped (DDR). There is such a thing as quad-pumping (Quad Data Rate - QDR) relative to operating frequency. (

    With the eDRAM that physically close to the CPU, quad-pumping 1.6GHz wouldn't be much of a problem given the short interconnects.
  • Devfarce - Sunday, February 09, 2014 - link

    I think it's interesting that they used the same circuitry for the eDRAM and the onboard PCH for the ULT models. I was wondering why there wasn't also an onboard PCH with the GT3e models but I'm now really impressed with the answer. I was also under the impression that the eDRAM was built on the 32nm SOC process so a lot of good info is presented here. Reply
  • extide - Sunday, February 09, 2014 - link

    Kind of surprising that the OPIO is single-ended, and not differential. I guess with such a short distance, you can get away with that and just have the wider bus. Cool! Reply
  • iwod - Sunday, February 09, 2014 - link

    Are the any news in regards as to why Xeon and all Server chips are always one node behind? And soon the desktop Haswell-Refresh as well? Will Intel simply relegate all Desktop and Server to older nodes while focusing new node development on Ultra Low Power for Laptop and Mobile Devices? Reply
  • fluxtatic - Sunday, February 09, 2014 - link

    The Xeon variants remain one node behind due to obligations for extended validation and support of the platforms. Intel can develop a new architecture and get it ready for consumer-level release, making back the R&D money they're putting into it, while they're simultaneously developing, testing, and validating the Xeon variants. It just makes sense to release Xeon on a mature arch (eg, one generation behind), as there is a significant amount of time and money involved getting the current generation behind adapted into the Xeon platform requirements.

    Lately, Intel is moving in the direction you describe - targeting mobile (well, laptop) processors first, followed by desktop, and then server/workstation. That makes sense, as well, as laptop processor sales are outstripping desktop. As a bit of speculation, it may be easier to target the lowest-power targets first, and scale up, rather than trying to do the reverse.
  • BMNify - Sunday, February 09, 2014 - link

    "The 128MB eDRAM is divided among eight 16MB macros. The eDRAM operates at 1.6GHz and connects to the outside world via a 4 x 16-bit wide on-package IO (OPIO) interface capable of up to 6.4GT/s. The OPIO is highly scalable and very area/power efficient.

    The Haswell ULT variants use Intel's on-package IO to connect the CPU/GPU island to an on-package PCH. In this configuration the OPIO delivers 4GB/s of bandwidth at 1pJ/bit. When used as an interface to Crystalwell, the interface delivers up to 102GB/s at 1.22pJ/bit. That amounts to a little under 1.07W of power consumed to transmit/receive data at 102GB/s.

    By keeping the eDRAM (or PCH) very close to the CPU island (1.5mm), Intel can make the OPIO extremely simple."

    hmm anand, care to explain please how and why you state "a 4 x 16-bit wide on-package IO (OPIO) interface capable of up to 6.4GT/s." that's a total of 25.6GB/s (at 4.8GT/s it would provide 19.2GB/s total bandwidth to the processor) and yet you say "the interface delivers up to 102GB/s at 1.22pJ/bit. That amounts to a little under 1.07W of power consumed to transmit/receive data at 102GB/s." implying that it has 4 times the bandwidth in real terms ?

    so is the 4 x 16-bit wide being used here to obfuscate the fact its really a max data throughput of 6.4 GBps per link x 4 and so matching the generic Quickpath Interconnect speeds
  • Stahn Aileron - Monday, February 10, 2014 - link

    Some rough math:

    4 x 16 bit = 8 bytes/transfer.
    8 bytes/transfer x 6.4 GT/s = 51.2 GB/s.

    So 51.2 GB/s raw bandwidth (no overhead accounted for.) 102 GB/s total throughput as Anand states sounds like the bus is bi-directional with 51.2 GB/s possible in each direction. So 51.2 GB/s x2 (both directions at the same time) give you "up to 102 GB/s" throughput overall.

    How did you arrive at the 25.6 GB/s value originally?
  • DanNeely - Monday, February 10, 2014 - link

    The 102.5 number appears to come from Intel slide #16 (AT gallery #6). That slide says there are 8 data clusters operating at 6.4GB/s each, with the final x2 apparently being from the bus supporting simultaneous transfer in each direction. Reply
  • twotwotwo - Sunday, February 09, 2014 - link

    I'm sorta curious about the potential for CRW, or some future version of it, for CPU performance.

    More-dynamic programming languages tend to have largish working sets and lots of indirection. In general,the processor still stalls waiting on RAM a lot across lots of workloads. Maybe an "L4" could be a nontrivial win for server-y workloads if the latency/size were right and they shipped it for servers. It's hard to tell; the fact that they're not talking about CRW as a CPU boost does say something.
  • zodiacfml - Monday, February 10, 2014 - link

    I still haven't seen benchmarks whether or not the eDRAM contributes on improving some application performance...or did I missed it? Reply
  • twotwotwo - Monday, February 10, 2014 - link

    I'd love info on this, too. I just poked around in the big list at and did see this:

    Intel Core i7-4960HQ @ 2.60GHz (6M L3, CRW) 10,325

    Intel Core i7-4900MQ @ 2.80GHz (8M L3) 9,123

    If I'm reading it right, the part with CRW (the eDRAM package) did tennish percent better despite lower nominal clock and less L3, on something presented as a pure CPU benchmark. So it did something.

    But someone who knows what they're doing could dig into whether that impression is right, and what sort of app sees the biggest gains.
  • toyotabedzrock - Monday, February 10, 2014 - link

    Your die size for the ult 2+2 is way higher than it should be. There is no reason it would be much bigger than the 2+2. Reply
  • TiGr1982 - Monday, February 10, 2014 - link

    Indeed, 140-160 mm^2 should be the case for ULT 2+2, not 180 mm^2. Reply
  • Akaz1976 - Monday, February 10, 2014 - link

    Does any of this mean that we can get a core i5 processor in an ipad like form factor (with 10+ battery life & full win8)? Reply
  • jimjamjamie - Monday, February 10, 2014 - link

    Surface Pro 2? Reply
  • azazel1024 - Monday, February 10, 2014 - link

    Well, Akaz, you can get a Bay Trail-T based processor in an iPad like format that gets 10+hrs of battery life and runs full Windows 8.1.

    Its not an i5, but is pretty nice.

    Come Airmont/Cherry Trail-T this coming fall, and that should be a rather impressive thing too (though Bay Trail-T is actually pretty decent).

    My T100 can easily push 10hrs of battery life in lightweight use and more than 12 just watching movies (about 5-6hrs in some pretty heavy gaming). Its still a bit of a lightweight compared to my i5 3317u based laptop, but it ain't shabby either. Not very noticable in basic tooling around the OS and web based stuff. Just noticable in things like photoshop and lightroom (and gaming), but even there, it can get the job done, so long as you aren't expecting full laptop/desktop levels of performance.
  • mikk - Tuesday, February 11, 2014 - link

    ~180mm² for 2+2 ULT is nonsense, this is as big as 4+2 Haswell. 130mm² is more accurate. Reply

Log in

Don't have an account? Sign up now