The successor to Pentium 4 is...

For quite some time now we've been trying to figure out what the successor to the Pentium 4's Netburst architecture would be. When the Pentium M was first released, everyone expected it to be the direct successor to the Pentium 4, but things obviously didn't work out that way.

Intel had Tejas ready to go, the successor to Prescott, but at that time it was clear that the path that they had chosen for the Pentium 4 had come to an end - limited by power. The Pentium M was a reasonable competitor, but not exactly a revolutionary successor to the Pentium 4. Based on our conversations and our experiences at IDF we're finally able to start piecing together what the eventual successor to the Pentium 4 will be. Remember that the Pentium 4 architecture will continue to exist throughout 2005 as the Pentium D and Pentium Extreme Edition, but with Intel's decision to drop the number 4 it's clear that they are ready for a departure from the Pentium 4 brand and architecture.

The first question has always been pipeline depth, will the successor to Netburst have a long pipeline like Prescott, or a short pipeline like the Pentium M. The answer appears to be somewhere in between Pentium M and Prescott, realistically being much closer to Willamette's 20 stage integer pipeline than Prescott's 31 stage pipe, for strictly power reasons. Intel is no longer doing as much research as they once were in branch prediction, indicating an end to the extreme pipeline growth that we've seen since the introduction of the Pentium 4. There has been a lot of research into areas such as continuous flow pipelines, but it's unclear whether that sort of technology will make its way into the next iteration of the Pentium line.

A lot of the lessons learned in the Pentium M will of course be applied to the Netburst successor, with Micro Ops Fusion being mentioned quite frequently. Intel management is finally aware that clock speed isn't the sole seller of CPUs, so they are more willing to design more elegant, high IPC cores at lower clock speeds this next time around - a lot of this is due to the success with Centrino (part of the reason why you see a switch to the Pentium brand name instead of Pentium 4, Pentium may very well become a platform much like Centrino).

For the next generation desktop microarchitecture, Intel still appears to be committed to the current style of big out-of-order cores, meaning that we won't see any Cell-style architectures from Intel this next time around. For the most part, we think this makes a lot of sense at the present time given the applications that are currently being run. Intel's thoughts are this; if they were to move immediately to a simpler core architecture and use a large number of them in parallel, that leaves too much opportunity for another company to build a CPU made up of fewer, more powerful cores, which on the current applications would perform better, or at least be easier to program for.

In the generation after the Pentium 4 successor, things may change as Intel has talked about having a handful of big cores and then multiple smaller cores for more specific, extremely parallel workloads. Looking at Intel's view of their microprocessors starting at around 2010, they start to appear a lot like Cell. What may inevitably happen is that Cell may be a bit ahead of its time in the marketplace.

Hyper Threading (SMT) will not die with the Pentium 4, in fact, the number of threads per core will go from 2 up to 4 threads before the end of the decade. The move to 8 threads per core won't happen anytime soon however, apparently there is a pretty sizeable performance gain by enabling 4 threads per core, but not as much when going from 4 to 8 threads.

Larger and software controlled caches will be much more common going forward, also eerily similar to the Cell architecture (the Cell SPEs only have local memory, which is similar to the idea of a software controlled cache).

You can expect a continued focus on SIMD performance, a perfect example happens to be the improvements in SIMD performance in Yonah's core that we reported on earlier.

Although we're quite convinced that an on-die memory controller would result in the best performance per transistor expended on a new architecture, we're doubtful that Intel would consider one. We may have to wait until stacked die and wafer technology before we see any sort of serious reduction in memory latency through techniques other than more caches and more cores.

Index Special Purpose Hardware - Intel's take on Cell
Comments Locked

22 Comments

View All Comments

  • IntelUser2000 - Friday, March 4, 2005 - link

    "why would you make a feature on a cpu that can only be enabled later anyways?"

    It may not make sense but apparently Pentium 4 had hyperthreading disabled for even Willamette. There are talks about unused "dark transistors" in Prescott. Why would they do that? Maybe because its not feasible, or easy, or cheap enough, and they want to enable it later. Like if Willamette had HT disabled, the performance degrade would have been significant, unlike today, which is negligible. But its easier to enable when its already there don't you think?

    "I thought we were talking about ways of increasing bandwidth to the cpu -- eg. intel does it by increasing the ram standard (ddr333 to ddr400), amd has now chosen to have the on die memory controller so as faster HTT will increase bandwidth between cpu and everything else"

    Specifically, bandwidth between dual processor and multi-processor and/or I/O. We are talking about desktops here so its only I/O. The CPU only need to talk to the memory controller for memory banwidth, and its integrated so it doesn't need HTT increase.

    You are kinda saying that if bus speeds increase in Pentium 4's, L2 cache bandwidth increases. That doesn't make sense at all. http://www.amd.com/us-en/Processors/ProductInforma...

    It even says at AMD for HTT: A system bus that uses HyperTransport technology for high-speed I/O COMMUNICATION.
  • Houdani - Friday, March 4, 2005 - link

    /snicker People talking to themselves is always good for a chuckle.
  • ncage - Friday, March 4, 2005 - link

    #20. I do agree with you that this would be a nightmare to code for unless they make the compiler so good that it does a majoritiy of the work for you which i can't imagine the compiler being THAT good. That would mean lots and lots of multithreaded programming which gets VERY complex. There are usually a few things you can spawn a new thread and process stuff in the background but for most applications, more than a few threads are not needed and deciding areas of your application that could be sped up with more threads becomes VERY complex. Take a for loop. Maybe every iteration in your for loop could be handled by a seperate process but what happenes if they are handled in order? What if you have the 3rd result before you have the 1st. This is a relatively simple example of course. Multithreaded programming becomes quite complex. High Multiprocessing becomes very useful in complex scientific appliations though. I also thing it would be quite useful in games

    On a side note i want to know what you have programmed like this? IF you have programmed something like i will be quite impressed.
  • fitten - Friday, March 4, 2005 - link

    Well... we have yet to see whether Cell will make it out of PS3s and IBM servers, though. Cell will be too complicated to program for regular programmers (I've programmed similar systems in the past) and Sony's paper launch claims that they've solved problems that no one has been able to solve yet... so... forgive me if I don't hold my breath waiting for Cell.
  • Warder45 - Friday, March 4, 2005 - link

    Yeah but now they have competition with CELL to get them moving on SPH.
  • mrmorris - Friday, March 4, 2005 - link

    "Intel has spoken a bit about including special purpose hardware in their forthcoming processors..."

    Yeah well, that's what they 6said back in the MMX days some 3600MHz ago!!
  • xsilver - Friday, March 4, 2005 - link

    "Dude, HTT is not memory standard, that's the link for the I/O, or in case of servers, communication between CPUs, get your facts straight. "

    I thought we were talking about ways of increasing bandwidth to the cpu -- eg. intel does it by increasing the ram standard (ddr333 to ddr400), amd has now chosen to have the on die memory controller so as faster HTT will increase bandwidth between cpu and everything else

    "Not exactly free since you need to buy the CPU, you can't just enable on current CPUs can you? :). "

    intel has the same thing, except you change the mobo instead of the cpu? how many people change the mobo without changing the cpu == answer, nobody .... why would you make a feature on a cpu that can only be enabled later anyways? its like your dad handing you the keys to a ferrari but then says, you can only drive it when you're 18 sonny boy :) ... why not just buy you the ferrari when you're 18?..... oh wait -- didnt intel just do it with their 64bit instructions on the prescott?

    from a performance perspective I still cant see a good argument for why intel is leaving out the on die controller.... its all economics of making more money from chipset sales

  • sphinx - Friday, March 4, 2005 - link

    I agree #11

    I think it is time to dump x86 altogether. Let's face it, Intel and AMD are still using the x86 architecture as a base for their new processors. I want to know if the CELL processor will change computing as we know it.
  • ceefka - Friday, March 4, 2005 - link

    Dedicated logic is nice when you can update it by flash (like BIOS). That can already be done by using FPGAs and CPLDs (like Xilinx). If too much becomes dedicated in a fixed way, without being low-cost upgradable the PC loses its versatility and attractiveness altogether.

    Can anyone remember what a PC was like ten years back in 1995? Who would have predicted then that we would have 64-bit capable CPU's on the brink of going dual core, 4GB capable mainboards, 300GB HDDs, LCD screens, and actually affordable RAM?

    When Intel adopts all these memory techniques so fast it's only logical that they are hesitant to produce a CPU with integrated mem controller.

    256MB on die RAM? That will be one expensive MF!
  • IntelUser2000 - Friday, March 4, 2005 - link

    "intel's reasoning doesn't make sense. they seem make people change mobos, not because of differing ram standards, but because they change cpu socket so damn often."

    Well, it makes sense at server side, specifically Xeon MP and Itanium, and according to some news that's what they are gonna do, since FB-DIMM will allow changing standards without changing chipsets or chip.

    " the memory controller on the AMD64 has already been updated from HTT800mhz to HTT1000mhz.... and can be continually revised and just introduced on newer steppings of the same cpu's.... eg. amd's forthcoming "e" spec with sse3, 4x ddr3200 support and other stuff for free"

    Dude, HTT is not memory standard, that's the link for the I/O, or in case of servers, communication between CPUs, get your facts straight.

    We don't know what's the max speed grade the memory controller on A64 will support. But the thing is if you want better memory standards than what the memory controller is capable of, you need newer versions, in this case newer CPU. Of course this does not apply to S423 to S478 and S478 to S775.

    "amd's forthcoming "e" spec with sse3, 4x ddr3200 support and other stuff for free"

    Not exactly free since you need to buy the CPU, you can't just enable on current CPUs can you? :).

    SSE3 is not related to integrated memory controller, 4xDDR3200 support was there already.

Log in

Don't have an account? Sign up now