Server and Workstation Roadmap

Beyond the Stable Image Platform, the main areas of interest for corporate and enterprise customers are going to be the server and workstation parts. Many roadmaps come and go with few changes to these areas. A few new parts might be announced, but generally the business/enterprise roadmaps are far less volatile than the consumer sector. This is one of the rare occasions where we get a large number of new parts showing up all at once, and in some ways it's a reflection of the recent IDF announcements. In other areas, though, we're simply seeing the Xeon equivalents and enhancements of the desktop world. There have been some updates to the Xeon platforms, so we'll look at those before moving onto the processors.

Click to enlarge

Among the conglomeration of acronyms, code names, and features, Intel has announced the names of the next generation chipsets. 5000X, 5000P, and 5000V coincide with the Greencreek, Blackford, and Blackford-VS code names that we had up until now. The chief difference between 5000X (Greencreek) and 5000P (Blackford) is in the configuration of the PCI Express lanes. Greencreek combines two X8 lanes into an X16 slot, while Blackford has three separate X8 connections. It's difficult to think of many uses for the 4000 MBps of bi-directional bandwidth that X16 offers that don't require server functionality, so the chipset breakdown seems to make some sense. 5000V is listed as the "Value DP" chipset, and it comes with some limitations. It only supports one X8 PCIe connection and has a maximum capacity of 16GB of RAM using 8 DIMMs. In contrast, 5000X/P both support up to 64GB of RAM over 16 DIMMs. All of the 5000 series of chipsets also support PCI-X, SATA, I/OAT, and AMT.

Another interesting aspect of the chipsets is the supported bus speeds. Xeon MP will get a bump up to 800FSB with the next chipset, providing for either 667 or 800FSB speeds. Meanwhile, on the Xeon DP chipsets, 800FSB is actually missing. Instead, we have both 667FSB and 1066FSB parts - these are for the upcoming 65nm processors; the current generation 90nm parts will run on current chipsets at 800FSB. The Xeon MP 800FSB is for a few select, high-end parts, with the majority of Xeon MP chips continuing to use 667FSB. Xeon DP is already at 800FSB for most of the parts, so 667FSB is actually a step backwards. Sossaman is the exception, as that chip is the Xeon version of Yonah, but several of the 65nm Dempsey chips will also use 667FSB. Two steps forward, one step back it seems.

Besides the above server platforms, there are also workstation platforms that more or less echo what's listed. Glidewell is the workstation version of Bensley, and Wyloway matches up with Kaylo. There are a few differences, though, like the use of the 975X chipset for Wyloway. No memory type was listed for Truland, but we would assume that it will also be FBD like Bensley and the other 5000 series chipsets. We found the use of the same chipsets for the upcoming NetBurst-based processors as well as the Conroe/Woodcrest parts to be a refreshing change - we won't be forced to upgrade for the new parts, unlike the 915/925 to 945/955 launch. Of course, these are all future chipsets, so no current motherboard or chipset will support the next generation architecture.

That takes care of the platforms, so let's move to the upcoming Xeon processors that will be used in these platforms.

Click to enlarge

Many of the entries simply copy the desktop offerings, although the package/socket for the DP and MP systems differs from that of the desktop offerings. We omitted the UP chips, as they are nothing more the Pentium 4 and Pentium D and we've already covered those. We've also left out the majority of the already shipping processors, like the Irwindale and Nocona Xeon DP parts. We did include all of the current Xeon MP parts, mostly because the large cache sizes are impressive. If the 4MB and 8MB Potomac L3 cache sizes aren't big enough, hopefully the 16MB L3 cache of Tulsa will be sufficient! Tulsa takes advantage of the 65nm process shrink to reach its large cache size, while the other Xeon MP parts all make due with 90nm technology. Such large cache sizes make for big - and expensive - chips, needless to say. Tulsa is also the only Xeon chip to get the Pellston technology, which we'll discuss more when we get to Itanium.

Where AMD launched Opteron dual core parts first and followed up with the Athlon 64 X2, Intel has gone the other route and launched Pentium D first, with the Xeon DP/MP parts only now nearing completion. The model numbers of Pentium 4/D/M have spread to the Xeon line as well, with all of the newest parts receiving names in the thousands. 7xxx is for Xeon MP parts and 5xxx is for Xeon DP parts. The 7020 through 7041 will use the older Paxville MP core - the Xeon variant of Pentium D and Smithfield. The 5xxx parts will use the Dempsey core instead, making them Xeon flavors of the Presler core. The Tulsa core is also dual-core, so it will likely get a 7xxx model number as well in the future. There is also one Paxville DP part scheduled for launch, the 2.80GHz chip. Unlike the MP dual core chips, this part doesn't earn the right for a model number. It is also the only 90nm dual core DP part - perhaps 7040 chips that didn't make the cut will be binned as Xeon DP 2.80 chips? We've already talked about the various bus speeds that will be available for the Xeon platform, and you can find the specifics in the table. VT is also targeted to launch with the Xeon MP 7xxx series, although Intel may choose to cut that feature between now and launch, so it is still "To Be Determined" (TBD).

Other than the NetBurst chips, we have a few other parts coming out. Sossaman is the server variant of Yonah. While Intel didn't list TDP for all of the parts - and TDP is itself a misleading number - we do get the 31W TDP of Sossaman at 2.0GHz. That's about one third of the TDP of the Xeon chips, and Intel's talk about performance per Watt is realized in such designs. The other part that shows up for the first time on the corporate roadmaps is Woodcrest. Woodcrest is the server/workstation version of Conroe - at least, one of the versions. While we know that Conroe will have both 2MB and 4MB versions, for now the server edition will be 4MB of cache only. That makes sense, as servers and workstations run application loads that benefit more from additional cache than desktop applications. Other than the amount of cache and the features that will be enabled on the initial Woodcrest parts, we don't have any specific core speeds. Even the "less than 80W TDP" is pretty vague - we'd imagine Woodcrest and Conroe will be targeting more like 50W TDP, judging by Yonah, but that's more of a guess than anything. Sossaman will also get the Xeon monicer, as far as we can tell, while Woodcrest is still TBD.
Stable Image Platform Program Enterprise Server Roadmap
Comments Locked

21 Comments

View All Comments

  • IntelUser2000 - Monday, September 12, 2005 - link

    Itanium either supports hardware emulation OR software translation. The difference between emulation and translation may seem to be minimal, but translation has much better performance than emulation. While the hardware emulation just emulates instructions, the software translator dynamically optimizes the code on the fly to improve performance.

    Hardware emulation is NOT present on Montecito in favor of IA-32EL(software translation)
  • IntelUser2000 - Monday, September 12, 2005 - link

    The MAJOR difference betweeen Foxton and *OTHER* dynamic overclocking is that Foxton is implemented on HARDWARE, while other dynamic overclocking is based on SOFTWARE.

    I guess you guys may refer to the dynamic overclocking by MSI by D.O.T. or the one in ATI Catalyst driver. But they are software based. 30 million of the LOGIC transistors are dedicated to JUST Foxton technology.

    Foxton isn't just dynamic overclocking. If the power consumption exceeds the set threshold, it clocks the CPU down until its equal or under the threshold point. Unlike conventional overclocking, Foxton FINDS the right point where it won't damage the CPU, while providing the maximum clockspeed the design can provide.

    OCing Prescott to 6GHz is not safe point, BTW.

    Foxton responds extremely fast on demand and power consumption. The hardware feature for Foxton is extensive for power management, basing it on power consumption, temperature, workload.
  • JarredWalton - Monday, September 12, 2005 - link

    Good points, and obviously I wasn't trying to get into the deep details of Itanium. I have a question for you, though, as you seem to know plenty about Itanium: Intel currently has IA-32EL; is there an IA-EM64T-EL in the works? (It might be called something else, but basically EM64T emulation for Itanium?)

    Even though Foxton is hardware based, we still don't know how it actually performs in practice - at least, I don't. (I probably never will, as I haven't even used an Itanium system other than to poke around a bit at some tradeshows.) 955 can run as high as 2.0 GHz under load - in practice, can you actually reach that speed most of the time, or is it more like 1.80 GHz for a bit, then 2.0 GHz for a bit, and maybe 1.90 GHz in between?

    Also, are you sure about the "30 million transistors" part? That's larger than the entire Itanium Merced core (not counting the L3 cache). I suppose if you're talking about all the debugging and monitoring transistors, 30 million might be possible, but I didn't think all of that was lumped under "Foxton"?
  • IntelUser2000 - Monday, September 12, 2005 - link

    I think there is plan for EM64T extension to IA-32EL. I heard from Inquirer that Montvale may have that, but either I could have misunderstood it/or its a rumor. Its just software support so I guess Intel can put it whenever they want to.

    For Foxton speeds, it depends. From what I understand, there is a thing called a power virus(A power virus is a malicious computer program that executes a specific instruction mix in order to establish the maximum power rating for a given CPU.), and if a number for power virus is 1.0(meaning 100% of maximum power), for Linpack its 0.8, specfp2k is 0.7, specint2k is 0.65, TpmC is 0.6. Since TpmC is furthest away from the power virus figure, it would reach maximum speed all the time, for 9055, that is 2.0GHz. For speccpu2k, it may be 1.9GHz, and for Linpack it may be 1.8GHz. So for some programs, there may be no benefit AT ALL, while others may get the maximum.

    Foxton can sample every 8uS to change voltage and frequency.


    Yes, I am sure about the Foxton hardware transistor count part. It uses custom 32-bit DSP with its own RAM to process the data necessary for Foxton. I was sort of surprised but yeah, around 30 million. Sorry I couldn't give the link, I'll send you somehow, give me info of how, but I do remember clearly. Merced has 25 million transistors including 96KB L2, without it that's around 20 million I guess, but Mckinley is actually simpler and has less logic transistors than Merced, which according to some, its around 15-17 million transistors.

    Montecito has 64 million transistors NOT including L2. 64-30=34 million/2=17 million transistors, which is right on mark for
  • IntelUser2000 - Wednesday, September 14, 2005 - link

    http://66.102.7.104/search?q=cache:fZ7OTmmmXrgJ:ww...">http://66.102.7.104/search?q=cache:fZ7O...f+1.7+bi...

    Well, I was KINDA right.

    quote:

    Hewlett-Packard declared. 30 million transistors, as many as are in a Pentium II, are responsible solely for power management


    Though, yes that doesn't mean they are all for Foxton. Maybe, I don't know.


    Itanium Merced has 25.4 million transistors. ~6 million of that is dedicated to x86 hardware emulator. Which leaves with 19.4 million transistors. W/O including 96KB L2, it would be around 14-15 million transistors for Merced core logic.


  • IntelUser2000 - Wednesday, September 14, 2005 - link

    OTOH, I think the site could be wrong. It doesn't make sense with other Montecito papers saying it consumes less than 0.5W and takes less than 0.5% die size. I give up haha.
  • Jimw18600 - Monday, September 12, 2005 - link

    Your definition of HTT is a little skewed. It doesn't enable processing multiple threads; that was always there, whether they were earmarked or not. What it does do, is instead of flushing the instruction buffer back to the missed branch, it restarts the broken thread and continues the rest forward. Broken threads are simply tossed out and resources are reclaimed in the last stage in the pipeline; completed threads are retired. And by the way, the reason Intel was forced to go to HTT was they were heading for 31-stage pipelines. If you were still back at 12-15 stages, HTT didn't have that much to offer.
  • JarredWalton - Monday, September 12, 2005 - link

    My definition of HTT was actually taken directly from the roadmap. That's how Intel describes it, and obviously a 1 sentence summary leaves out a lot of details. HTT does allow the concurrent execution of more than one thread, but resource contention makes it difficult to say exactly how HTT will affect performance.

    One interesting point about SMT in general is that POWER5 doesn't have 20 to 31 pipeline stages and yet it still benefits from the IBM SMT design. This is purely a hunch on my part, but I wouldn't be at all surprised to see some form of HT come out for Conroe/Woodcrest in the future. Trouble filling all for issue slots from one thread? SMT could help out. We'll see if Intel does that or not in the future.

    Note: HTT was actually present (but disabled) since Northwood for sure. Some people suspect that it was actually present in an early form in Willamette. Just because Conroe doesn't currently show any HT support, doesn't mean there's not some deactivaated features awaiting further testing. :)
  • IntelUser2000 - Monday, September 12, 2005 - link

    From what I understand, modern single thread processors like the early Northwood P4's can execute multiple threads, but not ALL simultaneously. Since today's processors are fast enough anyway, it SEEMS like multi-tasking. The OS decides how to devote the time to the CPUs I guess.

    HT, makes use of the otherwise idle units, since it will give basically double demand to the CPU. None of the thread can make full advantage of the CPU(say 15%), but second thread makes it more efficient by taking 20% advantage of the CPU, which is 33% better throughput. It is more complex than that, but I think that explanation is enough.

    Power 4/5 issue rate is 5-wide, which is quite a lot. It also has 17-stage pipeline, which is close to Pentium 4 Willamette/Northwood. Wide and deep, with lots of bandwidth and enough execution units, its perfect for SMT.
  • coomar - Monday, September 12, 2005 - link

    kind of difficult to read the confidential

    virtualization sounds interesting

Log in

Don't have an account? Sign up now