Intel Pentium 4 6xx and 3.73EE: Favoring Features Over Performance

Name: Intel Pentium 4 6xx and 3.73EE: Favoring Features Over Performance
Item: Intel Pentium 4 6xx and 3.73EE: Favoring Features Over Performance
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on February 21, 2005 6:15 AM EST

Posted in
CPUs

71 Comments | Add A Comment

71 Comments

Twice the Cache - 17% Higher Latency

Both the Pentium 4 6xx and the new Extreme Edition share the same core, meaning they also have the same L2 cache. When Intel first launched Prescott we noticed that in the move to the new architecture that cache latencies went up tremendously. The increase in cache latencies was to be expected, as one tradeoff of a larger cache is that it takings longer to find and access data. So when we heard that Intel was moving to a 2MB L2 cache with the 6xx series, we wondered how much slower the cache would get.

First we wanted to confirm that L1 cache latencies stayed the same, and they did at 4 cycles for the new Prescott 2M based core:

	Cachemem L1 Latency	ScienceMark L1 Latency
AMD Athlon 64	3 cycles	3 cycles
Intel Pentium 4 (Northwood)	1 cycle	2 cycles
Intel Pentium 4 (Prescott)	4 cycles	4 cycles
Intel Pentium 4 (Prescott 2M)	4 cycles	4 cycles
Intel Pentium M	3 cycles	3 cycles

Next up, was L2 cache latency. In our review of the Pentium M processor on the desktop we discovered that its 10 cycle L2 cache was responsible for its solid performance in non "media rich" applications (e.g. office applications, OS performance). The original Prescott had a 23 cycle L2 cache, and with a 2MB cache the latency has gone up to 27 cycles:

	Cachemem L2 Latency	ScienceMark L2 Latency
AMD Athlon 64	17 cycles	18 cycles
Intel Pentium 4 (Northwood)	16 cycles	16 cycles
Intel Pentium 4 (Prescott)	23 cycles	23 cycles
Intel Pentium 4 (Prescott 2M)	27 cycles	27 cycles
Intel Pentium M	10 cycles	10 cycles

While we're talking about "only" 4 cycles, at 3.6GHz that's 17% longer to access data from L2 cache. Given Prescott's extremely lengthy pipeline, a 17% increase in L2 cache latency is not going to help minimize the downsides of such a long pipeline. Also keep in mind that the only architectural change here is a larger L2 cache, so none of the normal tricks to help hide memory latencies are expanded upon in the new Pentium 4.

What Intel is counting on is that the increase in hit rate provided by a 100% larger cache will outshine the 17% longer access to L2 cache. Did Intel make the right bet? In order to find out we took the new Pentium 4 660 (3.6GHz - 2MB L2) and compared it to the old Pentium 4 560 (3.6GHz - 1MB L2), with all other variables the same, let's see how much of an impact the extra megabyte of cache has in the real world.

In the business category, we see the added cache paying off a little. SYSMark shows good improvement in the document creation portion of its tests, while the Business Winstone makes some very good gains. Worldbench shows web browsing with Mozilla to have improved a good bit while our compression test and the ACDSee test show a loss in performance. These losses generally indicate areas where the test is more dependant on latency than cache hit rate. On the content creation side, adding Windows Media Encoder to the Mozilla test improves performance more than the individual Mozilla test. This is likely due to the fact that the large cache keeps Mozilla's data from being kicked out while Windows Media Encoder is working.

On the gaming front, Doom 3 is the only test we saw with any performance improvement. And the only other application to show a significant performance gain is Maya with more than a 43% gain. The huge gain in performance under Maya is likely a result of 1MB of cache being too small to fit models in while 2MB is enough. This seems to be a case where the test is very bandwidth sensitive rather than latency sensitive. Dropping most (if not all) of the data being worked on into the L2 cache offers a program a very large boost in apparent bandwidth.

As we can see, the unfortunate truth for performance on the 600 series is that most consumer data sets can fit into a 1MB cache just fine. The added cache does seem to help with multitasking from our limited investigation of the subject. The more threads that hit memory aggressively, the better chance we have of seeing a benefit from the 2MB cache. This is because less data from each thread will be kicked out of the cache, resulting in fewer pipeline stalls.

Unfortunately, most usage models that are a good fit for the 600 series are server and workstation workloads. Streaming data (using or encoding media), games, and most other consumer applications don't have the lots of big data requirement that can really separate the performance of the 1MB and 2MB parts.

As we've provided this chart and gone through the general impact of the benchmarks on Intel's new 600 line, we won't include analysis on the pages with our benchmark data. For those who are interested in a deeper look at the numbers and performance of all 5 new parts, graphs of each benchmark are included later in this article.

Impact of L2 Cache Size on Performance (1MB vs. 2MB - 3.60GHz)
	1MB L2	2MB L2	2MB Performance Advantage
Business/General Use Performance
Business Winstone 2004	21.4	24.2	13.0%
SYSMark 2004 - Communication	137	137	0.0%
SYSMark 2004 - Document Creation	201	218	8.4%
SYSMark 2004 - Data Analysis	184	186	1.0%
Microsoft Office XP with SP-2	522	520	0.3%
Mozilla 1.4	459	422	8.0%
ACD Systems ACDSee PowerPack 5.0	547	558	-2.0%
Ahead Software Nero Express 6.0.0.3	545	550	-0.9%
WinZip Computing WinZip 8.1	412	411	0.2%
WinRAR	479	469	-2.0%
Multitasking Content Creation Performance
Content Creation Winstone 2004	32.7	33.9	3.7%
SYSMark 2004 - 3D Creation	231	231	0.0%
SYSMark 2004 - 2D Creation	288	279	-3.1%
SYSMark 2004 - Web Publication	206	203	-1.0%
Mozilla and Windows Media Encoder	676	601	11.1%
Video/Photo Creation & Editing
Adobe Photoshop 7.0.1	342	342	0.0%
Adobe Premiere 6.5	461	468	-1.5%
Roxio VideoWave Movie Creator 1.5	287	276	3.8%
Audio/Video Encoding
MusicMatch Jukebox 7.10	484	470	2.9%
DivX Encoding	55.3	55.4	0.2%
XviD Encoding	33.9	33.4	-1.4%
Microsoft Windows Media Encoder 9.0	2.57	2.56	-0.3%
Gaming
Doom 3	84.6	88.6	4.7%
UT2004	59.3	60.4	1.9%
Wolfenstein: ET	97.2	95.5	-1.7%
3D Rendering
Discreet 3dsmax 5.1 (DX)	268	266	0.7%
Discreet 3dsmax 5.1 (OGL)	327	329	-0.6%
SPECapc 3dsmax 6	1.64	1.62	-1.1%
Professional 3D
SPECviewperf 8 - 3dsmax-03	17.04	17.11	0.4%
SPECviewperf 8 - catia-01	13.87	13.57	-2.2%
SPECviewperf 8 - light-07	14.3	13.83	-3.3%
SPECviewperf 8 - maya-01	13.12	18.85	43.7%
SPECviewperf 8 - proe-03	16.7	16.5	-1.2%
SPECviewperf 8 - sw-01	13.09	13.33	1.8%
SPECviewperf 8 - ugs-04	15.31	13.82	-9.7%

Index An Interesting Observation: Prescott 2M's Die

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

71 Comments

View All Comments

Alfaneo - Friday, August 26, 2005 - link

here is 478 pin result
Run All Summary

---------- SUM_RESULTS\3DSMAX\SUMMARY.TXT
3dsmax-03 Weighted Geometric Mean = 16.99

---------- SUM_RESULTS\CATIA\SUMMARY.TXT
catia-01 Weighted Geometric Mean = 14.27

---------- SUM_RESULTS\ENSIGHT\SUMMARY.TXT
ensight-01 Weighted Geometric Mean = 20.60

---------- SUM_RESULTS\LIGHT\SUMMARY.TXT
light-07 Weighted Geometric Mean = 12.34

---------- SUM_RESULTS\MAYA\SUMMARY.TXT
maya-01 Weighted Geometric Mean = 18.69

---------- SUM_RESULTS\PROE\SUMMARY.TXT
proe-03 Weighted Geometric Mean = 16.74

---------- SUM_RESULTS\SW\SUMMARY.TXT
sw-01 Weighted Geometric Mean = 14.16

---------- SUM_RESULTS\UGS\SUMMARY.TXT
ugs-04 Weighted Geometric Mean = 18.35
blckgrffn - Thursday, February 24, 2005 - link
Let's hope that they don't post it because they know that running 1T is imperative to get good performance number, and thus use it by default.
Hans Maulwurf - Wednesday, February 23, 2005 - link
Many other sites don´t publish their command rate either, this looks very strange for me. Most sites used to publish them before. I don´t understand...
L3p3rM355i4h - Wednesday, February 23, 2005 - link
I'm assuming 1T, although the ammount of pwnage that would occur if it was 2T would be incredible.
Zebo - Wednesday, February 23, 2005 - link
Derek/Anand- Why is it you don't say what A64's command rate was? 1T or 2T? This makes a huge impact on A64's performance (as shown by Anand right here and myself in forums) and is sloppy jounalism to leave out. Sure "other" sites do this crap but not anandtech.:|
Dualboy24 - Wednesday, February 23, 2005 - link
I am just not finding the releases to be impressive lately... I am waiting to see the future dual core etc... perhaps that will wow us all. Its just not like the 90s anymore where it was always an exciting time with CPUs.

Perhaps a battle between 56kbps modem models would prove entertaining :) L()L
neogodless - Tuesday, February 22, 2005 - link
#64 I'm saying I don't bother. I don't do that at work (P42.4C) either. At work, I listen to MP3s while having 2 e-mail clients open, various browser windows and tabs, a development environment, FTP, database manager, various IM programs, remote desktop, etc. And I do about the same at home, though usually on a smaller scale. And it works fine. However, if I go to a web page that gobbles up resources (poorly written javascript, i can give you an example page), I'm able to do everything else on the HT machine which shows about "53%" overall CPU usage. An the Athlon 64, if something gobbles up CPU, I see "99%" usage and a sluggish environment. But it's ALL subjective... I want to see Objective measurements.

I also don't want to see Athlon vs. Intel opinions/flames because I'm not claiming one or the other is better... just asking for objective measurements.
RZaakir - Tuesday, February 22, 2005 - link
neo, are you saying that you have problems running a game and listening to MP3s simultaneously on your Athlon 64?
RockHydra11 - Tuesday, February 22, 2005 - link
Disappointing to say the least....
neogodless - Tuesday, February 22, 2005 - link
I'm not sure why I got attacked for requesting Multi-tasking benchmarks. I prefer my AMD for gaming, and I prefer the Intel at work where I run lots of programs at once but (unfortunately) never game. It's not a fair comparison anyway because my home machine is limited by only one monitor, while my work machine has two.

Yes, many benchmarks are optimized for Hyperthreading, and if they are synthetic, then it doesn't matter. I'm asking for benchmarks with programs you use every day. If they're optimized for Hyperthreading, then you will see real world benefit from that, when using an HT enabled processor.

When I run games on my AMD64, it gobbles up all the CPU (even if it's an old game) for whatever reason, and I don't find it practical to leave a game running in the background while doing something else. I've done it, and it didn't greatly hinder doing some small task like check e-mail or send an instant message, but I wouldn't intentionally do it, especially if I decided I'd rather listen to Mp3s than finish my game...

Intel Pentium 4 6xx and 3.73EE: Favoring Features Over Performance

Twice the Cache - 17% Higher Latency

Post Your Comment

71 Comments

View All Comments

Alfaneo - Friday, August 26, 2005 - link

blckgrffn - Thursday, February 24, 2005 - link

Hans Maulwurf - Wednesday, February 23, 2005 - link

L3p3rM355i4h - Wednesday, February 23, 2005 - link

Zebo - Wednesday, February 23, 2005 - link

Dualboy24 - Wednesday, February 23, 2005 - link

neogodless - Tuesday, February 22, 2005 - link

RZaakir - Tuesday, February 22, 2005 - link

RockHydra11 - Tuesday, February 22, 2005 - link

neogodless - Tuesday, February 22, 2005 - link

Log in

Don't have an account? Sign up now