Original Link: http://www.anandtech.com/show/1576
Linux and the Desktop Pentium M: Uncommon Performanceby Kristopher Kubicki on December 24, 2004 12:00 PM EST
- Posted in
IntroductionDothan is something that both perplexes and intrigues us at the same time. Not quite a Pentium 3, not quite a Pentium 4, and not quite something that is entirely different either. Meanwhile, the NetBurst architecture has come under serious strain over the last few years, particularly since Intel's Prescott launch. Is Intel still capable of killer products? And more importantly, do they still dominate on Linux?
As many who follow our Windows reviews know, Pentium M on the desktop is something a few years in the making. Even when the original 130nm Banias processor showed up in 2003, reviewers and customers alike were astonished with the technology. Intel received even more praise when their 90nm Dothan chips of the same product line showed up - utilizing less than 30W during peak operation and less than 5W on idle. Most of these advancements were due to Intel's controversial strategy to rethink the P6 architecture and refining a particularly interesting technology called Enhanced Speed Step. Enhanced Speed Step, also known as EIST, gives the operating system the ability to dynamically clock the processor. Typically, Windows will dedicate the full 100% of the Dothan's clock during intensive operation, but throttle the processor as far down as 10% of its capable speed when the computer is just idling. Thus, Pentium M has achieved incredible status among overclockers and HTPC enthusiasts - on Windows. Today, we will briefly explore the versatility of Pentium M on the Linux desktop. Lessons learned should also apply to the notebook market as well.
That being said, there are already a few fundamental flaws with the Pentium M architecture on Linux, the largest of these being compiler optimizations. While Opteron/Athlon 64 and Pentium M share substantial optimizations from every corner of the OSS universe, Pentium M receives very little regular attention. Dothan/Banias are slightly cursed, since most Linux OSes are built on the - mtune=i686 flag, which specifically tunes compilation to the P6 core (Pentium Pro), from which the Pentium M is derived. Why is that a curse and not a blessing? Although Dothan and Banias certainly share some key elements with the P6 architecture, they are far from it. Pentium M's Micro Ops Fusion, local branch prediction and general optimizations across integer division and register access are completely ignored by the compiler, even when setting - march=pentium-m, since most compilers (particularly anything before GCC 3.4.2) tend to just categorize Pentium M as a P6 processor with a higher clock.
Of course, the Intel C compiler, ICC, behaves very differently, but unfortunately, isn't very free either. We have a few tests today that include the non-commercial ICC as well and we see how they stack up against GCC 3.4.1. So, if it doesn't bother you that the majority of Linux sees your new Pentium M as a glorified Pentium Pro, without further ado, let's check out how it actually performs against other processors that we have looked at in the past.
The TestThe goal today is to benchmark our newest Pentium M Dothans of both the 400MHz and 533MHz front side bus. We would like to see how these processors compare to the better-performing Athlon 64 and Pentium 4 processors available today, particularly for the same price category. We will also look at how the higher clocked front side bus speed, different memory speeds and different compilers affect our benchmark results.
|Performance Test Configuration|
|Processor(s):||AMD Athlon FX-53 (130nm, 2.4GHz, 1MB L2 Cache, Socket 939)
AMD Athlon 64 3800+ (130nm, 2.4GHz, 512KB L2 Cache)
AMD Athlon 64 3500+ (130nm, 2.2GHz, 512KB L2 Cache)
AMD Athlon 64 3200+ (90nm, 2.0GHz, 512KB L2 Cache)
Intel Pentium 4 Extreme Edition 3.4GHz (130nm, 512KB L2 Cache, 2MB L3 Cache)
Intel Pentium 4 560 3.6GHz (90nm, 1MB L2 Cache)
Intel Pentium M 765 2.1GHz (90nm, 2MB L2 Cache, 533FSB)
Intel Pentium M 755 2.0GHz (90nm, 2MB L2 Cache, 400FSB)
|RAM:||2 x 512MB Mushkin PC-3200 CL2 (400MHz)
2 x 512MB Corsair PC2-5400 CL3 (475MHz)
|Motherboards:||DFI LanParty 915P-T12 (Socket 775)
MSI K8T Neo2 (Socket 939)
|Operating System(s):||SuSE 9.1 Professional
|Compiler:||dave:~ # gcc - v
Reading specs from /opt/gcc-mainline/lib/gcc/i586-suse-linux/3.4.1/specs
Configured with: ../configure - enable-threads=posix - prefix=/opt/gcc-mainline - with-local-prefix=/usr/local - infodir=/opt/gcc-mainline/share/info - mandir=/opt/gcc-mainline/share/man - libdir=/opt/gcc-mainline/lib - libexecdir=/opt/gcc-mainline/lib - enable-languages=c,c++,f77,objc,java,ada - enable-checking - enable-libgcj - with-gxx-include-dir=/opt/gcc-mainline/include/g++ --with-slibdir=/lib - with-system-zlib - enable-shared --enable-__cxa_atexit i586-suse-linux
Thread model: posix
gcc version 3.4.1 20040508 (prerelease) (SuSE Linux)
|Intel Compiler:||dave:/opt/intel_cc_80/bin # ./icc -v
As you will see from the specifications above, we are recycling most of our benchmarks from the last Linux CPU roundup that we published a few months ago. The two newcomers to the benchmark are the 2.1GHz 533FSB Dothan Pentium M and the 2.0GHz 400FSB Dothan Pentium M. Both processors use the desktop configuration, Socket 479. Socket 479 processors are somewhat difficult to find right now, although they similarly reflect the performance of their Socket 478 counterparts. Unfortunately, the Dothan/Banias Socket 478 pinout is different electrically than the typical desktop Socket 478, and thus you will need to use a Socket 479 board with a Socket 479 Dothan if you plan on using any of these Pentium M's in your desktop anytime soon.
Looking at the CPU proc information we can discern the following:
dave:~/bench/gcc/linux-2.6.4 # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 2.10GHz stepping : 6 cpu MHz : 2104.892 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe tm2 est bogomips : 4177.92The bogomips score seems fairly accurate; the 3.6GHz Nocona reports about 7200BMIPS per physical processor. However, note that there are no SSE3 enhancements, HyperThreading or EM64T addressing capability. All desktop Pentium M processors today are derived from the blade server market and their feature sets reflects that - blade servers are designed to be small, fast and cool; putting 8GB of memory in a blade would not make a lot of sense.
As you can also see from the information above, our Dothan 2.1GHz is in the 6th stepping, "F". The processor utilizes 64K of L1 cache and 2MB of L2 cache. The DFI motherboard that we use in this analysis keeps our Dothan bus at 100MHz while pushing the clock multiplier up to 21X; thus, effectively running it at 2.1GHz with 400MHz FSB. During the test, we also clock the bus at 133MHz and run the multiplier at 16X, which effectively runs our CPU at 2133MHz with the full 533FSB. This also skews our memory clock a bit - in the first 400MHz configuration, we are running DDR333 (100MHz with a 5:3 ratio). In the second configuration, we use 133MHz at a 4:3 ratio. This is perfectly normal behavior, although keep in mind that the default configuration with our set up runs at DDR200 with the 400MHz FSB. Keep in mind also that we anticipate these lower memory clocks to pinch the Dothan's performance in the long run.
Motherboard DetailsAs we had briefly mentioned earlier, all desktop Dothan/Banias motherboards are all derived from blade configurations. The key term used when describing a blade is density; things tend to be a little smaller, and cost generally takes a back seat to thermal reduction and size reduction. Like a Beowulf cluster, there isn't a lot of need for top of the line, best of breed components so long as the components used are reliable, cool and small. After all, what is a 10% dip in performance on one blade if you can double the number of blades that sit in the same rack?
All Pentium M blades run on Intel's notebook 855GME chipset. This chipset really doesn't differ from any other Dothan notebook chipset; DDR1, AGP/PCI, ICH5 and 400MHz front side bus. The 6300ESB southbridge provides a 64-bit PCI-X (not to be confused with PCI-Express) bus, which is generally dedicated to fiber optic networking.
For these sets of benchmarks, we selected the DFI 855GME-MFG motherboard but AOpen also sells a retail i855GME motherboard. Even though both motherboards come in MicroATX form factors and use relatively older bridges, these boards are very expensive - mostly due to the fact that they have no competition! At time of publication, our DFI 855GME-MFG cost a little over $250, which is a considerable amount to pay for a motherboard. On the other hand, if we buy a comparative top of the line Socket 775 motherboard, with all the trimmings, $250 isn't too much to spend.
DFI and AOpen do not differ much in design of their desktop Dothan motherboards, but DFI's board has a few extra amenities. A Realtek gigabit Ethernet port, six-channel VIA audio and Winbond Firewire are also standard on this motherboard - although, we had difficulties getting SUSE 9.1 and the Realtek 8110S Ethernet to play well together due to driver conflicts. For the duration of the analysis, we used an Intel Pro/1000+ Ethernet controller, but that should not affect our benchmarks. Our motherboard only supports a 4X AGP bus, but as we have seen in dozens of benchmarks before, that should hardly affect video performance, if at all.
A bit about Speed Step, Thermals, Power and NoiseAlthough this is not the first time that we have looked at Dothan, the exciting bit about the technology is that it requires so little power and thus, so little cooling. The only active cooling required in our Pentium M setup was a 40mm fan on the processor heatsink; the Northbridge is cooled passively.
Another reason why we selected the DFI motherboard for this roundup was the fact that it uses such elegant cooling. The HSF combo is proprietary to this motherboard, but it will fit easily in a 1U or SFF case with plenty of clearance and low noise. When we had our test rig setup in an aluminum Hornet Pro SFF chassis from Monarch Computers, we only needed a single, low RPM 80mm fan and the 40mm CPU HSF to cool the rig. At full operation, the Dothan desktop system ran at less than 30 dBA, too low for our Extech devices to even get a measurement at 12".
After a full hour of operation, our BIOS reported the Dothan at a "cool" 98 degrees Fahrenheit. The 40mm fan above is clearly ample enough for our purposes, but we did a little overclocking up to 2.4GHz and the same HSF combo held up fine. More importantly, the combo came free with the motherboard. What interests us even more is that this configuration is not even running Enhanced Speed Step! The 2.6 Linux kernel provides us with an excellent method of adjusting the CPU clock, dubbed "CPU Frequency Scaling". Unfortunately, this option is not enabled by default in most kernel configurations and requires a recompile.
In the .config file of a 2.6.x kernel build directory, we have to change the following lines:
CONFIG_X86_ACPI_CPUFREQ=y CONFIG_CPU_FREQ_GOV_USERSPACE=yBoth lines should now read "y" for yes. After restarting the computer, setting the processor clock speed is as easy as this:
echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeedThe sys daemons now will read the scaling file and dynamically clock the processor to 600,000 KHz. Although this amount of control is great for us right now, running a real time daemon to monitor CPU usage would probably be a little more beneficial to us. "cpufreqd" and "speedfreqd" both take control of the CPU scaling and work the same, if not arguably better than the Windows driver that does the same thing. For the tests in this analysis, both daemons were disabled; although, if you plan on getting the most out of your laptop or Dothan desktop, you should run a daemon for the best thermals and power usage.
Database TestsMySQL 4.0.20d has been a staple of our Linux tests since its inception. Even though it does not carry high relevance for a workstation test, we still regard it as the de facto free, open sourced benchmark for Linux. Below, you can see our results for sql-bench on both the 32-bit kernel for SuSE 9.1.
Our first benchmark with Pentium M seems almost too good to be true. Just to verify that our results are sane, we re-ran the same tests with several different multipliers set on the Dothan 2.1GHz.
Surprisingly, things look very favorable for Pentium M thus far. However, this is a mildly synthetic benchmark and it may not represent real world use as closely as many of our other benchmarks do. As you can see, we overclocked and underclocked the processor a little bit just to get a feel of what our Pentium M is capable of. Many of our database performance tests in the past have shown that Xeons with additional L3 cache out-perform Xeons without extra cache. So, we might be safe to assume that the 2MB L2 cache on the Pentium M is what gives the Dothan the additional boost. To date, the Dothan has more L2 cache than any other processor that we have seen.
Rendering BenchmarksBelow, we use Mental Ray 3.3.1 to render a particularly memory and CPU intensive benchmark scene (which you can download here). Below, you can see how the 32-bit binaries perform on the 32-bit versions of SuSE 9.1 Pro.
POV-Ray 3.6.1 was compiled from source using GCC 3.4.1.
We also took the same POV-Ray benchmark and ran it against the Pentium M clocked at speeds from 1.6GHz to 2.4GHz.
As you can see, the Pentium M can deal some healthy performance even with a simple 300MHz bump in clock speed (which is so easy to do on Pentium M's, it seems almost criminal). Although we have articles from Wes and Anand that will specifically deal with overclocking the Dothan, we should mention some of the excellent overclocking experiences that we had with the chip. Let's just keep our fingers crossed and hope that Intel doesn't decide to start locking these CPUs.
We added Apple's Shake 3.5c to our benchmarks during the Sun w2100z review, and we retested some of our older processors for these tests. You may download our batch Shake test file from Lindsay Adams here. The render time of all ten frames are summed and listed as the total render time for this analysis.
Dothan takes one of its first real dives here, and unfortunately not its last either.
Content CreationBelow, we compiled lame 3.96.1 without any additional optimizations and then used the following command on a 800mb .wav file.
# lame sample.wav - b 192 - m s - h - >/dev/null
The file is sent to stdout, which is then directed to /dev/null. We do not want the hard drive to throttle our mp3 encoding if possible, even if we are just immediately destroying it.
Next, we used the SuSE 9.1 Pro i686 bzip2 and gzip RPMs for this portion of the analysis. We recompile both binaries from scratch later to illustrate the effect of the compiler optimizations. The 800MB test file from the lame benchmark above was compressed and then timed using the command below.
# time gzip -c sample.wav > /dev/null
We compiled MPlayer 1.0pre5 from source without any optimizations. The benchmark command that we ran is below:
# time mencoder sample.mpg -nosound -ovc lavc vcodec=mpeg4:vpass=2 -o sample.avi
Encryption BenchmarksFinally, our favorite part of any Linux benchmark - hashing and encryption tests. Below, you can see how John the Ripper fared under various compilation options with the various processors that we had on hand.
Below, you can see how our processors performed in the OpenSSL "speed" benchmark. You may download the full printout of an Athlon 64 3800+ speeds here or the Pentium M 2.1GHz with 533FSB.
Performance on all encryption benchmarks was only average. The Dothan keeps up with all processors in the same price range, but it does not out-perform the category leaders in any test.
Compiling BenchmarksWe get a lot of requests to show some compiling benchmarks. We took the standard Linux 2.6.4 release from kernel.org and compiled it under our 32-bit test bed. We did not cross platform compile for simplicity, so we are only looking at the 32-bit vanilla kernel. We used the commands as below.
# yes "" | make config
# time make
Here, we see the largest deficiency of the Dothan yet.
The fact that the processor also utilizes much slower DDR333 probably does not help either, and we will have a chance to revisit that theory before the end of the benchmarks. With a moderate amount of overclocking via "scaling_setspeed", we were able to squeeze much better performance out of the chip - and we are still using the 40mm fan!
Update: We found an inconsitency after several days of trials in our test setup. It seems as though the PATA controller on our DFI motherboard is behaving irregularly. We believe this is localized to only this motherboard, but the controller is not behaving properly, and limiting our transfer rates. The GCC compile test is the only test in our benchmark that is HD bottlenecked.
Compiler OptimizationsAlthough TSCP is neither a model of practical application nor synthetic benchmark, it does provide us with some valuable data for different breakdowns of compiler flags and optimizations. As we have mentioned in past Linux analyses, compiler flags can show large differences between processors if they are used incorrectly. Remember, our Pentium M is significantly handicapped against K8 and Pentium 4 optimizations, since GCC seems to think that the Pentium M is merely a Pentium Pro, which it is not.
Where we denote "-march" in the graph, we mean specifically "-march=k8" or "-march=pentium-m", where it applies.
We also retested some of our content creation benchmarks to show the effect of setting "-march=pentium-m" or "-march=pentium4" at compile time. For good measure, we threw in some compilations of the same programs using the Intel C++ compiler, icc. If any compiler is going to utilize fully the advantages of Dothan on Linux, we should think it would be the Pentium M.
As expected, there are severe differences using the different compile flags. Attempting to set the architecture flag at "Pentium 4" degrades performance severely, and at the same time, there is no difference in performance between the "pentium-m" and the "i686" flag. When we look at ICC, on the other hand, Pentium M gets a nice boost when optimizing for Pentium M.
Memory AnalysisWhile testing the Pentium M on Linux, we came to the unofficial conclusion that Dothan was coming to a screeching halt on a lot of our benchmarks because it ran on antiquated DDR333. To put that theory to test, we took a few of the more memory intensive benchmarks and put the Dothan through its paces using different speeds of memory: DDR200, DDR266 and DDR333. To do these tests all on the same bus speed and multiplier, we had to tweak the memory ratio settings a bit, but fortunately, the motherboard was versatile enough to let us enable all of these modes. For this portion of the analysis, we are using the processor in a 100MHz bus with a 21X multiplier.
Now that we have proven the obvious (DDR333 is faster than DDR200), consider the implications of these benchmarks. Our DDR400 overclocking experiment should provide a very detailed outlay of what the sanctioned DDR400 platform will provide, even though the next generation Alviso chipset will also support DDR2 with different latencies.
Summing it all upOverall, Dothan provided us with some sporadic, but interesting, performance gains and losses. Unfortunately, Pentium M just doesn't scale similarly to Pentium 4 or Athlon 64 in any application, although it does seem to mimic the performance of one or the other occasionally. On our OpenSSL tests, Dothan continually out-nudged even our mid-range Athlons, but then fell far behind in compilation and some content creation tests.
There are, however, bottlenecks in the performance. High speed memory is something that our Dothan severely lacked on Linux, and we would certainly appreciate the next generation Alviso chipset to support something a little faster than DDR400. However, as Pentium M is a notebook chipset first and a blade/desktop chip second, the demands of low power notebook memory certainly take priority over a niche SFF/HTPC crowd.
The first surprise in our analysis came with the SQL database tests. Our windows benchmarks have shown in the past that the additional L3 cache can be quite helpful for database applications, and the 2MB L2 cache found on the Dothan plays a huge part in boosting performance. On the other hand, the additional cache might have been the same reason why GCC performed so poorly - although we hope that the Linux compile test was just a fluke (Update: Please see the note on the Compiling page. We believe we had an isolated fluke with the PATA driver that limited our performance). Other benchmarks put Dothan right in the upper middle of the pack, usually beating out the Pentium 4 offerings, but occasionally beating out the best that our Athlon 64s could produce as well.
Dothan isn't the miracle chip that we would have liked it to be. For starters, it is horribly expensive still. The 2.1GHz Dothan that we previewed today runs at around $500, and the motherboard costs another $270. For just a barebones configuration, our Pentium M desktop runs at around $1000. Granted, the overclockability on Pentium M seems outstanding, but finding slower, cheaper Dothans in socket 479 pin configurations may be a problem.
Unfortunately, we are only getting a small glimpse of the story here today. Our preliminary benchmarks on Windows show that Dothan does some awesome things on Windows; the compilers and operating system get a little more help from Intel in the design phase. Unfortunately, the extremely powerful and free Linux compiler remains dully unaware of many of the benefits that Pentium M has to offer, and as a result, it gets hurt painfully under the default or wrong compile flags.
All in all, Dothan does some very exciting things. The promise of cool, efficient powerhouses - from Intel, nonetheless - certainly has our attention. We will be keeping a very close eye on Pentium M over the next few months, particularly with the upcoming Alviso launch. If Dothan's Linux performance keeps up this well on the 855 chipset, we can't wait to see what it does on faster memory and the 915 Northbridge.