Original Link: http://www.anandtech.com/show/2798



I get some sort of odd satisfaction when Apple releases a product whose fundamental improvement is a new CPU. It happened when Apple first announced the MacBook Air, and it happened once more with the new iPhone 3GS:


From Left to Right: iPhone 3GS, iPhone 3G, iPhone

From a distance you can’t tell it apart from the iPhone 3G, which itself was arguably a step back in design from the original aluminum iPhone. But Apple products only sell because they look pretty right? How on earth would Apple ever justify selling an iPhone 3GS whose fundamental improvement is inside its pretty plastic?

To make matters worse, Apple has trained its users to expect significant changes in styling and UI on a regular basis. Just look at the progression of Mac OS X over the past several years. However, even with the latest iPhone OS release and significant technological pressure from Palm, the UI remains unchanged.

Competing against Microsoft and the other smartphone makers was pretty easy. Just do what they did, only better. But here we have the latest iPhone and it’s already behind the Palm Pre in a number of key features. Uhoh.

Yes, the S stands for Speed with the new iPhone 3GS but is that enough to keep this train rolling? We needed speed last year with the iPhone 3G, and all we got was a faster modem and lower battery life. Now we need multitasking support and we finally get a faster processor. Apple seems to be one step behind in the needs department, which is the perfect recipe for a company like Palm to step in and surprise.

Stepping away from the broader picture for a moment, Apple heads and haters alike can both appreciate the technology behind the 3GS, because the transition itself echoes what we’ve seen happen in the PC industry over the past two decades.

In my first iPhone 3GS article I compared the CPU upgrade to what we saw going from the 486 to Intel’s Pentium processor in the mid 1990s. Perhaps we need a quick refresher in CPU architecture? I’ll see if I can keep this succinct.



A Crash Course in CPU Architecture

It’s been years since I’ve gone through the life of an instruction, and when I last did it it was about a very high end desktop processor. I realize that not everyone interested in what’s powering the iPhone 3GS or Palm Pre may have been taken down this path, so I thought some of that knowledge might be useful here.

Applications spawn threads, threads are made up of instructions and instructions are what a CPU “processes”. The actual processing of an instruction is pretty simple; the CPU must fetch the instruction from memory, decode or somehow understand what the instruction is telling it to do (e.g. add two numbers), grab any data that is required by the instruction (e.g. find the numbers to be added), actually execute the instruction and finally write the result of the operation either to a register or memory.


Our basic microprocessor with a 5-stage pipeline

Based on the example above, executing an instruction requires five distinct stages. In a pipelined microprocessor, a different instruction can be active at each stage of the execution pipeline. For example, you can be grabbing data for one instruction, while decoding another and fetching yet another. All modern day processors work this way.


Multiple instructions can exist in the pipeline at once, but only one instruction may be active at any given stage

Each one of these stages should take the same amount of time for the processor to work efficiently; the length of time required at the longest stage actually determines the clock speed of the CPU. If the most complex stage in my example above is the decode stage and it requires 3ns to complete, then my CPU can run no faster than 333MHz (1 / 3ns).

To reach faster frequencies, we need to speed up each stage of the pipeline. You can speed up a stage by implementing some sweet new algorithms, or simply by splitting up complicated stages into simpler ones and increasing the number of stages in your pipeline.

In our previous example, the decode stage required 3ns to complete but if we split decode into three separate stages, each requiring 1ns, then we remove that bottleneck. Let’s say we do that but now some of our other stages become the bottleneck; with a target of a 1ns clock period (1ns spent per stage) we go from five stages to eight:

Fetch
Decode 1
Decode 2
Decode 3
Fetch Operands
Execute 1
Execute 2
Write Output

Now, with each stage running at 1ns, our maximum clock speed goes up from 333MHz to 1000MHz (1GHz). Sweet. Right?

With less work being done in each stage, we reach a higher clock speed, but we also depend on each stage being full in order to operate at peak efficiency.


5-stage pipeline (top) vs 8-stage pipeline (bottom). The 8 stage pipe is more desirable, but also requires more instructions to fill.

In the first CPU example we had a 5 stage pipeline, which meant that we needed to have the pipe full of 5 instructions at any given time to be operating at peak efficiency of 1 instruction completed every cycle. The second example has a ginormous 8 stage pipeline, which requires 8 instructions in the pipe for peak efficiency. In both cases you can only get one instruction out of the pipe every cycle, but the second chip can give us more completed instructions in say, 10 seconds.

Now think for a moment about the time periods we’re talking about here. The first CPU had a clock period of 3ns, where each stage took 3ns to complete. The second CPU had a clock period of 1ns. A single trip to main memory can easily take 60ns for a CPU with a very fast on-die memory controller, or over 100ns otherwise. For the sake of argument let’s say that we’re talking about a 100ns trip to main memory. Remember the Fetch Operands stage? Well if those operands are located in main memory that stage won’t take 3ns to complete, but rather 103ns since it has to get the operands from main memory.

Modern processors will perform a context switch upon any memory access to avoid stalling the pipeline for such an absurd length of time. The contents of the pipeline get flushed and filled with another thread while the data request goes off to main memory. Once the data is ready, the processor switches contexts once more and continues on its execution path. Here’s the problem: it takes time to refill the pipeline, and the longer the pipeline, the longer it takes to refill it. This is a bad, but regular occurrence in a microprocessor. Our instruction throughput drops from its 1 instruction per clock peak to 0; not good.

Other scenarios can create interruptions in the normal flow of things within our microprocessor. Some instructions may take multiple cycles at a single stage to complete. More complex arithmetic may spend significantly longer at the execute stage while the operation works out. With an in-order microprocessor, all instructions behind it must wait.

Again, the more stages in your pipeline, the bigger the penalty for a stall. But when the pipeline is full, a deeper pipeline will give us a higher clock speed and better overall performance - we just need to worry about keeping the pipeline full (which takes a great deal of additional transistors). And yes, there is an upper limit to how deep you can pipeline your processor before you start running into diminishing returns in both a performance and power sense, this was ultimately the downfall of the Pentium 4’s architecture.



Superscalar to the Rescue

If deepening the pipeline gives us higher clock speeds and more instructions being worked on at a time, but at the expense of lower performance when things aren’t working optimally, what other options do we have for increasing performance?

Instead of going deeper, what about making our chip wider? In our previous example only a single instruction could be active at any given stage in the pipeline - what if we removed that limitation?

A superscalar processor is one that allows multiple instructions to be active at any given stage in the pipeline. Through some duplication of resources you can now have two or more instructions at the same stage at the same time. The simplest superscalar implementation is a dual-issue, where two instructions can go down the pipe in parallel. Today’s Core 2 and Core i7 processors are four issue (four instructions go down the pipe in parallel); the high end hasn’t been dual issue since the days of the original Pentium processor.

The benefits of a superscalar chip are obvious: you potentially double the number of completed instructions at any given time. Combine that with a reasonably pipelined, high clock speed architecture and you have the makings of a high performance processor.

The drawbacks are also obvious; enabling a multi-issue architecture requires more transistors, which drive up die size (cost) and power (heat). Only recently have superscalar designs made their way into mobile devices thanks to smaller and cooler switching transistors (e.g. 45nm). You also have to worry even more about keeping the CPU fed with instructions, which means larger caches, faster memory buses and clever architectural tricks to extract as much instruction level paralellism as possible. A dual issue chip is a waste if you can’t keep it fed consistently.

Raw Clock Speed

The previous two examples of architectural enhancements are major improvements in design. To design a modern day CPU with more pipeline stages or to go from a single to dual-issue design takes a team years to implement; these are not trivial improvements.

A simpler path to improving performance is to just increase the clock speed of the CPU. In the first example I provided, our CPU could only run as fast as the most complex pipeline stage allowed it. In the real world however, there are other limitations to clock speed.

Manufacturing issues alone can severely limit clock speed. Even though an architecture may be capable of running at 1GHz, the transistors used in making the chip may only be yielding well at 600MHz. Power is also a major concern. A transistor usually has a range of switching speeds. Our hypothetical 45nm process may be able to run at 300MHz at 0.9500V or 600MHz at 1.300V; higher frequencies generally mean higher voltage, which results in higher power consumption - a big issue for mobile devices.

The iPhone’s processor is based on a SoC that can operate at up to 600MHz, for power (and battery life) concerns Apple/Samsung limit the CPU core to running at 412MHz. The architecture can clearly handle more, but the balance of power and battery life gate us. In general, increasing clock speed alone isn’t a desirable option to improve performance in a mobile device like a smartphone because your performance per watt doesn’t improve tremendously if at all.

In terms of sheer performance however, just increasing clock speed is preferred to deepening your pipeline and increasing clock speed. With no increase in pipeline depth you don’t have to worry about keeping any more stages full, everything just works faster if you increase your clock speed.

The key take away here is that you can’t just look at clock speed when it comes to processors. We learned this a long time ago in the desktop space, but it seems that it’s getting glossed over in the smartphone market. A 400MHz dual-issue core is going to be a better performer than a 500MHz single-issue core with a deeper pipeline, and the 528MHz processor in the iPod Touch is no where near as fast as the 600MHz processor in the iPhone 3GS.



Putting it in Perspective

Below is a table of the CPUs used in some of the top smartphones on the market, let’s put our newly refreshed knowledge to the test.

  CPU Issue Width Basic Pipeline Clock Speed
Apple iPhone/iPhone 3G Samsung ARM11 single 8-stage 412MHz
Apple iPhone 3GS Samsung ARM Cortex A8 dual 13-stage 600MHz
HTC Hero Qualcomm ARM11 single 8-stage 528MHz
Nokia N97 ARM11 single 8-stage 424MHz
Palm Pre TI ARM Cortex A8 dual 13-stage 600MHz
RIM Blackberry Storm Marvell ARM11 single 8-stage 624MHz
T-Mobile G1 ARM11 single 8-stage 528MHz

 

The first thing you’ll notice is that there are a number of manufacturers of the same CPUs. Unlike the desktop x86 CPU market, there are a multitude of players in the ARM space. In fact, ARM doesn’t manufacture any processors - it simply designs them. The designs are then licensed to companies like Marvell, Samsung, Texas Instruments and Qualcomm. Each individual company takes the ARM core they’ve licensed and surrounds it with other processors (e.g. graphics cores from PowerVR) and delivers the entire solution as a single chip called a System on a Chip (SoC). You get a CPU, GPU, cellular modem and even memory all on a single chip, all with minimal design effort.


A derivative of this is what you'll find in the iPhone 3GS

While it takes ARM a few years to completely architect a new design, their licensees can avoid the painful duty of designing a new chip and just license the core directly from ARM. ARM doesn’t have to worry about manufacturing and its licensees don’t have to focus on building world class microprocessor design teams. It’s a win-win situation for this business.

For the most part, ARM’s licensees don’t modify the design much at all. There are a few exceptions (e.g. Qualcomm’s Snapdragon Cortex A8), but usually the only things that will differ between chips are clock speeds and cache sizes.

The fundamentals of the architectures don’t vary from SoC to SoC, what does change are the clock speeds. Manufacturers with larger batteries and handsets can opt for higher clock speeds, while others will want to ship at lower frequencies. The ARM11 based products all fall within the 400 - 528MHz range. These are all single-issue chips with an 8-stage pipeline.

  iPhone 3G (ARM11) iPhone 3GS (ARM Cortex A8)
Manufacturing Process 90nm 65nm
Architecture In-Order In-Order
Issue Width 1-issue 2-issue
Pipeline Depth 8-stage 13-stage
Clock Speed 412MHz 600MHz
L1 Cache Size 16KB I-Cache + 16KB D-Cache 32KB I-Cache + 32KB D-Cache
L2 Cache Size N/A 256KB

 

The iPhone 3GS and the Palm Pre both ship with a Cortex A8. I’m actually guessing at the clock speeds here, there’s a chance that both of these devices run at closer to 500MHz but it’s tough to tell without querying the hardware at a lower level. The Cortex A8 gives us a deeper pipeline, and thus higher clock speeds, as well as a dual issue front end. The end result is significantly higher performance. Apple promised > 2x performance improvements from the iPhone 3GS over the iPhone 3G, such an increase was only possible with a brand new architecture.

I must stress this again: clock speed alone doesn’t determine the performance of a processor. Gizmodo’s recent N97 review complained about the speed of Nokia’s 424MHz processor (rightfully so). The review continued by saying that HTC uses 528MHz processors, implying that Nokia should do the same. The second part isn’t what Nokia should be doing on its $500+ smartphone, what is inexcusable is the fact that Nokia is not using ARM’s latest and greatest Cortex A8 on such an expensive phone. It’s the equivalent of Dell shipping a high end PC with a Core 2 Duo instead of a Core i7; after a certain price point, the i7 is just expected.



More Detail on ARM11 vs. Cortex A8

We’ve gone through the basic architectural details of the ARM11 and Cortex A8 cores, and across the board the A8 is far ahead. It gets even better for the new design once we drill a little deeper.

The L1 cache in the A8 gets a significant improvement. The ARM11 core had a 2 cycle L1 cache, while the A8 has a single cycle L1. In-order cores depend heavily on fast memory access, so an even faster L1 will have a dramatic impact on performance.
ARM11 actually supported a L2 cache but it was rarely used; the Cortex A8 is designed with a tightly coupled L2 cache varying in size. Vendors can choose from cache sizes as small as 128KB all the way up to 1MB, with a minimal access latency of 8 cycles. The L2 access time is programmable, with slower access more desirable to save power.

The caches also include way prediction to minimize the number of cache ways active when doing a cache access, this sort of cache level power management was also used by Intel back on the first Pentium M processors and is still used today in modern x86 processors.

The ARM11 core supported a 64-bit bus that connected it to the rest of the SoC; Cortex A8 allows for either a 64-bit or 128-bit bus. It’s unclear what vendors like Samsung and T.I. have implemented on their A8 based SoCs.


The S is for speed. Powered by the ARM Cortex A8.

With a deeper pipeline, the Cortex A8 also has a much more sophisticated branch prediction unit. While the ARM11 core had a 88% accurate branch predictor, the Cortex A8 can correctly predict branches over 95% of the time. If you care about stats, the A8 has a 512 entry branch target buffer and a 4K entry global history buffer. The accuracy of the branch predictor in the Cortex A8 is actually as high as what AMD claimed with its first Athlon processor, and this is an in-order core in a smartphone. With a 13-stage pipe however, a very accurate predictor was necessary.

While ARM11 supported some rudimentary SIMDfp instructions, Cortex A8 adds a full SIMDfp instruction set with NEON. ARM expects a greater than 2x improvement on media processing applications thanks to the A8’s NEON instructions - of course you’ll need to compile directly for NEON in order to see those gains. If you’re looking for a modern day relation, NEON is like the A8’s SSE whereas ARM11 basically had a sophisticated MMX equivalent. Both are very important.

The Cortex A8 is a more power hungry core than the ARM11, but the design also has much more extensive clock gating (turning off the clock to idle parts of the chip) than the ARM11. Since the A8 is newer it’s also going to be manufactured on a smaller manufacturing process. The bulk of ARM11 based SoCs used 90nm transistors, while A8 based SoCs are shipping at 65nm. ARM11 has started to transition down to 65nm, while A8 will move down to 45nm.

At the same clock speed and with the same L2 cache sizes, ARM shows the Cortex A8 as being able to execute 40% more instructions per second than the ARM11. That’s a generational performance improvement, something that can’t be delivered by clock speed alone, but the comparison is conservative. Cortex A8 designs won’t ship at the same clock speed and cache configurations as ARM11 chips; as far as I can tell, none of the major ARM11 based smartphones even had a L2 cache while Cortex A8 designs are expected to have one.

Furthermore, the ARM11 based smartphones were much lower in the frequency curve than the early A8 platforms. While a 40% improvement in instruction throughput is reasonable at the same specs, I would expect far larger real world performance improvements from a Cortex A8 based SoC compared to a ARM11 SoC.

Overall the Cortex A8 is much more like a modern day microprocessor. It’s still an in-order core, but it adds superscalar execution, a deeper pipeline, larger caches and a broader instruction set among other things. For any current high end smartphone there doesn’t seem to be a reason to choose the ARM11 over it, companies that insist on using ARM11 based designs even in 2009 are either not agile enough to implement a better chip in a quick manner or have no concern for performance and are more focused on cost savings. Neither option is a particularly good one and it is telling that the two manufacturers who seem to have gotten how to properly design a smartphone, Apple and Palm, have both opted to go with a Cortex A8 before most of the more established players.

A Call to Action

This leads me to a further point: we need more transparency in specs from smartphone manufacturers. The mobile phone market is all too shielded from the performance metrics and accountability that we’ve had in the PC space. When Intel was shipping Pentium 4s that performed slower than the Pentium IIIs they were replacing, we called them out on it. To this day, Apple refuses to talk about the processor in the iPhone 3GS. We get to hear all about what’s in the Nehalem Mac Pro, but the hardware behind the 3GS is off limits - despite the fact that it’s very good. This policy of not delivering specifics and a general unwillingness to talk about specs is absurd at best. It doesn’t take much more than a teardown and some homebrew code to figure out what CPU at what frequency is in any modern day smartphone; manufacturers should show pride in their hardware, or refrain from putting something inside a phone that’s they can’t be proud of.

What we need are cache sizes, clock speeds, full architecture disclosures. They don’t have to be on the phone’s marketing materials but make them accessible and at least some of the focus. These SoCs are so incredibly cool, they pack more power than the desktops of 10 years ago into a single chip smaller than my thumbnail - boast about them! Palm had a tremendous leg up on the competition with its OMAP 3430 processor, yet there was hardly any attention paid to it by Palm. I get that the vast majority of consumers don’t get, but those who do, would help tremendously if given access to this information. It’s something to get excited about.

And if the manufacturers won’t devote time and energy to this stuff, then I will.



The CPU and its Performance

I keep mentioning that the iPhone 3GS is faster than its predecessor, but these numbers speak louder than anything I can write:

Application Launch Time Apple iPhone 3G (3.0) Apple iPhone 3GS (3.0)
Star Defense 54.4 s 22.9 s
Sims 3 28.0 s 9.5 s
Resident Evil 32.0 s 22.5 s
Messaging App 4.66 s 1.97 s
Mail App 2.31 s 0.85 s
Search for "Man" 4.0 s 1.91 s
App Store 7.2 s 3.7 s
Power On Test 39.7 s 25.0 s
iPhone 3GS Advantage over iPhone 3G   95%

 

This is a generational improvement in performance folks. The new 3GS is, at worst, only 42% faster than the iPhone 3G. At best? Nearly 200% faster. Apple was right to abandon the aging ARM11 core used in the iPhone 3G in favor of the Cortex A8 in the 3GS. I also wonder if any of these performance gains are helped by using faster NAND flash in the 3GS. It wouldn't be enough to account for all of the performance boost, but perhaps 5 - 10%.

WiFi and 3G web page rendering speed is also a lot faster on the 3GS:

3G Apple iPhone (3.0) Apple iPhone 3G (3.0) Apple iPhone 3GS (3.0) Palm Pre (1.03)
anandtech.com 41.0 s 24.2 s 14.0 s 17.0 s
arstechnica.com 34.4 s 18.2 s 9.6 s 13.5 s
hothardware.com 84.3 s 58.3 s 19.8 s 23.0 s
pcper.com 67.1 s 35.1 s 18.5 s 22.1 s
digg.com 75.2 s 47.2 s 19.9 s 24.9 s
techreport.com 44.5 s 25.2 s 13.6 s 12.5 s
tomshardware.com 75.7 s 28.8 s 22.2 s 25.2 s
facebook.com 103.4 s 46.3 s 15.4 s 26.8 s

 

I re-ran all of my web browsing performance tests on all of the phones to provide the most accurate comparison. I ran the Palm Pre data before the 1.04 OS release came out but apparently that update didn't improve browsing performance so I wouldn't expect much difference there.

The 3GS makes everything faster, including web browsing over the 3G network. Just to be clear, I used the full site versions of all of these web pages - I did not use any mobile or iPhone optimized sites in the timing. I tried to perform all of the tests at the same time to eliminate any network strangeness. Each test was performed three times and I reported the average.

The 3GS is nearly 300% faster than the original iPhone in browsing over the cellular network. Here the 3GS looks to be around 114% faster than the iPhone 3G - definitely worth the upgrade if you do a lot of browsing on your phone. The iPhone 3GS ended up 24% faster than the Palm Pre, but I suspect that most of that is due to performance differences between Sprint and AT&T at my house.

It is important to realize what we're talking about here. These phones, particularly ones that are using old ARM11 based SoCs, are CPU bound while loading web pages. Even while browsing over a relatively slow < 1Mbps cellular network, the CPU still ends up being a significant bottleneck to web page rendering performance. Compare that to how things work on the desktop - when was the last time you felt your PC was too slow to browse the web? The Cortex A8 is a huge step forward here, and once again, there's no excuse for putting any ARM11 in a high end smartphone today.

Let's remove more bottlenecks and see how big of a difference the CPU alone makes, the following tests were performed over WiFi:

WiFi Apple iPhone 3G (3.0) Apple iPhone 3GS (3.0) Palm Pre (1.03)
anandtech.com 13.3 s 8.8 s 10.1 s
arstechnica.com 12.8 s 8.2 s 8.2 s
hothardware.com 35.8 s 15.1 s 11.6 s
pcper.com 27.8 s 17.3 s 21.3 s
digg.com 36.1 s 17.5 s 16.3 s
techreport.com 17.1 s 11.6 s 7.8 s
tomshardware.com 21.7 s 12.2 s 12.4 s
facebook.com 29.3 s 10.5 s 22.1 s

 

Remove the cellular bottleneck and things mostly stay the same between iPhones. The new 3GS is nearly 100% faster than the old 3G (and iPhone original). The major change comes from the comparison to the Palm Pre. The 3GS is now only 8% faster than the Pre, a significant improvement from the earlier releases of webOS. I do firmly believe that Palm has much room to improve performance on its device to bring it up to speed compared to the 3GS. It's running very similar hardware to the iPhone 3GS, there's no reason for it to feel so much slower.

Let me take this opportunity to also chastise HTC for using the Qualcomm MSM7200A in the new Hero smartphone. Here we have yet another Android OS phone using a horrendously old ARM11 based CPU, it’s just unacceptable. The table above shows you how much more performance is on the table if you move to Cortex A8. I’m still waiting for a handset maker to do Android justice and pair it with a truly robust hardware platform.



The GPU and its Performance

We’ve gone through, in great depth, the CPU in the new iPhone 3GS. But the GPU is arguably the more interesting upgrade. I’ve already covered the GPU pretty well so I present you with a blatant copy/paste of what I’d already written:

Now that we’re familiar with the 3GS’ CPU, it’s time to talk about the GPU: the PowerVR SGX.

Those familiar with graphics evolution in the PC space may remember Imagination Technologies and its PowerVR brand by their most popular desktop graphics card: STMicro’s Kyro and Kyro II. The Kyro series used the PowerVR3 chips and while STMicro ultimately failed to cement itself as a NVIDIA competitor in the desktop, the PowerVR technology lived on in ultra-mobile devices.

The SGX is on Imagination Technologies’ fifth generation of its PowerVR architecture, and just like the Kyro cards we loved, the SGX uses a tile based renderer. The idea behind a tile or deferred renderer is to render only what the camera sees, not wasting clocks and memory bandwidth on determining the color of pixels hidden by another object in the scene. Tile based renderers get their name from dividing the screen up into smaller blocks, or tiles, and working on each one independently. The smaller the tile, the easier it is to work on the tile on-chip without going to main memory. This approach is particularly important in the mobile space because there simply isn’t much available bandwidth or power. These chips consume milliwatts, efficiency is key.

The MBX-Lite used in the original iPhone was also a tile based architecture, the SGX is just better.

Also built on a 65nm process the PowerVR SGX is a fully programmable core, much like our desktop DX8/DX9 GPUs. While the MBX only supported OpenGL ES 1.0, you get 2.0 support from the SGX. The architecture also looks much more like a modern GPU:

Pixel, vertex and geometry instructions are executed by a programmable shader engine, which Imagination calls its Universal Scalable Shader Engine (USSE). The “coprocessor” hardware at the end of the pipeline is most likely fixed-function or scalar hardware that’s aids the engine.

The SGX ranges from the PowerVR SGX 520 which only has one USSE pipe to the high end SGX 543MP16 which has 64 USSE2 pipes (4 USSE2 pipes per core x 16 cores). The iPhone 3GS, I believe, uses the 520 - the lowest end of the new product offering. Update: Thanks to BlazingDragon and psonice for pointing out that the 3GS may in fact use a PowerVR SGX 535 based on drivers on the iPhone 3GS. It could still use a lower end SGX core and use the 535's driver, but there's at least the possibility that the 535 is under the hood of the 3GS.

A single USSE pipe can execute, in a single clock, a two-component vector operation or a 2 or 4-way SIMD operation for scalars. The USSE2 pipes are upgraded that handle single clock 3 or 4 component vector operations, have wider SIMD and can co-issue vector and scalar ops. The USSE2 pipes are definitely heavier and have some added benefits for OpenCL. For the 3GS, all we have to worry about is the single USSE configuration.

  iPhone 3G (PowerVR MBX-Lite) PowerVR SGX @ 100MHz PowerVR SGX @ 200MHz
Manufacturing Process 90nm 65nm 65nm
Clock Speed ~60MHz 100MHz 200MHz
Triangles/sec 1M 3.5M 7M
Pixels/sec 100M 125M 250M

 

In its lowest end configuration with only one USSE pipe running at 200MHz, the SGX can push through 7M triangles per second and render 250M pixels per second. That’s 7x the geometry throughput of the iPhone 3G and 2.5x the fill rate. Even if the SGX ran at half that speed, we’d still be at 3.5x the geometry performance of the iPhone 3G and a 25% increase in fill rate. Given the 65nm manufacturing process, I’d expect higher clock speeds than what was possible on the MBX-Lite. Also note that these fill rates take into account the efficiency of the SGX’s tile based rendering engine.

Obviously all games on the app store were designed for the PowerVR MBX-Lite and the SoC in the iPhone/iPhone 3G/iPod Touch. Some do run faster/smoother on the 3GS thanks to the faster CPU and GPU, but the new hardware should enable an entirely new class of game in the future.

Developers will have to deal with a segmented user base, since there are around 20 million PowerVR MBX-Lite based phones and a bit over 1 million SGX based 3GSes in the market.

The GPU Comparison

There are obvious limitations to the iPhone 3GS being used as a gaming platform. But there are also obvious limitations to the 3GS being used as a camera or a video player. The point isn’t that the 3GS could replace the Nintendo DS or the Sony PSP, it’s that the iPhone could get you maybe 80 - 90% of the way there.

There are serious control and battery life limitations, but the platform has the right combination of hardware, a software delivery system and ubiquity to at least be considered a viable gaming system. It has a much more powerful CPU/GPU combination than a Nintendo DS, now what it needs is the iPhone equivalent of Mario. Perhaps Apple should buy a game developer.

The obvious smartphone GPU comparison here is the Palm Pre. The TI OMAP 3430 used in the Pre (good job Palm on full disclosure there, just add clock speeds and you’ll get an A+) is a very similar SoC to the Samsung chip in the iPhone 3GS. As such, it also has a PowerVR SGX core.



Camera and Video Capture

Videos and photos are both tossed into the same place on the 3GS, and they both sync with iTunes. Michael Arrington and others recently speculated that Apple will be bringing video recording functionality to virtually all iPods and honestly, I think it makes sense. Just as the cell phone and the pocket camera converged, you can easily integrate 90% of the functionality of a tiny video camera into a smartphone like the iPhone or a MP3 player like the iPod. Hooray for reducing pocket clutter.

The camera on the 3GS is much improved over what's in the 3G and original iPod. For videos, especially outdoors or in situations where you have tons of light, there's no need to carry around anything else - the 3GS is sufficient. Adjustable focus on the still camera is a nice improvement and the live viewfinder is significantly faster as well, to the point where it's actually usable.


That's one tiny lens (upper left)

The lenses on these things are abysmally small; if you’ve got enough light, then the results are more than good enough:

If you’re low on light (and hate noise), don’t bother.


Scaled down, it's not terrible


There's a lot of noise

Apple offers two tap uploading to YouTube directly from the iPhone 3GS. Tap once to share your video, tap once more to upload it to YouTube - even over the cell network. It works extremely well and it’s ridiculously easy. Facebook integration would be a nice addition though.


Why is there no Send via MMS option? Because AT&T has yet to flip the switch for iPhone customers.

Video editing is also very cool on this device; it really does work as well as you’ve seen on the commercials. Record a video and, using your fingers, shrink the timeline to include only what you want to save. You can’t splice different parts of a single clip together, but you can at least trim out annoyingly long beginnings or bloopers at the end before you share your video with all of the world to see. Thanks to the Cortex A8 in the 3GS, trimming video goes by pretty quickly.



The Screen

There’s no increase in resolution and thanks to the iPhone OS 3.0 update, the automatic backlight is even more aggressive in making the 3GS as dim as possible - so why even bother having a section in the review called The Screen?

While not as dense as the Pre’s screen, the iPhone 3GS continues to have one of the best touchscreens on the market. Sure you get no feedback from your touches, but honestly, that’s something most seem to be able to get over.

There is one small improvement to the 3GS’ screen over the 3G. The new phone comes with a new oleophobic coating on the screen.

First a compass and now an oleophobic screen? Despite what you may think, this feature is actually pretty useful. Fingerprints and smudges still get on the new iPhone but they easily come off and they definitely aren't as present as on the older models.


iPhone 3GS (left) vs. iPhone 3G (right)


iPhone 3G (left) vs. iPhone 3GS (right)


iPhone 3GS (left) vs. iPhone 3G (right)

One side effect of the oleophobic coating is an increase in resistance on the surface of the screen. Your finger doesn't glide as smoothly over the new screen as it did over the old one. It’s a bit ironic actually. Newly washed hands will feel more resistance than hands with a bit of oil on them, yet the fingerprints aren’t as prevalent on the 3GS as they were on the 3G.


iPhone 3GS (left) vs. iPhone 3G (right) after being wiped down

The new screen is nice - but I still want an oleophobic coating on the back of the iPhone 3GS. The black plastic especially shows finger prints and general nastiness infinitely worse than the original metal iPhone.



A Testament to Honesty about Battery Life

I follow AMD's Patrick Moorhead on Twitter and recently he's been on a tirade against MobileMark 2007 and those who use it to characterize notebook battery life. Moorehead's complaint is motivated by the fact that Intel performs better on MobileMark than AMD (possibly due to better power characteristics at idle conditions), but it highlights another truth: most notebook makers can't be trusted when it comes to battery life claims.

In my recent MacBook Pro coverage I pointed out that this wasn't true for Apple. Apple claimed that the new MacBook Pro could last up to 7 hours on a single charge under a wireless productivity test, and my own tests backed up that assertion. Apple said its new 15-inch MacBook Pro could last up to 7 hours and my tests showed between 5 - 8 hours of battery life depending on workload.

Apple isn't very specific with the MacBook Pro battery life, you just get an upper limit. With the iPhone however, Apple has gotten much better at indicating battery life. In nearly every single test I ran, Apple's advertised battery life and the battery life I actually experienced were almost identical.

Apple claims that web browsing on WiFi should last you about 9 hours on the new iPhone 3GS; mine lasted 8.83 hours. Apple said on 3G I should get 5 hours, I got 4.81. For video playback Apple said the new iPhone 3GS should deliver 10 hours of battery life, I estimated around 9 (I was at the 5 hour mark with more than 50% of my battery left). Apple also does a bang up job on detailing its testing methodology if you care to read how Apple tests.

While I'll always run my own tests to verify Apple's claims, I will say that I've never been this impressed by a manufacturer's honesty with regards to performance claims. Apple gets praised for its design, attacked for its secrecy but it should be commended for its transparency and honesty when it comes to battery life specs on its products.



Gaming Battery Life: Expectedly Worse

The SoC in the original iPhone was built on a 90nm process, the new one is built on a 65nm process. Despite the lower power transistors, the new GPU core is much faster and thus requires more power than the old one.

The 3D engine is where the improvements lie and thus we see both better performance in 3D games and worse battery life. To find out how much I loaded up Resident Evil on the iPhone 3GS and the iPhone 3G. I left the game running at the very start and allowed the battery to run down. Keep in mind that although the character never moved, the scene was still being rendered tens of times per second. The battery life represented here is the best case scenario while playing a moderately stressful 3D game on the iPhone. I found that the Sims 3, for example, lasted much longer than even my simple Resident Evil test.

  Apple iPhone 3G Apple iPhone 3GS
Resident Evil 4.55 hours 3.42 hours

 

The results were a bit better than expected. The iPhone 3G lasted a bit over 4.5 hours while the 3GS was under 3.5 hours. That's a 25% reduction in battery life, something very measurable and noticeable. If you do a lot of 3D gaming on your iPhone, the new one isn't going to do you any favors in the battery life department.



The Rest of the Time: Improved Battery Life

The new iPhone 3GS didn't last as long as its predecessor when gaming, but in every other area the new phone either lasts longer than the old one or remains unchanged. Moore's Law rocks.

  Apple iPhone 3G Apple iPhone 3GS
WiFi Web Browsing 6.67 hours 8.83 hours
3G Web Browsing 4.5 hours 4.82 hours
H.264 Video Playback ~ 4 - 5 hours ~ 9 - 10 hours

 

The important take away here is that there’s no real performance improvement when working over 3G - which is the majority of the time. The old iPhone had much better battery life because of the lower power operation on Edge networks, so if you’re migrating from an original iPhone to the 3GS expect to notice a significant reduction in web browsing and talk time on the 3G network. Apple continues to refuse to put in the effort required to automatically sense and switch between Edge/3G networks depending on need, or at minimum offer a faster way of disabling 3G in order to preserve battery life. Currently it’s a lengthy process (for an iPhone) to disable/enable 3G:

 

If you spend a lot of your time on WiFi however, the 3GS lasts significantly longer than the iPhone 3G. Apple’s estimate of nearly 9 hours is very accurate.

Video playback is also quite impressive. If you can deal with watching a small screen for that long you can easily make it through a few feature length transcoded movies while you fly across the globe.



Voice Recognition

Like it or not, people use their phones while they are driving. Apple finds itself in an especially sour pickle as the iPhone’s touch screen offers no tactile feedback, requiring much more attention to call/type while driving.

The 3GS helps make the roads a bit safer for all of us by enabling voice recognition. Just hold the home button down for about 3 seconds and you can order your phone to do things.

The voice recognition works surprisingly well, even with background noise. I was also surprised by how well it handled less common names. I tried calling Amir Majidimehr, former Microsoft VP and generally awesome home theater dude, and the voice recognition handled it perfectly. You can also use the voice recognition to play music.



The Compass

The new iPhone 3GS has one more piece of hardware: a digital compass. When Apple made the 3GS announcement I kept IMing my friends and saying about how I couldn't wait to use the sweet new compass, yes, sometimes my sarcasm isn't very thinly veiled.

If you fire up the compass app you get, well, a compass. Hold it in front of you, move around and it tells you what direction you're pointing in. It's particularly useful if you're walking or hiking somewhere but there's also support for it in Google Maps on the iPhone 3GS.


Bad Andy, Good compass

On paper, the Google Maps compass integration is pretty sweet. Hit the GPS pinpoint button on Google Maps to find your current location, hit it one more time and the compass kicks in. Your blue dot on Google Maps now gets an orientation indicator to let you know what direction you're pointing in. This is very useful when trying to use Google Maps by foot and you'd think it'd be super useful for even navigating while driving, unfortunately the latter isn't true.


Apparently I'm driving perpendicular to the highway

I've found that either the compass will show the wrong orientation while in a moving vehicle or you'll get this hilarious message to move the 3GS in a figure 8 to re-calibrate the compass.


HAHA, you want me to what?

The compass is useful, it's a nice addition, but not a huge feature of the new phone.



The Inevitable Comparison: 3GS vs. Palm Pre

In response to my Pre review, many of you posted that the article read more like a comparison to the iPhone or a list of things for Apple to improve. I wrote it as such because I felt that while Palm out-innovated Apple in many ways, it fell short in just as many. At the same time I felt that Apple had much room to improve given the impact of the Pre, while also holding its advantages over the Pre. In short, neither device is perfect and both companies have much to learn from the other. It wouldn’t be fair for me to exclude the Pre from this article, as the iPhone 3GS delivers speed but lacks the functionality of what Palm has done with the Pre.

It’s a wonder what a year makes. Apple originally shied away from enabling background tasks on the iPhone because it didn’t want to compromise performance or battery life. The latter made sense, but the former didn’t really jive - the more we asked of the iPhone, the slower it got. In particular, its performance took a dive once the official App store launched along with the 2.0 firmware. Since then, the iPhone hasn’t exactly been fast - especially compared to some newer smartphones.

Apple’s solution to the background tasks problem was server-side push notifications. Take the most popular example: AIM. Since Apple doesn’t allow 3rd party applications to run in the background on the iPhone, if you’re in the middle of an AIM conversation and lock your phone, go to the home screen or launch another app, your connection to AIM is lost and your screen name logs off. You won’t get any new messages until you log back on.

/
I love sushi

With the iPhone OS 3.0 the AIM app can use Apple’s push notification servers to keep the connection active. The minute you close the AIM app on the iPhone, the connection between your phone and AIM is severed but kept alive by one of Apple’s push servers. Any new messages that are to be delivered to your phone go to Apple’s servers, which know your phone’s IP and whereabouts. The servers then push the message to your phone and you see it like a SMS notification on the iPhone:

Sweet, right? It’s great for receiving a single message, but it’s horrible for actually maintaining a conversation. To respond to the message I have to click view message, then wait for the AIM app to launch and log me in and only then can I begin typing. Now let’s assume that I quit out of AIM because I had to do something else, or even worse, let’s assume that I left AIM because I had to send a text message. I’m now switching between two messaging apps to carry on two different conversations. It’s cumbersome.

As AIM messages pile up, the counter on my AIM app icon increments to let me know what I’ve got waiting for me.

Switching between apps is made much faster on the 3GS, this whole process is far more annoying on the 3G or original iPhone because actually launching the AIM app takes far longer. It’s a better overall experience but still no where near the seamless setup that Palm offers. If you mostly text/IM people on your phone, then honestly, forget the iPhone and get a Pre - Apple simply doesn’t do the best job here any longer.


Sending IMs and switching between apps on the Pre, the way it should be done

The iPhone OS needs a drastic revamp. The OS was designed very well for what the first iteration of the iPhone was created for: single tasking with SMS, email, web browsing, phone calls, music playback and browsing through photos. Add several pages of apps to the OS and try to multitask between them and the OS quickly shows its limits. Although Apple has added a very sweet Copy/Paste interface to the iPhone, that’s about the extent of how well you can work between apps thanks to Apple’s no background tasks limitation.

Palm got the implementation of a multitasking OS down right with the Pre, but the performance levels just aren’t up to snuff. Take using the dialer app for example. Animations are choppy and there’s a noticeable lag between when you tap a button and when the app responds. That just isn’t true of the iPhone and definitely not true of the 3GS; responsive is the key word here and Palm lacks it.

Unfortunately, what the 3GS has in responsiveness it lacks in productivity. The more I use the 3GS the more I wish I was able to run more than one application at a time. What I want is a phone that multitasks like webOS but with the speed of the 3GS. I believe that both Apple and Palm are capable of delivering such a device, I’m just unsure which company will do it first.



Final Words

I always feel like I need to congratulate or somehow gift those readers who make it all the way through an article like this. Maybe I'll start handing out lollipops one day. If you made it this far you'll know that there's a lot of concluding that needs to happen.

First, the phone itself. Honestly, if you have the original iPhone then this is absolutely the one you'll want to upgrade to - you'll feel like you've been swept off of your feet one more time (assuming you did like your iPhone). Upgrading from the 3G is also a good idea in my opinion, just because of the tremendous increase in performance. Where the upgrade recommendation becomes stickier is if you have to pay full price for the phone. Unlike the iPhone 3G launch, AT&T isn't letting everyone move to the 3GS at the $199/$299 upgrade price (16/32GB models). Under immense pressure from the market, AT&T has made terms a little more favorable but there is a sizeable population that won't get upgrade pricing until later this year. At $500 or $600 I'm not sure the 3GS is worth the price today, simply because I'm expecting an updated model around this time next year. Remember that Cortex A9 based phones will be out in 2010 and Apple also has the option of using a multi-core A8 variant as well, especially at 45nm. If you have to spend that much money, either wait until the next iPhone or wait until your upgrade price drops; $500 can buy you a Core i7, it shouldn't be the cost of a CPU upgrade for your phone.

Next, there's the Palm Pre. I continue to be impressed by not only how much Palm was able to do in such a short period of time with webOS but how frequently Palm is updating the OS on the recently launched Pre. We're now up to four OS updates since its launch in June and I fully expect more from the Palm crew. By the time the Pre debuts on Verizon and AT&T the phone should be sitting pretty. If you don't mind being on the bleeding edge of a platform that needs work and real developer support, the Pre is a real alternative to the 3GS - and in some senses, a better device to use.

Finally, there's the hardware itself. The real story behind the 3GS isn't that Apple took longer than necessary to move to the Cortex A8. No, the real story here is that both ARM and Imagination Technologies are significantly improving the CPU and GPU performance in high end smartphone SoCs at a rapid pace. There's significant room for improvement in both CPU and GPU performance, right now all we're limited by is power consumption. Within the next year we should see more SoCs transition to 45nm and at that point I'd expect to see multi-core ARM implementations as well as wider PowerVR SGX cores. Then there's always ARM's Cortex A9, the first out-of-order ARM core. In the distant future we also have to start thinking about Intel; Atom was always intended for the smartphone market, and at 32nm I'd expect to see an Atom based CPU in an iPhone-sized device.

If you're bored by performance improvements on the desktop, then keep an eye on the ultra mobile space. Smartphones are going to see significant performance gains over the next few years. The iPhone 3GS is just the beginning.

Log in

Don't have an account? Sign up now