NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands Onby Anand Lal Shimpi & Brian Klug on February 24, 2013 3:00 PM EST
The Cortex A9 r4p1
Although we just call ARM’s previous architecture by its Cortex A9 name, there have been multiple revisions to the A9 architecture since its introduction. Tegra 2 implemented Cortex A9 r1p1, while Tegra 3 used r2p9. With Tegra 4i, NVIDIA moved to the absolute latest version of the Cortex A9 core: r4p1.
There are some significant changes to the Cortex A9 in r4p1. The GHB, L2 TLB and BTAC all grew by 4x and are now sized equally between the A9 and A15 implementations (16K predictors, 512 entries and 4096 entries, respectively). These changes help improve branch prediction accuracy, which further increases IPC on an already very efficient design.
The A9 r4p1 also has an enhanced data prefetching engine, including a small L1 prefetcher and dedicated hardware for the cache preload instruction.
NVIDIA claims a 15% increase in SPECint_base for the Cortex A9 r4p1 vs. r2p9, which is pretty impressive. Combined with the 2.3GHz max frequency, Tegra 4i’s CPU performance should be a healthy improvement over what we have in Tegra 3 today.
Tegra 4 Clock Speeds
Each of the four primary Cortex A15s is driven off the same voltage and frequency plane, although each core can be power gated individually. This is similar to how Intel designs its processors, but at odds with Qualcomm’s independent voltage/frequency planes.
NVIDIA does a good job of binning its SoCs, and the same will continue with Tegra 4. All four cores are capable of running at up to 1.9GHz, although NVIDIA claims we may see configurations with even higher single core boost frequencies (or even lower max frequencies, similar to Tegra 3). As I already mentioned, the fifth Cortex A15 runs at somewhere between 700 and 800MHz.
The Tegra 4 GPU operates at up to 672MHz, up from the 520MHz max in Tegra 3.