The Exynos 7420 - Inside a Modern SoC - Continued

An interesting part of the connectivity blocks is the modem connectivity block. Samsung describes this in its drivers as a “Combo PHY” capable of HSIC, PCIe and MIPI LLI. Given the wide range of connectivity options for external modems and the fact that usually there’s only one modem connected in a device, it makes sense to try to consolidate the various standards to save up on die space. The Galaxy S6 comes for the first time with a global rollout of Samsung’s own modem: Shannon 333. The piece which will probably be marketed as Exynos Modem 333, but like the 7420 Samsung has to yet to publicly acknowledge its existence. The company's in-house modems have in the past seen only limited adoption and used mostly in their home market of Korea. Starting with last year’s push of the Galaxy S5 Mini, we saw Samsung for the first time doing a wide-range rollout to other global markets.


Galaxy S6 PCB with SoC+DRAM and modem+NAND in view. The UFS module sits on top of the modem.
(Image source: Chipworks)

The Shannon 333 is connected to the Exynos 7420 via MIPI LLI (Low Latency Interface). This is an important distinction over past implementations that could have implications on the “integrated vs external” modem discussion. Qualcomm has had an indisputable superiority over competitors due to being able ship an all-in-one solution chipset. The advantage came in two areas: First was due to having a single physical chip; QC had the edge in packaging costs and PCB area footprint. Second one was that external modems require their own dedicated memory to be able to operate. We’ve seen this in many modems in the past, and even Qualcomm’s own Gobi modems such as the MDM9235 need to be partnered with an additional 128MB LPDDR2 of memory. The LLI connection, as opposed to traditional HSIC (High Speed Inter Chip, a USB 2.0 derivative without analog transceivers) interfaces allows the modem to directly access the SoC’s main memory, solving what was one of the most significant overheads of an external modem. Intel was actually the first to have an LLI connected modem in the form of the XMM7260 inside of the Galaxy Alpha, and like the Shannon 333, it was able to ditch the additional memory module which both reduces component cost and power consumption.

While Samsung is unable to comment on this topic, the MIPI Alliance explains that cost and power reduction were the goals of the Low Latency Interface. This also seems to fit with Samsung’s stance on integrated vs dedicated modems, explaining that the latter offers better time-to-market and AP performance characteristics. This makes sense given that modems need regulatory and carrier certifications, a process that takes a lot of money and time. Being able to quickly push out a silicon chip to a production device is critical as the industry now seems desperately to keep up with yearly major refreshes. Also as process nodes get more complex and expensive, it may make sense to actually separate the modem from the main SoC for yield and cost reasons.

It is my opinion that the company will continue with the dual-chip strategy on the high-end, but will still aim to include integrated modems in the low- and mid-range where cost-optimization is absolutely crucial. The Exynos 3470 seen in the Galaxy S5 Mini might see a successor in the ModAP integrated-modem SoC line-up as we’re seeing the first substantial evidence of what the Exynos 7580 is: An 8-core A53 SoC with integrated Shannon 310 modem and LPDDR3 memory. The odd naming convention aside, this looks to be a budget/mid-range chipset aiming to capture some design wins from Qualcomm and MediaTek.

While that was quite a tangent on the modem and its connectivity options, let’s go back to the SoC layout and IP blocks. General connectivity is part of every SoC, and the Exynos 7420 is no different here. With a diverse offering of SPI, HSi2C, UART, i2s, PCM, PWM and other ports it offers all the necessary bus interfaces required to connect all device components to the central SoC. I took the liberty of being very abstract and non-representative with these blocks so one should not read too much into their position or size.

An odd block that I could not account for is a quite larger area next to the A53 cluster. I’m not sure what it represents but it could be an agglomeration of smaller IPs or general SoC logic.

Samsung has used in SoCs previous to the Exynos 5430 a Coarse-Grained Reconfigurable Architecture (CGRA) processing unit called the Samsung Reconfigurable Processor (SRP) for audio processing. The SRP is an interesting architecture that Samsung seems to want to use for a variety of use-cases: We've seen prototype GPUs built with it and Samsung currently uses it as the processing cornerstone of its DRIMe-V SoC in DSLR cameras such as the NX-1. On Exynos SoCs 5430 and newer this audio block was dropped in favour of a more conventional ARM Cortex A5. The companion CPU is in charge of audio decoding, encoding and also audio processing tasks such as equalizer functions. Samsung has previously advertised that it can be also used for voice processing and voice recognition.

Finally, we move on to the media quadrant of the SoC. Here we find the ISP, the hardware media decoder/encoder and the display pipelines.


This part of the SoC is depicted totally different than the actual physical layout.

The Exynos’s hardware media accelerator is called the Multi-Format-Codec (MFC). This is a mature block as it has seen implementation in SoCs since the S3C6400 in 2007. Despite being out in the wild for 8 years now, we still don't know much at all about the architecture of the block. My assumption is that we’re most likely looking at a custom DSP architecture as the piece is accompanied by separate firmware that needs to be loaded for operation. The IP is able to encode and decode MPEG4, H263, H264, VP8, and HEVC and can additionally decode MPEG2, VC1 and VP9. The Exynos 5430 and 5433 used an additional HEVC decoder block separate of the MFC to be able to enable playback of the format, but with the 7420 this piece has been subsequently retired from the SoC as its functionality has been merged into the MFC.

I’ve always been impressed with Samsung’s hardware decoder in terms of performance and power, and the v9 of the MFC in the in the 7420 is no exception. I was able to playback 4Kp30 Main HEVC at only about 950mW of total device power (Minimum brightness, portrait mode to try to compensate for display power). This represents about only 600mW of system load power. The CPU load was very low as it hovered around 25-30% at 400MHz on two A53 cores. Unfortunately the decoder isn’t capable of Main10 profile (10bit) playback and freezes up after 2 seconds of 4Kp60 playback, making it not as future-proof as one would have hoped. As a note, Qualcomm’s Snapdragon 810 decode unit has the same limitations, so the playing field for this generation between the two major vendors is even.

Among the collection of media related blocks we find the ISP. We know very little about Samsung’s ISP, but it certainly is a very advanced piece of IP as Samsung can fall back to experience gained not only in the mobile sector but also in the standalone camera market where it produces its custom line of camera SoCs. The ISP consists of a mix of general purpose blocks such as a Cortex A5 running at 668MHz in tandem with a variety of fixed-function units.


Source: Samsung

Most that we know about the ISP architecture is from a 2013 paper Samsung had published on the Exynos 5420’s capabilities. There they explain that the whole ISP is formed by a series of sub-IPs each having their specialized jobs, such as sensor defect compensation, 3A (Auto-focus, Auto-exposure, Auto-white-balance), de-mosaic, inter-frame noise-reduction, phase-detection auto-focus, gyro digital image stabilizer, optical lens correction, face-detection, video stabilizer, and probably an even longer list of image processing features we’re not aware of. The SoC has 4 CSI ports and seems to have support for 3 image sensors.

Finally, we move on to the display pipeline, which Samsung calls DECON, short for Display and Enhancement Controller. The DECON block is also responsible for hardware layer composition. Mobile devices use hardware layers – meaning different frame-buffers on which they draw content to, and let the hardware unit recombine them into the final image. The most common example of this is the Android status bar window. Instead of having to re-render the whole screen whenever there’s activity on the status bar, the system will just redraw the thin status bar and let the hardware units do the composition. Video playback windows and application overlays work in a similar fashion.

The SoC has two main display controllers besides a separate HDMI output. Each is capable of MIPI DSI or DisplayPort output, although I’m not sure what its full capabilities such as resolution and frame-rate are. One addition to the Exynos 7420 that wasn’t present before in past variants is a Video Post-Processor (VPP) on each display controller. I’m again uncertain what the new block does but it seems to be capable of color-space conversions and uses poly-phase filters for a some certain task. Also part of each display controller is a block called MDNIe (Mobile Digital Natural Image Enhancement) which is used on all Exynos SoC for image color manipulation, sharpening and a large number of other effects. This is the block that enables Samsung devices to have different display profiles targeting different calibrations. As a side note, Samsung also employs a similar block on their external AMOLED DDICs to provide functionality to third-party SoCs in devices not using Exynos.

I’ve covered a bit what MIC (Mobile Image Compression) was able to provide to the Galaxy Note 4 in our review of that device; Display resolutions higher than 1080p make the image bandwidth required to transmit data from the SoC to the DDIC exceed the capacity of usual 4-lane MIPI DSI interfaces. To able to drive 1440p and higher displays vendor are either required to double up on the interface to a dual-DSI configuration, effectively using 8 lanes and thus doubling the power consumption of such an implementation. The alternative is to go the route of compressing the stream. Currently Samsung is the only one to offer such a solution in the form of their proprietary MIC mechanism, as the up-and-coming industry standard DSC (Display Stream Compression) has not yet seen compatible products released.

An interesting feature of both implementations that I previously wasn’t familiar with is the capability of doing partial slice updates. This means that if only a smart part of the screen is updated, then the compression algorithm only updates and transmits that part of the image, saving even more power by cutting down redundant data transmissions. I could verify this by changing and exaggerating the image color parameters via the MDNIe block. The display controller wouldn’t explicitly refresh the whole image after changing the color configuration, and only issue a slice update to the DDIC when the clock and WiFi-indicator showed activity. Due to the partial update, only a very small part of the screen would update with the new colors, demonstrating that the SoC transmits only fractions of screen data as static content is buffered directly on the DDIC.

Overall, the Exynos 7420 is an interesting SoC and I hope we’ve been able to better shed some light into most of the significant IP blocks that go into a modern SoC. At 78mm² the 7420 has quite some headroom to grow to the usual size of a high-end SoC. It’s possible Samsung intentionally kept the chip small to get more yield and higher unit volume as it is the first 14nm mass-production chipset for their foundries. It’s also possible that as the $/transistor metric hasn't gone down 14nm FinFET due to it being a very expensive process, that we’re seeing the start of a new trend and the end of large 100mm²+ SoCs. It’ll definitely be interesting to see in what direction the mobile semiconductor vendors will be heading in the coming year as the process gains maturity and production volume further ramps up as Samsung expands and GlobalFoundries and TSMC start their own FinFET mass-production.

The Exynos 7420 - Inside a Modern SoC - Part 1 CPU, Memory Performance & Device Disassembly
POST A COMMENT

114 Comments

View All Comments

  • jjj - Monday, June 29, 2015 - link

    The power doesn't look that great, for the A57 seems to allow 300-350Mhz higher clocks, granted it's not a clean shrink. It looks good here because on 20nm they pushed the clocks way high. Reply
  • name99 - Monday, June 29, 2015 - link

    Insofar as rumors can be believed, the bulk of A9's are scheduled to be produced by Samsung, presumably on this process. It seems strange to have Apple design/layout everything twice for the same CPU, so if these same rumors (30% going to TSMC) are correct, presumably that means the A9X will be on TSMC.

    As for characterizing Apple CPUs, while there are limits to what one can learn (eg in the voltage/power tradeoffs), there is a LOT which can be done but which, to my disappointment, has still not been done. In particular if someone wanted, I think there's scope for learning an awful lot from carefully crafted micro benchmarks. Agner Fog has give a large number of examples of how to do this in the x86 space, while Henry Wong at stuffedcow.net has done the same for a few less obvious parts of the x86 architecture and for GPUs.

    It strikes me as bizarre how little we know about Apple CPUs even after two years.
    The basic numbers (logical registers, window, ROB size) seem to about match Intel these days, and the architecture seems to be 6-wide with two functional clusters. There appears to be a loop buffer (but how large?) But that's about it.
    How well does the branch prediction work and where does it fail?
    What prefetchers are provided? (at I1, D1, L2. L3)
    Do the caches do anything smart (like dead block prediction) for either performance or power?
    Does the memory manager do anything smart (like virtual write queue in the L3)?
    etc etc etc

    Obviously Apple doesn't tell us these. (Nowadays the ONLY company that does is IBM, and only in pay-walled articles in their JRD.) But people write the micro benchmarks to figure this out for Intel and AMD, and I wish the same sort of enthusiasm and community existed in the ARM world.
    Reply
  • SunnyNW - Wednesday, July 1, 2015 - link

    Believe word on the street is the A9 will be Sammy 14nm and the A9X TSM 16nm+ Reply
  • SunnyNW - Wednesday, July 1, 2015 - link

    Please ignore this comment, should have read the rest of the comments before posting since Name99 already alluded to this below. Sorry Reply
  • CiccioB - Monday, June 29, 2015 - link

    Is the heterogeneous processing that allows all 8 cores working together active?
    Seen the numbers of the various bench it seems this feature is not used.
    What I would like to know exactly is that is the bench number of this SoC can be directly compared to SoC with only 4 cores like the incoming Qualcomm Snapdragon 820 based on custom architecture which has "only" 4 cores and not a big.LITTLE configuration.
    Reply
  • Andrei Frumusanu - Monday, June 29, 2015 - link

    HMP is active. Why do you think it seems to be not used? Reply
  • CiccioB - Monday, June 29, 2015 - link

    Because with 8 cores active (or what they should be with HMP) results is not even near 4x the score of a single core.
    So I wonder if those 8 core are really active. And whether they are of any real use if, to keep consumption adequate, frequencies of higher cores get limited.
    Reply
  • Andrei Frumusanu - Monday, June 29, 2015 - link

    All the cores are always active and they do not get limited other than in thermal stress situations. I didn't publish any benchmarks comparing single vs multi-core performance so your assumption must be based on something else. Having X-times the cores doesn't mean you'll have X-times the performance, it completely depends on the application.

    It's still a perfectly valid comparison to look at traditional quad-cores vs bL octa-cores. In the end you're looking at total power and total performance and for use-cases such as PCMark the number of cores used shouldn't be of interest to the user.
    Reply
  • Refuge - Monday, June 29, 2015 - link

    I would hazard a guess that thermal throttling has something to do with part of it. Reply
  • ruturaj1989@gmail.com - Monday, June 29, 2015 - link

    It does have 4 cores but I guess they are in big.LITTLE configuration too. We will see shortly. HMP is active but I am not sure if every bench app uses all the cores. Reply

Log in

Don't have an account? Sign up now