Original Link: http://www.anandtech.com/show/6554/audience-announces-3rd-generation-sound-processor-es515
Audience Announces 3rd Generation Sound Processor - eS515by Brian Klug on January 7, 2013 8:00 AM EST
About a month ago, Audience flew me out to their office in California to talk about a number of things. First, they offered the chance to check out anecohic rooms and listening booths used for testing and tuning smartphones equipped with Audience voice processors. Second, compare notes about testing and evaluating voice quality on mobile devices, both compared to the testing I’ve done and by wireless operators. Third, to take a look at their newest voice processor, the eS515. We’ve been covering noise rejection performance and voice quality on smartphones and tablets in our mobile device reviews, and Audience makes a line of standalone voice processors that work to improve voice quality both on the origin and endpoint of a mobile call.
First, Audience’s newest processor, the eS515, fundamentally changes itself from being just a standalone voice processor to a combination voice processor and audio codec. There’s a corresponding change in naming from voice processor to sound processor, as the eS515 takes the place of both a standalone codec and the slot otherwise taken by a standalone voice processor. This move changes Audience’s lineup from being a solution which requires an additional package (codec and voice processor) to being a solution that includes all the functions of a normal audio codec in addition to the audience noise processing infrastructure. The move allows direct access to all of the audio rails in addition to likely being a better sale to OEMs looking for a simple standalone solution. The eS515 includes a 1.13 watt class-D output speaker driver and 30mW output class-G headphone driver.
The other big news about eS515 are inclusion of some new improved audio processing features and others which move beyond just an emphasis on removing noise from calls and for ASR (automatic speech recognition or voice to text).
For a while we’ve seen smartphones shipping with two microphones, and recently some smartphones with three microphones, including the iPhone 5 and a number of prior Motorola phones. Until now, however, these implementations have used the these microphones in pairs, selecting combinations of two microphones to use at a time. In addition Audience designs with three microphones likewise used the microphones in pairs, selecting primary and secondary microphones depending on the phone’s position. eS515 is now the first to include a three-microphone algorithm for processing should an OEM choose to include a third microphone.
New features for eS515 also include de-reverb for rooms with heavy reverb, improved wideband processing for VoIP calls (up to 24 kHz, well beyond the 8 kHz for wideband cellular voice calls, 16 kHz is also supported), further improved ASR processing, and finally features for reducing noise when recording videos.
This last note is similar to what Motorola shipped on a number of former phones that leveraged the three-microphone setup. For example different scenes such as narration mode, wind reduction, and so on. The eS515 includes an interview mode called “Audio Zoom” which looks for a voice source behind and in front of the device and rejects noise elsewhere. Audience envisions a camera UI similar to Motorola’s with different audio scenes for users to choose from when recording videos.
I recorded a short video of “Audio Zoom” being demonstrated on an eS515 simulator.
After getting a design win, Audience works with handset and tablet makers to do final tuning on their devices, both after final industrial design is finished and sometimes on acoustic design before finalization. Part of this requires using special calibrated rooms to characterize the frequency response and directionality of devices. In addition Audience needs testing methodology for benchmarking its own projects.
I got to peek into Audience’s anechoic chamber and and an ETSI room as defined by EG 202 396–1 for noise suppression testing. Inside both rooms are a HATS (Head And Torso Simulator) which is instrumented with microphones for testing phones and tablets, and a controllable testing apparatus for holding the device under test and moving it through various positions.
The ETSI 202 396–1 specification defines a setup with four speakers and a subwoofer playing distractor music around the caller and a simple room layout. I plan to move our own smartphone call testing to a similar setup as well.
Audience earSmart eS515
World’s First Smart Sound Processor Combines Best in Class Codec with Leading Advanced Voice Technology to Enable New Levels of Personal Presence on Mobile Devices
The earSmart eS515 Smart Sound Processor combines Audience’s award-winning earSmart Advanced Voice processor with a its first high performance, low power stereo audio codec to deliver a best in class voice and multimedia experience. In addition, Audience’s Smart Sound Processor improves the accuracy of Automatic Speech Recognition (ASR) for smartphones, tablet and other consumer devices.
Now, more than ever, high-end mobile device users are demanding clearer voice quality, “first-time success voice recognition” and an immersive sound experience. This higher functionality, combined with slimmer designs and larger displays, is leading device manufacturers to look for tightly integrated processing solutions that deliver more, for less power. The earSmart eS515 offers a level of system integration not previously available for customers accepting nothing less than best-in-class voice and multi-media quality. The Audience eS515 is the first sound processor to offer advanced digital signal and multimedia processing integrated with a complete high-performance, low-power audio codec subsystem.
Be Heard – With Best in Class Voice Processing
The earSmart eS515 features 3rd generation Advanced Voice processing features. It delivers the world’s first true support of three-microphones, where all three are used simultaneously to gather more environmental information around the user and deliver vastly improved voice quality. The processing power of the earSmart eS515 has more than doubled to support high sample-rate VoIP calls (at 24 kHz) in addition to Narrowband and Wideband noise suppression. The earSmart eS515 offers the industry's first de-reverberation solution to reduce the echo of a speaker's voice in hallways, conference rooms and other challenging environments.
Be Understood – With Optimized ASR Assist
Audience ASR Assist technology directly addresses the challenge faced by many speech recognition applications today, namely their failure to recognize spoken words and complete assigned tasks because of disruptive background noise. The earSmart eS515 uses custom hardware-accelerated algorithms to isolate voice from surrounding environmental noise, dramatically improving the accuracy of speech-enabled applications such as voice search and speech to text. Mobile devices equipped with Audience ASR Assist technology deliver improved speech application reliability, accuracy and task completion – even in noisy, distracting environments.
Be There… Captured Memories Sound as Good as They Look
Mobile devices today are relied upon for media capture as much as they are for communication. The desire for captured audio to have the same high definition (HD) quality as video has never been more apparent. The earSmart eS515 Smart Sound Processor features the world’s first two-microphone 48 KHz noise suppression implementation for recording clean, high definition audio. The Audience eS515 also features Audio Zoom, the world’s first selective audio capture feature to be deployed for mobile devices. Audio Zoom allows users to dynamically switch between narrator mode, with a single speaker, to interview mode, where the person holding the device can “interview” another person. Interview mode leverages Audience’s industry leading stationary noise reduction technology to capture both voices with crystal clear accuracy while still suppressing background noise.
The earSmart eS515 Smart Sound Processor is a high performance custom designed audio processing solution with a significant level of integration and flexibility. It includes an Audience Advanced Voice processor with a hardware acceleration engine that has been optimized to run computationally intensive audio processing algorithms at very low-power consumption.
In addition, the highly integrated Audience earSmart eS515 Smart Sound Processor includes multiple host interfaces, digital audio interfaces, and a high performance, low power codec subsystem. This best in class codec consists of three ADCs on the record path and four DACs on the playback path. Two of the ADCs are designed for microphone inputs and the third is an auxiliary ADC that can be used with line level input. The codec subsystem consists of stereo class D speaker drivers, stereo class G headphone drivers, an earpiece driver, and two line outputs.
Key features of the Audience earSmart eS515 Smart Sound Processor are:
· High performance, low-power codec with stereo class-D and stereo class-G drivers
o Stereo Class-D loud speaker driver
§ 1.13 Watt output power (4.8V)
§ Control circuitry for low EMI emissions
§ Integrated anti-pop circuitry for all driver output paths
§ Filter-less design implementation for lower system cost
o Stereo Class-G headphone driver
§ 30 mW output power (16Ω)
§ Cap-less amplifier
§ Charge-pump design for ground centered outputs
§ Headphone/headset accessory and button detection circuitry
· 8kHz, 16kHz, as well as 24kHz voice processing capabilities for transmit and receive non-stationary noise suppression
· Hardware-assisted power optimization for low power consumption
· Transmit de-reverberation algorithms for reducing the undesired effect and distortion from environmental reverberation
· SLIMbus digital audio and command interface support for easy integration
· Advanced multimedia processing capabilities for enhanced audio recording and playback
· Automated Speech Recognition Assist for ASR-optimized noise suppression and gain control
• Non-stationary and stationary noise suppression (Close talk, Far talk)
• ASR Assist mode
• Audience Hi-Fi Voice, Voice Stretch and Acoustic Echo Cancellation
• Automatic Gain Control, Voice Equalization, Post Equalization and Multiband Compressor
• Narrowband (8kHz), Wideband (16kHz) and Super Wideband (24 kHz) signal processing
• Transmit De-Reverb
• UI Tone Mixing, Audio Multiband Compressor (MBC), Parametric EQ and Stereo Widening
• Dynamic Range Compressor (DRC)
• Equalizer Engine and Virtual Bass Boost support (OpenSL ES1.1 compliant)
• Configurable Beep Generator
• Multimedia recording features
◦ Audio Zoom directional sound capture modes for camcorder application
◦ Enhanced stereo recording: presets for optimized camcorder performance in multiple everyday, real-life environments
◦ Stereo 2-Mic stationary noise suppression
Key Device Features
• Audience earSmart Voice Processor core
• 4 High performance DACs with104dB dynamic range
• 4.8 mW stereo playback at 48kHz
• 1.1 W stereo class-D speaker driver
• 30 mW stereo class-G headphone driver
• 80 mW earpiece driver
• 2 Line output drivers
• 2 High performance 92dB SNR ADCs with mic PGAs
• 1 auxiliary ADC for line input with PGA
• SLIMbus interface
• Digital audio ports
◦ 3 master/slave PCM/I2S ports
◦ Supports 4 digital mic inputs and 1 output
◦ Up to 192kHz sample rates
• I2C,UART, and SLIMbus host interfaces