TWS Reference Design

About This Application Note

The TWS Reference Design Application Note describes the DSP Concepts True Wireless Stereo (TWS) reference design. The Audio Weaver Design, intended for use with DSP Concepts TWS RAPID Kit, features standard processing and signal flows seen in popular TWS commercial products. For an in-depth explanation of the RAPID TWS Kit and how to set up your kit, please refer to the RAPID TWS Kit Native Mode Setup and User Guide and the TWS_Audio-Weaver--RT685EVK-RevE-Board-Users-Guide.

Processing Overview

Audio in the design can be broken into three distinct, but related paths. One is the path of audio from the microphones to the connected device called the own voice processing path. The second is the path of audio from the device to the transducers in the earbuds called the playback processing path. The third is the path from the microphones to the transducers in the earbuds called the ambient sound processing path. The following diagram outlines the three paths.

The processing that composes these paths is described later in this guide.

TWS Reference Design High-Level Processing Diagram

Own Voice Processing Path

This section describes the major components of the own voice processing path.

Voice Activity Detection

The Voice Activity Detection (VAD) subsystem utilizes the bone conduction sensor in the RAPID earbuds to detect when the user is speaking. The status of the VAD is propagated to Mode Detection, Dynamics Processing, and Keyword Spotting to affect the processing they apply.

Mode Detect

The mode detection (Mode Detect) subsystem is used to detect the noise conditions under which the device is operating. This status is propagated to the Voice Communication and Signal Fusion subsystems to affect the processing they apply. The set of states that are possible are low noise, high noise, and wind. The status is encoded as a number as described in the following table.

Table: Mode Detect Codes

Mode Detect Code	Status
0	Low Noise
1	High Noise
2	Wind

Voice Comm

The Voice Communication (Voice Comm) subsystem applies processing to the incoming microphone signals to improve the quality of speech that is transmitted to the connected device. The processing applied depends on what noise condition is detected by the mode detect module. The possible modes of operation are low noise, high noise, and wind. In low noise and high noise mode, beamforming is used to reject noise outside of the path from the microphones to the mouth. In wind mode, processing is applied to remove as much wind noise as possible from the forward microphone in the earbud. The output of all processing modes is then fed to single channel noise reduction (SCNR).

Signal Fusion

The Signal Fusion subsystem combines the bone conduction sensor with the forward-facing microphone under wind conditions. Under other modes, the signal is left unaltered by the subsystem.

Dynamics Processing

The dynamic processing system contains all dynamic processing applied to the own voice processing path signal. The system consists of a noise gate keyed to the VAD, transient shaping, automatic gain control (AGC), and peak limiting.

Keyword Spotting

Keyword spotting is a basic demonstration of Voice User Interface (VUI) and Voice Assistants (VAs). It is composed of the Sensory™ TrulyHandsfree™ module and a One Shot Player module. The Sensory module listens for the desired keyword. When the keyword is detected, it triggers the One Shot Player module to play a short recording of a bell into the playback processing path of the design. The functionality of this system can be extended for more advanced VUI/VAs functionality.

Playback Processing Path

This section describes the major components of the playback processing path.

Playback Master Volume

The playback master volume is a playback volume control. The other gains are fixed and not to be changed.

Stereo Externalization (VisiSonics)

Stereo Externalization prevents in-head localization by making stereo streaming audio content sound like it originates from loudspeakers in front of the user. Users can adjust the reverberation level using three presets.

Streaming Personalization (Mimi)

Streaming Personalization amplifies and changes the frequency response of streaming audio signals to optimize audio playback based on a user’s hearing ability and preferences. Users can adjust the processing using four age-based presets.

Corrective EQ

A corrective EQ is applied to the playback processing path to flatten the frequency response of the earbud drivers’ output at the eardrum reference point.

Streaming EQ

A streaming EQ is applied to the playback processing path to allow users to tune the spectral quality of streaming audio content for an enhanced listening experience.

Output Limiter

The output limiter subsystem is a peak limiter used to keep the system from clipping.

Soft Clipper

The soft clipper protects the earbuds from potentially damaging peaks in the signal.

Ambient Sound Processing Path

This section describes the major components of the ambient sound processing path.

Ambient Sound Master Volume

The ambient sound master volume is an ambient sound volume control. The other gains are fixed and not to be changed.

Transparency EQ

A transparency EQ is applied to the ambient sound processing path to match the frequency response of the external microphone input signal to the frequency response of an open ear at the eardrums.

Corrective EQ

A corrective EQ is applied to the ambient sound processing path to flatten the frequency response of the balanced armature driver’s output at the eardrum reference point.

Power Control (RT685 Only)

The power control feature is only available in the RT685 version of the reference design. It includes APIs to activate and deactivate playback processing and ambient sound processing. Always on features, including own voice processing and keyword spotting, are automatically activated or deactivated by the following elements.

Quiescent Sound Detector

The Quiescent Sound Detector (QSD) module monitors activity in the ambient sound level. If it detects a quiet ambient sound level, the power control will deactivate all processing other than the QSD.

Voice Activity Detection

The Voice Activity Detection (VAD) subsystem utilizes the bone conduction sensor in the RAPID earbuds to detect when the user is speaking. If the QSD detects sounds, it will activate the VAD. If the VAD detects that the user is speaking, it will activate all own voice processing and keyword spotting. The status of the VAD is propagated to Mode Detection, Dynamics Processing, and Keyword Spotting to affect the processing they apply.

Native Design

Setup

The native design assumes that you have followed the setup instructions in the RAPID TWS Kit Setup and User Guide. Please review that document before proceeding.

Calibration

Because the interface does not apply fixed gain to the signals from the earbuds, you must calibrate the gains applied to the microphones for proper functioning of the design. Levels should be set using the VU meter following the mic trim gain in the record path.

Exterior/Ear Canal Microphones

Play 1 kHz tone at 70 dBa at 1 foot from the transducer.
Adjust microphone gains via the ‘MicTrimGain’ ScalerNV2 module until they show -40 dBFS on the meter in Audio Weaver.

Bone Conduction Sensor

Wearing the earbuds, make an “E” sound at 70 dBc at 18 inches.
Adjust microphone gains via the ‘MicTrimGain’ ScalerNV2 module until they show -40 dBFS on the meter in Audio Weaver.

To roughly match microphone input and streaming output levels between Native and RT685 reference design, UMC1820 volume nob shall set as follows. After set the gain nob, they shall not be changed.

Microphone input

Microphone input analog gain knob shall be set to 12 o’clock direction.

Speaker Output

Speaker output analog gain knob shall be set to 10 o’clock direction.

Signal Management

Audio Routing

In the native design, the input and output pins of the system are used for sending and receiving audio from the earbuds. The following tables display the configuration of signals assumed by the design. For setup instructions, refer to DSP Concepts RAPID TWS Kit Setup and User Guide.

Input

Table: Native Mode TWS Reference Design Input Channels

Pin	Signal
IN 1	Left Exterior Microphone #1
IN 2	Left Exterior Microphone #2
IN 3	Left Ear Canal Microphone
IN 4	Left Bone Conduction Sensor
IN 5	Right Exterior Microphone #1
IN 6	Right Exterior Microphone #2
IN 7	Right Ear Canal Microphone
IN 8	Right Bone Conduction Sensor

Output

Table: Native Mode TWS Reference Design Output Channels

Pin	Signal
OUT 1	Left Earbud Balanced Armature
OUT 2	Left Earbud Dynamic Driver
OUT 3	Right Earbud Balanced Armature
OUT 4	Right Earbud Dynamic Driver

Audio Playback

Playback of audio from a WAV file on disk is controlled by the Wave File Source module. Double left click the module to bring up an Inspector for the module. Playback can be stopped and started by clicking the isActive check box.

Control of the playback file path, number of channels, sample rate, block size, cache size, and looping can be accessed by completing the following steps:

Right click the WaveFileSource module.
Select View Properties.
Select the Arguments tab in the Property Sheet at the bottom of the Designer window.

Audio Capture

Recording of audio to a WAV file is controlled by the WaveFileSink module. Double left click the module to bring up a small Inspector control panel for the module. Recording can be stopped and started by toggling the isActive check box.

Control of the record file path, file base name, cache size, and bit depth can be accessed by completing the following steps:

Right click the WaveFileSink module.
Select View Properties
Select the Arguments tab in the Property Sheet at the bottom of the Designer window.

Keyword Detection Model

You can access control of the trigger word recognition model file by completing the following steps:

Right click the SensoryTHF_V6 module in the VUI subsystem.
Select View Properties
Select the Arguments tab in the Property Sheet at the bottom of the Designer window.
Specify the file path of the desired Sensory model

RT685 Design

Setup

For instructions on how to use the RAPID TWS Prototyping Kit with an NXP® RT685 EVK, refer to the TWS_Audio-Weaver RT685EVK-RevE-Board-Users-Guide.

Signal Management

Audio Routing

In the RT685 design, the input and output pins of the system are used for all signals routed in and out of the design. The following tables display the configuration of signals used by the design. For setup instructions, including how to configure your Digital Audio Workstation (DAW), refer to the TWS_Audio-Weaver--RT685EVK-RevE-Board-Users-Guide.

Input

Table: RT685 TWS Reference Design Input Channels

Pin	Signal
IN 1	USB Playback Channel #1
IN 2	USB Playback Channel #2
IN 3	USB Playback Channel #3
IN 4	USB Playback Channel #4
IN 5	Left Ear Canal Microphone
IN 6	Left Bone Conduction Sensor
IN 7	Left Exterior Microphone #1
IN 8	Left Exterior Microphone #2
IN 9	Right Ear Canal Microphone
IN 10	Right Bone Conduction Sensor
IN 11	Right Exterior Microphone #1
IN 12	Right Exterior Microphone #2

Output

Table: RT685 TWS Reference Design Output Channels

Pin	Signal
OUT 1	USB Record Channel #1
OUT 2	USB Record Channel #2
OUT 3	USB Record Channel #3
OUT 4	USB Record Channel #4
OUT 5	USB Record Channel #5
OUT 6	USB Record Channel #6
OUT 7	USB Record Channel #7
OUT 8	USB Record Channel #8
OUT 9	USB Record Channel #9
OUT 10	USB Record Channel #10
OUT 11	Left Earbud Balanced Armature
OUT 12	Left Earbud Dynamic Driver
OUT 13	Right Earbud Balanced Armature
OUT 14	Right Earbud Dynamic Driver

Audio Playback

After you complete the instructions for setup in the TWS_Audio-Weaver--RT685EVK-RevE-Board-Users-Guide, the first four input channels of the design become available for routing audio from your DAW to the RT685 board. This is achieved by routing the output of audio tracks in your DAW to channels 1-4.

Audio Capture

After you complete the instructions for setup in the TWS_Audio-Weaver--RT685EVK-RevE-Board-Users-Guide, the first 10 output pins of the design can be recorded in your DAW. This is achieved by routing the input to audio tracks in your DAW to channels 1-10.

Keyword Detection Model

You can access control of the trigger word recognition model file by completing the following steps:

Right click the SensoryTHFEmbeddedV2 module in the VUI subsystem.
Select View Properties
Select the Arguments tab in the Property Sheet at the bottom of the Designer window.
Specify the file path of the desired Sensory model.

Resource Requirement

We measured the computational load and data memory usage on RT685 HiFi4 core. The power control logic progressively activates or deactivates a part of the reference design computations. The data memory usage is determined in the design time, and is not affected by the power control logic.

This is a high-level breakdown table of the phase 3 reference design on RT685 Hifi4 core. The measurement was done with AWE-Core AC-8.C.9.

Table: Breakdown of Resource Usage

Processing Type	CPU Load (MHz)	SRAM Usage (KB)
Own Voice Processing	56	143
Ambient Sound Processing¹	7	1
Playback Processing²	155	213
VisiSonics stereo externalization	80	141
Mimi streaming personalization	54	15
Sensory keyword detection	14	53
Others³	0	100
Total	232	510

Notes:

Includes only Transparency EQ and Corrective EQ
Includes VisiSonics and Mimi
I/O buffer and wire scratch memory

Use Case Analysis

Each use case requires a full system or sub-set of signal processing features in the reference design. These are possible use cases

Always ON

Own voice processing and keyword detection
Own voice processing, keyword detection and ambient sound processing
Own voice processing, keyword detection and voice command detection (not available yet)
Own voice processing, keyword detection, voice command detection and ambient sound processing (not available yet)

Always on + streaming audio listening - any combination of items below and the items in Always ON

EQ and output limiter
EQ, output limiter and streaming personalization
EQ, output limiter and stereo externalization without head tracking
EQ, output limiter, streaming personalization, and stereo externalization without head tracking

Always on + Voice Call - any combination of items below and the items in Always ON

voice call Rx without personalization
voice call Rx without personalization (not available yet)

The following are the resource requirements of each use case. Choose any combination of Always ON items (1 or 2) and others (3, 4., 5, 6 or 7) and add them together.

Table: Resource Usage of Various Use Cases

ID	Use Case		MIPS	SRAM (KB)
1	Always ON	Own voice processing and keyword detection	70	283
2	Always ON	Own voice processing, keyword detection and ambient sound processing	80	286
3	Streaming Audio Listening	EQ and output limiter	21	57
4		EQ, output limiter and streaming personalization	75	72
5		EQ, output limiter and stereo externalization without head tracking	101	198
6		EQ, output limiter, streaming personalization, and stereo externalization without head tracking	155	213
7	Voice Call	voice call Rx without personalization	29	65

Finally, the table below shows the power consumption of multiple stages of automatic feature enablement on Phase 3+ system. The SRAM usage always remains the same as the full system 510 kB. CPU load is higher than sub-set of the system since dynamic deactivation of modules still has some overhead power usage.

The first four are Always ON scenario. Depends on the config setting from app, choose and add the CPU load from the remaining five items. For example, if enabling ambient sound listening and basic streaming listening, CPU load will be 14+13+21 in quiet condition with no own voice.

Table: Resource Usage with the Power Control Logic

Condition	Enabled Features	CPU Load Measured (MHz)
Quiet condition. No own voice	QSD	17
Quiet condition. Own voice detected	QSD, Sensory KWD detection	80
Noisy condition. No own voice	Own voice processing(BF, SCNR, fusion, etc.)	21
Noisy condition. Own voice detected	Own Voice Processing. Sensory KWD detection	80
enabling ambient sound listening	EQs	+9
enabling basic streaming listening	EQ, output limiter	+15
enabling streaming listening with Mimi	EQ, output limiter, Mimi	+70¹
enabling streaming listening with VisiSonics	EQ, output limiter, VisiSonics	+90¹
enabling streaming listening with Mimi and VisiSonics	EQ, output limiter, Mimi, VisiSonics	+145¹

Note:

Includes basic streaming process