SbDOAV2
Overview
Compute angular 2D direction of arrival of audio sources arriving at the mic array
Discussion
This module computes angular 2D direction of arrival of audio sources arriving at the mic array
Input Pins:
Input Pin 1: Multichannel frequency domain data. Usually output of a WOLA Analysis Module
Number of channels must match the number of microphones in the micGeometry argument
Input Pin 2: Bit map to include/exclude any desired input bins from processing. This will usually be set to all ones
Block size must be equal to the block size, (number of complex values), of Pin 1
Output Pins:
Output Pin 1: Estimated angular direction of arrival for this WOLA block in degrees. This will be an integer between 0 and 360.
Output Pin 2: Confidence value for this angle estimate. This will be an integer. Larger values mean more confidence.
Output Pin 3: Array of 360 floats denoting a histogram of possible angles of arrival for this block. Output pin 1 will be the peak of this histogram.
Module Arguments:
micGeometry: Drop down menu to select mic geometry being used
fftSize: FFTSize used by WOLA Analysis block preceding the SbDOAV2 module
Fs: Sampling rate of time domain data entering WOLA Analysis block
MicGeomSize: Dimension in meters of selected mic geometry. This has a different meaning for each mic geometry.
4_Mic_Square geometry MicGeomSize specifies side length of square.
4_Mic_Linear geometry MicGeomSize specifies distance between mics.
4_Mic_Trillium geometry MicGeomSize specifies radius of circle in which triangle is inscribed.
4_Mics_of_6_7_Mic_Circle geometry MicGeomSize specifies radius of circle in which 4 mics are inscribed.
Equilateral_Triangle geometry MicGeomSize specifies side length of triangle.
2_Mic geometry MicGeomSize specifies distance between mics.
7_Mic_Circle geometry MicGeomSize specifies radius of circle.
6_Mic_Circle geometry MicGeomSize specifies radius of circle.
Rectangular geometry MicGeomSize specifies side length of shorter side of 2x1 rectangle.
LowerBin: Frequency bin index of lowest WOLA bin entering SbDOAV2 input pin
MultiSource: Allows reporting of multiple audio sources, and controls optimization level.
MultiSource = 1 Report up to 5 audio sources per WOLA block. Angles are reported in the range 0-360 degrees.
This setting is not optimized and has huge MIPS spikes. This is only suitable for native mode, not any embedded targets.
MultiSource = 1 overrides any settings for OutputAngles and angles reported will always be in the range 0-360.
MultiSource = 0 Report only 1 audio source. This setting is optimized for low MIPS and is suitable for embedded and
native targets. The range of angles reported is controlled by the value of OuputAngles.
OutputAngles: Select range of angles reported 0-360 or 180-360
OutputAngles = 0 Report angles in the range 0-360
OutputAngles = 1 Report angles in the range 180-360. This setting uses slightly less MIPS since we are
reporting a smaller range of angles. This setting is intended for applications where only an angle range of 180
is needed, (e.g. a conference room soundbar mounted against a wall)
Inspector Tuning Variables:
BlocksPerHistogram: Each WOLA block we get an angle estimate for each bin. These are accumulated over BlocksPerHistogram
WOLA blocks into a histogram. Then the peak of the histogram is output as the angle estimate. Larger values of
BlocksPerHistogram give more accuracy but slower response time. BlocksPerHistogram value does not affect MIPS.
DT1: Dynamic thresholding parameter. Controls sensitivity of algorithm to input signal global power. DT1 specifies the
number of dB above the global threshold the average signal power for this block must be for an angle to be reported. This
parameter can be used to avoid reporting angles of small background nuisance noises
Type Definition
typedef struct _ModuleSbDOAV2
{
ModuleInstanceDescriptor instance; // Common Audio Weaver module instance structure
FLOAT32 Fs; // Sample rate of time domain data entering WOLA analysis block
INT32 fftSize; // FFT size used in computing WOLA block
INT32 micGeometry; // Specifies which 4 mic geometry we are working with
INT32 MultiSource; // MultiSource = 1, multiple audio sources reported each block. = 0, one audio source reported
INT32 OutputAngles; // OutputAngles = 0, angles in range 0-360 reported, = 1 180-360 reported
INT32 LowerBin; // Frequ bin index of lowest frequ component in the WOLA block
FLOAT32 MicGeomSize; // Length in meters of each side of square mic array
FLOAT32 DT1; // Dynamic threshold level
INT32 BlocksPerHistogram; // Number of blocks used to fill histogram before doing smoothing and averaging
INT32 boxcarDelay; // Half length of rectangular window for histogram smoothing
INT32 gaussDelay; // Half length of gaussian window for histogram smoothing
FLOAT32 gaussSigma; // Standard Deviation for Gaussian window
INT32 NumHistogramStates; // Number of full histograms used for temporal averaging
INT32 numStates; // Number of states in state buffer
INT32 Opt; // Optimization level
INT32 numMics; // Number of microphones for this mic geometry
INT32 maxDelay; // Maximum of gaussDelay, boxcarDelay
INT32 HistStatePtr; // Index of oldest entry in histogram buffer
INT32 XStatePtr; // Index of oldest entry in state buffer
INT32 blockCtr; // Counter to keep track of number of blocks used to update histogram
INT32 blockCtrReset; // Counter to keep track of number of blocks used between reset to zero of Rxx and avgHistogram
INT32 aboveThresh; // Input signal dBA greater than noise floor dBA. Changes once per BlocksPerHistogram blocks
INT32 doHistSmoothing; // Specifies whether to do smoothing of histogram
INT32 doHistAveraging; // Specifies whether to do temporal smoothing of histogram
INT32 doNFTracking; // Specifies whether to do noise floor tracking
INT32 doOnsetTracking; // Specifies whether to do onset tracking
FLOAT32 speedSound; // Speed of sound in m/s
FLOAT32 Test2Thresh; // Threshold for passing square mic array steering vector test 2
FLOAT32 Test1Thresh; // Threshold for passing square mic array steering vector test 1
FLOAT32 CTFact; // Coherence Test Factor, scale factor specifying how much larger dominant eigenvalue must be than the sum of all the other eigenvalues
FLOAT32 MinPeakThresh; // Minimum height of a histogram peak to be eligible to be reported as a DOA
INT32 MinDist; // Distance in integer units of degrees, between peaks in histogram to count as a distinct DOA
INT32 BlocksPerReset; // Number of blocks between resets to zero of Rxx and avgHistogram
FLOAT32 NFUp; // Noise floor ramp up coefficient
FLOAT32 NFUpSlow; // Noise floor ramp up slow coefficient
FLOAT32 NFDown; // Noise floor ramp down coefficient
FLOAT32 NFMin; // Minimum allowable value for noise floor
FLOAT32 NFSNR; // Noise Floor SNR
INT32 NFFrameCount; // Max number of frames to ramp up fast
FLOAT32 OSUp; // Onset ramp up coefficient
FLOAT32 OSDown; // Onset ramp down coefficient
FLOAT32 OSJump; // Onset jump coefficient
INT32 OSDur; // Number of frames for onset decay
FLOAT32* Rxx; // Holds current estimate of Rxx
FLOAT32* RxxTemp; // Temporary buffer needed for squaring Rxx
FLOAT32* RxxTemp2; // Temporary buffer needed for squaring Rxx
FLOAT32* XState; // Holds numStates sets of numMics input data
FLOAT32* HistState; // Holds NumHistogramStates histograms
FLOAT32* TempReal; // Scratch array of numSB real values
FLOAT32* noise_floor; // Noise floor per bin, array of numSB real values
INT32* sig_count_bw; // Frame count for fast ramp up per bin, array of numSB real values
FLOAT32* onset_threshold; // Onset threshold per bin, array of numSB real values
INT32* count_onset; // Onset count per bin, array of numSB real values
FLOAT32* xMagAvg; // Average magnitude per bin, array of numSB real values
FLOAT32* NFAvg; // Noise floor average per bin, array of numSB real values
FLOAT32* AWeights; // A weighting values
FLOAT32* TempComplex; // Scratch array of numSB complex values
FLOAT32* EvalEst; // Scratch array to hold estimates of dominant eigenvalues
FLOAT32* Trace; // Scratch array to hold trace of Rxx for each subband
INT32* freqBinMap; // Scratch array to hold mapping between valid bins and discrete frquency indices
FLOAT32* avgHistogram; // Scratch array to hold average histogram
FLOAT32* currentHistogram; // Scratch array to hold current histogram
FLOAT32* smoothedHistogram; // Scratch array to hold smoothed histogram
FLOAT32* histExt; // Scratch array to hold smoothed histogram
FLOAT32* histCnv; // Scratch array to hold smoothed histogram
INT32* histSortedAngles; // Scratch array for histogram peak picking
FLOAT32* histSortedValues; // Scratch array for histogram peak picking
INT32* histAnglesUsed; // Scratch array for histogram peak picking
FLOAT32* valuesOutState; // Scratch array to hold output histogram counts
INT32* anglesOutState; // Scratch array to hold output angles
FLOAT32* GaussianWindow; // Gaussian window for smoothing histogram
FLOAT32* BoxcarWindow; // Rectangular window for smoothing histogram
} ModuleSbDOAV2Class;
Variables
Properties
Name | Type | Usage | isHidden | Default value | Range | Units |
Fs | float | const | 0 | 48000 | Unrestricted | |
fftSize | int | const | 0 | 512 | Unrestricted | |
micGeometry | int | const | 0 | 0 | Unrestricted | |
MultiSource | int | const | 0 | 0 | Unrestricted | |
OutputAngles | int | const | 0 | 0 | Unrestricted | |
LowerBin | int | const | 0 | 0 | Unrestricted | |
MicGeomSize | float | const | 0 | 0.04 | Unrestricted | |
DT1 | float | parameter | 0 | 3 | -5000:5000 | |
BlocksPerHistogram | int | parameter | 0 | 32 | 1:200 | |
boxcarDelay | int | const | 1 | 2 | Unrestricted | |
gaussDelay | int | const | 1 | 8 | Unrestricted | |
gaussSigma | float | const | 1 | 3.2 | Unrestricted | |
NumHistogramStates | int | const | 1 | 4 | Unrestricted | |
numStates | int | const | 1 | 4 | Unrestricted | |
Opt | int | const | 1 | 1 | 0:2 | |
numMics | int | const | 1 | 4 | Unrestricted | |
maxDelay | int | const | 1 | 8 | Unrestricted | |
HistStatePtr | int | state | 1 | 0 | Unrestricted | |
XStatePtr | int | state | 1 | 0 | Unrestricted | |
blockCtr | int | state | 1 | 0 | Unrestricted | |
blockCtrReset | int | state | 1 | 0 | Unrestricted | |
aboveThresh | int | state | 1 | 0 | Unrestricted | |
doHistSmoothing | int | parameter | 1 | 1 | Unrestricted | |
doHistAveraging | int | parameter | 1 | 1 | Unrestricted | |
doNFTracking | int | parameter | 1 | 0 | Unrestricted | |
doOnsetTracking | int | parameter | 1 | 0 | Unrestricted | |
speedSound | float | parameter | 1 | 343 | 300:400 | metersPerSecond |
Test2Thresh | float | parameter | 1 | 0.2 | 0:2 | |
Test1Thresh | float | parameter | 1 | 0.2 | 0:2 | |
CTFact | float | parameter | 1 | 10 | -1:1e+30 | |
MinPeakThresh | float | parameter | 1 | 1 | -1:1000000 | |
MinDist | int | parameter | 1 | 10 | 0:360 | degrees |
BlocksPerReset | int | parameter | 1 | 100 | 100:100000 | |
NFUp | float | parameter | 1 | 1.01 | 0:2 | |
NFUpSlow | float | parameter | 1 | 1.001 | 0:2 | |
NFDown | float | parameter | 1 | 0.99 | 0:1 | |
NFMin | float | parameter | 1 | 0.0001 | 0:1 | |
NFSNR | float | parameter | 1 | 2 | 0:10 | |
NFFrameCount | int | parameter | 1 | 3 | 0:100 | |
OSUp | float | parameter | 1 | 1.3 | 0:10 | |
OSDown | float | parameter | 1 | 0.95 | 0:1 | |
OSJump | float | parameter | 1 | 1 | 0:10 | |
OSDur | int | parameter | 1 | 2 | 0:25 | |
Rxx | float* | state | 0 | [1024 x 1] | Unrestricted | |
RxxTemp | float* | state | 0 | [1024 x 1] | Unrestricted | |
RxxTemp2 | float* | state | 0 | [1024 x 1] | Unrestricted | |
XState | float* | state | 0 | [1024 x 1] | Unrestricted | |
HistState | float* | state | 0 | [1440 x 1] | Unrestricted | |
TempReal | float* | state | 0 | [32 x 1] | Unrestricted | |
noise_floor | float* | state | 0 | [32 x 1] | Unrestricted | |
sig_count_bw | int* | state | 0 | [32 x 1] | Unrestricted | |
onset_threshold | float* | state | 0 | [32 x 1] | Unrestricted | |
count_onset | int* | state | 0 | [32 x 1] | Unrestricted | |
xMagAvg | float* | state | 0 | [32 x 1] | Unrestricted | |
NFAvg | float* | state | 0 | [32 x 1] | Unrestricted | |
AWeights | float* | parameter | 0 | [1 x 257] | Unrestricted | |
TempComplex | float* | state | 0 | [64 x 1] | Unrestricted | |
EvalEst | float* | state | 0 | [64 x 1] | Unrestricted | |
Trace | float* | state | 0 | [64 x 1] | Unrestricted | |
freqBinMap | int* | state | 0 | [32 x 1] | Unrestricted | |
avgHistogram | float* | state | 0 | [360 x 1] | Unrestricted | |
currentHistogram | float* | state | 0 | [360 x 1] | Unrestricted | |
smoothedHistogram | float* | state | 0 | [360 x 1] | Unrestricted | |
histExt | float* | state | 0 | [376 x 1] | Unrestricted | |
histCnv | float* | state | 0 | [392 x 1] | Unrestricted | |
histSortedAngles | int* | state | 0 | [360 x 1] | Unrestricted | |
histSortedValues | float* | state | 0 | [360 x 1] | Unrestricted | |
histAnglesUsed | int* | state | 0 | [360 x 1] | Unrestricted | |
valuesOutState | float* | state | 0 | [10 x 1] | Unrestricted | |
anglesOutState | int* | state | 0 | [10 x 1] | Unrestricted | |
GaussianWindow | float* | parameter | 0 | [17 x 1] | Unrestricted | |
BoxcarWindow | float* | parameter | 0 | [5 x 1] | Unrestricted |
Pins
Input Pins
Name: Audio
Description: audio input
Data type: float
Channel range: 4
Block size range: Unrestricted
Sample rate range: Unrestricted
Complex support: Complex
Name: BitMap
Description: bit map
Data type: int
Channel range: 1
Block size range: Unrestricted
Sample rate range: Unrestricted
Complex support: Real
Output Pins
Name: Angle
Description: DOA Angle
Data type: float
Name: ConfidenceValue
Description: DOA Confidence Value
Data type: float
Name: Histogram
Description: Histogram Data
Data type: float
MATLAB Usage
File Name: sb_doa_v2_module.m
M=sb_doa_v2_module(NAME, MICGEOM, FFTSIZE, FS, MICGEOMSIZE, LOWERBIN, MULTISOURCE, OUTPUTANGLES)
SbDOAV2 module identifies the 2D angular direction of an audio source received at the mic array.
The module takes multichannel frequency domain data as input, (usually the output of a WOLA
Analysis module). The module produces 2 outputs per block:
1.)Angle = estimated angular direction of arrival in degrees. This will be an integer between 0 and 360.
2.)Confidence value for this angle estimate. Larger values mean more confidence in the angle estimate.
Arguments:
NAME - Name of the module.
MICGEOM - Specifies the mic geometry of the multichannel input data
FFTSIZE - FFTSize used by WOLA Analysis block preceding the SbDOAV2 module
FS - Sampling rate of time domain data entering WOLA Analysis block
MICGEOMSIZE - Dimension in meters of selected mic geometry
LOWERBIN - Frequency bin index of lowest WOLA bin entering SbDOAV2 input pin
MULTISOURCE - MULTISOURCE = 1 ---> Report multiple audio sources per block, (MIPS intensive),
- MULTISOURCE = 0 ---> Report one audio source per block, (Low MIPS)
OUTPUTANGLES - OUTPUTANGLE = 0 ---> Report angles from 0 to 360
- OUTPUTANGLE = 1 ---> Report angles from 180 to 360