Patent 2783913 Summary

(12) Patent:	(11) CA 2783913
(54) English Title:	OFF-AXIS AUDIO SUPPRESSION IN AN AUTOMOBILE CABIN
(54) French Title:	SUPPRESSION DES SONS HORS AXE DANS UN HABITACLE D'AUTOMOBILE
Status:	Granted

Bibliographic Data

(51) International Patent Classification (IPC):	H04R 3/00 (2006.01) B60R 11/02 (2006.01) G10L 21/0264 (2013.01)
(72) Inventors :	FALLAT, MARK RYAN (Canada) HETHERINGTON, PHILLIP ALAN (Canada) PERCY, MICHAEL ANDREW (Canada)
(73) Owners :	BLACKBERRY LIMITED (Canada)
(71) Applicants :	QNX SOFTWARE SYSTEMS LIMITED (Canada)
(74) Agent:	GOWLING WLG (CANADA) LLP
(74) Associate agent:
(45) Issued:	2016-01-26
(22) Filed Date:	2012-07-30
(41) Open to Public Inspection:	2013-01-29
Examination requested:	2012-07-30
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:

Application No.	Country/Territory	Date
11175926.2	European Patent Office (EPO)	2011-07-29

Abstracts

English Abstract

The suppression of off-axis audio in an audio environment is provided. Off- axis audio may be considered audio that does not originate from a region of interest. The off-axis audio is suppressed by comparing a phase difference between signals from two microphones to a target slope of the phase difference between signals originating from the region of interest. The target slope can be adapted to allow the region of interest to move with the location of a human speaker such as a driver.

French Abstract

La suppression dun signal audio hors axe dans un environnement audio est fournie. Un signal audio hors axe peut être considéré comme un signal audio ne provenant pas dune région dintérêt. Le signal audio hors axe est supprimé en comparant une différence de phase entre les signaux provenant de deux microphones à une pente cible de la différence de phase entre les signaux provenant de la région dintérêt. La pente cible peut être adaptée pour permettre à la région dintérêt de se déplacer selon lemplacement dun locuteur humain, comme un conducteur.

Claims

Note: Claims are shown in the official language in which they were submitted.

WHAT IS CLAIMED:

1. A method of off-axis audio suppression in an audio environment comprising:
receiving first and second audio signals from first and second microphones
positioned within the audio environment;
calculating a phase difference between the first and second audio signals;
adjusting a target slope based on the calculated phase difference between the
first
and second audio signals to adapt the region of interest based on a location
of a
human speaker within the audio environment, the target slope defining a
desired
phase difference between signals from the first and second microphones
corresponding to audio originating from a region of interest, and where
adjusting
the target slope includes:
unwrapping the calculated phase difference;
calculating a slope of the unwrapped phase difference;
calculating a difference between the slope and the target slope;
determining if the calculated difference is larger than a defined tolerance;
adjusting the target slope based on the slope of the unwrapped phase
difference;
calculating a direction error between the calculated phase difference and the
target
slope; and
processing the first and second audio signals based on the calculated
direction
error to suppress off-axis audio relative to the positions of the first and
second
microphones and the region of interest.
2. The method of claim 1, further comprising:
calculating a slope confidence value when unwrapping the calculated phase
difference, the slope confidence value determined as a sum of a signal-to-
noise
ratio of each of a plurality of frequency ranges of the calculated phase
difference; and
adjusting the target slope based on the slope of the unwrapped phase
difference
and the slope confidence value.

3. The method of claims 1 or 2, further comprising:
smoothing the calculated slope;
determining if an initial value for the target slope has been set;
determining if the smoothed slope has been stable for a time interval;
determining if the smoothed slope is in a desired direction based on the sign
of the
smoothed slope and the location of the human speaker in the audio
environment;
determining if the first and second audio signals correspond to voice audio;
and
setting an initial value for the target slope based to the smoothed slope when
the
initial value has not been set, the smoothed slope has been stable for the
time
interval, the smoothed slope is in the desired direction and the first and
second
audio signals correspond to voice audio.
4. The method of claims 1 or 2, further comprising:
smoothing the calculated slope;
determining if the smoothed slope has been stable for a time interval;
determining if the smoothed slope is in a desired direction based on the sign
of the
smoothed slope and the location of the human speaker in the audio
environment;
determining if the first and second audio signals correspond to voice audio;
adjusting the target slope based on the slope of the unwrapped phase
difference
using a leaky integrator when the smoothed slope has been stable for the time
interval, the smoothed slope is in the desired direction and the first and
second
audio signals correspond to voice audio, and
keeping the target slope unchanged when the smoothed slope has been stable for

the time interval or the smoothed slope is not in the desired direction or the
first
and second audio signals do not correspond to voice audio.
5. The method of any one of claims 1 to 4, wherein unwrapping comprises:
calculating a moving average of the phase difference;
locating zero-crossings of the moving average;

16

confirming the zero crossing actually represent direction changes; and
unwrapping the phase difference based on the located confirmed zero-crossings
and the direction of a low-frequency phase difference.
6. The method of any one of claims 1 to 5, wherein unwrapping comprises:
determining if a difference between the phase difference and the target slope
is
greater than pi or less than -pi;
subtracting n*pi from the phase difference when the difference between the
phase
difference and the target slope is greater than pi,
where:
target slope + pi > phase difference - n*pi > target slope - pi; and
adding m*pi to the phase difference when the difference between the phase
difference and the target slope is less than -pi,
where:
target slope + pi > phase difference + m pi > target slope - pi.
7. The method of any one of claims 1 to 6, wherein the first and second audio
signals
are frequency domain representations of a frame of audio received at the
corresponding microphone over a time interval, and wherein the method is
repeated
for subsequent frames of audio.
8. The method of claim 7, wherein processing the first and second audio
signals
comprises:
determining if the direction error is less than an on-axis threshold,
indicating that the
frame of audio represented by the first and second audio signals corresponds
to
voice audio originating from the region of interest; and
combining the first and second audio signals to enhance the frame of audio
when
the direction error is less than the on-axis threshold.
9. The method of claim 7, wherein processing the first and second audio
signals
comprises:

17

determining if the direction error is greater than an off-axis threshold,
indicating that
the frame of audio represented by the first and second audio signals
corresponds to noise audio or to voice audio originating from outside the
region
of interest; and
combining the first and second audio signals to suppress the frame of audio
when
the direction error is greater than the off-axis threshold.
10. The method of claim 7, wherein processing the first and second audio
signals
comprises:
determining if the direction error is between an off-axis threshold and an on-
axis
threshold, indicating that the frame of audio represented by the first and
second
audio signals corresponds to a combination of voice audio originating from the

region of interest and noise audio or to voice audio originating from outside
the
region of interest;
calculating a mixing mask as a function of frequency; and
combining the first and second audio signals using the mixing mask when the
direction error is between than the off-axis threshold and the on-axis
threshold.
11. An apparatus for performing off-axis audio suppression in an audio
environment
comprising:
a processor and memory configuring the apparatus to provide:
a target slope stored in memory defining a desired phase difference between
signals from first and second microphones corresponding to audio
originating from a region of interest;
a target adaptation component adjusting the target slope based on the
calculated phase difference between the first and second audio signals to
adapt the region of interest based on a location of a human speaker within
the audio environment;
a source-locating component calculating a direction error between the target
slope and a phase difference between first and second audio signals
received from the first and second microphones; and

18

an audio mixer processing the first and second audio signals based on the
calculated direction error to suppress off-axis audio relative to the
positions
of the first and second microphones and the region of interest;
wherein the target adaptation component unwraps the calculated phase
difference, calculates a slope of the unwrapped phase difference, calculates a

difference between the slope and the target slope, determines if the
calculated
difference is larger than a defined tolerance, and adjusts the target slope
based
on the slope of the unwrapped phase difference.
12. The apparatus of claim 11, further comprising:
calculating a slope confidence value when unwrapping the calculated phase
difference, the slope confidence value determined as a sum of a signal-to-
noise
ratio of each of a plurality of frequency ranges in the calculated phase
difference; and
adjusting the target slope based on the slope of the unwrapped phase
difference
and the slope confidence value.
13. The apparatus of claims 11 or 12, wherein the target adaptation component
further
sets an initial value for the target slope by:
smoothing the calculated slope;
determining if an initial value for the target slope has been set;
determining if the smoothed slope has been stable for a time interval;
determining if the smoothed slope is in a desired direction based on the sign
of the
smoothed slope and the location of the human speaker in the audio
environment;
determining if the first and second audio signals correspond to voice audio;
and
setting the initial value for the target slope based on the smoothed slope
when the
initial value has not been set, the smoothed slope has been stable for the
time
interval, the smoothed slope is in the desired direction and the first and
second
audio signals correspond to voice audio.

19

14. The apparatus of claims 11 or 12, wherein the target adaptation component
adjusts
the target slope by:
smoothing the calculated slope;
determining if the smoothed slope has been stable for a time interval;
determining if the smoothed slope is in a desired direction based on the sign
of the
smoothed slope and the location of the human speaker in the audio
environment;
determining if the first and second audio signals correspond to voice audio;
adjusting the target slope based on the slope of the unwrapped phase
difference
using a leaky integrator when the smoothed slope has been stable for the time
interval, the smoothed slope is in the desired direction and the first and
second
audio signals correspond to voice audio; and
keeping the target slope unchanged when the smoothed slope has not been stable

for the time interval or the smoothed slope is not in the desired direction or
the
first and second audio signals do not correspond to voice audio.
15. The apparatus of any one of claims 11 to 14, wherein unwrapping comprises:

calculating a moving average of the phase difference;
locating zero-crossings of the moving average;
confirming the zero crossing actually represent direction changes; and
unwrapping the phase difference based on the located confirmed zero-crossings
and a direction of the low-frequency phase difference.
16. The apparatus of any one of claims 11 to 15, wherein unwrapping comprises:

determining if a difference between the phase difference and the target slope
is
greater than pi or less than -pi;
subtracting n*pi from the phase difference when the difference between the
phase
difference and the target slope is greater than pi,
where:
target slope + pi > phase difference - n*pi > target slope - pi; and

adding m*pi to the phase difference when the difference between the phase
difference and the target slope is less than -pi,
where:
target slope + pi > phase difference + m*pi > target slope - pi.
17. The apparatus of any one of claims 11 to 16, further comprising a signal
processing
component to convert the first and second audio signals to frequency domain
representations of a frame of audio received at the corresponding microphone
over a
time interval.
18. The apparatus of claim 17, wherein the audio mixer determines if the
direction error
is less than an on-axis threshold, indicating that the frame of audio
represented by
the first and second audio signals corresponds to voice audio originating from
the
region of interest and combines the first and second audio signals to enhance
the
frame of audio when the direction error is less than the on-axis threshold.
19. The apparatus of claim 17, wherein the audio mixer determines if the
direction error
is greater than an off-axis threshold, indicating that the frame of audio
represented by
the first and second audio signals corresponds to noise audio or to voice
audio
originating from outside the region of interest and combines the first and
second
audio signals to suppress the frame of audio when the direction error is
greater than
the off-axis threshold.
20. The apparatus of claim 17, wherein the audio mixer determines if the
direction error
is between an off-axis threshold and an on-axis threshold, indicating that the
frame of
audio represented by the first and second audio signals corresponds to a
combination of voice audio originating from the region of interest and noise
audio or
to voice audio originating from outside the region of interest and combines
the first
and second audio signals using a mixing mask calculated as a function of
frequency
when the direction error is between the off-axis threshold and the on-axis
threshold.

21

21. A computer readable non-transitory memory containing instructions which
when
executed by a processor perform a method of off-axis audio suppression in an
audio
environment comprising:
receiving first and second audio signals from first and second microphones
positioned within the audio environment;
calculating a phase difference between the first and second audio signals;
adjusting a target slope based on the calculated phase difference between the
first
and second audio signals to adapt the region of interest based on a location
of a
human speaker within the audio environment, the target slope defining a
desired
phase difference between signals from the first and second microphones
corresponding to audio originating from a region of interest, and where
adjusting
the target slope includes:
unwrapping the calculated phase difference;
calculating a slope of the unwrapped phase difference;
calculating a difference between the slope and the target slope;
determining if the calculated difference is larger than a defined tolerance;
adjusting the target slope based on the unwrapped phase difference;
calculating a direction error between the calculated phase difference and the
target
slope; and
processing the first and second audio signals based on the calculated
direction
error to suppress off-axis audio relative to the positions of the first and
second
microphones and the region of interest.

22

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02783913 2012-07-30
OFF-AXIS AUDIO SUPPRESSION IN AN AUTOMOBILE CABIN
TECHNICAL FIELD
The current application relates to processing of audio in an audio
environment,
and in particular to the suppression of audio that is off-axis from a desired
direction.
BACKGROUND
Automobiles increasingly incorporate electronic devices into the cabin. These
electronic devices may include for example mobile devices, navigation systems,
control
systems, and/or audio/video systems. It is desirable to allow interaction with
these
devices using voice commands in order to allow a driver to focus on driving
the
automobile.
In order to allow interaction and control of electronics using voice commands
using audio from an audio environment such as an automobile cabin, it is
necessary to
process audio signals in order to identify desired voice commands. Voice
recognition is
used to translate received audio into a voice command, which can then be
executed to
interact with or control the electronics of the automobile or devices
connected thereto.
However, in an automobile environment it can be difficult to isolate audio
associated
with a human speaker from other noise present in the cabin or external to the
cabin.
Additional audio that may make voice recognition difficult may include, for
example,
conversations from other occupants, road noise, wind noise, windshield washer
noises,
turn signals, etc.
Attempts to enhance audio corresponding to a specific occupant and suppress
audio associated with noise have been limited in success. Some attempts use a
fixed
array of microphones to determine the location of an audio signal. In
particular, these
attempted solutions have used a phase difference between signals of individual
microphones of the microphone array.
Often these solutions require that the
microphones in the microphone array be positioned in a specific location, with
a
predetermined separation between microphones. This places an undesirable
restriction
on automobile manufacturers when designing an automobiles interior cabin.
1

CA 02783913 2014-12-02
It would be desirable to be able to suppress off-axis audio in an audio
environment while allowing flexibility in the position of microphones.
SUMMARY
In accordance with the present disclosure there is provided a method of off-
axis
audio suppression in an audio environment comprising: receiving first and
second audio
signals from first and second microphones positioned within the audio
environment;
calculating a phase difference between the first and second audio signals;
adjusting a
target slope based on the calculated phase difference between the first and
second
audio signals to adapt the region of interest based on a location of a human
speaker
within the audio environment, the target slope defining a desired phase
difference
between signals from the first and second microphones corresponding to audio
originating from a region of interest, and where adjusting the target slope
includes:
unwrapping the calculated phase difference; calculating a slope of the
unwrapped
phase difference; calculating a difference between the slope and the target
slope;
determining if the calculated difference is larger than a defined tolerance;
adjusting the
target slope based on the slope of the unwrapped phase difference; calculating
a
direction error between the calculated phase difference and the target slope;
and
processing the first and second audio signals based on the calculated
direction error to
suppress off-axis audio relative to the positions of the first and second
microphones and
the region of interest.
In accordance with the present disclosure there is further provided an
apparatus
for performing off-axis audio suppression in an audio environment comprising:
a
processor and memory configuring the apparatus to provide: a target slope
stored in
memory defining a desired phase difference between signals from first and
second
microphones corresponding to audio originating from a region of interest; a
target
adaptation component adjusting the target slope based on the calculated phase
difference between the first and second audio signals to adapt the region of
interest
based on a location of a human speaker within the audio environment; a source-
locating
component calculating a direction error between the target slope and a phase
difference
2

CA 02783913 2014-12-02
between first and second audio signals received from the first and second
microphones;
and an audio mixer processing the first and second audio signals based on the
calculated direction error to suppress off-axis audio relative to the
positions of the first
and second microphones and the region of interest; wherein the target
adaptation
component unwraps the calculated phase difference, calculates a slope of the
unwrapped phase difference, calculates a difference between the slope and the
target
slope, determines if the calculated difference is larger than a defined
tolerance, and
adjusts the target slope based on the slope of the unwrapped phase difference.
In accordance with the present disclosure there is further provided a computer

readable non-transitory memory containing instructions which when executed by
a
processor perform a method of off-axis audio suppression in an audio
environment
comprising: receiving first and second audio signals from first and second
microphones
positioned within the audio environment; calculating a phase difference
between the first
and second audio signals; adjusting a target slope based on the calculated
phase
difference between the first and second audio signals to adapt the region of
interest
based on a location of a human speaker within the audio environment, the
target slope
defining a desired phase difference between signals from the first and second
microphones corresponding to audio originating from a region of interest, and
where
adjusting the target slope includes: unwrapping the calculated phase
difference;
calculating a slope of the unwrapped phase difference; calculating a
difference between
the slope and the target slope; determining if the calculated difference is
larger than a
defined tolerance; adjusting the target slope based on the unwrapped phase
difference;
calculating a direction error between the calculated phase difference and the
target
slope; and processing the first and second audio signals based on the
calculated
direction error to suppress off-axis audio relative to the positions of the
first and second
microphones and the region of interest.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments are described herein with references to the appended drawings, in
which:
3

CA 02783913 2014-12-02
Figure 1 depicts in a diagram an illustrative environment in which off-axis
audio
suppression may be used;
Figure 2 depicts in a flow diagram an illustrative method of off-axis audio
suppression;
Figure 3 depicts in a flow diagram an illustrative method of adapting a target

slope;
Figure 4 depicts in a flow diagram an illustrative method of unwrapping a
phase
difference;
Figure 5 depicts in a flow diagram a further illustrative method of adapting a

target slope;
Figure 6 depicts in a flow diagram a further illustrative method of off-axis
audio
suppression;
Figure 7 depicts in a flow diagram an illustrative method of processing audio;
and
Figure 8 depicts in a block diagram illustrative components of a system for
suppressing off-axis audio.
DETAILED DESCRIPTION
It will be appreciated that for simplicity and clarity of illustration, where
considered appropriate, reference numerals may be repeated among the figures
to
indicate corresponding or analogous elements. In addition, numerous specific
details
are set forth in order to provide a thorough understanding of the embodiments
described
herein. However, it will be understood by those of ordinary skill in the art
that the
3a

CA 02783913 2012-07-30
embodiments described herein may be practiced without these specific details.
In other
instances, well-known methods, procedures and components have not been
described
in detail so as not to obscure the embodiments described herein. Also, the
description is
not to be considered as limiting the scope of the embodiments described
herein.
Off-axis audio suppression is described in detail with regards to Figures 1 -
8.
The off-axis audio suppression is described as being applied in an automobile
cabin to
improve the audio signal used to perform voice recognition to identify
commands
provided by the driver of the automobile. As described later, it is also
contemplated that
the off-axis audio suppression may also be used to improve the audio quality
of hands-
free phone conversations, as well as improve the audio signal from automobile
occupants other than the driver. Further, although described with regards to
an
automobile cabin, it is contemplated that the off-axis audio suppression may
be used in
other audio environments.
Figure 1 depicts in a diagram an illustrative environment in which off-axis
audio
suppression may be used. As depicted, an automobile 102 includes a cabin 104
in
which a driver 106 and passengers 108a, 108b, 108c (referred to collectively
as
passengers 108) sit. It will be appreciated that the passengers do not need to
be
present in the cabin 104. A plurality of microphones 110, 112 are positioned
within cabin
to pick up sound within the cabin 104. Although two microphones are described
herein,
it is contemplated that more microphones could be positioned within the cabin
104.
Voice recognition is typically activated by the driver pressing a button, for
example on the steering wheel, although other arrangements are possible. Once
the
voice recognition is activated, audio signals captured from the microphones
110, 112
are processed to identify an associated command. For example, commands may
include "Call home", "Play album", "Get directions", etc. Once the captured
audio is
processed and the associated command identified, it can be executed by an
appropriate
system or component of the automobile.
In the environment of Figure 1, the voice recognition processing may be
impeded
by additional audio other than the driver's spoken command.
For example,
conversations between the passengers may make identifying a desired command
4

CA 02783913 2012-07-30
associated with the driver's spoken command difficult. In order to enhance
audio
associated with the driver's spoken command and suppress the additional audio,

conceptually a region of interest is associated with the driver 106 and an
axis
determined from the region of interest to the microphones 110, 112. The axis
may be
represented by a slope of a phase difference between audio received at two
spaced
apart microphones. Audio that is determined to originate from a source off-
axis to the
region of interest is suppressed. By suppressing the off-axis audio, an
improved audio
signal can be provided to the voice recognition system, improving the chances
of
correctly identifying a spoken command.
In order to suppress off-axis audio, a phase difference between the audio
signals
captured by the two microphones is compared to a target slope. The audio
signals from
each microphone are converted into a frequency domain representation that
includes
phase information associated with discrete frequency ranges or bins. The phase

difference between the two signals is determined as the difference between the
phase
information for each corresponding frequency range or bin of the frequency
domain
audio signals. The target slope defines a desired phase difference between
signals
from the first and second microphones corresponding to audio originating from
the
region of interest. The phase difference between two signals may be described
by a
slope since the expected phase difference for an audio signal will not
necessarily be
constant across all frequencies, but will be a slope linearly increasing or
decreasing
from 0 at 0 Hz. The actual phase difference is compared to the desired phase
difference corresponding to audio originating from the region of interest. The
region of
interest defined by the target slope is adaptively updated in order to
correspond to an
actual location of the driver giving the spoke command. Adaptively adjusting
the region
of interest defined by the slope allows the driver to move freely while still
maintaining
suppression of additional audio not associated with the driver's spoken
command.
The target slope is determined as the phase difference versus frequency of
audio
that comes from the region of interest. When each audio signal is converted to
a
frequency domain signal, an interval of audio, for example 32 milliseconds
(ms) may be
converted to a frame of audio in the frequency domain. The frame of audio
comprises
information regarding the amplitude and phase of the audio for different
frequencies.

CA 02783913 2012-07-30
The frequencies may be grouped together in discrete ranges or bins and the
amplitude
and phase for each bin determined.
Figure 2 depicts in a flow diagram an illustrative method of off-axis audio
suppression. The method 200 begins with receiving first and second audio
signals
(202), which correspond to the audio captured from the first and second
microphones.
The audio signals are processed and a phase difference between the two signals
is
calculated (204). The phase difference is calculated for each frequency range
or bin of
the frequency domain audio signals. Once the phase difference between the two
signals is calculated, a direction error is calculated between the phase
difference and a
target slope (206). As described above, the target slope defines a desired
phase
difference between signals from the microphones corresponding to audio from
the
region of interest. As such, the direction error provides an indication as to
whether the
audio signals correspond to audio from the region of interest. The calculated
phase
difference is used to process the audio signals (208) and suppress off-axis
audio. The
processed audio may be used for voice recognition and may provide better
results due
to the suppressed off-axis audio.
Figure 3 depicts in a flow diagram an illustrative method 300 of adjusting a
target
slope used in suppressing off-axis audio. As described above, the target slope
defines
a region of interest corresponding to a location in the automobile cabin that
the desired
audio for the voice recognition originates from. The target slope is adjusted
based on
the audio received from the microphones in order to adapt the region of
interest to
correspond to the location of he audio source as it moves within the
automobile cabin.
To adjust the target slope, the phase difference is unwrapped (302). The audio
signals
captured from the microphones are transformed into a frequency domain
representation
and the phase difference calculated. However in doing so the phase difference
is limited
to between +1- pi, regardless of if the actual phase difference is larger. The
unwrapping
of the phase difference returns the limited phase difference signal to the
actual
representation of the phase difference. Once the phase difference is
unwrapped, the
slope of the unwrapped phase difference is calculated (304). Checks are then
made to
determine if the slope of the phase difference should be used to update the
target slope.
These checks include determining if the slope of the phase difference provides
a stable
6

CA 02783913 2012-07-30
estimate of the direction of the audio (306). The direction may be stable if,
for examWe
the slope of the phase difference has not changed greatly within a time
interval, for
example 2 or more frames of the frequency domain signal. If the direction is
not stable
(No at 306), the phase difference should not be used to update the target
slope and the
method is done (316). If the direction is stable (Yes at 306) it is determined
if the slope
is in the desired direction (308). The slope will typically be either
increasing or
decreasing depending on where, relative to the microphones the audio
originates from.
During an initial configuration of the method one of the microphones may be
indicated
as being closer to the driver, or other desired occupant. In such a case, the
desired
direction of the slope of the phase difference would be increasing, since
higher
frequencies will have a larger phase difference than lower frequencies. As
will be
appreciated, the desired direction may change if, for example, it is desired
to enhance
audio coming from the passenger side of the automobile cabin rather than the
driver. If
the slope of the phase difference is in the wrong direction (No at 308) than
the audio is
not coming from the desired side of the automobile cabin and so the slope of
the phase
difference should not be used to update the target slope and so the method is
done
(316). When the slope of the phase difference is in the right direction (Yes
at 308), then
the audio is coming from the desired side of the automobile cabin and it is
determined if
the audio is considered voice audio (310). As will be appreciated there are
various
ways to determine if the audio is associated with voice. Voice audio is
typically
associated with higher energy. If the audio is not voice audio (No at 310)
then the audio
is considered noise and so should not be used to update the target slope. When
the
audio is voice (Yes at 310) it is determined if the difference between the
slope of the
phase difference and the current target slope is large enough to use for
adapting the
target slope (312). A defined tolerance or threshold value may be used in
determining if
the difference is large enough. When the difference is not large enough (No at
312) the
method is done (316). When the difference is large enough (Yes at 312) then
the slope
of the phase difference is used to adjust the target slope (314). The target
slope may
be adjusted using a weighted means such as, for example, a leaky integrator.
As described above, the target slope is adjusted based on the slope of the
phase
difference of the signals. The difference between the slope of the phase
difference and
7

CA 02783913 2012-07-30
the current target slope is used in adjusting the target slope. However, if
the audio
enhancement has just been initiated, for example, by the driver pressing a
button on the
steering wheel, the target slope may not have been set yet and so the
difference
between the target slope and the slope of the phase difference cannot be
determined.
In such a case, rather than determining if the difference between the slope of
the phase
difference and the target slope is sufficient, the target slope may be set to
the slope of
the phase difference as an initial value.
Figure 4 depicts in a flow diagram an illustrative method of unwrapping a
phase
difference. As described above, in calculating the phase difference between
the
frequency domain audio signals the phase difference is limited to be between
+/- pi. As
a result, when the slope crosses +/- pi, it wraps around to -1+ pi resulting
in a
discontinuous slope. In order to unwrap the phase difference according to the
method
400, a moving average of the phase difference is calculated (402) and zero
crossings of
the average are located (404). The moving average is used to detect a flip in
the sign of
the phase difference, which corresponds to potential phase wrapping. The zero-
crossings may represent locations where the phase difference has been wrapped
or
they may represent an actual phase difference of 0. As such, the zero
crossings are
confirmed to correspond to data wrapping (406). To confirm the zero-crossing
does
corresponds with data wrapping, the directions of the moving average before
and after
the flip or zero crossing are compared to check that the slopes are moving in
the correct
direction. That is, the moving average was rising before wrapping to -pi or
falling before
wrapping to + pi. The zero crossings are also checked to ensure that there was
a
minimum frequency difference between adjacent zero crossing points. Once the
zero-
crossings are confirmed, the phase difference data is unwrapped around the
confirmed
zero crossings by either adding or subtracting 2*pi to all of subsequent phase
difference
values (408). Whether to add or subtract 2*pi is determined based on the low-
frequency phase difference. If the low-frequency phase difference is
decreasing then
2*pi is subtracted and if the low-frequency phase difference is increasing,
2*pi is added.
Figure 5 depicts in a flow diagram a further illustrative method of unwrapping
the
phase difference. As described above, the frequency domain signals are
segmented
into frequency ranges or bins. Rather than unwrapping the data based on a
moving
8

CA 02783913 2012-07-30
average as described above, the method 500 unwraps the phase difference of
each
frequency bin individually based on the target slope. For each frequency bin
of the
phase difference (502) the method determines if the phase difference
associated with
the respective frequency bin (indicated as Pdbin in the Figure for brevity) is
larger than
the target slope value at the frequency of the bin (indicated as T in the
Figure for
brevity) plus pi (504). If the phase difference is larger than the slope value
plus pi (Yes
at 504) a value, n, is determined such that the phase difference minus n*pi is
within +/-
pi of the target slope value (506). The unwrapped value for the frequency bin
is the set
as the wrapped value minus n*pi (508).
If the phase difference is not greater than the target slope plus pi, it is
determined
if the phase difference is less than the target slope minus pi (510). If it is
(Yes at 510) a
value, n, is determined such that the phase difference plus n* pi is within +/-
pi of the
target slope value (512), and the phase difference of the frequency bin is set
to the
phase difference plus n*pi (514). If the phase difference is not less than the
target slope
minus pi (No at 510), than the phase difference for the frequency bin is
between +/- pi of
the target slope and does not need to be unwrapped. Once the frequency bin has
been
unwrapped, the next frequency bin is processed (516). The unwrapped phase
difference may then be used in adjusting the target slope, for example, as
described
above with regards to Figure 3.
When the phase difference is unwrapped it is possible to determine a slope
confidence value indicating a confidence in the unwrapped phase difference.
The slope
confidence may be determined by a signal to noise ratio for each frequency bin
in the
unwrapped phase difference and summing the individual ratios together to
provide a
slope confidence. The slope confidence may then be used when adapting the
target
slope. For example, if the slope confidence value is below a threshold, the
target slope
may not be updated as the signal is too noisy. If the slope confidence is
above the
threshold it may be further used as a weighting factor of the leaky integrator
used to
adjust the target slope.
Figure 6 depicts in a flow diagram a further illustrative method of off-axis
audio
suppression. The method 600 is similar to the method 200 of Figure 2, however
the
9

CA 02783913 2012-07-30
method includes adjusting the target slope to adapt the region of interest
prior to
calculating the direction error. The method receives first and second audio
signals from
the microphones (602). The audio signals may be frequency domain
representations of
a frame of audio. For example, the audio signals may comprise a frequency
domain
representation of 32 ms of audio. The phase difference between the two audio
signals
is determined (604). Each audio signal may comprise a plurality of frequency
bins each
with an associated phase. The phase difference may be calculated as the
difference
between the corresponding frequency bins. Once the phase difference is
determined
outliers of the phase difference are determined and the phase difference
smoothed
(606). The smoothed phase difference is unwrapped (608) and the slope of the
smoothed phase difference is calculated (610). The slope of the unwrapped
phase
difference is used to adjust the target slope (612) and then the target slope
is re-
wrapped (614). By re-wrapping the target slope it is possible to compare the
target
slope to the phase difference of audio signals without needing to unwrap the
phase
difference of the audio signals. Once the target slope is adjusted and re-
wrapped, a
direction error is calculated between the adjusted target slope and the phase
difference
between the received audio signals (616) and the audio signals processed based
on the
calculated direction error (618).
Figure 7 depicts in a flow diagram an illustrative method 700 of processing
audio
based on the calculated direction error. The direction error is checked to see
if it is less
than an on-axis threshold (702). If it is (Yes at 702), the audio corresponds
to voice
audio originating from the region of interest and so the audio signals are
mixed together
to enhance the audio (704). If the direction error is not less than the on-
axis threshold
(No at 702), the direction error is checked to determine if it is greater than
an off-axis
threshold (706). If the direction error is greater than the off-axis threshold
(Yes at 706)
the audio signals correspond to noise audio or voice audio originating from
out of the
region of interest and so the audio signals are mixed together to suppress the
audio
(708). If the direction error is not less than the on-axis threshold and is
not greater than
the off-axis threshold (No at 706), then the audio is a combination of voice
audio and
noise audio. A mixing mask is calculated as a function of frequency (710). The
mixing
mask may comprise a weighting for each frequency of the signals to use during
the

CA 02783913 2012-07-30
mixing of the audio signals in order to suppress noise and enhance the voice
audio
originating from within the region of interest. The weighting of each
frequency may be
based on the direction error for the particular frequency. The mixing mask is
smoothed
(712) and the audio signals processed according to the smoothed mixing mask
(714).
Once the audio is processed as described above, the processed audio may be
provided as input to a voice recognition component. By processing the audio as

described above, audio corresponding to voice audio originating from the
region of
interest, such as from the driver, can be enhanced while other audio is
suppressed.
The processing can provide an improved audio signal for the voice recognition
providing
improved voice recognition.
Figure 8 depicts in a block diagram illustrative components of a system for
suppressing off-axis audio. The system 800 comprises two or more microphones
802a,
802b (referred to collectively as microphones 802) that capture sound from
within an
automobile cabin. The off-axis suppression described above does not require
the
microphones 802 to be placed in a specific location within the automobile
cabin.
Further the position of the microphones 802 does not need to be predetermined.
As
such, the individual microphones 802a, 802b can be located within the
automobile cabin
individually, allowing greater freedom in selecting the microphones to use as
well as
their location. The microphones 802 are typically placed towards the front of
the
automobile cabin. The microphones provide a signal corresponding to the
captured
audio to a pre-processing component 804. The pre-processing component 804 may
perform various processes on the signals from the microphones 802, including
analog
to digital conversion, amplification and filtering. The pre-processing
component 804
provides digital signals corresponding to the microphone signals to a domain
transformation component 806 which converts the digital signals in to
corresponding
frequency domain representation. The domain transformation component 806 may
use,
for example, a Fast Fourier Transform to transform a time interval of the
digital signals
to corresponding frequency domain signals. The frequency domain signals may be

segregated into discrete frequency ranges or bins. The domain transformation
component 806 may also determine the phase associated with each of the
frequency
bins of the digital signals. The frequency domain signals may be provided to a
11

CA 02783913 2012-07-30
processor 808 that processes the frequency domain audio signals to suppress
off-axis
audio. The processor 808 may include memory 810 for storing data and/or
instructions
used in the processing of the audio signals.
The processor 808 provides an off-axis suppression component 812 for
processing the audio signals 824. The off-axis suppression component 812 may
be
provided in the hardware of the processor 808, or may be provided as a result
of the
hardware of the processor 808 executing instructions stored in the memory 810
or in a
memory external to the processor 808. The off-axis suppression component 812
comprises a source-locator component 814 that receives the frequency domain
audio
signals, and compares a slope of the phase difference between the audio
signals to a
target slope 816 in order to determine a direction error as described above,
for example
with respect to Figure 2. The direction error may then be used by an audio
mixer
component 818 that mixes the audio signals to produce a processed audio signal
that
has off-axis audio suppressed.
The off-axis suppression component 812 also comprises a target adaptation
component 820. The target adaptation component 820 adapts the target slope
based
on the received audio signals as described above, for example with respect to
Figure 3.
The target adaptation component 820 adapts the target slope if the phase
difference
between the audio signals, which may be provided by the source locator
component
814, has a slope in the desired direction and the audio signals correspond to
voice. The
target adaptation component 820 allows a speaker, such as the driver to move
within
the automobile cabin while still providing off-axis audio suppression. The
processed
audio of the off-axis suppression component 812 is provided to a control
system 822
that utilizes the processed audio. As will be appreciated, the control system
822 may
utilize the processed audio in various ways. For example, the control system
822 may
be a voice recognition system that attempts to determine a command from the
processed audio to control an automobile system or component, such as an audio

system, a navigation system, or other automobile options. Additionally or
alternatively,
the control system 822 may be associated with hands-free phone system in which
the
processed audio may be transmitted to another participant of a phone call,
where the
processed audio reduces the background noise from the automobile cabin.
12

CA 02783913 2012-07-30
The various components of the system 800, such as the pre-processing
component 804, the domain transformation component 806, and the processor 808
have been depicted as separate components. It is contemplated that the
functionality
provided by each component may be incorporated into more or fewer components.
For
example, the domain transformation component 806 and the processor 808 may be
provided by a single component. Additionally, all of the components including
the pre-
processing component 804 and the control system 822 may be provided by a
single
component or apparatus.
The processing of audio to suppress off-axis audio has been described above
with regards to improving voice audio from a driver to improve voice
recognition. It is
possible to process the audio from other passengers. For example, by changing
the
direction used when setting the target slope, it is possible to enhance audio
from the
passenger. Additionally or alternatively, it is possible to process the audio
to improve a
hands-free call in order to suppress noise or conversations from other
occupants in the
automobile.
It will be appreciated that the off-axis audio suppression described herein
allows
audio from a desired location to be identified. Although specific embodiments
have
been described with regards to how the audio is processed based on whether the
audio
was considered to be from a desired location, namely the region of interest,
other
processing of the captured audio, based on whether the audio is determined to
be from
a desired location or not, is possible.
Further, the above has described the off-axis audio suppression with regards
to
an automobile cabin application. The off-axis audio suppression described
herein may
be applied to other environments in which audio is captured by a plurality of
microphones positioned in the environment.
For example, the off-axis audio
suppression could be used in rooms to improve voice recognition or remove
background audio. It will be appreciated that setting an initial target slope
in audio
environments, such as a room, where a speaker may be located in numerous
different
locations, may require further processing. The target slope could be initiated
based on
a location that a first sound is received from. Such an implementation would
'focus in'
13

CA 02783913 2012-07-30
on a first speaker or sound location once the off-axis audio suppression was
initiated.
Additionally or alternatively, the target slope could be initiated using one
or more
additional components, such as an image captured device, or other presence
sensor, to
identify a location of a desired human speaker and then calculate or estimate
a slope of
audio received from the identified location.
14

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	2016-01-26
(22) Filed	2012-07-30
Examination Requested	2012-07-30
(41) Open to Public Inspection	2013-01-29
(45) Issued	2016-01-26

Abandonment History

There is no abandonment history.

Maintenance Fee

Last Payment of $263.14 was received on 2023-07-21

Upcoming maintenance fee amounts

Description	Date	Amount
Next Payment if standard fee	2024-07-30	$347.00
Next Payment if small entity fee	2024-07-30	$125.00

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

the reinstatement fee;
the late payment fee; or
additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Request for Examination			$800.00	2012-07-30
Registration of a document - section 124			$100.00	2012-07-30
Application Fee			$400.00	2012-07-30
Registration of a document - section 124			$100.00	2014-06-03
Registration of a document - section 124			$100.00	2014-06-03
Maintenance Fee - Application - New Act	2	2014-07-30	$100.00	2014-07-10
Maintenance Fee - Application - New Act	3	2015-07-30	$100.00	2015-07-06
Final Fee			$300.00	2015-11-17
Maintenance Fee - Patent - New Act	4	2016-08-01	$100.00	2016-07-25
Maintenance Fee - Patent - New Act	5	2017-07-31	$200.00	2017-07-24
Maintenance Fee - Patent - New Act	6	2018-07-30	$200.00	2018-07-23
Maintenance Fee - Patent - New Act	7	2019-07-30	$200.00	2019-07-26
Registration of a document - section 124		2020-05-20	$100.00	2020-05-20
Maintenance Fee - Patent - New Act	8	2020-07-30	$200.00	2020-07-24
Maintenance Fee - Patent - New Act	9	2021-07-30	$204.00	2021-07-23
Maintenance Fee - Patent - New Act	10	2022-08-01	$254.49	2022-07-22
Maintenance Fee - Patent - New Act	11	2023-07-31	$263.14	2023-07-21

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
BLACKBERRY LIMITED

Past Owners on Record
2236008 ONTARIO INC.
8758271 CANADA INC.
QNX SOFTWARE SYSTEMS LIMITED

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Abstract	2012-07-30	1	13
Description	2012-07-30	14	749
Claims	2012-07-30	8	321
Representative Drawing	2012-09-21	1	8
Cover Page	2013-01-16	1	36
Drawings	2012-07-30	8	116
Claims	2014-12-02	8	333
Description	2014-12-02	15	794
Representative Drawing	2016-01-11	1	7
Cover Page	2016-01-11	1	35
Assignment	2012-07-30	8	265
Prosecution-Amendment	2012-07-30	2	45
Prosecution-Amendment	2014-06-02	2	86
Assignment	2014-06-03	46	6,216
Assignment	2014-06-03	28	4,228
Assignment	2014-07-28	15	435
Prosecution-Amendment	2014-12-02	13	551
Final Fee	2015-11-17	2	48

Language selection

Menus

English Abstract

French Abstract

Administrative Status

Abandonment History

Maintenance Fee

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2783913 Summary

English Abstract

French Abstract

Administrative Status

Abandonment History

Maintenance Fee

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.