Sélection de la langue

Search

Sommaire du brevet 1335003 

Énoncé de désistement de responsabilité concernant l'information provenant de tiers

Une partie des informations de ce site Web a été fournie par des sources externes. Le gouvernement du Canada n'assume aucune responsabilité concernant la précision, l'actualité ou la fiabilité des informations fournies par les sources externes. Les utilisateurs qui désirent employer cette information devraient consulter directement la source des informations. Le contenu fourni par les sources externes n'est pas assujetti aux exigences sur les langues officielles, la protection des renseignements personnels et l'accessibilité.

Disponibilité de l'Abrégé et des Revendications

L'apparition de différences dans le texte et l'image des Revendications et de l'Abrégé dépend du moment auquel le document est publié. Les textes des Revendications et de l'Abrégé sont affichés :

  • lorsque la demande peut être examinée par le public;
  • lorsque le brevet est émis (délivrance).
(12) Brevet: (11) CA 1335003
(21) Numéro de la demande: 593386
(54) Titre français: DETECTION DE L'ACTIVITE VOCALE
(54) Titre anglais: VOICE ACTIVITY DETECTION
Statut: Périmé
Données bibliographiques
(52) Classification canadienne des brevets (CCB):
  • 354/50
(51) Classification internationale des brevets (CIB):
  • G10L 11/02 (2006.01)
  • G10L 11/00 (2006.01)
(72) Inventeurs :
  • FREEMAN, DANIEL KENNETH (Royaume-Uni)
  • BOYD, IVAN (Royaume-Uni)
(73) Titulaires :
  • LG ELECTRONICS INC. (Republique de Corée)
(71) Demandeurs :
  • FREEMAN, DANIEL KENNETH (Royaume-Uni)
  • BOYD, IVAN (Royaume-Uni)
(74) Agent: G. RONALD BELL & ASSOCIATES
(74) Co-agent:
(45) Délivré: 1995-03-28
(22) Date de dépôt: 1989-03-10
Licence disponible: S.O.
(25) Langue des documents déposés: Anglais

Traité de coopération en matière de brevets (PCT): Non

(30) Données de priorité de la demande:
Numéro de la demande Pays / territoire Date
8805795 Royaume-Uni 1988-03-11
8820105.8 Royaume-Uni 1988-08-24
8813346.7 Royaume-Uni 1988-06-06

Abrégés

Abrégé anglais






Voice activity detector (VAD) for use in an LPC coder
in a mobile radio system, uses autocorrelation
coefficients R0, R1..... of the input signal, weighted
and combined, to provide a measure M which depends on the
power within that part of the spectrum containing no
noise, which is thresholded against a variable threshold
to provide a speech/no speech logic output. The measure is


Image


where Hi are the autocorrelation coefficients of the
impulse response of an Nth order FIR inverse noise filter
derived from LPC analysis of previous non-speech signal
frames. Threshold adaption and coefficient update are
controlled by a second VAD responsive to rate of spectral
change between frames.

Revendications

Note : Les revendications sont présentées dans la langue officielle dans laquelle elles ont été soumises.


17
THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. Voice activity detection apparatus comprising:
(i) means for receiving a first, input, signal;
(ii) means for periodically adaptively
generating a second signal representing an estimated noise
signal component of the first signal;
(iii) means for periodically forming from the
first and second signals a measure M of the spectral
similarity between a portion of the input signal and the
said estimated noise signal component; and
(iv) means for comparing the measure M with a
threshold value T to produce an output indicating the
presence or absence of speech; and
(v) analysis means operable to produce the
coefficients of a filter having a spectral response which
is the inverse of the frequency spectrum of one of the said
two signals;
wherein the measure M is proportional to the
zero-order autocorrelation R'o of a signal obtained by
filtering of the other of the said two signals by a filter
having the said coefficients.

2. Apparatus according to claim 1 in which the
analysis means includes an adaptive filter.

3. Apparatus according to claim 1 in which the
generating means are operable to compute the
autocorrelation coefficients Ai of the impulse response of
the said coefficients and the measure forming means
comprises means for computing the autocorrelation
coefficients Ri of the said other signal, and means
connected to receive Ri and Ai, and to calculate the measure
M therefrom.

4. Apparatus according to claim 2 in which the
means for computing the autocorrelation coefficients Ri of

18
the said other signal are arranged to do so in dependence
upon the autocorrelation coefficients of several successive
portions of the signal.

5. Apparatus according to claim 3 or 4 in which
M = R0A0 + 2.SIGMA.R1A1
where A1 represents the i-th autocorrelation
coefficient of the impulse response of said filter.

6. Apparatus according to claim 3 or 4 in which

Image

where A1 represents the i-h autocorrelation
coefficient of the impulse response of said filter.

7. Apparatus according claim 1, 2, 3 or 4 in
which the said one signal is the second, noise
representing, signal and the said other signal is the
first, input signal.

8. Apparatus according to claim 7, further
comprising an input arranged to receive a second input
signal, similarly subject to noise, from which speech is
absent, in which the generating means comprise LPC analysis
means for deriving values of A1 from the second input
signal.

9. Apparatus according to claim 1, 2, 3 or 4
further comprising a buffer connected to store data from
which the autocorrelation coefficients A1 of the said filter
response may be obtained, in which the said filter response
is periodically calculated from the signal by LPC analysis
means, the apparatus being so connected and controlled that
the measure M is calculated using the said stored data, and
the said stored data is updated only from periods in which
speech is indicated to be absent.

10. Apparatus according to claim 9 further
comprising means for indicating the absence of speech to

19
control the updating of the stored data, the means for
indicating the absence of speech being a second voice
activity detection means.

11. Apparatus according to claim 1, 2, 3, 4, 8,
or 10, further comprising means for adjusting the said
threshold value T during periods when speech is indicated
to be absent.

12. Apparatus according to claim 11 in which the
threshold value T is, when adjusted, adjusted to be equal
to the mean of the measure plus a term which is a fraction
of the standard deviation of the measure.

13. Apparatus according to claim 12, further
comprising second voice activity detection means arranged
to prevent adjustment of the threshold value when speech is
present.

14. Apparatus according to claim 10 or 13 in
which the said second voice activity detection means
controls both threshold adjusting and data updating.

15. Apparatus according to claim 10 in which
said second voice activity detection means comprises means
for generating a measure of the spectral similarity between
a portion of the input signal and earlier portions of the
input signal.

16. Apparatus according to claim 15 in which the
similarity measure generating means comprises means for
providing, from LPC filter data and autocorrelation data
relating to a present portion of the input signal, a
present distortion measure; means for providing an
equivalent past frame distortion measure corresponding to
a preceding portion of the input signal, and means for
generating a signal indicating the degree of similarity
therebetween as an indicator of speech presence or absence.


17. Apparatus according to claim 15 or 16, in
which said second voice activity detection means further
comprises voiced speech detection means comprising pitch
analysis means, for generating a signal indicative of the
presence of voiced speech, upon which the output of said
second voice activity detection means also depends.

18. A method of detecting voice activity in a
first, input, signal, comprising the steps of:
(a) periodically adaptively generating a second
signal representing an estimated noise signal component of
the first signal;
(b) periodically forming from the first and
second signals a measure M of the spectral similarity
between a portion of the input signal and the said
estimated noise signal component;
(c) comparing the measure M with a threshold
value T to produce an output indicating the presence or
absence of speech;
(d) producing the coefficients of a filter
having a spectral response which is the inverse of the
frequency spectrum of one of the said two signals; and
wherein the measure M is proportional to the zero-order
autocorrelation Ro' of a signal obtained by filtering of the
other of the said two signals by a filter having the said
coefficients.

19. Apparatus for encoding speech signals
including apparatus according to any one of claims 1, 2, 3,
4, 8, 10, 12, 13, 15 or 16.

20. Mobile telephone apparatus including
apparatus according to any one of claims 1, 2, 3, 4, 8, 10,
12, 13, 15 or 16.

21. A voice activity detection apparatus
comprising:
(i) a first voice activity detector which
operates by forming a measure of the spectral similarity

21
between an input signal and a stored portion of input
signal deemed to be speech free to produce an output signal
indicating the presence or absence of speech in the input
signal;
a store for containing the stored portion of
signal; and
(iii) an auxiliary voice activity detector,
wherein the auxiliary voice activity detector
alone controls the updating of the store, the auxiliary
voice activity detector operating by forming a measure of
the spectral similarity between the current signal and an
earlier portion of signal.

22. Voice activity detection apparatus
comprising:
(i) means for receiving an input signal;
(ii) a store for storing a noise representing
signal;
(iii) means for periodically forming from the
input signal and the stored noise representing signal a
measure of the spectral similarity between a portion of the
input signal and the said estimated noise signal component;
(iv) means for comparing the measure with a
threshold value to produce an output indicating the
presence or absence of speech;
(v) an auxiliary voice activity detector; and
(vi) store updating means for updating the store
from the input signal;
wherein the auxiliary voice activity detector is
operable in dependence on a measure of spectral similarity
between the input signal and a preceding portion of the
input signal to produce a control signal indicating the
presence or absence of speech, and the store updating means
is operable to update the store from the input signal only
when said control signal indicates that speech is absent.

23. Apparatus according to claim 22, further
comprising means for adjusting the said threshold value

22
during periods when speech is indicated by said control
signal to be absent.

24. Apparatus according to claim 22 or 23, in
which said auxiliary voice activity detector further
comprises voiced speech detection means comprising pitch
analysis means for generating a signal indicative of the
presence of voiced speech, upon which the control signal
produced by the auxiliary voice activity detector also
depends.

Description

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.


- 133~003



VOIOE ACTIVITY DETECTION

A voice activity detector is a device which is
supplied with a signal ~ith the object of detecting
periods of speech, or periods containing only noise~.
Although the present invention is not limited thereto, one
application of particular interest for such detectors is
in mobile radio telephone systems where the knowledge as
to the presence or otherwise of speech can be used
exploited by a speech coder to improve the efficient
utilisation of radio spectrum, and where also the noise
level (from a vehicle-mounted unit) is likely to be high.
The essence of voice activity detection is to locate a
me~sure which differs appreciably -between speech and
non-speech periods. In apparatus which includes a speech
coder, a number of parameters are readily available from
one or other stage of the coder, and it is therefore
desirable to economise on processing needed by utilising
some such parameter. In many environments, the main noise
sources occur in known defined areas of the frequency
spectrum. For example, in a moving car much of the noise
(eg, engine noise) is concentrated in the low frequency
regions of the spectrum. Where such knowledge of the
spectral position of noise is available, it is desirable
to ~ase the decision as to whether speech is present or
absent upon measurements taken fro~-that portion of the
spectrum which contains relatively little noise. It
would, of course, be possible in practice to pre-filter
the signal before analysing to detect speech activity, but
where the voice activity detector follows the output of a
speech coder, prefiltering-would distort the voice signal
to be coded.
~,

_ - 2 - 1335 00~

According to a f irst aspect of the invention there is
provided voice activity detection apparatus comprising
means for receiving an input signal, means for estimating
the noise signal component of the input signal, means for
continually forming a measure ~ of the spectral sim~larity
between a portion of the input signal and the noise
signal, and means- for comparing a p~rameter derived from
the measure M with a threshoId va~ue T to produce an
output to indicate the presence or absence of speech in
dependence upon whether or not that value is exceeded.
According to a second aspect of the invention there is
provided voice activity detection apparatus co~prising:
means for continually forming a spectral distortion
measure of the similarity between a portion of the input
signal and earlier portions of the input signal and me~ns
for comparing the degree of var~ation between successive
values of the measure with a threshold v~lue to produce an
output incating the presence or absence of speech in
dependence upon ~hether or not that value is exceeded.
Preferably, the measure is the Itakura-Saito
Distortion Xeasure.
Other aspects of the present invention ~re as defined
in the claLms.
Some embodiments of the invention will now be
described, by way of example, ~ith reference to the
accompanying drawings, in which:

Fig~re 1 is a block diagram of a first embodiment of
- the invention;

F-gure 2 shows a-second embodiment of the invention;

Figure 3 shows a third, preferred embodiment -of the
invention.

i _ - 3 - 133 S 0 03


-The general principle underlying a first Yoice
Activity Detector according to the a first embodiment of
the invention is as follows.
A frame of n signal samples
(sO~ 51~ 52' s3~ s4 -- sn_l ) will, ~hen
passed through a notional fourth order finite impulse
response (FIR) digital filter of impulse response (1,
ho~ hl, h2, -h3), result in a filtered signal
(ignoring samples from previous frames)
s =
( O)'
(S-l + hoSo)~
(S2 + hoSl + hlSO)'
(s3 + hos2 ~ hlsl ~ h2So)~
(S4 ~ hos3 ~ hlS2 + h2Sl + hlSOl'
(Ss + hos4 + hls3 + h2S2 + h3Sl)'
(S6 + hoS5 + hlS4 + h2S3 + h3s2),
(S7 ... )

The zero order autocorrelation coefficient is the sum of
each term squared, which may be normalized i.e. divided by
the total number of terms (for constant frame lengths it
is easier to omit the division); that of the filtered
signal is thus

R~o _ r ~s'i) -
.
~ - o
and this is therefore a measure of the power of the
notional filtered signal s' - in other words, of that part
of the siqnal s which falls within the passband of the
notional filter.
Expanding, neglecting the first 4 terms,
.

_ - 4 - 1 335 0 03


R~o = (s4 + hos3 + hlS2 + h2Sl + h3So)2
+ (s5 + hos4 + hls3 + h252 + h3Sl-)


= 54 hos4s3 + hls4s2 + h2s4sl + h3s4sO

+ hoS4S3 + h25o + hOhlS3S2 - hOh2S3Sl+ hoh3S3so

hls4s2 + hOhlS3S2 + h21s2 + hlh2S251 + hlh3S2SO

2s4sl + hOhlS3Sl + hlh2S2Sl + h2s21h2h3SlSO
3 4 0 hoh3s3so + hlh3S2sO + h2h3SlsO + h23so

+ . . .

= Ro (1 + ho2+ h21+ h2+ h32)

Rl ~2ho + 2h~hl + 2hlh2 + 2h2h3)
+ R2 (2hl + 2hlh3 + 2hOh2)
+ R3 (2h2 + 2hoh3)
R4 (2h3)

So R~o can be obtained from a combination of the
autocorrelation coefficients Ri,- weighted by the
bracketed constants which determine the frequency band to
which the value of R'o is responsive. In faet, the
bracketed terms are the autocorrelation coefficients of
the impulse response of the notional filter, so that the
expression above may ~e simplified to

_ _ 5 _ 1 3 35 0 03



O oHo + 2 ~ RiHi, (1)
i = 1

where N is the filter order and Hi are the
(un-normalised) autocorrelation coefficients of the
impulse response of the filter.
In other vords, the effect on the signal
autocorrelation coefficients of filtering a signal may be
simulated by producing a weighted sum of the
autocorrelation coef$icients of the (unfiltered) signal-,
using the impulse response that the required filter wou~d
have had.
Thus, a relatively simple algorithm, involving a small
number of multiplication operations, may simulate the
effect of a digital filter requiring typically a hundred
times this nu~er of multiplication operations.
This filtering operation may alternatiYely be viewed
as a form of spectrum comparison, with the signal spectrum
being matched against a reference spectrum (the inverse of
the response of the notional filter). Since the notional
filter in this application is selected so as to
approximate the inverse of the noise spectrum, this
operation may be viewed as a spectral comparison between
speech and noise spectra, and the zeroth autocorrelation
coefficient thus generated (i.e. the energy of the inverse
filtered signal) as a measure of dissimilarity between the
-~spectra. The Itakura-Saito distortion measure is used in
LPC to assess the match between the predictor filter and
the input spectrum, and in one form is expressed as

-


6 1335003



M = RoAo + 2 ~ RiAi'

where Ao etc are the autocorrelation coefficients of the
LPC parameter set. It vill be seen that this is closely
similar to the relationship derived above, and ~hen it is
remembered that the LPC coefficients are the taps of an
FIR filter having the inverse spectral response of the
input -signal so that the LPC coefficient set is the
impulse response of the inverse LPC- filter, it will be - ~
apparent that the Itakura-Saito Distortion Neasure is in
fact merely a form of equation 1, wherein the filter
response H is the inverse of the spectral shape of an
all-pole model of the input signal.
In fact, it is also possible to transpose the spectra,
using the LPC coefficients of the test spect~um and the
autocorrelation coefficients of the reference spectrum, to
obtain a different measure of spectral similarity.
The I-S Distortion ~easure is further discussed in
"Speech Coding based upon Vector Qu~ntisation~ by A Buzo,
A H Gray, R ~ Gray and J D ~arkel, IEEE Trans on ASSP, Vol
ASSP-28, No 5, October lg~0.
Since the frames of signal have only a finite length,
and a number of terms (N, where N is the filter order) are
neglected, the above result is an approxi~ation only; it
gives, however, a surprisingly qood indicator of the
presence or absence of speech and thus ~ay be used as a
measure H in speech detection. In an environment where
-the noise spectrum is well known and stationary, it is - i
quite possible to simply employ fixed ho~ hl etc
coefficients to model the inverse noise filter. t

~ ~ 7 _ 1335003


~ owever, apparatus which can adapt to different nolse
environments is much more widely useful.
Referring to Fi~ure 1, in a first embodiment, a signal
from a microphone (not shown) is received at an input 1
and converted to digital samples s at a suitable sampling
rate by an analoque to digital converter 2. An LPC
analysis unit 3 ~in a ~nown type of LPC coder) then
~derives, for successive frames of n teg 160) samples, a~ -
set of N (eg 8 or 12) LPC filter coefficients Li which
are transmitted~ to represent the input speech. The speech
signal s also enters a correlator unit 4 (normally part of
the LPC coder -3~~since the autocorrelation vector Ri ~
the speech is also ~usually ptoduced as a step in the LPC
analysis although it will be appreciated that a separate
correlator could be provided). The correlator 4 produces
the autocorrelation Yector Ri, including the zero order
correlation coefficient Ro and at least 2 further
aut~correlation coefficients Rl, R2, R3. These are
then supplied to a multiplier unit 5.
A second input 11 is connected to a second microphone
located distant from the speaker so as to receive only
background noise. The input from this ~icrophone is
converted to a diqital input sample train by AD convertor
12 and LPC analysed by a second LPC analyser 13. The
~noise" LPC coefficients produced from analyser 13 are
passed to correlator ~unit 14, and the autocorrelation
vector thus produced is multiplied term ~y term ~ith the
autocorrelation coefficients Ri f the input signal fro~
the speech microphon-e in multiplier 5 and the weighted
coefficients thus produced are com~ined in adder 6
according to Bquation 1, so as to apply a filter having
the inverse shape of the noise spectrum from the
noise-only ~icrophone (which in practice is the same as

1335003
-- 8 --

the shape of the noise spectrum~ in the signal-plus-noise
microphone) and thus filter out most of the noise. The
resulting measure ~ is thresholded by thresholder 7 to
produce a logic output 8 indicating the presence or
absence of speech; if ~ is high, speech is deemed to be
present.
This embodiment does, however, re~uire two microphones
and two LPC analysers, which adds to the expense and
complexity of the equipment necessary.
Alternatively, another embodiment uses a corresponding
measure formed using the autocorrelations from the noise
microphone 11 and ~the LPC coefficients from the maLn
microphone 1, so that an extra autocorrelator rather than
an LPC analyser is necessary.
These embodiments are therefore able to operate within
different environments having -noise at different
frequencies, or within a chan~ing noise spectrum in a
given environ~ent.
Referring to Pigure 2, in the preferred embodiment of
the invention, there is provided a buffer 15 which stores
a set of LPC coefficients (or the autocorrelation vector
of the set) derived from the microphone input 1 in a
period identified as being ~a ~non speech~ (ie noise only)
period. These coefficients are then used to derive a
measure using e~uation 1, which also-of course corresponds
to the Itakura-Saito Distortion ~easure, except that a
single stored frame of LPC coefficients corresponding to
an approximation of the inverse noise spectrum is used,
rather than the present frame of LPC coefficients.
The LPC coefficient vector Li output by analyser ~
is also routed to a correlator 14, which produces the
autocorrelation vector of the LPC coefficient vector. The
buffer memory 15 is controlled by the speech/non-speech

-- q 133500~


output of thresholder 7, in such a way that during
~speech~ frames the buffer retains the ~noise~
autocorrelation coefficients, but during "noise~ frames a
new set of LPC coefficients may be used to update the
buffer, for example by a multiple -switch 16, via which
outputs of the correlator 14, carrying each
autocorrelation coefficient, are connected to the buf~er
15. It ~ill be appreciated that correlator 14 could be
positioned after buffer 15. Further, the speech/no-speech
decision for coefficient update need not be from output 8,
but couid be (and preferably is) otherwise derived.
- Since frequent periods ~ithout speech occur, the LPC
coefficients stored in the buffer are updated from time to
time, so that the apparatus is thus capable of tracking
chanqes in the noise spectrum. It will be appreciated
that such updating of the buffer may be necessary only
occasionally, or may occur -only once at the start of
operation of the detector, if (as is often the case) the
noise spectrum is relatively stationary over time, but in
a mobile radio environment frequent updating is preferred.
In a modification of this embodiment, the system
initially employs equation 1 with coefficient terms
corresponding to a simple fixed high pass filter, and then
subsequently starts to adapt by switching over to using
~noise period~ LPC coefficients. If, for some reason,
speech detection fails, the system may return to using the
simple high pass filter.
It is possible to normalise the above measure by
dividing through by Ro~ so that the ~pression to be
thresholded has the form

N = Ao +2 ~ RiAi
- R
I O

- lO - 133S0 03

This measure is independent of the total signal enerqy in
a frame and is thus compensated for gross signal level
changes, but gives rather less mar~ea contrast betveen
"noise~ and ~speech~ levels and is hence preferably not
employed in h-igh-noise environments.
Instead of employing LPC analysis to derive the
inverse -filter coefficients of the noise signal (from
either the noise microphone or noise only periods, as in
the various embodiments described above), it is possible
to model the ~inverse noise spectrum usinq an adaptive
filter of known-type; as the noise spectrum changes only
slowly (as discussed below) a relatively-slow coefficient
adaption rate common for such filters is acceptable. In
one embodiment, which corresponds to Figure 1, LPC
analysis unit 13 is simply replaced by an adaptive filter
(for example a transversal FI~ or lattice filter),
connected so as to whiten the noise input by modelling the
inverse filter, and its coefficients ~re supplied as
before to autocorrelator 14.
~ n a second embodiment, corresponding to that of
Figure 2, LPC analysis means 3 is replaced by such an
adapter filter, and buffer means 15 is omitted, but switch
16 operates to prevent the adaptive filter from adapting
its coefficients during speech periods.
A second Voice Activity Detector in accordance with
another aspect of the invention vill now be described.
From the foregoing, it will be apparent that the ~PC
coefficient vector is simply the impulse response of an
FIR filter which has a response approximating the inverse
spectral shape of the input signal. When the
Itakura-Saito Distortion ~easure between adjacent frames
is formed, this is in fact e~ual to the power of the

1335003


signal, as f~ltered by the ~P~ filter of the previous
frame. So if spectra of adjacent frames differ little, a
correspondingly small amount of the spectral power of a
frame will escape filtering and the ~easure will be low.
Correspondingly, a large interframe spectral difference
produces a high Ita~ura-Saito Distortion ~easure, so that
the measure reflects the spectral similarity of adjacent
frames. In a speech coder, it is desirable to minimise
the data~ rate, so frame length is made as long as
possible; in other w rds, if the frame length is Iong
enough, then a speech signal should show a significant
spectral change from frame to frame (if it does not, the
coding is redundant). Noise, on the other hand, has a
slowly varying spectral shape from frame to frame, and so
in a period ~here speech is absent from the signal then
the Itakura-Saito Distortion ~easure will correspondingly
be low - since applying the inYerse LPC filter from the
previous frame ~filters out" most of the noise power.
Typically, the Itakura-Saito Distortion ~easure
between adjacent frames of a noisy signal containing
intermittent speech is higher during periods of speech
than periods of noise; the degree of variation (as
illustrated by the standard deviation) is higher, and less
intermittently variable.
It is noted that the standard deviation of the
standard deviation of N is also a reliable measure; the
effect of taking each standard deviation is essentially to
smooth the measure.
~ In this second form of Voice Activity ~etector, the
measured parameter used to decide whether speech is
present is preferably the standard deviation of the
Itakura-Saito Distortion ~easure, but other measures of
variance and other spectral distortion measures ~based for
example-on FFT analysis~ could be employed.

- 12 - 133~003


It is ~found advantageous to employ an adaptive
threshold in voice activity detection. Such thresholds
must not be adjusted during speech periods or the speech
signal will be thresholded out. It is accordingly
necessary to control the threshold adapter using a
speech/non-speech control signal, and it is preferable
that this control signal should be independent of the
output of the threshold adapter.
The threshold T is adaptively adjusted so as to keep the
threshold level just above the level of the measure N when
- noise only is present. ~Since the measure vill in general
~ ~vary randomly when noise is present, the -threshold is
varied by determining an average level over a number of
bloc~s, and setting the threshold at a level proportional
to this average. In a noisy environment this is not
usually sufficient, however, and so an assessment of the
degree of variation of the parameter over several blocks
- is also taken into account.
The threshold value T is therefore preferably
calculated according to
T = X' + K.d
where ~' is the average value of the measure over a number
of consecutive frames, d is the standard deviation of the
measure over those frames, and K is a constant (which may
- typically be 2).
~ In practice, it -is preferred not to resume adaptation
immediately after speech is indicated to be absent! but to
- wait to ensure the fall is stable (to avoid rapid repeated
switching between the adapting and non-adapking states).
- Referring to Figure 3, in a preferred embodiment of
~ the invention incorporating the a~ove aspects, an input 1
receives a signal which is sampled and digitised by
analogue to digital conYerter (ADC) 2, and supplied to the
input of an inverse filter analyser 3, which in practice

- 13 - 133S003


is part of a speech coder with which the voice activity
detector is to wor~, and vhich generates coefficients Li
(typically 8) of a filter corresponding to the inverse of
the input signal spectrum. The digitised signal ~s also
supplied to an autocorrelator 4, (~hich is part of
analyser 3) which generates the autocorrelation vector
Ri-of the input signal-(or at least as many low order
terms as there are LPC coefficients). Operation of these
parts of the apparatus is as described-in Figres 1 and 2.
Preferably, the autocorrelation coefficients Ri are then
averaged over several successive speech frames (typically
5-20 ms long) to i~prove their relia~i-lity. This may be
achieved ~y storing each set of - autocorrelations
coefficients output by autocorrelator 4 in a buffer 4a,
and employing an averager 4b to produce a weighted sum of
the current autocorrelation coefficients Ri and those
from previous frames stored in and supplied from buffer
4a. The averaged autocorrelation coefficients Rai thus
derived are supplied to weighting and adding means 5,6
which receives also the autocorrelation vector Ai f
stored noise-period inverse filter coefficients Li from
an autocorrelator 14 via buffer 15, and forms from Rai
and Ai a measure X preferably defined as:

= Bo ~2 ~ RaiBi'
~ , Ro

- This measure is then thresholded by thresholder 7
against a threshold level, and the ~logical result provides
an indication of the presence or absence of speech at
output 8.
In order that the inverse filter coefficients Li
correspond to a fair estimate of the inverse of the noise

~ - 14 - 1335 0 0~

spectrum, it is desirable to update these coefficients
during periods of noise (and, of course, not to update
during periods of speech). It is, ho~ever, preferable
that the speech/non-speech decision on which the updating
is based does not depend upon the result of the updatinq,
or else a single wrongly identified frame of signal may
result in the voice activity detector subsequently qoing
~out of lock~ and wrongly identifying following frames.
Preferably, therefore, there is provided a control signal
generating circuit 20, effectively a separate voice
activity detector/ which forms an independent control
signal indicating the presence or absence of speech to
control inverse filter analyser 3 (or buffer 8) so that~
the inverse filter autocorrelation coefficients Ai used
to form the measure h are only updated during ~noise~
periods. The control signal generator circuit 20 includes
LPC analyser 21 (which again may ~e part of a speech coder
and, specifically, may be performed by analyser 3), which
produces a set of LPC coefficients Ni corresponding to
the input signal and an autocorrelator 21a (which may be
performed by autocorrelator 3a) which derives the
autocorrelation coefficients Bi f ~i. If analyser 3
is performed by analyser 3, then ~i=Li and 8i=Ai.
These autocorrelation coefficients are then supplied to
weighting and adding me~ns 22,23 (e~uivalent to 5, 6)
which receive also the autocorrelation vector Ri f the
input signal from autocorrelator 4. A measure of the
spectral similarity between the input speech frame and the
- - -preceding speech frame is thus calculated; this may be the
Itakura-Saito distortion measure bet~een Ri f the
present frame and Bi f the preceding frame, as
disclosed a~ove, or it may instead be derived by
calculating the Itakura - Saito distortion measure ~or
Ri and- Bi -f the present frame, and subtracting (in
subtractor 25~ the corresponding measure for the previous

S- 1335003


frame stored in buffer 24, to generate - a spectral
difference signal (in either case, the measure is
prefera~ly energy-normalised by dividing by Ro). The
buffer 24 is then, of course, updated. This spectral
difference signal, when thresholded by a thresholder 26
is, as discussed above, an indicator of the presence or
absence of speech. We have found, however, that although
this measure is excellent for distinguishing noise from
unvoiced speech (a tas~ which prior art systems are
generally incapable of) it is in~general rather less able
to distinguish noise from voiced- speech. Accordingly,
there is preferably further providea within circuit 20 a
voiced speech detection circuit comprising a pitch
analyser 27 (which in practice ~ay operate as part of a
speech coder, and in particular may measure the long term
predictor lag value produced in a multipulse LPC coder).
The pitch analyser 27 produces a logic signal which is
ntrue" when voiced speech is detected, and this siqnal,
together ~ith the thresholded ~easure derived from
thresholder 26 ~which will ~enerally be ~true~ when
unvoiced speech is present) are supplied to the inputs of
a NOR gate 28 to generate a signal which is ~false~ when
speech is present and "true~ when noise is present. This
signal is supplied to buffer 8 lor to inverse filter
analyser 3) so that inverse filter coefficients Li are
only updated during noise periods.
Threshold adapter 29 is also connected to receive the
non-speech signal control output of control signal
generator circuit 20. The output of the threshold adapter
29 is supplied to thresholder 7. The threshold adapter
operates to increment or decrement the threshold in steps
which are a proportion of the instant threshold Yalue,
until the threshold approximates the noise pcwer level
(which may conveniently be derived from, f~r example,

~ 6 - 1335003


weighting and adding circuits 22, 23). When the~ input
signal is very low, it may be desirable that the threshold
is automatically set to a fixed, low, level since at the
low signal levels the effect of signal quantisation
produced by ADC- 2 can produce unreliable results.
There may be further provided "hangover~ generating
means 30, which operates to measure the duration of
indications of speech after thresholder 7 and, when the
presence of speech has been indicated for a period in
excess of a predetermined time constant, the output is
held high for a short ~hangover~ period. In this way,
clipping -of the midd~e of low-level spee~h bursts is
avoided, and appropriate selection of the time constant
prevents triggering of the hangover generator 30 by short
spikes of no;se which are falsely indicated as speech.
It will of course be appreciated that all the above
functions may be egecuted by a single suita~ly programmed
digital processing mPans such as a Digital Signal
Processing (~SP~ chip, as part of an LP~ codec thus
implemented (this is the preferred implementation), or as
a suitably programmed microcomputer or microcontroller
chip with an associated memory device.
Conveniently, as described above, the voice detection
apparatus may be implemented as part of an LPC codec.
Alternatively, where autocorrelation coefficients of the
signal or related measures (partial correlation, or
"parcor", coefficients) are transmitted to a distant
station the voice detection ~ay take place distantly from
the ~odec.

,

Dessin représentatif
Une figure unique qui représente un dessin illustrant l'invention.
États administratifs

Pour une meilleure compréhension de l'état de la demande ou brevet qui figure sur cette page, la rubrique Mise en garde , et les descriptions de Brevet , États administratifs , Taxes périodiques et Historique des paiements devraient être consultées.

États administratifs

Titre Date
Date de délivrance prévu 1995-03-28
(22) Dépôt 1989-03-10
(45) Délivré 1995-03-28
Expiré 2012-03-28

Historique d'abandonnement

Il n'y a pas d'historique d'abandonnement

Historique des paiements

Type de taxes Anniversaire Échéance Montant payé Date payée
Le dépôt d'une demande de brevet 0,00 $ 1989-03-10
Enregistrement de documents 0,00 $ 1993-04-06
Taxe de maintien en état - brevet - ancienne loi 2 1997-04-01 100,00 $ 1997-02-13
Taxe de maintien en état - brevet - ancienne loi 3 1998-03-30 100,00 $ 1998-02-13
Taxe de maintien en état - brevet - ancienne loi 4 1999-03-29 100,00 $ 1999-02-10
Taxe de maintien en état - brevet - ancienne loi 5 2000-03-28 150,00 $ 2000-02-14
Taxe de maintien en état - brevet - ancienne loi 6 2001-03-28 150,00 $ 2001-02-12
Taxe de maintien en état - brevet - ancienne loi 7 2002-03-28 150,00 $ 2002-02-13
Taxe de maintien en état - brevet - ancienne loi 8 2003-03-28 150,00 $ 2003-02-13
Enregistrement de documents 100,00 $ 2003-11-24
Taxe de maintien en état - brevet - ancienne loi 9 2004-03-29 150,00 $ 2003-12-22
Taxe de maintien en état - brevet - ancienne loi 10 2005-03-28 250,00 $ 2005-02-08
Taxe de maintien en état - brevet - ancienne loi 11 2006-03-28 250,00 $ 2006-02-07
Taxe de maintien en état - brevet - ancienne loi 12 2007-03-28 250,00 $ 2007-02-08
Taxe de maintien en état - brevet - ancienne loi 13 2008-03-28 250,00 $ 2008-02-08
Taxe de maintien en état - brevet - ancienne loi 14 2009-03-30 250,00 $ 2009-02-12
Taxe de maintien en état - brevet - ancienne loi 15 2010-03-29 450,00 $ 2010-02-18
Taxe de maintien en état - brevet - ancienne loi 16 2011-03-28 450,00 $ 2010-12-21
Titulaires au dossier

Les titulaires actuels et antérieures au dossier sont affichés en ordre alphabétique.

Titulaires actuels au dossier
LG ELECTRONICS INC.
Titulaires antérieures au dossier
BOYD, IVAN
BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY
FREEMAN, DANIEL KENNETH
Les propriétaires antérieurs qui ne figurent pas dans la liste des « Propriétaires au dossier » apparaîtront dans d'autres documents au dossier.
Documents

Pour visionner les fichiers sélectionnés, entrer le code reCAPTCHA :



Pour visualiser une image, cliquer sur un lien dans la colonne description du document. Pour télécharger l'image (les images), cliquer l'une ou plusieurs cases à cocher dans la première colonne et ensuite cliquer sur le bouton "Télécharger sélection en format PDF (archive Zip)" ou le bouton "Télécharger sélection (en un fichier PDF fusionné)".

Liste des documents de brevet publiés et non publiés sur la BDBC .

Si vous avez des difficultés à accéder au contenu, veuillez communiquer avec le Centre de services à la clientèle au 1-866-997-1936, ou envoyer un courriel au Centre de service à la clientèle de l'OPIC.


Description du
Document 
Date
(yyyy-mm-dd) 
Nombre de pages   Taille de l'image (Ko) 
Correspondance reliée au PCT 1995-01-18 1 39
Correspondance de la poursuite 1993-01-07 6 209
Demande d'examen 1992-09-08 2 126
Correspondance de la poursuite 1989-07-24 1 30
Dessins représentatifs 2002-05-15 1 5
Page couverture 1995-03-28 1 18
Abrégé 1995-03-28 1 25
Description 1995-03-28 16 676
Revendications 1995-03-28 6 222
Dessins 1995-03-28 3 44
Cession 2003-11-24 2 79
Taxes 1997-02-13 1 61