Sommaire du brevet 2165352

(12) Demande de brevet:	(11) CA 2165352
(54) Titre français:	METHODE DE MESURE DE CARACTERISTIQUES DE CAMOUFLAGE DE LA VOIX
(54) Titre anglais:	METHOD FOR MEASURING SPEECH MASKING PROPERTIES
Statut:	Réputée abandonnée et au-delà du délai pour le rétablissement - en attente de la réponse à l’avis de communication rejetée

Données bibliographiques

(51) Classification internationale des brevets (CIB):	G01R 29/00 (2006.01) G01R 23/165 (2006.01) G01R 23/18 (2006.01) G01R 29/26 (2006.01)
(72) Inventeurs :	SHOHAM, YAIR (Etats-Unis d'Amérique) WIERZYNSKI, CASIMIR (Etats-Unis d'Amérique)
(73) Titulaires :	AT&T IPM CORP. CASIMIR WIERZYNSKI
(71) Demandeurs :	AT&T IPM CORP. (Etats-Unis d'Amérique) CASIMIR WIERZYNSKI (Etats-Unis d'Amérique)
(74) Agent:	KIRBY EADES GALE BAKER
(74) Co-agent:
(45) Délivré:
(22) Date de dépôt:	1995-12-15
(41) Mise à la disponibilité du public:	1996-07-01
Requête d'examen:	1995-12-15
Licence disponible:	S.O.
Cédé au domaine public:	S.O.
(25) Langue des documents déposés:	Anglais

Traité de coopération en matière de brevets (PCT):	Non

(30) Données de priorité de la demande:

Numéro de la demande	Pays / territoire	Date
367,371	(Etats-Unis d'Amérique)	1994-12-30

Abrégés

Abrégé anglais

A method measures the masking properties of subband components of a
signal and determines a noise level vector for the signal. In the preferred
embodiment, a signal is separated to yield a set of subband signal components.
Bandpass noise components are also generated. For each combination of bandpass
noise and subband signal component, the value of the noise-to-signal ratio that meets
a specified masking criterion is determined. The values from the combinations are
stored. Then, a noise level vector for any other signal can be determined by filtering
the signal into a set of components, accessing the stored values and combining the
values to yield a measure of the masking properties of the other signal.

Revendications

Note : Les revendications sont présentées dans la langue officielle dans laquelle elles ont été soumises.

-17-
Claims:
1. A method of determining the noise power spectrum that can be
masked by a signal, the method comprising the steps of:
separating said signal into a set of subband components,
identifying the noise power spectrum that can be masked by each
subband component in said set of subband components, and
combining the identified noise power spectrum masked by each subband
component to yield the noise power spectrum that can be masked by said signal.
2. The method of claim 1 wherein the step of separating comprises the
step of:
applying said signal to a filterbank comprising a set of filters wherein
the output of each filter in said set of filters is a subband component of the signal.
3. The method of claim 1 wherein the step of combining comprises the
step of:
adding the noise power spectra masked by each subband component to
yield the noise power spectrum masked by said signal.
4. The method of claim 1 wherein said signal is wideband speech.
5. A method comprising the steps of:
separating an input signal to a set of subband signal components, and
generating output signals based on the power in each subband signal
component and on a masking matrix.
6. The method of claim 5 wherein said masking matrix Q is an nxn
matrix wherein each element qi,j of said masking matrix is the ratio of the noise
power in band j that can be masked by the power of the subband signal component in
band i.

- 18 -
7. The method of claim 5 wherein the input signal is a speech signal.
8. The method of claim 5 wherein the step of separating comprises the
step of:
applying said input signal to a filterbank comprising a set of filters
wherein the output of each filter in said set of filters is a subband component of the
signal.
9. A method comprising the steps of:
separating a signal into a set of n subband signal components, wherein
each subband signal component is characterized by a power level,
generating a set of n subband noise components, and
for combinations of one subband signal component i,i = 1 ,2,...n and one
subband noise component j,j= 1,2,...n, measuring the ratio of the power level of the
jth subband noise component that can be masked by the ith subband signal
component to the power level of the ith subband signal component.
10. The method of claim 9 wherein the power level of each subband
noise component that can be masked by each subband signal component is
determined according to a masking criterion.
11. The method of claim 10 wherein said masking criterion is a just-
noticeable-distortion level.
12. The method of claim 10 wherein said masking criterion is an
audible-but-not-annoying level.
13. The method of claim 9 wherein said step of separating a signal into a
set of n subband signal components comprises the step of applying said signal to a
first filterbank comprising a first set of n filters, wherein the outputs of said first set

- 19 -
of filters in said first filterbank are the set of n subband signal components.
14. The method of claim 13 wherein said step of generating a set of n
subband noise components comprises applying a wideband noise signal to a second
filterbank comprising a second set of filters, said second filterbank having the same
filter characteristics as said first filterbank, wherein the outputs of said second set of
filters in the second filterbank are said set of n subband noise components.
15. The method of claim 10 wherein
the measured ratio is an element qi,j of a masking matrix Q.
16. The method of claim 15 further comprising the steps of:
multiplying the masking matrix by a vector p whose elements pi are the
power in each subband component of an input signal, to yield the noise power
spectrum that can be masked by the signal.
17. A method of determining the power of a filtered noise signal that can
be masked by a filtered frame of speech, said method comprising the steps of:
delaying said filtered frame of speech by a specified time,
determining the power of said filtered frame of speech,
measuring the power of said filtered noise signal,
delaying said filtered noise signal by said specified time, and
adjusting the power of said filtered noise signal as a function of the
power of said filtered frame of speech and of a desired noise-to-signal ratio to yield
the power of the filtered noise signal that is masked by the filtered frame of speech.
18. The method of claim 17 further comprising the step of multiplying
said filtered noise signal by a gain signal so as to achieve the desired noise-to-signal
ratio.

- 20 -
19. The method of claim 17 wherein said specified time is a function of
the impulse response of said first filter.
20. The method of claim 17 wherein said desired noise-to-signal ratio is
determined according to a masking criterion.
21. The method of claim 17 further comprising the steps of:
generating a noise signal, said noise signal having unit variance; and
applying said noise signal to a second filter to generate said filtered
noise signal.
22. A method comprising the steps of:
applying an input speech signal to a filterbank, said filterbank
comprising a set of n filters wherein the output of each filter is a respective subband
signal component in a set of n subband signal components, and
generating output signals based on the product of a masking matrix Q
and a vector p, wherein said masking matrix Q is an nxn matrix in which each
element qi,j of said masking matrix is the ratio of power of the noise in filter j that
can be masked by the power of the subband signal component in band i and whereinsaid vector p is a vector of length n in which each element pi is the power of the ith
signal component.

Description

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.

- _ 21653~2
A ~IETH O D FO R ~IE~SInRIN G SPEEC H ~L~SK~N G PR OPERTIES
Te~hni~i Field
The invention relates to a method for measuring masking properties of
components of a signal and for determining a noise level vector for the signal.
Back~round oftheIllv~.ltio..
Advances in digital networks such as ISDN (Integrated Services Digital
Network) have rekindled interest in the tr~ncmi.ccion of high quality image and
sound. In an age of compact discs and high-definition television, the trend toward
higher and higher fidelity has come to include the telephone as well.
Aside from pure lictening pleasure, there is a need for better sounding
telephones, especially in the business world. Traditional telephony, with its limited
bandwidth of 300-3000 Hz for tr~n.cmi.ccion of narrowband speech, tends to strain
listeners over the length of a telephone conversation. Wideband speech in the
50-7000 Hz range, on tne other hand, offers listeners a feeling of more presence (by
reason of tr~n.cmic.sion of signals in the 50-300 Hz range) and more intelligibility (by
reason of tr~ncmiccion of signals in the 3000-7000 Hz range) and is more easily
tolerated over longer periods. Thus, wider bandwidth speech tr~ncmic.cion is a
natural choice for il~ oving the quality of telephone service.
In order to transmit speech (either wideband or narrowband) over the
20 telephone network, an input speech signal, which can be characterized as a
continuous function of a continuous time variable, must be converted to a digital
signal -- a signal that is discrete in both time and amplitude. The conversion is a two
step process. First, the input speech signal is sampled periodically in time (i.e. at a
particular rate) to produce a sequence of samples where the samples take on a
25 continullm of values. Then the values are qu~nti7e~ to a finite set of values,
represented by binary digits (bits), to yield the digital signal. The digital signal is
characterized by a bit rate, i.e. a specified number of bits per second that reflects
how often the input speech signal was sampled and how many bits were used to
qu~nti7e~ the sampled values.
The hllpru~ed quality of telephone service made possible through
tran.cmi.c.cion of wideband speech, unfolllmately, typically requires higher bit rate
tr~ncmi.c.cion unless the wideband signal is piopt;lly coded, i.e. such that thewideband signal can be co~ ssed into representation by a fewer number of bits
without introducing obvious distortion due to qll~nti7~tion errors. Recently, high
35 fidelity coders of speech and audio have relied on the notion that mean-squared-error

- 216~3~2
measures of distortion (e.g. measures of the energy difference between a signal and
the same signal after it is coded and decoded) do not necess~rily accurately describe
the perceptual quality of a coded signal. In short, not all kinds of distortion are
equally pelce~tible to the human ear. M. R. Schroeder, B. S. Atal and J. L. Hall,
5 "Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human
Ear," J. Acous. Soc. Am., Vol. 66, 1647-1652, 1979; N. Jayant, J. Johnston and R.
Safranek, "Signal Compression Based on Models of Human Perception," Proc.
IEEE, Vol. 81, No. 10, pp. 1385-1422, October 1993; J. D. Johnston, "Transform
Coding of Audio Signals Using Perceptual Noise Criteria," IEEE J. Sel. Areas
10 Comm., Vol. 6, pp. 314-323, 1988. Thus, given some knowledge of how the humanauditory system tolerates different kinds of noise, it has been possible to design
coders that reduce the audibility -- though not neces5~rily the energy -- of
qn~nti7~tion errors. More specifically, these coders exploit a phenomenon of theauditory system known as m~c~ing
Masking is a term describing the phenomenon of human hearing
wherein one sound obscures or drowns out another. A common example is where
the sound of a car engine is drowned out if the volume of the car radio is high
enough. Similarly, if one is in the shower and misses a telephone call, it is because
the sound of the shower masked the sound of the telephone ring; if the shower had
20 not been running, the ring would have been heard.
The m~cking pro~llies of a signal are typically measured as a noise-to-
signal ratio det~rmine~ with respect to a m~C~ing criterion. For example, one
m~c~ing criterion is the just-noticeable-distortion (JND) level, i.e. the noise-to-signal
ratio where the noise just becomes audible to a listener. ~It~rn~tively, another25 m~c~ing criterion is the audible-but-not-annoying level, i.e. the point where a
listener may hear the noise, but the noise level is not sufficiently high as to irritate
the listener.
E~elil,lents in the area of psychoacoustics have focused on the m~king
plopellies of pure tones (i.e. single frequencies) and of narrow band noise. See, e.g.,
30 J. P. Egan and H. W. Hake, "Qn the Masking Pattern of a Simple Auditory
Stimulus," J. Acous. Soc. Am., Vol. 22, pp. 622-630, 1950; R. L. Wegel and C. E.Lane, "The Masking of One Pure Tone by Another and its Probable Relation to the
Dynamics of the Inner Ear," Phys. Rev., Vol. 23, No. 2, pp. 266-285, 1924.
Psychoacoustic data gathered during these experiments has demonstrated that: when
35 a first tone is used to mask a second tone, the m~cking ability of the first tone is
maximized when the frequency of the first tone is near the frequency of the second

~- 2165352
tone and that the ability of narrowband noise to mask the second tone is also
maximized when the narrowband noise is centered at a frequency near the second
tone. a lower frequency tone can mask a higher frequency tone more readily than a
higher frequency tone can mask a lower frequency tone.
5 The m~cking plùp~,~lies of more complex signals (such as wideband speech),
however, are more difficult to determine, in part, because they are not readily
decomposed into the tones and narrowband noise whose m~cking plupellies have
been studied.
Thus, there is a need for a method to a priori measure the m~cking
10 pn~pc;.lies of complex signals, i.e. to determine a priori the level of noise which may
be tolerated based on a selected m~C~ing criterion. Such measurements may then be
used to improve speech coding as described in our co-pending and commonly
~c~igned application "Method for Noise Weighting Filtering," filed concurrently
he.~wiLIl and incorporated by reference.
15 Summary of the Invention
Central to the invention is a recognition that the m~C~ing properties of a
signal, such as wideband speech, may be determined from the m~c~ing propellies of
its subband components. Accordingly, the invention provides a method for
determining the m~cking plopellies of a signal in which the signal is decomposed20 into a set of subband components, as for example by a filterbank. In one
embodiment, for a given subband colllponent, the noise power spectrum that can be
masked by each subband component is identified and the noise spectra are combined
to yield the noise power spectrum that can be m~ked by the signal. In a further
embodiment, output signals are generated based on the power in each subband signal
25 and on a m~cking matrix. The noise power spectrum that can be masked by the input
signal is determined from the output signals.
Brief D~ ,lion of the Drawin~s
Advantages of the present invention will become app~cllt from the
following detailed description taken together with the drawings in which:
FIG. 1 illustrates the inventive method for determining a noise level
vector of a speech signal.
FIG. 2A illustrates the elements qi,j of a m~C~ing matrix Q.
FIG. 2B illustrates the elements of a noise level vector.

2165~2
FIG. 3 illustrates a system for detPrmining the values of elements qi j in
m~cking matrix Q in the inventive method.
FIG. 4 is a flow chart for determining the values of the elements q ~ j in
m~cking matrix Q in the inventive method.
S De~ailed De~ lion
FIG. 1 illustrates a flow chart of the inventive method in which for a
frame (or segment) of an input signal, a noise level vector, i.e. the spectrum of noise
which may be added to the frame without excee~ing a m~cking criterion, is
determined a priori. The method involves three main steps. In step 120, the input
10 signal frame is broken down, as for example by a filterbank, into subband
components whose nl~s~ing prûpc~ties are k,nown or can be determinP~l In step 140
the m~cking plupe.Lies for each colllp~ ent are identified or ~ccessed, e.g. from a
database or a library, and in step 160 the m~C~ing p-upe.lies are combined to
determine the noise level vector, i.e. the spectrum of noise power that can be masked
15 by the input signal.
Note that the method represents the frame of the input signal as a sum of
subband components each of whose m~cking prope.Lies has already been measured.
However, in order to determine the noise level vector of an input speech signal, the
m:~sking properties of the components required in step 140 must first be determined.
20 Once the library of colllponent m~C'~ing prû~lLies is determinec7 and advantageously
stored in a database, the m~cking components can always be ~cescefl, and optionally
adapted, to det,. ",inç the noise level vector of any input signal.
The inventive method of FIG. 1 recognizes that the m~C~ing property of
a speech signal, i.e. the spe~;llL-ll of noise that the speech signal can mask, can be
25 based on the m~C~ing plupelly of components of the speech. For example, in order
to ~lotermin.o the m~C~ing prl)pel~ies of speech, a segment or frame of a first speech
input signal is split into subband components, as for example by using a filterbank
comprising a plurality of subband (b~n~lp~cs) filters. In order to determine thespectrum of noise that can be masked by the first speech input signal in a first30 embodiment, the spectrum of noise that can be masked by each subband component
of the speech input signal is ~leterrnin~ci and then the spectra for all subbandcomponents are combined to find the noise level vector for the first speech input
signal.
In another embodiment, for each subband component a measurement is
35 taken to det~rmine how much narrowband noise in each subband can be masked.
Thus, the measurement could be sl-mm~ri7ed as a method consisting of two nested

_ 21~S3~
steps:
for every subband of speech i and for every subband of white noise j: Adjust thenoise in subband j to the point where sufficient noise is added so that the m~c~ing
criterion is met. Measure the noise-to-signal ratio at this point. repeat for next
S subband j repeat for next subband i.
The noise-to-signal measurements for each combination of i and j, q i, j, represent the
ratio of noise power in band j that can be masked by the first speech input signal in
band i. The elements qi,j form a matrix Q. An example of such a Q matrix is
illustrated in FIG. 2A where, for convenience, the entries have been converted to
10 decibels. The Q matrix of FIG. 2A illustrates the results of an experiment in which
narrowband speech masked nallowl)and noise. The row numbers correspond to
noise bands; the column numbers co~ pond to speech bands. Each element q i j
represents the maximum power ratio that can be m~in~ined between noise in band jand the first speech input signal in band i so that the noise is m~c~d Note that not
15 all qi j have an associated value, i.~. some entries in the Q matrix are blank, because,
as explained below, it typically is not necessary to deterrnine every value in the Q
matrix in order to deterrnine the noise level vector. As explained below, the
subbands in the Q matrix are not uniform in bandwidth. Instead, the bandwidth ofeach subband increases with frequency. For example, as shown in Table 2 below,
20 subband 1 covers a frequency range of 80 Hz, from 0 to 80 Hz, while subband 20
covers a frequency range of 770 Hz, from 6230 Hz to 7000 Hz. If the power in each
subband of the input frame of the first speech signal is represented as a columnvector, p = [P I ,P2 ,...Pn ] T, the noise level vector d NLV may be found based on the Q
matrix and on the p vector: d NLV = Qp. i.e. the noise level vector is also a column
25 vector obtained by multiplying the nxn Q matrix by the n column vector of the power in each s~bband of the input frame of speech as shown in FIG. 2B.
In either embodiment, once either the spectrum of noise m~ck~d by each
subband component or the elements in the Q matrix have been determined for a
given input signal, they can be used to detenTline the spectrum of noise that can be
30 m~c~e-l not only by the given input signal but also by other input signals. For
example, if the power in each subband of a second input signal is
P2 = [P I .P 2 ,---P n ]2T~ then d NLV2 = QP2 with Q as determin~d by the input signal.
Note that each q~ j is a power ratio determined for a particular masking
criterion. This definition makes sense for stationary stimuli (i.e. signals whose
35 statistical ~lu~ ies are invariant to time translation), but in the case of dynamic

21653~2
stimuli, such as speech, care must be taken in adding noise power to a signal whose
level varies rapidly. In this instance, this problem is advantageously avoided by
arranging for the noise power level to vary with the speech power level so that
within a given segment or frame, the ratio of speech to noise power is a pre-
5 determined constant. In other words, the level of the added noise is dyn~mic~llyadjusted in order to achieve a constant signal-to-noise ratio (SNR) throughout the
frame. Measuring the amount of m~cking between one subband component of
speech and another subband of noise therefore consists of lictening to an ensemble of
frames of b~n~ip~c.sed speech with a range of segmPnt~l SNRs to determine which
10 SNR value meets the m~cking criterion. Different frame sizes may advantageously
be used for different subbands as described below.
In the paragraphs that follow a more rigorous presentation is given of
the method described above. A method for deterrnining the m~C~ing pll)pe-lies ofthe component signals required for step 140 is presented below first, and then a15 method of combining the component m~C~ing ploL:)ellies in step 160 is presented.
The presentation concludes with a short discussion of other potential uses for the
inventive method.
The more rigorous presentation begins by ~cs~lming that an input speech
signal, s(n) is divided via a bank of filters into N subbands s l (n) ,...,sN(n), and that
20 the noise maskee d(n) is similarly split into subband components d I (n) ,...,dN(n).
For each pair of subbands (i,j), measure the maximum segm~nt~l noise-to-signal
ratio (NSR) between dj (n) and s i (n) such that the combination of dj (n) +si (n)
meets a given m~c~in~ threshold, e.g. such that the combination of dj(n)+si(n) is
aurally inllictinguishable (i.e. meets the just noticeable distortion level) from si(n)
25 alone. Define the NSR to be the reciprocal of the traditional SNR, i.e.
NSR 3 1 3q = Idjl 3 /~
where the s~mm~tion limits span the current frame of speech.
To split the speech and noise into subbands a non-uniform, quasi-critical
band filterbank is designed. The term quasi-critical is used in recognition that the
30 human cochlea may be represented as a collection of b~n~p~c.s filters where the
bandwidth of each b~n~p~c.c filter is termed a critical band. See, H. Fletcher,
"Auditory Patterns," Rev. Mod. Phy., Vol. 12, pp. 47-65, 1940. Thus, the
characteristics and parameters of the filters in the filterbank may incorporate

216~352
knowledge from auditory experim~nts as, for example, in determining the bandwidth
of the filters in the filterbank. Note that it is advantageous that the filterbank used to
produce the library of m~cking ~lopellies of components be the same as the
filterbank used in step 120 of FIG. 1. However, some constraints on the filterbank
5 may be advantageously imposed to make measurements obtained with one set of
filterbank subbands more readily applicable to filterbanks with other subbands. In
particular:
Each filter should be as rectangular as possible, although significant passband ripple
can be sacrificed in the name of greater :ltt~nn~tion. Overlap between adjacent filters
10 should be minimi7e~ Thus the filterbank is not completely faithful to the human ear
to the extent that ~ nt~lly measured cochlear filter responses are not
rectangular and tend to overlap a great deal. These conditions are imposed, however,
since the nltim~te interest is in the problem of coding, and splitting an input signal
into (nearly) orthogonal subbands prevents coding the same information twice. The
15 composite response of the filters should have nearly flat frequency response.Although perfect reconstruction is not required, the combined output should
advantageously be pelce~tually in~ictinguishable from the input. This quality of the
filterbank may be verified by lictening tests. To avoid audible distortions due to
different group delays, linear phase filters may be used, although it should be noted
20 that because of the a~y~ leLI~ of forward and backward m~cking it would be
preferable to use Illinil""", phase filters. This last point is illustrated by considering
the case when the speech signal consists of a single spike. The combined output of a
linear-phase filterbank would consist of the same spike delayed by half of the filter
length, but the combined filtered noise would be dispersed equally before and after
Z5 the spike. Since fol .~vd~d m~CL ing extends much farther in time than backward
m~cking, it would be preferable if more noise came after the spike instead of before;
this might be achieved with a more complicated minimnm-phase filter design.
In order to model the constant-Q, critical band nature of the cochlea, the
following constraints may also advantageously be imposed: N = 20 total subbands,30 corresponding roughly to the number of critical bands between 0 and 7KHz as found
in prior expe~ e~ l methods. The bandwidths form an increasing geometric series.
Assume that the first band spans the frequencies [0,a] and call b the ratio between
successive bandwidths, then these last two conditions may be summarized as

2165352
-
b2o - I
f2o =a b - l '
wheref~0 is the highest frequency to be included, typically 7KHz in a speech case.
Setting a = 100, corresponding to previous measurements of the first critical band,
and solved for b using Newton's iterative approximation. This value of b is then5 used to generate an ideal set of band edges as shown in Table 1.
Using these ideai band edges as a starting point, filters may be designed.
In one embodiment of the invention, twenty 512-point, min-max optimal filters using
the well-known Remez exchange algorithm were design~1 Table 2 lists the
parameters for each filter. Typically, it may be n~cess~ry to adjust the band edges so
10 that the composite filterbank response would be flatter, but the filterbank's combined
output should sound identic~l to the input.
Since the human cochlea exhibits increasing time resolution at higher
frequencies, the frame size for each band is advantageously chosen according to the
length of the impulse response of the band filter. For higher bands, the energy of the
15 impulse response becomes more concentrated in time, leading to a choice of a
smaller frame size. Table 3 shows the relationship between the noise band numberand frame size.
Despite the well-known dependence of m~C~ing on stimulus level, no
precise restrictions on loudness during the e~e. ;II lr.~ typically need be imposed. It
20 is usually sufficient to measure m~C~ing effects under the normal operating
conditions of an actual speech coder. Thus the volume control may be set to a
comfortable level for listening to the full-bandwidth speech and left in the same
position when listening to the con~titllent subbands, which as a result sound much
softer than the full speech signal. Listening tests are advantageously be carried out
25 in a soundproof booth using headphones with the same signal is presented to both
ears.
As mentioned above, the level of the noise should be adjusted on a
frame-by-frame basis in order to m~int~in a constant local NSR, qij. FIG. 3 is ablock diagram of a system to achieve this for each frame of speech. FIG. 4 is a
30 flowchart illustrating steps carried out by the system of FIG. 3. The operation of the
system of FIG. 3 is advantageously described on a step-by-step basis:
Generate a frame of unit variance noise: Unit variance G~l-ssi~n random noise
generator 305 is used to produce u(n) in step 405, which is then scaled according to

21653
u(n) ~Il (n) ~ ~ mn +N~ 2 (k)
where N is the frame size and m is the number of the current frame, starting from
m =0. This ensures noise with unit variance on a frame-by-frame basis. Filter
speech: Input the current frame of speech in step 410. In step 415 the speech is5 filtered through filter j 315 of the filterbank to produce sj (n ). Measure energy of
bqn~lps~c speech: The output of filter 315 is then passed through delay 317. Thedelay allows the system of FIG. 3 to "look ahead" to m~int~in a constant local NSR
as described below. To compute how much noise to inject in this frame, in step 420
calculate the energy p j of the speech as,
mN+N- I
pj = ~ sj2(k-L),
k=mN
using energy measurer 320 where L s the amount of delay as explained in more
detail below. ~ e look-ahead energy of ~qnApqcc speech: Because of the
inherent delay imposed by the filterbank, adjustments to the noise level at the filter
input are not imm~ t--ly registered at the output. Therefore some measure of the15 speech power is needed in the near future to help decide how to adjust the noise level
in the present. The look-ahead energy p j is defined as the energy of one frarne of
sj (n)
mN+N- I
pj= ~ sj(k)
k=mN
Typically L = 320 samples yields the best results for 512 point filters. Note that this
20 problem would be easier to solve if the filters were minimllm-phase rather than linear
phase. Compu~e desired na~ and noise power: In step 430 multiply the
speech power by the desired noise-to-signal ratio q ij in adaptive controller 330 to
yield a desired noise power, ~:
~ = P~
25 Fctin~qte re~luir~d b~ hsnd noise power: To approximate the desired noise
power at the filter output, it is noted that for a filter of bandwidth ~ i Hz, the filtered
unit-variance noise should have a variance of c~i/S, where S is the Nyquist
frequency. Linearity may therefore be exploited to try to achieve the desired noise
power ~ at the filter output. Because of the filter delays described above, instead of

- 211~53~2
- 10-
using the speech power in the current frame to compute ~, a look-ahead desired
noise energy ~ is defined:
~ = Pj4i~ -
Then the noise is scaled in pre-adjuster 340 in order to try to achieve the look-ahead
5 energy as follows:
e(n) = u(n)~
Filter the adjusted noise: The adjusted noise e(n) is filtered through band i using
filter 350, to yield ei(n) and then applied to delay 355so that the noise is again
synchronous with the input frame of speech. Me&~.lr~ the energy of the bqn~lp~s~10 noise: Next measure the actual bandpass noise power, di in measurer 360:
mN+N- I
di = ~ ei2(k-L) .
k=mN
Fine-tune the noise: To adjust the noise so that the desired NSR is achieved exactly,
apply at multiplier 380 a time-varying gain gi at the filter output. To minimi7~smearing in the noise spectrum, it is advantageous to vary g i smoothly so that it
takes the form
L) 2 B(1 -cos ( W ) )+A(l+cos ( W ) ) OS(n-L)<W- 1
B W<(n -L) <N- 1
where A is the final value of g i from the previous frame, W is the length of the
smoothing win~ow (which can be thought of as half of a Hann window), and B is the
final value of g i. Thus, given A and W, one should be able to solve for B such that
mN+N- I
~ ~e~(k-L)gi(k-L)~2=~.
k=mN
Because g i is linear in B, the above expression becomes a quadratic equation of the
form
a2B2 +a I B +aO =,
where

- 21~5~5~
1 mN+W-I rc(k--L) mN+N-I
a2 = 4 ~ (1-cos W )2ei2(k-L)+ ~, e~2(k-L)
k=mN k=mN+W
al = A ~ (1--cos2 rc(k L) )e~2(k--L)
aO = 4 ~ (l+cos ( W ) )2e~2(k--L)--~.
Thus a colllprolllise is forced between a smooth transition using a long window, and
5 a crisp change to the desired noise level using a short window. Making the window
too short smears the spectrum of the b~ndp~cc noise, an effect that typically is quite
noticeable, leading to severe underestim~tt~s of m~C~ing power. Making the window
too long, however, leads to more subtle clicks that emerge when the noise level lags
behind the speech. Thus, an initial value of W = N/2 was chosen.
The quadratic equation for B usually has two real solutions; typically the
solution that minimi7e~ IA -Bl was chosen in order to avoid drastic changes in gain
and reduce spectral cm~ring Sometimes, however, there is no real solution. This
may occur at transitions from loud to soft frames, when reducing the gain gradually
had the effect of including more noise at the beginning of the frame than we wanted
15 in the entire frame. In these cases W may be de-;lGlllented until the longest possible
window that allowed an exact solution was found. In rare cases this search can lead
to W = 0, but only during very soft passages when both speech and noise were below
the threshold of h~rin~ In the W=û case, g, has the form
gi(n--L) = ¦ mN+N-I
~, e2 (k -L)
k=mN
Since there are 20 sub-bands, potentially 400 combinations of i and j
need to be measured. However, it is not typically necessary to carry out the
experiment for every particular (i,j) combination because m~sking depends on howclosely the signal co.ll~onent and masker are in frequency. Thus, typically
measurements should be taken for combinations of i and j such that li -il < 2.
25 Values for qi ~ for ~ > 2 can typically be assumed to be zero, i.e. no m~sking
takes place, with perhaps the exception of small values of i and j where m~sking may
som~otim~s extend over 3 bands.

216~352
- 12-
Recall that a noise level vector for a speech signal, i.e. the spectrum of
noise masked by the input signal, may be calculated according to a three step
process. Already demonstrated is that speech might best be analyzed in terms of its
constituent critical bands, and ~t~mining the m~C~ing ~lu~e,lies of each band.
5 Now the third step of the process, namely, superposing the m~king pro~elLies of the
subbands to form a noise level vector, is discussed.
Given a vector of speech powers p = (P l .- .P 20 ). where p i
corresponds to the power of the speech in band i in the current frame, a noise level
vector d = (d I ,... ,d20 ) can be ~ i n~d such that noise added at these levels or
10 below does not exceed the m~C~ing threshold.
This calculation requires knowledge of how to add the m~C~ing effects
of two or more maskers and the effects are combined simple addition; or, more
formally:
Linear sUperp~citi~n of noise power: If a signal S masks a noise
power vector d = (dl ,.. ,d20)r, i.e., where dj is the power of the
noise in band j in the current frame and "T" in(liç~t.os the
transpose; and another signal S', uncorrelated with S, masks a
noise power vector d' = (dl ,. ,d'20 ) T; then the combined signal
S + S' will mask the noise power vector
d+d' = (dl+dl,.. d20+d2o)
Simple addition is advantageously used instead of non-linear superpositions rules
because it typically leads to more conse,~a~ive essim~t~s of the m~CI~ing p,u~,lies
of the signal.
Note generally that the ~u~l~osi~ion idea assumes that consecutive
25 bands in the filterbank do not overlap, so that the noise level in one band can be
adjusted without affecting the level in another, and so that the speech may be
decomposed into uncorrelated subbands. Thus high-order, nearly rectangular filters
in the filterbank were used.
Accordingly the total spectrum of the noise level vector, d NLV can be
30 found in a given frame if we know the m~Clring ~lu~ ly d i for every band of speech
i = 1,...,20 is known. This involves a simple sum of noise powers:
dNLY = ~ di -
i=l
To find the masked noise vector d i for speech band i, use the measured threshold

216~
- 13-
NSRs qi;- Since the speech powerpi and the miniml-m ratio of speech to noise
power q ij are known, then the maximum masked power in bands 1-20 using one
column of the q ij matrix can be computed:
r ~T
di = lPiqil,Piqi2.---.Piqi20J (4 3)
5 In other words, the threshold noise power in each band is equal to the product of the
signal power and the threshold noise-to-signal ratio.
Combining equations 4.2 and 4.3 to summarize the method as one
matrix equation yield.
d NLV = QP ,
lO where Q = I q i~ ~. (Note that whenever q ij has not been measured, assume that there
is zero m~cking; qij = 0-) Equation 4.4 thus describes how the noise level vector for
a given frame of speech can be determined based on the input power in the speechframe and on the m~cking plv~llies of speech as represented by the m:~C~ing matrix
Q-
The above method is flexible in that new knowledge about m~cking
effects in the human auditory system may be readily incorporated. The choice of a
linear superposition rule, for example, can be easily changed to a more complex
function based on future auditory eA~ l t~ The values in the Q matrix,
moreover, need not be fixed. Each element in the matrix could be adaptive, e.g. a
20 function of loudness since m~C~ing p,u~e.lies have been shown to change at highvolume levels. It would also be easy to use different Q matrices depending on
whether the current frame of speech consisted of voiced or unvoiced speech.
This disclosure describes a method for m~cllring the m~cking
plvL~ellies of cv~llponents of speech signals and for determining the m~cking
25 threshold of the speech signals. The method disclosed herein has been described
without reference to specific ha,-lw~; or software. Instead the method has been
described in such a manner that those skilled in the art can readily adapt such
hardware or software as may be available or preferable.
While the above teaching of the present invention has been in terms of
30 determining the m~C~ing ~lvpellies of speech signals, those skilled in the art of
digital signal procescing will recognize the applicability of these teachings to other
specific contexts. Thus, for example, the m~cking properties of music, other audio
signals, images and other signals may be determined using the present invention.

- 216~52
- 14-
Band number Lower edge Upper edge
Hz Hz
1 0 100
2 100 212
3 212 337
4 337 476
476 632
6 632 806
7 806 1001
8 1001 1219
9 1219 1462
1462 1734
11 1734 2038
12 2038 2377
13 2377 2756
14 2756 3180
3180 3654
16 3654 4183
17 4183 4775
18 4775 5436
19 5436 6174
6174 7000
TABLE 1

2165352
-
Band numberLower edge Upper edge ~f low ~f high W Scale factor
Hz Hz Hz
1 0 8070 80 200.0 1.0
2 1201957575450.0 0.9
3 2283008080300.0 0.9
4 3374357575300.0 0.9
5 4856009090150.0 1.0
6 660806 85 85 150.0 1.0
7 8601000 85 85 150.0 1.0
810601210 85 85 150.0 1.0
912651460 85 85 150.0 1.0
1015151735 85 85 150.0 1.0
15 1117902038 85 85 lS0.0 1.0
1220952377 85 85 150.0 1.0
1324352756 85 85 150.0 1.0
1428153180 85 85 150.0 1.0
1532393654 85 85 150.0 1.0
20 1637124183 85 85 150.0 1.0
1742424775 85 85 150.0 1.0
1848355437 85 85 150.0 1.0
1954956174 85 85 lS0.0 1.0
3~32062307000 85 85 150.0 1.0
TABLE 2

- 2165352
- 16-
Noise band# Frame size (samples)
1-5 512
6- 14 256
15-20 128
TABLE 3

Dessin représentatif

Désolé, le dessin représentatif concernant le document de brevet no 2165352 est introuvable.

États administratifs

2024-08-01 : Dans le cadre de la transition vers les Brevets de nouvelle génération (BNG), la base de données sur les brevets canadiens (BDBC) contient désormais un Historique d'événement plus détaillé, qui reproduit le Journal des événements de notre nouvelle solution interne.

Veuillez noter que les événements débutant par « Inactive : » se réfèrent à des événements qui ne sont plus utilisés dans notre nouvelle solution interne.

Pour une meilleure compréhension de l'état de la demande ou brevet qui figure sur cette page, la rubrique Mise en garde , et les descriptions de Brevet , Historique d'événement , Taxes périodiques et Historique des paiements devraient être consultées.

Historique d'événement

Description	Date
Inactive : CIB expirée	2013-01-01
Inactive : CIB de MCD	2006-03-12
Le délai pour l'annulation est expiré	1998-12-15
Demande non rétablie avant l'échéance	1998-12-15
Réputée abandonnée - omission de répondre à un avis sur les taxes pour le maintien en état	1997-12-15
Demande publiée (accessible au public)	1996-07-01
Toutes les exigences pour l'examen - jugée conforme	1995-12-15
Exigences pour une requête d'examen - jugée conforme	1995-12-15

Historique d'abandonnement

Date d'abandonnement	Raison	Date de rétablissement
1997-12-15

Titulaires au dossier

Les titulaires actuels et antérieures au dossier sont affichés en ordre alphabétique.

Titulaires actuels au dossier
AT&T IPM CORP.
CASIMIR WIERZYNSKI

Titulaires antérieures au dossier
YAIR SHOHAM

Les propriétaires antérieurs qui ne figurent pas dans la liste des « Propriétaires au dossier » apparaîtront dans d'autres documents au dossier.

Documents

Pour visionner les fichiers sélectionnés, entrer le code reCAPTCHA :

Pour visualiser une image, cliquer sur un lien dans la colonne description du document (Temporairement non-disponible). Pour télécharger l'image (les images), cliquer l'une ou plusieurs cases à cocher dans la première colonne et ensuite cliquer sur le bouton "Télécharger sélection en format PDF (archive Zip)" ou le bouton "Télécharger sélection (en un fichier PDF fusionné)".

Liste des documents de brevet publiés et non publiés sur la BDBC .

Si vous avez des difficultés à accéder au contenu, veuillez communiquer avec le Centre de services à la clientèle au 1-866-997-1936, ou envoyer un courriel au Centre de service à la clientèle de l'OPIC.

({010=Tous les documents, 020=Au moment du dépôt, 030=Au moment de la mise à la disponibilité du public, 040=À la délivrance, 050=Examen, 060=Correspondance reçue, 070=Divers, 080=Correspondance envoyée, 090=Paiement})

Filtre

Télécharger sélection en format PDF (archive Zip)

Télécharger sélection (en un fichier PDF fusionné)

Description du Document	Date (aaaa-mm-jj)	Nombre de pages	Taille de l'image (Ko)
Abrégé	1996-04-18	1	19
Description	1996-04-18	16	710
Revendications	1996-04-18	4	129
Dessins	1996-04-18	3	43
Rappel de taxe de maintien due	1997-08-16	1	111
Courtoisie - Lettre d'abandon (taxe de maintien en état)	1998-01-25	1	187
Courtoisie - Lettre du bureau	1998-11-24	1	50
Demande de l'examinateur	1998-02-02	4	84

Sélection de la langue

Menus

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.

Sommaire du brevet 2165352

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Votre demande est en traitement.Les informations demandèes serontaccessibles dans quelques instants.Merci de patienter.

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.