Sélection de la langue

Search

Sommaire du brevet 1243122 

Énoncé de désistement de responsabilité concernant l'information provenant de tiers

Une partie des informations de ce site Web a été fournie par des sources externes. Le gouvernement du Canada n'assume aucune responsabilité concernant la précision, l'actualité ou la fiabilité des informations fournies par les sources externes. Les utilisateurs qui désirent employer cette information devraient consulter directement la source des informations. Le contenu fourni par les sources externes n'est pas assujetti aux exigences sur les langues officielles, la protection des renseignements personnels et l'accessibilité.

Disponibilité de l'Abrégé et des Revendications

L'apparition de différences dans le texte et l'image des Revendications et de l'Abrégé dépend du moment auquel le document est publié. Les textes des Revendications et de l'Abrégé sont affichés :

  • lorsque la demande peut être examinée par le public;
  • lorsque le brevet est émis (délivrance).
(12) Brevet: (11) CA 1243122
(21) Numéro de la demande: 1243122
(54) Titre français: TRAITEMENT DES FORMES D'ONDES ACOUSTIQUES
(54) Titre anglais: PROCESSING OF ACOUSTIC WAVEFORMS
Statut: Durée expirée - après l'octroi
Données bibliographiques
(51) Classification internationale des brevets (CIB):
(72) Inventeurs :
  • MCAULAY, ROBERT J. (Etats-Unis d'Amérique)
  • QUATIERI, THOMAS F., JR. (Etats-Unis d'Amérique)
(73) Titulaires :
  • MASSACHUSETTS INSTITUTE OF TECHNOLOGY
(71) Demandeurs :
  • MASSACHUSETTS INSTITUTE OF TECHNOLOGY (Etats-Unis d'Amérique)
(74) Agent: RICHES, MCKENZIE & HERBERT LLP
(74) Co-agent:
(45) Délivré: 1988-10-11
(22) Date de dépôt: 1986-03-18
Licence disponible: S.O.
Cédé au domaine public: S.O.
(25) Langue des documents déposés: Anglais

Traité de coopération en matière de brevets (PCT): Non

(30) Données de priorité de la demande:
Numéro de la demande Pays / territoire Date
712,866 (Etats-Unis d'Amérique) 1985-03-18

Abrégés

Abrégé anglais


Abstract of the Disclosure
A sinusoidal model for acoustic waveforms is
applied to develop a new analysis/synthesis technique
which characterizes a waveform by the amplitudes,
frequencies, and phases of component sine waves.
These parameters are estimated from a short-time
Fourier transform. Rapid changes in the
highly-resolved spectral components are tracked using
the concept of "birth" and "death" of the underlying
sine waves. The component values are interpolated
from one frame to the next to yield a representation
that is applied to a sine wave generator. The
resulting synthetic waveform preserves the general
waveform shape and is perceptually indistinguishable
from the original. Furthermore, in the presence of
noise the perceptual characteristics of the waveform
as well as the noise are maintained. The method and
devices disclosed herein are particularly useful in
speech coding, time-scale modification, frequency
scale modification and pitch modification.

Revendications

Note : Les revendications sont présentées dans la langue officielle dans laquelle elles ont été soumises.


-31-
CLAIMS
1. A method of processing an acoustic waveform, the
method comprising:
a. sampling the waveform to obtain a series
of discrete samples and constructing therefrom a
series of frames, each frame spanning a plurality of
samples;
b. analyzing each frame of samples to
extract a set of frequency components having
individual amplitudes;
c. tracking said components from one frame
to a next frame; and
d. interpolating the values of the
components from the one frame to the next frame to
obtain a parametric representation of the waveform
whereby a synthetic waveform can be constructed by
generating a set of sine waves corresponding to the
interpolated values of the parametric representation.
2. The method of claim 1 wherein the step of
sampling further including constructing a frame
having variable length, which varies in accordance
with the pitch period, the length being at least
twice the pitch period of the waveform.
3. The method of claim 1 wherein the step of
sampling further includes sampling the waveform
according to a Hamming window.
4. The method of claim 1 wherein the step of
analyzing further includes analyzing each frame by
Fourier analysis.

-32-
5. The method of claim 1 wherein the step of
analyzing further includes selecting a harmonic
series to approximate the frequency components.
6. The method of claim 5 wherein the number of
frequency components in the harmonic series varies
according to the pitch period of the waveform.
7. The method of claim 1 wherein the step of
tracking further includes matching a frequency
component from the one frame with a component in the
next frame having a similar value.
8, The method of claim 7 wherein said matching
further provides for the birth of new frequency
components and the death of old frequency components.
9. The method of claim 1 wherein the step of
interpolating values further includes defining a
series of instantaneous frequency values by
interpolating matched frequency components from the
one frame to the next frame and then integrating the
series of instantaneous frequency values to obtain a
series of interpolated phase values.
10. The method of claim 1 wherein the step of
interpolating further includes deriving phase values
from frequency and phase measurements taken at each
frame and then interpolating the phase measurements.
11. The method of claim 1 wherein the step of
interpolating is achieved by performing an overlap
and add function.

-33-
12. The method of claim 1 wherein the method further
includes coding the frequency components for digital
transmission.
13. The method of claim 12 wherein the frequency
components are limited to a predetermined number
defined by a plurality of harmonic frequency bins.
14. The method of claim 13 wherein the amplitude of
only one of said components is coded for gain and the
amplitudes of the others are coded relative to the
neighboring component at the next lowest frequency.
15. The method of claim 12 wherein the phases are
coded by applying pulse code modulation techniques to
a predicted phase residual.
16. The method of claim 12 wherein high frequency
regeneration is applied.
17. The method of claim 1 wherein the method further
comprises constructing a synthetic waveform by
generating a series of constituent sine waves
corresponding in frequency and amplitude to the
extracted components.
18. The method of claim 17 wherein the time-scale of
said reconstructed waveform is varied by changing the
rate at which said series of constituent sine waves
are interpolated.
19. The method of claim 18 wherein the time-scale is
continuously variable over a defined range.

-34-
20. The method of claim 1 wherein the method further
comprises constructing a synthetic waveform by
generating a series of constituent sine waves
corresponding in frequency, amplitude, and phase to
the extracted components.
21. The method of claim 20 wherein the time-scale of
said reconstructed waveform is varied by changing the
rate at which said series of constituent sine waves
are interpolated.
22. The method of claim 21 wherein the time-scale is
continuously variable over a defined range.
23. The method of claim 20 wherein the constituent
sine waves are further defined by system
contributions and excitation contributions and
wherein the time-scale of said reconstructed waveform
is varied by changing the rate at which parameters
defining the system contributions of the sine waves
are interpolated.
24. The method of claim 17 wherein the short-time
spectral envelope of the synthetic waveform is varied
by scaling each frequency component.
25. The method of claim 23 wherein the pitch of the
synthetic waveform is altered by scaling the
excitation-contributed frequency components.
26. A device for processing an acoustic waveform,
the device comprising:
a. sampling means for sampling the waveform
to obtain a series of discrete samples and
constructing therefrom a series of frames, each frame
spanning a plurality of samples;

-35-
b. analyzing means for analyzing each frame
of samples to extract a set of frequency components
having individual amplitudes;
c. tracking means for tracking said
components from one frame to a next frame; and
d. interpolating means for interpolating
the values of the components from the one frame to
the next frame to obtain a parametric representation
of the waveform whereby a synthetic waveform can be
constructed by generating a set of sine waves
corresponding to the interpolated values of the
parametric representation.
27. The device of claim 26 wherein the sampling
means further includes means for constructing a frame
having variable length, which varies in accordance
with the pitch period, the length being at least
twice the pitch period of the waveform.
28. The device of claim 26 wherein the sampling
means further includes means for sampling according
to a Hamming window.
29. The device of claim 26 wherein the analyzing
means further includes means for analyzing each frame
by Fourier analysis.
30. The device of claim 26 wherein the analyzing
means further includes means for selecting a harmonic
series to approximate the frequency components.
31. The device of claim 30 wherein the number of
frequency components in the harmonic series varies
according to the pitch period of the waveform.

-36-
32. The device of claim 26 wherein the tracking
means further includes means for matching a frequency
component from the one frame with a component in the
next frame having a similar value.
33. The device of claim 32 wherein said matching
means further provides for the birth of new frequency
components and the death of old frequency components.
34. The device of claim 26 wherein the interpolating
means further includes means defining a series of
instantaneous frequency values by interpolating
matched frequency components from the one frame to
the next frame and means for integrating the series
of instantaneous frequency values to obtain a series
of interpolated phase values.
35. The device of claim 26 wherein the interpolating
means further includes means for deriving phase
values from the frequency and phase measurements
taken at each frame and then interpolating the phase
measurements.
36. The device of claim 26 wherein the interpolating
means further includes means for performing an
overlap and add function.
37. The device of claim 26 wherein the device
further includes coding means for coding the
frequency components for digital transmission.
38. The device of claim 32 wherein the frequency
components are limited to a predetermined number
defined by a plurality of harmonic frequency bins.

-37-
39. The device of claim 38 wherein the amplitude of
only one of said components is coded for gain and the
amplitudes of the others are coded relative to the
neighboring component of the next lowest frequency.
40. The device of claim 37 wherein the coding means
further comprises means for applying pulse code
modulation techniques to a predicted phase residual.
41. The device of claim 37 wherein the coding means
further comprises means for generating high frequency
components.
42. The device of claim 21 wherein the device
further comprises means for constructing a synthetic
waveform by generating a series of constituent sine
waves corresponding in frequency and amplitude to the
extracted components.
43. The device of claim 42 wherein the time-scale of
said reconstructed waveform is varied by changing the
rate at which said series of constituent sine waves
are interpolated.
44. The device of claim 43 wherein the time-scale is
continuously variable over a defined range.
45. The device of claim 26 wherein the device
further comprises means for constructing a synthetic
waveform by generating a series of constituent sine
waves corresponding in frequency, amplitude, and
phase to the extracted components.

-38-
46. The device of claim 45 wherein the time-scale of
said reconstructed waveform is varied by changing the
rate at which said series of constituent sine waves
are interpolated.
47. The device of claim 46 wherein the time-scale is
continuously variable over a defined range.
48. The device of claim 42 wherein the constituent
sine waves are further defined by system
contributions and excitation contributions and
wherein the time-scale of said reconstructed waveform
is varied by changing the rate at which parameters
defining the system contributions of the sine waves
are interpolated.
49. The device of claim 48 wherein the device
further includes a scaling means for scaling the
frequency components.
50. The device of claim 48 wherein the device
further includes a scaling means for scaling the
excitation-contributed frequency components.
51. A speech coding device comprising:
a. sampling means for sampling the waveform
to obtain a series of discrete samples and
constructing therefrom a series of frames, each frame
spanning a plurality of samples;
b. analyzing means for analyzing each frame
of samples by Fourier analysis to extract a set of
frequency components having individual amplitude
values;
c. tracking means for tracking the
components from one frame to a next frame; and
d. coding means for coding the component
values.

-39-
52. The device of claim 51 wherein the coding means
further includes means for selecting a harmonic
series of bins to approximate the frequency
components and the number of bins varies according to
the pitch of the waveform.
53. The device of claim 51 wherein the amplitude of
only one of said components is coded for gain and the
amplitudes of the other components are coded relative
to the neighboring component at the next lowest
frequency.
54. The device of claim 51 wherein the amplitudes of
the components are coded by linear prediction
techniques.
55. The device of claim 51 wherein the amplitudes of
the components are coded by adaptive delta modulation
techniques.
56. The device of claim 51 wherein the analyzing
means further comprises means for measuring phase
values for each frequency component.
57. The device of claim 56 wherein the coding means
further includes means for coding the phase values by
applying pulse code modulations to a predicted phase
residual.
58. The device of claim 56 wherein the coding means
further includes means for generating high frequency
component values from coded low frequency component
values.

-40-
59. A device for altering the time-scale of an
audible waveform, the device comprising:
a. sampling means for sampling the waveform
to obtain a series of discrete samples and
constructing therefrom a series of frames, each frame
spanning a plurality of samples;
b. analyzing means for analyzing each frame
of samples to extract a set of frequency components
having individual amplitudes;
c. tracking means for tracking said
components from one frame to a next frame;
d. interpolating means for interpolating
the amplitude and frequency values of the components
from the one frame to the next frame to obtain a
representation of the waveform whereby a synthetic
waveform can be constructed by generating a set of
sine waves corresponding to the interpolated
representation;
e. scaling means for altering the rate of
interpolation; and
f. synthesizing means for constructing a
time-scaled synthetic waveform by generating a series
of constituent sine waves corresponding in frequency
and amplitude to the extracted components, the sine
waves being generated at said alterable interpolation
rate.
60. The device of claim 59 wherein the time scale is
continuously variable over a defined range.
61. The device of claim 59 wherein the analyzing
mean further comprises means for measuring phase
values for each frequency component.
62. The device of claim 61 wherein the component
phase values are interpolated by cubic interpolation.

-41-
63. The device of claim 61 wherein the time-scale is
continuously variable over a defined range.
64. The device of claim 61 wherein the device
further comprises means for separating the measured
frequency components into system contributions and
excitation contributions and wherein the time-scale
of the synthetic waveform is varied by altering the
rate at which values defining the system
contributions are interpolated.
65. The device of claim 64 wherein the scaling means
alters the rate at which the system amplitudes and
phases, and the excitation amplitudes and frequencies
are interpolated.

Description

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.


~2~L3~Z;~
1 PROCESSING OF ACOUSTIC WAVEFORMS
.
The U.S. Government has rights in this
invention pursuant to the Department of the Air Force
Contract No. F19-028-80-C-0002.
Technical Field
The field of this invention is speech
technology generally and, in particular, methods and
devices for analyzîng, digitally-encoding, modifying
and synthesizing speech or other acoustic waveforms.
Background of the Invention
Typically, the problem of representing
speech signals is approached by using a speech
production model in which speech is viewed as the
result of passing a glottal excitation waveform
through a time-varying linear filter that models the
resonant characteristics of the vocal tract. In many
speech applications it suffices ~o assume that the
glottal excitation can be in one of two possible
states corresponding to voiced or un~oiced speech.
In the voiced speech state the excitation is periodic
with a period which is allowed to vary slowly over
time relative to the analysis frame rate (typically
10-20 msecs). For the unvoiced speech state the
glottal excitation is modeled as random noise with a
flat spectrum. In both cases the power level in the
excitation is also considered to be slowly
time-varying.
, I ' .

-
~4L3~22
1 While this binary model has been used
successfully to design narrowband vocoders and speech
synthesis systems, its limitations are well known.
For example, often the excitation is mixed having
both voiced and unvoiced components simultaneously,
and often only portions of the spectrum are truly
harmonic. Furthermore9 the binary model requires
that each frame of data be classified as either
voiced or unvoiced, a decision which is particularly
difficult to make if the speech is also subject to
additive acoustic noise.
Speech coders at rates compatible with
conventional transmission lines (i.e. 2.4 - 9.6
kilobits per second) would meet a substantial need.
At such rates the binary model is ill-suited for
coding applications. Additionally, speech processing
devices and methods that allow the user to modify
various parameters in reconstructing waveform would
find substantial usage. For example, time-scale
modification (without pitch alteration) would be a
very useful feature for a variety of speech
appl~cations (i.e. slowing down speech for
translation purposes or speeding it up for scanning
purposes~ as well as for musical composition or
analysis. Unfortunately, time-scale (and other
parameter) modifications also are not accomplished
with high quality by devices employing the b~nary
model.
Thus, there exists a need for better methods
and devices for processing audible waveforms. In
particular, speech coders operable at mid-band rates
and in noisy environments as well as synthesizers
capable of maintaining their perceptual quality of

1 speech while changing the rate of articulation would
satisfy long-felt needs and provide substantial
contributions to the art.
Summary of the Invention
It has been discovered that speech analysis
and synthesis as well as coding and time-scale
modifica~ion can be accomplished simply and
effectively by employing a time-frequency
representation of the speech waveForm which ls
independent of the speech state. Specifically, a
sinusoidal model for the speech waveform is used to
develop a new analysis-synthesis technique.
The basic method of the invention includes
the steps of: (a) selecting frames (i.e. windows of
about 20 - 40 milliseconds) of samples from the
waveform; (b) analy~ng each frame of samples to
extract a set of frequency components; (c) tracking
the components from one frame to the next; and (d)
interpolating the values of the components from one
frame to the next to obtain a parametrlc
representation of the waveform. A synthetic waveform
can then be constructed by generating a series of
sine waves corresponding to the parametric
representation.
In one simple embodiment of the invention, a
device is disclosed which uses only the amplitudes
and frequencies of the component sine waves to
represent the waveform. In this so-called
"magnitude-only" system, phase continuity is
maintained by defining the phase to be the integral
of the instantaneous frequency. In a more
comprehensive embodiment, expl~cit use is made of the
measured phases as well as-the amplitudes and
frequencies of the components.

iL2~3~Z2
--4--
1 The invention is particularly useful in
speech coding and time-scale modification and has
been demonstrated successfully in both of these
applications. Robust devices can be built according
to the invention to operate in environments of
additive acoustic noise. The invent10n also can be
used to analyze single and multiple speaker signals~
music or even biological sounds. The invention wi11
also find particular applications, for example, in
reading machines for the blind, in broadcast
journalism editing and in transmission of music to
remote players.
In one illustrated embodiment of the
invention, the basic method summarized above is
employed to choose amplitudes, freqwencies, and
phases corresponding to the largest peaks in a
periodogram of the measured signal, independently of
the speech state. In order to reconstruct the speech
waveform, the amplitudes, frequencies, and phases of
the sine waves estimated on one frame are matched and
allowed to cont~nuously evolve into the corresponding
parameter set on the successive frame. Because the
number of estimated peaks are not constant and slowly
varying, the matching process is not straightforward.
Rapidly varying regions of speech such as
unvo~ced/voiced transitions can result in large
changes in both the locat~on and number of peaks. To
account for such rapid movements in spectral energy,
the concept of "birth" and "death" of sinusoidal
components is employed in a nearest~neighbor matching
method based on the frequencies estlmated on each
frame. If a new peak appears, a "birth" is sa~d to
occur and a new track is ~nitiated. If an old peak
is not matched, a "death1' is said to occur and the

3~2
1 corresponding track is allowed to decay to zero.
Once the parameters on successive frames have been
matched, phase con~inuity of each sinusoidal
component is ensured by unwrapping the phase. In one
preferred embodiment the phase is unwrapped using a
cubic phase interpolation function having parame~er
values that are chosen to satlsfy the measured phase
and frequency constraints at the frame boundaries
while maintaining maximal smoothness over the frame
duration. Finally, the corresponding sinusoidal
amplitudes are simply interpolated in a linear manner
across each frame.
In speech coding applications, pitch
estimates are used to establish a set of harmonic
frequency bins to which the ~requency components are
assigned. (Pitch is used herein to mean the
fundamental rate at whlch a speakerls vocal cords are
vibrating). The amplitudes of the components can be
coded directly using adaptive pulse code modulation
20 (ADPCM) across frequency or ind~rec~ly using linear
predictive coding. In each harmonic frequency bin
the peak hav~ng the largest amplitude is selected and
assigned to the frequency at the center of the bin.
This results in a harmonic series based upon the
25 coded pitch period. The phases can then be coded by
using the frequencies to predict phase at the end of
the frame, unwrapping the measured phase with respect
to this prediction and then coding the phase residual
using 4 bits per phase peak. If there are not enough
bits aYailable to code all of the phase peaks (e.g.
for low-p;tch speakers~, phase tracks for the high
frequency peaks can be artificially generated. In
one preferred embodiment, this is done by translating
the frequency tracks of the base bands peaks to the

~L3~Z2
--6--
1 high frequency of the uncoded phase peaks. This new
coding scheme has the important property of
adaptively allocating the bits for each speaker and
hence is self-tuning to both low- and high-pi~ched
speakers. Although pitch is used to provide side
information for the coding algor~thm, the standard
voice-excitation model for speech is not used. This
means that recourse is never made to a
voiced-unvoiced decision. As a consequence the
invention is robus~ in noise and can be applied at
various data transmission rates simply by changing
the rules for the bit allocation.
The invention is also well-suited for
time-scale modification, which is accomplished by
time-scaling the amplitudes and phases such that the
frequency variations are preserved. The time-scale
at which the speech is played back is controlled
simply by changing the rate at which the matched
peaks are interpolated. This means that the
time-scale can be speeded up or slowed down by any
factor and this factor can be time-varying. This
rate can be controlled by a panel knob which allows
an operator complete flex~b~lity for varying the
time-scale. There is no perceptual delay in
~5 performing the time-scaling.
The invention will next be described in
connection with certain illustrated embod~ments.
However, it should be clear that various changes and
modifications can be made by those skilled in the art
without departing from the spirit and scope of the
invention. For example other sampling techniques can
be substituted for the use of a variable frame length
and Hamming window. Moreover the length of such
frames and windows can vary in response to the

2~
--7--
particular application. Likewise3 frequency matching
can be accomplished by various means. A variety of
commercial devices are available to perform Fourier
analysis; such analysis can also be performed by
custom hardware or specially-designed pro~rams.
Yarious techniques for extracting pitch
information can be employed. For example, the pi~ch
period can be derived from the Fourier transform.
Other techniques such as the Gold-Malpass techniques
~o can also be used. See generally, M.L. Malpass, "The
Gold Pitch Detector in a Real Time Environment" Proc.
of EASCON 1975 (Sept. 1975); B. Gold, "Description of
a Computer Program for Pitch Detection", Fourth
International Congress on Acoustics, Copenhagen
August 21-28, 1962 and B. Gold, "Note on Buzz-Hiss
Detection", J Acoust. Soc. Amer. 365, 1659-1661
(1964).
Yarious coding techniques can also be used
interchangeably with those descr~bed below. Channel
2c encoding techniques are described in J.N. Holmes,
"The JSRU Channel Vocoder", _E PROC, 27, 53-60
(1980)o Adaptive pulse code modulation is described
in-L.R. Rabiner and R.W. Schafer D~gital Process~n~
of Signal, (Prentice Hall 1978). Linear pred~ctive
2~ cod~ng is described by J.D. Markel, Linear Prediction
of Speech, (Springer-Yerlog, 1967).
It should be apprec~ated that the term
"interpolation" ~s used broad1y in this applicat~on
3~ to encompass various techn~ques for fill~ng in data
values between those measured at the frame
boundaries. In the magnitude-only system linear
interpolation ~s employed to fill in amplitude and
` frequency values. In this simple system phase values

3~Z~
--8--
1 are obtained by first defining a series of
instantaneous frequency values by interpolaking
matched frequency components from one frame to the
next and then integrating the series of instantaneous
frequency values to ob~ain a series of in~erpolated
phase values. In the more comprehensive system the
phase value of each frame is derived directly and a
cubic polynomial equation preferably is employed to
obtain maximally smooth phase interpolations from
frame to frame.
Other techniques that accomplish the same
purpose are also referred to in this application as
interpolation techniques. For example, the so-called
"overlap and add" method of filling in data values
can also be used. In this method a weighted
overlapping function can be applied to the result1ng
sine waves generated during ea~h frame and ~hen the
overlapped values can be summed to fill in the values
between those measured at the frame boundaries.
20 Brief Descriptlon of the Drawings
FIGURE 1 is a schematic block diagram of one
embodiment of the invention in which only the
magnitudes and frequencies of the components are used
to reconstruct a sampled waveform.
FIGURE 2 is an illustration of the extracted
amplitude and frequency components of a waveform
sampled according to the present invention.
FIGURE 3 is a general illustration of the
frequency matching method of the present invention.
.. . .

~æ~L3~2
1 FIGURE 4 is a detailed schematic
illustration of a frequency mat~hing method according
to the present invention.
FIGURE 5 is an illustration of tracked
frequency components of an exemplary speech pattern.
FIGURE 6 ~s a schematic block diagram of
another embodiment of the invention in which
magnitude and phase of frequency components are used
to reconstruct a sampled waveform.
FIGURE 7 is an illustrative set of cubic
phase interpolation functlons for smoothing the phase
functions useful in connection wi~h the embodiment of
FIGURE 6 from which the "maximally smooth~ phase
function is selected.
FIGURE 8 is a schematic block diagram of
another embodiment of the invention particularly
useful for time-scale modification.
FIGURE 9 is a schematic block diagram
showing an embodiment of the sys~em estima~ion
20 function of FIGURE 8.
FIGURE 10 is a block diagram of one
real-time implementation of the invention.
Detailed Description
In the presen~ invention the speech waveform
25 is modeled as a sum of sine waves. If s(n)
represents the sampled speech waveform then
s(n) = ~a~(n)sin[~itn)] (1)

~3~
-10-
1 where aj(n) and dj(n) are the time-varying
amplitudes and phases of the i'th tone.
In a simple embodiment the phase can be
defined to be the ~nteqral of the instantaneous
frequency fj(n) and therefore satisfies the
recursion
~j(n) = ~j~n-1)+2~fj(n)/fs t2)
where fs is the sampl~ng frequency. If the tones
are harmonically related, then
fj(n) = i*fO(n) (3)
where fO(n) represents the fundamental frequency at
time n. One particularly attractive property of ~he
above model is the fact that phase continuity, hence
waveform continuity, is guaranteed as a consequence
15 of the definition of phase in terms of the
instantaneous frequency. This means that waveform
reconstruction is possible from the magnitude-only
spectrum since a high-resolution spectral analysis
reveals the amplitudes and frequencies of the
2Q component sine waves.
A block diagram of an analysis/synthes~s
system according to the ~nvention is illustrated in
FIGURE 1. The peaks of the magnitude of the discrete
Fourier transform (DFT) of a windowed waveform are
25 found simply by determln~ng the locations of a change
in slope (concave down). In addition, the total
number of peaks can be limited and this limlt can be
adapted to the expected average pitch of the speaker.
In a simple embod~ment the speech waveform
30 can be d~gitized at a 10kHz sampling rate, low-passed
filtered at 5 kHz, and analyzed at 20 msec frame

~29L3~22
1 intervals with a 20 msec Hamming window. Speech
representations according to the invention can also
be obtained by employing an analysis window of
variable duration. For some applications It is
preferable to have the width of the analysis window
be pitch adaptive, being set, for example, at 2.~
times the average pitch period wit-h a minimum wid~h
of 20 msec.
Plotted in FIGURE 2 is a typical periodogram
for a frame of speech along with the amp1itudes and
frequencies that are estimated using the above
procedure. The DFT was computed using a 512-point
fast Fourier transform (FFT). Different sets of
these parameters will be obtained for each analysis
frame. To obtain a representation of the waveform
over time, frequency components measured on one frame
must be matched wi~h those that are obtained on a
successive frame.
FIGURE 3 illustrates the basic process of
frequency component matching. If the number of peaks
were constant and slowly varylng from frame to frame,'
the problem of matching the parameters estimated on
one frame with those on a successive frame would
simply require a frequency ordered assignment of
peaks. In practice, however, there will be spurious
peaks that come and go due to the effects of sidelobe
interaction; the locations of the peaks wil1 change
as the pitch changes; and there will be rapid changes
in both th~ location and the number of peaks
30 corresponding to rap~dly-varying reg~ons of speech,
such as at voiced/unvoiced transftions. In order to
account for such rapid movements in the spectral
peaks, the present Invention employs the concept of
"birth" and "death" of sinusoidal components as part
35 of the match~ng process.

~L3~L22
-12-
l The matching process is further explained by
consideration o~ FIGURE 4. Assume that peaks up to
frame k have been matched and a new parameter set for
frame k+1 is generated. Let the chosen frequencies
on frames k and k~1 be deno~ed by ~ok, ~k, ...
Wh 1 and ~ok 1, ~ 1, ...~M 1 respectively,
where N and M represent the total number of peaks
selected on each frame (N ~ M in general). One
process of matching each frequency in frame k, ~n~
to some frequency in frame k+1, ~m . is given in
the following three steps.
Step 1:
Suppose that a match has been found for
frequencies ~k, ~1 ... wn 1 A match is now
attempted for frequency ~n. FIGURE 4(a) depicts
the case where all frequencies ~m 1 in frame k~1
lie outside a "matching intervall' ~ of ~k, i.e.,
¦~k - ~m ~ (4)
for all m. In this case the frequency track
associated with ~n is declared "dead" on entering
frame k~1, and ~n is matched to itself in frame
k~1, but with zero amplitude. Frequency ~k is then
eliminated from further consideration and Step 1 is
repeated for the next frequency In the list, ~k+l.
If on the other hand there exists a
frequency ~k 1 in frame k+1 that lies within the
matching interval about ~k, and is ~he closest such
frequency, i.e.,
¦ k k+1¦ < ¦~k _ ~k+1¦ < A (5)

3~2
13-
1 -for all i ~ m, then ~km1 is declared to be a
candidate match to ~k. A defin~tive match is not
yet made, since there may exist a better match in
frame k to the frequency ~m 1, a contingency which
is accounted for in Step 2.
Step 2:
In this step, a candidate match from Step 1
is confirmed. Suppose that a frequency ~n f frame
k has been tentatively matched to frequency ~m f
frame k+1. Then, if ~m has no better ma~ch to
the remaining unmatched frequencies of frame k, then
the candidate match is declared to be a definitive
match. This condition, illustrated in FIGURE 4(c~,
is given by
¦~m 1 ~nl < ¦~m 1 _ ~k+1¦ ~or i > n (6)
When this occurs, frequencies ~n and ~k 1 are
eliminated from further consideration and Step 1 is
repeated for the next frequency in the list, ~k+1.
If the condition (6) is not satisfied, then
the frequency ~m 1 in frame k~1 is better matched
to the frequency ~n+1in frame k than it is to the
test frequency ~n. Two addltional cases are then
consldered. In the first case, illustrated in FIGURE
4(d), the adjacent rema~ning lower frequency ~k+
(if one exists) lies below the matching interval,
hence no match can be made. As a result~ the
frequency track associated with ~n is declared
"dead" on entering frame k+1, and ~k is matched to
itself with zero amplitude. In the second case,
~0 illustrated in FIGURE 4(e), the frequency ~k_1is
within the matching interval about ~n and a
definitive match is made. After either case Step 1

~2~3~2~
_14-
1 is repeated using the next frequency in the frame k
list, ~n 1- It should be noted that many other
situations are possible in this step, but to keep the
tracker alternatives as simple as possible only the
s two cases are discussed.
Step 3:
When all frequencies of Frame k have been
tested and assigned to continuing tracks or to dying
tracks, there may remain frequencies in frame k~1 for
which no matches have been made. Suppose that ~km 1
is one such frequency, then it is concluded that
~k 1 was "born" in frame k and its match, a new
frequency, ~km 1, is created ~n frame k with zero
magnitude. This is done for all such unmatched
frequencies. This last step is illustrated in FIGURE
4(f).
The results of applying the tracker to a
segment of real speech is shown in FIGURE 5, which
demonstrates the ability of the tracker to adapt
~o quickly through transitory speech behavior such as
voiced/unvoiced transitions, and mixed
voiced/unvoiced regions.
In the simple magnitude-only system,
synthesis is accomplished in a straightforward
2s manner. Each pair of match frequencies (and their
corresponding magnitudes) are linearly interpolated
across consecutive frame boundaries. As noted above,
in the magnitude-only system, phase continuity is
guaranteed by the definition of phase in terms of the
instantaneous frequency. The interpolated values are
then used to drive a sine wave gener~tor which yields
the synthetic waveform as shown in FIGURE 1. It
should be noted that performance is improved by
reducing the correlat~on window size, a, at h~gher
frequencies.

43~L22
-15-
1 A further feature shown in FIGURE 1 (and
discussed in detail be10w) is that the present
invention is ideally suited for performing time-scale
modification. From FIGURE 3 it can be seen that by
simply expanding or compressing the time scale, the
locations and magnitudes are preserved while
modifying their rate of change in time. To effect a
rate of change b, the synthesizer interpolation rate
R' (see FIGURE 1) is given by R' = bR. Furthermore,
wlth this system ~t is straightforward to invoke a
time-varying rate of change since frequencies may be
stretched or compressed by varying the interpolation
rate in time.
FIGURE 6 shows a block diagram of a more
comprehensfve system in which phases are measured
directly. In ~his system the frequency components
and their amplitudes are determined ln the same
manner as the magnitude-only system described above
and illustrated in FIGURE 1. Phase measurements,
however, are derived directly from the discrete
Fourier transform by computing the arctangents at the
estimated frequency peaks.
Since ln the comprehensive system of FIGURE
6 a set of amplitudes, frequencies and phases are
~5 estimated for each frame, it might seem reasonable to
estlmate the original speech waveform on the k'th
frame by generating synthetic speech using the
equation,
s(n) = ~ Ak cos ~n~k ~ ~k~ (7)
for kN < n < (k~1)N. Due to the time-varying nature
of the parameters, however, this straightforward
approach leads to discont~nuities at the frame
-

- - J
~:43~1L2Z
6 -
1 boundaries which seriously degrades the quality ~f
the synthetic speech. Therefore, a method must be
found for smoothly interpolating the parameters
measured from one frame to those that are obtained on
the next.
As a result of the frequency matching
algorithm described in the previous section, all of
the parameters measured for an arbitrary frame k are
associated with a corresponding set of parametPrs for
lo frame k+li Letting [AQk, ~Q, eQ] and [AQ
~Q , ~Q ] denote the successive sets of
parameters for the l'th frequency track, then an
obvious solution to the amplitude interpolation
problem is to take
A(n) - Ak+ ( Ak+l _ Ak ) n (8)
N
where n = 1, 2, , N is the time sample into the
k'th frame (The track subscript "Q" has been
omitted for convenience).
Unfortunately such a simple approach cannot
be used to interpolate the frequency and phase
because the measured phase, e , is obta~ned modulo
2~. Hence, phase unwrapping must be performed to
insure that the frequency tracks are "maximally
smooth" across frame boundaries. The first step in
solving this problem ~s to postulate a phase
interpolation function that is a cubic polynomial,
namely
~(t~ yt + at2 ~ st3 (9)

-
1~9L3~22
_17-
l It is convenient to treat the phase function as
though it were a function of a continuous time
variable t, with t=O corresponding to frame k and t=T
corresponding to frame k+l.
The parameters o~ ~he polynomial must be chosen to satisfy
frequency and phase measurements obtained at the frame
boundaries. Since the instantaneous frequency is the
derivatiYe of the phase, then
o(t) = y + 2 at + 3 st2 (10)
and it follows that at the starting point, t = O,
e(O) = ~ = 9k
0(O) = ~ = ~k (11)
and at the terminal point, t = T
e(T) = ek ~ ~kT ~ aT2 + BT3 = 0k+1 + 2 ~M
e(T) = ~k + 2 aT + 3 BT2 = ~k+1 (12)
where again the track subscript "Q" is om~tted for
convenience.
Since the terminal phase sk 1 is measured
modulo 2~, it is necessary to augment it by the term
~o 2~M (M is an integer) in order to make the resulting
frequency function "maximally smooth". At this point
M is unknown, but for each value of M, whatever it
may be, (12) can be solved for a(M) and s(M), (the
dependence on M has now been made expl~cit). The
solution is eas~ly shown to satisfy the matrix
equation:

33~2~
-18-
k~ k kT ~ 2
(M k~ k
In order to determine M and ultimately the
solution to the phase unwrapping problem, an
additional constraint needs to be imposed that
quantifies the "maximally smoo~h" criter~on. FIGURE
7 illustrates a typical set of cubic phase
interpolation functions for a number of values of M.
It seems clear on intuitive grounds that the best
phase function to p~ck is the one that would have the
least variation. This is what is meant by a
maximally smooth frequency track. In fact, if the
frequencies were constant and the vocal tract were
stationary, the true phase would be linear.
Therefore a reasonable criterion for "smoothness" is5 to choose M such that
,T
f(M) = J [e(t;M)]2 dt (14)
is a minimum, where a(t;M) denotes second derivative
of e(t;M) with respect to the time variable t.
Although M is integer valued, since f(M) is
quadratic in M, the problem is most easily solved by
minimizing f(x) with respect to the continuous
variable x and then choosing M to be the integer
closest to x. After straightforward but tedious
algebra, it can be shown that the minimizing value of
x is
2~ ~tek + ~kT - ek+1) + (~k+1 ~k) ](1

::L2~3~2Z
-19-
l from which M* is determined and used In (13) to
compute ~(M*) and s~M*), and in turn, the unwrapped
; phase interpola~ion function
e(t) = 0k ~ ~kt ~ ~(M*)t2 + B(M*)t3 (16)
5 This phase function no~ only satisfies all of ~he
measured phase and frequency endpoin~ constralnts,
but also unwraps the phase in such a way that s(t) Is
maximally smooth.
Since the above analysis began with the
assumption of an initial unwrapped phase ak
corresponding to frequency ~k at the start of frame
k, it is necessary to spec~fy the initialization of
the frame interpo1ation procedure. This is done by
noting that at some point in time the track under
15 study was born. When this event occurred, an
amplitude, frequency and phase were measured at frame
k+l and the parameters at frame k to which these
measurements correspond were defined by setting the
amplitude to zero (i.e., Ak = o) while maintaining
20 the sàme frequency (i.e. ~k = ~k'1) I d
to insure that the phase interpolation constraints
are satisfied initially, the unwrapped phase ~s
defined to be the measured phase ek 1 and the
start-up phase is deflned to be
~3k 0k~1 ~k ' 1 N (17)
where N is the number of samples traversed in going
from frame k+1 back to frame k.
As a result of the above phase unwrapping
procedure, each frequency track will have assoc~ated
with it an instantaneous unwrapped phase which
accounts for both the rapid phase changes due to the

3~2~
-20-
1 frequency of each sinusoidal component, and the
slowly varying phase changes due to the glo~tal pulse
and the vocal track transfer function. Letting
eQ(t) denote the unwrapped phase function for the
'th track, then the final synthetic waveform will be
given by
L(k)
stn) =~ AQ(n) cos [cQ(n)] (18)
where kN < n < (k~1~N, AQ(n) is given by (8), eQ(n)
~s the sampled data version of (16), and L(k) is
the number of sine waves estimated for the k'th frame.
The invention as described in connection
with FIGURE 6 has been used to develop a speech
coding system for operation at 8 kilobits per
second. At this rate, h~gh-quali~y speech depends
critically on the phase measurements and, thus, phase
coding is a high priority. Since the sinusoidal
representation also requires the specification of the
amplitudes and frequencies, it is clear that
relatively few peaks can be coded before all of the
available bits were used. The first step, therefore,
~s to significantly reduce the number of parameters
that must be coded. One way to do this is to force
all of the frequencies to be harmon~c.
During voiced speech one would expect all of
the peaks to be harmonically related and therefore,
by coding the fundamental, the locat~ons of all of
the frequencies will be ava~lable at the receiver.
Dur~ng unvoiced speech the frequency locations of ~he
peaks will not be harmonic in this case. However, it
is well known from random process theory that
noise-like waveforms can be represented (in an
ensemble mean-squared error sense) in terms of a
A

~2~2Z
-21-
1 harmonic expansion of sine waves provided the spac~ng
between adjacen~ harmon~cs is small enough that there
i 5 1ittle change in the power spec~rum envelope (i.eO
interYals less than about 100 Hz). This
S representation preserves the s~atistical properties
of the input speech provided the amplitudes and
phases are randomly varying from frame to frame.
Since the amplitudes and phases are to be coded, this
random variation inherent in the measurement
variables can be preserved ~n the synthe~ic waveform.
As a practical matter it is preferable to
estimate the fundamental frequency that characterizes
the set of frequencies in each frame, which in turn
relates to pitch extraction. For example, pitch
15 extraction can be accomplished by selecting the
fundamental frequency of a harmonic set of sine waves
to produce the best fit to the input waveform
according to a perceptual cr~ter~on. Other pitch
extraction techniques can also be employed.
As an ~mmediate consequence of using the
harmon~c frequency model, it follows ~hat the number
of sine wave components to be coded is the bandwidth
of the coded speech divided by the fundamental.
Since there is no guarantee that the number of
25 measured peaks will equal this harmonic number,
provision should be made for adjusting the number of
peaks to be coded. Based on the fundamental, a set
of harmonic frequency bins are established and the
number of peaks falling within each bin are
examined. If more than one peak is found, then only
the amplitude and phase corresponding to the largest
peak are retained for coding. If there are no peaks
in a given bin~ then a fictious peak is created

~3~2
_22-
1 having an amplitude and phase obtained by sampling
the short-time Fourier Transform at` the ~requency
correspond~ng to the center of the bin.
The amplitudes are then coded by applying
the same techniques used in channel vocoders. That
is, a gain level is set, for example, by using 5
bits with 2 dB per level to co~e the amplitude of a
first peak (i.e. the first peak above 300 Hz).
Subsequent peaks are coded logarithmically us~ng
delta-modulation techniques across frequency. In one
simulation 3.6 kbps were assigned to code the
amplitudes at a 50 Hz frame rate. Adaptive bit
allocation rules can be used to asslgn bits to
peaks. For example, if the pitch is high there will
~5 be relatively few peaks to code, and there will be
more bits per peak. Conversely when the pitch is low
there will be relativeiy few bits per peak, but s~nce
the peaks will be closer together their values will
be more correlated, hence the ADPCM coder should be
20 able to track them well.
To code the phases a f~xed number of bits
per peak (typically 4 or 5) is used. One method for
coding the phases is to assign the measured phase to
one of 2" equal subd~visions of -~ to ~ region, where
25 n= 4 or 5. Another method uses the frequency track
corresponding to the phase (to be coded) to pred~c~
the phase at the end of the current frame, unwrap the
value, and then code the phase residual us~ng ADPCM
techniques with 4 or 5 bits per phase peak. Since
there remains only 4.4 kbps to code the phases and
the fundamental (7 bits are used), then at a 50 Hz
frame rate, it w~ll be poss~ble to code at most 16
peaks. At a 4 kHz speech bandwidth and four bits per
phase, all of the phases w~ll be coded prov~ded the
, j ,

33L~
-23-
1 pitch is greater than 250 Hz. If the pitch ~s less
than 250 Hz provision has to be made for regenerating
a phase track for the uncoded high fre~uency peaks.
This is done by computing a differential frequency
s that is the difference between the derivatfve of the
instan~aneous cubic phase and the linear
-` interpolation of the end po~nt frequencies for that
track. The differential frequency is translated to
the high frequency region by adding it ~o ~he linear
interpolation of the end point frequencies
corresponding to the track of the uncoded phase. The
resulting Instan~aneous frequency function is then
integrated to give the ~nstantaneous phase function
that is applied to the sine wave generator. In this
lS way the phase coherence intrinsic in the voiced
speech and the phase incoherence characteristic of
unvoiced speech is effectively translated to the
uncoded frequency regions.
In FIGURE 8 another embodiment of the
invention is shown, particularly adapted for
time-scale modificat~on. In thls illustration, the
representative sine waves are further defined to
consist of system contributions (i.e. from the vocal
tract) and excitation contributions (i.e. from the
vocal chords). The excitation phase contributions
are singled out for cubic interpolat~on. The
procedure generally follows that described above ~n
connection w~th other embodiments, however, in a
further step the measured amplitudes Ak and phases
eQ are decomposed into vocal tract and exc~tation
components. The approach ~s to first form estimates
of the vocal tract amplitude and phase as functions

~.; 43~22
_24-
1 of frequency at each analysis ~rame (i.e., M(~,kR)
and ~(~,kR)). System amplitude and phase estimates
at the selected frequencies ~Q are then given by:
Mk = M(~k,kR) (19)
and
~k = (~kJkR) (20)
Finally, the exc~tation parameter estimates at each
analys~s frame boundary are obtained as
k k k
aQ = AQ/MQ (21)
and
Qk ek ~k (22~
The decomposition problem then becomes that
of estimating M(~,kR~ and ~(~,kR) as functions of
frequency from the high resolution spectrum X(~,kR).
(In practice, of course, uniformly spaced frequency
samples are available from the DFT.) There ex~st a
number of established ways for separating out the
system magnitude from the high-resolution spectrum,
such as all-pole modeling and homomorphic
20 deconvolution. If the vocal tract transfer function
is assumed to be minimum phase then the logarithm of
the system magnitude and the system phase form a
Hilbert transform pair. Under this condit~on, a
phase estimate ~(~,kR) can be derived from the
25 logarithm of a magnitude estlmate M(~,kR) of the

3~L2:~
-25-
1 system function through the Hilbert transform.
Furthermore, the resulting phase estimate will be
smooth and unwrapped as a functlon of frequency.
One approach to estima~ion of the system
magnitude, and the corresponding estimation of the
system phase through the use of the Hilbert Transform
is shown in FIGURE 9 and is based on a homomorphic
transformation. In this technique, the separation of
the system amplitude from the high-resolution
spectrum and the computation of the Hilbert transform
of this amplitude estimate are in effect performed
simultaneously. The Fourier transform of the
logarithm of the high-resolution magnitude ~s first
computed to obtain the "cepstrum". A right-sided
window, with duration proportional to the average
pitch period, is then applied. The ~maginary
component of the resulting ~nverse Fourier transform
is the desired phase and the real part is ~he smoo~h
log-magnitude. In practice, uniformly spaced samples
of the Fourler transform are computed w~th the FFT.
The length of the FFT was chosen at 512 wh~ch was
sufficiently large to avoid aliasing in the
cepstrum. Thus, the high-resolution spectrum used to
estimate the sinewave frequencies is also used to
~5 estimate the vocal-tract system function.
The remaining analysis steps in the
time-scale modifying system of FIGURE 8 are analogous
to those described above in connection with the other
embodiments. As a result of the matching algorithm,
all of the amplitudes and phases of the excitat~on
and system components measured for an arbitrary frame
k are associated with a corresponding set of
parameters for frame k+l. The next step In the
synthesis is to interpolate the matched excitation

~3~22
_26-
1 and system parameters across frame boundaries. The
interpolation procedures are based on the assumption
tha~ the excitation and sys~em functions are
slowly-varying across frame boundaries. This is
consistent with the assumption tha~ the model
parameters are slowly-varying relative to the
duration of the vocal tract impulse response. Since
this slowly-varying constraint maps to a
slowly-varying excitation and system amplitude, it
suffices to interpolate these functions linearly~
Since the vocal tract system is assumed
slowly varying over consecutive frames, it is
reasonable to assume tha~ its phase Is slowly-varying
as well and thus linear interpolation of the phase
samples will also suffice~ However~ the
characteristic of "s10wly-varying" is more diff~cult
to achieve for the system phase than for the system
magnitude. This is because an additfonal constralnt
must be imposed on the measured phase; namely that
the phase be smooth and unwrapped as a function of
frequency at each frame boundary. There ~t is shown
that if the system phase is obtained modulo 2~ then
linear Interpolation can result ~n a (falsely)
rapidly-vary~ng system phase between frame
boundaries. The importance of the use of a
homomorphic analyser of FIGURE 9 is now evident. The
system phase estimate derived from the homomorphic
analysis is unwrapped in frequency and thus
slowly-varying when the system amplitude (from which
it was derived) is slowly-varying. Linear
interpolation of samples of this function results
then in a phase trajectory wh~oh re~lects ~he
underlying vocal tract movement. This phase funct~on
is referred to as ~Q(t) where ~Q(o) corresponds to
the ~Q of Equation 22. F~nally, as before, a cubic

~2~3~
-27-
1 polynomial is employed to interpolate the excitation
phase and frequency. This will be referred to QQ(t)
where ~Q(o) corresponds to QQk 0~ Equation 22.
The goal of t~me-scale modification is to
maintain the perceptual quality of the original
speech while changing the apparent rate of
articulation. This implies that the frequency
trajectories of the excitation (and thus ~he pitch
contour) are stretched or compressed in time and ~he
vocal tract changes at a slower or faster rate. The
synthesis method of the previous section is ideally
suited for this transformation s~nce it involves
summing s~ne waves composed of vocal cord exc~tation
and vocal tract system contributions for which
explic~ funct~onal expressions have been derived.
Speech events which take place at a time
to according to the new time scale will have
occurred at p~ to ~n the original time scale. To
apply the above sine wave model to time-scale
~o modification, the "events" which are t~me-scaled are
the system amplitudes and phases, and the exc~tation
amplitudes and frequenc~es, along each frequency
track. Since the parameter estimates of the
unmodified synthesis are available as continuous
functions o~ time, then in theory, any rate change is
possible. In conjunction with the Equations (19) -
(22) the time scaled synthetic waveform can be
expressed as:
s'(n) = ~ AQ (P~1n)cos[QQ(P-ln)/p~ Q(P~1n)](23
where L (n) is the number of sine waves estimated at
time n. The required values in equatlon (23) are
obtained by slmply scallng AQ(t), QQ(t) and ~Q(t) at
a time p 1n and sealing the resulting excitation
phase by p

~43~
-28-
1 ~ith the proposed time-scale modification
system, it is also straightforward to apply a
time-varying rate change. Here the time-warp~ng
transformation is giYen by
to = W(to) = ¦ p(T)dT (24)
where p(T) is the desired ~ime-varying rate change.
In this generalization, each time-differential dT is
scaled by a different factor p(T). Speech even~s
which take place at a time to in the new time scale
will now occur at a time
to = W l(to) in the original time scale. If
to maps back to to~ then one approximat~on is
given by:
t1 ~ to ~ P-1(to) (25)
Since the parameters of the sinusoidal components are
available as continuous func~ions of time, they can
always be found at the requ~red t1.
Letting tn denote the ~nverse to time
tn=n , the synthetic waveform is then given by:
L(n)
~0 s' (n)=~ AQ(tn) cos[ QQ (tn) +~Q(tn) (26)
where
QQ(n)=nQ(n~ Q(tn) (27)
and
.
tn =tn~ p-l(tn-ll) (28)
where ~(t) is a quadratic function g~ven by the first
derivatlve of the cubic phase funct~on nQ(t)

~3~2~
_29-
: l and where
to = (29)
At the time a particular track is born, the
cubic phase function QQ(n) is initlalized by the
- 5 value Pttn') QQ(~n') where nQ(~n') is the
initial excitation phase obtained using (17).
It should also be appreciated that the
invention can be used to perform frequency and pitch
scaling. The short time spectral envelope of the
synthetic waveform can be varied by scal~ng each
frequency component and ~he pitch of the synthetic
waveform can be altered by scaling the
excitation-con~ributed frequency components.
In FIGURE 10 a final embodiment of the
invention is shown which has been implemented and
operated in real time. The illus~rated embodiment
was implemented in 16-bit fixed point arithmetic
using four Lincoln Digital Signai Processors
(LDSPs). The foreground program operates on every
input A/D sample collecting 100 input speech samples
into 10 msec buffers. At the same time a 10 msec
buffer of syntheslzed speech is played out through a
D/A converter. At the end of each frame, the most
recen~ speech is pushed down into a 600 msec buffer.
It ~s from this buffer that the data for the
pitch-adaptive Hamming w~ndow is drawn and on wh~ch a
512 point Fast Fourier Transform (FFT) is applied.
Next a set of amplitudes and frequencies is obtained
by locating the peaks of the magnitude of the FFT.
The data is supplied to the pitch extraction module
from which is generated the pitch estimate that
controls the pitch-adaptive windows. This parameter
is also supplied to the codiny module in the data
compression application. Once the pitch has been

~2~3~
_30-
estimated another pitch adaptive Hamming window is
buffered and transferred to another LDSP for parallel
computation, Another 512 point FFT is taken for the
purpose of estimating the amplitudes, frequencies and
phases, to which the coding and speech modification
methods wfll be applied. Once these peaks have been
determined ~he frequency tracking and phase
` interpolation methods are implemented. Depending
upon the application, these parameters would be coded
or modified to effect a speech transformation and
transferred to another pair of LDSPs, where the sum
of sine waves synthes~s is implemented. The
resulting synthetic waveform is then transferred back
to the master LDSP where it is put into the
appropriate buffer to be accessed by the foreground
program for D/A output.

Dessin représentatif

Désolé, le dessin représentatif concernant le document de brevet no 1243122 est introuvable.

États administratifs

2024-08-01 : Dans le cadre de la transition vers les Brevets de nouvelle génération (BNG), la base de données sur les brevets canadiens (BDBC) contient désormais un Historique d'événement plus détaillé, qui reproduit le Journal des événements de notre nouvelle solution interne.

Veuillez noter que les événements débutant par « Inactive : » se réfèrent à des événements qui ne sont plus utilisés dans notre nouvelle solution interne.

Pour une meilleure compréhension de l'état de la demande ou brevet qui figure sur cette page, la rubrique Mise en garde , et les descriptions de Brevet , Historique d'événement , Taxes périodiques et Historique des paiements devraient être consultées.

Historique d'événement

Description Date
Inactive : CIB expirée 2013-01-01
Inactive : CIB désactivée 2011-07-26
Inactive : CIB de MCD 2006-03-11
Inactive : CIB dérivée en 1re pos. est < 2006-03-11
Accordé par délivrance 1988-10-11
Inactive : Périmé (brevet sous l'ancienne loi) date de péremption possible la plus tardive 1986-03-18

Historique d'abandonnement

Il n'y a pas d'historique d'abandonnement

Titulaires au dossier

Les titulaires actuels et antérieures au dossier sont affichés en ordre alphabétique.

Titulaires actuels au dossier
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Titulaires antérieures au dossier
ROBERT J. MCAULAY
THOMAS F., JR. QUATIERI
Les propriétaires antérieurs qui ne figurent pas dans la liste des « Propriétaires au dossier » apparaîtront dans d'autres documents au dossier.
Documents

Pour visionner les fichiers sélectionnés, entrer le code reCAPTCHA :



Pour visualiser une image, cliquer sur un lien dans la colonne description du document. Pour télécharger l'image (les images), cliquer l'une ou plusieurs cases à cocher dans la première colonne et ensuite cliquer sur le bouton "Télécharger sélection en format PDF (archive Zip)" ou le bouton "Télécharger sélection (en un fichier PDF fusionné)".

Liste des documents de brevet publiés et non publiés sur la BDBC .

Si vous avez des difficultés à accéder au contenu, veuillez communiquer avec le Centre de services à la clientèle au 1-866-997-1936, ou envoyer un courriel au Centre de service à la clientèle de l'OPIC.


Description du
Document 
Date
(aaaa-mm-jj) 
Nombre de pages   Taille de l'image (Ko) 
Page couverture 1993-08-19 1 13
Abrégé 1993-08-19 1 21
Revendications 1993-08-19 11 288
Dessins 1993-08-19 8 130
Description 1993-08-19 30 901