Language selection

Search

Patent 1124866 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 1124866
(21) Application Number: 377766
(54) English Title: VOICE SYNTHESIZER
(54) French Title: SYNTHETISEUR DE LA PAROLE
Status: Expired
Bibliographic Data
(52) Canadian Patent Classification (CPC):
  • 354/47
(51) International Patent Classification (IPC):
  • G10L 13/047 (2013.01)
(72) Inventors :
  • OSTROWSKI, CARL L. (United States of America)
(73) Owners :
  • FEDERAL SCREW WORKS (Not Available)
(71) Applicants :
(74) Agent: MACRAE & CO.
(74) Associate agent:
(45) Issued: 1982-06-01
(22) Filed Date: 1981-05-15
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
836,589 United States of America 1977-09-26

Abstracts

English Abstract



ABSTRACT OF THE DISCLOSURE
A highly simplified speech synthesizer that is capable
of producing quality speech. The present speech synthesizer is
adapted to be driven by an 8-bit digital input command word. Six
of the bits are used for phoneme selection and the remaining two
bits for inflection control. In a first, embodiment, the system if
adapted to generate twelve parameter control signals for each
phoneme, with one of the parameters being utilized to control both
high and low frequency fricative injection into the vocal tract. This
embodiment also provides asynchronous excitation of the vocal tract
by including a second fricative excitation control circuit that is
adapted to inject white noise in parallel into the second and third
resonant filters under the control of the vocal amplitude control
signal. In a second embodiment, one of the twelve signal parameters
is utilized as two separate control signals thus effectively providing
thirteen control signal parameters. The vocal tract in the second
embodiment is also driven asynchronously with the glottal waveform
being injected in parallel into both the first and second resonant
filters. The second embodiment is also adapted to be operated off
portable power supply.


Claims

Note: Claims are shown in the official language in which they were submitted.


THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. In an electronic device for phonetically
synthesizing human speech including
vocal source means for producing a voiced excitation
signal;
fricative source means for producing an unvoiced
excitation signal;
input means responsive to input data identifying a
desired sequence of phonemes for producing a plurality of
control signals that electronically define each phoneme in
said desired sequence of phonemes, including a first control
signal for controlling the amplitude of said voiced excitation
signal and a second control signal for controlling the amplitude
of said unvoiced excitation signal; and
vocal tract means responsive to said voiced and
unvoiced excitation signals and certain of said plurality of
control signals for substantially producing the frequency
spectrums of each of said sequence of phonemes;
the improvement comprising pause control means
connected to said input means for producing an output signal
that is effective to cause said input means to maintain the current
values of certain of said control signals beyond the normal
phoneme period whenever both said first and second control signals
are absent.

2. The speech synthesizer of claim 1 wherein said
pause control means is further adapted to terminate production of
said output signal after a predetermined time period less than
the duration of an entire phoneme period in accordance with one


31

of said control signals that is produced at the beginning of
each phoneme.

3. The speech synthesizer of claim 2 wherein said
one control signal is a closure delay control signal.

32

Description

Note: Descriptions are shown in the official language in which they were submitted.


112~866
This application is a divisional of copending Canadian
application 310,040 filed August 25, 1978.



BACKGROUND AND SUMM~RY OF THE INVENTION
The present invention relates to voice synthesizers
and in particular to a highly simplified voice synthesizer
that is capable of producing ~uality speech.




csm/~

112 ~866
In general, the present invention comprises a
synthesizer of the type disclosed in copending Canadian
application Serial No. 283,441 entitled "Voice
Synthesizer", and assigned to the assignee of the present
application. While the synthesizer disclosed in the cited
copending application comprises a highly sophisticated
synthesizer capable of producing remarkably realistic
sounding speech, the present invention is intended to
provide a speech synthesizer that is simpler in design,
smaller in size, and less expensive, yet nonetheless capable
of producing quality speech.

The present speech synthesizer is adapted to be
ariven by an 8-bit digital input command word. Six of the
bits are used for phoneme selection, thus providing 26 or
64 possible phonemes, and the remaining two bits are dedicated
to inflection control. The system is adapted to generate
twelve control parameters for each phoneme. In the first
e~bodiment disclosed herein, one of the control signal
parameters, referred to as the fricative control, is utilized
to control the injection of both high and low frequency
fricative energy into the vocal tract. More ~articularly,
the system utilizes the fricative control signal and the
inverse of the fricative control signal to control the parallel
injection of fricati~e energy into th~ second and fourth ~F5)
resonant filters in the vocal tract. Thus, as will sub-
sequently be explained in greater detail, for a given phoneme
having an unvoiced component, fricative eneray is injected
directly into the F2 and F5 resonant filters, with the amount
of fricative energy that is injected into the F2 resonant




5~

.

1124866

filter being inversely related to the amount injected into
the F5 resonant filter. Also included in this embodiment
is a second fricative excitation control network that is
adapted to control the injection of fricative energy in
parallel into the second and third resonant filters in the
vocal tract under the control of the vocal amplitude control
signal. Consequently, the combination of the glottal
waveform which is injected into the Fl resonant filter and
the vocal amplitude controlled fricative injection into the
F2 and F3 resonant filters, provides asynchronous excitation
of the serial vocal tract. The result of using white noise
as the primary source of excitation for the F2 and F3
resonant filters provides the synthesizer with a more "breathy"
sounding voice.
According to the present invention there is provided
an electronic device for phonetically synthesizing human speech
including vocal source means for producing a voiced excitation
signal; fricative source means for producing an unvoiced
excitation signal; input means responsive to input data
identifying a desired sequence of phonemes for producing a
plurality of control signals that electronically define each
phoneme in said desired sequence of phonemes and that include
a first control signal for controlling the amplitude of said
voiced excitation signal and a second control signal for controlling




dm: b~ 3 -

124~366
the amplitude of said unvoiced excitation signal; vocal tract
means responsive to said voiced and unvoiced excitation signals
and certain of said plurality of control signals for substantially
producing the frequency spectrums of each of said desired
sequence of phonemes; and pause control means connected to said
input means for producing an output signal that is effective
to cause said input means to maintain the current values of
certain of said control signals beyond the normal phoneme
period whenever both said first and second control signals are
absent.
Additional objects and advantages will become
apparent from a reading of the detailed description of the
preferred embodiments which makes reference to the following
set of drawings in which:
BRIEF DESCRIPTION OF THE DRAWINGS
Figures la and lb are a block diagram of one
embodiment of a voice synthesizer according to the present
invention;
Figures 2a and 2b are a circuit diagram of the
voice synthesizer shown in Figures la and lb;
Figure 3 is a block diagram of another embodiment
of a voice synthesizer according to the present invention;
and




dm.

llZ~866
igures 4a and 4b are a circuit dia~ram of the
voice synthesizer shown in ~igure 3.


DETAILED DESCRIPTION ~F THE PREFERRED EMBODIMENTS


Looking to Figures la and lb, a block diagram
of one of the preferred embodiments of a voice synthesizer
according to the present invention is shown. As previously
noted, the present voice synthesizer comprises a simplified
and inexpensi~e version of the more sophisticated type of
synthesizer disclosed in copending Canadian application
Serial No. 283,441, entitled "Voice Synthesizer", and
assigned to the assignee of the present application. ~he
illustrated system is adapted to be driven by an 8-bit
digital command word. Six of the input bits 15 from the
digital command word are used for phoneme selection and
the remaining two bits 25 for Yarying the inflection level~
. of the audio output. The six phone~e select bits 15 are
provided to a ROM storage unit 12 ~hich has stored therein
for each of the 64 ~26) possible phonemes which can be
identified by the six phoneme select bits, 12 different
parameters which electronically define each phoneme. Each
parameter stored in R~M ~emory 12 preferably comprises
four bits of resolution for producing the serialized
binary-weighted di~ital

.




sb/1~

11248~;6

control signals described in the aforementioned copending
application. ThuS, the ROM memory unit 12 utilized in the
preferred embodiment must have a storage capacity of at least 4 x
12 x 64 or 3,072 bits. The memory utilized in the preferred
embodiment is a 12 x 2S6 bit read only memory (~OM).
Storage ROM 12 is adapted to be clocked under the
control of a duty cycle address circuit 22 which provides the
appropriate timing signals on lines 21 and 23 required for the
ROM 12 to generate the serialized binary-weighted duty cycle
parameter control signals previously mentioned. The duty cycle
address control circuit 22 is o~nnected to a clock circuit 24
that is adapted to produce a square wave clock signal at a
frequency of 20 KHz. The 20 KHz clock signal from clock circuit
24 is segregated by the duty cycle address control circuit 22
into 15 pulse groups, which are then further divided into time
segments of 8, 4, 2 and 1 clock pulses. For each group of 15
clock pulses received, the duty cycle address control circuit 22
provides a HI output signal on line 23 or the MSB line during the
eight and four time segmentsl and a HI output on line 21 or the
2~ LSB line during the eight and '.w~ time segments.
As previously noted, the serialized binary-weighted
digital control parameters generated by RoM 12 preferably contain
four bits of resolution. In other words, for each phoneme
parameter, ROM 12 contains four bits of information, thereby
providing 2 4 or 16 possible values per parameter. ~D provide the
four bits with their appropriate binary weight, the first or most
significant of the four serialized output bits in the oontrol
signal parameter is




-6-

, .

li2~66

generated when both the signals on lines 21 and 23 are HI; the
second bit when the LSB line is LO and the MSB line is HI; the
third bit when the LSB line is HI and the MSB line is LO; the
fourth or least significant of the four bits when the MSB and LSB
lines are both LO. Ihus, it can be seen that the first or m~st
significant bit is produced for a period of eight clock pulses,
the second bit is produced for a period of four clock pulses,
the third bit is produced for a period of two clock pulses, and
the fourth bit is produced for a period of one clock pulse. In
this manner, an analog signal can be digitally represented as the
average magnitude of a control signal over a 15 clock pulse
period.
Although known to the art, the particular control signal
parameters generated by ROM 12 will be briefly explained to
provide a better understanding of the operation of the present
system.
m e Fl, F2, and F3 conrol signals determine the
locations of the resonant frequency poles in the first three
variable resonant filters 42, 44, and 46 respectively, in the
vocal tract 60. The timing control signal (Timinq) is generated
for each phoneme and is used to establish the period of
production for each phoneme. me vocal amplitude control signal
(U~) is generated whenever a phoneme haviny a voiced component is
present. The vocal amplitude conrol signal controls the
intensity of the voiced component in the audio output. m e vocal
delay control signal ~VD~ is generated during certain
fricative-to-vowel phonetic transitions wher~in the amplitude of
the fricative constituent is rapidly decaying at the same time
the amplitude of the vocal constituent is rapidly increasing~




--7--

248~6

The vocal delay control signal is thus utilized to delay the
transmission of the vocal amplitude control signal under such
circumstances. The closure control signal (CL) is used to
simulate the phoneme interaction which occurs, for example,
during the production of the phoneme "b" followed by the phoneme
"e". In particular, the closure control signal, when provided to
the closure network 50, is adapted to cause an abrupt amplitude
modulation in the audio output that simulates the build-up and
sudden release of energy that occurs during the pronunciation of
such phoneme combinations. The vocal spectral contour control
signal (VSC) is used to spectrally shape the energy spectrum of
the vocal excitation signal. Specifically, the vocal spectral
contour control signal controls a first order low pass filter in
circuit block 40 that suppresses the vocal energy injected into
the vocal tract, with maximum suppression occurring in the
presen oe of purely unvoiced phonemes. m e F2Q control signal
varies the "Q" or bandwidth of the second order resonant filter
44 in the vocal tract 60, and is used primarily in connection
with the production of the nasal phonemes "n", "m" and "ng".
Nasal phonemes typically exhibit a higher amount of energy at the
first ~ormant (Fl), and substantially lower and broader energy
content at the higher formants. Thus, during the presen oe of
nasal phonemes, the F2Q control signal is generated to reduce the
Q of the F2 resonant filter 44 which, due to the cascaded
arrangement o the resonant filters in the vocal tract, prevents
significant amounts of energy frcm reaching the higher formants.
The fricative anplitude control signal (FA) is generated whenever
a phoneme having an unvoiced component is


112~8~6


present and is used to control the intensity of the unvoiced
component in the audio output. The closure delay control signal
(CLD) is generated during certain vowel-to-fricative phonetic
transitions wherein it is desirable to delay the transmission of
the closure and fricative amplitude oontrol signals in the same
manner as that discussed in connection with the vocal delay
control signal. Finally, a unique fricative control signal (FC)
is provided which replaces tw~ con~rol signals normally provided
in synthesizers of this type; i.e., the fricative frequency and
fricative low pass control signals. Specifically, it has been
determined that, in general, when a fricative phoneme requires
low frequency fricative energy in the range of the F2 formant, it
does not also require high frequency fricative energy in the
range of the F5 formant, and vice versa. m us, the present
invention utilizes a single fricative control (FC) signal, and
the inverse of the FC control signal (FC), to control the
injection of both low and high frequency fricative energy into
the vocal tract 60. The specific manner in which this is
accomplished will be subsequently explained in greater detail.
The output control signal parameters frcm RCM 12 are
applied to a plurality of relatively slow-acting transition
filters 14. In actuality, the binary-weighted duty cycle control
signals are effectively converted to analog signals by the
transition filters, and then converted back to duty cycle digital
signals by comparator amplifiers provided with a 20 KH~ triangle
clock signal from clock circuit 24. 1he transition filters 14
are purposefully designed to have a relatively long response timR
in relation to the steady-state

X g

I

` ` 1124866


duration of a typical phoneme so that the abrupt amplitude
variations in the output control signals from ROM 12 will be
eliminated. Thus, the transition filters 14 provide gradual
changes between the steady-state levels of the control signal
parameters to simulate the smcoth transitions ketween phonemes
present in human speech. Ihe response time of the transition
filters 14 utilized in the preferred embodiment are fixed, thus
eliminating the extensive amount of circuitry necessary to
provide variable speech rate capability.
The phoneme timer circuit 20 is adapted to produoe a
ramp signal that varies from five volts to zero volts in a time
period that determines the duration of phoneme production. The
slope of the ramp signal produced by the phoneme timer circuit 20
is dependent upon the value of the phoneme timing control signal
frcm ROM 12. The vocal delay control signal (VD) is provided to
a vocal delay network 16 which is adapted to delay the
transmission of the vocal amplitude control signal for a
predetermined period of time less than the duration of a single
phoneme time interval whenever the vocal delay control signal is
provided by ~CM 12. me closure delay control signal (CLD) is
provided to the closure delay network 18 which functions similar
to the vocal delay network 16 and is adapted to delay the
transmission of the fricative amplitude and closure control
signals whenever the closure delay control signal is provided by
RQM 12.
The two inflection select bits 25 from the 8-bit input
command word are provided directly to an inflection transition
filter circuit 32 which combines the binary-weighted bits into a
single analog


jT`
; -10-

- 112~866
inflection control signal, and then supplies the signal
to a transition filter which smooths the abrupt amplitude
variations in the inflection control signal in the same
manner as that previously described with respect to
transition filters 14. The output from the inflection
transition filter circuit 32 is provided to the vocal
excitation or glottal source 34 which generates the voiced
excitation signal or glottal waveform. The output from
the inflection transition filter 32 determines the pitch
of the voiced component, which corresponds to the fundamental
frequency (F0) of the glottal waveform. In the preferred
embodiment of the present invention, the glottal waveform
generated by the vocal excitation source 34 comprises a
truncated sawtooth type waveform similar to that described in
copending Canadian application, Serial No. 283,441, referred to above.
The glottal waveform from the vocal excitation source
34 is provided to the vocal tract 60 via the vocal excitation
controller circuit 40. The vocal excitation controller 40
is adapted to spectrally shape the energy content of the
glottal waveform in accordance with the vocal spectral
control signal, and modulate ~he amplitude of the vocal
excitation signal in accordance with the vocal amplitude
control signal.
The fricative excitation energy or unvoiced phoneme
quantity of human speech is supplied by a white noise
generator 26. Injection of the fricative excitation signal
into the vocal tract 60 is controlled by the fricative
excitation controller circuit 58 and a novel second parallel
fricative injection control network 38. The fricative excitation
controller 58 is shown broken down into its three individual




sb/J~

-` 1124866

circuits 28, 30 and 36 to emphasize the unique manner in which
injection of the fricative component into the vocal tract 60 is
controlled by this embodiment of the present invention. In
particular, a conventional voice fricative network 30 is provided
which is adapted to modulate the fricative amplitude control
signal in accordance with the glottal waveform whenever a phoneme
requiring voiced energy is generated, as determined by the
existenoe of a vocal amplitude control signal. The fricative
amplitude aontrol signal is then provided to a high pass filter
and fricative amplitude control network 28 which is adapted to
filter the fricative excitation signal from the white noise
generator 26 and modulate the amplitude of the signal in
accordance with the fricative amplitude control signal. The
modulated fricative excitation signal is then provided to a novel
fricative injection control network 36 which is adapted to
control the injection of fricative energy into the vocal tract 60
under the control of a single fricative control signal. The
fricative excitation signal from the output of the fricative
excitation oontroller 58 is parallel injected into both the F2
resonant filter 44 and the fricative or F5 resonant filter 54.
As previously noted, the output from the white noise
generator 26 is also provided to a second parallel fricative
injection control network 38. Significantly, it will be noted
that the parallel fricative injection control network 38 is
adapted to control the injection of fricative energy into the
second and third resonant filters 44 and 46 under the control of
the vocal amplitude control signal. As will subsequentl~ be
explained in greater detail, although the Fl, F2 and

1124866

~ resonant filters 42, 44 and 46 respectively, are a~nnected in
serial form, the vocal excitation signal injected into the Fl
resonant filter 42 does not have sufficient energy outside the
Fl frequency range to adequately drive the second and third
resonant filters 44 and 46 respectively. Rather, in the
enbodiment illustrated in Figures la and lb, the second and third
resonant filters 44 and 46 are driven substantially with the
white noise under the control of the vocal amplitude control
signal. The result of this arrangement is to provide the present
10voice synthesizer with a more "breathy" or "hoarse" sounding
voice.
me output from the first three serially connected
resonant filters 42, 44 ad 46 are summed with the output from the
fifth or fricative resonant filter 54, as indicated at 48, and
the combined output is provided through the closure network 50
and a-low pass filter 52 to an appropriate audio transducer. The
closure network 50 is adapted to abruptly modulate the amplitude
of the audio output signal in accordance with the closure control
signal as previously described. The low pass filter 52 is
20adapted to filter the effects of the 20 KHz clock signal from the
audio output.
Referring now to Figures 2a and 2b, a circuit diagram of
the embodiment of the present voice synthesizer illustrated in
Figures la and lb is shown. As previously mentioned in
connection with the descripticn of the block diagram, the present
voice synthesizer is adapted to be driven by an 8-bit digital
input command word. me six input bits utilized for phoneme
selection 74 are connected in parallel to a pair of R~M memories
70 and 72. ~o Ra~ IC chips are utilized to provide the required
3~storage capability previously



-13-

1124866
~iscussed. As also noted earlier, ROM memories 70 and 72 are
adapted to produce binary-weighted duty cycle output control
signals comprising the electronic parameters of the synthesized
speech. In that the present invention constitutes an improvement
in voice synthesizers and much of the circuitry is duplicative
for each control signal of the circuitry known to the art, only
the circuitry associated with the closure control signal, by
example, will be explained in detail.
When a closure control signal is produced at the output
of ROM 72, it is provided through a CMOS buffer 78 to a fixed
rate RC transition filter comprising resistors Rl and R2 and
capacitors Cl and C2. The transition filter as noted, serves to
smooth the abrupt amplitude variations in the binary-weighted
digital control signal produced by RoM memory 72. Additionally,
it will be noted that prior to application to the transition
filter, the closure control signal is provided through an analog
gate 82, the control terminal of which is connected to the
closure delay control signal on line 81. As also discussed
above, the closure delay control signal æ rves to mo~entarily
delay the transmission of the closure control signal (as well as
the fricative amplitude control signal) during certain
vowel-to-fricative phoneme transitions.
Once the closure control signal has been provided
through the transition filter and effectively converted thereby
to an analog signal, it is converted back to a digital square
wave signal having a duty cycle proportional to the amplitude of
the analog signal. miS is accomplished by c3nnecting the output
of the transition filter to


-14-

1124866

the negative input of a comparator amplifier 80. The positive
input of comparator amplifier 80 is supplied with a 20 KHz
triangle signal frcm the output of clock circuit 85. Comparator
amplifier 80 effectively pulse width modulates the analog control
signal provided to its negative input so that the output signal
provided on line 84 comprises a square wave signal whose duty
cycle is proportional to the magnitude of the analog signal
provided to its negative input. The duty cycle closure control
signal on line 84 is then provided to the control terminal of an
analog gate 86 which is connected in circuit with the audio
output line. me closure control signal on line 84 is adapted to
momentarily render nonconductive analog gate 86 so as to cause an
abrupt amplitude modulation of the audio output. As previously
noted, the closure control signal is generated for certain
phoneme interactions such as the phoneme "b" followed by the
phoneme "e".
As discussed in connection with the description of the
block diagram in Figues la and lb, the remaining two bits 76 in
the 8-bit digital input command w~rd are utilized for inflection
control. The two binary-weighted bits 76 are combined and
provided through a transition filter 88 to smooth the abrupt
amplitude variations in the combined signal. The resulting
analo~ signal on line 89 is provided to a sawtooth generator
circuit 90 which essentially comprises an integrator amplifier 91
that is adapted to produce a sawtooth waveform at its output at
node 95. The frequency of the sawtooth waveform generated by
circuit 90 is dependent ~pon the magnitude of the signal provided
to the negative input of integrator amplifier 91. Thus, it can
be seen that by varying the setting of inflection bits 76, t~
fundamental frequenty (FO of the glottal waveform is
~ -15-

~ . varied. 1124866
The sawtooth waveform at node 95 is provided
through an additional waveform shaping circuit 100 that is
adapted to effectively truncate the sawtooth waveform by
su~tracting the lower half of the signal. The resulting
output signal on line 104 represents the glottal waveform
that is injected into the vocal tract. For a more detailed
explanation of the Yocal excitation source circuitry, see the
aforementioned copending Canadian application, Serial No.
283,g41.
- Additionally, it will be noted that the sawtooth .
waveform at nocle 95 is also provided through an inverting
amplifier 97 to the input of a NO~-gate 98. NOR-gate 98 is
controlled by the output of op amp 94 which is adapted to
enable NOR-gate 98 wheneYer a vocal amplitude control signal
is produced on line 92. When a vocal amplitude control signal
is present on line 92, the output from op ~mp 94 will go LO,
thereby causing NOR-gate 98 to "s~uare-up" the sawtooth wave-
: form from .the output of op amp 97. The square wave signal
from the output of NOR-gate g8 is then provided to the input
of another ~OR-gate 102 which has its other input connected
to receiYe the fricatiYe amplitude control signal on line 96.
Thus, it can be seen that when a Yocal amplitude control.signal
; is present on line 92, thereby enabling NOR-gate 98, NOR-gate
102 will "chop" the fricatiYe amplitude control signal on line
96 in accordance with the "s~uared-up" sawtooth wavef~rm from
node 95. When a vocal amplitude control signal is not pre~ent
on line 92, ~OR-gate 98 is thereby inhibitecl rendering it~
autput LO, which in turnn~kes ~ gate 102 appear li~c an inver~r
permitting the fricative

- 16 -

~lZ~8S6

amplitude control signal on line 96 to pass unaffected by the
square wave signal. It will be noted, that since the frequency
of the sawtooth waveform at node 95 is approximately 200 times
slower than the duty cycle frequency of the fricative amplitude
control signal on line 96 (100 Hz. vs. 20 KHz), the "chopping" of
the fricative amplitude control signal by the sawtooth waveform
is effective to substantially diminish the fricative or unvoioed
speech component whenever a phoneme requiring voioed energy, as
indicated by the presenoe of a vocal amplitude control signal, is
present.
The fricative amplitude control signal from the output
of NOR-gate 102 on line 96' is provided to the control terminal
of an analog gate 106 that is connected in circuit to the output
of the white noise generator 110. m e fricative excitation
signal on line 108 produced by generator 110 is effectively
amplitude modulated by the rapid on/off cycling of analog gate
106 under the control of the fricative amplitude duty signal
control signal. The mcdulated signal is then provided through a
4 KHz high pass filter 122 to an additional pair of analog
gates 118 and 120. Analog gates 118 and 120 are adapted to
control the injection of fricative excitation energy into the F2
and F5 resonant filters in the vocal tract. Unlike previous
synthesizers, the present invention is adapted to control the
injection of fricative energy into the vocal tract with a single
control parameter; herein the fricative control (FC) signal.
Thus, the circuitry required to generate an additional control
parameter is eliminated. Upon examination of the frequency
spectrum of fricative phonemes, it was determined that for the
most part phonemes nequiring substantial

-17-

11~4866


amounts of low frequency fricative energy in the range of the F2
formant, do not also require substantial amounts of high
frequency fricative energy in the range of the F5 formant, and
vice versa. For example, for fricative phonemes such as "f" and
"p", fricative energy must be injected primarily into the F2
resonant filter, and for phonemes such as "s" and "t", it is
necessary to inject fricative energy primarily into the FS
resonant filter. Consequently, the present system is adapted to
generate a single fricative control parameter (FC) on line 112
which is also provided through an inverting comparator amplifier
114 to produce the inverse of the fricative control parameter
(FC) on line 116. The fricative control parameter on line 112 is
connected to the control terminal of analog gate 118 and is
adapted to control the injection of low frequency fricative
energy on line 124 into the F2 resonant filter, and the inverse
of the fricative control signal on line 116 is connected to the
control terminal of analog gate 120 and is adapted to control the
injection of high frequency fricative energy on line 126 into the
fricative or F5 resonant filter. Thus, it will be appreciated
that the amount of fricative energy that is injected into the F2
resonant filter is inversely related to the amount of fricative
energy that is injected into the F5 resonant filter.
qhe voiced component or glottal waveform on line 104
from the voiced excitation source is injected into the vocal
tract at the Fl resonant filter. Injection of the voiced
component into the vocal tract is controlled by the vocal
spectral contour control signal on line 140 and the vocal
amplitude control signal on line 128.

-18-

1124~366


In particular, the vocal amplitude and vocal spectral contour
control signals are connected to the control terminals of analog
gates 130 and 142 respectively, which are connected in circuit
with the voiced excitation signal on line 104. As previously
noted, the vocal spectral contour control signal is adapted to
spectrally shape the energy content of the voiced excitation
signal by controlling the cutoff frequency of a first order low
pass filter 143, and the vocal amplitude control signal is
adapted to modulate the amplitude of the voiced excitation
signal.
Although the Fl, F2, and F3 resonant filters are serially
connected, the voiced excitation signal in the preferred
embodiment herein does not contain enough hiyh frequency energy t~
adequately drive the F2 and F3 resonant filters. This, of course,
is contrary to conventional practice wherein the first three
resonant filters in the vocal tract are driven principally by the
voiced component of speech. However, in order to provide the
present synthesizer with a more "breathy" or "hoarse" voice, the
second and third resonant filt~rs herein are driven principally
with fricative energy under the control of the vocal ampli.tude
control signal. Specifically the output from the white noise
generator 110 on line 108 is injected directly into the F2
resonant filter through resistor R4 and into the F3 resonant
filter through resistor R5. In~ection of white noise into the F2
and F3 resonant filters is controlled by analog gate 134 which has
its control terminal connected to receive the vocal amplitude
control signal on line 128. Thus, it can be seen that the F2 and
F3 resonant filters in the present embodiment are driven
asynchronously in
-19 -

- 1124866

parallel, with white noise under the control of the vocal
amplitude control signal. The asynchronous drive of the F2 and F3
resonant filters derives fran the fact that residual vocal energy
fran the output of the Fl resonant filter does cause a certain
amount of excitatior. of the F2 and F3 resonant filters. However,
due to the inherent delay created by the voice component passing
through the Fl resonant filter, the F2 and F3 resonant filters are
subject to double excitation; first with fricative energy through
resistors R4 and RS and secondly by the delayed vocal energy fr~n
the output of the Fl resonant filter.
Finally, as noted in the block diagram, the output from
the Fl, F2 and F3 serially connected resonant filters in the vocal
tract is combined with the output from the fricative or F5
resonator by summing circuit 144 and provided through a low pass
filter circuit 146 to an appropriate audio transducer device.
Looking ncw to Figure 3, a block diagram of another
embodiment of the present invention is shown. The blocks
appearing in Figure 3 which correspond to blocks shown in the
first embodiment illustrated in Figures la and lb are labeled with
primed reference numerals. As can be readily seen from the
diagram, the embodiment illustated in Figure 3 is also driven by
an 8-bit digital input command word with six of the input bits
utilized for phoneme selection and the remaining tw~ bits used for
inflection control. As in the first emkodiment, the read-only
memory unit 12' is adapted to generate twelve control signal
parameters for each phoneme. However, it will be noted that one
of the signal parameters is utilized
-20-


~;, ,,-


~ l~Z~866

to produce two separate control signals; i.e., the vocal spectralcontour and fricative frequency control signals. The generation
of a separate fricative frequency control signal permits the
fricative control signal, as it was referred to in the first
embodiment, to be used solely as a fricative low pass (FLP)
control signal. Thus, a conventional fricative excitation
controller network 58' can be utilized.
The second embcdiment also includes a unique pause
control circuit 150 which is adaptd to "hold" the values of

certain critical control parameters from the output of RfM 12'
whenever a pause in the audio output is detected. The purpose of
the pause control circuit 150 is to prevent the values of the
critical control parameters from changing and thus altering the
characteristics of the vocal tract 60 before the audio has
completely faded out. The pause control circuit 150 is adaptd to
detect a pause by continuously monitoring the fricative amplitude
and vocal amplitude oontrol signals and providing an output signal
whenever both signals are L3. The output signal produced thereby
is fed back to the latch circuits at the outputs of RDM 12' to
"'hold" the parameters at their current values. The pause control
circuit 150 is further adapted to terminate the "hold" signal
after a predetermined period into the pause phoneme as determined
by the closure delay control signal from closure delay networ~
16'.
The remaining differences in the present embcdiment are
found in the vocal tract 60' and the manner in which the voiced
and unvoiced excitation signals are injected into the vocal tract
60'. Specifically, the Fl, F2, F3 and F5 resonant filters 42',



441, 46' -21-

- ` 1124866


and 54' respectively, in the present embodiment are all serially
connected rather than having the Fs resonant filter connected in
parallel with the first three æ rially connected resonant filters
as in the first embodiment. A~ditionally it will be noted that a
feedback path has been added between the F2 and Fl resonant
filters 44' and 42' between the F3 and F2 resonant filters 46' and
44' . These feedback paths are provided to simulate the back
pressures which are generated in the human voioe system between
the tongue, mouth and vocal chords.
Finally, it will be noted that the present e~hodiment
also provides asynchronous parallel excitation of the vocal tract
60'. However, unlike the first embodiment, the asynchronous
parallel excitation herein is supplied solely by the voiced
component. In particular, it can be seen that the output from the
fricative excitation controller 58' is only injected in parallel
into the F2 and FS resonant filters 44' and 54' in the
conventional manner. However, the voiced excitation signal from
the output of the vocal excitation controller 40', in addition to
bein~ iniected into the Fl resonant filter 42', is also in]ected
in parallel into the F2 resonant filter 44'. Thus, the F2
resonant filter 44', and to a lesser extent the F3 resonant filter
46', are driven twice; first by the direct injection of vocal
energy into the F2 resonant filter 44', and subsequently by the
delayed vocal energy from the output of the Fl resonant filter
42'. The purpose of this arrangement is to more accurately
simulate the true action of the human glottis which provides a
type of "double" excitation of the vocal chords each time it opens
and closes.
-22-

112~66


Referring now to Figures 4a and 4b, a circuit diagram of
the embcdiment of the present invention illustrated in Figure 3 is
shown. At the outset, it is to be noted that the voice
synthesizer illustrated in Figures 4a and 4b is adapted to operate
off a 12 volt power supply. In actuality, the system will
function off a supply that varies anywhere from 6 volts to 15
volts. m us, this embodiment of the present invention is
particularly suited for use in combination with a portable battery
p~wer source.
m e power requirements of the present system is such that
four discrete voltage levels are needed. In addition to the +V
(e.g. 12 volts) and ground potentials provided by the battery, the
present system includes a power supply circuit 220 that is
adapted to generate twwo additional voltage levels, designated
+Vl and +V2, between +V and ground. However, sinoe the voltage
output of a battery will vary over its useful life, it is
important that the +Vl and +V2 voltage levels vary
correspondingly. Thus, the present power supply circuit 22
includes a pair of voltage follower circuits 222 and 224 which are
adapted to produce output signals that "follow" variations in the
voltage level of the signals provided to their inputs.
Pdditionally, the change to a variable power source also
mandates the use of op amps in certain portions of the circuit
that are capable of providing an adequate current sink at their
minimum rated voltage. Accordingly, the preferred embodiment
utilizes Fairchile 798 op amps for those op amps designated with
the letter "A".
The ROM storage requirement is supplied in this



-23-

~ 6~i


embodiment by three individual CMOS R~M memory chips 152, 154, and
156, herein No. MC14524. The outputs from RCM memories 152, 154,
and 156 are provided to latch circuits 158, 160 and 162
respectively, which serve the purpose of the C~OS buffers used in
the first enbodiment to drive the slow~cting transition filters,
and also serve to inhibit the CMO~ ~5 data outputs fr~n going HI
during address switching. Latch circuit 158 is a tri-state latch,
the third state providing a sample-and-hold function.
As discussed previously, the transitional changes in the
values of the more critical control parameters may give rise to a
condition most noticeable with the last phoneme before a pause,
wherein the value of the control parameters may change prior to
complete dissipation of the excitation energy in the vocal tract.
The result is that the last phoneme before a pause will begin to
take on a different characteristic and therefore a different sound
as the audio fades out. Ib rectify this situation, the fricative
amplitude control signal on line 164 and the vocal amplitude
control signal on line 166 are provided to a NOR-gate 168 which
has its output connected to the negative input of a o~mparator
20 amplifier 170. When hoth the fricative amplitude and vocal
amplitude control signals are LO, the output from NO~<~ate 168
will go HI, causing the output of comparator amplifier 170 cn line
171 to go LO. The LO signal on line 171 in turn causes the output
of NOR~ate 172 to go HI, thereby switching tri-state latch 158 to
its sample-and-hold state. Additionally, the HI output signal
from NOR-gate 172 on line 176 is provided through an inverter 178
to the control terminals of a pair of analog gates 180 and 182.
Analog gates 180 and 182 are connected in circuit with the



. .~
~, -24-

-~ . 1124~66

vocal spectral contour (VSC + FF) and F2Q control signals
respectively, appearing at the Ql and Q2 outputs of latch circuit
160. When the signal on line 176 goe s HI causing the output of
inverter 178 to go LO, analog gates 180 and 182 are open
circuited, thus isolating the transition filters associated with
the V5C + FF and F2Q control signals from further changes in the
output state of latch 160.
Thus, it can be seen that whenever a pause phoneme is
detected, as determined by the absenoe of both the vocal amplitude
and fricative amplitude oontrol signals, the Fl, F2, F3, and FLP
control signal parameters appearing at the outputs of tri-state
latch 158 are held at their current values, and the transition
filters associated with the vocal spectral contour, fricative
frequency, and F2Q control signals æe isolated from the outputs
of latch 160. Accordingly, it can be seen that the capacitors in
the transition filters associated with each of the various
critical control signal parameters identified are effectively
isolated during the initial part of the pause phoneme from further
changes in the ~oM outputs to insure that the vocal energy in the
vocal tract completely fades out before the existing phoneme
parameters are changed.
The HI signal on line 176 at the output of NOR-gate 172
is automatically terminated after a predetermined period of time
into the pause phoneme to permit resumption of normal circuit
operation. In particular, the other input to NOR-gate 172 is
connected to receive the closure delay (CLD) duty cycle control
signal cn line 174 from the output of comparator amplifier 175.
The output from comparator amplifier 175 is always initially LO




c -25-
I



at the beginning of a phoneme period due to the triangle ramp
signal (TR) provided to its negative input from the phoneme timer
circuit 200. Hbwever, after a predetermined period of time less
than the duration of an entire phoneme period, the magnitude of
the TR signal will drop below the magnitude of the CLD control
signal provided to the positive input of comparator amplifier 175,
thus causing its output on line 174 to go HI. ffl e predetermined
period of time is, of course, dependent upon the sloFe of the TR
signal which is in turn controlled by the phoneme timing control
signal on line 204. ~hen the closure delay duty cycle control
signal on line 174 goes HI, the output of NOR-gate 172 g oe s LO,
thus removing the sample-and-hold signal frcm tri-state latch 158
and rendering analog gates 180 and 182 conductive.
Additionally, it will be noted that the same control
signal parameter from the Ql output of latch circuit 160 on line
184 is provided to tw~ separate transition filter circuits 185 and
186. Ihe output from transition filter 185 is provided through an
analog-to-digital converter 187 to provide the vocal spectral
contour duty cycle control signal on line 202, and the output from
transition filter 186 is provided through an analo~~to-digital
converter 188 to provide the fricative frequency duty cycle
control signal on line 190. muS~ it can be seen that a single
oontrol signal parameter on line 184 is utilized to provide both
the vocal spectral contour control signal on line 202 and the
fricative frequency control signal on line 190.
As noted in the discussion of the block diagram of Figure
3, the generation of a separate fricative frequency oontrol
-26-




, i

``" 112~8~6

signal permits the use of a conventional controller network
comprising separately controlled bandpass and low pass filter
circuits, 192 and 198 respectively. In particular, the fricative
frequency control signal on line 190 is provided to the control
terminal of an analog gate 191 which is adapted to oontrol the
bandpass of the bandpass filter 192. The remaining fricative
control signal, referred to simply as the FC control signal in the
first embcdiment, is utilized solely as a low pass control signal.
Accordingly, the fricative low pass (FLP) control signal on line
194 is provided to the control terminals of a pair of analog gates
195 and 196 which are adapted to oontrol the cut-off f~equency of
the low pass filter 198 in the fricative excitation controller
network. The fricative excitation signal from the controller
netw~rk is injected into the vocal tract at the F2 resonant filter J
through resistor R10 and at the F5 resonant filter through
resistor R12. Since the value of resistor R10 is substantially
greater than the value of resistor R12, the major portion of the
fricative excitation energy is injected into the F5 resonant
filter.
The vocal excitation signal or glottal waveform on line
200 is spectrally shaped and amplitude modulated under the control
of the vocal spectral contour control signal on line 202 and the
vocal amplitude control signal on line 206, respectively. The
glottal waveform is then injected into the vocal tract at the Fl
resonant filter through resistor R14 and at the F2 resonant filter
through resistor R16. Thus, as in the first embodiment, the vocal
tract is driven asynchronously due to the fact that the glottal
waveform is

-27-
'

.

112 ~866

effectively delayed -- i.e., shifted approximateiy 180 degrees -
as it passes through the Fl resonant filter. Accordingly, the F2
and F3 resonant filters are effectively driven twice; first by the
direct injection of the voiced excitation signal through resistor
R16, and subsequently by the delayed injection of vocal energy
from the output of the Fl resonant filter.
By driving the vocal tract asynchronously as described,
the pre æ nt speech synthesizer more closely simulates the true
action of the human glottls. Specifically, the glottis does not
provide a single excitation of the vocal chords by opening and
closing smoothly. Rather, it has been found that the glottis
initially closes on one side and then subsequently closes
completely with a rapid motion. Accordingly, the vocal tract is
effectively excited twice with each complete opening and closing
of the glottis. m e asynchronous drive of the present system thus
simulates this action by providing aouble vocal excitation of the
vocal tract.
Moreover, it has been found that, particularly in yiew of
the fact that an F4 resonant filter is not used, ~he audio output
sounds better if the glottal waveform does not have a substantial
amount of high frequency energy when injected into the Fl resonant
filter. However, with the high frequency energy of the glottal
wavefonm reduced when injected into the Fl resonant filter, there
is insufficient energy remaining in the glottal waveform at the
output of the Fl resonant filter to adequately drive the F2 and F3
resonant filters. Accordingly, the parallel injection of the
voiced excitation signal into the F2 resonant filter also serves
to provide adequate




-28-
~ .,

12486~

high frequency vocal energy to the F2 and F3 resonant filters.
Additionally, it will be noted that a feedback resistor
R22 is provided between the output of the F2 resonant filter and
the input of the Fl resonant filter, and another feedback resistor
R24 is provided between the output of the F3 resonant filter and
the input of the F2 resonant filter. These feedback resistors
simulate the normal back pressures which are present in the human
vocal system. Specifically, when the mouth closes, the back
pressure created affects the vibration of the vocal chords.
Similarly, the movement of the tongue also creates back pressures
which affect the vibration of the vocal chords. Ihus, the
inter-resonant feedback provided by resistors R22 and R24 serve to
more closely model the present vocal tract to the human voice
system. Also it will be noted that a pair of resistors R18 and
R20 are provided~across the bandpass sections of the Fl and F2
resonant filters, respectively. It has been found that "Q" or
bandpass of the Fl and F2 resonant filters varies inversely with
changes in the resonant frequencies of the filters, although to a
lesser extent. m us, resistors R18 and R20 are provided to
implement this feature.
Finally, as noted in the block diagram in Figure 3, the
present emkodiment utilizes a completely serially connected vocal
tract. In particular, the Fl, F2, F3 and F5 resonant filters are
- all connected in cascaded form, with the output from the F5
resonant filter provided through the closure network 214 and a 20
KHz lcw pass filter 216 ~o an appropriate audio transducer device.
While the above description oonstitutes the preferred

-29-

-

6i~i

embodiments of the invention, it will be appreciated that
the invention is susceptible to modification, variation
and change without departing from the proper scope or
fair meaning of the accompanying claims.
Subject matter disclosed in this application is dis-
closed and claimed in the aforementioned Canadian application
Serial Number 310,040 and a further divisional thereof.




- 30 -
csm/ ~

Representative Drawing

Sorry, the representative drawing for patent document number 1124866 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 1982-06-01
(22) Filed 1981-05-15
(45) Issued 1982-06-01
Expired 1999-06-01

Abandonment History

There is no abandonment history.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $0.00 1981-05-15
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
FEDERAL SCREW WORKS
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Drawings 1994-02-18 7 231
Claims 1994-02-18 2 47
Abstract 1994-02-18 1 36
Cover Page 1994-02-18 1 11
Description 1994-02-18 30 1,195