Language selection

Search

Patent 2356041 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 2356041
(54) English Title: EXCITATION VECTOR GENERATOR, SPEECH CODER AND SPEECH DECODER
(54) French Title: GENERATRICE A TRAITEMENT VECTORIEL EXCITE, VOIX CODER/DECODER
Status: Expired
Bibliographic Data
(51) International Patent Classification (IPC):
  • G10L 19/12 (2013.01)
(72) Inventors :
  • YASUNAGA, KAZUTOSHI (Japan)
  • MORII, TOSHIYUKI (Japan)
  • WATANABE, TAISUKE (Japan)
  • EHARA, HIROYUKI (Japan)
(73) Owners :
  • GODO KAISHA IP BRIDGE 1 (Japan)
(71) Applicants :
  • MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD. (Japan)
(74) Agent: OSLER, HOSKIN & HARCOURT LLP
(74) Associate agent:
(45) Issued: 2003-07-29
(22) Filed Date: 1997-11-06
(41) Open to Public Inspection: 1998-05-14
Examination requested: 2001-08-27
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
8-294738 Japan 1996-11-07
8-310324 Japan 1996-11-21
9-34582 Japan 1997-02-19
9-34583 Japan 1997-02-19

Abstracts

English Abstract

A random code vector reading section and a random codebook of a conventional CELP type speech coder/decoder are respectively replaced with an oscillator for outputting different vector streams in accordance with values of input seeds, and a seed storage section for storing a plurality of seeds . This makes it unnecessary to store fixed vectors as they are in a fixed codebook (ROM), thereby considerably reducing the memory capacity.


French Abstract

Une section de lecture aléatoire de vecteur de code et un livre de code aléatoire d'un codeur/décodeur de voix conventionnel de type CELP remplacés respectivement par un oscillateur pour la sortie de différents flux vectoriels conformément aux valeurs de départ d'entrée, et une section de stockage des valeurs de départ pour stocker une pluralité de valeurs de départ. Cela rend inutile de stocker des vecteurs fixes comme dans un livre de codes fixe (ROM), réduisant ainsi considérablement la capacité de la mémoire.

Claims

Note: Claims are shown in the official language in which they were submitted.





149
CLAIMS
1. A CELP (Code Excited Linear Prediction) speech
codes or decoder, comprising:
an adaptive codebook for generating an adaptive code
vector;
a means for determining a voiced/unvoiced
characteristic of the input speech;
a random codebook for generating a random code
vector;
a synthesis filter fox receiving a random code
vector generated from the random codebook so as to
perform LPC synthesis;
wherein said random codebook is formed by an
excitation vector generator which outputs an input vector
when the input speech is determined to be voiced and
outputs a modified input vector which is generated by
arranging at least one fixed waveform stored in a fixed
waveform storage means in accordance with the positions
and polarities of pulses of an input vector when the
input speech is determined to be unvoiced.
2. A CELP speech codes or decoder, comprising:
an adaptive codebook for generating an adaptive code
vector;
a means for determining a voiced/unvoiced
characteristic of the input speech;
a random codebook for generating a random code
vector;




150
a synthesis filter for receiving a random code
vector generated from the random codebook so as to
perform LPC synthesis;
wherein said random codebook is formed by an
excitation vector generator which generates a random code
vector composed of an input vector when the input speech
is determined to be voiced and generates a random code
vector by arranging at least one fixed waveform stored in
a fixed waveform storage means in accordance with the
positions and polarities of pulses of an input vector
when the input speech is determined to be unvoiced.
3. The CELP speech coder or decoder according to
claim 1 or 2, wherein an excitation vector generator is
adapted to effect said arranging step by shifting at
least one fixed waveform.
4. The CELP speech coder or decoder according to
claim 1 or 2, wherein an excitation vector generator is
adapted to effect said arranging step by convoluting at
least one fixed waveform with an input vector.
5. The CELP speech coder/decoder according to any
one of claims 1 to 4, wherein the determination of the
voiced/unvoiced characteristic of the input speech is
based on a value of a quantized pitch gain.
6. The CELP speech coder/decoder according to any
one of claims 1 to 4, wherein the determination, of the
voiced/unvoiced characteristic of the input speech is
based on a value of an adaptive codebook gain.
7. The CELP speech coder/decoder according to any
one of claims 1 to 6, wherein the excitation vector
generator generates a corresponding random code vector by




151
changing said fixed waveform either to be arranged,
shifted or convoluted.
8. The CELP speech coder/decoder according to any
one of claims 1 to 6, wherein the excitation vector
generator generates a different random code vector by
changing the fixed waveform used for the modification of
said input vector.
9. The CELP speech coder/decoder according to any
one of claims 1 to 8, wherein said input vector is a
vector provided from an algebraic codebook.
10. A CELP speech coding or decoding method,
comprising the steps of:
determining a voiced/unvoiced characteristic of the
input speech;
outputting a random code vector composed of an input
vector when the input speech is determined to be voiced
and outputting a modified input vector which is generated
by arranging at least one fixed waveform in accordance
with the positions and polarities of pulses of an input
vector when the input speech is determined to be
unvoiced;
performing LPC synthesis using the random code
vector.
11. A CELP speech coding or decoding method,
comprising the steps of:
generating an adaptive code vector;
determining a voiced/unvoiced characteristic of the
input speech;




152
generating a random code vector composed of an input
vector when the input speech is determined to be voiced
and generating a random code vector by arranging at least
one fixed waveform in accordance with the positions and
polarities of pulses of an input vector when the input
speech is determined to be unvoiced;
outputting the random code vector to a synthesis
filter so as to perform LPC synthesis.
12. The method according to claim l0 or 11; wherein
arranging step shifts said fixed waveform in accordance
with the positions and polarities of said pulses of an
input vector.
13. The method according to claim l0 or 11, wherein
arranging step convolutes said fixed waveform with an
input vector.
14. The method according to any one of claims 10 to
13, wherein the determination of the voiced/unvoiced
characteristic of the input speech is based on a value of
a quantized pitch gain.
15. The method according to one of claims 10 to 13,
wherein the determination of the voiced/unvoiced
characteristic of the input speech is based on a value of
an adaptive codebook gain.
16. The method according to any one of claims 10 to
15, wherein the random code vector generated by changing
said fixed waveform either to be arranged, shifted or
convoluted.
17. The method according to any one of claims 10 to
15, wherein a different random code vector is generated




153

by changing said fixed waveform and used for the
modification of said input vector.

18. The method according to any one of claims 10 to
17, wherein said input vector is a vector provided from
an algebraic codebook.

Description

Note: Descriptions are shown in the official language in which they were submitted.



CA 02356041 2001-08-27
1
DESCRIPTION
EXCITATION VECTOR GENERATOR, SPEECH CODER AND SPEECH
DECODER
Technical Field
The present invention relates to an excitation
vector generator capable of obtaining a high-quality
synthesized speech, and a speech codes and a speech
decoder which can code and decode a high-quality
speech signal at a low bit rate.
Background Art
A CELP (Code Excited Linear Prediction) type
speech codes executes linear prediction for each of
frames obtained by segmenting a speech at a given
time, and codes predictive residuals (excitation
signals) resulting from the frame-by-frame linear
prediction, using an adaptive codebook having old
excitation vectors stored therein and a random
codebook which has a plurality of random code vectors
stored therein. For instance, "Code-Excited Linear
Prediction(CELP):High-Quality Speech at Very Low~Bit
Rate," M. R. Schroeder, Proc. ICASSP '85, pp. 937-940
discloses a CELP type speech codes.
FIG. 1 illustrates the schematic structure of a
CELP type speech codes. The CELP type speech codes


CA 02356041 2001-08-27
2
separates vocal information into excitation
information and vocal tract information and codes
them. With regard to the vocal tract information, an
input speech signal 10 is input to a filter
coefficients analysis section 11 for linear
prediction and linear predictive coefficients (LPCs)
are coded by a filter coefficients quantization
section 12. Supplying the linear predictive
coefficients to a synthesis filter 13 allows vocal
tract information to be added to excitation
information in the synthesis filter 13. With regard
to the excitation information, excitation vector
search in an adaptive codebook 14 and a random
codebook 15 is carried out for each segment obtained
by further segmenting a frame (called subframe). The
search in the adaptive codebook 14 and the search in
the random codebook 15 are processes of determining
the code number and gain (pitch gain) of an adaptive
code vector, which minimizes coding distortion in an
equation 1, and the code number and gain (random code
gain) of a random code vector.
~~v-(gaHp+gcHc)~~' ( 1 )
v: speech signal (vector)
H: impulse response convolution matrix of the

CA 02356041 2001-08-27
3
h(0) 0 ~~~ ~~~ 0 0
h(1) h(0) 0 ~ ~ ~ 0 0
h(2) h(1) h(0) 0 0 0
0 0
h(0) 0
h(Z -1) ~ ~ ~ ~ ~ ~ ~ ~ ~ h(1) h(C
synthesis filter.
where h: impulse response (vector) of the synthesis
filter
L: frame length
p: adaptive code vector
c: random code vector
ga: adaptive code gain (pitch gain)
gc: random code gain
Because a closed loop search of the code that
minimizes the equation 1 involves a vast amount of
computation for the code search, however, an ordinary
CELP type speech coder first performs adaptive.
codebook search to specify the code number of an
adaptive code vector, and then executes random
codebook search based on the searching result to
specify the code number of a random code vector.
The speech coder search by the CELP type speech
coder will now be explained with reference to FIGS.
2A through 2C. In the figures, a code x is a target
vector for the random codebook search obtained by an
equation 2. It is assumed that the adaptive codebook

CA 02356041 2001-08-27
4
search has already been accomplished.
x - v - gaHp (2)
where x: target (vector) for the random codebook
search
v: speech signal (vector)
H: impulse response convolution matrix H of the
synthesis filter
p: adaptive code vector
ga: adaptive code gain (pitch gain)
The random codebook search is a process of
specifying a random code vector c which minimizes
coding distortion that is defined by an equation 3 in
a distortion calculator 16 as shown in FIG. 2A~.
~~X-,~cHc~~' ( 3 )
where x: target (vector) for the random codebook
search
H: impulse response convolution matrix of the
synthesis filter
c: random code vector
gc: random code gain.
The distortion calculator 16 controls a control
switch 21 to switch a random code vector to be read
from the random codebook 15 until the random code
vector c is specified.


CA 02356041 2001-08-27
An actual CELP type speech coder has a structure
in FIG. 2B to reduce the computational complexities,
and a distortion calculator 16' carries out a process
of specifying a code number which maximizes a
5 distortion measure in an equation 4.
'XtHCl2 rlXrHlC'= (xitCl2 ~X,rC~z
tIIH~III' IIIIH~III l ~ IIH~III' ~ l~'H'Hc
where x: target (vector) for the random codebook
search
H: impulse response convolution matrix of the
synthesis filter
H': transposed matrix of H
X': time reverse synthesis of x using H (x''-
x'H )
c: random code vector.
Specifically, the random codebook control switch
21 is connected to one terminal of the random
codebook 15 and the random code vector c is read from
an address corresponding to that terminal. The read
random code vector c is synthesized with vocal tract
information by the synthesis filter 13, producing a
synthesized vector Hc. Then, the distortion
calculator 16' computes a distortion measure in the
equation 4 using a vector x' obtained by a time
reverse process of a target x, the vector He
resulting from synthesis of the random code vector in


CA 02356041 2001-08-27
6
the synthesis filter and the random code vector c.
As the random codebook control switch 21 is switched,
computation of the distortion measure is performed
for every random code vector in the random codebook.
Finally, the number of the random codebook
control switch 21 that had been connected when the
distortion measure in the equation 4 became maximum
is sent to a code output section 17 as the code
number of the random code vector.
FIG. 2C shows a partial structure of a speech
decoder. The switching of the random codebook
control switch 21 is controlled in such a way as to
read out the random code vector that has a
transmitted code number. After a transmitted random
code gain~gc and filter coefficient are set in an
amplifier 23 and a synthesis filter 24, a random code
vector is read out to restore a synthesized speech.
In the above-described speech coder/speech
decoder, the greater the number of random code
vectors stored as excitation information in the
random codebook 15 is, the more possible it is to
search a random code vector close to the excitation
vector of an actual speech. As the capacity of the
random codebook (ROM) is limited, however, it is not
possible to store countless random code vectors
corresponding to all the excitation vectors in the
random codebook. This restricts improvement on the


CA 02356041 2001-08-27
7
quality of speeches.
Also has proposed an algebraic excitation which
can significantly reduce the computational
complexities of coding distortion in a distortion
calculator and can eliminate a random codebook (ROM)
(described in "8 KBIT/S ACELP CODING OF SPEECH WITH
MS SPEECH-FRAME: A CANDIDATE FOR CCITT
STANDARDIZATION": R. Salami, C. Laflamme, J-P. Adoul,
ICASSP '94, pp. II-97 to II-100, 1994).
10 The algebraic excitation considerably reduces the
complexities of computation of coding distortion by
previously computing the results of convolution of
the impulse response of a synthesis filter and a
time-reversed target and the autocorrelation of the
synthesis filter and developing them in a memory.
Further, a ROM in which random code vectors have been
stored is eliminated by algebraically generating
random code vectors. A CS-ACELP and ACELP which use
the algebraic excitation have been recommended
respectively as G. 729 and G. 723.1 from the ITU-T.
In the CELP type speech coder/speech decoder
equipped with the above-described algebraic
excitation in a random codebook section, however, a
target for a random codebook search is always coded
with a pulse.sequence vector, which puts a limit to
improvement on speech quality.


CA 02356041 2001-08-27
8
Disclosure of Invention
It is therefore a primary object of the present
invention to provide an excitation vector generator,
a speech coder and a speech decoder, which can
significantly suppress the memory capacity as
compared with a case where random code vectors are
stored directly in a random codebook, and can improve
the speech quality
It is a secondary object of this invention to
provide an excitation vector generator, a speech
coder and a speech decoder, which can generate
complicated random code vectors as compared with a
case where an algebraic excitation is provided in a
random codebook section and a target for a random
codebook search is coded with a pulse sequence vector,
and can improve the speech quality.
In this invention, the fixed cod-a vector reading
section and fixed codebook of a conventional CELP
type speech coder/decoder are respectively replaced
with an oscillator, which outputs different vector
sequences in accordance with the values of input
seeds, and a seed storage section which stores a
plurality of seeds (seeds of the oscillator). This
eliminates the need for fixed code vectors to be
stored directly in a fixed codebook (ROM) and can
thus reduce the memory capacity significantly.
Further, according to this invention, the random


CA 02356041 2001-08-27
9
code vector reading section and random codebook of
the conventional CELP type speech coder/decoder are
respectively replaced with an oscillator and a seed
storage section. This eliminates the need for random
code vectors to be stored directly in a random
codebook (ROM) and can thus reduce the memory
capacity significantly.
The invention is an excitation vector generator
which is so designed as to store a plurality of fixed
waveforms, arrange the individual fixed waveforms at
respective start positions based on start position
candidate information and add those fixed waveforms
to generate an excitation vector. This can permit an
excitation vector close to an actual speech to be
generated.
Further, the invention is a CELP type speech
coder/decoder constructed by using the above
excitation vector generator as a random codebook. A
fixed waveform arranging section may algebraically
generate start position candidate information of
fixed waveforms .
Furthermore, the invention is a CELP type speech
coder/decoder, which stores a plurality of fixed
waveforms, generates an impulse with respect to start
position candidate information of each fixed waveform,
convolutes the impulse response of a synthesis filter
and each fixed waveform to generate an impulse


CA 02356041 2001-08-27
response for each fixed waveform, computes the
autocorrelations and correlations of impulse
responses of the individual fixed waveforms and
develop them in a correlation matrix. This can
5 provide a speech coder/decoder which improves the
quality of a synthesized speech at about the same
computation cost as needed in a case of using an
algebraic excitation as a random codebook.
Moreover, this invention is a CELP type speech
10 coder/decoder equipped with a plurality of random
codebooks and switch means for selecting one of the
random codebooks. At least one random codebook may
be the aforementioned excitation vector generator, or
at least one random codebook may be a vector storage
section having a plurality of random number sequences
stored therein or a pulse sequences storage section
having a plurality of random number sequences stored
therein, or at least two random codebooks each having
the aforementioned excitation vector generator may be
provided with the number of fixed waveforms to be
stored differing from one random codebook to another,
and the switch means selects one of the random
codebooks so as to minimize coding distortion at the
time of searching a random codebook or adaptively
selects one random codebook according to the result
of analysis of speech segments.


CA 02356041 2001-08-27
l0A
Thus, in accordance with the present invention there
is provided a CELP (Code Excited Linear Prediction)
speech coder or decoder , comprising an adaptive codebook
for generating an adaptive code vector; a means for
determining a voiced/unvoiced characteristic of the input
speech; a random codebook for generating a random code
vector; a synthesis filter for receiving a random code
vector generated from the random codebook so as to
perform LPC synthesis; wherein said random codebook is
formed by an excitation vector generator which outputs an
input impulse vector when the input speech is determined
to be voiced and outputs a modified input impulse vector
which is generated by arranging at least one fixed
waveform stored in a fixed waveform storage means in
accordance with the positions and polarities of an input
impulse vector when the input speech is determined to be
unvoiced.
In accordance with a further aspect of the invention
there is provided a CELP speech coder or decoder,
comprising an adaptive codebook for generating an
adaptive code vector; a means for determining a
voiced/unvoiced characteristic of the input speech; a
random codebaok for generating a random code vector; a
synthesis filter for receiving a random code vector
generated from the random codebook so as to perform LPC
synthesis; wherein said random codebook is formed by an
excitation vector generator which generates a random code
vector composed of an input impulse vector when the input
speech is determined to be voiced and generates a random
code vector by arranging at least one fixed waveform
stored in a fixed waveform storage means in accordance
with the positions and polarities of an input impulse


CA 02356041 2001-08-27
10B
vector when the input speech is determined to be
unvoiced.
In accordance with a further aspect of the invention
there is provided a CELP speech coding or decoding
method, comprising the steps of determining a
voiced/unvoiced characteristic of the input speech;
outputting a random code vector composed of an input
impulse vector when the input speech is determined to be
voiced and outputting a modified input impulse vector
which is generated by arranging at least one fixed
waveform in accordance with the positions and polarities
of an input impulse vector when the input speech is
determined to be unvoiced, performing LPC synthesis using
the random code vector.
In accordance with a further aspect of the invention
there is provided a CELP speech coding or decoding method
comprising the steps of generating an adaptive code
vector; determining a voiced/unvoiced characteristic of
the input speech; generating a random code vector
composed of an input impulse vector when the input speech
is determined to be voiced and generating a random code
vector by arranging at least one fixed waveform in
accordance with the positions and polarities of an input
impulse vector when the input speech is determined to be
unvoiced; outputting the random code vector to a
synthesis filter so as to perform LPC synthesis.


CA 02356041 2001-08-27
11
Brief Description of Drawings
FIG. 1 is a schematic diagram of a conventional
CELP type speech coder;
FIG. 2A is a block diagram of an excitation
vector generating section in the speech coder in FIG.
1;
FIG. 2B is a block diagram of a modification of
the excitation vector generating section which is
designed to reduce the computation cost;
FIG. 2C is a block diagram of an excitation
vector generating section in a speech decoder which
is used as a pair with the speech coder in FIG. 1;
FIG. 3 is a block diagram of the essential
portions of ,a speech coder according to a first mode;
FIG. 4 is a block diagram of an excitation vector
generator equipped in the speech coder of the first
mode;
FIG. 5 is a block diagram of the essential
portions of a speech coder according to a second
mode;
FIG. 6 is a block diagram of an excitation vector
generator equipped in the speech coder of the second
mode;
FIG. 7 is a block diagram of the essential
portions of a speech coder according to third and
fourth modes;
FIG. 8 is a block diagram of an excitation vector


CA 02356041 2001-08-27
12
generator equipped in the speech coder of the third
mode;
FIG. 9 is a block diagram of a non-linear digital
filter equipped in the speech coder of the fourth
mode;
FIG. 10 is a diagram of the adder characteristic
of the non-linear digital filter shown in FIG. 9;
FIG. 11 is a block diagram of the essential
portions of a speech coder according to a fifth mode;
FIG. 12 is a block diagram of the essential
portions of a speech coder according to a sixth mode;
FIG. 13A is a block diagram of the essential
portions of a speech coder according to a seventh
mode;
FIG. 13B is a block diagram of the essential
portions of the speech coder according to the seventh
mode;
FIG. 14 is a block diagram of the essential
portions of a speech decoder according to an eighth
mode;
FIG. 15 is a block diagram of the essential
portions of a speech coder according to a ninth mode;
FIG. 16 is a block diagram of a quantization
target LSP adding section equipped in the speech
coder according to the ninth mode:
FIG. 17 is a block diagram of an LSP
quantizing/decoding section equipped in the speech

CA 02356041 2001-08-27
13
coder according
to the ninth
mode;


FIG. 18 is a block diagram of the essential


portions of a speech coder according to a tenth mode;


FIG. 19A
is
a
block
diagram
of
the
essential


portions of a speech coder according to an eleventh


mode;


FIG. 19B
is
a
block
diagram
of
the
essential


portions of a speech decoder according to the


eleventh mode;


FIG. 20 is a block diagram of the essential


portions of a speech coder according to a twelfth


mode;


FIG. 21 is a block diagram of the essential


portions of a speech coder according to a thirteenth


mode;


FIG. 22 is a block diagram of the essential


portions of a speech coder according to a fourteenth


mode;


FIG. 23 is a block diagram of the essential


portions of a speech coder according to a fifteenth


mode;


FIG. 24 is a block diagram of the essential


portions of a speech coder according to a sixteenth


mode;


FIG. 25 is a block diagram of a vector quantizing


section in
the sixteenth
mode;


FIG. 26 is ~ block diagram of a parameter coding




CA 02356041 2001-08-27
14
section of a speech coder according to a seventeenth
mode; and
FIG. 27 is a block diagram of a noise canceler
according to an eighteenth mode.
Best Modes for Carrying Out the Invention
Preferred modes of the present invention will now
be described specifically with reference to the
accompanying drawings.
(First Mode)
FIG. 3 is a block diagram of the essential
portions of a speech coder according to this mode.
This speech coder comprises an excitation vector
generator 30, which has a seed storage section 31 and
an oscillator 32, and an LPC synthesis filter 33.
Seeds (oscillation seeds) 34 output from the seed
storage section 31 are input to the oscillator 32.
The oscillator 32 outputs different vector sequences.
according to the values of the input seeds. The
oscillator 32 oscillates with the content according
to the value of the seed (oscillation seed) 34 and
outputs an excitation vector 35 as a vector sequence.
The LPC synthesis filter 33 is supplied with vocal
tract information in the form of the impulse response
convolution matrix of the synthesis filter, and
performs convolution on the excitation vector 35 with
the impulse response, yielding a synthesized speech


CA 02356041 2001-08-27
36. The impulse response convolution of the
excitation vector 35 is called LPC synthesis.
FIG. 4 shows the specific structure the
excitation vector generator 30. A seed to be read
5 from the seed storage section 31 is switched by a
control switch 41 for the seed storage section in
accordance with a control signal given from a
distortion calculator.
Simple storing of a plurality of seeds for
10 outputting different vector sequences from the
oscillator 32 in the seed storage section 31 can
allow more random code vectors to be generated with
less capacity as compared with a case where
complicated random code vectors are directly stored
15 in a random codebook.
Although this mode has been described as a speech
coder, the excitation vector generator 30 can be
adapted to a speech decoder. In this case, the
speech decoder has a seed storage section with the
same contents as those of the seed storage section 31
of the speech coder and the control switch 41 for the
seed storage section is supplied with a seed number
selected at the time of coding.
(Second Mode)
FIG. 5 is a block diagram of the essential
portions of a speech coder according to this mode.
This speech coder comprises an excitation vector


CA 02356041 2001-08-27
16
generator 50, which has a seed storage section 51 and
a non-linear oscillator 52, and an LPC synthesis
filter 53.
Seeds (oscillation seeds) 54 output from the seed
storage section 51 are input to the non-linear
oscillator 52. An excitation vector 55 as a vector
sequence output from the non-linear oscillator 52 is
input to the LPC synthesis filter 53. The output of
the LPC synthesis filter 53 is a synthesized speech
56.
The non-linear oscillator 52 outputs different
vector sequences according to the values of the input
seeds 54, and the LPC synthesis filter 53 performs
LPC synthesis on the input excitation vector 55 to
output the synthesized speech 56.
FIG. 6 shows the functional blocks of the
excitation vector generator 50. A seed to be read
from the seed storage section 51 is switched by a
control switch 41 for the seed storage section in
accordance with a control signal given from a
distortion calculator.
The use of the non-linear oscillator 52 as an
oscillator in the excitation vector 50 can suppress
divergence with oscillation according to the non-
linear characteristic, and can provide practical
excitation vectors.
Although this mode has been described as a speech


CA 02356041 2001-08-27
17
coder, the excitation vector generator 50 can be
adapted to a speech decoder. In this case, the
speech decoder has a seed storage section with the
same contents as those of the seed storage section 51
of the speech coder and the control switch 41 for the
seed storage section is supplied with a seed number
selected at the time of coding.
(Third Mode)
FIG. 7 is a block diagram of the essential
portions of a speech coder according to this mode.
This speech coder comprises an excitation vector
generator 70, which has a seed storage section 71 and
a non-linear digital filter 72, and an LPC synthesis
filter 73. In the diagram, numeral "74" denotes a
seed (oscillation seed) which is output from the seed
storage section 71 and input to the non-linear
digital filter 72, numeral "75" is an excitation
vector as a vector sequence output from the non-
linear digital filter 72, and numeral "76" is a
synthesized speech output from the LPC synthesis
filter 73.
The excitation vector generator 70 has a control
switch 41 for the seed storage section which switches
a seed to be read from the seed storage section 71 in
accordance with a control signal given from a
distortion calculator, as shown in FIG. 8.
The non-linear digital filter 72 outputs


CA 02356041 2001-08-27
18
different vector sequences according to the values of
the input seeds, and the LPC synthesis filter 73
performs LPC synthesis on the input excitation vector
75 to output the synthesized speech 76.
The use of the non-linear digital filter 72 as an
oscillator in the excitation vector 70 can suppress
divergence with oscillation according to the non-
linear characteristic, and can provide practical
excitation vectors. Although this mode has been
described as a speech coder, the excitation vector
generator 70 can be adapted to a speech decoder. In
this case, the speech decoder has a seed storage
section with the same contents as those of the seed
storage section 71 of the speech coder and the
control switch 41 for the seed storage section is
supplied with a seed number selected at the time of
coding.
(Fourth Mode)
A speech coder according to this mode comprises
an excitation vector generator 70, which has a seed
storage section 71 and a non-linear digital filter 72,
and an LPC synthesis filter 73, as shown in FIG. 7.
Particularly, the non-linear digital filter 72
has a structure as depicted in FIG. 9. This non-
linear digital filter 72 includes an adder 91 having
a non-linear adder characteristic as shown in FIG. 10,
filter state holding sections 92 to 93 capable of


CA 02356041 2001-08-27
19
retaining the states (the values of y(k-1) to y(k-N))
of the digital filter, and multipliers 94 to 95,
which are connected in parallel to the outputs of the
respective filter state holding sections 92-93,
multiply filter states by gains and output the
results to the adder 91. The initial values of the
filter states are set in the filter state holding
sections 92-93 by seeds read from the seed storage
section 71. The values of the gains of the
multipliers 94-95 are so fixed that the polarity of
the digital filter lies outside a unit circle on a Z
plane.
FIG. 10 is a conceptual diagram of the non-linear
adder characteristic of the adder 91 equipped in the
non-linear digital filter 72, and shows the
input/output relation of the adder 91 which has a 2's
complement characteristic. The adder 91 first
acquires the sum of adder inputs or the sum of the
input values to the adder 91, and then uses the non-
linear characteristic illustrated in FIG. 10 to
compute an adder output corresponding to the input
sum.
In particular, the non-linear digital filter 72
is a second-order all-pole model so that the two
filter state. holding sections 92 and 93 are connected
in series, and the multipliers 94 and 95 are
connected to the outputs of the filter state holding

CA 02356041 2001-08-27
sections 92 and 93. Further, the digital filter in
which the non-linear adder characteristic of the
adder 91 is a 2's complement characteristic is used.
Furthermore, the seed storage section 71 retains seed
5 vectors of 32 words as particularly described in
Table 1.
Table 1: Seed vectors for generating random code
vectors
i Sv(n-1) S (n-2) i Sv(n-1) Sv(n-2)
i i i i


1 0.250000 0.250000 9 0.109521 -0.761210


2 -0.564643 -0.104927 10 -0.202115 0.198718


3 0.173879 -0.978 r 11 -0.095041 0.863849
92


4 0.632652 0.951133 12 -0.634213 0.424549


5 0.920360 -0.113881 13 0.948225 -0.184861


6 0.864873 -0.860368 14 -0.958269 0.969458


7 0.732227 0.497037 15 0.233709 -0.057248


8 0.917543 -0.035103 16 -0.852085 -0.564948


10 In the thus constituted speech coder, seed
vectors read from the seed storage section 71 are
given as initial values to the filter state holding
sections 92 and 93 of the non-linear digital filter
72. Every time zero is input to the adder 91 from an
15 input vector (zero sequences), the non-linear digital
filter 72 outputs one sample (y(k)) at a time which
is sequentially transferred as a filter state to the
filter state holding sections 92 and 93. At this
time, the multipliers 94 and 95 multiply the filter
20 states output from the filter state holding sections
92 and 93 by gains al and a2 respectively. The adder


CA 02356041 2001-08-27
21
91 adds the outputs of the multipliers 94 and 95 to
acquire the sum of the adder inputs, and generates an
adder output which is suppressed between +1 to -1
based on the characteristic in FIG. 10. This adder
output (y(k+1)) is output as an excitation vector and
is sequentially transferred to the filter state
holding sections 92 and 93 to produce a new sample
(y(k+2)).
Since the coefficients 1 to N of the multipliers
94-95 are fixed so that particularly the poles of the
non-linear digital filter lies outside a unit circle
on the Z plane according to this mode, thereby
providing the adder 91 with a non-linear adder
characteristic, the divergence of the output can be
suppressed even when the input to the non-linear
digital filter 72 becomes large, and excitation
vectors good for practical use can be kept generated.
Further, the randomness of.excitation vectors to be
generated can be secured.
Although this mode has been described as a speech
coder, the excitation vector generator 70 can be
adapted to a speech decoder. In this case, the
speech decoder has a seed storage section with the
same contents as those of the seed storage section 71
of the speech coder and the control switch 41 for the
seed storage section is supplied with a seed number
selected at the time of coding.


CA 02356041 2001-08-27
22
(Fifth Mode)
FIG. 11 is a block diagram of the essential
portions of a speech coder according to this mode.
This speech coder comprises an excitation vector
generator 110, which has an excitation vector storage
section 111 and an added-excitation-vector generator
112, and an LPC synthesis filter 113.
The excitation vector storage section 111 retains
old excitation vectors which are read by a control
switch upon reception of a control signal from an
unillustrated distortion calculator.
The added-excitation-vector generator 112
performs a predetermined process, indicated by an
added-excitation-vector number excitation vector, on
an old excitation vector read from the storage
section 111 to produce a new excitation vector. The
added-excitation-vector generator 112 has a function
of switching the process content for an. old
excitation vector in accordance with the added-
excitation-vector number.
According to the thus constituted speech coder,
an added-excitation-vector number is given from the
distortion calculator which is executing, for example,
an excitation vector search. The added-excitation-
vector generator 112 executes different processes on
old excitation vectors depending on the value of the
input added-excitation-vector number to generate


CA 02356041 2001-08-27
23
different added excitation vectors, and the LPC
synthesis filter 113 performs LPC synthesis on the
input excitation vector to output a synthesized
speech.
According to this mode, random excitation vectors
can be generated simply by storing fewer old
excitation vectors in the excitation vector storage
section 111 and switching the process contents by
means of the added-excitation-vector generator 112,
and it is unnecessary to store random code vectors
directly in a random codebook (ROM). This can
significantly reduce the memory capacity.
Although this mode has been described as a speech
coder, the excitation vector generator 110 can be
adapted to a speech decoder. In this case, the
speech decoder has an excitation vector storage
section with the same contents as those of the
excitation vector storage section 111 of the speech
coder and an added-excitation-vector number selected
at the time of coding is given to the added-
excitation-vector generator 112.
(Sixth Mode)
FIG. 12 shows the functional blocks of an
excitation vector generator according to this mode.
This excitation vector generator comprises an added-
excitation-vector generator 120 and an excitation
vector storage section 121 where a plurality of


CA 02356041 2001-08-27
24
element vectors 1 to N are stored.
The added-excitation-vector generator 120
includes a reading section 122 which performs a
process of reading a plurality of element vectors of
different lengths from different positions in the
excitation vector storage section 121, a reversing
section 123 which performs a process of sorting the
read element vectors in the reverse order, a
multiplying section 124 which performs a process of
multiplying a plurality of vectors after the reverse
process by different gains respectively, a decimating
section 125 which performs a process of shortening
the vector lengths of a plurality of vectors after
the multiplication, an interpolating section 126
which performs a process of lengthening the vector
lengths of the thinned vectors, an adding section 127
which performs a process of adding the interpolated
vectors, and a process determining/instructing
section 128 which has a function of determining a
specific processing scheme according to the value of
the input added-excitation-vector number and
instructing the individual sections and a function of
holding a conversion map (Table 2) between numbers
and processes which is referred to at the time of
determining the specific process contents.


CA 02356041 2001-08-27
Table 2: Conversion map between numbers and
processes
Bit stream ( MS...LSB 6 ~ 4 3 2 1 0
)


V1 reading position 3 2 1 0


(16 kinds)


V2 reading position 2 1 0 4 3


(32 kinds)


V3 reading position 4 3 2 1 0


(32 kinds)


Reverse process 0


(2kinds)


Multiplication 1 0


(4 kinds)


decimating process 1 0


(4 kinds)


interpolation 0


(2 kinds)


The added-excitation-vector generator 120 will
now be described more specifically. The added-
5 excitation-vector generator 120 determines specific
processing schemes for the reading section 122, the
reversing section 123, the multiplying section 124,
the decimating section 125, the interpolating section
126 and the adding section 127 by comparing the input
10 added-excitation-vector number (which is a sequence
of 7 bits taking any integer value from 0 to 127)
with the conversion map between numbers and processes
(Table 2), and reports the specific processing
schemes to the respective sections.
15 The reading section 122 first extracts an element
vector 1 ( Vl ) of a length of 100 from one end of the
excitation vector storage section 121 to the position
of nl, paying attention to a sequence of the lower


CA 02356041 2001-08-27
26
four bits of the input added-excitation-vector number
(nl: an integer value from 0 to 15). Then, the
reading section 122 extracts an element vector 2 (V2)
of a length of 78 from the end of the excitation
vector storage section 121 to the position of n2+14
(an integer value from 14 to 45), paying attention to
a sequence of five bits (n2: an integer value from 14
to 45) having the lower two bits and the upper three
bits of the input added-excitation-vector number
linked together. Further, the reading section 122
performs a process of extracting an element vector 3
( V3 ) of a length of Ns ( = 52 ) from one end of the
excitation vector storage section 121 to the position
of n3+46 (an integer value from 46 to 77), paying
attention to a sequence of the upper five bits of the
input added-excitation-vector number (n3: an integer
value from 0 to 31), and sending V1,~V2 and V3 to the
reversing section 123.
The reversing section 123 performs a process of
sending a vector having Vl, V2 and V3 rearranged in
the reverse order to the multiplying section 124 as
new V1, V2 and V3 when the least significant bit of
the added-excitation-vector number is "0" and sending
V1, V2 and V3 as they are to the multiplying section
124 when the least significant bit is "1."
Paying attention to a sequence of two bits having
the upper seventh and sixth bits of the added-


CA 02356041 2001-08-27
27
excitation-vector number linked, the multiplying
section 124 multiplies the amplitude of V2 by -2 when
the bit sequence is "00," multiplies the amplitude of
V3 by -2 when the bit sequence is "O1," multiplies
the amplitude of Vl by -2 when the bit sequence is
"10" or multiplies the amplitude of V2 by 2 when the
bit sequence is "11," and sends the result as new V1,
V2 and V3 to the decimating section 125.
Paying attention to a sequence of two bits having
the upper fourth and third bits of the added-
excitation-vector number linked, the decimating
section 125
(a) sends vectors of 26 samples extracted every other
sample from V1, V2 and V3 as new V1, V2 and V3 to the
interpolating section 126 when the bit sequence is
"00," (b) sends vectors of 26 samples extracted every
other sample from V1 and V3 and every third sample
from V2 as new V1, V3 and V2 to the interpolating
section 126 when the bit sequence is "O1,"
(c) sends vectors of 26 samples extracted every
fourth sample from Vl and every other sample from V2
and V3 as new V1, V2 and V3 to the interpolating
section 126 when the bit sequence is "10," and
(d) sends vectors of 26 samples extracted every
fourth sample from V1, every third sample from V2 and
every other sample from V3 as new V1, V2 and V3 to
the interpolating section 126 when the bit sequence


CA 02356041 2001-08-27
28
1s "11."
Paying attention to the upper third bit of the
added-excitation-vector number, the interpolating
section 126
(a) sends vectors which have V1, V2 and V3
respectively substituted in even samples of zero
vectors of a length Ns (= 52) as new V1, V2 and V3 to
the adding section 127 when the value of the third
bit is "0" and
(b) sends vectors which have V1, V2 and V3
respectively substituted in odd samples of zero
vectors of a length Ns (= 52) as new V1, V2 and V3 to
the adding section 127 when the value of the third
bit is "1."
The adding section 127 adds the three vectors (V1,
V2 and V3) produced by the interpolating section 126
to generate an added excitation vector.
According to this mode, as apparent from the
above, a plurality of processes are combined at
random in accordance with the added-excitation-vector
number to produce random excitation vectors, so that
it is unnecessary to store random code vectors as
they are in a random codebook (ROM), ensuring a
significant reduction in memory capacity.
Note that the use of the excitation vector
generator of this mode in the speech coder of the
fifth mode can allow complicated and random


CA 02356041 2001-08-27
29
excitation vectors to be generated without using a
large-capacity random codebook.
(Seventh Mode)
A description will now be given of a seventh mode
in which the excitation vector generator of any one
of the above-described first to sixth modes is used
in a CELP type speech coder that is based on the PSI-
CELP, the standard speech coding/decoding system for
PDC digital portable telephones in Japan.
FIG. 13A is presents a block diagram of a speech
coder according to the seventh mode. In this speech
coder, digital input speech data 1300 is supplied to
a buffer 1301 frame by frame ( frame length Nf - 104 ) .
At this time, old data in the buffer 1301 is updated
with new data supplied. A frame power
quantizing/decoding section 1302 first reads a
processing frame s(i) (0 S i ~ Nf-1) of a length Nf
(= 104) from the buffer 1301 and acquires mean power
amp of samples in that processing frame from an
equation. 5.
Nf ~ s~(i)
amp= Nf (5)
where amp: mean power of samples in a processing
frame
i: element number (0 S i S Nf-1) in the
processing frame


CA 02356041 2001-08-27
s(i): samples in the processing frame
Nf: processing frame length (= 52).
The acquired mean power amp of samples in the
processing frame is converted to a logarithmically
5 converted value amplog from an equation 6.
amp log = 1og10 (255 x amp + 1) ( 6 )
logip (255 + 1)
where amplog: logarithmically converted value of the
mean power of samples in the processing frame
amp: mean power of samples in the processing
10 frame .
The acquired amplog is subjected to scalar
quantization using a scalar-quantization table Cpow
of 10 words as shown in Table 3 stored in a power
quantization table storage section 1303 to acquire an
15 index of power Ipow of four bits, decoded frame power
spow is obtained from the acquired index of power
Ipow, and the index of power Ipow and decoded frame
power spow are supplied to a parameter coding section
1331. The power quantization table storage section
20 1303 is holding a power scalar-quantization table
(Table 3) of 16 words, which is referred to when the
frame power quantizing/decoding section 1302 carries
out scalar quantization of the logarithmically
converted value of the mean power of the samples in
25 the processing frame.


CA 02356041 2001-08-27
31
Table 3: Power scalar-quantization table
i C ow(i) i C ow(i)


1 0.00675 9 0.39247


2 0.06217 10 0.42920


3 0.1087 7 11 0.46252


4 0.16637 12 0.49503


0.21876 13 0.52784


6 0.26123 14 0.56484


7 0.30799 15 0.61125


8 0.35228 16 0.67498


An LPC analyzing section 1304 first reads
analysis segment data of an analysis segment length
Nw (= 256) from the buffer 1301, multiplies the read
5 analysis segment data by a Hamming window of a window
length Nw (= 256) to yield a Hamming windowed
analysis data and acquires the autocorrelation
function of the obtained Hamming windowed analysis
data to a prediction order Np (= 10).. The obtained
autocorrelation function is multiplied by a lag
window table (Table 4) of 10 words stored in a lag
window storage section 1305 to acquire a Hamming
windowed autocorrelation function, performs linear
predictive analysis on the obtained Hamming windowed
autocorrelation function to compute an LPC parameter
cx(i) (1 S i S Np) and outputs the parameter to a
pitch pre-selector 1308.


CA 02356041 2001-08-27
32
Table 4: Lag window table
i Wla (i) i Wla (i)


0 0.9994438 5 0.9801714


1 0.9977772 6 0.9731081


2 0.9950056 7 0.9650213


3 0.9911382 8 0.9559375


4~ 0.9861880 9 0.9458861


Next, the obtained LPC parameter a(i) is
converted to an LSP (Linear Spectrum Pair) ta(i) (1
i S Np) which is in turn output to an LSP
quantizing/decoding section 1306. The lag window
storage section 1305 is holding a lag window table to
which the LPC analyzing section refers.
The LSP quantizing/decoding section 1306 first
refers to a vector quantization table of an LSP
stored in a LSP quantization table storage section
1307 to perform vector quantization on the LSP
received from the LPC analyzing section 1304, thereby
selecting an optimal index, and sends the selected
index as an LSP code Ilsp to the parameter coding
section 1331. Then, a centroid corresponding to the
LSP code is read as a decoded LSP coq(i) (1 5 i s Np)
from the LSP quantization table storage section 1307,
and the read decoded LSP is sent to an LSP
interpolation section 1311. Further, the decoded LSP
is converted to an LPC to acquire a decoded LSP a q(i)
(1 S i S Np), which is in turn sent to a spectral
weighting filter coefficients calculator 1312 and a
perceptual weighted LPC synthesis filter coefficients


CA 02356041 2001-08-27
33
calculator 1314. The LSP quantization table storage
section 1307 is holding an LSP vector quantization
table to which the LSP quantizing/decoding section
1306 refers when performing vector quantization on an
LSP.
The pitch pre-selector 1308 first subjects the
processing frame data s ( i ) ( 0 5 i S Nf-1 ) read from
the buffer 1301 to inverse filtering using the LPC cx
(i) (1 S i S Np) received from the LPC analyzing
section 1304 to obtain a linear predictive residual
signal res(i) (0 S i S Nf-1), computes the power of
the obtained linear predictive residual signal res(i),
acquires a normalized predictive residual power resid
resulting from normalization of the power of the
computed residual signal with the power of speech
samples of a processing subframe, and sends the
normalized predictive residual power to the parameter
coding section 1331. Next, the linear predictive
residual signal res(i) is multiplied by a Hamming
window of a length Nw (= 256.) to produce a Hamming
windowed linear predictive residual signal resw(i) (0
S i 5 Nw-1), and an autocorrelation function ~int(i)
of the produced resw(i) is obtained over a range of
Lmin-2 5 i s Lmax+2 (where Lmin is 16 in the
shortest analysis segment of a long predictive
coefficient and Lmax is 128 in the longest analysis
segment of a long predictive coefficient). A


CA 02356041 2001-08-27
34
polyphase filter coefficient Cppf (Table 5) of 28
words stored in a polyphase coefficients storage
section 1309 is convoluted in the obtained
autocorrelation function ~int(i) to acquire an
autocorrelation function ~ dq(i) at a fractional
position shifted by -1/4 from an integer lag int, an
autocorrelation function ~ aq(i) at a fractional
position shifted by +1/4 from the integer lag int,
and an autocorrelation function ~ ah(i) at a
fractional position shifted by +1/2 from the integer
lag int.
Table 5: Polyphase filter coefficients Cppf
i Cppf(i) i Cppf(i) i Cppf(i) i Cppf(i)


0 0.100035 7 0.000000 14 -0.128617 21 -0.212207


1 -0.180063 8 0.000000 15 0.300105 22 0.636620


2 0.900316 9 1.000000 16 0.900316 23 0.636620


3 0.300105 10 0.000000 17 -0.180063 24 -0.212207
~


4 -0.128617 11 0.000000 18 0.100035 25 0.127324


5 0.081847 12 0.000000 19 -0.069255 26 -0.090946


6 -0.060021 13 0.000000 20 0.052960 27 0.070736


Further, for each argument i in a range of Lmin-2
i S Lmax+2, a process of an equation 7 of
substituting the largest one of ~int(i), d~dq(i),
aq(i) and ~ ah(i) in ~ max(i) to acquire (Lmax - Lmin
+ 1) pieces of ~ max(i).
~ max(i) = NL~Y(~ int(i), ~dq(i), ~aq(i), ~ah(i))
~ max(i) : maximum value of ~int(i),~dq(i),~aq(i),~ah(i) ( 7 )
where ~ max(i)s the maximum value among ~ int(i),
dq(i), ~aq(i), ~ah(i)


CA 02356041 2001-08-27
I: analysis segment of a long predictive
coefficient (Lmin s i 5 Lmax)
Lmin: shortest analysis segment (= 16) of the
long predictive coefficient
5 Lmax: longest analysis segment (= 128) of the
long predictive coefficient
~int(i): autocorrelation function of an integer
lag (int) of a predictive residual signal
~ dq(i): autocorrelation function of a fractional
10 lag (int-1/4) of the predictive residual signal
~ aq(i): autocorrelation function of a fractional
lag (int+1/4) of the predictive residual signal
~ ah(i): autocorrelation function of a fractional
lag (int+1/2) of the predictive residual signal.
15 Larger top six are selected from the acquire
(Lmax - Lmin + 1) pieces of ~ max(i) and are saved as
pitch candidates psel(i) (0 s i 5 5), and the linear
predictive residual signal res(i) and the first pitch
candidate psel(0) are sent to a pitch weighting
20 filter calculator 1310 and psel(i) (0 S i 5 5) to an
adaptive code vector generator 1319.
The polyphase coefficients storage section 1309
is holding polyphase filter coefficients to be
referred to when the pitch pre-selector 1308 acquires
25 the autocorrelation of the linear predictive residual
signal to a fractional lag precision and when the
adaptive code vector generator 1319 produces adaptive


CA 02356041 2001-08-27
36
code vectors to a fractional precision.
The pitch weighting filter calculator 1310
acquires pitch predictive coefficients cov(i) (0 S i
2) of a third order from the linear predictive
residuals res(i) and the first pitch candidate
psel(0) obtained by the pitch pre-selector 1308. The
impulse response of a pitch weighting filter Q(z) is
obtained from an equation which uses the acquired
pitch predictive coefficients cov(i) (0 5 i 5 2),
and is sent to the spectral weighting filter
coefficients calculator 1312 and a perceptual
weighting filter coefficients calculator 1313.
Q(z)=1+ ~cov(i)x~.pixz-psel(0)+i-1 (8)
i~o
where Q(z): transfer function of the pitch weighting
filter
cov(i): pitch predictive coefficients (0 S i S
2)
~ pi: pitch weighting constant (= 0.4)
psel(0): first pitch candidate.
The LSP interpolation section 1311 first acquires
a decoded interpolated LSP t,~intp(n,i) (1 S i s Np)
subframe by subframe from an equation 9 which uses a
decoded LSP tcq(i) for the current processing frame,
obtained by the LSP quantizing/decoding section 1306,
and a decoded LSP coqp(i) for a previous processing
frame which has been acquired and saved earlier.


CA 02356041 2001-08-27
37
0.4xcvq(i)+0.6xwqp(i) n=1 ( 9 )
co int p(n, i) _ ~q(i) n = 2
where taintp(n,j): interpolated LSP of the n-th
subf tame
n: subframe number (= 1,2)
taq(i): decoded LSP of a processing frame
t,~qp(i): decoded LSP of a previous processing
f tame .
A decoded interpolated LPC cxq(n,i) (1 S i S Np)
is obtained by converting the acquired cointp(n,i) to
an LPC and the acquired, decoded interpolated LPC cx
q(n,i) (1 s i s Np) is sent to the spectral
weighting filter coefficients calculator 1312 and the
perceptual weighted LPC synthesis filter coefficients
calculator 1314.
The.spectral weighting filter coefficients
calculator 1312, which constitutes an MA type
spectral weighting filter I(z) in an equation 10,
sends its impulse response to the perceptual
weighting filter coefficients calculator 1313.
Nfir
I(z) _ ~ afir(i) x z-1 ( 10 )
i-1
where I(z): transfer function of the MA type spectral
weighting filter
Nfir: filter order ( = 11 ) of I ( z )
cx fir( i ) : filter order ( 1 S i s Nfir ) of I ( z ) .


CA 02356041 2001-08-27
38
Note that the impulse response afir(i) (1 S i
Nfir) in the equation 10 is an impulse response of an
ARMA type spectral weighting filter G(z), given by an
equation 11, cut after Nfir(= 11).
1 + ~ Npl a(n, i) x iunal x z-1
G(z) _ ( 11 )
1 + ~ Npl a(n, i) x J~,arl x z-1
where G(z): transfer function of the spectral
weighting filter
n: subframe number (= 1,2)
Np: LPC analysis order (= 10)
a(n,i): decoded interpolated LSP of the n-th
subframe
~ ma: numerator constant (= 0.9) of G(z)
far: denominator constant (= 0.4) of G(z).
The perceptual weighting filter coefficients
calculator 1313 first constitutes a perceptual
weighting filter W(z) which has as an impulse
response the result of convolution of the impulse
response of the spectral weighting filter I(z)
received from the spectral weighting filter
coefficients calculator 1312 and the impulse response
of the pitch weighting filter Q(z) received from the
pitch weighting filter calculator 1310, and sends the
impulse response of the constituted perceptual
weighting filter W(z) to the perceptual weighted LPC
synthesis filter coefficients calculator 1314 and a


CA 02356041 2001-08-27
39
perceptual weighting section 1315.
The perceptual weighted LPC synthesis filter
coefficients calculator 1314 constitutes a perceptual
weighted LPC synthesis filter H(z) from an equation
12 based on the decoded interpolated LPC a q(n,i)
received from the LSP interpolation section 1311 and
the perceptual weighting filter W(z) received from
the perceptual weighting filter coefficients
calculator 1313.
H(z) _
W(z) ( 12 )
1 + ~ Np aq(n, i) x z-1
i=1
where H(z): transfer function of the perceptual
weighted synthesis filter
Np: LPC analysis order
a q(n,i): decoded interpolated LPC of the n-th
subframe
n: subframe number (= 1,2)
W(z): transfer function of the perceptual
weighting filter (I(z) and Q(z) cascade-connected).
The coefficient of the constituted perceptual
weighted LPC synthesis filter H(z) is sent to a
target vector generator A 1316, a perceptual weighted
LPC reverse synthesis filter A 1317, a perceptual
weighted LPC synthesis filter A 1321, a perceptual
weighted LPC reverse synthesis filter B 1326 and a
perceptual weighted LPC synthesis filter B 1329.
The perceptual weighting section 1315 inputs a


CA 02356041 2001-08-27
subframe signal read from the buffer 1301 to the
perceptual weighted LPC synthesis filter H(z) in a
zero state, and sends its outputs as perceptual
weighted residuals spw(i) (0 s i 5 Ns-1) to the
5 target vector generator A 1316.
The target vector generator A 1316 subtracts a
zero input response Zres(i) (0 S i S Ns-1), which is
an output when a zero sequence is input to the
perceptual weighted LPC synthesis filter H(z)
10 obtained by the perceptual weighted LPC synthesis
filter coefficients calculator 1314, from the
perceptual weighted residuals spw(i) (0 S i S Ns-1)
obtained by the perceptual weighting section 1315,
and sends the subtraction result to the perceptual
15 weighted LPC reverse synthesis filter A 1317 and a
target vector generator B 1325 as a
target vector r(i) (0 5 i s Ns-1) for selecting an
excitation vector.
The perceptual weighted LPC reverse synthesis
20 filter A 1317 sorts the target vectors r(i) (0 S i
Ns-1) received from the target vector generator A
1316 in a time reverse order, inputs the acquired
vectors to the perceptual weighted LPC synthesis
filter H(z) with the initial state of zero, and sorts
25 its outputs again in a time reverse order to obtain
time reverse synthesis rh(k) (0 S i 5 Ns-1) of the
target vector, and sends the vector to a comparator A


CA 02356041 2001-08-27
41
1322.
Stored in an adaptive codebook 1318 are old
excitation vectors which are referred to when the
adaptive code vector generator 1319 generates
adaptive code vectors. The adaptive code vector
generator 1319 generates Nac pieces of adaptive code
vectors Pacb ( i , k ) ( 0 S i S Nac - 1 , 0 5 k s s Ns -1 , 6
Nac s 24) based on six pitch candidates psel(j) (0
5 j S 5) received from the pitch pre-selector 1308,
and sends the vectors to an adaptive/fixed selector
1320. Specifically, as shown in Table 6, adaptive
code vectors are generated for four kinds of
fractional lag positions per a single integer lag
position when 16 S psel(j) S 44, adaptive code
vectors are generated for two kinds of fractional lag
positions per a single integer lag position when 46
psel(j) S 64, and adaptive code vectors are generated
for integer lag positions when 65 S psel(j) S 128.
From this, depending on the value of psel(j) (0
5 5), the number of adaptive code vector candidates
Nac is 6 at a minimum and 24 at a maximum.


CA 02356041 2001-08-27
42
Table 6: Total number of adaptive code vectors
and fixed code vectors
Total number of vectors 255


Number of adaptive
code


vectors 222


16 s psel(i) s 44 116 (29 x four kinds of


fractional lags)


45 s psel(i) s 64 42 (21 x two kinds of


fractional lags)


65 s psel(i) s 128 64 (64 x one kind of


fractional lag)


Number of fixed code 32 (16x two kinds of codes)


vectors


Adaptive code vectors to a fractional precision
are generated through an interpolation which
convolutes the coefficients of the polyphase filter
stored in the polyphase coefficients storage section
1309.
Interpolation corresponding to the value of
lagf(i) means interpolation corresponding to an
integer lag position when lagf(i) - 0, interpolation
corresponding to a fractional lag position shifted by
-1/2 from an integer lag position when lagf(i) - 1,
interpolation corresponding to a fractional lag
position shifted by +1/4 from an integer lag position
when lagf(i) - 2, and interpolation corresponding to
a fractional lag position shifted by -1/4 from an
integer lag position when lagf(i) - 3.
The adaptive/fixed selector 1320 first receives
adaptive code vectors of the Nac (6 to 24) candidates
ZO generated by the adaptive code vector generator 1319


CA 02356041 2001-08-27
43
and sends the vectors to the perceptual weighted LPC
synthesis filter A 1321 and the comparator A 1322.
To pre-select the adaptive code vectors Pacb(i,k)
(0 5 i s Nac-1, 0 5 k S Ns-1, 6 5 Nac S 24)
generated by the adaptive code vector generator 1319
to Nacb (= 4) candidates from Nac (6 to 24)
candidates, the comparator A 1322 first acquires the
inner products prac(i) of the time reverse
synthesized vectors rh(k) (0 S i S Ns-1) of the
target vector, received from the perceptual weighted
LPC reverse synthesis filter A 1317, and the adaptive
code vectors Pacb(i,k) from an equation 13.
Ns-1
prac(i) _ ~ Pacb(i, k) x rh(k) ( 13 )
k=0
where Prac(i): reference value for pre-selection of
adaptive code vectors
Nac: the number of adaptive code vector
candidates after pre-selection (= 6 to 24)
i: number of an adaptive code vector (0 5 i s
Nac-1)
Pacb(i,k): adaptive code vector
rh(k): time reverse synthesis of the target
vector r(k).
By comparing the obtained inner products Prac(i),
the top Nacp.(= 4) indices when the values of the
products become large and inner products with the
indices used as arguments are selected and are


CA 02356041 2001-08-27
44
respectively saved as indices of adaptive code
vectors after pre-selection apsel(j) (0 5 j S Nacb-
1) and reference values after pre-selection of
adaptive code vectors prac(apsel(j)), and the indices
of adaptive code vectors after pre-selection apsel(j)
(0 s j S Nacb-1) are output to the adaptive/fixed
selector 1320.
The perceptual weighted LPC synthesis filter A
1321 performs perceptual weighted LPC synthesis on
adaptive code vectors after pre-selection
Pacb(absel(j),k), which have been generated by the
adaptive code vector generator 1319 and have passed
the adaptive/fixed selector 1320, to generate
synthesized adaptive code vectors SYNacb(apsel(j),k)
which are in turn sent to the comparator A 1322.
Then, the comparator A 1322 acquires reference values
for final-selection of an adaptive code vector
sacbr(j) from an equation 14 for final-selection on
the Nacb (= 4) adaptive code vectors after pre-
selection Pacb.( a.b.se.l.( j.). , k.) ,. pre.-sel.ec ed .,by the
comparator A 1322 itself.
prac2(apsel(j)) ( 14 )
sacbr(j) _
Ns -1 Sl,Nacb~ (j, k)
~k=0
where sacbr(j): reference value for final-selection
of an adaptive code vector
prac(): reference values after pre-selection of


CA 02356041 2001-08-27
adaptive code vectors
apsel(j): indices of adaptive code vectors after
pre-selection
k: vector order (0 5 j s Ns-1)
5 j: number of the index of a pre-selected adaptive
code vector ( 0 5 j S Nacb-1 )
Ns: subframe length (= 52)
Nacb: the number of pre-selected adaptive code
vectors (= 4)
10 SYNacb(J,K): synthesized adaptive code vectors.
The index when the value of the equation 14
becomes large and the value of the equation 14 with
the index used as an argument are sent to the
adaptive/fixed selector 1320 respectively as an index
15 of adaptive code vector after final-selection ASEL
and a reference value after final-selection of an
adaptive code vector sacbr(ASEL).
A fixed codebook 1323 holds Nfc (= 16) candidates
of vectors to be read by a fixed code vector reading
20 section 1324. To pre-select fixed. code. vectors.
Pfcb(i,k) (0 5 i S Nfc-1, 0 S k S Ns-1) read by the
fixed code vector reading section 1324 to Nfcb (= 2)
candidates from Nfc (= 16) candidates, the comparator
A 1322 acquires the absolute values ~prfc(i)~ of the
25 inner products of the time reverse synthesized
vectors rh(k) (0 S i S Ns-1) of the target vector,
received from the perceptual weighted LPC reverse


CA 02356041 2001-08-27
46
synthesis filter A 1317, and the fixed code vectors
Pfcb(i,k) from an equation 15.
Ns-1
prfc(i)I = ~ Pfcb(i, k) x rh(k) ( 15 )
k-0
where ~prfc(i)~: reference values for pre-selection
of fixed code vectors
k: element number of a vector (0 S k s Ns-1)
i: number of a fixed code vector ( 0 5 i S Nfc-1 )
Nfc: the number of fixed code vectors (= 16)
Pfcb(i,k): fixed code vectors
rh(k): time reverse synthesized vectors of the
target vector rh(k).
By comparing the values ~prfc(i)~ of the equation
15, the top Nfcb (= 2) indices when the values become
large and the absolute values of inner products with
the indices used as arguments are selected and are
respectively saved as indices of fixed code vectors
after pre-selection fpsel(j) (0 s j s Nfcb-1) and
reference values for fixed code vectors after pre-
selection ~prfcw(fpsel( j ) ~ , and ind~ice~s ofw fixed code-
vectors after pre-selection fpsel(j) (0 S j S Nfcb-
1) are output to the adaptive/fixed selector 1320.
The perceptual weighted LPC synthesis filter A
1321 performs perceptual weighted LPC synthesis on
fixed code vectors after pre-selection
Pfcb(fpsel(j),k) which have been read from the fixed
code vector reading section 1324 and have passed the

CA 02356041 2001-08-27
47
adaptive/fixed selector 1320, to generate synthesized
fixed code vectors SYNfcb(fpsel(~j),k) which are in
turn sent to the comparator A 1322.
The comparator A 1322 further acquires a .
- reference value for final-selection of a fixed code
vector sfcbr(j) from an equation 16 to finally select
an optimal fixed code vector from the Nfcb (= 2)
fixed code vectors after pre-selection
Pfcb(fpsel(j),k), pre-selected by the comparator A
1322 itself.
prfc(fpsel(j)
sfcbr(j) = Ns -1 ~ ( 16 )
SYNfcb (j, k)
~k=0
where sfcbr(j): reference value for final-selection
of a fixed code vector
~prfc()~: reference values after pre-selection of
fixed code vectors
fpsel(j): indices of fixed code vectors after
pre-selection ( 0 S j S Nfcb-1 )
k: element number of a vector ( 0 S k s -Ns-,1 )
j: number of a pre-selected fixed code vector (0
s j S Nfcb-1 )
Ns: subframe length (= 52)
Nfcb: the number of pre-selected fixed code
vectors (= 2)
SYNfcb(J,K): synthesized fixed code vectors.
The index when the value of the equation 16


CA 02356041 2001-08-27
48
becomes large and the value of the equation 16 with
the index used as an argument are sent to the
adaptive/fixed selector 1320 respectively as an index
of fixed code vector after final-selection FSEL and a
reference value after final-selection of a fixed code
vector sacbr(FSEL).
The adaptive/fixed selector 1320 selects either
the adaptive code vector after final-selection or the
fixed code vector after final-selection as an
adaptive/fixed code vector AF(k) (0 s k 5 Ns-1) in
accordance with the size relation and the polarity
relation among prac(ASEL), sacbr(ASEL), ~prfc(FSEL)~
and sfcbr(FSEL) (described in an equation 17)
received from the comparator A 1322.
Pacb(ASEL, k) sacbr(.~SEL) z sfcbr (FSEL), prac(.~1SEL) > 0
AF(k) = 0 sacbr(.~SEL) z sfcbr(FSEL), prac(.~SEL) s 0
Pfcb(FSEL, k) sacbr(_~SEL) < sfcbr(FSEL), prfc(FSEL) z 0
-Pfcb(FSEL, k) sacbr(rISEL) < sfcbr(FSEL), prfc(FSEL) < 0
(17)
where AF(k): adaptive/fixed code vector
AS-EL: index of- adaptive- code- vec-tor after .final,-
selection
FSEL: index of fixed code vector after final-
selection
k: element number of a vector
Pacb(ASEL,k): adaptive code vector after final-
selection
Pfcb(FSEL,k) : fixed code vector after final-


CA 02356041 2001-08-27
49
selection Pfcb(FSEL,k)
sacbr(ASEL): reference value after final-
selection of an adaptive code vector
sfcbr(FSEL) : reference value after final-
s selection of a ffixed code vector
prac(ASEL): reference values after pre-selection
of adaptive code vectors
prfc(FSEL): reference values after pre-selection
of fixed code vectors prfc(FSEL)..
The selected adaptive/fixed code vector AF(k) is
sent to the perceptual weighted LPC synthesis filter
A 1321 and an index representing the number that has
generated the selected adaptive/fixed code vector
AF(k) is sent as an adaptive/fixed index AFSEL to the
parameter coding section 1331. As the total number
of adaptive code vectors and fixed code vectors is
designed to be 255 (see Table 6), the adaptive/fixed~
index AFSEL is a code of 8 bits.
The perceptual weighted LPC synthesis filter A
1321 performs perceptual weighted LPC synthesi on
the adaptive/fixed code vector AF(k), selected by the
adaptive/fixed selector 1320, to generate a
synthesized adaptive/fixed code vector SYNaf(k) (0 S
k s Ns-1) and sends it to the comparator A 1322.
The comparator A 1322 first obtains the power
powp of the synthesized adaptive/fixed code vector
SYNaf ( k ) ( 0 S k S Ns -1 ) received from the perceptual


CA 02356041 2001-08-27
weighted LPC synthesis filter A 1321 using an
equation 18.
Ns-1
powp = ~ SYNaf ~ (k) ( 18 )
k=0
where powm: power of adaptive/fixed code vector
5 (SYNaf(k))
k: element number of a vector (0 S k S Ns-1)
Ns: subframe length (= 52)
SYNaf(k): adaptive/fixed code vector.
Then, the inner product pr of the target vector
10 received from the target vector generator A 1316 and
the synthesized adaptive/fixed code vector SYNaf(k)
is acquired from an equation 19.
Ns-1
pr = ~ SYNaf(k) x r(k) ( 19 )
k=0
where pr: inner product of SYNaf(k) and r(k)
15 Ns: subframe length (= 52)
SYNaf(k): adaptive/fixed code vector
r(k): target vector
k: element number of a vector (0 S k s- Ns.-1) .
Further, the adaptive/fixed code vector AF(k)
20 received from the adaptive/fixed selector 1320 is
sent to an adaptive codebook updating section 1333 to
compute the power POWaf of AF(k), the synthesized
adaptive/fixed code vector SYNaf(k) and POWaf are
sent to the parameter coding section 1331, and powp,
25 pr, r(k) and rh(k) are sent to a comparator B 1330.


CA 02356041 2001-08-27
51
The target vector generator B 1325 subtracts the
synthesized adaptive/fixed code vector SYNaf(k), received
from the comparator A 1322, from the target vector r(i) (0
i S Ns-1) received from the comparator A 1322, to
generate a new target vector, and sends the new target
vector to the perceptual weighted LPC reverse synthesis
filter B 1326.
The perceptual weighted LPC reverse synthesis filter B
1326 sorts the new target vectors, generated by the target
vector generator B 1325, in a time reverse order, sends the
sorted vectors to the perceptual weighted LPC synthesis
filter in a zero state, the output vectors are sorted again
in a time reverse order to generate time-reversed
synthesized vectors ph(k) (0 S k S Ns-1) which are in turn
sent to the comparator B 1330.
An excitation vector generator 1337 in use is the same
as, for example, the excitation vector generator 70 which
has been described in the section of the third mode. The
excitation vector generator 70 generates a random code
vector as the first seed is read from the seed storage
section 71 and input to the non-linear digital filter 72.
The random code vector generated by the excitation vector
generator 70 is sent to the perceptual weighted LPC
synthesis filter B 1329 and the comparator B 1330. Then, as
the second seed is read from the seed storage section 71
and input to the non-linear digital filter 72, a random
code vector is generated and output to the filter B 1329


CA 02356041 2001-08-27
52
and the comparator B 1330.
To pre-select random code vectors generated based on
the first seed to Nstb (= 6) candidates from Nst (= 64)
candidates, the comparator B 1330 acquires reference values
cr(il) (0 5 il S Nstbl-1) for pre-selection of first random
code vectors from an equation 20.
cr(il) = N~1 Pstbl(ilj) x rh(j) - pr N~ Pstbl(ilj) x ph(j) ( 20 )
j ~0 powp
where cr(il): reference values for pre-selection of first
random code vectors
Ns: subframe length (= 52)
rh(j): time reverse synthesized vector of a target
vector (r(j))
powp: power of an adaptive/fixed vector (SYNaf(k))
pr: inner product of SYNaf(k) and r(k)
Pstbl(il,j): first random code vector
ph(j): time reverse synthesized vector of SYNaf(k)
il: number of the first random code vector (0 S il S
Nst-1)
j: element number of a vector.
By comparing the obtained values cr(il), the top Nstb
(= 6) indices when the values become large and inner
products with the indices used as arguments are selected
and are respectively saved as indices of first random code
vectors after pre-selection slpsel(jl) (0 s jl S Nstb-1)
and first random code vectors after pre-selection
Pstbl(slpsel(jl),k) (0 s jl S Nstb-1, 0 S k S Ns-1). Then,


CA 02356041 2001-08-27
53
the same process as done for the first random code vectors
is performed for second random code vectors and indices and
inner products are respectively saved as indices of second
random code vectors after pre-selection slpsel(j2) (0 S j2
5 Nstb-1) and second random code vectors after pre- .
selection Pstb2(s2psel(j2),k) (0 s j2 5 Nstb-1, 0 S k s
Ns-1).
The perceptual weighted LPC synthesis filter B 1329
performs perceptual weighted LPC synthesis on the first
random code vectors after pre-selection Pstbl(slpsel(jl),k)
to generate synthesized first random code vectors
SYNstbl(slpsel(jl),k) which are in turn sent to the
comparator B 1330. Then, perceptual weighted LPC synthesis
is performed on the second random code vectors after pre-
selection Pstb2(slpsel(j2),k) to generate synthesized
second random code vectors SYNstb2(s2psel(j2),k) which are
in turn sent to the comparator B 1330.
To implement final-selection on the first random code
vectors after pre-selection Pstbl(slpsel(jl),k) and the
second random code vectors after pre-selection
Pstb2(slpsel(j2),k), pre-selected by the comparator B 1330
itself, the comparator B 1330 carries out the computation
of an equation 21 on the synthesized first random code
vectors SYNstbl(slpsel(jl),k) computed in the perceptual
weighted LPC synthesis filter B 1329.


CA 02356041 2001-08-27
54
SYNOstbl(slpsel(jl), k) = SI'Nstbl(slpsel(jl), k)
SYNaf(jl) Ns' ~pstbl(slpsel(j 1), k) X ph(k) ( 21 )
powp ~ k = 0
where SYNOstbl(slpsel(jl),k): orthogonally synthesized
first random code vector
SYNstbl(slpsel(jl),k): synthesized first random code
vector
Pstbl(slpsel(jl),k): first random code vector after
pre-selection
SYNaf(j): adaptive/fixed code vector
powp: power of adaptive/fixed code vector (SYNaf(j))
Ns: subframe length (= 52)
ph(k): time reverse synthesized vector of SYNaf(j)
jl: number of first random code vector after pre-
selection
k: element number of a vector (0 5 k 5 Ns-1).
Orthogonally synthesized first random code vectors
SYNOstbl(slpsel(jl),k) are obtained, and a similar
computation is performed on the synthesized second random
code vectors SYNstb2(s2psel(j2),k) to acquire orthogonally
synthesized second random code vectors
SYNOstb2(s2psel(j2),k), and reference values after final-
selection of a first random code vector slcr and reference
values after final-selection of a second random code vector
s2cr are computed in a closed loop respectively using
equations 22 and 23 for all the combinations (36
combinations) of (slpsel(jl), s2psel(j2)).


CA 02356041 2001-08-27
csc rl
scrl =
k s ~1~SY NOstbl(slpsel(jl), k) + SYNOstb2(s2psel(j2), k)~
(22)
where scrl: reference value after final-selection of a
first random code vector
5 cscrl: constant previously computed from an equation
24
SYNOstbl(slpsel(jl),k): orthogonally synthesized first
random code vectors
SYNOstb2(s2psel(j2),k): orthogonally synthesized
10 second random code vectors
r(k): target vector
slpsel(jl): index of first random code vector after
pre-selection
s2psel(j2): index of second random code vector after
15 pre-selection
Ns: subframe length (= 52) ,
k: element number of a vector.
scr2 = csc r22
k s ol~SYNOstb1(slpsel(jl), k - SYNOstb2(s2psel(j2), k)~
(23)
20 where scr2: reference value after final-selection of a
second random code vector
cscr2: constant previously computed from an equation

CA 02356041 2001-08-27
56
SYNOstbl(slpsel(jl),k): orthogonally synthesized first
random code vectors
SYNOstb2(s2psel(j2),k): orthogonally synthesized
second random code vectors
r(k): target vector
slpsel(jl): index of first random code vector after
pre-selection
s2psel(j2): index of second random code vector after
pre-selection
Ns: subframe length (= 52)
k: element number of a vector.
Note that cslcr in the equation 22 and cs2cr in the
equation 23 are constants which have been calculated
previously using the equations 24 and 25, respectively.
Ns-1 Ns-1
csc rl = ~ SYNOstbl(slpsel(jl), k) x r(k) = ~ S~ NOstb2(s2psel(j2), k) x r(k)
kap K=o
(24)
where cscrl: constant for an equation 29
SYNOstbl(slpsel(jl),k): orthogonally synthesized first
random code vectors
SYNOstb2(s2psel(j2),k): orthngonally synthesized
second random code vectors
r(k): target vector
slpsel(jl): index of first random code vector after
pre-selection
s2psel(j2): index of second random code vector after
pre-selection


CA 02356041 2001-08-27
57
Ns: subframe length (= 52)
k: element number of a vector.
Ns-1 Ns-1
csc rl = ~ SYNOstbl(slpsel(jl), k) x r(k) - ~ SYNOstb2(s2psel(j2), k) x r(k)
k=0 0
(25)
where cscr2: constant for the equation 23
SYNOstbl(slpsel(jl),k): orthogonally synthesized first
random code vectors
SYNOstb2(s2psel(j2),k): orthogonally synthesized
second random code vectors
r(k): target vector
slpsel(jl): 'index of first random code vector after
pre-selection
s2psel(j2): index of second random code vector after
pre-selection
Ns: subframe length (= 52)
k: element number of a vector.
The comparator B 1330 substitutes the maximum value of
Slcr in MAXslcr, substitutes the maximum value of S2cr in
MAXs2cr, sets MAXslcr or MAXs2cr, whichever is larger, as
scr, and sends the value of slpsel(jl), which had-been
referred to when scr was obtained, to the parameter coding
section 1331 as an index of a first random code vector
after final-selection SSEL1. The random code vector that
corresponds to SSEL1 is saved as a first random code vector
after final-selection Pstbl(SSELl,k) , and is sent to the
parameter coding section 1331 to acquire a first random


CA 02356041 2001-08-27
58
code vector after final-selection SYNstbl(SSELl,k) (0 S k
Ns-1) corresponding to Pstbl(SSELl,k).
Likewise, the value of s2psel(j2), which had been
referred to when scr was obtained, to the parameter coding
5 section 1331 as an index of a second random code vector
after final-selection SSEL2. The random code vector that
corresponds to SSEL2 is saved as a second random code
vector after final-selection Pstb2(SSEL2,k), and is sent to
the parameter coding section 1331 to acquire a second
random code vector after final-selection SYNstb2(SSEL2,k)
(0 S k s Ns-1) corresponding to Pstb2(SSEL2,k).
The comparator B 1330 further acquires codes S1 and S2
by which Pstbl(SSELl,k) and Pstb2(SSEL2,k) are respectively
multiplied, from an equation 26, and sends polarity
information Isls2 of the obtained S1 and S2 to the
parameter coding section 1331 as a gain polarity index
Isls2 (2-bit information).
(+1,+1) scrl z scr2, cscrl z 0


(-1,-1) scrl z scr2, cscrl < 0


(S1,S2) ( 26 )
_


(+1,-1) scrl < scr2,cscr? z 0


(-1,+1) scrl < scr2, cscr2 < 0


where S1: code of the first random code vector after final-
selection
S2: code of the second random code vector after final-
selection
scrl: output of the equation 29
scr2: output of the equation 23
cscrl: output of the equation 24

CA 02356041 2001-08-27
59
cscr2: output of the equation 25.
A random code vector ST(k) (0 5 k S Ns-1) is generated
by an equation 27 and output to the adaptive codebook
updating section 1333, and its power POWsf is acquired and
output to the parameter coding section 1331.
ST(k) s S1 x Pstbl(SSEL1, k) = S2 x Pstb2(SSEL2, k) ( 2 7 )
where ST(k): probable code vector
S1: code of the first random code vector after final-
selection
S2: code of the second random code vector after final-
selection
Pstbl(SSELl,k): first-stage settled code vector after
final-selection
Pstbl(SSEL2,k): second-stage settled code vector after
final-selection
SSEL1: index of the first random code vector after
final-selection
SSEL2: second random code vector after final-selection
k: element number of a vector ( 0 s k S Ns-1 ) .
A synthesized random code vector SYNst(k) (0 S k s Ns-
1) is generated by an equation 28 and output to the
parameter coding section 1331.
SYNst(k) = S1 x SYNstbl(SSEL1, k) + S2 x SYNstb2(SSEL2, k) ( 2s )
where STNst(k): synthesized probable code vector
S1: code of the first random code vector after final-
selection
S2: code of the second random code vector after final-


CA 02356041 2001-08-27
selection
SYNstbl(SSELl,k): synthesized first random code vector
after final-selection
SYNstb2(SSEL2,k): synthesized second random code
5 vector after final-selection
k: element number of a vector (0 s k 5 Ns-1).
The parameter coding section 1331 first acquires a
residual power estimation for each subframe rs is acquired
from an equation 29 using the decoded frame power spow
10 which has been obtained by the frame power
quantizing/decoding section 1302 and the normalized
predictive residual power resid, which has been obtained by
the pitch pre-selector 1308.
rs=Ns X spowX resid ( 29 )
15 where rs: residual power estimation for each subframe
Ns: subframe length (= 52)
spow: decoded frame power
resid: normalized predictive residual power.
A reference value for quantization gain selection STDg
20 is acquired from an equation 30 by using the acquired
residual power estimation for each subframe rs, the power
of the adaptive/fixed code vector POWaf computed in the
comparator A 1322, the power of the random code vector
POWst computed in the comparator B 1330, a gain
25 quantization table (CGaf[i],CGst(i]) (0 5 i S 127) of 256
words stored in a gain quantization table storage section
1332 and the like.


CA 02356041 2001-08-27
61
Table 7: Gain quantization table
i CGaf(i) CGst(i)


1 0.38590 0.23477


2 0.42380 0.50453


3 0.23416 0.24761



1 2 6 0.35382 1.68987


1 2 7 0.10689 1.02035


1 2 8 3.09711 1.75430


rs .CGaf (Ig) x SYNaf (k)
Ns-1 pOWaf
STDg a
+ rs ~CGst(Ig) x SYNst(k) - r(k)
POWst
(30)
where STDg: reference value for quantization gain selection
rs: residual power estimation for each .subframe
POWaf: power of the adaptive/fixed code vector
POWSst: power of the random code vector
i: index of the gain quantization table (0 S i S 127)
CGaf(i): component on the adaptive/fixed code vector
side in the gain quantization table
CGst(i): component on the random code vector side in
the gain quantization table
SYNaf(k): synthesized adaptive/fixed code vector
SYNst(k): synthesized random code vector
r(k): target vector
Ns: subframe length (= 52)
k: element number of a vector ( 0 S k S Ns-1 ) .


CA 02356041 2001-08-27
62
One index when the acquired reference value for
quantization gain selection STDg becomes minimum is
selected as a gain quantization index Ig, a final gain on
the adaptive/fixed code vector side Gaf to be actually
applied to AF(k) and a final gain on the random code vector
side Gst to be actually applied to ST(k) are obtained from
an equation 31 using a gain after selection of the
adaptive/fixed code vector CGaf(Ig), which is read from the
gain quantization table based on the selected gain
quantization index Ig, a gain after selection of the random
code vector CGst(Ig), which is read from the gain
quantization table based on the selected gain quantization
index Ig and so forth, and are sent to the adaptive
codebook updating section 1333.
(Gaf ,Gst) = pOWaf CGaf (Ig), pOWstCGst(IG)
(31)
where Gaf: final gain on the adaptive/fixed code vector
side
Gst: final gain on the random code vector side Gst
rs: residual power estimation for each subframe
POWaf: power of the adaptive/fixed code vector
POWst: power of the random code vector
CGaf(Ig): power of a fixed/adaptive side code vector
CGst(Ig): gain after selection of a random code vector
side
Ig: gain quantization index.


CA 02356041 2001-08-27
63
The parameter coding section 1331 converts the index
of power Ipow, acquired by the frame power
quantizing/decoding section 1302, the LSP code Ilsp,
acquired by the LSP quantizing/decoding section 1306, the
adaptive/fixed index AFSEL, acquired by the adaptive/fi:xed
selector 1320, the index of the first random code vector
after final-selection SSEL1, the second random code vector
after final-selection SSEL2 and the polarity information
Isls2, acquired by the comparator B 1330, and the gain
quantization index Ig, acquired by the parameter coding
section 1331, into a speech code, which is in turn sent to
a transmitter 1334.
The adaptive codebook updating section 1333 performs a
process of an equation 32 for multiplying the
adaptive/fixed code vector AF(k), acquired by the
comparator A 1322, and the random code vector ST(k),
acquired by the comparator B 1330, respectively by the
final gain on the adaptive/fixed code vector side Gaf and
the final gain on the random code vector side Gst, acquired
by the parameter coding section 1331, and then adding the
results to thereby generate an excitation vector ex(k) (0
k 5 Ns-1), and sends the generated excitation vector ex(k)
(0 5 k S Ns-1) to the adaptive codebook 1318.
ex(k)= Gaf x AF(k)+Gst x ST(k)
(32)
where ex(k): excitation vector
AF(k): adaptive/fixed code vector


CA 02356041 2001-08-27
64
ST(k): random code vector
k: element number of a vector (0 S k S Ns-1).
At this time, an old excitation vector in the adaptive
codebook 1318 is discarded and is updated with a new
excitation vector ex(k) received from the adaptive codebook
updating section 1333.
(Eighth Mode)
A description will now be given of an eighth mode in
which any excitation vector generator described in first to
sixth modes is used in a speech decoder that is based on
the PSI-CELP, the standard speech coding/decoding system
for PDC digital portable telephones. This decoder makes a
pair with the above-described seventh mode.
FIG. 14 presents a functional block diagram of a
speech decoder according to the eighth mode. A parameter
decoding section 1402 obtains the speech code (the index of
power Ipow, LSP code Ilsp, adaptive/fixed index AFSEL,
index of the first random code vector after final-selection
SSEL1, second random code vector after final-selection
SSEL2, gain quantization index Ig and gain polarity index
Isls2), sent from the CELP type speech coder illustrated in
FIG. 13, via a transmitter 1401.
Next, a scalar value indicated by the index of power
Ipow is read from the power quantization table (see Table
3) stored in a power quantization table storage section
1405, is sent as decoded frame power spow to a power
restoring section 1417, and a vector indicated by the LSP


CA 02356041 2001-08-27
code Ilsp is read from the LSP quantization table an LSP
quantization table storage section 1404 and is sent as a
decoded LSP to an LSP interpolation section 1406. The
adaptive/fixed index AFSEL is sent to an adaptive code
5 vector generator 1408, a fixed code vector reading section
1411 and an adaptive/fixed selector 1412, and the index of
the first random code vector after final-selection SSEL1
and the second random code vector after final-selection
SSEL2 are output to an excitation vector generator 1414.
10 The vector (CAaf(Ig), CGst(Ig)) indicated by the gain
quantization index Ig is read from the gain quantization
table (see Table 7) stored in a gain quantization table
storage section 1403, the final gain on the final gain on
the adaptive/fixed code vector side Gaf to be actually
15 applied to AF(k) and the final gain on the random code
vector side Gst to be actually applied to ST(k) are
acquired from the equation 31 as done on the coder side,
and the acquired final gain on the adaptive/fixed code
vector side Gaf and final gain on the random code vector
20 side Gst are output together with the gain polarity index
Isls2 to an excitation vector generator 1413.
The LSP interpolation section 1406 obtains a decoded
interpolated LSP cointp(n,i) (1 5 i S Np) subframe by
subframe from the decoded LSP received from the parameter
25 decoding section 1402, converts the obtained to intp(n,i) to
an LPC to acquire a decoded interpolated LPC, and sends the
decoded interpolated LPC to an LPC synthesis filter 1416.


CA 02356041 2001-08-27
66
The adaptive code vector generator 1408 convolute some
of polyphase coefficients stored in a polyphase
coefficients storage section 1409 (see Table 5) on vectors
read from an adaptive codebook 1407, based on the
adaptive/fixed index AFSEL received from the parameter
decoding section 1402, thereby generating adaptive code
vectors to a fractional precision, and sends the adaptive
code vectors to the adaptive/fixed selector 1412. The fixed
code vector reading section 1411 reads fixed code vectors
from a fixed codebook 1410 based on the adaptive/fixed
index AFSEL received from the parameter decoding section
1402, and sends them to the adaptive/fixed selector 1412.
The adaptive/fixed selector 1412 selects either the
adaptive code vector input from the adaptive code vector
generator 1408 or the fixed code vector input from the
fixed code vector reading section 1411, as the
adaptive/fixed code vector AF(k), based on the
adaptive/fixed index AFSEL received from the parameter
decoding section 1402, and sends the selected
adaptive/fixed code vector AF(k) to the excitation vector
generator 1413. The excitation vector generator 1414
acquires the first seed and second seed from the seed
storage section 71 based on the index of the first random
code vector after final-selection SSEL1 and the second
random code vector after final-selection SSEL2 received
from the parameter decoding section 1402, and sends the
seeds to the non-linear digital filter 72 to generate the


CA 02356041 2001-08-27
67
first random code vector and the second random code vector,
respectively. Those reproduced first random code vector and
second random code vector are respectively multiplied by
the first-stage information S1 and second-stage information
S2 of the gain polarity index to generate an excitation
vector ST(k), which is sent to the excitation vector
generator 1413.
The excitation vector generator 1413 multiplies the
adaptive/fixed code vector AF(k), received from the
adaptive/fixed selector 1412, and the excitation vector
ST(k), received from the excitation vector generator 1414,
respectively by the final gain on the adaptive/fixed code
vector side Gaf and the final gain on the random code
vector side Gst, obtained by the parameter decoding section
1402, performs addition or subtraction based on the gain
polarity index Isls2, yielding the excitation vector ex(k),
and sends the obtained excitation vector to the excitation
vector generator 1413 and the adaptive codebook 1407. Here,
an old excitation vector in the adaptive codebook 1407 is
updated with a new excitation vector input from the
excitation vector generator 1413.
The LPC synthesis filter 1416 performs LPC synthesis
on the excitation vector, generated by the excitation
vector generator 1413, using the synthesis filter which is
constituted by the decoded interpolated LPC received from
the LSP interpolation section 1406, and sends the filter
output to the power restoring section 1417. The power


CA 02356041 2001-08-27
68
restoring section 1417 first obtains the mean power of the
synthesized vector of the excitation vector obtained by the
LPC synthesis filter 1416, then divides the decoded frame
power spow, received from the parameter decoding section
1402, by the acquired mean power, and multiplies the
synthesized vector of the excitation vector by the division
result to generate a synthesized speech 518.
(Ninth Mode)
FIG. 15 is a block diagram of the essential portions
of a speech coder according to a ninth mode. This speech
coder has a quantization target LSP adding section 151, an
LSP quantizing/decoding section 152, a LSP quantization
error comparator 153 added to the speech coder shown in
FIGS. 13 or parts of its functions modified.
The LPC analyzing section 1304 acquires an LPC by
performing linear predictive analysis on a processing frame
in the buffer 1301, converts the acquired LPC to produce a
quantization target LSP, and sends the produced
quantization target LSP to the quantization target LSP
adding section 151. The LPC analyzing section 1304 also has
a particular function of performing linear predictive
analysis on a pre-read area to acquire an LPC for the pre-
read area, converting the obtained LPC to an LSP for the
pre-read area, and sending the LSP to the quantization
target LSP adding section 151.
The quantization target LSP adding section 151
produces a plurality of quantization target LSPs in


CA 02356041 2001-08-27
s9
addition to the quantization target LSPs directly obtained
by converting LPCs in a processing frame in the LPC
analyzing section 1304.
The LSP quantization table storage section 1307 stores
the quantization table which is referred to by the LSP.
quantizing/decoding section 152, and the LSP
quantizing/decoding section 152 quantizes/decodes the
produced plurality of quantization target LSPs to generate
decoded LSPs.
The LSP quantization error comparator 153 compares the
produced decoded LSPs with one another to select, in a
closed loop, one decoded LSP which minimizes an allophone,
and newly uses the selected decoded LSP as a decoded LSP
for the processing frame.
FIG. 16 presents a block diagram of the quantization
target LSP adding section 151.
The quantization target LSP adding section 151
comprises a current frame LSP memory 161 for storing the
quantization target LSP of the processing frame obtained by
the LPC analyzing section 1304, a pre-read area LSP memory
162 for storing the LSP of the pre-read area obtained by
the LPC analyzing section 1304, a previous frame LSP memory
163 for storing the decoded LSP of the previous processing
frame, and a linear interpolation section 164 which
performs linear interpolation on the LSPs read from those
three memories to add a plurality of quantization target
LSPs.


CA 02356041 2001-08-27
A plurality of quantization target LSPs are
additionally produced by performing linear interpolation on
the quantization target LSP of the processing frame and the
LSP of the pre-read, and produced quantization target LSPs
5 are all sent to the LSP quantizing/decoding section 152.
The quantization target LSP adding section 151 will
now be explained more specifically. The LPC analyzing
section 1304 performs linear predictive analysis on the
processing frame in the buffer to acquire an LPC cx(i) (1
10 i 5 Np) of a prediction order Np (= 10), converts the
obtained LPC to generate a quantization target LSP cu(i) (1
5 i s Np), and stores the generated quantization target
LSP t,~ ( i ) ( 1 S i 5 Np ) in the current frame LSP memory 161
in the quantization target LSP adding section 151. Further,
15 the LPC analyzing section 1304 performs linear predictive
analysis on the pre-read area in the buffer to acquire an
LPC for the pre-read area, converts the obtained LPC to
generate a quantization target LSP cuf(i) (1 s i S Np), and
stores the generated quantization target LSP t,~(i) (1 s i
20 Np) for the pre-read area in the pre-read area LSP memory
162 in the quantization target LSP adding section 151.
Next. the linear interpolation section 164 reads the
quantization target LSP ta(i) (1 s i 5 Np) for the
processing frame from the current frame LSP memory 161, the
25 LSP W f(i) (1 s i S Np) for the pre-read area from the pre-
read area LSP memory 162, and decoded LSP coqp(i) (1 s i 5
Np) for the previous processing frame from the previous


CA 02356041 2001-08-27
71
frame LSP memory 163, and executes conversion shown by an
equation 33 to respectively generate first additional
quantization target LSP c~l(i) (1 5 i s Np), second
additional quantization target LSP t,~2(i) (1 s i S Np), and
third additional quantization target LSP t,~l(i) (1 S i
Np).
w 1(i) 0.8 0.2 0.0 cu q (i)


cu 2 - 0 0 . 0 . cu qp
( i . 3 2 ( i )
) ~


cu 3(i) 0.8 0.3 0.~ cu f (i)


(33)
where t,~l(i): first additional quantization target LSP
t,~2(i): second additional quantization target LSP
t,~3(i): third additional quantization target LSP
i: LPC order (1 S i s Np)
Np: LPC analysis order (= 10)
cuq(i);decoded LSP for the processing frame
t,~qp(i);decoded LSP for the previous processing frame
taf(i): LSP for the pre-read area.
The generated t,~ 1 ( i ) , co 2 ( i ) and c,~ 3 ( i ) are sent to the
LSP quantizing/decoding section 152. After performing
vector quantization/decoding of all the four quantization
target LSPs t,~ (i) , W 1(i) , w2(i) and t~3(i) , the LSP
quantizing/decoding section 152 acquires power Epow(w) of
an quantization error for cu ( i ) , power Epow( w 1 ) of an
quantization error for t,~ 1 ( i ) , power Epow( t~ 2 ) of an
quantization error for t~ 2 ( i ) , and power Epow( t,~ 3 ) of an


CA 02356041 2001-08-27
72
quantization error for t,~3(i), carries out conversion of an
equation 34 on the obtained quantization error powers to
acquire reference values STDlsp(w), STDlsp(W1), STDlsp(t,~
2) and STDlsp(w3) for selection of a decoded LSP.
STDIsp ( cu Epow ( cep 0 . 0010
) )


STDIsp ( u~ Epow ( cu 0 . 0005
1 ) 1 )


STDIsp ( cep Epow ( m 0 . 0002( 3 4
2 ) 2 ) )


STDIsp ( cu Epow ( cu 0 . 0000
3 ) 3 )


where STDlsp(~): reference value for selection of a decoded
LSP for C~ ( i )
STDlsp(t,~l): reference value for selection of a decoded
LSP for w 1 ( i )
STDlsp(w2): reference value for selection of a decoded
LSP for co 2 ( i )
STDlsp(t,~3): reference value for selection of a decoded
LSP for co3(i)
Epow(c~): quantization error power for w(i)
Epow( t,~ 1 ) : quantization error power for to 1 ( i )
Epow( cu 2 ) : quantization error power for t,~ 2 ( i )
Epow(t,~3): quantization error power for w3(i).
The acquired reference values for selection of a
decoded LSP are compared with one another to select and
output the decoded LSP for the quantization target LSP that
becomes minimum as a decoded LSPcoq(i) (1 5 i 5 Np) for the
processing frame, and the decoded LSP is stored in the
previous frame LSP memory 163 so that it can be referred to
at the time of performing vector quantization of the LSP of


CA 02356041 2001-08-27
73
the next frame.
According to this mode, by effectively using the high
interpolation characteristic of an LSP (which does not
cause an allophone even synthesis is implemented by using
interpolated LSPs), vector quantization of LSPs can be so
conducted as not to produce an allophone even for an area
like the top of a word where the spectrum varies
significantly. It is possible to reduce an allophone in a
synthesized speech which may occur when the quantization
characteristic of an LSP becomes insufficient.
FIG. 17 presents a block diagram of the LSP
quantizing/decoding section 152 according to this mode. The
LSP quantizing/decoding section 152 has a gain information
storage section 171, an adaptive gain selector 172, a gain
multiplier 173, an LSP quantizing section 174 and an LSP
decoding section 175.
The gain information storage section 171 stores a
plurality of gain candidates to be referred to at the time
the adaptive gain selector 172 selects the adaptive gain.
The gain multiplier 173 multiplies a code vector, read from
the LSP quantization table storage section 1307, by the
adaptive gain selected by the adaptive gain selector 172.
The LSP quantizing section 174 performs vector quantization
of a quantization target LSP using the code vector
multiplied by the adaptive gain. The LSP decoding section
175 has a function of decoding a vector-quantized LSP to
generate a decoded LSP and outputting it, and a function of


CA 02356041 2001-08-27
74
acquiring an LSP quantization error, which is a difference
between the quantization target LSP and the decoded LSP,
and sending it to the adaptive gain selector 172. The
adaptive gain selector 172 acquires the adaptive gain by
which a code vector is multiplied at the time of vector-
quantizing the quantization target LSP of the processing
frame by adaptively adjusting the adaptive gain based on
gain generation information stored in the gain information
storage section 171, on the basis of, as references, the
level of the adaptive gain by which a code vector is
multiplied at the time the quantization target LSP of the
previous processing frame was vector-quantized and the LSF
quantization error for the previous frame, and sends the
obtained adaptive gain to the gain multiplier I73.
The LSP quantizing/decoding section 152 performs
vector-quantizes and decodes a quantization target LSP
while adaptively adjusting the adaptive gain by which a
code vector is multiplied in the above manner.
The LSP quantizing/decoding section 152 will now be
discussed more specifically. The gain information storage
section 171 is storing four gain candidates (0.9, 1.0; 1.1
and 1.2) to which the adaptive gain selector 172 refers.
The adaptive gain selector 172 acquires a reference value
for selecting an adaptive gain, Slsp, from an equation 35
for dividing power ERpow, generated at the time of
quantizing the quantization target LSP of the previous
frame, by the square of an adaptive gain Gqlsp selected at


CA 02356041 2001-08-27
the time of vector-quantizing the quantization target LSP
of the previous processing frame.
ERpow
Slsp - Gqlsp 2 ( 3 5 )
where Slsp: reference value for selecting an adaptive gain
5 ERpow: quantization error power generated when
quantizing the LSP of the previous frame
Gqlsp: adaptive gain selected when vector-quantizing
the LSP of the previous frame.
One gain is selected from the four gain candidates
10 (0.9, 1.0, 1.1 and 1.2), read from the gain information
storage section 171, from an equation 36 using the acquired
reference value Slsp for selecting the adaptive gain. Then,
the value of the selected adaptive gain Gqlsp is sent to
the gain multiplier 173, and information (2-bit
15 information) for specifying type of the selected adaptive
gain from the four types is sent to the parameter coding
section.
1.2 Slsp > 0.0025


1.1 Slsp > 0.0015


Glsp ( 3 6 )
= 1.0 Slsp > 0.0008


0.9 Slsp <_ 0.0008


where Glsp: adaptive gain by which a code vector for LS
20 quantization is multiplied
Slsp: reference value for selecting an adaptive gain.
The selected adaptive gain Glsp and the error which
has been produced in quantization are saved in the variable


CA 02356041 2001-08-27
76
Gqlsp and ERpow until the quantization target LSP of the
next frame is subjected to vector quantization.
The gain multiplier 173 multiplies a code vector, read
from the LSP quantization table storage section 1307, by
the adaptive gain selected by the adaptive gain selector
172, and sends the result to the LSP quantizing section 174.
The LSP quantizing section 174 performs vector quantization
on the quantization target LSP by using the code vector
multiplied by the adaptive gain, and sends its index to the
parameter coding section. The LSP decoding section 175
decodes the LSP, quantized by the LSP quantizing section
174, acquiring a decoded LSP, outputs this decoded LSP,
subtracts the obtained decoded LSP from the quantization
target LSP to obtain an LSP quantization error, computes
the power ERpow of the obtained LSP quantization error, and
sends the power to the adaptive gain selector 172.
This mode can suppress an allophone in a synthesized
speech which may be produced when the quantization
characteristic of an LSP becomes insufficient.
(Tenth Mode)
FIG. 18 presents the structural blocks of an
excitation vector generator according to this mode. This
excitation vector generator has a fixed waveform storage
section 181 for storing three fixed waveforms (vl (length:
L1), v2 (length: L2) and v3 (length: L3)) of channels CH1,
CH2 and CH3, a fixed waveform arranging section 182 for
arranging the fixed waveforms (v1, v2, v3), read from the


CA 02356041 2001-08-27
77
fixed waveform storage section 181, respectively at
positions P1, P2 and P3, and an adding section 183 for
adding the fixed waveforms arranged by the fixed waveform
arranging section 182, generating an excitation vector.
The operation of the thus constituted excitation
vector generator will be discussed.
Three fixed waveforms vl, v2 and v3 are stored in
advance in the fixed waveform storage section 181. The
fixed waveform arranging section 182 arranges (shifts) the
fixed waveform vl , read from the fixed waveform storage
section 181, at the position P1 selected from start
position candidates for CH1, based on start position
candidate information for fixed wavefotms it has as shown
in Table 8, and likewise arranges the fixed waveforms v2
and v3 at the respective positions P2 and P3 selected from
start position candidates for CH2 and CH3.
table 8
Channel Sign start position candidate information


number for fixed
waveform


CH1 ~1 P1 (0, 0, 2 0, 3 0, w, 0, 0)
1 6 7


2, 1 2, 2 2, 3 2, w, 6 7 2
2,


CH2 1 P2


6, 16, 26. 36, w, 66, 76


4, 1 4, 2 4, 3 4, ~ 6 7 4
~ ~, 4,


CH3 1 P3


8. 18, 28, 38, ..., 68, 78


The adding section 183 adds the fixed waveforms,
arranged by the fixed waveform arranging section 182, to
generate an excitation vector.


CA 02356041 2001-08-27
78
It is to be noted that code numbers corresponding, one
to one, to combination information of selectable start
position candidates of the individual fixed waveforms
(information representing which positions were selected as
P1, P2 and P3, respectively) should be assigned to the
start position candidate information of the fixed waveforms
the fixed waveform arranging section 182 has.
According to the excitation vector generator with the
above structure, excitation information can be transmitted
by transmitting code numbers correlating to the start
position candidate information of fixed waveforms the fixed
waveform arranging section 182 has, and the code numbers
exist by the number of products of the individual start
position candidates, so that an excitation vector close to
an actual speech can be generated.
Since excitation information can be transmitted by
transmitting code numbers, this excitation vector generator
can be used as a random codebook in a speech coder/decoder.
While the description of this mode has been given with
reference to a case of using three fixed waveforms as shown
in FIG. 18, similar functions and advantages can be
provided if the number of fixed waveforms (which coincides
with the number of channels in FIG. 18 and Table 8) is
changed to other values.
Although the fixed waveform arranging section 182 in
this mode has been described as having the start position
candidate information of fixed waveforms given in Table 8,


CA 02356041 2001-08-27
'T 9
similar functions and advantages can be provided for other
start position candidate information of fixed waveforms
than those in Table 8.
(Eleventh Mode)
FIG. 19A is a structural block diagram of a CELP type
speech coder according to this mode, and FIG. 19B is a
structural block diagram of a CELP type speech decoder
which is paired with the CELP type speech coder.
The CELP type speech coder according to this mode has
an excitation vector generator which comprises a fixed
waveform storage section 181A, a fixed waveform arranging ,
section 182A and an adding section 183A. The fixed waveform
storage section 181A stores a plurality of fixed waveforms.
The fixed waveform arranging section 182A arranges (shifts)
fixed waveforms, read from the fixed waveform storage
section 181A, respectively at the selected positions, based
on start position candidate information for fixed waveforms
it has. The adding section 183A adds the fixed waveforms,
arranged by the fixed waveform arranging section 182A, to
generate an excitation vector c.
This. CELP. type s.p~ech coder has"a time reversing
section 191 for time-reversing a random codebook searching
target x to be input, a synthesis filter 192 for
synthesizing the output of the time reversing section 191,
a time reversing section 193 for time-reversing the output
of the synthesis filter 192 again to yield a time-reversed
synthesized target x', a synthesis filter 194 for


CA 02356041 2001-08-27
synthesizing the excitation vector c multiplied by a random
code vector gain gc, yielding a synthesized excitation
vector s, a distortion calculator 205 for receiving x', c
and s and computing distortion, and a transmitter 196.
According to this mode, the fixed waveform storage
section 181A, the fixed waveform arranging section 182A and
the adding section 183A correspond to the fixed waveform
storage section 181, the fixed waveform arranging section
182 and the adding section 183 shown in FIG. 18, the start
position candidates of fixed waveforms in the individual
channels correspond to those in Table 8, and channel
numbers, fixed waveform numbers and symbols indicating the
lengths and positions in use are those shown in FIG. 18 and
Table 8.
The CELP type speech decoder in FIG. 19B comprises a
fixed waveform storage section 181B for storing a plurality
of fixed waveforms, a fixed waveform arranging section 182B
for arranging (shifting) fixed waveforms, read from the
fixed waveform storage section 181B, respectively at the
selected positions, based on start position candidate
information for fixed waveforms.it has, an adding section,
183B for adding the fixed waveforms, arranged by the fixed
waveform arranging section 182B, to yield an excitation
vector c, a gain multiplier 197 for multiplying a random
code vector gain gc, and a synthesis filter 198 for
synthesizing the excitation vector c to yield a synthesized
excitation vector s.


CA 02356041 2001-08-27
8'1
The fixed waveform storage section 181B and the fixed
waveform arranging section 182B in the speech decoder have
the same structures as the fixed waveform storage section
181A and the fixed waveform arranging section 182A in the
speech coder, and the fixed waveforms stored in the fixed
waveform storage sections 181A and 181B have such
characteristics as to statistically minimize the cost
function in the equation 3, which is the coding distortion
computation of the equation 3 using a random codebook
searching target by cost-function based learning.
The operation of the thus constituted speech coder
will be discussed.
The random codebook searching target x is time-
reversed by the time reversing section 191, then
synthesized by the synthesis filter 192 and then time-
reversed again by the time reversing section 193, and the
result is sent as a time-reversed synthesized target x' to
the distortion calculator 205.
The fixed waveform arranging section 182A arranges
( shifts ) the fixed waveform vl , read from the fixed
waveform storage section 181A, at. the position Pl sel.ected
from start position candidates for CFil, based on start
position candidate information for fixed waveforms it has
as shown in Table 8, and likewise arranges the fixed
waveforms v2 and v3 at the respective positions P2 and P3
selected from start position candidates for CH2 and CH3.
The arranged fixed waveforms are sent to the adding section


CA 02356041 2001-08-27
183A and added to become an excitation vector c, which is
input to the synthesis filter 194. The synthesis filter 194
synthesizes the excitation vector c to produce a
synthesized excitation vector s and sends it to the
distortion calculator 205.
The distortion calculator 205 receives the time-
reversed synthesized target x', the excitation vector c and
the synthesized excitation vector s and computes coding
distortion in the equation 4.
The distortion calculator 205 sends a signal to the
fixed waveform arranging section 182A after computing the
distortion. The process from the selection of start
position candidates corresponding to the three channels by
the fixed waveform arranging section 182A to the distortion
computation by the distortion calculator 205 is repeated
for every combination of the start position candidates
selectable by the fixed waveform arranging section 182A.
Thereafter, the combination of the start position
candidates that minimizes the coding distortion is selected,
and the code number which corresponds, one to one, to that
combination of the start position candidates and the then
optimal random code vector gain gc are transmitted as codes
of the random codebook to the transmitter 196.
The fixed waveform arranging section 182B selects the
positions of the fixed waveforms in the individual channels
from start position candidate information for fixed
waveforms it has, based on information sent from the


CA 02356041 2001-08-27
83
transmitter 196, arranges (shifts) the fixed waveform vl,
read from the fixed waveform storage section 181H, at the
position P1 selected from start position candidates for CH1,
and likewise arranges the fixed waveforms v2 and v3 at the
respective positions P2 and P3 selected from start position
candidates for CH2 and CH3. The arranged fixed waveforms
are sent to the adding section 1838 and added to become an
excitation vector c. This excitation vector c is multiplied
by the random code vector gain gc selected based on the
information from the transmitter 196, and the result is
sent to the synthesis filter 198. The synthesis filter 198
synthesizes the gc-multiplied excitation vector c to yield
a synthesized excitation vector s and sends it out.
According to the speech coder/decoder with the above
structures, as an excitation vector is generated by the
excitation vector generator which comprises the fixed
waveform storage section, fixed waveform arranging section
and the adding section, a synthesized excitation vector
obtained by synthesizing this excitation vector in the
synthesis filter has such a characteristic statistically
close to that of an actual target as to.be able to yield a
high-quality synthesized speech, in addition to the
advantages of the tenth mode.
Although the foregoing description of this mode has
been given with reference to a case where fixed waveforms
obtained by learning are stored in the fixed waveform
storage sections 181A and 1818, high-quality synthesized


CA 02356041 2001-08-27
8~
speeches can also obtained even when fixed waveforms
prepared based on the result of statistical analysis of the
random codebook searching target x are used or when
knowledge-based fixed waveforms are used.
While the description of this mode has been given with
reference to a case of using three fixed waveforms, similar
functions and advantages can be provided if the number of
fixed waveforms is changed to other values.
Although the fixed waveform arranging section in this
mode has been described as having the start position
candidate information of fixed waveforms given in Table 8,
similar functions and advantages can be provided for other
start position candidate information of fixed waveforms
than those in Table 8.
(Twelfth Mode)
FIG. 20 presents a structural block diagram of a CELP
type speech coder according to this mode.
This CELP type speech coder includes a fixed waveform
storage section 200 for storing a plurality of fixed
wavef orms ( three in this mode : CH1 : W 1, CH2 : W 2 and CH3 : W 3 ) ,
and a fixed wavefarm arranging section 201 which has start
position candidate information of fixed waveforms for
generating start positions of the fixed waveforms, stored
in the fixed waveform storage section 200, according to
algebraic rules. This CELP type speech coder further has a
fixed waveform an impulse response calculator 202 for each
waveform, an impulse generator 203, a correlation matrix


CA 02356041 2001-08-27
calculator 204, a time reversing section 191, a synthesis
filter 192' for each waveform, a time reversing section 193
and a distortion calculator 205.
The impulse response calculator 202 has a function of
convoluting three fixed waveforms from the fixed waveform
storage section 200 and the impulse response h (length L =
subframe length) of the synthesis filter to compute three
kinds of impulse responses for the individual fixed
waveforms (CHl:hl, CH2:h2 and CH3:h3, length L = subframe
length).
The synthesis filter 192' has a function of
convoluting the output of the time reversing section 191,
which is the result of the time-reversing the random
codebook searching target x to be input, and the impulse
responses for the individual waveforms, hl, h2 and h3, from
the impulse response calculator 202.
The impulse generator 203 sets a pulse of an amplitude
1 (a polarity present) only at the start position
candidates P1, P2 and P3, selected by the fixed waveform
arranging section 201, generating impulses for the
individual channels (CHl:dl, CH2:d2 and CH3:d3).
The correlation matrix calculator 204 computes
autocorrelation of each of the impulse responses hl, h2 and
h3 for the individual waveforms from the impulse response
calculator 202, and correlations between hl and h2, hl and
h3, and h2 and h3, and develops the obtained correlation
values in a correlation matrix RR.


CA 02356041 2001-08-27
86
The distortion calculator 205 specifies the random
code vector that minimizes the coding distortion, from an
equation 37, a modification of the equation 4, by using
three time-reversed synthesis targets (x'1, x'2 and
x'3),the correlation matrix RR and the three impulses (dl,
d2 and d3) for the individual channels.
i-' ; _,
3 x' ' d
3 3
-~ ~-~ d ' H; Hid ~
(37)
where di: impulse (vector) for each channel
di = ~1 x 8(k - p.),k = 0 to L-l,p : n start position
candidates of the i-th channel
H : impulse response convolution matrix for each
waveform (H - HW )
W : fixed waveform convolution matrix
w,(o) o ... ... 0 0 0 0


w, (1) ~,,, (o) o ... 0 0 0 0


w, (1) w; (o) 0 0 0 0 0


. 0 0 0 0
W - w; (L; w; (L; . . . . ~ . , o 0 0
- i) - 2)


2 0 ' w; (L; w; (L, . . o . . o
-1) - 2) .


w, (L; ~ ~ ~ . 0 0 0
-1)


o ~. . ~. 0 0


. ~. . ~. . o
0 0 o w, (L~ _ 1) w; ~,,
. .. (1) (o)




CA 02356041 2001-08-27
87
where w is the fixed waveform (length: L ) of the
i i
i-th channel
x' ~ vector obtained by time reverse synthesis of x
i
t t
using H (x' - x H ) .
1 i i
Here, transformation from the equation 4 to the
equation 37 is shown for each of the denominator term
(equation 38) and the numerator term (equation 39).
(x 'Hc)2
- (x 'H (W,d 1 + W ,d , + W 3d 3 ))''
- (x '(H ,d 1 + H ~d , + H 3d 3 ))''
to - ((x 'H 1 )d , + (x 'H , )d , + (x 'H 3 )d 3 )''
- (x i 'd 1 + x z'd , + x 3 'd 3 )
- ( 3 x~ 'd ~ )~ (38)
;_,
where x: random codebook searching target (vector)
t
x : transposed vector of x
H: impulse response convolution matrix of the
synthesis filter
c: random code vector (c = W d + W d + W d
1 1 2 2 3 3
W : fixed waveform convolution matrix
i
di: impulse (vector) for each channel
H : impulse response convolution matrix for each
i
waveform (H - HW )
1 1
x' - vector obtained by time reverse synthesis of x
i '
t t
using H (x' - x H ) .
i i i


CA 02356041 2001-08-27
88
~~H~~~~
_~~ H(Wld, + W,d, + Wed 3 )~I2
=~~Hld, + H,d, + H3d;~~''
_ (Hld, + H,d, + H3d3)'(Hldl + H,d, + H3d~)
_ (di Hi + d;H; + d3H3 )(Hld, + H,d, + H3d~)
3 3
~t_1~;_,d~'Had~H~ (39)
where H: impulse response convolution matrix of the
synthesis filter
c: random code vector (c = Wldl + W2d2 + W3d3)
W : fixed waveform convolution matrix
i
di: impulse (vector) for each channel
H : impulse response convolution matrix for each
waveform (H - HW )
1 1
The operation of the thus constituted CELP type speech
coder will be described.
To begin with, the impulse response calculator 202
convolutes three fixed waveforms stored and the impulse
response h to compute three kinds of impulse responses hl,
h2 and h3 for the individual fixed waveforms, and sends
them to the synthesis filter 192' and the correlation
matrix calculator 204.
Next, the synthesis filter 192' convolutes the random
codebook searching target x, time-reversed by the time


CA 02356041 2001-08-27
89
reversing section 191, and the input three kinds of impulse
responses hl, h2 and h3 for the individual waveforms. The
time reversing section 193 time-reverses the three kinds of
output vectors from the synthesis filter 192' again to
yield three time-reversed synthesis targets x'1, x'2 and
x'3, and sends them to the distortion calculator 205.
Then, the correlation matrix calculator 204 computes
autocorrelations of each of the input three kinds of
impulse responses hl, h2 and h3 for the individual
waveforms and correlations between hi and h2, hl and h3,
and h2 and h3, and sends the obtained autocorrelations and
correlations value to the distortion calculator 205 after
developing them in the correlation matrix RR.
The above process having been executed as a pre-
process, the fixed waveform arranging section 201 selects
one start position candidate of a fixed waveform for each
channel, and sends the positional information to the
impulse generator 203.
The impulse generator 203 sets a pulse of an amplitude
1 (a polarity present) at each of the start position
candidates, obtained from the fixed waveform arranging
section 201, generating impulses dl, d2 and d3 for the
individual channels and sends them to the distortion
calculator 205.
Then, the distortion calculator 205 computes a
reference value for minimizing the coding distortion in the
equation 37, by using three time-reversed synthesis targets


CA 02356041 2001-08-27
x'1, x'2 and x'3 for the individual waveforms, the
correlation matrix RR and the three impulses dl, d2 and d3
for the individual channels.
The process from the selection of start position
5 candidates corresponding to the three channels by the fixed
waveform arranging section 201 to the distortion
computation by the distortion calculator 205 is repeated
for every combination of the start position candidates
selectable by the fixed waveform arranging section 201.
10 Then, code number which corresponds to the combination of
the start position candidates that minimizes the reference
value for searching the coding distortion in the equation
37 and the then optimal gain are specified with the random
code vector gain gc used as a code of the random codebook,
15 and are transmitted to the transmitter.
The speech decoder of this mode has a similar
structure to that of the tenth mode in FIG. 19B, and the
fixed waveform storage section and the fixed waveform
arranging section in the speech coder have the same
20 structures as the fixed waveform storage section and the
fixed waveform arranging section in the speech decoder. The
fixed waveforms stored in the fixed waveform storage
section is a fixed waveform having such characteristics as
to statistically minimize the cost function in the equation
25 3 by the training using the coding distortion equation
(equation 3) with a random codebook searching target as a
cost-function.


CA 02356041 2001-08-27
SZ
According to the thus constructed speech coder/decoder,
when the start position candidates of fixed waveforms in
the fixed waveform arranging section can be computed
algebraically, the numerator in the equation 37 can be
computed by adding the three terms of the time-reversed
synthesis target for each waveform, obtained in the
previous processing stage, and then obtaining the square of
the result. Further, the numerator in the equation 37 can
be computed by adding the nine terms in the correlation
matrix of the impulse responses of the individual waveforms
obtained in the previous processing stage. This can ensure
searching with about the same amount of computation as
needed in a case where the conventional algebraic
structural excitation vector (an excitation vector is
constituted by several pulses of an amplitude 1) is used
for the random codebook.
Furthermore, a synthesized excitation vector in the
synthesis filter has such a characteristic statistically
close to that of an actual target as to be able to yield a
high-quality synthesized speech.
Although the foregoing description of this mode has
been given with reference to a case where fixed waveforms
obtained through training are stored in the fixed waveform
storage section, high-quality synthesized speeches can also
obtained even when fixed waveforms prepared based on the
result of statistical analysis of the random codebook
searching target x are used or when knowledge-based fixed


CA 02356041 2001-08-27
9.2
waveforms are used.
While the description of this mode has been given with
reference to a case of using three fixed waveforms, similar
functions and advantages can be provided if the number of
fixed waveforms is changed to other values.
Although the fixed waveform arranging section in this
mode has been described as having the start position
candidate information of fixed waveforms given in Table 8,
similar functions and advantages can be provided for other
start position candidate information of fixed waveforms
than those in Table 8.
(Thirteenth Mode)
FIG. 21 presents a structural block diagram of a CELP
type speech coder according to this mode. The speech coder
according to this mode has two kinds of random codebooks A
211 and B 212, a switch 213 for switching the two kinds of
random codebooks from one to the other, a multiplier 214
for multiplying a random code vector by a gain, a synthesis
filter 215 for synthesizing a random code vector output
from the random codebook that is connected by means of the
switch 213, and a distortion calculator 216 for computing
coding distortion in the equation 2.
The random codebook A 211 has the structure of the
excitation vector generator of the tenth mode, while the
other random codebook B 212 is constituted by a random
sequence storage section 217 storing a plurality of random
code vectors generated from a random sequence. Switching


CA 02356041 2001-08-27
93
between the random codebooks is carried out in a closed
loop. The x is a random codebook searching target.
The operation of the thus constituted CELP type speech
coder will be discussed.
First, the switch 213 is connected to the random
codebook A 211, and the fixed waveform arranging section
182 arranges (shifts) the fixed waveforms, read from the
fixed waveform storage section 181, at the positions
selected from start position candidates of fixed waveforrns
respectively, based on start position candidate information
for fixed waveforms it has as shown in Table 8. The
arranged fixed waveforms are added together in the adding
section 183 to become a random code vector, which is sent
to the synthesis filter 215 after being multiplied by the
random code vector gain. The synthesis filter 215
synthesizes the input random code vector and sends the
result to the distortion calculator 216.
The distortion calculator 216 performs minimization of
the coding distortion in the equation 2 by using the random
codebook searching target x and the synthesized code vector
obtained from the synthesis filter 215.
After computing the distortion, the distortion
calculator 216 sends a signal to the fixed waveform
arranging section 182. The process from the selection of
start position candidates corresponding to the three
channels by the fixed waveform arranging section 182 to the
distortion computation by the distortion calculator 216 is


CA 02356041 2001-08-27
94
repeated for every combination of the start position
candidates selectable by the fixed waveform arranging
section 182.
Thereafter, the combination of the start position
candidates that minimizes the coding distortion is selected,
and the code number which corresponds, one to one, to that
combination of the start position candidates, the then
optimal random code vector gain gc and the minimum coding
distortion value are memorized.
Then, the switch 213 is connected to the random
codebook B 212, causing a random sequence read from the
random sequence storage section 217 to become a random code
vector. This random code vector, after being multiplied by
the random code vector gain, is input to the synthesis
filter 215. The synthesis filter 215 synthesizes the input
random code vector and sends the result to the distortion
calculator 216.
The distortion calculator 216 computes the coding
distortion in the equation 2 by using the random codebook
searching target x and the synthesized code vector obtained
from the synthesis filter 215.
After computing the distortion, the distortion
calculator 216 sends a signal to the random sequence
storage section 217. The process from the selection of the
random code vector by the random sequence storage section
217 to the distortion computation by the distortion
calculator 216 is repeated for every random code vector


CA 02356041 2001-08-27
selectable by the random sequence storage section 217.
Thereafter, the random code vector that minimizes the
coding distortion is selected, and the code number of that
random code vector, the then optimal random code vector
gain gc and the minimum coding distortion value are
memorized.
Then, the distortion calculator 216 compares the
minimum coding distortion value obtained when the switch
213 is connected to the random codebook A 211 with the
minimum coding distortion value obtained when the switch
213 is connected to the random codebook B 212, determines
switch connection information when smaller coding
distortion was obtained, the then code number and the
random code vector gain are determined as speech codes, and
are sent to an unillustrated transmitter.
The speech decoder according to this mode which is
paired with the speech coder of this mode has the random
codebook A, the random codebook B, the switch, the random
code vector gain and the synthesis filter having the same
structures and arranged in the same way as those in FIG. 21,
a random codebook to be used, a random code vector and a
random code vector gain are determined based on a speech
code input from the transmitter, and a synthesized
excitation vector is obtained as the output of the
synthesis filter.
According to the speech coder/decoder with the above
structures, one of the random code vectors to be generated


CA 02356041 2001-08-27
gb
from the random codebook A and the random code vectors to
be generated from the random codebook B, which minimizes
the coding distortion in the equation 2, can be selected in
a closed loop, making it possible to generate an excitation
vector closer to an actual speech and a high-quality
synthesized speech.
Although this mode has been illustrated as a speech
coder/decoder based on the structure in FIG. 2 of the
conventional CELP type speech coder, similar functions and
advantages can be provided even if this mode is adapted to
a CELP type speech coder/decoder based on the structure in
FIGS. 19A and 19B or FIG. 20.
Although the random codebook A 211 in this mode has
the same structure as shown in FIG. 18, similar functions
and advantages can be provided even if the fixed waveform
storage section 181 takes another structure (e.g., in a
case where it has four fixed waveforms).
While the description of this mode has been given with
reference to a case where the fixed waveform arranging
section 182 of the random codebook A 211 has the start
position candidate information of fixed waveforms as shown
in Table 8, similar functions and advantages can be
provided even for a case where the section 182 has other
start position candidate information of fixed waveforms.


CA 02356041 2001-08-27
Although this mode has been described with reference
to a case where the random codebook B 212 is constituted by
the random sequence storage section 217 for directly
storing a plurality of random sequences in the memory,
similar functions and advantages can be provided even for a
case where the random codebook B 212 takes other excitation
vector structures (e.g., when it is constituted by
excitation vector generation information with an algebraic
structure).
Although this mode has been described as a CELP type
speech coder/decoder having two kinds of random codebooks,
similar functions and advantages can be provided even in a
case of using a CELP type speech coder/decoder having three
or more kinds of random codebooks.
(Fourteenth Mode)
FIG. 22 presents a structural block diagram of a CELP
type speech coder according to this mode. The speech coder
according to this mode has two kinds of random codebooks.
One random codebook has the structure of the excitation
vector generator shown in FIG. 18, and the other one is
constituted of a pulse sequences storage section which
retains a plurality of pulse sequences. The random
codebooks are adaptively switched from one to the other by
using a quantized pitch gain already acquired before random
codebook search.
The random codebook A 211, which comprises the fixed
waveform storage section 181, fixed waveform arranging


CA 02356041 2001-08-27
9.8
section 182 and adding section 183, corresponds to the
excitation vector generator in FIG. 18. A random codebook B
221 is comprised of a pulse sequences storage section 222
where a plurality of pulse sequences are stored. The random
codebooks A 211 and B 221 are switched from one to the
other by means of a switch 213'. A multiplier 224 outputs
an adaptive code vector which is the output of an adaptive
codebook 223 multiplied by the pitch gain that has already
been acquired at the time of random codebook search. The
output of a pitch gain quantizer 225 is given to the switch
213'.
The operation of the thus constituted CELP type speech
coder will be described.
According to the conventional CELP type speech coder,
the adaptive codebook 223 is searched first, and the random
codebook search is carried out based on the result. This
adaptive codebook search is a process of selecting an
optimal adaptive code vector from a plurality of adaptive
code vectors stored in the adaptive codebook 223 (vectors
each obtained by multiplying an adaptive code vector and a
random code vector by their respective gains and then
adding them together). As a result of the process, the code
number and pitch gain of an adaptive code vector are
generated.
According to the CELP type speech coder of this mode,
the pitch gain quantizer 225 quantizes this pitch gain,
generating a quantized pitch gain, after which random


CA 02356041 2001-08-27
s_s
codebook search will be performed. The quantized pitch gain
obtained by the pitch gain quantizer 225 is sent to the
switch 213' for switching between the random codebooks.
The switch 213' connects to the random codebook A 211
when the value of the quantized pitch gain is small, by
which it is considered that the input speech is unvoiced,
and connects to the random codebook B 221 when the value of
the quantized pitch gain is large, by which it is
considered that the input speech is voiced.
When the switch 213' is connected to the random
codebook A 211, the fixed waveform arranging section 182
arranges (shifts) the fixed waveforms, read from the fixed
waveform storage section 181, at the positions selected
from start position candidates of fixed waveforms
respectively, based on start position candidate information
for fixed waveforms it has as shown in Table 8. The
arranged fixed waveforms are sent to the adding section 183
and added together to become a random code vector. The
random code vector is sent to the synthesis filter 215
after being multiplied by the random code vector gain. The
synthesis filter 215 synthesizes the input random code
vector and sends the result to the distortion calculator
216.
The distortion calculator 216 computes coding
distortion in the equation 2 by using the target x for
random codebook search and the synthesized code vector
obtained from the synthesis filter 215.


CA 02356041 2001-08-27
l.0 0
After computing the distortion, the distortion
calculator 216 sends a signal to the fixed waveform
arranging section 182. The process from the selection of
start position candidates corresponding to the three
channels by the fixed waveform arranging section 182 to the
distortion computation by the distortion calculator 216 is
repeated for every combination of the start position
candidates selectable by the fixed waveform arranging
section 182.
Thereafter, the combination of the start position
candidates that minimizes the coding distortion is selected,
and the code number which corresponds, one to one, to that
combination of the start position candidates, the then
optimal random code vector gain gc and the quantized pitch
gain are transferred to a transmitter as a speech code. In
this mode, the property of unvoiced sound should be
reflected on fixed waveform patterns to be stored in the
fixed waveform storage section 181, before speech coding
takes places.
When the switch 213' is connected to the random
codebook B 212, a pulse sequence read from the pulse
sequences storage section 222 becomes a random code vector.
This random code vector is input to the synthesis filter
215 through the switch 213' and multiplication of the
random code vector gain. The synthesis filter 215
synthesizes the input random code vector and sends the
result to the distortion calculator 216.


CA 02356041 2001-08-27
Lo 1
The distortion calculator 216 computes the coding
distortion in the equation 2 by using the target x for
random codebook search X and the synthesized code vector
obtained from the synthesis filter 215.
After computing the distortion, the distortion
calculator 216 sends a signal to the pulse sequences
storage section 222. The process from the selection of the
random code vector by the pulse sequences storage section
222 to the distortion computation by the distortion
calculator 216 is repeated for every random code vector
selectable by the pulse sequences storage section 222.
Thereafter, the random code vector that minimizes the
coding distortion is selected, and the code number of that
random code vector, the then optimal random code vector
gain gc and the quantized pitch gain are transferred to the
transmitter as a speech code.
The speech decoder according to this mode which is
paired with the speech coder of this mode has the random
codebook A, the random codebook B, the switch, the random
code vector gain and the synthesis filter having the same
structures and arranged in the same way as those in FIG. 22.
First, upon reception of the transmitted quantized pitch
gain, the coder side determines from its level whether the
switch 213' has been connected to the random codebook A 211
or to the random codebook B 221. Next, based on the code
number and the sign of the random code vector, a
synthesized excitation vector is obtained as the output of


CA 02356041 2001-08-27
1D 2
the synthesis filter.
According to the speech coder/decoder with the above
structures, two kinds of random codebooks can be switched
adaptively in accordance with the characteristic of an
input speech (the level of the quantized pitch gain is used
to determine the transmitted quantized pitch gain in this
mode), so that when the input speech is voiced, a pulse
sequence can be selected as a random code vector whereas
for a strong voiceless property, a random code vector which
reflects the property of voiceless sounds can be selected.
This can ensure generation of excitation vectors closer to
the actual sound property and improvement of synthesized
sounds. Because switching is performed in a closed loop in
this mode as mentioned above, the functional effects can be
improved by increasing the amount of information to be
transmitted.
Although this mode has been illustrated as a speech
coder/decoder based on the structure in FIG. 2 of the
conventional CELP type speech coder, similar functions and
advantages can be provided even if this mode is adapted to
a CELP type speech coder/decoder based on the structure in
FIGS. 19A and 19B or FIG. 20.
In this mode, a quantized pitch gain acquired by
quantizing the pitch gain of an adaptive code vector in the
pitch gain quantizer 225 is used as a parameter for
switching the switch 213'. A pitch period calculator may be
provided so that a pitch period computed from an adaptive


CA 02356041 2001-08-27
1D 3
code vector can be used instead.
Although the random codebook A 211 in this mode has
the same structure as shown in FIG. 18, similar functions
and advantages can be provided even if the fixed waveform
storage section 181 takes another structure (e.g., in a
case where it has four fixed waveforms ) .
While the description of this mode has been given with
reference to the case where the fixed waveform arranging
section 182 of the random codebook A 211 has the start
position candidate information of fixed waveforms as shown
in Table 8, similar functions and advantages can be
provided even for a case where the section 182 has other
start position candidate information of fixed waveforms.
Although this mode has been described with reference
to the case where the random codebook B 212 is constituted
by the pulse sequences storage section 222 for directly
storing a pulse sequence in the memory, similar functions
and advantages can be provided even for a case where the
random codebook B 212 takes other excitation vector
structures (e. g., when it is constituted by excitation
vector generation information with an algebraic structure).
Although this mode has been described as a CELP type
speech coder/decoder having two kinds of random codebooks,
similar functions and advantages can be provided even in a
case of using a CELP type speech coder/decoder having three
or more kinds of random codebooks.
(Fifteenth Mode)


CA 02356041 2001-08-27
1Q4
FIG. 23 presents a structural block diagram of a CELP
type speech coder according to this mode. The speech coder
according to this mode has two kinds of random codebooks.
One random codebook takes the structure of the excitation
vector generator shown in FIG. 18 and has three fixed
waveforms stored in the fixed waveform storage section, and
the other one likewise takes the structure of the
excitation vector generator shown in FIG. 18 but has two
fixed waveforms stored in the fixed waveform storage
section. Those two kinds of random codebooks are switched
in a closed loop.
The random codebook A 211, which comprises a fixed
waveform storage section A 181 having three fixed waveforms
stored therein, fixed waveform arranging section A 182 and
adding section 183, corresponds to the structure of the
excitation vector generator in FIG. 18 which however has
three fixed waveforms stored in the fixed waveform storage
section.
A random codebook B'230 comprises a fixed waveform
storage section B 231 having two fixed waveforms stored
therein, fixed waveform arranging section B 232 having
start position candidate information of fixed waveforms as
shown in Table 9 and adding section 233, which adds two
fixed waveforms, arranged by the fixed waveform arranging
section B 232, thereby generating a random code vector. The
random codebook B 230 corresponds to the structure of the
excitation vector generator in FIG. 18 which however has


CA 02356041 2001-08-27
1'0 5
two fixed waveforms stored in the fixed waveform storage
section.
Table 9
Channel Sign Channel Sign Start
number position


number candidates fixed
waveforms



0, 4, 8, 1 1 6. w, 7 2, 7 6
2,


CH1 ~ 1 P1


2, 6, 0, 1 1 8, w, 7 4, 7 8
1 4,


1, 5, 9, 1 1 r , 7 3, 7 r
3, w,


CH2 1 P2


3, 7, l, 1 1 9, w 7 5, 7 9
1 5,


The other structure is the same as that of the above-
described thirteenth mode.
The operation of the CELP type speech coder
constructed in the above way will be described.
First, the switch 213 is connected to the random
codebook A 211, and the fixed waveform arranging section A
182 arranges (shifts) three fixed waveforms, read from the
fixed waveform storage section A 181, at the positions
selected from start position candidates of fixed waveforms
respectively, based on start position candidate information
for fixed waveforms it has as shown in Table 8. The
arranged three fixed waveforms are output to the adding
section 183 and added together to become a random code
vector. This random code vector is sent to the synthesis
filter 215 through the switch 213 and the multiplier 214
for multiplying it by the random code vector gain. The
synthesis filter 215 synthesizes the input random code


CA 02356041 2001-08-27
106
vector and sends the result to the distortion calculator
216.
The distortion calculator 216 computes coding
distortion in the equation 2 by using the random codebook
search target X and the synthesized code vector obtained
from the synthesis filter 215.
After computing the distortion, the distortion
calculator 216 sends a signal to the fixed waveform
arranging section A 182. The process from the selection of
start position candidates corresponding to the three
channels by the fixed waveform arranging section A 182 to
the distortion computation by the distortion calculator 216
is repeated for every combination of the start position
candidates selectable by the fixed waveform arranging
section A 182.
Thereafter, the combination of the start position
candidates that minimizes the coding distortion is selected,
and the code number which corresponds, one to one, to that
combination of the start position candidates, the then
optimal random code vector gain gc and the minimum coding
distortion value are memorized.
In this mode, the fixed waveform patterns to be stored
in the fixed waveform storage section A 181 before speech
coding are what have been acquired through training in such
a way as to minimize distortion under the condition of
three fixed waveforms in use.
Next, the switch 213 is connected to the random


CA 02356041 2001-08-27
1.~ 7
codebook B 230, and the fixed waveform arranging section B
232 arranges (shifts) two fixed waveforms, read from the
fixed waveform storage section B 231, at the positions
selected from start position candidates of fixed waveforms
respectively, based on start position candidate information
for fixed waveforms it has as shown in Table 9. The
arranged two fixed waveforms are output to the adding
section 233 and added together to become a random code
vector. This random code vector is sent to the synthesis
filter 215 through the switch 213 and the multiplier 214
for multiplying it by the random code vector gain. The
synthesis filter 215 synthesizes the input random code
vector and sends the result to the distortion calculator
216.
The distortion calculator 216 computes coding
distortion in the equation 2 by using the target x for
random codebook search X and the synthesized code vector
obtained from the synthesis filter 215.
After computing~the distortion, the distortion
calculator 216 sends a signal to the fixed waveform
arranging section B 232. The process from the selection of
start position candidates corresponding to the three
channels by the fixed waveform arranging section B 232 to
the distortion computation by the distortion calculator 216
is repeated for every combination of the start position
candidates selectable by the fixed waveform arranging
section B 232.


CA 02356041 2001-08-27
108
Thereafter, the combination of the start position
candidates that minimizes the coding distortion is selected,
and the code number which corresponds, one to one, to that
combination of the start position candidates, the then
optimal random code vector gain gc and the minimum coding
distortion value are memorized. In this mode, the fixed
waveform patterns to be stored in the fixed waveform
storage section B 231 before speech coding are what have
been acquired through training in such a way as to minimize
distortion under the condition of two fixed waveforms in
use.
Then, the distortion calculator 216 compares the
minimum coding distortion value obtained when the switch
213 is connected to the random codebook B 230 with the
minimum coding distortion value obtained when the switch
213 is connected to the random codebook B 212, determines
switch connection information when smaller coding
distortion was obtained, the then code number and the
random code vector gain are determined as speech codes, and
are sent to the transmitter.
The speech decoder according to this mode has the
random codebook A, the random codebook B, the switch, the
random code vector gain and the synthesis filter having the
same structures and arranged in the same way as those in
FIG. 23, a random codebook to be used, a random code vector
and a random code vector gain are determined based on a
speech code input from the transmitter, and a synthesized


CA 02356041 2001-08-27
1D 9
excitation vector is obtained as the output of the
synthesis filter.
According to the speech coder/decoder with the above
structures, one of the random code vectors to be generated
from the random codebook A and the random code vectors to
be generated from the random codebook B, which minimizes
the coding distortion in the equation 2, can be selected in
a closed loop, making it possible to generate an excitation
vector closer to an actual speech and a high-quality
synthesized speech.
Although this mode has been illustrated as a speech
coder/decoder based on the structure in FIG. 2 of the
conventional CELP type speech coder, similar functions and
advantages can be provided even if this mode is adapted to
a CELP type speech coder/decoder based on the structure in
FIGS. 19A and 19B or FIG. 20.
Although this mode has been described with reference
to the case where the fixed waveform storage section A 181
of the random codebook A 211 stores three fixed waveforms,
similar functions and advantages can be provided even if
the fixed waveform storage section A 181 stores a different
number of fixed waveforms (e.g., in a case where it has
four fixed waveforrns ) . The same is true of the random
codebook B 230.
While the description of this mode has been given with
reference to the case where the fixed waveform arranging
section A 182 of the random codebook A 211 has the start


CA 02356041 2001-08-27
X10
position candidate information of fixed waveforms as shown
in Table 8, similar functions and advantages can be
provided even for a case where the section 182 has other
start position candidate information of fixed waveforms.
The same is applied to the random codebook B 230.
Although this mode has been described as a CELP type
speech coder/decoder having two kinds of random codebooks,
similar functions and advantages can be provided even in a
case of using a CELP type speech coder/decoder having three
or more kinds of random codebooks.
(Sixteenth Mode)
FIG. 24 presents a structural block diagram of a CELP
type speech coder according to this mode. The speech coder
acquires LPC coefficients by performing autocorrelation
analysis and LPC analysis on input speech data 241 in an
LPC analyzing section 242, encodes the obtained LPC
coefficients to acquire LPC codes, and encodes the obtained
LPC codes to yield decoded LPC coefficients.
Next, an excitation vector generator 245 acquires an
adaptive code vector and a random code vector from an
adaptive codebook 243 and an excitation vector generator
244, and sends them to an LPC synthesis filter 246. One of
the excitation vector generators of the above-described
first to fourth and tenth modes is used for the excitation
vector generator 244. Further, the LPC synthesis filter 246
filters two excitation vectors, obtained by the excitation
vector generator 245, with the decoded LPC coefficients


CA 02356041 2001-08-27
111
obtained by the LPC analyzing section 242, thereby yielding
two synthesized speeches.
A comparator 247 analyzes a relationship between the
two synthesized speeches, obtained by the LPC synthesis
filter 246, and the input speech, yielding optimal values
(optimal gains) of the two synthesized speeches, adds the
synthesized speeches whose powers have been adjusted with
the optimal gains, acquiring a total synthesized speech,
and then computes a distance between the total synthesized
speech and the input speech.
Distance computation is also carried out on the input
speech and multiple synthesized speeches, which are
obtained by causing the excitation vector generator 245 and
the LPC synthesis filter 246 to function with respect to
all the excitation vector samples those are generated by
the random codebook 243 and the excitation vector generator
244. Then, the index of the excitation vector sample which
provides the minimum one of the distances obtained from the
computation. The obtained optimal gains, the obtained index
of the excitation vector sample and two excitation vectors
corresponding to that index are sent to a parameter coding
section 248.
The parameter coding section 248 encodes the optimal
gains to obtain gain codes, and the LPC codes and the index
of the excitation vector sample are all sent to a
transmitter 249. An actual excitation signal is produced
from the gain codes and the two excitation vectors


CA 02356041 2001-08-27
112
corresponding to the index, and an old excitation vector
sample is discarded at the same time the excitation signal
is stored in the adaptive codebook 243.
FIG. 25 shows functional blocks of a section in the
parameter coding section 248, which is associated with
vector quantization of the gain.
The parameter coding section 248 has a parameter
converting section 2502 for converting input optimal gains
2501 to a sum of elements and a ratio with respect to the
sum to acquire quantization target vectors, a target vector
extracting section 2503 for obtaining a target vector by
using old decoded code vectors, stored in a decoded vector
storage section, and predictive coefficients stored in a
predictive coefficients storage section, a decoded vector
storage section 2504 where old decoded code vectors are
stored, a predictive coefficients storage section 2505, a
distance calculator 2506 for computing distances between a
plurality of code vectors stored in a vector codebook and a
target vector obtained by the target vector extracting
section by using predictive coefficients stored in the
predictive coefficients storage section, a vector codebook
2507 where a plurality of code vectors are stored, and a
comparator 2508, which controls the vector codebook and the
distance calculator for comparison of the distances
obtained from the distance calculator to acquire the number
of the most appropriate code vector, acquires a code vector
from the vector storage section based on the obtained

CA 02356041 2001-08-27
113 '
number, and updates the content of the decoded vector
storage section using that code vector.
A detailed description will now be given of the
operation of the thus constituted parameter coding section
248. The vector codebook 2507 where a plurality of general
samples (code vectors) of a quantization target vector are
stored should be prepared in advance. This is generally
prepared by an LBG algorithm (IEEE TRANSACTIONS ON
COMMUNICATIONS, VOL. COM-28, NO. 1, PP 84-95, JANUARY 1980)
based on multiple vectors which are obtained by analyzing
multiple speech data.
Coefficients for predictive coding should be stored in
the predictive coefficients storage section 2505. The
predictive coefficients will now be discussed after
describing the algorithm. A value indicating a unvoiced
stateshould be.stored as an initial value in the decoded
vector storage section 2504. One example would be a code
vector with the lowest power.
First, the input optimal gains 2501 (the gain of an
adaptive excitation vector and the gain of a random
excitation vector) are converted to element vectors
(inputs) of a sum and a ratio in the parameter converting
section 2502. The conversion method is illustrated in an
equation 40.
P = log(Ga + Gs)
R = Ga/(Ga + Gs) (40)

CA 02356041 2001-08-27
114
where (Ga, Gs): optical gain
Ga: gain of an adaptive excitation vector
Gs: gain of stochastic excitation vector
(P, R): input vectors
P: sum
R: ratio.
It is to be noted that Ga above should not necessarily
be a positive value. Thus, R may take a negative value.
When Ga + Gs becomes negative, a fixed value prepared in
advance is substituted.
Next, based on the vectors obtained by the parameter
converting section 2502, the target vector extracting
section 2503 acquires a target vector by using old decoded
code vectors, stored in the decoded vector storage section
2504, and predictive coefficients stored in the predictive
coefficients storage section 2504. An equation for
computing the target vector is given by an equation 41.
r 'r
Tp=P-(~Upixpi+~Vpixri)
i-t
r t
Tr=R-(~Urixpi+~Vrixri) (41)
t.t ~.i
where (Tp, Tr): target vector
(P, R): input vector
(pi, ri): old decoded vector
Upi, Vpi, Uri, Vri: predictive coefficients (fixed
values)

CA 02356041 2001-08-27
115
i: index indicating how old the decoded vector is
1: prediction order.
Then, the distance calculator 2506 computes a distance
between a target vector obtained by the target vector
extracting section 2503 and a code vector stored in the
vector codebook 2507 by using the predictive coefficients
stored in the predictive coefficients storage section 2505.
An equation for computing the distance is given by an
equation 42.
z
Dn = Wp X ( Tp - Up0 X Cpn - Vp0 X Crn )
z
+ Wr x ( Tr - Up0 X Cpn - Vr0 x Crn ) ( 42 )
where Dn: distance between a target vector and a code
vector
(Tp, Tr): target vector
UpO, VpO, UrO, VrO: predictive coefficients (fixed
values)
(Cpn, Crn): code vector
n: the number of the code vector
Wp, Wr: weighting coefficient (fixed) for adjusting
the sensitivity against distortion.
Then, the comparator 2508 controls the vector codebook
2507 and the distance calculator 2506 to acquire the number
of the code vector which has the shortest distance computed
by the distance calculator 2506 from among a plurality of
code vectors stored in the vector codebook 2507, and sets

CA 02356041 2001-08-27
116
the number as a gain code 2509. Based on the obtained gain
code 2509, the comparator 2508 acquires a decoded vector
and updates the content of the decoded vector storage
section 2504 using that vector. An equation 43 shows how to
acquire a decoded vector.
r r
p=(~Upixpi+~Vpixri)+UpOxCpn+VpOxCrn
r.t ..t
t t
R=(~Urixpi+~Vrixri)+UrOxCpn+VrOxCrn (4 3)
t.t ~-t
where (Cpn, Crn): code vector
(P, r): decoded vector
(pi, ri): old decoded vector
Upi, Vpi, Uri, Vri: predictive coefficients (fixed
values)
i: index indicating how old the decoded vector is
1: prediction order.
n: the number of the code vector.
An equation 44 shows an updating scheme.
Processing order
p0 = CpN
r0 = CrN
pi = pi - 1 (i = 1 "' 1)
ri = ri - 1 (i = 1 ~' 1) (44)
N: code of the gain.
Meanwhile, the decoder, which should previously be
provided with a vector codebook, a predictive coefficients


CA 02356041 2001-08-27
Zr17
storage section and a coded vector storage section similar
to those of the coder, performs decoding through the
functions of the comparator of the coder of generating a
decoded vector and updating the decoded vector storage
section, based on the gain code transmitted from the coder.
A scheme of setting predictive coefficients to be
stored in the predictive coefficients storage section 2505
will now be described.
Predictive coefficients are obtained by quantizing a
lot of training speech data first, collecting input vectors
obtained from their optimal gains and decoded vectors at
the time of quantization, forming a population, then
minimizing total distortion indicated by the following
equation 45 for that population. Specifically, the values
of Upi and Uri are acquired by solving simultaneous
equations which are derived by partial differential of the
equation of the total distortion with respect to Upi and
Uri.
Total = ~ { Wp x (Pt - ~ Upi x pt,i)- +
r_.. ._0
r
Wr x (Rt - ~ Uri x rt,i)' ~ (45)
pt,0 = Cpncr>
rt,0 = Crn~~~
where Total: total distortion
t: time (frame number)
T: the number of pieces of data in the population


CA 02356041 2001-08-27
118
(Pt, Rt): optimal gain at time t
(pti, rti): decoded vector at time t
Upi, Vpi, Uri, Vri: predictive coefficients (fixed
values)
i: index indicating how old the decoded vector is
1: prediction order.
(Cpn~t~ , Crn~t~ ) : code vector at time t
n: the number of the code vector
Wp, Wr: weighting coefficient (fixed) for adjusting
the sensitivity against distortion.
According to such a vector quantization scheme, the
optimal gain can be vector-quantized as it is, the feature
of the parameter converting section can permit the use of
the correlation between the relative levels of the power
and each gain, and the features of the decoded vector
storage section, the predictive coefficients storage
section, the target vector extracting section and the
distance calculator can ensure predictive coding of gains
using the correlation between the mutual relations between
the power and two gains. Those features can allow the
correlation among parameters to.be. utilized sufficiently.
(Seventeenth Mode)
FIG. 26 presents a structural block diagram of a
parameter coding section of a speech coder according to
this mode. According to this mode, vector quantization is
performed while evaluating gain-quantization originated
distortion from two synthesized speeches corresponding to


CA 02356041 2001-08-27
119
the index of an excitation vector and a perpetual weighted
input speech.
As shown in FIG. 26, the parameter coding section has
a parameter calculator 2602, which computes parameters
necessary for distance computation from input data or a
perpetual weighted input speech, a perpetual weighted LPC
synthesis of adaptive code vector and a perpetual weighted
LPC synthesis of random code vecror 2601 to be input, a
decoded vector stored in a decoding vector storage section,
and predictive coefficients stored in a predictive
coefficients storage section, a decoded vector storage
section 2603 where old decoded code vectors are stored, a
predictive coefficients storage section 2604 where
predictive coefficients are stored, a distance calculator
2605 for computing coding distortion of the time when
'decoding is implemented with a plurality of code vectors
stored in a vector codebook by using the predictive
coefficients stored in the predictive coefficients storage
section, a vector codebook 2606 where a plurality of cede
vectors are stored, and a comparator 2607, which controls
the vector codebook and the distance calculator for
comparison of the coding distortions obtained from the
distance calculator to acquire the number of the most
appropriate code vector, acquires a code vector from the
vector storage section based on the obtained number, and
updates the content of the decoded vector storage section
using that code vector.


CA 02356041 2001-08-27
1.2 0 '
A description will now be given of the vector
quantizing operation of the thus constituted parameter
coding section. The vector codebook 2606 where a plurality
of general samples (code vectors) of a quantization target
vector are stored should be prepared in advance. This is
generally prepared by an LBG algorithm (IEEE TRANSACTIONS
ON COMMUNICATIONS, VOL. COM-28, NO. 1, PP 84-95, JANUARY
1980) or the like based on multiple vectors which are
obtained by analyzing multiple speech data. Coefficients
for predictive coding should be stored in the predictive
coefficients storage section 2604. Those coefficients in
use are the same predictive coefficients as stored in the
predictive coefficients storage section 2505 which has been
discussed in (Sixteenth Mode). A value indicating a
unvoiced stateshould be stored as an initial value in the
decoded vector storage section 2603.
First, the parameter calculator 2602 computes
parameters necessary for distance computation from the
input perpetual weighted input speech, perpetual weighted
LPC synthesis of adaptive code vector and perpetual
weighted,LPC:synthesis of random code vector, and further
from the decoded vector stored in the decoded vector
storage section 2603 and the predictive coefficients stored
in the predictive coefficients storage section 2604. The
distances in the distance calculator are based on the
following equation 46.


CA 02356041 2001-08-27
lz 1
En = ~ (Xi - Gan x Ai - Gsn x Si)'
._u
Gan = Orn x a x p(Opn)
Gsn = (1 - Orn) x a x p(Opn)
Opn = Yp + Up0 x Cpn + Vp0 x Crn (4 6)
J J
Yp = ~ Upj x pj + ~ Vpj x rj
j_I j=1
J J
Yr = ~ Urj x p j + ~ Vrj x rj
1.1 J_I
Gan, Gsn: decoded gain
(Opn, Orn): decoded vector
(Yp, Yr): predictive vector
En: coding distortion when the n-th gain code vector
is used
Xi: perpetual weighted input speech
Ai: perpetual weighted LPC synthesis of adaptive code
vector
Si: perpetual weighted LPC synthesis of stochastic
code vector
n: code of the code vector
i: index of excitation data
I: subframe length (coding unit of the input speech)
(Cpn,. Grn.): code vector
(pj, rj): old decoded vector
Upj, Vpj, Urj, Vrj: predictive coefficients (fixed
values)
j: index indicating how old the decoded vector is
J: prediction order.
Therefore, the parameter calculator 2602 computes

CA 02356041 2001-08-27
122
those portions which do not depend on the number of a code
vector. What is to be computed are the predictive vector,
and the correlation among three synthesized speeches or the
power. An equation for the computation is given by an
equation 47.
J l
yp = ~' Upj x pj ~' Vpj x rj
+


I , - 1


J J
Yr = ~ Urj x pj ~ Vrj x rj
+


,_1 ,.1


i
D xx = ~ X i x X
i


;_o


D xa = ~ X i x A x 2
i


._o



D xs = ~ X i x S 2
i x
,
o


D a a _
= ~ A i x A
i


._n


r
D a s = ~ A i x Si 2
x


._u


D ss = ~ S~ X Sc


i _ 0


where ive vector
(Yp,
Yr):
predict


Dxx, Dxa, Dxs, Daa, Das, Dss: value of correction
among synthesized speeches or the power
Xi: perpetual weighted input speech
Ai: perpetual weighted LPC synthesis of adaptive code
vector
Si: perpetual weighted LPC synthesis of stochastic
code vector
i: index of excitation data
I: subframe length (coding unit of the input speech)
(pj, rj): old decoded vector
Upj, Vpj, Urj, Vrj: predictive coefficients (fixed


CA 02356041 2001-08-27
123
values)
j: index indicating how old the decoded vector is
J: prediction order.
Then, the distance calculator 2506 computes a distance
between a target vector obtained by the target vector
extracting section 2503 and a code vector stored in the
vector codebook 2507 by using the predictive coefficients
stored in the predictive coefficients storage section 2505.
An equation for computing the distance is given by an
equation 42.
En = Dxx + (Gan)' x Daa + (Gsn)'' x Dss
- Gan x Dxa - Gsn x Dxs + Gan x Gsn x Das
Gan = Orn x exp(Opn)
Gsn = (1- Orn) x exp(Opn)
Opn = Yp + UpO x Cpn + VpO x Crn
Orn = Yr + Ur0 x Cpn + Vr0 x Crn
where En: coding distortion when the n-th gain code vector
is used
Dxx, Dxa, Dxs, Daa, Das, Dss: value of correction
among synthesized speeches or the power
Gan, Gsn: decoded gain
(Opn, Orn): decoded vector
(Yp, Yr): predictive vector
UpO, VpO, UrO, VrO: predictive coefficients (fixed
values)
(Cpn, Crn): code vector
n: the number of the code vector.
Actually, Dxx does not depend on the number n of the


CA 02356041 2001-08-27
124
code vector so that its addition can be omitted.
Then, the comparator 2607 controls the vector codebook
2606 and the distance calculator 2605 to acquire the number
of the code vector which has the shortest distance computed
by the distance calculator 2605 from among a plurality of
code vectors stored in the vector codebook 2606, and sets
the number as a gain code 2608. Based on the obtained gain
code 2608, the comparator 2607 acquires a decoded vector
and updates the content of the decoded vector storage
section 2603 using that vector. A code vector is obtained
from the equation 44.
Further, the updating scheme, the equation 44, is used.
Meanwhile, the speech decoder should previously be
provided with a vector codebook, a predictive coefficients
storage section and a coded vector storage section similar
to those of the speech coder, and performs decoding through
the functions of the comparator of the coder of generating
a decoded vector and updating the decoded vector storage
section, based on the gain code transmitted from the coder.
According to the thus constituted mode, vector
quantization.can be performed while evaluating gain-
quantization originated distortion from two synthesized
speeches corresponding to the index of the excitation
vector and the input speech, the feature of the parameter
converting section can permit the use of the correlation
between the relative levels of the power and each gain, and
the features of the decoded vector storage section, the


CA 02356041 2001-08-27
125
predictive coefficients storage section, the target vector
extracting section and the distance calculator can ensure
predictive coding of gains using the correlation between
the mutual relations between the power and two gains. This
can allow the correlation among parameters to be utilized
sufficiently.
(Eighteenth Mode)
FIG. 27 presents a structural block diagram of the
essential portions of a noise canceler according to this
mode. This noise canceler is installed in the above-
described speech coder. For example, it is placed at the
preceding stage of the buffer 1301 in the speech coder
shown in FIG. 13.
The noise canceler shown in FIG. 27 comprises an A/D
converter 272, a noise cancellation coefficient storage
section 273, a noise cancellation coefficient adjusting
section 274, an input waveform setting section 275, an LPC
analyzing section 276, a Fourier transform section 277, a
noise canceling/spectrum compensating section 278, a
spectrum stabilizing section 279, an inverse Fourier
transfornn sectian.280., a spectrum enhancing section 281, a
waveform matching section 282, a noise estimating section
284, a noise spectrum storage section 285, a previous
spectrum storage section 286, a random phase storage
section 287, a previous waveform storage section 288, and a
maximum power storage section 289.
To begin with, initial settings will be discussed.


CA 02356041 2001-08-27
126
Table 10 shows the names of fixed parameters and setting
examples.
Table 10
Fixed Parameters Setting
Examples


frame length 160
(20
msec
for
8-kHz
sampling
data)


pre-read data length 80 (10 msec for the
above data)


FET order 2 6
5


LPC prediction order 1
0


sustaining number of noise
spectrum reference 3
0


designated minimum power 2
0.
0


AR enhancement coefficient 0 0. 5


MA enhancement coefficient 0 0. .$


high-frequency enhancement 0. 4
coefficient 0


AR enhancement coefficient 1-0 0. 6 6


MA enhancement coefficient 1-0 0 6 4
.


AR enhancement coefficient 1-1 0. 7


MA enhancement coefficient 1-1 0. 6


high-frequency enhancement 0. 3
coefficient 1


power enhancement coefficient 1. 2


noise reference power 2 0 0 0 . 0
0


unvoiced segment power 0. 3
reduction coefficient


compensation power increase 2. 0
coefficient


number of consecutive noise 5
references


noise cancellation coefficient 0. $
training 'coefficient


unvoiced segment detection 0. 0 5
coefficient


designated noise cancellation 1. 5
coefficient




CA 02356041 2001-08-27
127
Phase data for adjusting the phase should have been
stored in the random phase storage section 287. Those are
used to rotate the phase in the spectrum stabilizing
section 279. Table 11 shows a case where there are eight
kinds of phase data.
Table 11
Phase Data
( -0. 5 0. 8 6), ( - 9 8, -0. 1 7)
l, ~.


( 0. 3 0. 9 5), (-0. 5 3, -0. 8 4)
0,


( -0. 9 -0. 3 4), ( 0. 7 0, 0. 7 1)
4,


( -0. 22, 0. 97), ( 0. 3 8, -0. 92)


Further, a counter (random phase counter) for using
the phase data should have been stored in the random phase
storage section 287 too. This value should have been
initialized to 0 before storage.
Next, the static RAM area is set. Specifically, the
noise cancellation coefficient storage section 273, the
noise spectrum storage section 285, the previous spectrum
storage section 286, the previous waveform storage section
288 and the maximum power storage section 289 are cleared.
The folJ~wing~ will:.:.dz,scus.s . the in.d.ividual storage sections
and a setting example.
The noise cancellation coefficient storage section 273
is an area for storing a noise cancellation coefficient
whose initial value stored is 20Ø The noise spectrum
storage section 285 is an area for storing, for each
frequency, mean noise power, a mean noise spectrum, a


CA 02356041 2001-08-27
128
compensation noise spectrum for the first candidate, a
compensation noise spectrum for the second candidate, and a
frame number (sustaining number) indicating how many frames
earlier the spectrum value of each frequency has changed; a
sufficiently large value for the mean noise power,
designated minimum power for the mean noise spectrum, and
sufficiently large values for the compensation noise
spectra and the sustaining number should be stored as
initial values.
The previous spectrum storage section 286 is an area
for storing compensation noise power, power (full range,
intermediate range) of a previous frame (previous frame
power), smoothing power (full range, intermediate range) of
a previous frame (previous smoothing power), and a noise
sequence number; a sufficiently large value for the
compensation noise power, 0.0 for both the previous frame
power and full frame smoothing power and a noise reference
sequence number as the noise sequence number should be
stored.
The previous waveform storage section 288 is an area
for storing data of the output signal of the previous frame
by the length of the last pre-read data for matching of the
output signal, and all 0 should be stored as an initial
value. The spectrum enhancing section 281, which executes
ARMA and high-frequency enhancement filtering, should have
the statuses of the respective filters cleared to 0 for
that purpose. The maximum power storage section 289 is an


CA 02356041 2001-08-27
129
area for storing the maximum power of the input signal, and
should have 0 stored as the maximum power.
Then, the noise cancellation algorithm will be
explained block by block with reference to FIG. 27.
First, an analog input signal 271 including a speech
is subjected to A/D conversion in the A/D converter 272,
and is input by one frame length + pre-read data length
(160 + 80 = 240 points in the above setting example). The
noise cancellation coefficient adjusting section 274
computes a noise cancellation coefficient and a
compensation coefficient from an equation 49 based on the
noise cancellation coefficient stored in the noise
cancellation coefficient storage section 273, a designated
noise cancellation coefficient, a learning coefficient for
the noise cancellation coefficient, and a compensation
power increase coefficient. The obtained noise cancellation
coefficient is stored in the noise cancellation coefficient
storage section 273, the input signal obtained by the A/D
converter 272 is sent to the input waveform setting section
275, and the compensation coefficient and noise
cancellation:".coef:ficient, are sent. t.o th.e noise estimating
section 284 and the noise canceling/spectrum compensating
section 278.
q = qXC + QX(1 - C)
r = Q/qXD (49)


CA 02356041 2001-08-27
where q: noise cancellation coefficient
Q: designated noise cancellation coefficient
C: learning coefficient for the noise cancellation
coefficient
r: compensation coefficient
D: compensation power increase coefficient.
The noise cancellation coefficient is a coefficient
indicating a rate of decreasing noise, the designated noise
cancellation coefficient is a fixed coefficient previously
designated, the learning coefficient for the noise
cancellation coefficient is a coefficient indicating a rate
by which the noise cancellation coefficient approaches the
designated noise cancellation coefficient, the compensation
coefficient is a coefficient for adjusting the compensation
power in the spectrum compensation, and the compensation
power increase coefficient is a coefficient for adjusting
the compensation coefficient.
In the input waveform setting section 275, the input
signal from the A/D converter 272 is written in a memory
arrangement having a length of 2 to an exponential power
from the end in such a way that FFT (Fast Fourier
Transform) can be carried out. 0 should be filled in the
front portion. In the above setting example, 0 is written
in 0 to 15 in the arrangement with a length of 256, and the
input signal is written in 16 to 255. This arrangement is
used as a real number portion in FFT of the eighth order.
An arrangement having the same length as the real number


CA 02356041 2001-08-27
131
portion is prepared for an imaginary number portion, and
all 0 should be written there.
In the LPC analyzing section 276, a hamming window is
put on the real number area set in the input waveform
setting section 275, autocorrelation analysis is performed
on the Hamming-windowed waveform to acquire an
autocorrelation value, and autocorrelation-based LPC
analysis is performed to acquire linear predictive
coefficients. Further, the obtained linear predictive
coefficients are sent to the spectrum enhancing section 281.
The Fourier transform section 277 conducts discrete
Fourier transform by FFT using the memory arrangement of
the real number portion and the imaginary number portion,
obtained by the input waveform setting section 275. The sum
of the absolute values of the real number portion and the
imaginary number portion of the obtained complex spectrum
is computed to acquire the pseudo amplitude spectrum (input
spectrum hereinafter) of the input signal. Further, the
total sum of the input spectrum value of each frequency
(input power hereinafter) is obtained and sent to the noise
estimating section 284. The. complex.sp.ectrum itself is sent
to the spectrum stabilizing section 279.
A process in the noise estimating section 284 will now
be discussed.
The noise estimating section 284 compares the input
power obtained by the Fourier transform section 277 with
the maximum power value stored in the maximum power storage


CA 02356041 2001-08-27
132
section 289, and stores the maximum power value as the
input power value in the maximum power storage section 289
when the maximum power is smaller. If at least one of the
following cases is satisfied, noise estimation is performed,
and if none of them are met, noise estimation is not
carried out.
(1) The input power is smaller than the maximum power
multiplied by an unvoiced segment detection coefficient.
(2) The noise cancellation coefficient is larger than
the designated noise cancellation coefficient plus 0.2.
(3) The input power is smaller than a value obtained
by multiplying the mean noise power, obtained from the
noise spectrum storage section 285, by 1.6.
The noise estimating algorithm in the noise estimating
section 284 will now be discussed.
First, the sustaining numbers of all the frequencies
for the first and second candidates stored in the noise
spectrum storage section 285 are updated (incremented by 1).
Then, the sustaining number of each frequency for the first
candidate is checked, and when it is larger than a
previously, set sustaining, number of noise spectrum
reference, the compensation spectrum and sustaining number
for the second candidate are set as those for the first
candidate, and the compensation spectrum of the second
candidate is set as that of the third candidate and the
sustaining number is set to 0. Note that in replacement of
the compensation spectrum of the second candidate, the


CA 02356041 2001-08-27
133
memory can be saved by not storing the third candidate and
substituting a value slightly larger than the second
candidate. In this mode, a spectrum which is 1.4 times
greater than the compensation spectrum of the second
candidate is substituted.
After renewing the sustaining number, the compensation
noise spectrum is compared with the input spectrum for each
frequency. First, the input spectrum of each frequency is
compared with the compensation nose spectrum of the first
candidate, and when the input spectrum is smaller, the
compensation noise spectrum and sustaining number for the
first candidate are set as those for the second candidate,
and the input spectrum is set as the compensation spectrum
of the first candidate with the sustaining number set to 0.
In other cases than the mentioned condition, the input
spectrum is compared with the compensation nose spectrum of
the second candidate, and when the input spectrum is
smaller, the input spectrum is set as the compensation
spectrum of the second candidate with the sustaining number
set to 0. Then, the obtained compensation spectra and
sustaining ,number.,s of the first, and.: second candidates are
stored in the noise spectrum storage section 285. At the
same time, the mean noise spectrum is updated according to
the following equation 50.
Si = SiXg + SiX(1 - g) (50)
where s: means noise spectrum
S: input spectrum


CA 02356041 2001-08-27
1~ 4
g: 0.9 (when the input power is larger than a half the
mean noise power)
0.5 (when the input power is equal to or smaller
than a half the mean noise power)
i: number of the frequency.
The mean noise spectrum is pseudo mean noise spectrum,
and the coefficient g in the equation 50 is for adjusting
the speed of learning the mean noise spectrum. That is, the
coefficient has such an effect that when the input power is
smaller than the noise power, it is likely to be a noise-
only segment so that the learning speed will be increased,
and otherwise, it is likely to be in a speech segment so
that the learning speed will be reduced.
Then, the total of the values of the individual
frequencies of the mean noise spectrum is obtained to be
the mean noise power. The compensation noise spectrum, mean
noise spectrum and mean noise power are stored in the noise
spectrum storage section 285.
In the above noise estimating process, the capacity of
the RAM constituting the noise spectrum storage section 285
can be saved by making a noise spectrum of one frequency
correspond to the input spectra of a plurality of
frequencies. As one example is illustrated the RAM capacity
of the noise spectrum storage section 285 at the time of
estimating a noise spectrum of one frequency from the input
spectra of four frequencies with FFT of 256 points in this
mode used. In consideration of the (pseudo) amplitude


CA 02356041 2001-08-27
135
spectrum being horizontally symmetrical with respect to the
frequency axis, to make estimation for all the frequencies,
spectra of 128 frequencies and 128 sustaining numbers are
stored, thus requiring the RAM capacity of a total of 768 W
or 128 (frequencies) X2 (spectrum and sustaining number) X
3 (first and second candidates for compensation and mean).
When a noise spectrum of one frequency is made to
correspond to input spectra of four frequencies, by
contrast, the required RAM capacity~is a total of 192 W or
32 (frequencies) X2 (spectrum and sustaining number) X3
(first and second candidates for compensation and mean). In
this case, it has been confirmed through experiments that
for the above 1 X4 case, the performance is hardly
deteriorated while the frequency resolution of the noise
spectrum decreases. Because this means is not for
estimation of a noise spectrum from a spectrum of one
frequency, it has an effect of preventing the spectrum from
being erroneous estimated as a noise spectrum when a normal
sound (sine wave, vowel or the like) continues for a long
period of time.
A description will now be given of a process in the
noise canceling/spectrum compensating section 278.
A result of multiplying the mean noise spectrum,
stored in the noise spectrum storage section 285, by the
noise cancellation coefficient obtained by the noise
cancellation coefficient adjusting section 274 is
subtracted from the input spectrum (spectrum difference


CA 02356041 2001-08-27
1~ 6
hereinafter). When the RAM capacity of the noise spectrum
storage section 285 is saved as described in the
explanation of the noise estimating section 284, a result
of multiplying a mean noise spectrum of a frequency
corresponding to the input spectrum by the noise
cancellation coefficient is subtracted. When the spectrum
difference becomes negative, compensation is carried out by
setting a value obtained by multiplying the first candidate
of the compensation noise spectrum stored in the noise
spectrum storage section 285 by the compensation
coefficient obtained by the noise cancellation coefficient
adjusting section 274. This is performed for every
frequency. Further, flag data is prepared for each
frequency so that the frequency by which the spectrum
difference has been compensated can be grasped. For example,
there is one area for each frequency, and 0 is set in case
of no compensation, and 1 is set when compensation has been
carried out. This flag data is sent together with the
spectrum difference to the spectrum stabilizing section 279.
Furthermore, the total number of the compensated
( compensati.on. numbe.r.). is .acquired. by,, chec.kin.g ,ahe values of
the flag data, and it is sent to the spectrum stabilizing
section 279 too.
A process in the spectrum stabilizing section 279 will
be discussed below. This process serves to reduce allophone
feeling mainly of a segment which does not contain speeches.
First, the sum of the spectrum differences of the


CA 02356041 2001-08-27
137
individual frequencies obtained from the noise
canceling/spectrum compensating section 278 is computed to
obtain two kinds of current frame powers, one for the full
range and the other for the intermediate range. For the
full range, the current frame power is obtained for all the
frequencies (called the full range; 0 to 128 in this mode).
For the intermediate range, the current frame power is
obtained for an perpetually important, intermediate band
(called the intermediate range; 16 to 79 in this mode).
Likewise, the sum of the compensation noise spectra
for the first candidate, stored in the noise spectrum
storage section 285, is acquired as current frame noise
power (full range, intermediate range). When the values of
the compensation numbers obtained from the noise
canceling/spectrum compensating section 278 are checked and
are sufficiently large, and when at least one of the
following three conditions is met, the current frame is
determined as a noise-only segment and a spectrum
stabilizing process is performed.
(1) The input power is smaller than the maximum power
multiplied by~ an unvoiced ..segment, det~ec~ion, co.efficiez~t.
(2) The current frame power (intermediate range) is
smaller than the current frame noise power (intermediate
range) multiplied by 5Ø
(3) The input power is smaller than noise reference
power.
In a case where no stabilizing process is not


CA 02356041 2001-08-27
138
conducted, the consecutive noise number stored in the
previous spectrum storage section 286 is decremented by 1
when it is positive, and the current frame noise power
(full range, intermediate range) is set as the previous
frame power (full range, intermediate range) and they are
stored in the previous spectrum storage section 286 before
proceeding to the phase diffusion process.
The spectrum stabilizing process will now be discussed.
The purpose for this process is to stabilize the spectrum
in an unvoiced segment (speech-less and noise-only segment)
and reduce the power. There are two kinds of processes, and
a process 1 is performed when the consecutive noise number
is smaller than the number of consecutive noise references
while a process 2 is performed othez<aise. The two processes
will be described as follow.
(Process 1)
The consecutive noise number stored in the previous
spectrum storage section 286 is incremented by 1, and the
current frame noise power (full range, intermediate range)
is set as the previous frame power (full range,
intermediate..:range.,) and: they aze: s.t..or,ed in the, pr.evi~us
spectrum storage section 286 before proceeding to the phase
adjusting process.
(Process 2)
The previous frame power, the previous frame smoothing
power and the unvoiced segment power reduction coefficient,
stored in the previous spectrum storage section 286, are

CA 02356041 2001-08-27
139
referred to and are changed according to an equation 51.
Dd80 = Dd80 X 0 . 8 + A80 X 0 . 2 X P
D80 = D80 X 0 . 5 + Dd80 X 0 . 5
Dd129 = Dd129X0.8 + A129X0.2xP (51)
D129 = D129 X 0 . 5 + Dd129 X 0 . 5
where Dd80: previous frame smoothing power (intermediate
range)
D80: previous frame power (intermediate range)
Dd129: previous frame smoothing power (full range)
D129: previous frame power (full range)
A80: current frame noise power (intermediate range)
A129: current frame noise power (full range).
Then, those powers are reflected on the spectrum
differences. Therefore, two coefficients, one to be
multiplied in the intermediate range (coefficient 1
hereinafter) and the other to be multiplied in the full
range (coefficient 2 hereinafter), are computed. First, the
coefficient 1 is computed from an equation 52.
rl = D80/A80 (when A80 > 0)
1.0 (when A80 5 0) (52)
where rl: coefficient 1
D80: previous frame power (intermediate range)
A80: current frame noise power (intermediate range).
As the coefficient 2 is influenced by the coefficient


CA 02356041 2001-08-27
1-4 0
1, acquisition means becomes slightly complicated. The
procedures will be illustrated below.
(1) When the previous frame smoothing power (full
range) is smaller than the previous frame power
(intermediate range) or when the current frame noise power
(full range) is smaller than the current frame noise power
(intermediate range), the flow goes to (2), but goes to (3)
otherwise.
(2) The coefficient 2 is set to 0.0, and the previous
frame power (full range) is set as the previous frame power
(intermediate range), then the flow goes to (6).
(3) When the current frame noise power (full range) is
equal to the current frame noise power (intermediate range),
the flow goes to (4), but goes to (5) otherwise.
(4) The coefficient 2 is set to 1.0, and then the flow
goes to (6).
(5) The coefficient 2 is acquired from the following
equation 53, and then the flow goes to (6).
r2 = (D129 - D80)/(A129 - A80) (53)
where r2: coefficient 2
D129: previous frame power (full range)
D80: previous frame power (intermediate range)
A129: current frame noise power (full range)
A80: current frame noise power (intermediate range).
(6) The computation of the coefficient 2 is terminated.


CA 02356041 2001-08-27
1.41
The coefficients 1 and 2 obtained in the above
algorithm always have their upper limits clipped to 1.0 and
lower limits to the unvoiced segment power reduction
coefficient. A value obtained by multiplying the spectrum
difference of the intermediate frequency (16 to 79 in this
example) by the coefficient 1 is set as a spectrum
difference, and a value obtained by multiplying the
spectrum difference of the frequency excluding the
intermediate range from the full range of that spectrum
difference (0 to 15 and 80 to 128 in this example) by the
coefficient 2 is set as a spectrum difference. Accordingly,
the previous frame power (full range, intermediate range)
is converted by the following equation 54.
D80 = A80Xr1
D129 = D80 + (A129 - A80)x r2 (54)
where rl: coefficient 1
r2: coefficient 2
D80: previous frame power (intermediate range)
A80a. . current, frams.. noi.s.e power ( ~termediate range )
D129: previous frame power (full range)
A129: current frame noise power (full range).
Various sorts of power data, etc. obtained in this
manner are all stored in the previous spectrum storage
section 286 and the process 2 is then terminated.
The spectrum stabilization by the spectrum stabilizing

CA 02356041 2001-08-27
1-4 2
section 279 is carried out in the above manner.
Next, the phase adjusting process will be explained.
While the phase is not changed in principle in the
conventional spectrum subtraction, a process of altering
the phase at random is executed when the spectrum of that
frequency is compensated at the time of cancellation. This
process enhances the randomness of the remaining noise,
yielding such an effect of making is difficult to give a
perpetually adverse impression.
First, the random phase counter stored in the random
phase storage section 287 is obtained. Then, the flag data
(indicating the presence/absence of compensation) of all
the frequencies are referred to, and the phase of the
complex spectrum obtained by the Fourier transform section
277 is rotated using the following equation 55 when
compensation has been performed.
Bs = Si X Rc - Ti X Rc + 1
Bt = Si X Rc + 1 + Ti X Rc
Si = Bs ~ (55)
Ti = Bt
where: Si~~ - Ti.: ~ complex;. spectrum .
i: index indicating the frequency
R: random phase data
c: random phase counter
Bs, Bt: register for computation.
In the equation 55, two random phase data are used in
pair. Every time the process is performed once, the random


CA 02356041 2001-08-27
143
phase counter is incremented by 2, and is set to 0 when it
reaches the upper limit (16 in this mode). The random phase
counter is stored in the random phase storage section 287
and the acquired complex spectrum is sent to the inverse
Fourier transform section 280. Further, the total of the
spectrum differences (spectrum difference power
hereinafter) and it is sent to the spectrum enhancing
section 281.
The inverse Fourier transform section 280 constructs a
new complex spectrum based on the amplitude of the spectrum
difference and the phase of the complex spectrum, obtained
by the spectrum stabilizing section 279, and carries out
inverse Fourier transform using FFT. (The yielded signal is
called a first order output signal.) The obtained first
order output signal is sent to the spectrum enhancing
section 281.
Next, a process in the spectrum enhancing section 281
will be discussed.
First, the mean noise power stored in the noise
spectrum storage section 285, the spectrum difference power
obtained.~by the spectrum,s.tabilizing:section 279 and the
noise reference power, which is constant, are referred to
select an MA enhancement coefficient and AR enhancement
coefficient. The selection is implemented by evaluating the
following two conditions.
(Condition 1)
The spectrum difference power is greater than a value


CA 02356041 2001-08-27
1.4 4 '
obtained by multiplying the mean noise power, stored in the
noise spectrum storage section 285, by 0.6, and the mean
noise power is greater than the noise reference power.
(Condition 2)
The spectrum difference power is greater than the mean
noise power.
When the condition 1 is met, this segment is a "voiced
segment," the MA enhancement coefficient is set to an MA
enhancement coefficient 1-1, the AR enhancement coefficient
is set to an AR enhancement coefficient 1-1, and a high-
frequency enhancement coefficient is set to a high-
frequency enhancement coefficient 1. When the condition 1
is not satisfied but the condition 2 is met, this segment
is an "unvoiced segment," the MA enhancement coefficient is
set to an MA enhancement coefficient 1-0, the AR
enhancement coefficient is set to an AR enhancement
coefficient 1-0, and the high-frequency enhancement
coefficient is set to 0. When the condition 1 is satisfied
but the condition 2 is not, this segment is an~"unvoiced,
noise-only segment," the MA enhancement coefficient is set
to an MA,:. enhancement. c:o.e.ffici,ent .0 ,, the AR enhancement
coefficient is set to an AR enhancement coefficient 0, and
the high-frequency enhancement coefficient is set to a
high-frequency enhancement coefficient 0.
Using the linear predictive coefficients obtained from
the LPC analyzing section 276, the MA enhancement
coefficient and the AR enhancement coefficient, an MA


CA 02356041 2001-08-27
1~ 5
coefficient AR coefficient of an extreme enhancement filter
are computed based on the following equation 56.
i
c~ (ma)i = cxiX Q
i
a (ar)i = cxiX r (56)
where (x(ma)i: MA coefficient
a (ar)i: AR coefficient
a i: linear predictive coefficient
/3: MA enhancement coefficient
7: AR enhancement coefficient
i: number.
Then, the first order output signal acquired by the
inverse Fourier transform section 280 is put through the
extreme enhancement filter using the MA coefficient and AR
coefficient. The transfer function of this filter is given
by the following equation 57.
1 + a (ma ), x Z -' + a (ma ), x Z -2 +~ ~ ~+a (ma ) ~ x Z -'
1+a(ar)1 xZ-' +a(ar), xZ-'+~~~+a(ar)J xZ-' (5~)
where cx (ma ) ,..: - MA. coeffi,ci.ent
cx (ar) : AR coefficient
1
j: order.
Further, to enhance the high frequency component,
high-frequency enhancement filtering is performed by using
the high-frequency enhancement coefficient. The transfer
function of this filter is given by the following equation

CA 02356041 2001-08-27
146
58.
1 - 8 Z 1 (58)
where b: high-frequency enhancement coefficient.
A signal obtained through the above process is called
a second order output signal. The filter status is saved in
the spectrum enhancing section 281.
Finally, the waveform matching section 282 makes the
second order output signal, obtained by the spectrum
enhancing section 281, and the signal stored in the
previous waveform storage section 288, overlap one on the
other with a triangular window. Further, data of this
output signal by the length of the last pre-read data is
stored in the previous waveform storage section 288. A
matching scheme at this time is shown by the following
equation 59.
0 - (j x D + (L - j) x Z )/L (j - 0 ~ L - 1)
i ~ i
O - D (j - L ~ L=M - 1)
j
Z - O (j - 0 ~ L - 1) (59)
j n.j
where O : output signal
p
D : second order output signal
j
Z : output signal
i
L: pre-read data length
M: frame length.
It is to be noted that while data of the pre-read data
length + frame length is output as the output signal, that


CA 02356041 2001-08-27
1-4 ~
of the output signal which can be handled as a signal is
only a segment of the frame length from the beginning of
the data. This is because, later data of the pre-read data
length will be rewritten when the next output signal is
output. Because continuity is compensated in the entire
segments of the output signal, however, the data can be
used in frequency analysis, such as LPC analysis or filter
analysis.
According to this mode, noise spectrum estimation can
be conducted for a segment outside a voiced segment as well
as in a voiced segment, so that a noise spectrum can be
estimated even when it is not clear at which timing a
speech is present in data.
It is possible to enhance the characteristic of the
input spectrum envelope with the linear predictive
coefficients, and to possible to prevent degradation of the
sound quality even when the noise level is high.
Further, using the mean spectrum of noise can cancel
the noise spectrum more significantly. Further, separate
estimation of the compensation spectrum can ensure more
accurate.,.. compensation..
It is possible to smooth a spectrum in a noise-only
segment where no speech is contained, and the spectrum in
this segment can prevent allophone feeling from being
caused by an extreme spectrum variation which is originated
from noise cancellation.
The phase of the compensated frequency component can


CA 02356041 2001-08-27
be given a random property, so that noise remaining
uncanceled can be converted to noise which gives less
perpetual allophone feeling.
The proper weighting can perpetually be given in a
voiced segment, and perpetual-weighting originating
allophone feeling can be suppressed in an unvoiced segment
or an unvoiced syllable segment.
Industrial Applicability
As apparent from the above, an excitation vector
generator, a speech coder and speech decoder according to
this invention are effective in searching for excitation
vectors and are suitable for improving the speech quality.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 2003-07-29
(22) Filed 1997-11-06
(41) Open to Public Inspection 1998-05-14
Examination Requested 2001-08-27
(45) Issued 2003-07-29
Expired 2017-11-06

Abandonment History

There is no abandonment history.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Request for Examination $400.00 2001-08-27
Registration of a document - section 124 $50.00 2001-08-27
Application Fee $300.00 2001-08-27
Maintenance Fee - Application - New Act 2 1999-11-08 $100.00 2001-08-27
Maintenance Fee - Application - New Act 3 2000-11-06 $100.00 2001-08-27
Maintenance Fee - Application - New Act 4 2001-11-06 $100.00 2001-08-27
Advance an application for a patent out of its routine order $100.00 2001-10-11
Extension of Time $200.00 2002-03-22
Maintenance Fee - Application - New Act 5 2002-11-06 $150.00 2002-11-04
Final Fee $612.00 2003-05-06
Maintenance Fee - Patent - New Act 6 2003-11-06 $150.00 2003-11-03
Maintenance Fee - Patent - New Act 7 2004-11-08 $200.00 2004-10-07
Maintenance Fee - Patent - New Act 8 2005-11-07 $200.00 2005-10-06
Maintenance Fee - Patent - New Act 9 2006-11-06 $200.00 2006-10-06
Maintenance Fee - Patent - New Act 10 2007-11-06 $250.00 2007-10-09
Maintenance Fee - Patent - New Act 11 2008-11-06 $250.00 2008-11-05
Maintenance Fee - Patent - New Act 12 2009-11-06 $250.00 2009-10-14
Maintenance Fee - Patent - New Act 13 2010-11-08 $250.00 2010-10-25
Maintenance Fee - Patent - New Act 14 2011-11-07 $250.00 2011-10-13
Maintenance Fee - Patent - New Act 15 2012-11-06 $450.00 2012-10-10
Maintenance Fee - Patent - New Act 16 2013-11-06 $450.00 2013-10-09
Registration of a document - section 124 $100.00 2014-05-23
Registration of a document - section 124 $100.00 2014-05-26
Maintenance Fee - Patent - New Act 17 2014-11-06 $450.00 2014-10-17
Maintenance Fee - Patent - New Act 18 2015-11-06 $450.00 2015-10-14
Maintenance Fee - Patent - New Act 19 2016-11-07 $450.00 2016-10-12
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
GODO KAISHA IP BRIDGE 1
Past Owners on Record
EHARA, HIROYUKI
MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
MORII, TOSHIYUKI
PANASONIC CORPORATION
WATANABE, TAISUKE
YASUNAGA, KAZUTOSHI
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Drawings 2002-12-17 23 505
Claims 2002-12-17 5 172
Representative Drawing 2003-06-17 1 19
Cover Page 2003-07-02 1 49
Claims 2001-08-27 5 156
Cover Page 2001-10-29 1 43
Abstract 2001-08-27 1 14
Drawings 2001-08-27 23 489
Description 2001-08-27 150 5,054
Representative Drawing 2001-10-04 1 13
Prosecution-Amendment 2002-12-17 9 299
Correspondence 2003-05-06 1 35
Fees 2003-11-03 1 36
Correspondence 2002-03-22 1 41
Assignment 2001-08-27 3 128
Correspondence 2001-09-13 1 43
Correspondence 2001-10-01 1 13
Prosecution-Amendment 2002-04-29 1 15
Prosecution-Amendment 2001-10-11 2 66
Prosecution-Amendment 2001-10-24 1 12
Prosecution-Amendment 2001-11-28 2 58
Prosecution-Amendment 2002-05-27 3 130
Prosecution-Amendment 2002-08-27 1 31
Fees 2002-11-04 1 35
Assignment 2014-05-26 5 190
Assignment 2014-05-23 6 220