Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
214~6g3
SPEECH CODER
BACKGROUND OF THE INVENTION
The present invention relates to a speech coder
and, more particularly, to a CELP type speech coder
for high quality coding of speech signals at as low
bit rates as 8 to 4 kb/s.
Recently, digitalization of automobile
telephones and cordless telephones through the use
of radio waves have been in rapid advancement. The
frequency band of the radio wave allocated for these
kinds of telephone communications is limited, and
development of low bit rate speech coding systems is
important for the reduction of the frequency band.
As an example of well-known coding system of this
type with a bit rate of about 8 to 4 kb/s, CELP
(Code-Excited LPC coding) has been proposed, as
described in, for instance, M. Schroeder and B.S.
Atal, "Code-excited Linear Prediction: High Quality
Speech at Low Bit Rates", ICASSP proceedings 85, pp.
937-940, 1985, published in U.S.A. (Literature 1).
In the CELP, which is a first prior art speech
coder disclosed in Literature 1, the transmitting
side executes the coding according to the following
procedure. First, a short period prediction code
representative of the frequency characteristics of
the speech is extracted from the speech signal for
each frame (for instance, 20 ms) (short term
prediction). Then, the frame is divided into a
214469'~
-
plurality of sub-frames having shorter length (of 5
ms, for instance). After extraction of a pitch
parameter representative of a long-term correlation
(i.e., pitch correlation) from a past excitation
signal for each sub-frame, a speech signal of the
sub-frame is long-term predicted from that pitch
parameter. The long-term prediction is performed,
by determining lag code representing the pitch
correlation according to the following procedure,
based on an adaptive codebook constituted by
excitation signals (adaptive codevectors) obtained
by delaying the past excitation signal for delay
samples corresponding to various lag codes.
Adaptive codevectors corresponding to the respective
lag codes are extracted by trying the lag codes in
correspondence to the size of the codebook. Then, a
synthesized signal is produced based on the
extracted adaptive codevectors, and the error power
between the synthesized signal and the speech signal
is calculated. Next, the optimum lag code
corresponding to the minimum calculated error power,
the adaptive codevector corresponding to the optimum
lag code and the corresponding gain are determined.
Subsequently, an excitation codevector
minimizing the error power between the synthesized
signal produced from the excitation vectors
extracted from noise signals (excitation codebook)
as quantization codes prepared in advance and a
214469~
.
residue signal obtained through the above long-term
prediction, and the corresponding gain are
determined (excitation codebook search). Thus
determined adaptive codevector index and index
representing the determined excitation codevector
are transmitted together with the gains of the
respective excitation signals and indexes
representing kinds of the spectrum parameters.
The lag code of the adaptive codevector and the
quantization code of the excitation codevector are
searched according to the following method. First,
a signal z(n) is calculated through performing the
perceptual weighting and the subtraction of the past
influence signal for the input speech signal x(n).
Next, a synthesis filter H, constituted by the
spectral parameters obtained by the above short-term
prediction, quantization and inverse quantization,
is driven by codevector ej(n) of quantization code
j to synthesize signal Hej(n). Then, the
quantization code j minimizing the error power E
between the signal z(n) and Hej(n) are determined
according to an equation (1).
N,- I .
(z[%]-~cj ~ tj[nD2 ( 1 )
neO
Here, N, represents the sub-frame length, H the
matrix for realizing the synthesis filter, and g
the gain of the codevector e;. In practice, the
equation (1) is developed as follows:
214469~
~ zf~]2_Cjj (2)
In the equation (2), the denumerator Cj represents
cross-correlation, and the numerator Gj auto-
correlation, which are developed from an equation(3) and (4).
Cj=~[~]~Cl~] (3)
~-1
Gj~L(~ [nD2 (4)
The calculation of the auto- and cross-
correlations Gj and Cj is performed through the
driving of the synthesis filter, i.e., filtering,
after calculation of the signal Hej(n). The
operation of the filtering is performed to the
extent corresponding to the codebook size as noted
above. This means that a large amount of
computations (i.e., number of times of
multiplification and addition) is required for the
frame under the process.
As a second prior art speech coder reducing the
amount of computations for the long-term prediction,
open and closed loop search of lag codes, which is
disclosed in Japanese Patent Laid-open No. Hei
2-228581, entitled "Digital Speech Coder and Method
for obtaining parameters used in the same Coder".
In this method, the preliminary selection of the lag
code is performed by means of the open loop, and the
search of the codes in the neighborhood of the lag
2144693
code determined by the preliminary selection is
performed by means of the closed loop. Thus, the
long-term prediction is realized with prediction
accuracy and with reduced operation amount.
Fig. 5 is a block diagram showing a CELP type
speech coder which includes the above first and
second prior art speech coders. The illustrated
speech coder comprises a coder 1 for coding the
input speech signal, a decoder 2 for decoding the
coded signal, and a transmission line 3 for
connecting the coder 1 and the decoder 2.
The coder 1 includes a buffer 11 for storing
the speech signal supplied from an input terminal
T1, an LPC analyzer 12 for calculating LPC
coefficients as the speech spectral parameter, a
parameter quantizer 13 for quantizing the LPC
coefficients, a weighting circuit 14 for perceptual
weighting the speech signal, an adaptive codebook 15
for storing past excitation signal, a long-term
predictor 16 for searching an adaptive codevector as
a lag code representing the pitch correlation, an
excitation codebook 17 as a codebook including
excitation codevector of sub-frame length
representing the long-term prediction residue, an
excitation codebook search circuit 18 for
determining the optimum excitation codevector from
the excitation codebook, a gain codebook 19 for
storing parameters representing gains of the
2144693
,
adaptive codevector and the excitation codevector, a
gain codebook search circuit 40 for determining
quantization gains of the adaptive and excitation
codevectors, and a multiplexer 41 for outputting the
code series combination.
The excitation codebook 17 may be the noise
codebook disclosed in Literature 1 or a learned
codebook produced by means of learning with vector
quantization (VQ) algorithm described in Japanese
Patent Laid-open No. Hei 2-22955, or Japanese Patent
Laid-open No. Hei 2-22956.
The decoder 2 comprises a de-multiplexer 21 for
decoding the supplied transfer codes into
predetermined code series, an adaptive codebook 22
which is the same as the adaptive codebook 15, an
excitation codebook 23 which is the same as the
excitation codebook 17, a gain codebook 24 which is
the same as the gain codebook 19, a synthesis filter
25 for reproducing a speech signal based on the
produced excitation and the speech synthesis filter,
and a speech output terminal T0.
Fig. 6 is a block diagram showing the structure
of the long-term predictor 16. This prior art
long-term predictor 16 comprises a lag code
generator 161 for varying the lag code by an amount
corresponding to the adaptive codebook size, an
adaptive codevector generator 162 for generating
codevector ed(n) corresponding to the lag code d set
214~69~
in the lag code generator 161 based on the past
signal stored in a codebook 166, a synthesis filter
163 for producing the weighted adaptive codevector
H-ed(n) as a synthesized signal for the input of the
adaptive codevector ed(n), an evaluation function
calculator 164 for calculating an evaluation
function representing the error power between the
speech signal stored in the speech buffer and the
synthesized signal H-ed(n), an optimum lag code
determiner 165 for determining an optimum lag code
CD based on the evaluation functions corresponding
to all varied lag codes d, the codebook 166 as a
buffer for storing such signals as the past
excitation signal, residue signal, weighted signal
or speech signal for the adaptive codebook search,
and a speech buffer 167 for storing the speech
signal in the coding interval and searching the lag
code minimizing the error power for the speech
signal.
Now, the processes of the speech coder in the
prior art will be described with reference to Figs.
5 and 6. First in the coder 1, the speech signal is
supplied from the input terminal T1 and stored in
the buffer 11. The LPC analyzer 12 calculates the
LPC coefficients of a predetermined sample speech
signal stored in the buffer 11 by means of the
short-term prediction analysis. The parameter
quantizer 13 quantizes the LPC coefficients obtained
2144693
in the LPC analyzer 12. Quantization code CL of the
LPC coeffici~nt is thus supplied to the multiplexer
41, and is inversely quantized for a subsequent
quantizing process.
The weighting circuit 14 performs the
perceptual weighting for the speech signal stored in
the buffer 11 using the quantized and inversely
quantized LPC coefficients, and the perceptual
weighted signal SW to the long-term predictor 16,
excitation codebook search circuit 18 and codebook
search circuit 40 to be used for subsequent codebook
search.
Subsequently, codebook search of the signal SW
is performed by using the adaptive, excitation and
gain codebooks 15, 17 and 19. First, the long-term
predictor 16 performs the long-term prediction to
determine the optimum lag code CD representing the
pitch coorelation as will be described later and to
generate the corresponding adaptive codevector, and
the lag code CD is transferred to the multiplexer
41. The influence of the adaptive codevector is
subtracted and the excitation codebook search
circuit 18 performs the excitation codebook search
to determine the quantization code CS and generates
the excitation codevector. The quantized codevector
CS is transferred to the multiplexer. After the
adaptive and excitation codevectors are determined,
the gain codebook search circuit 40 calculates the
21~4693
two excitations gains, and transfers the code CG
thereof to the multiplexer 41. In the multiplexer
41, the codes CL, CD, CS and CG are combined and
converted into the transfer code CT, and the code CT
is transferred via the transmission line 3 to the
decoder 2.
In the decoder 2, the de-multiplexer 21
decomposes the transfer code CT supplied from the
transmission line 3 into the codes CL, CD, CS and
CG. From the code CL corresponding to the LPC
coefficients, the filter coefficients are decoded to
be transferred to the synthesis filter 25. From the
lag code CD, the adaptive codevector is produced
using the adaptive codebook 15. From the
quantization code CS corresponding to the
excitation, the excitation codevector is produced
using the excitation codebook 17. From the code CG
corresponding to the gain, the gains of the adaptive
and excitation codevectors are calculated. Each of
the individual excitations is multiplied by the gain
term to produce the synthesis filter input signals.
Finally, using the input signals the synthesis
filter 25 synthesizes the speech signal which is to
be output from the terminal T0.
Now, the operation of the long-term predictor
16 will be described with reference to Fig. 6.
First, the lag code generator 161 varies the lag
code corresponding to the signal SW in the adaptive
214~693
codebook size, thus setting the lag code d. The
delay code d preferably may be a code representative
of the fractional lag, but it may be a code
representative of the integral number lag as well.
After the adaptive codevector generator 162 produces
the adaptive codevector ed(n) corresponding to the
lag code d from the codebook 166, the synthesis
filter 163 produces the weighted adaptive codevector
H-ed(n) as a zero state synthesized signal with the
input of the adaptive codevector ed(n). The
evaluation function calculator 164 calculates the
weighted adaptive codevector H-ed(n), cross-
correlation Cd and auto-correlation Gd of the zero
~ input response subtraction signal z(n) stored in the
speech buffer 167, and the evaluation function Cd2/Gd
corresponding to the error power. The processes of
the adaptive codevector generator 162, synthesis
filter 163 and evaluation function calculator 164
are performed for all the lag codes as the subject
of variation by the lag code generator 161. Then,
the optimum lag code determiner 165 determines the
lag code d corresponding to the maximum value of the
evaluation function Cd/Gd as the optimum lag code
CD.
SUMMARY OF THE INVENTION
The above first speech coder has a drawback
that the codebook search in the long-term prediction
requires a large amount of computations, so that it
214469~
is difficult to realize a speech coding with a low
bit rate and enough quality.
The above second prior art speech coder also
requires relatively huge amount of operations and
provides difficulty in realizing the low bit rate
and enough quality speech coder.
An object of the present invention is therefore
to reduce the computation amount in the long-term
prediction so as to realize a speech coder of as low
bit rate as about 4 kb/s and enough speech quality.
According to the present invention, there is
provided a speech coder comprising: speech analysis
means for analyzing speech signal having a
predetermined frame length to estimate a speech
spectral parameter and generate a corresponding
short-term prediction code, adaptive codebook
storing means for storing each past excitation
signal of a maximum integer lag length, long-term
predicting means for delaying the excitation signals
read out from the adaptive codebook storing means in
a predetermined lag range, determining a lag value
which minimizes the weighted square error as a
measure of the evaluation and searches an adaptive
codevector as the optimum lag code representing the
pitch correlation of the speech signals, excitation
codebook storing means for storing excitation
codevectors as quantization codes representing
residue signals after substracting output of the
11
214~69~
long-term prediction from the speech signal, and
excitation codebook search means for determining the
optimum excitation codevector from the excitation
codebook, the long-term predicting means including
lag code decimating and varying means for varying
the delay codes through decimating thereof.
Here, the lag code decimating and varying means
includes uniformlly decimating means for uniformly
decimating the lag codes each for every
predetermined number thereof. The uniformly
decimating means may extract only odd lag codes.
The lag code decimating and varying means may
include non-uniformly decimating means for
decimating the lag codes according to a process of
determining codes to be decimated by predetermined
non-uniformly decimating. The non-uniformly
~ec;m~ting means may extract all lag codes with the
short delays, whereas only even lag codes of long
delays.
The long-term prediction means may further
include sample decimating means for decimating with
predetermined ratios the speech signal and weighted
adaptive codevectors corresponding to the lag codes,
and supplies the decimated speech signal and
weighted adaptive codevectors to evaluation function
calculating means for calculating the weighted
square error. The sample decimating means may
execute a down-sampling with a low-pass filter or
2144693
simple decimating.
The long-term prediction means may include lag
code candidate determining means for selecting a
predetermined number of lag code candidates, and
final lag code determining means for determining the
optimum lag code among the lag code candidates.
Other objects and features will be clarified
from the following description with reference to
attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram showing the structure
of long-term predictor 16A according to a first
embodiment of the present invention;
Fig. 2 is a block diagram showing the structure
of long-term predictor 16B according to a second
embodiment of the present invention;
Fig. 3 is a block diagram showing the structure
of long-term predictor 16C according to a third
embodiment of the present invention;
Fig. 4 is a block diagram showing the structure
of long-term predictor 16D according to a fourth
embodiment of the present invention;
Fig. 5 is a block diagram showing a CELP type
speech coder including prior art speech coders; and
Fig. 6 is a block diagram showing the structure
of the long-term predictor 16 in the prior art
speech coder.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
13
214g693
Fig. 1 is a block diagram showing the structure
of long-term predictor 16A according to a first
embodiment of the present invention. The long-term
predictor 16A comprises the adaptive codevector
generator 162, synthesis filter 163, evaluation
function calculator 164, optimum delay code
determiner 165, codebook 166 and speech buffer 167
in the prior art long-term predictor 16, and a
uniformly decimated lag code genarator 168, which is
provided in lieu of the lag code generator 161 and
serves to vary the adaptive codebook lag code
through the uniformly decimating thereof.
The operation of this embodiment will now be
described with reference to Fig. 1. First, the
uniformly decimated lag code genarator 168 varies
the lag codes through the uniformly decimating
thereof within the codebook size, thus setting lag
codes d to be processed. For each lag code d, the
adaptive codevector generator 162 generates the
adaptive codevector ed(n), the synthesis filter 163
generates the weighted adaptive codevector H-ed(n),
and the evaluation function calculator 164
calculates the evaluation function Cd/Gd. In the
process of the uniformly decimating noted above,
only odd delay codes are extracted. The processes
of the adaptive codevector generation, the weighting
adaptive codevector generation, the evaluation
function calculation and the optimum lag code
14
214~693
determination as described above are like those in
the prior art, and their detailed description is not
given.
After the above processes for all the lag codes
d to be varied, as in the prior art the optimum lag
code determiner 165 determines and outputs the delay
code d corresponding to the maximum value of the
evaluation function Cd/Gd as the optimum lag code
CD. As described above, in this embodiment the
operation amount is reduced by decimating the lag
code at the long-term prediction.
Fig. 2 is a block diagram showing the structure
of a long-term predictor 16B according to a second
embodiment of the present invention with constituent
elements like those in Fig. 1 being designated by
like reference numerals and symbols. The long-term
predictor 16B is different from the long-term
predictor 16A in the first embodiment in that it
includes a non-uniformly decimated lag code
generator 169, which is provided in lieu of the
uniformly decimated lag code generator 168 and
serves to vary adaptive codebook lag codes through
non-uniformly decimating thereof.
The operation of this embodiment is the same as
in the above first embodiment except for that the
non-uniformly decimated lag code genarator 169 sets
the lag code d to be processed by varying lag code
through the non-uniformly decimating thereof within
214~693
the codebook size. In the process of the
non-uniformly decimating, all lag codes with the
short delays are extracted, whereas only even lag
codes of long delays are extracted.
Fig. 3 is a block diagram showing the structure
of long-term predictor 16C according to a third
embodiment of the present invention, with
constituent elements like those in Fig. 2 designated
by like reference numerals. The long-term predictor
16C is different from the long-term predictor 16B in
the second embodiment in that it includes, in
addition to the non-uniformly decimated lag code
genarator 169, sample decimation circuits 170 and
180 provided on the output side of the synthesis
filter 163 and the speech buffer 167, respectively,
for decimating the input signal to 1/D (a given
integer).
In operation, as in the second embodiment, the
non-uniformly decimated lag code generator 169
varies the lag codes through the non-uniformly
decimating thereof in the codebook size, thus
setting the delay codes d to be processed. For each
lag d, the adaptive codevector generator 162
generates adaptive codevector ed(n), and the
synthesis filter 163 generates the weighted adaptive
codevector H-et(n). Then, the sample decimation
circuit 170 executes sample decimating of the
weighted adaptive codevector H-ed(n) to 1/D (D being
16
2144693
D = 2, for instance) to generate decimated
codevector H-ed(n)'. Likewise, the sample decimation
circuit 180 executes the sample decimating of the
coAi~g interval zero input response subtraction
signal z(n) stored in the speech buffer 167 to 1/D
to generate the decimated input response subtraction
signal z(n)'. The evaluation function calculator
164 calculates cross- and auto-functions Cd and Gd of
the decimated codevector H-ed(n)' and the decimated
zero input response subtraction signal z(n)'. As in
the second embodiment, the evaluation function Cd/Gd
is determined from the cross- and auto-correlations
Cd and Gd. The sample decimating is executed through
down-sampling with a low-pass filter or simple
decimating. The other processes as noted above are
executed in the same manner as in the second
embodiment.,
In this embodiment, further computation amount
reduction is realized by using the signal obtained
through the sample decimating when executing the
correlation calculation.
Fig. 4 is a block diagram showing the structure
of long-term predictor 16D according to a fourth
embodiment of the present invention, with
constituent elements like those in Fig. 3 designated
by like reference numerals and symbols. The
long-term predictor 16D is different from the
long-term predictor 16C in the third embodiment is
17
214~693
in that it includes a lag code candidate determiner
171, which preliminarily selects M lag code
candidates, i.e., candidates of optimum lag code CD
based on the evaluation function Cd/Gd output from
the evaluation function calculator 164 as reference,
and a final lag code determiner 172, which is
provided in lieu of the optimum lag code determiner
165 and determines the optimum lag code CD among the
M lag code candidates.
In operation, the evaluation function Cd/Gd,
obt~ine~ as a result of a process like that in the
third embodiment for all the lag codes to be varied
by the non-uniformly decimated lag code generator
169, is supplied to the lag code candidate
determiner 171. The lag code candidate determiner
171 preliminarily selects M (M = 5, for instance)
lag code candidates Dl to DM (i.e., DM=D5) with the
evaluation function Cd/Gd. The final lag code
determiner 172 determines the optimum lag code CD
among these lag code candidates Dl to D5 in a
codebook search process as in the prior art.
In this embodiment, the sample decimating is
used for the preliminary selection of the optimum
lag code, and the lag codes corresponding to the
preliminarily selected lag codes are used for
retrieval as in the prior art. Thus, it is possible
to reduce computation amount with the improved
accuracy.
21446~3
While some preferred embodiments of the present
invention have been described, they are by no means
limitative, and various changes and modifications
are possible.
For example, while the square of the cross-
correlation divided by the auto-correlation has been
used as the evaluation function, the similar effects
may be obtained by utilizing only the square of the
cross-correlation. Also, while the weighted
adaptive codevector obtained based on the adaptive
codevector has been used, it is possible to use as
well the adaptive codevector itself without any
synthesizing process to obtain the similar effects.
Further, instead of the signal stored in the speech
buffer, the input speech signal, the residue signal
or the perceptual weighted signal may be employed as
the zero state subtraction signal. It is also
possible to use the past input speech signal, the
residue signal or the perceptual weighted signal
instead of the past excitation signal as the
codebook. Still further, while a process of direct
filtering with the synthesis filter H has been used
for the calculation of the cross- and auto-
correlations, it is possible to use a process using
an approximation equation as well to obtain the
similar effects. For the approximation process,
reference may be useful, for instance, IM. Trancoso
and Atal, "Efficient Procedures for Finding the
19
2144693
Optimum Innovation in Stocastic Coders", ICASSP
Proceedings 86, pp. 2375-2378, 1986, published in
U.S.A. In lieu of selecting a single optimum lag
code, a plurality of candidates may selected and the
regular selection may be performed in the next step
or simultaneous optimum lag code search may be
performed to obtain the similar effects. Further,
instead of using the LPC analyzer, it is possible as
well to employ other analysis methods such as a BURG
method of extracting spectrum parameter to obtain
the similar effects. It is obvious that other
parameters such as PARCOR or LSP coefficients permit
the similar effects to be obtained. Moreover, it is
of course possible to construct the excitation
codebook retrieval circuit as a multiple stage
structure instead of the single stage structure
without departing from the gist of the present
invention.
As has been described in the foregoing, in the
speech coder according to the present invention the
long-term predictor comprising the lag code
decimating and varying means for varying lag code
through the decimating thereof. The decimating of
the lag code reduces the number of lag codes to be
retrieved, thus reducing the computation amount in
the adaptive codebook search.