Language selection

Search

Patent 2469774 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2469774
(54) English Title: SIGNAL MODIFICATION METHOD FOR EFFICIENT CODING OF SPEECH SIGNALS
(54) French Title: PROCEDE DE MODIFICATION DU SIGNAL ASSURANT LE CODAGE EFFICACE DES SIGNAUX DE PAROLE
Status: Dead
Bibliographic Data
(51) International Patent Classification (IPC):
  • G10L 19/09 (2013.01)
  • G10L 19/00 (2013.01)
  • G10L 19/10 (2013.01)
  • G10L 19/12 (2013.01)
(72) Inventors :
  • TAMMI, MIKKO (Finland)
  • JELINEK, MILAN (Canada)
  • LAFLAMME, CLAUDE (Canada)
  • RUOPPILA, VESA T. (Canada)
(73) Owners :
  • NOKIA CORPORATION (Finland)
(71) Applicants :
  • VOICEAGE CORPORATION (Canada)
(74) Agent: SIM & MCBURNEY
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 2002-12-13
(87) Open to Public Inspection: 2003-06-26
Examination requested: 2007-10-18
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/CA2002/001948
(87) International Publication Number: WO2003/052744
(85) National Entry: 2004-06-09

(30) Application Priority Data:
Application No. Country/Territory Date
2,365,203 Canada 2001-12-14

Abstracts

English Abstract




For determining a long-term-prediction delay parameter characterizing a long
term prediction in a technique using signal modification for digitally
encoding a sound signal, the sound signal is divided into a series of
successive frames, a feature of the sound signal is located in a previous
frame, a corresponding feature of the sound signal is located in a current
frame, and the long-term-prediction delay parameter is determined for the
current frame while mapping, with the long term prediction, the signal feature
of the previous frame with the corresponding signal feature of the current
frame. In a signal modification method for implementation into a technique for
digitally encoding a sound signal, the sound signal is divided into a series
of successive frames, each frame of the sound signal is partitioned into a
plurality of signal segments, and at least a part of the signal segments of
the frame are warped while constraining the warped signal segments inside the
frame. For searching pitch pulses in a sound signal, a residual signal is
produced by filtering the sound signal through a linear prediction analysis
filter, a weighted sound signal is produced by processing the sound signal
through a weighting filter, the weighted sound signal being indicative of
signal periodicity, a synthesized weighted sound signal is produced by
filtering a synthesized speech signal produced during a last subframe of a
previous frame of the sound signal through the weighting filter, a last pitch
pulse of the sound signal of the previous frame is located from the residual
signal, a pitch pulse prototype of given length is extracted around the
position of the last pitch pulse of the sound signal of the previous frame
using the synthesized weighted sound signal, and the pitch pulses are located
in a current frame using the pitch pulse prototype.


French Abstract

Pour déterminer un paramètre de retard pour une prévision à long terme caractérisant une prévision à long terme dans une technique utilisant la modification du signal pour le codage numérique d'un signal sonore, le signal sonore est divisé en une série de trames successives, une caractéristique du signal sonore est placée dans une trame précédente, une caractéristique correspondante du signal sonore est placée dans une trame du moment et le paramètre de retard pour une prévision à long terme est déterminé pour la trame du moment alors que dans le même temps a lieu le mappage, avec la prévision à long terme, de la caractéristique du signal de la trame précédente avec la caractéristique du signal correspondant de la trame du moment. Dans un procédé de modification du signal destiné à être mis en oeuvre dans une technique de codage numérique d'un signal sonore, le signal sonore est divisé en une série de trames successives, chaque trame du signal sonore étant fractionnée en une pluralité de segments de signal, et au moins une partie des segments de signal de la trame étant déformée alors que les segments de signal déformés sont contraints dans la trame. Pour rechercher des impulsions de crête dans un signal sonore, un signal résiduel est produit au moyen du filtrage du signal sonore par un filtre d'analyse de prédiction linéaire, un signal sonore pondéré est produit par le traitement du signal sonore par un filtre à pondération, le signal sonore pondéré indiquant une périodicité du signal, un signal sonore pondéré synthétisé est produit par le filtrage d'un signal sonore synthétisé produit pendant au moins une dernière sous-trame d'une trame précédente du signal sonore au moyen du filtre à pondération, une dernière impulsion de crête du signal sonore de la trame précédente est localisée à partir du signal résiduel, un prototype d'impulsion de crête d'une longueur données est extrait autour de la position de la dernière impulsion de crête du signal sonore de la trame précédente au moyen du signal sonore pondéré synthétisé et les impulsions de crête sont localisées dans une trame du moment au moyen du prototype d'impulsion de crête.

Claims

Note: Claims are shown in the official language in which they were submitted.




52

WHAT IS CLAIMED IS:

1. A method for determining a long-term-prediction delay parameter
characterizing a long term prediction in a technique using signal modification
for digitally encoding a sound signal, comprising:
dividing the sound signal into a series of successive frames;
locating a feature of the sound signal in a previous frame;
locating a corresponding feature of the sound signal in a current frame;
and
determining the long-term-prediction delay parameter for the current
frame such that the long term prediction maps the signal feature of the
previous frame to the corresponding signal feature of the current frame.

2. A method for determining a long-term-prediction delay parameter as
defined in claim 1, wherein determining the long-term-prediction delay
parameter comprises:
forming a delay contour from the long-term-prediction delay parameter.

3. A method for determining a long-term-prediction delay parameter as
defined in claim 2, wherein:
the sound signal comprises a speech signal;
the feature of the speech signal in the previous frame comprises a
pitch pulse of the speech signal in the previous frame;
the feature of the speech signal in the current frame comprises a pitch
pulse of the speech signal in the current frame; and
forming a delay contour comprises mapping, with the long term
prediction, the pitch pulse of the current frame to the pitch pulse of the
previous frame.



53

4. A method for determining a long-term-prediction delay parameter as
defined in claim 3, wherein defining the long-term-prediction delay parameter
comprises:
calculating the long-term-prediction delay parameter as a function of
distances of successive pitch pulses between a last pitch pulse of the
previous frame and a last pitch pulse of the current frame.

5. A method for determining a long-term-prediction delay parameter as
defined in claim 2, further comprising:
fully characterizing the delay contour with a long-term-prediction delay
parameter of the previous frame and the long-term-prediction delay parameter
of the current frame.

6. A method for determining a long-term-prediction delay parameter as
defined in claim 2, wherein forming a delay contour comprises:
nonlinearly interpolating the delay contour between a long-term-
prediction delay parameter of the previous frame and the long-term-prediction
delay parameter of the current frame.

7. A method for determining a long-term-prediction delay parameter as
defined in claim 2, wherein forming a delay contour comprises:
determining a piecewise linear delay contour from a long-term-
prediction delay parameter of the previous frame and the long-term-prediction
delay parameter of the current frame.

8. A device for determining a long-term-prediction delay parameter
characterizing a long term prediction in a technique using signal modification
for digitally encoding a sound signal, comprising:
a divider of the sound signal into a series of successive frames;




54
a detector of a feature of the sound signal in a previous frame;
a detector of a corresponding feature of the sound signal in a current
frame; and
a calculator of the long-term-prediction delay parameter for the current
frame, the calculation of the long-term-prediction delay parameter being made
such that the long term prediction maps the signal feature of the previous
frame to the corresponding signal feature of the current frame.
9. A device for determining a long-term-prediction delay parameter as
defined in claim 8, wherein the calculator of the long-term-prediction delay
parameter comprises:
a selector of a delay contour from the long-term-prediction delay
parameter.
10. A device for determining a long-term-prediction delay parameter as
defined in claim 9, wherein:
the sound signal comprises a speech signal;
the feature of the speech signal in the previous frame comprises a
pitch pulse of the sound signal in the previous frame;
the feature of the speech signal in the current frame comprises a pitch
pulse of the speech signal in the current frame; and
the delay contour selector is a selector of a delay contour mapping with
the long term prediction the pitch pulse of the current frame to the pitch
pulse
of the previous frame.
11. A device for determining a long-term-prediction delay parameter as
defined in claim 10, wherein the long-term-prediction delay parameter sub-
calculator is:




55
a calculator of the long-term-prediction delay parameter as a function
of distances of successive pitch pulses between the last pitch pulse of the
previous frame and the last pitch pulse of the current frame.
12. A device for determining a long-term-prediction delay parameter as
defined in claim 9, further incorporating:
a function fully characterizing the delay contour with the long-term-
prediction delay parameter of the previous frame and the long-term-prediction
delay parameter of the current frame.
13. A device for determining a long-term-prediction delay parameter as
defined in claim 9, wherein the delay contour selector is:
a selector of a nonlinearly interpolated delay contour between the long-
term-prediction delay parameter of the previous frame and the long-term-
prediction delay parameter of the current frame.
14. A device for determining a long-term-prediction delay parameter as
defined in claim 9, wherein the delay contour selector is:
a selector of a piecewise linear decay contour determined from the
long-term-prediction delay parameter of the previous frame and the tong-term-
prediction delay parameter of the current frame.
15. A signal modification method for implementation into a technique
for digitally encoding a sound signal, comprising:
dividing the sound signal into a series of successive frames;
partitioning each frame of the sound signal into a plurality of signal
segments; and




56
warping at least a part of the signal segments of the frame, said
warping comprising constraining the warped signal segments inside the
frame.
16. A signal modification method as defined in claim 15, wherein:
the sound signal comprises pitch pulses;
each frame comprises boundaries; and
partitioning each frame comprises:
locating pitch pulses in the sound signal of the frame;
dividing the frame into pitch cycle segments each
containing one of the pitch pulses and each located inside the
boundaries of the frame.
17. A signal modification method as defined in claim 16, wherein:
locating pitch pulses comprises using an open-loop pitch estimate
interpolated over the frame; and
the signal modification method further comprises terminating a signal
modification procedure when a difference between positions of the located
pitch pulses and the interpolated open-loop pitch estimate does not meet a
given condition.
18. A signal modification method as defined in claim 15, wherein
partitioning each frame of the sound signal into a plurality of signal
segments
comprises:
weighting the sound signal to produce a weighted sound signal; and
extracting the signal segments from the weighted sound signal.
19. A signal modification method as defined in claim 15, wherein the
warping comprises:




57
producing a target signal for a current signal segment; and
finding an optimal shift for the current signal segment in response to
the target signal.
20. A signal modification method as defined in claim 17, wherein:
producing a target signal comprises producing a target signal from a
weighted synthesized speech signal of a previous frame or from modified
weighted speech signal; and
finding an optimal shift for the current signal segment comprises
performing a correlation between the current signal segment and the target
signal.
21. A signal modification method as defined in claim 20, wherein
performing a correlation comprises:
first evaluating the correlation with an integer resolution to find a signal
segment shift that maximizes the correlation;
then upsampling the correlation in a region surrounding the correlation-
maximizing signal segment shift, said upsampling of the correlation
comprising searching an optimal shift of the current signal segment by
maximizing the correlation with a fractional resolution.
22. A signal modification method as defined in claim 15, wherein:
each frame comprises boundaries;
warping at least a part of the signal segments of the frame comprises:
detecting whether a high power region exists in the sound
signal close to the frame boundary adjacent to a signal segment;
and
shifting the signal segment in relation to detection or
absence of detection of a high power region.




58
23. A signal modification method as defined in claim 15, wherein the
warping comprises:
forming a delay contour defining an interpolated long term prediction
delay parameter over the current frame and providing additional information
about the evolution of the pitch cycles and the periodicity of the current
sound
signal frame; and
shifting the individual pitch cycle segments one by one to adjust them
to the delay contour.
24. A signal modification method as defined in claim 23, wherein
shifting the individual pitch cycle segments comprises:
forming a target signal using the delay contour; and
shifting the pitch cycle segment to maximize the correlation of said
pitch cycle segment with the target signal.
25. A signal modification method as defined in claim 23, further
comprising:
examining the information from the delay contour about the evolution of
the pitch cycles and the periodicity of the current sound signal frame; and
defining at least one condition related to the information given by the
delay contour on the evolution of the pitch cycles and the periodicity of the
current sound signal frame; and
interrupting the signal modification when said at least one condition
related to the information given by the delay contour about the evolution of
the
pitch cycles and the periodicity of the current sound signal frame is not
satisfied.




59
26. A signal modification method as defined in claim 19, further
comprising:
constraining the shift of the signal segments, said constraining
comprising imposing a given criteria to all the signal segments of the frame;
and
interrupting the signal modification procedure when the given criteria is
not respected and maintaining the original sound signal.
27. A signal modification method as defined in claim 15, further
comprising:
detecting an absence of voice activity in the current frame of the sound
signal; and
selecting a signal-modification-disabled mode of coding the current
frame of the sound signal in response to detection of the absence of voice
activity in the current frame.
28. A signal modification method as defined in claim 15, further
comprising:
detecting a presence of voice activity in the current frame of the sound
signal;
rating the current frame as an unvoiced sound signal frame; and
selecting a signal-modification-disabled mode of coding the current
frame of the sound signal in response to:
detection of a presence of voice activity in the current
frame of the sound signal; and
rating the current frame as an unvoiced sound signal
frame.




60
29. A signal modification method as defined in claim 15, further
comprising:
detecting a presence of voice activity in the current frame of the sound
signal;
rating the current frame as a voiced sound signal frame;
detecting that signal modification is successful; and
selecting a signal-modification-enabled mode of coding the current
frame of the sound signal in response to:
detection of a presence of voice activity in the current
frame of the sound signal;
rating the current frame as a voiced sound signal frame;
and
detection that the signal modification is successful.
30. A signal modification method as defined in claim 15, further
comprising:
detecting a presence of voice activity in the current frame of the sound
signal;
rating the current frame as a voiced sound signal frame;
detecting that signal modification is not successful; and
selecting a signal-modification-disabled mode of coding the current
frame of the sound signal in response to:
detection of a presence of voice activity in the current
frame of the sound signal; .
rating the current frame as a voiced sound signal frame;
and
detection that signal modification is not successful.




61
31. A signal modification device for implementation into a technique for
digitally encoding a sound signal, comprising:
a first divider of the sound signal into a series of successive frames;
a second divider of each frame of the sound signal into a plurality of
signal segments; and
a signal segment warping member supplied with at least a part of the
signal segments of the frame, said warping member comprising a constrainer
of the warped signal segments inside the frame.
32. A signal modification device as defined in claim 31, wherein:
the sound signal comprises pitch pulses;
each frame comprises boundaries; and
the second divider comprises:
a detector of pitch pulses in the sound signal of the
frame;
a divider of the frame into pitch cycle segments each
containing one of the pitch pulses and each located inside the
boundaries of the frame.
33. A signal modification device as defined in claim 32, wherein:
the detector of pitch pulses uses an open-loop pitch estimate
interpolated over the frame; and
the signal modification device further comprises a signal modification
terminating member active when a difference between positions of the
detected pitch pulses and the interpolated open-loop pitch estimate does not
meet a given condition.




62
34. A signal modification device as defined in claim 31, wherein the
second divider of each frame of the sound signal into a plurality of signal
segments comprises:
a filter for weighting the sound signal to produce a weighted sound
signal; and
an extractor of the signal segments from the weighted sound signal
35. A signal modification device as defined in claim 31, wherein the
signal segment warping member comprises:
a calculator of a target signal for a current signal segment; and
a finder of an optimal shift for the current signal segment in response to
the target signal.
36. A signal modification device as defined in claim 35, wherein:
the calculator of a target signal is a calculator of a target signal from a
weighted synthesized speech signal of a previous frame or from modified
weighted speech signal; and
the finder of an optimal shift for the current signal segment comprises a
calculator of a correlation between the current signal segment and the target
signal.
37. A signal modification device as defined in claim 36, wherein the
calculator of a correlation comprises:
an evaluator of the correlation with an integer resolution to find a signal
segment shift that maximizes the correlation;
an upsampler of the correlation in a region surrounding the
correlation-maximizing signal segment shift, said upsampler comprising a
searcher of an optimal shift of the current signal segment, said searcher of
an




63
optimal shift of the current signal segment comprising an evaluator of the
correlation with a fractional resolution.
38. A signal modification device as defined in claim 34, wherein:
each frame comprises boundaries;
the signal segment warping member comprises:
a detector of whether a high power region exists in the
sound signal close to the frame boundary adjacent to a signal
segment; and
a shifter of the signal segment in relation to detection or
absence of detection of a high power region.
39. A signal modification device as defined in claim 31, wherein the
signal segment warping member comprises:
a calculator of a delay contour defining an interpolated long term
prediction delay parameter over the current frame and providing additional
information about the evolution of the pitch cycles and the periodicity of the
current sound signal frame; and
a shifter of the individual pitch cycle segments one by one to adjust
them to the delay contour.
40. A signal modification device as defined in claim 39, wherein the
shifter of the individual pitch cycle segments comprises:
a calculator of a target signal using the delay contour; and
a shifter of the pitch cycle segment to maximize the correlation of said
pitch cycle segment with the target signal.
41. A signal modification device as defined in claim 40, further
comprising::


64

an evaluator of the information from the delay contour about the
evolution of the pitch cycles and the periodicity of the current sound signal
frame; and
a definer of at least one condition related to the information given by
the delay contour about the evolution of the pitch cycles and the periodicity
of
the current sound signal frame; and
a terminator of the signal modification when said at least one condition
related to the information given by the delay contour about the evolution of
the
pitch cycles and the periodicity of the current sound signal frame is not
satisfied.
42. A signal modification device as defined in claim 35, further
comprising:
a constrainer of the shift of the pitch cycle segments, said constrainer
comprising an imposer of a given criteria to all segments of the frame; and
a terminator of the signal modification procedure when the given
criteria is not respected.
43. A signal modification device as defined in claim 31, further
comprising:
a detector of an absence of voice activity in the current frame of the
sound signal; and
a selector of a signal-modification-disabled mode of coding the current
frame of the sound signal in response to detection of the absence of voice
activity in the current frame.
44. A signal modification device as defined in claim 31, further
comprising:


65

a detector of a presence of voice activity in the current frame of the
sound signal;
a classifier for rating the current frame as an unvoiced sound signal
frame; and
a selector of a signal-modification-disabled mode of coding the current
frame of the sound signal in response to
detection of a presence of voice activity in the current
frame of the sound signal; and
rating the current frame as an unvoiced sound signal
frame.
45. A signal modification device as defined in claim 31, further
comprising:
a detector of a presence of voice activity in the current frame of the
sound signal;
a classifier for rating the current frame as a voiced sound signal frame;
a detector that signal modification is successful; and
a selector of a signal-modification-enabled mode of coding the current
frame of the sound signal in response to:
detection of a presence of voice activity in the current
frame of the sound signal;
rating the current frame as a voiced sound signal frame;
and
detection that signal modification is successful.
46. A signal modification device as defined in claim 31, further
comprising:
a detector of a presence of voice activity in the current frame of the
sound signal;


66

a classifier for rating the current frame as a voiced sound signal frame;
a detector that signal modification is not successful; and
a selector of a signal-modification-disabled mode of coding the current
frame of the sound signal in response to:
detection of a presence of voice activity in the current
frame of the sound signal;
rating the current frame as a voiced sound signal frame;
and
detection that signal modification is not successful.
47. A method for searching pitch pulses in a sound signal, comprising:
dividing the sound signal into a series of successive frames;
dividing each frame into a number of subframes;
producing a residual signal by filtering the sound signal through a linear
prediction analysis filter;
locating a last pitch pulse of the sound signal of the previous frame
from the residual signal;
extracting a pitch pulse prototype of given length around the position of
the last pitch pulse of the previous frame using the residual signal; and
locating pitch pulses in a current frame using the pitch pulse prototype.
48. A method for searching pitch pulses in a sound signal as defined in
claim 47, further comprising:
predicting the position of a first pitch pulse of the current frame to occur
at an instant related to the position of the previously located pitch pulse
and
an interpolated open-loop pitch estimate at an instant corresponding to the
position of the previously located pitch pulse; and
refining the predicted position of said pitch pulse by maximizing a
weighted correlation between the pulse prototype and the residual signal.


67

49. A method for searching pitch pulses in a sound signal as defined in
claim 48, further comprising:
repeating the prediction of pitch pulse position and the refinement of
predicted position until said prediction and refinement yields a pitch pulse
position located outside the current frame.
50. A device for searching pitch pulses in a sound signal, comprising:
a divider of the sound signal into a series of successive frames;
a divider of each frame into a number of subframes;
a linear prediction analysis filter for filtering the sound signal and
thereby producing a residual signal;
a detector of a last pitch pulse of the sound signal of the previous
frame in response to the residual signal;
an extractor of a pitch pulse prototype of given length around the
position of the last pitch pulse of the previous frame in response to the
residual signal; and
a detector of pitch pulses in a current frame using the pitch pulse
prototype.
51. A device for searching pitch pulses in a sound signal as defined in
claim 50, further comprising:
a predictor of the position of each pitch pulse of the current frame to
occur at an instant related to the position of the previous located pitch
pulse
and an interpolated open-loop pitch estimate at said instant corresponding to
the position of the previously located pitch pulse; and
a refiner of the predicted position of said pitch pulse by maximizing a
weighted correlation between the pulse prototype and the residual signal.


68

52. A device for searching pitch pulses in a sound signal as defined in
claim 51, further comprising:
a repeater of the prediction of pitch pulse position and the refinement
of predicted position until said prediction and refinement yields a pitch
pulse
position located outside the current frame.
53. A method for searching pitch pulses in a sound signal, comprising:
dividing the sound signal into a series of successive frames;
dividing each frame into a number of subframes;
producing a weighted sound signal by processing the sound signal
through a weighting filter, the weighted sound signal being indicative of
signal
periodicity;
locating a last pitch pulse of the sound signal of the previous frame
from the weighted sound signal;
extracting a pitch pulse prototype of given length around the position of
the last pitch pulse of the previous frame using the weighted sound signal;
and
locating pitch pulses in a current frame using the pitch pulse prototype.
54. A method for searching pitch pulses in a sound signal as defined in
claim 53, further comprising:
predicting the position of a first pitch pulse of the current frame to occur
at an instant related to the position of the previously located pitch pulse
and
an interpolated open-loop pitch estimate at an instant corresponding to the
position of the previously located pitch pulse; and
refining the predicted position of said pitch pulse by maximizing a
weighted correlation between the pulse prototype and the weighted sound
signal.


69

55. A method for searching pitch pulses in a sound signal as defined in
claim 54, further comprising:
repeating the prediction, of pitch pulse position and the refinement of
predicted position until said prediction and refinement yields a pitch pulse
position located outside the current frame.
56. A device for searching pitch pulses in a sound signal, comprising:
a divider of the sound signal into a series of successive frames;
a divider of each frame into a number of subframes;
a weighting filter for processing the sound signal to produce a weighted
sound signal, the weighted sound signal being indicative of signal
periodicity;
a detector of a last pitch pulse of the sound signal of the precious
frame in response to the weighted sound signal;
an extractor of a pitch pulse prototype of given length around the
position of the last pitch pulse of the previous frame in response to the
weighted sound signal; and
a detector of pitch pulses in a current frame using the pitch pulse
prototype.
57. A device for searching pitch pulses in a sound signal as defined in
claim 56, further comprising:
a predictor of the position of each pitch pulse of the current frame to
occur at an instant related to the position of the previous located pitch
pulse
and an interpolated open-loop pitch estimate at said instant corresponding to
the position of the previously located pitch pulse; and
a refiner of the predicted position of said pitch pulse by maximizing a
weighted correlation between the pulse prototype and the weighted sound
signal.


70

58. A device for searching pitch pulses in a sound signal as defined in
claim 57, further comprising:
a repeater of the prediction of pitch pulse position and the refinement of
predicted position until said prediction and refinement yields a pitch pulse
position located outside the current frame.
59. A method for searching pitch pulses in a sound signal, comprising:
dividing the sound signal into a series of successive frames;
dividing each frame into a number of subframes;
producing a synthesized weighted sound signal by filtering a
synthesized speech signal produced during a last subframe of a previous
frame of the sound signal through a weighting filter;
locating a last pitch pulse of the sound signal of the previous frame
from the synthesized weighted sound signal;
extracting a pitch pulse prototype of given length around the position of
the last pitch pulse of the previous frame using the synthesized weighted
sound signal; and
locating pitch pulses in a current frame using the pitch pulse prototype.
60. A method for searching pitch pulses in a sound signal as defined in
claim 59, further comprising:
predicting the position of a first pitch pulse of the current frame to occur
at an instant related to the position of the previously located pitch pulse
and
an interpolated open-loop pitch estimate at an instant corresponding to the
position of the previously located pitch pulse; and
refining the predicted position of said pitch pulse by maximizing a
weighted correlation between the pulse prototype and the synthesized
weighted sound signal.


71

61. A method for searching pitch pulses in a sound signal as defined in
claim 60, further comprising:
repeating the prediction of pitch pulse position and the refinement of
predicted position until said prediction and refinement yields a pitch pulse
position located outside the current frame.
62. A device for searching pitch pulses in a sound signal, comprising:
a divider of the sound signal into a series of successive frames;
a divider of each frame into a number of subframes;
a weighting filter for filtering a synthesized speech signal produced
during a last subframe of a previous frame of the sound signal and thereby
producing a synthesized weighted sound signal;
a detector of a last pitch pulse of the sound signal of the previous
frame in response to the synthesized weighted sound signal;
an extractor of a pitch pulse prototype of given length around the
position of the last pitch pulse of the previous frame in response to the
synthesized weighted sound signal; and
a detector of pitch pulses in a current frame using the pitch pulse
prototype.
63. A device for searching pitch pulses in a sound signal as defined in
claim 62, further comprising:
a predictor of the position of each pitch pulse of the current frame to
occur at an instant related to the position of the previous located pitch
pulse
and an interpolated open-loop pitch estimate at said instant corresponding to
the position of the previously located pitch pulse; and
a refiner of the predicted position of said pitch pulse by maximizing a
weighted correlation between the pulse prototype and the synthesized
weighted sound signal.


72

64. A device for searching pitch pulses in a sound signal as defined in
claim 63, further comprising:
a repeater of the prediction of pitch pulse position and the refinement
of predicted position until said prediction and refinement yields a pitch
pulse
position located outside the current frame.
65. A method for forming an adaptive codebook excitation during
decoding of a sound signal divided into successive frames and previously
encoded by means of a technique using signal modification for digitally
encoding the sound signal, comprising:
receiving, for each frame, a long-term-prediction delay parameter
characterizing a long term prediction in the digital sound signal encoding
technique;
recovering a delay contour using the long-term-prediction delay
parameter received during a current frame and the long-term-prediction delay
parameter received during a previous frame, wherein the delay contour maps,
with long term prediction, a signal feature of the previous frame to a
corresponding signal feature of the current frame;
forming the adaptive codebook excitation in an adaptive codebook in
response to the delay contour.
66. A device for forming an adaptive codebook excitation during
decoding of a sound signal divided into successive frames and previously
encoded by means of a technique using signal modification for digitally
encoding the sound signal, comprising:
a receiver of a long-term-prediction delay parameter of each frame,
wherein the long-term-prediction delay parameter characterizes a long term
prediction in the digital sound signal encoding technique;


73

a calculator of a delay contour in response to the long-term-prediction
delay parameter received during a current frame and the long-term-prediction
delay parameter received during a previous frame, wherein the delay contour
maps, with long term prediction, a signal feature of the previous frame to a
corresponding signal feature of the current frame ; and
an adaptive codebook for forming the adaptive codebook excitation in
response to the delay contour.

Description

Note: Descriptions are shown in the official language in which they were submitted.




CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
SIGNAL MODIFICATION METHOD FOR
EFFICIENT CODING OF SPEECH SIGNALS
FIELD OF THE INVENTION
The present invention relates generally to the encoding and decoding
of sound signals in communication systems. More specifically, the present
invention is concerned with a signal modification technique applicable to,. in
particular but not exclusively, code-excited linear prediction (CELP) coding.
BACKGROUND OF THE INVENTION
Demand for efficient digital narrow- and wideband speech coding
techniques with a good trade-off between the subjective quality and bit rate
is
increasing in various application areas such as teleconferencing, multimedia,
and wireless communications. Until recently, the telephone bandwidth
constrained into a range of 200-3400 Hz has mainly been used in speech
coding applications. However, wideband speech applications provide
increased intelligibility and naturalness in communication compared to the
conventional telephone bandwidth. A bandwidth in the range 50-7000 Hz has
been found sufficient for delivering a good quality giving an' impression of
face-to-face communication. For general audio signals, this bandwidth gives
an acceptable subjective quality, but is still. cower than the quality of FM
radio
or CD that operate in ranges of 20-16000 Hz and 20-20000 Hz, respectively.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
2
A speech encoder converts a speech signal into a digital bit stream
which is transmitted over a communication channel or stored in a storage
medium. The speech signal is digitized, that is sampled and quantized with
usually 16-bits per sample. The speech encoder has the role of representing
these digital samples with a smaller number of bits while maintaining a good
subjective speech quality. The speech decoder or synthesizer operates on the
transmitted or stored bit stream and converts it back to a sound signal.
Code-Excited Linear Prediction (CELP) coding is one of the best
techniques for achieving a good compromise between the subjective quality
and bit rate. This coding technique is a basis of several speech coding
standards both in wireless and wire line applications. In CELP coding, the
sampled speech signal is processed in successive blocks of N samples
usually called frames, where N is a predetermined number corresponding
typically to 10-30 ms. A linear prediction (LP) filter is computed and
transmitted every frame. The computation of the LP filter typically needs a
look ahead, i.e. a 5-10 ms speech segment from the subsequent frame. The
N-sample frame is divided into smaller blocks called subframes. Usually the
number of subframes is three or four resulting in 4-10 ms subframes. In each
subframe, an excitation signal is usually obtained from two components: a
past excitation and an innovative, fixed-codebook excitation. The component
formed from the past excitation is often referred to as the adaptive codebook
or pitch excitation. The parameters characterizing the excitation signal are
coded and transmitted to the decoder, where the reconstructed excitation
signal is used as the input of the LP filter.
In conventional CELP coding, long term prediction for mapping the
past excitation to the present is usually performed on a subframe basis. Long
term prediction is characterized by a delay parameter and a pitch gain that
are



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
3
usually computed, coded and transmitted to the decoder for every subframe.
At low bit rates, these parameters consume a substantial proportion of the
available bit budget. Signal modification techniques [1-7]
[1j W.B. Kleijn, P. Kroon, and D. Nahumi, "The RCELP
speech-coding algorithm," European Transactions on
Telecommunications, Voi. 4, No. 5, pp. 573-582, 1994.
[2] W.B. Kleijn, R.P. Ramachandran, and P. Kroon,
"Interpolation of the pitch-predictor parameters in analysis-by-
synthesis speech coders," IEEE Transactions on Speech and Audio
Processing, Vol. 2, No. 1, pp. 42-54, 1994.
[3J Y. Gao, A. Benyassine, J. Thyssen, H. Su, and E. Shlomot,
"EX-CELP: A speech coding paradigm," IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), Salt Lake City, Utah, U.S.A., pp. 689-692, 7-11 May
2001.
[4j US Patent 5,704,003, "RCELP coder," Lucent Technologies
Inc., (W.B. Kleijn and D. Nahumi), Filing Date; 19 September 1995.
[5] European Patent Application 0 602 826 A2, "Time shifting
for analysis-by-synthesis coding," AT&T Corp., (B. Kleijn), Filing
Date: 1 December 1993.
[6] Patent Application WO 00/11653, "Speech encoder with
continuous warping combined with long term prediction," Conexant
Systems Inc., (Y. Gao), Filing Date: 24 August 1999.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
4
[7j Patent Application WO 00!11654, "Speech encoder
adaptively applying pitch preprocessing with continuous warping,"
Conexant Systems Inc., (H. Su and Y. Gao), Filing Date: 24 Aug.
1999.
improve the performance of long term prediction at low bit rates by adjusting
the signal to be coded. This is done by adapting the evolution of the pitch
cycles in the speech signal to fit the long term prediction delay, enabling to
transmit only one delay parameter per frame. Signal modification is based on
the premise that it is possible to render the difference between the modified
speech signal and the original speech signal inaudible. The CELP coders
utilizing signal modification are often referred to as generalized analysis-by-

synthesis or relaxed CELP (RCELP) coders.
Signal modification techniques adjust the pitch of the signal to a
predetermined delay contour. Long term prediction then maps the past
excitation signal to the present subframe using this delay contour and scaling
by a gain parameter. The delay contour is obtained straightforwardly by
interpolating between two open-loop pitch estimates, the first obtained in the
previous frame and the second in the current frame. interpolation gives a
delay value for every time instant of the frame. After the delay contour is
available, the pitch in the subframe to be coded currently is adjusted to
follow
this artificial contour by warping, i.e. changing the time scale of the
signal.
In discontinuous warping [1, 4 and 5j



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
[1] W.B. Kleijn, P. Kroon, and D. Nahumi, "The RCELP speech-
coding algorithm," European Transactions on Telecommunications,
Vol. 4, No. 5,, pp. 573-582, 1994.
5 [4j US Patent 5,704,003, "RCELP coder," Lucent Technologies Inc.,
(W.B. Kleijn and D. Nahumi), Filing Date: 19 September 1995.
[5J European Patent Application 0 602 826 A2, "Time shifting for
analysis-by-synthesis coding," AT&T Corp., (B. Kleijn), Filing Date:
1 December 1993.
a signal segment is shifted in time without altering the segment length.
Discontinuous warping requires a procedure for handling the resulting
overlapping or missing signal portions. Continuous warping [2, 3, 6, 7]
[2] W.B. Kleijn, R.P. Ramachandran, and P. Kroon, "Interpolation of
the pitch-predictor parameters in analysis-by-synthesis speech
coders," IEEE Transactions on Speech and Audio Processing, Vol.
2, No. 1, pp. 42-54, 1994.
[3] Y. Gao, A. Benyassine, J. Thyssen, H. Su, and E. Shlomot, "EX-
CELP: A speech coding paradigm," IEEE International Conference
on Acoustics, Speech and Signal Processing (1CASSP), Salt Lake
City, Utah, U.S.A., pp. 689-692, 7-11 May 2001.
[6] Patent Application WO 00/11653, "Speech encoder with
continuous warping combined with long term prediction," Conexant
Systems Inc., (Y. Gao), Filing Date: 24 Aug. 1999.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
6
[7j Patent Application WO 00111654, "Speech encoder adaptively
applying pitch preprocessing with continuous warping," Conexant
Systems Inc., (H. Su and Y. Gao), Filing Date 24 Aug. 1999.
either contracts or expands a signal segment. This is done using a time
continuous approximation for the signal segment and re-sampling it to a
desired length with unequal sampling intervals determined based on the delay
contour. For reducing artifacts in these operations, the tolerated change in
the
time scale is kept small. Moreover, warping is typically done using the LP
residual signal or the weighted speech signal to reduce the resulting
distortions. The use of these signals instead of the speech signal also
facilitates detection of pitch pulses and low-power regions in between them,
and thus the determination of the signal segments for warping. The actual
modified speech signal is generated by inverse filtering.
After the signal modification is done for the current subframe, the
coding can proceed in any conventional manner except the adaptive
codebook excitation is generated using the predetermined delay contour.
Essentially the same signal modification techniques can be used both in
narrow- and wideband CELP coding.
Signal modification techniques can also be applied in other types of
speech coding methods such as waveform interpolation coding and sinusoidal
coding for instance in accordance with [8j.
[8j US Patent 6,223,151, "Method and apparatus for pre-processing
speech signals prior to coding by transform-based speech coders,"
Telefon Aktie Bolaget LM Ericsson, (W.B. Kleijn and T. Eriksson),
Filing Date 10 Feb. 1999.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
7
SUMMARY OF THE INVENTION
The present invention relates to a method for determining a long-term-
prediction delay parameter characterizing a long term prediction in a
technique using signal modification for digitally encoding a sound signal,
comprising dividing the sound signal into a series of successive frames,
locating a feature of the sound signal in a previous frame, locating a
corresponding feature of the sound signal in a current frame, and determining
the long-term-prediction delay parameter for the current frame such that the
long term prediction maps the signal feature of the previous frame to the
corresponding signal feature of the current frame.
The subject invention is concerned with a device for determining a
long-term-prediction delay parameter characterizing a long term prediction in
a technique using signal modification for digitally encoding a sound signal,
comprising a divider of the sound signal into a series of successive frames, a
detector of a feature of the sound signal in a previous frame, a detector of a
corresponding feature of the sound signal in a current frame, and a calculator
of the long-term-prediction delay parameter for the current frame, the
calculation of the tong-term-prediction delay parameter being made such that
the long term prediction maps the signal feature of the previous frame to the
corresponding signal feature of the current frame.
According to the invention, there is provided a signal modification
method for implementation info a technique for digitally encoding a sound
signal, comprising dividing the sound signal into a series of successive
frames, partitioning each frame of the sound signal into a plurality of signal



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
segments, and warping at least a part of the signal segments of the frame,
this warping comprising constraining the warped signal segments inside the
frame.
In accordance with the present invention, there is provided a signal
modification device for implementation into a technique for digitally encoding
a
sound signal, comprising a first divider of the sound signal into a series of
successive frames, a second divider of each frame of the sound signal into a
plurality of signal segments, and a signal segment warping member supplied
with at least a part of the signal segments of the frame, this warping member
comprising a constrainer of the warped signal segments inside the frame.
The present invention also relates to a method for searching pitch
pulses in a sound signal, comprising dividing the sound signal into a series
of
successive frames, dividing each frame into a number of subframes,
producing a residual signal by filtering the sound signal through a linear
prediction analysis filter, locating a last pitch pulse of the sound signal of
the
previous frame from the residual signal, extracting a pitch pulse prototype of
given length around the position of the last pitch pulse of the previous frame
using the residual signal, and locating pitch pulses in a current frame using
the pitch pulse prototype.
The present invention is also concerned with a device for searching
pitch pulses in a sound signal, comprising a divider of the sound signal into
a
series of successive frames, a divider of each frame into a number of
subframes, a linear prediction analysis filter for filtering the sound signal
and
thereby producing a residual signal, a detector of a last pitch pulse of the
sound signal of the previous frame in response to the residual signal, an
extractor of a pitch pulse prototype of given length around the position of
the



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
9
last pitch pulse of the previous frame in response to the residual signal, and
a
detector of pitch pulses in a current frame using the pitch pulse prototype.
According to the invention, there is also provided a method for
searching pitch pulses in a sound signal, comprising dividing the sound signal
into a series of successive frames, dividing each frame into a number of
subframes, producing a weighted sound signal by processing the sound
signal through a weighting filter wherein the weighted sound signal is
indicative of signal periodicity, locating a last pitch pulse of the sound
signal of
the previous frame from the weighted sound signal, extracting a pitch pulse
prototype of given length around the position of the last pitch pulse of the
previous frame using the weighted sound signal, and locating pitch pulses in a
current frame using the pitch pulse prototype.
Also in accordance with the present invention, there is provided a
device for searching pitch pulses in a sound signal,, comprising a divider of
the
sound signal into a series of successive frames, a divider of each frame into
a
number of subframes, a weighting filter for processing the sound signal to
produce a weighted sound signal wherein the weighted sound signal is
indicative of signal periodicity, a detector of a cast pitch pulse of the
sound
signal of the previous frame in response to the weighted sound signal, an
extractor of a pitch pulse prototype of given length around the position of
the
last pitch pulse of the previous frame in response to the weighted sound
signal, and a detector of pitch pulses in a current frame using the pitch
pulse
prototype.
The present invention further relates to a method for searching pitch
pulses in a sound signal, comprising dividing the sound signal into a series
of
successive frames, dividing each frame into a number of subframes,



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
producing a synthesized weighted sound signal by filtering a synthesized
speech signal produced during a last subframe of a previous frame of the
sound signal through a weighting filter, locating a last pitch pulse of the
sound
signal of the previous frame from the synthesized weighted sound signal,
5 extracting a pitch pulse prototype of given length around the position of
the
last pitch pulse of the previous frame using the synthesized weighted sound
signal, and locating pitch pulses in a current frame using the pitch pulse
prototype.
10 The present invention is further concerned with a device for searching
pitch pulses in a sound signal, comprising a divider of the sound signal into
a
series of successive frames, a divider of each frame into a number of
subframes, a weighting filter for filtering a synthesized speech signal
produced during a last subframe of a previous frame of the sound signal and
thereby producing a .synthesized weighted sound signal, a detector of a last
pitch pulse of the sound signal of the previous frame in response to the
synthesized weighted sound signal, an extractor of a pitch pulse prototype of
given length around the position of the last pitch pulse of the previous frame
in
response to the synthesized weighted sound signal, and a detector of pitch
pulses in a current frame using the pitch pulse prototype.
According to the invention, there is further provided a method for
forming an adaptive codebook excitation during decoding of a sound signal
divided into successive frames and previously encoded by means of a
technique using signal modification for digitally encoding the sound signal,
comprising:
receiving, for each frame, a long-term-prediction delay parameter
characterizing a long term prediction in the digital sound signal encoding
technique;



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
11
recovering a delay contour using the long-term-prediction delay
parameter received during a current frame and the long-term-prediction delay
parameter received during a previous frame, wherein the delay contour, with
long term prediction, maps a signal feature of the previous frame to a
corresponding signal feature of the current frame;
forming the adaptive codebook excitation in an adaptive codebook in
response to the delay contour.
Further in accordance with the present invention, there is provided a
device for forming an adaptive codebook excitation during decoding of a
sound signal divided into successive frames and previously encoded by
means of a technique using signal modification for digitally encoding the
sound signal, comprising:
a receiver of a long-term-prediction delay parameter of each frame,
wherein the long-term-prediction delay parameter characterizes a long term
prediction in the digital sound signal encoding technique;
a calculator of a delay contour in response to the long-term-prediction
delay parameter received during a current frame and the long-term-prediction
delay parameter received during a previous frame, wherein the delay contour,
with long term prediction, maps a signal feature of the previous frame to a
corresponding signal feature of the current frame; and
an adaptive codebook for forming the adaptive codebook excitation in
response to the delay contour.
The foregoing and other objects, advantages and features of the
present invention will become more apparent upon reading of the following
non restrictive description of illustrative embodiments thereof, given by way
of
example only with reference to the accompanying drawings.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
12
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is an illustrative example of original and modified residual
signals for one frame;
Figure 2 is a functional block diagram of an illustrative embodiment of
a signal modification method according to the invention;
Figure 3 is a schematic block diagram of an illustrative example of
speech communication system showing the use of speech encoder and
decoder;
Figure 4 is a schematic block diagram of an illustrative embodiment of
speech encoder that utilizes a signal modification method;
Figure 5 is a functional block diagram of an illustrative embodiment of
pitch pulse search;
Figure 6 is an illustrative example of located pitch pulse positions and
a corresponding pitch cycle segmentation for one frame;
Figure 7 is an illustrative example on determining a delay parameter
when the number of pitch pulses is three (c = 3);
Figure 8 is an illustrative example of delay interpolation (thick line)
over a speech frame compared to linear interpolation (thin line);



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
13
Figure 9 is an illustrative example of a delay contour over ten frames
selected in accordance with the delay interpolation (thick line) of Figure 8
and
linear interpolation (thin line) when the correct pitch value is 52 samples;
Figure 10 is a functional block diagram of the signal modification
method that adjusts the speech frame to the selected delay contour in
accordance with an illustrative embodiment of the present invention;
Figure 11 is an illustrative example on updating the target signal w(t)
using a determined optimal shift ~, and on replacing the signal segment
virS(k)
with interpolated values shown as gray dots;
Figure 12 is a functional block diagram of a rate determination logic in
accordance ~niith an illustrative embodiment of the present invention; and
Figure 13 is a schematic block diagram of an illustrative embodiment
of speech decoder that utilises the delay contour formed in accordance with
an illustrative embodiment of the present invention.
DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS
Although the illustrative embodiments of the present invention will be
described in relation to speech signals and the 3GPP AMR Wideband Speech
Codec AMR-WB Standard (ITU-T G.722.2), it should be kept in mind that the
concepts of the present invention may be applied to other types of sound
signals as well as other speech and audio coders.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
14
Figure 1 illustrates an example of modified residual signal 12 within
one frame. As shown in Figure 1, the time shift in the modified residual
signal
12 is constrained such that this modified residual signal is time synchronous
with the original, unmodified residual signal 11 at frame boundaries occurring
at time instants t"_~ and t". Here n refers to the index of the present frame.
More specifically, the time shift is controlled implicitly with a delay
contour employed for interpolating the delay parameter over the current
frame. The delay parameter and contour are determined considering the time
alignment constrains at the above-mentioned frame boundaries. When linear
interpolation is used to force the time alignment, the resulting delay
parameters tend to oscillate over several frames. This often causes annoying
artifacts to the modified signal whose pitch follows the artificial
oscillating
delay contour. Use of a properly chosen nonlinear interpolation technique for
the delay parameter will substantially reduce these oscillations.
A functional block diagram of the illustrative embodiment of the signal
modification method according to the invention is presented in Figure 2.
The method starts, in "'pitch cycle search" block 101, by locating
individual pitch pulses and pitch cycles. The search of block 101 utilizes an
open-loop pitch estimate interpolated over the frame. Based on the located
pitch pulses, the frame is divided into pitch cycle segments, each containing
one pitch pulse and restricted inside the frame boundaries t".~ and t~.
The function of the "delay curve selection" block 103 is to determine a
delay parameter for the long term predictor and form a delay contour for
interpolating this delay parameter over the frame. The delay parameter and
contour are determined considering the time synchrony constrains at frame



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
boundaries f".~ and t". The delay parameter determined in block 103 is coded
and transmitted to the decoder when signal modification is enabled for the
current frame.
5 The actual signal modification procedure is conducted in the "pitch
synchronous signal modification" block 105. Block 105 first forms a target
signal based on the delay contour determined in Mock 103 for subsequently
matching the individual pitch cycle segments into this target signal. The
pitch
cycle segments are then shifted one by one to maximize their correlation with
10 this target signal. To keep the complexity at a low level, no continuous
time
warping is applied while searching the optimal shift and shifting the
segments.
The illustrative embodiment of signal modification method as
disclosed in the present specification is typically enabled only on purely
15 voiced speech frames. For instance, transition frames such as voiced onsets
are not modified because of a high risk of causing artifacts. In purely voiced
frames, pitch cycles usually change relatively slowly and therefore small
shifts
suffice to adapt the signal to the long term prediction model. Because only
small, cautious signal adjustments are made, the probability of causing
artifacts is minimized.
The signal modification method constitutes an efficient classifier for
purely voiced segments, and hence a rate determination mechanism to be
used in a source-controlled coding of speech signals. Every block 101, 103
and 105 of Figure 2 provide several indicators on signal periodicity and the
suitability of signal modification in the current frame. These indicators are
analyzed in logic blocks 102, 104 and 106 in order to determine a proper
coding mode and bit rate for the current frame. More specifically, these logic



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
16
blocks 102, 104 and 106 monitor the success of the operations conducted in
blocks 101, 103, and 105.
If block 102 detects that the operation performed in block 101 is
successful, the signal modification method is continued in block 103. When
this block 102 detects a failure in the operation performed in block 101, the
signal modification procedure is terminated and the original speech frame is
preserved intact for coding (see block 108 corresponding to normal mode (no
signal modification)).
If block 104 detects that the operation performed in block 103 is
successful, the signal modification method is continued in block 105. When,
on the contrary, this block 104 detects a failure in the operation performed
in
block 103, the signal modification procedure is terminated and the original
speech frame is preserved intact for coding (see block 108 corresponding to
normal mode (no signal modification)).
If block 106 detects that the operation performed in block 105 is
successful, a low bit rate mode with signal modification is used (see block
10'T). On the contrary, when this block 106 detects a failure in the operation
performed in block 105 the signal modification procedure is terminated, and
the original speech frame is preserved intact for coding (see block 108
corresponding to normal mode (no signal modification)). The operation of the
blocks 101-108 will be described in detail later in the present specification.
Figure 3 is a schematic block diagram of an illustrative example of
speech communication system depicting the use of speech encoder and
decoder. The speech communication system of Figure 3 supports
transmission and reproduction of a speech signal across a communication



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
17
channel 205. Although it may comprise for example a wire, an optical link or a
fiber link, the communication channel 205 typically comprises at least in part
a
radio frequency link. The radio frequency link often supports multiple,
simultaneous speech communications requiring shared bandwidth resources
such as may be found with cellular telephony. Although not shown, the
communication channel 205 may be replaced by a storage device that
records and stores the encoded speech signal for later playback.
On the transmitter side, a microphone 201 produces an analog
speech signal 210 that is supplied to an analog-to-digital (A/D) converter
202.
The function of the A/D converter 202 is to convert the analog speech signal
210 into a digital speech signal 211. A speech encoder 203 encodes the
digital speech signal 211 to produce a set of coding parameters 212 that are
coded into binary form and delivered to a channel encoder 204. The channel
encoder 204 adds redundancy to the binary representation of the coding
parameters before transmitting them into a bitstream 213. over the
communication channel 205.
On the receiver side, a channel decoder 206 is supplied with the
above mentioned redundant binary representation of the coding parameters
from the received bitstream 214 to detect and correct channel errors that
occurred in the transmission. A speech decoder 207 converts the channel-
error-corrected bitstream 215 from the channel decoder 206 back to a set of
coding parameters for creating a synthesized digital speech signal 216. The
synthesized speech signal 216 reconstructed by the speech decoder 207 is
converted to an analog speech signal 217 through a digital-to-analog (D/A)
converter 208 and played back through a loudspeaker unit 209.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
18
Figure 4 is a schematic block diagram showing the operations
performed by the illustrative embodiment of speech encoder 203 (Figure 3)
incorporating the signal ~ modification functionality. The present
specification
presents a novel implementation of this signal modification functionality of
block 603 in Figure 4. The other operations performed by the speech encoder
203 are well known to those of ordinary skill in the art and have been
described, for example, in the publication [10]
[10] 3GPP TS 26.190, "AMR Wideband Speech Codec:
Transcoding Functions," 3GPP Technical Specification.
which is incorporated herein by reference. When not stated otherwise, the
implementation of the speech encoding and decoding operations in the
illustrative embodiments and examples of the present invention will comply
with the AMR Wideband Speech Codec (AMR-WB) Standard.
The speech encoder 203 as shown in Figure 4 encodes the digitized
speech signal using one or a plurality of coding modes. When a plurality of
coding modes are used and the signal modification functionality is disabled in
one of these modes, this particular mode will operate in accordance with well
established standards known to those of ordinary skill in the art.
Although not shown in Figure 4, the speech signal is sampled at a
rate of 16 kHz and each speech signal sample is digitized. The digital speech
signal is then divided into successive frames of given length, and each of
these frames is divided into a given number of successive subframes. The
digital speech signal is further subjected to preprocessing as taught by the
AMR-WB standard. This preprocessing includes high-pass filtering, pre-
emphasis filtering using a filter P(z) = 1 - 0.68z ~ and down-sampling from
the



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
19
sampling rate of 16 kHz to 12.8 kHz. The subsequent operations of Figure 4
assume that the input speech signal s(t) has been preprocessed and down-
sampled to the sampling rate of 12.8 kHz.
The speech encoder 203 comprises an LP (Linear Prediction)
analysis and quantization module 601 responsive to the input, preprocessed
digital speech signal s(t) 617 to compute and quantize the parameters ao, a~,
a2, ... , a"A of the LP filter 1 /A(z), wherein nA is the order of the filter
and A(z) _
ao + a~z ~ + a2z 2 + ... + anAZ nA . The binary representation 616 of these
quantized LP filter parameters is supplied to the multiplexes 614 and
subsequently multiplexed into the bitstream 615. The non-quantized and
quantized LP filter parameters can be interpolated for obtaining the
corresponding LP filter parameters for every subframe.
The speech encoder 203 further comprises a pitch estimator 602 to
compute open-loop pitch estimates 619 for the current frame in response to
the I_P filter parameters 618 from the LP analysis and quantization module
601. These open-loop pitch estimates 619 are interpolated over the frame to
be used in a signal modification module 603.
The operations performed in the LP analysis and quantization module
601 and the pitch estimator 602 can be implemented in compliance with the
above-mentioned AMR-WB Standard.
The signal modification module 603 of Figure 4 performs a signal
modification operation prior to the closed-loop pitch search of the adaptive
codebook excitation signal for adjusting the speech signal to the determined
delay contour d(t). In the illustrative embodiment, the delay contour d(t)
defines a Tong term prediction delay for every sample of the frame. By



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
construction the delay contour is fully characterized over the frame t E
(t"_~,
t"] by a delay parameter 620 d" = d(t") and its previous value d" _ ~ = d(t" _
~ )
that are equal to the value of the delay contour at frame boundaries. The
delay parameter 620 is determined as a part of the signal modification
5 operation, and coded and then supplied to the multiplexes 614 where it is
multiplexed into the bitstream 615.
The delay contour d(t) defining a long term prediction delay parameter
for every sample of the frame is supplied to an adaptive codebook 607. The
10 adaptive codebook 607 is responsive to the delay contour d(t) to form the
adaptive codebook excitation ub(t) of the current subframe from the excitation
u(t) using the delay contour d(t) as ub(t) = u(t - d(t)). Thus the the delay
contour maps the past sample of the exitation signal u(f - d(t)) to the
present
sample in the adaptive codebook excitation ub(t).
The signal modification procedure produces also a modified residual
signal i~(t) to be used for composing a modified target signal 621 for the
closed-loop search of the fixed-codebook excitation u~(t). The modified
residual signal i-(t) is obtained in the signal modification module 603 by
warping the pitch cycle segments of the LP residual signal, and is supplied to
the computation of the modified target signal in module 604. The LP synthesis
filtering of the modified residual signal with the filter 1/A(z) yields then
in
module 604 the modified speech signal. The modified target signal 621 of the
fixed-codebook excitation search is formed in module 604 in accordance with
the operation of the AMR-WB Standard, but with the original speech signal
replaced by its modified version.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
21
After the adaptive codebook excitation ub(t) and the modified target
signal 621 have been obtained for the current subframe, the encoding can
further proceed using conventional means.
The function of the closed-loop fixed-codebook excitation search is to
determine the fixed-codebook excitation signal u~(t) for the current subframe.
To schematically illustrate the operation of the closed-loop fixed-codebook
search, the fixed-codebook excitation u~(t) is gain scaled through an
amplifier
610. In the same manner, the adaptive-codebook excitation ub(t) is gain
scaled through an amplifier 609. The gain scaled adaptive and fixed-
codebook excitations ub(t) and u~(t) are summed together through an adder
611 to form a total excitation signal u(t). This total excitation signal u(t)
is
processed through an LP synthesis filter 1/A(z) 612 to produce a synthesis
speech signal 625 which is subtracted from the modified target signal 621
through an adder 605 to produce an error signal 626. An error weighting and
minimization module 606 is responsive to the error signal 626 to calculate,
according to conventional methods, the gain parameters for the amplifiers 609
and 610 every subframe. The error weighting and minimization module 606
further calculates, in accordance with conventional methods and in response
to the error signal 626, the input 627 to the fixed codebook 608. The
quantized gain parameters 622 and 623 and the parameters 624
characterizing the fixed-codebook excitation signal u~(t) are supplied to the
'
multiplexer 614 and multiplexed into the bitstream 615. The above procedure
is done in the same manner both when signal modification is enabled or
disabled.
It should be noted that, when the signal modification functionality is
disabled, the adaptive excitation codebook 60T operates according to
conventional methods. In this case, a separate delay parameter is searched



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
22
for every subframe in the adaptive codebook 607 to refine the open-loop pitch
estimates 619. These delay parameters are coded, supplied to the multiplexer
614 and multiplexed into the bitstream 615. Furthermore, the target signal 621
for the fixed-codebook search is formed in accordance with conventional
methods.
The speech decoder as shown in Figure 13 operates according to
conventional methods except when signal modification is enabled. Signal
modification disabled and enabled operation differs essentially only in the
way
the adaptive codebook excitation signal ub(t) is formed. In both operational
modes, the decoder decodes the received parameters from their binary
representation. Typically the received parameters include excitation, gain,
delay and LP parameters. The decoded excitation parameters are used in
module 701 to form the fixed-codebook excitation signal u~(t) for every
subframe. This signal is supplied through an amplifier 702 to an adder 703.
Similarly, the adaptive codebook excitation signal ub(t) of the current
subframe
is supplied to the adder 703 through an amplifier 704. In the adder 703, the
gain-scaled adaptive and fixed-codebook excitation signals ub(t) and u~(t) are
summed together to form a total excitation signal u(t) for the current
subframe.
This excitation signal u(t) is processed through the LP synthesis filter
1/A(z)
708, that uses LP parameters interpolated in module 707 for the current
subframe, to produce the synthesized speech signal s(t) .
When signal modification is enabled, the speech decoder recovers the
delay contour d(t) in module 705 using the received delay parameter d~ and
its previous received value d".~ as in the encoder. This delay contour d(t)
defines a long term prediction delay parameter for every time instant of the
current frame. The adaptive codebook excitation ub(t) = u(t- d(t)) is formed



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
23
from the past excitation for the currerit subframe as in the encoder using the
delay contour d(t).
The remaining description discloses the detailed operation of the
signal modification procedure 603 as well as its use as.a part of the mode
determination mechanism.
Search of Pitch Pulses and Pitch Cycle Segments
The signal modification method operates pitch and frame
synchronously, shifting each detected pitch cycle segment individually but
constraining the shift at frame boundaries. This requires means for locating
pitch pulses and corresponding pitch cycle segments for the current frame. In
the illustrative embodiment of the signal modification method, pitch cycle
segments are determined based on detected pitch pulses that are searched
according to Figure 5.
Pitch pulse search can operate on the residual signal r(t), the
weighted speech signal w(t) and/or the weighted synthesized speech signal
w(t) . The residual signal r(t) is obtained by filtering the speech signal
s(t) with
the L.P filter A(z), which has been interpolated for the subframes. In the
illustrative embodiment, the order of the LP filter A(z) is 16. The weighted
speech signal ~nr(t) is obtained by processing the speech signal s(t) through
the weighting filter
A(zly,) (1)
W(z) - _i ,



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
24
where the coefficients y~ = 0.92 and y2 = 0.68. The weighted speech signal
w(t) is often utilized in open-loop pitch estimation (module 602) since the
weighting filter defined by Equation (1 ) attenuates the formant structure in
the
speech signal s(t), and preserves the periodicity also on sinusoidal signal
segments. That facilitates pitch pulse search because possible signal
periodicity becomes clearly apparent in weighted signals. It should be noted
that the weighted speech signal w(t) is needed also for the look ahead in
order to search the last pitch pulse in the current frame. This can be done by
using the weighting filter of Equation (1 ) formed in the last subframe of the
current frame over the look ahead portion.
The pitch pulse search procedure of Figure 5 starts in block 301 by
locating the last pitch pulse of the previous frame from the residual signal
r(t).
A pitch pulse typically stands out clearly as the maximum absolute value of
the low-pass filtered residual signal in a pitch cycle having a length of
approximately p(t". ~). A normalized Hamming window H5(z) _ (0.08 z'2 +
0.54 z ~, + 1 + 0.54 z + 0.08 z2)/2.24 having a length of five (5) samples is
used
for the low-pass filtering in order to facilitate the locating of the last
pitch pulse
of the previous frame. This pitch pulse position is denoted by To. The
illustrative embodiment of the signal modification method according to the
invention does not require an accurate position for this pitch pulse, but
rather
a rough location estimate of the high-energy segment in the pitch cycle.
After locating the last pitch pulse at To in the previous frame, a pitch
pulse prototype of length 21 + 1 samples is extracted in block 302 of Figure 5
around this rough position estimate as, for example:
na" (Ic) = w(To - l + k) for k = 0, 1, . . . , 21. (2)



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
This pitch pulse prototype is subsequently used in locating pitch pulses in
the
current frame.
The synthesized weighted speech signal w(t) (or the weighted speech
5 signal w(t)) can be used for the pulse prototype instead of the residual
signal
r(t). This facilitates pitch pulse search, because the periodic structure of
the
signal is better preserved in the weighted speech signal. The synthesized
weighted speech signal w(t) is obtained by filtering the synthesized speech
signal s(t) of the last subframe of the previous frame by the weighting filter
10 lN(z) of Equation (1 ). if the pitch pulse prototype extends over the end
of the
previously synthesized frame, the weighted speech signal w(t) of the current
frame is used for this exceeding portion. The pitch pulse prototype has a high
correlation with the pitch pulses of the weighted speech signal w(t) if the
previous synthesized speech frame contains already a well-developed pitch
15 cycle. Thus the use of the synthesized speech in extracting the prototype
provides additional information for monitoring the performance of coding and
selecting an appropriate coding mode in the current frame as will be explained
in more detail in the following description.
20 Selecting ! = 10 samples provides a good compromise between the
complexity and performance in the pitch pulse search. The value of 1 can also
be determined proportionally to the open-loop pitch estimate.
Given the position To of the last pulse in the previous frame, the first
25 pitch pulse of the current frame can be predicted to occur approximately at
instant To + p(To). Here p(t) denotes the interpolated open-loop pitch
estimate
at instant (position) t. This prediction is performed in block 303.
In block 305, the predicted pitch pulse position To + p(To) is refined as



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
26
T~ = To -~ p(To) + arg max C( j), (3)
where the weighted speech signal w(t) in the neighborhood of the predicted
position is correlated with the pulse prototype:
C(J) = Y(J) ~ »t" (1~)w(To + p(To) + j -l+k), j E ~-jmax~ lmax]. (g')
r;=o
Thus the refinement is the argument j, limited into [-jmax, jmax]~ that
maximizes
' 10 the weighted correlation C(j) between the pulse prototype and one of the
above mentioned residual signal, weighted speech signal or weighted
synthesized speech signal. According to an illustrative example, the limit
jmax
is proportional to the open-loop pitch estimate as min{20, ( p(0)/4)}, where
the
operator ( ~ ) denotes rounding to the nearest integer. The weighting function
Y(.7) = 1- I.7 I ~ P(Z'o +h(To)) (5)
in Equation (4) favors the pulse position predicted using the open-loop pitch
estimate, since y( j) attains its maximum value 1 at j = 0. The denominator
p(To + p(To)) in Equation (5) is the open-loop pitch estimate for the
predicted
pitch pulse position.
After the first pitch pulse position T~ has been found using Equation
(3), the next pitch pulse can be predicted to be at instant T2 = T~ + p(T~)
and
refined as described above. This pitch pulse search comprising the prediction
303 and refinement 305 is repeated until either the prediction or refinement
procedure yields a. pitch ~ pulse position outside the current frame. These
conditions are checked in logic block 304 for the prediction of the position
of



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
27
the next pitch pulse (block 303) and in logic block 306 for the refinement of
this position of the pitch pulse (block 305). It should be noted that the
logic
block 304 terminates the search only if a predicted pulse position is so far
in
the subsequent frame that the refinement step cannot bring it back to the
current frame. This procedure yields c pitch pulse positions inside the
current
frame, denoted by T~, T2, ..., T~.
According to an illustrative example, pitch pulses are located in the
integer resolution except the last pitch pulse of the frame denoted by -T~.
Since
the exact distance between the last pulses of two successive frames is
needed to determine the delay parameter to be transmitted, the last pulse is
located using a fractional resolution of 1/4 sample in Equation (4) for j. The
fractional resolution is obtained by upsampling w(t) in the neighborhood of
the
last predicted pitch pulse before evaluating the correlation of Equation (4).
According to an illustrative example, Hamming-windowed sinc interpolation of
length 33 is used for upsampling. The fractional resolution of the last pitch
pulse position helps to maintain the good performance of long term prediction
despite the time synchrony constrain set to the frame end. This is obtained
with a cost of the additional bit rate needed for transmitting the delay
parameter in a higher accuracy.
After completing pitch cycle segmentation in tf~e current frame, an
optimal shift for each segment is determined. This operation is done using the
weighted speech signal w(t) as will be explained in the following description.
For reducing the distortion caused by warping, the shifts of individual pitch
cycle segments are implemented using the LP residual signal r(t). Since
shifting distorts the signal particularly around segment boundaries, it is
essential to place the boundaries in low power sections of the residual signal
r(t). In an illustrative example, the segment boundaries are placed



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
2~
approximately in the middle of two consecutive pitch pulses, but constrained
inside the current frame. Segment boundaries are always selected inside the
current frame such that each segment contains exactly one pitch pulse.
Segments with more than one pitch pulse or "empty" segments without any
pitch pulses hamper subsequent correlation-based matching with the target
signal and should be prevented in pitch cycle segmentation. The sth extracted
segment of IS samples is denoted as ws(k) for k = 0, 1, ..., IS - 1, The
starting
instant of this segment is ts, selected such that ws(0) = w(fs). The number of
segments in the present frame is denoted by c.
While selecting the segment boundary between two successive pitch
pulses TS and TS + ~ inside the current frame, the following procedure is
used.
First the central instant between two pulses is computed as ri = ((TS +
TS+ ~)l2). The candidate positions for the segment boundary are located in the
region (ll,-E,ra~, ll+Emax~~ where emax corresponds to five samples. The
energy
of each candidate boundary position is computed as
The position giving the smallest energy is selected because this, choice
typically results in the smallest distortion in the modified speech signal.
The
instant that minimizes Equation (6) is denoted as ~. The starting instant of
the
new segment is selected as is = ~i + ~. This defines also the length of the
previous segment, since the previous segment ends at instant ~1 + ~-1.
Figure 6 shows an illustrative example of pitch cycle segmentation.
Note particularly the first and the last segment w~(k) and w4(k),
respectively,
extracted such that no empty segments result and the frame boundaries are
not exceeded.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
29
Determination of the Delay Parameter
Generally the main advantage of signal modification is that only one
delay parameter per frame has to be coded and transmitted to the decoder
(not shown). However, special attention has to be paid to the determination of
this single parameter. The delay parameter not only defines together with its
previous value the evolution of the pitch cycle length over the frame, but
also
affects time asynchrony in the resulting modified signal.
In the methods described in [1, 4-7]
[1] W.B. Kleijn, P. Kroon, and D. Nahumi, "The RCELP speech-
coding algorithm," European Transactions on
Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.
[4] US Patent 5,704,003, "RCELP coder," Lucent Technologies
Inc., (W.B. Kleijn and D. Nahumi), Filing Date 19 Sep. 1995.
~ [5] European Patent Application 0 602 826 A2, "Time shifting for
analysis-by-synthesis coding," AT&T Corp., (B. Kleijn), Filing
Date 1 Dec. 1993.
[6] Patent Application WO 00/11653, "Speech encoder with
continuous warping combined with long term prediction,"
Conexant Systems Inc., (Y. Gao), Filing Date 24 Aug. 1999.
[7] Patent Application WO 00/11654, "Speech encoder
adaptively applying pitch preprocessing with continuous



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
warping," Conexant Systems Inc., (H. Su and Y. Gao), Filing
Date 24 Aug. 1999.
no time synchrony is required at frame boundaries, and thus the delay
5 parameter to be transmitted can be determined straightforwardly using an
open-loop pitch estimate. This selection usually results in a time asynchrony
at the frame boundary, and translates to an accumulating time shift in the
subsequent frame because the signal .continuity has to be preserved.
Although human hearing is insensitive to changes in the time scale of the
10 synthesized speech signal, increasing time asynchrony complicates the
encoder implementation. Indeed, long signal buffers are required to
accommodate the signals whose time scale may have been expanded, and a
control logic has to be implemented for limiting the accumulated shift during
encoding. Also, time asynchrony of several samples typical in RCELP coding
15 may cause mismatch between the LP parameters and the modified residual
signal. This mismatch may result in perceptual artifacts to the modified
speech signal that is synthesized by LP filtering the modified residual
signal.
On the contrary, the illustrative embodiment of the signal modification
20 method according to the present invention preserves the time synchrony at
frame boundaries. Thus, a strictly constrained shift occurs at the frame ends
and every new frame starts in perfect time match with the original speech
frame.
25 To ensure time synchrony at the frame end, the delay contour d(t)
maps, with the long term prediction, the last pitch pulse at the end of the
previous synthesized speech frame to the pitch pulses of the current frame.
The delay contour defines an interpolated long-term prediction delay
parameter over the current nfh frame for every sample from instant t".~ + 1



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
31
through t". Only the delay parameter d" = d(t") at the frame end is
transmitted
to the decoder implying that d(t) must have a form fully specified by the
transmitted values. The long-term prediction delay parameter has to be
selected such that the resulting delay contour fulfils the pulse mapping. In a
mathematical form this mapping can be presented as follows: Let Icy be a
temporary time variable and To and T~ the last pitch pulse positions in the
previous and current frames, respectively. Now, the delay parameter d" has to
be selected such that, after executing the pseudo-code presented in Table 1,
the variable K~ has a value very close to To minimizing the error (K~ - Toy.
The
pseudo-code starts from the value Ico = T~ and iterates backwards c times by
updating Ic; := Ic;_~ - d(IC;.~). If rc° then equals to To, long term
prediction can be
utilized with maximum efficiency without time asynchrony at the frame end.
Table 1. Loop for searching the optimal delay parameter.
initialization
Ko := T
~ % loop
for i = 1 to c
Ifj := ICj_~ - d(K;_~) ;
end;
An example of the operation of the delay selection loop in the case c
= 3 is illustrated in Figure 7. The loop starts from the value Ico = T~ and
takes
the first iteration backwards as rc~ = Ico - d(ICO). Iterations are continued
twice



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
32
more resulting in x2 = x~ - d(lf~) and x3 = x2 - d(~c~). The final value x3 is
then
compared against To in terms of the error e" _ ~1f3 - To~. The resulting error
is
a function of the delay contour that is adjusted in the delay selectian
algorithm
as will be taught later in this specification.
Signal modification methods [1, 4, 6, 7] such as described in the
following documents:
[1] W.B. Kleijn, P. Kroon, and D. Nahumi, "The RCELP speech-
coding algorithm," European Transactions on
Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.
[4] US Patent 5,704,003, "RCELP coder," Lucent Technologies
Inc., (W.B. Kleijn and D. Nahumi), Filing Date 19 Sep. 1995.
[6] Patent Application WO 00/11653, "Speech encoder with
continuous warping combined with long term prediction,"
Conexant Systems Inc., (Y. Gao), Filing Date 24 Aug. 1999.
[7] Patent Application WO v 00/11654, "Speech encoder
adaptively applying pitch preprocessing with continuous
warping," Conexant Systems Inc., (H. Su and Y. Gao), Filing
Date 24 Aug. 1999.
interpolate the delay parameters linearly over the frame between d" _ ~ and
d".
However, when time synchrony is required at the frame end, linear
interpolation tends to result in an oscillating delay contour. Thus pitch
cycles
in the modified speech signal contract and expand periodically causing easily
annoying artifacts. The evolution and amplitude of the oscillations are
related



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
33
to the last pitch position. The further the last pitch pulse is from the frame
end
in relation to the pitch period, the more likely the oscillations are
amplified.
Since the time synchrony at the frame end is an essential requirement of the
illustrative embodiment of the signal modification method according to the
present invention, linear interpolation familiar from the prior methods cannot
be used without degrading the speech quality. Instead, the illustrative
embodiment of the signal modification method according to the present
invention discloses a piecewise linear delay contour
10' d(t) _ (1 cx(t))d"-i +cx(t)dn tn-, < t < t"_,+a~"
where
ci(t) _ (f-tn_~)~a'".
Oscillations are significantly reduced by using this delay contour. Here t"
and
t"_~ are the end instants of the current and previous frames, respectively,
and
d" and d"_~ are the corresponding delay parameter values. Note that t"_~ + a"
is
the instant after which the delay contour remains constant.
In an illustrative example, the parameter 6" varies as a function of d"_~
as
172 samples, cl"_, <_ 90 samples
128 samples, d,~_, > 90 samples
and the frame length N is 256 samples. To avoid oscillations, it is beneficial
to
decrease the value of 6" as the length of the pitch cycle increases. On the
other hand, to avoid rapid changes in the delay contour d(t) in the beginning
of the frame as t"_~ < t < t"_~ + ~", the parameter 6" has to be always at
least a



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
34
halfi of the frame length. Rapid changes in d(t) degrade easily the quality of
the modified speech signal.
Note that depending on the coding mode of the previous frame, d"_~
can be either the delay value at the frame end (signal modification enabled)
or
the delay value of the last subframe (signal modification disabled). Since the
past value d"_~ of the delay parameter is known at the decoder, the delay
contour is unambiguously defined by d", and the decoder is able to form the
delay contour using Equation (7).
The only parameter which can be varied while searching the optimal
delay contour is d", the delay parameter value at the end of the frame
constrained into [34, 231 ]. There is no simple explicit method for solving
the
optimal d" in a general case. Instead, several values have to be tested to
find
the best solution. However, the search is straightforward. The value of d" can
be first predicted as "
T -T
dao~ = 2 ~ ° -d,~_i . (10)
c
In the illustrative embodiment embodiment, the search is done in three
phases by increasing the resolution and focusing the search range to be
examined inside [34, 231] in every phase. The delay parameters giving the
smallest error e" _ ~K~ - Toy in the procedure of Table 1 in these three
phases
are denoted by d"", elnz', and d" = d,~,3', respectively. In the first phase,
the
search is done around the value cl;°' predicted using Equation (10)
with a
resolution of four samples in the range [dn°' - 11, el;°' + 12]
when do°' < 60,
and in the range [cl"°' - 15, d;°' + 16] otherwise. The second
phase



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
constrains the range into [dn" -3, dn'' + 3] and uses the integer resolution.
The last, third phase examines the range [d;'' -3/4, dnz' +3i4] with a
resolution of 1/4 sample for d;z' < 92/2. Above that range [d;'' -1/2,
d"2' + 1/2] and a resolution of 1/2 sample is used. This third phase yields
the
5 optimal delay parameter d" to be transmitted to the decoder. This procedure
is
a compromise between the search accuracy and complexity. Of course, those
of ordinary skill in the art can readily implement the search of the delay
parameter under the time synchrony constrains using alternative means
without departing from the nature and spirit of the present invention..
The delay parameter d" E [34, 231] can be coded using nine bits per
frame using a resolution of 1/4 sample for d~ < 92/2 and 1/2 sample for dr, >
92/2.
Figure 8 illustrates delay interpolation when d"_~ = 50, d" = 53, a" = 172,
and the frame length N = 256. The interpolation method used in the
illustrative
embodiment of the signal modification method is shown in thick line whereas
the linear interpolation corresponding to prior methods is shown in thin line.
Both interpolated contours perform approximately in a similar manner in the
delay selection loop of Table 1, but the disclosed piecewise linear
interpolation results in a smaller absolute change ~dn_~ - d"I. This feature
reduces potential oscillations in the delay contour d(t) and annoying
artifacts
in the modified speech signal whose pitch will follow this delay contour.
To further clarify the performance of the piecewise linear interpolation
method, Figure 9 shows an example on the resulting delay contour d(t) over
ten frames with thick line. The corresponding delay contour d(f) obtained with
conventional linear interpolation is indicated with thin line. The example has



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
36
been composed using an artificial speech signal having a constant delay
parameter of 52 samples as an input of the speech modification procedure. A
delay parameter do = 54 samples was intentionally used as an initial value for
the first frame to illustrate the effect of pitch estimation errors typical in
speech
coding. Then, the delay parameters d" both for the linear interpolation and
the
herein disclosed piecewise linear interpolation method were searched using
the procedure of Table 1. All the parameters needed were selected in
accordance with the illustrative embodiment of the signal modification method
according to the present invention. The resulting delay contours d(t) show
that
piecewise linear interpolation yields a rapidly converging delay contour d(t)
whereas the conventional linear interpolation cannot reach the correct value
within the ten frame period. These prolonged oscillations in the delay contour
d(t) often cause annoying artifacts to the modified speech signal degrading
the overall perceptual quality. ,
Modification of the Signal
After the delay parameter do and the pitch cycle segmentation have
been determined, the signal modification procedure itself can be initiated. In
the illustrative embodiment of the signal modification method, the speech
signal is modified by shifting individual pitch cycle segments one by one
adjusting them to the delay contour d(t). A segment shift is determined by
correlating the segment in the weighted speech domain with the target signal.
The target signal is composed using the synthesized weighted speech signal
,v(t) of the previous frame and the preceding, already shifted segments in the
current frame. The actual shift is done on the residual signal r(t).
Signal modification has to be done carefully to both maximize the
performance of long term prediction and simultaneously to preserve the



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
37
perceptual quality of the modified speech signal. The required time synchrony
at frame boundaries has to be taken into account also during modification.
A block diagram of the illustrative embodiment of the signal
modification method is shown in Figure 10. Modification starts by extracting a
new segment ws(k) of IS samples from the weighted speech signal w(t) in
block 401. This segment is defined by the segment length IS and starting
instant is giving ws(k) = w(ts + k) for k = 0, 1, ..., IS - 1. The
segmentation
procedure is carried out in accordance with the teachings of the foregoing
description.
If no more segments can be selected or extracted (block 402), the
signal modification operation is completed (block 403). Otherwise, the signal
modification operation continues with block 404.
For finding the optimal shift of the current segment ws(k), a target
signal w(t) is created in block 405. For the first segment w~(k) in the
current
frame, this target signal is obtained by the recursion
w(t) - W(t)' t ~ t"_' (11 )
w(t) = w(t - d (t)), t" _, < t <_ t" _, + l, + 8, .
Here w(t) is the weighted synthesized speech signal available in the previous
frame for t _< t"_~. The parameter ~1 is the maximum shift allowed for the
first
segment of length h. Equation (11 ) can be interpreted as simulation of long
term prediction using the delay contour over the signal portion in which the
current shifted segment may potentially be situated. The computation of the
target signal for the subsequent segments follows the same principle and will
be presented later in this section.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
The search procedure for finding the optimal shift of the current
segment can be initiated after forming the target signal. This procedure is
based on the correlation cs( s' ) computed in block 404 between the segment
5 ws(k) that starts at instant is and the target signal w(t) as
cs(b')= ~ws(Is)w(k+ts+8'), ~' E [-(~BS~, r~s~], (12)
=o
where 8S determines the maximum shift allowed for the current segment ws(k)
10 and (~~~ denotes rounding towards plus infinity. Normalized correlation can
be
well used instead of Equation (12), although with increased complexity. In the
illustrative embodiment, the following values are used for 8S:
4'/~ samples, d" < 90 samples
5 samples, do >_ 90 samples
As will be described later in this section, the value of 8S is more limited
for the
first and the last segment in the frame.
Correlation (12) is evaluated .with an integer resolution, but higher
accuracy improves the performance of long term prediction. For keeping the
complexity low it is not reasonable to upsample directly the signal vus(k) or
w(t) in Equation (12). Instead, a fractional resolution is obtained in a
computationally efficient manner by determining the optimal shift using the
upsampled correlation cs (8~.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
39
The shift ~ maximizing the correlation cs (~' ) is searched first in the
integer resolution in block 404. Now, in a fractional resolution the maximum
value must be located in the open interval (~- 1, 8+ 1), and bounded into [-
BS,
85]. In block 406, the correlation cs ( s' ) is upsampled in this interval to
a
resolution of 1/8 sample using Hamming-windowed sine interpolation of a
length equal to 65 samples. The shift b' corresponding to the maximum value
of the upsampled correlation is then the optimal shift in a fractional
resolution.
After finding this optimal shift, the weighted speech segment ws(k) is
recalculated in the solved fractional resolution in block 407. That is, the
precise new starting instant of the segment is, updated as is := fs - 8 + Vii,
where ~, = ~~~. Further, the residual segment rs(k) corresponding to the
weighted speech segment ws(k) in fractional resolution is computed from the
residual signal r(t) at this point using again the sine interpolation a.s
described
before (block 407). Since the fractional part of the optimal shift is
incorporated
into the residual and weighted speech segments, all subsequent
computations can be implemented with the upward-rounded shift b'i =~8~.
Figure 11 illustrates recalculation of the segment ws(k) in accordance
with block 407 of Figure 10. In this illustrative example, the optimal shift
is
searched with a resolution of 1l8 sample by maximizing the correlation giving
the value b' _ -13/$. Thus the integer part 8~ becomes ~-13/a~ _ -1 and the
fractional part 3/s. Consequently, the starting instant of the segment is
updated
as tS = is + 3/8. In Figure 11, the new samples of ws(k) are indicated with
gray
dots.
If the logic block 106, which will be disclosed later, permits to continue
signal modification, the final task is to update the modified residual signal
Y(t)
by copying the current residual signal segment rs(k) into it (block 411 ):



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
i°(tl +~, +k) = ~s(Ic), 1c = 0,1, ..., lS -1. (14)
Since shifts in successive segments are independent from each others, the
5 segments positioned to r(t) either overlap or have a gap in between them.
Straightforward weighted averaging can be used for overlapping segments.
Gaps are filled by copying neighboring samples from the adjacent segments.
Since the number of overlapping or missing samples is usually small and the
segment boundaries occur at low-energy regions of the residual signal,
10 usually no perceptual artifacts are caused. It should be noted that no
continuous signal warping as described in [2], [6], [7],
[2] W.B. Kleijn, R.P. Ramachandran, and P. Kroon,
"Interpolation of the pitch-predictor parameters in analysis-by-
15 synthesis speech coders," IEEE Transactions on Speech and
Audio Processing, Vol. 2, No. 1, pp. 42-54, 1994.
[6] Patent Application WO 00/11653, "Speech encoder with
continuous warping combined with long term prediction,"
20 Conexant Systems Inc., (Y. Gao), Filing Date 24 Aug. 1999.
[7] Patent Application WO 00/11654, "Speech encoder
adaptively applying pitch preprocessing with continuous
warping," Conexant Systems Inc., (H. Su and Y. Gao), Filing
25 Date 24 Aug. 1999.
is employed, but modification is done discontinuously by shifting pitch cycle
segments in order to reduce the complexity. ,



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
41
Processing of the subsequent pitch cycle segments follows the above-
disclosed procedure, except the target signal w(t) in block 405 is formed
differently than for the first segment. The samples of w(t) are first replaced
with the modified weighted speech samples as
w(ts + 8, + Iz) = ws (k), lc = 0,1, ..., is -1. ( 15 )
This procedure is illustrated in Figure 11. Then the samples following the
updated segment are also updated,
w(Ic) = w(Iz-d(7c)), le =is +~, +ls, ...,ts +S, +ls+ls+~+~'S+m2. (16)
The update of target signal w(t) ensures higher correlation between
successive pitch cycle segments in the modified speech signal considering
the delay contour d(t) and thus more accurate long term prediction. While
processing the last segment of the frame, the target signal w(t) does not
need to be updated.
The shifts of the first and the last segments in the frame are special
cases which have to be performed particularly carefully. Before shifting the
first segment, it should be ensured that no high power regions exist in the
residual signal r(t) close to the frame boundary tn _ ~, because shifting such
a
segment may cause artifacts. The high power region is searched by squaring
the residual signal r(t) as
Eo(k) = r2 (k), k E [tn-~ - So, tn-~ + So~, (17)



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
42 -
where So = ~p(t" - ~)/2~. If the maximum of Eo(k) is detected close to the
frame
boundary in the range [t" _ ~ - 2, t" - ~ + 2], the allowed shift is limited
to 1/4
samples. If the proposed shift ~8~ for the first segment is smaller that this
limit,
the signal modification procedure is enabled in the current frame, but the
first
segment is kept intact.
The last segment in the frame is processed in a similar manner. As
was described in the foregoing description, the delay contour d(t) is selected
such that in principle no shifts are required for the last segment. However,
because the target signal is repeatedly updated during signal modification
considering correlations between successive segments in Equations (16) and
(17), it is possible the last segment has to be shifted slightly. In the
illustrative
embodiment, this shift is always constrained to be smaller than 3/2 samples.
If
there is a high power region at the frame end, no shift is allowed. This
condition is verified by using the squared residual signal
E1(k) = r2 (k), k E [t" - s~ + 1, t" + 1 ],
where S~ = p(t"). If the maximum of E~(k) is attained for k larger than or
equal
to t" - 4, no shift is allowed for the last segment. Similarly as for the
first
segment, when the proposed shift ~~~ < 1/4, the present frame is still
accepted
for modification, but the last segment is kept intact.
It should be noted that, contrary to the known signal modification
methods, the shift does not translate to the next frame, and every new frame
starts perfectly synchronized with the original input signal. As another
fundamental difference particularly to RCELP coding, the illustrative
embodiment of signal modification method processes a complete speech
frame before the subframes are coded. Adt~ittedly, subframe-wise



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
43
modifiication enables to compose the target signal for every subframe using
the previously coded subframe potentially improving the performance. This
approach cannot be used in the context of the illustrative embodiment of the
signal modification method since the allowed time asynchrony at the frame
end is strictly constrained. Nevertheless, the update of the target signal
with
Equations (15) and (16) gives practically speaking equal performance with the
subframe-wise processing, because modification is enabled only on smoothly
evolving voiced frames.
Mode Determination Logic Incorporated into the Signal
Modification Procedure
The illustrative embodiment of signal modification method according
to the present invention incorporates an efficient classification and mode
determination mechanism as depicted in Figure 2. Every operation performed
in blocks 101, 103 and .105 yields several indicators quantifying the
attainable
performance of long term prediction in the current frame. if any of these
indicators is outside its allowed limits, the signal modification procedure is
terminated by one of the logic blocks 102, 104, or 106. In this case, the
original signal is preserved intact.
The pitch pulse search procedure 101 produces several indicators on
the periodicity of the present frame. Hence the logic block 102 analyzing
these indicators is the most important component of the classification logic.
The logic block 102 compares the difference between the detected pitch pulse
positions and the interpolated open-loop pitch estimate using the condition
~Tk - Tk _ ~ - p(Tk)I < 0.2 p(Tk), k = 1, 2, ..., c, (19)



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
44
and terminates the signal modification procedure if this condition is not met.
The selection of the delay contour d(f) in block 103 gives also
additional information on the evolution of the pitch cycles and the
periodicity of
the current speech frame. This information is examined in the logic block 104.
The signal modification procedure is continued from this block 104 only if the
condition ~d" - d"_ ~~ < 0.2 d" is fulfilled. This condition means that only a
small
delay change is tolerated for classifying the current frame as purely voiced
frame. The logic block 104 also evaluates the success of the delay selection
loop of Table 1 by examining the difference ~If~ - Toy for the selected delay
parameter value d". If this difference is greater than one, sample, the signal
modification procedure is terminated.
For guaranteeing a good quality for the modified speech signal, it is
advantageous to constrain shifts done for successive pitch cycle segments in
block 105. This is achieved in the logic block 106 by imposing the criteria
Scs> _~~S-»~ < 4.0 samples, d" < 90 samples 20
4.8 samples, d" >_ 90 samples ( )
to all segments of the frame. Here d~S~ and b'~S' ~~ are the shifts done for
the sr"
and (s - 1 )t" pitch cycle segments, respectively. If the thresholds are
exceeded, the signal modification procedure is interrupted and the original
signal is maintained.
When the frames subjected to signal modification are coded at a low
bit rate, it is essential that the shape of pitch cycle segments remains
similar
over the frame. This allows faithful signal modeling by long term prediction
and thus coding at a low bit rate without degrading the subjective quality.
The



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
similarity of successive segments can be quantified simply by the normalized
correlation
r.,_ i
Ws IOW ~IC -E- is -~- ~l )
k=o 21
gs = ~e _, ~Y _, ( )
~WZ(Iz) ~WZ(It~-is -I-CS,)
k=0 k=0
5 between the current segment and the target signal at the optimal shift after
the update of ws(k) in block 407 of Figure 10. The normalized correlation gs
is
also referred to as pitch gain.
Shifting of the pitch cycle segments in block 105 maximizing their
10 correlation with the target signal enhances the periodicity and yields a
high
pitch prediction gain if the signal modification is useful in the current
frame.
The success of the procedure is examined in the logic block 106 using the
criteria
15 gS >_ 0.84.
If this condition is not fulfilled for all segments, the signal modification
procedure is terminated (block 409) and the original signal is kept intact.
When this condition is met (block 106), the signal modification continues in
20 block 411. The pitch gain gs is computed in block 408 between the
recalculated segment ws(k) from block 407 and the target signal w(t) from
block 405. In general, a slightly lower gain threshold can be allowed on male
voices with equal coding performance. The gain thresholds can be changed in
different operation modes of the encoder for adjusting the usage percentage
25 of the signal modification mode and thus the resulting average bit rate.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
46
Mode Determination Logic for a Source-controlled Variable Bit
Rate Speech Codec .
This section discloses the use of the signal modification procedure as
a part of the general rate determination mechanism in a source-controlled
. variable bit rate speech codec. This functionality is immersed into the
illustrative embodiment of the signal modification method, since it provides
several indicators on signal periodicity and the expected coding performance
of long term prediction in the present frame. These indicators include the
evolution of pitch period, the fitness of the selected delay contour for
describing this evolution, and the pitch prediction gain attainable with
signal
modification. If the logic blocks 102, 104 and 106 shown in Figure 2 enable
signal modification, long term prediction is able to model the modified speech
frame efficiently facilitating its coding at a low bit rate without degrading
subjective quality. In this case, the adaptive codebook excitation has a
dominant contribution in describing the excitation signal, and thus the bit
rate
allocated for the fixed-codebook excitation can be reduced. When a logic
block 102, 104 or 106 disables signal modification, the frame is likely to
contain an non-stationary speech segment such as a voiced onset or rapidly
evolving voiced speech signal. These frames typically require a high bit rate
for sustaining good subjective quality.
Figure 12 depicts the signal modification procedure 603 as a part of
the rate determination logic that controls four coding modes. In this
illustrative
embodiment, the mode set comprises a dedicated mode for non-active
speech frames (block 508), unvoiced speech frames (block 507), stable
voiced frames (block 506), and other types of frames (block 505). It should be
noted that all these modes except the mode for stable voiced frames 506 are



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
47
implemented in accordance with techniques well known to those of ordinary
skill in the art.
The rate determination logic is based on signal classification done in
three steps in logic blocks 501, 502, and 504, from which the operation of
blocks 501 and 502 is well known to those or ordinary skill in the art.
First, a voice activity detector (VAD) 501 discriminates between active
and inactive speech frames. If an inactive speech frame is detected, the
speech signal is processed according to mode 508.
If an active speech frame is detected in block 501, the frame is
subjected to a second classifier 502 dedicated to making a voicing decision.
If
the classifier 502 rates the current frame as unvoiced 'speech signal, the
classification chain ends and the speech signal is processed in accordance
with mode 507. Otherwise, the speech frame is passed through to the signal
modification module 603.
The signal modification module then provides itself a decision on
enabling or disabling the signal modification of the current frame in a logic
block 504. This decision is in practice made as an integral part of the signal
modification procedure in the logic blocks 102, 104 and 106 as explained
earlier with reference to Figure 2. When signal modification is enabled, the
frame is deemed as a stable voiced, or purely voiced speech segment.
When the rate determination mechanism selects mode 506, the signal
modification mode is enabled and the speech frame is encoded in accordance
with the teachings of the previous sections. Table 2 discloses the bit
allocation
used in the illustrative embodiment for the mode 506. Since the frames to be



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
48
coded in this mode are characteristically very periodic, a substantially lower
bit rate suffices for sustaining good subjective quality compared for instance
to transition frames. Signal modification allows also efficient coding of the
delay information using only nine bits per 20-ins frame saving a considerable
i proportion of the bit budget for other parameters. Good performance of long
term prediction allows to use only 13 bits per 5-ms subframe for the fixed-
codebook excitation without sacrificing the subjective speech quality. The
fixed-codebook comprises one track with two pulses, both having 64 possible
positions.
0



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
49
Table 2. Bit allocation in the voiced 6.2-kbps mode
for a 20-ms frame comprising four subframes.
Parameter Fame
~~ts I


,
d .3~",


LP Parameters 34


Pitch Delay 9


Pitch Filtering 4 - 1 + 1 + 1 + 1


Gains 24 = 6 + 6 + 6 + 6


Algebraic 52 = 13 + 13 + 13
Codebook + 13


Mode Bit 1


~,.'~'otal ' E ~ 24 bras 6j2 lops
= '~ j> ~~ ~ v


~y.~ , , .~~ ,:


Table 3. Bit allocation in the 12.65-kbps mode
in accordance with the AMR-WB standard.
;:'Parameter yts
; ~ ~ ry ~~~ I
t Frarne
~~
~
"
~
~~
y
~
'
Ej;


f S ~ - ~ t~~
to ~ ~1-~.h..~ s~! '~ '~
o . ..9,s E,E, ~y~..
.,fit g .:E.
a: . _~


LP Parameters 46


Pitch Delay 30 9 + 6 + 9 +
= 6


Pitch Filtering 4 1 + 1 + 1 +
= 1


Gains 24 7 + 7 + 7 +
= 7


Algebraic Codebook144 36 + 36 + 36
= + 36


Mode Bit 1


Total; r , y~53
bxfis
~
265
kl~ps


, li
c
#
f
P
t
a



The other coding modes 505, 507 and 508 are implemented following
known techniques. Signal modification is disabled in all these modes. Table 3
shows the bit allocation of the mode 505 adopted from the AMR-WB standard.
The technical specifications [11] and [12] related to the AMR-WB
standard are enclosed here as references on the comfort noise and VAD
functionalities in 501 and 508, respectively.



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
[11] 3GPP TS 26.192, "AMR Wideband Speech Codec:
Comfort Noise Aspects," 3GPP Technical Specification.
5 [12] 3GPP TS 26.193, "AMR Wideband Speech Codec: Voice
Activity Detector (VAD)," 3GPP Technical Specification.
In summary, the present specification has described a frame
synchronous signal modification method for purely voiced speech frames, a
10 classification mechanism for detecting frames to be modified, and to use
these methods in a source-controlled CELP speech codec in order to enable
high-quality coding at a low bit rate.
The signal modification method incorporates a classification
15 mechanism for determining the frames to be modified. This differs from
prior
signal modification and preprocessing means in operation and in the
properties of the modified signal. The classification functionality embedded
into the signal modification procedure is used as a part of the rate
determination mechanism in a source-controlled CELP speech codec.
Signal modification is done pitch and frame synchronously, that is,
adapting one pitch cycle segment at a time in the current frame such that a
subsequent speech frame starts in perfect time alignment with the original
signal. The pitch cycle segments are limited by frame boundaries. This feature
prevents time shift translation over frame boundaries simplifying encoder
implementation and reducing a risk of artifacts in the modified speech signal.
Since time shift does not accumulate over successive frames, the signal
modification method disclosed does not need long buffers for accommodating
expanded signals nor a complicated logic for controlling the accumulated time



CA 02469774 2004-06-09
WO 03/052744 PCT/CA02/01948
51
shift. 1n source-controlled speech coding, it simplifies multi-mode operation
between signal modification enabled and disabled modes, since every new
frame starts in time alignment with the original signal.
Of course, many other modifications and variations are possible. In
view of the above detailed illustrative description of the present invention
and
associated drawings, such other modifications and variations will now become
apparent to those of ordinary skill in the art. It should also be apparent
that
such other variations may be effected without departing from the spirit and
scope of the present invention.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(86) PCT Filing Date 2002-12-13
(87) PCT Publication Date 2003-06-26
(85) National Entry 2004-06-09
Examination Requested 2007-10-18
Dead Application 2010-12-13

Abandonment History

Abandonment Date Reason Reinstatement Date
2009-12-14 FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Registration of a document - section 124 $100.00 2004-06-09
Registration of a document - section 124 $100.00 2004-06-09
Registration of a document - section 124 $100.00 2004-06-09
Application Fee $400.00 2004-06-09
Registration of a document - section 124 $100.00 2004-11-23
Maintenance Fee - Application - New Act 2 2004-12-13 $100.00 2004-12-02
Maintenance Fee - Application - New Act 3 2005-12-13 $100.00 2005-11-14
Maintenance Fee - Application - New Act 4 2006-12-13 $100.00 2006-11-23
Request for Examination $800.00 2007-10-18
Maintenance Fee - Application - New Act 5 2007-12-13 $200.00 2007-11-29
Maintenance Fee - Application - New Act 6 2008-12-15 $200.00 2008-11-26
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
NOKIA CORPORATION
Past Owners on Record
JELINEK, MILAN
LAFLAMME, CLAUDE
RUOPPILA, VESA T.
TAMMI, MIKKO
VOICEAGE CORPORATION
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Claims 2004-06-09 22 759
Abstract 2004-06-09 2 99
Drawings 2004-06-09 13 234
Description 2004-06-09 51 2,073
Representative Drawing 2004-06-09 1 36
Cover Page 2004-08-12 2 72
PCT 2004-06-09 9 357
Assignment 2004-06-09 7 315
PCT 2004-06-10 7 244
Correspondence 2004-11-23 2 65
Assignment 2004-11-23 3 129
Correspondence 2004-12-20 1 15
Correspondence 2004-12-20 1 18
Fees 2004-12-02 1 50
Prosecution-Amendment 2007-10-18 1 56