Note: Descriptions are shown in the official language in which they were submitted.
' CA 02210313 1997-07-09
- . _
WO 96/22651 PGTlU596/00790
1
METHOD OF AND APPARATUS FOR ECHO REDUCTION IN A HANDS-FREE
CELLULAR RADIO COMMUNICAT10N SYSTEM
FiEi.D OF THE INVENTION
The present invention relates to cellular telephone systems and more
specifically to
a method of and apparatus for reducing the echo and providing comfort noise in
a voice
switched hands-free system for a cellular radiotelephone.
-~~ BACKGROUND OF THE 1NVENT10N
Cellular radiotelephones have become ubiquitous tools for wireless voice
communication. Many cellular radiotelephones can be operated using a so-called
"hands-free° system which allows the user of the cellular
radiotelephone to
communicate over short distances without having to hold a handset. The hands-
free
system is intended to be used while operating an automobile or when the user's
hands
are otherwise preoccupied. The hands-free system allows the user of a cellular
radiotelephone to engage in a conversation with another party by speaking into
a
microphone and listening to the other party by means of a loudspeaker. The
2 0 microphone and loudspeaker are sometimes referred to as the "hands-free"
loudspeaker and microphone since these are sometimes located external to the
cellular
radiotelephone and operate in place of the existing microphone and loudspeaker
located in the handset, or in the cellular radiotelephone itself. The hands-
free system
processes signals produced by the microphone to generate uplink signals which
are
2 5 ultimately transmitted to a base station. The hands-free system also
processes
downlink signals received from the base station.
There are essentially two types of hands-free systems: full-duplex and half-
duplex. In a full-duplex system, both the uplink and downlink may ~be active
simultaneously. In a full-duplex hands-free system, downlink signals which
ultimately
3 0 emanate from the hands-free loudspeaker as acoustic sound may be picked up
by the
hands-free microphone. For proper full-duplex operation, the downlink signal
must be
removed from the hands-tree microphone signal to prevent the person at the
other end
of the call from experiencing a pernicious acoustic echo of their own voice.
Depending
' CA 02210313 1997-07-09
WO 96/22651 PCTlUS96100790
a
upon the amplitude and delay of the echo, normal conversation may be very
difficult to
achieve when using a full-duplex hands-free arrangement unless the downlink
echo
can be suppressed.
Many solutions have been proposed to eliminate, or to otherwise significantly
reduce, the magnitude of the downlink echo in a full duplex system: see, for
example,
Park, ef aL, Acoustic Echo Cancellation for Fuil-Duplex Voice Transmission on
Fading
Channels", Proceedings of the International Mobite Satellite Conference,
Ottawa,
Ontario, Canada, June 18-20, 1990. Existing echo cancellation techniques are
complicated, require a great deal of processing power, and are not generally
Io appropriate -for use in a cost-sensitive consumer product such as a
cellular
radiotelephone.
!n a half-duplex hands-free system, only one path (uplink or downlink) is open
at a time; the other is muted. Whichever path is open determines which person
may
speak. Although effective in preventing an echo, a half-duplex system results
in stilted,
unnatural conversation. To improve the perfromance of half-duplex hands-free
systems, voice-swtching may be employed. In a voice switched system, the
activity of
the persons speaking is used to decide which path is open. A voice activity
detector
determines which person is talking and mutes the signal from the other end.
This
prevents the talker's echo from being picked up and retransmitted to the
talker. In ideal
2 0 conditions, well disciplined users of a voice-switched hands-free system
achieve near
full-duplex performance.
One problem with a voice-switched hands-free system occurs when the hands-
free system is used in a noisy environment such as in a moving automobile.
When the
hands-free microphone is muted, the person at the other end of the call
suddenly hears
2 5 silence when, previously, the background noise of the automobile was
audible. The
sudden loss of background noise may suggest to the person at the other end of
the call
that the connection has been lost. In order to overcome this, artificial
background noise,
called comfort noise, is provided. One example which is used in a time
division multiple
access (TDMA) cellular radio system is described in U.S. Patent 5,222,251
where the
3 0 microphone signals are replaced by codewords representing ambient noise.
The
codewords are produced by a speech compression algorithm known as VSELP. This
technique has several disadvantages. First, since the microphone signals are
replaced
by codewords representing ambient noise, there can be an abrubt change between
the
CA 02210313 1997-07-09
WO 96122651 PCT/LJS96I00790
3
actual ambient noise - which may be dynamic - and the artificial ambient
noise. If the
difference between the actual ambient noise and the artificial ambient noise
represented by codewords is significant, the landline user may find the
replacement
noticeable. Second, since the microphone signals are wholly replaced, the user
of the
radiotelephone is unable to "cut-in° on the other party by raising his
or her voice as can
be done in a normal telephone conversation. Thirdly, in a TDMA system,
successively
replacing the microphone signals with the same artificial codeword may produce
a
modulative effect which could be distracting to the user.
t o ~ Summary of the Invention
The aforementioned problem of reducing the echo and providing comfort noise in
a
cellular radiotelephone arranged in a hands-free configuration is ameliorated
in accordance
with the present invention. -
A method and apparatus is presented in which noise frames representing ambient
noise are generated by the cellular radiotelephone and are added to attenuated
uplink
speech frames derived from a microphone signal. The attenuation may be
gradually applied
to the speech frames. A variable pointer is used to randomly order the noise
frame to
reduce the modulative effect.
2 o These and other features and advantages of the present invention will be
readily apparent to one of ordinary skill in the art from the following
written description
when read in conjunction with the drawings in which like reference numerals
refer to
like elements.
Brief Description of the Drawings
An exemplifying embodiment of the invention will now be described in more
detail
with reference to the accompanying drawings, in which:
Figure 1 is a schematic illustration of a cellular radio communication system
where the
3 o cellular radiotelephone is configured in a hands-free arrangement;
Figure 2 is a schematic illustration of a hands-free system according to an
embodiment of
the present invention;
CA 02210313 1997-07-09
WO 9~6I22651 PCTlU596/00790
4
Figure 3 is a pictorial illustration of how uplink speech frames are
collected, processed, and
orgainzed for transmission according to one embodiment the present invention;
,
Flgure 4 is a state diagram illustrating the function of the decision block;
Figure 5a is a schematic illustration of a noise frame;
Figure 5b is a schematic illustration of a randomly ordered noise frame; and
Figure 6 is an pictorial illustration how upiink speech frames are collected,
processed, and
orgainzed for transmission according to an alternate embodiment the present
invention.
DESCRIPTION OF THE 1NVENT10N
to
In the following description, for purposes of explanation and not limitation,
specific details are set forth, such as particular circuits, circuit
components, techniques.
etc. in order to provide a thorough understanding of the invention. However it
will be
apparent to one of ordinary skill in the art that the present invention may be
practiced in
other embodiments that depart from these specific details. In other instances,
detailed
descriptions of well-known methods, devices, and circuits are omitted so as
not to
obscure the description of the present invention with unnecessary detail.
In 'Fgure 1 is shown a cellular radio communication system in which the
present
invention may be advantageously employed. An example of a cellular radio
communication system, known as D-AMPS, is currently in use in the United
States and
in several other countries. D-AMPS is described in the ElA/T1A standard
entitled
°Cellular System Dual-Mode Mobile Station - Base Station Compatibility
Standard, IS-
54-B", available from the Telecommunications Industry Association, 2001
Pennsylvania
Avenue, N.W., Washington, D.C. 20006. In this illustration, cellular
radiotelephone 100,
2 5 which is configured in a hands-free arrangement, is in radio communication
with
landline telephone user 110. Radio signals transmitted from cellular
radiotelephone 100
(i.e., the uplink) are received by cellular base station 108 which is
interfaced to public
switched telephone network (PSTN) 107 via mobile telephone switching office
(MTSO)
709. Conventional landline telephone 105 is coupled to PSTN 107 via hybrid
circuit
706. MTSO 109 may alternatively provide a radio connection between two
cellular
radiotelephone as is obvious to one skilled in the art. '
CA 02210313 2004-09-07
r
Cellular telephone 100 comprises ceAuiar transceiver 103 which is coupled to
hands-free system 111. Hands-free system 111 comprises acoustic echo processor
102, loudspeaker 104, and microphone 101. Cellular transceiver 103 may be
found in
a conventional cellular radiotelephone such as the DH 338 manufactured by the
instant
5 assignee of the present invention.
The acoustic echo reduction processor 102 generally shown in Figure 1 is
illustrated in greater detail in Figure 2. Referring now to Figure 2, acoustic
signals, such
as speech, environmental noise, and/or a combination thereof, are received by
microphone 101 whose analog signals are coupled to analog to digital (AID)
converter
200. AID converter 200 samples the analog microphone signals with, for
example, i 3
bit/sample resolution at 8 kilosampies/second to produce a 104 kilobitlsecond
pulse
code modulation (PGM) bitstream. The PCM bitstream is serially transmitted to
speech
frame collector 201 which arranges groups of samples into so-called speech
frames
which, in this example, are 160 samples, or 2080 bits, long. Although referred
to herein
as speech frames, the samples may, or may not, include actual speech. It is
also not
intended to limit the definition of a speech frame to a TDMA communication
format. In
code division multiple access, or CDMA, for example, the speech frame is not
broken
into discrete temporal blocks, but rather is a continuous stream of digital
data whose
bitrate is increased using a spreading code. Similarly in other multiple
access, or non-
multiple access, digital communication systems, the concept of a speech frame
as a
continuous bitstream equally applies as is obvious to one skilled in the art.
The speech
frames are coupled to the uplink speech defector 202 which analyzes each
speech
frame to determine if human speech is present. This determination may be
accomp)ished by, for example, analyzing the energy content of the speech frame
as
described in U.S. Patent Number 5,511,414 to Salve, et al. entitled "System
for
Adaptively Reducing Norse in Speech Signals" filed Sept. 29, 1993. An
indication of the
presence, or absence, of human speech in the speech frame is coupled to the
decision
logic block 206. The speech frames are coupled also to the uplink variable
attenuator
203 and subsequently to noise adder 204. The output of noise adder 204 is
referred to
herein as the uplink speech frame which is coupled to radio transceiver 103.
CA 02210313 1997-07-09
WO 96122651 PCTIUS96/00790
6
Similarly, downlink speech frames received from radio transceiver 103 are
coupled to the downlink speech detector 207 which determines the presence, or
absence, of human speech in the downlink speech frame in the same manner as
that
described for the upiink speech detector 202. The output of the downlink
speech
detector is coupled also to decision logic block 206.
By means of definition, what is referred to as the upiink path begins with the
hands-free microphone 101, ends with the landline telephone 105 and includes
everything therebetween. Similarly, what is referred to as the downlink path
begins with
the landline telephone '105, ends with the hands-free loudspeakec 104 and
includes
l0 everything therebetween. When the upiink path is open, the user of the
cellular
radiotelephone 100 may speak into microphone 101 and be heard by the other
party at
telephone 105. Similarly, when the downlink path is open, the landline user
may speak
into landline telephone i05 and be heard by the cellular user at speaker 104.
The manner in which the speech frames are attenuated, comfort noise added,
and the speech frames organized into TDMA frames for transmission is
illustrated
pictorially in Figure 3. The 104 kiiobitJsecond PCM bitstream produced by AID
converter
200 is coupled to speech frame collector 201 which outputs speech frames which
are,
in this example, 160 samples, or 2080 bits, in length. When the downlink path
is open,
the speech frames are attenuated by variable attenuator 203 whose output is
coupled
2 0 to noise adder 204. The uplink speech frames which are produced by noise
adder 204
are coupled to radio transceiver 103.
Radio transceiver 103 receives the uplink speech frames and couples them to
compression processor 300 which may be, for example, a VSELP speech coder as
used in D-AMPS. VSELP compression reduces the bitrate from 104 kiiobitslsecond
to
just under 8 kilobitslsecond. The compressed bits are coupted to coamg ano
interleaving block 310 where the compressed bits are segregated into classes 1
a, 1 b,
and 2. Cyclical redundancy check (CFiC) bits for error correction are added to
the class
1 a and 1 b bits, and then these bits undergo convolutional encoding. The
encoded and
error corrected bits then undergo interleaving. After the addition of overhead
bits, the
3 0 bitrate is 16.2 kilobits/second. The 16.2 kilobit/second bitstream is used
to digitally
modulate a radio carrier using Ir/4 shifted DQPSK. Each compressed, encoded,
and
interleaved speech frame is transmitted as a burst transmission in one slot of
a TDMA
CA 02210313 2004-09-07
7
frame to base station 108. A similar procedure is performed by base station
108 on the
downlink. The compression, coding and interleaving are performed according to
known
techniques such ~s described in the aforementioned IS-548 specification.
In full-rate IS-54-B, each 30 kHz duplex radio channel is divided into three
time
slots known as a TDMA frame. Each TDMA frame may be occupied by three
different
users thereby increasing the capacity of the limited radio frequency spectrum.
Each
user is assigned a separate slot. A TDMA frame is shown in Fgure 3. A first
user may
be assigned slots 1 and 4, for example, to transmit uplink signals to base
station 108,
1 o slots 2 and 5 are used by the first user for receiving downlink signals
from base station
108. Slots 3 and 6 are used by the first user for performing measurements of
other
channels far mobile assisted hand-over (MAHO). A second user may be assigned
to
transmit uplink signals on slots 2 and 5 and a third user may be assigned to
transmit on
slots 3 and 6, and so forth.
Referring again to Figure 2, it is illustrated that the operation of variable
attenuators 203 and 212, switches 205 and 208, as wel! as mute block 213 is
controlled by decision logic 206. The operation of decision logic 206 is
illustrated by
the state machine shown in Figure 4. Since the TDMA frame is 20 milliseconds
in
Length, there is a 20 millisecond pause between between each state in the
state
2 0 diagram. The decision process starts at state SOAR with both uplink and
downlink
paths open.
Returning to state SOAB _of the state machine shown in Figure 4, if uplink
speech detector 202 transmits to decision logic 206 an indication that human
speech is
present in the speech frame, the state machine reverts to state S12A and the
decision
logic 206 engages block 213 which interrupts the downlink speech frame (i.e.,
a PCM
bitstream) flow to D/A converter 210 which silences downlink loudspeaker i 04
(i.e.,
mutes the downlink.) Alternatively, the downlink speech frames could be
attenuated by
means of a fixed or variable attenuator. For each speech frame in which human
speech is detected, the state machine reverts back to state St2A. When upllnk
speech
3 0 detector 202 indicates to decision logic 206 that human speech is not
present in the
speech frame, the state machine moves to the next lowest state (e.g., SIiA).
If no
human speech is detected in the speech frames by uplink speech detector 202
after 12
~
CA 02210313 1997-07-09
- ~ PCTIU~ 9 6 ~ OO T 9 ~
~p~~s ~ c BAR f gn
consecutive speech frames (i.e., states S11A-S1A), the state machine restarts
at state
SOAB, mute block 213 is reset to restore the downlink speech frame flow to D/A
converter 210, and the downlink path is thereby un-muted. By having 12 states,
there
is a 240 millisecond "hang-over" which allows any potential echo to completely
propagate through the landline and cellular communication system before the
downlink
path is unmuted.
During periods when the downlink is muted, but no human speech is detected
by uplink speech detector 202, decision logic 206 closes switch 205 to fill
noise buffer
211 with a speech frame representing the background noise. This may be updated
on
a periodic basis; the desired result is to have stored in noise buffer 211, a
speech frame
representative of the background noise in which the hands-free system is
operating.
--~.
Whenever human speech is detected by uplink speech detector 202, switch 205 is
opened.
From state SOAB (i.e., both uplink and downlink paths open) when human
speech is detected by downlink speech detector 207, the state machine reverts
to state
S10B and the decision logic engages variable attenuator 203. The attenuation
provided
by variable attenuator 203 may be gradually increased for each consecutive
indication
from downlink speech detector 207 that human speech is present in the downlink
speech frames. By applying attenuation incrementally in small steps, or all at
once in
a larger step, rather than completely muting the uplink speech frame, the user
of the
hands-free system may still be heard if he/she raises his/her voice to a level
well above
that of the background noise. Alternatively, the attenuation may be applied
all at once
in a larger increment of, for example, 14 dB. If the speech frames are
attenuated
gradually in variable attenuator 203, then the noise frame stored in noise
buffer 211 is
also incrementally attenuated in block 212 inversely proportional to the
attenuation
applied to the speech frame by block 204. Gradually "un-attenuating" the noise
frames
keeps the energy delivered to the landline user at a consistent level.
Ambient background noise from noise buffer 211, attenuated by attentuator 212
so that it is of the proper energy level, is added to the speech frames in
noise adder 204
to produce uplink speech frames. The uplink speech frames are coupled to radio
transceiver 103 where they are processed as previously discussed. By
attenuating,
rather, than completely deleting, or replacing, the speech frames, the
landline user is
CA 02210313 1997-07-09
WO 96/22651 PCTlL1S96100790
9
able to hear ifi the hands-free user is trying to "cut-in" while still
providing sufficient echo
reduction. The output of the noise adder is referred to as the uplink speech
frame.
A further aspect of the present invention relates to how the contents of the
noise buffer 211 may be "randomized" to prevent any periodic modulation from
being
present in the upiink speech frames. When variable attenuator 203 is engaged,
decision logic 206 closes switch 208 which transfers the contents of noise
buffer 211 to
the noise addition block 204. An example of a noise frame is shown in Figure
5a. As
previously mentioned, the noise frame is a speech frame representing ambient
noise
which was stored previously in the noise buffer 211. For each attenuated
spserh
to frame, the noise frame is randomly ordered using information generated in
random
generator 209 before it is added to the attenuated speech frame. lf, for
example, there
are 160 samples (numbered 0-159) stored in noise buffer 211, the samples are
transferred to the adder 204 in a quasi-random order. For example, as shown in
Figure
5b, if sample number 3 (three) is selected as the starting point, then the
noise buffer is
emptied starting with sample 3, through sample 159. To prevent having a
constant
cross-over point, the remaining portion of the buffer locations are filled
starting with
another random location in the noise buffer 211. By randomizing the order of
the noise
frame, successive applications of the contents of noise buffer 211 will avoid
producing
a periadic modulation in the uplink speech frames that would result if the
same starting
2 o paint was repeatedly used. It is obvious to one of ordinary skill in the
art that other
techniques could be used to randomly order the noise buffer. However, the
intent of
this feature of the present invention is not to apply the same noise frame
repeatedly,
but to repeatedly apply randomly selected pieces of the same noise frame.
It would also be advantageous to use this feature of the present invention
when
2 5 practicing the invention claimed by U.S. Patent 5,222,251. Therein is
described a
hands-free system where signals on the reverse path (or downlink) are replaced
by
codewords that are at the same level as the ambient noise. According to this
feature of
the present invention, the codewords described in U.S. Patent 5,222,251 could
be
randomly ordered with each successive application to prevent a periodic
modulative
3 0 effect.
An alternative embodiment of the present invention, is illustrated in Figure 6
where the variable attenuator 203 and noise addition block 204 have been moved
to
CA 02210313 1997-07-09
WO 96122651 PCTJUS96100790
the other side of the compression block 300. In this embodiment, attenuation
is applied
to the compressed speech frame, or speech codeword, and the contents of the
noise
buffer 211 must be compressed according to the particular algorithm being used
to
produce a codeword representing ambient noise, or noise codeword. The noise
5 codeword is added, or otherwise used to modify, the attenuated speech
codeword. 8y
performing acoustic echo reduction on the other side of the compression block,
the
amount of memory needed to implement the invention may be reduced. Depending
upon the specific compression technique employed, the tradeoff for reduced
memory
may hp offset by additional complexity in manipulating the codewords.
10 While the present invention has been described with respect to a particular
embodiment, those skilled in the art will recognize that the present invention
is not
limited to the specific embodiments described and illustrated herein.
Different
embodiments and adaptations besides those shown and described as well as many
variations, modifications and equivalent arrangements will now be reasonably
suggested by the foregoing specification and drawings without departing from
the
substance or scope of the invention. While the present invention has been
described
herein in detail in relation to its preferred embodiments, it is to be
understood that this
disclosure is only illustrative and exemplary of the present invention and is
merely for
the purposes of providing a full and enabling disclosure of the invention.
Accordingly, it
2 0 is intended that the invention be limited only by the spirit and scope of
the claims
appended hereto.