Note: Descriptions are shown in the official language in which they were submitted.
1C~776Z7
FIELD OF THE INYENTION
The present invention relates generally to PCM (Pu7se
Code Modulation) telecommunications and, more particularly, to
speech detection for use in a Time Assignment Speech Interpel-
ation system in which all the signals are expressed in PCMcoded form and on time-division basis; such system is known in
the art as a PCM-TASI system.
BACKGROUND OF THE INVENTION
TASI systems are well-known and consist basically in
increasing the number of signal sources that can be switched
over a fixed number of transmission lines by connecting a
talker and a listener only when the talker is actually
speaking. One example of such a system is described in U.S.
patent No. 3,030,447 issued April 17, 1962 to Saal.
Most conventional detectors operate on the analog
(non-digital) vocal signal and consist in computing tne mean
power value of the signal and in comparing this value wit~ a ~- -
pre-determined decision threshold. More recent systems consist
in periodically sampling the amplitude of voice-frequency
signals and in translating these amplitude values into digital
form (see, for example, U.S. patent No. 3,712,959 granted Jan.
23, 1973 to Fariello and U.S. patent No. 3,832,491 granted Aug.
27, 1974 to Sciulli). However, the decision reached concerning
the status of a voice channel is based only on the amplitude of
the vocal signal and a distinction is made only between noise
and silence.
In present detectors, there is a certain delay before
the beginning of the identification of speech so as to prevent
undesired pulse noises which could cause the unwanted acti-
vation of a transmission channel. This delay is required inorder to ensure that the talker has really began to speak and
. .
- 1 - ~
' - : -,
10776Z7
is an inverse function of the signal amplitude. This solution,
while avoiding false activation, reduces the intelligibility
of the message since there is a chopping of the consonants of
low amplitude which, however, contain very useful information.
Indeed, the differences between the sounds "ta" and "da"
or "pa" and "ba" are condensed in the first milliseconds.
Furthermore, in presently known detectors, since consonants
include a lot of information and since they are of low ampli-
tude, there is a tendency to consider as speech all signals
having a relatively low amplitude. This results in considering
as speech: white noises of various origins which are inherent
to all transmission channels; and echos, i.e. vowels of high
amplitude which the other talker transmi~s and which, by inter-
ference, are present in the channel under consideration. These
echos are evidently reduced but have sufficient a~plitude to
cause a reactivation.
OBJECTS OF THE INVENTION
An object of the present invention is to provide a
speech detection system that instantly recognizes the presence
or absence of speech without being affected by random noises.
It is further object of the present invention to
provide a speech detection system whereby, when speech is
detected, the actual nature of speech may be known.
It is still a further object of this invention to
provide a speech detection system whereby, when no speech is
present on a channel, the type of silence or noise may be
known.
The present invention is concerned with a speech
'~ system which analyses in ~ee~ time the digital vocal signal
and which detects the presence or absence of speech. This
system enables to control a group of telephone channels
- 2 -
- '
1~)776Z~7
based on silences during conversations. The present system
differs from prior systems by its capability of discriminating
speech from what is not speech rather than discrimating noise
from silence. The present speech detection system enables, at
all times, information on the nature of the speech: voiced
compact, voiced non-compact, and unvoiced. Then, the system
enables to distinguish instantly the presence of short conso-
nants thereby ensuring a greater intelligibility to the tele-
phone transmission.
STATEMENT OF THE INVENTION
The present invention relates to a method of speech
detection in a PCM multiplexed voice-channel system which
comprises: processing a predetermined batch of consecutive PCM
samples; sequentially computing a series of parameters during
processing of the predetermined batch of consecutive PCM ~
samples, the parameters relating to: the amplitude, zero ~ -
crossing, zero crossing of the derivative of the vocal signali -
and determining the status of each channel from information
received as a result of the computing of the parameters over ;
the batch.
Whereas a certain delay is required in presently
known detectors to avoid unwanted noises of short duration,
such delay is no longer needed in the present system since the
present system is capable of recognizing these voices. ~- -
Furthermore9 white noises are now detected inde-
pendently of their amplitude; this is based on a characteristic
which d~stinguishes the white noise from other spoken sounds.
With the present invention, the voiced and unvoiced
signals are treated separately; this provides an immunity
agains~ echos and the unvoiced signals are not affected by
this immunity. Hence, a voiced signal of insufficient ampli-
tude to be a legitimate voiced signal will immediately be
_ 3 _
__. . . . . . . .
10776Z7
identified as an echoi on the other hand, the system wi71
remain extremely sensitive to unvoiced signals (consonants~
even of lower amplitude than that of an echo.
BRIEF DESCRIPTION OF THE DRAWINGS
A preferred embodiment will now be described with
reference to the accompanying drawings, in which:
Figure 1 is a block diagram of the speech detector
made in accordance with the present invention; and
Figure 2 is a schematic representation of the basic
principle of the decision stage of the present invention.
DESCRIPTION OF A PREFERRED EMBODIMENT
The voice speech detector of the subject invention
operates on PCM samples. Conventionally, the analog voice
information is applied to a PCM device which performs a
sampling, typically, at a 8 KHz rate; each sample is subse-
quently converted into a 8 bit binary code. In accordance
with the specific embodiment described herein, the 8 bit
samples are received in subassembly I in Fig. 1. The logarithm
of the amplitude of each sample is coded by an integer taken
between -127 ancl ~127 (with a double zero: -O and ~0 for
symmetry purpose).
The detector of the present invention operates on N
multiplexed voice channels. For channel n, the detector
computes four parameters from a batch of M consecutive samples.
Thus, as far as channel n is concerned, a new set of four
parame~ers is available every M samples. For a particular
channei, the parameters are the four positive integers defined
as:
a: the sum of the absolute values of the M samples:
M
a = ~ ¦Xj¦ where Xj is the integer corresponding
i ~ 1 to the jth sample of the waveform;
-- 4 --
1077627
zo: the number of zero crossingsof the waveform is the
number of sign changes between consecutive samples;
zl: considering the sequence of the M differences between
consecutive samples (i.e. ~1 - Xj - Xj; for
i - 1, 2, 3 .. M), zl represents the number of sign - -
changes among these M differences; in the sequel al
will be referred to as the signal derivative;
d: it is zl minus zo.
The status of channel n is decided on the sole basis
of the four integers along with its previous status.
For each channel~ there are two operating modes.
First, there is the computation mode which consists in
computing the values of a, zo, zl and d which is done sequen-
tially, as soon as the PCM samples arrive at the input of the
speech detector. Secondly, there is the decision mode which
consists in providing a decision at the end of a predetermined
batch of M samples. However, in order to carry out these oper-
ations. the parameters a, zo, zl and d are truncated to become,
respectively, A, Zo, Zl and D. The decision is then obtained
by means of three memories. In the embodiment described, the
same ROM memory of 256 binary inputs and 8 binary outputs is
consecutively used three times; this memory is divided into
three fields of 128, 64 and 64 binary inputs, respectively.
Figure 2 illustrates a schematic representation of
the truncation of a, zo, zl and d into A. Zo, Zl and D.
A - O, 1, 2, 3, 4, ~, 6, 7; it is the binary number
corresponding to the three highest bits of the binary number
(in 11 bits for M 48) corresponding to Ma + 1 wherein ~1 is
a constant which enables to optimize the information contained
in A. For M ~ 48, for example, ~1 - -20; for another value of
M, another value of ~1 must be determined in order to maintain
.
: . . . . - - .
. - , , ~ . . ~ .
1(~776;;~7
as close as possible the equivalence between a and A given in
the following Table la.
TABLE la
a ~c 4 0
5 ~c a < 12 1
12 ~ a c 28 2,3,4
28 ~c a 5,6,7
This value ~1 may be made adjustable with the mean
level of a talker based upon a few seconds. This results in
directly rendering the detector adaptable in amplitude which
may represent an advantage in certain applications.
Zo = 0, 1, 2, 3, ...15 is the binary number corre-
sponding to the four highest bits of the binary number (in 5
bits for M - 48) corresponding to ZO + ~2. For M - 48, ~2 is
equal to - ~2; for another value of M, another value of ~2 must
be determined to satisfy the equivalence of Table lb.
TABLE lb
zo Zo
~ 2.6
2.6 1
Zl - 0, 1, 2, 3, ...7 is the binary number corre-
sponding to the three highest bits of the binary number (in 6
bits for M - 48) corresponding to zl + ~3. For M = 48, ~3 is
equal to +6i for another value of M, another value of ~ must
be determined to satisfy the Table lc.
-- 6 --
.
77627
TABLE lc
zl ~1 .
zl 2.6 0,1,2
2 6 ~ zl < 1M8 3,4
~M8 ~ zl 5,6,7
D ~ 0, 1, 2, ...7 is the binary number corresponding
to the three highest bits of the binary number (in 4 bits for
M = 48) corresponding to zl - zo.
The four new integers ar~ processed two by two. -.
The memory field #1, which receives inputs D and Zo,
provides two output binary parameters R - 0,1 and Z ~ 0,1
as in Table ld.
TABLE ld
.
zo, zl or d R :
. . .
1.18 ~ Zzl ~ 1.42 0 :
If not 1 ~ :
It should be noted that R is a function of the ratio zl/zo;
this v?lue is easy obtainable from the parameters d and zo
which are sufficiently approximated by D and Zo. In essence,
R identifies the presence of white voice.
The memory field #2, which receives inputs Zl and A,
provid~s an output binary number AZ - 0,1 ...6 of 3 bits in
accord~nce with Table 2.
: - 7 -
1(~77627
TABLE 2
\ 0 1 2 3 4 ~ 6 7
O O O O O O O O O
S 1 1 1 1 4 4 6 6 6
2 2 2 2 5 5 6 6 6
3 2 2 2 5 5 6 5 6
4 2 2 2 5 5 6 6 6 ~:
2 2 2 3 3 6 6 6 :
6 2 2 2 3 3 6 6 6
2 2 2 3 3 6 6 5
AZ = f(A,Zl)
The memory field #3 receives inputs K, R, Z and AZ
(K and R being two binary parameters, the obtention of which
will be described hereinbelow); it provides, first, an inter- -
mediate parameter K - 0,1 the value of which with respect to
the inputs is given in Table 3a:
TABLE 3a
..... ~
AZ : -
_ . _ -
Zo R K 0 1 2 3 4 5 6
O O O 1 1 0 0 0 0 0 '
O O 1 1 1 0 0 0 0 o ' ' '
0 1 0 1 1 0 0 0 0 1 .'
0 1 1 1 1 0 0 0 0 1
1 0 0 1 0 0 0 0 0 0 :
1 0 1 1 0 0 0 0 0 0
1 1 0 1 1 1 1 1 1 1
1 1 1 I 1 1 1 1 1 1 1 .-
K f(Zo,R,K,AZ)
10~776;~7
TABLE 3b
AZ
Zo R K O 1 2 3 4 5 6
O O O 5 1 1 2 3 3 4
O O 1 5 7 1 2 6 3 4
O 1 0 5 1 1 2 3 3 3
O 1 1 5 7 1 2 6 3 ~
1 0 0 5 4 4 4 4 4 4
1 0 1 5 4 4 4 4 4 4 -:
1 1 0 5 3 3 3 3 3 3
1 1 1 5 6 6 6 6 6 6
S ~ f(Zo,R,K,AZ)
On the other hand, memory field #3 provides the
status information S ~ 1, 2, ... 7, the value of which with :
re`spect to the inputs is given in Table 3b. This status may
be conveniently described by seven binary variables referenced:
Y, CM, NV, FR, SL, WN and EC, which take the values of O or 1
according to Table 4a.
TABLE 4a
STATUS OUTPUT INFORMATION IDENTIFIED
NUMBER CORRESPONDING TO STATU' WAVEFORM
S V CM NV FR SL WN EC CHANNEL TYPE OF SPEECH
_
1 1 1 0 0 0 0 O active Voiced compact
; 2 1 0 0 0 0 0 0 active Voiced non~compact
3 O O 1 0 0 0 0 active Unvoiced, non-
fricative
4 O O 1 1 0 0 0 active Unvoiced,
fricative
5 O O O O 1 0 0 passive Silence
6 O O O O 1 1 0 passive White noise
7 O O O O 1 0 1 passive Echo
. 1~77627
The script j is given to the parameters and to the
decisions pertaining to the present batch of M samples and
j - 1, j - 2 for the preceding decisions. Therefore, R and K
may be defined by the following logic equations: Rj = Rj "or"
Rj 1 and Kj - Kj 1 "and" Kj 2 (where "and" and "or" are
the operators of the Boolean logic).
The ultimate decision, Sj*, concerning the status of
a channel after the analysis of batch j is given at table 4b.
TABLE 4b
1 0 \ S j I :
\ 1 2 3 4 5 6 7
1 1 1 3 4 5 6 7
2 2 2 3 4 5 6 7 -
Sj _ 1 3 1 2 3 3 5 6 7
4 1 2 4 4 5 6 7
; 5 1 2 3 4 5 6 7
6 1 2 3 4 5 6 7
_ _ 1 2 3 4 5 6 7
Sj* = f(Sj, Sj 1)
Sj* is a function of status Sj given by the memory
field #3 as well as the status of Sj 1 which was identified
by the same memory for the preceding batch. Sj* is equal to Sj, ~ -
except in few cases where it is equal to Sj 1 Theseexceptions correspond to a minor refinement of the decision
concerning the type voiced, compact/non-compact, or unvoiced,
fricative or non-fricative.
Referring to Figure 1~ the detector made in
20 accordance with the present inventlon includes fifteen sub- -
assemblies which are referenced in Roman numerals. The output -
'
- 1 0 -
.. ... . ~ . . .
1C~776Z7
of a sub-assembly is referred by its Roman numeral, followed by
the subscript: 1, 2, 3 ... .
A description of each sub-assembly and of its
function will now be given. Standard digital integrated
circuits well known to the person skilled in thisart may be
used to perform these functions and a detailed description
thereof is believed not to be necessary for a full
understanding of the present invention.
SUB-ASSEMBLY I
This sub-assembly receives the PCM samples of the
waveform which constitute the input to the detector and
computes sequentially the differences corresponding to the
derivative of the signal. The sequential operation of the
speech detector allows to keep in the memory of this sub- ~`
assembly only one PCM sample per channel and the sign of the
derivative. -
SUB-ASSEMBLY II
For each channel, this sub-assembly detects the
zero crossings of the waveform by comparing the signs of two
successive samples and computing the sum (zo) of a batch of
M samples.
SUB-ASSEMBLY III
For each channel, this sub-assembly computes the
difference (d) between the number of zero crossings (zo) of
the signal and the number of zero crossings of the derivative
(zl)~for a batch of M samples.
SUB-ASSEMBLY IV
For each channel, this sub-assembly detects the
zero crossings of the derivative of the signal by comparing
the signs of two successive samples and computing the sum (zl)
for a batch of M samples.
-- 11 --
,
~)776Z7
SUB-ASSEMBLY V
For each channel, this sub-assembly takes the abso-
lute value of the amplitude of each sample of the signal and
computes the sum (a) thereof for a batch of M samples.
SUB-ASSEMBLY VI
For each successive channel, this sub-assembly
effects a quantification or truncation on zo, which comes
from sub-assembly II and becomes Zo, and keeps it in memory
with a format of 4 bits. It also effects a quantification
on d which comes from sub-assembly III and becomes D, and
keeps it in memory with a format of 3 bits. It further
includes a one bit memory for the addressing of sub-assembly X.
SUB-ASSEMBLY VII
_ _ .
For each successive channel, this sub-assembly
effects a quantification on Z1, coming from sub-assembly IV,
which becomes Z1, and keeps it in memory with a format of 3
bits. It also effects a quantification on a, coming from the
sub-assembly V, which becomes A, and keeps it in memory with
a format of three bits. It further includes a two bit memory
for the addressing of sub-assembly X.
SUB-ASSEMBLY VIII
For each successive channel, it keeps in memory the
outputs of sub-assemblies XI and XII and the outputs X2 to X5
of sub-assembly X. It includes a two-bit memory for the
addressing of sub-assembly X.
SUB-ASSEMBLY IX
This sub-assembly enables, for each channel, to
successively direct the outputs of sub-assemblies VI, VII, VIII
to the inputs of sub-assembly X. ~`
SUB-ASSEMBLY X
This sub-assembly consists of a read only memory
- 12 -
-,
- : . - . . . . .
1~77627
(ROM) including three fields respectively addressed by sub-
assemblies VI, VII, VIII. The parameters ~ and Z resulting
from the memory field #l are the outputs Xl and X2 which
respectively constitute the inputs of sub-assemblies XII and
VIII. The memory field #2 gives parameter AZ on outputs X3,
X4, X5, thereby completing the input of sub-assembly VIII.
The informations with respect to the status V, NV, SL, WN, EC
resulting from memory field #3 is available on X2, X4, X6,
X7 and X8 and are entered in sub-assembly XV whereas the para-
meters CM and FR on outputs X3 and X5, respectively, are
entered in sub-assemblies XIII and XIV. The parameter ~ on
output X1 constitutes the input of sub-assembly XI.
SUB-ASSEMBLY XI
For each channel, it provides a sequence test on
parameter K between two consecutive batches of M samples;
this sub-assembly may include a pair of shift registers and
an "AND" gate.
SUB-ASSEMBLY XII
For each channel, it provides a sequence test for
parameter R between two consecutive batches of M samples;
this sub-assembly may include a shift register and an "OR"
gate.
SUB-ASSEMBLY XIII
.
For each channel, it provides a sequence test on
the results NV and FR between two consecutive batches of M
samples. This sub-assembly may include a pair of shift
registers and an two-input selector.
SUB-ASSEMBLY XIV
For each channel, it provides a sequence test on the
results V and CM between two consecutive batches of M samples.
This sub-assembly may include a pair of shift registers and a
- 13 -
76;2~
two-input selector.
SU~-ASSEMBLY XV
For each successive channel, it keeps in memory the
results V, CM, NV, FR, SL, WN, EC and makes them available
during the time allotted to a channel. It may include a shift
register which serves as a buffer memory for the results
obtained.
It is to be understood that the above described
arrangements are merely illustrative of numerous and varied
other arrangements which may form applications of the princi-
ples of the invention both in the calculation and in the
decision (i.e.: several distinct memories, use of micro
processors...). It is evident that these other arrangements
may readily be devised by persons skilled in the art without
departing from the spirit and scope of the present invention.
- 14 -
.