Note: Descriptions are shown in the official language in which they were submitted.
205~6~
FIELD OF THE INVENTION:
This invention relates to the field of voice
activated systems, and in particular to voice activation
of call control systems such as telephone switching
equipment.
BACKGROUND TO THE INVENTION:
Voice recognition systems which could be used
to activate switching equipment or to operate station
equipment such as computers and telephone sets have been
known for some time. One technique to enable operation
of the equipment is to perform an analysis of the words
spoken, and to compare the results of the analysis with
a standardized database stored in mass storage memory.
In another technique, a learning process is used, in
which successive correct comparison (hits) reinforce the
detection of common sounds of various users which are
spoken to generate a particular response. As the system
learns, the speed of locating the particular "hit"
increases.
Both of the above techniques, used separately
or in combination, require sophisticated analysis of the
spoken words, and are therefore costly in terms of
equipment and analysis time. Consequently they have
been used in centralized systems, where they can be
2s shared amongst many users. The complexity and cost of
such systems have been found to be very high.
Another form of system which has been used is
associated with each telephone set, with voice signals
stored at each telephone set. Users speak commands,
which are recognized in the telephone set based voice
recognition system by comparison of the voice command
signal with the stored audio signal.
Such systems are required to be provided for
each telephone set, which makes the telephone set very
costly.
20586~
SUMMARY OF THE PRESENT lNV~NllON:
The present invention is a centralized system,
in which the voices of designated users are digitized
and stored at a central location. Particular words are
stored of a limited command vocabulary. Each voice
signal is digitized, the digitized commands are grouped,
linked by a code associated with a user's telephone,
compressed and stored as data in a centralized mass
storage device.
When a user activates a telephone, the
associated data group is retrieved from storage,
decompressed and placed in a virtual memory. When the
user speaks one of the system commands, the spoken
command is digitized and compared against the stored
commands in the virtual memory. When a match is found
the associated system function is initiated. Typically
up to twenty system commands are used.
The present invention has several advantages
over the noted prior art systems. The equipment is
centralized, used in common with all users, therefore
avoiding the requirement that each telephone set should
have a recognition system and voice storage device built
into or connected to it. However because only a
particularly designated (predetermined) group of
commands, associated with particular users are stored,
sophisticated voice recognition and analysis equipment
for all possible voices and all possible words is not
required. Once the identity of a voice has been
determined, it is comparable against only a small number
of commands, e.g. twenty, and therefore determination of
words from a large dictionary for reproduction on
display or printing equipment is not required. The
establishment of a command "hit" of a small number of
commands causes the equipment to execute the command.
2058644
One may consider the comparison between a
voice recognition system which controls automatic typing
or display of words, with the present invention. In the
former system, analysis must be made of each word, a
S comparison is made with a dictionary stored in a mass
storage device, and the word is displayed on a CRT
display or printer. In such a system the equipment must
be capable of discerning the words spoken for all
potential users (who may have different voices,
different accents, etc.). This could involve every
potential speaker of the language, e.g. millions of
persons. Clearly such equipment must be very
sophisticated and is costly to provide.
On the other hand, in the present invention,
certain specific commands are stored associated with a
single user. The comparison equipment need only find a
correspondence between the command spoken by that same
user and his previously stored command. Clearly the
process of comparison of the spoken command and the
stored command, to find a hit, is vastly simpler. The
equipment is used in common with all potential users of
their respective telephones, reducing the cost per user
even further.
Once a single user has been identified as
wanting to use the system, all of his or her potential
commands are placed in virtual memory, ready for
comparison of the command. In accordance with the
present invention, there are two steps to the retrieval
process, the recognition of a particular user, which
causes retrieval of all of that person's stored commands
to virtual memory, and the subsequent recognition of the
particular word sound corresponding to a command.
Consequently the process of determining the particular
commands spoken is vastly simpler than in the prior art
system described above.
2~58644
-4-
In accordance with an embodiment of the
invention, a method of operating a central call control
apparatus by plural users is comprised of storing an
identification of each user in a directory, storing in
S compressed form a plurality of predetermined commands
spoken by the users grouped by user, each group having
a pointer to a corresponding identification in the
directory, indicating an identification in the directory
of a service demanding user wishing to initiate a
command upon receipt of a service demanding signal,
retrieving a corresponding group of commands linked by
the pointer to the identification and decompressing and
storing the group of commands, receiving spoken command
words from the service demanding user, comparing the
spoken command words with the decompressed group of
commands to find decompressed command words, and
invoking commands for which command words are found
corresponding to compressed commands in the decompressed
group of commands.
In accordance with another embodiment, a
method of operating a voice activation system is
comprised of the steps of storing a plurality of system
control signals, automatically prompting a user by a
synthesized voice prompt to speak a sequence of
operation commands for the system, digitizing and
storing each spoken operation command in a first memory
with a pointer to a corresponding system control signal,
storing a group of the digitized operation commands and
pointers in a mass storage device with a link to an
identifier of the user, receiving an identifier signal
associated with the user, and in response, retrieving
the group of operation commands from the mass storage
device and storing the group in a second memory,
receiving spoken commands from the user and digitizing
the commands, comparing the digitized spoken commands
20S8644
with the operation commands stored in the second memory
to find successive matches, and generating successive
system control signals corresponding to the matched
commands and utilizing associated pointers thereof to
operate the system.
In accordance with another embodiment, a voice
activation system is comprising of apparatus for storing
a plurality of system control signals, apparatus for
automatically prompting a user to speak a sequence of
operation commands for the system, apparatus for
digitizing each spoken operation command and storing
each digitized operation command in a first memory with
a pointer to a corresponding system control signal,
apparatus for storing a group of the digitized operation
commands and pointers in a mass storage device with a
link to an identifier of the user, apparatus for
receiving an identifier signal associated with the user,
and in response, retrieving the compressed group of
operation commands from the mass storage device,
apparatus for retrieving the group of operation commands
and pointers and storing them in a second memory,
apparatus for receiving spoken commands from the user
and for digitizing the commands, apparatus for comparing
the digitized spoken commands with the operation
commands stored in the second memory to find successive
matches, and apparatus for generating successive system
control signals corresponding to the matched commands,
utilizing associated pointers thereof, to operate the
system.
BRIEF INTRODUCTION TO THE DRAWINGS:
A better understanding of the invention will
be obtained by reference to the detailed description
below, in conjunction with the following drawings, in
which:
-6- 2~s86g4
Figure 1 is a block diagram of the present
invention,
Figure 2 is a block diagram used to illustrate
the process and hardware of an embodiment of the present
S invention,
Figure 3 is a block diagram illustrating a
process of establishing the system for a particular
user,
Figure 4 is similar to Figure 3 but used to
illustrate another aspect of the process,
Figure 5 is a block diagram used to illustrate
operation of the system in dialing a call, and
Figure 6 is a block diagram used to illustrate
operation of the system to implement a special feature
call.
DETAILED DESCRIPTION OF THE INVENTION:
Figure 1 illustrates in block diagram
apparatus which can be used to implement an embodiment
of the invention. A representative telephone set 1 is
connected via a subscriber's line to a switching machine
3, which is controlled by a processor 5. Various trunks
and other peripherals are connected to the switching
machine 3 in a well known manner.
A peripheral connected to switching machine 3
in a manner in which it can be connected by the
switching machine to any of the telephone sets is a
digitizer 11. The output of a voice synthesizer 9 is
connected to the same peripheral terminal as the
digitizer 11. A memory is connected to the output of
the digitizer 11 containing a directory 13. A memory 15
is also connected to the output of digitizer 11, as well
as a compressor/decompressor 17. Additional memories,
referred to as voice boxes l9A-19N are connected to
compressor/decompressor 17. The input to the voice
synthesizer 9 is connected to the memory 15 for
20~644
-7-
receiving signals which can be converted to synthesized
speech, and applied to the peripheral terminal on the
switching machine 3.
Each of the voice synthesizer and memories is
connected for control to processor 5. Alternatively a
separate processor may be used.
The switching machine 3 may be implemented
using for example the Mitel SX2000~, a system as
described in U.S. Patent 4,615,028. In such systems,
any of the peripherals can have access to other
peripherals for translation of circuit signals or
message signals, i.e. information including signalling
and control signals. Switching systems other than the
above-mentioned may be used.
Operation of the invention will be described
below with further reference to Figures 2-6.
In order to establish the operation of the
invention, a user alerts the system by pressing a
predetermined key or his telephone set 1. The switching
system, under control of processor 5, establishes a
connection between telephone set 1 and digitizer 11,
through to directory 13. Under control of the
processor, the identity of the telephone set (user) is
stored in the directory 13.
Turning now to Figure 3, it may be seen that
when the system is alerted as described above, it
establishes in the directory the identity of the
particular user. The identity of three users,
identified as User Code A, User Code B and User Code C
are shown as stored. As shown, when user B alerts the
system, under control of the microprocessor a "voice
box" (memory record space) for user B is established in
memory 15. Once this is established, a sequence of
stored prompts generate code which is passed to voice
synthesizer 9, which generates speech which is passed to
-8- 2058644
the telephone set, through the switching machine. The
prompts request the user to repeat the prompts, e.g. as
shown in Figure 3, to repeat the words "dial", "1", "2",
..., and a sequence of offered feature names, such as
"call forward".
The user repeats the prompts, which repeated
prompts are passed through digitizer 11, and are stored
as digital signals in the voice box for user B in the
system memory. The voice box of user B also contains,
with each digitized repeated prompt, a pointer to a
system command signal corresponding to each word, e.g. a
"dial" pointer, a "1" pointer, a "2" pointer, a
particular feature pointer, etc. These pointers are
used to initiate the generation of the command signal,
e.g. the generation of the digit 1, etc.
As shown in Figure 4, once the repeated
prompts have been stored in the system, the digitized
voice and pointers (the voice box of the user) are
compressed in compressor/decompressor 17, and stored as
a group in a mass storage device with a link to the
system directory. This is repeated for each of the
users, as they come on stream.
Therefore in the system memory, a directory 13
contains a link by means of pointers to each user. In
the mass storage, there is a stored voice box for each
of the users, in which each voice box contains
compressed prompts of the voice of the corresponding
user, for each command. Each voice box contains a link
to the directory entry for the particular user.
With reference to Figure 5, in order to
utilize the system, the user, e.g. user B, alerts the
system by pressing a telephone set key to generate a
multi-frequency tone or tones or by saying the user's
name, after the switching system 3 connects the
telephone set 1 to the particular peripheral line to
~9~ 2058644
which digitizer 11 is connected. The signal resulting
from pressing the set key or saying the name is
digitized in digitizer 11, and is compared under control
of or by processor 5 with the data stored in the system
S directory 13. The task is not onerous, since only those
names, in the voices of particular users are stored in
directory 13. On finding a match, the link to the voice
box corresponding to the matched voice causes the voice
box to be retrieved. The voice box signals are
decompressed and stored in memory 15, preferably in
virtual memory in order to minimize the memory
requirements. Once the voice box has been retrieved,
the system is set for receiving spoken commands, and
microprocessor 5, recognizing that the user's voice box
lS has been stored in memory 15 causes dial tone to be
passed to the telephone set.
The user then speaks the required words or
phrases, e.g. "dial", "7", "2", ... i.e. a command to
dial a particular telephone number. The voice signals
are digitized in digitizer 11, and are compared, by
means of microprocessor 5, with the data stored in the
decompressed voice box in memory 15 for that particular
user, e.g. in the example shown, user B. It will be
recognized that since in the voice of the particular
user particular commands have been prestored in the
voice box, the task of comparing the digitized command
signals with those in the voice box is significantly
easier and quicker than if an expected comparison of any
voice with any word were to be contemplated. The only
voice to be compared with is the actual voice giving the
commands.
With the finding of a match of a command, the
pointer corresponding to the matched words, e.g. to
begin dialing followed by the designation of a
particular number to be dialed, is obtained. The
-lO- 2058644
command pointers stored with the voice box commands are
retrieved by microprocessor 5, and are used to control
the switching system to execute the desired operation.
Figure 6 illustrates operation of the
invention when invoking a feature, and corresponds to
the lower portion of Figure 5. The figure illustrates
that a dialed tone has been provided back to the user,
as described above.
However in this case, rather than saying
"dial" or a numeral, the user says the name of a
feature, e.g. "call forward". After digitizing, a
comparison is carried out in which the "call forward"
feature stored in the voice box of user B is found. In
this case the pointer points to a feature operation and
description list 21 in the mass storage, and retrieves
corresponding call forward command data signals stored
therein. The corresponding data signals pass into the
memory 15. Under control of microprocessor 5 accessing
the call forward operation and description data signals,
the user is interactively stepped through the required
feature operation steps. For example, the words "state
the number that you wish calls to be forwarded to and
then say the word TRANSFER" can be retrieved from memory
15, and passed to voice synthesizer 9, which translates
the sentence into analog which is passed via the
subscriber's line to the user telephone 1. The user
then states the numbers, which are compared with the
stored numbers in the voice box stored in memory 15.
The resulting pointers, and the working of the call
forwarding feature as described above, are used by the
system to control the switching machine forward future
calls to the designated telephone line.
Thus the telephone system in effect mimics the
user friendly operation of the old fashioned telephone
operator, in which the user can speak and have the
-11- 2058644
system reach and respond to his or her commands. Where
sophisticated features are to be used, the user hears,
in a friendly and understandable voice, instructions on
how to invoke the feature which instructions can be
carried out in a conversational procedure.
Because the system has stored predetermined
command words from particular users by voice prompts, it
is a relatively simple task to compare the spoken
commands with the stored voice signals relating to those
lo commands. The system can therefore operate at high
speed, and a centralized system can be used for a large
number of users. With the stored signals being located
in the described voice boxes in a compressed form, such
a system can be provided with relatively low cost to a
large number of users.
A person understanding this invention may now
conceive of alternative structures and embodiments or
variations of the above. All of those which fall within
the scope of the claims appended hereto are considered
to be part of the present invention.