Note: Descriptions are shown in the official language in which they were submitted.
CA 02686692 2009-12-01
1
SYSTEM AND METHOD FOR INTERACTING WITH A COMPUTER SYSTEM
TECHNICAL FIELD
[0001] The present invention relates to a system and method for interacting
with a computer.
BACKGROUND
[0002] There commonly exist a number of control systems and techniques
to control computers by use of voice commands. Such systems control the
computer with unidirectional commands and as such, they emulate the typing of
text, or motion of mouse. Voice commands are used to create a one-way
interaction that ultimately converts into mouse and/or keyboard commands. Such
systems lack knowledge about the overall processes that the user is
controlling,
the short and long-term user's environment and analysis, to be more generally
useable in the context of vocal bi-directional interactions used for process
control.
[0003] There is thus a need for a truly bi-directional vocal interaction
system.
SUMMARY
[0004] It is an object of the present invention to provide a system and
method for processing one or more requests made by a user to a computer
through vocal or audio requests, or other acoustic waves, and further used to
exchange information between the user and the computer until the requests of
the
user have been processed.
[0005] It is another object of the present invention to provide means of
collecting information pertaining to a computer, which may be remotely
connected,
and controlling such computer by use of an interaction system that acts as an
interface between the user and the computer, such interaction system being
connected with the computer by the means of at least one communication link.
[0006] In accordance with the present invention, there is provided an
interaction system with optional Internet connectivity, usually comprising a
microphone, a speaker, a central processing unit with associated memory and
C I,S0585. ]
CA 02686692 2009-12-01
2
hardware and software needed for the purpose of decoding vocal or audio
requests and additional information that have been provided by the user and,
as
applicable, communicating with a computer to be controlled via a communication
link, processing those requests, and providing the computer response to the
user
in the form of an audio sound wave, or any other form of information that is
suitable to the user. Advantageously, the communication link may take the form
of
a USB bus connection.
More specifically, there is provided a method for interacting with a computer,
comprising:
acquiring an audio request from a user;
decoding the audio request;
emulating computer commands;
providing the emulated computer commands to the computer;
receiving feedback from the computer;
converting the feedback into an audio response; and
providing the audio response to the user.
BRIEF DESCRIPTION OF THE FIGURES
[0007] Embodiments of the invention will be described by way of examples
only with reference to the accompanying drawings, in which:
[0008] Figure 1 is a schematic view of a user interacting with a computer
through an interaction system in accordance with an illustrative example of
the
present invention; and
[0009] Figure 2 is a detailed schematic view of the interaction system of
Figure 1.
DETAILED DESCRIPTION
[0010] Generally stated, the non-limitative illustrative embodiment of the
present invention provides a system and method for interacting with a computer
1-490585.1
CA 02686692 2009-12-01
3
including means of interactions and control between a user and a computer by
use
of vocal and audio requests as well as audio feedback. These means of
interactions and control may be used as the only means of data entry and data
output, or, as the context requires, in combination with mouse, keyboard and a
display, or any other form of user interface.
[0011] As used in this specification, the term "request" is meant to mean any
command, series of commands or any speech/sound to be analyzed and
interpreted in order to be converted into one or more commands. Furthermore,
the
terms "vocal and audio requests" are meant to mean requests provided through
sound or acoustic waves transmitted through any medium.
[0012] The interaction system described herein provides means of collecting
supplemental information about the user and its environment, which may include
a
computer on which processes are executed, and an external data access port
which may be connected to the Internet for further information access. The
means
of collecting supplemental information includes a microphone, a computer
display
input and, optionally, a video camera and/or one or more data entry port(s).
[0013] Furthermore, the interaction system provides means for storing and
analyzing the collected information, over a period of time in a contextual
memory,
and for controlling the computer and the execution thereon of various
applications
(e.g. software). The means for controlling the computer include mouse and
keyboard emulation commands, in conjunction with analysis of the computer
response through its display.
[0014] The advantage of using mouse and keyboard emulation commands
based on a computer display image (i.e. computer video monitor data) analysis
over traditional application interaction methods is that nearly all
applications that
run on a personal computer also incorporate a keyboard and mouse input, and a
display output monitor. The interaction system is therefore directly
applicable to
controlling through voice, many existing processes that may be running on the
computer.
14F:05$5.1
CA 02686692 2009-12-01
4
[0015] Referring to Figure 1, there is shown an illustrative example of a user
(5) interacting with a computer (110) through an interaction system (42),
which are
interconnected together by a communication link (100), for example a USB bus.
(0016] The interaction system (42) may be powered using an integrated
power source or power input, or in the case where the interaction system (42)
is
connected to the computer (110) via a USB bus, use the power provided by the
USB bus, which includes a bus Voltage connection (Vbus), a positive data (D+)
connection, a negative data (D-) connection and a reference voltage (Vref).
Optionally, the interaction system (42) may also include a power pack (60)
that is
recharged by the power provided by the USB bus.
[0017] The user (5) receives and transmits information from/to the
interaction system (42) via vocal or audio requests using, for example, a
microphone (1) and a speaker (2). It is to be understood that the user (5)
should
be positioned at a suitable distance from the microphone (1) and speaker (2)
with
respect to the operational volume adjustments of the interaction system (42)
in
order to issue and receive coherent requests and messages from/to the
interaction
system (42). It is further to be understood that in an alternative embodiment,
the
microphone (1) and speaker (2) may be remotely located. For example, the user
(5) may use a telephone to remotely communicate with the interaction system
(42),
which may include a suitable interface.
[0018] The data processing components (70) which provide the interaction
system (42) with its functionalities will be further detailed below.
[0019] The computer (110) includes software components to receive
keyboard (830) and mouse (820) command inputs, which are transmitted by the
interaction system (42) on the communication link (100). The computer (110)
also
includes software components to relay the computer (110) video monitor data
(810) to the interaction system (42). For more clarity, the computer (110)
video
monitor data includes the information which is normally displayed by the
operating
system (such as, for example, WindowsTM) on the computer (110) video monitor
1.30585-1
CA 02686692 2009-12-01
for use by the user (5), in conjunction with a mouse and keyboard, to receive
information from the computer (110) and to control the computer (110).
[0020] The video monitor data from the computer (110) provides feedback
which is analyzed by the interaction system (42) in order to determine if the
desired control has being obtained once a command (i.e. setpoint) has been
issued.
[0021] It is to be understood that the computer (110) may be physically
located near the interaction system (42) or may be remotely located using a
communication link (100) such as an Internet or wireless connection.
Interaction system
[0022] Referring now to Figure 2, there is shown a detailed schematic view
of the interaction system (42) of Figure 1 in accordance with an illustrative
example of the present invention. The interaction system (42) includes a USB
concentrator/hub (50) used for multiplexing information leaving/entering a
central
processing unit (30), namely a USB mouse (513) and a USB Keyboard (514),
commands leaving through the main USB bus (100), and receiving the computer
(110) video data (515), and optionally, additional data (516) to be extracted
from
the main USB bus (100), and directed towards the central processing unit (30),
either directly or through an optional video processor (15). The
concentrator/hub
(50) may optionally be provided with one or more ports that can be made
available
for the connection of devices which are external to the interaction system
(42) and
which need connectivity with the main USB bus (100).
[0023] The central processing unit (30) collects contextual information
obtained from the user (5) and the environment by various means such as vocal
or
audio requests provided by the user (5) through the microphone (1), data
obtained
from the computer (110) through the remote computer video data (515), other
data
(516) obtained from the main USB bus (100) through the USB concentrator hub
(50) and visual information collected through an optional camera (20). The
central
processing unit (30) saves appropriate contextual information in an associated
contextual memory (40) and executes resident application program instructions
14s()5S5. I
CA 02686692 2009-12-01
6
(39), and optionally (11) and (12), required to process the user (5) requests.
From
time to time, the central processing unit (30) may transmit part or all of the
contextual memory (40) information to an external device or system through an
optional external data access port (4) or through the USB concentrator hub
(50) for
the purpose of saving and later recovering the previously saved contextual
memory information.
[0024] An optional video processor (15) receives the remote computer video
monitor data (515) and optional camera signal (221). The video processor (15)
includes a video comparator which highlights the changes that are occurring
between each frame of the received video data coming from the selected source,
using a frame memory buffer, which contains information stored from previous
video frames, and transmits processed video information (525) to the central
processing unit (30).
[0025] A speech-to-text conversion software (16) is used for translating
microphone (1) input sound waves (vocal or audio requests) emitted by the user
(5) into textual information used by the central processing unit (30) to
decode user
(5) requests. A USB mouse (13) and USB keyboard (14) emulation software
components convey control commands issued by the central processing unit (30)
to the computer (110) via the USB concentrator hub (50) output (100).
[0026] Conversely, a text-to-speech conversion software (17) is used for
converting textual information conveyed by the central processing unit (30) to
the
user (5), as sound waves (vocal or audio messages) emitted by the speaker (2)
through an audio interface (38).
[0027] An optional application controller (10) and learn mode switch (3) may
be used to select, based on the information provided by the learn mode switch
(3),
which of the executable code of memory locations (11) or (12) will be executed
by
the central processing unit (30) to process user (5) requests. These memory
locations (11) and (12), when present, contain alternate versions of the
executable
code that may be downloaded by the application controller (10) or by the
central
processing unit (30), at given time intervals or upon notification that a new
version
141;05851
CA 02686692 2009-12-01
7
o the executable code is available, into the interaction system (42) through
the
external data access port (4) and is executed by the central processing unit
(30) to
process user requests. In the absence of such application controller (10) and
such
optional memory locations (11) and (12), the central processing unit (30) uses
the
resident application program memory (39) executable code to process user
requests. It is to be understood that the learn mode switch (3) may be a
physical
switch or be implemented in software and be, for example, activated through a
vocal or audio request.
Contextual information
[0028] Contextual information is the information about the user and its
environment that is collected from time to time by the interaction system
(42). This
may be, for example, information such as the user's name, his/her email
addresses, friends names, phone numbers and addresses, nicknames, preferred
sources of information for news, meteo, music, etc. Contextual information
includes, as the context requires, additional data provided through the
external
data access port (4) which may be connected to the Internet.
[0029] Contextual information also includes images collected from the
computer (110) video monitor or/and the optional camera (20). Such images
provide more information about the user's environment, for example they may
inform the interaction system (42) of the number of people which are present
in the
user's (5) close environment, to confirm if anyone is present (as confirmed by
the
vocal or optional visual feedback) and confirm if anyone is located in
proximity to
the user's (5) microphone (2), e.g. to be used in the process of volume
adjustment,
or to decide that the user (5) has gone and that it will be necessary to wait
for the
user's (5) return before issuing a vocal response message.
10030] Contextual information may further include other elements that are
part of the user's (5) environment, for example physical documents containing
information to be scanned, digitized, recorded, music and sounds. The computer
(110) video monitor images provide information about the applications that are
currently running on the computer (110), and also provide the computer's (110)
i>8O5S5 1
CA 02686692 2009-12-01
8
response and status, including all visual information that is available to the
user (5)
on the computer (110) monitor.
In use
[0031] In use, vocal or audio requests from the user (5) are provided to the
interaction system (42) through the microphone (1), which requests are further
converted into textual information by the speech-to-text (16) software. The
textual
information containing user requests is then processed by the central
processing
unit (30). When needed, contextual information pertaining to the computer
(110) is
obtained from the central processing unit (30) through analysis of the
computer
(110) video data (515) and/or other data (516). The video data (515) may be
optionally processed by the video processor (15), which then provides the
processed video information (525) to the central processing unit (30) for
further
processing.
[0032] When insufficient information is available, as deemed necessary by
the central processing unit (30) execution process based on the resident
application program memory (39), or optionally by the application program
located
optional memory locations (11) or (12), the interaction system (42) asks the
user
(5) to provide additional information via one or more data entry means that
are
available to the interaction system (42). The additional data is then added to
the
contextual information saved in the contextual memory (40) for further
processing.
In an alternative embodiment, the interaction system (42) may send requests,
for
example over the Internet, for additional information. Such information may be
provided, for example, by a dedicated server or any other suitable data
source.
[0033] Optionally, a camera signal (221) is provided to the interaction
system (42) through optional camera (20), which camera signal (221) is
transmitted to the central processing unit (30) through the optional video
processor
(15) for further processing. The central processing unit (30) selects, as
appropriate, the desired source for the received video data to be either of
the
remote computer video data (515) or the data provided by the optional camera
1.;x0585.1
CA 02686692 2009-12-01
9
signal (221), with the use of a control signal (315) issued to the video
processor
(15)-
[0034] When applicable, the interaction system (42) responds by issuing
commands and transferring data, as applicable, to the computer (110) by use of
the mouse (513) and keyboard (514) command emulation software components
(13) and (14), respectively, in conjunction with feedback obtained by the
central
processing unit (30) through analysis of the remote computer video data (515).
[0035] Also when applicable, the audio response from the computer (110) is
provided to the user (5) by the central processing unit (30), either in
textual form to
the text-to-speech software (17), which then conveys the output audio response
to
the user (5) via the speaker (2), or directly to the speaker (2) via the
central
processing unit (30) audio interface (38).
[0036] The control of the computer (110) by use of simple mouse and
keyboard commands (i.e. moving the cursor to a given position, typing text in
a
window, clicking on the mouse buttons, etc.) greatly simplifies the coding
required
to control the computer. However, it is to be understood that the interaction
system
(42) disclosed herein does not preclude other computer control means, which
may
be deemed more appropriate to control specific processes or applications which
may require writing special communication control methods.
[0037] Furthermore, although the interaction system (42) has been
described herein as providing vocal responses to the user, it is to be
understood
that other means of response may also be used such as, for example, printouts,
saving files, searching or providing a video display to display questions and
gather
information from the user's (5).
[0038] It is to be understood that although throughout the present
specification reference is made to a USB bus connection, other data and/or
power
transmission means, either combined or separate, may also be used.
[0039] Although the present invention has been described by way of a
particular embodiments and examples thereof, it should be noted that it will
be
CA 02686692 2009-12-01
apparent to persons skilled in the art that modifications may be applied to
the
present particular embodiments without departing from the scope of the present
invention.
I .1 ;(M5. t