Note: Descriptions are shown in the official language in which they were submitted.
CA 02587899 2007-04-27
Title of the invention
Text to speech interactive voice response system
Field of the invention
This invention relates to data processing using computer hardware and
computer software and more particularly, this invention relates to a text to
speech interactive voice response system.
Background of the Invention
Particularly in the area of transportation, many organizations rely heavily on
the dissemination of clear and concise audio information. For example,
Automatic Terminal Information Services are a key component of airports of
all sizes, and are used by air traffic control personnel to create, monitor
and broadcast critical information to incoming and departing aircraft. In the
past, such broadcasts were recorded manually. This was labor intensive and
costly. Therefore there is a need for a system that can create these
broadcasts automatically and effortlessly while still retaining the high
quality that is required when broadcasting mission-critical information.
1
CA 02587899 2007-04-27
Summary of the Invention
The present invention meets the need for a system that can automatically
convert text messages to speech messages for broadcast in a clear and
natural voice. The system can be customized for local languages and
accents. Incoming text data may be highly abbreviated (such as weather or
aviation information) or in the form of standard orthography. The text data
are converted into high quality voice output in preparation for broadcast.
Errors originating in the incoming text data are flagged and optionally
logged. The user is therefore able to verify the final message for accuracy
before broadcast, and can perform a vocabulary search if missing words or
errors are found. The invention provides a cost-effective solution to
broadcast, and a dramatic increase in organizational efficiency.
Description of the Drawing
Figure 1 is a schematic of the text to speech conversion system of one
example of the invention.
2
CA 02587899 2007-04-27
Detailed Description of the Invention
The invention (10) is a general-purpose text-to-speech, interactive voice
response broadcasting system for various computer-based applications. It
is a modular system that is designed around a central Kernel Module (14)
that links to a set of service modules that perform various functions. These
modules can be standalone executables or dynamic link libraries. The
interface to each module is well defined, thus allowing various application-
or user-specific modules to be implemented that fulfill the requirements of
a set of specific needs.
The invention consists of a number of process control subsystems and
storage modules, plus a multiplicity of lower level modules that supply the
functionality required by other components of the system. All broadcast
information that is saved within or transported throughout the system is
maintained in specific kinds of storage structures.
3
CA 02587899 2007-04-27
The Storage Structures
The storage structures define the main storage units used by the system.
They are the Audio Data Item (62), the Input Data Item (26), the Output
Data Item (46), the Vocabulary Item (51), and the Schedule Data Item (78).
The Audio Data Item (62)
An Audio Data Item (62) is a text string reference to audio data that is used
to generate audio broadcasts. If the system is configured to store audio
data in disk files, the item references a file name. If the system is
configured to store audio data in computer memory, the item references a
memory offset and data length.
The Input Data Item (26)
An Input Data Item (26) is used to save and transport input data
information. It is used to generate the Output Data Items (46). It consists of
a text string name value, a text string type value, a text string language
value, a text string raw text value, a text string error structure, and a
numerical time stamp value. The name, type, and language values are
4
CA 02587899 2007-04-27
sufficient to uniquely identify an instance of an Input Data item. The type
value is also used to determine how the associated raw text is to be
processed, and how the item is to be maintained by the system.
The Output Data Item (46)
An Output Data Item (46) is used to save output data information. It is
derived from an Input Data Item (26) and consists of a text string name
value, a text string type value, a text string language value, a text string
raw text value, a text string processed text value, an Audio Control List (72)
(a list of Audio Data Items), and a numerical time stamp value. The name,
type, and language values are derived from the antecedent Input Data Item
and are sufficient to uniquely identify an instance of an Output Data item.
The type value is also used to determine how the item is to be maintained
by the system.
The Vocabulary Item (51)
A broadcast message is assembled using a technique called "speech
concatenation", which joins the individual speech units into longer phrases
and sentences. Speech units are individual word or phrases that are
5
CA 02587899 2007-04-27
associated with recorded and processed audio data. The method of speech
concatenation that is used by the system is critical to producing a high
quality of voice output. The invention uses a speech concatenation
technique that is based upon "intonational phrases", and which takes in
account the intonation and timing aspects of human speech.
Concatenation systems that do not take these aspects into account often
produce voice output that sounds "choppy" and disjointed.
A Vocabulary Item (51) associates the text of a speech unit with recorded
audio data information. It consists of a text string speech unit, an Audio
Data Item that references the audio data, and a numerical duration value.
The speech unit and language values are used to uniquely identify a
Vocabulary Item.
The Schedule Data Item (78)
Schedule Data Items (78) are used to save and transport scheduling
information and to control the audio broadcasts generated by the system.
Each Schedule Data Item consists of a text string name value, a text string
type value, and a text string language value. These values can either be
6
CA 02587899 2007-04-27
used to identify an Output Data Item or specify control information for a
broadcast.
The Storage Modules
When managing and processing data items, the system needs mechanisms
from which it can store and reference the data as needed. To accomplish
this, the system contains a number of persistent storage lists: the Input
Data Item List (28), the Error Data Item List (40), the Review Data Item List
(44), the Output Data Item List (48), and one or more Vocabulary Item Lists
(52).
The Input Data Item List (28)
The Input Data Item List is a container class for Input Data Items (26) that
have been received from the Input/Filter Module (12). It is designed as a
first in first out list.
The Error Data Item List (40)
The Error Data Item List is a container class for Input Data Items whose raw
text cannot be completely processed by the Process Control Module (30). It
7
CA 02587899 2007-04-27
is designed as an alphabetically sorted list that is based upon the name and
type values.
The Review Data Item List (44)
The Review Data Item List is a container class for Input Data Items that
must be reviewed by a system user through the User Interface Module (16)
before processing by the Process Control Module (30). It is designed as an
alphabetically sorted list that is based upon the name and type values.
The Output Data Item List (48)
The Output Data Item List is a container class for Output Data Items that
have been successfully processed by the Process Control Module. It is
designed as an alphabetically sorted list that is based upon the name and
type values.
The Vocabulary Items Lists (52)
A Vocabulary Items List is a container class for Vocabulary Items (51).
There must be a separate Vocabulary Item List for each language that is
8
CA 02587899 2007-04-27
supported by the system. It is designed as an alphabetically sorted list that
is based upon the word and phrase values.
The Broadcast Data Control Subsystems
The Broadcast Data Control Subsystem controls the life cycle of the data
that are used to generate broadcasts.
The Input Subsystem
The Input Subsystem consists of the Input Control Module (22) and the
Input/Filter Module (12).
The Input Control Module (22) is part of the Kernel Module (14) process. It
loads and manages the Input/Filter Module (12) and adds new Input Data
Items (26) to the Input Data Item List (28). This subsystem is not required
if the system can acquire data in some other means.
The Input/Filter Module (12) usually consists of two sub modules. The
Input DLL (20) module is an application specific module that acquires raw
text (11) with a predetermined application-specific format from some
source device. This application-specific text (23) is passed to the Filter DLL
9
CA 02587899 2007-04-27
(24) module, another application-specific module. The Filter DLL (24)
module scans the input text, and selects strings of text that conform to
specified properties, taken from within the input text data stream. Each
string of selected text is classified into a type value and given a name
value.
A new Input Data Item (26) is generated for each language supported.
These are passed to the Input Control Module (22) for inclusion in the Input
Data Item List (28).
The Audio Record Subsystem
The Audio Record Subsystem consists of the Record Control Module (123)
and the Record DLL (120) module. Through the Command Control
Subsystem, described below, it allows users to record the audio data to be
associated with specific Output Data Items. The Record Control Module
(123) is a part of the Kernel Module (14) process that loads and manages an
application-specific Record DLL module. The Record DLL module must
implement an interface to one or more audio input devices. A variation of
the Record DLL module is an interface to a third party text- to-speech
system.
CA 02587899 2007-04-27
The Process Control Subsystem
The Process Control Module (30) is the data processing subsystem. Input
Data Items (26) from the Input Data List (28) are processed to create Output
Data Items (46) that are stored in the Output Data Item List (48). Data
processing is a one to three step process that is controlled by the attributes
of the Input Data Item type value. If the Input Data Item type value
specifies that a system user must preview it (first value type), then the
Input
Data Item (42) is placed in the Review Data Item List (44). It is later
processed through the Command Interface Module (90) using the
processing steps below.
In some configurations, information in some Input Data Items (26) of
specified types (second value type) may need to be merged with previously
processed data. The merging process is done through the Combine DLL
(108) module that can be implemented to modify items in application-
specific ways.
If the raw text value associated with an Input Data Item of a specific type
(third value type) is encoded and requires some kind of translation, rules
11
CA 02587899 2012-07-18
must be generated for an application specific version of the Translate DLL
(104) module that modifies the raw text so that it is in a usable form.
If the raw text value associated with an Input Data Item has a well-defined
text format (fourth value type) and if rules have been defined for checking
that text format, then the Format Check DLL (102) module is used to
process the raw text. If the text fails the format check, the Input Data Item
is put (38) into the Error Item List (40) for correction through the Command
Interface Module (90). If the text passes the format check, the text
returned (34) from the Format Check DLL (102) module replaces the raw
text value.
The processed text and Input Item language value are then passed to the
Parser Module (50). This module attempts to decompose the processed
text into a sequence of speech units that correspond to speech units in
Vocabulary Items (51) in the Vocabulary Item List (52) associated with the
language value. If the processed text can be completely decomposed into a
sequence of speech units, then a list of the Audio Data Items (Audio
Control List) associated with the Vocabulary Items is generated, and an
12
CA 02587899 2007-04-27
Output Data Item (46) is also generated and placed in the Output Data Item
List (48). If the processed text cannot be decomposed, then the Input Data
Item (26) is placed in the Error Data Item List (40) for correction through
the Command Interface Module (90).
The Broadcast Control Subsystem
In order to be operational, the system must generate audio broadcasts to at
least one audio device. For the purpose of the system, an audio device is
defined as a specific sound output device (such as a device that is
associated with a telephone interface or a computer sound card), or as
saved audio data. Each audio device used by the system must be uniquely
identified so that different broadcasts can be directed to specific audio
devices.
The Broadcast Control Subsystem is the subsystem that controls the
generation of the audio broadcasts. It is composed of the Assemble Module
(60), the Schedule Control Module (68), the Schedule DLL (80) module, the
Audio Interface Module (82), and Audio DLL (66) module. This is the only
subsystem in which an instance of all sub-components is required. The
13
CA 02587899 2007-04-27
Schedule Control Module (68) is the main control module of the subsystem.
It initializes all other components and directs information flow for the
subsystem.
The Audio Interface Module (82) loads and communicates with the Audio
DLL (66) module. The Audio DLL (66) module is an application-specific DLL
module. When an Audio Control List (84) is passed to it, this module
generates audio broadcasts on specified audio devices.
The Schedule DLL (80) module is an application-specific DLL module that
controls what is broadcast, when it is broadcast, and on what audio device
it is broadcast. When an audio broadcast is to be done, it must pass a list
of Schedule Data Items (78) to the Schedule Control Module (68). The list
defines the structure and contents of a broadcast. The Schedule DLL (80)
module can interface with other devices, such as a Telephone Interface DLL
(124) module. This allows the system to respond to the telephone device
as any other audio device.
For each audio broadcast that is associated with an audio device, the
Assemble Module (60) creates Audio Control list (72) using information
14
CA 02587899 2007-04-27
from the list of Schedule Data Items (70) that has been received from the
Schedule DLL (80) module. Each Schedule Data Item can reference an
Output Data Item or a command string that controls the audio broadcast.
Each Output Data Item (62) is associated with a list of Audio Data Items.
The lists of Audio Data Items and the command strings are assembled to
create the Audio Control List (84). This Audio Control List controls the
concatenation of Audio Data Items, which are then sent to the audio device.
The System Management and Control Subsystems
The System Management and Control Subsystems are the subsystems that
define the user interface, configuration, and record keeping for the system.
The Command Subsystem
The Command Subsystem allows the system to interact directly with system
users. Although it is not essential that there be a user interface component
to a system, it is usually needed. This subsystem is made up of the
Command Interface Module (90) in the Kernel Module (14), and a User
Interface Module (16).
CA 02587899 2007-04-27
The Command Control Module (90) permits access to the Data Item Lists
(28), (40), (44), (48) and (52), so that Data Items can be viewed, modified,
or deleted. It also allows user access (98) to application-specific DLL
modules (100), using messages through the Event Handler Module (96). For
this functionality to be available, application-specific DLL modules are
implemented to include the Event Handler functions (101).
The User Interface Module (16) is an application-specific module that
allows users access to the system. It can be a stand-alone executable that
is run locally or remotely. It can also be implemented so that it is run
through a web browser or as a DLL module. User Interface Modules will
display a graphical user interface, and most will require a Text DLL (106)
module that supplies the appropriate text for an application, in the
language of the user. This design also allows the User Interface Module
(16) to specify extra processing through an application-specific Batch
Control DLL (122) module.
16
CA 02587899 2007-04-27
The Utility Control Subsystem
The Utility Control Subsystem supplies common functionality to all system
components. The Configuration Control DLL (125) module supplies a
consistent method of accessing the system configuration system. The
configuration system allows for a hierarchy of configuration files and
registry sections. Each file or registry section contains a set of section
values with associated key values. When a request (126) is made, the
configuration system is examined from last to first, or until the section-key
value is found. This allows for systems to be configured with local,
regional, and default settings.
The Log Control Module (121) supplies a consistent method of logging
system information into a common location. All system modules can be
implemented to access the log control functions.
The Intersystem Control Subsystem
A specific application can require that two or more systems communicate
with one another. The Intersystem Control subsystem permits this
communication. The Intersystem Control Module (127) is the Kernel
17
CA 02587899 2007-04-27
Module (14) that loads and initializes the subsystem. The Intersystem DLL
(110) module implements the application requirements for intersystem
communication. Since the functionality of this module is application-
specific, most of the communication with this module is done using
messages through the Event Handler Module (96).
Instruction Set
The Invention also includes a computer software program having a set of
instructions for converting text to speech. The set of instructions is formed
into a plurality of interdependent modules comprising:
An input/filter process;
An input control process;
A kernel control process;
A parser process;
A schedule control process;
An assembly process;
18
CA 02587899 2007-04-27
An audio interface process;
A command interface process; and
A user interface process.
The computer software program modules comprise a plurality of sets of
instruction comprising:
A first set of instructions for defining an input/filter process;
A second set of instructions for defining an input control process;
A third set of instructions for defining a kernel control process;
A fourth set of instructions for defining a parser process;
A fifth set of instructions for defining a schedule control process;
A sixth set of instructions for defining an assembly process;
A seventh set of instructions for defining an audio interface process;
An eighth set of instructions for defining a command interface process; and
A ninth set of instructions for defining a user interface process.
19
CA 02587899 2007-04-27
The foregoing description of the invention has been presented for
illustration purposes and description. It is not intended to be exhaustive or
to limit the invention to the precise form disclosed, and other modifications
may be possible in light of the above teachings. The example was chosen
and described in order to best explain the principles of the invention and
its practical application to thereby enable others skilled in the art to best
utilize the invention in various embodiments and various modifications as
are suited to the particular use contemplated. It is intended that the
appended claims be constructed to include other alternative embodiments
of the invention except insofar as limited by the prior art.