Patent 2375589 Summary

(12) Patent Application:	(11) CA 2375589
(54) English Title:	METHOD AND APPARATUS FOR DETERMINING USER SATISFACTION WITH AUTOMATED SPEECH RECOGNITION (ASR) SYSTEM AND QUALITY CONTROL OF THE ASR SYSTEM
(54) French Title:	METHODE ET APPAREIL POUR DETERMINER LA SATISFACTION DE L'UTILISATEUR DE SYSTEMES DE RECONNAISSANCE AUTOMATISEE DE LA PAROLE
Status:	Dead

Bibliographic Data

(51) International Patent Classification (IPC):	G10L 15/01 (2013.01) G10L 25/63 (2013.01)
(72) Inventors :	CRAIG, JAMES (Canada) OSBURN, ANDREW (Canada) COCKERILL, CARTER (Canada) BERNARD, JEREMY (Canada) BOYLE, MARK (Canada) BURNS, DAVID (Canada)
(73) Owners :	DIAPHONICS, INC. (Canada)
(71) Applicants :	DIAPHONICS, INC. (Canada)
(74) Agent:	GOWLING LAFLEUR HENDERSON LLP
(74) Associate agent:
(45) Issued:
(22) Filed Date:	2002-03-08
(41) Open to Public Inspection:	2003-09-08
Examination requested:	2003-02-26
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Abstracts

English Abstract

An apparatus for determining user satisfaction using automated speech
recognition (ASR) systems is disclosed. The apparatus comprises: means for
assessing the voice user emotional state based upon the voice characteristics;
means for assessing the voice user behavioural pattern based upon the current,
and previous, interactions with the ASR application; means for decision-
modelling the overall voice user experience based upon the emotional state and
behaviour pattern; and a real-time adaptation means of the voice user
interface
to match the individual based upon the QC in ASR assessment of the voice user
experience. A method of determining user satisfaction using automated speech
recognition (ASR) systems is also disclosed.

Claims

Note: Claims are shown in the official language in which they were submitted.

11

What is claimed is:

1. An apparatus for determining user satisfaction using automated
speech recognition (ASR) systems, the apparatus comprising:
(a) means for assessing the voice user emotional state based upon the
voice characteristics;
(b) means for assessing the voice user behavioural pattern based upon
the current, and previous, interactions with the ASR application;
(c) means for decision-modelling the overall voice user experience based
upon the emotional state and behaviour pattern; and
(d) a real-time adaptation means of the voice user interface to match the
individual based upon the QC in ASR assessment of the voice user experience.

2. An apparatus according to claim 1, further comprising a database
storage of historical voice user behavioural data, and decision modelling
algorithms employed to assess and weigh the data elements from the emotional
state and behavioural pattern assessments in order to achieve an overall
determination regarding the voice user experience.

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02375589 2002-03-08
Method and Apparatus for Determining User Satisfaction with
Automated Speech Recognition Systems
Field of the Invention
The present invention relates to a method of determining user satisfaction
with automated speech recognition systems and an apparatus using the
method. The invention is concerned with gauging user satisfaction with, and
quality control (QC) of, Automated Speech Recognition Systems (ASR). The
method and apparatus of the invention draw upon analysis of Speaker Emotion,
historical user behaviour, and statistical methods in order to estimate the
degree
of user satisfaction with an ASR system (QC in ASR). The QC in ASR system
operates within the Public Switch Telephone Network (PSTN) and integrates
components that are based in telephony services, automated speech
recognition, automated assessment of speaker emotion, and automated
speaker behaviour profiling.
Background and Summary of the Invention
Despite ever-increasing usage of the Web, companies still receive more
than 70% of their orders through the appliance their customers prefer - the
telephone. Thanks to widespread adoption of cellular telephones, voice remains
the preferred means of doing business. There will be more than 2 Billion
telephones by 2003, a figure that dwarfs the projected 200 Million Web-enabled
computers. For the foreseeable future, voice will continue to be the dominant
mode for exchanging business information.
Automated speech recognition (ASR) is a technology that allows a
computer to recognize and interpret human speech, much as the computer
would recognize a typed command. ASR has been around several decades, but
only relatively recent improvements in software and computing power have
made it a compelling business tool.
The key ASR benefits for callers are as follows:

CA 02375589 2002-03-08
2
- The ability to conduct transactions and receive information at any time
over any phone: Customers do not require any special equipment or Internet
access.
- A simple, user-friendly speech interface: Natural-language speech
recognition flattens out the frustrating hierarchy or tree structure
associated with
touch-tone menus, making ASR a more pleasant, efficient and effective system.
- 24/7 availability: The customer is not limited to store or call-center
hours.
For companies using ASR, key benefits include:
- Real-time integration with business systems: No re-keying of data
associated with manual transaction and customer care processing.
- Reduced requirements for Customer Service Agents: Customers are
never put on hold, and agents can focus on other tasks and more complex
transactions.
- Lower administrative costs: ASR allows companies to bring in customers
over an automated channel.
- Improved customer service levels: Implementing ASR will help eliminate
hold times, and the system is available 2417.
The challenge with ASR is designing and developing applications that will
emulate the natural human dialogue process. A good ASR application provides
a natural and intuitive dialogue flow that allows the User to interact with
the
system without question or concern. Achieving this level of ASR application
sophistication is very difficult in practice.
One of the most important elements missing from current ASR systems is
the ability to not only understand what the user is saying but also the
dialogue
context, meaning, and manner in which the speech is conveyed. In order to
fully
assess the user satisfaction and interaction with the ASR application a great
deal more information is required. The QC in ASR system meets this challenge
by providing a full and robust system for gauging the User-ASR experience.
This is accomplished by not only recognising the speech but also by assessing
the User emotional state and individual behavioural characteristics. This

CA 02375589 2002-03-08
assessment is then used to adapt the Voice User Interface to better meet the
needs of the individual user. Therefore, the ASR application can be tailored
real-time to match the individual and thereby begin to emulate a much more
human dialogue process.
There are several commercially available Quality Control and ASR
Monitor software based tools existing in the marketplace today. These tools
draw upon data in ASR application logs in order to identify problem areas in
the
applications such as bottlenecks, poor dialogue flows etc. These tools assist
in
tuning the applications to smooth out or redefine dialogues that may be
misunderstood or misleading. These tools are very rudimentary in scope and do
not conduct any assessment of the Voice User experience.
There are currently no other similar methods or processes in place to
assess the Voice User experience with an ASR in an automated fashion.
According to the present invention, the QC in ASR integrates the
following areas of technology:
- Automated speech recognition
- Automated analysis and assessment of speaker emotion based upon
utterances made by the speaker during the use of an ASR system
- Automated user behaviour profiling based upon ASR usage logs and
historical user profile data
- Decision matrix that takes as input all data regarding the User-ASR
experience, and determines a User satisfaction level, i.e., estimates the ease
with which the User interacted and was satisfied with ASR system.
- Algorithms that will automatically and dynamically adapt the ASR voice
intertace (i.e. the dialogue flow) to match the individual User needs based
upon
the outputs from the ASR level of satisfaction decision matrix.
- Statistical methods for aggregating User satisfaction and behavioural
data

CA 02375589 2002-03-08
4
Features of the invention and advantages associated therewith are as
follows:
- The QC in ASR does not rely on a single data source but rather
combines a number of unique methods to assess the User-ASR experience.
- Three distinct voice and speech components are analyzed (Context and
Discourse Structure, Prosody, and Paralinguistic Features) in order to assess
the User Emotional State
- User Emotional State is assessed real-time, ie while the User-ASR
interaction is ongoing
- The User behavioral pattern and history is stored, accessed, and
updated, based upon each interaction of the User with the ASR. This allows the
ASR to know in advance the User preferences and abilities and to tailor the
Voice User Interface appropriately.
- The data from the User emotional assessment and behavioural pattern
are used as inputs to a decision matrix in order to determine an assessment of
the overall User experience and satisfaction level
- The User emotional assessment and behavioural pattern data are used
to dynamically adapt the ASR Voice User Interface to conform to the needs of
the individual User
- Statistical methods are employed in order to assess the effectiveness of
the ASR Voice User Interface
Further understanding of other features, aspects, advantages of the
present invention will be realized by reference to the following description,
appended claims, and accompanying drawings.
Brief Description of the Drawings
Embodiments) of the present invention will be described with reference
to the accompanying drawings, wherein:
Fig. 1 schematically illustrates the components of the QC in ASR system
in accordance with the present invention;
Fig. 2 is a schematic presentation of the voice user emotion assessment
in Fig. 1; and

CA 02375589 2002-03-08
Fig. 3 is a schematic flow chart showing the QC in ASR process flow in
accordance with the present invention.
Detailed Description of the Preferred Embodiments)
5 Fig. 1 schematically illustrates the components of the QC in ASR system
in accordance with one embodiment of the invention.
Each component in Fig. 1 will be explained below.
1.0 Automated Speech Recognition Application
This component can include any ASR based application implemented
using contemporary speech recognition engines.
ASR applications have the ability to record the utterances made by the
speaker. These utterances, which are saved in standard audio file formats such
as .wav, .vox, etc, can then be used as inputs to the Voice User Emotion
Assessment 3.0 component as shown in Fig. 1.
The ASR application also builds a log file for every session conducted
with a user. The log file contains a great deal of information regarding the
user
session including such data as invalid responses, re-prompts, time-outs,
gender,
etc. The log file data are used as inputs to the Voice User Behavioural
Assessment 5.0 component.
2.0 ASR Log Files and Utterances
This component represents the data source of ASR Log files and User
utterances. As discussed above, the utterance and log files contain the source
data used by both the Voice User Emotion and Behavioural Assessment
components 3.0 and 5Ø
3.0 Voice User Emotion Assessment
This component takes as an input the User utterance file and processes
the voice data in order to assess the User emotional state. Several distinct

CA 02375589 2002-03-08
6
voice and speech components are analysed: For example, Context and
Discourse Structure, Prosody, and Paralinguistic Features. This component is
the most complex within the QC in ASR system and is described in further
detail
below. The Voice User Emotion Assessment data is updated with every User
utterance and passed to the Voice User Level of Satisfaction Matrix.
4.0 Voice User Level of Satisfaction Matrix
This component takes as an input the results of the Voice User Emotion
Assessment component 3.0 and the Voice User Behavioural Assessment
component 5Ø The decision matrix consists of a set of algorithms that
determines an estimate of the overall User satisfaction level based upon the
emotional and behavioural assessments. As the emotional and behavioural
assessments are continually being updated throughout the course of the User-
ASR interaction, the decision matrix is also continually updating the
determination of the user satisfaction level.
The estimated User level of satisfaction is frequently updated and passed
to the ASR Voice User Interface Adaptation Component 6Ø
5.0 Voice User Behavioural Assessment
This component takes as an input the ASR log files and processes the
contained data in order to assess the Voice User Behavioural pattern. The
behavioural pattern describes the manner in which the User is able to interact
with and navigate the ASR. For example, a novice User that is unfamiliar with
the dialogue flow or a User that has demonstrated difficulty in using the ASR
require a more directed and robust dialogue. Experienced users who have
demonstrated that they can move quickly through the ASR require a more terse
and brief dialogue flow. Therefore, the User behavioural pattern is built over
a
period of time based upon each interaction of the User with the ASR.
Each time the User accesses the ASR, the behavioural pattern is created
and/or updated as appropriate to reflect the User capabilities and, thereby,
reflect the Users individual needs.

CA 02375589 2002-03-08
The Voice User Behavioural Assessment data is updated, as Log File
data is available, and then passed to the Voice User Level of Satisfaction
Matrix.
6.0 ASR Voice User Interface Adaptation Component
This component takes as an input the User Level of Satisfaction data
from component 4Ø Based upon the determined level of satisfaction the Voice
User Interface within the ASR is updated dynamically to meet the immediate
User needs. In this manner the real-time determination of the User-ASR
interaction experience is determined and acted upon in order to conform and
tailor the ASR Voice User Interface to meet the individual and immediate User
requirements.
7.0 Voice User Historical Behavioural Data Component
~ This component represents the data source for User behavioural data. A
database record is created for an individual the first time they access the
ASR.
The record contains information regarding the individual's interaction with
the
ASR and reflects their level of satisfaction and ease of use during each
interaction. Each successive time the User accesses the ASR the historical
profile is queried in order to tailor the Voice User interface to meet the
individual
needs. Upon termination of the User-ASR interaction, the behavioural profile
is
amended as required.
The Voice User Emotion Assessment Component 3.0 will be described
below in greater detail, in conjunction with Fig. 2.
As noted above, this component 3.0 is very sophisticated within the QC in
ASR system. The purpose of the component is to process the User spoken
utterance files with the objective of attempting to determine the speaker
emotional state. The results can be sufi'icient to indicate if the User has
had a
"negative" experience as opposed to a "positive" one.

CA 02375589 2002-03-08
To achieve this objective there are many features and characteristics of
the human voice which can be derived and analysed in order to determine an
assessment of the User emotional state. The distinct voice and speech
components that are analysed are, for example, Context and Discourse
Structure 3.1, Prosody 3.2, and Paralinguistic Features 3.3 as illustrated in
Fig.
2.
Each components of the Voice User Emotion Assessment is further
detailed as follows:
3.1 Context and Discourse Structure
Context and Discourse Structure give consideration to the overall
meaning of a sequence of words rather than looking at specific words in
isolation. Different words can mean different things depending upon the
context
in which they are spoken. Therefore, one has to consider the overall discourse
and structure of the dialogue flow in order to fully assess the meaning and
emotion contained therein.
Techniques used to derive context and structure will consider the rise and
fall of voice intonation and computing the probability of a certain word based
upon the previous words that have been spoken.
3.2 Prosody
Prosodic features of voice are reflected in vocal effects such as variations
in pitch, volume, duration, and tempo among others. Of the three voice
components, prosody in voice holds the greatest potential for determination of
conveyed emotion. Prosodic features are extracted from a voice sample
through digital signal processing techniques. The prosodic features are
determined and then analysed in order to attempt to classify the user emotion.
Often several voice samples are required in order to derive an emotional
state.
3.3 Paralinguistic Features

CA 02375589 2002-03-08
9
Paralinguistic features or voice are separated into two types of
classifications. The first is voice quality that reflects different voice
modes such
as whisper, falsetto, and huskiness, among others. The second is voice
qualifications that include non-verbal cues such as laugh, cry, tremor, and
fitter.
As with prosody, these voice features can be extracted through digital signal
processing techniques. Paralinguistic features are then analysed in order to
attempt to classify the user emotion.
Fig. 3 is a schematic flow chart showing the QC in ASR process flow in
accordance with the present invention.
According to the present invention, the QC in ASR process flow is as
follows:
1. User calls the ASR application.
2. The ASR, through standard means such as account number,
password/PIN, voice biometric characteristics etc, identifies the caller.
3. The User behavioural profile is retrieved from the User Behavioural
database and the ASR Voice User Interface is initially configured based upon
the User profile. If the User is accessing the ASR for the first time then a
new
User Behavioural database record is created and a default Voice User Interface
is configured.
4. The ASR interacts with the User and, each time a User response is
provided, an utterance file is recorded and a Log File entry is made.
5. The Voice User Emotional Assessment component processes the
utterance files and the Voice User Behavioural Assessment component
processes the log files.
6. Step 5 is iterative and will be repeated each time a Voice User
response is provided.
7. The User Emotional and Behavioural Assessment data are passed to
the Voice User Level of Satisfaction Decision Matrix. The data are processed
in
order to determine the immediate user level of satisfaction.
8. The User level of satisfaction data are passed to the ASR Voice User
Adaptation Component. Based on the user satisfaction level, the Voice User

CA 02375589 2002-03-08
1
Interface can be immediately tailored to match the requirements of the User at
that specific time.
9. Upon completion of the User-ASR interaction, the User Behavioural
data record is updated.
While the present invention has been described with reference to several
specific embodiments, the description is illustrative of the invention and is
not to
be construed as limiting the invention. Various modifications and variations
may
occur to those skilled in the art without departing from the true spirit and
scope
of the invention as defined by the appended claims.

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(22) Filed	2002-03-08
Examination Requested	2003-02-26
(41) Open to Public Inspection	2003-09-08
Dead Application	2007-03-08

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2006-02-28	R30(2) - Failure to Respond
2006-03-08	FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Application Fee			$300.00	2002-03-08
Request for Examination			$400.00	2003-02-26
Registration of a document - section 124			$100.00	2003-03-06
Maintenance Fee - Application - New Act	2	2004-03-08	$100.00	2004-01-22
Registration of a document - section 124			$100.00	2004-12-31
Maintenance Fee - Application - New Act	3	2005-03-08	$100.00	2005-02-22
Registration of a document - section 124			$100.00	2013-06-20

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
DIAPHONICS, INC.

Past Owners on Record
BERNARD, JEREMY
BOYLE, MARK
BURNS, DAVID
COCKERILL, CARTER
CRAIG, JAMES
OSBURN, ANDREW

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Representative Drawing	2002-06-06	1	9
Cover Page	2003-09-02	1	43
Abstract	2002-03-08	1	19
Description	2002-03-08	10	403
Claims	2002-03-08	1	25
Drawings	2002-03-08	3	53
Claims	2005-03-01	4	170
Abstract	2005-03-01	1	18
Description	2005-03-01	11	413
Fees	2004-01-22	1	36
Prosecution-Amendment	2004-09-08	2	72
Correspondence	2002-04-10	1	25
Assignment	2002-03-08	3	81
Assignment	2003-03-06	5	212
Prosecution-Amendment	2003-02-26	1	37
Assignment	2003-03-20	1	24
Assignment	2005-02-17	1	39
Assignment	2004-12-31	24	1,142
Prosecution-Amendment	2005-03-01	13	482
Fees	2005-02-22	1	30
Prosecution-Amendment	2005-08-31	2	54
Assignment	2013-06-20	3	104

Language selection

Menus

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2375589 Summary

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.