Language selection

Search

Patent 3166784 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3166784
(54) English Title: HUMAN-MACHINE INTERACTIVE SPEECH RECOGNIZING METHOD AND SYSTEM FOR INTELLIGENT DEVICES
(54) French Title: PROCEDE DE RECONNAISSANCE VOCALE POUR L'INTERACTION HOMME-MACHINE D'UN APPAREIL INTELLIGENT ET SYSTEME
Status: Examination
Bibliographic Data
(51) International Patent Classification (IPC):
  • G10L 15/22 (2006.01)
(72) Inventors :
  • SUN, PENGFEI (China)
  • JIA, HONGYUAN (China)
  • LI, CHUNSHENG (China)
(73) Owners :
  • 10353744 CANADA LTD.
(71) Applicants :
  • 10353744 CANADA LTD. (Canada)
(74) Agent: JAMES W. HINTONHINTON, JAMES W.
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 2019-09-19
(87) Open to Public Inspection: 2020-07-09
Examination requested: 2022-07-04
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/CN2019/106778
(87) International Publication Number: CN2019106778
(85) National Entry: 2022-07-04

(30) Application Priority Data:
Application No. Country/Territory Date
201910002748.8 (China) 2019-01-02

Abstracts

English Abstract

A speech recognition method for human-machine interaction of a smart apparatus and a system, pertaining to the technical field of speech recognition, and improving the accuracy of speech recognition by means of joint optimization training of intent detection and slot filling. The method comprises: performing word segmentation on speech data of a user's question to obtain an original word sequence, and generating a vector representation of the original word sequence by means of embedding processing; performing weighting processing on a hidden state vector hi and a slot context vector ci S to obtain a slot label model yi S; performing weighting processing on a hidden state vector hT and an intent context vector cI to obtain an intent prediction model yI; joining the slot context vector ci S and the intent context vector cI by means of a slot gate g, and obtaining a transformed representation of the slot label model yi S by means of the slot gate g; and constructing an objective function for joint optimization of the intent prediction model yI and the transformed slot label model yi S, and performing intent detection on the speech data of the user's question on the basis of the objective function.


French Abstract

L'invention concerne un procédé de reconnaissance vocale pour l'interaction homme-machine d'un appareil intelligent et un système, appartenant au domaine technique de la reconnaissance vocale et améliorant la précision de la reconnaissance vocale au moyen de l'apprentissage d'optimisation conjointe de détection d'intention et de remplissage de cases. Le procédé consiste : à effectuer une segmentation de mots sur des données de parole d'une question d'un utilisateur pour obtenir une séquence de mots d'origine et à générer une représentation vectorielle de la séquence de mots d'origine au moyen d'un traitement d'incorporation ; à effectuer un traitement de pondération sur un vecteur d'état caché hi et sur un vecteur de contexte de case ci S pour obtenir un modèle d'étiquette de case yi S ; à effectuer un traitement de pondération sur un vecteur d'état caché hT et sur un vecteur de contexte d'intention cI pour obtenir un modèle de prédiction d'intention yI ; à joindre le vecteur de contexte de case ci S et le vecteur de contexte d'intention cI au moyen d'une porte de case g et à obtenir une représentation transformée du modèle d'étiquette de case yi S au moyen de la porte de case g ; et à construire une fonction objective pour une optimisation conjointe du modèle de prédiction d'intention yI et du modèle d'étiquette de case transformée yi S et à effectuer une détection d'intention sur les données de parole de la question de l'utilisateur sur la base de la fonction objective.

Claims

Note: Claims are shown in the official language in which they were submitted.


CLAIMS
What is claimed is:
1. A human-machine interactive speech recognizing method for an intelligent
device,
characterized in comprising:
subjecting a speech question of a user to a term-segmenting process to obtain
an original term
sequence, and vectorizing the original term sequence through an embedding
process;
calculating a hidden state vector hi and a slot context vector cis of each
term segmentation vector,
and weighting the hidden state vector hi and the slot context vector cis to
thereafter obtain a slot
label model yis;
calculating a hidden state vector hT and an intent context vector cf of the
vectorized original term
sequence, and weighting the hidden state vector hT and the intent context
vector cf to thereafter
obtain an intent prediction model yi;
employing a slot gate g to join the slot context vector cis and the intent
context vector cf, and
generating a transformed representation of the slot label model yis through
the slot gate g; and
jointly optimizing the intent prediction model yi and the transformed slot
label model yis to
construct a target function, and performing intent recognition on the speech
question of the user
based on the target function.
2. The method according to Claim 1, characterized in that the step of
subjecting a speech question
of a user to a term-segmenting process to obtain an original term sequence,
and vectorizing the
original term sequence through an embedding process includes:
receiving the speech question of the user and transforming the speech question
to a recognizable
text, and employing a tokenizer to term-segment the recognizable text and
obtain the original
term sequence; and
subjecting the original term sequence to a word embedding process, and
realizing a vector
representation of each segmented term in the original term sequence.
16

3. The method according to Claim 1, characterized in that the step of
calculating a hidden state
vector hi and a slot context vector cis of each term segmentation vector, and
weighting the hidden
state vector hi and the slot context vector cis to thereafter obtain a slot
label model yis includes:
employing a bidirectional LSTM network to encode each term segmentation
vector, and
outputting the hidden state vector hi corresponding to each term segmentation
vector;
calculating the slot context vector cis, to which each term segmentation
vector corresponds,
through formula <BIG> ,
wherein IMG represents an attention weight of a slot, its
calculation formula is <IMG> ,
where a represents a slot
activation function, and BIG represents a slot weight matrix; and
constructing a slot label model <IMG>
based on the hidden state
vector hi and the slot context vector c1s.
4. The method according to Claim 1, characterized in that the step of
calculating a hidden state
vector hT and an intent context vector cf of the vectorized original term
sequence, and weighting
the hidden state vector hT and the intent context vector cf to thereafter
obtain an intent prediction
model./ includes:
employing a hidden unit in the bidirectional LSTM network to encode the
vectorized original
term sequence, and obtaining the hidden state vector hT;
calculating the intent context vector cf of the original term sequence through
formula c1 =
<IMG> ,
wherein ct.; represents an attention weight of an intent, its calculation
formula is ct.; =
<IMG> where a' represents an intent activation function,
and <BIG>
represents an intent weight matrix; and
constructing an intent prediction model <IMG>
based on the hidden
state vector hT and the intent context vector cf.
5. The method according to Claim 1, characterized in that the step of
employing a slot gate g to
join the slot context vector cis and the intent context vector cf, and
generating a transformed
17

representation of the slot label model yiS through the slot gate g includes:
formally representing the slot gate g as g= v = tanh (cis. + W = c') , wherein
v represents a
weight vector obtained by training, and W represents a weight matrix obtained
by training; and
formally representing the transformation of the slot label model yis through
the slot gate g as
<IMG>
6. The method according to Claim 1, characterized in that the target function
constructed by
jointly optimizing the intent prediction model./ and the transformed slot
label model y1s. is:
<IMG>
wherein <IMG> represents a conditional probability
for outputting slot filling and intent prediction at a given original term
sequence, where X is the
vectorized original term sequence.
7. The method according to Claim 6, characterized in that the step of
performing intent
recognition on the speech question of the user based on the target function
includes:
sequentially obtaining intent conditional probabilities, to which the various
segmented terms in
the original term sequence correspond, through the target function; and
screening therefrom a segmented term with the maximum probability value and
recognizing the
segmented term as the intent of the speech question of the user.
8. A human-machine interactive speech recognizing system for an intelligent
device,
characterized in comprising:
a term segmentation processing unit, for subjecting a speech question of a
user to a term-
segmenting process to obtain an original term sequence, and vectorizing the
original term
sequence through an embedding process;
a first calculating unit, for calculating a hidden state vector hi and a slot
context vector ci5 of each
term segmentation vector, and weighting the hidden state vector hi and the
slot context vector ci5
to thereafter obtain a slot label model yiS;
a second calculating unit, for calculating a hidden state vector hT and an
intent context vector cf
of the vectorized original term sequence, and weighting the hidden state
vector hT and the intent
18

context vector cf to thereafter obtain an intent prediction model yi;
a model transforming unit, for employing a slot gate g to join the slot
context vector cis and the
intent context vector cf, and generating a transformed representation of the
slot label model yis
through the slot gate g; and
a joint optimization unit, for jointly optimizing the intent prediction
model./ and the transformed
slot label model yis to construct a target function, and performing intent
recognition on the speech
question of the user based on the target function.
9. The system according to Claim 8, characterized in that the term
segmentation processing unit
includes:
a term-segmenting module, for receiving the speech question of the user and
transforming the
speech question to a recognizable text, and employing a tokenizer to term-
segment the
recognizable text and obtain the original term sequence; and
an embedding processing module, for subjecting the original term sequence to a
word embedding
process, and realizing a vector representation of each segmented term in the
original term
sequence.
10. The system according to Claim 8, characterized in that the first
calculating unit includes:
a hidden state calculating module, for employing a bidirectional LSTM network
to encode each
term segmentation vector, and outputting the hidden state vector hi
corresponding to each term
segmentation vector;
a slot context calculating module, for calculating the slot context vector
cis, to which each term
segmentation vector corresponds, through formula <IMG"
.. wherein a represents an
attention weight of a slot, its calculation formula is <IMG>
, where
a represents a slot activation function, and WL represents a slot weight
matrix; and
a slot label model module, for constructing a slot label model yis = so ftmax
<IMG>
) based on the hidden state vector hi and the slot context vector cis.
19

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 03166784 2022-07-04
HUMAN-MACHINE INTERACTIVE SPEECH RECOGNIZING METHOD AND
SYSTEM FOR INTELLIGENT DEVICES
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to the technical field of speech
recognition, and more
particularly to a human-machine interactive speech recognizing method and
system for
an intelligent device.
Description of Related Art
[0002] With the development of the internet technology, there come into being
more and more
intelligent devices that employ speeches for human-machine interaction.
Currently
available speech interactive systems include Siri, Xiaomi, Cortana, Avatar
Framework,
and Duer, etc. As compared with the traditional human-machine interaction
based on
manual input, speech human-machine interaction exhibits characteristics of
conveniency,
high efficiency, and broad range of application scenarios. During the process
of speech
recognition, intent recognition and slot filling techniques are keys to
ensuring the
accuracy of speech recognition results.
[0003] As regards intent recognition, it can be abstracted as a classification
problem, and a
classifier represented by means of CNN + knowledge is then employed to train
an intent
recognition model, in which is further introduced semantic representation of
knowledge
to enhance the generalization capability of the presentation layer in addition
to word-
embedding for speech questions of users, but it has been found in practical
application
that such a model is defective in terms of slot information filling deviation,
whereby
accuracy of the intent recognition model is adversely affected. As regards
slot filling, its
1
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
essence is to formalize a sentence sequence to a marked sequence, and there
are many
frequently used methods to mark sequences, such as the hidden Markov model or
the
conditional random field model, but these slot filling models cannot satisfy
practical
application requirements under specific application scenarios, due to
ambiguities of slots
existent under different semantic intents caused by the lack of contextual
information.
Seen as such, trainings of the two models are independently carried out in the
state of the
art, and there is no combined optimization of the intent recognition task and
the slot filling
task, so that the finally trained models are problematic in terms of low
recognition
accuracy in the aspect of speech recognition, and user experience is lowered.
SUMMARY OF THE INVENTION
[0004] The objective of the present invention is to provide a human-machine
interactive speech
recognizing method and system for an intelligent device, to enhance accuracy
of speech
recognition by jointly optimizing and training intent recognition and slot
filling.
[0005] To achieve the above objective, according to one aspect, the present
invention provides a
human-machine interactive speech recognizing method for an intelligent device,
the
method comprising:
[0006] subjecting a speech question of a user to a term-segmenting process to
obtain an original
term sequence, and vectorizing the original term sequence through an embedding
process;
[0007] calculating a hidden state vector hi and a slot context vector cis of
each term segmentation
vector, and weighting the hidden state vector hi and the slot context vector
cis to thereafter
obtain a slot label model y ;
[0008] calculating a hidden state vector hT and an intent context vector cf of
the vectorized
original term sequence, and weighting the hidden state vector hT and the
intent context
vector cf to thereafter obtain an intent prediction model y';
[0009] employing a slot gate g to join the slot context vector cis and the
intent context vector cf,
and generating a transformed representation of the slot label model yis
through the slot
2
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
gate g; and
[0010] jointly optimizing the intent prediction model yi and the transformed
slot label model yis
to construct a target function, and performing intent recognition on the
speech question
of the user based on the target function.
[0011] Preferably, the step of subjecting a speech question of a user to a
term-segmenting process
to obtain an original term sequence, and vectorizing the original term
sequence through
an embedding process includes:
[0012] receiving the speech question of the user and transforming the speech
question to a
recognizable text, and employing a tokenizer to term-segment the recognizable
text and
obtain the original term sequence; and
[0013] subjecting the original term sequence to a word embedding process, and
realizing a vector
representation of each segmented term in the original term sequence.
[0014] Preferably, the step of calculating a hidden state vector hi and a slot
context vector cis of
each term segmentation vector, and weighting the hidden state vector hi and
the slot
context vector cis to thereafter obtain a slot label model yis includes:
[0015] employing a bidirectional LSTM network to encode each term segmentation
vector, and
outputting the hidden state vector hi corresponding to each term segmentation
vector;
[0016] calculating the slot context vector cis, to which each term
segmentation vector
corresponds, through formula ciS = , wherein a
represents an attention
weight of a slot, its calculation formula is a11 = ¨ exp (ei k)
Texp (eij)
, e ¨ o-(Whsehj), where
represents a slot activation function, and WL represents a slot weight matrix;
and
[0017] constructing a slot label model yiS = softmax (Whse(hi +cis) ) based on
the hidden
state vector hi and the slot context vector cis.
[0018] Further, the step of calculating a hidden state vector hT and an intent
context vector cf of
the vectorized original term sequence, and weighting the hidden state vector
hT and the
3
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
intent context vector cf to thereafter obtain an intent prediction model yi
includes:
[0019] employing a hidden unit in the bidirectional LSTM network to encode the
vectorized
original term sequence, and obtaining the hidden state vector hT;
[0020] calculating the intent context vector cf of the original term sequence
through formula
c1 = EaJhT, wherein ai represents an attention weight of an intent, its
calculation
formula is al. ¨ Texp (e1)
e ¨ o- ' hT
) , where a' represents an intent activation
k=lexp (e k)'
function, and Ku, represents an intent weight matrix; and
[0021] constructing an intent prediction model 371 = so ftmax(Wilu,(hT + cl))
based on the
hidden state vector hT and the intent context vector cf.
[0022] Preferably, the step of employing a slot gate g to join the slot
context vector cis and the
intent context vector cf, and generating a transformed representation of the
slot label
model yis through the slot gate g includes:
[0023] formally representing the slot gate g as g= v = tanh (cr + W = c') ,
wherein v
represents a weight vector obtained by training, and W represents a weight
matrix
obtained by training; and
[0024] formally representing the transformation of the slot label model yis
through the slot gate
g as:
[0025] y = so ftmax(W hse(hi + c g)).
[0026] Optionally, the target function constructed by jointly optimizing the
intent prediction
model yi and the transformed slot label model yis is:
[0027] p(ys , 3711X) = p(y1 IX) {I p(yiclX) , wherein p(ys , 371 IX)
represents a conditional
probability for outputting slot filling and intent prediction at a given
original term
sequence, where X is the vectorized original term sequence.
[0028] Preferably, the step of performing intent recognition on the speech
question of the user
based on the target function includes:
4
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
[0029] sequentially obtaining intent conditional probabilities, to which the
various segmented
terms in the original term sequence correspond, through the target function;
and
[0030] screening therefrom a segmented term with the maximum probability value
and
recognizing the segmented term as the intent of the speech question of the
user.
[0031] In comparison with prior-art technology, the human-machine interactive
speech
recognizing method for an intelligent device provided by the present invention
achieves
the following advantageous effects.
[0032] In the human-machine interactive speech recognizing method for an
intelligent device
provided by the present invention, the speech question of the user as obtained
is firstly
transformed to a recognizable text, a term segmenting process is carried out
on the basis
of the recognizable text to generate an original term sequence, which is then
subjected to
a word embedding process to realize vector representation, thereafter, a slot
label model
yis and an intent prediction model yi are respectively constructed on the
basis of the
vectorized original term sequence, wherein the step of constructing the slot
label model
yis is to calculate a hidden state vector h, and a slot context vector c15 of
each term
segmentation vector, and weight the hidden state vector hi and the slot
context vector os
to thereafter obtain the slot label model yis, while the step of constructing
the intent
prediction model yi is to calculate a hidden state vector hT and an intent
context vector cf
of the original term sequence, and weight the hidden state vector hT and the
intent context
vector cf to thereafter obtain the intent prediction model j/; seen as such,
in order to fuse
the intent prediction model)/ with the slot label model yis, a decoder layer
is additionally
added to the existing encoder-decoder framework to construct the intent
prediction model
y', join the slot context vector c15 and the intent context vector c' by
introducing a slot gate
g, finally jointly optimize the intent prediction model yl and the transformed
slot label
model yis to obtain a target function, employ the target function to
sequentially obtain
intent conditional probabilities, to which the various segmented terms in the
original term
sequence correspond, and screen therefrom a segmented term with the maximum
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
probability value and recognize it as the intent of the speech question of the
user, so as to
ensure accuracy of speech recognition.
[0033] According to another aspect, the present invention provides a human-
machine interactive
speech recognizing system for an intelligent device, wherein the system is
applied to the
human-machine interactive speech recognizing method for an intelligent device
as recited
in the foregoing technical solution, and the system comprises:
[0034] a term segmentation processing unit, for subjecting a speech question
of a user to a term-
segmenting process to obtain an original term sequence, and vectorizing the
original term
sequence through an embedding process;
[0035] a first calculating unit, for calculating a hidden state vector hi and
a slot context vector cis
of each term segmentation vector, and weighting the hidden state vector hi and
the slot
context vector cis to thereafter obtain a slot label model yis;
[0036] a second calculating unit, for calculating a hidden state vector hT and
an intent context
vector cf of the vectorized original term sequence, and weighting the hidden
state vector
hT and the intent context vector cf to thereafter obtain an intent prediction
model yi;
[0037] a model transforming unit, for employing a slot gate g to join the slot
context vector cis
and the intent context vector cf, and generating a transformed representation
of the slot
label model yis through the slot gate g; and
[0038] a joint optimization unit, for jointly optimizing the intent prediction
model yl and the
transformed slot label model yis to construct a target function, and
performing intent
recognition on the speech question of the user based on the target function.
[0039] Preferably, the term segmentation processing unit includes:
[0040] a term-segmenting module, for receiving the speech question of the user
and transforming
the speech question to a recognizable text, and employing a tokenizer to term-
segment
the recognizable text and obtain the original term sequence; and
[0041] an embedding processing module, for subjecting the original term
sequence to a word
embedding process, and realizing a vector representation of each segmented
term in the
6
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
original term sequence.
[0042] Preferably, the first calculating unit includes:
[0043] a hidden state calculating module, for employing a bidirectional LSTM
network to encode
each term segmentation vector, and outputting the hidden state vector hi
corresponding to
each term segmentation vector;
[0044] a slot context calculating module, for calculating the slot context
vector cis, to which each
term segmentation vector corresponds, through formula c = Eah1, wherein ais:j
represents an attention weight of a slot, its calculation formula is ___ =
Texp (e
Ek=i exP (ei,k)'
e = o-(Whsehj), where a represents a slot activation function, and WL
represents a
slot weight matrix; and
[0045] a slot label model module, for constructing a slot label model yic =
so f tmax (14a(hi + cis) ) based on the hidden state vector hi and the slot
context
vector cis.
[0046] As compared with prior-art technology, the advantageous effects
achieved by the human-
machine interactive speech recognizing system for an intelligent device
provided by the
present invention are identical with the advantageous effects achievable by
the human-
machine interactive speech recognizing method for an intelligent device
provided by the
foregoing technical solution, so these are not redundantly described in this
context.
BRIEF DESCRIPTION OF THE DRAWINGS
[0047] The drawings described here are meant to provide further understanding
of the present
invention, and constitute part of the present invention. The exemplary
embodiments of
the present invention and the descriptions thereof are meant to explain the
present
invention, rather than to restrict the present invention. In the drawings:
7
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
[0048] Fig. 1 is a flowchart schematically illustrating the human-machine
interactive speech
recognizing method for an intelligent device in Embodiment 1 of the present
invention;
[0049] Fig. 2 is an exemplary view illustrating encoder-decoder fusing model
in Embodiment 1
of the present invention;
[0050] Fig. 3 is an exemplary view illustrating the slot gate g in Fig. 2; and
[0051] Fig. 4 is a block diagram illustrating the structure of the human-
machine interactive
speech recognizing system for an intelligent device in Embodiment 2 of the
present
invention.
[0052] Reference numerals:
[0053] 1 ¨ term segmentation processing unit
[0054] 3 ¨ second calculating unit
[0055] 5 ¨joint optimization unit
2¨ first calculating unit
4¨ model transforming unit
8
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
DETAILED DESCRIPTION OF THE INVENTION
[0056] To make more lucid and clear the objectives, features and advantages of
the present
invention, the technical solutions in the embodiments of the present invention
are clearly
and comprehensively described below with reference to the accompanying
drawings in
the embodiments of the present invention. Apparently, the embodiments as
described are
merely partial, rather than the entire, embodiments of the present invention.
All other
embodiments obtainable by persons ordinarily skilled in the art on the basis
of the
embodiments in the present invention without spending creative effort shall
all fall within
the protection scope of the present invention.
[0057] Embodiment 1
[0058] Fig. 1 is a flowchart schematically illustrating the human-machine
interactive speech
recognizing method for an intelligent device in Embodiment 1 of the present
invention.
Referring to Fig. 1, the human-machine interactive speech recognizing method
for an
intelligent device provided by this embodiment comprises:
[0059] subjecting a speech question of a user to a term-segmenting process to
obtain an original
term sequence, and vectorizing the original term sequence through an embedding
process;
calculating a hidden state vector hi and a slot context vector cis of each
term segmentation
vector, and weighting the hidden state vector hi and the slot context vector
cis to thereafter
obtain a slot label model yis; calculating a hidden state vector hT and an
intent context
vector cf of the vectorized original term sequence, and weighting the hidden
state vector
hT and the intent context vector cf to thereafter obtain an intent prediction
model yi;
employing a slot gate g to join the slot context vector cis and the intent
context vector cf,
and generating a transformed representation of the slot label model yis
through the slot
gate g; and jointly optimizing the intent prediction model./ and the
transformed slot label
model yis to construct a target function, and performing intent recognition on
the speech
9
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
question of the user based on the target function.
[0060] In the human-machine interactive speech recognizing method for an
intelligent device
provided by this embodiment, the speech question of the user as obtained is
firstly
transformed to a recognizable text, a term segmenting process is carried out
on the basis
of the recognizable text to generate an original term sequence, which is then
subjected to
a word embedding process to realize vector representation, thereafter, a slot
label model
yis and an intent prediction model yi are respectively constructed on the
basis of the
vectorized original term sequence, wherein the step of constructing the slot
label model
yis is to calculate a hidden state vector h, and a slot context vector cis of
each term
segmentation vector, and weight the hidden state vector hi and the slot
context vector cis
to thereafter obtain the slot label model yis, while the step of constructing
the intent
prediction model yi is to calculate a hidden state vector hT and an intent
context vector cf
of the original term sequence, and weight the hidden state vector hT and the
intent context
vector cf to thereafter obtain the intent prediction model yl; as shown in
Fig. 2, in order to
fuse the intent prediction model yi with the slot label model yis, a decoder
layer is
additionally added to the existing encoder-decoder framework to construct the
intent
prediction model
join the slot context vector cis and the intent context vector cf by
introducing a slot gate g, finally jointly optimize the intent prediction
model yl and the
transformed slot label model yis to obtain a target function, employ the
target function to
sequentially obtain intent conditional probabilities, to which the various
segmented terms
in the original term sequence correspond, and subsequently screen therefrom a
segmented
term with the maximum probability value and recognize it as the intent of the
speech
question of the user, so as to ensure accuracy of speech recognition.
[0061] Specifically, the step of subjecting a speech question of a user to a
term-segmenting
process to obtain an original term sequence, and vectorizing the original term
sequence
through an embedding process in the foregoing embodiment includes:
[0062] receiving the speech question of the user and transforming the speech
question to a
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
recognizable text, and employing a tokenizer to term-segment the recognizable
text and
obtain the original term sequence; and subjecting the original term sequence
to a word
embedding process, and realizing a vector representation of each segmented
term in the
original term sequence.
[0063] As should be noted, the step of calculating a hidden state vector hi
and a slot context
vector cis of each term segmentation vector, and weighting the hidden state
vector hi and
the slot context vector cis to thereafter obtain a slot label model yis in the
foregoing
embodiment includes:
[0064] employing a bidirectional LSTM network to encode each term segmentation
vector, and
outputting the hidden state vector hi corresponding to each term segmentation
vector;
calculating the slot context vector cis, to which each term segmentation
vector
corresponds, through formula c = ,
wherein a represents an attention
exp (ei j) s
weight of a slot, its calculation formula is = = _______________________ e
= o-(Whehj), where
exp
a represents a slot activation function, and WL represents a slot weight
matrix; and
constructing a slot label model y;s. = softmax (WL(hi +cis) ) based on the
hidden
state vector hi and the slot context vector cis.
[0065] During specific implementation, after plural term segmentation vectors
have been input
to the bidirectional LSTM network, hidden state vectors hi can be
correspondingly output
on a one-by-one basis, as regards formula c = E of
the slot context vector, where
represents the attention weight of the slot, i represents the ith term
segmentation
vector, and j represents the jth element in the ith term segmentation vector.
Specifically,
the calculation formula of the attention weight of the slot is cO. ¨ Texp (e
, e = =
exp (ei,k)
o-(Wilhj), where T represents the total number of elements in the term
segmentation
vector, and K represents the Kth element in T. In addition, as regards slot
activation
function a and slot weight matrix W,, these can be derived on the basis of
vector
11
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
matrix training of the original term sequence, and the specific training
processes are
conventional technical means frequently employed in this technical field, so
these are not
redundantly described in this embodiment.
[0066] The step of calculating a hidden state vector hT and an intent context
vector cf of the
vectorized original term sequence, and weighting the hidden state vector hT
and the intent
context vector cf to thereafter obtain an intent prediction model yi in the
foregoing
embodiment includes:
[0067] employing a hidden unit in the bidirectional LSTM network to encode the
vectorized
original term sequence, and obtaining the hidden state vector hT; calculating
the intent
context vector cf of the original term sequence through formula c1 = EaJhT,
wherein
aJ represents an attention weight of an intent, its calculation formula is aJ
¨ Texp (ei)
Ek,, exp (e k)'
ei = o-' (Wifi,hT) , where a' represents an intent activation function, and
Wif,
represents an intent weight matrix; and constructing an intent prediction
model 37' =
so f tmax (14/11,õ(hT + c')) based on the hidden state vector hT and the
intent context
vector c.f.
[0068] During the process of specific implementation, the method of training
the intent
prediction model yi is the same as the method of training the slot label model
yis, and the
difference rests in the fact that the hidden state vector hT can be obtained
merely by means
of a hidden unit in the bidirectional LSTM network, after one-dimensional
transformation
of the vector matrix, formula cd = E ct. hT is subsequently invoked to
calculate the
intent context vector cf of the original term sequence, where ct.; represents
an attention
weight of an intent, its calculation formula is ct" = Texp (e1)
e- = (K.", hT) ,
Ek,, exp (e k)
wherein a' represents an intent activation function, and Ku, represents an
intent
weight matrix; as regards the intent activation function a' and the intent
weight matrix
Wkõ, these can be derived on the basis of processed one-dimensional vector
training, the
12
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
specific training processes are conventional technical means frequently
employed in this
technical field, so these are not redundantly described in this embodiment.
[0069] Moreover, the step of employing a slot gate g to join the slot context
vector cis and the
intent context vector cf, and generating a transformed representation of the
slot label
model yis through the slot gate g in the foregoing embodiment includes:
[0070] formally representing the slot gate g as g= v = tanh (cis. + W = c') ,
wherein v
represents a weight vector obtained by training, and W represents a weight
matrix
obtained by training; and formally representing the transformation of the slot
label model
yis through the slot gate g as yis. = so ftmax (WL (hi + c g)). Fig. 3 shows a
structure
model of the slot gate g.
[0071] Preferably, the target function constructed by jointly optimizing the
intent prediction
model yi and the transformed slot label model yis in the foregoing embodiment
is:
[0072] p(ys y' po ,
polispo wherein p (ys, y11X) represents a conditional
probability for outputting slot filling and intent prediction at a given
original term
sequence, where X represents the vectorized original term sequence. After
expansion,
P (ys ,Y1 IX) = P(371 IX) Fr P(YiclX) = P(371 ixi,' xT)
P(Yis. 'xi, = = = xT) , where xi
represents the ith term segmentation vector, and T represents the total number
of term
segmentation vectors. Through calculation of the target function can be
obtained intent
probability values of the various term segmentation vectors, and a segmented
term with
the maximum probability value is screened out of the various term segmentation
vectors
and recognized as the intent of the speech question of the user.
[0073] Embodiment 2
[0074] Referring to Fig. 1 and Fig. 4, this embodiment provides a human-
machine interactive
speech recognizing system for an intelligent device, the system comprising:
13
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
[0075] a term segmentation processing unit 1, for subjecting a speech question
of a user to a
term-segmenting process to obtain an original term sequence, and vectorizing
the original
term sequence through an embedding process;
[0076] a first calculating unit 2, for calculating a hidden state vector hi
and a slot context vector
cis of each term segmentation vector, and weighting the hidden state vector hi
and the slot
context vector cis to thereafter obtain a slot label model yis;
[0077] a second calculating unit 3, for calculating a hidden state vector hT
and an intent context
vector ef of the vectorized original term sequence, and weighting the hidden
state vector
hT and the intent context vector cf to thereafter obtain an intent prediction
model yl;
[0078] a model transforming unit 4, for employing a slot gate g to join the
slot context vector cis
and the intent context vector cf, and generating a transformed representation
of the slot
label model yis through the slot gate g; and
[0079] a joint optimization unit 5, for jointly optimizing the intent
prediction model yl and the
transformed slot label model yis to construct a target function, and
performing intent
recognition on the speech question of the user based on the target function.
Specifically,
the term segmentation processing unit includes:
[0080] a term-segmenting module, for receiving the speech question of the user
and transforming
the speech question to a recognizable text, and employing a tokenizer to term-
segment
the recognizable text and obtain the original term sequence; and
[0081] an embedding processing module, for subjecting the original term
sequence to a word
embedding process, and realizing a vector representation of each segmented
term in the
original term sequence.
[0082] Specifically, the first calculating unit includes:
[0083] a hidden state calculating module, for employing a bidirectional LSTM
network to encode
each term segmentation vector, and outputting the hidden state vector hi
corresponding to
each term segmentation vector;
[0084] a slot context calculating module, for calculating the slot context
vector cis, to which each
term segmentation vector corresponds, through formula c = Eahj, wherein ais:j
14
Date Regue/Date Received 2022-07-04

CA 03166784 2022-07-04
exp (e ii)
represents an attention weight of a slot, its calculation formula is c0../ =

exp
cid = o-(Whsehj), where a represents a slot activation function, and WL
represents a
slot weight matrix; and
[0085] a slot label model module, for constructing a slot label model yis =
softmax (W, (hi +cis) ) based on the hidden state vector hi and the slot
context
vector cis.
[0086] As compared with prior-art technology, the advantageous effects
achieved by the human-
machine interactive speech recognizing system for an intelligent device
provided by this
embodiment of the present invention are identical with the advantageous
effects
achievable by the human-machine interactive speech recognizing method for an
intelligent device provided by the foregoing Embodiment 1, so these are not
redundantly
described in this context.
[0087] As understandable to persons ordinarily skilled in the art, realization
of the entire or
partial steps in the method of the present invention can be completed via a
program that
instructs relevant hardware, the program can be stored in a computer-readable
storage
medium, and subsumes the various steps of the method in the foregoing
embodiment
when it is executed, wherein the storage medium can be an ROM/RAM, a magnetic
disk,
an optical disk, or a memory card, etc.
[0088] What the above describes is merely directed to specific modes of
execution of the present
invention, but the protection scope of the present invention is not restricted
thereby. Any
change or replacement easily conceivable to persons skilled in the art within
the technical
range disclosed by the present invention shall be covered by the protection
scope of the
present invention. Accordingly, the protection scope of the present invention
shall be
based on the protection scope as claimed in the Claims.
Date Regue/Date Received 2022-07-04

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Examiner's Report 2024-05-23
Inactive: Report - No QC 2024-05-22
Inactive: Adhoc Request Documented 2023-12-18
Amendment Received - Voluntary Amendment 2023-12-18
Examiner's Report 2023-08-17
Inactive: Report - No QC 2023-07-22
Letter sent 2022-08-04
Priority Claim Requirements Determined Compliant 2022-08-03
Letter Sent 2022-08-03
Application Received - PCT 2022-08-02
Request for Priority Received 2022-08-02
Inactive: IPC assigned 2022-08-02
Inactive: First IPC assigned 2022-08-02
National Entry Requirements Determined Compliant 2022-07-04
Request for Examination Requirements Determined Compliant 2022-07-04
All Requirements for Examination Determined Compliant 2022-07-04
Application Published (Open to Public Inspection) 2020-07-09

Abandonment History

There is no abandonment history.

Maintenance Fee

The last payment was received on 2023-12-15

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
MF (application, 2nd anniv.) - standard 02 2021-09-20 2022-07-04
MF (application, 3rd anniv.) - standard 03 2022-09-19 2022-07-04
Basic national fee - standard 2022-07-04 2022-07-04
Reinstatement (national entry) 2022-07-04 2022-07-04
Request for examination - standard 2024-09-19 2022-07-04
MF (application, 4th anniv.) - standard 04 2023-09-19 2023-06-15
MF (application, 5th anniv.) - standard 05 2024-09-19 2023-12-15
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
10353744 CANADA LTD.
Past Owners on Record
CHUNSHENG LI
HONGYUAN JIA
PENGFEI SUN
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Claims 2023-12-17 17 1,030
Abstract 2022-07-03 1 25
Description 2022-07-03 15 670
Drawings 2022-07-03 3 86
Claims 2022-07-03 4 182
Representative drawing 2022-11-02 1 38
Examiner requisition 2024-05-22 3 175
Courtesy - Letter Acknowledging PCT National Phase Entry 2022-08-03 1 591
Courtesy - Acknowledgement of Request for Examination 2022-08-02 1 423
Examiner requisition 2023-08-16 3 151
Amendment / response to report 2023-12-17 43 3,052
National entry request 2022-07-03 13 1,268
International Preliminary Report on Patentability 2022-07-03 12 393
International search report 2022-07-03 4 140
Patent cooperation treaty (PCT) 2022-07-03 1 40
Amendment - Abstract 2022-07-03 2 131