Note: Claims are shown in the official language in which they were submitted.
Claims:
1. A computer device comprising:
an obtaining module, configured to obtain a session record, wherein the
session record
includes at least two statements, wherein the statements include questioning
statements
sent by questioners and answering statements sent by answerers;
a splitting module, configured to:
split the session record into corresponding groups according to a preset
splitting rule, wherein groups include at least one questioning statement and
at least one answering statement;
split the groups into corresponding statement pairs according to a processing
rule to which the groups correspond;
use, when a number of the answering statements included in the group does
not exceed a first preset threshold and a number of the questioning statements
as included exceeds the first preset threshold, a preset binary classifier to
predict whether the questioning statements as included and an antecedent
questioning statements of the questioning statements as included belong to a
same question;
a judging module, configured to determine the processing rule to which the
groups
correspond according to the number of the questioning statements and the
number of
the answering statements included in the groups, wherein the processing rule
is based
on the number of questioning statements and the number of answering statements
as
compared to the first preset threshold; and
an updating module, configured to update a knowledge base of a system
according to
statement pairs.
29
Date Recue/Date Received 202402-06
2. The computer device of claim 1, wherein each statement has a
corresponding generation
time.
3. The computer device of claim 2, wherein the splitting module further
comprises:
sequentially traversing the session record according to generation time of
each
statement;
judging, when the statement traversed is the questioning statement, whether a
traversed
questioning statement and an antecedent questioning statement of the traversed
questioning statement belong to same group according to a sentence pattern of
antecedent answering statement of the traversed questioning statement and/or
according
to an interval time to the antecedent questioning statement of the traversed
questioning
statement; and
determining, when the statement traversed is the answering statement, that a
traversed
answering statement belongs to the group to which the antecedent questioning
statement
of the traversed answering statement corresponds.
4. The computer device of claim 3, wherein the splitting module further
comprises:
splitting, when the number of the questioning statements included in the group
does not
exceed the first preset threshold, the questioning statements each into at
least two text
segments according to preset signs included in the questioning statements;
predicting whether two adjacent text segments belong to a same question by
employing
a preset binary classifier;
generating corresponding questioning statements respectively according to text
segments predicted to belong to the same question; and
generating the corresponding statement pairs according to all the questioning
statements
as generated and the answering statements included in the group.
Date Recue/Date Received 202402-06
5. The computer device of claim 4, further comprises:
traversing the text segments, and merging traversed text segments with
corresponding
posterior text segments when number of characters of the traversed text
segments is
smaller than a second preset threshold.
6. The computer device of claim 5, further comprises:
merging the traversed text segments with corresponding posterior text segments
by
employing a preset classifier algorithm when the traversed text segments and
the
corresponding posterior text segments belong to a same intent class or when
the
traversed text segments belong to a preset merging intent class.
7. The computer device of claim 6, wherein the splitting module further
comprises:
merging, when there are the questioning statements that belong to the same
question,
the questioning statements that belong to the same question and generating the
corresponding statement pairs according to all merged questioning statements
and the
answering statements; and
generating the corresponding statement pairs according to all the questioning
statements
and the answering statements included in the group, when there are no
questioning
statements that belong to the same question.
8. The computer device of claim 7, wherein the splitting module further
comprises:
combining, when the numbers of the questioning statements and the answering
statements included in the group both exceed the first preset threshold, the
questioning
statements and the answering statements included in the group; and
generating the corresponding statement pairs.
9. The computer device of claim 8, wherein the updating module further
comprises:
clustering the statement pairs by using a preset clustering algorithm;
31
Date Recue/Date Received 202402-06
generating statement pair groups;
determining the number of the questioning statements included in each
statement pair
group;
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm;
determining a weight to each statement pair group corresponds according to
corresponding matching degrees and number of questioning statements included
in the
statement pair groups; and
sequentially updating the knowledge base of the system according to the weight
to
which each statement pair group corresponds.
10. The computer device of claim 9, the splitting module further comprises:
rectifying any wrong word included in the session record according to a preset
rectifying rule; and
performing a normalizing process on rectified session record.
11. The computer device of claim 10, the splitting module further comprises:
recognizing an intent class to which each questioning statement included in
the session
record corresponds by employing the preset classifier algorithm and
eliminating any
questioning statement to which a preset irrelevant intent class corresponds as
included
in the session record.
12. The computer device of claim 11, further comprises a process of analyzing
and mining
dialogue statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session
record,
and
32
Date Recue/Date Received 202402-06
rectifying any wrong word included in the session record according to a preset
rectifying rule.
13. The computer device of claim 12, further comprises:
performing a purification operation on all characters included in the session
record;
recognizing a dialogue intent to which the questioning statement sent by each
user
corresponds by using the preset classifier algorithm;
14. The computer device of claim 13, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the
questioning
statements to be processed are traversed, and eliminating any questioning
statement to
be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the
antecedent
questioning statement of the questioning statement to be processed according
to a preset
merging rule;
eliminating any answering statement whose number of characters is smaller than
a
preset number threshold when the answering statements of the customer service
are
traversed, and determining any answering statement remaining after the
elimination as
an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering
statement when the antecedent statement of the answering statement to be
processed is
an answering statement of the customer service, and storing the merged
answering
statement to the group to which the antecedent questioning statement
corresponds; and
33
Date Recue/Date Received 202402-06
merging the answering statement to be processed in the group to which the
questioning
statement corresponds when the antecedent statement of the answering statement
to be
processed is a questioning statement of the user.
15. The computer device of claim 14, further comprises
splitting the questioning statements included in the group each into text
segments when
the group is of QA type;
processing the text segments from front to back, and merging any text segment
whose
number of characters is smaller than the preset number threshold in a
posterior text
segment of this text segment; and/or
merging any text segment pertaining to the same intent class as the
corresponding
posterior text segment or pertaining to a preset merging intent class in the
posterior text
segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a
sliding
window, and predicting obtained text segments belong to a same and single
question by
means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each
questioning statement and its antecedent questioning statement belong to the
same
question when the group is of QQA type; and
combining all questioning statements and answering statements in pairs to
generate
corresponding OA statement pairs when the group is of QAQA type.
16. The computer device of claim 15, further comprises:
clustering the statements by employing a preset clustering algorithm,
generating
statement pair groups, and determining the number of questioning statements
included
in each statement pair group;
34
Date Recue/Date Received 202402-06
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm; and
determining a weight to which each statement pair group corresponds according
to the
corresponding matching degrees and the numbers of questioning statements
included in
the statement pair groups.
17. The computer device of claim 16, wherein the session record is directed to
text statements,
principal wrong words are homonyms.
18. The computer device of claim 17, wherein the session record is directed to
a speech
statements, the session record is firstly required to convert the speech
statements into the
text statements through speech recognition technique.
19. The computer device of claim 18, wherein language model and word frequency
features
are combined.
20. The computer device of claim 19, wherein corresponding rectifying rules
are provided for
the speech statements and the text statements respectively.
21. The computer device of claim 20, wherein a wrong words are rectified
according to the
corresponding rectifying rules.
22. The computer device of claim 21, wherein the purification operation
includes removing
irrelevant characters including preset useless punctuations and preset stop
words,
recognizing irrelevant information contained in each text statement including
commodity
names and placenames and normalizing the irrelevant infoimation to
corresponding preset
characters according to which the irrelevant information corresponds.
23. The computer device of claim 22, the session record of the user with
customer service
within one day is a segment of the dialogue.
Date Recue/Date Received 202402-06
24. The computer device of claim 23, wherein the session record is split into
one or more
dialogues, and the dialogues are split into groups.
25. The computer device of claim 24, wherein the user consults same type of
questions within
a preset period of time, wherein the customer service has replied, the user
consults
different questions next time in the dialogue with the customer service.
26. The computer device of claim 25, wherein to eliminate any questioning
statement with
eliminable intent and irrelevant to business whose number of characters is
smaller than a
preset number threshold or intent is judged by the preset classifier algorithm
as chitchat
intent.
27. The computer device of claim 26, wherein the interval time between the
antecedent
questioning statement of the questioning statement to be processed and the
questioning
statement to be processed exceeds a corresponding preset time threshold and/or
when the
sentence pattern of the antecedent answering statement of the questioning
statement to be
processed is a preset sentence pattern, the questioning statement to be
processed is merged
with its antecedent questioning statement.
28. The computer device of claim 27, wherein the antecedent questioning
statement is a
questioning statement that is temporally antecedent to the statement to be
processed and
with a shortest interval time to the statement to be processed.
29. The computer device of claim 28, wherein the antecedent answering
statement is an
answering statement that is temporally antecedent to the statement to be
processed and
with the shortest interval time to the statement to be processed.
30. The computer device of claim 29, wherein the interval time between the
antecedent
questioning statement of the questioning statement to be processed and the
questioning
statement to be processed exceeds the corresponding preset time threshold,
judge there is
no relevancy between the antecedent questioning statement and the questioning
statement
to be processed, so a new group is generated according to the questioning
statement to be
processed.
36
Date Recue/Date Received 202402-06
31. The computer device of claim 30, wherein the corresponding preset time
threshold is not
exceeded, judge there is relevancy between the antecedent questioning
statement and the
questioning statement to be processed, and the questioning statement to be
processed is
merged in the group to which the antecedent questioning statement corresponds.
32. The computer device of claim 31, wherein the preset sentence patterns
include statements
that guide the user to further respond to responses made by the customer
service, includes
asks in reply by the customer service to indefinite expressions of users, or
the sentence
pattern asking for essential information from the user.
33. The computer device of claim 32, wherein the antecedent answering
statement is of the
preset sentence pattern, the statement sent by the user after the antecedent
answering
statement is made in reply to this antecedent answering statement and is
relevant to the
antecedent answering statement, and the questioning statement to be processed
is merged
with the antecedent questioning statement.
34. The computer device of claim 33, wherein the antecedent answering
statement of the
questioning statement to be processed is of the preset sentence pattern, the
questioning
statement to be processed is merged with the antecedent questioning statement,
and the
questioning statement to be processed is merged in the group to which the
antecedent
questioning statement corresponds.
35. The computer device of claim 34, wherein the antecedent statement is a
statement that is
temporally antecedent to the statement to be processed and with the shortest
interval time
to the statement to be processed.
36. The computer device of claim 35, wherein splitting the dialogue into
groups, wherein
result includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
37
Date Recue/Date Received 202402-06
plural questions correspond to a segment of reply, wherein the user asks
plural
questions and the customer service replies with a segment of words, is marked
as QQA;
and
plural questions correspond to plural replies, wherein several rounds of
communication
are carried out between the user and the customer service in a short time, is
marked as
QAQA.
37. The computer device of claim 36, wherein corresponding type is determined
according to
number of answering statements and questioning statements included in each
group, and is
processed according to corresponding processing rule.
38. The computer device of claim 37, wherein QA is a standard input form of a
algorithm,
wherein one standard question is only meant to express one question in the
knowledge
base.
39. The computer device of claim 38, wherein an auxiliary algorithm is
required during
splitting to judge whether two segments of words are directed to one question
or to two
questions.
40. The computer device of claim 39, wherein the auxiliary algorithm is a
binary classifier,
wherein inputs to the binary classifier are two statements.
41. The computer device of claim 40, wherein any model realizes binary
questions.
42. The computer device of claim 41, wherein model bert predicts during the
process of
pretraining whether input two statements are directed to context of the same
and single
statement or topics irrelevant to each other, serves as the classifier, and
fine-tuning training
is performed.
43. The computer device of claim 42, wherein the posterior text segment
indicates a text
segment following and immediately adjacent to the text segment being
processed.
38
Date Recue/Date Received 202402-06
44. The computer device of claim 43, wherein the classifier algorithm merges
any text
segment judged as pertaining to the same intent class as the posterior text
segment or
pertaining to the preset merging intent class as a chitchat class in the
posterior text
segment.
45. The computer device of claim 44, wherein text segments predicted to belong
to the same
question is merged into one questioning statement.
46. The computer device of claim 45, wherein the text segments predicted to
belong to
different questions are split into two different questioning statements.
47. The computer device of claim 46, wherein the groups of QA type not
belonging to the
same and single question are converted to groups of QQA type, wherein the
groups of QA
type whose all text segments belong to the same and single question are split
into a QA
statement pair only includes one questioning statement and one answering
statement.
48. The computer device of claim 47, wherein to recognize to which
circumstance a group of
QQA type specifically pertains, judge through a binary classification
algorithm.
49. The computer device of claim 48, wherein the questioning statement is
split into text
segments, and any text segment whose number of characters included is smaller
than the
preset number threshold or pertaining to the preset merging intent is directly
merged with
the antecedent questioning statement, or the questioning statement and the
antecedent
questioning statement are input together in the binary classification
algorithm to judge they
belong to the same question.
50. The computer device of claim 49, wherein it is judged any text segment and
corresponding
antecedent questioning statement belong to the same question, the text
statement and the
corresponding antecedent questioning statement are merged into one questioning
statement.
39
Date Recue/Date Received 202402-06
51. The computer device of claim 50, wherein it is recognized any text
segment and the
corresponding antecedent questioning statement do not belong to the same
question, the
questioning statement is split into new questioning statements.
52. The computer device of claim 51, wherein the statements remain are only
one answering
statement and one questioning statement, they are determined as the QA
statement pair.
53. The computer device of claim 52, wherein the statements remain are more
than one
questioning statement and one answering statement, the questioning statements
and the
answering statements are combined in pairs to generate corresponding QA
statement pairs.
54. The computer device of claim 53, wherein interaction between the user and
the customer
service in the short time is split into plural groups of QA statement pairs.
55. The computer device of claim 54, wherein the answering statements and the
questioning
statements are directly combined in pairs with respect to groups of QAQA type.
56. The computer device of claim 55, wherein clustering is to incorporate
similar questions
together to constitute a cluster.
57. The computer device of claim 56, wherein calculate text distance metrics
amongst the
statement pairs via a text matching algorithm and determine the statement
pairs belong to
same statement pair group according to the text distance metrics.
58. The computer device of claim 57, wherein the text matching algorithm is an
algorithm
calculates similarity degree of two texts.
59. The computer device of claim 58, wherein an unsupervised text matching
algorithm, word
mover's distance (WMD), is used.
60. The computer device of claim 59, wherein any clustering algorithm is
applied to determine
statement pairs belong to the same statement pair group.
61. The computer device of claim 60, wherein in all QA pairs, there are
invalid QA pairs
caused by imprecise splitting, and circumstance in which answers are not
pertinent to
Date Recue/Date Received 202402-06
questions asked due to negligence of the customer service, wherein invalid QA
statement
pairs are removed.
62. The computer device of claim 61, wherein filtration of the QA statement
pairs is decided
by matching degrees of questions and answers, wherein the QA statement pairs
whose
matching degrees between questioning statements and answering statements
satisfy a
preset condition remain, wherein the QA statement pairs whose matching degrees
between
questioning statements and answering statements do not satisfy are eliminated
and filtered
out.
63. The computer device of claim 62, wherein matching process is a text
matching process,
wherein a set of supervised algorithms are trained based on existing knowledge
base data
to perform similarity calculation.
64. The computer device of claim 63, wherein frequently asked questions have
higher
priorities to be maintained in the knowledge base.
65. The computer device of claim 64, wherein more important questions are
preferentially
maintained wherein some less valuable questions are neglected.
66. The computer device of claim 65, wherein frequencies by which questions
are asked are
measured by the number of questions under each cluster obtained.
67. The computer device of claim 66, wherein accuracy of answers are measured
by matching
degrees of questions and answers in a filtering process.
68. The computer device of claim 67, wherein the corresponding sorting weight
is derived by
normalizing two values and weighting and accumulating he two values.
69. The computer device of claim 68, wherein corresponding statement pairs are
sequentially
obtained according to sorting weights during subsequent maintenance of the
knowledge
base, and screened and processed manually or by machine, and maintained in the
knowledge base.
41
Date Recue/Date Received 202402-06
70. A system comprising:
an obtaining module, configured to obtain a session record, wherein the
session record
includes at least two statements, wherein the statements include questioning
statements
sent by questioners and answering statements sent by answerers;
a splitting module, configured to:
split the session record into corresponding groups according to a preset
splitting rule, wherein groups include at least one questioning statement and
at least one answering statement;
split the groups into corresponding statement pairs according to a processing
rule to which the groups correspond;
use, when a number of the answering statements included in the group does
not exceed a first preset threshold and a number of the questioning statements
as included exceeds the first preset threshold, a preset binary classifier to
predict whether the questioning statements as included and an antecedent
questioning statements of the questioning statements as included belong to a
same question;
a judging module, configured to determine the processing rule to which the
groups
correspond according to the number of the questioning statements and the
number of
the answering statements included in the groups, wherein the processing nrle
is based
on the number of questioning statements and the number of answering statements
as
compared to the first preset threshold; and
an updating module, configured to update a knowledge base of a system
according to
statement pairs.
71. The system of claim 70, wherein each statement has a corresponding
generation time.
72. The system of claim 71, wherein the splitting module further comprises:
42
Date Recue/Date Received 202402-06
sequentially traversing the session record according to generation time of
each
statement;
judging, when the statement traversed is the questioning statement, whether a
traversed
questioning statement and an antecedent questioning statement of the traversed
questioning statement belong to same group according to a sentence pattern of
antecedent answering statement of the traversed questioning statement and/or
according
to an interval time to the antecedent questioning statement of the traversed
questioning
statement; and
determining, when the statement traversed is the answering statement, that a
traversed
answering statement belongs to the group to which the antecedent questioning
statement
of the traversed answering statement corresponds.
73. The system of claim 72, wherein the splitting module further comprises:
splitting, when the number of the questioning statements included in the group
does not
exceed the first preset threshold, the questioning statements each into at
least two text
segments according to preset signs included in the questioning statements;
predicting whether two adjacent text segments belong to a same question by
employing
a preset binary classifier;
generating corresponding questioning statements respectively according to text
segments predicted to belong to the same question; and
generating the corresponding statement pairs according to all the questioning
statements
as generated and the answering statements included in the group.
74. The system of claim 73, further comprises:
traversing the text segments, and merging traversed text segments with
corresponding
posterior text segments when number of characters of the traversed text
segments is
smaller than a second preset threshold.
43
Date Recue/Date Received 202402-06
75. The system of claim 74, further comprises:
merging the traversed text segments with corresponding posterior text segments
by
employing a preset classifier algorithm when the traversed text segments and
the
corresponding posterior text segments belong to a same intent class or when
the
traversed text segments belong to a preset merging intent class.
76. The system of claim 75, wherein the splitting module further comprises:
merging, when there are the questioning statements that belong to the same
question,
the questioning statements that belong to the same question and generating the
corresponding statement pairs according to all merged questioning statements
and the
answering statements; and
generating the corresponding statement pairs according to all the questioning
statements
and the answering statements included in the group, when there are no
questioning
statements that belong to the same question.
77. The system of claim 76, wherein the splitting module further comprises:
combining, when the numbers of the questioning statements and the answering
statements included in the group both exceed the first preset threshold, the
questioning
statements and the answering statements included in the group; and
generating the corresponding statement pairs.
78. The system of claim 77, wherein the updating module further comprises:
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each
statement pair
group;
44
Date Recue/Date Received 202402-06
deterinining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm;
deterinining a weight to each statement pair group corresponds according to
corresponding matching degrees and number of questioning statements included
in the
statement pair groups; and
sequentially updating the knowledge base of the system according to the weight
to
which each statement pair group corresponds.
79. The system of claim 78, the splitting module further comprises:
rectifying any wrong word included in the session record according to a preset
rectifying rule; and
performing a normalizing process on rectified session record.
80. The system of claim 79, the splitting module further comprises:
recognizing an intent class to which each questioning statement included in
the session
record corresponds by employing the preset classifier algorithm and
eliminating any
questioning statement to which a preset irrelevant intent class corresponds as
included
in the session record.
81. The system of claim 80, further comprises a process of analyzing and
mining dialogue
statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session
record;
and
rectifying any wrong word included in the session record according to a preset
rectifying rule.
82. The system of claim 81, further comprises:
Date Recue/Date Received 202402-06
performing a purification operation on all characters included in the session
record; and
recognizing a dialogue intent to which the questioning statement sent by each
user
corresponds by using the preset classifier algorithm.
83. The system of claim 82, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the
questioning
statements to be processed are traversed, and eliminating any questioning
statement to
be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the
antecedent
questioning statement of the questioning statement to be processed according
to a preset
merging rule;
eliminating any answering statement whose number of characters is smaller than
a
preset number threshold when the answering statements of the customer service
are
traversed, and determining any answering statement remaining after the
elimination as
an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering
statement when the antecedent statement of the answering statement to be
processed is
an answering statement of the customer service, and storing the merged
answering
statement to the group to which the antecedent questioning statement
corresponds; and
merging the answering statement to be processed in the group to which the
questioning
statement corresponds when the antecedent statement of the answering statement
to be
processed is a questioning statement of the user.
84. The system of claim 83, further comprises
splitting the questioning statements included in the group each into text
segments when
the group is of QA type;
46
Date Recue/Date Received 202402-06
processing the text segments from front to back, and merging any text segment
whose
number of characters is smaller than the preset number threshold in a
posterior text
segment of this text segment; and/or
merging any text segment pertaining to the same intent class as the
corresponding
posterior text segment or pertaining to a preset merging intent class in the
posterior text
segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a
sliding
window, and predicting obtained text segments belong to a same and single
question by
means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each
questioning statement and its antecedent questioning statement belong to the
same
question when the group is of a QQA type; and
combining all questioning statements and answering statements in pairs to
generate
corresponding OA statement pairs when the group is of a QAQA type.
85. The system of claim 84, further comprises:
clustering the statements by employing a preset clustering algorithm,
generating
statement pair groups, and determining the number of questioning statements
included
in each statement pair group;
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm; and
determining a weight to which each statement pair group corresponds according
to the
corresponding matching degrees and the numbers of questioning statements
included in
the statement pair groups.
47
Date Recue/Date Received 202402-06
86. The system of claim 85, wherein the session record is directed to text
statements, principal
wrong words are homonyms.
87. The system of claim 86, wherein the session record is directed to a speech
statements, the
session record is firstly required to convert the speech statements into the
text statements
through speech recognition technique.
88. The system of claim 87, wherein language model and word frequency features
are
combined.
89. The system of claim 88, wherein corresponding rectifying rules are
provided for the
speech statements and the text statements respectively.
90. The system of claim 89, wherein a wrong words are rectified according to
the
corresponding rectifying rules.
91. The system of claim 90, wherein the purification operation includes
removing irrelevant
characters including preset useless punctuations and preset stop words,
recognizing
irrelevant information contained in each text statement including commodity
names and
placenames and normalizing the irrelevant information to corresponding preset
characters
according to which the irrelevant information corresponds.
92. The system of claim 91, the session record of the user with customer
service within one
day is a segment of the dialogue.
93. The system of claim 92, wherein the session record is split into one or
more dialogues, and
the dialogues are split into groups.
94. The system of claim 93, wherein the user consults same type of questions
within a preset
period of time, wherein the customer service has replied, the user consults
different
questions next time in the dialogue with the customer service.
95. The system of claim 94, wherein to eliminate any questioning statement
with eliminable
intent and irrelevant to business whose number of characters is smaller than a
preset
number threshold or intent is judged by the preset classifier algorithm as
chitchat intent.
48
Date Recue/Date Received 202402-06
96. The system of claim 95, wherein the interval time between the antecedent
questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds a corresponding preset time threshold and/or when the
sentence pattern
of the antecedent answering statement of the questioning statement to be
processed is a
preset sentence pattern, the questioning statement to be processed is merged
with its
antecedent questioning statement.
97. The system of claim 96, wherein the antecedent questioning statement is a
questioning
statement that is temporally antecedent to the statement to be processed and
with a shortest
interval time to the statement to be processed.
98. The system of claim 97, wherein the antecedent answering statement is an
answering
statement that is temporally antecedent to the statement to be processed and
with the
shortest interval time to the statement to be processed.
99. The system of claim 98, wherein the interval time between the antecedent
questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds the corresponding preset time threshold, judge there is no
relevancy
between the antecedent questioning statement and the questioning statement to
be
processed, so a new group is generated according to the questioning statement
to be
processed.
100. The system of claim 99, wherein the corresponding preset time threshold
is not exceeded,
judge there is relevancy between the antecedent questioning statement and the
questioning
statement to be processed, and the questioning statement to be processed is
merged in the
group to which the antecedent questioning statement corresponds.
101. The system of claim 100, wherein the preset sentence patterns include
statements that
guide the user to further respond to responses made by the customer service,
includes asks
in reply by the customer service to indefinite expressions of users, or the
sentence pattern
asking for essential information from the user.
49
Date Recue/Date Received 202402-06
102. The system of claim 101, wherein the antecedent answering statement is of
the preset
sentence pattern, the statement sent by the user after the antecedent
answering statement is
made in reply to this antecedent answering statement and is relevant to the
antecedent
answering statement, and the questioning statement to be processed is merged
with the
antecedent questioning statement.
103. The system of claim 102, wherein the antecedent answering statement of
the questioning
statement to be processed is of the preset sentence pattern, the questioning
statement to be
processed is merged with the antecedent questioning statement, and the
questioning
statement to be processed is merged in the group to which the antecedent
questioning
statement corresponds.
104. The system of claim 103, wherein the antecedent statement is a statement
that is
temporally antecedent to the statement to be processed and with the shortest
interval time
to the statement to be processed.
105. The system of claim 104, wherein splitting the dialogue into groups,
wherein result
includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
plural questions correspond to a segment of reply, wherein the user asks
plural
questions and the customer service replies with a segment of words, is marked
as QQA;
and
plural questions correspond to plural replies, wherein several rounds of
communication
are carried out between the user and the customer service in a short time, is
marked as
QAQA.
106. The system of claim 105, wherein corresponding type is determined
according to
number of answering statements and questioning statements included in each
group, and is
processed according to corresponding processing rule.
Date Recue/Date Received 202402-06
107. The system of claim 106, wherein QA is a standard input form of a
algorithm, wherein one
standard question is only meant to express one question in the knowledge base.
108. The system of claim 107, wherein an auxiliary algorithm is required
during splitting to
judge whether two segments of words are directed to one question or to two
questions.
109. The system of claim 108, wherein the auxiliary algorithm is a binary
classifier, wherein
inputs to the binary classifier are two statements.
110. The system of claim 109, wherein any model realizes binary questions.
111. The system of claim 110, wherein model bert predicts during the process
of pretraining
whether input two statements are directed to context of the same and single
statement or
topics irrelevant to each other, serves as the classifier, and fine-tuning
training is
performed.
112. The system of claim 111, wherein the posterior text segment indicates a
text segment
following and immediately adjacent to the text segment being processed.
113. The system of claim 112, wherein the classifier algorithm merges any text
segment judged
as pertaining to the same intent class as the posterior text segment or
pertaining to the
preset merging intent class as a chitchat class in the posterior text segment.
114. The system of claim 113, wherein text segments predicted to belong to the
same question
is merged into one questioning statement.
115. The system of claim 114, wherein the text segments predicted to belong to
different
questions are split into two different questioning statements.
116. The system of claim 115, wherein the groups of the QA type not belonging
to the same
and single question are converted to groups of the QQA type, wherein the
groups of the
QA type whose all text segments belong to the same and single question are
split into a
QA statement pair only includes one questioning statement and one answering
statement.
51
Date Recue/Date Received 202402-06
117. The system of claim 116, wherein to recognize to which circumstance a
group of the
QQA type specifically pertains, judge through a binary classification
algorithm.
118. The system of claim 117, wherein the questioning statement is split into
text segments, and
any text segment whose number of characters included is smaller than the
preset number
threshold or pertaining to the preset merging intent is directly merged with
the antecedent
questioning statement, or the questioning statement and the antecedent
questioning
statement are input together in the binary classification algorithm to judge
they belong to
the same question.
119. The system of claim 118, wherein it is judged any text segment and
corresponding
antecedent questioning statement belong to the same question, the text
statement and the
corresponding antecedent questioning statement are merged into one questioning
statement.
120. The system of claim 119, wherein it is recognized any text segment and
the
corresponding antecedent questioning statement do not belong to the same
question, the
questioning statement is split into new questioning statements.
121. The system of claim 120, wherein the statements remain are only one
answering statement
and one questioning statement, they are determined as the QA statement pair.
122. The system of claim 121, wherein the statements remain are more than one
questioning
statement and one answering statement, the questioning statements and the
answering
statements are combined in pairs to generate corresponding QA statement pairs.
123. The system of claim 122, wherein interaction between the user and the
customer service in
the short time is split into plural groups of QA statement pairs.
124. The system of claim 123, wherein the answering statements and the
questioning statements
are directly combined in pairs with respect to groups of the QAQA type.
125. The system of claim 124, wherein clustering is to incorporate similar
questions together to
constitute a cluster.
52
Date Recue/Date Received 202402-06
126. The system of claim 125, wherein calculate text distance metrics amongst
the statement
pairs via a text matching algorithm and determine the statement pairs belong
to same
statement pair group according to the text distance metrics.
127. The system of claim 126, wherein the text matching algorithm is an
algorithm calculates
similarity degree of two texts.
128. The system of claim 127, wherein an unsupervised text matching algorithm,
word mover's
distance (WMD), is used.
129. The system of claim 128, wherein any clustering algorithm is applied to
detelinine
statement pairs belong to the same statement pair group.
130. The system of claim 129, wherein in all QA pairs, there are invalid QA
pairs caused by
imprecise splitting, and circumstance in which answers are not pertinent to
questions asked
due to negligence of the customer service, wherein invalid QA statement pairs
are
removed.
131. The system of claim 130, wherein filtration of the QA statement pairs is
decided by
matching degrees of questions and answers, wherein the QA statement pairs
whose
matching degrees between questioning statements and answering statements
satisfy a
preset condition remain, wherein the QA statement pairs whose matching degrees
between
questioning statements and answering statements do not satisfy are eliminated
and filtered
out.
132. The system of claim 131, wherein matching process is a text matching
process, wherein a
set of supervised algorithms are trained based on existing knowledge base data
to perform
similarity calculation.
133. The system of claim 132, wherein frequently asked questions have higher
priorities to be
maintained in the knowledge base.
134. The system of claim 133, wherein more important questions are
preferentially maintained
wherein some less valuable questions are neglected.
53
Date Recue/Date Received 202402-06
135. The system of claim 134, wherein frequencies by which questions are asked
are measured
by the number of questions under each cluster obtained.
136. The system of claim 135, wherein accuracy of answers are measured by
matching degrees
of questions and answers in a filtering process.
137. The system of claim 136, wherein the corresponding sorting weight is
derived by
normalizing two values and weighting and accumulating he two values.
138. The system of claim 137, wherein corresponding statement pairs are
sequentially obtained
according to sorting weights during subsequent maintenance of the knowledge
base, and
screened and processed manually or by machine, and maintained in the knowledge
base.
139.A method comprising:
obtaining a session record, wherein the session record includes at least two
statements,
wherein the statements include questioning statements sent by questioners and
answering statements sent by answerers;
splitting the session record into corresponding groups according to a preset
splitting
rule, wherein groups include at least one questioning statement and at least
one
answering statement;
deteunining a processing rule to which the groups correspond according to
number of
the questioning statements and number of the answering statements included in
the
groups, wherein the processing rule is based on the number of questioning
statements
and the number of answering statements as compared to a first preset
threshold;
splitting the groups into corresponding statement pairs according to the
processing rule
to which the groups correspond;
using, when the number of the answering statements included in the group does
not
exceed the first preset threshold and the number of the questioning statements
as
included exceeds the first preset threshold, a preset binary classifier to
predict whether
54
Date Recue/Date Received 202402-06
the questioning statements as included and an antecedent questioning
statements of the
questioning statements as included belong to a same question; and
updating a knowledge base of a system according to statement pairs.
140. The method of claim 139, wherein each statement has a corresponding
generation time.
141. The method of claim 140, wherein splitting the session record into
corresponding groups
according to the preset splitting rule comprises:
sequentially traversing the session record according to generation time of
each
statement;
judging, when the statement traversed is the questioning statement, whether a
traversed
questioning statement and an antecedent questioning statement of the traversed
questioning statement belong to same group according to a sentence pattern of
antecedent answering statement of the traversed questioning statement and/or
according
to an interval time to the antecedent questioning statement of the traversed
questioning
statement; and
determining, when the statement traversed is the answering statement, a
traversed
answering statement belongs to the group to which the antecedent questioning
statement
of the traversed answering statement corresponds.
142. The method of claim 141, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
splitting, when the number of the questioning statements included in the group
does not
exceed the first preset threshold, the questioning statements each into at
least two text
segments according to preset signs included in the questioning statements;
predicting two adjacent text segments belong to a same question by employing a
preset
binary classifier;
Date Recue/Date Received 202402-06
generating corresponding questioning statements respectively according to text
segments predicted to belong to the same question; and
generating the corresponding statement pairs according to all the questioning
statements
as generated and the answering statements included in the group.
143. The method of claim 142, further comprises:
traversing the text segments, and merging traversed text segments with
corresponding
posterior text segments when number of characters of the traversed text
segments is
smaller than a second preset threshold.
144. The method of claim 143, further comprises:
merging the traversed text segments with corresponding posterior text segments
by
employing a preset classifier algorithm when the traversed text segments and
the
corresponding posterior text segments belong to a same intent class or when
the
traversed text segments belong to a preset merging intent class.
145. The method of claim 144, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
combining, when the numbers of the questioning statements and the answering
statements included in the group both exceed the first preset threshold, the
questioning
statements and the answering statements included in the group; and
generating the corresponding statement pairs.
146. The method of claim 145, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
merging, when there are the questioning statements belong to the same
question, the
questioning statements belong to the same question and generating the
corresponding
56
Date Recue/Date Received 202402-06
statement pairs according to all merged questioning statements and the
answering
statements; and
generating the corresponding statement pairs according to all the questioning
statements
and the answering statements included in the group, when there are no
questioning
statements belong to the same question.
147. The method of claim 146, wherein updating the knowledge base of the
system according
to the statement pairs comprises:
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each
statement pair
group;
detemining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm;
determining a weight to each statement pair group corresponds according to
corresponding matching degrees and number of questioning statements included
in the
statement pair groups; and
sequentially150 updating the knowledge base of the system according to the
weight to
which each statement pair group corresponds.
148. The method of claim 147, further comprises:
rectifying any wrong word included in the session record according to a preset
rectifying rule; and
performing a normalizing process on rectified session record.
57
Date Recue/Date Received 202402-06
149. The method of claim 148, further comprises:
recognizing the intent class to which each questioning statement included in
the session
record corresponds by using the preset classifier algorithm and eliminating
any
questioning statement to which a preset irrelevant intent class corresponds as
included
in the session record.
150. The method of claim 149, further comprises a process of analyzing and
mining dialogue
statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session
record;
and
rectifying any wrong word included in the session record according to a preset
rectifying rule.
151. The method of claim 150, further comprises:
performing a purification operation on all characters included in the session
record; and
recognizing a dialogue intent to which the questioning statement sent by each
user
corresponds by using the preset classifier algorithm.
152. The method of claim 151, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the
questioning
statements to be processed are traversed, and eliminating any questioning
statement to
be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the
antecedent
questioning statement of the questioning statement to be processed according
to a preset
merging rule;
58
eliminating any answering statement whose number of characters is smaller than
a
preset number threshold when the answering statements of the customer service
are
traversed, and determining any answering statement remaining after the
elimination as
an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering
statement when the antecedent statement of the answering statement to be
processed is
an answering statement of the customer service, and storing the merged
answering
statement to the group to which the antecedent questioning statement
corresponds; and
merging the answering statement to be processed in the group to which the
questioning
statement corresponds when the antecedent statement of the answering statement
to be
processed is a questioning statement of the user.
153. The method of claim 152, further comprises
splitting the questioning statements included in the group each into text
segments when
the group is of a QA type;
processing the text segments from front to back, and merging any text segment
whose
number of characters is smaller than the preset number threshold in a
posterior text
segment of this text segment; and/or
merging any text segment pertaining to the same intent class as the
corresponding
posterior text segment or pertaining to a preset merging intent class in the
posterior text
segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a
sliding
window, and predicting obtained text segments belong to a same and single
question by
means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each
questioning statement and its antecedent questioning statement belong to the
same
question when the group is of a QQA type; an
59
Date Recue/Date Received 202402-06
combining all questioning statements and answering statements in pairs to
generate
corresponding OA statement pairs when the group is of a QAQA type.
154. The method of claim 153 further comprises:
clustering the statements by employing a preset clustering algorithm,
generating
statement pair groups, and determining the number of questioning statements
included
in each statement pair group;
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm; and
detemrining a weight to which each statement pair group corresponds according
to the
corresponding matching degrees and the numbers of questioning statements
included in
the statement pair groups.
155. The method of claim 154, wherein the session record is directed to text
statements,
principal wrong words are homonyms.
156. The method of claim 155, wherein the session record is directed to a
speech statements,
the session record is firstly required to convert the speech statements into
the text
statements through speech recognition technique.
157. The method of claim 156, wherein language model and word frequency
features are
combined.
158. The method of claim 157, wherein corresponding rectifying rules are
provided for the
speech statements and the text statements respectively.
159. The method of claim 158, wherein a wrong words are rectified according to
the
corresponding rectifying rules.
160. The method of claim 159, wherein the purification operation includes
removing irrelevant
characters including preset useless punctuations and preset stop words,
recognizing
Date Recue/Date Received 202402-06
irrelevant information contained in each text statement including commodity
names and
placenames and normalizing the irrelevant information to corresponding preset
characters
according to which the irrelevant information corresponds.
161. The method of claim 160, the session record of the user with customer
service within one
day is a segment of the dialogue.
162. The method of claim 161, wherein the session record is split into one or
more dialogues,
and the dialogues are split into groups.
163. The method of claim 162, wherein the user consults same type of questions
within a preset
period of time, wherein the customer service has replied, the user consults
different
questions next time in the dialogue with the customer service.
164. The method of claim 163, wherein to eliminate any questioning statement
with eliminable
intent and irrelevant to business whose number of characters is smaller than a
preset
number threshold or intent is judged by the preset classifier algorithm as
chitchat intent.
165. The method of claim 164, wherein the interval time between the antecedent
questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds a corresponding preset time threshold and/or when the
sentence pattern
of the antecedent answering statement of the questioning statement to be
processed is a
preset sentence pattern, the questioning statement to be processed is merged
with its
antecedent questioning statement.
166. The method of claim 165, wherein the antecedent questioning statement is
a questioning
statement that is temporally antecedent to the statement to be processed and
with a shortest
interval time to the statement to be processed.
167. The method of claim 166, wherein the antecedent answering statement is an
answering
statement that is temporally antecedent to the statement to be processed and
with the
shortest interval time to the statement to be processed.
61
Date Recue/Date Received 202402-06
168. The method of claim 167, wherein the interval time between the antecedent
questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds the corresponding preset time threshold, judge there is no
relevancy
between the antecedent questioning statement and the questioning statement to
be
processed, so a new group is generated according to the questioning statement
to be
processed.
169. The method of claim 168, wherein the corresponding preset time threshold
is not
exceeded, judge there is relevancy between the antecedent questioning
statement and the
questioning statement to be processed, and the questioning statement to be
processed is
merged in the group to which the antecedent questioning statement corresponds.
170. The method of claim 169, wherein the preset sentence patterns include
statements that
guide the user to further respond to responses made by the customer service,
includes asks
in reply by the customer service to indefinite expressions of users, or the
sentence pattern
asking for essential information from the user.
171. The method of claim 170, wherein the antecedent answering statement is of
the preset
sentence pattern, the statement sent by the user after the antecedent
answering statement is
made in reply to this antecedent answering statement and is relevant to the
antecedent
answering statement, and the questioning statement to be processed is merged
with the
antecedent questioning statement.
172. The method of claim 171, wherein the antecedent answering statement of
the questioning
statement to be processed is of the preset sentence pattern, the questioning
statement to be
processed is merged with the antecedent questioning statement, and the
questioning
statement to be processed is merged in the group to which the antecedent
questioning
statement corresponds.
173. The method of claim 172, wherein the antecedent statement is a statement
that is
temporally antecedent to the statement to be processed and with the shortest
interval time
to the statement to be processed.
62
Date Recue/Date Received 202402-06
174. The method of claim 173, wherein splitting the dialogue into groups,
wherein result
includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
plural questions correspond to a segment of reply, wherein the user asks
plural
questions and the customer service replies with a segment of words, is marked
as QQA;
and
plural questions correspond to plural replies, wherein several rounds of
communication
are carried out between the user and the customer service in a short time, is
marked as
QAQA.
175. The method of claim 174, wherein corresponding type is determined
according to
number of answering statements and questioning statements included in each
group, and is
processed according to corresponding processing rule.
176. The method of claim 175, wherein QA is a standard input form of a
algorithm, wherein
one standard question is only meant to express one question in the knowledge
base.
177. The method of claim 176, wherein an auxiliary algorithm is required
during splitting to
judge whether two segments of words are directed to one question or to two
questions.
178. The method of claim 177, wherein the auxiliary algorithm is a binary
classifier, wherein
inputs to the binary classifier are two statements.
179. The method of claim 178, wherein any model realizes binary questions.
180. The method of claim 179, wherein model bed predicts during the process of
pretraining
whether input two statements are directed to context of the same and single
statement or
topics irrelevant to each other, serves as the classifier, and fine-tuning
training is
performed.
181. The method of claim 180, wherein the posterior text segment indicates a
text segment
following and immediately adjacent to the text segment being processed.
63
Date Recue/Date Received 202402-06
182. The method of claim 181, wherein the classifier algorithm merges any text
segment judged
as pertaining to the same intent class as the posterior text segment or
pertaining to the
preset merging intent class as a chitchat class in the posterior text segment.
183. The method of claim 182, wherein text segments predicted to belong to the
same question
is merged into one questioning statement.
184. The method of claim 183, wherein the text segments predicted to belong to
different
questions are split into two different questioning statements.
185. The method of claim 184, wherein the groups of the QA type not belonging
to the same
and single question are converted to groups of the QQA type, wherein the
groups of the
QA type whose all text segments belong to the same and single question are
split into a
QA statement pair only includes one questioning statement and one answering
statement.
186. The method of claim 185, wherein to recognize to which circumstance a
group of the
QQA type specifically pertains, judge through a binary classification
algorithm.
187. The method of claim 186, wherein the questioning statement is split into
text segments,
and any text segment whose number of characters included is smaller than the
preset
number threshold or pertaining to the preset merging intent is directly merged
with the
antecedent questioning statement, or the questioning statement and the
antecedent
questioning statement are input together in the binary classification
algorithm to judge they
belong to the same question.
188. The method of claim 187, wherein it is judged any text segment and
corresponding
antecedent questioning statement belong to the same question, the text
statement and the
corresponding antecedent questioning statement are merged into one questioning
statement.
189. The method of claim 188, wherein it is recognized any text segment and
the
corresponding antecedent questioning statement do not belong to the same
question, the
questioning statement is split into new questioning statements.
64
Date Recue/Date Received 202402-06
190. The method of claim 189, wherein the statements remain are only one
answering statement
and one questioning statement, they are determined as the QA statement pair.
191. The method of claim 190, wherein the statements remain are more than one
questioning
statement and one answering statement, the questioning statements and the
answering
statements are combined in pairs to generate corresponding QA statement pairs.
192. The method of claim 191, wherein interaction between the user and the
customer service
in the short time is split into plural groups of QA statement pairs.
193. The method of claim 192, wherein the answering statements and the
questioning
statements are directly combined in pairs with respect to groups of the QAQA
type.
194. The method of claim 193, wherein clustering is to incorporate similar
questions together to
constitute a cluster.
195. The method of claim 194, wherein calculate text distance metrics amongst
the statement
pairs via a text matching algorithm and determine the statement pairs belong
to same
statement pair group according to the text distance metrics.
196. The method of claim 195, wherein the text matching algorithm is an
algorithm calculates
similarity degree of two texts.
197. The method of claim 196, wherein an unsupervised text matching algorithm,
word mover's
distance (WMD), is used.
198. The method of claim 197, wherein any clustering algorithm is applied to
determine
statement pairs belong to the same statement pair group.
199. The method of claim 198, wherein in all QA pairs, there are invalid QA
pairs caused by
imprecise splitting, and circumstance in which answers are not pertinent to
questions asked
due to negligence of the customer service, wherein invalid QA statement pairs
are
removed.
Date Recue/Date Received 202402-06
200. The method of claim 199, wherein filtration of the QA statement pairs is
decided by
matching degrees of questions and answers, wherein the QA statement pairs
whose
matching degrees between questioning statements and answering statements
satisfy a
preset condition remain, wherein the QA statement pairs whose matching degrees
between
questioning statements and answering statements do not satisfy are eliminated
and filtered
out.
201. The method of claim 200, wherein matching process is a text matching
process, wherein a
set of supervised algorithms are trained based on existing knowledge base data
to perform
similarity calculation.
202. The method of claim 201, wherein frequently asked questions have higher
priorities to be
maintained in the knowledge base.
203. The method of claim 202, wherein more important questions are
preferentially maintained
wherein some less valuable questions are neglected.
204. The method of claim 203, wherein frequencies by which questions are asked
are measured
by the number of questions under each cluster obtained.
205. The method of claim 204, wherein accuracy of answers are measured by
matching degrees
of questions and answers in a filtering process.
206. The method of claim 205, wherein the corresponding sorting weight is
derived by
normalizing two values and weighting and accumulating he two values.
207. The method of claim 206, wherein corresponding statement pairs are
sequentially obtained
according to sorting weights during subsequent maintenance of the knowledge
base, and
screened and processed manually or by machine, and maintained in the knowledge
base.
208. An electronic equipment comprising:
one or more processors;
66
Date Recue/Date Received 202402-06
a memory, associated with the one or more processors and used for storing a
program
instruction wherein the program instruction is executed by the one or more
processors
configured to:
obtain a session record, wherein the session record includes at least two
statements, wherein the statements include questioning statements sent by
questioners and answering statements sent by answerers;
split the session record into corresponding groups according to a preset
splitting rule, wherein the groups include at least one questioning statement
and at least one answering statement;
determine a processing rule to which the groups correspond according to a
number of the questioning statements and a number of the answering
statements included in the groups, wherein the processing rule is based on the
number of questioning statements and the number of answering statements as
compared to a first preset threshold;
split the groups into corresponding statement pairs according to the
processing rule to which the groups correspond;
using, when the number of the answering statements included in the group
does not exceed the first preset threshold and the number of the questioning
statements as included exceeds the first preset threshold, a preset binary
classifier to predict whether the questioning statements as included and an
antecedent questioning statements of the questioning statements as included
belong to a same question; and
updating a knowledge base of a system according to the statement pairs.
209. The equipment of claim 208, wherein each statement has a corresponding
generation time.
67
Date Recue/Date Received 202402-06
210. The equipment of claim 209, wherein splitting the session record into
corresponding
groups according to the preset splitting rule comprises:
sequentially traversing the session record according to generation time of
each
statement;
judging, when the statement traversed is the questioning statement, whether a
traversed
questioning statement and an antecedent questioning statement of the traversed
questioning statement belong to same group according to a sentence pattern of
antecedent answering statement of the traversed questioning statement and/or
according
to an interval time to the antecedent questioning statement of the traversed
questioning
statement; and
determining, when the statement traversed is the answering statement, a
traversed
answering statement belongs to the group to which the antecedent questioning
statement
of the traversed answering statement corresponds.
211. The equipment of claim 210, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
splitting, when the number of the questioning statements included in the group
does not
exceed the first preset threshold, the questioning statements each into at
least two text
segments according to preset signs included in the questioning statements;
predicting two adjacent text segments belong to a same question by employing a
preset
binary classifier;
generating corresponding questioning statements respectively according to text
segments predicted to belong to the same question; and
generating the corresponding statement pairs according to all the questioning
statements
as generated and the answering statements included in the group.
212. The equipment of claim 211, further comprises:
68
Date Recue/Date Received 202402-06
traversing the text segments, and merging traversed text segments with
corresponding
posterior text segments when number of characters of the traversed text
segments is
smaller than a second preset threshold.
213. The equipment of claim 212, further comprises:
merging the traversed text segments with corresponding posterior text segments
by
employing a preset classifier algorithm when the traversed text segments and
the
corresponding posterior text segments belong to a same intent class or when
the
traversed text segments belong to a preset merging intent class.
214. The equipment of claim 213, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
combining, when the numbers of the questioning statements and the answering
statements included in the group both exceed the first preset threshold, the
questioning
statements and the answering statements included in the group; and
generating the corresponding statement pairs.
215. The equipment of claim 214, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
merging, when there are the questioning statements belong to the same
question, the
questioning statements belong to the same question and generating the
corresponding
statement pairs according to all merged questioning statements and the
answering
statements; and
generating the corresponding statement pairs according to all the questioning
statements
and the answering statements included in the group, when there are no
questioning
statements belong to the same question.
216. The equipment of claim 215, wherein updating the knowledge base of the
system
according to the statement pairs comprises:
69
Date Recue/Date Received 202402-06
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each
statement pair
group;
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm;
determining a weight to each statement pair group corresponds according to
corresponding matching degrees and number of questioning statements included
in the
statement pair groups; and
sequentially150 updating the knowledge base of the system according to the
weight to
which each statement pair group corresponds.
217. The equipment of claim 216, further comprises:
rectifying any wrong word included in the session record according to a preset
rectifying rule; and
performing a normalizing process on rectified session record.
218. The equipment of claim 217, further comprises:
recognizing the intent class to which each questioning statement included in
the session
record corresponds by using the preset classifier algorithm and eliminating
any
questioning statement to which a preset irrelevant intent class corresponds as
included
in the session record.
219. The equipment of claim 218, further comprises a process of analyzing and
mining dialogue
statements between customer service and a user comprising:
Date Recue/Date Received 202402-06
obtaining a session record to be processed and preprocessing obtained session
record;
and
rectifying any wrong word included in the session record according to a preset
rectifying rule.
220. The equipment of claim 219, further comprises:
perfoiming a purification operation on all characters included in the session
record; and
recognizing a dialogue intent to which the questioning statement sent by each
user
corresponds by using the preset classifier algorithm.
221. The equipment of claim 220, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the
questioning
statements to be processed are traversed, and eliminating any questioning
statement to
be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the
antecedent
questioning statement of the questioning statement to be processed according
to a preset
merging rule;
eliminating any answering statement whose number of characters is smaller than
a
preset number threshold when the answering statements of the customer service
are
traversed, and determining any answering statement remaining after the
elimination as
an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering
statement when the antecedent statement of the answering statement to be
processed is
an answering statement of the customer service, and storing the merged
answering
statement to the group to which the antecedent questioning statement
corresponds;
71
Date Recue/Date Received 202402-06
merging the answering statement to be processed in the group to which the
questioning
statement corresponds when the antecedent statement of the answering statement
to be
processed is a questioning statement of the user;
222. The equipment of claim 221, further comprises
splitting the questioning statements included in the group each into text
segments when
the group is of a QA type;
processing the text segments from front to back, and merging any text segment
whose
number of characters is smaller than the preset number threshold in a
posterior text
segment of this text segment; and/or
merging any text segment pertaining to the same intent class as the
corresponding
posterior text segment or pertaining to a preset merging intent class in the
posterior text
segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a
sliding
window, and predicting obtained text segments belong to a same and single
question by
means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each
questioning statement and its antecedent questioning statement belong to the
same
question when the group is of a QQA type; and
combining all questioning statements and answering statements in pairs to
generate
corresponding OA statement pairs when the group is of a QAQA type.
223. The equipment of claim 222 further comprises:
clustering the statements by employing a preset clustering algorithm,
generating
statement pair groups, and determining the number of questioning statements
included
in each statement pair group;
72
Date Recue/Date Received 202402-06
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm; and
determining a weight to which each statement pair group corresponds according
to the
corresponding matching degrees and the numbers of questioning statements
included in
the statement pair groups.
224. The equipment of claim 223, wherein the session record is directed to
text statements,
principal wrong words are homonyms.
225. The equipment of claim 224, wherein the session record is directed to a
speech statements,
the session record is firstly required to convert the speech statements into
the text
statements through speech recognition technique.
226. The equipment of claim 225, wherein language model and word frequency
features are
combined.
227. The equipment of claim 226, wherein corresponding rectifying rules are
provided for the
speech statements and the text statements respectively.
228. The equipment of claim 227, wherein a wrong words are rectified according
to the
corresponding rectifying rules.
229. The equipment of claim 228, wherein the purification operation includes
removing
irrelevant characters including preset useless punctuations and preset stop
words,
recognizing irrelevant infoimation contained in each text statement including
commodity
names and placenames and normalizing the irrelevant information to
corresponding preset
characters according to which the irrelevant information corresponds.
230. The equipment of claim 229, the session record of the user with customer
service within
one day is a segment of the dialogue.
73
Date Recue/Date Received 202402-06
231. The equipment of claim 230, wherein the session record is split into one
or more
dialogues, and the dialogues are split into groups.
232. The equipment of claim 231, wherein the user consults same type of
questions within a
preset period of time, wherein the customer service has replied, the user
consults different
questions next time in the dialogue with the customer service.
233. The equipment of claim 232, wherein to eliminate any questioning
statement with
eliminable intent and irrelevant to business whose number of characters is
smaller than a
preset number threshold or intent is judged by the preset classifier algorithm
as chitchat
intent.
234. The equipment of claim 233, wherein the interval time between the
antecedent questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds a corresponding preset time threshold and/or when the
sentence pattern
of the antecedent answering statement of the questioning statement to be
processed is a
preset sentence pattern, the questioning statement to be processed is merged
with its
antecedent questioning statement.
235. The equipment of claim 234, wherein the antecedent questioning statement
is a
questioning statement that is temporally antecedent to the statement to be
processed and
with a shortest interval time to the statement to be processed.
236. The equipment of claim 235, wherein the antecedent answering statement is
an answering
statement that is temporally antecedent to the statement to be processed and
with the
shortest interval time to the statement to be processed.
237. The equipment of claim 236, wherein the interval time between the
antecedent questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds the corresponding preset time threshold, judge there is no
relevancy
between the antecedent questioning statement and the questioning statement to
be
processed, so a new group is generated according to the questioning statement
to be
processed.
74
Date Recue/Date Received 202402-06
238. The equipment of claim 237, wherein the corresponding preset time
threshold is not
exceeded, judge there is relevancy between the antecedent questioning
statement and the
questioning statement to be processed, and the questioning statement to be
processed is
merged in the group to which the antecedent questioning statement corresponds.
239. The equipment of claim 238, wherein the preset sentence patterns include
statements that
guide the user to further respond to responses made by the customer service,
includes asks
in reply by the customer service to indefinite expressions of users, or the
sentence pattern
asking for essential information from the user.
240. The equipment of claim 239, wherein the antecedent answering statement is
of the preset
sentence pattern, the statement sent by the user after the antecedent
answering statement is
made in reply to this antecedent answering statement and is relevant to the
antecedent
answering statement, and the questioning statement to be processed is merged
with the
antecedent questioning statement.
241. The equipment of claim 240, wherein the antecedent answering statement of
the
questioning statement to be processed is of the preset sentence pattern, the
questioning
statement to be processed is merged with the antecedent questioning statement,
and the
questioning statement to be processed is merged in the group to which the
antecedent
questioning statement corresponds.
242. The equipment of claim 241, wherein the antecedent statement is a
statement that is
temporally antecedent to the statement to be processed and with the shortest
interval time
to the statement to be processed.
243. The equipment of claim 242, wherein splitting the dialogue into groups,
wherein result
includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
Date Recue/Date Received 202402-06
plural questions correspond to a segment of reply, wherein the user asks
plural
questions and the customer service replies with a segment of words, is marked
as QQA;
and
plural questions correspond to plural replies, wherein several rounds of
communication
are carried out between the user and the customer service in a short time, is
marked as
QAQA.
244. The equipment of claim 243, wherein corresponding type is determined
according to
number of answering statements and questioning statements included in each
group, and is
processed according to corresponding processing rule.
245. The equipment of claim 244, wherein QA is a standard input foiiii of a
algorithm, wherein
one standard question is only meant to express one question in the knowledge
base.
246. The equipment of claim 245, wherein an auxiliary algorithm is required
during splitting to
judge whether two segments of words are directed to one question or to two
questions.
247. The equipment of claim 246, wherein the auxiliary algorithm is a binary
classifier, wherein
inputs to the binary classifier are two statements.
248. The equipment of claim 247, wherein any model realizes binary questions.
249. The equipment of claim 248, wherein model bert predicts during the
process of pretraining
whether input two statements are directed to context of the same and single
statement or
topics irrelevant to each other, serves as the classifier, and fine-tuning
training is
perfoimed.
250. The equipment of claim 249, wherein the posterior text segment indicates
a text segment
following and immediately adjacent to the text segment being processed.
251. The equipment of claim 250, wherein the classifier algorithm merges any
text segment
judged as pertaining to the same intent class as the posterior text segment or
pertaining to
the preset merging intent class as a chitchat class in the posterior text
segment.
76
Date Recue/Date Received 202402-06
252. The equipment of claim 251, wherein text segments predicted to belong to
the same
question is merged into one questioning statement.
253. The equipment of claim 252, wherein the text segments predicted to belong
to different
questions are split into two different questioning statements.
254. The equipment of claim 253, wherein the groups of the QA type not
belonging to the same
and single question are converted to groups of the QQA type, wherein the
groups of the
QA type whose all text segments belong to the same and single question are
split into a
QA statement pair only includes one questioning statement and one answering
statement.
255. The equipment of claim 254, wherein to recognize to which circumstance a
group of the
QQA type specifically pertains, judge through a binary classification
algorithm.
256. The equipment of claim 255, wherein the questioning statement is split
into text segments,
and any text segment whose number of characters included is smaller than the
preset
number threshold or pertaining to the preset merging intent is directly merged
with the
antecedent questioning statement, or the questioning statement and the
antecedent
questioning statement are input together in the binary classification
algorithm to judge they
belong to the same question.
257. The equipment of claim 256, wherein it is judged any text segment and
corresponding
antecedent questioning statement belong to the same question, the text
statement and the
corresponding antecedent questioning statement are merged into one questioning
statement.
258. The equipment of claim 257, wherein it is recognized any text segment and
the
corresponding antecedent questioning statement do not belong to the same
question, the
questioning statement is split into new questioning statements.
259. The equipment of claim 258, wherein the statements remain are only one
answering
statement and one questioning statement, they are determined as the QA
statement pair.
77
Date Recue/Date Received 202402-06
260. The equipment of claim 259, wherein the statements remain are more than
one questioning
statement and one answering statement, the questioning statements and the
answering
statements are combined in pairs to generate corresponding QA statement pairs.
261. The equipment of claim 260, wherein interaction between the user and the
customer
service in the short time is split into plural groups of QA statement pairs.
262. The equipment of claim 261, wherein the answering statements and the
questioning
statements are directly combined in pairs with respect to groups of the QAQA
type.
263. The equipment of claim 262, wherein clustering is to incorporate similar
questions
together to constitute a cluster.
264. The equipment of claim 263, wherein calculate text distance metrics
amongst the
statement pairs via a text matching algorithm and determine the statement
pairs belong to
same statement pair group according to the text distance metrics.
265. The equipment of claim 264, wherein the text matching algorithm is an
algorithm
calculates similarity degree of two texts.
266. The equipment of claim 265, wherein an unsupervised text matching
algorithm, word
mover's distance (WMD), is used.
267. The equipment of claim 266, wherein any clustering algorithm is applied
to determine
statement pairs belong to the same statement pair group.
268. The equipment of claim 267, wherein in all QA pairs, there are invalid QA
pairs caused by
imprecise splitting, and circumstance in which answers are not pertinent to
questions asked
due to negligence of the customer service, wherein invalid QA statement pairs
are
removed.
269. The equipment of claim 268, wherein filtration of the QA statement pairs
is decided by
matching degrees of questions and answers, wherein the QA statement pairs
whose
matching degrees between questioning statements and answering statements
satisfy a
preset condition remain, wherein the QA statement pairs whose matching degrees
between
78
Date Recue/Date Received 202402-06
questioning statements and answering statements do not satisfy are eliminated
and filtered
out.
270. The equipment of claim 269, wherein matching process is a text matching
process,
wherein a set of supervised algorithms are trained based on existing knowledge
base data
to perform similarity calculation.
271. The equipment of claim 270, wherein frequently asked questions have
higher priorities to
be maintained in the knowledge base.
272. The equipment of claim 271, wherein more important questions are
preferentially
maintained wherein some less valuable questions are neglected.
273. The equipment of claim 272, wherein frequencies by which questions are
asked are
measured by the number of questions under each cluster obtained.
274. The equipment of claim 273, wherein accuracy of answers are measured by
matching
degrees of questions and answers in a filtering process.
275. The equipment of claim 274, wherein the corresponding sorting weight is
derived by
normalizing two values and weighting and accumulating he two values.
276. The equipment of claim 275, wherein corresponding statement pairs are
sequentially
obtained according to sorting weights during subsequent maintenance of the
knowledge
base, and screened and processed manually or by machine, and maintained in the
knowledge base.
277.A computer readable physical memory having stored thereon, computer-
executable
instructions, when executed by a computer, the computer is configured to:
obtain a session record, wherein the session record includes at least two
statements,
wherein the statements include questioning statements sent by questioners and
answering statements sent by answerers;
79
Date Recue/Date Received 202402-06
split the session record into corresponding groups according to a preset
splitting rule,
wherein the groups include at least one questioning statement and at least one
answering statement;
deteimine a processing rule to which the groups correspond according to a
number of
the questioning statements and a number of the answering statements included
in the
groups, wherein the processing rule is based on the number of questioning
statements
and the number of answering statements as compared to a first preset
threshold;
split the groups into corresponding statement pairs according to the
processing rule to
which the groups correspond;
using, when the number of the answering statements included in the group does
not
exceed the first preset threshold and the number of the questioning statements
as
included exceeds the first preset threshold, a preset binary classifier to
predict whether
the questioning statements as included and an antecedent questioning
statements of the
questioning statements as included belong to a same question; and
updating a knowledge base of a system according to the statement pairs.
278. The memory of claim 277, wherein each statement has a corresponding
generation time.
279. The memory of claim 278, wherein splitting the session record into
corresponding groups
according to the preset splitting rule comprises:
sequentially traversing the session record according to generation time of
each
statement;
judging, when the statement traversed is the questioning statement, whether a
traversed
questioning statement and an antecedent questioning statement of the traversed
questioning statement belong to same group according to a sentence pattern of
antecedent answering statement of the traversed questioning statement and/or
according
Date Recue/Date Received 202402-06
to an interval time to the antecedent questioning statement of the traversed
questioning
statement; and
determining, when the statement traversed is the answering statement, a
traversed
answering statement belongs to the group to which the antecedent questioning
statement
of the traversed answering statement corresponds.
280. The memory of claim 279, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
splitting, when the number of the questioning statements included in the group
does not
exceed the first preset threshold, the questioning statements each into at
least two text
segments according to preset signs included in the questioning statements;
predicting two adjacent text segments belong to a same question by employing a
preset
binary classifier;
generating corresponding questioning statements respectively according to text
segments predicted to belong to the same question; and
generating the corresponding statement pairs according to all the questioning
statements
as generated and the answering statements included in the group.
281. The memory of claim 280, further comprises:
traversing the text segments, and merging traversed text segments with
corresponding
posterior text segments when number of characters of the traversed text
segments is
smaller than a second preset threshold;
282. The memory of claim 281, further comprises:
merging the traversed text segments with corresponding posterior text segments
by
employing a preset classifier algorithm when the traversed text segments and
the
81
Date Recue/Date Received 202402-06
corresponding posterior text segments belong to a same intent class or when
the
traversed text segments belong to a preset merging intent class.
283. The memory of claim 282, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
combining, when the numbers of the questioning statements and the answering
statements included in the group both exceed the first preset threshold, the
questioning
statements and the answering statements included in the group; and
generating the corresponding statement pairs.
284. The memory of claim 283, wherein splitting the groups into the
corresponding statement
pairs according to the processing rule to which the groups correspond
comprises:
merging, when there are the questioning statements belong to the same
question, the
questioning statements belong to the same question and generating the
corresponding
statement pairs according to all merged questioning statements and the
answering
statements; and
generating the corresponding statement pairs according to all the questioning
statements
and the answering statements included in the group, when there are no
questioning
statements belong to the same question.
285. The memory of claim 284, wherein updating the knowledge base of the
system according
to the statement pairs comprises:
clustering the statement pairs by using a preset clustering algorithm;
generating statement pair groups;
determining the number of the questioning statements included in each
statement pair
group;
82
Date Recue/Date Received 202402-06
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm;
determining a weight to each statement pair group corresponds according to
corresponding matching degrees and number of questioning statements included
in the
statement pair groups; and
sequentially150 updating the knowledge base of the system according to the
weight to
which each statement pair group corresponds.
286. The memory of claim 285, further comprises:
rectifying any wrong word included in the session record according to a preset
rectifying rule; and
performing a normalizing process on rectified session record.
287. The memory of claim 286, further comprises:
recognizing the intent class to which each questioning statement included in
the session
record corresponds by using the preset classifier algorithm and eliminating
any
questioning statement to which a preset irrelevant intent class corresponds as
included
in the session record.
288. The memory of claim 287, further comprises a process of analyzing and
mining dialogue
statements between customer service and a user comprising:
obtaining a session record to be processed and preprocessing obtained session
record;
and
rectifying any wrong word included in the session record according to a preset
rectifying rule.
289. The memory of claim 288, further comprises:
83
Date Recue/Date Received 202402-06
performing a purification operation on all characters included in the session
record; and
recognizing a dialogue intent to which the questioning statement sent by each
user
corresponds by using the preset classifier algorithm.
290. The memory of claim 289, further comprises:
traversing a dialogue according to a temporal sequence of generation times;
screening questioning statements to be processed of the user when the
questioning
statements to be processed are traversed, and eliminating any questioning
statement to
be processed of the user does not conform to a preset condition;
determining a questioning statement to be processed is to be merged with the
antecedent
questioning statement of the questioning statement to be processed according
to a preset
merging rule;
eliminating any answering statement whose number of characters is smaller than
a
preset number threshold when the answering statements of the customer service
are
traversed, and determining any answering statement remaining after the
elimination as
an answering statement to be processed;
merging the answering statement to be processed with an antecedent answering
statement when the antecedent statement of the answering statement to be
processed is
an answering statement of the customer service, and storing the merged
answering
statement to the group to which the antecedent questioning statement
corresponds; and
merging the answering statement to be processed in the group to which the
questioning
statement corresponds when the antecedent statement of the answering statement
to be
processed is a questioning statement of the user.
291. The memory of claim 290, further comprises
splitting the questioning statements included in the group each into text
segments when
the group is of a QA type;
84
Date Recue/Date Received 202402-06
processing the text segments from front to back, and merging any text segment
whose
number of characters is smaller than the preset number threshold in a
posterior text
segment of this text segment; and/or
merging any text segment pertaining to the same intent class as the
corresponding
posterior text segment or pertaining to a preset merging intent class in the
posterior text
segment of this text segment;
sequentially obtaining a preset number of adjacent text segments through a
sliding
window, and predicting obtained text segments belong to a same and single
question by
means of a binary classifier algorithm;
traversing the questioning statements included in the group and judging each
questioning statement and its antecedent questioning statement belong to the
same
question when the group is of a QQA type; and
combining all questioning statements and answering statements in pairs to
generate
corresponding OA statement pairs when the group is of a QAQA type.
292. The memory of claim 291 further comprises:
clustering the statements by employing a preset clustering algorithm,
generating
statement pair groups, and determining the number of questioning statements
included
in each statement pair group;
determining matching degrees between the questioning statements and the
answering
statements included in the statement pair groups according to a preset
similarity
algorithm; and
determining a weight to which each statement pair group corresponds according
to the
corresponding matching degrees and the numbers of questioning statements
included in
the statement pair groups.
Date Recue/Date Received 202402-06
293. The memory of claim 292, wherein the session record is directed to text
statements,
principal wrong words are homonyms.
294. The memory of claim 293, wherein the session record is directed to a
speech statements,
the session record is firstly required to convert the speech statements into
the text
statements through speech recognition technique.
295. The memory of claim 294, wherein language model and word frequency
features are
combined.
296. The memory of claim 295, wherein corresponding rectifying rules are
provided for the
speech statements and the text statements respectively.
297. The memory of claim 296, wherein a wrong words are rectified according to
the
corresponding rectifying rules.
298. The memory of claim 297, wherein the purification operation includes
removing irrelevant
characters including preset useless punctuations and preset stop words,
recognizing
irrelevant information contained in each text statement including commodity
names and
placenames and normalizing the irrelevant information to corresponding preset
characters
according to which the irrelevant information corresponds.
299. The memory of claim 298, the session record of the user with customer
service within one
day is a segment of the dialogue.
300. The memory of claim 299, wherein the session record is split into one or
more dialogues,
and the dialogues are split into groups.
301. The memory of claim 300, wherein the user consults same type of questions
within a
preset period of time, wherein the customer service has replied, the user
consults different
questions next time in the dialogue with the customer service.
302. The memory of claim 301, wherein to eliminate any questioning statement
with eliminable
intent and irrelevant to business whose number of characters is smaller than a
preset
number threshold or intent is judged by the preset classifier algorithm as
chitchat intent.
86
Date Recue/Date Received 202402-06
303. The memory of claim 302, wherein the interval time between the antecedent
questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds a corresponding preset time threshold and/or when the
sentence pattern
of the antecedent answering statement of the questioning statement to be
processed is a
preset sentence pattern, the questioning statement to be processed is merged
with its
antecedent questioning statement.
304. The memory of claim 303, wherein the antecedent questioning statement is
a questioning
statement that is temporally antecedent to the statement to be processed and
with a shortest
interval time to the statement to be processed.
305. The memory of claim 304, wherein the antecedent answering statement is an
answering
statement that is temporally antecedent to the statement to be processed and
with the
shortest interval time to the statement to be processed.
306. The memory of claim 305, wherein the interval time between the antecedent
questioning
statement of the questioning statement to be processed and the questioning
statement to be
processed exceeds the corresponding preset time threshold, judge there is no
relevancy
between the antecedent questioning statement and the questioning statement to
be
processed, so a new group is generated according to the questioning statement
to be
processed.
307. The memory of claim 306, wherein the corresponding preset time threshold
is not
exceeded, judge there is relevancy between the antecedent questioning
statement and the
questioning statement to be processed, and the questioning statement to be
processed is
merged in the group to which the antecedent questioning statement corresponds.
308. The memory of claim 307, wherein the preset sentence patterns include
statements that
guide the user to further respond to responses made by the customer service,
includes asks
in reply by the customer service to indefinite expressions of users, or the
sentence pattern
asking for essential information from the user.
87
Date Recue/Date Received 202402-06
309. The memory of claim 308, wherein the antecedent answering statement is of
the preset
sentence pattern, the statement sent by the user after the antecedent
answering statement is
made in reply to this antecedent answering statement and is relevant to the
antecedent
answering statement, and the questioning statement to be processed is merged
with the
antecedent questioning statement.
310. The memory of claim 309, wherein the antecedent answering statement of
the questioning
statement to be processed is of the preset sentence pattern, the questioning
statement to be
processed is merged with the antecedent questioning statement, and the
questioning
statement to be processed is merged in the group to which the antecedent
questioning
statement corresponds.
311. The memory of claim 310, wherein the antecedent statement is a statement
that is
temporally antecedent to the statement to be processed and with the shortest
interval time
to the statement to be processed.
312. The memory of claim 311, wherein splitting the dialogue into groups,
wherein result
includes three types of groups comprising:
one question corresponds to a segment of reply, is marked as QA;
plural questions correspond to a segment of reply, wherein the user asks
plural
questions and the customer service replies with a segment of words, is marked
as QQA;
and
plural questions correspond to plural replies, wherein several rounds of
communication
are carried out between the user and the customer service in a short time, is
marked as
QAQA.
313. The memory of claim 312, wherein corresponding type is determined
according to
number of answering statements and questioning statements included in each
group, and is
processed according to corresponding processing rule.
88
Date Recue/Date Received 202402-06
314. The memory of claim 313, wherein QA is a standard input form of a
algorithm, wherein
one standard question is only meant to express one question in the knowledge
base.
315. The memory of claim 314, wherein an auxiliary algorithm is required
during splitting to
judge whether two segments of words are directed to one question or to two
questions.
316. The memory of claim 315, wherein the auxiliary algorithm is a binary
classifier, wherein
inputs to the binary classifier are two statements.
317. The memory of claim 316, wherein any model realizes binary questions.
318. The memory of claim 317, wherein model bert predicts during the process
of pretraining
whether input two statements are directed to context of the same and single
statement or
topics irrelevant to each other, serves as the classifier, and fine-tuning
training is
performed.
319. The memory of claim 318, wherein the posterior text segment indicates a
text segment
following and immediately adjacent to the text segment being processed.
320. The memory of claim 319, wherein the classifier algorithm merges any text
segment
judged as pertaining to the sarne intent class as the posterior text segment
or pertaining to
the preset merging intent class as a chitchat class in the posterior text
segment.
321. The memory of claim 320, wherein text segments predicted to belong to the
same question
is merged into one questioning statement.
322. The memory of claim 321, wherein the text segments predicted to belong to
different
questions are split into two different questioning statements.
323. The memory of claim 322, wherein the groups of the QA type not belonging
to the same
and single question are converted to groups of the QQA type, wherein the
groups of the
QA type whose all text segments belong to the same and single question are
split into a
QA statement pair only includes one questioning statement and one answering
statement.
89
Date Recue/Date Received 202402-06
324. The memory of claim 323, wherein to recognize to which circumstance a
group of the
QQA type specifically pertains, judge through a binary classification
algorithm.
325. The memory of claim 324, wherein the questioning statement is split into
text segments,
and any text segment whose number of characters included is smaller than the
preset
number threshold or pertaining to the preset merging intent is directly merged
with the
antecedent questioning statement, or the questioning statement and the
antecedent
questioning statement are input together in the binary classification
algorithm to judge they
belong to the same question.
326. The memory of claim 325, wherein it is judged any text segment and
corresponding
antecedent questioning statement belong to the same question, the text
statement and the
corresponding antecedent questioning statement are merged into one questioning
statement.
327. The memory of claim 326, wherein it is recognized any text segment and
the
corresponding antecedent questioning statement do not belong to the same
question, the
questioning statement is split into new questioning statements.
328. The memory of claim 327, wherein the statements remain are only one
answering
statement and one questioning statement, they are determined as the QA
statement pair.
329. The memory of claim 328, wherein the statements remain are more than one
questioning
statement and one answering statement, the questioning statements and the
answering
statements are combined in pairs to generate corresponding QA statement pairs.
330. The memory of claim 329, wherein interaction between the user and the
customer service
in the short time is split into plural groups of QA statement pairs.
331. The memory of claim 330, wherein the answering statements and the
questioning
statements are directly combined in pairs with respect to groups of the QAQA
type.
332. The memory of claim 331, wherein clustering is to incorporate similar
questions together
to constitute a cluster.
Date Recue/Date Received 202402-06
333. The memory of claim 332, wherein calculate text distance metrics amongst
the statement
pairs via a text matching algorithm and determine the statement pairs belong
to same
statement pair group according to the text distance metrics.
334. The memory of claim 333, wherein the text matching algorithm is an
algorithm calculates
similarity degree of two texts.
335. The memory of claim 334, wherein an unsupervised text matching algorithm,
word
mover's distance (WMD), is used.
336. The memory of claim 335, wherein any clustering algorithm is applied to
deterinine
statement pairs belong to the same statement pair group.
337. The memory of claim 336, wherein in all QA pairs, there are invalid QA
pairs caused by
imprecise splitting, and circumstance in which answers are not pertinent to
questions asked
due to negligence of the customer service, wherein invalid QA statement pairs
are
removed.
338. The memory of claim 337, wherein filtration of the QA statement pairs is
decided by
matching degrees of questions and answers, wherein the QA statement pairs
whose
matching degrees between questioning statements and answering statements
satisfy a
preset condition remain, wherein the QA statement pairs whose matching degrees
between
questioning statements and answering statements do not satisfy are eliminated
and filtered
out.
339. The memory of claim 338, wherein matching process is a text matching
process, wherein a
set of supervised algorithms are trained based on existing knowledge base data
to perform
similarity calculation.
340. The memory of claim 339, wherein frequently asked questions have higher
priorities to be
maintained in the knowledge base.
341. The memory of claim 340, wherein more important questions are
preferentially maintained
wherein some less valuable questions are neglected.
91
Date Recue/Date Received 202402-06
342. The memory of claim 341, wherein frequencies by which questions are asked
are
measured by the number of questions under each cluster obtained.
343. The memory of claim 342, wherein accuracy of answers are measured by
matching
degrees of questions and answers in a filtering process.
344. The memory of claim 343, wherein the corresponding sorting weight is
derived by
normalizing two values and weighting and accumulating he two values.
345. The memory of claim 344, wherein corresponding statement pairs are
sequentially
obtained accorcfing to sorting weights during subsequent maintenance of the
knowledge
base, and screened and processed manually or by machine, and maintained in the
knowledge base.
92
Date Recue/Date Received 202402-06