Language selection

Search

Patent 3150183 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 3150183
(54) English Title: FLINK STREAMING PROCESSING ENGINE METHOD AND DEVICE FOR REAL-TIME RECOMMENDATION AND COMPUTER EQUIPMENT
(54) French Title: METHODE ET DISPOSITIF DE MOTEUR DE TRAITEMENT DE DIFFUSION FLINK POUR LA RECOMMANDATION EN TEMPS REEL ET MATERIEL INFORMATIQUE
Status: Granted and Issued
Bibliographic Data
(51) International Patent Classification (IPC):
  • H04L 65/60 (2022.01)
(72) Inventors :
  • HE, XIAOMING (China)
  • ZHOU, RUI (China)
(73) Owners :
  • 10353744 CANADA LTD.
(71) Applicants :
  • 10353744 CANADA LTD. (Canada)
(74) Agent: JAMES W. HINTONHINTON, JAMES W.
(74) Associate agent:
(45) Issued: 2024-02-27
(22) Filed Date: 2022-02-25
(41) Open to Public Inspection: 2022-08-25
Examination requested: 2022-09-16
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
202110215212.1 (China) 2021-02-25

Abstracts

English Abstract

Pertaining to the field of big data recommendation technology, the present application discloses a Flink stream processing engine method for real-time recommendation, and corresponding device and computer equipment. The method comprises: performing short-term feature calculation on a log in a message queue according to a calculation model file, and storing short- term features obtained by the calculation in a distributed storage layer, wherein the distributed storage layer stores therein long-term features obtained in advance; reading a long-term feature and a short-term feature of a to-be-recommended user from the distributed storage layer, when an event is received from the message queue; performing real-time prediction according to a prediction model file, the long-term feature and the short-term feature of the to-be-recommended user, to obtain a preference prediction result of the to-be-recommended user; and writing the preference prediction result of the to-be-recommended user in the distributed storage layer.


French Abstract

Concernant le domaine de la technologie de recommandation des mégadonnées, la présente demande décrit une méthode du moteur de traitement de flux de Flink pour des recommandations en temps réel, ainsi quun dispositif et un matériel informatique connexes. La méthode comprend le fait deffectuer un calcul de caractéristique à court terme sur une connexion à une file de messages en fonction dun dossier de modèles de calcul et le stockage des caractéristiques à court terme obtenues par le calcul dans une couche de stockage distribué, dans lequel la couche de stockage distribué stocke les caractéristiques à long terme obtenues préalablement et une caractéristique à court terme dun utilisateur recommandé potentiel de la couche de stockage distribué. La méthode comprend également le fait deffectuer une prédiction en temps réel en fonction dun dossier de modèle de prédiction et des caractéristiques à long terme et à court terme de lutilisateur recommandé potentiel, dans le but dobtenir dun résultat de prédiction de préférence de lutilisateur recommandé potentiel, lors de la réception dun événement de la file de messages, ainsi que lécriture du résultat de prédiction de préférence de lutilisateur recommandé potentiel dans la couche de stockage distribué.

Claims

Note: Claims are shown in the official language in which they were submitted.


Claims:
1. A Flink stream processing engine device for real-time recommendation,
comprising:
an algorithm calculating module configured to perform short-term feature
calculation on
a log in a message queue according to a calculation model file;
a data pushing module configured to store short-term features obtained by the
calculation
in a distributed storage layer, wherein the distributed storage layer stores
therein long-
term features obtained in advance;
a feature enquiring module configured to, on receipt of an event from the
message queue,
read a long-term feature and a short-term feature of a to-be-recommended user
from the
distributed storage layer;
the algorithm calculating module further configured to perform real-time
prediction
according to a prediction model file, the long-term feature and the short-term
feature of
the to-be-recommended user, to obtain a preference prediction result of the to-
be-
recommended user; and
the data pushing module further configured to write the preference prediction
result of the
to-be-recommended user in the distributed storage layer.
2. The device of claim 1 further comprising a message queue connecting
module configured to
obtain data from the message queue.
3. The device of claim 2 wherein the message queue stores log data.
4. The device of claim 3 wherein the log data includes behavior data
generated by user
behavior operations within a preset time period.
5. The device of claim 4 wherein the behavior operations include at least
one of instance
browsing, clicking, listing as favorite, sharing a commodity object operated
by the user.
29
Date Recue/Date Received 2023-12-05

6. The device of any one of claims 4 or 5 wherein the preset time period is
set according to
practical requirements.
7. The device of claim 6 wherein the distributed storage layer is an NOSQL
storage layer
which includes a cache component and a persistence component.
8. The device of claim 7 wherein the cache component is a Redis component
which has a
distributed, high-reading, and high-writing, small storage capacity, and
stores data within a
period of time.
9. The device of any one of claims 7 or 8 wherein the persistence component
is an Hbase
component which has a distributed, low-reading, and high-writing, large
storage capacity,
and column-extensible properties, and stores data beyond a period of time.
10. The device of any one of claims 7 to 9 wherein the cache component and the
persistence
component can both be used to store the long-term features and the short-term
features.
11. The device of any one of claims 1 to 10 wherein the long-term features
represent long-term
preferences of the user with respect to a commodity.
12. The device of claim 11 wherein the long-term preferences of the user
include at least one of
user basic features, user long-term behavior features and other features.
13. The device of claim 12 wherein user basic features include at least one of
age, gender,
profession, place of residence, and preference of the user.
14. The device of any one of claims 11 or 12 wherein the user long-term
behaviour features
include shopping behaviour statistic values at an e-commerce platform within a
period of
time.
15. The device of claim 10 wherein the short-term features represent short-
term preferences of
the user with respect to a commodity.
Date Recue/Date Received 2023-12-05

16. The device of claim 15 wherein the short-term preferences are obtained
from behavior
operation data generated by the user within the preset time period.
17. The device of any one of claims 1 to 16 further comprising a first machine
learning
algorithm to train primary data of a sample user to generate a short-term
model and to obtain
the calculation model file.
18. The device of claim 17 wherein the first machine learning algorithm is
embodied as one of
an iterative decision tree, a logic regression, and a support vector machine.
19. The device of any one of claims 1 to 18 wherein the to-be-recommended user
is determined
based on a preset user identification.
20. The device of any one of claims 1 to 19 further comprising a second
machine learning
algorithm to generate the prediction model file.
21. The device of claim 20 wherein the second machine learning algorithm is
embodied as one
of an iterative decision tree, a logic regression, and a support vector
machine.
22. The device of any one of claims 1 to 21, further comprising:
a configuration synchronization module configured to obtain a configuration
file that
contains a configuration parameter; and
a model loading module configured to load from a model repository a model file
well
trained in advance to which the configuration parameter corresponds, and
sending the
model file to various distributed nodes of the Flink steam processing engine,
wherein the
model file includes the calculation model file and the prediction model file.
23. The device according to Claim 22, wherein the configuration synchronizing
module is
further configured to:
synchronize at least one configuration file from a distributed configuration
cluster, and
loading the at least one configuration file into a memory.
31
Date Recue/Date Received 2023-12-05

24. The device according to Claim 23, wherein the model loading module is
further configured
to:
read a preconfigured model identification parameter from the configuration
parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
load from the model repository a model file to which the model identification
parameter
corresponds.
25. The device of claim 24 wherein the model loading module is further
configured to:
initialize a model repository client end at an application submission phase of
the Flink
stream processing engine;
load the model file and a relevant script configuration file to equipment in
which the
Flink stream processing engine resides;
employ distributed file cache by the equipment to submit the model file and
the relevant
script configuration file to the various distributed nodes;
load the model repository client end during a task loading at the various
nodes, obtaining
the model from the model file required to be used according to the
configuration file and
loading the model to the memory, and generating corresponding UDF; and
employ the model file in the memory during prediction to perform prediction.
26. The device of any one of claims 22 to 25 wherein the model file is
automatically loaded and
deployed from the repository.
27. The device of claim 22, wherein the algorithm calculating module is
further configured to:
perform short-term feature calculation on the log according to the calculation
model file
through the various distributed nodes;
32
Date Recue/Date Received 2023-12-05

generate a prediction model according to the prediction model file through the
various
distributed nodes; and
input the long-term feature and the short-term feature of the to-be-
recommended user to
the prediction model for real-time prediction, to obtain the preference
prediction result of
the to-be-recommended user.
28. The device of claim 7, wherein the distributed storage layer includes a
cache component and
the persistence component, and the feature enquiring module is further
configured to:
read a long-term feature and a short-term feature of the to-be-recommended
user from a
local cache through the various distributed nodes;
if reading from the local cache failed, read a long-term feature and a short-
term feature of
the to-be-recommended user from the cache component; and
if reading from the cache component failed, read a long-term feature and a
short-term
feature of the to-be-recommended user from the persistence component, and
asynchronously writing the read features in the cache component.
29. The device of claim 28 wherein the cache component is a distributed
secondary cache
component.
30. The device of claim 29 wherein the secondary cache component stores
partial data, has a
large capacity, a long expiration time, and utilizes an LRU policy.
31. The device of claim 29 wherein a data synchronizer is employed to
asynchronously write a
result to the secondary cache component for standby use in the future.
32. The device of claim 28 wherein the local cache is a host memory that only
retains related
data, has a small capacity, a short expiration time, and utilizes an LRU
policy.
33. The device of claim 28 wherein the persistence component retains total
data for at least a
year.
33
Date Recue/Date Received 2023-12-05

34. The device of claim 28 wherein the persistence component can be an offline
data
warehouse.
35. The device of any one of claims 1 to 34, characterized in that the
calculation model file is an
SQL model file, and the prediction model file is a PMML model file.
36. The device of any one of claims 1 to 35 further comprising utilizing a
cluster deployment
framework including:
preparing a Zookeeper cluster;
preparing a Hadoop cluster;
preparing a Redis cluster, wherein a deployment mode is three sentinels plus
N*2 sets of
servers, wherein N is a number of master nodes;
preparing a Kafka cluster;
preparing a scheduler, using Hadoop and Flink components;
starting the Zookeeper, Hadoop, Kafka and Redis clusters; and
creating a model repository on HDFS, wherein trained models are released in
the
repository.
37. A Flink stream processing engine device for real-time recommendation,
comprising:
an algorithm calculating module configured to perfolin short-term feature
calculation on
a log in a message queue according to a calculation model file;
a data pushing module configured to store short-term features obtained by the
calculation
in a distributed storage layer, wherein the distributed storage layer stores
therein long-
term features obtained in advance;
34
Date Recue/Date Received 2023-12-05

a feature enquiring module configured to, on receipt of an event from the
message queue,
read a long-term feature and a short-term feature of a to-be-recommended user
from the
distributed storage layer;
the algorithm calculating module further configured to perform real-time
prediction
according to a prediction model file, the long-term feature and the short-term
feature of
the to-be-recommended user, to obtain a preference prediction result of the to-
be-
recommended user; and
the data pushing module further configured to write the preference prediction
result of the
to-be-recommended user in the distributed storage layer.
38. The device of claim 37 further comprising a message queue connecting
module configured
to obtain data from the message queue.
39. The device of claim 38 wherein the message queue stores log data.
40. The device of claim 39 wherein the log data includes behavior data
generated by user
behavior operations within a preset time period.
41. The device of claim 40 wherein the behavior operations include at least
one of instance
browsing, clicking, listing as favorite, sharing a commodity object operated
by the user.
42. The device of any one of claims 40 or 41 wherein the preset time period is
set according to
practical requirements.
43. The device of claim 42 wherein the distributed storage layer is an NOSQL
storage layer
which includes a cache component and a persistence component.
44. The device of claim 43 wherein the cache component is a Redis component
which has a
distributed, high-reading, and high-writing, small storage capacity, and
stores data within a
period of time.
Date Recue/Date Received 2023-12-05

45. The device of any one of claims 43 or 44 wherein the persistence component
is an Hbase
component which has a distributed, low-reading, and high-writing, large
storage capacity,
and column-extensible properties, and stores data beyond a period of time.
46. The device of any one of claims 43 to 45 wherein the cache component and
the persistence
component can both be used to store the long-telm features and the short-term
features.
47. The device of any one of claims 37 to 46 wherein the long-term features
represent long-term
preferences of the user with respect to a commodity.
48. The device of claim 47 wherein the long-term preferences of the user
include at least one of
user basic features, user long-term behavior features and other features.
49. The device of claim 48 wherein user basic features include at least one of
age, gender,
profession, place of residence, and preference of the user.
50. The device of any one of claims 47 or 48 wherein the user long-term
behaviour features
include shopping behaviour statistic values at an e-commerce platform within a
period of
time.
51. The device of claim 46 wherein the short-term features represent short-
term preferences of a
user with respect to a commodity.
52. The device of claim 51 wherein the short-term preferences are obtained
from behavior
operation data generated by the user within the preset time period.
53. The device of any one of claims 37 to 52 further comprising a first
machine learning
algorithm to train primary data of a sample user to generate a short-term
model and to obtain
the calculation model file.
54. The device of claim 53 wherein the first machine learning algorithm is
embodied as one of
an iterative decision tree, a logic regression, and a support vector machine.
36
Date Recue/Date Received 2023-12-05

55. The device of any one of claims 37 to 54 wherein the to-be-recommended
user is
determined based on a preset user identification.
56. The device of any one of claims 37 to 55 further comprising a second
machine learning
algorithm to generate the prediction model file.
57. The device of claim 56 wherein the second machine learning algorithm is
embodied as one
of an iterative decision tree, a logic regression, and a support vector
machine.
58. The device of any one of claims 37 to 57, further comprising:
a configuration synchronization module configured to obtain a configuration
file that
contains a configuration parameter; and
a model loading module configured to load from a model repository a model file
well
trained in advance to which the configuration parameter corresponds, and
sending the
model file to various distributed nodes of the Flink stream processing engine,
wherein the
model file includes the calculation model file and the prediction model file.
59. The device according to Claim 58, wherein the configuration synchronizing
module is
further configured to:
synchronize at least one configuration file from a distributed configuration
cluster, and
loading the at least one configuration file into a memory.
60. The device according to Claim 59, wherein the model loading module is
further configured
to:
read a preconfigured model identification parameter from the configuration
parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
load from the model repository a model file to which the model identification
parameter
corresponds.
37
Date Recue/Date Received 2023-12-05

61. The device of claim 60 wherein the model loading module is further
configured to:
initialize a model repository client end at an application submission phase of
the Flink
stream processing engine;
load the model file and a relevant script configuration file to equipment in
which the
Flink stream processing engine resides;
employ distributed file cache by the equipment to submit the model file and
the relevant
script configuration file to the various distributed nodes;
load the model repository client end during a task loading at the various
nodes, obtaining
the model from the model file required to be used according to the
configuration file and
loading the model to the memory, and generating corresponding UDF; and
employ the model file in the memory during prediction to perform prediction.
62. The device of any one of claims 58 to 61 wherein the model file is
automatically loaded and
deployed from the repository.
63. The device of claim 58, wherein the algorithm calculating module is
further configured to:
perform short-term feature calculation on the log according to the calculation
model file
through the various distributed nodes;
generate a prediction model according to the prediction model file through the
various
distributed nodes; and
input the long-term feature and the short-term feature of the to-be-
recommended user to
the prediction model for real-time prediction, to obtain the preference
prediction result of
the to-be-recommended user.
64. The device of claim 43, wherein the distributed storage layer includes a
cache component
and the persistence component, and the feature enquiring module is further
configured to:
38
Date Recue/Date Received 2023-12-05

read a long-term feature and a short-term feature of the to-be-recommended
user from a
local cache through the various distributed nodes;
if reading from the local cache failed, read a long-term feature and a short-
term feature of
the to-be-recommended user from the cache component; and
if reading from the cache component failed, read a long-term feature and a
short-term
feature of the to-be-recommended user from the persistence component, and
asynchronously writing the read features in the cache component.
65. The device of claim 64 wherein the cache component is a distributed
secondary cache
component.
66. The device of claim 65 wherein the secondary cache component stores
partial data, has a
large capacity, a long expiration time, and utilizes an LRU policy.
67. The device of claim 65 wherein a data synchronizer is employed to
asynchronously write a
result to the secondary cache component for standby use in the future.
68. The device of claim 64 wherein the local cache is a host memory that only
retains related
data, has a small capacity, a short expiration time, and utilizes an LRU
policy.
69. The device of claim 64 wherein the persistence component retains total
data for at least a
year.
70. The device of claim 64 wherein the persistence component can be an offline
data
warehouse.
71. The device of any one of claims 37 to 70, characterized in that the
calculation model file is
an SQL model file, and the prediction model file is a PMML model file.
72. The device of any one of claims 37 to 71 further comprising utilizing a
cluster deployment
framework including:
39
Date Recue/Date Received 2023-12-05

preparing a Zookeeper cluster;
preparing a Hadoop cluster;
preparing a Redis cluster, wherein a deployment mode is three sentinels plus
N*2 sets of
servers, wherein N is a number of master nodes;
preparing a Kafka cluster;
preparing a scheduler, using Hadoop and Flink components;
starting the Zookeeper, Hadoop, Kafka and Redis clusters; and
creating a model repository on HDFS, wherein trained models are released in
the
repository.
73. A Flink stream processing engine method for real-time recommendation,
characterized in
comprising:
performing short-term feature calculation on a log in a message queue
according to a
calculation model file, and storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
on receipt of an event from the message queue, reading a long-term feature and
a short-
term feature of a to-be-recommended user from the distributed storage layer;
performing real-time prediction according to a prediction model file, the long-
term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
writing the preference prediction result of the to-be-recommended user in the
distributed
storage layer.
Date Recue/Date Received 2023-12-05

74. The method of claim 73 wherein the message queue is a mode of
implementation of a
middleware.
75. The method of claim 74 wherein the message queue stores log data.
76. The method of claim 75 wherein the log data includes behavior data
generated by user
behavior operations within a preset time period.
77. The method of claim 76 wherein the behavior operations include at least
one of instance
browsing, clicking, listing as favorite, sharing a commodity object operated
by the user.
78. The method of any one of claims 76 or 77 wherein the preset time period is
set according to
practical requirements.
79. The method of claim 78 wherein the distributed storage layer is an NOSQL
storage layer
which includes a cache component and a persistence component.
80. The method of claim 79 wherein the cache component is a Redis component
which has a
distributed, high-reading, and high-writing, small storage capacity, and
stores data within a
period of time.
81. The method of any one of claims 79 or 80 wherein the persistence component
is an Hbase
component which has a distributed, low-reading, and high-writing, large
storage capacity,
and column-extensible properties, and stores data beyond a period of time.
82. The method of any one of claims 79 to 81 wherein the cache component and
the persistence
component can both be used to store the long-tenn features and the short-term
features.
83. The method of any one of claims 73 to 82 wherein the long-term features
represent long-
term preferences of the user with respect to a commodity.
84. The method of claim 83 wherein the long-term preferences of the user
include at least one of
user basic features, user long-telln behavior features and other features.
41
Date Recue/Date Received 2023-12-05

85. The method of claim 84 wherein user basic features include at least one of
age, gender,
profession, place of residence, and preference of the user.
86. The method of any one of claims 83 or 84 wherein the user long-term
behaviour features
include shopping behaviour statistic values at an e-commerce platform within a
period of
time.
87. The method of claim 82 wherein the short-term features represent short-
term preferences of
the user with respect to a commodity.
88. The method of claim 87 wherein the short-term preferences are obtained
from behavior
operation data generated by the user within the preset time period.
89. The method of any one of claims 73 to 88 further comprising employing a
first machine
learning algorithm to train primary data of a sample user to generate a short-
term model and
to obtain the calculation model file.
90. The method of claim 89 wherein the first machine learning algorithm is
embodied as one of
an iterative decision tree, a logic regression, and a support vector machine.
91. The method of any one of claims 73 to 90 wherein the to-be-recommended
user is
determined based on a preset user identification.
92. The method of any one of claims 73 to 91 further comprising employing a
second machine
learning algorithm to generate the prediction model file.
93. The method of claim 92 wherein the second machine learning algorithm is
embodied as one
of an iterative decision tree, a logic regression, and a support vector
machine.
94. The method of any one of claims 73 to 93, further comprising: prior to the
step of
performing short-term feature calculation on a log in a message queue
according to a
calculation model file:
obtaining a configuration file that contains a configuration parameter; and
42
Date Recue/Date Received 2023-12-05

loading from a model repository a model file well trained in advance to which
the
configuration parameter corresponds, and sending the model file to various
distributed
nodes of the Flink stream processing engine, wherein the model file includes
the
calculation model file and the prediction model file.
95. The method according to Claim 94, characterized in that the step of
obtaining a
configuration file includes:
synchronizing at least one configuration file from a distributed configuration
cluster, and
loading the at least one configuration file into a memory.
96. The method according to Claim 95, characterized in that the step of
loading from a model
repository a model file to which the configuration parameter corresponds
includes:
reading a preconfigured model identification parameter from the configuration
parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
loading from the model repository a model file to which the model
identification
parameter corresponds.
97. The method of claim 96 wherein loading from a model repository a model
file comprises:
initializing a model repository client end at an application submission phase
of the Flink
stream processing engine;
loading the model file and a relevant script configuration file to equipment
in which the
Flink stream processing engine resides;
employing distributed file cache by the equipment to submit the model file and
the
relevant script configuration file to the various distributed nodes;
43
Date Recue/Date Received 2023-12-05

loading the model repository client end during a task loading at the various
nodes,
obtaining the model from the model file required to be used according to the
configuration file and loading the model to the memory, and generating
corresponding
UDF; and
employing the model file in the memory during prediction to perform
prediction.
98. The method of any one of claims 94 to 97 wherein the model file is
automatically loaded
and deployed from the repository.
99. The method of claim 94, characterized in that the step of performing short-
term feature
calculation on a log in a message queue according to a calculation model file
includes:
performing short-term feature calculation on the log according to the
calculation model
file through the various distributed nodes; and
the step of performing real-time prediction according to a prediction model
file, the long-
term feature and the short-term feature of the to-be-recommended user, to
obtain a
preference prediction result of the to-be-recommended user includes:
generating a prediction model according to the prediction model file through
the various
distributed nodes; and
inputting the long-term feature and the short-term feature of the to-be-
recommended user
to the prediction model for real-time prediction, to obtain the preference
prediction result
of the to-be-recommended user.
100. The method of claim 79, characterized in that the distributed storage
layer includes a cache
component and the persistence component, and the step of on receipt of an
event from the
message queue, reading a long-term feature and a short-term feature of a to-be-
recommended user from the distributed storage layer:
reading a long-term feature and a short-term feature of the to-be-recommended
user from
a local cache through the various distributed nodes;
44
Date Recue/Date Received 2023-12-05

if reading from the local cache failed, reading a long-term feature and a
short-term
feature of the to-be-recommended user from the cache component and
if reading from the cache component failed, reading a long-term feature and a
short-term
feature of the to-be-recommended user from the persistence component, and
asynchronously writing the read features in the cache component.
101. The method of claim 100 wherein the cache component is a distributed
secondary cache
component.
102. The method of claim 101 wherein the secondary cache component stores
partial data, has a
large capacity, a long expiration time, and utilizes an LRU policy.
103. The method of claim 101 wherein a data synchronizer is employed to
asynchronously write
a result to the secondary cache component for standby use in the future.
104. The method of claim 100 wherein the local cache is a host memory that
only retains related
data, has a small capacity, a short expiration time, and utilizes an LRU
policy.
105. The method of claim 100 wherein the persistence component retains total
data for at least a
year.
106. The method of claim 100 wherein the persistence component can be an
offline data
warehouse.
107. The method of any one of claims 73 to 106, characterized in that the
calculation model file
is an SQL model file, and the prediction model file is a PMML model file.
108. The method of any one of claims 73 to 107 further comprising utilizing a
cluster
deployment framework including:
preparing a Zookeeper cluster;
preparing a Hadoop cluster;
Date Recue/Date Received 2023-12-05

preparing a Redis cluster, wherein a deployment mode is three sentinels plus
N*2 sets of
servers, wherein N is a number of master nodes;
preparing a Kafka cluster;
preparing a scheduler, using Hadoop and Flink components;
starting the Zookeeper, Hadoop, Kafka and Redis clusters; and
creating a model repository on HDFS, wherein trained models are released in
the
repository.
109. A Flink steam processing computer equipment comprising:
a computer readable physical memory;
a processor communicatively coupled to the physical memory,
a computer program stored on the physical memory and operable on the
processor,
wherein the processor executes the computer program configured to:
perform short-term feature calculation on a log in a message queue according
to a
calculation model file, and store short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
on receipt of an event from the message queue, read a long-term feature and a
short-term
feature of a to-be-recommended user from the distributed storage layer;
perform real-time prediction according to a prediction model file, the long-
term feature
and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
write the preference prediction result of the to-be-recommended user in the
distributed
storage layer.
46
Date Recue/Date Received 2023-12-05

110. The equipment of claim 109 wherein the message queue is a mode of
implementation of a
middleware.
111. The equipment of claim 110 wherein the message queue stores log data.
112. The equipment of claim 111 wherein the log data includes behavior data
generated by user
behavior operations within a preset time period.
113. The equipment of claim 112 wherein the behavior operations include at
least one of instance
browsing, clicking, listing as favorite, sharing a commodity object operated
by the user.
114. The equipment of any one of claims 112 or 113 wherein the preset time
period is set
according to practical requirements.
115. The equipment of claim 114 wherein the distributed storage layer is an
NOSQL storage
layer which includes a cache component and a persistence component.
116. The equipment of claim 115 wherein the cache component is a Redis
component which has
a distributed, high-reading, and high-writing, small storage capacity, and
stores data within a
period of time.
117. The equipment of any one of claims 115 or 116 wherein the persistence
component is an
Hbase component which has a distributed, low-reading, and high-writing, large
storage
capacity, and column-extensible properties, and stores data beyond a period of
time.
118. The equipment of any one of claims 115 to 117 wherein the cache component
and the
persistence component can both be used to store the long-term features and the
short-term
features.
119. The equipment of any one of claims 109 to 118 wherein the long-term
features represent
long-term preferences of the user with respect to a commodity.
120. The equipment of claim 119 wherein the long-term preferences of the user
include at least
one of user basic features, user long-term behavior features and other
features.
47
Date Recue/Date Received 2023-12-05

121. The equipment of claim 120 wherein user basic features include at least
one of age, gender,
profession, place of residence, and preference of the user.
122. The equipment of any one of claims 119 or 120 wherein the user long-term
behaviour
features include shopping behaviour statistic values at an e-commerce platform
within a
period of time.
123. The equipment of claim 118 wherein the short-term features represent
short-term
preferences of the user with respect to a commodity.
124. The equipment of claim 123 wherein the short-term preferences are
obtained from behavior
operation data generated by the user within the preset time period.
125. The equipment of any one of claims 109 to 124 further configured to
employ a first machine
learning algorithm to train primary data of a sample user to generate a short-
term model and
to obtain the calculation model file.
126. The equipment of claim 125 wherein the first machine learning algorithm
is embodied as
one of an iterative decision tree, a logic regression, and a support vector
machine.
127. The equipment of any one of claims 109 to 126 wherein the to-be-
recommended user is
determined based on a preset user identification.
128. The equipment of any one of claims 109 to 127 further configured to
employ a second
machine learning algorithm to generate the prediction model file.
129. The equipment of claim 128 wherein the second machine learning algorithm
is embodied as
one of an iterative decision tree, a logic regression, and a support vector
machine.
130. The equipment of any one of claims 109 to 129, further configured to:
prior to the step of
performing short-term feature calculation on a log in a message queue
according to a
calculation model file:
obtain a configuration file that contains a configuration parameter; and
48
Date Recue/Date Received 2023-12-05

load from a model repository a model file well trained in advance to which the
configuration parameter corresponds, and send the model file to various
distributed nodes
of a Flink stream processing engine, wherein the model file includes the
calculation
model file and the prediction model file.
131. The equipment according to Claim 130, further configured to:
synchronize at least one configuration file from a distributed configuration
cluster, and
loading the at least one configuration file into a memory.
132. The equipment according to Claim 131, further configured to:
read a preconfigured model identification parameter from the configuration
parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
load from the model repository a model file to which the model identification
parameter
corresponds.
133. The equipment of claim 132 further configured to:
initialize a model repository client end at an application submission phase of
the Flink
stream processing engine;
load the model file and a relevant script configuration file to equipment in
which the
Flink stream processing engine resides;
employ distributed file cache by the equipment to submit the model file and
the relevant
script configuration file to the various distributed nodes;
load the model repository client end during a task loading at the various
nodes, obtaining
the model from the model file required to be used according to the
configuration file and
loading the model to the memory, and generating corresponding UDF; and
49
Date Recue/Date Received 2023-12-05

employ the model file in the memory during prediction to perform prediction.
134. The equipment of any one of claims 130 to 133 wherein the model file is
automatically
loaded and deployed from the repository.
135. The equipment of claim 130, further configured to:
perform short-term feature calculation on the log according to the calculation
model file
through the various distributed nodes;
generate a prediction model according to the prediction model file through the
various
distributed nodes; and
input the long-term feature and the short-term feature of the to-be-
recommended user to
the prediction model for real-time prediction, to obtain the preference
prediction result of
the to-be-recommended user.
136. The equipment of claim 115, further configured to:
read a long-term feature and a short-term feature of the to-be-recommended
user from a
local cache through the various distributed nodes;
if reading from the local cache failed, read a long-term feature and a short-
term feature of
the to-be-recommended user from the cache component; and
if reading from the cache component failed, read a long-term feature and a
short-term
feature of the to-be-recommended user from the persistence component, and
asynchronously writing the read features in the cache component.
137. The equipment of claim 136 wherein the cache component is a distributed
secondary cache
component.
138. The equipment of claim 137 wherein the secondary cache component stores
partial data, has
a large capacity, a long expiration time, and utilizes an LRU policy.
Date Recue/Date Received 2023-12-05

139. The equipment of claim 137 wherein a data synchronizer is employed to
asynchronously
write a result to the secondary cache component for standby use in the future.
140. The equipment of claim 136 wherein the local cache is a host memory that
only retains
related data, has a small capacity, a short expiration time, and utilizes an
LRU policy.
141. The equipment of claim 136 wherein the persistence component retains
total data for at least
a year.
142. The equipment of claim 136 wherein the persistence component can be an
offline data
warehouse.
143. The equipment of any one of claims 109 to 142, characterized in that the
calculation model
file is an SQL model file, and the prediction model file is a PMML model file.
144. The equipment of any one of claims 109 to 143 further configured to
utilize a cluster
deployment framework including:
preparing a Zookeeper cluster;
preparing a Hadoop cluster;
preparing a Redis cluster, wherein a deployment mode is three sentinels plus
N*2 sets of
servers, wherein N is a number of master nodes;
preparing a Kafka cluster;
preparing a scheduler, using Hadoop and Flink components;
starting the Zookeeper, Hadoop, Kafka and Redis clusters; and
creating a model repository on HDFS, wherein trained models are released in
the
repository.
51
Date Recue/Date Received 2023-12-05

145.A computer readable physical memory having stored thereon a computer
program executed
by a computer configured to:
perform short-term feature calculation on a log in a message queue according
to a
calculation model file, and store short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
on receipt of an event from the message queue, read a long-term feature and a
short-tenn
feature of a to-be-recommended user from the distributed storage layer;
perform real-time prediction according to a prediction model file, the long-
term feature
and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
write the preference prediction result of the to-be-recommended user in the
distributed
storage layer.
146. The memory of claim 145 wherein the message queue is a mode of
implementation of a
middleware.
147. The memory of claim 146 wherein the message queue stores log data.
148. The memory of claim 147 wherein the log data includes behavior data
generated by user
behavior operations within a preset time period.
149. The memory of claim 148 wherein the behavior operations include at least
one of instance
browsing, clicking, listing as favorite, sharing a commodity object operated
by the user.
150. The memory of any one of claims 148 or 149 wherein the preset time period
is set according
to practical requirements.
151. The memory of claim 150 wherein the distributed storage layer is an NOSQL
storage layer
which includes a cache component and a persistence component.
52
Date Recue/Date Received 2023-12-05

152. The memory of claim 151 wherein the cache component is a Redis component
which has a
distributed, high-reading, and high-writing, small storage capacity, and
stores data within a
period of time.
153. The memory of any one of claims 151 or 152 wherein the persistence
component is an
Hbase component which has a distributed, low-reading, and high-writing, large
storage
capacity, and column-extensible properties, and stores data beyond a period of
time.
154. The memory of any one of claims 151 to 153 wherein the cache component
and the
persistence component can both be used to store the long-term features and the
short-term
features.
155. The memory of any one of claims 145 to 154 wherein the long-term features
represent long-
term preferences of the user with respect to a commodity.
156. The memory of claim 155 wherein the long-term preferences of the user
include at least one
of user basic features, user long-term behavior features and other features.
157. The memory of claim 156 wherein user basic features include at least one
of age, gender,
profession, place of residence, and preference of the user.
158. The memory of any one of claims 155 or 156 wherein the user long-term
behaviour features
include shopping behaviour statistic values at an e-commerce platform within a
period of
time.
159. The memory of claim 154 wherein the short-term features represent short-
term preferences
of the user with respect to a commodity.
160. The memory of claim 159 wherein the short-term preferences are obtained
from behavior
operation data generated by the user within the preset time period.
161. The memory of any one of claims 145 to 160 further configured to employ a
first machine
learning algorithm to train primary data of a sample user to generate a short-
term model and
to obtain the calculation model file.
53
Date Recue/Date Received 2023-12-05

162. The memory of claim 161 wherein the first machine learning algorithm is
embodied as one
of an iterative decision tree, a logic regression, and a support vector
machine.
163. The memory of any one of claims 145 to 162 wherein the to-be-recommended
user is
determined based on a preset user identification.
164. The memory of any one of claims 145 to 163 further configured to employ a
second
machine learning algorithm to generate the prediction model file.
165. The memory of claim 164 wherein the second machine learning algorithm is
embodied as
one of an iterative decision tree, a logic regression, and a support vector
machine.
166. The memory of any one of claims 145 to 165, further configured to: prior
to the step of
performing short-term feature calculation on a log in a message queue
according to a
calculation model file:
obtain a configuration file that contains a configuration parameter; and
load from a model repository a model file well trained in advance to which the
configuration parameter corresponds, and send the model file to various
distributed nodes
of a Flink stream processing engine, wherein the model file includes the
calculation
model file and the prediction model file.
167. The memory according to Claim 166, further configured to:
synchronize at least one configuration file from a distributed configuration
cluster, and
loading the at least one configuration file into a memory.
168. The memory according to Claim 166, further configured to:
read a preconfigured model identification parameter from the configuration
parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
54
Date Recue/Date Received 2023-12-05

load from the model repository a model file to which the model identification
parameter
corresponds.
169. The memory of any one of claims 166 to 168 further configured to:
initialize a model repository client end at an application submission phase of
the Flink
stream processing engine;
load the model file and a relevant script configuration file to memory in
which the Flink
stream processing engine resides;
employ distributed file cache by the memory to submit the model file and the
relevant
script configuration file to the various distributed nodes;
load the model repository client end during a task loading at the various
nodes, obtaining
the model from the model file required to be used according to the
configuration file and
loading the model to the memory, and generating corresponding UDF; and
employ the model file in the memory during prediction to perform prediction.
170. The memory of any one of claims 166 to 169 wherein the model file is
automatically loaded
and deployed from the repository.
171. The memory of claim 166, further configured to:
perform short-term feature calculation on the log according to the calculation
model file
through the various distributed nodes;
generate a prediction model according to the prediction model file through the
various
distributed nodes; and
input the long-term feature and the short-term feature of the to-be-
recommended user to
the prediction model for real-time prediction, to obtain the preference
prediction result of
the to-be-recommended user.
Date Recue/Date Received 2023-12-05

172. The memory of claim 151, further configured to:
read a long-term feature and a short-term feature of the to-be-recommended
user from a
local cache through the various distributed nodes;
if reading from the local cache failed, read a long-term feature and a short-
term feature of
the to-be-recommended user from the cache component; and
if reading from the cache component failed, read a long-term feature and a
short-term
feature of the to-be-recommended user from the persistence component, and
asynchronously writing the read features in the cache component.
173. The memory of claim 172 wherein the cache component is a distributed
secondary cache
component.
174. The memory of claim 173 wherein the secondary cache component stores
partial data, has a
large capacity, a long expiration time, and utilizes an LRU policy.
175. The memory of claim 173 wherein a data synchronizer is employed to
asynchronously write
a result to the secondary cache component for standby use in the future.
176. The memory of claim 172 wherein the local cache is a host memory that
only retains related
data, has a small capacity, a short expiration time, and utilizes an LRU
policy.
177. The memory of claim 172 wherein the persistence component retains total
data for at least a
year.
178. The memory of claim 172 wherein the persistence component can be an
offline data
warehouse.
179. The memory of any one of claims 145 to 178, characterized in that the
calculation model
file is an SQL model file, and the prediction model file is a PMML model file.
56
Date Recue/Date Received 2023-12-05

180. The memory of any one of claims 145 to 179 further configured to utilize
a cluster
deployment framework including:
preparing a Zookeeper cluster;
preparing a Hadoop cluster;
preparing a Redis cluster, wherein a deployment mode is three sentinels plus
N*2 sets of
servers, wherein N is a number of master nodes;
preparing a Kafka cluster;
preparing a scheduler, using Hadoop and Flink components;
starting the Zookeeper, Hadoop, Kafka and Redis clusters; and
creating a model repository on HDFS, wherein trained models are released in
the
repository.
57
Date Recue/Date Received 2023-12-05

Description

Note: Descriptions are shown in the official language in which they were submitted.


FLINK STREAMING PROCESSING ENGINE METHOD AND DEVICE FOR REAL-
TIME RECOMMENDATION AND COMPUTER EQUIPMENT
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present application relates to the field of big data recommendation
technology, and
more particularly to a Flink stream processing engine method for real-time
recommendation, and corresponding device and computer equipment.
Description of Related Art
[0002] With the upgrading of user experiences, higher and higher demand is
correspondingly put
on the real timeliness requirement of recommendation. The traditional double-
layered
recommendation framework (application layer, offline layer) engenders the
occurrence
of such phenomena as complicated function of the application layer, response
timeout
and data inconsistency. In addition, although realization of recommendation at
the offline
layer is quick, time precision of recommendation reaches only to the magnitude
of days;
however, since user interests generally diminish rapidly with the attenuation
of time, if
items of interest to users cannot be pushed to the users at the first time,
the conversion
rate would be reduced and even user loss ensues. This makes extremely valuable
the
creation of a set of real-time recommending system capable of analyzing users'
behaviors
in real time and so performing prediction that the system can respond timely.
[0003] Real-time recommendation means the capabilities to perceive changes in
users' behaviors
and to push precise contents to users according to users' behaviors generated
by the users
in real time. At present, the typical recommending system is a triple-layered
framework,
namely online, near-line and offline. By means of offline or near-line model
training,
near-line real-time feature engineering and callback and sorting, and finally
through
1
Date Recue/Date Received 2022-03-22

forced interference and bottom coverage of the business performed at the
online layer,
the value of the system is fully reflected in terms of highly speedy response
to scenarios
with massive data volumes.
[0004] The recommendation algorithm engine that is based on stream processing
is at the kernel
position of the near-line portion, whereas the traditional scheme that employs
STORM or
SPARKSTREAMING, for example, for realization is problematic in the following
aspects:
[0005] 1) There lacks utilization of the latest users' behavior data, and the
user feature model
cannot be updated, so it is impossible to perceive in real time the real-time
changes in
users' interests.
[0006] 2) It is impossible to give considerations to reliability, disaster
recovery, performance and
generality at the same time, thereby making it difficult to deploy the
algorithm model and
to complete real-time response.
[0007] 3) There are too many technical details by the use of JAVA or PYTHON
language for
development, whereby development is slow, and it is difficult to timely
respond to
business requirements.
SUMMARY OF THE INVENTION
[0008] In order to overcome at least one problem mentioned in the above
Description of Related
Art, the present application provides a Flink stream processing engine method
for real-
time recommendation, and corresponding device and computer equipment, with
technical
solutions being specified as follows.
[0009] According to the first aspect, there is provided a Flink stream
processing engine method
for real-time recommendation, which method comprises:
[0010] performing short-term feature calculation on a log in a message queue
according to a
calculation model file, and storing short-term features obtained by the
calculation in a
2
Date Recue/Date Received 2022-03-22

distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
[0011] reading a long-term feature and a short-term feature of a to-be-
recommended user from
the distributed storage layer, when an event is received from the message
queue;
[0012] performing real-time prediction according to a prediction model file,
the long-term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
[0013] writing the preference prediction result of the to-be-recommended user
in the distributed
storage layer.
[0014] Further, prior to the step of performing short-term feature calculation
on a log in a
message queue according to a calculation model file, the method further
comprises:
[0015] obtaining a configuration file that contains a configuration parameter;
and
[0016] loading from a model repository a model file well trained in advance to
which the
configuration parameter corresponds, and sending the model file to various
distributed
nodes of the Flink stream processing engine, wherein the model file includes
the
calculation model file and the prediction model file.
[0017] Further, the step of obtaining a configuration file includes:
[0018] synchronizing at least one configuration file from a distributed
configuration cluster, and
loading the at least one configuration file into a memory.
[0019] Further, the step of loading from a model repository a model file to
which the
configuration parameter corresponds includes:
[0020] reading a preconfigured model identification parameter from the
configuration parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
[0021] loading from the model repository a model file to which the model
identification
parameter corresponds.
3
Date Recue/Date Received 2022-03-22

[0022] Further, the step of performing short-term feature calculation on a log
in a message queue
according to a calculation model file includes:
[0023] performing short-term feature calculation on the log according to the
calculation model
file through the various distributed nodes;
[0024] the step of performing real-time prediction according to a prediction
model file, the long-
term feature and the short-term feature of the to-be-recommended user, to
obtain a
preference prediction result of the to-be-recommended user includes:
[0025] generating a prediction model according to the prediction model file
through the various
distributed nodes; and
[0026] inputting the long-term feature and the short-term feature of the to-be-
recommended user
to the prediction model for real-time prediction, to obtain the preference
prediction result
of the to-be-recommended user.
[0027] Further, the distributed storage layer includes a cache component and a
persistence
component, and the step of reading a long-term feature and a short-term
feature of a to-
be-recommended user from the distributed storage layer, when an event is
received from
the message queue includes:
[0028] reading a long-term feature and a short-term feature of the to-be-
recommended user from
a local cache through the various distributed nodes;
[0029] reading a long-term feature and a short-term feature of the to-be-
recommended user from
the cache component, if reading from the local cache failed; and
[0030] reading a long-term feature and a short-term feature of the to-be-
recommended user from
the persistence component, if reading from the cache component failed, and
asynchronously writing the read features in the cache component.
[0031] Further, the calculation model file is an SQL model file, and the
prediction model file is
a PMML model file.
4
Date Recue/Date Received 2022-03-22

[0032] According to the second aspect, there is provided a Flink stream
processing engine device
for real-time recommendation, which device comprises:
[0033] an algorithm calculating module, for performing short-term feature
calculation on a log
in a message queue according to a calculation model file;
[0034] a data pushing module, for storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance; and
[0035] a feature enquiring module, for reading a long-term feature and a short-
term feature of a
to-be-recommended user from the distributed storage layer, when an event is
received
from the message queue;
[0036] the algorithm calculating module is further employed for performing
real-time prediction
according to a prediction model file, the long-term feature and the short-term
feature of
the to-be-recommended user, to obtain a preference prediction result of the to-
be-
recommended user; and
[0037] the data pushing module is further employed for writing the preference
prediction result
of the to-be-recommended user in the distributed storage layer.
[0038] Further, the device further comprises:
[0039] a configuration synchronizing module, for obtaining a configuration
file that contains a
configuration parameter; and
[0040] a model loading module, for loading from a model repository a model
file well trained in
advance to which the configuration parameter corresponds, and sending the
model file to
various distributed nodes of the Flink stream processing engine, wherein the
model file
includes the calculation model file and the prediction model file.
[0041] Further, the configuration synchronizing module is specifically
employed for:
[0042] synchronizing at least one configuration file from a distributed
configuration cluster, and
loading the at least one configuration file into a memory.
Date Recue/Date Received 2022-03-22

[0043] Further, the model loading module is specifically employed for:
[0044] reading a preconfigured model identification parameter from the
configuration parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
[0045] loading from the model repository a model file to which the model
identification
parameter corresponds.
[0046] Further, the algorithm calculating module is specifically employed for:
[0047] performing short-term feature calculation on the log according to the
calculation model
file through the various distributed nodes;
[0048] the algorithm calculating module is specifically further employed for:
[0049] generating a prediction model according to the prediction model file
through the various
distributed nodes; and
[0050] inputting the long-term feature and the short-term feature of the to-be-
recommended user
to the prediction model for real-time prediction, to obtain the preference
prediction result
of the to-be-recommended user.
[0051] Further, the distributed storage layer includes a cache component and a
persistence
component, and the feature enquiring module is specifically employed for:
[0052] reading a long-term feature and a short-term feature of the to-be-
recommended user from
a local cache through the various distributed nodes;
[0053] reading a long-term feature and a short-term feature of the to-be-
recommended user from
the cache component, if reading from the local cache failed; and
[0054] reading a long-term feature and a short-term feature of the to-be-
recommended user from
the persistence component, if reading from the cache component failed, and
asynchronously writing the read features in the cache component.
[0055] Further, the calculation model file is an SQL model file, and the
prediction model file is
a PMML model file.
6
Date Recue/Date Received 2022-03-22

[0056] According to the third aspect, there is provided a computer equipment
that comprises a
memory, a processor and a computer program stored on the memory and operable
on the
processor, and the following operating steps are realized when the processor
executes the
computer program:
[0057] performing short-term feature calculation on a log in a message queue
according to a
calculation model file, and storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
[0058] reading a long-term feature and a short-term feature of a to-be-
recommended user from
the distributed storage layer, when an event is received from the message
queue;
[0059] performing real-time prediction according to a prediction model file,
the long-term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
[0060] writing the preference prediction result of the to-be-recommended user
in the distributed
storage layer.
[0061] According to the fourth aspect, there is provided a computer-readable
storage medium
storing a computer program thereon, and the following operating steps are
realized when
the computer program is executed by a processor:
[0062] performing short-term feature calculation on a log in a message queue
according to a
calculation model file, and storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
[0063] reading a long-term feature and a short-term feature of a to-be-
recommended user from
the distributed storage layer, when an event is received from the message
queue;
[0064] performing real-time prediction according to a prediction model file,
the long-term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
7
Date Recue/Date Received 2022-03-22

[0065] writing the preference prediction result of the to-be-recommended user
in the distributed
storage layer.
[0066] The technical solutions provided by the present application at least
achieve the following
advantageous effects:
[0067] 1. Message reliability guarantee and capacity expansion and extraction:
the distributed
stream engine Flink itself brings about precise message reliability and
flexible capacities
to expand and extract, and makes it possible to reduce resource consumption in
normal
times but to increase resources in large-scale promotional activities to
respond to peak
flow.
[0068] 2. Reduction in algorithm development difficulty: the algorithm
personnel more often
employ such language systems as SQL, Python, and machine learning, rather than
the
JAVA language system, whereas the present application makes use of the SQL and
P]\4IML
general language standards, whereby development difficulty and time
consumption of the
algorithm are both reduced, and most technical problems are solved.
[0069] 3. Reduction in response duration: the response duration by the use of
an offline algorithm
is usually of the hourly magnitude, and the operation time usually cannot be
guaranteed,
whereas the response time based on the use of the current engine is of the
millisecond
magnitude, whereby real-time recommendation and feature engineering are made
possible.
[0070] 4. One-stop model deployment: models obtained by offline training can
be stored in a
general model repository, the current engine is subsequently specified to read
the model
of given coordinates before the engine automatically starts deployment,
whereby massive
deployment operations are dispensed with, and error and time cost are reduced.
BRIEF DESCRIPTION OF THE DRAWINGS
[0071] To more clearly describe the technical solutions in the embodiments of
the present
application, drawings required to be used in the description of the
embodiments will be
8
Date Recue/Date Received 2022-03-22

briefly introduced below. Apparently, the drawings introduced below are merely
directed
to some embodiments of the present application, while it is possible for
persons ordinarily
skilled in the art to acquire other drawings based on these drawings without
spending
creative effort in the process.
[0072] Fig. 1 is a view schematically illustrating data interaction between
the stream processing
engine and relevant components provided by the present application;
[0073] Fig. 2 is a view schematically illustrating the structure of the Flink
stream processing
engine provided by the present application;
[0074] Fig. 3 is a flowchart illustrating the Flink stream processing engine
method for real-time
recommendation provided by the present application;
[0075] Fig. 4 is a flowchart illustrating loading of the model file provided
by the present
application;
[0076] Fig. 5 is a flowchart illustrating reading of three-level cache
features provided by the
present application;
[0077] Fig. 6 is a view illustrating a cluster deployment framework provided
by the present
application; and
[0078] Fig. 7 is a view illustrating the internal structure of a computer
equipment provided by an
embodiment of the present application.
DETAILED DESCRIPTION OF THE INVENTION
[0079] To make more lucid and clear the objectives, technical solutions and
advantages of the
9
Date Recue/Date Received 2022-03-22

present application, the technical solutions in the embodiments of the present
application
will be more clearly and comprehensively described below with reference to the
accompanying drawings in the embodiments of the present application.
Apparently, the
embodiments as described are merely partial, rather than the entire,
embodiments of the
present application. All other embodiments obtainable by persons ordinarily
skilled in the
art on the basis of the embodiments in the present application without
spending creative
effort shall all fall within the protection scope of the present application.
[0080] As should be noted, unless explicitly demanded otherwise in the
context, such wordings
as "comprising", "including" and "containing" as well as their various
grammatical forms
as used throughout the Description and the Claims shall be understood to
indicate the
meaning of inclusion rather than the meaning of exclusion or exhaustion, in
other words,
they collectively mean "including, but not limited to...". In addition, unless
noted
otherwise, the wordings of "more" and "plural" indicate the meaning of "two or
more" in
the description of the present application.
[0081] Fig. 1 is a view schematically illustrating data interaction between
the stream processing
engine and relevant components provided by the present application. As shown
in Fig. 1,
black line arrows indicate data flows, grey line arrows indicate file flows,
and long-term
features are pushed by offline data synchronizer (Long Term Feature
Synchronizer) 101
to NOSQL storage layer 102 that at least contains a persistence component
(Persistence)
and a cache component (Cache). Stream processing engine (Stream Engine) 103
reads a
configuration file from distributed configuration cluster (Configuration
Zookeeper) 104,
and loads from model repository (Model Repository) 105 a model file (for
instance, SQL
model file and PMML model file) well trained in advance. Stream engine 103
uses a log
in message queue (MQ Kafka) 106 to perform short-term feature engineering,
calculates
and thereafter stores data in the persistence component and the cache
component in
NOSQL storage layer 102 before writes back an event to message queue 106.
Thereafter,
triggered by this event, predicting stream engine 103 obtains long-term
features and short-
Date Recue/Date Received 2022-03-22

term features from NOSQL storage layer 102, uses already loaded model text and
script
to perform real-time prediction, and finally writes the result in the
persistence component
and the cache component of NOSQL storage layer 102. A stream structure with
separated
reading and writing is thus formed. In actual application, with the use of the
Flink
component, the above stream processing is superior than Storm in terms of
message
supportability, volume of development, and throughput. The distributed
configuration
cluster is embodied as Zookeeper that can be combined with a Hadoop cluster.
The
message queue is embodied as Kafica that is characteristic of high throughput
and low
time delay, and conforms to log-type, message-triggering scenarios. The
persistence
storage component is embodied as HBase that possesses distributed, high-
writing and
low-reading, large storage capacity, and column-extensible characteristics.
The cache
component is embodied as Redis that possesses distributed, high-reading and
high-
writing, small storage capacity characteristics, and conforms to application
scenarios.
[0082] Fig. 2 is a view schematically illustrating the structure of the Flink
stream processing
engine provided by the present application. As shown in Fig. 2, the Flink
stream
processing engine (Stream Engine) comprises the following modules: a
configuration
synchronizing module (Configuration Synchronizer), a model loading module
(Model
Loader), an MQ connecting module (MQ connector), a feature enquiring module
(Query er), an algorithm calculating module (Algorithm Calculation), and a
data pushing
module (Data Pusher). Of these, the configuration synchronizing module, the
model
loading module and the MQ connecting module can be disposed on the same
scheduler,
and the feature enquiring module and the algorithm calculating module can be
disposed
on various distributed nodes in the distributed cluster, while the scheduler
can perform
information interaction with the various distributed nodes.
[0083] The configuration synchronizing module is employed for synchronizing
various
differently typed configuration files from the distributed configuration
cluster, loading
the same in the memory, and providing necessary parameter configurations for
the
11
Date Recue/Date Received 2022-03-22

recommending system, including, but not limited to, model hyperparameters,
feature
weights, and number restriction of recommendation results, etc., for example,
selecting a
corresponding recommendation algorithm for prediction from the model
repository
according to user configuration parameters, while it is further possible for
the business
personnel to interfere configurable recommendation results through the system
configuration page to certain extent.
[0084] The model loading module is employed for loading SQL model files or
PMML model
files from the model repository, and sending the files to the various
distributed nodes by
means of the distributed file caching principle for standby use by distributed
tasks.
[0085] The MQ connecting module is employed for obtaining log data from the
message queue
MQ so as to perform short-telin feature calculation through the algorithm
calculating
module, for obtaining an event from the message queue so as to facilitate the
algorithm
calculating module to base on the event to obtain long-term features and short-
term
features from the NOSQL storage layer, and additionally for writing the
response event
returned by the algorithm calculating module in the message queue MQ for use
by
downstream systems.
[0086] The feature enquiring module is employed for enquiring feature data
from the NOSQL
cluster per piece or in batches. Due to the big data scenario, the requirement
for reading
feature data is short while reading in great batches and persistent. To meet
such
performance requirement, it is possible to employ the mode of a triple-layered
cache plus
a separated reading and writing mechanism.
[0087] The algorithm calculating module is located on the various distributed
nodes, and uses
the model file loaded in advance by the model loading module, after having
received an
event from the message queue, it uses the native stream SQL calculating
function to
calculate features or uses the PMML predicting function to predict results.
12
Date Recue/Date Received 2022-03-22

[0088] The data pushing module writes the resultant data in the persistence
component and the
cache component simultaneously, and writes the triggered event after obtaining
response
from the persistence component and the cache component in the message queue
for
downstream use.
[0089] In order to realize functional requirements of near-line streaming
recommendation in a
recommending system, and to give considerations at the same time to such
problems as
concerning message reliability, disaster recovery, performance, and
transaction that
possibly occur during the operating process of the recommending system, the
present
application ensures storage administration of global and local statuses and
precise
reliability guarantee through the distributed stream calculation framework
Flink,
performs prediction calculation by means of a general model repository client
end and
model files that conform to the SQL and PMML standards, realizes automatic
model
loading, and solves the problem in which it is difficult to deploy the
algorithm model and
it is difficult to respond in real time; quick reading and reliability of
massive data are
guaranteed through reading and writing separation and triple-layered caching
mechanism
of the feature data repository.
[0090] The technical solutions of the present application are further
described below through a
plurality of embodiments.
[0091] In one embodiment, there is provided a Flink stream processing engine
method for real-
time recommendation, with reference to Fig. 3, the method can comprise the
following
steps.
[0092] 301 ¨ performing short-term feature calculation on a log in a message
queue according
to a calculation model file, and storing short-term features obtained by the
calculation in
a distributed storage layer, wherein the distributed storage layer stores
therein long-term
13
Date Recue/Date Received 2022-03-22

features obtained in advance.
[0093] The message queue is a mode of implementation of a message middleware,
such as a
kafca message queue, the message queue stores therein log data, the log data
can include
behavior data generated by user behavior operations within a preset time
period, and the
behavior operations are for instance browsing, clicking, listing as favorite,
and sharing of
a certain commodity object operated by a user on a certain APP page. The
preset time
period can be set according to practical requirements, for instance,1 day, 3
days or 7 days,
etc., in addition, the present application makes no restriction to the source
of the log data,
and the log data can for example come from an application APP.
[0094] The distributed storage layer can be embodied as an NOSQL storage layer
(a non-
relational database) that includes a cache component and a persistence
component, of
which the cache component can be embodied as cache component Redis that
possesses
distributed, high-reading and high-writing, small storage capacity
characteristics, and can
store data within a period of time (such as within 1 year); the persistence
storage
component can be embodied as persistence storage component HBase that
possesses
distributed, high-writing and low-reading, large storage capacity, and column-
extensible
characteristics, and can store data that has been overdue longer (such as
exceeding 1 year).
Both the cache component and the persistence component can be used to store
long-term
features and short-term features related to user operational behaviors.
[0095] The aforementioned long-term features are used to reflect long-term
preference of a user
with respect to a designated commodity, including user basic feature, user
long-term
behavior feature and other features, of which the user basic feature includes,
for instance,
age, gender, profession, place of residence and preference of the user; the
user behavior
feature means shopping behavior statistic values at an e-commerce platform
within a
period of time, such as the statistic feature of operational behaviors of the
user under a
certain commodity category within 30 days, 90 days, 180 days. Long-term
features can
14
Date Recue/Date Received 2022-03-22

be extracted out of user data, for example, a long-term model well trained
through offline
data is used to extract long-teini features of the user from user data, to
which no specific
restriction is made in the embodiments of the present application.
[0096] The aforementioned short-term features are used to reflect short-term
preference of a user
with respect to a designated commodity, and can be extracted and obtained from
behavior
operation data generated by the user within a preset time period, such as the
behavior
operation data generated within 7 days.
[0097] Specifically, the Flink stream processing engine can obtain log data
from the message
queue according to a preset time interval, employ a preset calculation model
file to
perform short-term feature calculation on the user behavior data in the log
data, and store
short-term features obtained by such calculation respectively in the cache
component and
the persistence component of the distributed storage layer. The preset time
interval can
be set according to practical requirements, such as 10 seconds. A machine
learning
algorithm can be employed to train primary data of a sample user to generate a
short-term
model, and to obtain a calculation model file of the short-term model. The
machine
learning algorithm can be embodied as an iterative decision tree, logic
regression or a
support vector machine, to which no restriction is made in the embodiments of
the present
application.
[0098] 302 ¨ reading a long-term feature and a short-term feature of a to-be-
recommended user
from the distributed storage layer, when an event is received from the message
queue.
[0099] After the Flink stream processing engine has stored the short-term
features of the user in
the persistence component and the cache component in the distributed storage
layer, an
event will be written back to the message queue.
[0100] Specifically, having determined that the message queue has successfully
received the
Date Recue/Date Received 2022-03-22

event, triggered by this event, the Flink stream processing engine can obtain
a long-term
feature and a short-term feature of a to-be-recommended user from the cache
component
and the persistence component in the distributed storage layer. There may be
one or more
to-be-recommended user(s), to which no restriction is made in the embodiments
of the
present application.
[0101] The to-be-recommended user can be a user requiring to be recommended
with interesting
contents, the contents here can be commodity information, and the to-be-
recommended
user can be determined based on a preset user identification.
[0102] 303 ¨ performing real-time prediction according to a prediction model
file, the long-term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user.
[0103] The preference prediction result of the to-be-recommended user can
include commodity
objects to be recommended.
[0104] Specifically, the Flink stream processing engine can generate a
prediction model
according to a prediction model file obtained in advance, and perform real-
time prediction
on the long-term feature and the short-term feature of the to-be-recommended
user
according to the prediction model to obtain the preference prediction result
of the to-be-
recommended user.
[0105] A machine learning algorithm can be employed to train to generate a
prediction model
and to obtain a prediction model file of the prediction model on the basis of
the long-term
feature and the short-term feature of a sample user and preference commodity
objects of
the sample user as training samples. The machine learning algorithm can be
embodied as
an iterative decision tree, logic regression or a support vector machine, to
which no
restriction is made in the embodiments of the present application.
16
Date Recue/Date Received 2022-03-22

[0106] In this embodiment, since the prediction model file is obtained by
training and learning
of the long-term feature and short-term feature of the user, long-term feature
data of the
user is not only used, but short-term (latest) feature data of the user is
also given
consideration, whereby it is made possible to more precisely predict
preference
commodity objects of the to-be-recommended user, and to effectively enhance
precision
and reliability of contents recommended to the user.
[0107] 304 ¨ writing the preference prediction result of the to-be-recommended
user in the
distributed storage layer.
[0108] Specifically, the Flink stream processing engine can write the
preference prediction result
of the to-be-recommended user simultaneously in the cache component and the
persistence component in the distributed storage layer, and write the
triggered event in
the message queue after obtaining response result returned from the cache
component
and the persistence component, so as to dispose the preference prediction
result of the to-
be-recommended user for use by downstream systems.
[0109] This embodiment of the present application provides a Flink stream
processing engine
method for real-time recommendation, the distributed stream engine Flink
itself brings
about precise message reliability and flexible capacities to expand and
extract, thus makes
it possible to guarantee message reliability and capacities to expand and
extract of the
system, and makes it possible to reduce resource consumption in normal times
but to
increase resources in large-scale promotional activities to respond to peak
flow; the
response duration by the use of an offline algorithm is usually of the hourly
magnitude,
and the operation time usually cannot be guaranteed, whereas the response time
based on
the use of the current engine is of the millisecond magnitude, whereby real-
time
recommendation and feature engineering are made possible, and response
duration is
thereby effectively reduced.
17
Date Recue/Date Received 2022-03-22

[0110] In one embodiment, prior to the aforementioned step 301 of performing
short-term feature
calculation on a log in a message queue according to a calculation model file,
the method
can further comprise:
[0111] obtaining a configuration file that contains a configuration parameter;
and
[0112] loading from a model repository a model file well trained in advance to
which the
configuration parameter corresponds, and sending the model file to various
distributed
nodes of the Flink stream processing engine, wherein the model file includes
the
calculation model file and the prediction model file.
[0113] The step of obtaining the configuration file can include:
[0114] synchronizing at least one configuration file from a distributed
configuration cluster, and
loading the at least one configuration file into a memory.
[0115] Specifically, the configuration synchronizing module in the Flink
stream processing
engine can synchronize differently typed configuration files from the
distributed
configuration cluster, and send the configuration file to the algorithm
calculating module
located at the various distributed nodes, so that the algorithm calculating
module loads
the configuration file to the memories of the distributed nodes.
[0116] The step of loading from a model repository a model file to which the
configuration
parameter corresponds can include:
[0117] reading a preconfigured model identification parameter from the
configuration parameter,
and loading from the model repository a model file to which the model
identification
parameter corresponds, wherein the model identification parameter includes a
namespace,
a model name and a model version.
[0118] In a concrete example as shown in Fig. 4, the aforementioned process of
loading the
model file can include the following steps:
18
Date Recue/Date Received 2022-03-22

[0119] a) initializing the model repository client end at the application
submission phase of the
Flink stream processing engine;
[0120] b) loading the model file and the relevant script configuration file to
the equipment in
which the Flink stream processing engine resides;
[0121] c) employing distributed file cache by the equipment to submit the
model and the relevant
file to the various distributed nodes;
[0122] d) loading the model repository client end during task loading at the
various nodes,
obtaining the model required to be used according to the configuration and
loading the
model to the memory, and generating corresponding UDF; and
[0123] e) employing the model file in the memory during prediction to perform
prediction.
[0124] In this embodiment, after the model file trained offline has been
submitted to the model
repository, it suffices to automatically load the model to the stream
processing engine and
operate the model through the model identification parameter of the model file
required
to be used in the configuration parameter, so that the difficulty in deploying
the algorithm
model and the difficulty in completing real-time response are avoided, one-
stop automatic
start and deployment of the model is realized, massive deployment operations
are
dispensed with, and error and time cost are reduced.
[0125] In one embodiment, the aforementioned step 301 of performing short-term
feature
calculation on a log in a message queue according to a calculation model file
can include:
[0126] performing short-term feature calculation on the log according to the
calculation model
file through the various distributed nodes.
[0127] The aforementioned step 303 of performing real-time prediction
according to a prediction
model file, the long-term feature and the short-term feature of the to-be-
recommended
user, to obtain a preference prediction result of the to-be-recommended user
can include:
[0128] generating a prediction model according to the prediction model file
through the various
distributed nodes; and
19
Date Recue/Date Received 2022-03-22

[0129] inputting the long-term feature and the short-term feature of the to-be-
recommended user
to the prediction model for real-time prediction, to obtain the preference
prediction result
of the to-be-recommended user.
[0130] In the present application, the characteristic of distributed message
response of Flink itself
is utilized, whereby its parallel capability is maximally utilized to enhance
the computing
capability and to shorten prediction time at the same time.
[0131] In one embodiment, the distributed storage layer includes a cache
component and a
persistence component, and the aforementioned step 302 of reading a long-term
feature
and a short-term feature of a to-be-recommended user from the distributed
storage layer,
when an event is received from the message queue can include:
[0132] reading a long-term feature and a short-term feature of the to-be-
recommended user from
a local cache through the various distributed nodes; reading a long-telln
feature and a
short-term feature of the to-be-recommended user from the cache component, if
reading
from the local cache failed; and reading a long-term feature and a short-term
feature of
the to-be-recommended user from the persistence component, if reading from the
cache
component failed, and asynchronously writing the read features in the cache
component.
[0133] Specifically, different features in the NOSQL storage layer are read by
each distributed
node through the feature enquiring module (Query er) and based on the
configuration
parameter obtained in advance.
[0134] More specifically, with reference to the flowchart illustrating reading
of three-level cache
features in Fig. 5, the feature enquiring module firstly reads from a local
primary cache,
return directly ensues if hit, reads from a distributed secondary cache if not
hit, return
ensues from the secondary cache if hit, reads from a persistence layer if not
hit, and
employs a data synchronizer to asynchronously write back the result to the
secondary
cache for standby use next time.
Date Recue/Date Received 2022-03-22

[0135] The primary cache is the host memory that only retains related data, is
relatively small in
capacity, the expiration time thereof is relatively short (the expiration time
should be
configured according to the business), and utilizes the LRU policy. The
secondary cache
is a distributed cache that stores partial data, is relatively large in
capacity, the expiration
time thereof is relatively long (the expiration time should be configured
according to the
business), likewise utilizes the LRU policy, and random increase and decrease
of the
expiration time are performed within a certain range in order to avoid the
avalanche effect.
The tertiary is persistence, retains total data, can be set to expire as one
year or longer,
and can be exchanged to an offline data warehouse if it should be retained for
more longer
time. The data synchronizer possesses a queue, and invoking of the cache
method merely
effects simple disposal in the queue; a thread can be separately initiated,
and data in the
queue is then consumed by the thread to enter distributed cache.
[0136] In this embodiment, through the guarantee by the three-level caching
mechanism, loss of
feature data can be greatly reduced, or the possibility of erring in the
recommendation
result caused by error in the process of loading the feature data can be
reduced, so that
the reading performance and reliability of the feature data can be ensured at
the same
time, whereby robustness of the entire system is enhanced.
[0137] In one embodiment, the calculation model file is an SQL model file, and
the prediction
model file is a PMML model file.
[0138] In this embodiment, the stream processing engine is realized on the
basis of the FLINK
and PMML technologies, which enable the algorithm personnel without Java
language
capability to process features in real time and to perform model prediction in
real time
merely by using streaming SQL and PMML model files in a general model
repository, so
as to solve the problems in which there are too many technical details by the
use of JAVA
or PYTHON language for development, whereby development is slow, and it is
difficult
21
Date Recue/Date Received 2022-03-22

to timely respond to business requirements.
[0139] Fig. 6 is a view illustrating a cluster deployment framework provided
by the present
application, the cluster deployment framework is employed to realize the Flink
stream
processing engine method for real-time recommendation in the aforementioned
embodiments, specifically, the cluster deployment framework can be deployed by
the
mode described below:
[0140] 1. preparation of a Zookeeper cluster: the specific Zookeeper cluster
deployment mode
has already been mature, and can be deployed or self-tuned according to the
mode
recommended in Fig 6. However, as should be noted, one set can be used for
Zookeeper
relevant to Hadoop, while another set should be used for Kafka.
[0141] 2. preparation of a Hadoop cluster: the specific Hadoop cluster
deployment mode has
already been mature, to which no repetition is redundantly made in this
context, and it is
suggested that a physical machine rather than a virtual machine be used, and
it is also
required to install HBase and Flink components on the cluster.
[0142] 3. preparation of a Redis cluster: the suggested deployment mode is
three sentinels plus
N*2 sets of servers, where N is the number of master nodes, the specific N
should be
determined according to business scale, it is N*2 because one master is
equipped with a
one standby. The specific Redis cluster deployment method has already been
mature, to
which no repetition is redundantly made in this context, but it is suggested
that it be
deployed close to the calculation node, best within the single route to ensure
lower
network time delay.
[0143] 4. preparation of a Kafka cluster: the specific Kafka cluster
deployment mode has already
been mature, to which no repetition is redundantly made in this context, and
reference
can be made to the official website.
[0144] 5. preparation of a scheduler: Hadoop and Flink components are
installed for submission
of engine application.
[0145] 6. The Zookeeper, HDFS, YARN, Kafka and Redis clusters are started.
[0146] 7. A model repository is newly created on HDFS, and well trained models
are released in
22
Date Recue/Date Received 2022-03-22

the repository.
[0147] 8. Flink submission script is used and command line parameters are
designated on the
scheduler, wherein coordinates of the models are indispensable parameters,
while the
remaining is optional.
[0148] In one embodiment, there is provided a Flink stream processing engine
device for real-
time recommendation, which device can comprise:
[0149] an algorithm calculating module, for performing short-term feature
calculation on a log
in a message queue according to a calculation model file;
[0150] a data pushing module, for storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance; and
[0151] a feature enquiring module, for reading a long-term feature and a short-
term feature of a
to-be-recommended user from the distributed storage layer, when an event is
received
from the message queue;
[0152] the algorithm calculating module is further employed for performing
real-time prediction
according to a prediction model file, the long-term feature and the short-term
feature of
the to-be-recommended user, to obtain a preference prediction result of the to-
be-
recommended user; and
[0153] the data pushing module is further employed for writing the preference
prediction result
of the to-be-recommended user in the distributed storage layer.
[0154] In one embodiment, the device further comprises:
[0155] a configuration synchronizing module, for obtaining a configuration
file that contains a
configuration parameter; and
[0156] a model loading module, for loading from a model repository a model
file well trained in
advance to which the configuration parameter corresponds, and sending the
model file to
various distributed nodes of the Flink stream processing engine, wherein the
model file
includes the calculation model file and the prediction model file.
23
Date Recue/Date Received 2022-03-22

[0157] In one embodiment, the configuration synchronizing module is
specifically employed for:
[0158] synchronizing at least one configuration file from a distributed
configuration cluster, and
loading the at least one configuration file into a memory.
[0159] In one embodiment, the model loading module is specifically employed
for:
[0160] reading a preconfigured model identification parameter from the
configuration parameter,
wherein the model identification parameter includes a namespace, a model name
and a
model version; and
[0161] loading from the model repository a model file to which the model
identification
parameter corresponds.
[0162] In one embodiment, the algorithm calculating module is specifically
employed for:
[0163] performing short-term feature calculation on the log according to the
calculation model
file through the various distributed nodes;
[0164] the algorithm calculating module is specifically further employed for:
[0165] generating a prediction model according to the prediction model file
through the various
distributed nodes; and
[0166] inputting the long-term feature and the short-term feature of the to-be-
recommended user
to the prediction model for real-time prediction, to obtain the preference
prediction result
of the to-be-recommended user.
[0167] In one embodiment, the distributed storage layer includes a cache
component and a
persistence component, and the feature enquiring module is specifically
employed for:
[0168] reading a long-term feature and a short-term feature of the to-be-
recommended user from
a local cache through the various distributed nodes;
[0169] reading a long-term feature and a short-term feature of the to-be-
recommended user from
the cache component, if reading from the local cache failed; and
[0170] reading a long-term feature and a short-term feature of the to-be-
recommended user from
24
Date Recue/Date Received 2022-03-22

the persistence component, if reading from the cache component failed, and
asynchronously writing the read features in the cache component.
[0171] In one embodiment, the calculation model file is an SQL model file, and
the prediction
model file is a PMML model file.
[0172] Specific definitions relevant to the user recommending device based on
stream processing
may be inferred from the aforementioned definitions to the Flink stream
processing
engine method for real-time recommendation, while no repetition is made in
this context.
The various modules in the aforementioned the user recommending device based
on
stream processing can be wholly or partly realized via software, hardware, and
a
combination of software with hardware. The various modules can be embedded in
the
form of hardware in a processor in a computer equipment or independent of any
computer
equipment, and can also be stored in the form of software in a memory in a
computer
equipment, so as to facilitate the processor to invoke and perform operations
corresponding to the aforementioned various modules.
[0173] In one embodiment, a computer equipment is provided, the computer
equipment can be
a server, and its internal structure can be as shown in Fig. 7. The computer
equipment
comprises a processor, a memory, and a network interface connected to each
other via a
system bus. The processor of the computer equipment is employed to provide
computing
and controlling capabilities. The memory of the computer equipment includes a
nonvolatile storage medium, and an internal memory. The nonvolatile storage
medium
stores therein an operating system, and a computer program. The internal
memory
provides environment for the running of the operating system and the computer
program
in the nonvolatile storage medium. The network interface of the computer
equipment is
employed to connect to other equipment via network for communication. The
computer
program realizes a Flink stream processing engine method for real-time
recommendation
when it is executed by a processor.
Date Recue/Date Received 2022-03-22

[0174] As understandable to persons skilled in the art, the structure
illustrated in Fig. 7 is merely
a block diagram of partial structure relevant to the solution of the present
application, and
does not constitute any restriction to the computer equipment on which the
solution of
the present application is applied, as the specific computer equipment may
comprise
component parts that are more than or less than those illustrated in Fig. 7,
or may combine
certain component parts, or may have different layout of component parts.
[0175] In one embodiment, there is provided a computer equipment that
comprises a memory, a
processor and a computer program stored on the memory and operable on the
processor,
and the following steps are realized when the processor executes the computer
program:
[0176] performing short-term feature calculation on a log in a message queue
according to a
calculation model file, and storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
features obtained in advance;
[0177] reading a long-term feature and a short-term feature of a to-be-
recommended user from
the distributed storage layer, when an event is received from the message
queue;
[0178] performing real-time prediction according to a prediction model file,
the long-term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
[0179] writing the preference prediction result of the to-be-recommended user
in the distributed
storage layer.
[0180] In one embodiment, there is provided a computer-readable storage medium
storing
thereon a computer program, and the following steps are realized when the
computer
program is executed by a processor:
[0181] performing short-term feature calculation on a log in a message queue
according to a
calculation model file, and storing short-term features obtained by the
calculation in a
distributed storage layer, wherein the distributed storage layer stores
therein long-term
26
Date Recue/Date Received 2022-03-22

features obtained in advance;
[0182] reading a long-term feature and a short-term feature of a to-be-
recommended user from
the distributed storage layer, when an event is received from the message
queue;
[0183] performing real-time prediction according to a prediction model file,
the long-term
feature and the short-term feature of the to-be-recommended user, to obtain a
preference
prediction result of the to-be-recommended user; and
[0184] writing the preference prediction result of the to-be-recommended user
in the distributed
storage layer.
[0185] As comprehensible to persons ordinarily skilled in the art, the entire
or partial flows in
the methods according to the aforementioned embodiments can be completed via a
computer program instructing relevant hardware, the computer program can be
stored in
a nonvolatile computer-readable storage medium, and the computer program can
include
the flows as embodied in the aforementioned various methods when executed. Any
reference to the memory, storage, database or other media used in the various
embodiments provided by the present application can all include nonvolatile
and/or
volatile memory/memories. The nonvolatile memory can include a read-only
memory
(ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM),
an electrically erasable and programmable ROM (EEPROM) or a flash memory. The
volatile memory can include a random access memory (RAM) or an external cache
memory. To serve as explanation rather than restriction, the RAM is obtainable
in many
forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM),
synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.
[0186] Technical features of the aforementioned embodiments are randomly
combinable, while
all possible combinations of the technical features in the aforementioned
embodiments
27
Date Recue/Date Received 2022-03-22

are not exhausted for the sake of brevity, but all these should be considered
to fall within
the scope recorded in the Description as long as such combinations of the
technical
features are not mutually contradictory.
[0187] The foregoing embodiments are merely directed to several modes of
execution of the
present application, and their descriptions are relatively specific and
detailed, but they
should not be hence misunderstood as restrictions to the inventive patent
scope. As should
be pointed out, persons with ordinary skill in the art may further make
various
modifications and improvements without departing from the conception of the
present
application, and all these should pertain to the protection scope of the
present application.
Accordingly, the patent protection scope of the present application shall be
based on the
attached Claims.
28
Date Recue/Date Received 2022-03-22

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Inactive: Grant downloaded 2024-03-08
Inactive: Grant downloaded 2024-03-08
Grant by Issuance 2024-02-27
Letter Sent 2024-02-27
Inactive: Cover page published 2024-02-26
Pre-grant 2024-01-16
Inactive: Final fee received 2024-01-16
Letter Sent 2024-01-15
Notice of Allowance is Issued 2024-01-15
Inactive: Approved for allowance (AFA) 2024-01-10
Inactive: QS passed 2024-01-10
Amendment Received - Response to Examiner's Requisition 2023-12-05
Amendment Received - Voluntary Amendment 2023-12-05
Examiner's Report 2023-09-21
Inactive: Report - No QC 2023-09-19
Amendment Received - Voluntary Amendment 2023-08-02
Amendment Received - Response to Examiner's Requisition 2023-08-02
Examiner's Report 2023-07-25
Inactive: Report - No QC 2023-07-24
Letter sent 2023-06-27
Advanced Examination Determined Compliant - paragraph 84(1)(a) of the Patent Rules 2023-06-27
Amendment Received - Voluntary Amendment 2023-05-30
Inactive: Advanced examination (SO) 2023-05-30
Amendment Received - Voluntary Amendment 2023-05-30
Inactive: Advanced examination (SO) fee processed 2023-05-30
Letter Sent 2023-02-03
Inactive: <RFE date> RFE removed 2023-02-01
Inactive: Office letter 2022-12-20
Inactive: IPC assigned 2022-11-07
Inactive: First IPC assigned 2022-11-07
All Requirements for Examination Determined Compliant 2022-09-16
Request for Examination Requirements Determined Compliant 2022-09-16
Request for Examination Received 2022-09-16
Application Published (Open to Public Inspection) 2022-08-25
Inactive: Reply received: Priority translation request 2022-03-22
Letter sent 2022-03-15
Filing Requirements Determined Compliant 2022-03-15
Priority Claim Requirements Determined Compliant 2022-03-14
Letter Sent 2022-03-14
Request for Priority Received 2022-03-14
Inactive: QC images - Scanning 2022-02-25
Application Received - Regular National 2022-02-25
Inactive: Office letter 2022-02-08

Abandonment History

There is no abandonment history.

Maintenance Fee

The last payment was received on 2023-12-15

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Application fee - standard 2022-02-25 2022-02-25
Request for examination - standard 2026-02-25 2022-09-16
Advanced Examination 2023-05-30 2023-05-30
MF (application, 2nd anniv.) - standard 02 2024-02-26 2023-12-15
Final fee - standard 2022-02-25 2024-01-16
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
10353744 CANADA LTD.
Past Owners on Record
RUI ZHOU
XIAOMING HE
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Cover Page 2024-01-30 1 59
Representative drawing 2024-01-30 1 31
Claims 2023-05-30 29 1,513
Claims 2023-08-02 29 1,512
Claims 2023-12-05 29 1,512
Cover Page 2022-11-12 1 59
Description 2022-02-25 20 1,570
Description 2022-03-22 28 1,261
Claims 2022-03-22 4 134
Drawings 2022-03-22 5 379
Abstract 2022-03-22 1 25
Representative drawing 2022-11-12 1 29
Final fee 2024-01-16 3 67
Electronic Grant Certificate 2024-02-27 1 2,527
Courtesy - Filing certificate 2022-03-15 1 578
Courtesy - Acknowledgement of Request for Examination 2023-02-03 1 423
Commissioner's Notice - Application Found Allowable 2024-01-15 1 580
Advanced examination (SO) / Amendment / response to report 2023-05-30 35 1,277
Courtesy - Advanced Examination Request - Compliant (SO) 2023-06-27 1 184
Examiner requisition 2023-07-25 3 163
Amendment / response to report 2023-08-02 64 2,388
Examiner requisition 2023-09-21 3 144
Amendment / response to report 2023-12-05 64 2,339
New application 2022-02-25 6 220
Commissioner’s Notice - Translation Required 2022-03-14 2 94
Translation Received 2022-03-22 42 1,930
Request for examination 2022-09-16 6 209
Prosecution correspondence 2022-10-23 4 134