Note: Descriptions are shown in the official language in which they were submitted.
CA 03154025 2022-03-10
INTERACTIVE BEHAVIOR RECOGNIZING METHOD, DEVICE, COMPUTER
EQUIPMENT AND STORAGE MEDIUM
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present application relates to an interactive behavior recognizing
method, and
corresponding device, computer equipment and storage medium.
Description of Related Art
[0002] With the development of science and technology, the unmanned selling
technique has
been increasingly highly regarded by various large-scale retailers. This
technique makes
use of such multiple smart recognition techniques as sensors, image analysis,
and
computer vision to achieve unmanned settlement of accounts. Use of the image
recognition technique to sense relative positions between human beings and
shelves and
movements of commodities on the shelves to carry out human-goods interactive
behavior
recognition is an important precondition to ensuring normal settlement of
consumptions
by customers.
[0003] However, the currently available human-goods interactive behavior
recognizing methods
usually make use of templates and rule matching, but the definition of
templates and
stipulation of rules require a great deal of manpower input, and such practice
is often
applicable only to the recognition of conventional human body postures, such
recognition
is inferior in precision, weak in transferability, and applicable merely to
human-goods
interactive behaviors under specific scenarios.
1
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
SUMMARY OF THE INVENTION
[0004] In view of the aforementioned technical problems, there is an urgent
need to provide an
interactive behavior recognizing method, and corresponding device, computer
equipment
and storage medium having higher recognition precision and better
transferability.
[0005] There is provided an interactive behavior recognizing method that
comprises:
[0006] obtaining an image to be detected;
[0007] performing human body posture detection on the image to be detected
through a preset
detection model, and obtaining human body posture information and hand
position
information, wherein the detection model is employed to perform human body
posture
detection;
[0008] tracing the human body posture according to the human body posture
information to
obtain human body motion track information, and performing a target tracing on
the hand
position according to the hand position information to obtain a hand area
image;
[0009] performing article recognition on the hand area image through a preset
classification
recognition model, and obtaining an article recognition result, wherein the
classification
recognition model is employed to perform article recognition; and
[0010] obtaining a first interactive behavior recognition result according to
the human body
motion track information and the article recognition result.
[0011] In one of the embodiments, the step of performing human body posture
detection on the
image to be detected through a preset detection model, and obtaining human
body posture
information and hand position information includes:
[0012] preprocessing the image to be detected, and obtaining a human body
image in the image
to be detected; and
[0013] performing human body posture detection on the human body image through
the preset
detection model, and obtaining the human body posture information and the hand
position
information.
2
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0014] In one of the embodiments, the method further comprises:
[0015] obtaining human body position information according to the image to be
detected; and
[0016] obtaining a second interactive behavior recognition result according to
the human body
motion track information, the article recognition result, the human body
position
information and a preset shelf information, wherein the second interactive
behavior
recognition result is a human-goods interactive behavior recognition result.
[0017] In one of the embodiments, the step of obtaining an image to be
detected includes:
[0018] obtaining the image to be detected as collected by an image collection
device at a preset
first shooting angle; wherein
[0019] preferably, the preset first shooting angle is an overhead angle
perpendicular to the ground,
and the image to be detected is of RGBD data.
[0020] In one of the embodiments, the method further comprises:
[0021] obtaining sample image data;
[0022] marking key points and hand position of a human body image in the
sample image data,
and obtaining a first marked image data;
[0023] performing an image enhancing process on the first marked image data,
and obtaining a
first training dataset; and
[0024] inputting the first training dataset to an HRNet model for training,
and obtaining the
detection model.
[0025] In one of the embodiments, the method further comprises:
[0026] marking a hand area in the sample image data and performing article
category marking
on an article located in the hand area, and obtaining a second marked image
data;
[0027] performing an image enhancing process on the second marked image data,
and obtaining
a second training dataset; and
[0028] inputting the second training dataset to a convolutional neural network
for training, and
3
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
obtaining the preset classification recognition model, wherein, the
convolutional neural
network is a yo1ov3-tiny network or a vgg16 network.
[0029] In one of the embodiments, the step of obtaining sample image data
includes:
[0030] obtaining an image data collected by the image collection device at a
preset second
shooting angle within a preset time frame; and
[0031] screening from the collected image data to obtain sample image data
containing human-
goods interactive behaviors, wherein, preferably, the preset second shooting
angle is an
overhead angle perpendicular to the ground, and the sample image data is of
RGBD data.
[0032] There is provided an interactive behavior recognizing device that
comprises:
[0033] a first obtaining module, for obtaining an image to be detected;
[0034] a first detecting module, for performing human body posture detection
on the image to
be detected through a preset detection model, and obtaining human body posture
information and hand position information, wherein the detection model is
employed to
perform human body posture detection;
[0035] a tracing module, for tracing the human body posture according to the
human body
posture information to obtain human body motion track information, and
performing a
target tracing on the hand position according to the hand position information
to obtain a
hand area image;
[0036] a second detecting module, for performing article recognition on the
hand area image
through a preset classification recognition model, and obtaining an article
recognition
result, wherein the classification recognition model is employed to perform
article
recognition; and
[0037] a first interactive behavior recognizing module, for obtaining a first
interactive behavior
recognition result according to the human body motion track information and
the article
recognition result.
[0038] There is provided a computer equipment that comprises a memory, a
processor and a
4
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
computer program stored on the memory and operable on the processor, and the
following
steps are realized when the processor executes the computer program:
[0039] obtaining an image to be detected;
[0040] performing human body posture detection on the image to be detected
through a preset
detection model, and obtaining human body posture information and hand
position
information, wherein the detection model is employed to perform human body
posture
detection;
[0041] tracing the human body posture according to the human body posture
information to
obtain human body motion track information, and performing a target tracing on
the hand
position according to the hand position information to obtain a hand area
image;
[0042] performing article recognition on the hand area image through a preset
classification
recognition model, and obtaining an article recognition result, wherein the
classification
recognition model is employed to perform article recognition; and
[0043] obtaining a first interactive behavior recognition result according to
the human body
motion track information and the article recognition result.
[0044] There is provided a computer-readable storage medium storing a computer
program
thereon, and the following steps are realized when the computer program is
executed by
a processor:
[0045] obtaining an image to be detected;
[0046] performing human body posture detection on the image to be detected
through a preset
detection model, and obtaining human body posture information and hand
position
information, wherein the detection model is employed to perform human body
posture
detection;
[0047] tracing the human body posture according to the human body posture
information to
obtain human body motion track information, and performing a target tracing on
the hand
position according to the hand position information to obtain a hand area
image;
[0048] performing article recognition on the hand area image through a preset
classification
recognition model, and obtaining an article recognition result, wherein the
classification
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
recognition model is employed to perform article recognition; and
[0049] obtaining a first interactive behavior recognition result according to
the human body
motion track information and the article recognition result.
[0050] In the aforementioned interactive behavior recognizing method, and
corresponding
device, computer equipment and storage medium, interactive behavior
recognition is
performed on the image to be detected through the detection model and the
classification
recognition model, whereby only few data is required to be collected on the
basis of
existing models to be deployed in different stores, stronger transferability
is achieved,
lower deployment cost is spent, and it is made possible for the detection
model to flexibly
and precisely recognize interactive behaviors, and to enhance recognition
precision.
BRIEF DESCRIPTION OF THE DRAWINGS
[0051] Fig. 1 is a view illustrating the application environment for an
interactive behavior
recognizing method in an embodiment;
[0052] Fig. 2 is a flowchart schematically illustrating an interactive
behavior recognizing method
in an embodiment;
[0053] Fig. 3 is a flowchart schematically illustrating an interactive
behavior recognizing method
in another embodiment;
[0054] Fig. 4 is a flowchart schematically illustrating the detection model
training steps in an
embodiment;
[0055] Fig. 5 is a flowchart schematically illustrating the classification
recognition model
training steps in an embodiment;
6
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0056] Fig. 6 is a block diagram illustrating the structure of an interactive
behavior recognizing
device in an embodiment; and
[0057] Fig. 7 is a view illustrating the internal structure of a computer
equipment in an
embodiment.
DETAILED DESCRIPTION OF THE INVENTION
[0058] To make more lucid and clear the objectives, technical solutions and
advantages of the
present application, the present application is described in greater detail
below with
reference to accompanying drawings and embodiments. As should be understood,
the
specific embodiments as described here are merely meant to explain the present
application, rather than to restrict the present application.
[0059] The interactive behavior recognizing method provided by the present
application is
applicable to the application environment as shown in Fig. 1, in which
terminal 102
communicates with server 104 through network. Terminal 102 can be, but is not
limited
to be, any of various image collection devices, moreover, terminal 102 can
employ one
or more depth camera(s) with shooting angle(s) perpendicular to the ground,
while server
104 can be embodied as an independent server or a server cluster consisting of
a plurality
of servers.
[0060] In one embodiment, as shown in Fig. 2, there is provided an interactive
behavior
recognizing method, and the method is explained with an example of its being
applied to
the server in Fig. 1, to comprise the following steps.
[0061] Step 202 ¨ obtaining an image to be detected.
[0062] The image to be detected is an interactive behavior image between a
human being and an
7
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
object to be detected.
[0063] In one of the embodiments, step 202 includes: the server obtains the
image to be detected
as collected by an image collection device at a preset first shooting angle,
preferably, the
preset first shooting angle is an overhead angle perpendicular to the ground
or
approximately perpendicular to the ground, and the image to be detected is of
RGBD data.
[0064] In other words, the image to be detected is of RGBD data collected by
an image collection
device under an overhead angle scenario, the image collection device can be
embodied
as a depth camera disposed above a shelf, the first shooting angle can be not
perpendicular
to the ground, and can be any overhead angle close to being perpendicular
insofar as the
installation environment allows, so as to avoid shooting dead angle as far as
possible.
[0065] The present technical solution makes use of a depth camera poised for
the overhead angle
to detect human-goods interactive behaviors, in comparison with the
traditional camera
installation mode whereby the camera is installed at a certain included angle
with respect
to the ground, the present technical solution effectively evades the problem
in which both
the human being and the shelf are shielded due to askance angle and the
problem in which
it is more difficult to trace the hand; in actual application, image
collection at overhead
angle makes it possible to better recognize the behaviors of picking up goods
by different
persons in turns.
[0066] Step 204 ¨ performing human body posture detection on the image to be
detected through
a preset detection model, and obtaining human body posture information and
hand
position information, wherein the detection model is employed to perform human
body
posture detection.
[0067] The detection model is a human body posture detection model that can be
used to detect
key points of the human skeleton.
8
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0068] Specifically, the server inputs a human body image to the detection
model; human body
posture detection is performed on the human body image in the detection model;
human
body posture information and hand position information output by the detection
model
are obtained; the human body posture detection can be via a common skeleton
line
detecting method, the human body posture information as obtained is an image
of human
skeletal key points, and the hand position information is the specific
position of a hand in
the image of human skeletal key points.
[0069] Step 206 ¨ tracing the human body posture according to the human body
posture
information to obtain human body motion track information, and performing a
target
tracing on the hand position according to the hand position information to
obtain a hand
area image.
[0070] Specifically, a target tracing algorithm is employed, such as the
Camshift algorithm
adaptable to changes in size and shape of a moving target, to trace the motion
tracks of
the human body and the hand, respectively, to obtain human body motion track
information, and to enlarge the hand position in the tracing process to obtain
a hand area
image.
[0071] Step 208 ¨ performing article recognition on the hand area image
through a preset
classification recognition model, and obtaining an article recognition result,
wherein the
classification recognition model is employed to perform article recognition.
[0072] The classification recognition model is an article recognition model
that can be trained
by deep learning.
[0073] Specifically, the hand area image is input to the classification
recognition model, and the
hand area image is detected in the classification recognition model to judge
whether an
9
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
article is held in the hand area, in the case there is an article being held,
the classification
recognition model recognizes the article and outputs an article recognition
result;
moreover, the classification recognition model can further make skin color
judgment on
the hand area image, and timely send out early warning against the behavior of
intentional
shielding of the hand by means of such articles as clothes, so as to achieve
the objective
of reducing goods loss.
[0074] Step 210¨ obtaining a first interactive behavior recognition result
according to the human
body motion track information and the article recognition result.
[0075] The first interactive behavior recognition result is a human-article
interactive behavior
recognition result.
[0076] Specifically, the human body motion track information can be used to
judge behavioral
actions of a human being, for example, hand stretching, bending, stooping and
squatting,
etc., and to judge whether any article is held in the hand, when an article is
held in the
hand, the article is recognized to obtain an article recognition result,
whereby it is possible
to judge that the human body is picking up or putting down the article, namely
to analyze
to obtain a human-article interactive behavior recognition result.
[0077] In the interactive behavior recognizing method provided by the present
technical solution,
interactive behavior recognition is performed on the image to be detected
through the
detection model and the classification recognition model, and it is made
possible, through
model training and algorithm tuning, to automatically recognize interactive
behaviors
between human beings and articles, and the recognition result is made more
precise;
moreover, only few data is required to be collected on the basis of the
current detection
model and classification recognition model to be deployed in different
scenarios, stronger
transferability is achieved, and lower deployment cost is spent.
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0078] In one of the embodiments, as shown in Fig. 3, the method comprises the
following steps.
[0079] Step 302 ¨ obtaining an image to be detected.
[0080] Step 304 ¨ preprocessing the image to be detected, and obtaining a
human body image in
the image to be detected.
[0081] Step 304 is a process to extract a human body image that is required to
be used in
subsequent steps from the image to be detected, and the unwanted background
image is
shielded out.
[0082] Specifically, the preprocessing can be background modeling, in other
words, background
modeling based on Gaussian mixture is performed on the image to be detected,
and a
background model is obtained.
[0083] A human body image in the image to be detected is obtained according to
the image to
be detected and the background model.
[0084] Step 306 ¨ performing human body posture detection on the human body
image through
the preset detection model, and obtaining the human body posture information
and the
hand position information.
[0085] Step 308 ¨ tracing the human body posture according to the human body
posture
information to obtain human body motion track information, and performing a
target
tracing on the hand position according to the hand position information to
obtain a hand
area image.
[0086] Step 310 ¨ performing article recognition on the hand area image
through a preset
classification recognition model, and obtaining an article recognition result,
wherein the
11
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
classification recognition model is employed to perform article recognition.
[0087] Step 312¨ obtaining a first interactive behavior recognition result
according to the human
body motion track information and the article recognition result.
[0088] In this embodiment, unwanted background image is shielded out in step
304 by
preprocessing the image to be detected, only the human body image to be
subsequently
used is retained, whereby the volume of data to be processed in the following
steps is
reduced, and data processing efficiency is enhanced.
[0089] In one of the embodiments, the method further comprises:
[0090] A - obtaining human body position information according to the image to
be detected;
wherein
[0091] the human body position information can indicate position information
in a three-
dimensional world coordinate system.
[0092] Specifically, collection position information of the image to be
detected in the three-
dimensional world coordinate system is obtained, three-dimensional world
coordinate
transformation is performed according to the position information of the human
body
image in the image to be detected and the collection position information, and
position
information of the human body in the three-dimensional world coordinate system
is
obtained.
[0093] B - obtaining a second interactive behavior recognition result
according to the human
body motion track information, the article recognition result, the human body
position
information and a preset shelf information, wherein the second interactive
behavior
recognition result is a human-goods interactive behavior recognition result.
[0094] The shelf information includes shelf position information and
information of articles in
12
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
the shelf, of which the shelf position information is a three-dimensional
world coordinate
position where the shelf locates.
[0095] Specifically, shelf information to which the human body position
corresponds is obtained
according to the human body position information and the preset shelf
information; an
interactive behavior between the human body and the shelf is determined by
tracing the
three-dimensional world coordinate positions where the human body and the
shelf locate,
and the occurrence of a valid human-goods interactive behavior is further
determined in
the tracing process by recognizing whether the hand area has any commodity
associated
with the shelf, the valid human-goods interactive behavior here can be a
behavior of a
customer completing one round of picking up goods from the shelf.
[0096] The present technical solution converts out the position of a customer
in the world
coordinate system through three-dimensional world coordinate transformation,
and the
association thereof with the shelf makes it possible to recognize whether the
customer
has effected a valid human-goods interactive behavior; on the other hand, on
the basis of
the recognition of the human-goods interactive behavior in conjunction with
the article
recognition result, under the premise that the shelf stock is known, it is
possible to
indirectly achieve counting of the existing stock by monitoring the number of
times of
valid interactions between humans and the shelf, in case of short supply, the
server can
timely remind shop assistants to administer the stock, whereby manual stock-
taking cost
is greatly reduced.
[0097] In one of the embodiments, as shown in Fig. 4, the method further
comprises detection
module training steps, which specifically include the following steps.
[0098] Step 402 ¨ obtaining sample image data.
[0099] Specifically, an image data collected by the image collection device at
a preset second
13
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
shooting angle within a preset time frame is obtained, i.e., interactive
behavioral image
data of a certain magnitude is collected; sample image data containing human-
goods
interactive behaviors are screened and obtained from the collected image data,
the preset
second shooting angle can be an overhead angle perpendicular to the ground or
approximately perpendicular to the ground, and the sample image data is of
RGBD data.
[0100] Step 404 ¨ marking key points and hand position of a human body image
in the sample
image data, and obtaining a first marked image data.
[0101] Specifically, the sample image data should essentially cover different
human-goods
interactive behaviors in actual scenarios, and it is further possible to
enhance the sample
data, increase the volume of the sample image data, and raise the proportion
of training
samples with large posture amplitudes in the interactive behavioral process,
for instance,
to raise the proportion of such human-goods interactive behavioral postures as
bending,
stooping and squatting etc., so as to enhance detection precision of the
detection model.
During the process of specific implementation, a part of the first marked
image data can
be taken to serve as a training dataset, while the remaining part serves as a
verification
dataset.
[0102] Step 406 ¨ performing an image enhancing process on the first marked
image data, and
obtaining a first training dataset; during the process of specific
implementation, the image
enhancing process is performed on the training dataset in the first marked
image data to
obtain a first training dataset.
[0103] Specifically, the image enhancing process can include any one or more
of the following
image transforming methods, such as image normalization, random clipping of
images,
image zooming, image rollover, image affine transformation, image contrast
change,
image hue change, image saturation change, and adding tone interference blocks
to
images, etc.
14
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0104] Step 408 ¨ inputting the first training dataset to an HRNet model for
training, and
obtaining the detection model. Specifically, different network architectures
of the HRNet
model can be employed to train human body posture detection models, various
models
obtained through training by the different network architectures are then
verified and
appraised through the verification dataset, and a model with the optimal
effect is selected
to serve as the detection model.
[0105] In one of the embodiments, as shown in Fig. 5, the method further
comprises
classification recognition module training steps, which specifically include
the following
steps.
[0106] Step 502 ¨ obtaining sample image data.
[0107] Step 504¨ marking a hand area in the sample image data and performing
article category
marking on an article located in the hand area, and obtaining a second marked
image data.
[0108] Step 506 ¨ performing an image enhancing process on the second marked
image data,
and obtaining a second training dataset.
[0109] Specifically, the image enhancing process can include any one or more
of the following
image transforming methods, such as image normalization, random clipping of
images,
image zooming, image rollover, image affine transformation, image contrast
change,
image hue change, image saturation change, and adding tone interference blocks
to
images, etc.
[0110] Step 508 ¨ inputting the second training dataset to a yo1ov3-tiny
network or a vgg16
network for training, and obtaining the preset classification recognition
model.
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0111] The present technical solution collects RGBD data through a depth
camera with angle
perpendicular or approximately perpendicular to the ground, then manually
sorts and
collects the RGBD data containing human-goods interactive behaviors to serve
as training
samples, namely sample image data, employs deep learning training, and
recognizes
different postures of the human body with the trained model results, whereby
the
detection model can more flexibly and precisely recognize interactive
behaviors, and
possesses stronger transferability.
[0112] As should be understood, although the various steps in the flowcharts
of Figs. 2-5 are
sequentially displayed as indicated by arrows, these steps are not necessarily
executed in
the sequences indicated by arrows. Unless otherwise explicitly noted in this
paper,
execution of these steps is not restricted by any sequence, as these steps can
also be
executed in other sequences (than those indicated in the drawings). Moreover,
at least
partial steps in the flowcharts of Figs. 2-5 may include plural sub-steps or
multi-phases,
these sub-steps or phases are not necessarily completed at the same timing,
but can be
executed at different timings, and these sub-steps or phases are also not
necessarily
sequentially performed, but can be performed in turns or alternately with
other steps or
with at least some of sub-steps or phases of other steps.
[0113] There is provided an interactive behavior recognizing device, as shown
in Fig. 6, the
device comprises a first obtaining module 602, a first detecting module 604, a
tracing
module 606, a second detecting module 608 and a first interactive behavior
recognizing
module 610, of which:
[0114] the first obtaining module 602 is employed for obtaining an image to be
detected;
[0115] the first detecting module 604 is employed for performing human body
posture detection
on the image to be detected through a preset detection model, and obtaining
human body
posture information and hand position information, wherein the detection model
is
employed to perform human body posture detection;
[0116] the tracing module 606 is employed for tracing the human body posture
according to the
16
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
human body posture information to obtain human body motion track information,
and
performing a target tracing on the hand position according to the hand
position
information to obtain a hand area image;
[0117] the second detecting module 608 is employed for performing article
recognition on the
hand area image through a preset classification recognition model, and
obtaining an
article recognition result, wherein the classification recognition model is
employed to
perform article recognition; and
[0118] the first interactive behavior recognizing module 610 is employed for
obtaining a first
interactive behavior recognition result according to the human body motion
track
information and the article recognition result.
[0119] In one of the embodiments, the first detecting module 604 is further
employed for
preprocessing the image to be detected, and obtaining a human body image in
the image
to be detected; and performing human body posture detection on the human body
image
through the preset detection model, and obtaining the human body posture
information
and the hand position information.
[0120] In one of the embodiments, the device further comprises:
[0121] a human body position module, for obtaining human body position
information according
to the image to be detected; and
[0122] a second interactive behavior recognizing module, for obtaining a
second interactive
behavior recognition result according to the human body motion track
information, the
article recognition result, the human body position information and a preset
shelf
information, wherein the second interactive behavior recognition result is a
human-goods
interactive behavior recognition result.
[0123] In one of the embodiments, the first obtaining module 602 is further
employed for
obtaining the image to be detected as collected by an image collection device
at a preset
first shooting angle; preferably, the preset first shooting angle is an
overhead angle
17
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
perpendicular to the ground, and the image to be detected is of RGBD data.
[0124] In one of the embodiments, the device further comprises:
[0125] a second obtaining module, for obtaining sample image data;
[0126] a first marking module, for marking key points and hand position of a
human body image
in the sample image data, and obtaining a first marked image data;
[0127] a first enhancing module, for performing an image enhancing process on
the first marked
image data, and obtaining a first training dataset; and
[0128] a first training module, for inputting the first training dataset to an
HRNet model for
training, and obtaining the detection model.
[0129] In one of the embodiments, the device further comprises:
[0130] a second marking module, for marking a hand area in the sample image
data and
performing article category marking on an article located in the hand area,
and obtaining
a second marked image data;
[0131] a second enhancing module, for performing an image enhancing process on
the second
marked image data, and obtaining a second training dataset; and
[0132] a second training module, for inputting the second training dataset to
a yo1ov3-tiny
network or a vgg16 network for training, and obtaining the preset
classification
recognition model.
[0133] In one of the embodiments, the second obtaining module is further
employed for
obtaining an image data collected by the image collection device at a preset
second
shooting angle within a preset time frame; and screening from the collected
image data
to obtain sample image data containing human-goods interactive behaviors,
preferably,
the preset second shooting angle is an overhead angle perpendicular to the
ground, and
the sample image data is of RGBD data.
[0134] Specific definitions relevant to the interactive behavior recognizing
device may be
18
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
inferred from the aforementioned definitions to the interactive behavior
recognizing
method, while no repetition is made in this context. The various modules in
the
aforementioned interactive behavior recognizing device can be wholly or partly
realized
via software, hardware, and a combination of software with hardware. The
various
modules can be embedded in the form of hardware in a processor in a computer
equipment or independent of any computer equipment, and can also be stored in
the form
of software in a memory in a computer equipment, so as to facilitate the
processor to
invoke and perform operations corresponding to the aforementioned various
modules.
[0135] In one embodiment, a computer equipment is provided, the computer
equipment can be
a server, and its internal structure can be as shown in Fig. 7. The computer
equipment
comprises a processor, a memory, a network interface, and a database connected
to each
other via a system bus. The processor of the computer equipment is employed to
provide
computing and controlling capabilities. The memory of the computer equipment
includes
a nonvolatile storage medium and an internal memory. The nonvolatile storage
medium
stores therein an operating system, a computer program and a database. The
internal
memory provides environment for the running of the operating system and the
computer
program in the nonvolatile storage medium. The database of the computer
equipment is
employed to store data. The network interface of the computer equipment is
employed to
connect to an external terminal via network for communication. The computer
program
realizes an interactive behavior recognizing method when it is executed by a
processor.
[0136] As understandable to persons skilled in the art, the structure
illustrated in Fig. 7 is merely
a block diagram of partial structure relevant to the solution of the present
application, and
does not constitute any restriction to the computer equipment on which the
solution of
the present application is applied, as the specific computer equipment may
comprise
component parts that are more than or less than those illustrated in Fig. 7,
or may combine
certain component parts, or may have different layout of component parts.
19
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0137] In one embodiment, there is provided a computer equipment that
comprises a memory, a
processor and a computer program stored on the memory and operable on the
processor,
and the following steps are realized when the processor executes the computer
program:
obtaining an image to be detected; performing human body posture detection on
the
image to be detected through a preset detection model, and obtaining human
body posture
information and hand position information, wherein the detection model is
employed to
perform human body posture detection; tracing the human body posture according
to the
human body posture information to obtain human body motion track information,
and
performing a target tracing on the hand position according to the hand
position
information to obtain a hand area image; performing article recognition on the
hand area
image through a preset classification recognition model, and obtaining an
article
recognition result, wherein the classification recognition model is employed
to perform
article recognition; and obtaining a first interactive behavior recognition
result according
to the human body motion track information and the article recognition result.
[0138] In one embodiment, when the processor executes the computer program,
the following
steps are further realized: the step of performing human body posture
detection on the
image to be detected through a preset detection model, and obtaining human
body posture
information and hand position information includes: preprocessing the image to
be
detected, and obtaining a human body image in the image to be detected; and
performing
human body posture detection on the human body image through the preset
detection
model, and obtaining the human body posture information and the hand position
information.
[0139] In one embodiment, when the processor executes the computer program,
the following
steps are further realized: obtaining human body position information
according to the
image to be detected; and obtaining a second interactive behavior recognition
result
according to the human body motion track information, the article recognition
result, the
human body position information and a preset shelf information, wherein the
second
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
interactive behavior recognition result is a human-goods interactive behavior
recognition
result.
[0140] In one embodiment, when the processor executes the computer program,
the following
steps are further realized: the step of obtaining an image to be detected
includes: obtaining
the image to be detected as collected by an image collection device at a
preset first
shooting angle; wherein preferably, the preset first shooting angle is an
overhead angle
perpendicular to the ground, and the image to be detected is of RGBD data.
[0141] In one embodiment, when the processor executes the computer program,
the following
steps are further realized: obtaining sample image data; marking key points
and hand
position of a human body image in the sample image data, and obtaining a first
marked
image data; performing an image enhancing process on the first marked image
data, and
obtaining a first training dataset; and inputting the first training dataset
to an HRNet
model for training, and obtaining the detection model.
[0142] In one embodiment, when the processor executes the computer program,
the following
steps are further realized: marking a hand area in the sample image data and
performing
article category marking on an article located in the hand area, and obtaining
a second
marked image data; performing an image enhancing process on the second marked
image
data, and obtaining a second training dataset; and inputting the second
training dataset to
a convolutional neural network for training, and obtaining the preset
classification
recognition model.
[0143] In one embodiment, when the processor executes the computer program,
the following
steps are further realized: the step of obtaining sample image data includes:
obtaining an
image data collected by the image collection device at a preset second
shooting angle
within a preset time frame; and screening from the collected image data to
obtain sample
image data containing human-goods interactive behaviors, wherein, preferably,
the preset
21
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
second shooting angle is an overhead angle perpendicular to the ground, and
the sample
image data is of RGBD data.
[0144] In one embodiment, there is provided a computer-readable storage medium
storing
thereon a computer program, and the following steps are realized when the
computer
program is executed by a processor: obtaining an image to be detected;
performing human
body posture detection on the image to be detected through a preset detection
model, and
obtaining human body posture information and hand position information,
wherein the
detection model is employed to perform human body posture detection; tracing
the human
body posture according to the human body posture information to obtain human
body
motion track information, and performing a target tracing on the hand position
according
to the hand position information to obtain a hand area image; performing
article
recognition on the hand area image through a preset classification recognition
model, and
obtaining an article recognition result, wherein the classification
recognition model is
employed to perform article recognition; and obtaining a first interactive
behavior
recognition result according to the human body motion track information and
the article
recognition result.
[0145] In one embodiment, when the computer program is executed by a
processor, the following
steps are further realized: the step of performing human body posture
detection on the
image to be detected through a preset detection model, and obtaining human
body posture
information and hand position information includes: preprocessing the image to
be
detected, and obtaining a human body image in the image to be detected; and
performing
human body posture detection on the human body image through the preset
detection
model, and obtaining the human body posture information and the hand position
information.
[0146] In one embodiment, when the computer program is executed by a
processor, the following
steps are further realized: obtaining human body position information
according to the
22
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
image to be detected; and obtaining a second interactive behavior recognition
result
according to the human body motion track information, the article recognition
result, the
human body position information and a preset shelf information, wherein the
second
interactive behavior recognition result is a human-goods interactive behavior
recognition
result.
[0147] In one embodiment, when the computer program is executed by a
processor, the following
steps are further realized: the step of obtaining an image to be detected
includes: obtaining
the image to be detected as collected by an image collection device at a
preset first
shooting angle; wherein preferably, the preset first shooting angle is an
overhead angle
perpendicular to the ground, and the image to be detected is of RGBD data.
[0148] In one embodiment, when the computer program is executed by a
processor, the following
steps are further realized: obtaining sample image data; marking key points
and hand
position of a human body image in the sample image data, and obtaining a first
marked
image data; performing an image enhancing process on the first marked image
data, and
obtaining a first training dataset; and inputting the first training dataset
to an HRNet
model for training, and obtaining the detection model.
[0149] In one embodiment, when the computer program is executed by a
processor, the following
steps are further realized: marking a hand area in the sample image data and
performing
article category marking on an article located in the hand area, and obtaining
a second
marked image data; performing an image enhancing process on the second marked
image
data, and obtaining a second training dataset; and inputting the second
training dataset to
a convolutional neural network for training, and obtaining the preset
classification
recognition model.
[0150] In one embodiment, when the computer program is executed by a
processor, the following
steps are further realized: the step of obtaining sample image data includes:
obtaining an
23
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
image data collected by the image collection device at a preset second
shooting angle
within a preset time frame; and screening from the collected image data to
obtain sample
image data containing human-goods interactive behaviors, wherein, preferably,
the preset
second shooting angle is an overhead angle perpendicular to the ground, and
the sample
image data is of RGBD data.
[0151] As comprehensible to persons ordinarily skilled in the art, the entire
or partial flows in
the methods according to the aforementioned embodiments can be completed via a
computer program instructing relevant hardware, the computer program can be
stored in
a nonvolatile computer-readable storage medium, and the computer program can
include
the flows as embodied in the aforementioned various methods when executed. Any
reference to the memory, storage, database or other media used in the various
embodiments provided by the present application can all include nonvolatile
and/or
volatile memory/memories. The nonvolatile memory can include a read-only
memory
(ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM),
an electrically erasable and programmable ROM (EEPROM) or a flash memory. The
volatile memory can include a random access memory (RAM) or an external cache
memory. To serve as explanation rather than restriction, the RAM is obtainable
in many
forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM),
synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.
[0152] Technical features of the aforementioned embodiments are randomly
combinable, while
all possible combinations of the technical features in the aforementioned
embodiments
are not exhausted for the sake of brevity, but all these should be considered
to fall within
the scope recorded in the description as long as such combinations of the
technical
features are not mutually contradictory.
24
Date Recue/Date Received 2022-03-10
CA 03154025 2022-03-10
[0153] The foregoing embodiments are merely directed to several modes of
execution of the
present application, and their descriptions are relatively specific and
detailed, but they
should not be hence misunderstood as restrictions to the inventive patent
scope. As should
be pointed out, persons with ordinary skill in the art may further make
various
modifications and improvements without departing from the conception of the
present
application, and all these should pertain to the protection scope of the
present application.
Accordingly, the patent protection scope of the present application shall be
based on the
attached Claims.
Date Recue/Date Received 2022-03-10