Sommaire du brevet 3150597

(12) Demande de brevet:	(11) CA 3150597
(54) Titre français:	METHODE ET DISPOSITIF DE DETECTION DE PIETON
(54) Titre anglais:	PEDESTRIAN DETECTING METHOD AND DEVICE
Statut:	Examen

Données bibliographiques

(51) Classification internationale des brevets (CIB):	G6V 10/74 (2022.01) G6T 7/50 (2017.01) G6V 10/20 (2022.01) G6V 10/40 (2022.01) G6V 20/52 (2022.01)
(72) Inventeurs :	YIN, YANTAO (Chine) LIU, JIANG (Chine) HUANG, YINJUN (Chine) JI, HUAIYUAN (Chine) JING, WEI (Chine)
(73) Titulaires :	10353744 CANADA LTD.
(71) Demandeurs :	10353744 CANADA LTD. (Canada)
(74) Agent:	JAMES W. HINTONHINTON, JAMES W.
(74) Co-agent:
(45) Délivré:
(22) Date de dépôt:	2022-03-01
(41) Mise à la disponibilité du public:	2022-09-02
Requête d'examen:	2022-09-16
Licence disponible:	S.O.
Cédé au domaine public:	S.O.
(25) Langue des documents déposés:	Anglais

Traité de coopération en matière de brevets (PCT):	Non

(30) Données de priorité de la demande:

Numéro de la demande	Pays / territoire	Date
202110231224.3	(Chine)	2021-03-02

Abrégés

Abrégé anglais

The present invention discloses a pedestrian detecting method and a
corresponding device, relates
to the field of image recognition technology. The method comprises: creating a
background mask
corresponding to each depth camera according to a first depth image captured
by each depth
camera, wherein the background mask includes a ground mask and a marker mask;
respectively
updating background masks to which various depth cameras correspond on the
basis of pixels in
plural frames of second depth images continuously captured by each depth
camera, and pixels in
the background mask corresponding to each depth camera; and recognizing a
pedestrian
detecting result by comparing pixels in the full-scene top-view depth picture
and pixels in the
full-scene top-view depth background picture, and comparing pixels in the full-
scene top-view
color picture and pixels in the full-scene top-view color background picture.

Revendications

Note : Les revendications sont présentées dans la langue officielle dans laquelle elles ont été soumises.

CLAIMS
What is claimed is:
1. A pedestrian detecting method, characterized in comprising:
creating a background mask corresponding to each depth camera according to a
first depth image
captured by each depth camera, wherein the background mask includes a ground
mask and a
marker mask;
respectively updating background masks to which various depth cameras
correspond on the basis
of pixels in plural frames of second depth images continuously captured by
each depth camera,
and pixels in the background mask corresponding to each depth camera;
obtaining a full-scene top-view depth background picture and a full-scene top-
view color
background picture after coordinate-transforming and merging pixels in the
background masks
to which the various depth cameras correspond;
splitting the full-scene top-view depth background picture into separate top-
view depth
background pictures corresponding to each depth camera, and splitting the full-
scene top-view
color background picture into separate top-view color background pictures
corresponding to
each depth camera;
updating pixels in a foreground region into the top-view depth background
picture and the top-
view color background picture of the corresponding depth camera by recognizing
the
foreground region that contains human body pixels in third depth images
obtained in real time
by the various depth cameras, so as to update the top-view depth picture and
the top-view color
picture of each depth camera;
merging the top-view depth pictures of the various depth cameras to form a
full-scene top-view
depth picture, and merging the top-view color pictures of the various depth
cameras to form a
full-scene top-view color picture; and
recognizing a pedestrian detecting result by comparing pixels in the full-
scene top-view depth
picture and pixels in the full-scene top-view depth background picture, and
comparing pixels
in the full-scene top-view color picture and pixels in the full-scene top-view
color background
3 1

picture.
2. The method according to Claim 1, characterized in that the step of creating
a background mask
corresponding to each depth camera according to a first depth image captured
by each depth
camera includes:
frame-selecting a ground region from the first depth image captured by each
depth camera to
create a ground fitting formula, and frame-selecting at least one marker
region to create a
marker fitting formula corresponding to the marker region in a one-to-one
manner;
creating the ground mask corresponding to each depth camera according to the
ground fitting
formula, and creating the marker mask corresponding to each depth camera
according to the
marker fitting formula; and
merging the ground mask and the marker mask to form the background mask
corresponding to
each depth camera.
3. The method according to Claim 1, characterized in that the step of updating
background masks
on the basis of pixels in plural frames of second depth images continuously
captured by each
depth camera, and pixels in the background mask corresponding to each depth
camera includes:
comparing depth values of pixels at various corresponding locations in the
Illth frame of second
depth image and the m+ lth frame of second depth image captured by the same
depth camera,
where an initial value of m is 1;
recognizing pixels whose depth values are changed, updating the depth value of
a pixel at a
corresponding location in the m+ lth frame of second depth image as a small
value in the
comparing result, let m = m+1, and comparing again depth values of pixels at
various
corresponding locations in the Illth frame of second depth image and the m+lth
frame of second
depth image, until pixels at various locations in the last frame of second
depth image and their
corresponding depth values are obtained;
comparing the pixels at various locations in the last frame of second depth
image and their
corresponding depth values with pixels at various locations in the background
mask and their
corresponding depth values; and
32

recognizing pixels whose depth values are changed, and updating the depth
value of a pixel at a
corresponding location in the background mask as a small value in the
comparing result.
4. The method according to Claim 1, characterized in that the step of
obtaining a full-scene top-
view depth background picture and a full-scene top-view color background
picture after
coordinate-transforming and merging pixels in the background masks to which
the various depth
cameras correspond includes:
creating a full-scene top-view depth background blank template picture and a
full-scene top-view
color background blank template picture, wherein depth values of pixels at
various locations
in the full-scene top-view depth background blank template picture are zero,
and color values
of pixels at various locations in the full-scene top-view color background
blank template
picture are zero;
merging and unifying pixels in the background masks to which the various depth
cameras
correspond to form a full-scene background mask, unifoimly transforming the
pixel
coordinates to world coordinates, and then uniformly transforming the world
coordinates to
top-view coordinates;
sequentially traversing pixels in the full-scene background mask, comparing a
depth value of
each pixel with depth values of pixels at corresponding locations in the full-
scene top-view
depth background blank template picture, and replacing pixels at corresponding
locations in
the full-scene top-view depth background blank template picture with large-
value pixels in the
full-scene background mask, to obtain a full-scene top-view depth background
picture; and
on the basis of pixels to which replacement occurs in the full-scene top-view
depth background
mask, replacing their pixel color values to pixels at corresponding locations
in the full-scene
top-view color background blank template picture, to obtain a full-scene top-
view color
background picture.
5. The method according to Claim 4, characterized in that the step of
splitting the full-scene top-
view depth background picture into separate top-view depth background pictures
corresponding
to each depth camera, and splitting the full-scene top-view color background
picture into separate
3 3

top-view color background pictures corresponding to each depth camera
includes:
on the basis of top-view coordinates of pixels of the background mask to which
each depth
camera corresponds, splitting the full-scene top-view depth background picture
into separate
top-view depth background pictures corresponding to each depth camera, and
splitting the full-
scene top-view color background picture into separate top-view color
background pictures
corresponding to each depth camera.
6. The method according to Claim 5, characterized in that the step of updating
pixels in a
foreground region into the top-view depth background picture and the top-view
color background
picture of the corresponding depth camera by recognizing the foreground region
that contains
human body pixels in third depth images obtained in real time by the various
depth cameras
includes:
comparing depth values of pixels in the third depth images obtained in real
time by the depth
cameras with depth values of pixels of the corresponding separate top-view
depth background
pictures;
employing a frame difference method to recognize pixels whose depth values are
small values in
the third depth images, and summarizing to obtain a foreground region that
contains human
body pixels;
correspondingly matching and associating pixels in the foreground region with
pixels of the
separate top-view depth background pictures in a one-to-one manner, and
replacing depth
values of the pixels in the separate top-view depth background pictures with
depth values of
the pixels in the corresponding foreground region; and
recognizing pixels to which replacement occurs in the separate top-view depth
background
pictures, and replacing color values of pixels in the foreground region to
corresponding pixels
in the separate top-view color background pictures.
7. The method according to Claim 6, characterized in that the step of merging
the top-view depth
pictures of the various depth cameras to form a full-scene top-view depth
picture, and merging
the top-view color pictures of the various depth cameras to form a full-scene
top-view color
34

picture includes:
traversing pixels in the corresponding top-view depth picture of each depth
camera, and replacing
depth values of pixels at corresponding locations in the full-scene top-view
depth background
picture, to obtain a full-scene top-view depth picture; and
recognizing pixels to which replacement occurs in the full-scene top-view
depth picture, and
replacing color values of pixels at corresponding locations in the full-scene
top-view color
background picture, to obtain a full-scene top-view color picture.
8. The method according to Claim 7, characterized in that the step of
recognizing a pedestrian
detecting result by comparing pixels in the full-scene top-view depth picture
and pixels in the
full-scene top-view depth background picture, and comparing pixels in the full-
scene top-view
color picture and pixels in the full-scene top-view color background picture
includes:
comparing pixels whose depth values are changed in the full-scene top-view
depth picture and
the full-scene top-view depth background picture, and on the basis of a dense
region area of
pixels and depth values of the various pixels, recognizing a head volume
and/or a body volume;
and
recognizing a pedestrian detecting result on the basis of sizes and/or a size
of the head volume
and/or the body volume.
9. A pedestrian detecting device, characterized in comprising:
a mask creating unit, for creating a background mask corresponding to each
depth camera
according to a first depth image captured by each depth camera, wherein the
background mask
includes a ground mask and a marker mask;
a mask updating unit, for respectively updating background masks to which
various depth
cameras correspond on the basis of pixels in plural frames of second depth
images
continuously captured by each depth camera, and pixels in the background mask
corresponding to each depth camera;
a mask merging unit, for obtaining a full-scene top-view depth background
picture and a full-
scene top-view color background picture after coordinate-transforming and
merging pixels in
3 5

the background masks to which the various depth cameras correspond;
a background splitting unit, for splitting the full-scene top-view depth
background picture into
separate top-view depth background pictures corresponding to each depth
camera, and
splitting the full-scene top-view color background picture into separate top-
view color
background pictures corresponding to each depth camera;
a foreground recognizing unit, for updating pixels in a foreground region into
the top-view depth
background picture and the top-view color background picture of the
corresponding depth
camera by recognizing the foreground region that contains human body pixels in
third depth
images obtained in real time by the various depth cameras, so as to update the
top-view depth
picture and the top-view color picture of each depth camera;
a full-scene merging unit, for merging the top-view depth pictures of the
various depth cameras
to form a full-scene top-view depth picture, and merging the top-view color
pictures of the
various depth cameras to form a full-scene top-view color picture; and
a pedestrian detecting unit, for recognizing a pedestrian detecting result by
comparing pixels in
the full-scene top-view depth picture and pixels in the full-scene top-view
depth background
picture, and comparing pixels in the full-scene top-view color picture and
pixels in the full-
scene top-view color background picture.
10. A computer-readable storage medium, storing a computer program thereon,
characterized in
that the method steps according to any of Claims 1 to 8 are realized when the
computer program
is executed by a processor.
36

Description

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.

PEDESTRIAN DETECTING METHOD AND DEVICE
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to the field of image recognition
technology, and more
particularly to a pedestrian detecting method and a pedestrian detecting
device.
Description of Related Art
[0002] In this age of vigorous development of artificial intelligence, various
new things come
into being like mushrooms sprout after a spring rain, as unmanned supermarkets
and
unmanned stores emerge one another the other. With the advent of smart retail
as the tidal
waves of the day, offline retail is combined with artificial intelligence, and
it has become
a new orientation of research to provide a completely novel purchasing mode as
smooth
as online shopping. The imperceptible shopping experience of "take-and-go"
comes to its
true meaning by providing in real time such services as commodities
recommendation
and settlement through full-coverage shooting of behavior track of every
customer
coming into a closed scenario.
[0003] The currently not so many pedestrian detecting schemes are all directed
to relatively
spatial scenarios in which shooting is inevitably obliquely down oriented, the
advantage
thereof is the larger projection area of shooting, thus facilitating to obtain
more feature
information, but the ensuing shielding problem also cannot be avoided. In such
a
complicated scenario as an unattended store or an unmanned supermarket,
performance
ill effects brought about by shielding might render it impossible for the
entire system to
normally operate, so that settlement on leaving the store and shopping
experiences are
adversely affected.
1
Date Recue/Date Received 2022-03-01

SUMMARY OF THE INVENTION
[0004] An objective of the present invention is to provide a pedestrian
detecting method and a
pedestrian detecting device, whereby the problem concerning the missing of
shielded
information due to oblique shooting via a single camera is effectively solved
by collecting
and monitoring pedestrian data within a scenario with specific angles by means
of plural
depth cameras, and precision in pedestrian detection data is enhanced.
[0005] In order to achieve the above objective, a first aspect of the present
invention provides a
pedestrian detecting method that comprises:
[0006] creating a background mask corresponding to each depth camera according
to a first depth
image captured by each depth camera, wherein the background mask includes a
ground
mask and a marker mask;
[0007] respectively updating background masks to which various depth cameras
correspond on
the basis of pixels in plural frames of second depth images continuously
captured by each
depth camera, and pixels in the background mask corresponding to each depth
camera;
[0008] obtaining a full-scene top-view depth background picture and a full-
scene top-view color
background picture after coordinate-transforming and merging pixels in the
background
masks to which the various depth cameras correspond;
[0009] splitting the full-scene top-view depth background picture into
separate top-view depth
background pictures corresponding to each depth camera, and splitting the full-
scene top-
view color background picture into separate top-view color background pictures
corresponding to each depth camera;
[0010] updating pixels in a foreground region into the top-view depth
background picture and
the top-view color background picture of the corresponding depth camera by
recognizing
the foreground region that contains human body pixels in third depth images
obtained in
real time by the various depth cameras, so as to update the top-view depth
picture and the
top-view color picture of each depth camera;
2
Date Recue/Date Received 2022-03-01

[0011] merging the top-view depth pictures of the various depth cameras to
form a full-scene
top-view depth picture, and merging the top-view color pictures of the various
depth
cameras to form a full-scene top-view color picture; and
[0012] recognizing a pedestrian detecting result by comparing pixels in the
full-scene top-view
depth picture and pixels in the full-scene top-view depth background picture,
and
comparing pixels in the full-scene top-view color picture and pixels in the
full-scene top-
view color background picture.
[0013] Preferably, the step of creating a background mask corresponding to
each depth camera
according to a first depth image captured by each depth camera includes:
[0014] frame-selecting a ground region from the first depth image captured by
each depth camera
to create a ground fitting formula, and frame-selecting at least one marker
region to create
a marker fitting formula corresponding to the marker region in a one-to-one
manner;
[0015] creating the ground mask corresponding to each depth camera according
to the ground
fitting formula, and creating the marker mask corresponding to each depth
camera
according to the marker fitting formula; and
[0016] merging the ground mask and the marker mask to form the background mask
corresponding to each depth camera.
[0017] Preferably, the step of updating background masks on the basis of
pixels in plural frames
of second depth images continuously captured by each depth camera, and pixels
in the
background mask corresponding to each depth camera includes:
[0018] comparing depth values of pixels at various corresponding locations in
the Mth frame of
second depth image and the m+lth frame of second depth image captured by the
same
depth camera, where an initial value of m is 1;
[0019] recognizing pixels whose depth values are changed, updating the depth
value of a pixel
at a corresponding location in the m+lth frame of second depth image as a
small value in
the comparing result, let m = m+1, and comparing again depth values of pixels
at various
corresponding locations in the Mth frame of second depth image and the m+lth
frame of
3
Date Recue/Date Received 2022-03-01

second depth image, until pixels at various locations in the last frame of
second depth
image and their corresponding depth values are obtained;
[0020] comparing the pixels at various locations in the last frame of second
depth image and
their corresponding depth values with pixels at various locations in the
background mask
and their corresponding depth values; and
[0021] recognizing pixels whose depth values are changed, and updating the
depth value of a
pixel at a corresponding location in the background mask as a small value in
the
comparing result.
[0022] Preferably, the step of obtaining a full-scene top-view depth
background picture and a
full-scene top-view color background picture after coordinate-transforming and
merging
pixels in the background masks to which the various depth cameras correspond
includes:
[0023] creating a full-scene top-view depth background blank template picture
and a full-scene
top-view color background blank template picture, wherein depth values of
pixels at
various locations in the full-scene top-view depth background blank template
picture are
zero, and color values of pixels at various locations in the full-scene top-
view color
background blank template picture are zero;
[0024] merging and unifying pixels in the background masks to which the
various depth cameras
correspond to form a full-scene background mask, uniformly transforming the
pixel
coordinates to world coordinates, and then unifoimly transforming the world
coordinates
to top-view coordinates;
[0025] sequentially traversing pixels in the full-scene background mask,
comparing a depth
value of each pixel with depth values of pixels at corresponding locations in
the full-scene
top-view depth background blank template picture, and replacing pixels at
corresponding
locations in the full-scene top-view depth background blank template picture
with large-
value pixels in the full-scene background mask, to obtain a full-scene top-
view depth
background picture; and
[0026] on the basis of pixels to which replacement occurs in the full-scene
top-view depth
background mask, replacing pixels at corresponding locations in the full-scene
top-view
4
Date Recue/Date Received 2022-03-01

color background blank template picture with their pixel color values, to
obtain a full-
scene top-view color background picture.
[0027] Preferably, the step of splitting the full-scene top-view depth
background picture into
separate top-view depth background pictures corresponding to each depth
camera, and
splitting the full-scene top-view color background picture into separate top-
view color
background pictures corresponding to each depth camera includes:
[0028] on the basis of top-view coordinates of pixels of the background mask
to which each
depth camera corresponds, splitting the full-scene top-view depth background
picture into
separate top-view depth background pictures corresponding to each depth
camera, and
splitting the full-scene top-view color background picture into separate top-
view color
background pictures corresponding to each depth camera.
[0029] Further, the step of updating pixels in a foreground region into the
top-view depth
background picture and the top-view color background picture of the
corresponding depth
camera by recognizing the foreground region that contains human body pixels in
third
depth images obtained in real time by the various depth cameras includes:
[0030] comparing depth values of pixels in the third depth images obtained in
real time by the
depth cameras with depth values of pixels of the corresponding separate top-
view depth
background pictures;
[0031] employing a frame difference method to recognize pixels whose depth
values are small
values in the third depth images, and summarizing to obtain a foreground
region that
contains human body pixels;
[0032] correspondingly matching and associating pixels in the foreground
region with pixels of
the separate top-view depth background pictures in a one-to-one manner, and
replacing
depth values of the pixels in the separate top-view depth background pictures
with depth
values of the pixels in the corresponding foreground region; and
[0033] recognizing pixels to which replacement occurs in the separate top-view
depth
background pictures, and replacing corresponding pixels in the separate top-
view color
Date Recue/Date Received 2022-03-01

background pictures with color values of pixels in the foreground region.
[0034] Further, the step of merging the top-view depth pictures of the various
depth cameras to
form a full-scene top-view depth picture, and merging the top-view color
pictures of the
various depth cameras to form a full-scene top-view color picture includes:
[0035] traversing pixels in the corresponding top-view depth picture of each
depth camera, and
replacing depth values of pixels at corresponding locations in the full-scene
top-view
depth background picture, to obtain a full-scene top-view depth picture; and
[0036] recognizing pixels to which replacement occurs in the full-scene top-
view depth picture,
and replacing color values of pixels at corresponding locations in the full-
scene top-view
color background picture, to obtain a full-scene top-view color picture.
[0037] Preferably, the step of recognizing a pedestrian detecting result by
comparing pixels in
the full-scene top-view depth picture and pixels in the full-scene top-view
depth
background picture, and comparing pixels in the full-scene top-view color
picture and
pixels in the full-scene top-view color background picture includes:
[0038] comparing pixels whose depth values are changed in the full-scene top-
view depth picture
and the full-scene top-view depth background picture, and on the basis of a
dense region
area of pixels and depth values of the various pixels, recognizing a head
volume and/or a
body volume; and
[0039] recognizing a pedestrian detecting result on the basis of sizes and/or
a size of the head
volume and/or the body volume.
[0040] As compared with prior-art technology, the pedestrian detecting method
provided by the
present invention achieves the following advantageous effects.
[0041] The pedestrian detecting method provided by the present invention can
be divided into
an algorithm preparation phase, an algorithm initialization phase and an
algorithm
detection application phase in the actual application, of which the algorithm
preparation
6
Date Recue/Date Received 2022-03-01

phase is also the phase of generating the background mask of each depth
camera, and the
specific process is as follows: a first depth image of the current detected
scenario is firstly
obtained by each depth camera that shoots the image overhead, a ground region
and at
least one marker region are frame-selected in the first depth image, a ground
fitting
formula corresponding to each depth camera and a corresponding marker fitting
formula
are created, and a ground mask established from the ground fitting formula and
marker
masks established from the various marker fitting formulae are then merged to
obtain
background masks corresponding to the various depth cameras in the current
scenario.
The algorithm initialization phase is also a background mask updating phase,
and the
specific process is as follows: background update is performed on the
background masks
to which various depth cameras correspond on the basis of depth values of
pixels in plural
continuous frames of second depth images as obtained and depth values of
pixels in the
corresponding background mask, a full-scene top-view depth background picture
and a
full-scene top-view color background picture are subsequently obtained after
coordinate-
transforming and merging pixels in the various background masks, the full-
scene top-
view depth background picture is thereafter split into separate top-view depth
background
pictures corresponding to each depth camera, the full-scene top-view color
background
picture is split into separate top-view color background pictures
corresponding to each
depth camera, consequently, pixels in a foreground region are updated into the
top-view
depth background picture and the top-view color background picture of the
corresponding
depth camera on the basis of the foreground region that contains human body
pixels in a
third depth image obtained in real time by each depth camera, so as to update
the top-
view depth picture and the top-view color picture of each depth camera,
finally, the top-
view depth pictures of the various depth cameras are merged to form a full-
scene top-
view depth picture, and the top-view color pictures of the various depth
cameras are
merged to form a full-scene top-view color picture. The algorithm detection
application
phase is a human body region detecting phase, and its corresponding specific
process is
as follows: a pedestrian detecting result is comprehensively recognized by
comparing
pixels in the full-scene top-view depth picture and pixels in the full-scene
top-view depth
7
Date Recue/Date Received 2022-03-01

background picture, and comparing pixels in the full-scene top-view color
picture and
pixels in the full-scene top-view color background picture.
[0042] As can be seen, the present invention utilizes specific viewing angles,
such as the
overhead shooting mode, to obtain depth images and establish background masks,
solves
the problem concerning information missing due to shielding by oblique
shooting, and
enlarges application scenarios of pedestrian detection; in addition, the use
of depth
cameras increases information dimensions of images as compared with the use of
ordinary cameras, whereby data of 3D spatial coordinates including the human
height and
the head can be obtained, and precision in pedestrian detection data is
enhanced. Through
the distributed layout of multiple depth cameras, it is made possible to adapt
to
complicated monitored scenarios where a great deal of shielding is present,
and the use
of two-dimensional judging conditions as depth images and color images makes
it
possible to further enhance precision in pedestrian detection data.
[0043] The second aspect of the present invention provides a pedestrian
detecting device that is
applied to the pedestrian detecting method as recited in the aforementioned
technical
solution, and the device comprises:
[0044] a mask creating unit, for creating a background mask corresponding to
each depth camera
according to a first depth image captured by each depth camera, wherein the
background
mask includes a ground mask and a marker mask;
[0045] a mask updating unit, for respectively updating background masks to
which various depth
cameras correspond on the basis of pixels in plural frames of second depth
images
continuously captured by each depth camera, and pixels in the background mask
corresponding to each depth camera;
[0046] a mask merging unit, for obtaining a full-scene top-view depth
background picture and a
full-scene top-view color background picture after coordinate-transforming and
merging
pixels in the background masks to which the various depth cameras correspond;
[0047] a background splitting unit, for splitting the full-scene top-view
depth background picture
8
Date Recue/Date Received 2022-03-01

into separate top-view depth background pictures corresponding to each depth
camera,
and splitting the full-scene top-view color background picture into separate
top-view
color background pictures corresponding to each depth camera;
[0048] a foreground recognizing unit, for updating pixels in a foreground
region into the top-
view depth background picture and the top-view color background picture of the
corresponding depth camera by recognizing the foreground region that contains
human
body pixels in third depth images obtained in real time by the various depth
cameras, so
as to update the top-view depth picture and the top-view color picture of each
depth
camera;
[0049] a full-scene merging unit, for merging the top-view depth pictures of
the various depth
cameras to form a full-scene top-view depth picture, and merging the top-view
color
pictures of the various depth cameras to form a full-scene top-view color
picture; and
[0050] a pedestrian detecting unit, for recognizing a pedestrian detecting
result by comparing
pixels in the full-scene top-view depth picture and pixels in the full-scene
top-view depth
background picture, and comparing pixels in the full-scene top-view color
picture and
pixels in the full-scene top-view color background picture.
[0051] In comparison with prior-art technology, the advantageous effects
achievable by the
pedestrian detecting device provided by the present invention are identical
with the
advantageous effects achieved by the pedestrian detecting method as recited in
the
aforementioned technical solution, so no repetition is redundantly made
thereto in this
context.
[0052] The third aspect of the present invention provides a computer-readable
storage medium
storing thereon a computer program that executes the steps of the
aforementioned
pedestrian detecting method when it is operated by a processor.
[0053] In comparison with prior-art technology, the advantageous effects
achievable by the
computer-readable storage medium provided by the present invention are
identical with
9
Date Recue/Date Received 2022-03-01

the advantageous effects achieved by the pedestrian detecting method as
recited in the
aforementioned technical solution, so no repetition is redundantly made
thereto in this
context.
BRIEF DESCRIPTION OF THE DRAWINGS
[0054] The drawings described here are merely meant to supply further
comprehension to the
present invention, and constitute a portion of the present invention.
Exemplary
embodiments of the present invention and descriptions thereof are meant to
explain the
present invention, rather than to restrict the present invention. In the
drawings:
[0055] Fig. 1 is a flowchart schematically illustrating the pedestrian
detecting method in
Embodiment 1 of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0056] In order to make apparent and comprehensible the aforementioned
objectives, features
and advantages of the present invention, the technical solutions in the
embodiments of
the present invention will be more clearly and comprehensively described below
with
reference to the accompanying drawings in the embodiments of the present
invention.
Apparently, the embodiments as described are merely partial, rather than the
entire,
embodiments of the present invention. All other embodiments obtainable by
persons
ordinarily skilled in the art on the basis of the embodiments in the present
invention
without spending creative effort in the process shall all be covered by the
protection scope
of the present invention.
[0057] Embodiment 1
[0058] Please refer to Fig. 1, this embodiment provides a pedestrian detecting
method that
comprises:
Date Recue/Date Received 2022-03-01

[0059] creating a background mask corresponding to each depth camera according
to a first depth
image captured by each depth camera, wherein the background mask includes a
ground
mask and a marker mask; respectively updating background masks to which
various
depth cameras correspond on the basis of pixels in plural frames of second
depth images
continuously captured by each depth camera, and pixels in the background mask
corresponding to each depth camera; obtaining a full-scene top-view depth
background
picture and a full-scene top-view color background picture after coordinate-
transforming
and merging pixels in the background masks to which the various depth cameras
correspond; splitting the full-scene top-view depth background picture into
separate top-
view depth background pictures corresponding to each depth camera, and
splitting the
full-scene top-view color background picture into separate top-view color
background
pictures corresponding to each depth camera; updating pixels in a foreground
region into
the top-view depth background picture and the top-view color background
picture of the
corresponding depth camera by recognizing the foreground region that contains
human
body pixels in third depth images obtained in real time by the various depth
cameras, so
as to update the top-view depth picture and the top-view color picture of each
depth
camera; merging the top-view depth pictures of the various depth cameras to
form a full-
scene top-view depth picture, and merging the top-view color pictures of the
various
depth cameras to form a full-scene top-view color picture; and recognizing a
pedestrian
detecting result by comparing pixels in the full-scene top-view depth picture
and pixels
in the full-scene top-view depth background picture, and comparing pixels in
the full-
scene top-view color picture and pixels in the full-scene top-view color
background
picture.
[0060] The pedestrian detecting method provided by this embodiment can be
divided into an
algorithm preparation phase, an algorithm initialization phase and an
algorithm detection
application phase in the actual application, of which the algorithm
preparation phase is
also the phase of generating the background mask of each depth camera, and the
specific
11
Date Recue/Date Received 2022-03-01

process is as follows: a first depth image of the current detected scenario is
firstly obtained
by each depth camera that shoots the image overhead, a ground region and at
least one
marker region are frame-selected in the first depth image, a ground fitting
formula
corresponding to each depth camera and a corresponding marker fitting formula
are
created, and a ground mask established from the ground fitting formula and
marker masks
established from the various marker fitting formulae are then merged to obtain
background masks corresponding to the various depth cameras in the current
scenario.
The algorithm initialization phase is also a background mask updating phase,
and the
specific process is as follows: background update is performed on the
background masks
to which various depth cameras correspond on the basis of depth values of
pixels in plural
continuous frames of second depth images as obtained and depth values of
pixels in the
corresponding background mask, a full-scene top-view depth background picture
and a
full-scene top-view color background picture are subsequently obtained after
coordinate-
transforming and merging pixels in the various background masks, the full-
scene top-
view depth background picture is thereafter split into separate top-view depth
background
pictures corresponding to each depth camera, the full-scene top-view color
background
picture is split into separate top-view color background pictures
corresponding to each
depth camera, consequently, pixels in a foreground region are updated into the
top-view
depth background picture and the top-view color background picture of the
corresponding
depth camera on the basis of the foreground region that contains human body
pixels in a
third depth image obtained in real time by each depth camera, so as to update
the top-
view depth picture and the top-view color picture of each depth camera,
finally, the top-
view depth pictures of the various depth cameras are merged to form a full-
scene top-
view depth picture, and the top-view color pictures of the various depth
cameras are
merged to form a full-scene top-view color picture. The algorithm detection
application
phase is a human body region detecting phase, and its corresponding specific
process is
as follows: a pedestrian detecting result is comprehensively recognized by
comparing
pixels in the full-scene top-view depth picture and pixels in the full-scene
top-view depth
background picture, and comparing pixels in the full-scene top-view color
picture and
12
Date Recue/Date Received 2022-03-01

pixels in the full-scene top-view color background picture.
[0061] As can be seen, this embodiment utilizes specific viewing angles, such
as the overhead
shooting mode, to obtain depth images and establish background masks, solves
the
problem concerning information missing due to shielding by oblique shooting,
and
enlarges application scenarios of pedestrian detection; in addition, the use
of depth
cameras increases information dimensions of images as compared with the use of
ordinary cameras, whereby data of 3D spatial coordinates including the human
height and
the head can be obtained, and precision in pedestrian detection data is
enhanced. Through
the distributed layout of multiple depth cameras, it is made possible to adapt
to
complicated monitored scenarios where a great deal of shielding is present,
and the use
of two-dimensional judging conditions as depth images and color images makes
it
possible to further enhance precision in pedestrian detection data.
[0062] As should be noted, the first depth image, second depth image and third
depth image in
the above embodiment differ from one another only in terms of purposes of use,
of which
the first depth image is used to create the ground fitting formula and the
marker fitting
formula, the second depth image is used to update the background mask, and the
third
depth image is used to obtain a real-time detected image of human body
detection data.
For instance, the first frame of image obtained through overhead shooting of a
monitored
region by a depth camera is taken to serve as the first depth image, the
second to the
hundredth frames of depth images are taken to serve as second depth images,
after the
background mask has been updated to completion, the real-time image obtained
through
overhead shooting of the monitored region by the depth camera is taken to
serve as the
third depth image.
[0063] In this embodiment, the step of creating a background mask
corresponding to each depth
camera according to a first depth image captured by each depth camera
includes:
[0064] frame-selecting a ground region from the first depth image captured by
each depth camera
13
Date Recue/Date Received 2022-03-01

to create a ground fitting formula, and frame-selecting at least one marker
region to create
a marker fitting formula corresponding to the marker region in a one-to-one
manner;
creating the ground mask corresponding to each depth camera according to the
ground
fitting formula, and creating the marker mask corresponding to each depth
camera
according to the marker fitting formula; and merging the ground mask and the
marker
mask to form the background mask corresponding to each depth camera.
[0065] During specific implementation, explanation is now made with an example
of creating a
background mask for a first depth image captured by one of the depth cameras.
The
method of creating a ground fitting formula based on the ground region frame-
selected
from the first depth image includes:
[0066] Sll ¨ making statistics on a data collection corresponding to the
ground region, the data
collection including a plurality of pixel coordinates and depth values
corresponding
thereto;
[0067] S12 ¨ randomly selecting n pixels from the ground region to create a
ground initial dataset,
where n>3 and n is an integer;
[0068] 513 ¨ creating an initial ground fitting formula based on the currently
selected n pixels,
traversing pixels not selected in the initial dataset, and sequentially
substituting the pixels
in the initial ground fitting formula to calculate ground fitting values of
the corresponding
pixels;
[0069] 514 ¨ screening out ground fitting values that are smaller than a first
threshold, and
generating ith round of effective ground fitting value collection, where the
initial value of
i is 1;
[0070] S15 ¨ when a ratio of the number of pixels to which the ith round of
effective ground
fitting value collection corresponds to the total number of pixels in the
ground region is
greater than a second threshold, accumulating the entire ground fitting values
in the ith
round of effective ground fitting value collection;
[0071] 16¨S
when the accumulating result of the entire ground fitting values in the ith
round is
smaller than a third threshold, the initial ground fitting formula to which
the ith round
14
Date Recue/Date Received 2022-03-01

corresponds is defined as the ground fitting formula, when the accumulating
result of the
entire ground fitting values to which the ith round corresponds is greater
than the third
threshold, let i = i+1, and returning to step S12 when i does not reach a
threshold number
of rounds, otherwise executing step S17; and
[0072] S17 ¨ defining the initial ground fitting formula, to which the minimum
value of the
accumulating results of the entire ground fitting values in all rounds
corresponds, as the
ground fitting formula.
[0073] The method of creating a corresponding marker fitting formula based on
the marker
region includes:
[0074] 521 ¨making statistics on a data collection corresponding to the marker
region in a one-
to-one manner, the data collection including a plurality of pixel;
[0075] S22 ¨ randomly selecting n image points from the marker region to
create a marker initial
dataset, where n>3 and n is an integer;
[0076] S23 ¨ creating an initial marker fitting formula based on the currently
selected n pixels,
traversing pixels not selected in the initial dataset, and sequentially
substituting the pixels
in the initial marker fitting formula to calculate marker fitting values of
the corresponding
pixels;
[0077] S24 ¨ screening out marker fitting values that are smaller than a first
threshold, and
generating ith round of effective marker fitting value collection, where the
initial value of
i is 1;
[0078] S25 ¨ when a ratio of the number of pixels to which the ith round of
effective marker
fitting value collection corresponds to the total number of pixels in the
marker region is
greater than a second threshold, accumulating the entire marker fitting values
in the ith
round of effective marker fitting value collection;
[0079] S26 ¨ when the accumulating result of the entire marker fitting values
in the ith round is
smaller than a third threshold, the initial marker fitting formula to which
the ith round
corresponds is defined as the marker fitting formula, when the accumulating
result of the
entire marker fitting values to which the ith round corresponds is greater
than the third
Date Recue/Date Received 2022-03-01

threshold, let i = i+1, and returning to step S22 when i does not reach a
threshold number
of rounds, otherwise executing step S27; and
[0080] S27 ¨ defining the initial marker fitting formula, to which the minimum
value of the
accumulating results of the entire marker fitting values in all rounds
corresponds, as the
marker fitting formula.
[0081] Explanation is made below with the marker fitting formula as an
example: a ground
region is firstly frame-selected through an interactive mode set by a program,
a data
collection is screened out to contain only ground image points, three pixels
are thereafter
randomly selected to create a ground initial dataset, and an initial ground
fitting formula
is fitted by employing a plane formula, aix + biy + ciz + di = 0, where i
represents the
serial number of a depth camera, if only one depth camera is used in the full
scene, then
i is valuated as 1, that is to say, the ground fitting formula is created only
with respect to
the first depth image captured by this one depth camera; if w depth cameras
are used in
the full scene, the valuation of i is traversed respectively through 1 to w,
that is to say,
corresponding ground fitting formulae should be created one by one with
respect to first
depth images captured by the w depth cameras.
[0082] After the initial ground fitting formula has been created, pixels not
selected in the initial
dataset are traversed (except for the three already selected pixels), world
coordinate
values (x, y, z) to which each pixel corresponds are sequentially substituted
in the initial
ground fitting formula (I axi + byi+ czi+ d1iI) to calculate ground fitting
values
error current to which the traversed pixels correspond, the ground fitting
values that are
smaller than a first threshold e are screened out to form an effective ground
fitting value
collection corresponding to this round of initial ground fitting formula, when
a ratio of
the number of corresponding pixels in this round of effective ground fitting
value
collection to the total number of pixels in the ground region is greater than
a second
threshold d, the entire ground fitting values in this round of effective
ground fitting value
collection are accumulated to obtain a result error sum, and when error
sum<error best
16
Date Recue/Date Received 2022-03-01

in this round, where error best is a third threshold, the ground fitting
formula is created
on the basis of the values of a, b, c, d in this round of initial ground
fitting formula,
whereas when error sum>error best in this round, the above steps should be
repeated to
enter the next round, i.e., three image points are selected anew to create a
ground initial
dataset, initial ground fitting formulae are created and a result of
accumulating the entire
ground fitting values in this round is obtained, until the initial ground
fitting formula to
which the minimum value of the result of accumulating the entire ground
fitting values
in all rounds corresponds is defined as the ground fitting formula.
[0083] Through the above process it is made possible to effectively avoid
interference from some
abnormal points, and the ground fitting formula as calculated is more fit to
the ground; in
addition, since the values of a, b, c, d in the ground fitting formula are
calculated by a
random consistency algorithm, the resultant ground fitting formula can be used
as the
optimal model of the ground region in the first depth image, the interference
of abnormal
points is effectively filtered out, and the established ground equation is
prevented from
deviating from the ground.
[0084] By the same token, the process of creating the marker fitting formula
is logically
consistent with the process of creating the ground fitting formula, so it is
not redundantly
described in this embodiment, but as should be stressed that, since there are
usually more
than one marker region, so marker fitting formulae should correspond to the
plural marker
regions in a one-to-one manner.
[0085] In this embodiment, the method of merging the ground mask and the
marker mask to form
a background mask corresponding to each depth camera includes:
[0086] creating a ground equation on the basis of the ground fitting formula,
and creating a
marker equation on the basis of the marker fitting formula; traversing pixels
in the first
depth image, and respectively substituting the pixels in the ground equation
and the
marker equation to obtain ground distances and marker distances of the pixels;
screening
17
Date Recue/Date Received 2022-03-01

out the pixels whose ground distances are smaller than a ground threshold to
be filled as
the ground mask, and screening out the pixels whose marker distances are
smaller than a
marker threshold to be filled as the marker masks; and merging the ground mask
and the
entire marker masks to obtain a background mask to which the depth camera
under the
current scenario corresponds.
laxt-EbYt+czt+dt1 =
[0087] During specific implementation, a general equation distance = is
Va2+b2+c2
employed to respectively calculate the ground equation and the marker
equation, when
the numerator I axi + by i + czi + diI is a ground fitting formula, and when
a, b,c in the
denominator are values in the ground fitting formula, this equation represents
a ground
equation, when the numerator I axi + byi + czi + diI is a marker fitting
formula, and
when a, b, c in the denominator are values in the marker fitting formula, this
equation
represents a marker equation. After the ground equation and the marker
equation have
been created to completion, ground distances and marker distances of the
entire pixels in
the first depth image are obtained by traversing the pixels and respectively
substituting
the pixels in the ground equation and the marker equation, the pixels whose
ground
distances are smaller than a ground threshold are screened out to be filled as
the ground
mask, and the pixels whose marker distances are smaller than a marker
threshold are
screened out to be filled as the marker mask.
[0088] Exemplarily, the ground threshold and the marker threshold are both set
as 10cm, that is
to say, the region within 10cm of the ground is defined as a ground mask, the
region
within 10cm of the marker is defined as a marker mask, and finally the regions
of the
ground mask and the entire marker masks are defined as the background mask of
the
current scenario. Through the creation of the background mask, it is made
possible to
effectively filter out noises on the marker region(s) and the ground region,
and to solve
the problem concerning reduction in algorithm performance caused by noises
generated
by depth cameras shooting these regions. For instance, the marker is a shelf.
18
Date Recue/Date Received 2022-03-01

[0089] In this embodiment, the method of updating the background mask on the
basis of pixels
in plural frames of second depth images continuously captured by each depth
camera,
and pixels in the background mask corresponding to each depth camera includes:
[0090] comparing depth values of pixels at various corresponding locations in
the Mth frame of
second depth image and the m+lth frame of second depth image captured by the
same
depth camera, where an initial value of m is 1; recognizing pixels whose depth
values are
changed, updating the depth value of a pixel at a corresponding location in
the m+lth
frame of second depth image as a small value in the comparing result, let m =
m+1, and
comparing again depth values of pixels at various corresponding locations in
the Mth
frame of second depth image and the m+lth frame of second depth image, until
pixels at
various locations in the last frame of second depth image and their
corresponding depth
values are obtained; comparing the pixels at various locations in the last
frame of second
depth image and their corresponding depth values with pixels at various
locations in the
background mask and their corresponding depth values; and recognizing pixels
whose
depth values are changed, and updating the depth value of a pixel at a
corresponding
location in the background mask as a small value in the comparing result.
[0091] During specific implementation, internal parameters and external
parameters of each
depth camera are firstly calibrated to perform transformation of the image
from two-
dimensional coordinates to three-dimensional coordinates, so that relevant
calculations
are made through practical physical meanings. Subsequently, each depth camera
is used
to continuously capture 100 frames of second depth images, and background
update is
performed on the background mask with respect to the 100 frames of second
depth images
captured by each depth camera. The updating process is as follows: by
comparing the
depth values of pixels (row, col) at various identical locations in the 100
frames of second
depth images, the minimum values of the corresponding depth values of pixels
(row, col)
at each identical location are screened out of the 100 frames of second depth
images, so
that the corresponding depth values of pixels (row, col) at various locations
in the 100
frames of second depth images as output are all minimum values in the 100
frames of
19
Date Recue/Date Received 2022-03-01

second depth images; such setup aims as follows: since the depth cameras
employ the
overhead shooting scheme, when a moving object (such as a passing pedestrian)
appears
in the second depth images, the depth values of pixels at corresponding
locations become
larger, by taking the minimum values of the corresponding depth values of
pixels at
identical locations in the 100 frames of second depth images, it is made
possible to
effectively prevent the second depth images from being interfered with passing
objects
that occasionally appear, and avoid the appearance of pixels of passing
objects in the
background mask. Thereafter, the pixels at various locations in the 100 frames
of second
depth images and their corresponding depth values are compared with pixels at
various
locations in the background mask and their corresponding depth values, pixels
whose
depth values are changed are recognized, and the depth values of pixels at
corresponding
locations in the background mask are updated as small values in the comparing
result, so
as to ensure precision of the updated background mask.
[0092] In this embodiment, the method of obtaining a full-scene top-view depth
background
picture and a full-scene top-view color background picture after coordinate-
transforming
and merging pixels in the background masks to which the various depth cameras
correspond includes:
[0093] creating a full-scene top-view depth background blank template picture
and a full-scene
top-view color background blank template picture, wherein depth values of
pixels at
various locations in the full-scene top-view depth background blank template
picture are
zero, and color values of pixels at various locations in the full-scene top-
view color
background blank template picture are zero; merging and unifying pixels in the
background masks to which the various depth cameras correspond to form a full-
scene
background mask, uniformly transforming the pixel coordinates to world
coordinates,
and then uniformly transforming the world coordinates to top-view coordinates;
sequentially traversing pixels in the full-scene background mask, comparing a
depth
value of each pixel with depth values of pixels at corresponding locations in
the full-scene
top-view depth background blank template picture, and replacing pixels at
corresponding
Date Recue/Date Received 2022-03-01

locations in the full-scene top-view depth background blank template picture
with large-
value pixels in the full-scene background mask, to obtain a full-scene top-
view depth
background picture; and on the basis of pixels to which replacement occurs in
the full-
scene top-view depth background mask, replacing pixels at corresponding
locations in
the full-scene top-view color background blank template picture with its pixel
color
values, to obtain a full-scene top-view color background picture.
[0094] During specific implementation, depth values of pixels at various
locations in the created
full-scene top-view depth background blank template picture are zero, namely
back depth(row, col) = 0, color values of pixels at various locations in the
created full-
scene top-view color background blank template picture are zero, namely back
color(row,
col) = [0,0,0], thereafter, the pixels in the background masks to which the
various depth
cameras correspond are merged, that is to say, the pixels in the background
masks to
which the various depth cameras correspond are unifofinly expressed by the
same and
single pixel coordinate system to form a full-scene background mask, the
various pixels
in the full-scene background mask are then uniformly transformed via pixel
coordinates
to world coordinates, and subsequently uniformly transformed from the world
coordinates to top-view coordinates under the current monitored scenario ¨ the
coordinates transforming process is well known to persons skilled in the art,
and is not
redundantly described in this embodiment. Consequently, a pixel comparison
formula
current depth(row, col) > back depth(row, col) is employed to compare the
depth value
of each pixel [current depth(row, col)] in the full-scene background mask with
the depth
value of a pixel [back depth(row, col)] at the corresponding location in the
full-scene
top-view depth background blank template picture, a full-scene top-view depth
background picture formula back depth(row, col) = current depth(row, col) is
employed
to replace pixels at corresponding locations in the full-scene top-view depth
background
blank template picture with large-value pixels in the full-scene background
mask, to
obtain a full-scene top-view depth background picture, and a full-scene top-
view color
background picture formula back color(row, col) = current color(row, col) is
employed
21
Date Recue/Date Received 2022-03-01

to replace pixels at corresponding locations in the full-scene top-view color
background
blank template picture with color values of pixels to which replacement occurs
in the full-
scene top-view depth background mask, to obtain a full-scene top-view color
background
picture.
[0095] Understandably, current depth(row, , col) represents the depth values
of pixels in the full-
scene background mask, back depth(row, , col) represents the depth values of
pixels in the
full-scene top-view depth background blank template picture, the formula
back depth(row, , col) = current depth(row, col) represents assigning the
depth value of a
pixel at a certain coordinate location in the full-scene background mask to
the pixel at the
corresponding location in the full-scene top-view color background blank
template
picture, namely to replace the pixel at the corresponding location in the full-
scene top-
view depth background blank template picture; by the same token, current
color(row,
col) represents the color values of pixels in the full-scene background mask,
back color(row, col) represents the color values of pixels in the full-scene
top-view color
background blank template picture, the formula back color(row, col) =
current color(row, col) represents assigning the color value of a pixel at a
certain
coordinate location in the full-scene background mask to the pixel at the
corresponding
location in the full-scene top-view color background blank template picture. A
full-scene
top-view depth background picture and a full-scene top-view color background
picture
are formed until the various pixels have all been traversed.
[0096] In this embodiment, the method of splitting the full-scene top-view
depth background
picture into separate top-view depth background pictures corresponding to each
depth
camera, and splitting the full-scene top-view color background picture into
separate top-
view color background pictures corresponding to each depth camera includes:
[0097] on the basis of top-view coordinates of pixels of the background mask
to which each
depth camera corresponds, splitting the full-scene top-view depth background
picture into
separate top-view depth background pictures corresponding to each depth
camera, and
22
Date Recue/Date Received 2022-03-01

splitting the full-scene top-view color background picture into separate top-
view color
background pictures corresponding to each depth camera.
[0098] During specific implementation, sensor depth[k] represents separate top-
view depth
background pictures to which the kth depth camera corresponds, back depth
represents
the full-scene top-view depth background picture, the formula sensor
depth[k]frow, col)
= back depth(row, col) is employed to split the full-scene top-view depth
background
picture into separate top-view depth background pictures to which the kth
depth camera
corresponds, in which back depth(row, col) represents the depth value of a
certain
coordinate pixel in the full-scene top-view depth background picture,
sensor depth[k]frow, col) represents the depth value of a pixel at a certain
coordinate
location in the separate top-view depth background pictures to which the kth
depth camera
corresponds, the formula sensor depth[k]frow, col) = back depth(row, col)
represents
assigning the depth value of a pixel at a certain coordinate location in the
full-scene
background mask to the pixel at the corresponding location in the separate top-
view depth
background pictures to which the kth depth camera corresponds; by the same
token,
sensor color[k] represents separate top-view color background pictures to
which the kth
depth camera corresponds, back color represents the full-scene top-view color
background picture, the formula sensor color[k]frow, col) = back color(row,
col) is
employed to split the full-scene top-view color background picture into
separate top-view
color background pictures to which the kth depth camera corresponds.
[0099] In this embodiment, the method of updating pixels in a foreground
region into the top-
view depth background picture and the top-view color background picture of the
corresponding depth camera by recognizing the foreground region that contains
human
body pixels in third depth images obtained in real time by the various depth
cameras
includes:
[0100] comparing depth values of pixels in the third depth images obtained in
real time by the
depth cameras with depth values of pixels of the corresponding separate top-
view depth
23
Date Recue/Date Received 2022-03-01

background pictures; employing a frame difference method to recognize pixels
whose
depth values are small values in the third depth images, and summarizing to
obtain a
foreground region that contains human body pixels; correspondingly matching
and
associating pixels in the foreground region with pixels of the separate top-
view depth
background pictures in a one-to-one manner, and replacing depth values of the
pixels in
the separate top-view depth background pictures with depth values of the
pixels in the
corresponding foreground region; and recognizing pixels to which replacement
occurs in
the separate top-view depth background pictures, and replacing corresponding
pixels in
the separate top-view color background pictures with color values of pixels in
the
foreground region. Seen as such, through such a frame difference method, it is
made
possible to effectively filter out noises from the third depth images obtained
in real time,
and enhance precision in foreground region recognition.
[0101] During specific implementation, in order to reduce the number of
pixels, the voxel
filtering method can be employed to filter the pixels to reduce the number of
pixels and
to enhance computing speed. Exemplarily, the voxel size is set as vox size =
(0.1,0.1,0.1),
and the sparse outlier removing method is employed to filter out partial
pixels based on
distances between adjacent pixels and the multiples of standard deviations, so
as to
effectively reduce interference from outlying noises.
[0102] In this embodiment, the method of merging the top-view depth pictures
of the various
depth cameras to form a full-scene top-view depth picture, and merging the top-
view
color pictures of the various depth cameras to form a full-scene top-view
color picture
includes:
[0103] traversing pixels in the corresponding top-view depth picture of each
depth camera, and
replacing depth values of pixels at corresponding locations in the full-scene
top-view
depth background picture, to obtain a full-scene top-view depth picture; and
recognizing
pixels to which replacement occurs in the full-scene top-view depth picture,
and replacing
color values of pixels at corresponding locations in the full-scene top-view
color
24
Date Recue/Date Received 2022-03-01

background picture, to obtain a full-scene top-view color picture.
[0104] In this embodiment, the method of recognizing a pedestrian detecting
result by comparing
pixels in the full-scene top-view depth picture and pixels in the full-scene
top-view depth
background picture, and comparing pixels in the full-scene top-view color
picture and
pixels in the full-scene top-view color background picture includes:
[0105] comparing pixels whose depth values are changed in the full-scene top-
view depth picture
and the full-scene top-view depth background picture, and on the basis of a
dense region
area of pixels and depth values of the various pixels, recognizing a head
volume and/or a
body volume; and recognizing a pedestrian detecting result on the basis of
sizes and/or a
size of the head volume and/or the body volume.
[0106] During specific implementation, considering that there might be the
case of unintentional
captures in the detecting result, it is possible to filter out such captures
according to actual
physical features, by transforming the full-scene top-view depth picture to
the actual
world coordinates, the physical volume of the human body, such as the physical
volume
of the human head, etc., is calculated in the foreground region in conjunction
with a
human body detection frame, for instance, boundary lengths and widths of the
human
body and the human head are calculated on the basis of pixel coordinates, and
the physical
volume of the human body and the physical volume of the human head are then
calculated
and obtained in combination with depth values.
[0107] If Vbody max > Vbody> Vbody min is satisfied, human body volume
requirement is met,
[0108] If Vheard max > Vhead> Vhead Mill is satisfied, human head volume
requirement is met,
[0109] where Vbody represents the physical volume of the human body as
detected, Vhead
represents the physical volume of the human head as detected, Vbody max and
Vboa), min
represent preset upper limit and lower limit of recognition of the physical
volume of the
human body, and Vhead max and Vhead min represent preset upper limit and lower
limit of
recognition of the physical volume of the human head. If only the human body
while no
Date Recue/Date Received 2022-03-01

human head is detected in the full-scene top-view depth picture, a human-head
searching
mode is started to automatically search for the human head frame in the full-
scene top-
view depth picture through an algorithm. Through the human head frame
searching
function, it is made possible to effectively call back the human head frame
missing in the
full-scene top-view depth picture, and to thus enhance algorithm stability.
[0110] During specific implementation, boundary pixels of the foreground
region in the full-
scene top-view depth picture are recognized through the frame difference
method, i.e.,
the foreground region of the human body ROT is represented by
bird _depth_map _mask _r oi , and the formula bird _depth_map _mask_roi =
bird_depth_map_mask[row _min: row _max, col_min: col_max] is employed to
recognize the boundary pixels of the foreground region, where row min and row
max
represent the upper limit and the lower limit of the pixels at the x axis, and
col min:
col max represent the upper limit and the lower limit of the pixels at they
axis. Moreover,
in order to accelerate computation, it is possible to employ the mode of
calculating
integral graph accumulation, namely to accumulatively add the depth values of
plural
pixels, until the location of a human head frame is demarcated on reaching a
threshold
range. Thereafter, head points are searched in the human head frame, in other
words, the
head points are circled within the human head frame to traverse movements, and
the head
points region is searched in the human head frame on the basis of the ratio of
the
foreground pixels in the head points circle to the entire pixels in the
circle. Through the
aforementioned head-point searching mechanism, it is made possible to
effectively filter
out the interference from noises, avoid instability of head points caused by
noises, and to
prevent noises from causing adverse effect on the body height and subsequent
tracking.
[0111] Subsequently, it is further possible to base on an average value of
depth values of pixels
of the head region in the head points region to calculate the human body
height by
laxt bYt czt dti
employing the formula distance = Va2+b2+c2 , and 2D or 3D head-point
coordinates.
26
Date Recue/Date Received 2022-03-01

[0112] In summary, this embodiment exhibits the following creative aspects:
[0113] Through the distributed layout of multiple depth cameras, it is made
possible to adapt to
complicated monitored scenarios where a great deal of shielding is present, by
slightly
partially overlapping specific viewing angles of the depth cameras, it is
possible to make
maximal use of the coverage area of the viewing angles of the cameras, and to
obtain a
full-scene top-view depth picture of the entire monitored scenario in
conjunction with
merging rules.
[0114] The use of RGBD depth cameras increases information dimensions, a full-
scene top-view
depth picture of particular viewing angles is obtained by merging depth
information, and
a full-scene top-view color picture of particular viewing angles is obtained
by merging
color information. Pedestrian detection can be effectively performed through
the full-
scene top-view color picture, secondary verification can be performed on the
detecting
result in conjunction with depth information, and such information as relevant
to height
can be obtained.
[0115] The use of the merging mode whereby foreground and background are
respectively
merged can reduce the merging of irrelevant background, effectively enhance
the overall
merging time, and hence enhance algorithm performance.
[0116] The use of refined and simplified algorithmic logics, for instance, the
use of human head
frame searching function can avoid the circumstance in which the pedestrian
cannot be
subsequently tracked due to absence of the head frame, enhance robustness of
the
algorithms.
[0117] The solution of the above embodiment separately carries out foreground
detection, and a
full-scene top-view picture of the entire scenario is finally integrated via a
merging
module, whereby waste of computational resources can be effectively reduced,
and
computing speed is enhanced.
[0118] Embodiment 2
27
Date Recue/Date Received 2022-03-01

[0119] This embodiment provides a pedestrian detecting device that comprises:
[0120] a mask creating unit, for creating a background mask corresponding to
each depth camera
according to a first depth image captured by each depth camera, wherein the
background
mask includes a ground mask and a marker mask;
[0121] a mask updating unit, for respectively updating background masks to
which various depth
cameras correspond on the basis of pixels in plural frames of second depth
images
continuously captured by each depth camera, and pixels in the background mask
corresponding to each depth camera;
[0122] a mask merging unit, for obtaining a full-scene top-view depth
background picture and a
full-scene top-view color background picture after coordinate-transforming and
merging
pixels in the background masks to which the various depth cameras correspond;
[0123] a background splitting unit, for splitting the full-scene top-view
depth background picture
into separate top-view depth background pictures corresponding to each depth
camera,
and splitting the full-scene top-view color background picture into separate
top-view
color background pictures corresponding to each depth camera;
[0124] a foreground recognizing unit, for updating pixels in a foreground
region into the top-
view depth background picture and the top-view color background picture of the
corresponding depth camera by recognizing the foreground region that contains
human
body pixels in third depth images obtained in real time by the various depth
cameras, so
as to update the top-view depth picture and the top-view color picture of each
depth
camera;
[0125] a full-scene merging unit, for merging the top-view depth pictures of
the various depth
cameras to form a full-scene top-view depth picture, and merging the top-view
color
pictures of the various depth cameras to form a full-scene top-view color
picture; and
[0126] a pedestrian detecting unit, for recognizing a pedestrian detecting
result by comparing
pixels in the full-scene top-view depth picture and pixels in the full-scene
top-view depth
background picture, and comparing pixels in the full-scene top-view color
picture and
pixels in the full-scene top-view color background picture.
28
Date Recue/Date Received 2022-03-01

[0127] In comparison with prior-art technology, the advantageous effects
achievable by the
pedestrian detecting device provided by this embodiment of the present
invention are
identical with the advantageous effects achieved by the pedestrian detecting
method
provided by the aforementioned Embodiment 1, so no repetition is redundantly
made
thereto in this context.
[0128] Embodiment 3
[0129] This embodiment provides a computer-readable storage medium storing
thereon a
computer program that executes the steps of the aforementioned pedestrian
detecting
method when it is operated by a processor.
[0130] In comparison with prior-art technology, the advantageous effects
achievable by the
computer-readable storage medium provided by this embodiment are identical
with the
advantageous effects achieved by the pedestrian detecting method provided by
the
aforementioned technical solution, so no repetition is redundantly made
thereto in this
context.
[0131] As comprehensible to persons ordinarily skilled in the art, the entire
or partial steps
realizing the method of the present invention can be completed through a
program that
instructs relevant hardware, the program can be stored in a computer-readable
storage
medium, and subsumes the various steps of the embodied method when it is
executed,
and the storage medium can be ROM/RAM, a magnetic disk, an optical disk, a
memory
card, etc.
[0132] What is described above is merely directed to specific embodiments of
the present
invention, but the protection scope of the present invention is not restricted
thereby. Any
variation or replacement easily conceivable to persons skilled in the art
within the
technical range disclosed by the present invention shall be covered within the
protection
29
Date Recue/Date Received 2022-03-01

scope of the present invention. Accordingly, the protection scope of the
present invention
shall be based on the Claims.
Date Recue/Date Received 2022-03-01

Dessin représentatif

Une figure unique qui représente un dessin illustrant l'invention.

États administratifs

2024-08-01 : Dans le cadre de la transition vers les Brevets de nouvelle génération (BNG), la base de données sur les brevets canadiens (BDBC) contient désormais un Historique d'événement plus détaillé, qui reproduit le Journal des événements de notre nouvelle solution interne.

Veuillez noter que les événements débutant par « Inactive : » se réfèrent à des événements qui ne sont plus utilisés dans notre nouvelle solution interne.

Pour une meilleure compréhension de l'état de la demande ou brevet qui figure sur cette page, la rubrique Mise en garde , et les descriptions de Brevet , Historique d'événement , Taxes périodiques et Historique des paiements devraient être consultées.

Historique d'événement

Description	Date
Modification reçue - réponse à une demande de l'examinateur	2024-03-11
Modification reçue - modification volontaire	2024-03-11
Rapport d'examen	2023-12-14
Inactive : Rapport - Aucun CQ	2023-12-13
Lettre envoyée	2023-02-03
Inactive : Page couverture publiée	2022-10-11
Requête d'examen reçue	2022-09-16
Toutes les exigences pour l'examen - jugée conforme	2022-09-16
Exigences pour une requête d'examen - jugée conforme	2022-09-16
Demande publiée (accessible au public)	2022-09-02
Inactive : CIB en 1re position	2022-08-01
Inactive : CIB attribuée	2022-08-01
Inactive : CIB attribuée	2022-08-01
Inactive : CIB attribuée	2022-08-01
Inactive : CIB attribuée	2022-08-01
Inactive : CIB attribuée	2022-08-01
Inactive : CIB en 1re position	2022-08-01
Exigences de dépôt - jugé conforme	2022-03-17
Lettre envoyée	2022-03-17
Exigences applicables à la revendication de priorité - jugée conforme	2022-03-16
Demande de priorité reçue	2022-03-16
Demande reçue - nationale ordinaire	2022-03-01
Inactive : Pré-classement	2022-03-01
Inactive : CQ images - Numérisation	2022-03-01

Historique d'abandonnement

Il n'y a pas d'historique d'abandonnement

Taxes périodiques

Le dernier paiement a été reçu le 2023-12-15

Avis : Si le paiement en totalité n'a pas été reçu au plus tard à la date indiquée, une taxe supplémentaire peut être imposée, soit une des taxes suivantes :

taxe de rétablissement ;
taxe pour paiement en souffrance ; ou
taxe additionnelle pour le renversement d'une péremption réputée.

Les taxes sur les brevets sont ajustées au 1er janvier de chaque année. Les montants ci-dessus sont les montants actuels s'ils sont reçus au plus tard le 31 décembre de l'année en cours.
Veuillez vous référer à la page web des taxes sur les brevets de l'OPIC pour voir tous les montants actuels des taxes.

Historique des taxes

Type de taxes	Anniversaire	Échéance	Date payée
Taxe pour le dépôt - générale		2022-03-01	2022-03-01
Requête d'examen - générale		2026-03-02	2022-09-16
TM (demande, 2e anniv.) - générale	02	2024-03-01	2023-12-15

Titulaires au dossier

Les titulaires actuels et antérieures au dossier sont affichés en ordre alphabétique.

Titulaires actuels au dossier
10353744 CANADA LTD.

Titulaires antérieures au dossier
HUAIYUAN JI
JIANG LIU
WEI JING
YANTAO YIN
YINJUN HUANG

Les propriétaires antérieurs qui ne figurent pas dans la liste des « Propriétaires au dossier » apparaîtront dans d'autres documents au dossier.

Documents

Pour visionner les fichiers sélectionnés, entrer le code reCAPTCHA :

Pour visualiser une image, cliquer sur un lien dans la colonne description du document (Temporairement non-disponible). Pour télécharger l'image (les images), cliquer l'une ou plusieurs cases à cocher dans la première colonne et ensuite cliquer sur le bouton "Télécharger sélection en format PDF (archive Zip)" ou le bouton "Télécharger sélection (en un fichier PDF fusionné)".

Liste des documents de brevet publiés et non publiés sur la BDBC .

Si vous avez des difficultés à accéder au contenu, veuillez communiquer avec le Centre de services à la clientèle au 1-866-997-1936, ou envoyer un courriel au Centre de service à la clientèle de l'OPIC.

Filtre

Télécharger sélection en format PDF (archive Zip)

Télécharger sélection (en un fichier PDF fusionné)

Description du Document	Date (yyyy-mm-dd)	Nombre de pages	Taille de l'image (Ko)
Revendications	2024-03-10	88	4 715
Description	2022-02-28	30	1 476
Revendications	2022-02-28	6	287
Abrégé	2022-02-28	1	23
Dessins	2022-02-28	1	86
Page couverture	2022-10-10	2	85
Dessin représentatif	2022-10-10	1	38
Modification / réponse à un rapport	2024-03-10	191	7 243
Courtoisie - Certificat de dépôt	2022-03-16	1	578
Courtoisie - Réception de la requête d'examen	2023-02-02	1	423
Demande de l'examinateur	2023-12-13	5	248
Nouvelle demande	2022-02-28	7	234
Requête d'examen	2022-09-15	6	209

Sélection de la langue

Menus

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Taxes périodiques

Historique des taxes

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.

Sommaire du brevet 3150597

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Taxes périodiques

Historique des taxes

Votre demande est en traitement.Les informations demandèes serontaccessibles dans quelques instants.Merci de patienter.

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.