Language selection

Search

Patent 3143843 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3143843
(54) English Title: SYSTEMS AND METHODS FOR FACE AND OBJECT TRACKING AND MONITORING
(54) French Title: SYSTEMES ET METHODES POUR LE SUIVI ET LA SURVEILLANCE DE FACE ET D'OBJET
Status: Application Compliant
Bibliographic Data
(51) International Patent Classification (IPC):
  • G6V 10/75 (2022.01)
  • G6V 10/40 (2022.01)
  • G6V 10/82 (2022.01)
  • G6V 40/16 (2022.01)
  • G6V 40/40 (2022.01)
  • G7C 9/37 (2020.01)
(72) Inventors :
  • ANSARI, DANISH AHMED (India)
  • KRISHNASAMY, MURUGAN (United Arab Emirates)
  • SAXENA, RAJNEESH KANT (India)
  • YADAV, RAJEEV (Nigeria)
(73) Owners :
  • CYBERSMART TECHNOLOGIES INC.
(71) Applicants :
  • CYBERSMART TECHNOLOGIES INC. (Canada)
(74) Agent: SMART & BIGGAR LP
(74) Associate agent:
(45) Issued:
(22) Filed Date: 2021-12-23
(41) Open to Public Inspection: 2023-06-23
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data: None

Abstracts

English Abstract


Methods, devices, and systems for identifying an object, such as a person, and
tracking an
object. In an example, a method for identifying an object includes: receiving,
at a
processor, two or more images of a first object from different directions;
detecting, at the
processor using a convolutional neural network (CNN), features of the first
object in the
two or more images; comparing, at the processor, the features with a second
set of
features extracted from two or more existing images of a second object; in
response to the
features matching the second set of features of the two or more existing
images to a
predetermined threshold, identifying, by the processor, the first object to be
the same as
the second object. The processor can initiate a security event when the first
object is the
same object as the second object.


Claims

Note: Claims are shown in the official language in which they were submitted.


WHAT IS CLAIMED IS:
1. A method for identifying an object, comprising:
receiving, at a processor, two or more images of a first object from different
directions;
detecting, at the processor using a convolutional neural network (CNN),
features of the
first object in the two or more images;
comparing, at the processor, the features with a second set of features
extracted from two
or more existing images of a second object; and
in response to the features matching the second set of features of the two or
more existing
images to a predetermined threshold, identifying, by the processor, the first
object to be a
same object as the second object.
2. The method of claim 2, further comprising: generating, at the processor,
an object label
uniquely identifying the first object.
3. The method of claim 1 or 2, wherein each of the first object and the
second object is a
person, and the two or more images and the two or more existing images include
a face of the
person.
4. The method of claim 3, wherein the two or more existing images include
at least one
photo from a photo identification of the person.
5. The method of claim 3, wherein the two or more existing images include
all available
photos of the person.
23

6. The method of any one of claims 3 to 5, further comprising
distinguishing, by the
processor using 3 dimensional face recognition, a real face of the person
versus an image of the
person.
7. The method of any one of claims 3 to 6, further comprising detecting, by
the processor
using the CNN, at least one action of the first object for the identifying the
first object to be the
same object as the second object.
8. The method of any one of claims 3 to 7, further comprising monitoring,
by the processor
using the CNN, one or more responses of the person to one or more prompts from
the processor,
or a movement of a body part of the person, for the identifying the first
object to be the same
object as the second object.
9. The method of any one of claims 3 to 8, further comprising initiating,
by the processor
when the identifying of the first object is the same object as the second
object, a security event
with respect to the person.
10. The method of any one of claim 9, wherein the security event comprises
granting or
denying access of the person to an area.
11. The method of any one of claim 8, wherein the person's responses are
detected by the
processor concurrently or sequentially.
12. The method of any one of claims 1 to 11, wherein the features are
Kanade¨Lucas¨
Tomasi (KLT) features.
24

13. The method of any one of claims 1 to 12, further comprising generating,
by the processor,
a location label associated with a location of the first object.
14. The method of any one of claims 3 to 13, further comprising detemining,
by the
processor, an identity of the person by iris, retina, hand geometry, or voice
of the person for the
identifying the first object to be the same object as the second object.
15. The method of claims 3 to 14, further comprising identifying, by the
processor, a
wearable object worn by the person.
16. The method of any one of claims 1 to 15, wherein the identifying is
performed by the
processor using the CNN.
17. A method for tracking an object, comprising:
receiving, at a processor, two or more images of an object taken from
different directions;
receiving, at the processor, a location associated with the at least one of
the two or more
images;
identifying, at the processor using a convolutional neural network (CNN), the
object in
the two or more images; and
in response to determining that the object is included in stored images,
generating, at the
processor, a location label associated with a location of the first object
base on a location
of a camera generating the two or more images .
18. The method claim 17, wherein the locations of the object comprise
coordinates of the
object.

19. A method for tracking an object, comprising:
transmitting, at a camera device, two or more images of an object taken from
different
directions;
transmitting, by the camera device, a location associated with the at least
one of the two
or more images;
identifying, at a display device using a convolutional neural network (CNN),
the object in
the two or more images; and
in response to determining that the object is included in stored images,
generating, at the
display device, a location label associated with a location of the first
object base on a
location of a camera generating the two or more images.
20. The method of claim 19, further comprising, displaying, by the display
device, one or
more location labels associated with the first object during a selected
period.
21. A non-transitory computer-readable medium including instructions which,
when
executed by at least one processor, cause the at least one processor to
perform the method as
claimed in any one of claims 1-18.
26

Description

Note: Descriptions are shown in the official language in which they were submitted.


SYSTEMS AND METHODS FOR FACE AND OBJECT TRACKING AND
MONITORING
TECHNICAL FIELD
[0001] Example embodiments relate to facial recognition, tracking, and
monitoring for
security applications.
BACKGROUND
[0002] Traditional facial recognition (FR) technology is built on
"face foreword photos".
Lighting, angle, and obstructions such as glasses, hair, or masks can cause
inaccuracies.
[0003] Existing conventional FR Artificial Intelligence (AI) is
difficult to use for person
tracking, and typically requires identification badges to store the
credentials for comparing to the
person being tracked. Person tracking can be difficult due to great volume of
identifications to be
verified, and the superior computing capability required for training AT
models. As well, existing
face tracking system do not work well to track objects at a large scale.
Finally, there is a need to
control errors based on organizational standards or security consideration.
SUMMARY
[0004] Example embodiments include methods, devices, and systems for
identifying an
object, such as a person, and tracking the person for security purposes. In an
example, a method
for identifying the object includes: receiving, at a processor, two or more
images of a first object
from different directions; detecting, at the processor using a convolutional
neural network
(CNN), features of the first object in the two or more images; comparing, at
the processor, the
features with a second set of features extracted from two or more existing
images of a second
object; in response to the features matching the second set of features of the
two or more existing
images to a predetermined threshold, identifying, by the processor, the first
object to be same as
the second object.
1
Date recue/ date received 2021-12-23

[0005] In another example, a method for tracking an object includes
receiving, at a
processor, two or more images of an object taken from different directions;
receiving, at the
processor, a location associated with the at least one of the two or more
images; identifying, at
the processor using convolutional neural network (CNN), the object in the two
or more images;
in response to determining that the object is included in stored images,
generating, at the
processor, locations of the object based on the location associated with at
least one of the two or
more images and locations associated with the stored images.
[0006] In another example, a method for tracking an object includes
transmitting, at a
camera device, two or more images of an object taken from different
directions; transmitting, by
the camera device, a location associated with the at least one of the two or
more images;
identifying, at a display device using a convolutional neural network (CNN),
the object in the
two or more images; in response to determining that the object is included in
stored images,
generating, at the display device, locations of the object based on the
location associated with at
least one of the two or more images and locations associated with the stored
images.
[0007] In another example, a non-transitory computer-readable medium
includes
instructions which, when executed by at least one processor, cause the at
least one processor to
perform the method as claimed in any one of the preceding examples.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] Reference will now be made, by way of example, to the
accompanying drawings
which show example embodiments, and in which:
[0009] Figure 1 is a schematic structural diagram of a system of an
object tracking
system, in accordance with an example embodiment;
[0010] Figure 2 is a schematic diagram of a hardware structure of a
camera device of the
system in Figure 1;
[0011] Figure 3 is a schematic diagram of a hardware structure of a display
device of the
system in Figure 1;
2
Date recue/ date received 2021-12-23

[0012] Figure 4A is a diagram illustrating object tracking by the
system of Figure 1, in
accordance with an example embodiment;
[0013] Figure 4B is a diagram illustrating example images captured by
the camera device
in Figure 4A, in accordance with an example embodiment;
[0014] Figure 5 is a diagram illustrating steps of a facial recognition
method, in
accordance with an example embodiment;
[0015] Figure 6 is a diagram illustrating example steps of liveness
recognition in Figure
5;
[0016] Figure 7 is a flow chart showing an object tracking method of
the system in
Figure 1; and
[0017] Figure 8 is a block diagram of a convolutional neural network
(CNN) model for
use in identifying and tracking an object by the system of Figure 1, in
accordance with an
example embodiment.
[0018] Similar reference numerals may have been used in different
figures to denote
similar components.
DETAILED DESCRIPTION
[0019] Figure 1 illustrates a block diagram of an example facial
recognition, object
identifying, and object tracking system 100, in accordance with an example
embodiment.
Objects include persons (humans) and non-human objects. The system 100 can be
used to
identify a person using facial recognition, identify an object, track an
objects, including
movement of people.
3
Date recue/ date received 2021-12-23

[0020] In the example of Figure 1, the system 100 can include: one or
more camera
devices 104 and one or more display devices 106. The camera device 104 can be
used to capture
images 102 of an object of interest. The object can be a person or a physical
object, such as
wearable objects including glasses, hat, mask, etc. The camera device 104 can
also be used to
identify and track an object. The facial recognition can be used to identify a
person. The object
identifying can be used to identity an object, such as facial recognition of a
person, and object
tracking can be used to track an object.
[0021] The camera device 104 can include rules based models to
identify an object,
including to perform the facial recognition, and/or track and object. The
camera device 104 can
also include machine learning models, which can include one or more neural
networks (NNs)
such as convolutional neural networks (CNNs). The camera device 104 can be a
security camera.
The display devices 106 can be configured to display the objects and
coordinates of the objects
to a user. The display device 106 can be a security monitoring terminal or a
security monitoring
mobile device.
[0022] In examples, the camera device 104 and the display device 106 can
communicate
over communication links 108 and communication sessions. The communication
links 108 can
be wireless or wired. In an example, each of the communication links can
include a WebSocket
protocol to provide continuous two-way communication.
[0023] Figure 2 illustrates a block diagram of the camera device 104,
in accordance with
an example embodiment. The camera device 104 can be an electronic device or
user equipment.
The camera device 104 can be a camera or a video camera. The camera device 104
can also be a
mobile camera device 104. The camera device 104 can be operated by a user or a
robot. The
camera device 104 can be a security camera, which can be in a fixed location
or can be mobile.
The security camera may be in a fixed location and controllable with respect
to pan, zoom, and
tilt (also known as PZT).
4
Date recue/ date received 2021-12-23

[0024] The camera device 104 includes one or more cameras 522, which
can be used to
capture images of the objects from more or more directions. The example camera
device 104
includes at least one memory 502, at least one processor 504, and at least one
communications
interface 506. The camera device 104 can include input or output (I/O)
interface devices 508,
including but not limited to touch screen, display screen, keyboard, micro-
phone, speaker,
mouse, gesture feedback devices (through the camera 522) and/or haptic feed-
back device. In
some examples, memory 502 can access the object database 110 and the map
database 112, from
the cloud storage.
[0025] In the example of Figure 2, the camera device 104 includes
sensors 520 which are
used to detect information from the environment of the camera device 104. In
an example, the
sensors 520 can be used to determine a location and an orientation (e.g.,
pitch, roll, yaw) of the
camera device 104. The sensors 520 can include: global positioning system
(GPS), local
positioning system (LPS), range director or scanner such as LiDAR to determine
the camera
distance to objects or points of the objects, barometric pressure sensor to
determine a height of
the camera device 104, compass to determine orientation of the camera device
104 in relation to
North, and/or accelerometers to determine orientation of the camera device
104. The GPS and/or
the LPS can be used to generate the location of the camera device 104. The
range detector can be
used to determine a distance between the camera device 104 and the object
being captured by the
camera 522.
[0026] The range director such as LiDAR can be used by the camera device
104 to
determine the camera distance to objects or points of the objects. For
example, the distance
between the closest point of the object to the camera device 104.
[0027] In some examples, the distance between the camera device 104
and the object can
be generated using photogrammetry. In some examples, Google (TM) ARCore can be
used to
generate the distance between the camera device 104 and the object. In some
examples, a
combination of photogrammetry and at least one of the sensors 520 can be used
by the
positioning module 518 to determine the distance.
5
Date recue/ date received 2021-12-23

[0028] The positioning module 518 can generate, using the sensor
information: i) camera
location, ii) camera orientation, and/or iii) camera distance to object. In
some examples, the
positioning module 518 uses data from the sensors 520. In some examples, the
positioning
module 518 uses data from the GPS and/or the LPS. In some examples, the object
is tracked and
presumed to be the same location and optionally the same orientation as the
camera device 104,
i.e., the user is holding the camera device 104.
[0029] In an example, the positioning module 518 may also include
ARCore. ARCore
includes a mobile augmented reality library that can be used for camera
orientation estimation,
which is readily available on most Android (TM) devices or smaiiphones. ARCore
is a library by
Google (TM), which uses the data from the inertial measurement unit (IMU)
sensors (e.g
accelerometer, magnetometer, and gyroscope), along with image feature points
for tracking the
camera orientation of the camera device 104 utilizing a Simultaneous
Localization and Mapping
(SLAM) algorithm. ARCore can perform camera orientation estimation in real-
time. In that
regard, to track the motion of the camera device 104, an android application
(i.e. the positioning
module 518) using ARCore can be developed in Unity3D environment, the Unreal
environment,
or other interactive 3D environments, for capturing RGB images along with the
real world
location of the camera device 104. The positioning module 518 can generate or
determine the
location and the camera orientation of the camera device 104 for each image
102.
[0030] In example embodiments, the memory 502 can store modules for
execution by the
processor 504, including: image 2D or 3D object detection module 510,
positioning module 518,
and anchor point generator 524. The modules can include software stored in the
memory 502,
hardware, or a combination of software and hardware. In some examples, the
modules of the
camera device 104 include machine learning models, which can include NNs such
as CNNs. For
example, the image 2D or 3D object detection module 510 can include an image
2D or 3D object
detector model which includes a CNN. In some examples, one or more of the
modules are
executed by other devices, such as a cloud server.
6
Date recue/ date received 2021-12-23

[0031] The anchor point generator 524 is used to generate anchor
points of the feature
points of the object including a person, using the location and orientation of
the camera device
104. For example, the anchor points are generated for identifying and/or
tracking the object. In
an example, the anchor points can be generated by the camera device 104 using
ARAnchorManager from AR Foundation. In some examples, each anchor point of the
object is
individually trackable. In examples, movements of the object, or a part of the
object, can be
tracked using the anchor points.
[0032] Figure 3 illustrates a block diagram of the display device
106, in accordance with
an example embodiment. The display device 106 can be an electronic device or
user equipment.
The display device 106 can be a desktop, a laptop, a set top box, or a mobile
communication
device such as a smart phone or a tablet. The display device 106 can be the
same as or different
from the camera device 104 (e.g., for AR purposes). The user of the display
device 106 can be
the same as or different from the user of the camera device 104.
[0033] The example display device 106 in Figure 3 includes at least
one memory 402, at
least one processor 404, at least one communications interface 406, and I/O
interface devices
408. The memory 402, the processor 404, the communications inter-face 406 can
be similar to
those described camera device 104 of Figure 2. The memory 402 can store a
2D/3D display
module 410 for execution by the processor 404. The modules (e.g. 2D/3D display
module 410)
of the display device 106 can include software stored in the memory 402,
hardware, or a
combination of software and hardware. The display device 106 includes a
display 412, which can
be a 360-degree display. The I/O interface devices 408 can include but are not
limited to touch
screen, keyboard, camera, microphone, speaker, mouse, gesture feedback device
(through the
camera or accelerometers) and/or haptic feedback device.
[0034] The 2D/3D display module 410 can receive, from a third party
mapping service, a
2D or 3D map for display on the display 412. The 2D/3D display module 410 can
display
movement of an object based on the map data or real time coordinates of the
object. In some
examples, the 2D/3D display module 410 is executed by a particular platform
such as a 3D video
7
Date recue/ date received 2021-12-23

platform such as a mobile platform, streaming platform, web platform, gaming
platform,
application plug-ins, etc. The display device 106 can include input/output
(I/O) interface devices
408 for interacting with the user. In an example embodiment, the display 412
is a computer
monitor.
[0035] In examples, the system 100 is configured to identify an object or a
person. In the
example of Figure 4A, the system 100 is configured to identify an object 202,
in accordance with
an example embodiment. Examples will be described with relation to one object
202, such as a
person as shown in Figure 4A. The object may be any physical object, such as a
car, a chair, an
animal, or a plant, etc.
[0036] In Figure 4A, the camera device 104 can be operated by a user or
machine that
takes images 102 of the object 202. The camera device 104 can take one or more
images 102 of
the object 202. In some examples, the camera device 104 captures a video of
the object 202,
therefore generating one or more images 102.
[0037] In some examples, the system 100 is configured to identify a
person based on the
image 102 using facial recognition. In some examples, camera device 104 is
configured to
perform a facial recognition of a human face, which is three-dimensional and
changes in
appearance with lighting and facial expression. The camera device 104 or
display device 106 is
configured to detect face and to segment the face from the image background,
align the
segmented face image to account for face pose, image size and photographic
properties, such as
.. illumination and grayscale to enable the accurate localization of facial
features.
[0038] In some examples, the camera device 104 or display device 106
may extract the
facial features, in which features such as eyes, nose and mouth are pinpointed
and measured in
the image to represent the face. The camera device 104 may then match the
established feature
vector of the face against a database of faces on images, photos or photo IDs
stored in a database,
such as object database 110 or in memory 402.
8
Date recue/ date received 2021-12-23

[0039] The facial feature points are features detected in the image
102 by the camera
device 104 or display device 106, represented by the circles 203 in Figure 4A.
Facial feature
points, also known as feature edge points, Kanade¨Lucas¨Tomasi (KLT) corners
or Harris
corners, are identified visual features of particular edges detected from the
image 102. In an
example, Google ARCore can be used to generate the facial feature points.
[0040] The extracted facial features can be used to search for or
compare with stored
images of people. The camera device 104 or the display device 106 can be
configured to perform
the comparison using eigenfaces, linear discriminant analysis and elastic
bunch graph matching
using the Fisherface algorithm, the hidden Markov model, the multilinear
subspace learning
using tensor representation, and/or the neuronal motivated dynamic link
matching. In some
examples, the comparison is performed using NNs or CNNs.
[0041] If camera device 104 or the display device 106 determines that
extracted facial
features on image 102 match the facial features of a stored image to a
predetermined threshold,
the camera device 104 or display device 106 is configured to consider the
identity of the person
in association with stored image to be the identity of the person on the image
102. For example,
the camera device 104 or display device 106 can also generate an object score
which represents
the probability or confidence score of the identity of the person. The camera
device 104 or
display device 106 may also generate an object label in the system 100 to
uniquely identify the
person or object 202 in the image of 102.
[0042] In some examples, the system 100 may also perform 3D face
recognition using
cameras 522 to capture two or more images about the shape of a face from
different directions.
The images are used to identify distinctive features on the surface of a face,
such as the contour
of the eye sockets, nose, and chin. 3D face recognition is not affected by
changes in lighting like
other techniques. It can also identify a face from a range of viewing angles,
including a profile
view. 3D data points from a face improve the precision of face recognition. In
an example, 3D
images of faces may be captured by three cameras 522 that point at different
angles; one camera
will be pointing at the front of the subject, second one to the side, and
third one at an angle. All
9
Date recue/ date received 2021-12-23

these cameras 522 work together to track a subject's face in real-time and be
able to detect and
recognize the face of a person.
[0043] By using 3D facial recognition or liveness recognition, the
system 100 has an
expanded accuracy range for angles, lighting conditions, and obstructive
objects. For example,
the system 100 may have up to 45 degrees lateral, 30 degrees vertical, and
partially obstructed
still has 90% accuracy.With the facial recognition capacity, the system 100 is
able to perform
mass identification as it does not require the cooperation of the rest subject
to work. The system
100 is accurate for identifying and tracking objects or people of interest in
controlled areas,
including airports, offices, and prisons. The system 100 can also be used as
traditional facial
recognition technologies for security or login. For example, the system 100
may be implemented
in airports, multiplexes, prisons, and other selected places to more
accurately identify individuals
among the crowd, without passers-by even being aware of the system.
[0044] In some examples, the system 100 is configured to perform face
recognition
including face verification, face recognition, and liveness recognition. For
example, the camera
.. device 104 or display device 106 may also perform a face verification by
comparing a person in
the image 102 with the person on a photo of a photo ID. The camera device 104
or display
device 106 may also perform a face recognition by comparing a person in the
image 102 with all
available photos and determining the identity of the person.
[0045] Figures 5 is an example flow chart showing an example method
500 for facial
recognition. The face recognition in system 100 may include three levels. At
step 552, system
100 receives two or more images taken from different directions of a person or
an object. At step
554, system 100 first performs face recognition and verification as described
in Figure 4A above.
[0046] As well, by using 3D facial recognition or liveness
recognition, the system 100
can take into account wearable object of a person, such as head covering, face
masks, sunglasses,
facial hairs, hats, caps, facial gestures, various lighting situations. At
step 555, the system 100
may identify a wearable object worn by a person. For example, the system 100
may generate
Date recue/ date received 2021-12-23

images from different directions of the person, the system 100 is configured
to identify a
wearable object worn by a person. For example, based on a position relative to
the features of
eyes, the system 100 identify the wearable object is a hat if the object is
above the eyes, a pair of
glass if the object in worn in front of the eyes, or a mask if the object is
below the eyes. The
identification of the wearable object can be used to assist in the
identification of the person.
[0047] At step 556, the system 100 may include anti-spoofing
mechanism. At step 506,
the system 100 can distinguish a real face of a person from an image or video
of the person. For
example, the system 100 may use 3D face recognition in which physically
present a person is
needed in order to recognize the person correctly. A photo or a video of the
person would not
suffice to fool the system 100. As such, the system 100 prevents bots and bad
actors from using
stolen photos, injected deepfake videos, life-like masks, or other spoofs.
This mechanism ensures
only real humans can be recognized, and can distinguish a real face from an
image even without
user interaction. For example, as illustrated in Figure 6, the system 100 may
perform a liveness
recognition for an enhanced security mechanism. For example, at step 602, the
system 100 can
detect whether the person in the image 102 is an actual live person, and can
distinguish a real
face of a person from a saved picture of a person. For example, the system 100
may use 3D face
recognition in which physically present a person is needed in order to
recognize the person
correctly.
[0048] In some examples, the system 100 can include enhanced security
mechanism for a
restricted area or initiate a security event. In some examples, the security
event includes access to
a device in the restricted area. At step 558, the system 100 is configured to
detect actions of a
person. For example, at step 604 in Figure 6, the system 100 is configured to
recognize a
person's responses, such as head turns, lifting arm, etc., and the system 100
is configured to
monitor the movement of the relevant body parts, to evaluate whether the
person has responded
as requested. The requested action may include one or more actions, discretely
or continuously,
instructed by an audio output from the I/0 interface device 508. For example,
the system 100 is
configured to require users to perform a simple task such as following a
moving dot on the
screen. By determining that the person correctly responded to the request, the
system 100
11
Date recue/ date received 2021-12-23

determines that the image in the person in the image is indeed a live person.
In some examples,
the system 100 may have initially stored a person's response, or a movement of
a body part of
the person. The system 100 can then compare the stored response of the person
with the
responses captured by the camera device 104 to determine whether the person on
the camera is
the same person previously stored in the system 100.
[0049] In another example, the system 100, such as the camera device
104, is configured
to display several flashing dots on the surface of the camera device 104, and
to monitor the
movement of the pupils, evaluating whether the flashing dot has been followed
correctly. If the
pupils have correctly follow the flashing dots, the system 100 may grant the
person access to a
.. restricted area, such as the entrance of a room storing a safe. If the
pupils have incorrectly follow
the flashing dots, the system 100 may deny the person access to a restricted
area, for example, by
keeping a door locked. In addition to facial recognition, the system 100 can
also require a user to
perform multiple actions concurrently or sequentially to identify a person.
[0050] In some examples, the system 100 is further configured to
combine facial
recognition described above with geo-positioning to identify a person or
object and the location
of an object or a person. For example, the geo-location of the person may be
used to further
determine a specific person is indeed at a predetermined location, such as s
prisoner or a security
guard in a prison. In some examples, the system 100 is configured to combine
facial recognition
with bio-informatics of a person to more accurately identify the person. For
example, the system
100 is further configured to recognize a person based on identify the person
and the biological
and behavioral traits of the person for enhanced security or to improve the
accuracy of facial
recognition. For example, the system 100 may identify the person by
recognizing the person's
fingerprint, face, iris, palmprint, retina, hand geometry, voice, signature,
posture and gait
received in system 100. The system 100 can also take into consideration of the
biological and
.. behavioral traits of the person to increase the confidence of the
probability or object score when
the system 100 identifies a person by facial recognition to assist with
identifying the person in
steps 554 and 706 to be described below.
12
Date recue/ date received 2021-12-23

[0051] In some examples, the system 100 is configured to track the
object 202. As
described above, the camera device 104 can perform image 2D or 3D object
detection on the
image 102 to identify the object 202.
[0052] In the example of Figure 4A, the camera devices 104(1)-104(3)
takes three
images 102(1)-102(3) of the object 202, with the locations of the camera
device 104(1)-(3)
shown as 1st location, 2nd location, and 3rd location. Each camera device
104(1)-104(3) can
determine the locations (coordinates) of the camera device 104 with
positioning module 518
described in Figure 2. Figure 4B illustrates example images 102 captured by
the camera device
104, in accordance with an example embodiment.
[0053] Referring to Figure 4A, a first image 102(1) is captured by the
camera device
104(1) from the 1st location, a second image 102(2) is captured by the camera
device 104(2)
from the 2nd location, and a third image 102(3) is captured by the camera
device 104(3) from the
3rd location. The images 102(1)-(3) all have different POVs of the same object
202 based on
where the images 102 are captured by the camera devices 104(1)-(3). In some
examples, multiple
images can be captured at the same orientation of the camera device 104(1)-
(3), at different
zoom distances to the object 202, e.g., optical zoom, digital zoom, or
manually moving the
camera device 104. More or fewer images 102 of the object 202 can be taken
than those shown
in Figure 4A.
[0054] The camera device 104 can also generate feature points in the
images 102(1)-(3)
in the same manner as described in facial recognition to identify a person
above. Although not
shown in Figure 4B, an object label and feature points of the object 202 are
also generated for
the second image 102(2) from the 2nd location and for the third image 102(3)
from the 3rd
location. For the same object 202, the object label is the same in the first
image 102(1), the
second image 102(2), and the third image 102(3). Consensus rules and/or object
scores can be
used to resolve any conflicts in the object label. As such, by identifying a
same object at different
locations, the system 100 can track the positon of the object.
13
Date recue/ date received 2021-12-23

[0055] The camera device 104 or display device 106 can include a
front detection model
that includes a NN such as a CNN. For example, the CNN can be trained to
return a vector that
represents the front identifying information. By training the CNN, the system
100 can reduce the
errors in facial recognition and object tracking. In an example, the front
identifying information
is the anchor points of the front of the object. In an example, the front
identifying information
can include descriptive text, e.g. "face" or "nose" for a human, or "bill" of
a hat. In an example,
the front detection model can query the object database 110 to retrieve any
one of the following
example front identifying information: the descriptive text, an image of a
front of the object.
[0056] In example embodiments, using the object label, the system 100
can track an
object 202 in the system 100. For example, the display device 106 can be used
to track and
monitor an object 202. A distance threshold for the movement of the object can
be used in some
examples to determine whether the object 202 had actually moved, in which the
distance
threshold can vary depending on the application, the size of the object 202,
or the particular
environment.
[0057] In some examples, the camera device 104 captures the images 102
using video
capture. A video can include a plurality of video frames, which are the images
102. For example,
a user or a machine can activate a video record function of the camera device
104 and move the
camera device 104 to the first location, the second location, and the third
location (and/or other
locations). The video can then be used by extracting the images 102 (video
frames), which are
then used to identify an object 202 for example, using facial recognition
and/or to track object
202. The video can be recorded and then processed by the object tracking
method at a later time,
or can be processed in real-time. In some examples, audio from the video can
be used to assist
the object tracking method in generating the object label, for example to
identify a human voice
(using voice recognition), etc.
[0058] Figure 7 illustrates an example block diagram of an object tracking
method 700
performed by system 100, in accordance with an example embodiment. At step
702, the camera
device 104 receives at least two or more images which includes an object. For
example, at least
14
Date recue/ date received 2021-12-23

one image is received from the camera 522. At step 704, the camera device 104
generates, for
each image, using the positioning module 518, a camera location associated
with each image.
The system 100 may consider camera location as the object location. At step
706, the camera
device 104 identifies, using the image 2D/3D object detection module 510, the
object in each
image. Optionally, the camera device 104 may general an object label to
uniquely identify the
object detected in each image, using the object identification method 600
describes above. As
illustrated in Figure 4A, different camera devices 104(1)-(3) can transmit the
captured image,
extracted features points of the object, the location data of the camera
devices 104, and generated
object label to the display device 106. At step 708, display device 106
determines whether the
object is also included in previous images. In an example, the display device
106 may decide
whether the object is also included in previous images by comparing the object
label associated
with the object. If the object label associated with a first object in the
image captured by a first
camera device 104 is the same as the object label associated with the second
object in the image
captured by a second camera device 104, the display device 106 considers the
first and second
objects to be the same. In another example, the display device 106 may
determine whether the
object is also included in previous images by comparing the feature points of
a first object in the
image captured by a first camera device 104 with the features points of a
second object in the
image captured a second camera device 104. If the feature points of a first
object are determined
to be the same or substantially the same as the second object, the first
object is determined to be
the same as the second object. In either case, the display device 106
determines that the object
202 is also included in previous images.
[0059] When the display device 106 determines that the object is also
included in
previous images, at step 710, the display device 106 may generate a location
label corresponding
to the camera location. The display device 106 may also display, for example
using 2D/3D
display module 410, location label of the object. As such, with the method
700, the system 100
can track the location of the object 202. For example, the location label may
be "server room on
the fifth floor", or "exit at the underground parking". The display device 106
may also display a
location label of the object over a period to indicate movement of the object.
Date recue/ date received 2021-12-23

[0060] In some examples, the display device 106 is configured to
receive images and
coordinates of the object 202 from the camera devices 104(1)-104(3), and
perform the method
700 from step 706-710.
[0061] It would be appreciated that the facial recognition method 500
and object tracking
method 700 can be applied to a plurality of objects 202. For example, each
object 202 can be
identified using the facial recognition method 500, and/or processed at the
same time through the
tracking method 700, or alternatively each individual object 202 can be
identified individually
using the facial recognition method 500, and/or processed individually through
the tracking
method 700 to detect and track each individual object instance at a time. The
tracking method
.. 700 is used to determine movement of the object 202, and locate the object
using the coordinates
associated with the object 202.
[0062] As described above, the system 100 may be configured to use
artificial
intelligence, such as CNN, to perform the object identification, facial
recognition in method 500
and/or object tracking in method 700 above. Figure 8 illustrates an example
detailed block
diagram of a CNN model for facial recognition and object tracking performed by
the system 100,
in accordance with an example embodiment. For examples, at least one or more
of the described
modules or applications of the camera device 104 and/or display device 106 can
include a CNN.
The CNN is a deep neural network with a convolutional structure, and is a deep
learning
architecture. The deep learning architecture indicates that a plurality of
layers of learning is
performed at different abstraction layers by using a machine learning
algorithm. As a deep
learning architecture, the CNN is a feed-forward (feed-forward) artificial
neural network. Each
neural cell in the feed-forward artificial neural network may respond to an
image input to the
neural cell.
[0063] As shown in Figure 8, the CNN 1100 may include an input layer
1110, a
.. convolutional layer/pooling layer 1120 (the pooling layer is optional), and
a fully connected
network layer 1130. In examples, the input layer 1110 can receive the image
102 and can receive
other information (depending on the particular module or model).
16
Date recue/ date received 2021-12-23

[0064] The convolutional layer/pooling layer 1120 shown in Figure 7
can include, for
example, layers 1122(1), 1122(2), ..., 1122(n). For example: In an
implementation, the layer
1122(1) is a convolutional layer, the layer 1122(2) is a pooling layer, the
layer 1122(3) is a
convolutional layer, the layer 1122(4) is a pooling layer, the layer 1122(5)
is a convolutional
layer, and the layer 122(6) is a pooling layer, and so on. In another
implementation, the layers
1122(1) and 1122(2) are convolutional layers, the layer 1122(3) is a pooling
layer, the layers
1122(4) and 1122(5) are convolutional layers, and the 1122(6) is a pooling
layer. In examples, an
output from a convolutional layer may be used as an input to a following
pooling layer, or may
be used as an input to another convolutional layer, to continue a convolution
operation.
[0065] The following describes internal operating principles of a
convolutional layer by
using the layer 1122(1) as an example of a convolutional layer 1122(1). The
convolutional layer
1122(1) may include a plurality of convolutional operators. The convolutional
operator is also
referred to as a kernel. A role of the convolutional operator in image
processing is equivalent to a
filter that extracts specific information from an input image matrix. In
essence, the convolutional
operator may be a weight matrix. The weight matrix is usually predefined. In
the process of
performing a convolution operation on an image, the weight matrix is usually
processed one
pixel after another (or two pixels after two pixels), depending on the value
of a stride in a
horizontal direction on the input image, to extract a specific feature from
the image. The size of
the weight matrix needs to be related to the size of the image. It should be
noted that a depth
dimension of the weight matrix is the same as a depth dimension of the input
image. In the
convolution operation process, the weight matrix extends to the entire depth
of the input image.
Therefore, after convolution is performed on a single weight matrix,
convolutional output with a
single depth dimension is output. However, the single weight matrix is not
used in most cases,
but a plurality of weight matrices with same dimensions (row x column) are
used, in other words,
a plurality of same-model matrices. Outputs of all the weight matrices are
stacked to form the
depth dimension of the convolutional image. It can be understood that the
dimension herein is
determined by the foregoing "plurality". Different weight matrices may be used
to extract
different features from the image. For example, one weight matrix is used to
extract image edge
information, another weight matrix is used to extract a specific color of the
image, still another
17
Date recue/ date received 2021-12-23

weight matrix is used to blur unneeded noises from the image, and so on. The
plurality of weight
matrices have the same size (row x column). Feature graphs obtained after
extraction performed
by the plurality of weight matrices with the same dimension also have the same
size, and the
plurality of extracted feature graphs with the same size are combined to form
an output of the
convolution operation.
[0066] Weight values in the weight matrices need to be obtained
through a large amount
of training in actual application. The weight matrices formed by the weight
values obtained
through training may be used to extract information from the input image, so
that the CNN 1100
performs accurate prediction. By continuously training the CNN model, the
system 100 can learn
automatically from previous identifications and improve accuracy in facial
recognition and
tracking object. .
[0067] When the CNN 1100 has one or more convolutional layers, an
initial
convolutional layer (such as 1122(1)) usually extracts a relatively large
quantity of common
features. The common feature may also be referred to as a low-level feature.
As the depth of the
CNN 1100 increases, a feature extracted by a deeper convolutional layer (such
as 1122(6) or
1122(n)) becomes more complex, for example, a feature with high-level
semantics or the like. A
feature with higher-level semantics is more applicable to a to-be-resolved
problem.
[0068] An example of the pooling layer is also described. Because a
quantity of training
parameters usually needs to be reduced, a pooling layer usually needs to
periodically follow a
convolutional layer. To be specific, at the layers 1122(1), .... 1122(n), one
pooling layer may
follow one convolutional layer, or one or more pooling layers may follow a
plurality of
convolutional layers. In an image processing process, the purpose of the
pooling layer is to
reduce the space size of the image. The pooling layer may include an average
pooling operator
and/or a maximum pooling operator, to perform sampling on the input image to
obtain an image
.. of a relatively small size. The average pooling operator may compute a
pixel value in the image
within a specific range, to generate an average value as an average pooling
result. The maximum
pooling operator may obtain, as a maximum pooling result, a pixel with a
largest value within the
18
Date recue/ date received 2021-12-23

specific range. In addition, just like the size of the weight matrix in the
convolutional layer needs
to be related to the size of the image, an operator at the pooling layer also
needs to be related to
the size of the image. The size of the image output after processing by the
pooling layer may be
smaller than the size of the image input to the pooling layer. Each pixel in
the image output by
the pooling layer indicates an average value or a maximum value of a subarea
corresponding to
the image input to the pooling layer.
[0069] The fully connected network layer 1130 is now described. After
the image is
processed by the convolutional layer/pooling layer 1120, the CNN 110000 is
still incapable of
outputting desired output information. As described above, the convolutional
layer/pooling layer
1120 only extracts a feature, and reduces a parameter brought by the input
image. However, to
generate final output information (desired category information or other
related information), the
CNN 1100 needs to generate an output of a quantity of one or a group of
desired categories by
using the fully connected network layer 1130. Therefore, the fully connected
network layer 1130
may include a plurality of hidden layers (such as 1132(1), 1132(2), ...,
1132(n) in Figure 11) and
an output layer 1140. A parameter included in the plurality of hidden layers
may be obtained by
performing pre-training based on related training data of a specific task
type. For example, the
task type may include image recognition, image classification, image super-
resolution re-setup,
or the like.
[0070] The output layer 1140 follows the plurality of hidden layers
1132(1), 1132(2), ...,
1132(n) in the network layer 1130. In other words, the output layer 1140 is a
final layer in the
entire CNN 1100. The output layer 1140 has a loss function similar to category
cross-entropy
and is specifically used to calculate a prediction error. Once forward
propagation (propagation in
a direction from 1110 to 1140 in Figure 8 is forward propagation) is complete
in the entire CNN
1100, back propagation (propagation in a direction from 1140 to 1110 in Figure
8 is back
propagation) starts to update the weight values and offsets of the foregoing
layers, to reduce a
loss of the CNN 1100 and an error between an ideal result and a result output
by the CNN 1100
by using the output layer.
19
Date recue/ date received 2021-12-23

[0071] It should be noted that the CNN 1100 shown in Figure 7 is
merely used as an
example of a CNN. In actual application, the CNN may exist in a form of
another network
model.
[0072] In the present application, one or more of the steps 554, 555,
556, 558 in method
500, one or more of steps 602, 604 in method 600, and one or more of steps
704, 706, and 708
and 710 in method 700 may be performed by system 100 using CNN 1100.
[0073] In the present application, to implement this functionality
using techniques
previously known in the art, the processor 404 or 504 would need to execute a
vastly
complicated rule-based natural language processing algorithm that would
consume many more
.. computational resources than the deep learning-based methods described
herein, and/or would
produce less accurate results,
[0074] The system 100 has compatibility with other services based on
Restful API, an
application programming interface (API or web API) that conforms to the
constraints of REST
architectural style and allow the system 100 to interact with RESTful web
services. The system
100 is device independent and can function in Andriod i0S, etc.
[0075] The units described as separate parts may or may not be
physically separate, and
parts displayed as units may or may not be physical units, may be located in
one position, or may
be distributed on a plurality of network units. Some or all of the units may
be selected according
to actual requirements to achieve the objectives of the solutions of the
embodiments.
[0076] In addition, functional units in the example embodiments may be
integrated into
one processing unit, or each of the units may exist alone physically, or two
or more units are
integrated into one unit.
[0077] When the functions are implemented in the form of a software
functional unit and
sold or used as an independent product, the functions may be stored in a
computer-readable
Date recue/ date received 2021-12-23

storage medium. Based on such an understanding, the technical solutions of
example
embodiments may be implemented in the form of a software product. The software
product is
stored in a storage medium, and includes several instructions for instructing
a computer device
(which may be a personal computer, a server, or a network device) to perform
all or some of the
steps of the methods described in the example embodiments. The foregoing
storage medium
includes any medium that can store program code, such as a Universal Serial
Bus (USB) flash
drive, a removable hard disk, a read-only memory (ROM), a random access memory
(RAM), a
magnetic disk, or an optical disc. In an example, the software product can be
an inference model
generated from a machine learning training process.
[0078] In the described methods or block diagrams, the boxes may represent
events,
steps, functions, processes, modules, messages, and/or state-based operations,
etc. While some of
the example embodiments have been described as occurring in a particular
order, some of the
steps or processes may be performed in a different order provided that the
result of the changed
order of any given step will not prevent or impair the occurrence of
subsequent steps.
Furthermore, some of the messages or steps described may be removed or
combined in other
embodiments, and some of the messages or steps described herein may be
separated into a
number of sub-messages or sub-steps in other embodiments. Even further, some
or all of the
steps may be repeated, as necessary. Elements described as methods or steps
similarly apply to
systems or subcomponents, and vice-versa. Reference to such words as "sending"
or "receiving"
could be interchanged depending on the perspective of the particular device.
[0079] The described embodiments are considered to be illustrative
and not restrictive.
Example embodiments described as methods would similarly apply to systems or
devices, and
vice-versa.
[0080] The various example embodiments are merely examples and are in
no way meant
to limit the scope of the example embodiments. Variations of the innovations
described herein
will be apparent to persons of ordinary skill in the art, such variations
being within the intended
scope of the example embodiments. In particular, features from one or more of
the example
21
Date recue/ date received 2021-12-23

embodiments may be selected to create alternative embodiments comprised of a
sub-combination
of features which may not be explicitly described. In addition, features from
one or more of the
described example embodiments may be selected and combined to create
alternative example
embodiments composed of a combination of features which may not be explicitly
described.
Features suitable for such combinations and sub-combinations would be readily
apparent to
persons skilled in the art. The subject matter described herein intends to
cover all suitable
changes in technology.
22
Date recue/ date received 2021-12-23

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Compliance Requirements Determined Met 2024-02-07
Letter Sent 2023-12-27
Application Published (Open to Public Inspection) 2023-06-23
Inactive: IPC assigned 2022-04-25
Inactive: IPC assigned 2022-04-25
Inactive: IPC assigned 2022-04-25
Inactive: IPC assigned 2022-04-25
Inactive: IPC assigned 2022-04-25
Inactive: First IPC assigned 2022-04-25
Inactive: IPC assigned 2022-04-25
Letter sent 2022-01-20
Filing Requirements Determined Compliant 2022-01-20
Small Entity Declaration Determined Compliant 2021-12-23
Inactive: QC images - Scanning 2021-12-23
Application Received - Regular National 2021-12-23

Abandonment History

There is no abandonment history.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Application fee - small 2021-12-23 2021-12-23
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
CYBERSMART TECHNOLOGIES INC.
Past Owners on Record
DANISH AHMED ANSARI
MURUGAN KRISHNASAMY
RAJEEV YADAV
RAJNEESH KANT SAXENA
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column (Temporarily unavailable). To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Representative drawing 2023-12-12 1 5
Cover Page 2023-12-12 1 40
Description 2021-12-22 22 1,111
Claims 2021-12-22 4 113
Abstract 2021-12-22 1 20
Drawings 2021-12-22 9 137
Courtesy - Filing certificate 2022-01-19 1 568
Commissioner's Notice - Maintenance Fee for a Patent Application Not Paid 2024-02-06 1 552
New application 2021-12-22 8 231