Patent 2934102 Summary

(12) Patent Application:	(11) CA 2934102
(54) English Title:	A SYSTEM AND A METHOD FOR TRACKING MOBILE OBJECTS USING CAMERAS AND TAG DEVICES
(54) French Title:	UN SYSTEME ET UNE METHODE DE SUIVI DES OBJETS MOBILES AU MOYEN DE CAMERAS ET DE DISPOSITIFS A BALISE
Status:	Dead

Bibliographic Data

(51) International Patent Classification (IPC):	G01V 99/00 (2009.01) G06T 7/00 (2006.01)
(72) Inventors :	NIELSEN, JORGEN (Canada) GEE, PHILLIP RICHARD (Canada)
(73) Owners :	APPROPOLIS INC. (Canada)
(71) Applicants :	APPROPOLIS INC. (Canada)
(74) Agent:
(74) Associate agent:
(45) Issued:
(22) Filed Date:	2016-06-22
(41) Open to Public Inspection:	2016-12-25
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:

Application No.	Country/Territory	Date
62/184,726	United States of America	2015-06-25
62/236,412	United States of America	2015-10-02
14/866,499	United States of America	2015-09-25
14/997,977	United States of America	2016-01-18

Abstracts

English Abstract

A method and system for tracking mobile objects in a site are
disclosed. The system comprises a computer cloud communicating with one or
more imaging devices and one or more tag devices. Each tag device is attached
to
a mobile object, and has one or more sensors for sensing the motion of the
mobile
object. The computer cloud visually tracks mobile objects in the site using
image
streams captured by the imaging devices, and uses measurements obtained from
tag devices to resolve ambiguity occurred in mobile object tracking. The
computer
cloud uses an optimization method to reduce power consumption of tag devices.

Claims

Note: Claims are shown in the official language in which they were submitted.

WHAT IS CLAIMED IS:
1. A system for tracking at least one mobile object in a site,
the
system comprising.
one or more imaging devices capturing images of at least a portion of
the site, and
one or more tag devices, each of the one or more tag devices being
associated with one of the at least one mobile object and moveable therewith,
each
of the one or more tag devices having one or more sensors for obtaining one or

more tag measurements related to the mobile object associated therewith, and
at least one processing structure combining the captured images with
at least one of the one or more tag measurements for tracking the at least one

mobile object.
2. The system of claim 1 wherein said one or more sensors
comprising at least one of an Inertial Measurement Unit (IMU), a barometer, a
thermometer, a magnetometer, a global navigation satellite system (GNSS)
sensor,
an audio frequency microphone, a light sensor, a camera, and a receiver signal

strength (RSS) measurement sensor.
187

3. The system of claim 1 or 2 wherein the at least one processing
structure analyzes images captured by the one or more imaging devices for
determining a set of candidate tag devices for providing said at least one of
the one
or more tag measurements
4 The system of claim 3 wherein the at least one processing
structure analyzes images captured by the one or more imaging devices for
selecting said at least one of the one or more tag measurements.
The system of any one of claims 1 to 4 wherein each of the tag
devices provides the at least one of the one or more tag measurements to the
at
least one processing structure only when said tag device receives from the at
least
one processing structure a request for providing the at least one of the one
or more
tag measurements.
6 The system of any one of claims 1 to 5 wherein the at least
one
processing structure identifies, from the captured images, one or more
foreground
feature clusters (FFCs) for tracking the at least one mobile object, and
determines a
bounding box and a tracking point therefor, said tracking point being at a
bottom
edge of the bounding box.
188

7. The system of claim 6 wherein at least one processing
structure associates each tag device with one of the FFCs.
8. The system of claim 7 wherein, when associating a tag device
with a FFC, the at least one processing structure calculates an FFC-tag
association
probability indicating the reliability of the association between the tag
device and the
FFC.
9. The system of claim 8 wherein said FFC-tag association
probability is calculated based on a set of consecutively captured images.
10. The system of any one of claims 6 to 9 wherein, after
detecting
the one or more FFCs, the at least one processing structure determines the
location
of each of the one or more FFCs in the captured image, and maps each of the
one
or more FFCs to a three-dimensional (3D) coordinate system of the site by
using
perspective mapping.
11. The system of any one of claims 6 to 10 wherein each FFC
corresponds to a mobile object, and wherein the at least one processing
structure
tracks the FFCs using a first order Markov process
189

12. The system of claim 11 wherein the at least one processing
structure tracks the FFCs using a Kalman filter with a first order Markov
Gaussian
process.
13. The system of any one of claims 6 to 12 wherein, when tracking
each of the FFCs, the at least one processing structure uses the coordinates
of the
corresponding mobile object in a 3D coordinate system of the site as state
variables, and the coordinates of the FFC in a two dimensional (2D) coordinate

system of the captured images as observations for the state variables, and
wherein
the at least one processing structure maps the coordinates of the
corresponding
mobile object in a 3D coordinate system of the site to the 2D coordinate
system of
the captured images.
14. The system of any one of claims 1 to 13 wherein the at least
one processing structure discretizes at least a portion of the site into a
plurality of
grid points, and wherein, when tracking a mobile object in said discretized
portion of
the site, the at least one processing structure uses said grid points for
approximating the location of the mobile object
15. The system of claim 14 wherein, when tracking a mobile object
in said discretized portion of the site, the at least one processing structure

calculates a posterior position probability of the mobile object.
190

16. A method of tracking at least one mobile object in at least
one
visual field of view, comprising
capturing at least one image of the at least one visual field of view,
identifying at least one candidate mobile object in the at least one
image,
obtaining one or more tag measurements from at least one tag device,
each of said at least one tag device being associated with a mobile object and

moveable therewith, and
tracking at least one mobile object using the at least one image and
the one or more tag measurements
17. The method of claim 16 further comprising:
analyzing the at least one image for determining a set of candidate tag
devices for providing said one or more tag measurements
18. The method of claim 16 or 17 further comprising.
analyzing the at least one image for selecting said at least one of the
one or more tag measurements.
19. The method of any one of claims 16 to 18 further comprising.
identifying, from the at least one image, one or more foreground
feature clusters (FFCs) for tracking the at least one mobile object, and
determines a
191

bounding box and a tracking point therefor, said tracking point being at a
bottom
edge of the bounding box
20. The method of claim 19 further comprising:
associating each tag device with one of the FFCs
21. The method of claim 20 further comprising
calculating an FFC-tag association probability indicating the reliability
of the association between the tag device and the FFC
22. The method of any one of claims 19 to 21 further comprising
tracking the FFCs using a first order Markov process
23. The method of any one of claims 16 to 22 further comprising:
discretizing at least a portion of the site into a plurality of grid points,
and
tracking a mobile object in said discretized portion of the site by using
said grid points for approximating the location of the mobile object
192

24. A non-transitory, computer readable storage device comprising
computer-executable instructions for tracking at least one mobile object in a
site,
wherein the instructions, when executed, cause one or more processing
structure to
perform actions comprising'
capturing at least one image of the at least one visual field of view;
identifying at least one candidate mobile object in the at least one
image,
obtaining one or more tag measurements from at least one tag device,
each of said at least one tag device being associated with a mobile object and

moveable therewith, and
tracking at least one mobile object using the at least one image and
the one or more tag measurements
25. The storage device of claim 24 further comprising computer-
executable instructions, when executed, causing the one or more processing
structure to perform actions comprising:
calculating an FFC-tag association probability indicating the reliability
of the association between the tag device and the FFC.
193

26. The storage device of claim 24 or 25 further comprising
computer-executable instructions, when executed, causing the one or more
processing structure to perform actions comprising:
analyzing the at least one image for selecting said at least one of the
one or more tag measurements
27. The storage device of any one of claims 24 to 26 further
comprising computer-executable instructions, when executed, causing the one or

more processing structure to perform actions comprising
identifying, from the at least one image, one or more foreground
feature clusters (FFCs) for tracking the at least one mobile object, and
determines a
bounding box and a tracking point therefor, said tracking point being at a
bottom
edge of the bounding box.
28 The storage device of claim 27 further comprising computer-
executable instructions, when executed, causing the one or more processing
structure to perform actions comprising:
associating each tag device with one of the FFCs
194

29. The storage device of claim 28 further comprising computer-
executable instructions, when executed, causing the one or more processing
structure to perform actions comprising
calculating an FFC-tag association probability indicating the reliability
of the association between the tag device and the FFC
30. The storage device of any one of claims 24 to 29 further
comprising computer-executable instructions, when executed, causing the one or

more processing structure to perform actions comprising.
discretizing at least a portion of the site into a plurality of grid points,
and
tracking a mobile object in said discretized portion of the site by using
said grid points for approximating the location of the mobile object.
31. A system for tracking at least one mobile object in a site, the
system comprising.
at least a first imaging device having a field of view (FOV) overlapping
a first subarea of the site and capturing images of at least a portion of the
first
subarea, the first subarea having at least a first entrance; and
one or more tag devices, each of the one or more tag devices being
associated with one of the at least one mobile object and moveable therewith,
each
of the one or more tag devices having one or more sensors for obtaining one or

more tag measurements related to the mobile object associated therewith; and
195

at least one processing structure for.
determining one or more initial conditions of the at least
one mobile object entering the first subarea from the at least first
entrance, and
combining the one or more initial conditions, the
captured images, and at least one of the one or more tag
measurements for tracking the at least one mobile object.
32. The system of claim 31 wherein the at least one processing
structure builds a birds-eye view based on a map of the site, for mapping the
at
least one mobile object therein
33. The system of claim 31 or 32 wherein said one or more initial
conditions comprise data determined from one or more tag measurements
regarding the at least one mobile object before the at least one mobile object
enters
the first subarea from the at least first entrance
34. The system of any one of claims 31 to 33 further comprising:
at least a second imaging device having an FOV overlapping a
second subarea of the site and capturing images of at least a portion of the
second
subarea, the first and second subareas sharing the at least first entrance;
and
wherein the one or more initial conditions comprise data determined from the
at
196

least second imaging device regarding the at least one mobile object before
the at
least one mobile object enters the first subarea from the at least first
entrance.
35. The system of any one of claims 31 to 34 wherein the first
subarea comprises at least one obstruction in the FOV of the at least first
imaging
device, and wherein the at least one processing structure uses a statistic
model
based estimation for resolving ambiguity during tracking when the at least one

mobile object temporarily moves behind the obstruction
36. A method for tracking at least one mobile object in a site, the
method comprising
obtaining a plurality of images captured by at least a first imaging
device having a field of view (FOV) overlapping a first subarea of the site,
the first
subarea having at least a first entrance,
obtaining tag measurements from one or more tag devices, each of
the one or more tag devices being associated with one of the at least one
mobile
object and moveable therewith, each of the one or more tag devices having one
or
more sensors for obtaining one or more tag measurements related to the mobile
object associated therewith,
determining one or more initial conditions of the at least one mobile
object entering the first subarea from the at least first entrance; and
197

combining the one or more initial conditions, the captured images, and
at least one of the one or more tag measurements for tracking the at least one

mobile object.
37. The method of claim 36 further comprising.
building a birds-eye view based on a map of the site, for mapping the
at least one mobile object therein.
38. The method of claim 36 or 37 further comprising
assembling said one or more initial conditions using data determined
from one or more tag measurements regarding the at least one mobile object
before
the at least one mobile object enters the first subarea from the at least
first entrance
39. The method of any one of claims 36 to 38 further comprising:
obtaining images captured by at least a second imaging device having
an FOV overlapping a second subarea of the site, the first and second subareas

sharing the at least first entrance, and
assembling the one or more initial conditions using data determined
from the at least second imaging device regarding the at least one mobile
object
before the at least one mobile object enters the first subarea from the at
least first
entrance.
198

40. The method of any one of claims 36 to 39 wherein the first
subarea comprises at least one obstruction in the FOV of the at least first
imaging
device; and the method further comprising:
using a statistic model based estimation for resolving ambiguity during
tracking when the at least one mobile object temporarily moves behind the
obstruction.
41. One or more non-transitory, computer readable media storing
computer executable code for tracking at least one mobile object in a site,
the
computer executable code comprising computer executable instructions for.
obtaining a plurality of images captured by at least a first imaging
device having a field of view (FOV) overlapping a first subarea of the site,
the first
subarea having walls and at least a first entrance;
obtaining tag measurements from one or more tag devices, each of
the one or more tag devices being associated with one of the at least one
mobile
object and moveable therewith, each of the one or more tag devices having one
or
more sensors for obtaining one or more tag measurements related to the mobile
object associated therewith;
determining one or more initial conditions of the at least one mobile
object entering the first subarea from the at least first entrance; and
combining the one or more initial conditions, the captured images, and
at least one of the one or more tag measurements for tracking the at least one

mobile object
199

42. The computer readable media of claim 41 wherein the
computer executable code further comprises computer executable instructions
for.
building a birds-eye view based on a map of the site, for mapping the
at least one mobile object therein.
43. The computer readable media of claim 41 or 42 wherein the
computer executable code further comprises computer executable instructions
for
assembling said one or more initial conditions using data determined
from one or more tag measurements regarding the at least one mobile object
before
the at least one mobile object enters the first subarea from the at least
first entrance
44. The computer readable media of any one of claims 41 to 43
wherein the computer executable code further comprises computer executable
instructions for:
obtaining images captured by at least a second imaging device having
an FOV overlapping a second subarea of the site, the first and second subareas

sharing the at least first entrance; and
assembling the one or more initial conditions using data determined
from the at least second imaging device regarding the at least one mobile
object
before the at least one mobile object enters the first subarea from the at
least first
entrance
200

45. The
computer readable media of any one of claims 41 to 45
wherein the first subarea comprises at least one obstruction in the FOV of the
at
least first imaging device; and wherein the computer executable code further
comprises computer executable instructions for:
using a statistic model based estimation for resolving ambiguity during
tracking when the at least one mobile object temporarily moves behind the
obstruction.
201

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02934102 2016-06-22
1 A SYSTEM AND A METHOD FOR TRACKING MOBILE OBJECTS USING
2 CAMERAS AND TAG DEVICES
3
4 FIELD OF THE DISCLOSURE
The present invention relates generally to a system and a method for
6
tracking mobile objects, and in particular, a system and a method for tracking
7 mobile objects using cameras and tag devices.
8
9 BACKGROUND
Outdoor mobile object tracking such as the Global Positioning System
11 (GPS)
is known. In the GPS system of the U.S.A. or similar systems such as the
12 GLONASS
system of Russia, the Doppler Orbitography and Radio-positioning
13
Integrated by Satellite (DORIS) of France, the Galileo system of the European
14 Union
and the BeiDou system of China, a plurality of satellites on earth orbits
communicate with a mobile device in an outdoor environment to determine the
16
location thereof. However, a drawback of these systems is that the satellite
17 communication generally requires line-of-sight communication between the
18
satellites and the mobile device, and thus they are generally unusable in
indoor
19 environments, except in restricted areas adjacent to windows and open
doors.
Some indoor mobile object tracking methods and systems are also
21 known.
For example, in the Bluetooth Low Energy (BLE) technology, such as the
22
iBeaconTM technology specified by Apple Inc. of Cupertino, CA, U.S.A. or
23
Samsung's Proximity TM, a plurality of BLE access points are deployed in a
site and
1

CA 02934102 2016-06-22
1
communicate with nearby mobile BLE devices such as smartphones for locating
the
2 mobile
BLE devices using triangulation. Also indoor WiFi signals are becoming
3
ubiquitous and commonly used for object tracking based on radio signal
strength
4 (RSS)
observables. However, the mobile object tracking accuracy of these systems
is still to be improved. Moreover, these systems can only track the location
of a
6 mobile
object, and other information such as gestures of a person being tracked
7 cannot be determined by these systems.
8 It is
therefore an object to provide a novel mobile object tracking
9 system
and method with higher accuracy, robustness and that provides more
information about the mobile objects being tracked.
11
12 SUMMARY
13 There
are a plethora of applications that desire extension of the
14
location of a mobile device or a person in an indoor environment or in a dense
.urban outdoor environment. According to one aspect Of this disclosure, an
object
16
tracking system and a method is disclosed for tracking mobile objects in a
site, such
17 as a campus, a building, a shopping center or the like.
18 Herein,
mobile objects are moveable objects in the site, such as
19 human
being, animals, carts, wheelchairs, robots and the like, and may be moving
or stationary from time to time, usually in a random fashion from a statistic
point of
21 view.
22
According to another aspect of this disclosure, visual tracking in
23
combination of tag devices are used for tracking mobile objects in the site.
One or
2

CA 02934102 2016-06-22
1 more imaging devices such as one or more cameras, are used for
intermittently or
2 continuously, visually tracking the locations of one or more mobile
objects using
3 suitable image processing technologies. One or more tag devices attached to
4 mobile objects may also be used for refining object tracking and for
resolving
ambiguity occurred in visual tracking of mobile objects.
6 As will be described in more detail later, herein, ambiguity
occurred in
7 visual object tracking includes a variety of situations that cause visual
object
8 tracking less reliable or even unreliable.
9 Each tag device is a uniquely identifiable, small electronic
device
attached to a mobile object of interest and moving therewith, undergoing the
same
11 physical motion. However, some mobile objects may not have any tag
device
12 attached thereto.
13 Each tag device comprises one or more .sensors, and is battery
14 powered and operable for an extended period of time, e.g., several
weeks, between
battery charges or replacements. The tag devices communicate with one or more
16 processing structures, such as one or more processing structures of one
or more
17 server computers, e.g., a so-called computer cloud, using suitable
wireless
18 communication methods. Upon receiving a request signal from the computer
cloud,
19 a tag device uses its sensors to make measurements or observations of
the mobile
object associated therewith, and transmits these measurements wirelessly to
the
21 system. For example, a tag device may make measurements of the
characteristics
22 of the physical motion of itself. As the tag devices undergo the same
physical
3

CA 02934102 2016-06-22
1 motion as the associated mobile object, the measurements made by the tag
devices
2 represent the motion measurements of their associated mobile objects.
3 According to another aspect of this disclosure, the object
tracking
4 system comprises a computer cloud having one or more servers,
communicating
with one or more imaging devices deployed in a site for visually detecting and
6 tracking moving and stationary mobile objects in the site.
7 The computer cloud accesses suitable image processing technologies
8 to detect foreground objects, denoted as foreground feature clusters
(FFCs), from
9 images or image frames captured by the imaging devices, each FFC
representing a
candidate mobile object in the field of view (FOV) of the imaging device. The
11 computer cloud then
identifies and tracks the FFCs. =
12 When ambiguity occurs in identifying and tracking FFCs, the
computer
13 cloud requests one or more candidate tag devices to make necessary tag
14 measurements. The computer cloud uses tag measurements to resolve any
ambiguity and associates FFCs with tag devices for tracking.
16 According to another aspect of this disclosure, when associating
FFCs
17 with tag devices, the computer cloud calculates a FFC-tag association
probability,
18 indicating the correctness, reliability or belief in the determined
association. In this
19 embodiment, the FFC-tag association probability is numerically
calculated, e.g., by
using a suitable numerical method to find a numerical approximation of the FFC-
tag
21 association probability. The FFC-tag association probability is
constantly updated as
22 new images and/or tag measurements are made available to the system. The
23 computer cloud attempts to maintain the FFC-tag association probability
at or above
4

CA 02934102 2016-06-22
1 a predefined probability threshold. If the FFC-tag association
probability falls below
2 the probability threshold, more tag measurements are requested. The tag
devices,
3 upon request, make the requested measurements and send the requested
4 measurements to the computer cloud for establishing the F.FC-tag
association.
Like any other systems, the system disclosed herein operates with
6 constraints such as power consumption. Generally, the overall power
consumption
7 of the system comprises the power consumption of the tag devices in
making tag
8 measurements and the power consumed by other components of the system
9 including the computer cloud and the imaging devices. While the computer
cloud
and the imaging devices are usually powered by relatively unlimited sources of
11 power, tag devices are usually powered by batteries having limited
stored energy.
12 Therefore, it is desirable, although optional in some embodiments, to
manage
13 power consumption of tag devices during mobile object tracking through
using low
14 power consumption components known in the art, and by only triggering
tag devices
to conduct measurements when actually needed.
16 Therefore, according to another aspect of this disclosure, at
least in
17 some embodiments, the system is designed using a constrained
optimization
18 algorithm with an objective of minimizing tag device energy consumption
for a
19 constraint of the probability of correctly associating the tag device
with an FFC. The
system achieves this objective by requesting tag measurements only when
21 necessary, and by determining the candidate tag devices for providing
the required
22 tag measurements.
5

CA 02934102 2016-06-22
1 When
requesting tag measurements, the computer cloud first
2
determines a group of candidate tag devices based on the analysis of captured
3 images and determines required tag measurements based on the analysis of
4
captured images and the knowledge of power consumption for making the tag
measurements. The computer cloud then only requests the required tag
6 measurements from the candidate tag devices.
7 One
objective of the object tracking system is to visually track mobile
8 objects
and using measurements from tag devices attached to mobile objects to
9 resolve
ambiguity occurred in visual object tracking. The system tracks the locations
of mobile objects having tag devices attached thereto, and optionally and if
possible,
11 tracks
mobile objects having no tag devices attached thereto. The object tracking
12 system is the combination of:
13 1)
Computer vision processing to visually track the mobile objects
14 as they move throughout the site;
2) Wireless messaging
between the tag device and the computer
16 cloud
to establish the unique identity of each tag device; herein, wireless
messaging
17 refers to any suitable wireless messaging means such as messaging via
18 electromagnetic wave, optical means, acoustic telemetry, and the like;
19 3) Motion
related observations or measurements registered by
various sensors in tag devices, communicated wirelessly to the computer cloud;
and
21 4)
Cloud or network based processing to correlate the
22
measurements of motion and actions of the tag devices and the computer vision
23 based
motion estimation and characterization of mobile objects such that the
6

CA 02934102 2016-06-22
1
association of the tag devices and the mobile objects observed by the imaging
2 devices can be quantified through a computed probability of such
association.
3 The
object tracking system combines the tracking ability of imaging
4 devices
with that of tag devices for associating a unique identity to the mobile
object
being tracked. Thereby the system can also distinguish between objects that
appear
6
similar, being differentiated by the tag. In another aspect, if some tag
devices are
7
associated with the identities of the mobile objects they attached to, the
object
8
tracking system can further identify the identities of the mobile objects and
track
9 them.
In contradistinction, known visual object tracking technologies using
11 imaging
devices can associate a unique identity to the mobile object being tracked
12 only if
the image of the mobile object has at least one unique visual feature such as
13 an
identification mark, e.g., an artificial mark or a biometrical mark, e.g., a
face
14
feature, which may be identified by computer vision processing methods such as
face recognition. Such detailed visual identity recognition is not always
available or
16 economically feasible.
17
According to one aspect of this disclosure, there is provided a system
18 for
tracking at least one mobile object in a site. The system comprises: one or
more
19 imaging
devices capturing images of at least a portion of the site; and one or more
tag devices, each of the one or more tag devices being associated with one of
the
21 at
least one mobile object and moveable therewith, each of the one or more tag
22 devices
obtaining one or more tag measurements related to the mobile object
23
associated therewith; and at least one processing structure combining the
captured
7

CA 02934102 2016-06-22
1 images
with at least one of the one or more tag measurements for tracking the at
2 least one mobile object.
3 In some
embodiments, each of the one or more tag devices comprises
4 one or more sensors for obtaining the one or more tag measurements.
In some embodiments, the one or more sensors comprise at least one
6 of an Inertial Measurement Unit (IMU), a barometer, a thermometer, a
7
magnetometer, a global navigation satellite system (GNSS) sensor, an audio
8
frequency microphone, a light sensor, a camera, and a receiver signal strength
9 (RSS) measurement sensor.
In some embodiments, the RSS measurement sensor is a sensor for
11
measuring the signal strength of a received wireless signal received from a
12 transmitter, for estimating the distance from the transmitter.
13 In some
embodiments, the wireless signal is at least one of a
14 Bluetooth signal and a WiFi signal.
In some embodiments, the at least one processing structure analyzes
16 images
captured by the one or more imaging devices for determining a set of
17
candidate tag devices for providing said at least one of the one or more tag
18 measurements.
19 In some
embodiments, the at least one processing structure analyzes
images captured by the one or more imaging devices for selecting said at least
one
21 of the one or more tag measurements.
22 In some
embodiments, each of the tag devices provides the at least
23 one of
the one or more tag measurements to the at least one processing structure
8

CA 02934102 2016-06-22
1 only when said tag device receives from the at least' one processing
structure a
2 request for providing the at least one of the one or more tag
measurements.
3 In some embodiments, each of the tag devices, when receiving from
4 the at least one processing structure a request for providing the at
least one of the
one or more tag measurements, only provides the requested the at least one of
the
6 one or more tag measurements to the at least one processing structure.
7 In some embodiments, the at least one processing structure
identifies
8 from the captured images one or more foreground feature clusters (FFCs)
for
9 tracking the at least one mobile object.
In some embodiments, the at least one processing structure
11 determines a bounding box for each FFC.
12 In some embodiments, the at least one processing structure
13 determines a tracking point for each FFC.
14 In some embodiments, for each FFC, the at least one processing
structure determines a bounding box and a tracking point therefor, said
tracking
16 point being at a bottom edge of the bounding box.
17 In some embodiments, at least one processing structure associates
18 each tag device with one of the FFCs.
19 In some embodiments, when associating a tag device with a FFC, the
at least one processing structure calculates an FFC-tag association
probability
21 indicating the reliability of the association between the tag device and
the FFC.
22 In some embodiments, said FFC-tag association probability is
23 calculated based on a set of consecutively captured images.
9

CA 02934102 2016-06-22
1 In some embodiments, said FFC-tag association probability is
2 calculated by finding a numerical approximation thereof.
3 In some embodiments, when associating a tag device with a FFC, the
4 at least one processing structure executes a constrained optimization
algorithm for
minimizing the energy consumption of the one or more tag devices while
6 maintaining the FFC-tag association probability above a target value.
7 In some embodiments, when associating a tag device with a FFC, the
8 at least one processing structure calculates a tag-image correlation
between the tag
9 measurements and the analysis results of the captured images.
In some embodiments, when the tag measurements for calculating
11 said tag-image correlation comprise measurement obtained from an IMU.
12 In some embodiments, when the tag measurements for calculating
13 said tag-image correlation comprise measurements obtained from at least
one of an
14 accelerometer, a gyroscope and a magnetometer for calculating a
correlation
between the tag measurements and the analysis results of the captured images
to
16 determine whether a mobile object is changing its moving direction.
17 In some embodiments, the at least one processing structure
maintains
18 a background image for each of the one or more imaging devices.
19 In some embodiments, when detecting FFCs from each of the
captured images, the at least one processing structure generates a difference
21 image by calculating the difference between the captured image and the
22 corresponding background image, and detects one or more FFCs from the
23 difference image.

CA 02934102 2016-06-22
1 In some
embodiments, when detecting one or more FFCs from the
2
difference image, the at least one processing structure 'mitigates shadow from
each
3 of the one or more FFCs.
4 In some
embodiments, after detecting the one or more FFCs, the at
least one processing structure determines the location of each of the one or
more
6 FFCs in
the captured image, and maps each of the one or more FFCs to a three-
7 dimensional (3D) coordinate system of the site by using perspective
mapping.
8 In some
embodiments, the at least one processing structure stores a
9 3D map
of the site for mapping each of the one or more FFCs to the 3D coordinate
system of the site, and wherein in said map, the site includes one or more
areas,
11 and each of the one or more areas has a horizontal, planar floor.
12 In some
embodiments, the at least one processing structure tracks at
13 least
one of the one or more FFCs based on the velocity thereof determined from
14 the captured images.
In some embodiments, each FFC corresponds to a mobile object, and
16 wherein
the at least one processing structure tracks the FFCs using a first order
17 Markov process.
18 In some
embodiments, the at least one processing structure tracks the
19 FFCs using a Kalman filter with a first order Markov Gaussian process.
In some embodiments, when tracking each of the FFCs, the at least
21 one
processing structure uses the coordinates of the corresponding mobile object
in
22 a 3D
coordinate system of the site as state variables, and the coordinates of the
23 FFC in
a two dimensional (2D) coordinate system of the captured images as
11
=

CA 02934102 2016-06-22
1 observations for the state variables, and wherein the at least one
processing
2 structure maps the coordinates of the corresponding mobile object in a 3D
3 coordinate system of the site to the 2D coordinate system of the captured
images.
4 In some embodiments, the at least one processing structure
discretizes at least a portion of the site into a plurality of grid points,
and wherein,
6 when tracking a mobile object in said discretized portion of the site,
the at least one
7 processing structure uses said grid points for approximating the location
of the
8 mobile object.
9 In some embodiments, when tracking a mobile object in said
discretized portion of the site, the at least one processing structure
calculates a
11 posterior position probability of the mobile object.
12 In some embodiments, the at least one processing structure
identifies
13 at least one mobile object from the captured images using biometric
observation
14 made from the captured images.
In some embodiments, the biometric observation comprise at least
16 one of face characteristics and gait, and wherein the at least one
processing
17 structure makes the biometric observation using at least one of face
recognition and
18 gait recognition.
19 In some embodiments, at least a portion of the tag devices store a
first
ID for identifying the type of the associated mobile object.
21 In some embodiments, at least one of said tag devices is a smart
22 phone.
12

CA 02934102 2016-06-22
1 In some
embodiments, at least one of said tag devices comprises a
2 microphone, and wherein the at least one processing structure uses tag
3 measurement obtained from the microphone to detect at least one of room
4
reverberation, background noise level and spectrum of noise, for establishing
the
FFC-tag association.
6 In some
embodiments, at least one of said tag devices comprises a
7 microphone, and wherein the at least one processing structure uses tag
8
measurement obtained from the microphone to detect motion related sound, for
9 establishing the FFC-tag association.
In some embodiments, said motion related sound comprises at least
11 one of
brushing of clothes against the microphone, sound of a wheeled object
12 wheeling over a floor surface and sound of an object sliding on a floor
surface.
13 In some
embodiments, one or more first tag device broadcast an
14
ultrasonic sound signature, and wherein at least a second tag device comprises
a
microphone for receiving and detecting the ultrasonic sound signature
broadcast
16 from said one or more first tag devices, for establishing the FFC-tag
association.
17 In some
embodiments, the one or more processing structures are
18 processing structures of one or more computer servers.
19
According to another aspect of this disclosure, there is provided a
method of tracking at least one mobile object in at least one visual field of
view. The
21 method
comprises: capturing at least one image of the at least one visual field of
22 view;
identifying at least one candidate mobile object in . the at least one image;
23
obtaining one or more tag measurements from at least one tag device, each of
said
13

CA 02934102 2016-06-22
1 at
least one tag device being associated with a mobile object and moveable
2
therewith; and tracking at least one mobile object using the at least one
image and
3 the one or more tag measurements.
4 In some
embodiments, the method further comprises: analyzing the at
least one image for determining a set of candidate tag devices for providing
said
6 one or more tag measurements.
7 In some
embodiments, the method further comprises: analyzing the at
8 least
one image for selecting said at least one of the one or more tag
9 measurements.
In some embodiments, the method further comprises: identifying, from
11 the at
least one image, one or more foreground feature clusters (FFCs) for tracking
12 the at
least one mobile object, and determines a bounding box and a tracking point
13 therefor, said tracking point being at a bottom edge of the bounding
box.
14 In some
embodiments, the method further comprises: associating
each tag device with one of the FFCs.
16 In some
embodiments, the method further comprises: calculating an
17 FFC-tag
association probability indicating the reliability of the association between
18 the tag device and the FFC.
19 In some
embodiments, the method further comprises: tracking the
FFCs using a first order Markov process.
21 In some
embodiments, the method further comprises: discretizing at
22 least a
portion of the site into a plurality of grid points; and tracking a mobile
object
14
=

CA 02934102 2016-06-22
1 in said
discretized portion of the site by using said grid points for approximating
the
2 location of the mobile object.
3
According to another aspect of this disclosure, there is provided a
4 non-
transitory, computer readable storage device comprising computer-executable
instructions for tracking at least one mobile object in a site, wherein the
instructions,
6 when
executed, cause a first processor to perform actions comprising: capturing at
7 least
one image of the at least one visual field of view; identifying at least one
8
candidate mobile object in the at least one image; obtaining one or more tag
9
measurements from at least one tag device, each of said at least one tag
device
being associated with a mobile object and moveable therewith; and tracking at
least
11 one
mobile object using the at least one image and the one or more tag
12 measurements.
13 In some
embodiments, the storage . device further comprises
14 computer-executable instructions, when executed, causing the one or more
processing structure to perform actions comprising: calculating an FFC-tag
16
association probability indicating the reliability of the association between
the tag
17 device and the FFC.
18 In some
embodiments, the storage device further comprises
19 computer-executable instructions, when executed, causing the one or more
processing structure to perform actions comprising: analyzing the at least one
21 image for selecting said at least one of the one or more tag
measurements.
22 In some
embodiments, the storage device further comprises
23 computer-executable instructions, when executed, causing the one or more

CA 02934102 2016-06-22
1 processing structure to perform actions comprising: identifying, from the
at least one
2 image, one or more foreground feature clusters (FFCs) for tracking the at
least one
3 mobile object, and determines a bounding box and a tracking point
therefor, said
4 tracking point being at a bottom edge of the bounding box.
In some embodiments, the storage device further comprises
6 computer-executable instructions, when executed, causing the one or more
7 processing structure to perform actions comprising: associating each tag
device
8 with one of the FFCs.
9 In some embodiments, the storage device further comprises
computer-executable instructions, when executed, causing the one or more
11 processing structure to perform actions comprising: calculating an FFC-tag
12 association probability indicating the reliability of the association
between the tag
13 device and the FFC.
14 In some embodiments, the storage device further comprises
computer-executable instructions, when executed, causing the one or more
16 processing structure to perform actions comprising: discretizing at
least a portion of
17 the site into a plurality of grid points; and tracking a mobile object
in said discretized
18 portion of the site by using said grid points for approximating the
location of the
19 mobile object.
According to another aspect of this disclosure, there is provided a
21 system for tracking at least one mobile object in a site. The system
comprises: at
22 least a first imaging device having a field of view (FOV) overlapping a
first subarea
23 of the site and capturing images of at least a portion of the first
subarea, the first
16

CA 02934102 2016-06-22
1 subarea having at least a first entrance; and one or more tag devices,
each of the
2 one or more tag devices being associated with one of the at least one
mobile object
3 and moveable therewith, each of the one or more tag devices having one or
more
4 sensors for obtaining one or more tag measurements related to the mobile
object
associated therewith; and at least one processing structure for: determining
one or
6 more initial conditions of the at least one mobile object entering the
first subarea
7 from the at least first entrance; and combining the one or more initial
conditions, the
8 captured images, and at least one of the one or more tag measurements for
9 tracking the at least one mobile object.
In some embodiments, the at least one processing structure builds a
11 birds-eye view based on a map of the site, for mapping the at least one
mobile
12 object therein.
13 In some embodiments, said one or more initial conditions comprise
14 data determined from one or more tag measurements regarding the at least
one
mobile object before the at least one mobile object enters the first subarea
from the
16 at least first entrance.
17 In some embodiments, the system further comprises: at least a
18 second imaging device having an FOV overlapping a second subarea of the
site
19 and capturing images of at least a portion of the second subarea, the
first and
second subareas sharing the at least first entrance; and wherein the one or
more
21 initial conditions comprise data determined from the at least second
imaging device
22 regarding the at least one mobile object before the at least one mobile
object enters
23 the first subarea from the at least first entrance.
=
17

CA 02934102 2016-06-22
1 In some
embodiments, the first subarea comprises at least one
2
obstruction in the FOV of the at least first imaging device; and wherein the
at least
3 one
processing structure uses a statistic model based estimation for resolving
4
ambiguity during tracking when the at least one mobile object temporarily
moves
behind the obstruction.
6
According to another aspect of this disclosure, there is provided a
7 method
for tracking at least one mobile object in a site. The method comprises:
8
obtaining a plurality of images captured by at least a first imaging device
having a
9 field
of view (FOV) overlapping a first subarea of the site, the first subarea
having at
least a first entrance; obtaining tag measurements from one or more tag
devices,
11 each of
the one or more tag devices being associated with one of the at least one
12 mobile
object and moveable therewith, each of the one or more tag devices having
13 one or
more sensors for obtaining one or more tag measurements related to the
14 mobile
object associated therewith; determining one or more initial conditions of the
at least one mobile object entering the first subarea from the at least first
entrance;
16 and
combining the one or more initial conditions, the captured images, and at
least
17 one of
the one or more tag measurements for tracking the at least one mobile object.
18 In some
embodiments, the method further comprises: building a birds-
19 eye
view based on a map of the site, for mapping the at least one mobile object
therein.
21 In some
embodiments, the method further comprises: assembling said
22 one or
more initial conditions using data determined from one or more tag
18

CA 02934102 2016-06-22
1
measurements regarding the at least one mobile object before the at least one
2 mobile object enters the first subarea from the at least first entrance.
3 In some
embodiments, the method further comprises: obtaining
4 images
captured by at least a second imaging device having an FOV overlapping a
second subarea of the site, the first and second subareas sharing the at least
first
6
entrance; and assembling the one or more initial conditions using data
determined
7 from
the at least second imaging device regarding the at least one mobile object
8 before
the at least one mobile object enters the first subarea from the at least
first
9 entrance.
In some embodiments, the first subarea comprises at least one
11
obstruction in the FOV of the at least first imaging device; and the method
further
12
comprises: using a statistic model based estimation for resolving ambiguity
during
13 tracking when the at least one mobile object temporarily moves behind the
14 obstruction.
According to another aspect of this disclosure, there is provided one
16 or more
non-transitory, computer readable media storing computer executable code
17 for
tracking at least one mobile object in a site. The computer executable code
18
comprises computer executable instructions for: obtaining a plurality of
images
19
captured by at least a first imaging device having a field of view (FOV)
overlapping
a first subarea of the site, the first subarea having walls and at least a
first entrance;
21
obtaining tag measurements from one or more tag devices, each of the one or
more
22 tag
devices being associated with one of the at least one mobile object and
23
moveable therewith, each of the one or more tag devices having one or more
19

CA 02934102 2016-06-22
1 sensors for obtaining one or more tag measurements related to the mobile
object
2 associated therewith; determining one or more initial conditions of the
at least one
3 mobile object entering the first subarea from the at least first
entrance; and
4 combining the one or more initial conditions, the captured images, and at
least one
of the one or more tag measurements for tracking the at least one mobile
object.
6 In some embodiments, the computer executable code further
7 comprises computer executable instructions for: building a birds-eye view
based on
. 8 a map of the site, for mapping the at least one mobile object
therein.
9 In some embodiments, the computer executable code further
comprises computer executable instructions for: assembling said one or more
initial
11 conditions using data determined from one or more tag measurements
regarding
12 the at least one mobile object before the at least one mobile object
enters the first
13 subarea from the at least first entrance.
14 In some embodiments, the computer executable code further
comprises computer executable instructions for: obtaining images captured by
at
16 least a second imaging device having an FOV overlapping a second subarea
of the
17 site, the first and second subareas sharing the at least first entrance;
and
18 assembling the one or more initial conditions using data determined from
the at
19 least second imaging device regarding the at least one mobile object
before the at
least one mobile object enters the first subarea from the at least first
entrance.
21 In some embodiments, the first subarea comprises at least one
22 obstruction in the FOV of the at least first imaging device; and wherein
the computer
23 executable code further comprises computer executable instructions for:
using a

CA 02934102 2016-06-22
1 statistic model based estimation for resolving ambiguity during tracking
when the at
2 least one mobile object temporarily moves behind the obstruction.
3
4 BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a schematic diagram showing an object tracking system
6 deployed in a site, according to one embodiment;
7 Figure 2 is a schematic diagram showing the functional structure
of
8 the object tracking system of Fig. 1;
9 Figure 3 shows a foreground feature cluster (FFC) detected in a
captured image;
11 Figure 4 is a schematic diagram showing the main function blocks
of
12 the system of Fig. 1 and the data flow therebetween;
13 Figures 5A and 5B illustrate connected flowcharts showing steps of
a
14 process of tracking mobile objects using a vision assisted hybrid
location algorithm;
Figures 6A to 6D show steps of an example of establishing and
16 tracking an FFC-tag association following the process of Figs. 5A and
5B;
17 Figure 7 is a schematic diagram showing the main function blocks
of
18 the system of Fig. 1 and the data flows therebetween, according to an
alternative
19 embodiment;
Figure 8 is a flowchart showing the detail of FFC detection, according
21 to one embodiment;
22 Figures 9A to 9F show a visual representation of steps in an
example
23 of FFC detection;
21

CA 02934102 2016-06-22
1 Figure 10 shows a visual representation of an example of a
difference
2 image wherein the mobile object captured therein has a shadow;
3 Figure 11A is a three-dimensional (3D) perspective view of a
portion
4 of a site;
Figure 11B is a plan view of the site portion of Fig. 11A;
6 Figures 11C and 11D show the partition of the site portion of Fig.
11B
7 and 11A, respectively;
8 Figures 11E and 11F show the calibration processing for
establishing
9 perspective mapping between the site portion of Fig. 11A and captured
images;
Figure 12A shows a captured image of the site portion of Fig. 11A, the
11 captured image having an FFC of a person detected therein;
12 Figure 12B is a plan view of the site portion of Fig. 11A with the
FFC
13 of Fig. 12A mapped thereto;
14 Figure 120 shows a sitemap having the site portion of Fig. 11A and
the FFC of Fig. 12A mapped thereto;
16 Figure 13 shows a plot of the x-axis position of a bounding box
17 tracking point (BBTP) of an FFC in captured images, wherein the vertical
axis
18 represents the BBTP's x-axis position (in pixel) in captured images, and
the
19 horizontal axis represents the image frame index;
Figure 14 is a flowchart showing the detail of mobile object tracking
21 using an extended Kalman filter (EKF);
22 Figure 15A shows an example of two imaging devices CA and CB with
23 overlapping field of view (FOV) covering an L-shaped room;
22

CA 02934102 2016-06-22
1 Figure 15B shows a grid partitioning of the room of Fig. 15A;
2 Figure 16A shows an imaginary, one-dimensional room partitioned to
3 six grid points;
4 Figure 16B is a state diagram for the imaginary room of Fig. 16A;
Figures 17A and 17B are graphs for a deterministic example, where a
6 mobile object is moving left to right along the x-axis in the FOV of an
imaging device,
7 wherein Fig. 17A is a state transition diagram, and Fig, 17B shows a
graph of
8 simulation results;
9 Figures 18A to 18C show another example, where a mobile object is
slewing to the right hand side along the x-axis in the FOV of an imaging
device,
11 wherein Fig. 18A is a state transition diagram, and Fig. 18B and 18C are
graphs of
12 simulation results of the mean and the standard deviation (STD) of x-
and y-
13 coordinates of the mobile object, respectively;
14 Figure 19 is a schematic diagram showing the data flow for
determining a state transition matrix;
16 Figures 20A to 20E show visual representation of an example of
17 merging/occlusion of two mobile objects;
18 Figures 21A to 21E show visual representation of an example that a
19 mobile object is occluded by a background object;
Figure 22 shows a portion of the functional structure of a Visual
21 Assisted Indoor Location System (VAILS), according to an alternative
embodiment,
22 the portion shown in Fig. 22 corresponding to the computer cloud of Fig.
2;
23

CA 02934102 2016-06-22
Figure 23 is a schematic diagram showing the association of a blob in
2 a camera view, a BV object in a birds-eye view of the site and a tag
device;
3 Figure 24 is a schematic illustration of an example site, which is
4 divided into a number of rooms, with entrances/exits connecting the
rooms;
Figure 25 is a schematic illustration showing. a mobile object entering
6 a room and moving therein;
7 Figure 26 is a schematic diagram showing data flow between the
8 imaging device, camera view processing submodule, internal blob track
file (IBTF),
9 birds-eye view processing submodule, network arbitrator, external blob
track file
(EBTF) and object track file (OTF); =
11 Figures 27A to 27D are schematic illustrations showing
possibilities
12 that may cause ambiguity;
13 Figure 28 is a schematic illustration showing an example, in which
a
14 tagged mobile object moves in a room from a first entrance on the left-
hand side of
the room to the right-hand side thereof towards a second entrance, and an
16 untagged object moves in the room from the second entrance on the right-
hand side
17 of the room to the left-hand side thereof towards the first entrance;
18 Figure 29 is a schematic diagram showing the relationship between
19 the IBTF, EBTF, OTF, Tag Observable File (TOF) for storing tag
observations,
network arbitrator and tag devices;
21 Figure 30 is a schematic diagram showing information flow between
22 camera views, birds-eye view and tag devices;
=
24

CA 02934102 2016-06-22
1 Figure
31 is a more detailed version of Fig. 30, showing information
2 flow
between camera views, birds-eye view and tag devices, and the function of the
3 network arbitrator in the information flow;
4 Figures
32A shows an example of a type 3 blob having a plurality of
sub-blobs;
6 Figure
32B is a diagram showing the relationship of the type 3 blob
7 and its sub-blobs of Fig. 32A;
8 Figure
33 shows a timeline history diagram. of a life span of a blob
9 from its creation event to its annihilation event;
Figure 34 shows a timeline history diagram of the blobs of Fig. 28;
11 Figures
35A shows an example of a type 6 blob merged from two
12 blobs;
13 Figure
35B is a diagram showing the relationship of the type 6 blob
14 and its sub-blobs of Fig. 35A;
Figure 36A is a schematic illustration showing two tagged objects
16 simultaneously entering a room from a same entrance and moving therein;
17 Figure
36B shows a timeline history diagram of a life span of a blob
18 from
its creation event to its annihilation event, for tracking two tagged objects
19
simultaneously entering a room from a same entrance and moving therein with
different speeds;
21 Figure
37A is a schematic illustration showing an example wherein a
22 blob is split to two sub-blobs;

CA 02934102 2016-06-22
1 Figure 37B is a schematic illustration showing an example wherein
a
2 person enters a room, moves therein, and later pushes a cart to exit the
room;
3 Figure 37C is a schematic illustration showing an example wherein
a
4 person enters a room, moves therein, sits down for a while, and then
moves out of
the room;
6 Figure 37D is a schematic illustration showing an example wherein
a
7 person enters a room, moves therein, sits down for a while at a location
already
8 having two person sitting, and then moves out of the room;
9 Figure 38 is a table listing the object activities and the
performances of
the network arbitrator, camera view processing and tag devices that may be
11 triggered by the
corresponding object activities; =
12 Figures 39A and 39B show two consecutive image frames, each
13 having detected blobs;
14 Figure 39C shows the maximum correlation of image frames of Figs.
39A and 39B;
16 Figure 40 shows an image frame having two blobs;
17 Figure 41A is a schematic illustration showing an example wherein
a
18 mobile object is moving in a room and is occluded by an obstruction
therein;
19 Figure 41B is a schematic diagram showing data flow in tracking
the
mobile object of Fig. 41A,
21 Figure 42 shows a timeline history diagram of the blobs of Fig.
41A;
22 Figure 43 shows an alternative possibility that may give rise to
same
23 camera view observations of Fig. 41A;
26

CA 02934102 2016-06-22
1 Figure
44 shows an example of a blob with a BBTP ambiguity region
2 determined by the system;
3 Figures
45A and 45B show a BBTP in the camera view and mapped
=
4 into the birds-eye view, respectively;
Figures 46A and 46B show an example of an ambiguity region of a
6 BBTP (not shown) in the camera view and mapped into the birds-eye view,
7 respectively;
8 Figure
47 shows a simulation configuration having an imaging device
9 and an obstruction in the FOV of the imaging device;
Figure 48 shows the results of the DBN prediction of Fig. 47 without
11 velocity feedback;
12 Figure
49 shows the prediction likelihood over time in tracking the
13 mobile object of Fig. 47 without velocity feedback;
14 Figure
50 shows the results of the DBN prediction in tracking the
mobile object of Fig. 47 with velocity feedback;
16 Figure
51 shows the prediction likelihood over time in tracking the
17 mobile object of Fig. 47 with velocity feedback;
18 Figures
52A to 52C show another example of a simulation
19
configuration, the simulated prediction likelihood without velocity feedback,
and the
simulated prediction likelihood with velocity feedback, respectively;
21 Figure
53A shows a simulation configuration for simulating the
22
tracking of a first mobile object (not shown) with an interference object
nearby the
27

CA 02934102 2016-06-22
1
trajectory of the first mobile object and an obstruction between the imaging
device
2 and the trajectory;
3 Figure 53B shows the prediction likelihood of Fig. 53A;
4 Figures
54A and 54B show another simulation example of tracking a
first mobile object (not shown) with an interference object nearby the
trajectory of
6 the
first mobile object and an obstruction between the imaging device and the
7 trajectory;
8 Figure
55 shows the initial condition flow and the output of the network
9 arbitrator;
Figure 56 is a schematic illustration showing an example wherein two
11 mobile
object moves across a room but the imaging device therein reports only one
12 mobile object exiting from an entrance on the right-hand side of the
room;
13 Figure
57 shows another example, wherein the network arbitrator may
14 delay
the choice among candidate routes if the likelihoods of candidate routes are
still high, and make a choice when one candidate route exhibits sufficiently
high
16 likelihood;
17 Figure
58A is a schematic illustration showing an example wherein a
18 mobile object moves across a room;
19 Figure
58B is a schematic diagram showing the initial condition flow
and the output of the network arbitrator in a mobile object tracking example
of Fig.
21 58A;
22 Figure
59 is a schematic illustration showing an example wherein a
23 tagged object is occluded by an untagged object;
28

CA 02934102 2016-06-22
1 Figure
60 shows the relationship between the camera view processing
2
submodule, birds-eye view processing submodule, and the network arbitrator/tag
3 devices;
4 Figure
61 shows a 3D simulation of a room having an indentation
representing a portion of the room that is inaccessible to any mobile objects;
6 Figure
62 shows the prediction probability based on arbitrary building
7 wall
constraints of Fig. 61, after sufficient number of iterations to approximate a
8 steady state;
9 Figures
63A and 63B show a portion of the MATLABO code used in a
simulation;
11 Figure
64 shows a portion of the MATLABO code for generating a
12 Gaussian shaped likelihood kernel;
13 Figures
65A to 650 show the plotting of the initial probability subject to
14 the
site map wall regions, the measurement probability kernel, and the probability
after the measurement likelihood has been applied, respectively;
16 Figure 66 shows a steady state distribution reached in a
simulation;
17 Figures
67A to 67D show the mapping between a world coordinate
18 system and a camera coordinate system;
19 Figure 68A is an original picture used in a simulation;
Figure 68B is an image of the picture of Fig. 68A captured by an
21 imaging device;
22 Figure
69 show a portion of MATLABO code for correcting the
23 distortion in Fig. 68B; and
29

CA 02934102 2016-06-22
1 Figure 70 shows the distortion-corrected image of Fig. 68B.
2
3 DETAILED DESCRIPTION' =
4 Glossary:
Global Positioning System (GPS)
6 Doppler Orbitography and Radio-positioning Integrated by Satellite
7 (DORIS)
8 Bluetoothe Low Energy (BLE)
9 foreground feature clusters (FFCs)
field of view (FOV)
11 Inertial Measurement Unit (IMU)
12 a global navigation satellite system (GNSS),
13 a receiver signal strength (RSS)
14 two dimensional (2D)
three-dimensional (3D)
16 bounding box tracking point (BBTP)
17 Kalman filter (EKF)
18 standard deviation (STD)
19 Visual Assisted Indoor Location System (VAILS)
internal blob track file (IBTF),
21 external blob track file (EBTF)
22 object track file (OTF)
23 Tag Observable File (TOF)

CA 02934102 2016-06-22
1 central processing units (CPUs)
2 input/output (I/O)
3 frames per second (fps)
4 personal data assistant (PDA)
universally unique identifier (UUID)
=
6 security camera system (SCS)
7 Radio-frequency identification (RFID)
8 probability density function (PDF)
9 mixture of gaussians (MoG) model
singular value decomposition (SVD)
11 access point (AP)
12 standard deviation (STD) of x- and y-coordinates of the mobile
object,
13 denoted as STDx and STDy
14 a birds-eye view (BV)
camera view processing and birds-eye view processing (CV/BV)
16 camera view (CV) objects
17 birds-eye view (CV) objects
18 object track file (OTF)
19
In the following, a method and system for, tracking mobile objects in a
21 site are disclosed. The system comprises one or more computer servers,
e.g., a so-
22 called computer cloud, communicating with one or more imaging devices
and one
23 or more tag devices. Each tag device is attached to a mobile object, and
has one or
31

CA 02934102 2016-06-22
1 more sensors for sensing the motion of the mobile object. The computer
cloud
2 visually tracks mobile objects in the site using image streams captured
by the
3 imaging devices, and uses measurements obtained from tag devices to resolve
4 ambiguity occurred in mobile object tracking. The computer cloud uses an
optimization method to reduce power consumption of tag devices.
6
7 System Overview
8 Turning to Fig. 1, an object tracking system is shown, and is
generally
9 identified using numeral 100. The object tracking system 100 comprises
one or
more imaging devices 104, e.g., security cameras or other camera devices,
11 deployed in a site 102, such as a campus, a building, a shopping center
or the like.
12 Each imaging device 104 is communicated with a computer network or cloud
108
13 via suitable wired communication means 106, such, as Ethernet, serial
cable,
14 parallel cable, USB cable, HDMI cable or the like, and/or via suitable
wireless
communication means such as Wi-Fi , BluetoothO, ZigBee , 3G or 4G wireless
16 telecommunications or the like. In this embodiment, the computer cloud
108 is also
17 deployed in the site 102, and comprises one or more server computers 110
18 interconnected via necessary communication infrastructure.
19 One or more mobile objects 112, e.g., one or more persons, enter
the
site 102, and may move to different locations therein. From time to time, some
21 mobile objects 112 may be moving, and some other mobile objects 112 may
be
22 stationary. Each mobile object 112 is associated with a tag device 114
movable
23 therewith. Each tag device 114 communicates with the computer cloud 108
via
32

CA 02934102 2016-06-22
1 suitable wireless communication means 116, such as Wi-Fi , Bluetoothe,
ZigBee0,
2 3G or 4G wireless telecommunications, or the like. The tag devices 114
may also
3 communicate with other nearby tag devices using suitable peer-to-peer
wireless
4 communication means 118. Some mobile objects may not have a tag device
associated therewith, and such objects cannot benefit fully from the
embodiments
6 disclosed herein.
7 The computer cloud 108 comprises one or more server computers
8 110 connected via suitable wired communication means 106. As those
skilled in the
9 art understand, the server computers 110 may be any computing devices
suitable
for acting as servers. Typically, a server computer may comprise one or more
11 processing structures such as one or more single-core or multiple-core
central
12 processing units (CPUs), memory, input/output (I/O) interfaces including
suitable
13 wired or wireless networking interfaces, and control circuits connecting
various
14 computer components. The CPUs may be, e.g., Intel microprocessors
offered by
Intel Corporation of Santa Clara, CA, USA, AMD0 microprocessors offered by
16 Advanced Micro Devices of Sunnyvale, CA, USA, ARM microprocessors
17 manufactured by a variety of manufactures under the ARM architecture
developed
18 by ARM Ltd. of Cambridge, UK, or the like. The memory may be volatile
and/or non-
19 volatile, non-removable or removable memory such as RAM, ROM, EEPROM,
solid-state memory, hard disks, CD, DVD, solid-state memory, flash memory, or
the
21 like. The networking interfaces may be wired networking interfaces such
as
22 Ethernet interfaces, or wireless networking interfaces such as WiFi0,
Bluetooth0,
23 3G or 4G mobile telecommunication, ZigBee , or the like. In some
embodiments,
33

CA 02934102 2016-06-22
1
parallel ports, serial ports, USB connections may also be used for networking
2
although they are usually considered as input/output interfaces for connecting
3
input/output devices. The I/O interfaces may also comprise keyboards, computer
4 mice, monitors, speakers and the like.
The imaging devices 104 are usually deployed in the site 102 covering
6 most or
all of the common traffic areas thereof, and/or other areas of interest. The
7 imaging
devices 104 capture images of the site 102 in their respective field of views
8 (F0V5).
Images captured by each imaging device 104 may comprise the images of
9 one or more mobile objects 112 within the FOV thereof..
Each captured image is sometimes called an image frame. Each
11 imaging
device 104 captures images or image frames at a designated frame rate,
12 e.g.,
in some embodiments, 30 frames per second (fps), i.e., capturing 30 images
13 per
second. Of course, those skilled in the art understand that the imaging
devices
14 may
capture image streams at other frame rates. The frame rate of an imaging
device may be a predefined frame rate, or a frame rate adaptively designated
by the
16
computer cloud 108. In some embodiments, all imaging devices have the same
17 frame
rate. In some other embodiments, imaging devices may have different frame
18 rate.
19 As the
frame rate of each imaging device is known, each image frame
is thus captured at a known time instant, and the time interval between each
pair of
21
consecutively captured image frames is also known. As will be described in
more
22 detail
later, the computer cloud 108 analyses captured image frames to detect and
23 track
mobile objects. In some embodiments, the computer cloud 108 detects and
=
34

CA 02934102 2016-06-22
=
1 tracks mobile objects in the FOV of each imaging device by individual
analyzing
2 each image frame captured therefrom (i.e., without using historical image
frames).
3 In some alternative embodiments, the computer cloud .108 detects and
tracks
4 mobile objects in the FOV of each imaging device by analyzing a set of
consecutively captured images, including the most recently captured image and
a
6 plurality of previously consecutively captured images. In some other
embodiments,
7 the computer cloud 108 may combine image frames captured by a plurality of
8 imaging devices for detecting and tracking mobile objects.
9 Ambiguity may occur during visual tracking of mobile objects.
Ambiguity is a well-known issue in visual object tracking, and includes a
variety of
11 situations that cause visual object tracking less reliable or even
unreliable.
12 Ambiguity may occur in a single imaging device capturing images of
a
13 single mobile object. For example, in a series of images captured by an
imaging
14 device, a mobile object is detected moving towards a bush, disappeared
and then
appearing from the opposite side of the bush. Ambiguity may occur as it may be
16 uncertain whether the images captured a mobile object passing the bush
from
17 behind, or the images captured a first mobile object moved behind the
bush and
18 stayed therebehind, and then a second mobile object previously staying
behind the
19 bush now moved out thereof.
Ambiguity may occur in a single imaging device capturing images of
21 multiple mobile objects. For example, in a series of image frames
captured by an
22 imaging device, two mobile objects are detected moving towards each
other,
23 merging to one object, and then separating to two objects again and
moving apart

CA 02934102 2016-06-22
1 from each other. Ambiguity occurs in this situation as it may be
uncertain whether
2 the two mobile objects are crossing each other or the two mobile objects
are moving
3 towards each other to a meeting point (appearing in the captured images as
one
4 object), and then turning back to their respective coming directions.
Ambiguity may occur across multiple imaging devices. For example, in
6 images captured by a first imaging device, a mobile object moves and
disappears
7 from the field of view (FOV) of the first imaging device. Then, in images
captured by
8 a second, neighboring imaging device, a mobile object appears in the FOV
thereof.
9 Ambiguity may occur in this situation as it may be uncertain whether it
was a same
mobile object moving from the FOV of the first imaging device into that of the
11 second imaging device, or a first mobile object moved out of the FOV the
first
12 imaging device and a second mobile object moved into of the FOV the
second
13 imaging device.
14 Other types of ambiguity in visual object tracking are also
possible.
For example, when determining the location of a mobile object in the site 102
based
16 on the location of the mobile object in a captured image, ambiguity may
occur as
17 the determined location may not have sufficient precision required by
the system.
18 In embodiments disclosed herein, when ambiguity occurs, the system
19 uses tag measurements obtain from tag devices to associate objects
detected in
captured images and the tag devices for resolving the ambiguity.
21 Each tag device 114 is a small, battery-operated electronic
device,
22 which in some embodiments, may be a device designed specifically for
mobile
23 object tracking, or alternatively may be a multi-purpose mobile device
suitable for
=
36

CA 02934102 2016-06-22
1 mobile device tracking, e.g., a smartphone, a tablet, a smart watch and
the like.
2 Moreover, in some alternative embodiments, some tag devices may be
integrated
3 with the corresponding mobile objects such as carts, wheelchairs, robots and
the
4 like.
Each tag device comprises a processing structure, one or more
6 sensors and necessary circuit connecting the sensors to the processing
structure.
7 The processing structure controls the sensors to collect data, also
called tag
8 measurements or tag observations, and establishes communication with the
9 computer cloud 108. In some embodiments, the processing structure may
also
establish peer-to-peer communication with other tag devices 114. Each tag
device
11 also comprises a unique identification code, which is used by the
computer cloud
12 108 for uniquely identifying the tag devices 114 in the site 102.
13 In different embodiments, the tag device 114 may comprise one or
14 more sensors for collecting tag measurements regarding the mobile object
112. The
number and types of sensors used in each embodiment depend on the design
16 target thereof, and may be selected by the system designer as needed
and/or
17 desired. The sensors may include, but not limited to, an inertial
Measurement Unit
18 (IMU) having accelerometers and/or gyroscopes (e.g., . rate gyros) for
motion
19 detection, a barometer for measuring atmospheric pressure, a thermometer
for
measuring temperature external to the tag 114, a magnetometer, a global
21 navigation satellite system (GNSS) sensor, e.g., a Global Positioning
System (GPS)
22 receiver, an audio frequency microphone, a light sensor, a camera, and
an RSS
37

CA 02934102 2016-06-22
1
measurement sensors for measuring the signal strength of a received wireless
2 signal.
3 An RSS
measurement sensor is a sensor for measuring the signal
4
strength of a received wireless signal received from a transmitter, for
estimating the
distance from the transmitter. The RSS measurement may be useful for
estimating
6 the
location of a tag device 114. As described above, a tag device 114 may
7
communicate with other nearby tag devices 114 using peer-to-peer
communications
8 118. For example, some tag devices 114 may comprise a short-distance
9
communication device such as a Bluetoothe Low Energy (BLE) device. Examples
of BLE devices include transceivers using the iBeaconTM technology specified
by
11 Apple
Inc. of Cupertino, CA, U.S.A. or using Samsung's ProximityTM technology. As
12 those
skilled in the art understand, a BLE device broadcasts a BLE signal (so-called
13 BLE
beacon), and/or receives BLE beacons transmitted from nearby BLE devices.
14 A BLE
device may be a mobile device such as a tag device 114, a smartphone, a
tablet, a laptop, a personal data assistant (PDA) or the like that uses a BLE
16
technology. A BLE device may also be a stationary device such as a BLE
17 transmitter deployed in the site 102.
18 A BLE
device may detect BLE beacons transmitted from nearby BLE
19
devices, determine their identities using the information embedded in the BLE
beacons, and establish peer-to-peer link therewith. A BLE beacon usually
includes
21 a
universally unique identifier (UUID), a Major ID and = a Minor ID. The UUID
22
generally represents a group, e.g., an organization, a firm, a company or the
like,
23 and is
the same for all BLE devices in a same group. The Major ID represents a
38

CA 02934102 2016-06-22
1 subgroup, e.g., a store of a retail company, and is the same for all BLE
devices in a
2 same subgroup. The Minor ID represents the BLE device in a subgroup. The
3 combination of the UUID, Major ID and Minor ID, i.e., (UUID, Major ID,
Minor ID),
4 then uniquely determines the identity of the BLE device..
The short-distance communication device may comprise sensors for
6 wireless receiver signal strength (RSS) measurement, e.g., Bluetoothe RSS
7 measurement. As those skilled in the art appreciate, a BLE beacon may
further
8 include a reference transmit signal power indicator. Therefore, a tag
device 114,
9 when detects a BLE beacon broadcast from a nearby transmitter BLE device
(which
may be a nearby tag device 114 or a different BLE device such as a BLE
transmitter
11 deployed in the site 102), may measure the received signal power of the
BLE
12 beacon obtaining a RSS measurement, and compare the RSS measurement with
13 the reference transmit signal power embedded in the BLE beacon to
estimate the
=
14 distance from the transmitter BLE device.
The system 100 therefore may use the RSS measurement obtained
16 by a target tag device regarding the BLE beacon of a transmitter BLE
device to
17 determine that two mobile objects 112 are in close proximity such as two
persons in
18 contact, conversing, or the like (if the transmitter BLE device is
another tag device
19 114), or to estimate the location of the mobile object 112 associated
with the target
tag device (if the transmitter BLE device is a BLE transmitter deployed at a
known
21 location), which may be used to facilitate the detection and tracking of
the mobile
22 object 112.
39

CA 02934102 2016-06-22
1
Alternatively, in some embodiments, the system may comprise a map
2 of the
site 102 indicative of the transmitter signal strength of a plurality of
wireless
3 signal
transmitters, e.g., Bluetooth and/or WiFi access points, deployed at known
4
locations of the site 102. The system 100 may use this wireless signal
strength map
and compare with the RSS measurement of a tag device 114 to estimate the
6
location of the tag device 114. In these embodiments, the wireless signal
7
transmitters do not need to include a reference transmit.signal power
indicator in the
8 beacon.
9 The
computer cloud 108 tracks the mobile objects 112 using
information obtained from images captured by the one or more imaging devices
104
11 and
from the above-mentioned sensor data of the tag devices 114. In particular,
the
12
computer cloud 108 detects foreground objects or foreground feature clusters
13 (FFCs)
from images captured by the imaging devices 104 using image processing
14 technologies.
Herein, the imaging devices 104 are located at fixed locations in the
16 site
102, generally oriented toward a fixed direction (except that in some
17
embodiments an imaging device may occasionally pan to a different direction),
and
18
focused, to provide a reasonably static background. Moreover, the lighting in
the
19 FOV of
each imaging device is generally unchanged for the time intervals of interest,
or the lighting changing slowly that it may be considered unchanged among a
finite
21 number
of consecutively captured images. Generally, the computer cloud 108
22
maintains a background image for each imaging device 104, which typically
23
comprising image of permanent features of the site such as floor, ceiling,
walls and

CA 02934102 2016-06-22
1 the like, and semi-permanent structures such as furniture, plants, trees
and the like.
2 The computer cloud 108 periodically updates the background images.
3 Mobile objects, being moving or stationary, generally appear in
the
4 captured images as foreground objects or FFCs that occlude the
background. Each
FFC is an identified area in the captured images corresponding to a moving
object
6 that may be associated with a tag device 114. Each FFC is bounded by a
bounding
7 box. A mobile object being stationary for an extended period of time,
however, may
8 become a part of the background and undetectable from the captured
images.
9 The computer cloud 108 associates detected FFCs with tag devices
114 using the information of the captured images and information received from
the
11 tag devices 114, for example, both evidencing motion of 1 meter per
second. As
12 each tag device 114 is associated with a mobile object 112, an FFC
successfully
13 associated with a tag device 114 is then considered an identified mobile
object 112,
14 and is tracked in the site 102.
Obviously, there may exist mobile objeCts = in the site 102 but not
16 associated with any tag device 114, which cannot be identified. Such
unidentified
17 mobile objects may be robots, animals, or may be people without a tag
device. In
18 this embodiment, unidentified mobile objects are ignored by the computer
cloud 108.
19 However, those skilled in the art appreciate that, alternatively, the
unidentified
mobile objects may also be tracked, to some extent, 'solely by using images
21 captured by the one or more imaging devices 104.
22 Fig. 2 is a schematic diagram showing the functional structure 140
of
23 the object tracking system 100. As shown, the computer cloud 108
functionally
41

CA 02934102 2016-06-22
1 comprises a computer vision processing structure 146 and a network
arbitrator
2 component 148. Each tag device 114 functionally comprises one or more
sensors
3 150 and a tag arbitrator component 152.
4 The network arbitrator component 148 and the tag arbitrator
component 152 are the central components of the system 100 as they "arbitrate"
6 the observations to be done by the tag device 114. The network arbitrator
7 component 148 is a master component and the tag arbitrator components 152
are
8 slave components. Multiple tag arbitrator components .152 may communicate
with
9 the network arbitrator component 148 at the same time and observations
therefrom
may be jointly processed by the network arbitrator component 148.
11 The network arbitrator component 148 manages all tag devices 114
in
12 the site 102. When a mobile object 112 having a tag device 114 enters
the site 102,
13 the tag arbitrator component 152 of the tag device 114 automatically
establishes
14 communication with the network arbitrator component 148 of the computer
cloud
108, via a so called "handshaking" process. With handshaking, the tag
arbitrator
16 component 152 communicates its unique identification code to the network
17 arbitrator component 148. The network arbitrator component 148 registers
the tag
18 device 114 in a tag device registration table (e.g., a table in a
database), and
19 communicates with the tag arbitrator component 152 of the tag device 114
to
understand what types of tag measurements can be provided by the tag device
114
21 and how much energy each tag measurement will consume.
22 During mobile object tracking, the network arbitrator component
148
23 maintains communication with the tag arbitrator components 152 of all
tag devices
42

CA 02934102 2016-06-22
1 114, and may request one or more tag arbitrator component 152 to provide
one or
2 more tag measurements. The tag measurements that a tag device 114 can
provide
3 depend on the sensors installed in the tag device. For example,
accelerometers
4 have an output triggered by magnitude of change of acceleration, which
can be
used for sensing the moving of the tag device 114. The accelerometer and rate
gyro
6 can provide motion measurement of the tag device 114 or the mobile object
112
7 associated therewith. The barometer may provide air pressure measurement
8 indicative of the elevation of the tag device 114.
9 With the information of each tag device 114 obtained during
handshaking, the network arbitrator component 148 can dynamically determine,
11 which tag devices and what tag measurements therefrom are needed to
facilitate
12 mobile object tracking with minimum power consumption incurred to the
tag devices
13 (described in more detail later).
14 When the network arbitrator component 148 is no longer able to
communicate with the tag arbitrator component 152 of a tag device 114 for a
16 predefined period of time, the network arbitrator component 148
considers that the
17 tag device 114 has left the site 102 or has been deactivated or turned
off. The
18 network arbitrator component 148 then deletes the tag device 114 from
the tag
19 device registration table.
Shown in Fig. 2, a camera system 142 such as a security camera
21 system (SOS) controls the one or more imaging devices 104, collects
images
22 captured by the imaging devices 104, and sends captured images to the
computer
23 vision processing structure 146.
43

CA 02934102 2016-06-22
1 The
computer vision processing structure 146 processes the received
2 images
for detecting FFCs therein. Generally, the computer vision processing
3
structure 146 maintains a background image for each imaging device 104. When
an
4 image
captured by an imaging device 104 is sent to the computer vision processing
structure 146, the computer vision processing structure 146 calculates the
6
difference between the received image and the stored background image to
obtain
7 a
difference image. With suitable image processing technology, the computer
vision
8
processing structure 146 detects the FFCs from the difference image. In this
9
embodiment, the computer vision processing structure 148 periodically updates
the
background image to adapt to the change of the background environment, e.g.,
the
11 illumination change from time to time.
12 Fig. 3
shows an FFC 160 detected in a captured image. As shown, a
13
bounding box 162 is created around the extremes of the blob of the FFC 160. In
this
14
embodiment, the bounding box is a rectangular bounding box, and is used in
image
analysis unless detail, e.g., color, pose and other features, of the FFC is
required.
16 A
centroid 164 of FFC 160 is determined. Here, the centroid 164 is not
17 necessarily the center of the bounding box 162.
18 A
bounding box tracking point (BBTP) 166 is determined at a location
19 on the
lower edge of the bounding box 162 such that a. virtual line between the
centroid 164 and the BBTP 166 is perpendicular to the lower edge of the
bounding
21 box
162. The BBTP 166 is used for determining the location of the FFC 160 (more
22
precisely the mobile object represented by FFC 160) in the site 102. In some
44

CA 02934102 2016-06-22
1 alternative embodiments, both the centroid 164 and the BBTP 166 are used
for
2 determining the location of the FFC 160 in the site 102..
3 In some embodiments, the outline of the FFC 160 may be reduced to
4 a small set of features based on posture to determine, e.g., if the
corresponding
mobile object 112 is standing or walking. Moreover, analysis of the FFC 160
6 detected over a group of sequentially captured images may show that the
FFC 160
7 is walking and may further provide an estimate of the gait frequency. As
will be
8 described in more detail later, a tag-image correlation between the tag
9 measurements, e.g., gait frequency obtained by tag devices, and the
analysis
results of the captured images may be calculated for establishing FFC-tag
11 association.
12 The computer vision processing structure 146 sends detected FFCs
to
13 the network arbitrator component 148. The network arbitrator component 148
14 associate the detected FFCs with tag devices 114, and, if needed,
communicates
with the tag arbitrator components 152 of the tag devices 114 to obtain tag
16 measurements therefrom for facilitating FFC-tag association.
17 The tag arbitrator component 152 of a tag device 114 may
18 communicate with the tag arbitrator components 152 of other nearby tag
devices
19 114 using peer-to-peer communications 118.
Fig. 4 is a schematic diagram showing the main function blocks of the
21 system 100 and the data flows therebetween. As shown, the camera system
142
22 feeds images captured by the cameras 104 in the site 102 into the
computer vision
23 processing block 146. The computer vision processing block 146 processes
the

CA 02934102 2016-06-22
1 images received from the camera system 142 such as necessary filtering,
image
2 corrections and the like, and isolates or detects a set of FFCs in the
images that
3 may be associated with tag devices 114.
4 The set of FFCs and their associated bounding boxes are then sent
to
the network arbitrator component 148. The network arbitrator component 148
6 analyzes the FFCs and may request the tag arbitrator components 152 of
one or
7 more tag devices 114 to report tag measurements for facilitating FFC-tag
8 association.
9 Upon receiving a request from the network arbitrator component
148,
the tag arbitrator component 152 in response makes necessary tag measurements
11 from the sensors 150 of the tag device 114, and sends tag measurements
to the
12 network arbitrator component 148. The network arbitrator component 148
uses
13 received tag measurements to establish the association between the FFCs
and the
14 tag devices 114. Each FFC associated with a tag device 114 is considered
as an
identified mobile object 112 and is tracked by the system 100.
16 The network arbitrator component 148 stores each FFC-tag
17 association and an association probability thereof (FFC-tag association
probability,
18 described later) in a tracking table 182 (e.g., a table in a database).
The tracking
19 table 182 is updated every frame as required.
Data of FFC-tag associations in the tracking table 182, such as the
21 height, color, speed and other feasible characteristics of the FFCs, is
fed back to
22 the computer vision processing block 146 for facilitating the computer
vision
23 processing block 146 to better detect the FFC in subsequent images.
46 =

CA 02934102 2016-06-22
1 Figs. 5A and 5B illustrate a flowchart 200, in two sheets, showing
2 steps of a process of tracking mobile objects 112 using a vision assisted
hybrid
3 location algorithm. As described before, a mobile object 112 is
considered by the
4 system 100 as an FFC associated with a tag device 114, or an "FFC-tag
association"
for simplicity of description.
6 The process starts when the system is started (step 202). After
start,
7 the system first go through an initialization step 204 to ensure that all
function
8 blocks are ready for tracking mobile objects. For ease of illustration,
this step also
9 includes tag device initialization that will be executed whenever a tag
device enters
the site 102.
11 As described above, when a tag device 114 is activated, e.g.,
entering
12 the site 102, or upon turning on, it automatically establishes
communication with the
13 computer cloud 108, via the "handshaking" process, to register itself in
the computer
14 cloud 108 and to report to the computer cloud regarding what types of
tag
measurements can be provided by the tag device 114 and how much energy each
16 tag measurement will consume.
17 As the newly activated tag device 114 does not have any prior
18 association with an FFC, the computer cloud 108, during handshaking,
requests the
19 tag device 114 to conduct a set of observations or measurements to
facilitate the
subsequent FFC-tag association with a sufficient FFC-tag association
probability.
21 For example, in an embodiment, the site 102 is a building, with a Radio-
frequency
22 identification (RFID) reader and an imaging device 104 installed at the
entrance
23 thereof. A mobile object 112 is equipped with a tag device 114 having an
RFID tag.
47

CA 02934102 2016-06-22
1 When the mobile object 112 enters the site 102 through the entrance
thereof, the
2 system detects the tag device 114 via the RFID reader. The detection of
the tag
3 device 114 is then used for associating the tag device with the FFC
detected in the
4 images captured by the imaging device at the entrance of the site 102.
Alternatively, facial recognition using images captured by the imaging
6 device at the entrance of the site 102 may be used to = establish initial
FFC-tag
7 association. In some alternatively embodiments, other biometric sensors
coupled to
8 the computer cloud 108, e.g., iris or fingerprint scanners, may be used
to establish
9 initial FFC-tag association.
After initialization, each imaging device 104 of the camera system 142
11 captures images of the site 102, and send a stream of captured images to
the
12 computer vision processing block 146 (step 206).
13 The computer vision processing block 146 detects FFCs from the
14 received image streams (step 208). As described before, the computer
vision
processing structure 146 maintains a background image for each imaging device
16 104. When a captured image is received, the computer vision processing
structure
17 146 calculates the difference between the received image and the stored
18 background image to obtain a difference image, and detects FFCs from the
19 difference image.
The computer vision processing block 146 then maps the detected
21 FFCs into a three-dimensional (3D), physical-world coordinate system of
the site by
22 using, e.g., a perspective mapping or perspective transform technology
(step 210).
23 With the perspective mapping technology, the computer vision processing
block 146
48

CA 02934102 2016-06-22
1 maps points in a two-dimensional (2D) image coordinate system (i.e., a
camera
2 coordinate system) to points in the 3D, physical-world coordinate system
of the site
3 using a 3D model of the site. The 3D model of site is generally a
description of the
4 site and comprises a plurality of localized planes connected by stairs
and ramps.
The computer vision processing block 146 determines the location of the
6 corresponding mobile object in the site by mapping the BBTP and/or the
centroid of
7 the FFC to the 3D coordinate system of the site.
8 The computer vision processing block 146 sends detected FFCs,
9 including their bounding boxes, BBTPs, their locations in the site and
other relevant
information, to the network arbitrator component 148 (step 212). The network
11 arbitrator component 148 then collaborates with the tag arbitrator
components 152
12 to associate each FFC with a tag device 114 and track the FFC-tag
association, or,
13 if an FFC cannot be associated with any tag device 114, mark it as
unknown
14 (steps 214 to 240).
In particular, the network arbitrator component 148 selects an FFC,
16 and analyzes the image streams regarding the selected FFC (step 214).
Depending
17 on the implementation, in some embodiments, the image stream from the
imaging
18 device that captures the selected FFC is analyzed. In some other
embodiments,
19 other image streams, such as image streams from neighboring imaging
devices, are
also used in the analysis.
21 In this embodiment, the network arbitrator component 148 uses a
22 position estimation method based on a suitable statistic model, such as
a first order
23 Markov process, and in particular, uses a Kalman filter with a first
order Markov
49

CA 02934102 2016-06-22
1 Gaussian process, to analyze the FFCs in the current images and
historical images
2 captured by the same imaging device to associate the FFCs with tag
devices 114
3 for tracking. Motion activities of the FFCs are estimated, which may be
compared
4 with tag measurements for facilitating the FFC-tag association.
Various types of image analysis may be used for estimating motion
6 activity and modes of the FFCs.
7 For example, analyzing the BBTP of an FFC and background may
8 determine whether the FFC is stationary or moving in foreground. Usually,
a slight
9 movement is detectable. However, as the computer vision processing
structure 146
periodically updates the background image, a long-term stationary object 112
may
11 become indistinguishable from background, and no FFC corresponding to
such
12 object 112 would be reliably detected from captured images. In some
embodiments,
13 if an FFC that has been associated with a tag device disappears at a
location, i.e.,
14 the FFC is no longer detectable in the current image, but have been
detected as
stationary in historical images, the computer cloud 108 then assumes that a
"hidden"
16 FFC is still at the last known location, and maintains the association
of the tag
17 device with the "hidden" FFC.
18 By analyzing the BBTP of an FFC and background, it may be detected
19 that an FFC spontaneously appears from the background, if the FFC is
detected in
the current image but not in historical images previously captured by the same
21 imaging device. Such a spontaneous appearance of FFC may indicate that a
long-
22 term stationary mobile object starts to move, that a mobile object
enters the FOV of
23 the imaging device from a location undetectable by the imaging device
(e.g., behind

CA 02934102 2016-06-22
1 a door) if the FFC appears at an entrance location such as a door, or
that a mobile
2 object enters the FOV of the imaging device from the FOV of a neighboring
imaging
3 device if the FFC appears at about the edge of the captured image. In some
4 embodiments, the computer cloud 108 jointly processes the image streams
from all
imaging devices. If an FFC FA associated with a tag device TA disappears from
the
6 edge of the FOV of a first imaging device, and a new FFC FB spontaneously
7 appears in the FOV of a second, neighboring imaging device at a
corresponding
8 edge, the computer cloud 108 may determine that the mobile object previously
9 associated with FFC FA has moved from the FOV of the first imaging device
into
the FOV of the second imaging device, and associates the FFC FB with the tag
11 device TA.
12 By determining the BBTP in a captured image and mapping it into
the
13 3D coordinate system of the site using perspective mapping, the location
of the
14 corresponding mobile object in the site, or its coordinate in the 3D
coordinate
system of the site, may be determined.
16 A BBTP may be mapped from a 2D image coordinate system into 3D,
17 physical-world coordinate system of the site using perspective mapping,
and
18 various inferences can then be extracted therefrom.
19 For example, as will be described in more detail later, a BBTP may
appear to suddenly "jump", i.e., quickly move upward, if the mobile object
moves
21 partially behind a background object and is partially occluded, or may
appear to
22 quickly move downwardly if the mobile object is moving out of the
occlusion. Such a
23 quick upward/downward movement is unrealistic from a Bayesian
estimation. As will
51

CA 02934102 2016-06-22
1 be described in more detail later, the system 100 can detect such
unrealistic
2 upward/downward movement of the BBTP and correctly identify occlusion.
3 Identifying occlusion may be further facilitated by a 3D site map
with
4 identified background structures, such as trees, statues, posts and the
like, that may
cause occlusion. By combining the site map and the tracking information mapped
6 thereinto, a trajectory of the mobile object passing possible background
occlusion
7 objects may be derived with a high reliability.
8 If it is detected that the height of the bounding box of the FFC
is
9 shrinking or increasing, it may be determined that the mobile object
corresponding
the FFC is moving away from or moving towards the imaging device,
respectively.
11 The change of scale of the FFC bounding box may be combined with the
position
12 change of the FFC in the captured images to determine the moving
direction of the
13 corresponding mobile object. For example, if the FFC is stationary or
slightly moving,
14 but the height of the bounding box of the FFC is shrinking, it may be
determined
that the mobile object corresponding the FFC is moving radially away from the
16 imaging device.
17 The biometrics of the FFC, such as height, width, face, stride
length of
18 walking, length of arms and/or legs, and the like may be detected using
suitable
19 algorithms for identification of the mobile object. For example, an
Eigenface
algorithm may be used for detecting face features of an FFC. The detected face
21 features may be compared with those registered in a database to
determine the
22 identity of the corresponding mobile object, or be used to compare with
suitable tag
23 measurements to identify the mobile object.
52

CA 02934102 2016-06-22
1 The
angles and motion of joints, e.g., elbows and knees, of the FFC
2 may be
detected using segmentation methods, and correlated with plausible motion
3 as
mapped into the 3D coordinate system of the site. The detected angles and
4 motion
of joints may be used for sensing the activity of the corresponding mobile
object such as walking, standing, dancing or the like. For example, in Fig. 3,
it may
6 be
detected that the mobile object corresponding to FFC 160 is running by
7
analyzing the angles of the legs with respect to the body. Generally, this
analysis
8
requires at least some of the joints of the FFC is unobstructed in the
captured
9 images.
Two mobile objects may merge into one FFC in captured images. By
11 using a
Bayesian model, it may be detected that an FFC corresponding to two or
12 more
occluding objects. As will be described in more detail later, when
establishing
13 FFC-tag
association, the FFC is associated with the tag devices of the occluding
14 mobile objects.
Similarly, two or more FFCs may emerge from a previously single FFC,
16 which
may be detected by using the Bayesian model. As will be described in more
17 detail
later, when establishing FFC-tag association, each of these FFCs is
18 associated with a tag device with an FFC-tag association probability.
19 As
described above, based on the perspective mapping, the direction
of the movement of an FFC may be detected. With the assumption that the
21
corresponding mobile object is always facing the direction of the movement,
the
22 heading
of the mobile object may be detected by tracking the change of direction of
23 the FFC
in the 3D coordinate system. If the movement trajectory of the FFC
53

CA 02934102 2016-06-22
1 changes direction, the direction change of the FFC would be highly
correlated with
2 the change of direction sensed by the IMU of the corresponding tag
device.
3 Therefore, tag measurements comprising data obtained from the IMU
4 (comprising accelerometer and/or gyroscope) may be used to for
calculating a tag-
image correlation between the IMU data, or data obtained from the
accelerometer
6 and/or gyroscope, and the FFC analysis of captured images to determine
whether
7 the mobile object corresponding to the FFC is changing its moving
direction. In an
8 alternative embodiment, data obtained from a magnetometer may be used and
9 correlated with the FFC analysis of captured images to determine whether
the
mobile object corresponding to the FFC is changing its moving direction.
11 The colors of the pixels of the FFC may also be tracked for
12 determining the location and environment of the corresponding mobile
object. Color
13 change of the FFC may be due to lighting, the pose of the mobile object,
the
14 distance of the mobile object from the imaging device, and/or the like.
A Bayesian
model may be used for tracking the color attributes of the FFC.
16 By analyzing the FFC, a periodogram of walking gait of the
17 corresponding mobile object may be established. The periodicity of the
walking gait
18 can be determined from the corresponding periodogram of the bounding box
19 variations.
For example, if a mobile object is walking, the bounding box of the
21 corresponding FFC will undulate with the object's walking. The bounding
box
22 undulation can be analyzed in terms of it frequency and depth for
obtaining an
23 indication of the walking gait.
54

CA 02934102 2016-06-22
1 The
above list of analysis is non-exhaustive, and may be selectively
2 included in the system 100 by a system designer in various embodiments.
3
Referring back to Fig. 5A, at step 216, the network arbitrator
4
component uses the image analysis results to calculate an FFC-tag association
probability between the selected FFC and each of one or more candidate tag
6 devices
114, e.g., the tag devices 114 that have not been associated with any FFCs.
7 At this
step, no tag measurements are used in calculating the FFC-tag association
8 probabilities.
9 Each
calculated FFC-tag association probability is an indicative
measure of the reliability of associating the FFC with a candidate tag device.
If any
11 of the
calculated FFC-tag association probabilities is higher than a predefined
12
threshold, the selected can be associated with a tag device without using any
tag
13 measurements.
14 In some
situations, an FFC may be associated with a tag device 114
and tracked by image analysis only and without using any tag measurements. For
16
example, if a captured image comprises only one FFC, and there is only one tag
17 device
114 registered in the system 100, the FFC may be associated with the tag
18 device 114 without using any tag measurements.
19 As
another example, the network arbitrator component 148 may
analyze the image stream captured by an imaging device, including the current
21 image
and historical images captured by the same imaging device, to associate an
22 FFC in
the current image with an FFC in previous images such that the associated
23 FFCs
across these images represent a same object. If such object has been
=

CA 02934102 2016-06-22
=
1 previously associated with a tag device 114, then the FFC in the current
image may
2 be associated with the same tag device 114 without using any tag
measurements.
3 As a further example, the network arbitrator component 148 may
4 analyze a plurality of image streams, including the current images and
historical
images captured by the same and neighboring imaging devices, to associate an
6 FFC with a tag device. For example, if an identified FFC in a previous
image
7 captured by a neighboring imaging device appears to be leaving the FOV
thereof
8 towards the imaging device that captures the current image, and the FFC
in the
9 current image with an FFC appears to enter the FOV thereof from the
neighboring
imaging device, then the FFC in the current image may be considered the same
11 FFC in the previous image captured by the neighboring imaging device,
and can be
12 identified, i.e., associated with the tag device that was associated
with the FFC in
13 the previous image captured by the neighboring imaging device.
14 At step 218, the network arbitrator component 148 uses the
calculated
FFC-tag association probabilities to check if the selected FFC can be
associated
16 with a tag device 114 and tracked without using any tag measurements. If
any of
17 the calculated FFC-tag association probabilities is higher than a
predefined
18 threshold, the selected can be associated with a tag device without
using any tag
19 measurements, the process goes to step 234 in Fig. 5B (illustrated in
Figs. 5A and
5B using connector C).
21 However, if at step 218, none of the calculated FFC-tag
association
22 probabilities is higher than a predefined threshold, the selected FFC
can only be
23 associated with a tag device if further tag measurements are obtained.
The network
56

CA 02934102 2016-06-22
1 arbitrator component 148 then determines, based on the analysis of step
214, a set
2 of tag measurements that may be most useful for establishing the FFC-tag
3 association with a minimum tag device power consumption, and then
requests the
4 tag arbitrator components 152 of the candidate tag devices 114 to
activate only the
related sensors to gather the requested measurements, and report the set of
tag
6 measurements (step 220).
7 Depending on the sensors installed on the tag device 114, numerous
8 attributes of a mobile object 112 may be measured.
9 For example, by using the accelerometer and rate gyro of the IMU,
a
mobile object in a stationary state may be detected. In particular, a motion
11 measurement is first determined by combining and weighting the magnitude
of the
12 rate gyro vector and the difference in the accelerometer vector
magnitude output. If
13 the motion measurement does not exceed a predefined motion threshold for
a
14 predefined time threshold, then the tag device 114, or the mobile object
112
associated therewith, is in a stationary state. There can be different levels
of static
16 depending on how long the threshold has not been exceeded. For example,
one
17 level of static may be sitting still for 5 seconds, and another level of
static may be
18 lying inactively on a table for hours.
19 Similarly, a mobile object 112 transition from stationary to
moving may
be detected by using the accelerometer and rate gyro of the IMU. As described
21 above, the motion measurement is first determined. If the motion
measurement
22 exceeds the predefined motion threshold for a predefined time threshold,
the tag
23 device 114 or mobile object 112 is in motion.
57

CA 02934102 2016-06-22
1 Slight
motion, walking or running of a mobile object 112 may be
2
detected by using the accelerometer and rate gyro of the IMU. While being non-
3
stationary, a tag device 114 or mobile object 112 in motion of slight motion
while
4
standing in one place, walking at a regular pace, running or jumping may be
further
determined using outputs of the accelerometer and rate gyro. Moreover, the
outputs
6 of the
accelerometer and rate gyro may also be used for recognizing gestures of the
7 mobile object 112.
8
Rotating of a mobile object 112 while walking or standing still may be
9
detected by using the accelerometer and rate gyro of the IMU. Provided that
attitude
of the mobile object 114 does not change during the rotation, the angle of
rotation is
11
approximately determined from the magnitude of the rotation vector, which may
be
12 determined from the outputs of the accelerometer and rate gyro.
13 A
mobile object 112 going up/down stairs may be detected by using
14 the
barometer and accelerometer. Using output of the barometer, pressure changes
may be resolvable almost to each step going up or down stairs, which may be
16 confirmed by the gesture detected from the output of the accelerometer.
17 A
mobile object 112 going up/down elevator may be detected by using
18 the
barometer and accelerometer. The smooth pressure changes between each
19 floor
as elevator ascends and descends may be detected from the output of the
barometer, which may be confirmed by a smooth change of the accelerometer
21 output.
22 A
mobile object 112 going in or out of a doorway may be detected by
23 using
the thermometer and barometer. Going from outdoor to indoor or from indoor
58

CA 02934102 2016-06-22
1 to outdoor causes a change in temperature and pressure, which may be
detected
2 from the outputs of the thermometer and barometer. Going from one room
through
3 a doorway to another room also causes change in, temperature and pressure
4 detectable by the thermometer and barometer.
Short term relative trajectory of a mobile object 112 may be detected
6 by using the accelerometer and rate gyro. Conditioned on an initial
attitude of the
7 mobile object 114, the short term trajectory may be detected based on the
8 integration and transformation of the outputs of the accelerometer and
rate gyro.
9 Initial attitudes of the mobile object 114 may need to be taken into
account in
detection of short term trajectory.
11 Periodogram of walking gait of a mobile object 112 may be detected
12 by using the accelerometer and rate gyro.
13 Fingerprinting position and trajectory of a mobile object 112
based on
14 magnetic vector may be determined by using magnetometer and
accelerometer. In
some embodiments, the system 100 comprises a magnetic field map of the site
102.
16 Magnetometer fingerprinting, aided by the accelerometer outputs, may be
used to
17 determine the position of the tag device 114/ mobile object 112. For
example, by
18 expressing the magnetometer and accelerometer measucements as two
vectors,
19 respectively, the vector cross-product of the magnetometer measurement
vector
and the accelerometer measurement vector can be calculated. With suitable time
21 averaging, deviations of such a cross-product is approximately related
to the
22 magnetic field anomalies. In an indoor environment or environment
surrounded by
23 magnetic material (such as iron rods in construction), the magnetic
field anomaly
59

CA 02934102 2016-06-22
1 will
vary significantly. Such magnetic field variation due to the building
structure and
2
furniture can be captured or recorded in the magnetic field site map during a
3
calibration process. Thereby, the likelihood of the magnetic anomalies can be
4
determined by continuously sampling the magnetic and accelerometer vectors
over
time and comparing the measured anomaly with that recorded in the magnetic
field
6 site map.
7
Fingerprinting position and trajectory of a mobile object 112 based on
8 RSS may be determined by using RSS measurement sensors, e.g., RSS
9
measurement sensors measuring Bluetooth and/or WiFi signal strength. By using
the wireless signal strength map or reference transmit signal power indicator
in the
11 beacon
as described above, the location of a tag device 114 may be approximately
12
determined using RSS fingerprinting based on the output of the RSS measurement
13 sensor.
14 A
single sample of the RSS measurement taken by a tag device 114
can be highly ambiguous as it is subjected to multipath distortion of the
16
electromagnetic radio signal. However, a sequence of samples taken by the tag
17 device
114 as it is moving with the associated mobile object 112 will provide an
18 average
that can be correlated with an RSS radio map of the site. Consequently
19 the
trend of the RSS measurements as the mobile is moving is related to the mobile
position. For example, an RSS measurement may indicate that the mobile object
is
21 moving
closer to an access point at a known position. Such RSS measurement may
22 be used
with the image based object tracking for resolving ambiguity. Moreover,
23 some types of mobile objects, such as human body, will absorb wireless

CA 02934102 2016-06-22
1 electromagnetic signals, which may be leveraged from obtaining more
inferences
2 from RSS measurement.
3 Motion related sound, such as periodic rustling of clothes items
4 brushing against the tag device, a wheeled object wheeling over a floor
surface,
sound of an object sliding on a floor surface, and the like, may be detected
by using
6 an audio microphone. Periodogram of the magnitude of the acoustic signal
captured
7 by a microphone of the tag device 114 may be used to detect walking or
running
8 gait.
9 Voice of the mobile object or voice of another nearby mobile
object
may be detected by using an audio microphone. Voice is a biometric that can be
11 used to facilitate tag-object association. By using voice detection and
voice
12 recognition, analysis of voice picked up by the microphone can be useful
for
13 determining the background environment of the tag device 114 / mobile
object 112,
14 e.g., in a quiet room, outside, in a noisy cafeteria, in a room with
reverberations and
the like. Voice can also be used to indicate approximate distance between two
16 mobile objects 112 having tag devices 114. For example, if the
microphones of two
17 tag devices 114 can mutually hear each other, the system 100 may
establish that
18 the two corresponding mobile objects are at a close distance.
19 Proximity of two tag devices may be detected by using audio
microphone and ultrasonic sounding. In some embodiments, a tag device 114 can
21 broadcast an ultrasonic sound signature using the microphone, which may
be
22 received and detected by another tag device 114 using microphone, and
used for
23 establishing the FFC-tag association and ranging.
61

CA 02934102 2016-06-22
1 The
above list of tag measurements is non-exhaustive, and may be
2 selectively included in the system 100 by a system designer in various
3
embodiments. Typically there is ample information for tag devices to measure
for
4 positively forging the FFC-tag association.
The operation of the network arbitrator component 148 and the tag
6
arbitrator component 152 is driven by an overriding optimization objective. In
other
7 words,
a constrained optimization is conducted with the objective of minimizing the
8 tag
device energy expenditure (e.g., minimizing battery consumption such that the
9 battery
of the tag device can last for several weeks). The constraint is that the
estimated location of the mobile object equipping with the tag device (i.e.,
the
11
tracking precision) is needed to be within an acceptable error range, e.g.,
within a
12 two-
meter range, and that the association probability between an FFC, i.e., an
13
observed object, and the tag device is required to be above a pre-determined
14 threshold.
In other words, the network arbitrator component 148, during above-
16
mentioned handshaking process with each tag device 114, understands what types
17 of tag
measurements can be provided by the tag device 114 and how much energy
18 each
tag measurement will consume. The network arbitrator component 148 then
19 uses
the image analysis results obtained at step 214 to determine which tag
measurement would likely give rise to a sufficient FFC-tag association
probability
21 higher
than the predefined probability threshold with a smallest power consumption.
22 In some
embodiments, one of the design goals of the system is to
23 reduce
the power consumption of the battery-driven tag devices 114. On the other
62

CA 02934102 2016-06-22
1 hand, the power consumption of the computer cloud 108 is not constrained.
In these
2 embodiments, the system 100 may be designed in such a way that the
computer
3 cloud 108 takes as much computation as possible to reduce the computation
need
4 of the tag devices 114. Therefore, the computer cloud 108 may employ
complex
vision-based object detection methods such as face recognition, gesture
recognition
6 and other suitable biometrics detection methods, and jointly processing
the image
7 streams captured by all imaging devices, to identify as many mobile
objects as
8 feasible, within their capability. The computer cloud 108 requests tag
devices to
9 report tag measurements only when necessary.
Referring back to Fig. 5A, at step 222, the tag arbitrator components
11 152 of the candidate tag devices 114 receive the tag measurement request
from the
12 network arbitrator component 148. In response, each tag arbitrator
component 152
13 makes requested tag measurements and report tag measurements to the
network
14 arbitrator component 148. The process then goes to step 224 of Fig. 5B
(illustrated
=
in Figs. 5A and 5B using connector A).
16 In this embodiment, at step 222, the tag arbitrator component 152
17 collects data from suitable sensors 150 and processes collected data to
obtain tag
18 measurements. The tag arbitrator component 152 sends tag measurements,
rather
19 than raw sensor data, to the network arbitrator component 148 to save
transmission
bandwidth and cost.
21 For example, if the network arbitrator component 148 requests a
tag
22 arbitrator component 152 to report whether its associated mobile object
is stationary
23 or walking, the tag arbitrator component 152 collects data and the IMU
and
63 =

CA 02934102 2016-06-22
1 processes collected IMU data to calculate a walking probability
indicating the
2 likelihood of the associated mobile object being walking. The tag
arbitrator
3 component 152 then sends the calculated walking = probability to the
network
4 arbitrator component 148. Comparing to transmitting the raw IMU data,
transmitting
the calculated walking probability of course consumes much less communication
6 bandwidth and power.
7 At step 224 (Fig. 5B), the network arbitrator component 148 then
8 correlates the image analysis results of the FFC and. the tag
measurements
9 received thererfrom and calculates an FFC-tag association probability
between the
FFC and each candidate tag device 114.
11 At step 226, the network arbitrator component 148 checks if any of
the
12 calculated FFC-tag association probabilities is greater than the predefined
13 probability threshold. If a calculated FFC-tag association probability
is greater than
14 the predefined probability threshold, the network arbitrator component 148
associates the FFC with the corresponding tag device 114 (step 234).
16 At step 236, the network arbitrator component 148 stores the FFC-
tag
17 association in the tracking table 182, together with data related
thereto such as the
18 location, speed, moving direction, and the like, if the tag device 114
has not yet
19 been associated with any FFC, or update the FFC-tag association in the
tracking
table if the tag device 114 has already associated with an FFC in previous
21 processing. The computer vision processing block 146 tracks the
FFCs/mobile
22 objects.
64

CA 02934102 2016-06-22
1 In this
way, the system continuously detects and tracks the mobile
2 objects
112 in the site 102 until the tag device 114 is no longer detectable, implying
3 that
the mobile object 112 has been stationary for an extended period of time or
has
4 moved
out of the site 102, or until the tag device 114 cannot be associated with any
FFC, implying that the mobile object 112 is at an undetectable location in the
site
6 (e.g., a location beyond the FOV of all imaging devices). .
7 After
storing/updating the FFC-tag association, the network arbitrator
8
component 148 sends data of the FFC-tag association, such as the height,
color,
9 speed
and other feasible characteristics of the FFCs, to the computer vision
processing block 146 (step 238) for facilitating the computer vision
processing block
11 146 to
better detect the FFC in subsequent images, e.g.,- facilitating the computer
12 vision processing block 146 in background differencing and bounding box
13 estimation.
14 The
process then goes to step 240, and the network arbitrator
component 148 checks if all FFCs have been processed. If yes, the process goes
to
16 step
206 of Fig. 5A (illustrated in Figs. 5A and 5B using 'connector E) to process
17 further
images captured by the imaging devices 104. If not, the process loops to
18 step
214 of Fig. 5A (illustrated in Figs. 5A and 5B using connector D) to select
19 another FFC for processing.
If, at step 226, the network arbitrator component 148 determines that
21 no
calculated FFC-tag association probability is greater than the predefined
22
threshold, the network arbitrator component 148 then checks if the candidate
tag
23 devices
114 can provide further tag measurements helpful in leading to a sufficiently

CA 02934102 2016-06-22
1 high FFC-tag association probability (step 228), and if yes, requests the
candidate
2 tag devices 114 to provide further tag measurements (step 230). The
process then
3 loops to step 222 of Fig. 5A (illustrated in Figs. 5A and 5B using
connector B).
4 If, at step 228, it is determined that no further tag measurements
would be available for leading to a sufficiently high FFC-tag association
probability,
6 the network arbitrator component 148 marks the FFC as an unknown object
7 (step 232). As described before, unknown objects are omitted, or
alternatively,
8 tracked up to a certain extent. The process then goes to step 240.
9 Although not shown in Figs. 5A and 5B, the process 200 may be
terminated upon receiving a command from an administrative user.
11 Figs. 6A to 6D show an example of establishing and tracking an FF0-

12 tag association following the process 200. As shown, the computer vision
13 processing block 146 maintains a background image 250 of an imaging
device.
14 When an image 252 of captured by the imaging device is received, the
computer
vision processing block 146 calculates a difference image 254 using suitable
image
16 processing technologies. As shown in Fig. 60, two FFCs 272 and 282 are
detected
17 from the difference image 254. The two FFCs 272 and 282 are bounded by
their
18 respective bounding boxes 274 and 284. Each bounding box 274, 284
comprises a
19 respective BBTP 276, 286. Fig. 60 shows the captured image 252 with
detected
FFCs 272 and 282 as well as their bounding boxes 274 and 284 and BBTPs 276
21 and 286.
22 When processing the FFC 272, the image analysis of image 252 and
23 historical images show that the FFC 272 is moving by a walking motion
and the
66

CA 02934102 2016-06-22
1 FFC 282 is stationary. As the image 252 comprises two FFCs 272 and 282,
FF0-
2 tag association cannot be established by using the image analysis results
only.
3 Two tag devices 114A and 114B have been registered in the system
4 100, neither of which have been associated with an FFC. Therefore, both
tag
devices 114A and 114B are candidate tag devices.
6 The network arbitrator component 148 then requests the candidate
7 tag devices 114A and 114B to measure certain characteristics of the
motion of their
8 corresponding mobile objects. After receiving the tag measurements from tag
9 devices 114A and 114B, the network arbitrator component 148 compares the
motion tag measurements of each candidate tag device with that obtained from
the
11 image analysis to calculate the probability that the object is
undergoing a walking
12 activity. One of the candidate tag devices, e.g., tag device 114A, may
obtain a
13 motion tag measurement leading to an FFC-tag association probability
higher than
14 the predefined probability threshold. The network arbitrator component
148 then
associates FFC 272 with tag device 114A and store this FFC-tag association in
the
16 tracking table 182. Similarly, the network arbitrator component 148
determines that
17 the motion tag measurement from tag device 114B indicates that its
associated
18 mobile object is in a stationary state, and thus associates tag device
114B with FFC
19 284. The computer vision processing block 146 tracks the FFCs 272 and
282.
With the process 200, the system 100 tracks the FFCs that are
21 potentially moving objects in the foreground. The system 100 also tracks
objects
22 disappearing from the foreground, i.e., tag devices not associated with
any FFC,
23 which implies that the corresponding mobile objects may be outside the
FOV of any
67
=

CA 02934102 2016-06-22
1 imaging device 104, e.g., in a washroom area or private office where
there is no
2 camera coverage. Such disappearing objects, i.e., those corresponding to
tag
3 devices with no FFC-tag association, are still tracked based on tag
measurements
4 they provide to the computer cloud 108 such as RSS measurements.
Disappearing objects may also be those Who have become static for
6 an extended period of time and therefore part of the background and hence
not part
7 of a bounding box 162. It is usually necessary for the system 100 to
track all tag
8 devices 114 because in many situations only a portion of the tag devices
can be
9 associated with FFCs. Moreover, not all FFCs or foreground objects can be
associated with tag devices. The system may track these FFCs based on image
11 analysis only, or alternatively, ignore them.
12 With the process 200, an FFC may be associated with one or more
13 tag device 114. For example, when a mobile object 1120 having a tag
device 1140
14 is sufficiently distant from other mobile objects in the FOV of an
imaging device, the
image of the mobile object 112C as an FFC is distinguishable from other mobile
16 objects in the captured images. The FFC of the mobile object 1120 is
then
17 associated with the tag device 1140 only.
18 However, when a group of mobile objects 112D are close to each
one,
19 e.g., two persons shaking hands, they may be detected. as one FFC in the
captured
images. In this case, the FFC is associated with all tag devices of the mobile
21 objects 112D.
22 Similarly, when a mobile object 112E is partially or fully
occluded in
23 the FOV of an imaging device by one or more mobile objects 112F, the
mobile
=
68

CA 02934102 2016-06-22
1 objects 112E and 112F may be indistinguishable in the captured images,
and be
2 detected as one FFC. In this case, the FFC is associated with all tag
devices of the
=
3 mobile objects 112E and 112F.
4 Those skilled in the art understand that an FFC associated with
multiple tag devices is usually temporary. Any ambiguity caused therefrom may
be
6 automatically resolved in subsequent mobile object detection and tracking
when the
7 corresponding mobile objects are separated in the FOV of the imaging
devices.
8 While the above has described a number of embodiments, those
9 skilled in the art appreciate that other alternative embodiments are also
readily
available. For example, although in above embodiments, data of FFC-tag
11 associations in the tracking table 182 is fed back to the computer
vision processing
12 block 146 for facilitating the computer vision processing block 146 to
better detect
13 the FFC in subsequent images (Fig. 4), in an alternative.embodiment, no
data of
14 FFC-tag associations is fed back to the computer vision processing block
146.
Fig. 7 is a schematic diagram showing the main function blocks of the system
100
16 and the data flows therebetween in this embodiment. The object tracking
process in
17 this embodiment is the same as the process 200 of Figs. 5A and 5B,
except that, in
18 this embodiment, the process does not have step 238 of Fig. 5B.
19 In above embodiments, the network arbitrator component 148, when
needing further tag measurements for establishing FFC-tag association, only
21 checks if the candidate tag devices 114 can provide further tag
measurements
22 helpful in leading to a sufficiently high FFC-tag association
probability (step 228 of
23 Fig 5B). In an alternative embodiment, when needing further tag
measurements of a
69

CA 02934102 2016-06-22
1 first mobile object, the network arbitrator component 148 can request tag
2 measurements from the tag devices near the first mobile object, or
directly use the
3 tag measurements if they are already sent to the computer cloud 108
(probably
4 previously requested for tracking other mobile objects). The tag
measurements
obtained from these tag devices can be used as inference to the location of
the first
6 mobile object. This may be advantageous, e.g., for saving tag device
power
7 consumption if the tag measurements of the nearby tag devices are already
8 available in the computer cloud, or when the battery power of the tag device
9 associated with the first object is low.
In another embodiment, the tag devices constantly send tag
11 measurements to the computer cloud 108 without being requested.
12 In another embodiment, each tag device attached to a non-human
13 mobile object, such as a wheelchair, a cart, a shipping box or the like,
stores a
14 Type-ID indicating the type of the mobile object. In this embodiment,
the computer
cloud 108, when requesting tag measurements, can request tag devices to
provide
16 their stored Type-ID, and then uses object classification to determine
the type of the
17 mobile object, which may be helpful for establishing FFC-tag
association. Of course,
18 alternatively, each tag device associated with a human object may also
store a
19 Type-ID indicating the type, i.e., human, of the mobile object.
In another embodiment, each tag device .is associated with a mobile
21 object, and the association is stored in a database of the computer
cloud 108. In
22 this embodiment, when ambiguity occurs in the visual tracking of mobile
objects, the

CA 02934102 2016-06-22
computer cloud 108 may request tag devices to provide their ID, and checks the
2 database to determine the identity of the mobile object for resolving the
ambiguity.
3 In another embodiment, contour segmentation can be applied in
4 detecting FFCs. Then, motion of the mobile objects can be detected using
suitable
classification methods. For example, for individuals, after detecting an FFC,
the
6 outline of the detected FFC can be characterized to a small set of
features based on
7 posture for determining if the mobile object is standing or walking.
Furthermore, the
8 motion detected over a set of sequential image frames can give rise to an
estimate
9 of the gait frequency, which may be correlated with the gait determined
from tag
measurements.
11 In above embodiments, the computer cloud 108 is deployed at the
site
12 102, e.g., at an administration location thereof. However, those skilled
in the art
13 appreciate that, alternatively, the computer cloud 108 may be deployed
at a location
14 remote to the site 102, and communicates with imaging devices 104 and
tag
devices 114 via suitable wired or wireless communication means. In some other
16 embodiments, a portion of the computer cloud 108, including one or more
server
17 computers 110 and necessary network infrastructure, may be deployed on
the site
18 102, and other portions of the computer cloud 108 may be deployed remote
to the
19 site 102. Necessary network infrastructure known in the art is required
for
communication between different portions of the computer cloud 108, and for
21 communication between the computer cloud 108 and the imaging devices 104
and
22 tag devices 114.
23
71

CA 02934102 2016-06-22
1 Implementation
2 The above embodiments show that the system and method disclosed
3 herein are highly customizable, providing great flexibility to a system
designer to
4 implement the basic principles ye design the system in a way as desired,
and adapt
to the design target that the designer has to meet and to the resources that
the
6 designer has, e.g., available sensors in tag devices, battery capacities
of tag
7 devices, computational power of tag devices and the computer cloud, and
the like.
8 In the following, several aspects in implementing the above
described
9 system are described.
11 I. Imaging Device Frame Rates
12 In some embodiments, the imaging devices 104 may have different
13 frame rates. For imaging devices with higher frame rates than others,
the computer
14 cloud 108 may, at step 206 of the process 200, reduce their frame rate
by time-
sampling images captured by these imaging devices, or by commanding these
16 imaging devices to reduce their frame rates. Alternatively, the computer
cloud 108
17 may adapt to the higher frame rates thereof to obtain better real-time
tracking of the
18 mobile objects in the FOVs of these imaging devices.
19
II. Background Images
21 The computer cloud 108 stores and periodically updates a
background
22 image for each imaging device. In one embodiment, the computer cloud 108
uses a
23 moving average method to generate the background image for each imaging
device.
72

CA 02934102 2016-06-22
1 That is, the computer cloud 108 periodically calculates the average of N
2 consecutively captured images to generate the background image. While the N
3 consecutively captured images may be slightly different to each other,
e.g., having
4 different lighting, foreground objects and the like, the differences
between these
images tend to disappear in the calculated background image when N is
sufficiently
6 large.
7
8 III. FFC Detection
9 In implementing step 208 of detecting FFCs, the computer vision
processing block 146 may use any suitable imaging processing methods to detect
11 FFCs from captured images. For example, Fig. 8 is a flowchart showing
the detail of
12 step 208 in one embodiment, which will be described together with the
examples of
13 Figs. 9A to 9F.
14 At step 302, a captured image is read into the computer vision
processing block 146. In this embodiment, the capture image is an RGB color
16 image. Fig. 9A is a line-drawn illustration of a captured color image
having two
17 facing individuals as two mobile objects.
18 At step 304, the captured image is converted to a greyscale image
19 (current image) and a difference image is generated by subtracting the
background
image, which is also a greyscale image in this embodiment, from the current
image
21 on a pixel by pixel basis. The obtained difference image is converted to
a binary
22 image by applying a suitable threshold, e.g., pixel value being equal to
zero or not.
73

CA 02934102 2016-06-22
1 Fig. 9B
shows the difference image 344 obtained from the captured
2 image
342. As can be seen, two images 346 and 348 of the mobile objects in the
3 FOV of
the imaging device have been isolated from the background. However, the
4
difference image 344 has imperfections. For example, images 346 and 348 of the
mobile objects are incomplete as some regions of the mobile objects appear in
the
6 image
with colors or grey intensities insufficient for differentiating from the
7
background. Moreover, the difference image 344 also comprises salt and pepper
8 noise pixels 350.
9 At step
306, the difference image is processed using morphological
operations to compensate imperfections. The morphological operations use
11
Morphology techniques that process images based on shapes. The morphological
12
operations apply a structuring element to the input image, i.e., the
difference image
13 in this
case, creating an output image of the same size. In morphological operations,
14 the
value of each pixel in the output image is determined based on a comparison of
the corresponding pixel in the input image with its neighbors. Imperfections
are then
16 compensated to certain extents.
17 In this
embodiment, the difference image 344 is first processed using
18
morphological opening and closing. As shown in Fig. 9C, salt and pepper noise
is
19 removed.
The difference image 344 is then processed using erosion and dilation
21
operations. As shown in Fig. 9D the shapes of the mobile object images 346 and
22 348 are
improved. However, the mobile object image 346 still contains a large
23 internal hole 354.
74

CA 02934102 2016-06-22
1 After
erosion and dilation operations, a flood fill operation is applied to
2 the
difference image 344 to close up any internal holes. The difference image 344
3 after flood fill operation is shown in Fig. 9E.
4 Also
shown in Fig. 9E, the processed difference image 344 also
comprises small spurious FFCs 356 and 358. By applying suitable size criteria
such
6 small
spurious FFCs 356 and 358 are rejected as their sizes are smaller than a
7
predefined threshold. Large spurious FFCs, on the other hand, may be retained
as
8 FFCs.
However, they may be omitted later for not being able to be associated with
9 any tag
device. In some cases, a large spurious FFC, e.g., a shopping cart, may be
associated with another FFC, e.g., a person, already associated with a tag
device,
11 based on similar motion between the two FFCs over time.
12
Referring back to Fig. 8, at step 308, the computer vision processing
13 block
146 extracts FFCs 346 and 348 from processed difference image 344, each
14 FFC
346, 348 being a connected region in the difference image 344 (see Fig. 9F).
The computer vision processing block 146 creates bounding boxes 356 and 358
16 and
their respective BBTPs (not shown) for FFCs 346 and 348, respectively. Other
17 FFC characteristics as described above are also determined.
18 After
extracting FFCs from the processed difference image, the
19 process then goes to step 210 of Fig. 5A.
The above process converts the captured color images to greyscale
21 images
for generating greyscale difference images and detecting FFCs. Those
22 skilled
in the art appreciate that in an alternative embodiment, color difference
23 images
may be generated for FFC detection by calculating the difference on each
=

CA 02934102 2016-06-22
=
1 color channel between the captured color image and the background color
image.
2 The calculated color channel differences are then weighted and added
together to
3 generate a greyscale image for FFC detection.
4 Alternatively, the calculated color channel differences may be
enhanced by, e.g., first squaring the pixel values in each color channel, and
then
6 adding together the squared values of corresponding pixels in all color
channels to
7 generate a greyscale image for FFC detection.
8
9 IV. Shadows
It is well known that shadow may be cast adjacent an object in some
11 lighting conditions. Shadows of a mobile object captured in an image may
interfere
12 with FFC detection, the FFC centroid determination and BBTP
determination. For
13 example, Fig. 10 shows a difference image 402 having the image 404 of a
mobile
14 object, and the shadow 406 thereof, which is shown in the image 402
under the
mobile object image 404. Clearly, if both the mobile object image 404 and the
16 shadow 406 were detected as an FFC, an incorrect bounding box 408 would
be
17 determined, and the BBTP would be mistakenly determined at a much lower
18 position 410, compared to the correct BBTP location 412, As a
consequence, the
19 mobile object would be mapped to a wrong location in the 3D coordinate
system of
the site, being much closer to the imaging device.
21 Various methods may be used to mitigate the impact of shadow in
22 detecting FFC and in determining the bounding box, centroid and BBTP of
the FFC.
23 For example, in one embodiment, one may leverage the fact that the color
of
76

CA 02934102 2016-06-22
1 shadows are usually different than that of the mobile object, and filters
different
2 color channels of a generated color difference image to eliminate the
shadow or
3 reduce the intensity thereof. This method would be less effective if the
color of the
4 mobile object is poorly distinguishable from the shadow.
In another embodiment, the computer vision processing block 146
6 considers the shadow as a random distribution, and analyses shadows in
captured
7 images to differentiate shadows from mobile object images. For example,
for an
8 imaging device facing a well-lit environment, where the lighting is
essentially diffuse
9 and that all the background surfaces are Lambertian surfaces, the shadow
cast by a
mobile object consists of a slightly reduced intensity in a captured image
comparing
11 to that of the background areas in the image, as the mobile object only
blocks a
12 portion of the light that is emanating from all directions. The
intensity reduction is
13 smaller with the shadow point being further from the mobile object.
Hence the
14 shadow will have an intensity distribution scaled with the distance
between shadow
points and the mobile object while the background has a deterministic
intensity
16 value. As the distance from the mobile object to the imaging device is
initially
17 unknown, the intensity of the shadow can be represented as a random
distribution.
18 The computer vision processing block 146 thus analyses shadows in images
19 captured by this imaging device using a suitable random process method to
differentiate shadows from mobile object images.
21 Some imaging devices may face an environment having specular light
22 sources and/or that the background surfaces are not Lambertian surfaces.
Shadows
23 in such environment may not follow the above-mentioned characteristics
of the
77

CA 02934102 2016-06-22
1 diffuse lighting. Moreover, lighting may change with time, e.g., due to
sunlight
2 penetration of room, electrical lights turned off or on, doors opened or
closed, and
3 the like. Light changes will also affect the characteristics of shadows.
4 In some embodiments, the computer vision processing block 146
considers the randomness of the intensities of both the background and the
shadow
6 in each color channel, and considers that generally the background varies
slowly
7 and the foreground, e.g., a mobile object, varies rapidly. Based on such
8 considerations, the computer vision processing block 146 uses a pixel-
wise high
9 pass temporal filtering to filtering out shadows of mobile, objects.
In some other embodiments, the computer vision processing block
11 146 determines a probability density function (PDF) of the background to
adapt to
12 the randomness of the lighting effects. The intensity of background and
shadow
13 components follows a mixture of gaussians (MoG) model, and a foreground,
e.g., a
14 mobile object, is then discriminated probabilistically. As there are a
large number of
neighboring pixels making up the foreground region, then a spatial MoG
16 representation of the PDF of the foreground intensity can be calculated
for
17 determining how different it is from the background or shadow.
18 In some further embodiments, the computer vision processing block
19 146 weights and combines the pixel-wise high pass temporal filtering and
the spatial
MoG models to determine if a given pixel is foreground, e.g., belonging to a
mobile
21 object, with higher probability.
22 In still some further embodiments, the computer vision processing
23 block 146 leverages the fact that, if a shadow is not properly
eliminated, the BBTP
78

CA 02934102 2016-06-22
1 of an FFC shifts from the correct location in the difference images and
may shift
2 with the change of lighting. With perspective mapping,. such a shift of
BBTP in the
3 difference images can be mapped to a physical location shift of the
corresponding
4 mobile object in the 3D coordinate system of the site. The computer
vision
processing block 146 calculates the physical location shift of the
corresponding
6 mobile object in the physical world, and requests the tag device to make
necessary
7 measurement using, e.g., the IMU therein. The computer vision processing
block
8 146 checks if the calculated physical location shift of the mobile object
is consistent
9 with the tag measurement, and compensates for the location shift using the
tag
measurement.
11
12 V. Perspective Mapping
13 As
described above, at step 210 of Fig. 5A, the extracted FFCs are
14 mapped to the 3D physical-world coordinate system of the site 102.
In one embodiment, the map of the site is partitioned into one or more
16
horizontal, planes L1, Ln, each at a different elevation. In other words,
in the 3D
17 physical world coordinate system, points in each plane have the same z-
coordinate.
18 However, points in different planes have different z-coordinates. The
FOV of each
19 imaging device covers one or more horizontal planes.
A point (xõõ,i, yw,i, 0) on a plane Li at an elevation Zi=0 and falling within
21 the FOV of an imaging device can be mapped to a point (xc, yc) in the
images
22 captured by the imaging device:
79

CA 02934102 2016-06-22
fx x1
bfyl = Hi lYw,i1 , (1)
y 1
fx
xc = ¨ P (2)
Iv
fy
(3)
1 wherein
H11,1 H12,1 H13,1
Hi = H21,1 H22,1 1123,i
F
(4)
H31i H32,1 H33,1
2 is a 9-by-9 perspective-transformation matrix.
3 The
above relationship between point (xõ,,i, .yõõ,i, 0) in physical world
4 and point (x,, yc) in a captured image may also be written as:
E
H31,1x,..x.w,1 + 1132,1xcyw1 + 113--,-,cc
H31,iYcxw,i H32 ,iYCYW,i 31 = H11,1XW,1 1112,01W,i +
H13,1, 5)
+ H33,1Yc = H21 1x1 H22,iYw,i + H23,1= (
For each imaging device, a perspective-transformation matrix Hi
6 needs
to be determined for each plane Li falling within the FOV thereof. The
7
computer vision processing block 146 uses a calibration process to determine a
8
perspective-transformation matrix for each plane in the FOV of each imaging
device.
9 In
particular, for a plane Li, 1n, falling within the FOV of an imaging
device, the computer vision processing block 146 first selects a set of four
(4) or
11 more
points on plane Li with known 3D physical-world coordinates, such as corners
12 of a
floor tile, corners of doors and/or window openings, of which no three points
are
13 in the
same line, and sets their z-values to zero. The computer vision processing
14 block
146 also identifies the set of known points from the background image and
80 .

CA 02934102 2016-06-22
1
determines their 2D coordinates therein. The computer vision processing block
146
2 then
uses a suitable optimization method such as a singular value decomposition
3 (SVD)
method to determine a perspective-transformation matrix Hi for plane Li in the
4 FOV of
the imaging device. After determining the perspective-transformation matrix
Hi, a point on plane Li can be mapped to a point in an image, or a point in an
image
6 can be mapped to a point on plane Li by using equation (5).
7 The
calibration process may be executed for an imaging device only
8 once at
the setup of the system 100, periodically such as during maintenance, as
9 needed
such as when repairing or replacing the imaging device. The calibration
process is also executed after the imaging device is reoriented or zoomed and
11 focused.
12 During
mobile object tracking, the computer vision processing block
13 146
detects FFCs from each captured image as described above. For each
14
detected FFC, the computer vision processing block 146 determines coordinates
(xe,
yc) of the BBTP of the FFC in the captured image, and determines the plane,
e.g.,
16 Lk,
that the BBTP of the FFC falls within, with the assumption that the BBTP of
the
17 FFC,
when mapping to the 3D physical world coordinate system, is on plane Lk, i.e.,
18 the z-
coordinate of the BBTP equals to that of plane Lk. The computer vision
19
processing block 146 then calculates the coordinates (xw,k, Yw,k, 0) of the
BBTP in a
3D physical world coordinate system with respect to the imaging device and
plane
21 Lk
(denoted as a "local 3D coordinate system") using above equation (5), and
22
translate the coordinates of the BBTP into a location (xw,k+Ax, yw,k+Ay, zk)
in the 3D
23
physical world coordinate system of the site (denoted as the "global 3D
coordinate
81

CA 02934102 2016-06-22
1
system"), wherein Ax and Ay are the difference between the origins of the
local 3D
2
coordinate system and the global 3D coordinate system, and zk is the elevation
of
3 plane Lk.
4 For
example, Fig. 11A is a 3D perspective view of a portion 502 of a
site 102 falling with the FOV of an imaging device, and Fig. 11B a plan view
of the
6 portion
502. For ease of illustration, the axes of a local 3D physical world
coordinate
7 system with respect to the imaging device is also shown, with Xw and Yw
8
representing the two horizontal axes and Zw representing the vertical axis. As
9 shown,
the site portion 502 comprises a horizontal, planar floor 504 having a
plurality of tiles 506, and a horizontal, planar landing 508 at a higher
elevation than
11 the floor 504.
12 As
shown in Figs. 11C and 11D, the site portion 502 is partitioned into
13 two
planes L1 and L2, with plane L2 corresponding to the floor 504 and plane L1
14 corresponding to the landing 508. Plane L1 has a higher elevation than
plane L2.
As shown in Fig. 11E, during calibration of the imaging device, the
16
computer vision processing block 146 uses the corners Al, A2, A3 and A4 of the
17 landing
508, whose physical world coordinates (xwl,ywi,zwi), (xw2, Yw2, zwi), (xw3,
18 yw3,
zwi) and (xwa, Ywa, zwi), respectively, are known with zwi also being the
elevation
19 of
plane L1, to determine a perspective-transformation matrix H1 for plane L1 in
the
imaging device. Fig. 11F shows a background image 510 captured by the imaging
21 device.
22 As
described above, the computer vision processing block 146 set zwi
23 to
zero, i.e., set the physical world coordinates of the corners Al, A2, A3 and
A4 to
82

CA 02934102 2016-06-22
1 (Xw1, Ywl, 0), (Xw2, Yw2,0), (xw3, Yw3, 0) and (xwa, Ywa, 0),
respectively, determines their
2 image coordinates (xci, yci), (xc2, Yc2), (xc3, yc3) and (xc4, yc,4),
respectively, in the
3 background image 510, and then determines a perspective-transformation
matrix H1
4 for plane L1 in the imaging device by using these physical world
coordinates (xwl,
Ywi, 0), (xw2, Yw2, 0), (Xw3, Yw3, 0) and (xwa, Ywa, 0), and corresponding
image
6 coordinates (xci, Yci), (xc2, Ye2), (xc3, )1c3) and (xcA., ye,4)=
7 Also shown in Figs. 11E and 11F, the computer vision processing
8 block 146 uses the four corners Q1, Q2, Q3 and Q4 of a tile 506A to
determine a
9 perspective-transformation matrix H2 for plane L2 in the imaging device
in a similar
manner.
11 After determining the perspective-transformation matrices H1 and
H2,
12 the computer vision processing block 146 starts to track mobile objects
in the site
13 102. As shown in Fig. 12A, the imaging device captures an image 512, and
the
14 computer vision processing block 146 identifies therein an FFC 514 with
a bounding
box 516, a centroid 518 and a BBTP 520. The computer vision processing block
16 146 determines that the BBTP 520 is within the plane L2, and then uses
equation (5)
17 with the perspective-transformation matrix H2 and the coordinates of the
BBTP 520
18 in the captured image 512 to calculate the x- and y-coordinates of the
BBTP 520 in
19 the 3D physical coordinate system of the site portion 502 (Fig. 12B). As
shown in
Fig. 120, the computer vision processing block 146 may further translate the
21 calculated x- and y-coordinates of the BBTP 520 to a pair of x- and y-
coordinates of
22 the BBTP 520 in the site 102.
23
83

CA 02934102 2016-06-22 =
VI. FFC tracking
2 The
network arbitrator component 148 updates FFC-tag association
3 and the
computer vision processing block 146 tracks an identified mobile object at
4 step
236 of Fig. 5B. Various mobile object tracking methods are readily available
in
different embodiments.
6 For
example, in one embodiment, each FFC in captured image stream
7 is
analyzed to determine FFC characteristics, e.g., the motion of the FFC. If the
FFC
8 cannot
be associated with a tag device without the assistance of tag measurements,
9 the
network arbitrator component 148 requests candidate tag devices to obtain
required tag measurements over a predefined period of time. While the
candidate
11 tag
devices are obtaining tag measurements, the imaging devices continue to
12 capture
images and the FFCs therein are further analyzed. The network arbitrator
13 component 148 then calculates the correlation between the determined FFC
14
characteristics and the tag measurements received from each candidate tag
device.
The FFC is then associated with the tag device whose tag measurements exhibit
16 highest correlation with the determined FFC characteristics.
17 For
example a human object in the FOV of the imaging device walks
18 for a
distance along the x-axis of the 2D image coordinate system, pauses, and
19 then
turns around and walks back retracing his path. The person repeats this
walking pattern for four times. The imaging device captures the person's
walking.
21
Fig. 13 shows a plot of the BBTP x-axis position in captured images.
22 The
vertical axis represents the BBTP's x-axis position (in pixel) in captured
images,
23 and the
horizontal axis represents the image frame index. It can be expected that, if
84

CA 02934102 2016-06-22
1 the accelerometer in the person's tag device records the acceleration
measurement
2 during the person's walking, the magnitude of the acceleration will be
high when the
3 person is walking, and when the person is stationary, the magnitude of
the
4 acceleration is small. Correlating the acceleration measurement with FFC
observation made from captured images thus allows the system 100 to establish
6 FFC-tag association with high reliability.
7 Mapping an FFC from the 2D image coordinate system into the 3D
8 physical world coordinate system may be sensitive to noise and errors in
9 analyzation of captured images and FFC detection. For example, mapping
the
BBTP and/or the centroid of an FFC to the 3D physical world coordinate system
of
11 the site may be sensitive to errors such as the errors in determining
the BBTP and
12 centroid due to poor processing of shadows; mobile objects may occlude
each other;
13 specular lighting results in shadow distortions that may cause more
errors in BBTP
14 and centroid determination. Such errors may cause the perspective
mapping from a
captured image to the 3D physical world coordinate system of the site noisy,
and
16 even unreliable in some situations.
17 Other mobile object tracking methods using imaging devices exploit
18 the fact that the motions of mobile objects are generally smooth across
a set of
19 consecutively captured images, to improve the tracking accuracy.
With the recognition that perspective mapping may introduce errors, in
21 one embodiment, no perspective mapping is conducted and the computer
vision
22 processing block 146 tracks FFCs in the 2D image coordinate system. The
23 advantage of this embodiment is that the complexity and ambiguities of
the 2D to

CA 02934102 2016-06-22
1 3D perspective mapping is avoided. However, the disadvantage is that the
object
2 morphing as the object moves in the camera FOV may give rise to errors in
object
3 tracking. Modelling object morphing may alleviate the errors caused
therefrom, but it
4 requires additional random variables for unknown parameters in the
modelling of
object morphing or additional variables as ancillary state variables,
increasing the
6 system complexity.
7 In another embodiment, the computer vision processing block 146
8 uses an extended Kalman filter (EKF) to track mobile 'objects using the
FFCs
9 detected in the captured image streams. When ambiguity occurs, the
computer
vision processing block 146 requests candidate tag devices to provide tag
11 measurements to resolve the ambiguity. In this embodiment, the random
state
12 variables of the EKE are the x- and y-coordinates of the mobile object
in the 3D
13 physical world coordinate system following a suitable random motion
model such as
14 a random walk model if the mobile object is in a relatively open area,
or a more
deterministic motion model with random deviation around a nominal velocity if
the
16 mobile object is in a relatively directional area, e.g., as a hallway.
17 Following the EKF theory, observations ,are made on discrete time
18 steps, each time step corresponds to a captured image. Each observation
is the
19 BBTP of the corresponding FFC in a captured image. In other words, the x-
and y-
coordinates of the mobile object in the 3D physical world coordinate system
are
21 mapped to the 2D image coordinate system, and the compared with the BBTP
22 using EKF for predicting the motion of the mobile object.
86
=

CA 02934102 2016-06-22
1 Mathematically, the random state variables, collectively denoted
as a
2 state vector, for the nth captured image of a set of consecutively
captured images
3 is:
sn = kw,n,Yw,nr., (6)
4 where [-] represents a matrix, and NT represents matrix transpose. The
BBTP of
corresponding FFC is thus the observation of sn in captured images.
6 In the embodiment that the motion of the mobile object is modelled
as
7 random walk, the movement of each mobile object is modelled as an
independent
8 first order Markov process with a state vector of sr,. Each captured
image
9 corresponds to an iteration of the EKF, wherein a white or Gaussian noise
is added
to each component Xw,n, Yw,n Of sn. The state vector sn is then modelled based
on a
11 linear Markov Gaussian model as:
sn = Asn_i + Bun, (7)
12 with
13 and un being a Gaussian vector with the update covariance of
n-2 0 1
Qn = E[unuTi] =(8)
14
14 In other words, the linear Markov Gaussian model may be written
as:
ixw ,n = xw,n-1 ux,n
Yw,n = Yw,n-1 uy ,n (9)
where .
rux,ni _N (roi , r 0
,0, 0 o_u2D, (10)
,u ]
y ,n
87

CA 02934102 2016-06-22
=
1 i.e., each of ux,n and uy,n is a zero-mean normal distribution with a
standard
2 deviation of GU.
3 Equation (7) or (9) gives the state transition function. The
values of
4 matrix A and B in Equation (7) depends on the system design parameters
and the
characteristics of the site 102. In this embodiment,
A = B = oi
Lo (11)
6
7 The state vector sn is mapped to a position vector [)(c,n, Yc,nfr in the
20
8 image coordinate system of the capture image using perspective mapping
9 (equations (1) to (3)), i.e.,
rx,n xw,n
fy,n1= H [Yw,n1 (12)
fv,n 1
fx,n
xc n = , (13)
fv,n
fy,n
Yc,n =(14)
v,n
Then, the observation, i.e., the position of the BBTP in the 20 image
11 coordinate system, can be modelled as:
= h(s) + wn, (15)
12 where zn = [z1, z2]T is the coordinates of the BBTP with z1 and z2
representing the x-
13 and y-coordinates thereof,
[h(S)1 _ rxc,n1 _ [fx,ni fv,1 (16)
h(s) =
[hy(sn) 13/c,n1 fy,n1 fv,
88

CA 02934102 2016-06-22
1 is a nonlinear perspective mapping function, which may be approximated
using a
2 first order Talylor series thereof, and
Wx,n 0
w ] [ 2
Crz
0
n = [ wy ¨N ([ 0 , 2 (17)
1),
0 0-,
3 i.e., each of the x-component wx,n and the y-component wy,n of the noise
vector wn is
4 a zero-mean normal distribution with a standard deviation of az.
The EKF can then be started with the state transition function (7) and
6 the observation function (15). Fig. 14 is a flowchart 700 showing the
steps of mobile
7 object tracking using EKF.
8 At step 702, to start the EKF, the initial state vector s(010) and
the
9 corresponding posteriori state covariance matrix, M(010), are determined.
The initial
state vector corresponds to the location of a mobile object before the imaging
11 device captures any image. In this embodiment, if the location of a
mobile object is
12 unknown, its initial state vector is set to be at the center of the FOV
of the imaging
13 device with a zero velocity, and the corresponding posteriori state
covariance matrix
14 M(010) is set to a diagonal matrix with large values, which will force
the EKF to
disregard the initial information and base the first iteration entirely on the
FFCs
16 detected in the first captured image. On the other hand, if the location
of a mobile
17 object is unknown, e.g., via a RFID device at an entrance as described
above, the
18 initial state vector s(010) is set to the known location, and the
corresponding
19 posteriori state covariance matrix M(010) is set to a zero matrix (a
matrix with all
elements being zero).
21 At step 704, a prediction of the state vector is made:
89

CA 02934102 2016-06-22
= .
s (ran - 1) = s(n - tin - 1). (18)
1 At step 706, the prediction state covariance is determined:
M (nin ¨ 1) = M (n ¨ 1In ¨ 1) + Qu (19)
2 where
=
Qu = [(ILI. 0 ]
j. (20)
I_ 0 a
3 At step 708, the Kalman gain is determined:
K (n) = Wnin ¨ 1)H (n)T (H (n) M (nin ¨ 1)H (n)T + Qw)-1- (21)
4 where H(n) is the Jacobian matrix of h(s(nin-1)),
Fah.,(s(nin ¨ 1)) ali.,(s(nin ¨ 1)) . .
axõ aymn
a hy(s(nin - 1)) ahy(s(nin - 1)) (22)
a Yw,n
At step 710, prediction correction is conducted. The prediction error is
6 determined based on difference between the predicted location and the
BBTP
7 location in the captured
image: =
2n = - h(s (nin ¨ 1)). (23)
z,n
8 Then, the updated state estimate is given as:
s(nin) = s(nin - 1) + K (n)2n. (24)
9 At step 712, the posterior state covariance is calculated as:
M (nin) = (I ¨ K (n)H (n))M (nin ¨ 1), (25)
with I representing an identity matrix.
11 An issue of using the random walk model is that mobile object
tracking
12 may fail when the object is occluded. For example, if a mobile object
being tracked
=

CA 02934102 2016-06-22
1 is occluded in the FOV of the imaging device, the EKF would receive no
new
2 observations from consequent images. The EKF tracking would then stop at
the last
3 predicted state, which is the state determined in the. previous
iteration, and the
4 Kalman gain will go instantly to zero (0). The tracking thus stops.
This issue can be alleviated by choosing a different 2D model of pose
6 being a random walk model and using the velocity magnitude (i.e., the
speed) as an
7 independent state variable. The speed will also be a random walk but with
a
8 tendency towards zero (0), i.e., if no observations are made related to
speed then it
9 will exponentially decay towards zero (0).
Now consider the EKF update when the object is suddenly occluded
11 such that there are no new measurements. In this case speed state will
slowly
12 decay towards zero with settable decay parameter, but generally with high
13 probability. When the object emerges from the occlusion, it would not be
too far
14 from the EKF tracking point such that, with the restored measurement
quality,
accurate tracking can resume. The velocity decay factor used in this model is
16 heuristically set based on the nature of the moving objects in the FOV.
For example,
17 if the mobile objects being tracked are travelers moving in an airport
gate area, the
18 change in velocity of bored travelers milling around killing time will
be higher and
19 less predictable than people walking purposively down a long corridor.
As each
imaging device is facing an area with known characteristics, model parameters
can
21 be customized and refined according to the known characteristics of the
area and
22 past experience.
91
=

CA 02934102 2016-06-22
1 Those
skilled in the art appreciate that the above EKF tracking is
2 merely
one example of implementing FFC tracking, and other tracking methods are
3 readily
available. Moreover, as FFC tracking is conducted in the computer cloud
4 108,
the computational cost is generally of less concern, and other advanced
tracking methods, such as Bayesian filters, can be used. If the initial
location of a
6 mobile
object is accurately known, then a Gaussian kernel. may be used. However,
7 if a
mobile object is likely in the FOV but its initial location of is unknown, a
particle
8 filter
(PF) may be used, and once the object becomes more accurately tracked, the
9 PF can
be switched to an EKF for reducing computational complexity. When
multiple mobile objects are continuously tracked, computational resources can
be
11 better
allocated by dynamically switching object tracking between PF and EKF, i.e.,
12 using
EKF to track the mobile objects that have been tracked with higher accuracy,
13 and
using PF to track the mobile objects not yet being tracked, or being tracked
but
14 with low accuracy.
A limitation of the EKF as established earlier is that the site map is not
16 easily
accounted for. Neither are the inferences which are only very roughly
17 approximated as Gaussian as required for the EKF.
18 In an
alternative embodiment, non-parametric Bayesian processing is
19 used for FFC tracking by leveraging the knowledge of the site.
In this embodiment, the location of a mobile object in room 742 is
21
represented by a two dimensional probability density function (PDF) p,,,y. If
the area
22 in the
FOV of an imaging device is finite with plausible boundaries, the area is
23
discretized into a grid, and each grid point is considered to be a possible
location for
92

CA 02934102 2016-06-22
1 mobile
objects. The frame rates of the imaging devices are sufficiently high such
2 that, from one captured image to the next, a mobile object would appear
therein
3 either
stay at the same grid point or move from a grid point to an adjacent grid
point.
4 Fig.
15A shows an example of two imaging devices CA and CB with
overlapping FOVs covering an L-shaped room 742. As shown, the room 742 is
6
connected to rooms 744 and 746 via doors 748 and 750, respectively. Rooms 744
7 and 746
are uncovered by imaging devices CA and CB. Moreover, there exist areas
8 752
uncovered by both CA and CB. An access point (AP) is installed in this room
9 742 for sensing tag devices using RSS measurement.
When a mobile object having a tag device enters room 742, the RSS
11
measurement indicates that a tag device/mobile object is in the room. However,
12 before
processing any captured images, the location of the mobile device is
13 unknown.
14 As
shown in Fig. 15B, the area of the room 742 is discretized into a
grid having a plurality of grid points 762, each representing a possible
location for
16 mobile
objects. In this embodiment, the distance between two adjacent grid points
17 762
along the x- or y- axis is a constant. In other words, each grid point may be
18
expressed as: (iAx, jAy) with Ax and Ay being constants and i and j being
integers.
19 Ax and Ay are design parameters that depend on the application and
implementation.
21 The
computer vision processing block 146 also builds a state diagram
22 of the
grid points described the transition of a mobile object from one grid point to
23
another. The state diagram of the grid points is generally a connected graph
whose
93

CA 02934102 2016-06-22
1 properties change with observations made from the imaging device and the
tag
2 device. A state diagram for room 742 would be too complicated to show
herein. For
3 ease of illustration, Fig. 16A shows an imaginary, one-dimensional room
partitioned
4 to 6 grid points, and Fig. 16B shows the state diagram for the imaginary
room of Fig.
16A. In this example, the walls are considered reflective, i.e., a mobile
object in grid
6 point 1 can only choose to stay therein or move to grid point 2, and a
mobile object
7 in grid point 6 can only choose to stay therein or move to grid point 5.
8 Referring back to Figs. 15A and 15B, as the room 742 is
discretized
9 into a plurality of grid points 762, the computer vision processing block
146
associates a belief probability with each grid point as the possibility that
the mobile
11 object to be tracked is at that point. The computer vision processing
block 146 then
12 considers that the motion of mobile objects follows a first order Markov
model, and
13 uses a Minimum Mean Square Error (MMSE) location estimate method to
track the
14 mobile object.
Let pf j denote the location probability density function (pdf) or
16 probability mass function (pmf) that the mobile object is at the
location (OA, jAy) at
17 the time step t. Initially, if the location of the mobile object is
unknown, the location
18 pdf pf:j is set to be uniform over all grid points, i.e.,
0 1
pid = x¨y , for i = 1, ..., X, and] = 1, Y
(26)
19 where X is the number of grid points along the x-axis and Y is the
number of grid
points along the y-axis.
94

CA 02934102 2016-06-22
1 Based
on the Markov model, pr.1 is only dependent on the previous
2
probability pfil, the current update and the current BBTP position zt, pit .1
may be
3
computed using a numerical procedure. The minimum variance estimate of the
4 mobile object location is then based on the mean of this pdf.
From one time step to the next, the mobile object may stay at the
6 same
grid point or move to one of the adjacent grid points, each of which is
7
associated with a transition probability. Therefore, the expected (i.e., not
yet
8
compared with any observations) transition of the mobile object from time step
t to
9 time
step t+1, or equivalently, from time step t-1 to time.step t, may be described
by
a transition matrix consisting of these transition probabilities:
pL = Tpt-1, (27)
11 where
pf, is a vector consisting of expected location pdfs at time step t, pt-1 is a
12 vector
consisting of the location pdfs KJ at time step t-1, and T is the state
transition
13 matrix.
14 Matrix
T describes the probabilities that mobile object transiting from
one grid point to another. Matrix T describes boundary conditions, including
16
reflecting boundaries and absorbing boundaries. A reflecting boundary such as
a
17 wall
means that a mobile object has to turn back when approaching the boundary.
18 An
absorbing boundary such as a door means that a 'mobile object can pass
19 therethrough, and the probability of being in the area diminishes
accordingly.
When an image of the area 742 is captured and a BBTP is determined
21
therein, the location of the BBTP is mapped via perspective mapping to the 3D
22
physical world coordinate system of the area 742 as an observation. Such an

CA 02934102 2016-06-22
1 observation may be inaccurate, and its pdf, denoted as pnp,i, j, may be
modelled
2 as a 2D Gaussian distribution.
3 Therefore, the location pdfs p, or the matrix rot thereof, at time
step t
4 may be updated from that at time step t-1 and the BBTP observation as:
Pt = 7/Pit3sTpTPL-1, (28)
where Pit3BTp is a vector of P[3BTp,i j at time step t, and rl is a scaler to
ensure the
6 updated location pdf pTj can be added to one (1).
7 Equation (28) calculates the posterior location probability pdf pt
based
8 on the BBTP data obtained from the imaging device. The peak or maximum of
the
9 updated pdf p1, or pt in matrix form, indicates the most likely location
of the mobile
object. In other words, if the maximum of the updated pdf p1 is at i=ik and
j=jk, the
11 mobile object is most likely at the grid point (ikAx, jkAy). With more
images being
12 captured, the mobile location pdf pj is further updated using equation
(28) to obtain
13 updated estimate of the mobile object location.
14 With this method, if the BBTP is of high certainty then the
posterior
location probability pdf pt quickly becomes a delta function, giving rise to
high
16 certainty of the location of the mobile object.
17 For example, if a mobile object at (ix, jay) is static from time
step t =
18 1 to time step t = k, then equation (28) becomes
= Hmt3BTp,iipp,i, (29)
96

CA 02934102 2016-06-22
1 which becomes a "narrow spike" with the peak at (i,j) after several
iterations, and
2 the variance of the MMSE estimate of the object location diminishes.
3 Figs. 17A and 17B show a deterministic example, where a mobile
4 object is moving to the right hand side along the x-axis in the FOV of an
imaging
device. Fig. 17A is the state transition diagram, showing that the mobile
object is
6 moving to the right with probability of one (1). The computer vision
processing block
7 146 tests the first assumption that the mobile object is stationary and
the second
8 assumption that the mobile object is moving, by using a set of
consecutively
9 captured image frames and equation (28). The test results are show in
Fig. 17B. As
can be seen, while at first several image frames or iterations, both
assumptions
11 show similar likelihood, the assumption of a stationary object quickly
diminishes to
12 zero probability but the assumption of a moving object grows to a much
higher
13 probability. Thus, the computer vision processing block 146 can decide
that the
14 object is moving, and may request candidate tag devices to provide IMU
measurements for establishing FFC-tag association. =
16 Figs. 18A to 18E show another example, where a mobile object is
17 slewing, i.e., moving with uncertainty, to the right hand side along the
x-axis in the
18 FOV of an imaging device. Fig. 18A is the state transition diagram,
showing that, in
19 each transition from one image to another, the mobile object may stay at
the same
grid point with a probability of q, and may move to the adjacent grid point on
the
21 right hand side with a probability of (1-q). Hence the average slew
velocity is:
Ax
Vavg = (1 ¨ q) ¨At' (30)
97
=

CA 02934102 2016-06-22
1
2 Figs.
18B and 18C show the tracking results using equation (28) with
3 q =
0.2. Fig. 18B shows the mean of x- and y-coordinates of the mobile object,
4 which
accurately tracked the movement of the mobile object. Fig. 18C shows the
standard deviation (STD) of x- and y-coordinates of the mobile object, denoted
as
6 STDx
and STDy. As can be seen, both STDx and STDy start with a high value
7
(because the initial location PDF is uniformly distributed). STDy quickly
reduced to
8 about
zero (0) because, in this example, no uncertainty exists along the y-axis
9 during
mobile object tracking. STDx quickly reduced from a large initial value to a
steady state with a low but non-zero probability due to the non-zero
probability q.
11 Other
grid based tracking methods are also readily available. For
12
example, instead of using a Gaussian model for the BBTP, a different model
13
designed with consideration of the characteristics of the site, such as its
geometry,
14
lighting and the like, and the FOV of the imaging device may be used to
provide
accurate mobile object tracking.
16 In
above embodiment, the position (x, y) of the mobile object is used
17 as the
state variables. In an alternative embodiment, the position (x, y) and the
18
velocity (vx, vy) of the mobile object are used as the sta. te=variables. In
yet another
19 embodiment, speed and pose may be used as state variables.
In above embodiments, the state transition matrix T is determined
21
without assistance of any tag devices. In an alternative embodiment, the
network
22 arbitrator component 148 requests tag devices to provide necessary tag
23
measurement for assistance in determining the state transition matrix T. Fig.
19 is a
98

CA 02934102 2016-06-22
1 schematic diagram showing the data flow for determining the state
transition matrix
2 T. The computer vision processing block uses computer vision technology to
3 process (block 802) images captured from imaging devices 104, and tracks
(block
4 804) FFC using above described BBTP based tracking. The BBTPs are sent to
the
network arbitrator component 148, and the network arbitrator component 148
6 accordingly requests tag arbitrator components 146. to provide necessary
tag
7 measurements. A state transition matrix T is then generated based on
obtained tag
8 measurements, and is sent to the computer vision processing block 146 for
mobile
9 object tracking.
The above described mobile object tracking using a first order Markov
11 model and grid discretization is robust and computationally efficient.
Ambiguity
12 caused by object merging/occlusion may be resolved using a prediction-
observation
13 method (described later). Latency in mobile object tracking (e.g., due
to the
14 computational load) is relatively small (e.g., several seconds), and is
generally
acceptable.
16 The computer vision processing structure 146 provides information
17 regarding the FFCs observed and extracts attributes thereof, including
observables
18 such as the bounding box around the FFC, color histogram, intensity,
variations
19 from one image frame to another, feature points within the FFC,
associations of
adjacent FFCs that are in a cluster and hence are part of the same mobile
object,
21 optical flow of the FFC and velocities of the feature points,
undulations of the overall
22 bounding box and the like. The observables of the FFCs are stored for
facilitating, if
23 needed, the comparison with tag measurements.
99

CA 02934102 2016-06-22
1 For
example, the computer vision processing structure 146 can
2 provide
a measurement of activity of the bounding box of an FFC, which is used to
3 compare
with similar activity measurement obtained the tag device 114. After
4
normalization a comparison is made resulting in a numerical value for the
likelihood
indicating whether the activity observed by the computer vision processing
structure
6 146 and
tag device 114 are the same. Generally a Gaussian weighting is applied
7 based
on parameters that are determined experimentally. As another example, the
8
position of the mobile object corresponding to an FFC in the site, as
determined via
9 the
perspective mapping or transformation from the captured image, and the MMSE
estimate of the mobile object position can be correlated with observables
obtained
11 from
the tag device 114. For instance, the velocity observed from the change in the
12
position of a person indicates walking, and the tag device reveals a gesture
of
13 walking
based on IMU outputs. However, such as gesture may be weak if the tag
14 device
is attached to the mobile object in such a manner that the gait is weakly
detected, or may be strong if the tag device is located in the foot of the
person.
16 Fuzzy
membership functions can be devised to represent the gesture. This fuzzy
17 output
can be compared to the computer vision analysis result to determine the
18 degree
of agreement or correlation of the walking activity. In some embodiments,
19 methods based on fuzzy logic may be used for assisting mobile object
tracking.
In another example, the computer vision processing structure 146
21
determines that the bounding box of an FFC has become stationary and then
22 shrunk
to half the size. The barometer of a tag device reveals a step change in
23 short
'term averaged air pressure commensurate with an altitude change of about
100
=

CA 02934102 2016-06-22
=
1 two feet. Hence the tag measurement from the tag device's barometer would
2 register a sit down gesture of the mobile object. However, due to noise
and
3 barometer drift as well as spurious changes in room air pressure the
gesture is
4 probabilistic. The system thus correlates the tag measurement and
computer vision
analysis result, and calculates a probability representing the degree of
certainty that
6 the tag measurement and computer vision analysis result match regarding the
7 sitting activity.
8 With above examples, those skilled in the art appreciate that, the
9 system determines a degree of certainty of a gesture or activity based on
the
correlation between the computer vision (i.e., analysis of captured images)
and the
11 tag device (i.e., tag measurements). The set of such correlative
activities or
12 gestures are then combined and weighted for calculating the certainty,
represented
13 by a probability number, that the FFC may be associated with the tag
device.
14
Object merging and occlusion
16 Occlusion may occur between mobile objects, and between a mobile
17 object and a background object. Closely positioned mobile objects may be
detected
18 as a single FFC.
19 Figs. 20A to 20E show an example of merging/occlusion of two
mobile
objects 844 and 854. As shown in Fig. 20A, the two mobile objects 844 and 854
are
21 sufficiently apart and they show in a captured image 842A as separate
FFCs 844
22 and 854, having their own bounding box 846 and 856 and BBTPs 848 and
858,
23 respectively.
101

CA 02934102 2016-06-22
1 As
shown in Figs. 20B to 200, when mobile objects 844 and 854 are
2 moving
close to each other, they are detected as a single FFC 864 with a bounding
3 box 866
and a BBTP 868. The size of the single FFC 854. may vary depending the
4 occlusion between the two mobile objects and/or the distance therebetween.
Ambiguity may occur as it may appear that the two previously detected mobile
6 objects
844 and 854 disappear with a new mobile object 864 appearing at the same
7 location.
8 As
shown in Fig. 20E, when the two mobile objects have moved apart
9 with
sufficient distance, two FFCs are again detected. Ambiguity may occur as it
may appear that the previously detected mobile object 864 disappears with two
new
11 mobile objects 844 and 854 appearing at the same location.
12 Figs.
21A to 21E show an example that a mobile object is occluded by
13 a background object.
14 Fig.
21A shows a background image 902A having a tree 904 therein
as a background object.
16 A
mobile object 906A is moving towards the background object 904,
17 and
passes the background object 904 from behind. As shown in Fig. 21B, in the
18 captured image 902B, the mobile object 906A is not yet occluded by the
19
background object 904, and the entire image of mobile object 906 is detected
as an
FFC 906A with a bounding box 908A and a BBTP 910A. In Fig. 21C, the mobile
21 object
906A is slightly occluded by the background object 904 and the FFC 906A,
22
bounding box 908A and BBTP 910A are essentially the same as those detected in
23 the image 902B (except position difference).
102

CA 02934102 2016-06-22
1 In Fig.
21D, the mobile object is significantly occluded by the
2
background object 904. The detected FFC 906B is now significantly smaller that
the
3 FFC
906A in images 902B and 902C. Moreover, the BBTP 910B is at a much
4 higher
position than 910A in images 902B and 902C. Ambiguity may occur as it
may appear that the previously detected mobile object 906A disappears and a
new
6 mobile object 906B appears at the same location.
7 As
shown in Fig. 21E, when the mobile object 906A walks out of the
8
occlusion of the background object 904, a "full" FFC 906A much larger than
9 FFC
906B is detected. Ambiguity may occur as it may appear that the previously
detected mobile object 906B disappears and a new mobile object 906A appears at
11 the same location.
12 As
described before, the frame rate of the imaging device is
13
sufficiently high, and the mobile object movement is therefore reasonably
smooth.
14 Then, ambiguity caused by object merging/occlusion can be resolved by a
prediction-observation method, i.e., predicting the action of the mobile
object and
16
comparing the prediction with observation obtained from captured images and/or
17 tag devices.
18 For
example, the mobile object velocity and/or trajectory may be used
19 as
random state variables, and above described tracking methods may be used for
prediction. For example, the system may predict the locations and time
instants that
21 a
mobile object may appear during a selected period of future time, and monitor
the
22 FFCs
during the selected period of time. If the FFCs appear to largely match the
23
prediction, e.g., the observed velocity and/or trajectory highly correlated
with the
103
=

CA 02934102 2016-06-22
1 prediction (e.g., their correlation higher than a predefined or
dynamically set
2 threshold), then the FFCs are associated with the same tag device even if
in some
3 moments/images abnormity of FFC occurred, such as size of the FFC
significantly
4 changed, BBTP significantly moved off the trajectory, FFC disappeared or
appeared,
and the like.
6 If the ambiguity cannot be resolved solely from captured images,
tag
7 measurements may be requested to obtain further observation to resolve
the
8 ambiguity.
9
VII. Some alternative embodiments
11 In an alternative embodiment, the system 100 also comprises a map
12 of magnetometer abnormalities (magnetometer abnormality map). The system
may
13 request tag devices having magnetometers to provide magnetic
measurements and
14 compare with the magnetometer abnormality map for tracking resolving
ambiguity
occurred during mobile object tracking.
16 In above embodiments, tag devices 114 comprise sensors for
17 collecting tag measurements, and tag devices 114 transmit tag
measurements to
18 the computer cloud 108. In some alternative embodiments, at least some
tag
19 devices 114 may comprise a component broadcasting, continuously or
intermittently,
a detectable signal. Also, one or more sensors for detecting such detectable
signal
21 are deployed in the site. The one or more sensors detect the detectable
signal and
22 obtain measurements of one or more characteristics of the tag device
114, and
23 transit the obtained measurements to the computer cloud 108 for
establishing FF0-
104

CA 02934102 2016-06-22
1 tag association and resolving ambiguity. For example, in one embodiment,
each tag
2 device 114 may comprise an RFID transmitter transmitting an RFID
identity, and
3 one or more RFID readers are deployed in the site 102, e.g., at one or more
4 entrances, for detecting the RFID identity of the tag devices in
proximity therewith.
As another example, each tag device 114 may broadcast a BLE beacon. One or
6 more BLE access points may be deployed in the site 102, detecting the BLE
beacon
7 of a tag device, and determine an estimated location using RSS. The
estimated
8 location, although inaccurate, may be transmitted to the computer cloud
for
9 establishing FFC-tag association and resolving ambiguity.
11 VIII. Visual Assisted Indoor Location System (VAILS)
12 In an alternative embodiment, a Visual Assisted Indoor Location
13 System (VAILS) is modified from the above described systems and used for
14 tracking mobile objects in a site being a complex environment such as an
indoor
environment.
16
17 VIII-1. VAILS system structure
18 Similar to the systems described above, the VAILS in this
embodiment
19 uses imaging devices, e.g., security camera, and, if necessary, tag
devices for
tracking mobile objects in an indoor environment such as a building. Again,
the
21 mobile objects are entities moving or stationary in the indoor
environment. At least
22 some mobile objects are each associated with a mobile tag device such
that the tag
23 device generally undergoes the same activity as the mobile object it is
associated
105

CA 02934102 2016-06-22
1 therewith. Hereinafter, such mobile objects associated with tag devices
are
2 sometimes denoted as tagged objects, and objects with no tag devices are
3 sometimes denoted as untagged objects. While untagged objects may exist in
the
4 system, both tagged and untagged objects may be jointly tracked for higher
reliability.
6 While sharing many common features with the systems described
7 above, VAILS faces more tracking challenges such as identifying mobile
objects
8 more often entering and exiting the FOV of an imaging device and more
often being
9 occluded by background objects (e.g., poles, walls and the like) and/or
other mobile
objects, causing ambiguity.
11 In this embodiment, VAILS maintains a map of the site, and builds
a
12 birds-eye view of a building floor-space view generally, by recording
the locations of
13 mobile objects onto the map. Conveniently, the system comprises a birds-
eye view
14 processing sub-module (as a portion of a camera view processing and
birds-eye
view processing module, described below) for maintaining the birds-eye view of
the
16 site (denoted the "birds-eye view (BV)" hereinafter for ease of
description) and for
17 updating the locations of mobile objects therein based on the tracking
results. Of
18 course, such a birds-eye view module may be combined with any other
suitable
19 module(s) to form a single module have the combined functionalities.
The software and hardware structures of VAILS are similar to those of
21 the above described systems. Fig. 22 shows a portion of the functional
structure of
22 VAILS, corresponding to the computer cloud 108 of Fig. 2. As shown, the
computer
23 vision processing module 108 of Fig. 2 is replaced with a camera view
processing
106

CA 02934102 2016-06-22
1 and
birds-eye view processing (CV/BV) module 1002, having a camera view
2
processing submodule 1002A and a birds-eye view processing submodule 1002B.
3 The submodules are implemented using suitable programming languages and/or
4
libraries such as the OpenCV open-source computer vision library offered by
opencv.org, MATLABO offered by MathWorks, C++, and the like. Those skilled in
6 the art
appreciate that MATLABO may be used for prototyping and simulation of the
7 system, and C++ and/or OpenCV may be used for implementation in practice.
8
Hereinafter, the term "computer vision processing" is equivalent to the phrase
9 "camera
view processing" as the computer vision processing is for processing
camera-view images.
11 In some
alternative embodiments, the camera view processing and
12 birds-
eye view processing submodules 1002A and 1002B may be two separate
13 modules.
14 The
camera view processing submodule 1002A receives captured
image streams (also denoted as camera views hereinafter) from imaging devices
16 104,
processes captured image streams as described above, and detects FFCs
17
therefrom. The FFCs may also be denoted as camera view (CV) objects or blobs
18 hereinafter.
19 The
birds-eye view processing sub-module 1002B uses the site map
1004 to establish a birds-eye view of the site and to map each detected blob
into
21 the
birds-eye view as a BV object. Each BV object thus represents a mobile object
22 in the
birds-eye view, and may be associated with a tad device. In other words,
107

CA 02934102 2016-06-22
1 blobs
are in captured images (i.e., in camera view) and BV objects are in the birds-
2 eye view of the site.
3 As
shown in Fig. 23, a blob is associated with a tag device via a BV
4 object.
Of course, some BV objects may not be associated with any tag
6 devices
if their corresponding mobile object do not have any tag devices associated
7 therewith.
8
Referring back to Fig. 22, the blob and/or BV object attributes are sent
9 from
the CV/BV module 1002 to the network arbitrator 148 for processing and
solving any possible ambiguity.
11 Similar
to the description above, the network arbitrator 148 may
12 request
tag devices 114 to report observations, and use observations received from
13 tag
devices 114 and the site map 1004 to solve ambiguity and associate CV objects
14 with tag devices. The CV object/tag device associations are stored in a CV
object/tag device association table 1006. Of course, the network arbitrator
148 may
16 also
use the established CV object/tag device associations in the CV object/tag
17 device
association table 1006 for solving ambiguity. As will be described in more
18 detail
later, the network arbitrator 148 also leverages known initial conditions in
19 establishing or updating CV object/tag device associations.
After processing, the network arbitrator 148 sends necessary data,
21
including state variables, tag device information, and known initial
conditions
22 (described later) to the CV/BV module 1002 for updating the birds-eye
view.
108

CA 02934102 2016-06-22
1 In this
embodiment, the data representing the birds-eye view and
2 camera
view are stored and processed in a same computing device. Such an
3
arrangement avoids frequent data transfer (or, in some implementations, file
4
transfer) between the birds-eye view and camera views that may otherwise be
required. The CV/BV module 1002 and the network arbitrator 148, on the other
6 hand,
may be deployed and executed on separate computing devices for improving
7 the
system performance and for avoiding heavy computational load to be otherwise
8 applied
to a single computing device. As the data transfer between the CV/BV
9 module
1002 and the network arbitrator 148 is generally small, deploying the two
modules 1002 and 148 to separate computing devices would not lead to high data
11
transfer requirement. Of course, in embodiments that multi-core or multi-
processor
12
computing devices are used, the CV/BV module 1002 and the network arbitrator
13 148 may
be deployed on a same multi-processor computing device but executed as
14 separate threads for improving the system performance.
One important characteristic of an indoor site is that the site is usually
16 divided into a number of subareas, e.g., rooms, hallways, separated by
17
predetermined structural components such as walls. Each subarea has one or
more
18 entrances and/or exits.
19 Fig. 24
is a schematic illustration of an example site 1020, which is
divided into a number of rooms 1022, with entrances/exits 1024 connecting the
21 rooms 1022. The site configuration, including the configuration of rooms,
22
entrances/exits, predetermined obstacles and occlusion structures, is known to
the
23 system
and is recorded in the site map. Each subarea 1022 is equipped with an
109

CA 02934102 2016-06-22
1 imaging device 104. The FOV of each imaging device .104 is generally
limited with
2 the respective subarea 1022.
3 A mobile object 112 may walk from one subarea 1022 to another
4 through the entrances/exits 1024, as indicated by the arrow 1026 and
trajectory
1028. The cameras 104 in the subareas 1022 capture image streams, which are
6 processed by the CV/BV processing module 1002 and the network arbitrator
148 for
7 detecting the mobile object 112, mapping the detected mobile object 112
into a
8 birds-eye view as a BV object, and determining the trajectory 1028 for
tracking the
9 mobile object 112.
When a "new" blob appears in the images captured by an imaging
11 device 104, the system uses initial conditions that are likely related
to the new blob
12 to try to associate the new blob in the camera view with a BV object in
the birds-eye
13 view and with a mobile object (in the real world). Herein, the initial
conditions
14 include data already known by the system prior to the appearance of the
new blob.
Initial conditions may include data regarding tagged mobile devices, and may
also
16 include data regarding untagged devices.
17 For example, as shown in Fig. 25, a mobile object 112A enters room
18 1022A from the entrance 1024A and moves along the trajectory 1028
towards the
19 entrance 1024B.
The mobile object 112A, when entering room 1022A, appears as a
21 new blob (also referred using numeral 112A for ease of description) in
the images
22 captured by the imaging device 104A of room 1022A. As the new blob 112A
23 appears at the entrance 1024A, it is likely that the corresponding
mobile object
110

CA 02934102 2016-06-22
1
originated from the adjacent room 1022B, sharing the same entrance 1024A with
2 room 1022B.
3 As the
network arbitrator 148 is tasked with overall process control
4 and
tracking the object using the camera view and tag device observations as
input,
the network arbitrator 148 in this embodiment has tracked the object outside
of the
6 FOV of
the imaging device 104A (i.e., in room 1022B). Thus, in this example, when
7 the
mobile object 112A enters the FOV of the imaging device 104A, the network
8
arbitrator 148 checks if there exists known data prior to the appearance of
the new
9 blob
112A regarding a BV object in room 1022B disappearing from the entrance
1024A. If the network arbitrator 148 finds such data, the network arbitrator
148
11
collects the found data as a set of initial conditions and sends them as an IC
packet
12 to the
CV/BV processing module 1002, or in particular the camera view processing
13
submodule 1002A, and requests the camera view processing submodule 1002A to
14 track
the mobile object 112A, which is now shown in the FOV of the imaging device
104A as a new blob 112A in room 1022A.
16 The
CV/BV module 1002, or more particularly, the camera view
17
processing submodule 1002A, continuously processes the image streams captured
18 by the
imaging device 104A for detecting blobs (in this example, the new blob 112A)
19 and
pruning detected blobs for to establishing a blob/BV object, or a blob/BV
object/tag device association for the new blob 112A. For example, the blob
112A
21 may
exhibit in the camera view of imaging device 104A as a plurality of sub-blobs
22
repeatedly separating and merging (fission and fusion) due to the imperfection
of
23 image
processing. Such fission and fusion can be simplified by pruning. The
111

CA 02934102 2016-06-22
1 knowledge of the initial conditions allows the camera view processing
submodule
2 1002A to further prune and filter the blobs.
3 The pruned graph of blobs is then recorded in an internal blob
track
4 file (IBTF). The data in the IBTF records the history of each blob
(denoted as a blob
track), which may be used to construct a timeline history diagram such as Fig.
34
6 (described later), and is searchable by the birds-eye view processing
submodule
7 1002B or network arbitrator 148. However, the IBTF contains no
information that
8 cannot be abstracted directly from the camera-view image frames directly.
In other
9 words, the IBTF does not contain any information from the network
arbitrator 148 as
initial conditions, nor any information from the birds-eye view fed back to
the
11 camera view. As described above, the camera view processing submodule
1002A
12 processes captured images using background/foreground differentiation,
13 morphological operations and graph based pruning, and detects foreground
blobs
14 representing mobile objects such as human objects, robots and the like.
The
camera view stores all detected and pruned blob tracks in the IBTF. Thus, the
16 camera view processing submodule 1002A operates autonomously without
17 feedback from the network arbitrator 148 acts as an autonomous sensor,
which is
18 an advantage in at least some embodiments. On the other hand, a
disadvantage is
19 that the camera view processing submodule 1002A does not benefit from
the
information of the birds-eye view processing submodule 1002B or network
arbitrator
21 148.
22 The network arbitrator 148 tracks the tagged objects in a maximum
23 likelihood sense, based on data from the camera view and tag sensors.
Moreover,
112

CA 02934102 2016-06-22
1 the network arbitrator 148 has detailed information of the site stored in
the site map
2 of the birds-eye view processing submodule 1002B. In the example of Fig.
25, the
3 network arbitrator 148 puts together the initial conditions of the tagged
object 112A
4 entering the FOV of imaging device 104A, and requests the CV/BV processing
module 1002 to track the object 112A. That is, the tracking request is sent
from the
6 network arbitrator 148 via the initial conditions.
7 The birds-eye view processing submodule 1002B parses the initial
8 conditions and search for data of object 112A in the IBTF to start
tracking thereof in
9 room 1022A. When the birds-eye view processing submodule 1002B finds a
blob or
a set of sub-blobs that match the initial conditions, the birds-eye view
processing
11 submodule 1002B extracts the blob track data from the IBTF and places
extracted
12 blob track data into an external blob track file (EBTF). An EBTF record
is generated
13 for each request from the network arbitrator 148. In the example of Fig.
25, there is
14 only one EBTF record as there is only one unambiguous object entering
the FOV of
imaging device 104A. However, if the birds-eye view processing submodule 1002B
16 determines ambiguities resulting from other blob tracks then they can
also be
17 extracted into the EBTF.
18 In this embodiment, the system does not comprise any specific
19 identifier to identify whether a mobile object is a human, a robot or
another type of
object, although in some alternative embodiments, the system may comprise such
21 an identifier for facilitating object tracking.
22 The birds-eye view processing module 1002B processes the request
23 from the network arbitrator 148 to track the blob identified in the
initial conditions
=
113

CA 02934102 2016-06-22
1 passed from the network arbitrator 148. The birds-eye view processing
module
2 1002B also processes the IBTF with the initial conditions and the EBTF.
The birds-
3 eye view processing module 1002B computes the perspective transformation
of the
4 blob in the EBTF and determines the probability kernel of where the
mobile object is.
The birds-eye view processing module 1002B also applies constraints of the
6 subarea such as room dimensions, locations of obstructions, walls and the
like, and
7 determines the probability of the object 112A exiting the room 1022A
coincident with
8 a blob annihilation event in the EBTF. The birds-eye view processing
module 1002B
9 divides the subarea into a 20 floor grid as describe before, and
calculates a 2D floor
grid probability as a function of time, stored in an object track file (OTF).
The OTF is
11 then made available to the network arbitrator 148. The data flow between
the
12 imaging device 104A, camera view processing submodule 1002A, IBTF 1030,
birds-
13 eye view processing submodule 1002B, the network arbitrator 148, EBTF
1034 and
14 OTF 1034 is shown in Fig. 26.
The above described process is an event driven process and is
16 updated in real time. For example, when the network arbitrator 148
requires an
17 update, the birds-eye view processing submodule 1002B then assembles a
partial
18 EBTF based on the accrued data in the IBTF, and provides an estimate of
location
19 of the mobile object to the network arbitrator 148. The above described
processes
can track mobile objects with a latency of a fraction of a second.
21 Referring back to Fig. 25, the camera view processing submodule
22 1002A detects and processes the blob 112A as the mobile object 112A
moves in
23 room 1022A from entrance 1024A to entrance 1024B. The birds-eye view
114

CA 02934102 2016-06-22
1
processing module 1002B records the mobile object's trajectory 1028 in the
birds-
2 eye view.
3 In the
example of Fig. 25, there are no competing blobs in the image
4 frames
captured by the imaging device 104A and the image processing technology
used by the system is sufficiently accurate to avoid blob fragmentation, the
IBTF
6 thus
consists only a creation event and an annihilation event joined by a single
edge
7 that
has one or more image frames (see Fig. 31, described later). Also, as the
initial
8
conditions from the network arbitrator 148 is unambiguous regarding the tagged
9 object
112A, the IBTF has a single blob coincident with the initial conditions,
meaning no ambiguity. The EBTF is therefore the same as the IBTF.
11 The
birds-eye view processing submodule 1002B converts the blob in
12 the
camera view into a BV object in the birds-eye view, and calculates a floor
grid
13
probability, based on the subarea constraint, and the location of the imaging
device
14 (hence
the computed H matrix, described later). The probability of the BV object
location, or in other words, the mobile object location in the site, is
updated as
16 described before.
17 The OTF
comprises a summary description of the trajectory of each
18 object
location PDF as a function of time. The OTF is interpreted by the network
19
arbitrator 148, and registers no abnormalities or potential ambiguities. The
OTF is
used for generating the initial conditions for the next adjoining imaging
device FOV
21 subarea.
22 The
example of Fig. 25 shows an ideal scenario in which there exist
23 no
ambiguities in the initial conditions from the network arbitrator 148, and
there
115

CA 02934102 2016-06-22
=
1 exist no ambiguities in the camera view blob tracking. Hence the blob/BV
object/tag
2 device association probability remains at 1 throughout the entire period
that the
3 mobile object 112A moves from entrance 1024A to entrance 1024B until the
mobile
4 object exits from entrance 1024B.
When the mobile object disappears at entrance 1024B, the system
6 may use the data of the mobile object 112 at the entrance 1024B, or the
data of the
7 mobile object 112 in room 1022A, for establishing a blob/BV object/tag
device
8 association for a new blob appearing in room 1022C at the entrance 1024B.
9 As another example, if a new blob appearing in a subarea, e.g., a
room, but not adjacent any entrance or exit, the new blob may be a mobile
object
11 previously being stationary for a long time but now starting to move.
Thus, previous
12 data of mobile objects in that room may be used as initial conditions
for the new
13 blob.
14
VIII-2. Initial conditions
16 By determining and using initial conditions for a new blob
appearing in
17 the FOV of an imaging device, the network arbitrator 148 is then able to
solve
18 ambiguities that may occur in mobile object tracking. Such ambiguities
may arise in
19 many situations, and may not be easily solvable without initial
conditions.
Using Fig. 25 as an example, when the imaging device 104A captures
21 a moving blob 112A in room 1022A, and the system detects a tag device in
room
22 1022A, it may not be readily determinative to associate the blob 112A
with the tag
23 device due to possible ambiguities. In fact, there exist several
possibilities.
116

=
CA 02934102 2016-06-22
1 As
shown in Fig. 27A, one possibility is that there is indeed only one
2 tagged
mobile object 112B in room 1022A moving from entrance 1024A to the exit
3 1024B.
However, as shown in Fig. 27B, a second possibility is that an untagged
4 mobile
object 112C is moving in room 1022A from entrance 1024A to the exit
1024B, but there is also a stationary, tagged mobile object 112B in room 1022A
6 outside the FOV of the imaging device 104A.
7 The
possibility of Fig. 27B may be confirmed by requesting the tag
8 device
to provide motion related observations. If the tag device reports no
9
movement, then, the detected blob 112A must be an untagged mobile object 112C
in the FOV of the imaging device 104A, and there is also a tagged device 112B
in
11 the room 1022A, likely outside the FOV of the imaging device 104A.
12 On the
other hand, if the tag device reports movement, then, Fig. 27B
13 is
untrue. However, the system may still not be unable to confirm whether Fig.
27A
14 is true, as there exists another possibility as shown in Fig. 270.
As shown in Fig. 27C, there may be an untagged mobile object 112C
16 in room
1022A moving from entrance 1024A to the exit 1024B, and a tagged mobile
17 object 112B outside the FOV of the imaging device 104A and moving.
18
Referring back to Fig. 25B, the ambiguity between Figs. 27A and 27C
19 may be
solved by using the initial conditions likely related to blob 112A that the
system has previously determined in adjacent room 1022B. For example, if the
21 system
determines that the initial conditions obtained in room 1022B indicate that,
22
immediately before the appearance of blob 112A, an untagged mobile object
23
disappeared from room 1022B at the entrance 1024A, the system can easily
117

CA 02934102 2016-06-22
=
1 associate the new blob 112A with the untagged mobile object that has
disappeared
2 from room 1022B, and the tag device must be associated with a mobile
object not
3 detectable in images captured by the imaging device 104A.
4 It is worth to note that there still exists another possibility
that a tagged
mobile object 112B is moving in room 1022A from entrance 1024A to exit 1024B,
6 and there is also a stationary, untagged mobile object 112C in room 1022A
outside
7 the FOV of the imaging device 104A. Fig. 270 may be confirmed if previous
data
8 regarding an untagged mobile object is available; otherwise, the system
would not
9 be able to determine if there is any untagged mobile object undetectable
from the
image stream of the imaging device 104A, and simply ignore such possibilities.
11 Fig. 28 shows another example, in which a tagged mobile object
112B
12 moves in room 1022 from the entrance 1024A on the left-hand side to the
right-
13 hand side towards the entrance 1024B, and an untagged object 1120 moves
in
14 room 1022 from the entrance 1024B on the right-hand side to the left-
hand side
towards the entrance 1024A. The system knows that there is only one tag device
in
16 room 1022.
17 The imaging device 104 in room 1022 detects two blobs 112B and
18 1120, one of which has to be associated with the tag device. Both blobs
122B and
19 1120 show walking motion with some turnings.
Many information and observations may be used to associate the tag
21 device with one of the two blobs 112B and 1120. For example, the initial
conditions
22 may show that a tagged mobile object enters from the entrance 1024A on
the left-
23 hand side, and an untagged mobile object enters from the entrance 1024B
on the
118

CA 02934102 2016-06-22
1 right-
hand side, indicating that blob 112B shall be associated with the tag device,
2 and
blob 1120 corresponds to an untagged device. The accelerometer/rate gyro of
3 the IMU
may provide observations showing periodic activity matching the pattern of
4 the
walking activity of the blob 112B, indicating the same as above. Further,
short
term trajectory estimation based on IMU observations over time may be used to
6 detect
turns, which may then be used to compare with camera view detections to
7
establish above described association. Moreover, if the room 1022 is also
equipped
8 with a
wireless signal transmitter near the entrance 10248 on the right-hand side,
9 and the tag device comprises a sensor for RSS measurement, the RSS
measurement may also indicate an increasing RSS over time, indicating the blob
11 112B
approaching the entrance 1024B is a tagged mobile object. With these
12
example, those skilled in the art appreciate that, during the movement of
blobs
13 112B
and 1120 in the FOV of the imaging device 104, the system can obtain
14
sufficient motion related information and observations to determine that blobs
112B
and 1120, respectively, are tagged and untagged mobile objects with high
16 likelihood.
17 In some
embodiments, if the tag device is able to provide observations,
18 e.g.,
IMU observations, with sufficient accuracy, a trajectory may be obtained and
19
compared with the camera view detections to establish above described
association.
On the other hand, it may be difficult to obtain the trajectory with
21
sufficient accuracy using captured images due to the limited optical
resolution of the
22 imaging
device 104 and the error introduced in mapping the blob in captured image
23 to the
birds-eye view. In many practical scenarios, the images captured by an
119

CA 02934102 2016-06-22
1 imaging
device may only be used to reliably indicate which mobile object is in front
2 of others.
3 By
using relevant initial conditions, image streams captured by one or
4 more
imaging devices, and observations from tag devices, the system establishes
blob/BV object/tag device associations, tracks tagged mobile objects in the
site, and,
6 if
possible, tracks untagged mobile objects. An important target of the system is
to
7 track
and record summary information regarding the locations and main activities of
8 mobile
objects, e.g., which subareas and when the mobile objects have been to.
9 One may
then conclude a descriptive scenario story such as "the tagged object
#123 entered room #456 from port #3 at time t1 and exited port #5 at time t2.
Its
11 main
activity was walking". The detailed trajectory of a mobile object and/or
12
quantitative details of the trajectory may not be required in some alternative
13 embodiments.
14 When
ambiguity exists, the initial conditions from the network
arbitrator 148 may not be sufficient to affirmatively establish a blob/BV
object/tag
16 device
association. In other words, the probability of such a blob/BV object/tag
17 device
association is less than 1. In this situation, the birds-eye view processing
18
submodule 1002B then starts extracting the EBTF from the IBTF immediately, and
19 considers observations for object/tag device activity correlation.
For example, if the camera view processing module 1002A detects
21 that a
blob exhibits a constant velocity indicative of a human walking, the birds-eye
22 view
processing submodule 1002B then begins to fill the OTF with the information
23
obtained by the camera view processing submodule 1002A, which is the
information
120
=

CA 02934102 2016-06-22
1
observed by the imaging device. The network arbitrator 148 analyzes the
(partial)
2 OTF and
determines an opportunity for object/tag device activity correlation. Then,
3 the
network arbitrator 148 requests the tag device to provide observations such as
4 the
accelerometer data, RSS measurement, magnetometer data and the like. The
network arbitrator 148 also generates additional, processed data such as
6
walking/stationary activity classifier based on tag observations, e.g., the
IMU output.
7 The tag observations and the processed data generated by the network
arbitrator
8 based on tag observations have been described above. Below lists some of the
9 observations again for illustration purposes:
= walking activity ¨ network arbitrator processed gesture;
11 =
walking pace (compared to undulations of camera-view bounding
12 box);
13 = RSS
multipath activity commensurate with BV object velocity
14
calculated based on the perspective mapping of a blob in the
camera view to the birds-eye view.;
16 = RSS
longer term change commensurate with the RSS map (i.e., a
17 map of the site showing RSS distribution therein);
18 = rate gyro activity indicative of walking; and
19 =
magnetic field variations indicative of motion (no velocity may be
estimated therefrom).
21 The
network arbitrator 148 sends object activity data, which is the data
22
describing object activities, and may be tag observations or above described
data
=
121

=
CA 02934102 2016-06-22
1
generated by the network arbitrator 148 based on tag observations received tag
2 observations, to the birds-eye view processing submodule 1002B.
3 The
birds-eye view processing submodule 1002B then calculates
4 numeric
activity correlations between the object activity data and the camera view
observations, e.g., data of blobs. The calculated numeric correlations are
stored in
6 the OTF, forming correlation metrics.
7 The
network arbitrator 148 uses these correlation metrics and weights
8 them to
update the blob/BV object/tag device association probability. With sufficient
9 camera
view observations and tag observations, ambiguity can be resolved and the
blob/BV object/tag device association may be confirmed with an association
11
probability larger than a predefined probability threshold. Fig. 29 shows
the
12
relationship between the IBTF 1030, EBTF 1032, OTF 1034, Tag Observable File
13 (TOF)
1036 for storing tag observations, network arbitrator 148 and tag devices 114.
14 With
above description, those skilled in the art appreciate that the
camera view processing submodule 1002A processes image frames captured by
16 the
imaging devices to detect blobs and to determine the attributes of detected
17 blobs.
18 The
birds-eye view processing submodule 1002B does not directly
19 communicate with the tag devices. Rather, the birds-eye view processing
submodule 1002B calculates activity correlations based on the object activity
data
21
provided by the network arbitrator 148. The network arbitrator 148 checks the
partial
22 OTF,
and, based on the calculated activity correlations, determines if the BV
object
23 can be associated with a tag device.
122
=

CA 02934102 2016-06-22
1 Those
skilled in the art also appreciate that the network arbitrator 148
2 has an
overall connection diagram of the various subareas, i.e., the locations of the
3
subareas and the connections therebetween, but does not have the details of
each
4 of the
subareas. The details of the subareas are stored in the site map, and, if
available, the magnetometer map and the RSS map. These maps are fed to the
6 birds-eye view processing submodule 1002B.
7 When
relevant magnetometer and/or RSS data is available from the
8 tag
devices, the network arbitrator 148 can relay these data as tag observations
9 (stored
in the TOF 1036) to the birds-eye view processing submodule 1002B. As
the birds-eye view processing submodule 1002B knows the probability of the tag
11 device
being in a specific location, it can update the magnetometer and/or RSS map
12 accordingly.
13
Generally, the system can employ many types of information for
14
tracking mobile objects, including the image streams captured by the imaging
devices in the site, tag observations and initial conditions regarding mobile
objects
16
appearing in the FOV of each imaging device. In some embodiments, the system
17 may
further exploit additional constraints for establishing blob/BV object/tag
device
18
association and tracking mobile objects. Such additional constraints include,
but not
19 limited
to, realistic object motion constraints. For example, the velocity and
acceleration of a mobile object relative to a floor space cannot realistically
exceed
21 certain
predetermined limits. There may establish justifiable assumption of no object
22
occlusion in birds-eye view. In some embodiments, there may exist a plurality
of
23 imaging
devices with overlapping FOVs, e.g., monitoring a common subarea; the
=
123

CA 02934102 2016-06-22
1 image streams captured by these imaging devices thus may be collectively
2 processed to detect and track mobile objects with higher accuracy. The
site
3 contains barriers or constraints, e.g., walls, at known locations that
mobile objects
4 cannot realistically cross, and the site contains ports or
entrances/exits at known
locations allowing mobile objects to move from one subarea to another.
6 The above described constraints may be more conveniently
7 processed in the birds-eye view than in the camera view. Therefore, as
shown in
8 Fig. 29, the birds-eye view 1042 may be used as a hub for combining data
obtained
9 from one or more imaging devices or camera views 104, observations from
one or
more tag devices 104, and the constraints 1044, for establishing blob/BV
object/tag
11 device association. Some data such as camera view observations of
imaging device
12 104 and tag observations of tag devices 114 may be sent to the birds-eye
view
13 1042 via intermediate components such as the camera view processing
submodule
14 1002A and the network arbitrator 148, respectively. However, such
intermediate
components are omitted in Fig. 30 for ease of illustration.
16 With the information flow shown in Fig. 30, in a scenario of Fig.
27A
17 where the initial conditions indicate a tagged mobile object 112B
entering the
18 entrance 1024A with steady walking activity, no ambiguity arises. The
camera view
19 information, i.e., the blob 112B, and the tag device observations can be
corroborated with each other directly without the aid of the additional
constraints. In
21 other words, the camera view produces a single blob of very high
probability and
22 with no issue of blob association from one image frame to another. A
trajectory of
23 the corresponding mobile object is determined and mapped into the birds-
eye view
124

CA 02934102 2016-06-22
1 as an
almost deterministic path with small trajectory uncertainty. The CV/BV module
2 checks
the mapped trajectory to ensure that its correctness (e.g., the trajectory
3 does
not cross a wall). After determining the correctness of the trajectory, a BV
4 object
is assigned to the blob, and a blob/BV object/tag device association is then
established.
6 As
there is no issue with the correctness and uniqueness of the
7
established association, the CV/BV module then informs the network arbitrator
to
8
establish the probability of the blob/BV object association. The network
arbitrator
9 checks
the initial conditions likely related to the blob, and calculates the
probability
of the blob/BV object/tag device association. If the calculated association
probability
11 is
sufficiently high, e.g., higher than a predefined probability threshold, then
the
12 network
arbitrator does not request for any further tag observations from tag
13 devices.
14 If,
however, the calculated association probability is not sufficiently
high, then the network arbitrator requests observations from the tag device.
As
16
described before, the requested observations are those most suitable for
increasing
17 the
association probability with minimum energy expenditure incurred to the tag
18 device.
In this example, the requested tag observations may be those suitable for
19
confirming walking activity consistent with camera view observations (e.g.,
walking
activity observed from the blob 112B).
21 After
receiving the tag observations, the received tag observations are
22 sent to the CV/BV module for re-establishing the blob/BV object/tag device
23
association. The association probability is also re-calculated and compared
with the
125

CA 02934102 2016-06-22
1 probability threshold to determine whether the re-established associated
is
2 sufficiently reliable. This process may be repeated until a sufficiently
high
3 association probability is obtained.
4 Fig. 31 is a more detailed version of Fig. 30, showing the
function of
the network arbitrator 148 in the information flow. As shown, initial
conditions 1046
6 are made available to the camera views 104, birds-eye view 1042 and
network
7 arbitrator 148. The network arbitrator 148 handles all communications
with the tag
8 devices 114 based on the need of associating the tag devices 114 with BV
objects.
9 Tag information and decisions made by the network arbitrator 148 are sent
to the
camera views 104 and the birds-eye view 1042. The main output 1048 of the
11 network arbitrator 148 is the summary information regarding the
locations and main
12 activities of mobile objects, i.e., the scenario stories, which may be
used as initial
13 conditions for further mobile object detection and tracking, e.g., for
detecting and
14 tracking mobile objects entering an adjacent subarea. The summary
information is
updated every time when an object exits a subarea.
16
17 VIII-3. Camera view processing
18 It is common in practice that a composite blob of a mobile object
may
19 comprise a plurality of sub-blobs as a cluster. The graph in the IBTF
thus may
comprise a plurality of sub-blobs. Many suitable image processing
technologies,
21 such as morphological operations, erosion, dilation, flood-fill, and the
like, can be
22 used to generate such a composite blob from a set of sub-blobs, which,
on the other
23 hand, implies that the structure of a blob is dependent on the image
processing
126

CA 02934102 2016-06-22
1 technology being used. While under ideal conditions a blob may be
decomposed
2 into individual sub-blobs, such decomposition is often practically
impossible unless
3 other information, such as clothes color, face detection and recognition,
and the like,
4 are available. Thus, in this embodiment, sub-blobs are generally
considered hidden
with inference only from the uniform motion of the feature points and optical
flow.
6 In some situations, the camera view processing submodule 1002A
7 may not have sufficient information from the camera view to determine
that a cluster
8 of sub-blobs are indeed associated with one mobile object. As there is no
feedback
9 from the birds-eye view processing module 1002B to the camera view
processing
submodule 1002A, the camera view processing submodule 1002A cannot use the
11 initial conditions to combine a cluster of sub-blobs into a blob.
12 The birds-eye view processing module 1002B, on the other hand, may
13 use initial conditions to determine if a cluster of sub-blobs shall be
associated with
14 one BV object. For example, the birds-eye view processing module 1002B
may
determine that the creation time of the sub-blobs is coincident with the
timestamp of
16 the initial conditions. Also the initial conditions may indicate a
single mobile object
17 appearing in the FOV of the imaging device. Thus, the probability that
the sub-blobs
18 in the captured image frame are associated with the same mobile object
or BV
19 object is one (1).
In some embodiments, a classification system is used for classifying
21 different types of blobs with a classification probability indicating
the reliability of
22 blob classification. The different types of blobs include, but not
limited to, the blobs
23 corresponding to:
127

CA 02934102 2016-06-22
1 = Blob
type 1: single adult human object, diffuse lighting, no
2 obstruction;
3 = Blob
type 2: single adult human object, diffuse lighting, with
4 obstruction;
= Blob type 3: single adult human object, non-diffuse lighting, no
6 obstruction;
7 = Blob
type 4: single adult human object, diffuse lighting, partial
8 occlusion but recoverable;
9 = Blob
type 5: two adult humans in one object, diffuse lighting,
ambiguous occlusion; and
11 = Blob
type 6: two adult humans in one object, specular lighting,
12 ambiguous occlusion.
13 Other
types of blobs, e.g., those corresponding to child objects may
14 also be
defined. Each of the above types of blobs may be processed using different
rules. In some embodiments, the classification system may further identify non-

16 human objects such as robots, carts, wheelchairs and the like, based on
17 differentiating the shapes thereof.
18 Fig.
32A shows an example of a blob 1100 of above described type 3,
19 i.e., a
blob of a single adult human object under non-diffuse lighting and with no
obstruction. The type 3 blob 1100 comprises three (3) sub-blobs or bloblets,
21
including the head 1102, the torso 1104 and the shadow 1106. Fig. 32B
illustrates
22 the relationship between the type 3 blob 1100 and its sub-blobs 1102 to
1106.
128

CA 02934102 2016-06-22
1 With
the classification system, the camera view processing submodule
2 1002A
can then combine a cluster of sub-blobs into a blob, which facilitates the
3 camera view pruning of the graph in the IBTF.
4 The
camera view processing submodule 1002A sends classified sub-
blobs and their classification probabilities to the birds-eye view processing
module
6 1002B for facilitating mobile object tracking.
7 For
example, the initial conditions from the network arbitrator 148
8
indicate a single human object, and the birds-eye view processing submodule
9 1002B,
upon reading the initial conditions, expects a human object to appear in the
FOV of the imaging device at an expected time (determined from the initial
11 conditions).
12 At the
expected time, the camera view processing submodule 1002A
13 detects
a cluster of sub-blobs appearing at an entrance of a subarea. With the
14
classification system, the camera view processing submodule 1002A combines a
cluster of sub-blobs into a blob, and determines that the blob may be a human
16 object
with a classification probability of 0.9, a probability higher than a
predefined
17
classification probability threshold, then the birds-eye view processing
submodule
18 1002B
determines that the camera view processing submodule 1002A has correctly
19
combined the cluster of sub-blobs in the camera view as one blob, and the blob
shall be associated with the human object indicated by the initial conditions.
21 On the
other hand, if in the above example, the initial conditions
22
indicate two human objects, the birds-eye view processing submodule 1002B then
129

CA 02934102 2016-06-22
1
determines that the camera view processing submodule has incorrectly combined
2 the cluster of sub-blobs into one blob.
3 The
birds-eye view processing submodule 1002B records its
4
determination regarding the correctness of the combined cluster of sub-blobs
in the
OTF.
6 When
the camera view processing submodule 1002A combines the
7 cluster
of sub-blobs into one blob, it also stores information it derived about the
blob
8 in the
IBTF. If the camera view processing submodule has incorrectly combined the
9 cluster
of sub-blobs into one blob, the derived information may also be wrong. To
prevent the incorrect information from populating to subsequent calculation
and
11 decision making, the birds-eye view processing submodule 1002B applies
12
uncertainty metrics to the data in the OTF to allow the network arbitrator 148
to use
13 the
uncertainty metrics for weighting the data in the OTF in object tracking. With
14 proper
weighting, the data obtained by the network arbitrator 148 from other
sources, e.g., tag observations, may reduce the impact of OTF data that has
less
16
certainty (i.e., more likely to be wrong), and reduce the likelihood that the
wrong
17 information in OTF data populates to consequent calculation and decision
making.
18 In an
alternative embodiment, feedback is provided from the birds-eye
19 view processing submodule 1002B to the camera view processing submodule
1002A to facilitate the combination of sub-blobs. For example, if the birds-
eye view
21
processing submodule 1002B determines from the initial conditions that there
is
22 only
one mobile object appearing at an entrance, it feeds back this information to
23 the camera view processing submodule 1002A, such that the camera view
130

CA 02934102 2016-06-22
1
processing submodule 1002A can combine the cluster of sub-blobs appearing at
2 the
entrance as one blob, even if the cluster of sub-blobs appear in the camera
view,
3 from the CV perspective, are more likely projected to be two or more
blobs.
4
Multiple blobs may also merge into one blob due to mobile objects
overlapping therebetween in the FOV of the imaging device, and a previously
6 merge
blob may be separated when previously overlapped mobile objects are
7 separated.
8 Before
describing blob merging and separating (also called fusion and
9
fission), it is note that each blob detected in an image stream comprises two
basic
blob events, i.e., blob creation and annihilation. A blob creation event
corresponds
11 to an
event that a blob emerging in the FOV of an imaging device, such as from a
12 side of
the FOV of the imaging device, from an entrance or from an obstruction in
=
13 the FOV of the imaging device, and the like.
14 A blob
annihilation event corresponds to an event that a blob
disappears in the FOV of an imaging device, such as exiting from a side of the
FOV
16 of the
imaging device (implying moving into an adjacent subarea or leaving the site),
17 disappearing behind an obstruction in the FOV of the imaging device, and
the like.
18 Fig. 33
shows a timeline history diagram of a life span of a blob. As
19 shown,
the life span of the blob comprises a creation event 1062, indicating the
first
appearing of the blob in the captured image stream, and an annihilation event
1064,
21
indicating the disappearance of the blob from the captured image stream,
22
connected by an edge 1063 representing the life of the blob. During the life
span of
23 the
blob, the PDF of the BBTP of the blob is updated at discrete time instants
1066,
131

CA 02934102 2016-06-22
1 and the
BBTP PDF updates 1068 are passed to the birds-eye view for updating a
2 Dynamic Bayesian Network (DBN) 1070. The BTF comprises all blobs observed
3 and
tracked prior to any blob/BV object/tag device association. All attributes of
the
4 blobs generated by the camera view processing submodule are stored in the
BTF.
When the blob annihilation event occurs, it implies that (block 1072)
6 the
corresponding mobile object has exited the current subarea and entered an
7 adjacent subarea (or left the site).
8 A blob
event instantaneous occurs in an image frame, and may be
9
represented as a node in a timeline history diagram. A blob transition from
one
event to another is generally across a plurality of image frames, and is
represented
11 as an edge in the timeline history diagram.
12 A blob
may have more events. For example, a blob may have one or
13 more
fusion events, occurred when the blob is merged into another blob, and one or
14 more
fission events, occurring when two or more previously merged blobs are
separated.
16 For
example, Fig. 34 shows a timeline history diagram of the blobs of
17 Fig.
28, which shows that blobs 1 and 2 are created (events 1062A and 1062B,
18
respectively) at entrances 1024A and 1024B, respectively, to the room 1022 in
the
19 FOV of
the imaging device 104. After a while, a fusion event 1082 of blobs 1 and 2
occurs, resulting in blob 3. Another while later, blob 3 fissions into blobs 4
and 5
21
(fission event 1084). At the end of the timeline, blob 4 and 5 are annihilated
22
(annihilation events 1064A and 1064B, respectively) as they exit the FOV of
the
23 imaging
device 104 through entrances 1024B and 1024A, respectively. The camera
132

CA 02934102 2016-06-22
=
1 view
processing submodule 1002A produces the blob-related event nodes and
2 edges,
including the position and attributes of the blobs generated in the edge
3 frames,
which are passed to a DBN. The DBN puts the most likely story together in
4 the birds-eye view.
Fig. 35A shows an example of a type 6 blob 1110 corresponding to
6 two
persons standing close to each other. The blob 1110 comprises three sub-blobs,
7
including two partially overlapping sub-blobs 1112 and 1114 corresponding to
the
8 two
persons, and a shadow blob 1116. Fig. 35B illustrates the relationship between
9 the
type 6 blob 1110 and its sub-blobs 1112 to 1116. Similar to the example of
Fig.
33A, the blob 1110 may be decomposed into individual sub-blobs of two human
11 blobs and a shadow blob under ideal conditions.
12 The
type 6 blob 1110 and other types of blobs, e.g., type 5 blobs, that
13 are
merged from individual blobs, may be separated in a fission event. On the
other
14 hand,
blobs of individual mobile objects may be merged to a merged blob, e.g., a
type 5 or type 6 blob in a fusion event. Generally, fusion and fission events
may
16 occur
depending on the background, mobile object activities, occlusion, and the
like.
17 Blob
fusion and fission may cause ambiguity in object tracking.
18 Fig.
36A shows an example of such an ambiguity. As, shown, two tagged objects
19 112B
and 112C simultaneously enter the entrance 1024A of room 1022 and move
in the FOV of imaging device 104 across the room 1022, and exit from the
entrance
21 1024B.
22 As the
mobile objects 112B and 1120 are tagged objects, the initial
23
conditions from the network arbitrator 148 indicate two objects entering room
1022.
133

CA 02934102 2016-06-22
1 On the
other hand, the camera view processing submodule 1002A only detects one
2 blob
from image frames captured by the imaging device 1.04. Therefore, ambiguity
3 occurs.
4 As the
ambiguity is not immediately resolvable when the mobile
objects 112B and 112C enter the room 1022, the camera view processing
6 submodule 1002A combines detected cluster of sub-blobs into one blob.
7 If
mobile objects 112B and 112C are Moving in room 1022 at the
8 same
speed, then they still exhibit, in the camera view, as a single blob and
9
ambiguity cannot be resolved. The IBTF then indicates a blob track graph that
appears to be moving at a constant rate of walking. A primitive blob tracking
would
11 not
classify the blob as two humans. The birds-eye view processing submodule
12 1002B
analyzes the IBTF based on the initial conditions, and maps the blob cluster
13 graph
from the IBTF to the EBTF. As the ambiguity cannot be resolved, the blob
14 cluster
is thus mapped as a single BV object, and stored in the OTF. In this case,
the network arbitrator 148 would not request any tag measurements as the data
in
16 the OTF
does not indicate any possibility of disambiguation, only the initial
17 conditions indicating ambiguity.
18 When
the mobile objects 112B and 112C exit room 1022 into an
19
adjacent, next subarea, the network arbitrator 148 assembles data thereof as
initial
conditions for passing to the next subarea. As will be described later, if the
mobile
21 objects
112B and 1120 are separated in the next subarea, they may be
22
successfully identified, and their traces in room 1022 may be "back-tracked".
In
134

CA 02934102 2016-06-22
1 other words, the system may delay ambiguity resolution until the
identification of
2 mobile objects is successful.
3 If, however, the mobile objects 112B and 112C are moving in room
4 1022 at different speeds, the single blob eventually separates into two
blobs.
The single blob is separated when the mobile object traces separate,
6 wherein one trace extends ahead of the other. It is possible that there
exists a
7 transition period of separation, in which the single blob may be
separated into more
8 than a plurality of sub-blobs, which, together with the inaccuracy of the
BBTP of the
9 single blob, cause the camera view processing submodule 1002A to fail to
group
the sub-blobs into two blobs. However, such a transition=period is temporary
and
11 can be omitted.
12 With the detection of two blobs, the IBTF now comprises three blob
13 tracks, i.e., blob track 1 corresponding to the previous single blob,
and blob tracks 2
14 and 3 corresponding to the current two blobs, as shown in the timeline
history
diagram of Fig. 36B.
16 The initial conditions indicate the two ambiguous objects 112B and
17 112C at the entrance 1024A of room 1022, and the birds-eye view
processing
18 submodule 1002B processes the IBTF to generate the floor view for blob
tracks or
19 edges 1, 2 and 3. Based on the graph and the floor grid, as blob tracks
2 and 3
start at a location in room 1022 in proximity with the end location of blob
track 1, the
21 birds-eye view processing submodule 1002B associates blob track 1 with
blob track
22 2 to form a first blob track graph, and also associates blob track 1
with blob track 3
135

CA 02934102 2016-06-22
to form a second blob track graph, both associations being consistent to the
initial
2 conditions and having high likelihoods.
3 It is worth to note that, if one or both of blob. tracks 2 and 3
start at a
4 location in room 1022 far from the end location of blob track 1, the
association of
blob tracks 1 and 2 and that of blob tracks 1 and 3 would have low likelihood.
6 Back to the example, with the information from the camera view
7 processing submodule 1002A, the birds-eye view processing submodule 1002B
8 determines activities of walking associated with the first and second
blob track
9 graphs, which is compared with tag observations for resolving ambiguity.
The network arbitrator 148 requests the tag devices to report tag
11 observations, e.g., the mobile object velocities, when the mobile
objects 112B and
12 1120 are in the blob tracks 2 and 3, and uses the velocity observations
for resolving
13 ambiguity. The paces of the mobile objects may also be observed in
camera view
14 and by the tag devices, and are used for resolving ambiguity. The
obtained tag
observations such a velocities and paces are stored in the OTF.
16 In some embodiments, the network arbitrator 148 may request tag
17 devices to provide RSS measurement and/or magnetic field measurement.
The
18 obtained RSS and/or magnetic field measurements are sent to the birds-
eye view
19 processing submodule 1002B,
As the birds-eye view processing submodule 1002B has the
21 knowledge of the traces of mobile objects 112B and 112C, it can
correlate the
22 magnetic and RSS measurements with the RSS and magnetic maps,
respectively.
23 As the tagged objects are going through the same path with one behind
the other,
136

CA 02934102 2016-06-22
the RSS and/or magnetic correlations for the two objects 112B and 112C exhibit
2 similar
pattern with a delay therebetween. The ambiguity can then be resolved and
3 the
blobs can be correctly associated with their respective BV objects and tag
4 devices.
The power spectrum of the RSS can also be used for resolving
6
ambiguity. The RSS has a bandwidth roughly proportional to the velocity of the
tag
7 device
(and thus the associated mobile object). As the velocity is accurately known
8 from
the camera view (calculated based on, e.g., optical flow and/or feature point
9
tracking), the RSS spectral power bandwidths may be compared with the object
velocity for resolving ambiguity.
11 As the
mobile object moves, the magnetic field strength will fluctuate
12 and the
power spectral bandwidth will change. Thus, the magnetic field strength
13 may
also be used for resolving ambiguity in a similar manner. All of these
14
correlations and discriminatory attributes are processed by the birds-eye view
processing submodule 1002B and sent to the network arbitrator 148.
16 As
described above, the camera view processing submodule 1002A
17 tries
to combine sub-blobs that belong to the same mobile object, by using
18
background/foreground processing, morphological operations and/or other
suitable
19 imaging
processing techniques. The blobs and/or sub-blobs are pruned, e.g., by
eliminating some sub-blobs that are likely not belonging to any blob, to
facilitate
21 blob
detection and sub-blob combination. The camera view processing submodule
22 1002A
also uses optical flow methods to combine a cluster of sub-blobs into one
23 blob.
However, sub-blobs may not be combined if there is potential ambiguity, and
137

CA 02934102 2016-06-22
1 thus
the BTF (IBTF and EBFT) may comprise multiple blob tracks for the same
2 object.
3 Fig.
37A illustrates an example, in which a blob 112B is detected by
4 the
imaging device 104 appearing at entrance 1024A of room 1022, moving towards
entrance 1024B along the path 1028, but splitting (fission) into two sub-blobs
that
6 move
along slightly different path and both exit the room 1022 from entrance 1024B.
7 In this
example, three tracks are detected and included in the BTF,
8 with
one track from the entrance 1024A to the fission point, and two tracks from
the
9 fission point to entrance 1024B.
Initial conditions play an important role in this example in solving the
11
ambiguity. If the initial conditions indicate two mobile objects appearing at
entrance
12 1024A,
the two tracks after the fission point are then associated with the two mobile
13 objects.
14
However, if, in this example, the initial conditions indicate a single
object appearing at entrance 1024A, as objects cannot be spontaneously created
16 within
the FOV of the imaging device 104, the birds-eye view processing submodule
17 interprets the blob appearing at the entrance 1024A as a single mobile
object.
18 The
first blob track from the entrance 1024A to the fission point is
19
analyzed in the BV frame. The bounding box size that should correspond to the
physical size of the object is calculated and verified for plausibility. In
this example
21 we are
assuming a diffuse light for simplicity such that the shadows are not an
issue,
22 and the
processing of shadows is omitted as shadows can be treated as described
23 above.
138

CA 02934102 2016-06-22
1 Immediately after the fission point, there appear two bounding
boxes
2 (i.e., two CV objects or two FFCs). If the two bounding boxes are then
moving at
3 different velocity or along two paths significantly apart from each
other, the two CV
4 objects are then associated with two mobile objects. Tag observations may
be used
to determine which one of the two mobile objects is the tagged object.
However, if
6 the two CV objects are moving at substantially the same velocity along
two paths
7 close to each other, the ambiguity cannot be solved. In other words, the
two CV
8 objects may be indeed a single mobile object but appearing as two CV
objects due
9 to the inaccuracy in image processing, or the two CV objects are two
mobile objects
but are close to each other and cannot be distinguished with sufficient
confidentiality.
11 The system thus considers them as one (tagged) mobile object. If, after
exiting from
12 the entrance 1024B, the system observes two significantly different
movements, the
13 above described ambiguity occurred in room 1022 can then be solved.
14 With above examples, those skilled in the art appreciate that
ambiguity in most situations can be resolved for by using camera view
observations
16 and the initial conditions. If the initial conditions are affirmative,
the ambiguity may
17 be resolved with probability of one (1). If, however, the initial
conditions are
18 probabilistic, the ambiguity is resolved with a probability less than
one (1). The
19 mobile object is tracked with a probability less than one (1) and is
conditioned on
the possibility of the initial conditions. For example, mobile object tracking
may be
21 associated with the following Bayesian probabilities:
22 Pr(blob tracks 1, 2 and 3 being associated) = Pr(initial
conditions
23 indicating one person),
139

CA 02934102 2016-06-22
1 where Pr(A) represents the probability that A is correct; or
2 Pr(blob
tracks 2 and 3 being separately associated with blob track 1) =
3 Pr(initial conditions indicating two persons).
4 During
object tracking, a blob may change in size or shape, an
example of which is shown in Fig. 37B.
6 In this
example, there is a cart 1092 in room 1022 that has been
7
stationary for a long time and therefore become part of the background in
camera
8 view. A
tagged person 112B enters from the left entrance 1024A and moves across
9 the
room 1022 along the path 1028. Upon reaching the cart 1092, the person 112B
pushes the cart 1092 to the right entrance 1024B and exit therefrom.
11 During
tracking of the person 112B, the camera view processing
12
submodule 1002A determines a bounding box for the person's blob, which,
however,
13
suddenly becomes much larger when the person 112B starts to push the cart 1092
14 therewith.
Accordingly, the information carried in the edge of the blob track graph
16 is
characterized by a sudden increase in the size of the blob bounding box, which
17 causes
a blob track abnormality in birds-eye view processing. A blob track
18
abnormality may be considered a pseudo-event not detected in the camera view
19 processing but rather in the subsequent birds-eye view processing.
In the example of Fig. 37B, the initial conditions indicate a single
21 person
entering entrance 1024A. Although the camera view processing indicates a
22 single
blob crossing the room 1022, the birds-eye view processing analyzes the
23
bounding box of the blob and determines that the bounding box size of the blob
at
140

CA 02934102 2016-06-22
1 the
first portion of the trace 1028 (between the entrance 1024A and the cart 1092)
2 does
not match that at the second portion of the trace 1028 (between the cart 1092
3 and the entrance 1024B). A blob track abnormality is then detected.
4 Without
further information, the birds-eye view processing/network
arbitrator can determine that the mobile object 112B Is likely associated with
an
6
additional object that was previously part of the background in captured image
7 frames.
8 The
association of the person 112B and the cart 1092 can be further
9
confirmed if the cart 1092 comprises a tag device that wakes up as it is being
moved by the person (via accelerometer measuring a sudden change). The tag
11 device
of the cart 1092 immediately registers itself with the network arbitrator 148,
12 and
then the network arbitrator 148 starts to locate this tag device. Due to the
13
coincidence of the tag device waking up and the occurrence of the blob track
14
abnormality, the network arbitrator 148 can determine that the mobile object
112B is
now associated with the cart 1092 with a moderate level of probability.
Furthermore,
16 the tag
device of the cart 1092 can further detects that it is being translated in
17
position (via magnetic field measurement, RSS measurement, accelerometer and
18 rate
gyro data indicating vibrations due to moving, and the like), and thus the
cart
19 1092
can be associated with the mobile object 112B during the second portion of
the trace 1028.
21 If
feedback can be provided to the camera view processing
22
submodule 1002A, the camera view processing submodule 1002A may analyze the
23
background of captured images and compare the background in the images
141

CA 02934102 2016-06-22
1
captured after the cart 1092 is pushed with that in the images captured before
the
2 cart
1092 is pushed. The difference can show that the cart object 1092 that is
3 moved.
4 Fig.
370 shows another example, in which a tagged person 112B
enters from the left entrance 1024A and moves across the room 1022 along the
6 path
1028. During moving, the person 112B sits down for a while at location 1094,
7 and then stands up and walks out from entrance 1024B.
8
Accordingly, in the camera view, the person 112B appears as a
9 moving
blob from the entrance 1024A where a new track of blob 112B is initiated.
Periodic oscillating of the bounding box confirms the object walking. Then,
the
11 walking
stops and the blob 112B becomes stationary (e.g., for a second). After that,
12 the
blob 112B remains stationary but the height thereof shrinks. When the person
13 112B
stands up, the corresponding blob 112B increases to its previous height. After
14 a short
period, e.g., a second, the blob again exhibits walking motion (periodic
undulations) and moves at a constant rate towards the entrance 1024B.
16 While
in this embodiment the change of the height of the blob in
17 Fig.
370 does not cause ambiguity, in some alternative embodiments, the system
18 may need to confirm the above-described camera observation using tag
19 observations.
IMU tag observations, e.g., accelerometer and rate gyro outputs,
21 exhibit
a motion pattern consistent to the camera view observation. In particular, tag
22
observations reveal a walking motion, and then a slight motion activity (when
the
23 person
112B is sitting down and when the person 112B is standing up). Then, the
142

CA 02934102 2016-06-22
1 IMU tag observations again reveal a walking motion. Such a motion pattern
can be
2 used to confirm the camera view observation.
3 In some embodiments wherein the tag device comprise other sensors
4 such as a barometer, the output of the barometer can detect the change in
altitude
from standing and sitting (except that the tag device is coupled to the person
at an
6 elevation close to the floor, or that the tag device is carried in a
handbag that is put
7 on a table when the person 112B sits down). As usually the person 112B
will sit
8 down for at least several seconds or even much longer, the barometer
output, while
9 noisy, can be filtered with a time constant, e.g., several seconds, to
remove noise
and detect altitude change, e.g., of about half meter. Thus, the barometer
output
11 can be used for detecting object elevation changes, such as a person
sitting down,
12 and for confirming the camera view observation.
13 RSS measurement can also be used for indicating object in
stationary
14 by determining that the RSS measurement does not change in a previously
detected manner or does not change at all. Note that the RSS measurement does
16 not change when the tagged person is walking along an arc and
maintaining a
17 constant distance to the wireless signal transceiver. However, this
rarely occurs,
18 and even if it occurs, alternative tag observations can be used.
19 In the example of Fig. 37C, the site map may contain information
regarding the location 1094, e.g., a chair pre-deployed and fixed at a
location 1094.
21 Such information may also be used for confirming the camera view
observation.
22 Fig. 37D shows yet another example. Similar to Fig. 37C, a tagged
23 person 112B enters from the left entrance 1024A and moves across the
room 1022
143

CA 02934102 2016-06-22
1 along the path 1028. Accordingly, in the camera view, the person 112B
appears as
2 a moving blob from the entrance 1024A where a new track of blob 112B is
initiated.
3 Periodic oscillating of the bounding box confirms the object walking.
4 When the person 112B arrives at location 1094, the person 112B
sits
down. Unlike the situation of Fig. 37C, in Fig. 37D, two untagged persons 112C
and
6 112D are also sitting at location 1094 (not yet merged into the
background).
7 Therefore, the blob of person 112B merges with those of persons 1120 and
112D.
8 After a short while, person 112B stands up and walks out from
9 entrance 1024B. The camera view processing submodule 'detects the fission
of the
merged blob, and the birds-eye view processing submodule can successfully
detect
11 the moving of person 112B by combining camera view observations and tag
12 observations.
13 However, if an untagged person, e.g., person 1120 also stands up
14 and walks with person 112B, unresolvable ambiguity occurs as the system
cannot
detect the motion of the untagged person 1120. Only the motion of the tagged
16 person 112B can be confirmed. This example shows the limitations in
tracking
17 untagged mobile objects.
18 Fig. 38 shows a table listing the object activities and the
performances
19 of the network arbitrator, camera view processing and tag devices that
may be
triggered by the corresponding object activities.
21
=
144

CA 02934102 2016-06-22
1 VIII-4. Tracking blobs in image frames
2 Tracking blobs in image frames may be straightforward in some
3 situations such as Fig. 27A in which the association of the blob, the BV
object and
4 the tag device based on likelihood is obvious as there is only one mobile
object
112B in the FOV of the imaging device 104A. During the = movement of the
mobile
6 object 112B, each image frame captured by the imaging device 104A has a
blob
7 that is "matched" with the blob of the previous image frame only with a
slight
8 position displacement. As in this scenario blobs cannot spontaneously
appear or
9 disappear, the only likely explanation of such a matched blob is that the
blobs in the
two frames are associated, i.e., representing the same mobile object, with
11 probability of 1.
12 However, in many practical scenarios, some blobs in consecutive
13 frames may be relatively displaced by a large amount, or are
significantly different in
14 character. As described earlier, blobs typically are not a clean single
blob outlining
the mobile object. Due to ambiguities of distinguishing foreground from
background,
16 image processing techniques such as background differencing, binary
image
17 mapping and morphological operations may typically result in more than
one sub-
18 blob. Moreover, sub-blobs are dependent on the background, i.e., the sub-
blob
19 region becomes modulated by the background. Therefore, while a mobile
object
cannot suddenly disappear or appear, the corresponding blob can blend
21 ambiguously with the background, and disappear and capriciously and
22 subsequently appear again.
145

CA 02934102 2016-06-22
=
1 A
practical approach for handling blobs is to "divide and conquer".
2 More
particularly, the sub-blobs are tracked individually and associated to a blob
3 cluster
if some predefined criteria are met. Often, sub-blobs originate from a fission
4
process. After a few image frames, the sub-blobs undergo a fusion process and
become one blob. When the system determines such fission-fusion, the sub-blobs
6
involved are combined as one blob. Test results show that, by considering the
7
structure of the graph of the sub-blobs, this approach is effective in
combining sub-
8 blobs.
9 Some
image processing techniques such as the binary and
morphological operations may destroy much of the information regarding a blob.
11
Therefore, an alternative is to calculate the optical flow from one image
frame to the
12 next.
The blob associated with a moving object exhibits ,a nonzero optical flow
while
13 the
background has a zero flow. However, this requires the imaging device to be
14
stationary and constant, without zooming or panning. Also the frame rate must
be
sufficiently high such that the object motion is small during the frame
interval,
16
comparing to the typical feature length of the object. A drawback of the
optical flow
17
approach is that when a human is walking, the captured images show parts of
the
18 human
are stationary while other parts are moving. Swinging arms can even exhibit
19 an optical flow in the opposite direction.
Although initial conditions may reveal that the object is a walking
21 human,
and may allow determination of parts of the human based on the optical
22 flow,
such algorithms are complex and may not be robust.. An alternative method is
23 to use
feature point tracking, i.e., to track feature points, e.g., corners of a
blob.
146

CA 02934102 2016-06-22
1 Depending on the contrast of the humans clothing over the background,
suitable
2 feature points can be found and used.
3 Another alternative method is to determine the boundary of the
object,
4 which may be applied to a binary image after morphological operations. To
avoid
merely getting boundaries around sub-blobs, snakes or active contours based on
a
6 mixture of penalty terms may be used to generate the outline of the
human, from
7 which the legs, arms and head can be identified. As the active contour
has to be
8 placed about the desired blob, the system avoids forming too large a blob
with
9 limited convergence and errors in background/foreground separation that
may result
in capricious active contours.
11 Other suitable, advanced algorithms may alternatively be used to
track
12 the sub-blob of a person's head, and attempt to place a smaller bounding
box about
13 each detected head sub-blob. After determining the bounding box of a
head and
14 knowing that the human object is walking or standing, the nominal
distance from the
head to the ground is thus approximately known. Then the. BBTP of the blob can
be
16 determined. A drawback of this algorithm is that it may not work well if
the human
17 face is not exposed to the imaging device. Of course, this algorithm
will fail if the
18 mobile object is not a human.
19 In this embodiment, the VAILS uses the standard method of
morphological operations on a binary image after background differencing. This
21 method is generally fast and robust even though it may omit much of the
blob
22 information. This method is further combined with a method of
determining the
23 graph of all of the related sub-blobs for combining same. When
ambiguities arise,
147

CA 02934102 2016-06-22
=
1 the blob or sub-blob track, e.g., the trajectory being recorded, is
terminated, and, if
2 needed, a new track may be started, and maintained after being stable.
Then the
3 birds-eye view processing connects the two tracks to obtain the most
likely mobile
4 object trajectory.
In forming the blob tracks, it is important to note that the system has to
6 maximize the likelihood of association. For example, Figs. 39A and 39B
show two
7 consecutive image frames 1122A and 1122B, each having two detected blobs
8 1124A and 1124B. Assuming that the system does not known any information
of the
9 mobile object(s) corresponding to the blobs 1124A and 1124B, to determine
whether or not the blobs 1124A and 1124B correspond to the same mobile object,
11 the system uses a likelihood overlap integral method. With this method,
the system
12 correlates the two blobs 1124A and 1124B in the consecutive frames 1122A
and
13 1122B to determine an association likelihood. In particular, the system
incrementally
14 displaces the blob 1124A in the first frame 1122A, and correlates the
displaced blob
1124A with the blob 1124B in the second frame 1122B until a maximum
correlation
16 or "match" is obtained. The F is essentially a normalized overlap
integral (see Fig.
17 39C) in which the equivalence of the correlation coefficient emerges.
18 The system determines a measurement of the likelihood based on the
19 numerical calculation of the cross-correlation coefficient at the
location of the
maximum blob correlation. Practically the calculated cross-correlation
coefficient is
21 a positive number smaller than or equal to one (1).
22 In calculating the maximum correlation of the two blobs 1124A and
23 1124B, the system actually treats the blobs as spatial random process,
as the
148

CA 02934102 2016-06-22
1 system
does not know any information of the mobile object(s) corresponding to the
2 blobs
1124A and 11246. A numerical calculation of correlation is thus used in this
3
embodiment for determining the maximum correlation. In this embodiment, images
4 1122A
and 11226 are binary images, and the blob correlation is calculated using
data of these binary images. Alternatively, images 1122A and 1122B may be
color
6 images,
and the system may calculate blob correlation using data of each color
7 channel
of the images 1122A and 1122B (thus each color channel being considered
8 an independent random field).
9 In
another embodiment, the system may correlate derived attributes of
the blobs, e.g., feature points. In particular, the system first uses the well-
known
11 Lucas
Kanade method to first establish association of the. feature points, and then
12 establishes the object correlation from frame to frame.
13 The
above described methods are somewhat heuristic, guided by the
14 notion
of correlation of random signals but after modification and selection of the
signal (i.e., blob content) in heuristic ways. Each of the methods has its own
16
limitation and a system designer selects a method suitabl.e for meeting the
design
17 goals.
18 The
above described likelihood overlap integral method as illustrated
19 in
Figs. 39A to 390 has an implied assumption that the blob is time invariant, or
at
least changes slowly with time. While this assumption is generally practical,
in some
21
situations where the blob is finely textured, the changes in the blob can be
large in
22 every
frame interval, and the method may fail. For example, if the object is a human
23 with
finely pitched checkered clothing, then a direct correlation over the typical
33
149

CA 02934102 2016-06-22
1 ms (milliseconds) frame interval will result in a relatively small
overlap integral. A
2 solution is that the system may pre-process the textured blob with a low
pass spatial
3 filter or even conversion to binary with morphological steps such that
the overlap
4 integral will be more invariant. However, as the system does not know ahead
of
time what object texture or persistence the blob has, there is a trade-off of
blob
6 preprocessing before establishing the correlation or overlap integral.
7 While difficulty and drawbacks exist, a system designer can still
8 choose a suitable method such that some correlation can be determined
over some
9 vector of object attributes. The outcome of the correlation provides a
quantitative
measure of the association but also provides a measure of how the attributes
11 change from one frame to the next. An obvious example in correlating the
binary
12 image is the basic incremental displacement of the blob centroid. If
color channels
13 are used, then additionally the system can track the hue of the object
color, which
14 varies as the lighting changes with time. The change in, displacement is
directly
useful. After obtaining, together with the correlation, a measurement of how
much
16 the mobile object has moved, the system can then determine how reliable
the
17 measurement is, and use this measurement with the numerical correlation
to
18 determine a measurement of the association likelihood.
19 If the camera view processing submodule does not have any
knowledge of the blob motion from frame to frame, an appropriate motion model
21 may simply be a first order Markov process. Then, blobs that have small
22 displacements between frames would have a higher likelihood factor, and
whether
23 the blob completely changes direction from frame to frame is irrelevant.
On the
150

CA 02934102 2016-06-22
1 other
hand, if initial conditions indicate that the mobile object is a human with
steady
2 walking
perpendicular to the axis of the imaging device,. then the system can exploit
3
incremental displacement in a specific direction. Moreover, if the mobile
object
4 velocity
is limited, and will not vary instantaneously, a second order Markov model
can be used, which that tracks the mobile object velocity as a state variable.
Such a
6 second
order Markov model is useful in blob tracking through regions in which the
7 blob is
corrupted by, e.g., background clutter. A Kalman filter may be used in this
8 situation.
9 The
birds-eye view processing (described later) benefits from the blob
velocity estimate. The system passes the BBTP and the estimate of velocity
from
11 the camera view to the birds-eye view.
12 The
system resolves potential ambiguity of blobs to obtain the most
13 likely
BV object trajectory in birds-eye view. The system considers the initial
14
conditions having high reliability. Consequently, in an image frame such as
the
image frame 1130 of Fig. 40, potential ambiguity can be readily resolved as
each
16 car
1132, 1134 has its own trajectory. More particularly, ambiguity is resolved
based
17 on
Euclidean distance of the differential displacement, and if needed, based the
18 tracking of the car velocities as the car trajectories are smooth.
19 A
problem in using the likelihood overlap integral method that the
system has to deal with is that some attributes, e.g., size, orientation and
color mix,
21 of blobs
in consecutive frames may not be constant, causing the overlap or
22
correlation integral to degrade. The system deals with this problem by
allowing
151

CA 02934102 2016-06-22
1 these attributes to change within a predefined or adaptively determined
range to
2 tolerate correlation integral degradation.
3 In some embodiments, tolerating correlation integral degradation
is
4 acceptable if the variation of the blob attributes is small. In some
alternative
embodiments, the system correlates the binary images of the blobs that have
been
6 treated with a sequence of morphological operations to minimize the
variation
7 caused by changes in blob attributes.
8 Other methods are also readily available. For example, in some
9 embodiments, the system does not use background differencing for
extracting
foreground blobs. Rather, the system purposely blurs captured images and then
11 uses optical flow technology to obtain blob flow relative to the
background. Optical
12 flow technology, in particular, works well for the interior of the
foreground blob that
13 is not modulated by the variation of the clutter in the background. In
some
14 alternative embodiments, feature point tracking is used for tracking
objects with
determined feature points.
16 The above described methods, including the likelihood overlap
integral
17 method (calculating block correlation), optical flow or feature point
tracking, allow
18 the system to estimate the displacement increment over one image frame
interval.
19 In practical use, mobile objects are generally moving slowly, and the
imaging
devices have a sufficiently high frame rate. Therefore, = a smaller
displacement
21 increment in calculating blob correlations gives rise to higher
reliability of resolving
22 ambiguity. Moreover, the system in some embodiments can infer a
measurement of
23 the blob velocity, and track the blob velocity as a state variable of a
higher order
=
152

CA 02934102 2016-06-22
Markov process of random walk, driven by white (i.e., Gaussian) acceleration
2 components. For example, a Kalman filter can be used for tracking the
blob velocity,
3 as most mobile objects inevitably have some inertia and thus the
displacement
4 increments are correlated from frame to frame. Such a statistic model
based
estimation based method is also useful in tracking mobile objects that are
6 temporarily occluded and causes no camera view observation.
7 Generally, blob tracking may be significantly simplified if some
8 information of the mobile object being tracked can be omitted. One of the
simplest
9 blob tracking methods with most omitted mobile object information is the
method(s)
tracking blobs using binary differenced, morphologically processed images. If
more
11 details of the mobile objects are desired, more or all attributes of
mobile objects and
12 their corresponding blobs have to be retained and used with deliberate
modelling.
13
14 VIII-5. Interrupted blob trajectories
Mobile objects may be occluded by obstructions in a subarea, causing
16 fragments of the trajectory of the corresponding blob. Figs. 41A and 416
show an
17 example. As shown, a room 1142 is equipped with an imaging device 104,
and has
18 an obstruction 1150 in the FOV of the imaging device 104. A mobile
object 112 is
19 moving in a room 1142 from entrance 1144A towards entrance 1144B along a
path
1148. A portion of the path 1148 is occluded by the obstruction 1150.
21 With the initial conditions of the mobile object 112 at the
entrance
22 1144A, the system tracks the object's trajectory (coinciding with the
path 1148) until
23 the mobile object is occluded by the obstruction 1150, at which moment
the blob
153

CA 02934102 2016-06-22
1 corresponding to the mobile object 112 disappears from the images
captured by the
2 imaging device 104, and the mobile object tracking is interrupted.
3 When the mobile object 112 comes out of the obstruction 1150, and
4 re-appears in the captured images, the mobile object tracking is resumed.
As a
consequence, the system records two trajectory segments in the blob-track
file.
6 The system then maps the two trajectory segments in the birds-eye
7 view, and uses a statistic model based estimation and, if needed, tag
observations
8 to determine whether the two trajectory segments shall be connected. As
the
9 obstruction is clearly defined in the site map, processing the two
trajectory
segments in the birds-eye view would be easier and more straightforward. As
11 shown in Fig. 41B, the two trajectory segments or blob tracks are stored
in the blob-
12 track file as a graph of events and edges.
13 Fig. 42 is the timeline history diagram of Fig. 41A, showing how
the
14 two trajectory segments are connected. As shown, when blob 1 (the blob
observed
before the mobile object 112 is occluded by the obstruction 1150) is
annihilated and
16 blob 2 (the blob observed after the mobile object 112 'came out of the
obstruction
17 1150) is created, the system determines whether or not blobs 1 and 2
shall be
18 associated by calculating an expected region of blob re-emerging, and
checking if
19 blob 2 appears in the expected region. If blob 2 appears in the expected
region, the
system then associates blobs 1 and 2, and connects the two trajectory
segments.
21 In determining whether or not blobs 1 and 2 shall be associated,
the
22 system, if needed, may also request tag device(s) to provide tag
observations for
23 resolving ambiguity. For example, Fig. 43 shows an alternative
possibility that may
154

CA 02934102 2016-06-22
1 give rise to same camera view observations. The system can correctly
decide
2 between Figs. 41A and 43 by using tag observations.
3
4 VIII-6. Birds-eye view processing
In the VAILS, a blob in a camera view is mapped into the birds-eye
6 view for establishing the blob/ BV object/tag device association. The
BBTP is used
7 for mapping the blob into the birds-eye view. However, the uncertainty of
the BBTP
8 impacts the mapping.
9 As described above, the BBTP, bounding box track point, of a blob
is
a point in the captured images that the system estimates as the point that the
object
11 contacts the floor surface. Due to the errors introduced in calculation,
the calculated
12 BBTP is inaccurate, and the system thus determines an ambiguity region
or a
13 probability region associated with the BBTP for describing the PDF of
the BBTP
14 location distribution. In ideal case that the BBTP position has no
uncertainty, the
ambiguity region is reduced to a point.
16 Fig. 44 shows an example of a blob 1100 with a BBTP ambiguity
17 region 1162 determined by the system. The ambiguity region 1162 in this
18 embodiment is determined as a polygon in the camera view with a
uniformly
19 distributed BBTP position probability therewithin. Therefore, the
ambiguity region
may be expressed as an array of N vertices.
21 The vertex array of the ambiguity region is mapped into the birds-
eye
22 view floor space using above-described perspective mapping. As the
system only
155

CA 02934102 2016-06-22 =
1 needs to calculate the mapping of the vertices, mapping such a polygonal
ambiguity
2 region can be done efficiently, resulting in an N-point polygon in the
birds-eye view.
3 Figs. 45A and 45B show a BBTP 1172 in the camera view and
4 mapped into the birds-eye view, respectively, wherein the dash-dot line
1174 in
Fig. 45B represents the room perimeter.
6 Figs. 46A and 46B show an example of an ambiguity region of a
7 BBTP identified in the camera view and mapped into the birds-eye view,
8 respectively. In this example, the imaging device is located at the
corner of a 3D
9 coordinate system at xW=0 and yW=0 with a height of zW=12m. The imaging
device has an azimuth rotation of azrot=pi/4 and a down tilt angle of downtilt
=pi/3.
11 For example, the object monitored by the imaging device could have a
height of
12 z0=5m. Ambiguity mapped into BV based on outline contour of blob results
from
13 the 3D box object. The slight displacement shown is a result of the
single erosion
14 step taken of the blob.) One would decompose/analyze the blob to get a
smaller
BBTP polygon uncertainty region.
16 The PDF of the BBTP location is used for Bayesian update. In this
17 embodiment, the PDF of the BBTP location is uniformly distributed within
the
18 ambiguity region, and is zero (0) outside the ambiguity region.
Alternatively, the
19 PDF of the BBTP location may be defined as Gaussian or other suitable
distribution
for taking into account random factors such as the camera orientation, lens
21 distortion and other random factors. These random factors may also be
mapped
22 into the birds-eye view as a Gaussian process by determining the mean
and
23 covariance matrix thereof.
156

CA 02934102 2016-06-22
1 In this embodiment, the VAILS uses a statistic model based
estimation
2 method to track the BBTP of a BV object. The statistic model based
estimation,
3 such as a Bayesian estimation, used in this embodiment is similar to that
described
4 above. The Bayesian object prediction is a prediction of the movement of
the BBTP
of a BV object for the next frame time (i.e., the time instant the next image
frame to
6 be captured) based on information of the current and historical image
frames as
7 well as available tag observations. The Bayesian object prediction works
well even
8 if nothing is known regarding the motion of the mobile object (except the
positions of
9 the blob in captured images). However, if a measurement of the object's
velocity is
available, the Bayesian object prediction may use the Object's velocity in
predicting
11 the movement of the BBTP of a BV object. The object's velocity may be
estimated
12 by the blob Kalman filter tracking of the velocity state variable, based
on the optical
13 flow and feature point motion of the camera view bounding box. Other
mobile object
14 attributes, such as inertia, maximum speed, object behavior (e.g., a
child likely
behaving differently than an attendant pushing someone in a wheelchair), and
the
16 like. As described above, after object prediction, blob/BV object/tag
device
17 association is established, and the prediction result is feedback to
computing vision
18 process. The detail of the birds-eye view Bayesian processing is
described later.
19
VIII-7. Updating posterior probability of object location
21 Updating posterior probability of object location is based on the
blob
22 track table in the computer cloud 108, which is conducted after the
blob/BV
23 object/tag device association is established. The posterior object
location pdf is
157

CA 02934102 2016-06-22
1 obtained by multiplying the current object location pdf by the blurred
polygon
2 camera view observation pdf. Other observations such as tag observations and
3 RSS measurement may also be used for updating posterior probability of
object
4 location.
6 VIII-8. Association table update
7 The blob/BV object/tag device association is important to mobile
8 object tracking. An established blob/BV object/tag device association is
the
9 association of a tagged mobile object associated with a set of blobs
through the
timeline or history. Based on such an association, the approximate BV object
11 location can be estimated based on the mean of the posterior pdf. The
system
12 records the sequential activities of the tagged mobile object, e.g.,
"entered door X of
13 the room Y at time T, walked through central part of room and left at
time T2
14 through entrance Z". Established blob/BV object/tag device association
are stored in
an association table. The update of the association table and the Bayesian
object
16 prediction update are in parallel and co-dependent. In one alternative
embodiment,
17 the system may establish multiple blob/BV object/tag device associations
as
18 candidate associations for a mobile object, track the candidate
associations, and
19 eventually select the most likely one as the true blob/BV object/tag
device
association for the mobile object.
21
158

CA 02934102 2016-06-22
1 VIII-9. DBN update
2 The VAILS in this embodiment uses a dynamic Bayesian network
3 (DBN) for calculating and predicting the locations of BV objects.
Initially, the camera
4 view processing submodule operates independently to generate a blob-track
file.
The DBN then starts with this blob-track file, transforms the blob therein
into a BV
6 object and tracks the trajectory probability. The blob-track file
contains the
7 sequence of likelihood metrics based on the blob correlation coefficient.
8 As described before, each blob/BV object/tag device association is
9 associated with an association probability. If the association
probability is smaller
than a predefined threshold, object tracking is then interrupted. To prevent
object
11 tracking interruption due to temporarily lowered association
probability, a state
12 machine with suitable intermediate states may be used to allow an
association
13 probability to temporarily lower for a short period of time, e.g., for
several frames,
14 and increase to above the predefined threshold.
Fig. 47 shows a simulation configuration having an imaging device
16 104 and an obstruction 1202 in the FOV of the imaging device 104. A
mobile object
17 moves along the path 1204. Fig. 48 shows the results of the DBN
prediction.
18 Tracking of a first mobile object may be interrupted when the
first
19 mobile object is occluded by an obstruction in the FOV. During the
occlusion period,
the probability diffuses outward. Mobile object tracking may be resumed after
the
21 first mobile object comes out of the obstruction and re-appears in the
FOV.
22 However if there is an interfering source such as a second mobile
23 object also emerging from a possible location that the first mobile
object may re-
159

CA 02934102 2016-06-22
1 appear, the tracking of the first mobile object may be mistakenly resumed
to
2 tracking the second mobile object. Such a problem is due to the fact
that, during
3 occlusion, the probability flow essentially stops and then diffuses outward,
4 becoming weak when tracking is resumed. Fig. 49 shows the prediction
likelihood
over time in tracking the mobile object of Fig. 47. As shown, the prediction
likelihood
6 drops to zero during occlusion, and only restores to a low level after
tracking is
7 resumed.
8 If velocity feedback is available, it may be used to improve the
9 prediction. Fig. 50 shows the results of the DBN prediction in tracking
the mobile
object of Fig. 47. The prediction likelihood is shown in Fig. 51, wherein the
circles
11 indicate camera view observations are made, i.e., images are captured,
at the
12 corresponding time instants. As can be seen, after using velocity
feedback in DBN
13 prediction, the likelihood after resuming tracking only exhibits a small
drop. On the
14 other hand, if the prediction likelihood after resuming tracking drops
significantly
below a predefined threshold, a new tracking is started.
16 Figs. 52A to 52C show another example of a simulation
configuration,
17 the simulated prediction likelihood without velocity feedback, and the
simulated
18 prediction likelihood with velocity feedback, respectively.
19 To determine if it is the same object when the blob re-emerges or
it is
a different object, the system calculates the probability of the following two
21 possibilities:
160

CA 02934102 2016-06-22
1 A ¨
assuming the same object: considering the drop in association
2
likelihood and considering querying the tag device to 'determine if a common
tag
3 device corresponding to both blobs.
4 B ¨
assuming different objects: what is the likelihood that a new object
can be spontaneously generated at the start location of the trajectory after
the
6 tracking is resumed? What is the likelihood that the original object
vanished?
7 Blob-
track table stores multiple tracks, and the DBN selects the most
8 likely one.
9 Fig.
53A shows a simulation configuration for simulating the tracking
of a first mobile object (not shown) with an interference object 1212 nearby
the
11
trajectory 1214 of the first mobile object and an obstruction 1216 between the
12 imaging device 104 and the trajectory 1214. The camera view processing
13
submodule produces a bounding box around each of the first, moving object and
14 the
stationary interference object 1212, and the likelihood of the two bounding
boxes are processed.
16 The
obstruction 1216 limits the camera view measurements, and the
17 nearby
stationary interference 1212 appears attractive as the belief will be spread
18 out
when the obstruction is ended. The likelihood is calculated based on the
overlap
19 integration and shown in Fig. 42. The calculated likelihood is shown in
Fig. 53B.
At first the likelihood of the first object builds up quickly but then starts
21
dropping as the camera view measurements stops due to the obstruction.
However,
22 the
velocity is known and therefore the likelihood of the first object doesn't
decay
161
=

CA 02934102 2016-06-22
1 rapidly. Then the camera view observations resume after the obstruction
and the
2 likelihood of the first object jumps back up.
3 Figs. 54A and 54B show another simulation example.
4
VI 11-10. Network arbitrator
6 Consider the simple scenario of Fig. 25. The initial conditions
originate
7 from the network arbitrator, which evaluates the most likely trajectory
of the mobile
8 object 112A as it goes through the site consisting of multiple imaging
devices
9 104B,A,C. The network arbitrator attempts to output the most likely
trajectory of the
mobile object from the time the mobile object enters the site to the time the
mobile
11 object exits the site, which may last for hours. The mobile object moves
from the
12 FOV of one imaging device to that of the next. As the mobile object
enters the FOV
13 of an imaging device, the network arbitrator collects initial conditions
relevant to the
14 CV/BV processing module and sends the collected initial conditions
thereto. The
CV/BV processing module is then responsible for object tracking. When the
mobile
16 object leaves the FOV of the current imaging device, the network
arbitrator again
17 collects relevant initial conditions for the next imaging device and
sends to the
18 CV/BV processing module. This procedure repeats until the mobile object
=
19 eventually leaves the site.
In the simple scenario of Fig. 25, the object trajectory is simple and
21 unambiguous such that the object's tag device does not have to be
queried.
22 However, if an ambiguity regarding the trajectory or regarding the
blob/BV
23 object/tag device association arises, then the tag device will be
queried. In other
162

CA 02934102 2016-06-22
1 words,
if the object trajectory seems dubious or confused with another tag device,
2 the
network arbitrator handles requests for tag observations to resolve the
3
ambiguity. The network arbitrator has the objective of minimizing the energy
4
consumed by the tag device subject to the constraint of the acceptable
likelihood of
the overall estimated object trajectory.
6 The
network arbitrator determines the likely trajectory based on a
7 conditional Bayesian probability graph, which may have high computational
8 complexity.
9 Fig. 55
shows the initial condition flow and the output of the network
arbitrator. As shown, initial conditions come from network arbitrator and is
used in
11 camera
view to acquire and track the incoming mobile object as a blob. The blob
12
trajectory is stored in the blob-track file and is passed to the birds-eye
view. The
13 birds-
eye view does a perspective transformation of the blob track and does a
14 sanity
check on the mapped object trajectory to ensure that all constraints are
satisfied. Such constraints includes, e.g., that the trajectory cannot pass
through
16
building walls, pillars, propagate at enormous velocities, and the like. If
constraints
17 are
violated then the birds-eye view will distort the trajectory as required,
which is
18
conducted as a constrained optimization of likelihood. Once the birds-eye view
19
constraints are satisfied, the birds-eye view reports to the network
arbitrator, and
the network arbitrator puts the trajectory into the higher level site
trajectory
21 likelihood.
22 The
network arbitrator is robust to handle errors to avoid failures, such
23 as
prediction having no agreement with camera view or with tag observation,
163
=

CA 02934102 2016-06-22
1 camera view observations and/or tag observations stopped due to various
reasons,
2 a blob being misconstrued as a different object and the misconstruing
being
3 propagated into another subarea of the site, invalid tag observations,
and the like.
4 The network arbitrator resolves ambiguities. Fig. 56 shows an
example, wherein the imaging device reports that a mobile object exits from an
6 entrance on the right-hand side of the room. However, there are two
entrances on
7 the right-hand side, and ambiguity arises in that it is uncertain which
of the two
8 entrances the mobile object takes to exit from the room.
9 The CV/BV processing module reports both possible paths of room-
leaving to the network arbitrator. The network arbitrator. processes both
paths using
11 camera view and tag observations until the likelihood of one of the
paths attains a
12 negligibly low probability, and is excluded.
13 Fig. 57 shows another example, wherein the network arbitrator may
14 delay the choice among candidate routes (e.g., when the mobile object
leaves the
left-hand side room) if the likelihoods of candidate routes are still high,
and make a
16 choice when one candidate route exhibits sufficiently high likelihood.
In Fig. 57, the
17 upper route is eventually selected.
18 Those skilled in the art appreciate that many graph theory and
19 algorithms, such as the Viterbi algorithm, are readily available for
selecting the most
likely route from a plurality of candidate routes.
21 If a tag device reports RSS measurements of a new set of WiFi
22 access point transmissions, then a new approximate location can be
determined
23 and the network arbitrator may request the CV/BV processing module to
look for a
164

CA 02934102 2016-06-22
1
corresponding blob among the detected blobs in the subarea of the WiFi access
2 point.
3
4 VIII-11. Taq device
Tag devices are designed to reduce power consumption. For example,
6 if a
tag device is stationary for a predefined period of time, the tag device then
7
automatically shut down with a timing clock and the accelerometer remaining in
8
operation. When the accelerometer senses sustained motion, i.e., not merely a
9 single
impulse disturbance, then the tag device is automatically turned on and
establishes communication with the network arbitrator. The network arbitrator
may
11 use the
last known location of the tag device as the current location thereof, and
12 later
updates its location with incoming information, e.g., new camera view
13 observations, new tag observations and location prediction.
14 With
suitable sensors therein, tag devices may obtain a variety of
observations. For example,
16 = RSS
of wireless signals: the tag device can measure the RSS of
17 one or
more wireless signals, indicate if the RSS measurements
18 are
increasing, decreasing, and determine the short term variation
19 thereof;
= walking step rate: which can be measured and compared directly
21 with the bounding box in camera view;
165

CA 02934102 2016-06-22
1 =
magnetic abnormalities: the tag device may comprise a
2
magnetometer for detecting magnetic field with a magnitude, e.g.,
3 significantly above 40 pT;
4 =
measuring temperature for obtaining additional inferences; for
example, if the measured temperature is below a first predefined
6
threshold, e.g., 37 C, then the tag device is away from the human
7 body,
and if the measured temperature is about 37 C, then the tag
8 device
is on the human body. Moreover, if the measured
9
temperature is below a second predefined threshold, e.g., 20 C,
then it may indicates that the associated mobile object is in outdoor;
11 and
12 = other measurement, e.g., the rms sound level.
13 Fig.
58B shows the initial condition flow and the output of the network
14
arbitrator in a mobile object tracking example of Fig. 58A. A single mobile
object
moves across a room. The network arbitrator provides the birds-eye view with a
set
16 of
initial conditions of mobile object entering the current subarea. The birds-
eye
17 view
maps the initial conditions into the location that the new blob is expected.
After
18 a few
image frames the camera view affirms to the birds-eye view that it has
19
detected the blob and the blob-track file is initiated. The birds-eye view
tracks the
blob and updates the object-track file. The network arbitrator has access to
the
21 object-
track file and can provide an estimate of the tagged object at any time. When
22 the
blob finally vanishes at an exit point, this event is logged in the blob-track
file
23 and the
birds-eye view computes the end of the object track. The network arbitrator
166

CA 02934102 2016-06-22
1 then assembles initial conditions for the next subarea. In this simple
example, there
2 is no query to the tag device as the identity of the blob was never in
question.
3 Tagged object may be occluded by untagged object. Fig. 59 shows an
4 example, and the initial condition flow and the output of the network
arbitrator are
the same as Fig. 58B. In this example, the initial conditions are such that
the tagged
6 object is known when it walks through the left-hand side entrance, and
that the
7 untagged object is also approximately tracked. As the tracking
progresses, the
8 tagged object occasionally becomes occluded by the untagged object. The
camera
9 view will give multiple tracks for the tagged object. The untagged object
is
continuously trackable with feature points and optical flow. That is, the blob
events
11 of fusion and fission are sortable for the untagged object. In the birds-
eye view, the
12 computation of the blob-track file to object-track file will request a
sample of activity
13 from the tag through the network arbitrator. In this scenario the tag
will reveal
14 continuous walking activity, which, combined with the prior existence of
only one
tagged and one untagged object, forces the association of the segmented tracks
of
16 the object-track file with high probability. When the tagged object
leaves the current
17 subarea, the network arbitrator assembles initial conditions for the
next subarea.
18 In this example, for additional confirmation, the tag device can
be
19 asked if it is undergoing a rotation motion. The camera view senses the
untagged
object has gone through about 400 degrees of turning while the tagged object
only
21 45 accumulated. However, as the rate gyros require significantly more
power than
22 other sensors, such as request will not be sent to the tag device if the
ambiguity can
23 be resolved using other tag observations, e.g., observation from the
accelerometer.
167

CA 02934102 2016-06-22
1 Fig. 60 shows the relationship between the camera view processing
2 submodule, birds-eye view processing submodule, and the network
arbitrator/tag
3 devices.
4
VIII-12. Birds-eye view (BV) Bayesian processing
6 In the following the Bayesian update of the BV is described. The
7 Bayesian update is basically a two-step process. The first step is a
prediction of the
8 object movement for the next frame time, followed by update based on a
general
9 measurement. It would be basic diffusion if nothing is known of the
motion of the
object. However, if an estimate of the blob velocity is available, and that
the
11 association of the blob and object is assured, then the estimate of the
blob velocity
12 is used. This velocity estimate is obtained from the blob Kalman filter
tracking of the
13 velocity state variable, based on the optical flow and feature point
motion of the
14 camera view bounding box with known information of the mobile object.
16 (i) Diffuse prediction probability based on arbitrary building wall
constraints
17 In this embodiment, the site map has constraints of walls with
18 predefined wall lengths and directions. Fig. 61 shows a 3D simulation of
a room
19 1400 having an indentation 1402 representing a portion of the room that
is
inaccessible to any mobile objects. The room is partitioned into a plurality
of grid
21 points.
22 The iteration update steps are as follows: =
168

CA 02934102 2016-06-22
1 Si. Let
the input PDF be Po. Then the Gaussian smearing or
2
diffusion is applied by the 2D convolution, resulting in P1. P1 represents the
3
increase in the uncertainty of the object position based on underlying random
4 motion.
S2. The Gaussian kernel has a half width of If such that P1 is larger
6 than Po
by a border of width. The system considers that the walls are reflecting
7 walls
such that the probability content in these borders is swept inside the walls
of
8 Po.
9 S3. In
the inaccessible region, the probability content of each grid
point in the inaccessible region is set to that of the closest (in terms of
Euclidean
11
distance) wall grid point. The correspondence of the inaccessible grid points
and
12 the
closest wall points is determined as part of the initialization process of the
13 system,
and thus is only done once. To save calculations in each iteration, every
14
inaccessible grid point is pre-defined with a correction, forming an array of
corrections. The structure of this matrix is
16 [Correction index, isource, isource, isink, isink]
17 S4.
Finally the probability density is normalized such that it has an
18
integrated value of one (1). This is necessary as the corner fringe regions
are not
19 swept and hence there is a loss of probability.
The probability after sufficient number of iterations to approximate a
21 steady
state is given in Fig. 62 for the room example of Fig. 61. In this example,
the
22 process
starts with a uniform density throughout the accessible portion of the room,
23
implying no knowledge of where the mobile object is. Note that the probability
is
169 =

CA 02934102 2016-06-22
1 higher in the vicinity of the walls as the probability impinging on the
walls is swept
2 back to the wall position. On the other hand, the probability in the
interior is smaller
3 but non-zero, and appears fairly uniform. Of course this result is a
product of the
4 heuristic assumptions of appropriating probability mass that penetrates
into
inaccessible regions. Actually, when measurements are applied the probability
ridge
6 at the wall contour becomes insignificant.
7 Figs. 63A and 63B show a portion of the MATLAB code used in the
8 simulation.
9
(ii) Update based on a general measurement
11 Below, based on standard notation, x is used as the general state
12 variable and z is used as as a generic measurement related to the state
variable.
13 The Bayes rule is then applied as
p(zIx)p(x)
p(xlz) = ___________________________ p(z) = np(zix)p(x), (31)
14 where p(x) can be taken as the pdf prior to the measurement of z and
p(xlz) is
conditioned on the measurement. Note then that p(z1x) is the probability of
the
16 measurement given X. In other words, given the location x then p(z1x) is
the
17 likelihood of receiving a measurement z. Note that z is not a variable;
rather it is a
18 given measurement. Hence as z is somewhat random in every iteration then
so is
19 p(z1x), which can be a source of confusion.
Putting this into the evolving notation, the calculation of the pdf after
21 the first measurement given can be expressed as
170

CA 02934102 2016-06-22
= 71Pz1,j,iPu0 ,j,i- (32)
1 Here, pz1,1,i is the probability or likelihood of the observation z given
that the object is
2 located at the grid point of fjAg, i6,91. The prior probability of pyi is
initially modified
3 based on the grid transition to generate the pdf with update as pi . This
is
4 subsequently updated with the observation likelihood of pzi- , resulting
in the
posterior probability of p}i for the first update cycle. 17 is the universal
normalization
6 constant that is implied to normalize the pdf such that it always sums to
1 over the
7 entire grid.
8 Consider the simplest example of initial uniform PDF such that pyi
is a
9 constant and positive in the feasibility region where the probability in the
inaccessible regions is set to 0. Furthermore, assume that the object is known
to be
11 completely static such that there is no diffusion probability, or the
Gaussian kernel
12 of the transition probability is a delta function. We can solve for the
location pdf as
0
Pu,j,i = 0 (33)
= 17 (34)
[I
13 Finally assume that the observation likelihood is constant with
respect
14 to time such that pzki =p11. This implies that the same observation is
made at
each iteration but with different noise or uncertainty. For large t the
probability of
16 Ki will converge to a single delta function at the point where p is
maximum
17 (provided that pyi is not zero at that point). Also implicitly assumed
is that the
171

CA 02934102 2016-06-22
1 measurements are statistically independent. Note that pyi can actually be
anything
2 provided that there is a finite value at the grid point where p(,),Li is
maximum.
3 Next,
consider the case where the update kernel has finite deviation,
4 which implies that there will be some diffusing of the location
probability after each
iteration. The measurement will reverse the diffusion. Hence we have two
opposing
6 processes like the analogy of the sand pile where one process spreads the
pile
7 (update probability kernel) and another group builds up the pile
(observations).
8 Eventually a steady state equilibrium will result that is tantamount with
the
9 uncertainty of the location
of the object. =
As an example, consider a camera view observation, which is
11 described as a Gaussian shaped likelihood kernel (a PDF), and may be the
BBTP
12 estimate from the camera view. The Gaussian shaped likelihood kernel may
be a
13 simple 2D Gaussian kernel shape represented by the mean and deviation.
Fig. 64
14 shows a portion of the MATLABO code for generating sLich a PDF. Figs.
65A to
65C show the plotting of py,i (the initial probability subject to the site map
wall
16
regions), pzk which is the measurement probability kernel which is a
constant
17 shape every iteration but with a "random" offset equivalent to the
actual
18 measurement z, and is the variable D in the MATLABO code of Fig. 64, and
p (the
19 probability after the measurement likelihood has been applied).
After a few iterations, a steady state distribution is reached, an
21 example of which is illustrated in Fig. 66. The steady state is
essentially a weighting
22 between the kernels of the diffusion and the observation likelihood.
Note that in the
172

CA 02934102 2016-06-22
1 example of Fig. 66, z is a constant such that pzkj,i is always the same.
On the other
2 hand, in the practical cases there is no "steady state" distribution as z
is random.
3 Consider the above example where the camera view tracking a blob
in
4 which the association of the blob and the mobile object is considered to
be
uninterrupted. In other words, there are no events causing ambiguity with
regards to
6 the one-to-one association between the moving blob and the moving mobile
object.
7 If nothing is known regarding the mobile object and the camera view does
not track
8 it with a Kalman filter velocity state variable, then the object
probability merely
9 diffuses in each prediction or update phase of the Bayesian cycle. This
is
tantamount to the object undergoing a two dimensional random walk. The
deviation
11 of this random walk model is applied in birds-eye view as it directly
relates to the
12 physical dimensions. Hence the camera view provides observations of the
BBTP of
13 a blob where nothing of the motion is assumed.
14 In the birds-eye view, the random walk deviation is made large
enough such that the frame by frame excursions of the BBTP are accommodated.
16 Note that if the deviation is made too small then the tracking will
become sluggish.
17 Likewise, if the deviation is too large then tracking will merely follow
the
18 measurement z and the birds-eye view will not provide any useful
filtering or
19 measurement averaging. Even if the object associated with the blob is
unknown, the
system is in an indoor environment tracking objects that generally do not
exceed
21 human walking agility. Hence practical limits of the deviation can be
placed.
22 A problem occurs when the camera view observations are interrupted
23 based on an obstruction of sorts like the object propagating behind an
opaque wall.
173

CA 02934102 2016-06-22
1 Now there will be an interruption in the blob tracks, and the birds-eye
view then has
2 to consider if these paths should be connected, i.e., if they should be
associated
3 with the same object. We calculated p without camera view observations
based
4 on probability diffusion and realize that the probability "gets stuck"
with p centered
at the end point of the first path with ever expanding deviation representing
the
6 diffusion. The association to the beginning of the second path is then
based on a
7 likelihood that initially grows but only reaches a small level. Hence the
association is
8 weak and dubious. The camera view cannot directly assist with the
association of
9 the two path segments as it has no assumptions of the underlying object
dynamics.
However, the camera view does know about the velocity of the blob just prior
to the
11 end of path 1 where camera view observations were lost.
12 Blob velocity can in principle be determined by the optical flow
and
13 movement of feature points of the blob resulting in a vector in the
image plane.
14 From this a mean velocity of the BBTP can be inferred by the camera view
processing submodule alone. The BBTP resides on the flobr surface
(approximately)
16 and then we can map this to the birds-eye view with the same routine
that was used
17 for the BBTP uncertainty probability polygon mapping onto the floor
space. If the
18 velocity vector is perfectly known then the diffusion probability is a
delta function
19 that is offset by a displacement vector that is the velocity vector
times the frame
update time. However, practically the velocity vector will have uncertainty
21 associated with it and the diffusion probability will include this with
a deviation. It is
22 reasonable that the velocity uncertainty grows with time and therefore
so should this
174

CA 02934102 2016-06-22
1 deviation. This is of course heuristic but a bias towards drifting the
velocity towards
2 zero is reasonable.
3
4 VIII-13. H matrix Processing
Below describes the H matrix processing necessary for the
6 perspective transformations between the camera and the world coordinate
systems.
7 The meaning of variables in this section can be found in the tables of
subsection "(vi)
8 Data structures" below.
9
(i) Definition of rotation angles and translation
11 Blobs in a captured image may be mapped to a 3D coordinate system
12 using perspective mapping. However, such a 3D coordinate system, denoted
as a
13 camera coordinate system, is defined from the view of the imaging device
or
14 camera that captures the image. As the site may comprises a plurality of
image
devices, there may exist a plurality of camera coordinate systems, each of
which
16 may only be useful for the respective subarea of the site.
17 On the other hand, the site has an overall 3D coordinate system,
18 denoted as a world coordinate system, for site map and for tracking
mobile objects
19 therein. Therefore there may need to a mapping between the world
coordinate
system and a camera coordinate system.
21 The world and camera coordinate systems are right hand systems.
22 Fig. 67A shows the orientation of the world and camera coordinate
systems with the
175

CA 02934102 2016-06-22
1 translation vector T = [0 0 ¨h]T. First rotate about Xc by (¨pi/2) as in
Fig. 67B.
2 Rotation matrix is
R1= [1 0 0
0 0 ¨11. (35)
0 1 0
3 Next, rotate in azimuth about Yc in the positive direction by az
as in
4 Fig. 67C. The rotation matrix
is given as =
R2 [C 0 ¨S
0 1 0 I, (36)
S 0 C
where C = cos(az) and S = sin(az). Finally we do the down tilt of atilt as
shown in
6 Fig. 67D. The rotation is given by
=
1 0 0
R3= [0 C SI, (37)
0 ¨S C
7 where C = cos(atilt) and S = sin(atilt). The overall rotation matrix is R
= R3R2R1,
8 wherein the order of the matrix multiplication is important.
9 After the translation and rotation the camera scaling (physical
distance
to pixels) and the offset in pixels is applied.
x,
x = s¨ + ox, (38)
zc
Yc
y s¨ oy. (39)
zc
11 x and y are the focal image plane coordinates which are in terms of
pixels.
12
176

CA 02934102 2016-06-22
1 (ii) Direct generation of the H matrix
2 The projective mapping matrix is given as H = [R ¨RT] with the
3 mapping of a world point to a camera point as
xc
xw
Yc
H
zw (40)
z,
1
4 Note that we still have to apply the offset and the scaling to map
into
the focal plane pixels.
6
7 (iii) Determining the H matrix directly from the image frame.
8 Instead of using the angles and camera height from the floor plane
to
9 get R and T and subsequently H, we can compute H directly from an image
frame if
we have a set of points on the floor and image that correspond. These are
called
11 control points. This is very useful procedure as it allows us to map
from the set of
12 control points to H to R and T. To illustrate this, suppose we have a
picture that is
13 viewed with the camera from which we can determine the four vertex
points as
14 shown in Figs. 68A and 68B.
We can easily look at the camera frame and pick out the 4 vertex
16 points of the picture unambiguously. Suppose that the vertex points of
Pout are
17 given by (-90, -100), (90, -100), (90, 100) and (-90, 100). The
corresponding vertex
18 points in the camera image are given as (0.5388, 1.2497), (195.7611,
39.3345),
19 (195.7611, 212.3656) and (0.8387, 251.3501). We can then run a suitable
function,
e.g., the cp2tform() MATLABO function, to determine the inverse projective
21 transform. The MATLAB code is shown in Fig. 69.
177

CA 02934102 2016-06-22
1 In Fig. 69, [g1,g2] is the set of input points of the orthographic
view,
2 which is the corner vertex points of the image. [x,y] is the set of
output points, which
3 are the vertex points of the image picked off the perspective image.
These are used
4 to construct the transformation matrix H. H can be used in, e.g., the
MATLABO
imtransform() function to "correct" the distorted perspective image Fig. 68B
back to
6 the orthographic view resulting in Fig. 70.
7 Note that here we have used 4 vertex points. We may alternatively
8 use more points and then H will be solved in a least-square sense.
9 The algorithm contained in cp2tform() and imtransform() is based
on
selecting control points that are contained in a common plane in the world
reference
11 frame. In the current case, the control points reside on the Zw = 0
plane. We will
12 use the constraint of
[ff:xyl F[R1211 R2
1 [[R112
=R
2
R T ffww yx = H xy (41)
[R3]1 [R3]2 1 1
13 to first determine H and then extract the coefficients of {R, T}. The
elements of H
14 are denoted as
H = {H11 H12 H13
H21 H22 H23I. (42)
H31 H32 H33
Note that the first two columns of H are the first two columns of R and
16 the third column of H is -RT. The object then is to determine the 9
components of H
17 from the pin hole image
components. We have =
178

CA 02934102 2016-06-22
f 1 = f fc cx =Hid' . + Hi2f,õõ + H13 . Hawx + H32fwy + H33'
fcyH2ifwx + _____________________ H22fwy + H23
f
Y fcz Hawx + H32fwy + H33' (43)
= =
1 which is rearranged as
i H3ifxfwx + H32fxfwy + H33fx = Hilfwx + Hi2fwy + H13,
(44)
(H31fyfwx + H32fyfvvy + H33fy = H21fivx + H22fwy + H23'
2 This results in a pair of constraints expressed as
(uxb= 0,
t
(45) u b = O.
Y
3 where
b = [H11
{ H12 H13 I-121 H22 H23 H31 H32 H33f,
ux = [¨fw, ¨fwy ¨1 0 0 0 fxfwx fxfwy fx],
(46)
u, = [0 0 0 .f.wx -fWy -1 fy fwx fy fwy fy]=
4 Note that we have a set of 4 points in 2D giving us 8
constraints but 9
coefficients of H. This is consistent with the solution of the homogeneous
equation
6 given to within a scaling constant as .
ux,i
uy,i
i b= [1.
ux,4
uy,4 0 (47)
7 Defining the matrix
ux,1 =
u = uy.,11
(48)
ux,4
tly,4
8 we have Ub = 08.
179 =

CA 02934102 2016-06-22
1 As stated above, any arbitrary line in the world reference frame
is
2 mapped into a line on the image plane. Hence the four lines of a
quadrilateral in the
3 world plane of Z, = 0 are mapped into a quadrilateral in the image plane.
Each
4 quadrilateral is defined uniquely by the four vertices, hence 8
parameters. We have
8 conditions which is sufficient to evaluate the perspective transformation
including
6 any scaling. The extra coefficient in H is due to a constraint that we
have not
7 explicitly imposed due to the desire to minimize complexity. This
constraint is that
8 the determinant of R is unity. The mapping in Equation (41) does not
include this
9 constraint and therefore we have two knobs that both result in the same
scaling of
the image. For example we can scale R by a factor of 2 and reduce the
magnitude
11 of T and leave the scaling of the image unchanged. Including a condition
that IRI=1
12 or fixing T to a constant magnitude ruins the linear formulation of
Equation (41).
13 Hence we opt for finding the homogeneous solution to Equation (41) to
within a
14 scaling factor and then determining the appropriate scaling afterwards.
Using the singular value decomposition method (SVD), we have
U = xv=wH. (49)
16 As U is an 8x9 matrix the matrix, x is an 8x8 matrix of left singular
vectors and w is
17 a 9x9 matrix of right singular vectors. If there is no degeneracy in the
vertex points
18 of the two quadrilaterals (i.e., no three points are on a line) then the
matrix v of
19 singular values will be an 8x9 matrix where the singular values will be
along the
diagonal of the left 8x8 component of v with the 9th column as all zeros. Now
let the
21 9th column of w be Wo, which is a unit vector orthogonal to the first 8
column
22 vectors of w. Hence we can write
180
=

CA 02934102 2016-06-22
0
Uwo = xvwHwo = xv o = x09,1
[
= 9xl= .
(50)
1
1 Hence Wo is the desired vector that is the solution of the
2 homogeneous equation to within a scaling factor. That is, b = Wo. The SVD
method
3 is more robust in terms of the problem indicated above that H33 could
potentially be
4 zero. However, the main motivation for using the SVD is that the vertices
of the
imaged quadrilaterals will generally be slightly noisy with lost resolution
due to the
6 spatial quantization. However, the 2D template pattern may have
significantly many
7 more feature points than the minimum four assumed. The advantage of using
the
8 SVD method is that it provides a convenient method of incorporating any
number of
9 feature point observations greater or equal to the minimum requirement of
4.
Suppose n feature points are used. Then the v matrix will be 2nx9 with the
form of a
11 9x9 diagonal matrix with the 9 singular values and the bottom block
matrix of size
12 (2n-9)x9 of all zeros. The singular values will be nonzero due to the
noise and
13 hence there will not be a right singular vector that corresponds to the
null space of
14 U. However, if the noise or distortion is minor then one of the singular
values will be
much smaller than the other 8. The right singular vector that corresponds to
this
16 singular value is the one that will result in smallest magnitude of the
residual
17 IlAwo 112. This can be shown as follows
IlAw0112 = 4AHAw0 = wwvx H xvwwo =
(51)
= "-smallest
181
=

CA 02934102 2016-06-22
= .
1 where As2mallest denotes the smallest singular value and wo is the
corresponding
2 right singular vector.
3 Once wo is determined by the svd of U, then we equate b = wo and H
4 is extracted from b. We then need to determine the scaling of H.
Once H is determined, then we can map any point in the Zw = 0 plane
6 to the image plane based on
1H11fwx + H12fvv3, + H13
fx = p
aa3lfwx H32 f3, H33'
(52)
f , lizifwx + H22fw3, + H23 .
=
Y H31fwx H32fwy + H33'
7
8 (iv) Obtaining R and T from H
9 From H we can determine the angles associated with the rotation
and
the translation vector. Details of this depends on the set of variables used.
One
11 possibility is the Euler angles of tax, a31, az}, a translation of fxr,
yr, zr} and scaling
12 values of {sx,sy,szl. The additional variable of s is a scaling factor
that is necessary
13 as H will generally have an arbitrary scaling associated with it.
Additionally there are
14 scaling coefficients of {sx, syl that account for the pixel dimensions
in x and y. We
have left out the offset parameters of fox, oy}. These can be assumed to be
part of
16 the translation T. Furthermore the parameters {ox, oy, sx, sy} are
generally assumed
17 to be known as part of the camera calibration.
18 The finalized model for H is then
182

CA 02934102 2016-06-22 =
sx 0 Oi F[R1]1 [R1]2
H = s 0 sy 0 [R2]1 [R2]2 ¨RT , (53)
0 0 1 [R3]1 [R3}2
1
2 (v) Mapping from the image pixel to the floor plane
3 The mapping from the camera image to the floor surface is
nonlinear
4 and implicit. Hence we use MATLABO fsolve() to determine the solution of
the set
of equations for {xw, ywl. For this example we assume that H is known from the
6 calibration as well as s, ox and oy.
xc
[ wl
Y
Ycl ¨ H w Fxo (54)
zc
1
xc
x = s ¨ ox, = (55)
zc
Yc
y = s¨ + oy, (56)
zc
7 Note that zw has been set to zero as we are assuming the point on
the
8 floor surface.
9
(vi) Data structures
11 Structures are used to group the data and pass it to functions
as
12 global variables. These are given as follows:
13 buildmap ¨ describing the map of the site, including structure of
all
14 building dimensions, birds eye floor plan map. Members are as follows:
183

CA 02934102 2016-06-22
member description
XD Overall x dimension of floor in meters
YD Overall x dimension of floor in meters
dl Increment between grid points
Nx, Ny Number of grid points in x and y
1
2 scam ¨ structure of parameters related to the security camera.
We
3 are assuming the camera to be located at x=y=0 and a height of h in
meters.
member description
=
Height of camera in meters
az Azimuth angle in radians
atilt Downtilt angle of the camera in radians
Scaling factor
OX Offset in x in pixels
oy Offset in y in pixels
3D translation vector from world center to camera center in world
coordinates
Projective mapping matrix from world to camera coordinates
4
obj ¨ structure of parameters related to each object (multiple objects
6 can be accommodated)
member description
xo,yo Initial position of the object
H,w,d Height, width and depth of object
Homogeneous color of object in [R,G,B]
=
vx, vy Initial velocity of the object
7
184

CA 02934102 2016-06-22
1 MiSC - miscellaneous parameters
member description
Nf Number of video frames
Index of video frame
Vd Video frame array
2
3 As those skilled in the art appreciate, in some embodiments, a
site
4 may be divided into a number of subareas with each subarea having one or
more
"virtual" entrances and/or exits. For example, a hallway may have a plurality
of
6 pillars or posts blocking the FOVs of one or more imaging devices. The
hallway may
7 then be divided into a plurality of subareas defined by the pillars, and
the space
8 between pillars for entering a subarea may be considered as a "virtual"
entrance for
9 the purposes of the system described herein.
Moreover, in some other embodiments, a "virtual" entrance may be
11 the boundary of the FOV of an imaging device, and the site may be
divided into a
12 plurality of subareas based on the FOVs of the imaging devices deployed
in the site.
13 The system provides initial conditions for objects entering the FOV of
the imaging
14 device as described above. In these embodiments, the site may or may not
have
any obstructions such as walls and/or pillars, for defining each subarea.
16 As those skilled in the art appreciate, the processes and methods
17 described above may be implemented as computer executable code, in the
forms of
18 software applications and modules, firmware modules and combinations
thereof,
19 which may be stored in one or more non-transitory, computer readable
storage
devices or media such as hard drives, solid state drives, floppy drives,
Compact
21 Disc Read-Only Memory (CD-ROM) discs, DVD-ROM discs, Blu-ray discs,
Flash
185

CA 02934102 2016-06-22
1 drives, Read-Only Memory chips such as erasable programmable read-only
2 memory (EPROM), and the like.
3 Although embodiments have been described above with reference
to
4 the accompanying drawings, those of skill in the art will appreciate that
variations
and modifications may be made without departing from the scope thereof as
defined
6 by the appended claims.
7
186
=

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(22) Filed	2016-06-22
(41) Open to Public Inspection	2016-12-25
Dead Application	2020-08-31

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2019-06-25	FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Application Fee			$200.00	2016-06-22
Maintenance Fee - Application - New Act	2	2018-06-22	$50.00	2018-06-18

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
APPROPOLIS INC.

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Change of Agent	2022-06-06	4	113
Office Letter	2022-06-23	1	187
Office Letter	2022-06-23	1	184
Representative Drawing	2016-11-29	1	10
Abstract	2016-06-22	1	16
Description	2016-06-22	186	6,661
Claims	2016-06-22	15	382
Drawings	2016-06-22	50	864
Cover Page	2016-12-28	2	44
New Application	2016-06-22	6	200
Change of Agent	2017-02-20	2	58
Office Letter	2017-03-08	1	22
Office Letter	2017-03-08	1	26

Language selection

Menus

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2934102 Summary

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.