Sélection de la langue

Search

Sommaire du brevet 2986160 

Énoncé de désistement de responsabilité concernant l'information provenant de tiers

Une partie des informations de ce site Web a été fournie par des sources externes. Le gouvernement du Canada n'assume aucune responsabilité concernant la précision, l'actualité ou la fiabilité des informations fournies par les sources externes. Les utilisateurs qui désirent employer cette information devraient consulter directement la source des informations. Le contenu fourni par les sources externes n'est pas assujetti aux exigences sur les langues officielles, la protection des renseignements personnels et l'accessibilité.

Disponibilité de l'Abrégé et des Revendications

L'apparition de différences dans le texte et l'image des Revendications et de l'Abrégé dépend du moment auquel le document est publié. Les textes des Revendications et de l'Abrégé sont affichés :

  • lorsque la demande peut être examinée par le public;
  • lorsque le brevet est émis (délivrance).
(12) Demande de brevet: (11) CA 2986160
(54) Titre français: ENTRAINEMENT DE VEHICULES EN VUE D'AMELIORER LES CAPACITES DE CONDUITE AUTONOME
(54) Titre anglais: TRAINING OF VEHICLES TO IMPORVE AUTONOMOUS CAPABILITIES
Statut: Réputée abandonnée et au-delà du délai pour le rétablissement - en attente de la réponse à l’avis de communication rejetée
Données bibliographiques
Abrégés

Abrégé anglais


Systems and methods to improve performance, reliability, learning and safety
and thereby
enhance autonomy of vehicles. Human sensors are used to capture human eye
movement,
hearing, hand grip and contact area, and foot positions. Event signatures
corresponding to
human actions, reactions and responses are extracted form these sensor values
and
correlated to events, status and situations acquired using vehicle and outside
environment
sensors. These event signatures are then used to train vehicles to improve
their autonomous
capabilities. Human sensors are vehicle mounted or frame mounted. Signatures
obtained
from events are classified and stored.

Revendications

Note : Les revendications sont présentées dans la langue officielle dans laquelle elles ont été soumises.


Claims
1. A training system consisting of an imaging device configured to acquire
images of an eye of
a subject, an image analysis system configured to extract data from said
images, wherein the
data includes eye movement events associated with at least one of saccades,
glissades,
microsaccades, fixations, smooth pursuit, dwells, or square-wave jerks,
wherein the data is
used to train vehicles.

Description

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.


TRAINING OF VEHICLES TO IMPROVE AUTONOMOUS CAPABILITIES
BACKGROUND
[01] Autonomous vehicles (AV) are expected to eventually replace the much of
the traditional
human operation of vehicles. The task of automation is greatly aided by
exponential growth of
computing capabilities, including hardware and software. Lidar, radar,
infrared and ultrasound
sources are being deployed in test vehicles to improve their autonomy.
However, vehicles that
are truly fully autonomous have not yet been developed.
BRIEF SUMMARY
[02] With the progression of time, and as computing powers and artificial
intelligence make
inroads into automation, what was once a task performed using simple sensors
has grown much
more efficient and versatile. However, mimicking human driving behavior and
improving upon
existing systems would go a long way for large scale adoption of fully
autonomous vehicles.
Current systems are limited in their abilities to operate in real-world
scenarios. To perform more
human-like tasks, it is essential to understand how humans drive, and to do
this it is essential to
capture events during a variety of scenarios when a human is operating a
vehicle. Operating a
vehicle is a complicated process, and vision is one of the key sensors for
humans. Using vision
sensing data helps automation become more human-like. Vision will become the
primary
differentiator from the previous generation of vehicles with autonomous
functionality.
[03] Learning from human behavior by incorporating human sensors to capture
human actions
during driving is advantageous. In a first embodiment, human eye movements
during driving are
used to train vehicles to be more autonomous. Other human sensors gather data
from binaural
microphones, data related to hand (grip and contact area on steering wheel)
and foot position,
which are used to train vehicles to improve their autonomous functionality.
Eye movement is
captured through cameras and smarthphones located on human mounted frames or
the dashpad
of the vehicle. Illumination sources include IR illuminators, light from phone
screens, and
ambient light. Hand grip and contact area on the steering wheel is captured
using a sensor array
that read contact points and their position as well as grip forces on each
sensor.
[04] When examples and embodiments are described to be related to vehicles,
these vehicles
can include land, air, space, water vehicles, including wheeled, tracked,
railed or skied vehicles.
Vehicles can be human-occupied or not, powered by any means, and can be used
for conveyance,
CA 2986160 2017-11-20

leisure, entertainment, exploration, mapping, recreation, rescue, delivery,
fetching, and provision
of services, messenger services, communication, transportation, mining,
safety, or armed forces.
[05] Vehicles can be operated in a range of modes. They can be fully under the
control of a
human, which is the case for non-autonomous vehicles; they can be fully
autonomous, without
the need for human intervention, assistance or oversight; or a range of types
in between, which
can be broadly termed semi-autonomous. Non-autonomous vehicles require a human
operator,
whereas fully autonomous versions do not require a human operator. All
examples of vehicles
appearing in this disclosure are automatic drive, that is, they have no clutch
pedal, just
accelerator and brake pedals that are both operated by the same foot.
BRIEF DESCRIPTION OF DRAWINGS
[06] Fig 1 is an example of a prior art non-autonomous vehicle.
[07] Fig 2 is an example of a semi-autonomous vehicle.
[08] Fig. 3 is an example of a fully autonomous vehicle.
[09] Figs 4a, 4b show different views of a trial autonomous vehicle with
traditional sensors.
[010] Figs 5a, Sal show parts of a human eye.
[011] Fig 5b shows the axes of the eye.
[012] Fig 5c shows different types of reflections from an eye.
[013] Fig 5d shows visual field of an eye.
[014] Fig 6 shows the image of an eye illuminated by an IR source.
[015] Figs 7a, 7b show a binaural recording dummy head
[016] Fig 7c shows the placement of microphones inside the dummy head-mounted
[017] Fig 7d shows a variation of the binaural device that is truncated above
and below the ears.
[018] Figs 8a-8c shows examples of placement of binaural microphone devices.
[019] Figs 9a-9e show hand contact area and grip sensing devices and their
details.
[020] Figs 10a-10h show foot position sensing concepts.
CA 2986160 2017-11-20

[021] Figs ha-c shows the inside of a car with various arrangements of cameras
and IR
illumination sources.
[022] Figs 12a shows the inside of a car with a single phone camera.
[023] Figs 12b shows the image acquired by the setup of fig 12a.
[024] Figs 12c shows the inside of a car with two phone cameras.
[025] Figs 12d shows the inside of a car with a single phone camera aimed at a
windshield
mounted patch.
[026] Figs 12e shows the inside of a car with a single phone camera having
illuminator patterns
on its screen.
[027] Figs 12f shows the inside of a car with two phone cameras having
illuminator patterns on
their screens.
[028] Figs 13a-d show details of an embodiment of a phone camera imaging
adapter.
[029] Figs 14a, 14a1, 14a2, 14b, 14b1, 14c, 14c1 show various arrangements of
frame mounted
eye and sound imaging systems.
[030] Fig 15a shows a scenario of eye movement data being used to train a
vehicle to improve
its autonomy. Fig 15b shows eye movement data of fig 15a.
[031] Fig 16 shows a scenario on a road with an approaching ambulance.
[032] Figs 17 shows a scenario on a road with a child at the edge of the road.
[033] Figs 18a-b show a scenario on a road with a long truck that is not
slowing down.
[034] Fig 19a shows a scenario on a road with a maintenance vehicle on the
road.
[035] Figs 19b, 19b1, 19b2, 19b3 show a scenario on a road with a child on a
bicycle at the
edge of the road.
[036] Fig 19c shows a scenario on a road with a ball rolling onto the road.
[037] Figs 19d1-19d3 shows a scenario on a road with a kangaroo crossing the
road.
[038] Fig 19e shows a scenario on a road with a dog on a leash by the edge of
the road.
CA 2986160 2017-11-20

[039] Fig 19f shows a scenario on a road with a dog on a stretched leash by
the edge of the
road.
[040] Figs 19e1, 19f1 show corresponding eye movements overlaid on figs 19e
and 19f.
[041] Figs 19e1a, 19fla show eye movement overlay separately with added
fixation details.
[042] Fig 20 shows a prior art autonomous vehicle control system.
[043] Fig 21 shows an autonomous vehicle control system incorporating human
sensors.
[044] Fig 22 shows different types of sensors used in a data capturing
vehicle.
[045] Fig 23 shows events recorded by human sensors.
[046] Fig 24 shows the event identifying module.
[047] Fig 25 shows a signature categorization scheme
[048] Fig 26a shows a human event occurrence detection scheme.
[049] Fig 26b shows an event extraction, categorization, map update and
training software
update scheme.
[050] Fig 1 is an example of a non-autonomous vehicle. Here, a human operator
is driving a
car. The steering wheel is on the right hand side (right hand drive- RHD), the
traffic pattern is
left hand traffic (LHT). The human driver has control of all functions,
including steering,
braking, acceleration, signaling (turn indicators, emergency indicator),
lights (high and low
beams), windshield wipers, vehicle atmospheric control (ventilation, heating,
cooling, humidity
control, defrosting). The car can have features like cruise control and anti-
lock braking system
(ABS), but these are not considered to contributing to vehicle autonomy.
[051] Fig 2 is an example of a semi-autonomous vehicle. Here, a human occupant
is sitting in a
RHD car in front of the steering wheel. The car's Autonomous Control System
(ACS) has
control of most functions, including steering, braking, acceleration,
signaling (turn indicators,
emergency indicator), low beam headlight, windshield wipers. The occupant's
intervention,
assistance, or oversight is only required in certain situations, for example,
in unfamiliar,
unexpected, unmapped, abnormal, emergency or malfunctioning situations, or
when a potentially
dangerous or illegal situation might arise.
CA 2986160 2017-11-20

[052] Fig 3 is an example of a fully autonomous vehicle. Here, human occupants
are sitting in a
car. There is no visible steering wheel. The car's Autonomous Control System
(ACS) has control
of all functions, including steering, braking, acceleration, signaling (turn
indicators, emergency
indicator), low/high beam headlight, windshield wipers, and defroster. The
occupants'
intervention is limited to emergency situations, wherein an emergency alert
can be sent, or the
car can be made to perform a subroutine like slowing down and stopping at the
nearest safe
location. Such situations, can include, for example: abnormal, emergency or
malfunctioning
situations. Such emergency maneuvers can be performed automatically, for
example by pressing
a button, or choosing from a list in a menu. A normally stowed steering can
optionally be
accessible. Driving skills are not required for most of these procedures or
maneuvers.
[053] Figs 4a, 4b show different views of a trial autonomous vehicle with
traditional sensors.
Lidar 401 uses pulsed laser light (of infrared wavelengths) to illuminate a
scene and measure the
reflected light pulses to create a 3D representation of the scene. The front
camera array 402 can
have one or more cameras. In the example shown, there are three cameras in the
front camera
array, each camera (operating in the visible wavelengths) having an
approximately 60 degree
horizontal field of view (fov), for a total of 180 degree coverage. The array
can optionally have
an IR camera with a wideangle lens. The side camera arrays each can have a
single visible
wavelength camera having a field of view of 60 degrees horizontal fov, and can
additionally have
an IR camera. The side cameras can be rotated about 30 degrees to avoid
overlap with the front
cameras. The back camera array can have a single visible wavelength camera
having a fov of 60
degrees horizontal, and can additionally have an IR camera. In essence, the
vehicle can have 360
degree horizontal coverage in the visible wavelength using 6 cameras. However,
this
arrangement can be varied. For example, the front array can be made to have 3
cameras with, the
middle one having a 30 degree fov, and two 60 degree fov camera on either
side, and wideangle
cameras on the side and back so that, together, all the cameras provide a 360
degree fov. For
acquiring stereo images, the camera counts can be doubled, and placed
appropriately. The
vehicle can include long range (405), medium range (406a, 406b - not visible)
and short range
(407a, 407b ¨ not visible, 408) radars systems. They map information from
nearby and far
objects (for example, up to 200 meters) related to the objects' velocity, size
and distance. Ultra
wideband radar systems can also be used. Ultrasonic sensors (404a and 404- not
visible, but on
the rear left wheel) sense the position of nearby objects.
[054] Since the human eye is one of the most used, useful, versatile and
powerful sensors, a
discussion of the eye relevant to this disclosure is provided. Figures 5a, Sal
show details of a
human eye. The outer part of the eye includes three concentric portions:
Cornea (501), Iris (502),
CA 2986160 2017-11-20

and Sclera (503). The border of the cornea with the sclera is the limbus. The
iris controls the
diameter and size of the pupil (504) and determines the color of the eyes.
Pupil diameter is
adjustable and controls the amount of light passing through it into the lens
(504a). Pupillary
constriction is thrice as fast as dilation. The retina (505) is the light
sensing part of the eye, and
has photoreceptor cells, of which cones comprise 6% and rods 94%. Rods and
cones in the retina
converts light falling on them into electrical signals, which are then sent
through the optic nerve
to the visual cortex in the brain for processing. The blind spot is the
retinal area to which the
optic nerves attach, and has no photoreceptors.
[055] Unlike rods, cones provide color vision. Rods have a low spatial acuity
but are better at
scotopic vision (imaging in low-light levels), and cones provide photopic
vision with high spatial
acuity. The macula (506) is an oval-shaped pigmented yellow spot near the
retinal center and
contains the fovea (507). The fovea is a small 1.5 mm diameter pit that
contains the largest
concentration of cone cells and is responsible for central, high resolution
vision. Eye movements
helps images of objects we want to see fall on the fovea. About 25% of visual
cortex processes
the central 2.5 degrees of the visual scene, and this relationship is
eccentric as we move away
from the fovea centralis. The fovea is rod-free, with a very high density of
cones, which falls off
rapidly away from the fovea and then levels off. At about 15 -20 from the
fovea, the density of
the rods reaches a maximum.
[056] Medial commisure (508) and lateral commisure (509) are the two inner
corners where the
eyelids join. Palpebral fissure is the opening between the eyelids. Canthal or
commisural tilts are
the angles between the lateral and medial commisures, with positive angles
associated with the
lateral aspect being above the medial. The lacrimal caruncle (510) appears
lateral to the medial
commisure.
[057] Eye movements alter the three-dimensional orientation of the eye inside
the head and are
controlled by three pairs of muscles to cause horizontal (yaw), vertical
(pitch), and torsional
(roll) eye movements. Eye orientation uniquely decides gaze direction. Figure
Sal shows two
such muscles: the superior oblique muscle (511) and the inferior rectus muscle
(512).
[058] Fig 5b shows the axes of the eye. Illumination along the optical axis
(on-axis
illumination) will cause light to be retroreflected from the retina, causing
the pupil to appear
brighter than the surrounding iris ¨ similar to red-eyes in flash-photography,
and is called the
bright-pupil effect.
CA 2986160 2017-11-20

[059] Fig 5c shows different types of reflections from an eye which is
illuminated by a light
source. Light entering the eye is refracted and partially reflected at various
layers. Reflection
occurs at the outer corneal surface (called the first Purkinje: Pl, this is
the brightest), inner
corneal surface (second Purkinje: P2), outer lens surface (third Purkinje: P3)
and inner lens
surface (fourth Purkinje: P4).
[060] When looking at a person's eyes, the reflection we see on the eye is
from the cornea (P1).
When imaging with a camera, infrared light can be used to illuminate the eye
so that this IR light
returning from the eye is selectively imaged, while the visible spectrum is
muted or discarded.
Corneal reflection P1 of the illumination source appears as a spot. Iris
reflection is dark (but has
color information). The pupil commonly appears dark in the eye image when
using off-axis
illumination. In this case, light reflected from the retina is not imaged by
the camera and
therefore the pupil appears as a dark circle against the surrounding iris.
This arrangement is more
pupil diameter variation tolerant than bright-pupil imaging.
[061] However, retinal retroreflection has strong direction dependence and can
be bright at
angles closer to normal causing the pupil to be bright. In this disclosure,
unless otherwise
specified, both the first Purkinje P1 (corneal reflection) and the pupil are
detected and used for
analysis of eye movements, and dark pupil imaging is used.
[062] When using pupil¨corneal reflection systems, calculation of the pupil
center can be
skewed by descending eyelids, downward pointing eye lashes, and use of
mascara. To alleviate
these issues, algorithms can work with the following assumptions: both the
iris and pupil are
roughly ellipsoidal; the pupil is centered inside the iris; the pupil is
darker than the iris, which, in
turn, is darker than the sclera.
[063] Fig 5d shows a diagram of the visual field including the fovea,
parafovea and peripheral
vision regions with an exemplary degree of the visual field that the regions
can see. The fovea
occupies 1.5 degrees of visual field, and provides the sharpest vision; the
parafovea previews
foveal information, and the peripheral vision reacts to flashing objects and
sudden movements.
Peripheral vision includes approximately 15-50% of the acuity of the fovea and
it is also less
color-sensitive. In the human eye, the three vision field regions are
asymmetric. For example, in
reading, the perceptual span (size of effective vision) is 3-4 letter spaces
to the left of fixation
and 14-15 letter spaces to the right. One degree of visual angle is roughly
equal to 3-4 letter
spaces for normal reading distances. When fixated on a scene, eyes are
oriented so that the center
of the image of the scene falls on center of the fovea, which is called the
point of gaze (POG).
The intersection of the visual axis with the scene can also be the POG.
CA 2986160 2017-11-20

[064] Eyes move during a majority of the time when awake. The different types
of eye
movements related to this disclosure are smooth pursuit, tremor, drift,
saccades, glissades, and
microsaccades. When looking at a scene, human eyes move around, rather than
being fixed in a
position. This movement locates regions of interest (ROI) in the scene to help
the brain create a
multi-dimensional map. For example, when looking at a (two-dimensional)
photograph, the eyes
make jerky but fast movements called saccades, and momentarily stop at several
points called
fixations. When looking at a scene on path of travel, for example a crowded
city road, a three-
dimensional map is created. Monocular eye movements are called ductions.
Movement nasally is
adduction, temporal movement is abduction, elevation is sursumduction (or
supraduction),
depression is deorsumduction (infraduction), incycloduction (intorsion) is
nasal rotation of the
vertical meridian, excycloduction (extorsion) is temporal rotation of the
vertical meridian.
[065] Binocular eye movements, wherein the two eyes move in the same
direction, are called
conjugate movements or versions. Dextroversion is movement of both eyes to the
right,
levoversion is movement of both eyes to the left, sursumversion (supraversion)
is elevation of
both eyes, deorsumversion (infraversion) is depression of both eyes.
[066] Depth perception (stereopsis) is extracted from binocular disparity
(disjugacy), wherein
the difference in image location of an object seen by the right and left eyes
is caused by the
horizontal separation (parallax) between the eyes. Vergences are simultaneous
movements of
both the left and right eyes in opposite directions (which can be converging
or diverging) to
provide single binocular vision. These disconjugate movements prevent double
vision (diplopia)
when a foveated object moves in space, for example, from a far distance to
closer to the eyes.
When moving left to right, a temporal non-syncrony can occur, wherein the
abducting eye moves
faster and longer than the adducitng eye, with this misalignment being
corrected at the end of a
saccade through glissades and drift. Most humans have a dominant eye, which
may be directed
in a different direction from than the passive eye.
EYE MOVEMENT EVENTS
[067] Fixation is when the eye temporarily stops at a location while scanning
a scene. Fixations
allow re-positioning of the fovea over ROIs to acquire and compose higher
resolution
information in conjunction with the nervous visual processing system. The
range for fixation
durations is 0.1 to I second, typically 200-600 ms. The typical fixation
frequency is less than 3
CA 2986160 2017-11-20

Hz. Fixations are not complete stillness, but can include three micro-
movements: tremors,
microsaccades (to quickly bring eye back to its original position), and drifts
(slow movements
taking the eye away from fixation center), or very low gaze velocities (below
10-50 degrees per
second).
[068] Saccades are rapid movement of the eye between fixation points, and are
events where
the eyes move fast and ballistically, with durations in the range 20-100
milliseconds, during
which period we are blind. Saccadic velocities can be in the 20-500 degrees
per second range,
with some peak velocities of up to 900 degrees/second. Saccades are rarely a
straight line
between two points, they take several shapes and curvatures. The end of a
saccade is not abrupt-
the eye wobbles before stopping. This post-saccadic movement is called a
glissade, and do not
appear at the beginning of a saccade. They are used to realign eyes before a
steady fixation. This
settling is similar to a precision motion controlled closed-loop stage
settling when at destination
leading to positional "ringing".
[069] The time between a stimulus and start of a saccade is the saccadic
latency, which varies
depending on the saccadic amplitude that follows, and are usually in the range
of 100-350 ms.
For a 5-10 degree saccade, the latency can be 200 millisecond. Refractory
periods between
saccades can be built into saccadic latencies or identified as being distinct
periods in cases of
very short or an absent inter-saccadic fixation, for example, when another
saccade is required to
be performed immediately following a saccade. Additional requirements can be
set in the
software interface, for example, clear peaks, maximum velocity.
[070] Smooth pursuit are slows motions of the eye as it follows a moving
target, for example,
an airplane in the sky. During smooth pursuit, the gaze position can lag the
target, and the eye
makes catch up saccades to re-foveate the target. Overshoots are corrected
using back-up
saccades, while leading saccades are anticipatory saccades. Velocities of
smooth pursuit
increases with straighter paths.
[071] Square-wave jerks are conjugate saccadic intrusions in the eye movement
while tracking
a target that causes the eye to lose tracking position and then restores it.
They consist of pairs of
small saccades in opposite directions which are separated by saccadic latency.
[072] Dwell has a specific meaning in this disclosure ¨ it is the time spent
in one region of
interest (R01). Dwells have starting and ending points, durations and
dispersions, but are
different from fixations because their temporal and spatial extents are larger
than fixations.
Transitions are gaze shifts used to move from one ROI to another. In one-way
transitions, and
CA 2986160 2017-11-20

unlike two-way transitions, gaze does not return to the same ROI right away.
Gaze revisits occur
when gaze returns to the same ROI, but after other transitions have occurred
in between.
Attention maps show the spatial distribution of data. An example is a dwell
map, which is a
pictorial illustration of all ROIs with a dwell time over a threshold. While
viewing a dynamically
changing image like a car driving along a crowded city street, the ROIs are
dynamically
changing. The attention map of the traversed path will have dynamically
changing ROIs, and
therefore have dynamic attention maps indicating dynamically changing heat and
dwell times
[073] Tremor has a specific meaning in this disclosure ¨ it is a fixational
eye movement of
amplitude less than 1 degree, and peak velocities of around 20 second/sec.
[074] Blinks are events surrounding the time period when the eyelid is closed,
and can be
algorithmically defined as loss of positional signal for a threshold duration
in combination with
eye movement distance data loss, for example, 50-100 ms over 10-20 degrees.
During blinks
amongst much of the population, the descending upper eyelid covers most of the
cornea. Blink
durations increase with drowsiness, alcohol levels, schizophrenia and similar
disorders. In this
disclosure, blinks are not considered a part of eye movements ¨ unlike
saccades, glissades,
microsaccades, tremors, dwells, smooth pursuit, and square-wave jerks.
Although blinks are
recorded, they are used to determine the cause of discontinuity or anomalies
in data that are not
explainable by eye movements. To reiterate, blinks do not form a part of eye
movements in this
disclosure.
[075] Eye-in-head fixations occur when the eye not moving relative to its
socket, for example
when the head is moving along with the stimulus. Eye-on-stimulus fixations
occur when the eye
is fixated on the stimulus, but moves inside its socket to track as well as
compensate for head
motion. In normal driving situations, the head is free to move, and therefore
both the eye and
head move when tracking objects at a high angle away from the median plane of
the subject. The
median plane is considered to be the same as the as the central plane of the
steering wheel.
[076] Fig 6 shows a screenshot of an eye movement tracker. The eye is
illuminated by an IR
source, and the acquired image has a dark pupil. The cornea (601) and pupil
(602) have been
identified, along with the corneal reflection (601a). Crosshairs through the
corneal reflection
center (601b) and the pupillary center (602a) are overlaid by the image
analysis system.
[077] Eye trackers are made by several companies, including SMI, Gazpet,
lmotions, Tobil,
ASL, SR Research, SmartEye, Seeing Machines. Binaural microphones are made by
3D10 and
Roland and others.
CA 2986160 2017-11-20

[078] Mounting of eye trackers can be on the subject's head, on a tower on
which the subject's
head is resting, or remote from the subject. A combinations of mounting
schemes can also be
used when required. For example, a configuration can have remote cameras and
illumination, but
head mounted inertial measure units (IMU) to detect head position in space.
Or, another
configuration can have dashboard/dashpad mounted illumination combined with
head-mounted
cameras. Apart from cameras used to image the eyes, eye tracker units can be
combined with
scene tracker cameras that capture the scene being viewed along or parallel to
the line of sight.
These trackers can be mounted on the frame, dashpad or on the outside of the
vehicle. Images
from the eye and scene trackers can be combined to produce gaze-overlaid
images. Furthermore,
using head/facial feature detection, head tracking cameras can also be added
to these systems.
[079] Most commercial eye trackers have options to adjust camera and
illuminator positioning
(linear and angular). Cameras can be automatic or manual focusing or require
no focus
adjustments. Automatic adjustment of linear and angular camera positions can
additionally be
carried out using feedback from the eye tracker's image analysis system. Eye
movement
measures can include direction, amplitude, time duration, velocity,
acceleration, and time
differential of acceleration. Tracing of a subject's eye movements spatially
and temporally
provides the scanpath events and representations, including saccades and
fixations.
[080] In non head-mounted eye tracking systems, extreme gaze angles will cause
precision and
accuracy deterioration in eye trackers, particularly when combined with head
rotation. Multiple
cameras and illuminators positioned appropriately can overcome such issues.
CALIBRATION
[081] Eyes vary widely within the population, and also from the ideal model,
because of non-
uniform shapes of the eye's components (like cornea and lens) between
individuals. Variation
between the two eyes of the same individual is also common. Saccadic
amplitudes vary within a
population for the same scene or task, and also vary between the two eyes of
the same subject.
All of these variations can occur within the "normal" population, or can be
caused by
abnormalities.
[082] Identifying and accounting for these variations will help deliver better
eye-movement
data. A discussions of variations and abnormalities follows, which can be used
for calibration
purposes whenever needed. Calibration can be carried out before the start of a
path before the
vehicle starts moving, or in between a path, or at the end of it. calibration
can be instructive or
interactive. For example, the driver can be prompted to look straight ahead,
then look at side
CA 2986160 2017-11-20

view mirrors, then the rearview mirror, then look ahead but into the sky (for
infinity- least focus
power of the eye's lens). Calibration can provide examples of specific pupil
and corneal
reflection relations to the tracker. Initial calibration of each individual's
left and/or right eye can
provide offset factors or equations for compensation when using a global
equation based on the
assumption of an ideal eye. For those wearing glasses, calibration can be made
with and without
glasses. Drivers can be instructed to close one eye at a time performing
calibrations. This can
detect abnormalities as well as the dominant eye. Calibrations using four gaze
positions can
account for several non-ideal eye conditions. Lateral and medial commisures,
lacrimal caruncle,
and canthal tilts can also be identified during calibration, some of which can
be used as
landmarks or account for head/camera tilts. Visible light sources like red
laser LEDs can be used
to calibrate eye movement as well as autorefractors.
[083] Drugs, alcohol, mental and physical disorders, age (very young children
and very old
people) will all affect eye movement. Data acquired from such subjects can be
be adjusted or
eliminated by identifying them as outliers. A similar situation arises with
contact lenses, thick or
complex lensed spectacles, heavy mascara, drooping eyelids (ptosis), squint
(due to glare or
laughter, for example), teary eyes and subjects with prominent epicanthic
folds. If such subjects
are a large subset of the general population, eliminating them can provide
data that is not truly
representative. When such subgroups become a significant proportion of the
data population,
hardware and/or software settings can be altered to utilize data without
discarding them as
outliers. Pupil size changes with the amount of ambient light, drugs,
cognitive load, emotional
state, fatigue, age. In subjects with anisocorea, pupillary sizes, including
during dilation and
constriction (mydrisasis and miosis), can be different between the two eyes.
Consensual is the
normal phenomenon wherein both pupils constrict or dilate even when one eye is
closed. Pupil
size is sensitive to the angular displacement between the camera and the eye
being imaged.
[084] Crossed-eye (strabismus) is present in varying degrees in about 4% of
the population, and
can be esotropic (nasally convergent) or exotropic (divergent). Strabismus can
be comitant
(present in all directions of gaze) or incomitant (varies with varying
directions of gaze), or
hypertropic (vertically misaligned).
[085] Eye trackers can have biases, noise and other statistical anomalies that
are inherent to
their software, hardware and optical system. Using eye trackers in moving
vehicles can
compound this due to vibration, thermal cycling and other non laboratory
environments/non-deal
settings. Using artificial eyes fixed in position can help detect and account
for these issues when
analyzing acquired data (for example, by using filters and offsets), and
thereby improve
CA 2986160 2017-11-20

accuracy, precision and confidence. Averaging data from the two eyes of the
same subject can
substantially improve precision. However, this comes at a cost, for example,
in terms of loss of
information related to vergences. Filtering and de-noising functions can be
used to overcome
these.
[086] If the head were constrained from moving, and only the eyes are moving
within their
sockets, a single camera and single infrared source can be used for detecting
eye movements.
Multiple cameras and IR sources can be used for better results if head
movement is allowed.
Sampling frequencies (frame rate per second) of currently available lower-end
cameras start at
50 Hz, and the higher the sampling rate, the better the quality of results,
particularly when
analyzing saccades, glissades, tremors and microsaccades.
[087] In commercially available software, parameters settings are used to
identify specific
events and separate them. These parameters include sensitivity settings for
each category of
events, saccade-onset, steady-state and end-velocity threshold, and
acceleration threshold. Since
different manufacturers use different algorithms, hardware and software
settings, these
parameters are not universally applicable. In many instances, the user
interface is simplified to
provide a few descriptive settings like low, medium, and high.
[088] Algorithms used to extract events from eye movement data can detect gaze
position,
velocity, acceleration and jerk, each of them being the time derivative of its
predecessor.
[089] In an embodiment, dispersion algorithms are used to detect fixations
without using
velocity and acceleration data to extract fixation onset and offsets. In an
embodiment,
probabilistic modeling of saccades and fixations are carried out using Hidden
Markov Models. In
an embodiment, detection of events relating to gaze, fixation, or saccades to
near objects, like
control buttons on a vehicle's steering, is carried out by identifying events
where glissadic
movements are different for each eye of the subject, but where microsaccades
occur in both eyes
at almost the same time.
[090] During a backtrack, a saccade following a previous saccade occurs in the
opposite
direction. Look-aheads saccades allow gaze to shift and fixate upon objects
that will soon need
to be used in some way. this is contrasted with saccading to other objects
that may be used in a
future planned action, for example, saccading to a radio knob on the dashboard
of a vehicle to
increase its volume. Ambient processing involving longer saccades and shorter
fixations are used
to scan the important or critical features of a scene first, followed by focal
processing for detailed
inspection using shorter saccades and longer fixations within regions. Target
selection along a
CA 2986160 2017-11-20

scanpath is guided by past experiences and memories of driving under similar
conditions, or
similar paths, or the same path, which also avoid revisits of earlier targets
that are
inconsequential. For example, consider driver "A" driving home at 6 pm from a
his place of
work, which he has been doing for the last 5 years as a matter of routine. He
will ignore most
traffic signs ¨ although he sees them in his peripheral vision, he will not
foveate/saccade to them,
However, he will pay attention to traffic lights, saccading slower to the
lights because of their
expected presence. Saccades will be reduced in number and saccadic velocities
(when compared
to driving though an unfamiliar path), while fixations and their durations
will increase.
[091] Saccadic Directions and Amplitudes: In an embodiment, when there is
negligible or no
inter-saccadic dwell or fixation between two saccades, and the first saccade's
travel was greater
than 15 degrees, the two saccades are considered to be purposed for targeting
the same object but
broken down into a first saccade and a second corrective-saccade. As an
example, this can occur
when entering intersections or roundabouts, where there is a requirement to
scan extreme angles
to inspect traffic entering the roads ahead. A similar situation arises when
entering a highway
from a minor road, wherein the driver is required to check the lane ahead as
well as the traffic
behind. In routine driving, viewing objects away from the fovea using
peripheral vision does not
allow for fine-detail cognition. However, details like traffic on adjacent
lanes far ahead is
relatively unimportant. It is usual to search for details within close
proximity to the current ROI
using foveal vision, for example, close by vehicles in adjacent lanes. When
viewing a road,
saccades to nearby locations can be more common after a fixation (for example,
a child on a
bicycle, and checking if there are adults accompanying the child), rather than
large amplitude
saccades to distant locations. In an embodiment, when the distances between
objects are very
small (on the order of 10 arcminutes), for example, a multitude of pedestrians
on the sidewalk,
an absence of saccades between the pedestrians is not taken as a lack of
cognition of all these
pedestrians by the driver, but rather advantageously devoting extra cognitive
resources for the
available retinal resolution in the peripheral vision and perceiving these
pedestrians at lower
resolution, all the while using foveal vision to perceive other more important
objects on the road.
When a subject is searching intently (as opposed to performing general
overviews), or when
concurrently performing other unrelated tasks, saccadic amplitudes tend to
drop. Saccadic
velocities decrease with drowsiness, predictable targets, older age,
neurological disorders, and
drug and alcohol use.
[092] In an embodiment, when tracking objects using smooth pursuits, for
example, a bird
taking off from the middle of a road and flying vertically, signature
detection algorithms are
CA 2986160 2017-11-20

programmed to accommodate jumpy vertical smooth pursuits. In contrast, this is
not the case for
horizontal smooth pursuits, for example, when a ball rolls across the road.
[093] In an embodiment, a specific instance of a table listing settings for
threshold and cutoff
values for a program having a set of subroutines suited to a particular
scenario, imaging and
sensor hardware, software and hardware setup is given below. These settings
can change from
instance to instance.
Type Duration ms Amplitude Velocity
Fixation 100-700
Saccade 30-80 4-20 degrees 30-500
degrees/sec
Glissade 10-40 0.5-2 20-140 degrees/sec
degrees
Smooth pursuit 10-30 degrees/sec
Microsaccade 10-30 10-40 15-50 degrees/sec
seconds
Tremor <1 degree 20 second/sec peak
Drift 200-1000 1-60 seconds 6-25 seconds/sec
BINAURAL SENSING
[094] Figs 7a, 7b show front and side views of a binaural-recording mannequin-
head having a
microphone (not shown) in each ear at the end of their ear canals (701). The
head can be made of
plastics or composites, while the pair of life-like ear replicas are made from
silicone. Fig 7c
shows the placement of microphones (702) inside the mannequin. The mannequin's
head is
similar in shape and size to a regular human head, but lacks many features
like lips and eyes. It
has ears that resemble the size and geometry of a human ear. The nose is a
straight-edge
representation of a human nose and casts a shadow of the sound. Sound wraps
around the
mannequin-head, and is shaped by the geometry and material of the outer and
middle ear. Some
of the is transmitted through the head. The two microphones record sound in
way that, when
played back, a 3-D 'in-head' acoustic experience is created. The mannequin
mimics natural ear
spacing and produces a "head shadow" of the head, nose and ears that produces
interaural time
differences and interaural level differences. Such an arrangement captures
audio frequency
CA 2986160 2017-11-20

adjustments like head-related transfer functions. Fig 7d shows a variation of
the binaural device
that is truncated above and below the ears of the mannequin.
[095] Fig 8a shows the placement of such a truncated mannequin-head in a car
above the
driver-side headrest, with its nose facing the driver. A bluetooth device (not
shown) within the
mannequin (801) transmits the sound stream from the microphones to a recording
device in the
car. This recording device can be integrated with the human sensor recording
system or a
standalone unit which timestamps the sounds as it records it. Fig 8b shows a
whole mannequin
head (802) placed on the backside of the backrest, aligned with the drivers
head. The nose is
above the headrest and facing the driver. Fig 8c shows a full mannequin head
(803) just as in fig
8a, but anchored to the roof of the car. Other possible configurations include
placing the
complete or truncated head on the dashpad, above the rearview mirror, on the
passenger-side
seat's head-rest, and on top of the driver's head (using a head-strap). A
frame-mounted variation
of the binaural recording device using a set of smaller ears and without the
intervening
mannequin-head, is shown in figs 14a-14d. The outer ears and ear-canals of
this frame-mounted
binaural device are made of silicone, with a microphone each at the end of the
ear canal.
[096] Fig 9a shows a steering wheel with a hand sensing mat (901) wrapped
around the outer
wheel. Fig 9b shows a segment (901a) of the hand sensing mat (901). The mat
has eight sensors
(902) in a row (902a) along its width. The length of the mat is chosen to fit
completely around
the steering wheel. In the example of fig 9a, there are 64 sensor rows (902a)
arranged
circumferential, with each row having 8 sensors. Each sensor (902) has a
sensing pad that detects
both contact and pressure of the palms and fingers. Fig 9c shows an enlarged
section of the mat
of fig 9b, with a cross-section through a row of mats appearing in fig 9d.
Each of the sensors
(902) are connected to a bus (serial or parallel) (903) that is connected to a
processor (not
shown). All the rows (902a) are connected to this bus. Each sensor (902) has a
unique address.
When a sensor (902) is touched or pressed, the touch event and pressure value
is sent via this bus
(903) to the processor. In an example operating scheme, the steering wheel is
programmatically
divided into left and right halves. In fig 9a, the left side has 32 rows of 8
sensors (for a total of
256 sensors), and the right side the same. Therefore, there are a total of
about 1.158 x 101'77
unique combinations. To derive a simpler correlation, the rows can be numbered
1 to 32. An
example of hand sensing information obtained during casual driving of a right
hand drive (RHD)
at constant speed along a particular segment of a path on a highway with wide
roads and very
little traffic, where the highway is fenced, has central divider and 3 lanes
in each direction,
touches denoted as Left is: [-34.82088284469496,149.38979801139794];
[t10:32:28]; [y(12),
x(2, 3, 4, 5, 6, 7), p(3, 3, 3, 2, 1, 0)]; [y(13), x(3, 4, 5, 6), p(3, 2, 2,
1)]; [y(14), x(4, 5), p(1, 0)];
CA 2986160 2017-11-20

[y(15), x(4)], p(0); R:[]. This data point indicates that at the recorded
latitude, longitude and time
hrs, 32 min, 28 sec, the left side of the steering wheel was contacted at row
12, sensors 2, 3, 4,
5, 6, 7 with pressure on these sensors of 3, 3, 3, 2, 1, 0, respectively. A
similar interpretation
applies for the remaining y, x, p values. The zero in the pressure data
indicates no pressure is
being applied, but there is contact. Pressure values are dimensionless, with a
range of 0-7, the
highest value indicating extreme squeezing of the steering wheel. R[] is a
blank data set
indicating that the right side of the steering has no contact with a hand
(right side of the steering
wheel is not being held). For a very simplified understanding of this dataset,
the pressure values
can be added: [(3+3+3+2+1+0) + (3+2+2+1) + (1+0) +(0)]=21 to indicate that the
left hand is
engaging the steering wheel at pressure 21 at the particular location and/or
time, whereas the
right hand is not holding the steering wheel. This can indicate a relaxed,
simple and familiar
driving situation where the driver is not required to be very alert. This
situation can be contrasted
with driving in a crowded city road that is un-fenced, undivided, with a lot
of pedestrians on the
sidewalks, bicycle lanes, traffic lights, intersections, frequently stopping
vehicles like buses. The
driver in such a situation is much more alert and cautious, with both hands on
the steering wheel,
gripped tighter than the usual. If the driver is new to this city and this
traffic pattern, the level of
alertness and caution will be even greater, and the grip on the steering wheel
tighter. Calibration
can be performed by asking the driver to perform different operations, for
example, holding the
steering with both hands without squeezing, then full squeeze with both hands.
[097] Fig 9e shows a variation of hand sensing device. Many drivers hold the
steering wheel
with their palms and fingers on the steering wheel's ring portion and their
thumbs resting on the
spoke portions. To sense fingers/thumbs on the spoke portion, additional
sensor mats (901a,
901b) are wrapped around on each of the spoke portions adjacent the ring. In
fig 9e, each side of
the ring-spoke intersection has 8 rows of 4 sensors each.
[098] When the vehicle is being turned steeply using the steering wheel, for
example, at a left
hand turn at a 4 way intersection in LHT, the turning action by the driver
will cause the gripping
the opposite sides as the steering wheel is rotated (as the turn progresses).
This pattern can be
detected by algorithms, and used appropriately (for example, to detect sharp
turns), or the dataset
can be discarded if not appropriate for the present computation.
[099] Figs 10a-10h show foot position sensing concepts. Fig I Oa shows the
inside of a LHD car
with accelerator (1001) and brake (1002) pedals. Figure 10b shows a close up
of modified
accelerator and brake pedals, each having a proximity sensor (1001a, 1002b)
mounted on them.
Proximity sensors can be any one of those known in the art, including
capacitive, light,
CA 2986160 2017-11-20

ultrasonic or time-of-flight (TOF) type. For example, the VL6180 TOF sensor
made by ST-
Microelectronics can be employed. Fig 10c shows another example of the foot
position sensing
concept. Here, there are two sensors (1003a, 1003b) on the brake pedal and two
sensors on the
accelerator pedal (not shown). Only the brake pedal is illustrated in this
figure. The distances
measured by the brake pedal sensors (1003a, 1003b) are dbp-foot (1004a) and
dbp-wall (1004b),
respectively. Similarly, the distances measured by the accelerator pedal
sensors are dap-foot and
dba-wall, respectively (these are not indicated in the figures, but are
similar to the brake pedal).
It is to be noted that dbp-wall and dap-wall are set to a value of zero when
not being depressed,
which is not apparent from fig 10c (which shows the actual distance between
the pedal and wall).
This can be done during calibration of the pedals at startup. When the pedals
are depressed, dap-
foot and dba-wall will actually return the values of how much they are
depressed, not their
distance from the wall.
[0100] Fig 10d shows an arrangement with three TOF sensors on each of the
brake and
accelerator pedals, two on the front face and one on the backside (not shown)
facing away from
the driver. Having two sensors on the front surface allows enhanced mapping of
the foot by
performing one or a combination of mathematical operations on the measurements
performed by
these front sensors. These operations can include: averaging, using data from
the sensor that is
currently providing the highest value, using data from the sensor that is
currently providing the
lowest value. Furthermore, a calibration step can be incorporated during
startup of the vehicle,
where the driver is prompted to perform various operations to obtain baseline
values. These
operations can include: asking the driver to rest the foot on, but not
depress, the accelerator
pedal, then the same operation for the brake pedal, then depressing each pedal
(while the vehicle
is in park mode).
[0101] Figs 10e-h shows the principle of operation of the foot sensor, with
the both the pedals
having two sensors each as described for fig 10b. In figs 10e and 10f, the
foot is on the
accelerator and brake pedals, respectively. The proximity sensor on the pedals
on which the foot
is on will now record its lowest value. When an anticipation of braking
arises, for example, when
driving a car and an unaccompanied child is seen 200 meters ahead, standing by
the edge of the
road and facing the road, the foot goes off the accelerator and moves over the
brake pedal,
hovering about 8 cm over it as in fig lOg (only brake pedal shown). As the car
approaches closer
and is 100 meters from the child, the foot gets closer to the brake pedal, and
is now 4 cm over it.
At 75 meters, the foot is on the brake pedal, but not depressing it, as in fig
10h (only brake pedal
shown). At 50 meters from the child, the brake pedal is slightly depressed to
slow the car. The
CA 2986160 2017-11-20

foot remains on the pedal until after crossing the child, and immediately
removed from the brake
pedal and the accelerator pedal is depressed.
[0102] As an example of foot and accelerator dataset for a short segment in a
path, consider a
driver driving a car through a suburban area having residential houses and
schools during a
school day and school hours. When either the brake pedal or the accelerator
pedal is not
depressed, dbp-wall=0 mm, and dap-wall=0 mm. Assume sample data capture starts
at time t=0
seconds. The foot is on the accelerator pedal to keep it at a constant speed
of 50 km/hour, and
this continues for 30 seconds. During this period, dap-foot=0 mm, dap-wa11=7
mm, which means
that the foot is on the accelerator pedal and depressing it by 7 mm. As the
car approaches a
school zone, the foot goes off the accelerator and depresses the brake pedal
to reduce the speed
to the legal speed limit of 40 km/hour, which occurs for 4 seconds. During
this period, dap-
foot=0, dap-wall=0, dbp-foot=0 mm, dpb-wall=5 mm. This is an expected pattern
of driving, and
can be compared with the map, which will similarly indicate a school zone with
reduced speed
requirement. However, after entering the school zone (at t=35 seconds), it
always is a possibility
that children will dart across the road. The foot is therefore mostly hovering
over the brake pedal,
particularly when getting close to the entrance of the school, in anticipation
of needing to brake
to avoid children darting across the road. At t=37 seconds, dap-foot=0, dap-
wall=0, dbp-foot=0
mm, dpb-wall=5 mm, which means that the foot is not hovering over the
accelerator pedal or
being depressed, the foot is on the brake pedal and pushing it down by 5 mm.
Just after t=37
seconds, the driver notices a small child on a bicycle exiting the gates of
the school and driving
towards the road. There is a possibility that the child is not paying
attention to traffic, and may
enter the road ahead. At t=39 sec, the foot depresses the brake pedal for 2
seconds to slow the car
down to 20 km/hour. The corresponding values for these 2 seconds are: dap-
foot=0, dap-wal1=0,
dbp-foot=0 mm, (dpb-wa11=5 mm to 12 mm). This sequence of values from t=0 to
t=41 values
can be stored in a file along with GPS and timestamps. The average of such
data sequences
collected by several different drivers over several days can be used as a
training file for an AV.
Data from the training file will indicate school days and hours because of the
behavior of drivers,
and also the average speed to be followed, and also the speed profile for the
section of the
segment of the path.
[0103] Figs ha-c shows the inside of a car with various arrangements of
cameras and IR
illumination sources for eye movement tracking. Fig Ila shows the inside of a
non-autonomous
car being driven through a path by a driver. The car is equipped with UPS,
inertial measurement
unit (IMU), LIDAR, radar, outside cameras, inside cameras and other outside
environment and
vehicle sensors shown in fig 24. The inside cameras track the subject's eye
movements and head
CA 2986160 2017-11-20

movements. The video data stream is saved with time and GPS/IMU stamps. The
saved stream is
them analyzed by an image processing system to extract event information,
including saccades,
microsaccades, glissades, tremors, fixations, drift. A map incorporating the
path's roads,
timestamps, GPS/IMU coordinates, speed profiles, driver behaviors (lane
changes, turn
indicators, braking, accelerating, foot going off the accelerator pedal and
moving/hovering over
the brake pedal, vehicle behaviors (turning radius etc). In Fig 11a, the car
has one imaging device
(1101) on its dashpad. The device is closer to the windshield than the edge of
the dashpad. The
device has one camera (1101a) and two IR illumination sources (1101b, 1101c).
The center of
the steering wheel is on the same plane as the sagittal plane of the driver,
and the center of the
camera is co-incident with the plane that connects the sagittal plane with the
central axis of the
steering wheel. Figure llb shows an arrangement with an imaging device (1121)
placed on the
dashpad and having two cameras (1121a, 1121b) and two IR illumination sources
(1121c,
1121d). The two cameras are each offset from the central axis of the steering
wheel by 4 cm.
Figure Ilc shows an arrangement with an imaging device (1131) having two
cameras (1131a,
1131b) and two IR illumination sources (1131c, 1131d), and two additional
cameras (1132a,
1132b) with these additional cameras having two IR illuminators for each (not
labeled in the
figure). The two central cameras are each offset from the central axis of the
steering wheel by 4
cm each, while one of the additional cameras is placed below the central axis
of the rearview
mirror, and the other is placed along the central horizontal axis of the
driver-side sideview
mirror, and at an elevation lower than the other cameras.
[0104] Figs 12a-f shows the inside of a car with various arrangements of phone
cameras. The
cameras of figs12a-12f can have a clip-on filter (an example appears in fig
12c) whose
transmission wavelength matches the illumination source wavelength. For
example, if the
phone's screen were programmed to display a narrowband blue wavelength, then
the filter's
transmission characteristics would match the same color. The filter can be of
any type, including
absorption and reflectance. Examples of a phone's screen color characteristics
are described with
reference to fig 12e below. In addition to, or as a replacement for, the
filter, a snap-fit or clip-on
lens system (single lens or multiple lens or zoom-lens) can also be added to
reduce the field of
view so that a much larger proportion of the driver's head is captured, thus
giving a higher
resolution of the eyes. Such a zoom-lens can be connected to the a system that
processes the
image acquired by the camera so as to make the zoom-lens auto-focus on the
driver's eyes,
giving a better focus as well as a higher resolution image of the eyes (by
filling in more of the
camera's sensors with relevant portions rather than unnecessary background).
These filter and
CA 2986160 2017-11-20

lens/zoom-lens arrangements can be adapted to use for front-facing as well as
rear-facing
cameras.
[0105] Fig 12a shows a mobile phone (1201) with its back facing camera (1201a)
facing the
driver and located along the driver's sagittal plane. The phone is secured on
a stand (1201b)
which is on the dashpad. The phone stands/holders in all embodiment in this
disclosure can have
tip-tilt adjustment. Such tip-tilt adjustment helps align the camera to
account for the
inclination/irregularities of the dashpad, angle of the windshield, driver
height and placement,
steering wheel height, and camera body irregularities. In the embodiment of
fig 12a, illumination
can be using ambient light, light from the phone's screen, or external
illuminator (as embodiment
of external illuminator is shown in fig 12d), or a combination. The quality of
data obtained from
ambient light is much lower compared to when using pure IR illumination, and
not all eye
movement events can be computed from this data. Fig 12b shows the image
obtained by the
phone's back-facing camera in a non-zoomed mode. The camera can be zoomed in
to capture
more of the face (and eyes) and less of the background, which will also
improve accuracy during
event extraction and provide higher quality results.
[0106] Fig 12c shows an imaging arrangement with two mobile phones (1202,
1203) with their
front facing cameras (1202a, 1203a) facing the driver. The phones are secured
on individual
stands which sit on the dashpad of the car. There are no illuminators present
except ambient light
and/or light from the phone's screen. A clip-on filter (as described earlier)
(1202b, 1203b) is also
shown.
[0107] Fig 12d shows an imaging arrangement with a mobile phone. The phone
(1204) is lying
on its back (screen facing dashpad), with its back facing camera (1204a)
facing the windscreen
and located along the driver's sagittal plane. The phone is secured on a base
stand (1204b). Two
IR illuminators (1204c, 1204d) are aimed towards the eyes of the driver. A
patch (1204e) of
width 15 cm and height 10 cm is affixed to the windscreen. The center of this
patch is aligned
with the center of the camera. The base stand (1204b) has tip-tilt adjustment.
The base is
adjusted such that the camera's center images the driver's forehead at a
center point between the
eyes. The size of the patch will be smaller the closer it is to the camera. It
is wide enough to
image an area that is three times the width of the driver's head, with a
proportional height. The
patch is optimized for full reflectance of IR wavelengths (the specific
wavelength band being the
band at which the IR illuminator emits light) at angles of 40-50 degrees,
preferably 43 to 48
degree angle of incidence. In this imaging arrangement, the placement position
of the camera (on
the dashpad) and its tip-tilt setting, the placement position of the patch on
the windscreen, and
CA 2986160 2017-11-20

the height of the driver are interrelated. The goal here is to place the path
as low as possible on
the windscreen, without its line of sight being obscured by the steering
wheel, while at the same
time centering the driver's eyes on the camera. Light from the IR illuminators
is reflected from
the eyes of the driver and is then reflected of the patch into the camera. The
patch does not
obstruct visible wavelengths, and therefore the driver is able to see through
the patch without the
view of the road being obstructed. The patch can be custom made for a specific
vehicle model,
including its optimum reflection angle at the required angle, taking into
account the angle of the
windshield.
[0108] Fig 12e shows a mobile phone (1205) with its front facing ('selfie')
camera (1205a)
facing the driver and aligned approximately with the center of the steering
wheel. The back
facing camera (not shown) faces the road ahead. The phone is secured on a
stand (1206) which is
on the dashpad. Illumination is provided by the camera's screen (1205b). The
screen has four
squares (1205c, 1205d, 1205e, 12050 of a particular narrowband color, while
the rest of the
screen is is blank (black). These four squares act as illuminators. The
particular color can be, for
example, the wavelength of 450 nm (blue), with a narrow bandwidth of +/- 15
nm. The color can
be chosen to be that of the particular phone model's peaks of display
illumination intensity. In
another example, this color can be 540 nm +/- 10 nm. Generally, the bandwidth
is chosen to be
narrow for intensity curves around a peak that are more flattened, and broader
bandwidths for
intensity peaks around which the intensity curves are steep. The imaging
software (of the
camera) is programmed to discard (from the acquired images) wavelengths above
and below the
narrowband wavelengths. The advantage in this imaging setup is that eye
movement tracking
becomes much more sensitive because the reflections of the four squares from
the eye can be
captured while rejecting ambient light, including reflected ambient light. The
four squares can
also each have a different narrowband color, or two of the same color, or any
such combination.
The phone's software is programmed to cause the screen to display these
specific narrowband
colored squares, and the phone's imaging software is set to reject other
colors from the images
captured by the camera. Instead of being a square, the shape of the
illumination areas can also be
another shape, like a circle, triangle, line or a grid pattern, or other
patterns similar to those
appearing in fig 12f. The size of these squares and other patterns can be of a
size that works well
with the particular zoom level. The color of the illuminator patterns can be
set to change
periodically, for example, change the color of each square every 0.1 second.
The color of the
illuminator patterns can be set to change automatically depending on the
dominating ambient
colors. For example, when driving through roads surrounded by greenery, the
phone detects this
dominance and automatically changes the color of the patterns to another
color. If greenery and
CA 2986160 2017-11-20

blue-skies are dominant, the color of the pattern is automatically changed to
another color like
red.
[0109] Fig 12f shows a variation of fig 12e. This arrangement shows two mobile
phones (1206,
1207) with front facing cameras (1206a, 1207a). Each of the mobile phones
have, on their
screen, different patterns. The first pattern (1206b) is a circle with a
crosshair through it, and
second is a solid circle (1206c), the third (1207b) has two concentric
circles, the fourth (1207c)
is a solid square. As with fig 12e, each of the patterns can have the same
colors or different
colors, or the crosshair can be of one color while its circle can be solid and
of a different color, or
the two concentric circles can be of different colors. The patterns are
generated by the phone, and
the imaging software can be programmed to identify these specific pasterns as
reflected from the
eye.
[0110] Figs 13a-d show details of an embodiment of mobile phone camera imaging
arrangement
with an adapter. Fig 13a shows the placement of the phone (1310) on the
dashpad (1311a), with a
portion of it overhanging into the dashboard (1311b). The camera is aligned to
the center of the
steering wheel. The adapter (1312) has, built into it, two mirrors (1312a,
13126) and two filters
(1312c, 1312d). The two mirrors are used to direct light (that has been
reflected from the eye)
into the back facing camera (1312e). Both the back facing camera (1312f) as
well as the front
facing (1312e) camera capture images. As is with most mobile phone cameras,
the front facing
camera captures a much larger area (but at a lower resolution) compared to the
rear facing
camera (which has a much higher resolution). The front facing camera is used
as a coarse
indicator of the eye position in the scene being captured, while the rear
facing camera captures
the finer details that are useful for eye movement event extraction. The rear
facing camera can
also be made capable of optical zoom (as opposed to software zoom) to get
close-up images of
the driver's eyes. These filters (1312c, 1312d) cutoff all wavelengths above
and below the
illuminator's narrowband wavelength. The illuminator can be external sources,
for example, IR
sources, or patterns on the phone's display. In an alternative embodiment, the
optical filters can
be dispensed with, and a scheme for software implemented filtering as was
described for figs 12e
can be used. The mirrors can be chosen to be fully reflective for all
wavelengths, or in an
alternate embodiment, selected for reflection only in the narrowband
illumination wavelength.
[0111] Figs 14a-c show various arrangements of frame mounted eye and sound
imaging systems.
Fig 14a shows a frame mounted eye movement and ambient sound imaging system as
worn by
the driver, with only a truncated head of driver shown in the figure. Fig 14a1
shows the top view
of fig 14a, while fig 14a2 shows the bottom view of the frame worn of fig 14a.
The frame is
CA 2986160 2017-11-20

symmetrical, including components mounted on it, and therefore components on
only one side of
the frame are numbered. The frame (1401) has an inertial measurement unit
(IMU) (1401a) with
a master timing clock. The IMU allows absolute position tracking of the head
of the driver. On
each side of the frame, there are: binaural recording device (1401b), two
cameras (1401c,
1401d), and three IR illuminators (1401e, 1401f, 1401g). It should be noted
that imaging of just
one (for example, the dominant eye) can be carried out instead of binocular
imaging in both head
mounted as well as remotely mounted (dashpad) systems. Of course, details like
vergences that
are related to binocular vision will be lost. Frame-mounted eye movement
imaging systems,
unlike dashpad mounted systems, are not aware of when the head is moving. IMUs
help extract
eye movement information if and when there is associated head movement, for
example, in eye-
in-head fixations. Both the eyes and head move when tacking objects at a high
angle away from
the steering wheel. In this disclosure, all reference to eye movement data
assumes that head
movement has been taken into consideration. It should be obvious that dashpad
or other remotely
mounted cameras (IR or visible wavelength) can be used to detect head movement
instead of
using IMUs.
[0112] Fig 14b shows an embodiment of a frame mounted eye movement and ambient
sound
imaging system (upper figure), and the bottom view (lower figure) of this
frame. The frame
(1402) has symmetrical components, and an extra IMU (1402d) in the center
portion. Only one
side of the symmetrically placed components are identified in the figure.
Prescription eyeglass
(1402a) is clamped on to the frame using a hard polymer clamp (1402b). Other
components are:
IMU (1402c), binaural recording device (1402e). In the space between the
eyeglass and the eye,
each side of the frame has two eye movement capturing cameras (1402f, 1402g),
two IR
illuminators (1402h, 1402i), a rear facing (road facing) camera (1402j) that
captures images of
the scene in front of the driver, and a autorefractor (1402k) that is used to
record in near real-time
where the eye is focused. The autorefratctor faces the pupil and has its own
IR source in-built,
and projects a pattern on the eye. The cornea and the phakic lens of the eye
together focus the
pattern onto the fundus. The wavefront reflected from the fundus is sensed by
a lenslet array in
the autorefractor, and the wavefront is analyzed. The focal length of the
eye's lens can then be
deduced from this measurement since the cornea has a fixed focal length in an
individual. The
line of sight of the eye can be derived from the eye position data extracted
when recording eye
movement data. Combining this line of sight with the focal length of the lens
provides
information on the point in space the eye was fixated on. The road-facing
camera on the frame
captures video in real-time, and this can be combined with the eye fixation
point to determine
what object was being fixated on.
CA 2986160 2017-11-20

[0113] Fig 14c shows an embodiment of a frame mounted eye movement and ambient
sound
imaging system (upper figure), and the bottom view (lower figure) of this
frame. Only one side
of the symmetrically placed components are identified in the figure.
Prescription eyeglass
(1403a) mounted on frame (1403) using a hard polymer clamp (1403b), IMU
(1403c, 1403d) and
binaural recording device (1403e). In the space between these eyeglasses and
the eyes, each side
of the frame has two eye movement recording cameras (1403f, 1403g), two IR
illuminators
(1403h, 1403i), and a autorefractor (1403k) that is used to record in near
real-time where the eye
is focused. Outside this space (that is, outside the eyeglasses), rear facing
camera (road facing
cameras) (1403j) that captures images of the scene in front of the driver.
These road facing
cameras are connected to the frame by a transparent hard polymer U-shaped
member (14031) to
the main frame, the U-shaped members going around the eye glasses. If
prescription eyeglasses
are not required, then the U-shaped, member is not required, and instead, the
road-facing camera
can be attached directly to the frame, for example, just behind one of the eye-
facing cameras.
The autorefractors in this embodiment do not face the pupil, but instead face
the eyeglasses. The
eyeglasses have an IR reflective coating applied on their inner surface (the
surface closer to the
eyes). This coating type can be a 100% IR reflectivity (wavelength specific to
the light source
used by the autorefractor) at around 30-60 degree angle of incidence. In
effect, the eyeglasses act
as mirrors at this wavelength. In another embodiment, the autorefractor and
the eye imaging
cameras can share the same IR illumination source, with the sources having a
pattern also
suitable for the autorefractor. The autorefractor records in almost real-time
the focal length the
eye's lens. As in the previous embodiment, this data can be combined with the
eye fixation point
to determine what object was being fixated on. In another embodiment (not
shown), the system
of fig 14c can be used without the autorefractor.
[0114] Any of the previously discussed frames mounted eye and sound imaging
systems can be
used with a reduced or increased number of components. For example, the frame
could have one
or more eye-facing cameras for each eye, with one or more IR illuminators. If
needed, the frame
can be made for imaging only one eye (for example, the dominant eye), the
other portion of the
frame being empty. The binaural recorders can be dispensed with if not
required, and the same
with the road-facing cameras and/or IMU sensors. In any of the embodiments of
the binaural
sensors, any of the previously disclosed binaural sensors can be incorporated
into any of the
frames of 14a-14c as long as they are of a size that can be mounted on the
frame without causing
inconvenience to the driver. Furthermore, the binaural sensors can be
incorporated to other parts
of the car or drive, including head rest, roof-mounting, on the driver's head.
CA 2986160 2017-11-20

[0115] Fig 15a shows a scenario of eye movement data being used to train a
vehicle to improve
its autonomy. It shows the first image of the video, but with 2.5 seconds
(starting with time of the
first image) worth of saccades and fixations overlaid on this image. A car is
being driven by a
human driver. The car being driven by the driver is not shown. What is visible
to one of the
outside facing cameras is shown in the figure. The driver's view is vignetted
by the frame of the
car, and what is visible to the driver are parts that are covered by glass
(like the windshield and
windows). As the driver is driving the car, the eye movement imaging system
(dashpad mounted
or head mounted or a combination) captures the eye movements of the driver. An
image analysis
system extracts data related to at least saccades and fixations, and
optionally also data related to
glissades, smooth pursuits, microsaccades, square wave jerks, drifts and
tremors. Figure 15b
shows saccades and fixations isolated from fig 15a for clarity. Saccades are
shown by lines with
arrows, fixations are shown as circles, the largest fixation circle being 600
ms, the smallest 200
ms. Time and geolocation stamps are gathered along with outside video,
driver's eye movement
video, and LIDAR. It should be appreciated that not all data may be available
at all times. for
example, during blinks, driving through tunnels, and poor weather conditions,
but available data
is recorded at all times. This data is saved in a AV's software. The saved
data indicates to the AV
which parts of the segment of this path need to be analyzed carefully or with
higher priority and
appropriate actions taken whenever necessary. Much higher computational
efficiencies can be
attained if foveated portions of an image are analyzed instead of the entire
image. Also, foveated
regions can be processed for color information, while the peripheral vision
can be analyzed for
moving objects, flashing objects, and sudden movement, lending itself to much
faster, accurate
and efficient computation.
[0116] In the scenarios of figs 16-20, time and geolocation stamps are
gathered along with all
sensor data (as in fig 20) including outside video, driver's head/dashpad
mounted outside facing
camera video, driver's eye movement video, binaural audio, foot and hand
sensor data,
speedometer, RPM, wheel turning angle, weather (temperature, precipitation,
visibility,
humidify), LIDAR, radar, and ultrasound. Signatures are relative to each frame
in a video, or a
series of consequent frames of the video. The values of these events are
recorded within each
video frame (as metadata) or in a separate file (but with synchronized
timestamps and/or position
data) as multi-dimensional vectors that include timestamps, physical location
(GPS/IMU),
vehicle, outside environment and human sensor (shown in fig 22) data. It
should be appreciated
that not all sensor data may be available at all times. for example, when
using a mobile phone to
record sound and eye movement, binaural data will not be available, just a
single microphone
data. Or, the driver may be driving a vehicle that doesn't have all the sensor
systems installed.
CA 2986160 2017-11-20

Absence of some of these sensor system doesn't take away from the fact that
event signatures
can still be extracted- although with a loss of robustness and possible
increase in latencies.
[0117] Fig 16 depicts a scenario of a human driver driving a car in a city
road with several
intersections and light traffic. In this figure and its accompanying
description, the car being
driven by the driver is not shown, and all references to a driver relates to
the driver of this car.
However other cars (1601, 1602) on the road are shown. The figure shows roads,
buildings, and
other features from the perspective of the driver. The A-beams of the car are
not shown in the
figure, only the area within the windshield. The car has human, outside
environment and vehicle
sensors (as listed in fig 20).
[0118] An active ambulance is nearby, but not yet visible to the driver
because it is hidden by a
building (1603). The ambulance's sirens can be heard, but its flashing lights
are not yet visible to
the driver because buildings are blocking the view of the perpendicular roads
ahead. Sounds,
unlike light, are not completely blocked by buildings and trees. It appears to
the driver that the
ambulance is on one of the cross-roads since the road ahead and behind are
clear.
[0119] When the ambulance's sirens become audible and discernible, the driver
removes his foot
off the accelerator and moves it over the brake pedal, while saccading to the
rearview mirror,
driver-side sideview mirror, and left and right side in front to find out
where the ambulance is.
This saccading pattern is repeated until the driver is able to aurally
establish the origin of the
sound as coming from the front. After this, the driver's saccades are directed
towards that region
in the front. As soon as the reflections of flashing lights (1604) of the
ambulance are seen by the
subject (reflections bouncing from buildings, road and trees), the brake pedal
is depressed
slightly (inversely proportional to how far ahead the ambulance's lights are).
The brake pedal is
then continuously depressed to slow the vehicle to bring it to a rolling stop
if and when the need
arises. As soon as the ambulance exits the intersection (1605), the
accelerator pedal is depressed
to speed up the car if there are no other emergency vehicles following the
ambulance. The
binaural recording provides an extractable signature for the ambulance's
siren. The human event
occurrence detection scheme of fig 26a is used to detect that a human event
has occurred in fig
16 since there is a foot release from the accelerator and movement over the
brake pedal and also
an associated aural event (ambulance siren) detected. Once a human event has
been detected, the
next step is to find the associated outside event that caused the human event
to occur. The
associated eye movement data is used to analyze the video images of the road
ahead and behind
(from road facing cameras) for detectable events. The image analysis is
efficient because only
the small parts of the images where the eyes are saccading and fixating are
analyzed. The initial
CA 2986160 2017-11-20

faint lights of the ambulance are detected in the video images. Critical
features include flashing
lights and specific colors of the light. This forms the process of event
signature extraction of fig
26b. Extracted components include aural (siren sound), video (flashing light),
and foot (slowing
down of car). This is followed by the categorization, map update and training
software update as
shown in fig 26b. Several such instances under different conditions and from
different drivers
and geographical regions are similarly extracted and stored in the database.
The "ambulance"
event (in the form of a subroutine for emergency vehicle identification and
reaction) can first be
implemented in test vehicles. These test vehicles can be semi or fully
autonomous. A variation of
this scenario is when there is no light, just sound- which can be the case in
crowded cities, for
example. In such instances, the only the binaural signal is captured. When
using non-binaural
recording (mobile phone with a single microphone, for example), directionality
will be lost, but a
sound signature can still be extracted, and combined with other human and
vehicle (outside and
inside) sensor data.
[0120] Signatures from multiple instances of such ambulance appearances from
different subject
drivers can be used to form an averaged scenario signatures (including sound
and light
signatures) and AV response to an approaching ambulance. This can be a group
of drivers in a
region having similar flashing light schemes and sounds for ambulances, and
also similar traffic
rules regarding ambulances. Although one instance can be used for training,
for improved
accuracy, several of such events from several different paths driven by
several different drivers
can be acquired and used to train AVs. This specific subroutine is then fine-
tuned by software
self-learning (artificial intelligence) or by a (human) programmer or a
combination. After several
cycles of fine-tuning and testing, the subroutine can be implemented in non-
trial AVs. Without
this updated software, the AV would have continued without reducing speed
significantly ¨ until
an ambulance actually appeared.
[0121] [01] Fig 17 shows a scenario in which a small child is ambling towards
the edge of the
road of the opposite lane. A human driver is driving a car at around 36
km/hour speed on this
narrow undivided road with buildings on either side. The driver sees the child
(1701) emerging
from behind a pillar (1702) without an accompanying adult, 100 meters ahead.
The edge of the
pillar is 2 meters from the edge of the road. In reality, an adult is holding
the hand of the child,
but is behind the pillar and therefore hidden from the driver's view. The
driver's eyes saccade to
the child and form an ROI around the child (ROI-child), which includes
checking for adults
holding the child's hand, and tracking the child moving closer to the road,
interspersed with
saccades to the road ahead. The driver has now become alert, and increased
hand grip and
contact area on the steering wheel. The foot goes off the accelerator and over
the brake pedal
CA 2986160 2017-11-20

immediately. With the eyes unable to find an accompanying adult, and the child
being about 70
meters ahead, brakes are applied to lower the speed from the initial 36
km/hour to 18 km/hour in
a span of 1.5 seconds. As the eyes saccade to and fro between the REM-child
(which is now about
60 meters ahead of the driver) and the road as the child inches closer to the
road, the driver is
still unable to spot an adult. The car is slowed from 20 km/hour to 5 km/hour
in 2 seconds. The
child is 1.5 meters from the edge of the road, and the car is about 50 meters
from the child. The
brake pedal is kept depressed in preparation for a complete stop to take place
about 10 meters
from the child. However, 30 meters from the child, the driver is able to see
the adult holding the
child's hand. The foot goes off the brake now. The adult (who has apparently
seen the
approaching car) restrains the child from moving forward. The driver presses
on the accelerator
pedal to slowly bring back the speed to 36 km/hour. This signature is captured
and processed,
and then filed in a "unaccompanied child approaching road" sub-category under
main category
"Child" (as listed in fig 25), this process is described later. From the
foregoing, it can be seen
that the driver was being over-cautious. He reduced the speed to 5 km/hr at 50
meters from the
child, even though the child was 1.5 meters from the road. However, when data
is gathered from
a large population of drivers, the average speed at 50 meters from the child
would be 20 km/hr,
and can be used by an actual AV.
[0122] The human event occurrence detection scheme of fig 26a is used to
detect that a human
event has occurred in fig 17 since there is a sudden foot release from the
accelerator and
movement over the brake pedal and increase in hand grip and contact area on
the steering, both
with associated eye-movement to the side of the road and formation of ROIs and
saccades/fixations around the child. Once a human event has been detected, the
next step is to
find the associated outside event that caused the human event to occur. Video
images from
cameras facing the road are analyzed using image processing, and the child is
identified as
corresponding to the eye movement data, as also the edge of the road. Critical
features like lack
of adult accompanying the child, and the spacing between the child and the
road are stored as
part of the signature. This forms the process of event signature extraction of
fig 26b. This is
followed by the categorization, map update and training software update as
shown in fig 26b.
Several such instances under different conditions and from different drivers
and geographical
regions are similarly extracted and stored in the database. When the updated
training software is
used by an AV, and the AV encounters a similar "unaccompanied child
approaching road"
scenario, it reduces speed and analyzes the ROT around the child at a higher
priority, while
reducing speed to about 20 km/hour by the time it gets to 50 meters of the
child. Once the adult
is detected, the sped is resumed to 36 km/hour. Without this updated averaged
software, the AV
CA 2986160 2017-11-20

would have continued without reducing speed, and an accident could have
probably occurred if
the child was actually unaccompanied and entered the road. The additional
benefit of using the
updated software is that higher speeds can be maintained without being overly
cautious, and
rational speed decisions can be made depending on how the situation evolves. z
[0123] Fig 18a shows an aerial view of a scenario in which a human driver is
driving a car
(1801a) that is approaching a roundabout having three roads entering it. All
references to a driver
in this scenario relate to the driver of the car. A heavy truck (1801b) is
also approaching the
roundabout (1803). Both the car and the truck are traveling in the directions
shown, and are 200
meters from the roundabout. The car is traveling at 80 km/hour and the truck
slightly slower at
75 km/hour. The distance (1804) between the car's lane's entry point into the
roundabout and the
truck's lane's entry point is about 75 meters. While fig 18a shows this
starting scenario. The
truck is not slowing down as it is getting closer to the roundabout. The car
has the right of way,
but the driver is not sure if the truck will eventually stop. The truck
appears in the driver's
peripheral vision, and the driver makes a saccade towards the truck, and then
slow tracks it (for
about 3 seconds) as it approaches the roundabout. During this period, the
driver's grip on the
steering wheel and the contact area increase slightly. The foot goes off the
accelerator, but does
not move over the brake pedal. The driver then makes a saccade towards the
roundabout to check
if there are other vehicles in or about to enter the roundabout (vehicles
inside the roundabout
have the right-of-way), and observes that the roundabout is clear. The
driver's eyes then quickly
saccades to the truck to slow-track it for another 3 seconds. Since the truck
is not slowing down,
but continuing towards the roundabout, the driver's foot goes over the brake
pedal and depresses
it to halve the speed from 80 km/hour to 40 km/hour in 4 seconds. Fig 18b
shows the perspective
view of the scenario at this time, wherein the truck is about 40 meters from
the roundabout and
starts slowing down rapidly and the car has already entered the roundabout.
The car driver has
been slow tracking the truck, and notices it is slowing down. The driver's
foot goes off the
brakes for 1.5 seconds, while the eyes saccades to the roundabout to check for
entering traffic,
and saccades back to the truck (which has almost come to a complete stop) and
goes over the
accelerator pedal and depresses to rapidly speed up to 60 km/hour and enter
the roundabout. The
eyes fixate on the truck while crossing the roundabout. The scenario beginning
at fig 18a,
proceeding through 18b, and ending after the car has exited the roundabout, is
captured and a
signature extracted and categorized under "Danger" (see signature
categorization in fig 25 and
related text).
[0124] Fig 19a shows a scenario where a driver is driving a car encounters a
maintenance truck
on the same lane replacing street lights. In fig 19a, the car is not shown,
only the maintenance
CA 2986160 2017-11-20

truck (1901) is shown. The truck has a flashing yellow light (1902), and an
extended boom
(1903) with a platform (1904) having a person on it. The car is 60 meters from
the truck and
traveling at 40 km/hour, and the truck is 40 meters from the intersection
(1905). The car is on a
'right turn only' lane, and intends to turn right at the intersection. The
driver sees the truck on the
lane. The driver's eyes saccade to the truck body, then to the boom and the
platform above, and
then to the person on the platform. The eyes establish an ROI around the
truck, boom and person,
saccading around it, while the hand grip and contact surface area on the
steering wheel increases.
The foot simultaneously goes off the accelerator and on to the brake pedal,
slightly depressing it.
The eyes then saccade to the rearview mirror and sideview mirror, end of the
road (which the
driver notices is about 40 meters from the intersection), and then back to the
truck. The car is
slowed down to 15 km/hour over 3 seconds. The car is now 30 meters from the
truck. The human
driver instinctively decides to drive around the truck by switching to the
other lane on the same
side without expecting the truck to start driving away. After this, the driver
switches back into the
original lane. If the truck were parked at the intersection, then the human
driver would have
switched lanes and taken an alternate route, for example, going straight
through the intersection.
The decision to switch lanes to get around the truck involved the eyes
establishing an ROI
around the truck-boom-platform, and saccading and fixating within this region,
and also to the
rear/sideview mirrors and the intersection, deciding it is safe to switch to
another lane and back
again (while mentally noting that there is no traffic in rear/side view
mirrors, and there is enough
distance between truck and intersection). The signature of this event is
captured (as described in
the previous scenarios), and categorized under "Unexpected Objects" (see
signature
categorization in fig 25 and related text), under a sub-category of
"Maintenance".
[0125] Fig 19b shows a scenario of a child on a bicycle on the pavement on the
same side of the
lane that a human driver is driving a car. Fig 19b1 shows eye movement data
for the first 4
seconds of this scenario superimposed on a still image. Fig 19b2 shows just
the eye movement
data, while 19b3 shows an enlarged version of fig 19b2. The circles represent
fixation points and
time, the largest circle corresponding to a fixation time of 500 ms, while a
majority of them are
150 ms. The straight lines represent saccades, with directions indicated by
arrows. Over the
course of this scenario, there is no other distraction in the foveal or
peripheral vision, including
no traffic lights or other traffic. The car is on the rightmost lane and 100
meters away from the
child (1910), driving at 50 km/hour. There is no traffic on the road. The
driver's eyes saccades to
the child and the bike (1911), forming an ROI around it. The eyes-brain
combination conclude
that the bicycle is stationary, with both feet of the child on the ground, and
the front wheel is
close to the edge of the road. There are no adults accompanying the child (and
therefore the
CA 2986160 2017-11-20

child's actions can be more risky). The child appears very young in age,
perhaps 4-8 years old,
and therefore can perform unexpected moves, including riding the bike into the
road without
waiting for the car to pass, or stumbling and falling onto the road. Expecting
this, the driver's
grip and contact area on the steering wheel increases slightly, while the foot
goes off the
accelerator and goes over the brake pedal and depresses it to bring the speed
down to 25 km/hour
over 4 seconds, all the while saccading within the same ROI to detect
unexpected actions of the
child, except for one saccade to the end of the road and one slightly to the
right of this point. The
car is now 60 meters from the child. The child is closer, and the driver is
able to confirm that the
child is indeed very young, probably 4-6 years old. With no change in the
child's pose (i.e. the
child is well-balanced and stable, and not rocking the bicycle back and
forth), the driver's
apprehension level drops, but is still very cautious because of the age of the
child, and drops the
speed down to 15 km/hour in 4 seconds. The car is now 35 meters from the
child. The driver
halves the speed down to about 8 km/hour over 4 seconds, and is about 20
meters from the child.
The car proceeds at this very low speed until it is 5 meters from the child.
The driver then
removes the foot from the brake and depresses the accelerator pedal to bring
the speed to 40
km/hour in 3 seconds. The signature of this event is extracted and categorized
under "Child" (see
signature categorization in fig 25 and related text), under sub-category:
"Child on bicycle", sub-
sub-category "unaccompanied child on bicycle" and a further, sub-category:
"unaccompanied
child on bicycle at edge of the road".
[0126] The learning here is that the driver's reaction is proportionally
related to the child's age,
distance from the edge of the road (inverse relationship), absence of
accompanying adults, and
present speed of travel. These reactions include saccades around the ROI, grip
and contact area
on the steering wheel, reduction in speed (including the quantum of reduction,
latency to starting
the reduction process, distance from the child before the reduction is
applied). In the AV
software, the image processing system processes these factors to form a
response, including
speed reduction. Without training, a traditional AV software will not prepare
for evasive actions
or reduce speed to account for the unexpected. The overall speeds are on the
lower end for AVs
compared to humans because they are cautionary all the time. Training AVs can
make them
faster, while helping build more logic and rationale to such situations. If a
very small child on a
small bicycle is being closely accompanied by an adult, then the image
processing will identify
the adult following the child's bike and become less cautionary. There are
variations in such a
scenario: for example, there is an adult, but the adult is 5 meters away from
the child. Caution
and speed reduction will become greater now. Automatic identification of such
an
"unaccompanied child on bicycle at edge of the road" scenario will become
easier, efficient, and
CA 2986160 2017-11-20

more comprehensive when data from a swarm of drivers is used. The collection
of such scenarios
will grow with time and become well-defined algorithms in the training
software. Over time,
variations of "kid on a bike" (like "kid on skateboard") can be added to the
set of algorithms,
particularly as the test-base grows. New but unidentifiable variants can be
manually processed
for scenario detection and response.
[0127] Fig 19c shows a scenario where a soccer ball rolls into a suburban road
on which a
human driver is driving a car. The car is traveling at 50 km/hour. The driver
notices the ball
(1921) entering the road 50 meters ahead from a point (1920a) behind a tree.
The driver's eyes
saccade to the ball and slow tracks it for about a second. The direction of
the ball is indicated by
broken line arrow (1920b). After confirming that it is a ball rolling into the
road, and anticipating
the possibility of a child following the ball into the road without watching
out for traffic, the
driver's grip and contact area on the steering wheel increases slightly. The
foot goes off from the
accelerator pedal and onto the brake pedal without depressing it. The eyes
stop tracking the ball,
but instead saccade to the point from where the ball came from, and
establishes a ROI around
that area. After the car gets to 20 meters of point 1920a, the area around it
becomes clearer (not
hidden by trees or shrubs). The eyes saccade to a point that is a backward
extension of the arrow
and is 5 meters from the road, and establishes an ROT there. The car has
meanwhile slowed to 45
km/hour (because the accelerator pedal was not depressed). Seeing no person
present there, the
driver assumes no one is following the ball, and returns the foot to the
accelerator pedal 5 meters
from point 1920a to return to a speed of 50 km/hour. The signature of this
event is then extracted
and categorized under "Child" (see signature categorization in fig 25 and
related text) rather than
"Unexpected Objects". A non-human navigating a vehicle will notice the ball
rolling across the
road, but will continue if the ball has exited the lane. A human would expect
a child to appear
unexpectedly following the ball. The eye movement pattern will be saccading to
the ball, smooth
pursuit for a short time and saccading to the region from where the ball might
have originated
from. Depending on the vehicle speed and distance to the ball, the foot might
move away from
the accelerator pedal and move over to the brake pedal at different speeds,
and might depress it
very little (or not at all) or a lot. However, the basic signature underlying
such variations will
have similar patterns.
[0128] Fig 19d1-19d3 shows the scenario of a kangaroo that is about to enter a
single-lane rural
highway on which a human driver is driving a car at 100 km/hour. The sun has
already set an
hour back. The car has its high-beam lights on. The driver has been driving in
a relaxed manner,
with just two fingers and a thumb lightly touching (and not pressing down
hard) the steering
wheel. One hundred and fifty meters ahead, the driver sees an object moving in
his peripheral
CA 2986160 2017-11-20

vision. His eyes saccade to the object, and notices it is a 1.5 meter tall
kangaroo (1931). Both of
the driver's hands grab the steering wheel, gripping it (medium force) with
all fingers. The foot
simultaneously releases the accelerator pedal and moves over the brake pedal,
depressing it with
medium firmness. The car is now 100 meters from the kangaroo and moving at 70
km/hour. The
driver's eyes are slow tracking the kangaroo's eyes (which are glowing due to
the high beam of
the car) as it hops into the driver's lane. An experienced driver, he knows
that the kangaroos
move in mobs, and there might be more of them following the one that just got
on the road. He
also knows that kangaroos often stop and stare at a car's blinding lights,
sometimes even turning
around from the middle of the road or right after just crossing it. He
continues pressing down on
the brake pedal to slow the car down to 50 km/hour, while forming an ROI
around the kangaroo
(but fixated on its glowing eyes whenever it looks at the car), slow tracking
it whenever it hops.
The kangaroo hops away into the far side of the road just as the car passes
it. The signature of
this event is extracted and categorized under "Danger" (see signature
categorization in fig 25 and
related text), under sub-category: "Animals", sub-sub-category "Kangaroo".
Incidents of
kangaroos on (or by the side of) the road are recorded and signatures
extracted. There will be
numerous variations of this signature. For example, the kangaroo stopped in
the middle of the
road and would not budge, or it turned around and hopped back into the car's
lane after reaching
the divider line, or there were more kangaroos following the original one.
However, common
aspects will include slow-tracking of hopping, or fixation on the kangaroo,
all of which can be
extracted from eye movement video, road facing camera video (IR and/or ambient
light), and
long range radar data, and combined with hand and foot sensor data. Pattern
analysis can be used
to identify both the kangaroo and as well as bright spots (eyes) on the roads
and shoulders in the
night in rural or kangaroo-prone roads. Smooth pursuit when looking far away
from the side of
the road indicates the kangaroos are not close to the road, and therefore
there is no danger. The
gait of kangaroos varies with their speed. When they are just ambling or
feeding, they can use all
their limbs. While running at low speeds, they are on their hind limbs, but
not hopping very high.
When running fast, they are on their hind limbs and hopping much higher. The
gait of kangaroos
is also distinguished from other animals like cows because the preference of
kangaroos to use
hind limbs. This aspect (of kangaroos preferring hind legs for locomotion) can
be exploited by
the outside facing video image analysis to distinguish kangaroos from other
animals. With
numerous such events being captured under different conditions, a robust
automated kangaroo
detector and countermeasure subroutine can be formed. Capturing the appearance
(size, shape,
color) and gait of different animals under different conditions allows the
extraction of signatures
unique to each animal, and categorization under appropriate animal sub-
categories. It will be
appreciated that the signature extraction schemes in the various scenarios in
this disclosure not
CA 2986160 2017-11-20

only capture human actions and reactions to specific events, but they also
indirectly capture the
memories and experience of the human drivers, along with human logic,
deduction, rationality
and risk-mitigation since these are the factors that cause drivers to act and
react a certain way.
For example, in the case of the driver just discussed above, who knows from
experience and
memory that kangaroos move in mobs, and there might be many crossing the road,
and that
kangaroos have a tendency to linger on the road or hop back into the road
after seeming to try to
cross it. Using such signatures will reduce or negate the need for these
actions and reactions of
the driver to be programmed into AV software by a human programmer. Such
signatures carry a
wealth of human knowledge, experience and logic accumulated over years and
spread among a
wide variety of geographies and populations, and their trade-offs with
rationalization and risk
management, allowing safe, fast, efficient and pleasant transportation. As
societies transition
towards non-human vehicle operators, all this is saved to signatures for use
by AVs.
[0129] Figs 19e, 19f show two scenarios of a dog on leash by the side of the
road and walking
towards the road on which a human driver is driving a car. In the first
instance (fig 19e), the dog
(1941) is close to its human (1940), and the leash (1942) is sagging. In the
other case (fig 191'),
the dog (1941a) and its human (1940a) are 4 meters away, with the leash
(1942a) taut and the
dog appearing to be tugging on the leash. In the first instance, the driver
will not observe a
possible danger, and will continue driving normally. In the second case, the
driver will slow
down, expecting the possibility that the leash would give way or pull the
human along with it as
it runs into the road. The driver's eyes will saccade to the dog, form an ROT
around it (and notice
its body to check its body size and whether its body pose indicates tugging),
then trace the leash
and form a simple ROT around the human (and check if it is an adult, body pose
to see how much
control the human has). Depending on the outcome, the driver slows down or
continues at the
same speed, with corresponding hand grip/contact area, foot positions.
[0130] Figs 19e1, 19f1 show corresponding eye movements for figs 19e and 19f.
The eye
movement overlay is shown separately (for the sake of clarity) in fig 19e1a
and fig 19fla, and
also show added fixation details. The eye movement overlay in figs 19e1, 19ela
starts from
when the driver notices the dog in his peripheral vision and saccades to it,
and ends 2 seconds
after this. It should be appreciated that most eye movements are not
conscious. Saccade
directions are indicated by arrows, fixations are indicated by circles, with
the smallest circle
being about 100 ms, and the largest one 350 ms. The eye movement overlay in
figs 19f1, 19f1a
starts from when the driver notices the dog in his peripheral vision and
saccades to it, and ends 3
seconds after this. The dog and the human form separate ROIs in fig 19f1, but
are a single ROT
in fig 19e1. Signatures are extracted and categorized under "Danger", sub-
category "Animal",
CA 2986160 2017-11-20

sub-sub category "Dog", which can have sub-sub-sub categories "large dog",
"small dog",
"seeing dog". The human event occurrence detection scheme of fig 26a is used
to detect that a
human event has occurred in fig 19f1a since there is a sudden foot release
from the accelerator
and movement over the brake pedal and increase in hand grip and contact area
on the steering,
both with associated eye-movement to the side of the road and formation of
ROIs and
saccades/fixations around the dog-human combination. Once a human event has
been detected,
the next step is to find the associated outside event that caused the human
event to occur. Video
images from cameras facing the road are analyzed using image processing, and
the dog-human
pair are identified as corresponding to the eye movement data, as also the
edge of the road.
Critical features like the spacing between the dog and human, size of the dog,
leash curvature
(lack of), human pose, and distance to edge of road are stored as part of the
signature. This forms
the process of event signature extraction of fig 26b. This is followed by the
categorization, map
update and training software update as shown in fig 26b. Several such
instances under different
conditions and from different drivers and geographical regions are similarly
extracted and stored
in the database. When the updated training software is used by an AV, and the
AV encounters a
similar "big dog at edge of road tugging on leash held by human" scenario, it
reduces speed and
becomes cautious (analyzes the ROIs at a higher priority). Without this
updated software, the AV
would have continued without reducing speed significantly, and an accident
could have probably
occurred if the leash were to break or slipped out of the human's hand, or the
dog, dragging its
human, had entered the road.
[0131] Fig 20 shows prior art vehicles with at least some level of autonomy.
The autonomy
conferring elements can be divided into two components; hardware and software.
The hardware
module has two layers: sensor and vehicle interface layers. The sensor layer
has environment
sensors and vehicle sensors. The vehicle interface layer has steering, braking
and acceleration
systems. The software component has four layers: perception, reaction,
planning, and vehicle
control. The perception layer has four components: localization, road
detection, obstacle
avoidance, and pose estimation. The sensor layer gathers information about the
environment
around the vehicle, for example, using its cameras, lidar, radar and
ultrasonic sensors. The data
typically relates to obstacles, surrounding traffic, pedestrians, bicycles,
traffic signs and lights,
roads and lanes, paths and crossings, GPS coordinates. Vehicle sensors gather
data on vehicle
speed, acceleration, turning, and direction vectors. Data from the environment
and vehicle sensor
is fed to the software component. The software component then analyses this
data to determine
its present position relative to a map, what actions needs to be performed by
the vehicle,
CA 2986160 2017-11-20

planning for future actions, and how the vehicle has to be controlled. The
software component
then send control instructions to the vehicle interface layer.
[0132] Fig 21 shows an embodiment of a vehicle with improved autonomous
abilities and
functionalities. The vehicle has an enhanced set of environment and vehicle
sensors, the details
of which are shown in fig 22. In addition, the hardware component has human
sensors. The
hardware module (2101) has two layers: sensor (2105) and vehicle interface
(2106) layers. The
sensor layer has environment sensors (2105a), vehicle sensors (2105b) and
human sensors
(2105c). The vehicle interface layer has steering (2106a), braking (2106b),
acceleration (2105c),
signaling (2106d) and communication (2106e) systems. The software component
(2102) has four
layers: perception (2107), reaction (2108), planning (2109), and vehicle
control (2110). The
perception layer has four components: localization (2107a), road detection
(2107b), obstacle
avoidance (2107c), and pose estimation (2107d). The sensor layer gathers
information about the
environment around the vehicle, for example, using its cameras, lidar, radar
and ultrasonic
sensors. The sensor layer additionally gathers data about the vehicle
operator's eye movements,
foot position, contact area and grip of the hands on the steering wheel. Data
from the
environment, vehicle and human sensors is fed to the software component. The
software
component then analyses this data to determine its present position relative
to a map, what
actions needs to be performed by the vehicle, planning for future actions, and
how the vehicle
has to be controlled. The software component then sends control instructions
to the vehicle
interface layer.
[0133] Fig 22 shows details of an enhanced set of environmental sensors that
include human
sensors. Environmental sensors (2200) include sensors to sense the environment
outside the
vehicle (2210), sensors to sense vehicle functioning (2230), and human sensors
(2250). Outside
environment sensors (2210) include: visible cameras (2211) to capture visible
wavelength
images outside the vehicle, including front, rear and side facing cameras,
infrared cameras
(2212) to capture 360 degree images in the infrared wavelength. Lidars (2213)
are time of flight
distance measurement with intensity sensors using pulsed lasers in the 0.8-2
micron (infrared)
wavelength range. Lidars provide a 3D map of the world around the vehicle,
including distances
to objects. Radars map (2214) the position of close by objects, while sonar
(ultrasonic) sensors
(2215) detect nearby objects. Ferromagnetic sensor (2216) detects
ferromagnetic objects,
particularly those on the road, including buried strips. GPS (2217) use global
positioning
satellites to determine the vehicles position. Other environment sensors
include fog (2218), snow
(2219) and rain (2220) sensors. Blinding (2221) sensors detect light that is
blinding the driver,
including sun low on the horizon and high-beam headlights from vehicles coming
from the
CA 2986160 2017-11-20

opposite direction. Vehicle sensors (2230) sense the vehicles actions,
performance and
instantaneous position, and include sensors for measuring current brake force
(2231) and steering
angle (2232), detection of turn signals (2233), status of light (whether
headlights are turned
on/off, and high beam) (2234), RPM (2235), odometer (2236), speed (2237),
handbrake position
(2238), cruise control settings (2239), ABS activation (2240), readings of the
vehicles inertial
measurement units (IMU) (2241), and vibration sensors (2242) that detect
unusual vibration of
the vehicle, for example, from rumble strips, alert strips, speed bumps,
gravel, and potholes.
Human sensors (2250) include eye movement sensors (2251), foot position
sensors (2252) and
hand grip and contact area on steering wheel sensors (2253), and aural (2254)
sensors.
[0134] Fig 23 shows the different kind of human sensors (2250) used, and the
events they
contribute to recording. Eye movement sensors (2251) detect the following
events: saccades
(2251a), glissades (225 lb), fixations (225 lc), smooth pursuits (2251d),
microsaccades (2251e),
square wave jerks (22510, drifts (2251g) and tremors (2251h). Foot movement
sensors detect 3
aspects: xyz position of brake pedal (2252a), xyz position of acceleration
pedal (2252b), and xyz
position of the foot (2252c) of the driver. See fig 10b and fig 10c (and
associated text) for details
on quantities measured. The combination of 2252a, 2252b and 2252c helps make a
determination
of where the foot is with respect to the brake and accelerator pedals, and
whether either one of
them are being depressed, and to what extent they are being depressed. Hand
contact area and
grip sensors detect the hand contact area and grips on the steering wheel. The
left hand contact
area (2253a) and its grip force (2253b), and the right hand contact area
(2253c) and its grip force
(2253d) on the steering wheel are sensed and measured as discussed under fig
9a-9e (and
associated text). Aural sensors (2254) helps detect various events having
associated sounds like:
emergencies 2254a (police, ambulance and other emergency vehicle sirens),
dangers 2254b
(sounds of wheels screeching, honking by other vehicles etc), alerting sounds
(2254c), warning
sounds 2254d (for example, police using handheld loudspeakers for warning),
Doppler detection
2254e (for example, to detect if a police siren is approaching the vehicle or
receding away),
accidents 2254f (sounds of crashes, fender benders, thuds). Aural events also
include normal
ambient sounds outside the vehicle (2254g) and inside the vehicle (2254h)
(which in essence
means no abnormal events are occurring) and directionality 2254i (direction
from which a
particular sound is coming from).
[0135] Fig 24 shows an event identifying module. This module uses the data
from the sensors in
fig 23 to identify and extract events. It should be noted that the normal
outside ambient (2254g)
and normal inside ambient (2254h) have no events associated with them, and so
are not used by
the event identifying module except as reference data (since there is always
road, engine and
CA 2986160 2017-11-20

passing traffic noise, and these either have to be subtracted from event data
or remain unused if
there is no event). Although all the sensors are continuously sensing, events
do not occur all the
time, but occur according to the scheme of fig 26a. The event identifying
module (2360) includes
3 sub-modules: outside environment event identifying module (2310), vehicle
sensor logger and
identifying module (2330), and human event identifying module (2350). Each of
these have their
own sub-modules. The outside environment event identifying module (2310) has
the following
sub-modules: visible cameras feature identifying module (2311), IR cameras
feature identifying
module (2312), Lidar feature identifying module (2313), long (2314a), short
(2314b), medium
(2314c) range radars feature identifying module, ultrasonic feature
identifying module (2315),
ferromagnetic object feature identifying module (2316), GPS logger (2317), fog
density logger
(2318), snow visibility logger (2319), rain visibility logger (2320), and
blinding light (sun, high
beam) logger (2321). The vehicle sensor logger and identifying module (2330)
has the following
sub-modules: brake force logger (2331), steering angle logger (2332), turn
signal logger (2333),
headlight logger (2334), RPM logger (2335), distance logger (2336), speed
logger (2337),
handbrake logger (2338), cruise control logger (2339), ABS logger (2340),
inertial measurement
unit (IMU) logger (2341), and vibration logger and feature identifier (2342).
The human event
identifying module (2350) has the following sub-modules: eye movement event
identifying
module (2351), foot event identifying module (2352), aural event identifying
module (2354), and
the hand event identifying module (2253).
[0136] Fig 25 shows the categorization of event signatures (and their
priorities) so that they can
be stored, recalled and used appropriately. It will be appreciated that the
priorities are not in any
particular order. For example, priority B can be made the highest priority in
an AV's software.
The categorization process can use several variants. For example, it can be
based on eye
movements correlated with other human, vehicle, and outside sensors. For
example, saccades to
a point, fixation, and return saccades to that point followed by cautious
slowing down indicates a
possible unsafe situation. However, a saccade to a point and immediate slowing
down indicates a
more immediate danger. Such a scenario can be accompanied by rapid checking of
the side-view
and/or rear-view mirrors in anticipation of performing a cautionary action
like lane change or
complete stop. When analyzing this scenario for extracting training
information, if there is any
confusion as to what feature the eye had saccaded to because multiple objects
were present in the
line of sight but the objects are at different depths, autorefractor
information (if available) of
focal length of the eye's lens can be used determine what was fixated on. From
this scenario,
several concepts can be extracted, including the appearance of what features
relative to the lane
CA 2986160 2017-11-20

on the road require caution, judged distance to the feature, slow-down and
braking profile
depending on what the feature is, cautionary, defensive and evasive actions to
be performed.
[0137] The event signatures include: Danger (priority A) 2501, Child (priority
B) 2502,
Efficiency (priority C) 2503, Courtesy (priority D) 2504, Special occasions
(priority E) 2505,
Weather related (priority F) 2506, New traffic situation (priority G) 2507,
Unclear situation
(priority H) 2508, Startled (priority I) 2509, Unexpected objects (priority J)
2510, Unexpected
actions of others (priority K) 2511, Sudden actions of others (priority L)
2512, Comfort levels-
speed, distance (priority M) 2513, Environment (low-light, sun-in-eyes, high-
beam) (priority N)
2514, and Legal (priority 0) 2515.
[0138] Event signature Danger (2501) relates to events that are dangerous,
with potential for
human injury or property damage. For example, consider a scenario where a
potential accident
that was averted because a heavy truck entered a road without yielding. The
event signature can
include eye movements (like saccades, fixations, slow tracking), binaural
recording, along with
hand and foot sensor data, all combined with road facing video of a situation
where a collision
with this truck could have potential occurred, but the driver took evasive
action to avert this
accident.
[0139] Event signature Child (2502) relates to events associated with a child,
either averting an
accident, or driving cautiously in expectation of an unpredictable, illegal or
unexpected action by
a child. For example, consider a scenario in which potential injury to a child
was averted. The
child, along with a caregiver, are walking straight ahead along a sidewalk of
a road. The driver
on the road notices the child turning back and looking at a bird on road's
divider. The driver
slows down expecting the child to cross the road to pursue the bird. The
caregiver is unaware of
what is going on. As expected, the child lets go of the caregiver and darts
across the road. The
driver is already slowing down and completely alert, and is prepared to stop,
and does stop a
meter from the child. Eye movement data, hand and foot sensor data, and
forward looking video
are all analyzed to extract relevant information and formulate an event
signature.
[0140] Event signature Efficiency (2503) relates to events that help in
improvising efficiency of
transportation. This can be, for example, taking the shortest route, or taking
the fastest route, or
avoiding to the contribution of traffic congestion on a particular segment of
a path. These
scenarios are typical in congested portions of large cities. The driver takes
side routes which are
slightly longer, but helps get to the destination faster, and also helps
prevent congestion at a
particularly notorious segment.
CA 2986160 2017-11-20

[0141] Event signature Courtesy (2504) relates to actions of the driver that
lend to politeness,
civility and courtesy. This can be, for example, the driver slowing down to
let another car enter
the lane. In this situation, there is no other need or indicator for slowing
down, including legal
(traffic signs or laws), traffic conditions or other event categories. Eye
movement data, aural
data, hand and foot sensor data, and forward looking video are all analyzed to
extract relevant
information and formulate an event signature.
[0142] Event signature Special Occasions (2505) relates to non-normal
occasions, and the
driver's response to it. For example, traffic diversions are in place for a
popular tennis match.
Roads approaching the venue have traffic diversion signs. However, these signs
are road-side
moving/scrolling display type. Such signs are not in the database of regular
traffic signs. The
driver follows these diversions, although this route is not the optimal one as
per the map of the
region. In ordinary circumstances, this action by the driver will be deemed
inefficient and scored
low. However, if the time-period for the segment of the path has already been
indicated as
Special Occasion, and the driver follows the diversions, then the actions of
the driver will be
used to extract an event signature. Such a signature can include: saccading to
the roads-side
display, which becomes a new region of interest (ROT), and saccades/smooth
pursuits following
the scrolling/moving letters within this ROI, while saccading back and forth
to the traffic ahead,
slowing down to read the signs (foot movement), gripping the steering wheel a
little tighter.
[0143] Event signature Weather Related (2506) relates to environmental (local
weather)
characteristics that cause a driver to change driving characteristics. For
example, during a first
rain, roads becomes slippery, and an experienced driver will slow down much
more than the
usual when turning. During subsequent rains, the magnitude of slowing down
will reduce. As
another example, on a rainy day with wet and slippery roads, the driver will
maintain a longer
following distance, be more vigilant when traffic is merging (foot is more
often hovering over
the brake, with a lot more alternating acceleration and braking, while the
hands are firmly
gripped on the steering wheel, and there are a lot more saccades towards
adjacent lanes).
[0144] Event signature New Traffic Situation (2507) relates to driver's
behavior during changed
traffic situations. This can include accidents ahead, lane closures, certain
segments being
converted to one-way roads. These situations will generally be a surprise to
drivers. Their
response to these situations will deviate from the normal, and the routes they
take will vary from
what is required by a map. Hand and foot sensors will detect some
indecisiveness (unusual
slowing down, foot off the accelerator and hovering over the brake, with
intermittent pressing of
the brake pedal, both hands on steering), while eyes will register regions
with unusually slowing
CA 2986160 2017-11-20

traffic (saccades to various portions of oncoming as well as on road traffic)
which is confirmed
by forward looking camera video.
[0145] Event signature Unclear Situation (2508) relates to situations when the
driver is not sure
of what to do next. For example, when lane markers on roads are faded, drivers
traversing that
segment of the path after a long time will be confused as to the lane
boundaries. This can
translate into the foot getting off the accelerator and hovering over the
brake pedal without
depressing it, even though the speed limit for that segment is much higher.
Other examples are: a
situation when traffic lights are malfunctioning, or when another car has
turned on turn indicator
but is not entering the lane the driver is on. Lack of clarity in these
situations can be traced from
saccades to and from different ROIs, hand grip pressure and foot position.
Aural sensors may not
detect any abnormality in ambient sounds.
[0146] Event signature Startled (2509) relates to an event in which the driver
is startled. In such
a situation, the driver becomes alert instantly. The hand-grip tightens
instantly, with more
number of fingers and more surface area of the palms making contact with the
steering wheel.
The foot instantly gets off the accelerator and moves over the brakes, usually
depressing the
brakes at least slightly. Eye movements will indicate rapid saccades between
very few ROls. An
example is when a truck behind sounds its air-horn unexpectedly. Another
example is a very
small bird flying across the road right in front of a car (for example, bird
is entering the road 5
meters ahead when the car is traveling at 80 km/hour), startling the driver.
The bird has no
potential to damage the car. There will be a sudden foot movement from the
accelerator to the
brake, instantaneous grip and more contact area on the steering wheel, a
saccade to the bird and
then a very rapid smooth pursuit tracing the bird as it flies away, the
steering wheel grip-force
relaxing almost instantly but slower than at the beginning of this event (when
the bird was first
sighted) and the foot going back to the accelerator. This example event lasts
around a second or
two.
[0147] Event signature Unexpected Objects (2510) relates to an event in which
an unexpected
object appears to the driver. In such a situation, the driver becomes alert
gradually (for example,
as the object comes comes closer and its visual clarity increases). The hand-
grip tightens
gradually, with more number of fingers and more surface area of the palms
making contact with
the steering wheel as the object comes closer. The foot gets off the
accelerator and moves over
the brakes gradually. Eye movements will indicate a rapid saccade to the
object, and then
fixations and saccades within this region, and then a saccade to the rear-view
or side-view
mirror, and then a saccade back to and within the object ROT. An example is a
large bird hopping
CA 2986160 2017-11-20

across the road 100 meters ahead while the vehicle is traveling at 60 kn/hour.
The bird has no
potential to cause major damage to the car. There will be a slow foot movement
from the
accelerator to the brake (which is not depressed) while a saccade to and
within the ROI that
defines the bird. This is followed by a slow smooth pursuit as the bird hops
away from the road,
the steering wheel grip force relaxing and the foot going back to the
accelerator. This example
event lasts over 3 seconds. Another example is pieces of shredded tire on a
highway appearing
starting 200 meters ahead while traveling at 100 km/hour.
[0148] Event signature Unexpected Actions of Others (2511) relates to events
that are dictated
by the actions of other vehicles. For example, when a car in front travels at
60 km/hour on a
highway marked 100 km/hour, the drive is forced to slow down. Such an event is
usually
accompanied by saccades to the object in front, then to the rear-view mirror
and then side-view
mirror, all the while the foot has moved from the accelerator tot he brake and
the steering wheel
grip has tightened slightly along with a greater contact area. The driver is
not startled, nor is the
car in front an unexpected object.
[0149] Event signature Sudden Actions of Others (2512) are events that where
the actions of
other vehicles on the road lead to a driver performing a reflexive or
conscious action. For
example, when a vehicle in an adjacent lane swerves very slightly (but stays
within its lane) into
the lane of a driver, the driver swerves instantaneously away, and then slows
down slightly. Eye
movements will indicate a sudden saccade as the swerving vehicle enters the
peripheral vision.
The saccade is followed by almost instantaneous foot movement away from the
accelerator and
onto the brake, which is depressed (there is no hovering over the brake, the
foot depresses it
immediately), while hand-grip and contact-area values increase instantly. The
time period for this
example event is about 0.5 seconds.
[0150] Event signature Comfort levels (2513) are event signatures surrounding
drivers' attempts
to adjust driving parameters to suit their comfort levels. This can be, for
example, adjusting
following distance, speed, lane position, preference for a longer route rather
than taking a much
shorter but crowded rote, or avoiding driving close to very large vehicles.
These events are
typically much longer in time frame, with most sensor readings spread over a
larger time (slower
foot motions, grip on steering wheel is lighter and has less contact),
including slower speeds and
higher latencies for saccades, near-absence of microsaccades, and very low
amplitude glissades.
An example is when a driver driving on a divided multi-lane highway with
sparse-traffic
encounters a long segmented trailer (carrying two levels of cars) ahead. The
driver is
uncomfortable driving behind its trailer, and prepares to get ahead of it by
switching lanes and
CA 2986160 2017-11-20

merging back. Slow saccades are directed to the rear-view and side-view
mirrors, and a gradual
speeding up of the car (foot stays on the accelerator since there was no prior
braking for several
minutes before of the start of this event) occurs. The steering wheel is
gripped a little tighter than
before (the previous grip was of very low value, and the contact was only
single hand and three
fingers, the present grip becomes slightly tighter and with two hands and more
fingers engaged).
Saccades to the trailer, rear-view and side-view mirrors can all be, for
example, one to two
seconds apart during the lane change procedure. After the lane change and
getting ahead of the
trailer (for example, after 15 seconds), switching back to the original lane
involve slow saccades
= mostly directed to the rear-view and side-view mirrors.
[0151] Event signature Environment (2514) relates to driving behavior events
that are affected
by the environment, examples are low-light levels, sun straight ahead and low
on the horizon,
high-beam lights of oncoming traffic. When any of these events happen rapidly
or unexpectedly,
the driver slows down, maintains a longer following distance, is more
cautious, all of which
mostly translate to foot hovering over or depressing brakes, tighter grip and
higher contact area
on steering wheel, without affecting saccades, glissades and microsaccades.
[0152] Event signature Legal (2515) relates to a driver's action while
following legal guidelines.
For example, a driver stopping the car at the instruction of a police officer
waving to the driver to
pull over, or giving way to a ministerial motorcade, or pulling over for a
random roadside breath
test. These vents are not routine in any segment of a path, and may not happen
to every driver on
that segment. They can appear slightly similar to stopping at traffic lights,
but are distinguishable
because there are no traffic lights on the corresponding map. These events can
be accompanied
by the driver pulling away from the road and onto a side or a non-road area.
They can also be a
general slowing down, with slow tracking of vehicles on the driver's side
(faster traffic is on the
driver's side).
[0153] Fig 26a shows a human event occurrence detection scheme, while fig 26b
shows how this
detection scheme feeds data into analysis scheme to extract signatures and use
it to train AVs.
This scheme is used to make a determination as to when an outside event has
occurred. The
sensors are continuously capturing data. However, not all of this data
eventually goes towards
training an AV. Specific events occurring on the outside of the vehicle are
correlated with human
sensor data vehicle sensor data. Eye movement event and aural events are
classed as primary
human events, while foot events and hand events are classed as secondary human
events. When
at least one each of primary and secondary human events have occurred, there
is a possibility
that this was caused by or in anticipation of an outside event. In fig 26b,
these human events are
CA 2986160 2017-11-20

compared to the pre-existing map to confirm if the human events correspond to
what an outside
event, and if there is no correlation, no outside event has occurred. If there
is a correlation, then
there is an expectation that an unusual outside event (outside the car) has
occurred to which the
driver is responding. For example, on a divided highway with sparse traffic,
drivers might
increase speed when they realize they are driving below the speed limit, or
decease speed when
the speed has increased over the speed limit. However, there was no outside
event that caused
these actions, and therefore no correlation between the human events and what
is happening
outside the car. Similarly, when following a routine path home from their
workplace, drivers will
have the usual patterns of saccades, glissades, microsaccades, fixations, hand
and foot sensor
readings, and similar aural recordings. In these cases, an unusual outside
event has not occurred
to cause a change in their normal driving pattern.
[0154] In fig 26a, data relating to eye movement (2351), aural (2354), foot
(2352) and hand
(2353) are fed to eye movement event comparator (2601), aural event comparator
(2604), foot
event comparator (2602) and hand event comparator (2603), respectively. The
comparison is
between the respective events at time T and time T+AT, where AT is a small
increment in time.
This comparison helps determine whether a change in a human event has occurred
in the time
period AT. Thresholds can be set to determine what constitutes a change. For
example, a 25%
increase in hand contact area on the steering wheel and/or a 50% increase in
total grip force on
the steering wheel can be set as the minimum for triggering a change-
determination. Similar
threshold settings can be used for other human events. The thresholds can be
tailored for
individuals, path locations (for example, rural versus urban), male/female
drivers, type of vehicle
being driven, time of day and other factors. If no change has occurred, the
comparison continues
starting with the next T, where the next T=T+AT. If a change has indeed
occurred, then a check
is made (2610) to see if at least one each of primary and secondary human
events have changed
for this time period. If the answer to this in the negative, then the
determination is made that no
outside event has occurred (2611). If the answer is affirmative, then an
outside event has
probably occurred (2612), and the human events are compared (2613) with the
map
corresponding to the path segment for the same time period T to T+AT. The
results of this
comparison are shown in fig 26b. It should be noted that while all these
comparisons are going
on, each of 2351-2354 are continuously feeding data to each of 2601-2604,
respectively. This
process continues irrespective of the outcomes at 2601-2604 and 2610.
Regarding eye
movements, it should be noted that tremors, microsaccades and drifts can be
used as alternatives
or to augment fixation detection. Similarly glissades can be used as
alternatives or to augment
saccade detection, or for detecting the end of a saccade.
CA 2986160 2017-11-20

[0155] Fig 26b shows event signature extraction, categorization, map update
and training
software update by using human event data from fig 26a after confirmation that
an outside event
has probably occurred. The human event is compared to the corresponding map
segment to see
whether this was an expected event. For example, if the map indicates that
there is a traffic light
for the segment corresponding to when a driver stopped the car (saccades to
the traffic light
above and ahead of the car, hand contact area increased slightly, foot off the
accelerator and over
the brake and slow depressing of brake to come to a complete stop), then there
was probable
cause for the car to have stopped on the road. Data from the vehicle sensor
logger and
identifying module (2330), outside environment feature identifying module, and
human event
identifying module corresponding to this time segment (T to T+AT) are captured
and stored in a
data vector. A signature is extracted from this data vector. This forms the
signature of the outside
event that has occurred which caused the human to act or react a certain way.
This signature is
compared to the existing signature database to see if it is a known signature,
i.e, a similar event
has occurred in the past. If it is a known signature, the signature's count is
incremented in a user
section of the map (the main map is not altered). If this is an unknown
signature, then the
signature is categorized under the scheme of fig 25 as to belonging to one of
2501-2515. This
signature is then added to the appropriate category in the signature database,
and also added to
the user section of the map. The AVs training software is then updated.
CA 2986160 2017-11-20

Dessin représentatif
Une figure unique qui représente un dessin illustrant l'invention.
États administratifs

2024-08-01 : Dans le cadre de la transition vers les Brevets de nouvelle génération (BNG), la base de données sur les brevets canadiens (BDBC) contient désormais un Historique d'événement plus détaillé, qui reproduit le Journal des événements de notre nouvelle solution interne.

Veuillez noter que les événements débutant par « Inactive : » se réfèrent à des événements qui ne sont plus utilisés dans notre nouvelle solution interne.

Pour une meilleure compréhension de l'état de la demande ou brevet qui figure sur cette page, la rubrique Mise en garde , et les descriptions de Brevet , Historique d'événement , Taxes périodiques et Historique des paiements devraient être consultées.

Historique d'événement

Description Date
Inactive : CIB expirée 2024-01-01
Demande non rétablie avant l'échéance 2021-08-31
Le délai pour l'annulation est expiré 2021-08-31
Inactive : COVID 19 Mis à jour DDT19/20 fin de période de rétablissement 2021-03-13
Lettre envoyée 2020-11-20
Inactive : CIB attribuée 2020-10-05
Inactive : CIB attribuée 2020-10-01
Réputée abandonnée - omission de répondre à un avis sur les taxes pour le maintien en état 2020-08-31
Inactive : COVID 19 - Délai prolongé 2020-08-19
Inactive : COVID 19 - Délai prolongé 2020-08-06
Inactive : COVID 19 - Délai prolongé 2020-07-16
Inactive : COVID 19 - Délai prolongé 2020-07-02
Inactive : COVID 19 - Délai prolongé 2020-06-10
Inactive : COVID 19 - Délai prolongé 2020-05-28
Inactive : COVID 19 - Délai prolongé 2020-05-14
Inactive : CIB expirée 2020-01-01
Inactive : CIB enlevée 2019-12-31
Lettre envoyée 2019-11-20
Représentant commun nommé 2019-10-30
Représentant commun nommé 2019-10-30
Demande publiée (accessible au public) 2019-05-20
Inactive : Page couverture publiée 2019-05-19
Inactive : CIB attribuée 2018-01-31
Inactive : CIB en 1re position 2018-01-25
Inactive : CIB attribuée 2018-01-23
Inactive : CIB attribuée 2018-01-23
Inactive : CIB attribuée 2018-01-17
Inactive : Certificat dépôt - Aucune RE (bilingue) 2017-11-30
Demande reçue - nationale ordinaire 2017-11-24
Déclaration du statut de petite entité jugée conforme 2017-11-20

Historique d'abandonnement

Date d'abandonnement Raison Date de rétablissement
2020-08-31

Historique des taxes

Type de taxes Anniversaire Échéance Date payée
Taxe pour le dépôt - petite 2017-11-20
Titulaires au dossier

Les titulaires actuels et antérieures au dossier sont affichés en ordre alphabétique.

Titulaires actuels au dossier
ASHOK KRISHNAN
Titulaires antérieures au dossier
S.O.
Les propriétaires antérieurs qui ne figurent pas dans la liste des « Propriétaires au dossier » apparaîtront dans d'autres documents au dossier.
Documents

Pour visionner les fichiers sélectionnés, entrer le code reCAPTCHA :



Pour visualiser une image, cliquer sur un lien dans la colonne description du document. Pour télécharger l'image (les images), cliquer l'une ou plusieurs cases à cocher dans la première colonne et ensuite cliquer sur le bouton "Télécharger sélection en format PDF (archive Zip)" ou le bouton "Télécharger sélection (en un fichier PDF fusionné)".

Liste des documents de brevet publiés et non publiés sur la BDBC .

Si vous avez des difficultés à accéder au contenu, veuillez communiquer avec le Centre de services à la clientèle au 1-866-997-1936, ou envoyer un courriel au Centre de service à la clientèle de l'OPIC.


Description du
Document 
Date
(aaaa-mm-jj) 
Nombre de pages   Taille de l'image (Ko) 
Description 2017-11-19 46 2 657
Abrégé 2017-11-19 1 15
Dessins 2017-11-19 29 604
Revendications 2017-11-19 1 9
Dessin représentatif 2019-04-08 1 9
Certificat de dépôt 2017-11-29 1 201
Avis de rappel: Taxes de maintien 2019-08-20 1 120
Avis du commissaire - non-paiement de la taxe de maintien en état pour une demande de brevet 2020-01-01 1 533
Courtoisie - Lettre d'abandon (taxe de maintien en état) 2020-09-20 1 552
Avis du commissaire - non-paiement de la taxe de maintien en état pour une demande de brevet 2021-01-03 1 536