Patent 2347493 Summary

(12) Patent Application:	(11) CA 2347493
(54) English Title:	ATTENTIVE PANORAMIC SENSING FOR VISUAL TELEPRESENCE
(54) French Title:	DETECTION PANORAMIQUE VIGILANTE POUR TELEPRESENCE VISUELLE
Status:	Dead

Bibliographic Data

(51) International Patent Classification (IPC):	H04N 7/18 (2006.01) G06T 3/00 (2006.01) H04N 5/232 (2006.01) H04N 7/15 (2006.01)
(72) Inventors :	ELDER, JAMES H. (Canada) GOLDSTEIN, RONEN (Canada) HOU, YUQIAN (Canada)
(73) Owners :	ELDER, JAMES H. (Canada) GOLDSTEIN, RONEN (Canada) HOU, YUQIAN (Canada)
(71) Applicants :	ELDER, JAMES H. (Canada) GOLDSTEIN, RONEN (Canada) HOU, YUQIAN (Canada)
(74) Agent:	HILL & SCHUMACHER
(74) Associate agent:
(45) Issued:
(22) Filed Date:	2001-05-14
(41) Open to Public Inspection:	2002-11-14
Examination requested:	2006-05-03
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Abstracts

English Abstract

Sensor and bandwidth constraints limit the instantaneous spatial resolution
and field-of view (FOV)
achievable in any visual system. In the human eye, a compromise has evolved in
which resolution is high
near the optical axis, but falls off with eccentricity. The effective
resolution of the system is extended
by fast gaze-shifting mechanisms and a memory system that allows a form of
integration over multiple
fixations.
We have constructed an artificial visual system based on these concepts. The
peripheral component
consists of a catadioptric video sensor that provides a panoramic FOV. The
foveal component is a video
pan/tilt camera with 14 deg FOV. Calibration yields a table of projective
parameters, indexed by foveal
pan/tilt coordinates, that allows rapid transfer of pixels between foveal and
panoramic coordinate frames.
A second transform maps between sensor and display frames. Alpha masking
yields a circular, smoothly
blended fovea embedded in the lower resolution panoramic image.
The system may be operated in 3 modes. In slaved mode, mouse-clicks in the
display generate saccade
commands to the pan/tilt platform. In autonomous mode, saccades are entirely
determined by motion
detected in the peripheral sensor. In semi-autonomous mode these two
independent motor command
streams are arbitrated to produce a system responsive to operator interest as
well as autonomously-
detected motion events.
The display duration of foveal images from past fixations is determined by a
memory parameter. At
one extreme, previous foveal data are immediately replaced by more recent low
resolution data from the
panoramic sensor. At the other extreme, a sequence of fixations builds up a
persistent high resolution
mosaic. In intermediate modes, foveal data from previous fixations gradually
fade into more recent
low-resolution data.
The system is presently operating on a Pentium platform at 15 fps.

Claims

Note: Claims are shown in the official language in which they were submitted.

THEREFORE WHAT IS CLAIMED IS:

1. A device for panoramic sensing for visual telepresence, comprising:
a video sensor having a panoramic field of view, a motion sensor and a display
means connected to said video sensor; and
control means connected to said video sensor, said motion sensor and said
display means, said control means being operable in either a slaved mode in
which an
operator controls positioning of said video sensor, an autonomous mode in
which
saccades are determined by motion detected by said motion sensor, or a semi-
autonomous mode in which saccades are determined by a combination of motion
detected by said motion sensor and operator interest.

2. The device according to claim 1 wherein said video sensor includes a foveal
component comprising a video camera, and wherein display duration of foveal
images
from past fixations is determined bay a memory parameter.

3. The device according to claim 2 wherein said control means includes a
calibration
providing a table of projective parameter parameters, indexed by foveal
pan/tilt
coordinates, that allows rapid transfer of pixels between foveal and panoramic
coordinate frames, including a transform means for mapping between the motion
sensor and display frames produced on said display means.

4. The device according to claim 3 including alpha masking means for
displaying a

high resolution smoothly blended fovea embedded in lower resolution panoramic
image.

10

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02347493 2001-05-14
ATTENTIVE PANORAMIC SENSING FOR VISUAL TELEPRESENCE
FIELD OF THE INVENTION
The present invention relates to panoramic sensing systems for visual
telepresence.
1 Introduction
Over th.e last ten years there has been incre,~.sing interest in wide FOV
sensing, particularly panoramic
sensing (e.g. (6, 11, 3, 5)). The advantages of a panoramic: FOV for
surveillance and teleconferencing
applications are clear, however these advantages come at the: expense of
resolution. Switching from the
14 deg FOV of a typical lens to the 360 deg F'OV of a panoramic camera results
in a 26-fold reduction in
linear resolution. For a standard 768 x 494 N'I'SC camera, horizontal
resolution is reduced to roughly 0.5
deg/pixel, a factor of 60 below human foveal resolution.
The human visual system has evolved a bipartite solution to the FOV/resolution
tradeoff. The FOV of
the human eye is roughly 160 x 175 deg - nearly hemispheric. Central vision is
served by roughly 6 million
photoreceptive cones that provide high resolution, chromatic sensation over a
5 deg FOV, while roughly
100 milli~~n rods provide relatively low-resolution achromatic vision over the
remainder of the visual field.
The effective resolution of the system is extended by fast gaze-shifting
mechanisms and a memory system
that allows a form of integration over multiple fixations.
In this paper, we outline the design of an artificial visual system based on
these concepts, and report
preliminary results from a prototype system we have constructed. The
peripheral component of the system
consists of a catadioptric video sensor that provides a panoramic FOV. The
foveal component is a video

CA 02347493 2001-05-14
pan/tilt camera with 14 x 10 deg FOV. Video streams from the two sensors are
fused at 15 fps on a standard
video display.
Saccades (rotations of the pan/tilt sensor) may be initiated either manually
by a human observer via
mouse clicks on the display, or automatically by a motion localization
algorithm. Memory parameters
govE~rn the tradeoff between the high spatial resolution of the foveal video
stream, and the high temporal
resolution of the panoramic stream.
;3ystems of this kind may be useful in both autonomous and semi-autonomous
applications. Events de-
tect~~d in the panoramic sensor may general:e saccade commands to allow more
detailed inspection/verification
at foveal resolution. In telepresence applications, foveal data may provide
the resolution. required to see
facial expressions, read text, etc..., while the panoramic data may augment
the sense of presence and
situ<itional awareness.
2 Prior Work
There has been considerable work on space-variant (foveated) sensor chips (9,
1, 7, 10). However, since the
num'Ser of photoreceptive elements on these sensors is limited, they do not
provide a resolution or FOV
advantage over traditional chips. Moreover it is not clear how such sensors
could be used to achieve a
panoramic FOV over which the fovea can be rapidly deployed.
Geisler & Perry (2) have demonstrated a. wavelet-based video encoding system
that progressively sub-
samF~les the video stream at image points distant from the viewer-defined
region of interest. Recent work
with saccade-contingent displays (4) has shown that video data viewed in the
periphery of the human visual
system can be substantially subsampled with. negligible subjective or
objective impact. While our attentive
panoramic sensor is not eye-slaved, these prior results do suggest that
attention-contingent sampling for
human-in-the-loop video is feasible_ and poi:entially useful. . . . _
Z

CA 02347493 2001-05-14
SUMMARY OF THE INVENTION
The present invention provides a device for panoramic sensing for visual
tE;lepresence, comprising:
a video sensor having a panoramic field of view, a motion sensor and a display
means connected to said video sensor; and
control means connected to :.aid video sensor, said motion sensor and said
display means, said control means being operable in either a slaved mode in
which an
o~~erator controls positioning of said video sensor, an autonomous mode in
which
saccades are determined by motion detected by said motion sensor, or a semi-
autonomous mode in which saccadEa are determined by a combination of motion
dEaected by said motion sensor and operator interest.
DETAILED DESCRIPTION OF THE INVENTION
:Design
3.1 Hardware
The prototype sensor is shown in Fig. 1(a). 'rhe panoramic component is a
parabolic catadioptric sensor (5)
purch,3sed from Cyclovision Technologies (n.ow RemoteRealityT~~). The
parabolic mirror stands roughly
2 metres from the ground, facing down, and thus images the panoramic field
below the ceiling of the
laboratory . Panoramic images are captured through an orthographic lens system
by a Pulnix TMC-7DSP
colour CCD camera.
The foveal component consists of a Cohu 1300 colour CCD camera with a 50mm
Fujinon lens, mounted
on a Directed Perception PTU-46-17.5 pan/'tilt platform. As loaded, the
platform travels at an average
speed of roughly 60 deg/sec in both pan and tilt directions: typical saccades
complete in 150 to 1200
msec. The platform has been modified so that both axes of rotation coincide
approximately with the
optical centre of the lens system, so that parallax between foveal images at
different pan/tilt coordinates
is minimized.
Th~~ optical centres of the two sensors are separated by 22cm in the vertical
direction. This means
that a fixed system of coordinate transforms between the two sensors can be
accurate only if viewing
distance is large or if dynamic variations in depth are small relative to
viewing distance. Since neither
condition holds in our laboratory, we currently calibrate the system for
intermediate distances and accept
the misregistrations that occur at other depths.
Vide processing, display and control are handled by a single-CPU 800 MHz
Pentium III computer.
The two colour 30 fps NTSC video streams a.re digitized by a 6-channel Data
Translation DT3132 frame
grabber card into two digital 640 x 480 video :streams. The display is driven
by an NVidia 64MB GeForce2
CTS graphics card.

CA 02347493 2001-05-14
3.2 Coordinate Transformations
In t:he current prototype, we model the scene as static and piecewise planar,
and approximate the cor-
respondence between foveal and panoramic coordinate frames using a table of
projective transformations
indexed by the pan/tilt coordinates of the foveal sensor. We discuss our
calibration procedure in Section
4. The system relies upon 4 different coordinate transformations (Fig. 2):
~ panorama->display
~ fovea->display
~ panorama-~pan/tilt
~ display-~pan/tilt
The first two transformations rnap the two video streams to common display
coordinates. The last two
transformations map selected interest points from panoramic or display
coordinates to pan/tilt coordinates
used to effect a saccade.
3.2.1 Panorama-Display 'I~ansformation
The panorama-3display coordinate transfo:cm is a fixed 3-parameter
translation/scaling, so that the ob-
server views the scene essentially in panoramic coordinates. In the present
configuration we map a 256 x 128
pixel subimage from the upper half of the panorama to a 1280x640 pixel window
in the display.

CA 02347493 2001-05-14
3.2.:? Fovea~Display Transformation
The fovea-adisplay transformation is composed of fovea-apanorama and panorama-
display transforma-
tions. Calibration (Section 4) yields a table of projective matrices, indexed
by the pan/tilt coordinates
of the foveal sensor platform, that are used to map foveal pixels to panoramic
coordinates. Given an
arbitrary pan/tilt index, the projective matrix is constructed by bilinearly
interpolating the 8 projective
parameters stored at neighbouring entries. The result is then mapped to
display coordinates using the
fixed panorama~display coordinate transform. The rectangular foveal image is
thus mapped to a general
quadrilateral in the display.
3.2.3 Panorama-~Pan/Tilt Transformation
In addition to the table of projective parameters used for the fovea--panorama
transformation, the Calibra-
tion procedure yields a second table used for the panorama-~pan/tilt
transformation. This table provides
the pan/tilt coordinates required for given panoramic coordinates to map to
the centre of a foveal im-
age. 'thus the table can be used to centre the fovea at a point of interest
automatically detected in the
panorama.
3.2.4 Display-3Pan/Tilt Transformai:ion
The display-~pan/tilt transformation is composed of a fixed
translation/scaling display-panorama trans-
formation and the panorama-~pan/tilt transformation just described. This
transformation is used to
generate saccades to points of interest detected in the display by the
observer.
4 (calibration
The system is calibrated manually, using a simple calibration rig. Since our
sensor is located close to the
corner of the laboratory, we work within a 90 x 45 deg subfield located at the
top of the panorama and
facing out from the walls. 21 synchronous pairs of foveal/panoramic frames are
captured over a 7 x 3
regularly spaced grid in pan/tilt space. The rig is positioned an intermediate
distance from the sensor to
optimize the working range of the coordinates transformations for the given
environment. 12-16 point pairs
are manually localized in each foveal/panoramic image pair, and the
corresponding least-squares projective
transformation is estimated using standard techniques. These data are used to
form the fovea-panorama
coordinate transformation, indexed by the pan/tilt coordinates of the foveal
platform.
For each image pair obtained we also store the projection of the foveal centre
into panoramic coordinates.
This allows construction of a second table, indexed by panoramic coordinates,
that provides the pan/tilt

CA 02347493 2001-05-14
coordinates required to centre the fovea at a specific panoramic location.
This table is used to generate
sa~:cades from human or machine attention algorithms.
Operation
Fig. 2 shows a schematic of how these video streams are processed, combined
and displayed. The panoramic
video stream is first unwarped by the CPU using Cyclovision software (5) to
form a 1024 x 256 colour video
stream (Fig. 3(a)). The two video streams. are then transformed into common
display coardinates prior to
fusion.
The fusion algorithm is essentially to display foveal pixels where they exist,
and panoramic pixels
otherwise (Fig. 3(b)). In order to make the fusion less ,warring to the
observer, the foveal and panoramic
day:a are blended using a set of concentric alpha masks, yielding a high-
resolution circular fovea smoothly
inset within a low-resolution periphery (1~ ig. 3(c)). All coordinate
transformations and masking are done
by graphics hardware using OpenGL. When not interrupted by saccade commands,
the system runs at 15
fps.
Saccades are initiated in two ways. If the observer clicks the mouse in the
display, the location is
transformed from display to pan/tilt coordinates which form the target of an
immediate saccade. Saccades
may also be initiated by a motion localiz,~.tion algorithm, which we describe
below.
6 Motion Localization
6.1 Algorithm
The system may be operated to make saccades to points in the panorama where
motion is detected. A
fundamental issue in motion processing is how to select the spatial scale of
analysis. In our case, the
purpose of the detection is to drive the fovea to the point of interest to
resolve the change. Thus it is
natural to match the scale of analysis to the FOV of the foveal sensor in
panoramic coordinates. In this
way, saccades will resolve the greatest amount of motion energy.
Successive panoramic RGB image pairs (Fig. 4(a-b)) are differenced, rectified,
and summed to form a
primitive motion map (Fig. 4(c)). This m,ap is convolved with a separable
square kernel that approximates
the FOV of the foveal sensor in panoramic coordinates (50 x 50 pixels). The
resulting map (Fig. 4(d)) is
thresholded to prevent the generation of saccades due to sensor noise and
vibration (Fig. 4(e)).
In order to select the appropriate threshold, an experiment was conducted in
which motion map statis-
tics were collected for a static scene. 'Thirty motion reaps yielded nearly a
million data points. We ran
this experiment under two different conditions. In the first condition,
saccades were inhibited, so that
vibration in the sensor was minimized. T:he resulting distribution of motion
values is shown in Fig. 5(a).
In the second condition, we computed the motion maps immediately following a
saccade, at which time
we ~:xpect vibration to be near its maxirr~um (Fig. 5(b)). The noise
distribution can be seen to depend
strongly on the state of the sensor. In the present prototype we use the first
distribution to determine the
threshold (3.0) and simply inhibit motion detection for a 2-second period
following each saccade.
'the location of the maximum of the thresholded motion map determines the next
fixation (Fig. 4(f)).
Since the motion computation and the video fusion computations are done by the
same CPU, motion
computation pauses the update of the display for an average of 400 msec. This
need not occur in a true
telepresence application, in which the attention algorithms could run on the
host computer of the sensor
and the fusion algorithms could run on the client computer of the observer.

CA 02347493 2001-05-14
? Memory
What information the human visual system retains over a sequence of fixations
is a subject of debate in
vision science at the present time (e.g. ~8J). There is no question, however,
that humans have some forms
of visual memory (iconic, short-term, long-term).
We have implemented a primitive sort; of memory in our own artificial
attentive sensor. The display
duration of foveal images from past fixations is determined by a memory
parameter. At one extreme,
previous foveal data are immediately replaced by more recent low resolution
data from the peripheral
sensor. At the other extreme, a sequence of fixations builds up a persistent
high resolution mosaic (Fig.
6(a)). In intermediate modes, foveal data from previous fixations gradually
fade into more recent low-
resolution data (Fig. 6(b)).
8 Future Work
We see the attentive panoramic sensor as. a testbed for a number of important
computer vision problems.
For applications where object distances a.re much greater than the
foveal/panoramic baseline, good regis-
tration can be achieved by a single calibration prior to operation. For close-
range applications, however,
registration is more approximate and fails if the scene is dynamic. Errors can
be reduced by redesigning the
sensor package to shorten the baseline, but our ultimate goal is to solve
foveal-panoramic: correspondence
in real time and thus achieve good registr<~tion for close-range dynamic
environments. This is a challenging
goal, given the 16:1 linear resolution difference between fovea and panorama.
We believe that human eye movement behaviour and attention processing are to a
great degree de-
termined by the decline in visual acuity with eccentricity, and thus we feel
that the attentive panoramic
sensor is an interesting platform on which to test attention algorithms.
It remains to be seen how effective a:nd imrnersive an experience this kind of
sensor can deliver in a
telepresence application. Given the speed of eye movements and the intolerance
of the visual system to lag,
eye-slaved systems are impractical in many telepresence applications. We wish
to investigate the degree
to which intelligent attention algorithms and system memory can be used to
provide an effective visual
experience in situations where lags are significant and bandwidth is limited.
9 Conclusion
We have demonstrated what we believe to be the first attentive panoramic
visual sensor in which high res-
olueion (foveal) colour video is fused in real time (15 fps) with colour
panoramic video. Saccadic behaviour
is determined both by the interest of the observer and by autonomous attention
(motion) computations. A
primitive form of memory permits the accumulation of high resolution
information over space, at the ex-
pense of temporal resolution. The attentive panoramic sensor is to be used for
future research in real-time
video fusion, attention and telepresence.
RE~ferences
~1J F. Ferrari, J. Nielsen, P. CZuesta, and G. Sandini. Space variant imaging.
Sensor Revie~cu, 15(2):17-20,
1995.
(2J W.S. Geisler and J.S. Perry. A real-time foveated mufti-resolution system
for low-bandwidth video
communication. In B. Rogowitz and T. Pappas; editors, Human Yision and
Electronic Imaging, SPIE
Proceedings, volume 3299, pages 294-305. 1998.

CA 02347493 2001-05-14
(3J H. Ishiguro, M. Yamamoto, and S. Tsuji. Omni-directional stereo. IEEE
Trans. Pattern Analysis and
Machine Intelligence, 14(2):257-262, 1992.
(4J L. Loschky and G.W. McConkie. Gas:e contingent displays: Maximizing
display bandwidth efficiency.
Army Research Laboratory Advances' Displays and Interactive Displays Federated
Laboratory Third
Annual Symposium, 1999.
(5) S. Nayar. Catadioptric omnidirectional camera. Proc. IEEE Conf. Computer
Vision Pattern Recog-
nition, pages 482-488, 1997.
(6J S.J. Oh and E.L. Hall. Guidance of a mobile robot using an omnidirectional
vision navigation system.
Proc. Soc. Photo-Optical Instrumer~ta;tion Engineers (SPIE), 852:288-300,
1987.
(7J F. Pardo, B. Dierickx, and D. Sche:ffer. CMOS foveated image sensor:
Signal scaling and small
geometry effects. IEEE Transactio~as on .Electron Devices, 44(10):1731-1737,
October 1997.
(8J R.A. Rensink, J.K. O'Regan, and J.J, Clark. To see or not to see: the need
for attention to perceive
changes in scenes. Psychological science, 8(5):368-373, Sep 1997.
(9J J. van der Spiegel, G. Kreider, C. Claeys, I. Debusschere, G. Sandini, P.
Dario, F. Fantini, P. Belluti,
and G. Soncini. A foveated retina-like sensor using CCD technology. In C. Mead
and M. Ismail,
editors, Analog VLSI implementation of neural systems, chapter 8, pages 294-
305. Kluwer, Boston,
1989.
(10~ R. Wodnicki, G. W. Roberts, and M. Levine. Design and evaluation of a log-
polar image sensor
fabricated using a standard 1.2 um ASIC CMOS process. IEEE Journal of Solid-
State Circuits,
32(8):1274-1277, August 1997.
(11) Y. Yagi and S. Kawato. Panoramic scene analysis with conic projection.
Proc. let. G'onf. on Robots
and Systems (IROS), 1990.
The foregoing description of the preferred embodiments of the invention has
been presented to illustrate the principles of the invention and not to limit
the invention
to the particular embodiment illustrated. It is intended that the scope of the
invention be
defined by all of the embodiments encompassed within the following claims and
their
equivalents.

Representative Drawing

Sorry, the representative drawing for patent document number 2347493 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(22) Filed	2001-05-14
(41) Open to Public Inspection	2002-11-14
Examination Requested	2006-05-03
Dead Application	2010-05-14

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2009-05-14	FAILURE TO PAY APPLICATION MAINTENANCE FEE
2009-05-25	R30(2) - Failure to Respond

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Application Fee			$150.00	2001-05-14
Maintenance Fee - Application - New Act	2	2003-05-14	$50.00	2003-03-07
Maintenance Fee - Application - New Act	3	2004-05-14	$100.00	2004-03-12
Maintenance Fee - Application - New Act	4	2005-05-16	$100.00	2005-05-05
Request for Examination			$800.00	2006-05-03
Maintenance Fee - Application - New Act	5	2006-05-15	$200.00	2006-05-03
Maintenance Fee - Application - New Act	6	2007-05-14	$200.00	2007-05-01
Maintenance Fee - Application - New Act	7	2008-05-14	$200.00	2008-05-02

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
ELDER, JAMES H.
GOLDSTEIN, RONEN
HOU, YUQIAN

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Drawings	2001-05-14	5	245
Description	2001-05-14	8	430
Claims	2001-05-14	2	40
Abstract	2001-05-14	1	39
Cover Page	2002-11-01	1	50
Fees	2008-05-02	1	33
Assignment	2001-05-14	5	224
Correspondence	2001-07-25	2	103
Correspondence	2002-04-02	4	158
Fees	2003-03-07	1	38
Fees	2004-03-12	1	39
Prosecution-Amendment	2006-05-03	1	38
Fees	2006-05-03	1	38
Fees	2005-05-05	1	36
Fees	2007-05-01	1	36
Prosecution-Amendment	2008-11-24	2	64

Language selection

Menus

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2347493 Summary

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.