Language selection

Search

Patent 2022706 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 2022706
(54) English Title: STEREOSCOPIC VIDEO-GRAPHIC COORDINATE SPECIFICATION SYSTEM
(54) French Title: SYSTEME VIDEOGRAPHIQUE STEREOSCOPIQUE A REPRESENTATION DE COORDONNEES
Status: Deemed expired
Bibliographic Data
(52) Canadian Patent Classification (CPC):
  • 350/3.1
(51) International Patent Classification (IPC):
  • H04N 13/341 (2018.01)
(72) Inventors :
  • MILGRAM, PAUL (Canada)
  • DRASCIC, DAVID (Canada)
  • GRODSKI, JULIUS J. (Canada)
(73) Owners :
  • MINISTER OF NATIONAL DEFENCE (Canada)
(71) Applicants :
(74) Agent: GOWLING WLG (CANADA) LLP
(74) Associate agent:
(45) Issued: 1998-08-04
(22) Filed Date: 1990-08-03
(41) Open to Public Inspection: 1992-02-04
Examination requested: 1994-05-16
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data: None

Abstracts

English Abstract



A stereoscopic image generator for superimposing a stereographic pointer
video signal onto a composite standard video signal, the generator comprises a
computer for generating the stereographic pointer video signal in response to aninput signal representative of the three dimensional coordinates of a point in athree dimensional video image, a video synchronizing circuit for receiving the
composite standard video signal and delivering a synchronization signal to the
computer, and a video keying circuit for receiving the composite standard video
signal and the stereographic pointer video signal and superimposing the
stereographic pointer video signal onto the composite standard video signal to
produce a single combined video signal, the single combined video signal allowing
the stereographic pointer video signal and the composite standard video signal to
be viewed together simultaneously on the same video screen.


French Abstract

L'invention est un générateur d'images stéréoscopiques servant à superposer le signal vidéo d'un pointeur stéréographique à un signal vidéo composite standard. Ce générateur comporte un ordinateur servant à produire le signal vidéo d'un pointeur stéréographique en réponse à la réception d'un signal représentant les trois coordonnées d'un point d'une image vidéo tridimensionnelle, un circuit de synchronisation vidéo servant à recevoir le signal vidéo composite standard et à transmettre le signal de synchronisation à l'ordinateur, et un circuit d'incrustation vidéo servant à recevoir le signal vidéo composite standard et le signal vidéo du pointeur stéréographique et à superposer le second au premier pour produire un seul signal vidéo résultant qui affiche simultanément sur l'écran le pointeur stéréographique et l'image du signal vidéo composite standard.

Claims

Note: Claims are shown in the official language in which they were submitted.



-44-
THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A stereoscopic television system, comprising:
alternating-field video signal generating means for producing a standard video signal
having alternating odd and even raster scan field video images, said
alternating-field video signal generating means including a left camera for
producing a left video image and a right camera for producing a right video
image, each said left and right video images being comprised of alternating odd
and even raster scan field video images, means for synchronizing said left and
right cameras, means for aligning said left and right cameras, and alternating field
combining circuit means for producing a composite standard video signal
comprised of alternating video images from said left and right cameras;
a stereoscopic pointer image generator for generating a raster graphic output video signal
of a virtual, stereographic pointer in the form of a complex three-dimensional
image, said stereoscopic pointer image generator including computation means forgenerating said complex image in response to an input signal;
video synchronizing circuit means for receiving said composite standard video signal and
delivering a synchronization signal to said computation means;
video keying circuit means for receiving said composite standard video signal and said
graphic output video signal and superimposing said graphic output video signal
onto said composite standard video signal to produce a single combined video
signal, said single combined video signal allowing said graphic output video
signal and said composite standard video signal to be viewed together
simultaneously on the same video screen;
a video screen for displaying said composite standard video signal with said
stereographic pointer superimposed thereon;
stereoscopic shuttering spectacles, for use by an observer, for separating odd and even
raster scan field video images displayed on the video screen into left and right eye
images respectively to allow the observer to perceive three dimensional images
on a display screen; and



- 45 -
means for enabling the observer to communicate to said image generator, with three
degrees of freedom, the desired location of said stereographic pointer within the
perceived three dimensional images on said screen.

2. A stereoscopic television system as defined in claim 1, said complex image being
in the form of arbitrary numeric and alphanumeric information about objects in said
video scene and said image being disposed at various arbitrary depth planes within said
scene whereby said information is displayed on-screen and in-depth.

3. A stereoscopic television system as defined in claim 1, said complex image being
in the form of three-dimensional shapes.

4. A stereoscopic television system as defined in claim 1, said complex image being
in the form of a wireframe overlayed over images of real objects in said scene for
enhancing said images of real objects.

5. A stereoscopic television system as defined in claim 1, said complex image
representing on-screen information about field intensities surrounding real objects
displayed in said scene.





-46-
6. An interactive camera alignment control system for dynamically configuring the
separation and convergence angle of a pair of cameras of a camera system so that the
point of convergence of said cameras is as close as possible to the centre of an observer's
interest within a video scene produced by said cameras being viewed, said camera system
including means for mounting said cameras for movement toward and away from one
another and for angular displacement of their lines of sight from a reference plane and
means responsive to electrical control signals for adjusting the separation and
convergence angle of said cameras, said cameras being operable to produce first and
second standard video signals having alternating odd and even raster scan field video
images, comprising:
means for combining said first and second standard video signals to produce a composite
standard video signal comprised of alternating even video images from one of said
first and second cameras and odd images from the other of said first and second
cameras;
a stereoscopic pointer image generator for generating a raster graphic output video signal
of a virtual, stereographic pointer, said stereoscopic pointer image generator
including computation means for generating said pointer in response to an input
signal and an electrical output signal indicative of a point or region of interest
within said scene;
video synchronizing circuit means for receiving said composite standard video signal and
delivering a synchronization signal to said computation means;
video keying circuit means for receiving said composite standard video signal and said
graphic output video signal and superimposing said graphic output video signal
onto said composite standard video signal to produce a single combined video
signal, said single combined video signal allowing said graphic output video
signal and said composite standard video signal to be viewed together
simultaneously on the same video screen;
a video screen for displaying said composite standard video signal with said
stereographic pointer superimposed thereon;


- 47 -
stereoscopic shuttering spectacles, for use by an observer, for separating odd and even
raster scan field video images displayed on the video screen into left and right eye
images respectively to allow the observer to perceive three dimensional images
on a display screen;
means for enabling the observer to communicate to said image generator, with three
degrees of freedom, the desired location of said pointer within the perceived three
dimensional images on said screen; and
means responsive to said electrical output signal for producing camera separation and
convergence angle control signals and transmitting said control signals to said
means responsive to electrical control signals whereby to cause adjustment of said
camera separation and convergence angle.

7. A control system as defined in claim 6, said means for producing control signals
being operable to produce control signals so as to move said cameras to a position which
centres said point or region of interest within said video scene's field of view.

8. A control system as defined in claim 6, said means for producing control signals
being operable to produce control signals so as to move said cameras to a position which
cause said lines of sight to converge on said point or centre of said region of interest.

9. A control system as defined in claim 6, further including means responsive to said
electrical output signal indicative of the point or region of interest within said scene for
computing the optimum separation and angle of convergence for that particular point or
region of interest.

-48-
10. A control system as defined in claim 6, said means responsive to the coordinates
of said pointer for producing the camera separation and convergence angle control signals
being further responsive to signals representative of the focal length of the lenses of said
cameras, the gain of the display system and the distance of the observer's eyes from the
display surface.

11. A stereoscopic television system as defined in claim 6, further including a pair
of parallel roman screws each having left and right hand threads thereon, one of said
cameras being threadedly engaged with the left hand threads of said pair of screws and
the other of said camerras being threadedly engaged with the right hand threads of said
pair of screws whereby rotation of one or both of said screws being operable to angularly
adjust said cameras with respect to one another, motor means for rotating said screws,
controller means for controlling said motor means and computation means for producing
and delivering camera alignment signals to said controller means.


Description

Note: Descriptions are shown in the official language in which they were submitted.


_1_
The present invention relates, in general, to a stereoscopic television
systems of the type employed in teleoperated robotic systems and remote
surveillance systems and, morn specifically, to a stereographic video graphic
three
dimensional coordinate specification system for use in such systems.
S
BACKGROUND OF THE INVENTION
The present invention is concerned with the problem of enabling a human
operator of a teleoperated system (such as a mobile explosive ordnance
disposal
robot) or of a remote surveillance system, to visualise efficiently the
location and
orientation of various objects and obstacles in a remote environment, and to
visualise efficiently the location and orientation of the remote system, or
teleoperator, itself with respect to the remote environment, that is, with
respect
to various objects and obstacles in that environment, and to operate
efficiently the
various functions of the remote system, that is, to control its locomotion and
to
operate any of the teleoperator's effectors, such as robotic arms, grippers,
weapons, etc., with respect to the remote environment. The terms
"teleoperator"
or "robot" include any system, such as a mobile robot, which can be controlled
at
a distance and from which visual information is fed by means of a video signal
to
a human controller. The terms also include a video camera system alone,
without
remote vehicle platform or telemanipulator, as used for remote surveillance.
In conventional systems, the ability to carry out these functions is limited
primarily by the ability of the human operator to view the remote environment.
Typically, a closed circuit monoscopic video system is used with such systems.
A
closed circuit monoscopic video system includes a single video camera mounted
on or near the mobile robot and the human operator views the remote
environment via a single video monitor. The term "remote" is used here in a
general sense, to refer to any separation between the observer and the
camera(s), that is, either a physical or a temporal or a functional
separation.
There are a number of visualisation problems which commonly accompany such
viewing systems and these arise from the factors briefly discussed below.
First, the resolution of the clased circuit video system is typically about
330-360 horizontally resolvable lines, depending on the quality of the
(colour, solid



-2-
state) video camera, optics, and monitor. This is much less than that of the
human visual system during direct viewing and therefore limits the ability of
the
human operator to detect and recognise details. Second, unless expensive
coupling hardware between the human operator's head movements and the
remote camera's pan and tilt unit has been provided, which is typically not
the
case at present, the ability of the human operator to "look around" and assess
the
remote environment comfortably is greatly restricted. Third, the relatively
small
field of view afforded by the camera lenses being used is typically around 30-
40°,
depending on the focal length of the lens, is much less than the natural field
of
view of about 120° of the human binocular visual system. Further, the
usual
reduction in scale due to the size of the viewing screen restrict the ability
of the
human operator to assimilate important information from the remote visual
environment, such as estimating the rate at which objects are streaming
through
the camera's visual field, information which is necessary for the operator to
estimate robot speed and to control robot locomotion accurately. Fourth,
single
camera video systems can, under many circumstances, severely restrict the
ability
of the human operator to estimate the distances between objects in the remote
environment, as well as to detect the presence of objects or obstacles which
otherwise tend to blend into the visual background.
The present invention is particularly concerned with the fourth problem
addressed above, although it does have implications for the other viewing
problems mentioned. In order to estimate "depth" information with monoscopic
video systems, i.e. the relative distance of objects in the direction
perpendicular
to the plane of the viewing screen, the main visual cues available include
relative
object size wherein objects closer to the camera appear larger, motion
parallax
involving relative change of visual angle of moving objects, occlusion wherein
closer objects block off farther objects located behind them, surface texture
and
lighting. Stereopsis, the important ability to perceive volumetric information
by
means of binocular disparity, i.e. the differences between the projections of
the
parts of an object onto the two retinas of an observer's eyes, is not
achievable
with monoscopic television systems.




~~~~''~
In some operations carried out with remotely manipulated systems, it is
necessary to estimate the distance from the robot, or from the remote cameras,
to a particular object or, more particularly, to estimate the spatial
coordinates of
a specified object relative to the robot. Furthermore, in some operations, it
is
necessary to estimate the distance between two particular objects or specific
points in the remote vicinity of the teleoperator. For example, an operator
might
want to know the distance to a particular object for purposes of orientation,
weapon aiming, manoeuvring, etc. Similarly, the operator might want to
indicate
a particular point in space in order to issue some kind of °'go to"
command to the
locomotion or manipulator control system, in a higher order control mode than
is presently possible. In the case of a mobile explosive ordnance disposal
robot,
for example, instead of aiming the robot's weapon at a target manually, if the
operator were to have the relative spatial coordinates of the designated
target
available, it would be a straightforward matter to design a microprocessor
based
system to direct the weapon towards the specified target.
For all of the above operations, the basic objective is to automate various
teleoperator functions and thereby to improve operational efficiency, by
taking
advantage of the ability to make precise numerical computations afforded by
available computing power. The problem in all of these applications, however,
is the lack of an adequate means to communicate accurately to the computer
system the essential information about the spatial coordinates of objects of
interest in the robot's surroundings.
Present techniques for addressing the problems outlined above consider
separately two levels of problems. The first problem is with respect to the
human
operator's perception of the spatial relationship among various objects in the
vicinity of the robot and the second problem is that of communicating the
spatial
coordinates of designated perceived objects or locations to the local computer
system.
At present, the most common means of addressing the first problem is to
continue to use monoscopic video and to rely on the various monoscopic depth
cues listed above. A more advanced means of addressing the problem is to
install
a stereoscopic viewing capability on the mobile robot. Under many
circumstances




~~9~~r~~
-4-
this will greatly improve the human operator's perception of the remote
environment and should especially enhance operations involving, for example,
(negative) obstacle avoidance, gripping and detection of camouflaged objects.
stereoscopic video systems are used in practice to allow an observer to
perceive volumetric information about all three dimensions of a (remote)
environment. That is, instead of the two dimensional images displayed on the
surface of a conventional video monitor, the viewer of a stereoscopic display
is
able also to perceive depth and distance directly within the image. In order
to
accomplish this, the two images produced by the two cameras at different
viewpoints must be presented to the corresponding eyes of the observer
separately, on either one or more than one display surface. The term "display
surface" will therefore be taken here to refer to one or more display devices
which
are used to present left and right eye information separately to the
observer's left
and right eye respectively.
With respect to the second problem mentioned, there is at present no
adequate practical means for the human operator to estimate an object's
spatial
coordinates, other than by estimating this solely on the basis of visual
observation
(either monoscopically or stereoscopically). On the other hand, it is possible
to
accomplish such measurements automatically, by making use of suitable machine
vision equipment. Typically this would comprise suitably arranged remote
camera(s), hardware and software for digitising camera images, pattern
recognition software for recognising object features in the camera images, and
software for computing the requisite spatial coordinates of designated objects
of
interest.
The obvious drawback to achieving the automated solution to the second
problem outlined above is the expense involved in adding the necessary
hardware
and software components. Equally important, however, is the reliability of
such
an arrangement. Although great progress has been made in the area of machine
vision, the general problem can not as yet be considered to be "solved". In
real
operational environments, potentially under poor lighting conditions, problems
associated with using computer software to identify integral objects, whose
features may not be easily distinguishable within a noisy and possibly complex
F




~.s l~ ~I ~~
9,; ,J
-5-
visual environment, can be great and could impede performance of the
teleoperator system as a whole. Furthermore, even if the computing power of
the
system is able to identify individual objects within the stereoscopic camera
images,
the problem still remains of how to enable the human operator to indicate to
the
computer system which of those objects in the visual scene are of interest to
the
human operator.
SUMMARY OF THE INVENTION
The present invention seeks to provide a system which enables an operator
to perceive on a single video monitor volumetric information about all three
dimensions in a remote environment, indicate to the computer system which of
those objects in the visual scene are of interest and accurately communicate
to the
computer system the essential information about the spatial coordinates of
objects
of interest in the remote environment.
In accordance with the present invention, this is achieved by providing a
method and an apparatus which synchronously superimposes a virtual,
stereographic pointer video signal onto the stereographic video signal of a
remote
environment so as to allow the two signals to be displayed together as a
single
combined video signal on a single viewing screen.
This arrangement enables a human operator to not only perceive a three
dimensional image of the remote visual surroundings, but also to manipulate
the
stereographic pointer within that image. The pointer appears to "float" within
the
real video environment and can be moved about by the operator, in three
dimensions, within that environment. The operator is thus able to "place" this
virtual pointer "on" or "near" any object being observed an the screen. The
system is provided with data respecting the magnification, separation and
orientation of the optics of the two cameras which generate the video image as
well as data respecting the cameras' light sensing electronic elements so as
to
enable it to scale the dimensions of the pointer appropriately. Thus, the
system
can easily display at any time required, or continuously if necessary, the
actual
quantitative scaled spatial coordinates of the virtual pointer in terms of its
location


~ ~.'~~
within the actual world recorded by the video cameras, as perceived by the
human
abserver.
The pointer may be of any desired shape, such as a "V", an arrow, or
cross-hairs. One especially useful embodiment of the pointer is a free-
floating
cursor tied to the end of a "rubber band", which is fixed at some point in
space,
such as in the vicinity of the camera system, so as to indicate clearly to the
operator the "pathway" to the cursor through three dimensional space. A
particularly useful embodiment of the "rubber band" option is a so called
"tape
measure" option whereby the stereographic pointer is used to fix one end of
the
"rubber band" at a point in the three dimensional space designated by the
operator and the "rubber band" is then "stretched" by the operator to a second
designated point in space. The real-world distance between the two points can
then be computed and displayed on the display device. ..
Accordingly, one aspect of the present invention is generally defined as a
stereoscopic image generator for superimposing a stereographic pointer video
signal onto a composite standard video signal, the generator comprising
computation means for generating the stereographic pointer video signal in
response to an input signal representative of the three dimensional
coordinates
of a point in a three dimensional video image, video synchronizing circuit
means
for receiving the composite standard video signal and delivering a
synchronization
signal to the computation means, and video keying circuit means for receiving
the
composite standard video signal and the stereographic pointer video signal and
superimposing the stereographic pointer video signal onto the composite
standard
video signal to produce a single combined video signal, the single combined
video
signal allowing the stereographic pointer video signal and the composite
standard
video signal to be viewed together simultaneously on the same video screen.
Yet another useful feature of the present invention enables automatic
adjustment of the stereoscopic cameras by solving the dynamic viewpoint
adjustment problem. The advantages of dynamically optimising the camera
configuration relative to the particular viewing operation include eliminating
excessive convergence or divergence of the observer's eyes, reducing
eyestrain,
fatigure and discomfort, minimising stereoscopic depth distortion and
increasing


~~~~ aid
stereoscopic depth resolution. In order to optimise the camera configuration
it
is necessary both to know where the observer's focus of attention is and to be
able
to control dynamically both degrees of freedom of the cameras, that is, the
camera
separation and the angle of convergence of the cameras. The present invention
provides both of those capabilities.
According to the present invention, an observer who is viewing the remote
environment by means of the stereoscopic video system employs a three degree
of freedom pointing device, which communicates with the stereographic pointer
via a control computer, to indicate where in the three dimensional environment
he is interested in surveying, examining more closely, making measurements,
etc.
Any shift of attention indicated by movement of the stereographic pointer
within
an x-y plane, parallel to the display surface, can be accompanied by remote
control of the cameras' pan and tilt angles and/or by means of remote
translation
of the cameras with respect to the x-y plane. Any shift of attention indicated
by
1S movement of the stereographic pointer in the z-direction, perpendicular to
the x-y
plane, however, may necessitate recomputation of the cameras' separation and
angle of convergence, according to some optimisation routine. Similarly, the
observer may use the pointer to indicate that an increased or decreased
stereoscopic depth resolution is needed at that particular distance from the
cameras, which may also necessitate recomputation of the cameras' separation
and
angle of convergence, according to some optimisation routine.
Once the focus of attention, or intended focus of attention, of the observer
has been communicated to the control computer by means of the stereographic
pointer, an optimisation routine is invoked which computes the separation and
angle of convergence of the cameras which are suitable for that particular
focus
of attention, or intended focus of attention. In addition to the observer's
focus of
attention in three dimensional space, as indicated by the stereographic
pointer, the
optimisation routine takes into account the focal length of the lenses used,
the
gain of the display system and the distance of the abserver's eyes from the
display
surface. It should also be noted the the optimisation routine takes into
account
variable focal lengths due to zooming of the lenses. The optimisation routine
also
takes into account the history of recent adjustments to the camera
configuration,



in order to prevent adjustments to the camera configuration which may be too
rapid. Under some circumstances, it is advantageous for the camera alignment
to be modified only whenever the location of the stereographic pointer
deviates
beyond a specified distance from the point of convergence of the cameras. This
can be initiated either upon request of the user, or automatically by the
control
computer according to a defined criterion, or on a continuous basis.
Once the updated separation and angle of convergence of the cameras are
computed by the optimisation routine, control signals are generated which
drive
a suitable mechanism which controls the two degree of freedom camera alignment
control system. A preferred mechanism is comprised of two stepper motors which
operate twin roman screws having left and right handed threads on opposing
halves of their shafts, upon which r ide the two camera platforms. By turning
both
screws appropriately, the two platforms move together and apart symetrically,
which allows for adjustment of camera separation. By turning only one screw,
or
by turning both screws in opposite directions, the angle of convergence, or
divergence, of the two cameras can be adjusted. Combining both of these modes
of adjustment therefore allows independent adjustment of camera separation and
camera convergence angle.
Accordingly, a further aspect of the present invention is generally defined
as an interactive camera alignment control system for dynamically configuring
the
separation and convergence angle of a pair of cameras of a camera system so
that
the point of convergence of the cameras is as close as possible to the centre
of an
observer's interest within a video scene produced by the cameras being viewed,
the camera system including means for mounting the cameras for movement
toward and away from one another and for angular displacement of their lines
of
sight from a reference plane and means responsive to electrical control
signals for
adjusting the separation and convergence angle of the cameras, the cameras
being
operable to produce first and second standard video signals having alternating
odd
and even raster scan field video images, comprising means for combining the
first
and second standard video signals to produce a composite standard video signal
comprised of alternating even video images from one of the first and second
cameras and odd images from the other of the first and second cameras, means



~r~~~
_g_
far producing an electrical signal indicative of the point or region of
interest
within the scene, means for superimposing onto the composite standard video
signal a raster graphic output video signal of a virtual, stereographic
pointer
representative of the three dimensional coordinates of a point in a three
dimensional video image to produce a multiplexed, composite video signal,
means
for displaying the multiplexed, composite video signal, means responsive to
the
coordinates of the pointer for producing the camera separation and convergence
angle control signals and transmitting the control signals to the control
signal
responsive means whereby to cause adjustment of the camera separaticn and
convergence angle.
From an operational viewpoint, there are a number of novel applications
of such a system. One application involves interactive, real-time, on-line
rangefinding, whereby a combined computer/video system computes and displays
the distance from the remote camera system to the object indicated by the
human
operator, or alternatively, between objects indicated by the human observer.
l~nother application relates to aiming of weaponry wherein the operator points
to a target in three dimensional space and the computer computes its relative
spatial coordinates and the conseguent necessary orientation of the
telemanipulator. Still another application relates to high order manual
vehicle
control wherein, rather than steering a remotely operated vehicle manually to
a
particular point by means of continuous directional and velocity control, the
stereographic pointer is used to indicate the desired end point, which, once
completely specified in spatial coordinates relative to the vehicle's frame of
reference, can be approached under the control of a control system (at any
desired rate). A still further application is high order manual manipulator
control
wherein, rather than manually controlling each of the joints of a mufti-degree
of
freedom robotic arm in order to move the end effector to a desired point in
space, or even rather than controlling the end effector directly by means of
resolved motion control, the stereographic pointer is used to indicate the
desired
end point, which, once completely specified in spatial coordinates relative to
the
manipulator's frame of reference, can be approached by the end effector under
the control of the computer control system (at any desired rate).



~~1.~~~
- 10-
The present invention can be achieved by existing, off-the-shelf technology
and therefore is relatively inexpensive. T'he invention combines closed
circuit
stereoscopic video technology with stereoscopic computer graphic technology
and
uses a continuous, three degree of freedom computer input device and human
pattern recognition and decision making capabilities, rather than those of a
programmed microprocessor, to create a "virtual" pointing device for
performing
measurements in a "real" remotely viewed environment, thereby obviating the
need for a relatively expensive and potentially unreliable and/or inaccurate
"artificially intelligent" system.
BRIEF IDESCRIFTI~N ~F TIIE DRAV4'INGS
These and other features of the invention will become more apparent from
the following description in which reference is made to the appended drawings
wherein:
FIGURE 1 is a block diagrammatic view of a preferred embodiment of a
stereoscopic television system with a stereographic pointer according to the
present invention;
FIGURE 2 is a schematic diagram of a typical alternating-field stereoscopic
television system;
FIGURE 3 is a block diagrammatic view of an alternating field combining
circuit;
FIGURE 4 are flow chart diagrams of one embodiment of interactive
stereographic pointer display logic according to the present invention;
FIGURE 5 is a schematic representation of a preferred embodiment of
stereographic pointer plotting geometry according to the present invention;
FIGURE g is a schematic diagram of a preferred embodiment of a dual camera
alignment control system according to the present invention;
FIGURE 7a is a sketch of a simulated park scene;
FIGURE 7b is a sketch of how the park scene of FIGURE 7a might appear on
a video monitor where, by using suitable stereoscopic spectacles, an
observer would perceive the scene three dimensionally, that is, he would
directly perceive the relative distances of the various objects in the scene



-11-
viewed via the cameras, by using stereoscopic depth cues, rather than just
monoscopic depth cues;
FIGURES 8a and 8b are similar to FIGURES 7a and 7b but illustrating an
embodiment in which the pointer is used to indicate and/or highlight a
S monitor a virtual three dimensional trajectory through the equivalent
hypothetical real video scene;
FIGURE 9 illustrates an embodiment of the present invention, which provides
for
dynamic adjustment of the camera separation and camera angle of
convergence;
FIGURES 10a and 10b are diagrams illustrating the geometry of the optical
sensing elements of cameras and the perception by a human observer of
the corresponding stereoscopic images; and
FIGURES 11~a through 11e depict various top views of a stereoscopic viewing
system in a hypothetical working environment to illustrate the manner in
which one embodiment of the present invention operations to dynamically
control the configuration of the stereoscopic cameras.
DESCRIPTION OF PREFERRED E1WI130DIMENT
FIGURE 1 illustrates the preferred embodiment of the present invention,
comprising a stereoscopic television system 10 which provides virtual
stereographic
pointer images superimposed upon real-time live three dimensional video images
produced by an alternating-field stereoscopic television system. In general,
the
present invention includes an alternating field video signal generating system
11
which includes a pair of synchronized cameras 14 and 16. 'The cameras may be
mounted on a robot or may farm part of a remote surveillance system. As
explained more fully later, the individual video signals produced by the
cameras
are processed by an alternating field combining circuit 20 to produce a single
composite video signal 23. Signal 23 is delivered to a stereographic pointer
image
generator 30 where a pointer video signal, having the same format as signal
23,
is superimposed onto signal 23. The resulting signal 35 is delivered to a
video
display monitor 38 on which, using stereoscopic shuttering spectacles ~4, a
user
is able to perceive three dimensional images which will include the images




-12-
recorded by the cameras as well as the stereographic pointer. Further, using a
pointer positioning device 35, the user is able to move the pointer on the
screen
within the three dimensional images captured by the cameras and place the
pointer on or near any object observed on the screen. This, in turn, allows a
control computer 32 to control the positioning of the cameras as well as the
operation of the robot, if a robot is used.
Before describing the invention in greater detail, it would be useful to
review FIGURES 7 and 8.
FIGURE 7 is an illustration of how the relationship between a hypothetical
real-world video scene and the same scene viewed stereoscopically on a monitor
with superimposed stereoscopic graphics might look. FIGURE 7a is a sketch of
a simulated park scene, comprising a park bench 394, in front of a tree 392,
beside a dustbin 395, with a rock 393 in the background and a box 396
somewhere
off to the right. The simulated scene is being viewed through a pair of
stereoscopic video cameras 14 and 16. FIGURE 7b is a sketch of how the same
scene might appear on a video monitor 38, where it is to be understood that,
by
using suitable stereoscopic spectacles b4, an observer would perceive this
scene
three dimensionally; that is, he would directly perceive the relative
distances of the
various objects in the scene viewed via the cameras, by using stereoscopic
depth
cues, rather than just monoscopic depth cues.
One embodiment of the stereographic pointer is shown by the shaded
triangle pointer 371, which in this example has been placed at the top of the
rock
393' by the observer, using the Pointer Positioning Device 35. It is to be
understood that, to the observer, the stereographic pointer 371 would be
perceived as hovering exactly above that corner of the rock in three
dimensional
space, that is, at the exact same distance from the cameras as the real corner
of
the rock 393 itself. It is important to note, however, that the image of the
stereographic pointer 371 in FIGURE 7b is a virtual image, that is, there is
not a
corresponding image in the real video scene in FIGURE 7a.
One feature of the present invention is the capability of computing,
according to the method illustrated in FIGURE 5 described later, the
equivalent
distance from the midpoint of the camera axis to the equivalent location of
the




-13-
virtual pointer within the corresponding real video world, and displaying the
results of this computation to the observer. In FIGURE 7b this distance is
illustrated by the "Start" distance within the readout block 374 on the
screen.
Another feature of the present invention, which can not be illustrated in
FIGURE 7b due to the two dimensionality of the figure, is the capability of
displaying the computational readouts associated with the pointer position, or
the
entire readout block 374, or any other symbols or figures, alphanumeric or
pictorial or otherwise, at any desired equivalent distance within the
corresponding
real visual scene. This is accomplished simply by displaying the information
stereoscopically, at the same equivalent distance within the real world as any
desired equivalent stereographic pointer location, according the methods of
computing the necessary binocular parallax outlined in this disclosure. This
feature is in contrast to the conventional method whereby information is
typically
displayed at the same plane as the viewing screen, that is, monoscopically.
Yet another feature of the present invention is the so-called "tape measure"
option. As illustrated in FIGURE 7b, the stereographic pointer 371 can also be
moved by the observer, using the Pointer Positioning Device 35, from its
original
"Start" position, to any other position, indicated in the figure by the second
stereographic pointer 372. Because the equivalent position of the
stereographic
pointer within the corresponding real video world is always known, according
to
the method illustrated in FIGURE 5, it is possible not only to compute the new
equivalent position of the stereographic pointer 372, but also to compute the
net
scalar distance between the old and the new pointer positions 371 and 372.
These
values are illustrated within the readout block 374 as the "End" position and
the
"Net Distance". Another fundamental feature of the tape measure option is the
ability to highlight on the viewing screen the equivalent three dimensional
trajectory 373 of the pointer 372 relative to the "Start" pointer 371. Note
that
neither the stereographic pointers 371 and 372, nor the stereographie
trajectory
line 373 appear in the equivalent real-world illustration shown in FIGURE 7a.
An extension of the tape measure option is illustrated in FIGURES Sa and
gb, wherein the context of the two figures parallels exactly that of FIGURES
7a
and 7b. Whereas in FIGURE 7b the observer has used the stereographic pointer




_14_
as an instrument to measure the locations of specific objects or points, or
distances between those objects or points, within the real three dimensional
video
scene, in FIGURE 8b the observer has used the pointer principally as a means
to
indicate and/or highlight a virtual three dimensional trajectory 388 through
the
equivalent hypothetical real video scene shown in FIGURE 8m. In this example,
the observer has moved the stereographic pointer, here shown as an inverted
arrow 383 (as opposed to the inverted shaded triangle shape depicted in
FIGURE 7b) to an initial point within the real three dimensional video scene.
He
has then caused the stereographic pointer to move to five other subsequent
points,
depicted in the figure as stereographic pointers 384, 385, 386 and 387. The
virtual
three dimensional trajectory line 388 has been drawn to illustrate a pathway
through these various waypoints. Note that neither the virtual three
dimensional
trajectory line 388, nor the stereographic painters 383, 384, 385, 386 and
387,
appear within the illustration of the corresponding real world scene in
FIGURE 8a. This feature of superimposing a virtual three dimensional
trajectory
on top of a real video scene has applications in navigation, training,
telerobotic
control, robot path planning, highlighting the predicted course of
manipulators or
projectiles, as well as any other situation in which it could be useful to
indicate to
a human observer a potential three dimensional pathway through a real remote
scene.
Yet another feature of the present invention is the ability, in addition to
the superimposed stereographic pointers, tape measures and trajectories
illustrated thus far, to consider the concept of the stereographic pointer in
a more
general sense by superimposing on the viewing screen stereoscopic images of
more "complex" three dimensional objects. In generating such images, the
methods described above for computing the screen coordinates of an arbitrary
stereographic pointer located at a particular equivalent location within the
real
video world can be applied in the general case, to compute the screen
coordinates
of the vertices, for example, of any more complex three dimensional object
which
is to be drawn within the corresponding real video world. This feature is
illustrated in FIGURE 7b by the virtual box 375, which has been superimposed
next to the front of the park bench 394'. Note again that no similar box
appears




~~Ga~3~~3
-15-
in the equivalent real-world illustration shown in FIGURE 7a. Such a
capability
is useful for applications where it is necessary to be able to visualise how
or where
a particular object will appear within a visual scene, such as during a design
project, or for training purposes, or for superimposing "predictive displays"
of
future object locations and attitudes, or for graphically superimposing
concepts
such as "field intensities", etc.
In addition, if the geometrical measurements and location of a particular
object are known to the computer system, the same object can be superimposed
graphically onto the real video scene, fox example using wireframe imaging, as
a
means of checking the integrity of the display system and/or the object
location
measuring system, by verifying whether the real and the virtual objects being
displayed are indeed aligned. Another related application of this capability
is to
superimpose a graphical "wireframe" image of an object onto the corresponding
real object as a means of enhancing the view of the boundaries of that object,
such as for when the video display suffers from poor lighting conditions,
glare,
scatter, occlusion, etc. Object boundaries can also be enhanced by adding
colour,
or shading, or highlighting to the virtual stereographic image, or portions of
that
image, permitting one to indicate to the observer various types of
information,
such as proximity warnings. dearly, the examples mentioned here are
representative only, and do not limit the potential range of applications of
the
present invention.
Before describing the invention in greater detail, it would be useful to
review the operation of a typical stereoscopic television system. FIGURE 2
illustrates a stereoscopic television system generally designated by reference
2S numeral 44 in which individual video signals 15 and 17, produced by
synchronised
left and right video cameras 14 and 16 respectively, are processed by
alternating
field combining circuit 20 to produce a single composite interlaced standard
video
signal 23, in conventional analogue format, such as NTSC, PAL, SEAM and the
like. As illustrated in FIGURE 2, the left (L) camera video signal 15
comprises
alternating odd (O) and even (E) raster scan fields, labelled LO and LE
respectively. Similarly, the xight (IZ) camera video signal 17 comprises
alternating
odd (O) and even (E) raster scan fields, labelled RO and RE respectively. The



- 16-
resulting single composite interlaced standard video signal 23 is comprised of
a
repeating pattern of alternating fields, LO, RE, LO, RE, etc. which is fed
both to
video monitor 38 and to spectacle synchronisation and driving circuit b0. This
results in driving signals fed in counterphase to stereoscopic shuttering
spectacles
G4, which act to "separate" for the observer the odd and even field video
images
being displayed on the video screen 38 into left and right eye images,
respectively,
to thereby allow the observer to perceive three dimensional images on the
display
screen. With no loss of generality, it is understood that the standard video
signal
23 illustrated in FIGURE 2 could equally well be depicted as comprising the
alternating field pattern LE, RO, LE, RO, etc.
With reference to FIGURE 1, the present invention will be seen to include
an alternating-field video signal generating system 11 for producing composite
interlaced standard video signal 23 in conventional analogue format, such as
NTSC, PAL, SECAM and the like, comprising a combination of alternating left
and right field video images. System 11 generally comprises a synchronised
dual
video signal generating system 12 for generating a pair of synchronised video
signals, comprising a left camera video signal 15 and a right camera video
signal
17, which are fed to alternating-field combining circuit 20. System 12
generally
takes the form of a pair of video cameras, left camera 14 and right camera 16,
which are physically aligned by camera alignment system 50 having a camera
alignment mechanism 51 (see FIGURE 6) under the control of camera alignment
controller 52, and are synchronised by camera sync generator circuit 18.
Alternatively, system 12 could comprise a special purpose dual-optic,
single-camera stereoscopic video system (not shown). An alternative embodiment
of circuit 11, for off line images, could comprise a video playback system
(not
shown).
FIGURE 3 illustrates a preferred form of alternating-field combining circuit
20 in which left video signal 1S is used to trigger sync separator 24, which
in turn
generates a square wave signal that drives a combining circuit 22, which
includes
left solid state relay 25 and right solid state relay 27. Sync separator 24 is
well
known to those skilled in the art, an embodiment of which is disclosed in an
article by L.-Y. Lin, published in Electrical Design Plews, October 16, 1986,


-17-
pp. 233-234. A NOT gate 26 is inserted between the sync separator 24 and the
right solid state relay 27 and causes the standard video signal 23 to be
comprised
of alternating odd and even fields supplied by the incoming left and right
video
signals 15 and 17, as illustrated in FIGURE 2. With no loss of generality it
is
understood that sync separator 24 shown in FIGURE 3 could equally well have
been triggered by right video signal 17, rather than left video signal 15 as
shown.
Referring to FIGURE 1, stereographic pointer image generator 30 will be
seen to include a computer video sync circuit 31, a control computer 32, a
video
keying circuit 34 and a pointer positioning device 35. Pointer image generator
30
operates, by means of computer video sync circuit 31, upon incoming standard
video signal 23 to synchronise or "genlock" the raster graphic output 33 of
control
computer 32 with video signal 23: As is well known to those skilled in the
art,
"genlocking" is the forced synchronization, by an external signal, of the
internal
horizontal, vertical and colour burst oscillators of a video device. Raster
graphic
output 33 is of the same electronic format as conventional analogue video
signals,
such as NTSC, PAL, SECAM and the like. A video keying circuit 34 is operable
to superimpose graphic output 33 upon the synchronised standard video signal
23,
allowing the two video signals to be displayed together as a single combined
video
signal 3b on the same viewing screen 38. Circuits 31 and 34 are well known to
those skilled in the art, one combined implementation of which is the AmiGen
genlock device, manufactured by Mimetics Corporation of Palo Alto, California.
Output 33 is a video signal of the stereographic pointer of the present
invention.
Control computer 32 includes software, generally designated by reference
numeral 70 in FIGURE 4, for creating and displaying computer generated left
and
right images, such as a computer generated cursor, or crosshair, or orthogonal
axis
system, or cursor attached to a "rubber band" anchored at its other end, or a
"tape
measure" running between two movable ends, under the control of a human
operator, and for displaying these in synchrony with the left and right camera
video images on the same viewing screen 3~ thereby creating a virtual,
stereographic pointer which appears to be superimposed upon the observed
"real"
three dimensional video image produced by the camera system.




-18-
Pointer positioning means 35 is in the form of a joystick, trackball, mouse
or the like and enables a human operator to communicate with the computer in
order to manipulate, with three degrees of freedom, the perceived location of
the
stereographic pointer within the perceived three dimensional real video image.
Three degree of freedom positioning devices are well known in the art and are
generally comprised of a stick or handle or trackball which can move within a
two-dimensional plane and an additional control mechanism which allows
movement in a third orthogonal dimension. Sensors (not shown) associated with
movement of the pointing device generate signals which are transmitted to the
Control Computer 32. One embodiment of such a device is the FastTRAP three
axis pointing device manufactured by MicroSpeed, Inc. of Fremont, California,
comprising a two degree of freedom (x-y plane) trackball and a one degree of
freedom (z plane) thumbwheel. The device include buttons which can be used to
cause the control computer to fix one or more pointers in a particular
position
and draw trajectories between them as indicated earlier with reference to
FIGURES 7 and 8.
Spectacle synchronisation and driving circuit 60 acts upon composite
interlaced standard video signal 36, which consists of the real three
dimensional
video images produced by the cameras and the superimposed stereographic
pointer, as displayed on video screen 38, by identifying the odd and even
raster
fields, corresponding to left and right eye images respectively (or,
equivalently,
right and left eye images respectively), by means of sync separator 61. Sync
separator 61 is well known to those skilled in the art, and one embodiment is
known from the article by L.-Y. Lin, published in Electrical Design News,
October
16, 1986, pp. 233-234. The outpuf of sync separator bi is fed to spectacle
driver
62, which produces driving signals fed in counterphase to stereoscopic
shuttering
spectacles 64, according to the principles illustrated in FIGURE 2.
Stereoscopic
shuttering spectacles 64 are operable to separate, for the observer, the odd
and
even field video images being displayed on the video screen 38 into left and
right
31) eye images, respectively, to thereby allow the observer to perceive three
dimensional images on the display screen.




-19-
Stereoscopic shuttering spectacles 64 are well known in the art and are
characterised as portable electro-optic shutter viewers having electro-optic
shutters
for positioning proximate to the eyes of the user. The preferred embodiment of
stereoscopic shuttering spectacles 64, as well as necessary means for
implementing
spectacle driver 62, are known from Paul Milgram United States
Patent No. 4,698,668 issued October 6, 1987. The combination of stereoscopic
shuttering spectacles 64 and spectacle driver 62 include means (not shown) to
receive a synchronisation signal and to alternatively activate and deactivate
individual shutter lenses (not shown) opposite the left and right eyes of the
viewer
to observe the even field scan and the odd field scan as transmitted in the
composite video signal delivered to video screen 38.
In addition to generating the raster graphic output 33 required for the
stereographic pointer, control computer 32 provides signals to control the
alignment of the dual video cameras comprised within the synchronised dual
video
signal generating system 12, as illustrated within FIGURE 6. Signals to
control the
alignment of the dual video cameras are optionally generated by software
within
control computer 32, in response to equivalent spatial location of the
stereographic pointer and/or requests by the human operator. In this
embodiment, signals arising from the control computer 32 are fed to the camera
alignment controller 52 which in turn activates motors 55 and 55' comprised
within
the camera alignment mechanism 51. The preferred embodiment of the camera
alignment mechanism 51, as illustrated schematically in FIGURE 6, comprises a
pair of mounting brackets, the left camera bracket 57, upon which left camera
14
is mounted, and right camera bracket 58, upon which right camera 16 is
mounted.
The motors 55 and 55' act to turn the two roman screws 56 and 56', each of
which
have both left and right hand threads, thereby moving the connection points of
the
two camera brackets 57 and 58 on roman screws 56 and 56' to different
positions
on the screws, thereby changing the orientation of the camera brackets 57 and
58
relative to each other. By turning both screws appropriately, the two
platforms
move together and apart symetrically, which allows for adjustment of camera
separation. By turning only one screw, or by turning both screws in opposite
directions, the angle of convergence, or divergence, of the two cameras can be




-20-
adjusted. Combining both of these modes of adjustment therefore allows
independent adjustment of camera separation and camera convergence angle.
In order to generate a stereoscopic graphic image of a pointer or any other
two or three dimensional object so that it appears to the User in a specific
location in space, it is necessary to determine the precise locations on the
viewing
monitor 38 where the corresponding left-eye and right-eye images would appear
if the pointer or object were physically present in the view of the
stereoscopic
video scene.
FIGURE 5a represents the geometry of the left camera, where the
compound lens of the camera is represented as a pinhole lens 806. The three
dimensional. geometrical basis of the camera system PQR has its origin
corresponding with the location of the pinhole lens 806. The R axis 808 is
perpendicular to the camera image sensor 801 and passes through the centre of
it. Given an arbitary point expressed in terms of the PQR basis, such as
A(p,q,r)
807, it is possible to calculate the position of the image A~S(lx,ly) 805 on
the image
sensor 801, with the equations lx=f(p/r) and ly=f(q/r), where lx 803 is the
horizontal distance of the image A~S(lx,ly) from the centre of the image
sensor
801, ly 804 is the vertical distance of the image AIS(lx,ly) from the centre
of the
image sensor 801, and "1" is the focal length of the lens.
A top view of a simplified two camera stereoscopic system is shown in
FIGURE 5b. The centroid 812 of the XYZ basis of the stereoscopic system is
located midway between the left camera lens 813 and the right camera lens 814,
at a distance "s". The two cameras converge on point "C", situated a distance
"e"
from the centroid 812 along the Z axis. The convergence angle ~ is defined by
the equation tan(k)=s/c. Note that for stereoscopic camera configurations that
are parallel, c is defined as infinite and ~ is defined as zero.
The geometric basis of the left camera is denoted by the PQR axes in
Figure 5a and by the PR axes in FIGURE 5b, while the geometric basis of the
right camera is denoted by the UV~I axes, shown as the U~V axes in FIGURE 5b.
Any point expressed in terms of any one of the three bases (XYZ, PQR, UVW),
can also be expressed in terms of any of the other bases. For example, given
any
point A, expressed in terms of the XYZ basis, as in FIGURE 5b, its
representation




~~~~ar~~
-21-
in the other bases is calculated as follows. liven f, s, and either c or ~,
and
given:
x
[A] - Y
z
then one can calculate:
p cos(~) 0 -sin() x s
[A] ~R - ~ 0 1 0 y + 0
r sirn(~) 0 cos(c~) z 0
Similarly,
a Cos(cp) 0 sin(cp) x -s
[A] ~, - v 0 1 0 y + 0
w -~($~) 0 ~(~) z 0
When the location of any point A expressed in the basis of each camera as
[.A]PQR and [A]~.,,, one can calculate the location of the image 805 on the
left
camera image sensor 801 and the image 819 on the right camera image sensor 820
using the method outlined below:
For the left camera:
Ix-fp and ly.- f9
r r
For the right camera:
rx-f a and ry-f ~'
w w



-22-
The relationship between the location of AiS on a camera image sensor and
the corresponding location on the display screen is linear in both the
horizontal
and vertical directions for an ideal display system. For real display systems,
appropriate calibration algorithms must be implemented to account for
non-linearities in each direction. Ideally, the gain in each direction is the
same,
although this is a function of the settings of the monitor. In order to draw a
realistic virtual image that will have the correct size and apparent location
in
depth, the left and right eye images are drawn on the viewing screen with the
size
and location they would have if they were real. The method described above
allows this to be done accurately.
The preferred embodiment of software logic 70 for generating interactive
pointer images, under control of user through pointer positioning device 35 is
illustrated in FIGI1RE 4. In the embodiment shown, software logic 70 serves to
enable the generation of pointer images which take on arbitrary forms, such as
"V" shaped cursor, crosshair, orthogonal axis system, or cursor attached to a
"rubber band" anchored at its other end, or "tape measure" function comprising
"rubber band" anchored under interactive control of the software user at both
ends. In addition, ordinary alphanumeric text can be generated to appear at
specified apparent depth planes. In the description which follows, the term
evQnt
means signals and commands generated by the software User, the term
TapeMeasure means a stereographic line generator produced by the software and
used by the user to measure real 'world distance between specified points in
external video view, the term "Measuring" Flag indicates that TapeMeasure is
currently employed and the term "Measured" )Flag indicates that the
TapeMsasure
operation has completed, and a fixed line with distance information appears on
the display.
According to software logic 70, generation of pointer images commences
with invocation of stereographic pointer program 701, causing initialisation
70Z of
computer memory and hardware device. Upon invocation of Draw Pointer
routine 703, Control Computer 3Z enters the Wait for Event state 704.
Upon Event occurrence, the software first tests for a "Quit" Signal at 710.
If present, the software tests at 71i for Measuring flag to be True,
signalling that




~~v
_23_
current measuring tape function is to cease. If True, the old TapeMeasure is
erased (713), Measuring Flag is set to False (714), and the software returns
to
Wait for Event state 704. If False, the software routines are ShutDown at 712
and the software exits. If no "Quit" signal is present at 710, the software
tests at
720 for "Change Pointer Shape" signal to be present.
If a "Change Pointer Shape" signal is present, the software sets the Pointer
Shape to NewShape at 721, redraws the Pointer at 722, and returns to Wait for
Event 704. If no "Change Pointer Shape" signal is present at 720, the software
tests for "StopMeasuring" Signal 730. If it is present, the software tests if
Measured flag is True at 731. If True, the software erases the old
TapelVleasure
at 732 and erases the Distance Information at 733. This is in order to clear
the
screen if the StopMeasuring signal arrives without a preceding StartMeasuring
signal. If Measured flag is False at step 731, then steps 732 and 733 are
skipped.
The software then sets Measuring flag to False at 734 and Measured flag to
True
at 735. The software then sets TapeMeasure EndPosition to current the Pointer
Position at 736, redraws TapeMeasure at 737, prints Distance Information 738,
and returns to Wait for Event state 704. If no "StopMeasuring" signal is
present
at 730, the software tests for "StartMeasuring" signal 740, which indicates
that the
user wishes to start measuring from a new location. If this signal is present,
the
software tests whether the Measuring flag is True at 741. If T:ue, the
software
erases the old TapeMeasure at 742 while if Measuring flag is False, step 742
is
skipped. The software then tests if Measured flag is True at 743. If True, the
software erases ~ld TapeMeasure 744, erases Distance Information 745, and sets
Measured flag to False ?46. If Measured flag is false at 743, steps 744, 74S
and
746 are skipped. The software then sets Measuring flag to True at 747 and
TapeMeasure StartPosition to the current Pointer Position 748. The software
then returns to Wait for Event state 704.
If no "StartMeasuring" signal is present at 740, the software tests at 750 if
"HomePointer" signal is present. This indicates a desire to return the pointer
to
a predefined location. If present, the software sets at step 751 the Pointer
to Real
World Position equal to Preset ~7alue, calculates Pointer Drawing Position at
752,
and redraws Pointer at 753. If Measuring flag is True at 754, the software
sets




~~~~~~3
_24_
TapeMeasure EndPosition to Pointer Position 755, redraws TapeMeasure 756,
and then returns to Wait for Event state 704. If Measuring flag is False at
754,
the software returns directly to Wait for Event state 704.
If no "Homel?ointer" signal is present at 750, the software tests whether a
"Painter Positioning Device Movement" signal is present at 760. If present,
the
software updates Pointer Real World Position at 761, and continues from step
752
above.
If no "Pointer Positioning Device Movement" signal is present at 764, the
software tests at step 770 whether a "Manual Camera Alignment Change" signal
1U is present, indicating that the camera position has been adjusted manually,
outside
of program control. If present, the software reads in new values for Camera
Separation and Convergence Point 771, and continues from 783 below.
If no "Manual Camera Alignment Change" signal is present at 770, the
software tests if "Remote Camera Alignment Change" signal is present at 780.
If
present, the software Reads in Alew Camera Separation and Convergence
information at 781. The software then sends appropriate control signals to
Camera Alignment Controller 52 at step 782 to adjust the cameras remotely. The
software then calculates new Pointer Drawing Position at step 783 and redraws
Pointer at step 784 to maintain apparent real world position. If either
Measuring
flag or Measured flag is True at step 78S the software Recalculates
TapeMeasure
Drawing Position at 786 to maintain correspondence with real world view,
redraws
TapeMeasure at 787, and returns to Wait for Event state at 704. If both
Measuring flag and Measured flag are True at 785, the software returns
directly
to Wait for Event state at 704.
If no "Remote Camera Alignment Change" signal is present at 784,
software tests if "Automatic Camera Alignment Change" signal is present at
step
790. If present, software Calculates Optimal Camera Configuration to minimize
z-axis scale distortion based on current Pointer position and continues from
step
782 above. If no "Automatic Camera Alignment Changes" signal is present at
790,
the software ignores the Event and returns to Wait for Event state 704.




-25-
INTEIBA~TIVE CAMERA ALIGNMENT
One of the problems associated with assembling a functional stereoscopic
video system is determining suitable values for the separation and directional
parameters of the cameras of the stereoscopic video system. Because the
objective of a stereoscopic video system is to allow an observer to perceive
volumetric information about an object by means of binocular disparity, that
is,
by means of the differences between the projections of the parts of an object
onto
the two retinas of the observer's eyes, it is generally desirable that the
images
produced by the cameras be oriented with respect to each other such that the
disparity between the corresponding retinal images does not exceed the limits
of
the observer's ability to fuse those separate images into a single
stereoscopically
perceived image.
d~eeping in mind that the separation between a human's two eyes is fixed,
the objective of matching the orientation of the cameras with the orientation
of
the observer's eyes involves ensuring that the separation and the angle of
convergence of the two cameras are appropriate relative to the separation and
the
angle of convergence of the observer's eyes. For any particular angle of
convergence and particular horizontal separation between cameras, there will
be
one vertical line in space at which the vertical planes defined by the lines
of sight
of the two cameras intersect. Any object or object segment located along that
line
in space will be presented to the observer on the display surface as two
concident
images, that is, with no horizontal disparifiy, and will elicit a particular
angle of
convergence of the observer's eyes when observing it. The actual angle of
convergence of the observer's eyes will depend on the separation between the
observer's eyes and on the distance at which the observer's eyes are situated
from
the display surface.
Any object or object segment located at a point in space which is in front
of the vertical line of convergence of the cameras, that is, proximal to the
cameras, will produce left and right images on the display surface that have a
certain disparity. When viewing such a point, the observer's eyes will
converge at
a point in front of the display surface, and the points will thus appear to
lie in
front of the display surface. The actual angle of convergence of the
observer's




:~~ ~~1~ ,
-26-
eyes will depend on the separation between the observer's eyes, the distance
at
which the observer's eyes are situated from the display surface, the focal
length
of the lenses used and the gain of the displays. The term "display gain" is
used
here to refer to the relationship between the displacement of a point on the
display screen, or display surface, relative to its corresponding displacement
on the
sensing element of the video cameras.
Any object or object segment located at a point in space which is behind
the vertical line of convergence of the cameras, that is, distal to the
cameras, will
produce left and right images on the display surface that also have a certain
disparity. When viewing such a point, the observer's eyes will converge at a
point
behind the display surface, and the point will thus appear to lie behind the
display
surface. The actual angle of convergence of the observer's eyes will depend on
the separation between the observer's eyes, the distance at which the
observer's
eyes are situated from the display surface, the focal length of the lenses
used and
f5 the gain of the displays.
Whenever the separation and angle of convergence of the video cameras,
the focal length of the cameras, the gain of the displays and the distance of
the
observer from the display surface are such that the images presented to and
perceived stereoscopically by the observer are similar to what would be
perceived
if the observer were to view the scene naturally with his own eyes, the
stereoscopic display system can be thought of as orthostereoscopic. However,
it
is also possible to separate the cameras at smaller distances relative to the
convergence angle, to produce a hypostereoscopic display system, or to
increase
the camera separation relative to the convergence angle to produce a
hyperstereoscopic display system. With hyperstereoscopic display systems, the
increased disparity between the left and right eye images results in an
effective
increase in the observer's ability to perceive differences in the location of
points
along the longitudinal axis of the camera system, that is, increased depth
resolution. With hypostereoscopic display systems, the observer does not have
to converge his eyes as much, for a given angle of convergence of the cameras
to
view particular objects, and, although depth resolution is decreased, the
range
within which the observer is able to fuse images will be correspondingly
increased.




~~~~r~~
-27-
On the basis of these principles, at least six problem areas associated with
determining the separation and alignment of stereoscopic video cameras, or
optical elements, can be identified:
In order to perceive stereoscopically a "near object", that is, an object
which is located in front of the line of camera convergence, i.e. proximal to
the
cameras, the observer must converge his eyes to a point in front of the
display
surface, to fuse the binocularly disparate images into one perceptually
integral
three dimensional object. In other words, any point which is located directly
in
front of the line of camera convergence, for example, will appear to the left
of the
centre of the right camera and to the right of centre of the left camera,
thereby
forcing the observer to converge his eyes to a point in front of the display
surface.
For objects which are farther in front of the convergence line, i.e. closer to
the
cameras, the observer must converge his eyes more. At some point it will
become
too difficult for most observers to accomplish this fusion, and the integral
image
may break down and be perceived as two separate images. This problem is
magnified for camera configurations in which the cameras are placed in
parallel,
in which case all images appear in front of the line of convergence, which
is.at
infinity. Similarly, for "far objects", that is, objects which are behind the
line of
camera convergence, i.e. distal to the cameras, the observer's eyes must
diverge
to a point behind the display surface, in order for a fused three dimensional
image
to be perceived. For objects which are farther behind the convergence line,
i.e.
farther from the cameras, the observer must converge his eyes less, and may
even
have to diverge his eyes. At some point it will became too difficult for most
observers to accomplish this fusion, and the integral image rnay break down
and
be perceived as two separate images. One obvious practical solution to this
problem is to endeavour to maintain the camera alignment such that the objects
being viewed are as often as possible as close as possible to the line of
convergence of the two cameras, thereby minimising the extent to which the
observer is required to converge or diverge his eyes.
As an extension of the problem concerning the observer's need to converge
or diverge his eyes excessively, in order to perceive an integral (i.e, fused)
three
dimensional object whenever such objects are respectively very far in front of
or




-zs-
behind the line of convergence of the cameras, it may be possible in many
cases
for the observer to succeed in perceiving an integral fused image; however, it
may
be difficult for the observer to maintain that image for a long time without
suffering from eyestrain and/or fatigue and/or discomfort. Once again, one
obvious practical solution to the problem is to endeavour to maintain the
camera
alignment such that the objects being viewed are as often as possible as close
as
possible to the line of convergence of the rivo cameras, thereby minimising
the
extent to which the observer is required to converge or diverge his eyes, and
might
thereby potentially experience eyestrain and/or fatigue and/or discomfort.
Under normal binocular viewing, that is, without a stereoscopic video
system, the focussing and convergence actions of a human's eyes are
compatible,
and in fact the human uses these as additional cues about the depth or
distance
of objects being viewed. That is, for objects which are relatively close to
him, the
human will both converge his eyes more and adjust his ocular focal length for
close viewing. Conversely, for objects which are relatively far away, the
human
will both converge his eyes less and adjust his ocular focal length for far
viewing.
With a stereoscopic video system, on the other hand, a potential conflict
between
these cues may arise. That is, there is an intrinsic conflict between the
observer's
need to converge his eyes at different angles in order to perceive objects at
different depths within the visual scene and the fact that ail images are in
fact
being presented on one display surface, which is at a fixed distance from the
observer's eyes. As a consequence, whereas the observer might continually be
changing the angle of convergence of his eyes as he scans the visual scene
presented on the display surface, the focal length of his visual system
remains
constant. Clearly this conflict is minimal whenever the objects being viewed
lie
in the vicinity of the line of convergence of the cameras, which corresponds
to the
objects being perceived as lying on the surface of the display, which is
therefore
compatible with the observer's own focal plane.
As is well known to those skilled in the art of producing stereoscopic
display images, it is advisable to avoid presenting °'near" objects, as
defined above,
in close proximity to the edges of the display screen, or display surface.
This is
because there can arise a serious conflict between the observed depth of the
edges




-29-
of the screen, which the observer clearly observes to be at the same depth as
the
rest of the screen surface, and the intended depth of the near object.
Whenever
the near object is not only in the vicinity of the screen edge, but goes
partially
beyond it, the edge of the screen will occlude parts of the near object.
Because
the observer's perceptual mechanisms know that objects can be visually
occluded
only by other objects which are in front of them, the near object stereoscopic
cue
will be overpowered by the occlusion cue, which will give the impression that
the
object is a far object instead. One solution to this problem is to endeavour
to
ensure that no near objects of interest are presented near the edges of the
screen,
and, if this case does arise, to redirect the cameras such that the objects of
interest are displaced from the edges of the screen. This solution may not
always
be feasible, however, such as when the cameras are mounted on a stationary
mount which cannot be moved, or which is not equipped with a panning
capability. Another, more flexible, solution in such instances, which is
offered by
the present invention, is to realign the cameras and thereby redefine the
depth of
the observed environment relative to the viewing screen, such that the same
objects, which should have been perceived as "near" objects, become perceived
as
lying at or "behind" the plane of the viewing screen.
Converged stereoscopic camera configurations can result in what is known
as stereoscopic depth distortion, whereby the locations of objects in the real
world
which are located within a fronto- parallel plane defined by the alignment of
the
cameras will appear to be distorted to the observer by the stereoscopic video
system. For camera configurations for which the camera separation is
relatively
large relative to the angle of convergence of the cameras, stereoscopic depth
distortion will increase. For example, with widely converged cameras, an
observer
stereoscopically viewing a horizontal metre stick located in the fronto-
parallel
plane including the camera convergence point may report that the metre stick
appears to be curved away from the observer. For a fixed camera configuration,
the depth distortion phenomenon will be static. Whenever the camera system is
moved in some way, involving translational and/or rotational motion, dynamic
depth distortions will result. The various properties of the stereoscopic
depth
distortion phenomenon have been analysed and reported, for example, by A.B.




- 30 -
Diner and M. von Sydow in NASA Jet Propulsion Laboratory Publication JPL
87-l, Rev. 1, May, 1988, "Stereo Depth Distortions in Teleoperation".
Any stereoscopic camera configuration will have associated with it a
particular stereoscopic depth resolution, that is, the extent to which the
stereoscopic cues supplied by the display system allows the observer to detect
just
noticeable differences in the location of points along the longitudinal axis
of the
camera system. With hyperstereoscopic display systems, that is, with
relatively
large camera separations relative to the angle of convergence of the cameras,
stereoscopic depth resolution will be increased. Two problems are associated
with
the degree of stereoscopic depth resolution. One problem is that for
hyperstereoscopic display systems, stereoscopic depth distortion will also
increase,
which implies that stereoscopic depth resolution and stereoscopic depth
distortion
must always be traded off against each other in the design of a stereoscopic
display system. Another problem is the so-called cardboarding effect,
according
to which an object will appear to be flattened in the z direction, that is,
the
direction of the longitudinal axis of the camera system. This will occur with
hypostereoscopic display systems, whenever the display gain in the z-direction
is
less than the display gain in the x-y plane, that is, within the fronto-
parallel plane
which is orthogonal to the z-axis.
From this discussion it is clear that, under many circumstances, it is
advantageous to have the cameras arranged, in terms of direction, separation,
and
angle of convergence, such that the centre of the observer's interest within
the .
video scene being viewed is as close as possible to the point of convergence
of the
cameras. If this can be achieved, the problems of excessive convergence or
divergence of the observer's eyes, eyestrain, fatigue and discomfort, as well
as
stereoscopic depth distortion, can be minimised, and stereoscopic depth
resolution
can be increased in conjunction. In other words, in many instances there is
some
optimal camera arrangement, in terms of direction, separation, and angle of
convergence of the cameras, relative to the vicinity in the video scene within
which
the observer happens to be looking, or concentrating his attention. To
determine
this optimum, however, it is necessary to know where this region is within the
video scene.



-31-
If that region is fairly static, then an appropriate camera configuration can
be determined and fixed. If the observer is required to scan the scene
extensively,
however, and/or if the cameras are to be redirected dynamically within the
video
scene, then a fixed camera configuration will not, in the general case, be
optimal.
It is advantageous, in other words, to be able to reconfigure the cameras
dynamically during a particular viewing operation, and thereby optimise the
viewing operation.
A number of methods of reconfiguring the stereoscopic cameras
dynamically during a viewing operation are possible. The viewer may
interactively
adjust the cameras until a subjectively acceptable image is obtained.
Alternatively,
analytical methods can be used to determine an optimal camera configuration,
with respect to the properties of the stereoscopic video system, to the task
to be
accomplished, and to the viewing region within the video scene.
In all cases of determining the optimal camera configuration, it is necessary
to know the object of the viewing operation, which is a function of the region
in
space that is the centre of the observer's visual attention. Clearly, the
human
observer himself will know that point or region at any point in time. I-
Iowever,
for dynamic camera configuration control, the problem remains of how to
communicate information about that point or region to the system component
whose role it is to compute the optimal configuration. In terms of the x-y
plane,
that is, in terms of any of the fronto-parallel planes which are orthogonal to
the
longitudinal axis of symmetry of the two cameras, this information is
straightforward to communicate. Simply by panning, tilting, sweeping or
translating the cameras, the observer may be able to indicate the new centre
of
attention. Alternatively, the observer may use a set of cross-hairs, or
similar
indicator, overlaid on the screen to indicate where in the x-y plane the
cameras
should be directed. On the other hand, in the z-direction, that is, in the
direction
along the longitudinal axis of symmetry of the camera system, it is more
difficult
for the observer to indicate his focus of attention. With a stereoscopic video
system the observer is able to perceive the region of interest; however, there
is no
obvious means of specifying the location of this region to the system
component
whose role it is to compute the optimal camera configuration. One of the




~~<~v~;~~
- 32 -
principal objectives of the present invention is therefore to enable the
observer
to communicate, either overtly or covertly, the location of such points or
regions
of interest to the system component whose role it is to compute the optimal
camera configuration, in order that the cameras remain optimally configured
throughout any viewing operation.
The second main objective of the present invention is to enable the
stereoscopic video system to be reconfigured easily, in response to
(re)computation of the optimal camera configuration by the system component
whose role it is to compute that configuration. In principle, the actual
reconfiguring of the cameras can be carried out either under direct manual
control, or under remote control, using some source of power to move the
cameras relative to each other. Clearly, for situations in which the video
cameras
are physically remote from the observer, however, remote control is
preferable.
The present invention provides a motorised means of realising on-line
adjustment
of the camera separation and convergence angles, in order that the cameras
remain optimally configured relative to the observer's focus of attention, or
intended focus of attention, in the stereoscopic video scene.
It is important to point out that uses of the present invention are not
limited to stereoscopic video systems only. Frequently in cinematography,
video
cameras are incorporated within or in conjunction with film cameras, as a
means
of obtaining immediate feedback on the scene being recorded, aiming the
cameras, etc. In a stereoscopic cinematographic system, it is possible to
extend
this technique by employing stereoscopic video cameras. The present invention
therefore provides a means for the operator of a stereoscopic cinematographic
system to to communicate, either overtly or covertly, the location of such
points
or regions of interest to the system component whose role it is to compute the
optimal camera configuration, in order that the cinematographic cameras remain
optimally configured throughout any filming operation, as well as providing a
motorised means of realising on-line adjustment of the cinematographic camera
separation and convergence angles, in order that the cinematographic cameras
remain optimally configured relative to the intended focus of attention within
the
stereoscopic video scene.




~~~~~v~~
-33-
FIGURE 9 illustrates an embodiment of the present invention, which
provides for dynamic adjustment of the camera separation 2s, and camera angle
of convergence 2~, as defined in FIGURE 5B. In that figure, video cameras 14
and 16 are depicted schematically, from a simulated top view, as converging at
point 855 in space in front of the cameras, in the vicinity of two real
objects 390
and 391. The same two objects are shown reproduced in the video monitor 38,
as objects 390' and 391'. As depicted in the figure, however, the observer has
caused a virtual pointer 370, in the form of a pair of cross-hairs, to be
drawn
proximal to the observed location of object 390'. Note that the pointer image
370
does not appear in the real scene near object 390. According to the preferred
embodiment, if the observer's focus of attention should deviate an excessive
distance from point 855, for example as illustrated, the decision may be taken
to
realign the cameras, that is, to change the camera separation and camera angle
of convergence to accommodate this shift of focus. The decision to effect this
realignment may be taken by either the observer himself, or by rules which
have
been programmed within the control computer 32, in the Optimisation Routine
+ Camera Control Logic Subsystem 37. The actual reconfigured camera state,
together with the commands to the Camera Alignment Controller 52 necessary to
effect the changes, are computed by the Optimisation Routine + Camera Control
Logic Subsystem 37, according to the methods illustrated in FIGURES i0 and 11.
The Optimisation Routine and Camera Control Logic Subsystem 37
computes the required configuration of the stereoscopic video system, that is,
depending on the indicated region of. interest, the routine computes the
appropriate separation of the video cameras 2s and their angle of convergence
2~.
If the stereoscopic image generating system is equipped with a pan and/or tilt
controller, or if it is equipped with remotely controlled zoom lenses and/or
focus
adjustments, these parameters too can be configured by the Optimisation
Routine
and Camera Control Logic Subsystem 37.
According to the present invention, three strategies are possible to govern
realignment of the stereoscopic cameras, all of which are explained and
illustrated
in the (allowing. According to one strategy, the user indicates, by means of
the
Pointer Positioning Device 35 and the stereographic pointer 370, the current
point




~~~'~~~~
-34-
of interest and indicates a desire for either increased depth resolution or
greater
(usable stereoscopic range. Upon command of the user, the Optimisation Routine
+ Camera Control Logic Subsystem 37 will use its knowledge of the equivalent
location of the pointer in real three dimensional video space to guide the
aiming
and the focus of the cameras. The Optimisation Routine + Camera Control
Logic Subsystem 37 will also permit the user to indicate an appropriate trade-
off
between resolution and (usable range. Separate controls can be provided for
remote operation of zoom lenses. According to another strategy, the
Optimisation
Routine + Camera Control Logic Subsystem 37 continuously tracks the
stereographic pointer, automatically adjusting the stereoscopic camera
configuration when appropriate, using the movement of the pointer as an
indicator of the current focus of interest and necessary stereoscopic range.
According to a third strategy, the user has complete manual control over the
stereoscopic camera configuration.
IS The Optimisation Rautine + Camera Contral Logic Subsystem 37
calculates the appropriate camera configuration based on the geometric model
of
human stereopsis illustrated in FI~GURlE 10b. The model assumes that human
eyes are limited in their range of acceptable convergence angles. In young
adults,
the eyes can converge comfortably from as close as l5cm from the observer, at
which point the angle of convergence cr 865, that is, the angle formed at the
convergence point by the intersection of the optical axes of the two eyes, is
approximately 60 degrees, to as far away as optical infinity, in which case cr
865
is 0 degrees. Under direct viewing conditions, that is, without the aid of any
external viewing hardware, this range is sufficient to cover the entire visual
range
of distances from the observer.
Under stereoscopic video viewing conditions, the normal range of
convergence angles may not suffice, however. Depending on the parameters
which define the video system's optical sensing elements, fusion by the
observer
of some objects in the displayed image may require the observer to converge
his
eyes with a convexgence angle which is greater than 60 degrees, or
alternatively,
in some cases, even to diverge his eyes. In cases for which divergence of the
eyes
occurs, the angle of convergence of the observer's eyes is negative, that is,
the




-35-
optical axes of the observer's eyes intersect behind the observer. lfiis can
occur
with a hyperstereoscopic system for distal objects, when the disparity between
left
and right images is exaggerated.
The relationship between the geometry of the optical sensing elements of
the real world cameras and the perception by a human observer of the
corresponding stereoscopic image is illustrated in FIGURES l0a and lOb. If the
two cameras are converging at Point T 855, then the positions upon the left
and
right image sensors 851 and 858 which correspond to any object located at
Point
T are both located at the centres of the respective image sensors (assuming
ideal
camera optics). In FIGURE l0a these points are depicted as TLS 841 and TES
842 respectively. On the other hand, the images of any other Point B 850 along
the axis of symmetry of the cameras will be displaced from the centres of the
left
and right optical sensing elements. In FIGURE l0a these points are depicted as
Bras 843 and BRis 844 respectively. On the left sensor 851, the point BLIS 843
is
displaced to the right of TLS 841, by an amount that can be calculated by the
equations given in the discussion of FIGURE 5. On the right image sensor 858,
the image of Point B 850 is displaced to the left Of TRIS 842.
The image received by each optical sensing element through its lens must
be scaled to fit the particular Stereoscopic Viewing Screen 873. In the
simplest
case, the displacement of each image point from the centre or origin of the
display screen is scaled relative to the centre or origin of the image sensors
by
some gain value G, where G can be defined, for ideal monitors and ideal
identical
cameras, as:
G p viewing sereera width
i mage sensor width
For video systems that are not ideal, on the other hand, suitable two
dimensional
filtering or optical correction or calibration techniques, known to those
skilled in
the art, can be employed, such that G, rather than having a constant value for
all
points on the screen, becomes a function of the horizontal and vertical
displacement of each point from the centre or origin of each image sensor.



~~~~,~~
-36-
Instead of the point T 855 lying at the point of convergence of the two
cameras, consider now an arbitrary point A 856 adjacent to point T 855. Such a
point could be considered, for example, as the end point of a real line with
width
a. In FIGURE 5a the general case for the computation of the displacement
S A~s(lx,ly) 805 of an arbitrary point A(p,q,r) 807 from the centre of a
camera image
sensor is illustrated. In FIGURE 10a, the image of a line with width a would
have
a corresponding width of a~IS = f(a/r) on the left image sensor, where f 802
and
r are as shown in FIGURE 5a. The total width of the left image of such a line
would therefore have some corresponding width on the stereoscopic viewing
screen 873. This width is labelled aYS 863 in FIGURE 10b, and is defined by
a~ = G aL~s. The visual angle subtended by the image of the line width avs 863
is a function of the distance I) 872 of the viewer from the viewing screen
873.
(Similar calculations apply for the width of the right eye image on the
viewing
screen and the visual angle subtended.)
The left and right eye images of point T are coincident on the viewing
screen because point T is at the convergence point of the cameras. The images
of point B are not, however. Referring to the computation of the distance lx
803
shown in FIGURE 5a, in FIGURE 10a the image of point B 850 on the left
camera image sensor 851, BLis 843, is some distance lx from the centre of the
left
sensor, while the corresponding image Bas is some distance rx from the centre
of the right sensor. The distances lx and rx correspond to distances dL 865
and
d~ 866 on the viewing screen, and are related (for ideal video cameras and
viewing screens) by the equations d~, = G lx and dR ---- G rx. For
convenience,
positive values are defined here as indicated in the figure; that is, d~, 865
is
positive to the left and dR 866 is positive to the right of the point of
convergence.
(For non-ideal systems, G is a function of the horizontal and vertical
position of
the image on the camera image sensor.)
The convergence distance DB 874 from the eyes of the observer to the
stereoscopic image on the viewing screen of the real point B, which is shown
as
equivalent point B.,,s 860 in FIGURE 10b, can be calculated from the
relationship



r1 ~ gJ
-37-
~IPD
B
- (C>'L +
where D 872 is the distance from the eyes 869 and 871 of the observer to the
viewing screen 873, IPD 870 is the inter-pupillary distance of the observer,
and dL
865 and dR 866 are as defined above. This equation can be used to calculate
the
apparent distance to any such stereoscopically displayed point (i.e., any
point that
is presented with horizontal disparity to the left and right eyes).
Note that, as point ~ becomes farther and farther away, the value of
(dL + dR) approaches the IPD, and the apparent distance D$ becomes infinite.
If dL.and/or dR continue to increase, DB will become negative. That is, in
order
to fuse such a stereoscopic image into a single object, the observer must
diverge
his eyes. This can be done to a very small extent without discomfort.
Exceeding
this extent means that the object will eventually break down into two separate
left
and right images, which obviously will not convey a sense of depth, and which
may
cause strain to the observer's eyes.
Analogously, the configuration of the stereoscopic viewing system can
exaggerate the nearness of an object, making it difficult or impossible for
the
observer to converge his eyes on the image. Consequently, for any given camera
configuration, there is a certain point close to the cameras which defines the
nearest allowable point of the fusable stereoscopic region, and a certain
point far
from the cameras which will define the allowable furthest point of the
stereoscopically viewable region.
The most straightforward criterion for specifying these limits is in terms of
the maintaining comfortable rotation angles of the observer's eyes. In other
words, the nearest point should have an image appear not closer than 15 cm
away
from the observer, and the farthest point should have an image appear at
optical
infinity. The maximum viewscreen disparity, disp, between the left and right
images can therefore be calculated from the relationship:
a;sp _ ~L + dR i _ ~
~B




-38-
Using this equation to calculate the nearest and farthest fusable points for a
particular camera configuration, distance from observer's eyes to viewscreen D
and IPD, we observe that the maximum allowable disparity, lisp, for DB =
infinity, is the IPD. Conversely, given values for the IPD and the viewing
distance
S D, the minimum allowable parallax can also be calculated. DB is negative for
images that appear in front of the viewing screen). Note that the fusable
stereoscopic region can be increased by increasing the observer's viewing
distance D.
,As stated earlier, however, there are many reasons why this simple method
of determining the usable stereoscopic range is inadequate. The potential
problems of conflicting depth cues, such as stereopsis versus occlusion cues
near
the borders of the viewing screen, or convergence versus accommodation, or
user
eyestrain, provide compelling reasons to avoid objects appearing in front of
the
viewscreen when possible, for example. It is therefore desirable to restrict
the
stereoscopic range such that the viewscreen disparity disp fox most points is
greater than or equal to zero. That is, by this criterion, DB ~ D, which
implies
that the nearest objects will appear on the surface of the viewscreen.
Referring to FIGURE 110, the relationship between the real object distance
zB 859 and the apparent distance DB 874 from the observer to the object viewed
within the display can be referred to as the gain in the depth direction,
defined
as K = DB/zB. The parameter (or function) G, discussed earlier, describes the
gain in the horizontal and/or vertical directions; that is, it describes the
transformation upon the width and height of an object in transmission from the
cameras to the viewing screen. The parameter If, the gain in the depth
direction,
describes what happens to the apparent location in three dimensional space of
an
object.




-39-
To a first approximation, letting the horizontal displacement of any point
from the centre of the stereoscopic system be zero (i.e. x = 0), the gain in
the
depth direction K can be evaluated by the eduation:
(S2 + c z) ~ 2D 1)
K_
(l~~sz+ l~'~cz+csGf-szG,~z
where z is the generalised distance from the centroid of the camera system to
the
point, and all other parameters are as previously defined. For K = 1, the
camera
system is orthostereoscopic, and objects will appear to be as far away from
the
observer on the screen as they are from the cameras. For values of K less than
one, the hypostereoscopic condition, the observer perceives objects to be
closer
than they really axe. Conversely, for values of K greater than one, the
hyperstereoscopic condition, depth is magnified, and objects will look further
away.
Since in general different points on any object will appear at different
distances within the visual field, the front and back of that object will also
appear
at different positions in depth. The larger K is, the greater the apparent
distance
from the front of an object to the back; that is, the object will appeared to
be
stretched. Similarly, for smaller K, objects will appear to be flattened.
Under
certain circumstances, these distorting effects can interfere with the task at
hand.
The appropriate value for K to minimise these distortions is one for which
the gain in the depth direction corresponds with that irr the width and height
directions. The gain in the width and height direction can be expressed as
g _ ~(f!z): The ratio K/g can be regarded as the depth distortion ratio of the
stereoscopic system. When this ratio is approximately one, the object being
viewed may appear to be either smaller or larger than its real size, but it
will have
the correct shape or proportions; that is, for example, a cube will remain a
cube,
and not be perceived as a rectangular prism.



-40-
To reduce the extent of an undesired depth distortion, it is necessary to
adjust the camera parameters accordingly. If we assign (IC/g)=I and solve for
c
816, the distance from the centroid of the cameras to their point of
convergence,
we obtain the equation
l2DsZp- IzDs2Gf+szG2f
-IZpzD+ I~DzGf+sG2~
Therefore, for situations requiring very little depth distortion, given an
object or
"focus of interest" at distance z from the centroid of the cameras, and given
the
distance I7 of the observer from the display surface, the gain G of the video
system, the focal length of the lenses f, and the inter-papillary distance IPD
of the
observer, it is possible, for a particular separation s of the cameras, to
configure
the cameras to minimise the depth distortion by using the value of c derived
from
the equation to prescribe the required camera convergence angle 2ø.
The human visual system is limited in its ability to detect differences in
depth using only binocular disparity as a cue. Under ideal conditions subjects
in
the laboratory have been able to detect disparities of approximately 3 arc-
seconds.
Under normal conditions, the typical lower limit is between 10 and 20 arc-
seconds.
When the distance D of the observer from the viewing screen is lm, this
corresponds to an equivalent disparity perceived within the viewing screen of
between O.OSmm and 0.lmm. Because few display media have sufficient
resolution to match this theoretical limit, the practisral limit of depth
resolution for
most stereoscopic video systems is a function of the equipment used, and not
of
the human stereascopic visual system.
The practical limit of depth resolution can be expressed as the smallest
disparity between two points in the z direction which is discernible within
the
viewing screen. This will be a function of the particular hardware used, the
ambient conditions and, to a limited extent, the distance of the observer from
the
viewing screen. Assuming ideal equipment, it will be independent of the
position


-41-
of points in the x or y directions on the viewing screen. Viewing screen
disparity
is defined as the distance between the left and right images, i.e. disp = dR +
du,
as defined earlier. If the range of fusable viewing screen disparities is
defined
over the range from zero to the IPD, as discussed earlier, that is, 0 5 disp 5
IPD,
S and if the hardware limitations are such that the minimum discernible
disparity
is d, then there is only a fixed number of discriminable depths within the
stereoscopic viewing region. This can be expressed as:
NDI _ dlsp~ - dispmin
s
where FdDI is the number of discriminable depth intervals, disp,~a" is the
minimum
(usable viewing screen disparity, and dispm~ is the maximum fusable viewing
screen disparity.
The relationship between each depth interval on the viewing screen and
the corresponding depth interval in the real world is a function of the camera
configuration. The magnitude of each interval is related to the square of the
distance. For example, for a hypothetical stereoscopic video system with a
minimum discernible disparity d of lmm, dispma" = Umm, and dispm~ = 65mm,
the number of discriminable depth intervals in the fusable stereoscopic region
is
therefore
NDI - (65 - 0)mm ~ 65
lmm
Therefore, if the object depth corresponding to dispm,n = ~ 1S Z"~ar = 1m, and
if
the object depth corresponding to dispm~ = 6~mm is zja~ = llm, there will 65
discriminable depth regions within the 10 metre range, where the actual extent
of
each region increases monotonically with the square of the distance from the
stereoscopic cameras.
To increase the effective depth resolution for a particular task, it is
therefore necessary to force x"~a~ and z~ar closer together, thereby
compressing the
same number of depth intervals into a small range of distances. Examining the



-42-
equations for calculating the screen parallax, it is seen that increasing the
convergence angle of the cameras results in a compression of the fusable
stereoscopic region.
Because, according to the present invention, it is possible for an observer
to indicate where his focus of attention is, by using the Pointer Positioning
Device
35 to move the Stereographic Pointer, and because the present invention
provides
for a dynamically controllable Camera Alignment System 50, the present
invention
now provides the possibilit'~r for the camera configuration parameters to be
optimised relative to the task being performed by the user of the stereoscopic
video system. This may be done either upon by request by the user or
automatically, and either on a continuous basis or intermittently. Requests
for
particular resolution or fusable region requirements can clearly be
communicated
by means of conventional computer interface media. In all cases though, the
Stereographic Pointer is used as an indicator of the user's focus of attention
within
the three dimensional video space. The user can therefore control the size of
the
fusable stereoscopic region, by causing the cameras to be adjusted to have
higher
resolution when necessary, at the cost of a smaller fusable region, or to have
a
wider fusable region when necessasy, at the cost of fewer discriminable
regions.
FIGURE 11 depicts the Top view of a stereoscopic viewing system in a
hypothetical working environment. Referring to FIGURE 11a, the operator has
used the Stereographic Pointer 831, in this particular example depicted as a
small
darkened triangle, to indicate that he wishes to focus in the vicinity of
object C
830 with a fairly wide fusable region hand therefore only moderate
resolution).
Consequently, the cameras have been configured by the Optimisation Routine +
Camera Control Logic Subsystem 37 so that the point of convergence of the
cameras is in the vicinity of object C 830. In FIGURE llb, the operator has
moved the Stereographic Pointer 831 to indicate that he is now interested in
focussing in the region of object )B 832 instead, but does not want to change
the
size of the fusable region. Consequently, the Optimisation Routine + Camera
Control Logic Subsystem 37 has adjusted the convergence point and the
separation distance of the cameras accordingly, so that the convergence angle
is
similar to that of FIGURE lla. In FIGURE 11s the user is still focussing in
the



-43-
vicinity of object B 832, but has requested additional resolution in that
vicinity.
The camera separation has consequently been widened. In FIGURE lld the
operator is still focussing in the vicinity of object B 832, but has requested
that the
fusable range be increased to include object E 834, so that he can observe it
also.
S FIGITItE lle represents a general situation in which the user is moving the
Stereographic Pointer 831 within the video scene and in which the cameras are
dynamically tracking the Stereographic Pointer 831 as it moves around the
scene.
In that case, the Optimisation Routine + Camera Control Logic Subsystem 37
maintains the cameras continually converged at the position of the pointer, on
the
assumption that this is where the user is focussing his attention. In each of
the
examples illustrated in FIGIJItE 11, by adjusting the camera configuration to
meet
the needs of the operator, the task of the operator is theoretically
facilitated.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 1998-08-04
(22) Filed 1990-08-03
(41) Open to Public Inspection 1992-02-04
Examination Requested 1994-05-16
(45) Issued 1998-08-04
Deemed Expired 2005-08-03

Abandonment History

There is no abandonment history.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $0.00 1990-08-03
Maintenance Fee - Application - New Act 2 1992-08-03 $100.00 1992-05-29
Registration of a document - section 124 $0.00 1993-04-06
Registration of a document - section 124 $0.00 1993-04-06
Maintenance Fee - Application - New Act 3 1993-08-03 $100.00 1993-05-18
Maintenance Fee - Application - New Act 4 1994-08-03 $100.00 1994-07-28
Maintenance Fee - Application - New Act 5 1995-08-03 $150.00 1995-08-01
Maintenance Fee - Application - New Act 6 1996-08-05 $150.00 1996-08-01
Maintenance Fee - Application - New Act 7 1997-08-04 $150.00 1997-06-11
Final Fee $300.00 1998-04-02
Maintenance Fee - Application - New Act 8 1998-08-03 $150.00 1998-06-04
Maintenance Fee - Patent - New Act 9 1999-08-03 $400.00 2000-06-16
Maintenance Fee - Patent - New Act 10 2000-08-03 $400.00 2000-09-20
Maintenance Fee - Patent - New Act 11 2001-08-03 $200.00 2001-05-17
Maintenance Fee - Patent - New Act 12 2002-08-05 $200.00 2002-05-10
Maintenance Fee - Patent - New Act 13 2003-08-04 $200.00 2003-05-21
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
MINISTER OF NATIONAL DEFENCE
Past Owners on Record
DRASCIC, DAVID
GRODSKI, JULIUS J.
MILGRAM, PAUL
UNIVERSITY OF TORONTO
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Claims 1994-02-26 9 322
Representative Drawing 1998-07-21 1 15
Description 1994-02-26 43 2,290
Abstract 1994-02-26 1 25
Drawings 1994-02-26 13 403
Cover Page 1998-07-21 2 72
Cover Page 1994-02-26 1 16
Claims 1997-12-01 5 197
Drawings 1997-12-01 13 309
Correspondence 1998-04-02 1 40
Fees 2003-05-21 1 29
Fees 2000-09-20 1 45
Fees 1997-06-11 1 41
Fees 2001-05-17 1 36
Fees 1998-06-04 1 32
Correspondence 2001-02-15 2 74
Fees 2002-05-10 1 35
Fees 2000-06-16 1 45
Prosecution Correspondence 1994-05-16 1 44
Prosecution Correspondence 1994-05-16 3 113
Office Letter 1994-07-21 1 46
Fees 1996-08-01 1 33
Fees 1995-08-01 1 38
Fees 1994-07-28 1 43
Fees 1993-05-18 1 29
Fees 1992-05-29 1 30