Language selection

Search

Patent 2196563 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2196563
(54) English Title: APPARATUS AND METHODS FOR DETERMINING THE THREE-DIMENSIONAL SHAPE OF AN OBJECT USING ACTIVE ILLUMINATION AND RELATIVE BLURRING IN TWO IMAGES DUE TO DEFOCUS
(54) French Title: APPAREIL ET PROCEDES DE DETERMINATION DE LA FORME TRIDIMENSIONNELLE D'UN OBJET AU MOYEN D'UN ECLAIRAGE DYNAMIQUE ET D'UNE DIMINUTION DE NETTETE RELATIVE DANS DEUX IMAGES DUE A LA DEFOCALISATION
Status: Dead
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06K 9/74 (2006.01)
(72) Inventors :
  • NAYAR, SHREE K. (United States of America)
  • NOGUCHI, MINORI (Japan)
  • WANTANABE, MASAHIRO (Japan)
(73) Owners :
  • THE TRUSTEES OF COLUMBIA UNIVERSITY (United States of America)
(71) Applicants :
  • THE TRUSTEES OF COLUMBIA UNIVERSITY (United States of America)
(74) Agent: BLAKE, CASSELS & GRAYDON LLP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 1995-06-07
(87) Open to Public Inspection: 1996-12-19
Examination requested: 2002-04-19
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/US1995/007890
(87) International Publication Number: WO1996/041304
(85) National Entry: 1997-01-31

(30) Application Priority Data: None

Abstracts

English Abstract




A method and apparatus for mapping depth of an object (22) in a preferred
arrangement uses a projected light pattern to provide a selected texture to
the object (22) along the optical axis (24) of observation. An imaging system
senses (32, 34) first and second images of the object (22) with the projected
light pattern and compares the defocused of the projected pattern in the
images to determine relative depth of elemental portions of the object (22).


French Abstract

Procédé et appareil de représentation de l'épaisseur d'un objet (22). Selon une disposition préférée, une configuration de lumière projetée donne une texture choisie à l'objet (22) le long de l'axe optique (24) d'observation. Un système d'imagerie détecte (32, 34) une première et une deuxième image de l'objet (22) à l'aide de la configuration de lumière projetée et compare la défocalisation de la configuration projetée dans les images pour déterminer l'épaisseur relative de parties élémentaires de l'objet (22).

Claims

Note: Claims are shown in the official language in which they were submitted.



Claims

1. A method for mapping a three-dimensional structure
by depth from defocus, comprising the steps of:
(a) illuminating said structure with a
_ preselected illumination pattern;
(b) sensing at least two images of said
illuminated structure each of said images
being formed with different imaging
parameters; and
(c) determining a relative blur between
corresponding elemental portions of said
sensed images thereby determining the depth
of corresponding elemental portions of said
three-dimensional structure.

2. The method of claim 1 wherein said illumination
comprises illumination with a two dimensional
illumination pattern.

3. The method of claim 2 wherein said illumination is
with a rectangular grid having selected horizontal
and vertical grid spacing of rectangular
transparent and opaque elements forming a
checkerboard pattern.

4. The method of claim 3 wherein said sensing
comprises sensing using an array of sensing
elements having horizontal and vertical array
spacings that are integral sub-multiples of said
horizontal and vertical grid spacing in said
sensed images.

5. The method of claim 1 wherein said at least two
images are formed with a telecentric lens system.

6. A method for mapping a three-dimensional structure
by depth from defocus, comprising the steps of:

42
(a) illuminating said structure with an
illumination pattern comprising a rectangular
grid projected along an optical axis;
(b) sensing at least two images of said
illuminated structure from said optical axis
using a constant magnification imaging
system, said images being sensed at least two
imaging planes with different locations with
respect to the focal plane of said imaging
system;
(c) determining the relative blur between
corresponding elemental portions of said
illumination patterns in said sensed images
thereby determining the depth of
corresponding elemental portions of said
three dimensional structure.

7. The method of claim 6 wherein said images are
sensed using first and second sensing arrays of
sensing elements arranged in a rectangular pattern
with selected element spacing in each direction of
said array.

8. The method of claim 7 wherein said rectangular
grid has a checkerboard pattern with selected grid
periodicity.

9. The method of claim 8 wherein said grid
periodicity is selected to provide a grid image on
said sensing arrays wherein said grid periodicity
is an integral multiple of said corresponding
element spacing.

10. The method of claim 9, wherein said grid
periodicity is selected to provide a grid image
with a period substantially equal to twice said
element spacing, and said grid image is aligned
with said array in two orthogonal directions.

43
11. The method of claim 9, wherein said grid image
periodicity is substantially equal to four times
said pixel width and said grid image is shifted on
said array by one eighth of said grid image
periodicity.

12. The method of claim 6, wherein said light source
is a monochromatic laser light source.

13. The method of claim 9, wherein said sensing step
comprises sensing at least two depth images of
said scene formed by said laser light and at least
one brightness image of said scene formed by
ambient light, and said determining step comprises
measuring a relative blur between said sensed
laser light images.

14. The method of claim 6, wherein a first image being
is at a position corresponding to a near focused
plane in said and a second image is sensed at a
position corresponding to a far focused plane.

15. The method of claim 6, wherein said illumination
grid is selected so as to produce an illumination
pattern which generates multiple spatial
frequencies.

16. The method of claim 6, wherein said illuminating
step further comprises using half-mirror optics to
reflect said illumination pattern prior to
illuminating said scene, and said sensing step
further comprises passing said scene images
through said half-mirror optics prior to sensing
said scene, such that said illumination pattern
and said scene images pass along a common optical
axis.

44
17. The method of claim 6, wherein said illuminating
step further comprises using polarization optics
to polarize said illumination pattern prior to
illuminating said scene, and said sensing step
further comprises passing said scene images
through polarization optics prior to sensing said
scene.

18. The method of claim 6, wherein said determining
step further comprises:
(i) converting said sensed images into digital
signals on a pixel by pixel basis; and
(ii) convolving said digital signals on a pixel by
pixel basis to determine power measurement
signals that correspond to the fundamental
frequency of said illumination pattern at
each of said pixels for each sensed scene
image.

19. The method of claim 18, wherein said measuring
step further comprises:
(iii) correcting said power measurement signals for
mis-registration on a pixel by pixel basis,
such that any errors introduced into said
power measurement signals because of
misalignment between said sensing pixels of
said array and said illumination pattern is
corrected.

20. The method of claim 19, wherein said correcting
step comprises taking the sum of the squares of
said measurement signal at four neighboring
pixels.

21. The method of claim 18, wherein said measuring
step further comprises:
(iii) normalizing said power measurement signals on
a pixel by pixel basis.


22. The method of claim 18, wherein said measuring
step further comprises:
(iii) comparing said power measurement signals for
one of said sensed images, on a pixel by
pixel basis, with determined power
measurements for a second of said sensed
images to determine said depth information at
each of said pixels.

23. The method of claim 6, wherein said determination
step comprises arranging said pixel by pixel depth
information as a depth map.

24. The method of claim 23, further comprising the
step of displaying said depth map as a wireframe
image.

25. The method of claim 13, wherein said determination
step comprises arranging said pixel by pixel depth
information as a depth map, further comprising the
step of constructing a texture mapped three-dimensional
display from said sensed brightness
image and said depth map.

26. Apparatus for measuring a three-dimensional
structure of a scene by depth from defocus,
comprising:
(a) active illumination means for illuminating
the scene with a preselected illumination
pattern;
(b) sensor means, optically coupled to said
illuminating means, for sensing at least two
images of the scene, wherein at least one of
said sensed images is taken with optical or
imaging parameters that are different from at
least one other of said sensed images;

46

(c) depth measurement means, coupled to said
sensor means, for measuring a relative blur
between said sensed images; and
(d) scene recovery means, coupled to said depth
measurement means, for reconstructing said
three-dimensional structure of said sensed
scene from said measured relative blur of
said sensed images.

27. The apparatus of claim 26, wherein said sensor
means comprises a plurality of sensors, each
sensor having X * Y pixels of predetermined width
to form an X * Y sensing grid, said depth
measurement means measuring said relative blur on
a pixel by pixel basis over said X * Y pixel grid,
such that depth information is obtained for each
of said pixels within said X * Y grid.

28. The apparatus of claim 27, wherein said active
illumination means comprises:
(i) an illumination base;
(i) a light source coupled to said illumination
base; and
(ii) a spectral filter having said preselected
illuminating pattern coupled to said
illumination base, such that light from said
light source passes through said spectral
filter to form said preselected illumination
pattern.

29. The apparatus of claim 28, wherein said
preselected illumination pattern of said spectral
filter is optimized so that a small variation in
the degree of defocus sensed by said sensor means
results in a large variation in the relative blur
measured by said depth measurement means.

49
30. The apparatus of claim 29, wherein said optimized
illumination pattern is a rectangular grid
pattern.

31. The apparatus of claim 30, wherein said optimized
illumination pattern comprises a pattern having a
period being substantially equal to twice said
pixel width and a phase shift being substantially
equal to zero with respect to said sensing grid,
in two orthogonal directions.

32. The apparatus of claim 30, wherein said optimized
illumination pattern comprises a pattern having a
period being substantially equal to four times
said pixel width and a phase shift being
substantially equal to one eighth of said pixel
width with respect to said sensing grid, in two
orthogonal directions.

33. The apparatus of claim 28, wherein said light
source is a Xenon lamp.

34. The apparatus of claim 28, wherein said light
source is a monochromatic laser.

35. The apparatus of claim 34, wherein said sensor
means further comprises:
(i) a sensor base;
(i) first and second depth sensors, coupled to
said sensor base, for sensing depth images of
said scene formed by said laser light, such
that said depth measurement means measure a
relative blur between said sensed laser light
images; and
(ii) at least one brightness sensor, coupled to
said sensor base, for sensing an image of
said scene formed by ambient light.

48
36. The apparatus of claim 26, wherein said sensor
means comprises:
(i) a sensor base;
(ii) a lens, coupled to said sensor base and
optically coupled to said illuminating means,
for receiving scene images;
(iii) a beamsplitter, coupled to said sensor base
and optically coupled to said lens, for
splitting said scene images into two split
scene images; and
(iv) first and second sensors, coupled to said
sensor base, wherein said first sensor is
optically coupled to said beamsplitter such
that a first of said split scene images is
incident on said first sensor and said second
sensor is optically coupled to said
beamsplitter such that a second of said split
scene images is incident on said second
sensor.

37. The apparatus of claim 36, wherein said sensor
means further comprises:
(v) an optical member having an aperture, coupled
to said sensor base in a position between
said lens and said beamsplitter, being
optically coupled to both said lens and said
beamsplitter such that images received by
said lens are passed through said aperture
and are directed toward said beamsplitter.

38. The apparatus of claim 36, wherein said first
sensor is at a position corresponding to a near
focused plane in said sensed scene, and said
second sensor is at a position corresponding to a
far focused plane in said sensed scene.

39. The apparatus of claim 38, wherein said spectral
filter includes an illumination pattern capable of

49
generating multiple spatial frequencies for each
image sensed by said first and second sensors.

40. The apparatus of claim 26, further comprising:
(e) a support member, coupled to said active
illumination means and said sensor means; and
(f) a half-mirror, coupled to said support member
at an optical intersection of said active
illumination means and said sensor means,
such that said preselected illumination
pattern is reflected by said half-mirror
prior to illuminating said scene, and such
that said scene images pass through said
half-mirror prior to being sensed by said
sensor means, whereby said illumination
pattern and said scene images pass through
coaxial optical paths.

41. The apparatus of claim 26, further comprising:
(e) a support member, coupled to said active
illumination means and said sensor means; and
(f) a half-mirror, coupled to said support member
at an optical intersection of said active
illumination means and said sensor means,
such that said preselected illumination
pattern passes through said half-mirror prior
to illuminating said scene, and such that
said scene images are reflected by said half-mirror
prior to being sensed by said sensor
means, whereby said illumination pattern and
said scene images pass through coaxial
optical paths.

42. The apparatus of claim 26, further comprising:
(e) a support member, coupled to said active
illumination means and said sensor means; and
(f) a polarization filter, coupled to said
support member at an optical intersection of



said active illumination means and said
sensor means, such that said preselected
illumination pattern is reflected by said
polarization filter prior to illuminating
said scene, and such that said scene images
pass through said polarization filter prior
to being sensed by said sensor means, whereby
said illumination pattern incident on said
scene and said sensed scene images are both
polarized in controlled polarization
directions.

43. The apparatus of claim 27, wherein said depth
measurement means further comprises:
(i) analog to digital converting means, coupled
to said sensor means, for converting sensed
images into digital signals on a pixel by
pixel basis; and
(ii) convolving means, coupled to said analog to
digital converting means, for convolving said
digital signals on a pixel by pixel basis to
derive power measurement signals that
correspond to the fundamental frequency of
said illumination pattern at each of said
pixels for each sensed scene image.

44. The apparatus of claim 43, wherein said depth
measurement means further comprises:
(iii) registration correction means, coupled to
said convolving means, for correcting said
power measurement signals for mis-registration
on a pixel by pixel basis, such
that any errors introduced into said power
measurement signals because of misalignment
between said sensing pixels of said grid and
said illumination pattern is corrected.

51
45. The apparatus of claim 44, wherein said
registration correction means further include
arithmetic means for multiplying each of said
power measurement signals, on a pixel by pixel
basis, by the sum of the squares of said power
measurement signal's four neighboring power
measurement signals.

46. The apparatus of claim 43, wherein said depth
measurement means further comprises:
(iii) normalizing means, coupled to said convolving
means, for normalizing said power measurement
signals on a pixel by pixel basis.

47. The apparatus of claim 43, wherein said depth
measurement means further comprises:
(iii) comparator means, coupled to said convolving
means, for comparing said power measurement
signals for one of said sensed images, on a
pixel by pixel basis, with determined power
measurements for a second of said sensed
images, to determine said depth information
at each of said pixels.

48. The apparatus of claim 47, wherein said comparator
means includes a look-up table.

49. The apparatus of claim 27, wherein said scene
recovery means comprises depth map storage means,
coupled to said depth measurement means, for
storing derived pixel by pixel depth information
for said scene as a depth map.

50. The apparatus of claim 49, further comprising:
(e) display means, coupled to said scene recovery
means, for displaying said depth map as a
wireframe on a bitmapped workstation.

52
51. The apparatus of claim 35, wherein said scene
recovery means comprises three-dimensional
texturemap storage means, coupled to said depth
measurement means and said brightness sensor, for
storing derived pixel by pixel depth information
and brightness information for said scene, further
comprising:
(e) display means, coupled to said scene recovery
means, for displaying said three-dimensional
texturemap as a wireframe on a bitmapped
workstation.

Description

Note: Descriptions are shown in the official language in which they were submitted.


96/41304 ~ b~ ~ 3 PCT~S95~07890

Descri~tion

Apparatus and Methodg for D~t~rm;n;ng
the Three-Dimensio~al Shape o~ an Object
~sing Active Tllnm;n~t;nn and Relative
Rlnrr;n~ in Two Imaqes Due To Defocus
.




~ackqround of ~he Invention

I. Field o~~the invention.
The present invention relates to t~n; all~Q for
mapping a three~ ;nn~l 8tructure or object from
two-dimensional images, and more particularly relatea
to t~hn;~l~q employing active ;ll n~tion to retrieve
depth in~~rr~t;~n,

II. Description o~ the related art.
A pertinent problem in ~ ~~tinn~l vision is the
recovery of three-dimensional mea~uL~ 8 of a
structure ~rom two~ n~l imageg. There have been
many proposed solutions to this problem that can be
broadly rl~QQ;fi~ into two categories; passive and
active. Passive techniques such a~ shape from shading
and texture attempt to extract ~tructure from a single
image. Such t~n;~l~c are still under investigation
and it ia expected they will prove complem.-ntary to
other techniques but cannot serve as stand alone
approaches. Other passive methods, such as stereo and
structure from motion, use multiple views to resolve
shape ambiguities inherent in a single image. The
primary problem ~n~ollnt~red by these methods has proved
to be corresp~nA~n~ and ieature tracking. In
~A~t;~n, paggive algorithmE have yet to demonstrate
the accuracy and robustne 8 required for high-level
perception tasks such as object recognition and pose
estimation.
Hitherto, high-quality three-~; q;~n~l mapping
o~ object~ has resulted only from the use of active
.

W096/41304 l~ 1 96~ 63 r~ Y,lu~

sen30rs based on time of flight or light striping.
Prom a practical perspective, light stripe range
finding has been the preferred approach. In structured
enviL, ~~, where active riq~;iqt;~n of a scene is
feasible, it offers a robust yet ;n~r~nR;ve solution
to a variety of problems. ~owever, it has suffered
from one inherent drawback, namely, speed. To achieve
depth maps with sufficient spatial resolution, a large
number (say, ~ of closely spaced stripes are used. If
all stripes are projected simultiqn~nl~ly it is
~ 8;hle to aggociate a unique stripe with any given
image point, a process that is necessary to compute
depth by triangulation. The cliq~1qirivl approach is to
obtain multiple images, one for each stripe. The
requirement for multiple images increases the required
time for mapping.
F4cus analysis has a major advantage over stereo
and structure from motion, as two or more images of a
scene are taken under different optical settings but
from the same viewpoint, which, in turn, CiL~ La
the need for Co~ A~n~ or feature tracking.
~owever, differences between the two images tend to be
very subtle and previous ~ol-lt;~n~ to depth from
defocu~ have met with limited success as they are based
on rough appr~;r-t;~r~ to the optical and sensing
-hiqn;~ involved in focus analysis.
Fnnl~ ~iq1 to depth from defocu~ is the relaticn-
ship between focused and defocused images. Figure 1
shows the basic image formation geometry. All~light
rays that are~radiated by object point P and pass
through aperture A, having an aperture ~ r a, are
r~fra~t~ by the lens to converge at point Q on the
focus image plane I~. For a~thin lens, the relationship
between the object distance d, focal length of the lens
f, and the image distance d1 is given by the Gau3~ian
lens law:

~ WO 96141304 PCI-/US95~07890
2i 9~5/~3 ' ; ' i

-~ 1 +'-1 1
d di f

~ ach polnt on an object plane that includes point
P is projected onto a single point on the image plane
I~, causing a clear or focused image to be formed. If,
however, a sensor plane such as I1or I1, does not
~oin~ with the image focus plane and is displaced
from it, the energy received from P by the lens is
distributed over a patch on the sensor plane. The
result is a blurred image of P. It is clear that a
single image does not include sufficient information
for depth est;~t;nn a3 two scenes defocused to
different degrees can produce ~nt;~l image ~
A solution to depth is achieved by using two images
iormed on image planes Il and I2 separated by a known
physical distance ~. The problem i8 reduced to
analyzing the relative blurring of each scene point
in the two images and , , ;ng the distance ~ to
the focused image for each image point. Then, using
d~ , the lens law (1) yields depth d of the scene
point. Simple as this procedure may appear, several
technical problems emerge when ; ,1~ ;n~ a method of
practical value.
First, there i8 the problem of det~rm;n;ng
relative defocus. In frequency domain, blurring can be
viewed as low-pass filtering of the scene texture.
Relative blllrr;ns can thus in principle be estimated by
frequency analysis. However, the_local object texture
is unknown and variable. Since the effect of blurring
is fre~uency ~p~n~nt, it is not I ~n~ngful to
investigate the net blurring of the entire rnl~ nn
3C o+ fr~ nr;~ that constitute scene texture. This
observation has forced lnvestigators to use narrow-band
filters that isolate more or less single frequencies
and estimate their relative attPnn~t;nn due to defocus
in two or more images. Given that the d ~n~nt

W09~41304 PCT~S95/07890
21 ~6563 ~ ~
fre~uencies of the scene are unknown and possibly
spatially varying, one is forced to use compleY
filtering teahn;~-~a that add to the complexity of the
process. This ~ lPY;ty makes the approach
impractical for any real-time appl;rPt;nn.
A second problem with the depth from defocus
terhn;~ is with respect to textureless surfaces. If
the imaged surface is textureless (a white sheet of
paper, for instance) defocus and focus produce ~
;~nt;~ images and any number of filters would prove
ineffective in est;m-t;ng relative blurring.
Particularly in structured envi~. a this problem
can be ~ csed by projecting an illl n~timn~pattern
on the scene of interest, i.e. forcing scene texture.
Indeed, ;llllm;n~t;~n projection has been suggested in
the past for both depth from defocus and depth from
pattern size distortion under perspective projection.
For example, Girod et al., "Depth From Defocus of
Structured ~ight, n Pro~ ;ngq of the SPIE - The Int'l
Soc~y for Optical Eng'g, vol. 1194, pp. 209-215 (1990)
~; arl os~q the uge of a structured light source iL a
depth from defocus range ~ensing system. Girod
projects a structured light pattern levenly spaced
vertical lines) through a large ~elLuL~ lens onto the
object surface. Girod detects a single image which has
image characteristics derived from the defor--a8;n~
effects of the large dye~Lu~ light source. Girod also
suggests use of an anisotropic aperture, e.g., a slit
or T-shaped aperture, in conn~at;~n with the light
source to produce orth~g~n~l patterrs that can be
compared to remove systemic errors due to the limited
depth of field of the camera.
Similarly, A. Pentland et al., "Simple Range
Cameras Based OA Focal Error," J. Optical Soc~y of
America, vol. 11, pp. 2925-3~4 (1994) discloses a
structured light sensor which projects a pattern of
light (evenly spaced vertical~lines) via a simple slide
projector o~to a scene, measures the appare~t blurring

~ w096i4~304 2 1 ~ 6 5 6 3 ~ . ~ /vby~

of the pattern, and compares it to the known (focused)
original light pattern to estimate depth.
~ otably, these proposed snl~lt;nnq rely on
ev~ln~;nr, defocus from a slngle image. A~q a result,
they do not take into account variations in the defocus
eV~ t;nn that can arise from the natural textural
characteristics of the object.
When cnnq;~r;ng a multiple image system, the
relation between magnification and focus must be taken
into account. ,In the imaging system shown in Figure 1,
the effective image lor5t;nn of point P moves along ray
R as the sensor plane is ~;qpl~rPd Accordingly the
~foc~qe~ image formed on plane I~ 1G larger than the
focused image that would be iormed on plane I~ and both
of these image6 are larger than that formed on plane I2.
This causes a shift in image coor~;n~tes of P that in
turn depends on the u~Xnown scene coordinates of P.
This variation in image r-gn;f;r~t;nn with defocus
manifests as a corr~ ce-like problem in depth
from defocus since it is n~r~RsAry to compare the
defocus o_ ~ULL. ~ ;nrJ image ~l~ c in image planes
Il and I~ to estimate blurring. This problem has been
und~L _~qized in much of the previous work where a
precise focuq-magnification calibration of motorized
zoom-lensea is suggested and where a registration-like
correction in lmage domain is proposed. The
r~l ;hr~t;nn approach, while effective, is cumbersome
and not viable for many of~-the-shelf lenses.

SummarY of the Invention
An object of the present invention is to provide
an apparatus for mapping a three-dimensional object
from two-~; ~;nn~l images.
A further object of the present invention is to
provide an apparatus and method employing active
;ll n~t;nn to retrieve depth ;nfnrr~t;nn based on
focus analysis.

WO9~41304 2 1 ~6~63 ~ PCT~S9~07890 ~

A further object of the present invention is to
provide a system that uses ;nPYp~nAlve o~-the-shelf
imaging and processing hardware.
A further object of the present invention is to
provide a system for determining depth information with
improved accuracy.
Still a further object of the present invPnt;o~ ia
to provide a depth from defocus method that uses two
scene images which correspond to different levels of
focus to retrieve depth information.
Still a further object of the present invention is
to provide an apparatus and method for determining
depth from defocus analysis based on a careful analysis
of the optical, sensing, and , _ ~t;rnal elements
required.
In order to meet these and other objects which
will become apparent with reference to further
disclosure set forth below, the present invention
broadly provides a method for measuring the three-
~ aion~l structure of a object by depth fromdefocus. The method re~uires that the scene be
;11 n~tP~ with a preselected ;11 n~tirn pattern,
and that at least two images of the scene be sensed,
where the sensed images are formed with different
imaging parameters. The relative blur~between
corr~apon~;ng ~ 1 portions of the sensed images
is measured, thereby to ~t~rm; n~ the relative depth of
corr~aprn~;ng elemental portions of said three
~ irnAl structure.
The present invention also provides an ~dL~Lus
for mapping the three-~; a;rn~l structure of an
object by depth from defocus. The apparatus ;rrl~ a
active ;lll1min~t;rn means for ;llnm;n~ting the object
with a preselected ;11 n~t-on pattern and sensor
means, optically coupled to the ;ll11min~t;ng means, for
sensing at least two images of the object, wherein the
sensed images are taken with different imaging
parameters. The apparatus also inrlll~pa depth

~ Wos6141304 2 1 9 6 5 6 3

mea~u,. means, coupled to the sensor means, for
measuring the relative blur between the sensed images,
thereby to ~t~rm; n~ the depth of the
portions of the object.
Preferably, the images are sensed via constant
image magnification sensing as an array having X 1 Y
pixels of predetermined width, and the relative blur is
measured on a pixel by pixel basis over the X ~ Y pixel
array, so that depth ;nforr-t;nn is ~htA;n~ on a pixel
by pixel basis over the X ~ Y array.
In one ~ , the scene is ill1 n~t~ by a
Xenon light source filtered by a spectral filter having
the preselected ;ll1lm;nAt;ng pattern. ~he spectral
filter may be selected 80 a~ to produce an ;11 nAt;~n
l~ pattern which g~n~r~t~ multiple spatial frequencies
~or each image element sensed. In a different
: '-'; ~, the light source is a mono~l," ;c laser
light source, and at least two depth images of the
scene formed by the laser light and at least one
brightness image of the scene formed by ambient light
are sensed. In this : '~';- , the relative blur
between the sensed laser light images is measured.
Preferably, the preselected illnm;nAt;~n pattern
is o~t; m; 7~ 80 that a small variation in the degree of
defocus results in a large variation in the measured
relative blur. The optimized ;11 nAt;on pattern
advantageously takes the form of a r~ctAn~-l ~r grid
pattern, and is preferably selected to have a grid
period which is substAnt;Ally egual to twice the pixel
array element width and a registration phase shift
being subst~nt;Ally equal to zero with respect to the
sensing array, in two orth~g~nAl directions.
~lt~rn~t;vely, the optimized ;llnm;nAt;~n pattern may
have a period which is subst~nt;Ally equal to four
3~ times the pixel element width and a registration pha~e
shift being substAntiAlly equal to ~/4 with respect to
the sensing array in two orth~g~n~l directions.




, ~ .

WO9~41304 2 1 9 6 5 6 3 8 PCT~S95/07890

In a preferred ~ , two images are sensed,
a first image at a position ~rr~p~n~ i ng to a near
focused plane in the sensed sce~e, and a second image
at a position ~LL ~ A;ng to a far focused plane in
the sensed scene.
~ alf-mirror optics may be provided so that the
;lln-;n~ n pattern and the scene images pass along
the same optical axis.
Polarization optics which polarize both the
10 i 11 nm; n~ n pattern and the scene images in controlled
polarization ~;rPrt;~nq may be used to filter specular
r~fl~r~; ~n q from the object.
Adv~n~geo~qly, the sensed images are converted
into digital signals on a pixel by pixel basis, and are
then convolved to determine power mea~ul t signals
that corre~pond to the f, ' ~1 fre~uency of the
;llll~;n~ n pattern at each pixel for each sensed
scene image. The power mea~uL, signals are
preferably corrected for mis-registration on a pixel by
pixel ba~is, such that any errors introduced into the
power mea~uL signals because of misalignment
between the sensing pixels of the grid and the
illnm;n~ n pattern is corrected. Correction may be
effP~n~tP~ by multiplying each of the power
mea~uL ' signals, on a pixel by pixel basis, by the
sum of the squares of the power mea~uL siB al's
four r~;3hhnring power mea~uL signals. In
addition, the power mea~uL ~ signals are preferably
normalized on a pixel by pixel basis
The power measurement signals for one of the
sensed images are ,:~ ~d, on a pixel by pixel basis,
with ~Pt~rm;nPd power mea-uL, ~ for a second sensed
image to determine depth information at each pixels.
Look-up table mapping may be used for such comparison.
The pixel by pixel depth information is preferably
arranged as a depth map and may be displayed as a
wireframe on a bi~rarr~d workstation. If both laser-
light depth images and ambient light brightness images

~ WO96/41304 2 1 PCr/US95~07~90
9~63 9~
are sensed, the brightn~8 zmage may be preferably
displayed concurrently with the depth map 80 that the
three-fl; ~; nn~ structure of the sensed scene and the
actual scene picture are concurrently displayed as a
t~LuL _.
Based on the above t~rhni~l~, both textured and
textureless surfaces can be recovered by uaing an
optimized ;lll-m;nR~ion pattern that i8 registered with
an image sensor. Further, constant magnification
defocusing is provided in a depth-from-defocus imaging
t~rhn;~l~, ~rror~;ngly, techniques for real-time
three-fl; tnn~l imaging are provided herein which
produce precise, high resolution depth maps at frame
rate.
The ac~ ying drawings, which are incorporated
and constitute part of this disclosure, illustrate a
preferred '~ of the invention and serve to
explain the pr;nr;r~l~ of the invention.

Brief De8cri~tion of the Drawinqs
Figure 1 is a ~; ~l;fied diagram illustration the
_ormation of images with a lens.
Figure 2 is a ~; _1; f; Pd diagram showing the
a~ L~l~. ' of the optical portions of the present
tn~r~nt; nn,
Figure 2A is a plan view of the image detectors
used in the Figure 1 apparatus.
Flgure 3 is a plan view of the image detectors
used in the Figure 2 apparatus.
Figures 4A and 4B show the preferred ~LL~ily
of rectangular grids for the projection screen used in
the Figure 2 : ' ~fl; .
Figure 5 is a diagram showing the spatial and
fn~l~nry domain fnnrt;nn for op~;m;78~;nn of the
system of the present invention.
Figure 6 are diagrams showing the determ; n~; nn of
the tuned focus operator.

WO9~41304 ~ =, PCTNS9~07890
2 , 9 6 5 6 3 ~ ~
Figure 7 iB a graph of the normalized defocus
factor as a function of distance between the image
plane and the focused image plane.
Flgure 8 iB a drawing of the optics of the
apparatus of Figure 2 showing alternate filter
locations.
Figure 9 i8 a plan view of a BPl Prt~hl P filter.
Figures lOA to lOD show alternate aLL~u~ for
polarization control in the apparatus of the invention.
Figures llA to F show an alternate grid pattern
and the use of tuned f;ltPr; ng to provide two ranges of
depth determination therefrom.
Figures 12A through D show further ~ltPrn~te grid
patterns and the frequency response thereto.
Figure 13 is a 8; ,l;f;P~ drawing of an apparatus
for detprm;n;ng depth of imaged object Pl~ ~ using
two dif~erent apertures.
Figure 14 is a plan view of a pair of constant
area, variable geometry apertures.
Figure 15 showa a registration shift of the grid
pattern with respect to a pixel array.
Figure 15 i8 ~n ~'lJ-~ dLUs for recovering depth
infnrr~t;nn and image3 of an object.
Figure 17A through E illustrates a phase ahift
grid screen and the resulting ;~ m1n~t;on and spatial
frequency respon~e.
Figure 18 i5 a computation ~1OW diagram showing
the derivation of depth map ;nfnr~-t;nn from two
defocused images.

Descri~tion of the Pre~erred Embodiments
Reference will now be made in detail to the
present preferred Pmhn~; ' of the ;n~Pnt;nn as
illustrated in t~he drawings.
Figure 2 is a simplified optical diagram showing
an ~pp~r~tll~ 10 for mapping the three ~ ;nn~l image
of an ob~ect 22 within its field of view. The
apparatus 10 ;nr1ll~P~ a light source 12, preferably of

~ wo96l4~304 2 1 96 5 63 ;~ PClnrsss/n7~s~
/1
high intensity, such aG a Xenon arc lamp, a strobe
light or a laser source. Light source 12 ;11 nm; nAtes a
partially transparent screen 14 having a two
nAl grid to be projected as ;~ min~ti~ onto
object 22. A lens 16 aperture 18 and beam splitter 20
are provided such that an image of the screen 14 is
projected onto object 22 along optical axis 24. Such
active ;11 n~ti,-n provides a well defined texture to
object 22 as viewed from the apparatus 10. In
C~mn~t;rrl with measuring depth by defocus ~
image portions of the object 22 can only be measured
where they have a defined texture, the defocus of which
can be observed in the Apr~rAtll~ 10. The grid
;lln~l;n~tirm pattern provides a forced and well de$ined
texture onto object 22.
The ;-~,L~A~ 10 is arranged so that light from
light source 12 is projected along the same optical
axis 24 îrom which object 22 is observed. For this
reason a beam aplltting rf~f~ctor 20 is provided to
project the ;lln~;nAt;r~n in the direction of object 22.
R~fl e~-tor 20, which is illustrated as a plane mirror
may pre$erably be a prism beam splitter.
Light from the ;11 nAt;r~n grid pattern as
r~flecte~ $rom object 22 along the optical axis 24
passes through beam splitter 20 into a detection
;~ppzlrAtll~ 25. Apparatus 25 includes a focal plane
aperture 26 and lens 28 for providing invariance of
image r-~nif;~At;r~n with vAr;Atirm in the image plane,
also referred to as a t~l~c~ntr;r lens system. A
second beam splitter 30 ~ fl~'tR images of objeot 22
onto image detecting arrays 32 and 34, each of which
has a different spacing along the optical axis from
lens 2 8 to provide a pair of detected images which have
different degrees of de$ocus, but identical image
~gn;f;c~t;on~
The characteristics of a tele~.ontr; r lens system
using a single lens will be rl~ r;h~d in ~ nn~rt;~ln
with the diagram of Figure 3. As previously discussed

W096141304 21 96563 ~ i ' r~
/~ .
with respect to Figure l, a point P on an object i6
imaged by a lens 28 onto a point Q on the ~OCU3 plane
euLL ~ ;rg to the image distance di from lens 28.
Image distance dl is related to the object distance d
from lens 28 to point P on the object by the len6
formula (l). At image plane6 Il which is 6paced by
distance ~ from the focu6 plane I~, a blurred image
corresponding to an enlarged circular spot will result
from the defocusing of object point P. Bikewise, a
blurred image of point P will be formed on image plan
I" located a distance ~ from image plane Il. Image
planes Il and ~ are on opposite sides of focuG plane
I~. The objective of the tPl ~c~ntric len6 system,
wherein aperture 26 is located a distance f,
CULL ~ ;ng to the focal length of lens 28, in front
of the lens 28 is that the size of the resulting images
on image planes Il and I2 will be the same as the ideal
image that would ~e projected on focus plane Ii. This
is evident from the fact that the optical center of the
20 refracted image path behind lens 28, which is
designated by R' is p~rAllrl to the optical axis 24.
Of course, those skilled in the art will appreciate
that the location of the apparatus will depend on the
optical properties of the lens choaen. Indeed, certain
lenses are r-m1f~t11red to be tPl~rrntric v-nd do not
reciuire an ~;t;~n~l d~elLure~ .
In the ~rp~r~tnq lO of ~igure 2 image planes 32
and 34 are located at ~;ff~r~nt optical distances from
lens 28 corr~p~n~;ng, for example, to image planes Il
30 and I2 respectively. Because a t~l~c~ntric lens system
is used the images ~t~rt~ on these image planes will
have ;~nt; r~ 1 image size and corresponding locations
of image ~ with different amounts of defocus =
resulting from the different distances from lens 28
along the optical paths to the image planes.
~ igure 2A is a plan view from the ;nr;~nt optical
axis of image sensors 32, 34, which are preferably
charge coupled devices (CCD). In a preferred

O WO 96/41304 2 ~ 9 6 5 6 3 ~ r~
~ ~ ~ C ~
aLL~n~ , the image planes consists of image sensing
cells 36 which are preferably arranged in an array of
512 by 480 Pl ~ at regular spacings Px and py along
the x and y airections. The CCD image ~PtPrtnr is
preferred for use in the systems of the~present
invention because of the digital nature of signal
i;ng elements, which facilitates precise
;~pnt;f;r~t;nn o_ picture Pl~ ~A in two image planes
32 and 34 so that a correlation can be made between the
image Pl ~. Alternatively other image sampling
devices, such as a television receptor tubes can be
used with ~Lu~Llate analog-to-digital conversion.
~owever, the use of such analog devices may result in
possihlP 1088 of precision in detPrm;n;nr~ the defocus
effect upon which the present system depends for depth
pPrrPpt;nn.
Figure 4A is a plan view of a screen 14 according
to one of the preferred ~ ; R of the present
invention. The screen 14 may be formed on a glass
sub~trate, for example by photo-etching techniques, and
has a rhprkprho~rd grid of ~ltPrn~t;nn~ transparent and
opaque P~ q of a size bx times by and periodicity tx
ind ~, which is sPle~te~ such that the individual black
and white rhPrkPrho~rd square~ of grid 14 project
images which, after rPflPrt;ng ovf_ the object 22, are
~nr;~Pnt o~ the photo image detectors 32 and 34 with an
image grid element size rnrrPRpon~ln3 to the size and
periodicity of image ~PtPCt;ng Pl ~ 36 in arrays 32
and 34. Accordingly, the illnm;n~t;Gn grid period (tx,
ty) of the projected grid image on the photo ~Ptectnr
arrays is twice the spacing (Px, py) of the imaging
Pl ~ of the array, which uuLla~ullds to the
detected pixel ~ of the image. Those skilled in
the art will recognize that the defocused images of
grid 14 on ~n~;vidual pixel Pl s of ~Pterton~ 32
and 34 will be defocused ~rom the ideal image by reason
of the spacing o~ arrays 32 and 34 from the ideal
~ocusing plane ovf the image. The amount of such

W096l4l304 2 1 ~ 6 5 6 3 ~t~ PCT~S95/07890 O

defocusing for each image element and for each detector
array ia a function of the spacing S of the
corrP~ptntl;ng object portion from the ideal focused
plane F in the object field for the corrP~pt~rt~;nt3
tlPte~t~r array. By ~P~Pct;ng image intensity in each
of the detecting pixel Pl~ ~ 36 of detecting arrays
32, 34, it i8 possible to derive for each defocused
image on ~P~P~; ng arrays 32, 34 the amount of
defocusing ofjthe projected grid which:results from the
spacing S of the ~tJL~ n~;nJ object portion from the
plane of focus F in the object field for that array.
Accordingly, two mea~uL ~ of the defocus factor, as
described below are obtained for each Pl Rl area of
the object, and such defocus factors are used to
compute a normalized defocus measure which can be
directly mapped to the depth of the corrP~p~nt1;ng
object portion corrP~p~rtl;ng to the pixel element, as
will be described below.

Opt;~n;7~tlrn Proce8s
Ir. order to better describe the principles of
operation of the method and apparatus of the present
invention, it is useful to have an understanding of the
analytical tPrhn;t~P~ by which the optimum parameters
of the apparatus and method were determined. Such
analytical tP~hn;~P~ take into account ~~llng of
both the ;~ m;n~t;~n and imaging fnntt;tnt~ to analyze
the physical and , ~t;~n~l p~ tPrs involved in
the depth from defocus determination, and its
application to varying object field re~uirements.
There are five different Pl~ t~t or ~ Antc,
that play a critical role. We briefly describe them
before proceediug to model them.
1. Tl 1 In~tion Pattern: The exact pattern used to
~llnmin~tt~ the gcene ~P~PrminP~ its final texture. The
spatial and spatial ~re~uency characteristics of this
texture determine the behavior of the focus measure and
hence the accuracy of depth est;r-t;tn. It is the

~ Wo9~41304 2 1 9 6 ~ 6 3 ,~ ~ ~ i?; ' PCT~S9Yo7890

parameters of this cnmrnnPnt that we set out to
optimize 30 as to achieve maximum depth accuracy.
2. Optical Transfer Function: The finite size of the
lens aperture 26 imposes restrictiona on the range of
spatial frequencies that are ~tent~h1P by the imaging
system. These restr;rt;on~ play a critical role in the
opt;m;~t;nn of the ;llllm;n~tion pattern. Upon initial
;n~pe~tjnn~ the optical transfer function (OTF) seems
to severely constrain the range of use~ul ;llnm;n~t;nn
patterns. ~owever, as we shall see, the OTF's limited
range also enables us to avoid serious problems such as
image aliasing.
3. D~oc~Q;ng: The depth d of a surface point is
directly related to its defocus (or lack of it) on the
image plane. It is this rh that enables us to
recover depth from defocus. It is well-known that
defocus is ess~nt;~lly a low-pass spatial filter.
Eowever, a realistic model for this rhPr~ is
imperative for focus analysis. Our objective is to
determine depth from two images on planes Il and I~ by
est;m-t;ng the plane of best focus I~ for each scene
polnt P.
4. Image Sensing: The two images used for shape
recovery are of course discrete. The r~l~t;nn~h;r
between the cnnt;nnnll~ image formed on the sensor plane
and the array discrete image used in computations is
determined by the shape and spatial arrangement of
sensing ~ 36 (pixels) on the image ~tpctnrs 32,
34. As will be ~hown, the final ;ll n~t;on pattern
14 will inclnde Pl~ ~nt~ that are comparable to the
size Px, py of each pixel on the sensor array.
Therefore, an accurate model for image sensing is
e~sential for ;llllm;n~tion nrt;m;7~tjnn
5. Focus Operator: The relative degree of defocus in
two images is estimated by using a focus operator.
Such an operator is typically a high-pass filter and is
applied to ;screte image P~ . Interestingly, the

W096/41304 2 1 q 6 5 6 3 , ~

optimal ;11 n~tion pattern is also ~L~n~l "I on the
parameters of the focus operator used.
All the above factors together determine the
relation between the depth d of a scene point P and its
two focus measures. Therefore, the optimal
~ min~t;on grid 14 ig viewed as one that ~-~;mi7
the sensitivity and robustness of the focus measure
function. To achieve this each is modeled in
spatial as well as Fourier domains. Since we have used
the t~ler~ntric lens (Figure 3) in our impl: ~t;nn,
it~s parameters are used in developing each model.
However, all of the following expressions can be made
valid for the classical lens system (Figure 2) by
f : di
simply replaclng the factor , by a


Ill_ n~tion Pattern
Before the parameters of the illumination grid 14
can be ~t~rmi n~, an ill n~tion model must be
defined. Such a model must be flexible in that it must
subsume a large enough variety of possible ;11 ;n~tion
ao patterns. In ~f;n;ng the model, it is meaningful to
take the characteristics of the other r~ ~nn~nt~ into
cnnR;~r~tinn. As we will describe shortly, the image
sensors 32, 34 used have rectangular sensing ~
36 arranged on a rrrt~n~l~r spatial array as shown in
Figure 2A. With this in mind, we define the following
i11 nmi n~tion model. The basic building block of the
model is a rectangular ill in~ted patch, or cell, with
uniform intensity:
iC(x,y) =iC~x,y; bx~by) =2II ( bl x, b Y) ~ 12)

where, ~II() is the two-~i inn~l Rectangular
function. The unk~own parameters of this illnmin~tion
cell are bX and i~, the length~and width of the cell.

_ Wo96/41304 -17- p
~ 2 ~ 95563
This cell is assumed to be repeated on a two-
dimensio~al grid to obtain a periodic pattern as shown
in Figures 4A and 4B. This periodicity is essential
since our goal is to achieve spatial invariance in
depth accuracy,~ i.e. all image regions, irrespective o_
their distance irom each other, must pos6elss the same
textural characteristics. The periodic grid i~ de~ined
as:
~ ,y) i~(x,y; tX,ty) =2III( 2( 1 x+ 1 y~ x 1 ))

where, IIII() is the 2-dimensional Shah fnn~t;nn, and
2tX and 2ty determine the periods o~ the grid in the x
and y directions. ~ote that this grid is not
rectangular but has vertical and horizontal symmetry on
the x-y plane. The +inal ;ll ;n~t;~n pattern i(x,y)
is ~btA;n~ by convolving the cell iC(x~y) with the grid
ig(x~y)
i(X,y)=i(X,y; bx~by~tx~ty) =iC(x~y) ~ig(X,y) (4)

The exact pattern is therefore ~t~rm;n~d by +our
parameters, namely, bx, by, tx and ty. The above
~1nm;n~t;~n grid ig not as restrictive as it may
appear upon i~ltial ~n~p~rtj~n. For instance, bx~ by,
2tX and 2ty can each be stretched to obtain repeated
~l1nm;n~tion and non-il1-lmin~tion stripes in the
horizontal and vertical directions, respectively.
~lt~rn~t;vely, they can also be adjusted to obtain a
~h~k~rh~rd ;llnm;n~t;~n pattern with large or small
;11 n~t~ patches. The exact values for bx, by, tx
and ty will be evaluated by the optimization procedure
described later. I~ practice, the ;ll nAt;~n pattern
determined by the optim;7~t;nn is used to f~hr;~te a
screen with the same pattern.
The opt;m;~ti~n ~L~ceduL~ requires the analysis
o~ each -nt c~ the 8ystem in spatial domain as




. . ~ . ~.,

WO96/41304 PCT~S9~07890
2 1 96563 , 8' ' '" '
well as frequency domain (u,v). The Fourier tran3forms
of the ;~ m;nation cell, grid, and pattern are denoted
as Ic(u,v), Tq(u,v), and I(u,v), respectively, and found
to be:
r (u v) ~ Ic(u~v; bD~ by) = bs ~b5~ by (b
(u~ v) = I~(u~ v; t=~ ty) = 2lIl((tsu + tl~v)~ (tru--tvv)) (~

I(u~ v) = I(u~ v; bS, by~ tS~ ty) = Ic(u~ v) I9(u~ v)

O~tical Tr~r~f~r F~m~t;nn
Ad~acent points on the illl n~t~ surface reflect
light waves that interfere with each other to produce
diffraction effects. The angle of diffraction
increases with the spatial frequency of surface
texture. Since the lens aperture 26 of the imaging
system 25 (Fl~ure 2) i8 of finite radius a', it does
not capture the higher order ~;ffr~tionR r~ te~ by
the surface. This effect places a limit on the optical
r~nl-~tion of the imaging system, which is
characterized by the optical transfer function lOTF):

O(U,V) = O(u,lli a/ f) (s)
= 1 ($) (7--sin~~ fi~< zr'
l~~ v~ J
where~=2cos~'(~


where, (u,v) is the spatial frequency of the two-
n~l surface texture as seen from the image side
of the lens, f is the focal length of the lens, and A
is the wavelength of ;n~ n~ light. It is clear from
2C the above expression that only spatial frequencies

~ W096/41~.04 ~ .f~,r~
2 1 96563 ,~

below the limit ~f will be imaged by the optical

system (Figure 5). This in turn places restrictions on
the frequency of the ;1l ;nFt;r~l pattern. Further,
the above fr~lr~nry limit can be used to "cut off'~ any
desired r~mber of higher harmonics produced by the
~ ;n-~t;on pattern. In short, the OTF is a curse and
a blessing; it limits the ~r~tr-~rtAhle range of
frequencies and at the same time can be used to
7n;n;7n;7e t~e detrimental effects of ;1;A~;ng and high-
order ~ ;rA.

Pefocus~inr~
~ ,7~r~rr;nrJ to Figure 3, a i8 the distance betweenthe focus image plane Ir of a surface point P and its
defocused image formed on the sensor plane Il. The
light energy radiated by the surface point and
cr7l1~rt~d by the imaging o~tics is nn;fornly
distributed over a circular patch on the sensor plane.
This patch, also called the p~llbox, is the defocus
function (Figure 7~:

7(T y) = h(r~y;cr~a~f) = 21rar2C~ (2ac~) (s7

where, once again, a' is the radius of the t~l~r~ntric
lens d~eLLuL--. In Fourier domain, the above defocus
fllnrt;rn is given by:

I(U~V) = H (u,v;cr~a'~f) = ~a~ ~ 5J (2~ra~r ~ ) (10

where J1 is the first-order Bessel function. As is
evident from the above expressior., defocus serves as a
low-pass filter. The bandwidth of the filter increases
a~ a x decreases, i.e. as the sensor plane Il gets
closer to the focus plane I~ In the extreme case of
= O, ~(u,vJ passes all frequencieg without attr~nn tirn




.~ ; .~ . . .- ;

Wo96/4l304 2 1 9 6 5 6 3 r~

producing a perfectly focused image. Note that in a
defocused image, all frequencies are attenuated at the
same time. In the ca3e of passive depth from focus or
defocus, this poses a serious problem; different
frequencies in an unknown scene are bound to have
different (and unknown) magnitudes and phases. It is
al;ff;rnl~ therefore to estimate the degree of defocus
of an image region without the use of a large set of
narrow-band focus operators that analyze each Erequency
in ;qrl~t;rn This again ;n~;ratrR that it would be
desirable to have an ;ll n~t;nn pattern that has a
single la n~nt frequency, rn~hl;ng robust estimation
of defocus and hence depth.

~maqe Senslnq
15 We assume the image sensor to be a typical CCD
sensor array. Such a sensor can be modeled as a
rrct~n~l ~r array 32, 34 of rectangular sensing
rl~ 'q 36 (pixels). The quantum rff;r;rnry of each
sensor 36 is assumed to be uniform over the area of the
pixel. Let m(x,y) be the rr,nt;nnnllq image formed on~
the sensor plane. The _inite pixel area has the effect
of averaging the crnt;nnrnq image m(x,y). In spatial
domain, the averaging function is the rect~n~l ~r cell:

~2~y) = ~SC( tyiwr1wy) = ~ It--y) (
r Y

where, wx, and wy are the width and height of the said
2~ sensing element 36, respectively. The discrete image
is obtained by sampling the convolution of m(x,y) with
(x,y). This sampling function is a rectangular grid:

9( 1 Y; P=~ Pv~ Ov) IIl(p~ (Y -- YJv)) ( 1'~

where, Px and py are spar;ngq between discrete samples
in the two spatial dimensions, and (~ y) is phase
shift of the grid. The final discrete image is
therefore:

~ Wo9~41304 PCT~S9~07890
2 196563 ' - ~
r~L( I ~ Y~ = (.7~( 1, y~ * m( ~ Y)) D~ . Y) ( 13

The p- ~rs wx, wy, p~, and py are all ~t~rmin~ by
the particular image sensor used. These parameters are
therefore Ymown and their values are substituted after
array opt;m;7rt;c-n is done. On the other hand, the
phase shift (~ y) of the sampiing function is with
respect to the ;11 'n ~t;on pattern and will also be
viewed ag ;11nm;n~t;nn F-- t~rs during opt;m;~t;~n
To r~rorJn; ~r the importance of these phase parameters
one can visualize the variations in a discrete image
that arise from simply tr~n~1~t;ng a high-frequency
illl n-tir~n pattern with respect to the sensing grid.
In Fourier domain, the above averaging and
sampling fl1n~t;rtn~ are:
r.in(-rw~u) sin~wyu)
~(u,~= SC(u,v;w~,wj)= w;, ~w~ (14
~rwru ~WyU

~,7(U, v) = Sg(ut v; pz~ pyt ~7z7 ~7~ 15
= 21II(pzu~pyv)~ 2~ u+~ v)

The ~in,1 discrete image is:
~fL(U~ V) = (SC(U, V) ' M(U, V)) * SD(U~ V) (16

1~ Focus O~erator
~ ince de~ocusing has the ef~ect of suppressing
high-frequency c ~ '~ in the focused image, it
desirable that the focus operator respond to high
frequencles in the image. For the purpose of
;11 n~tion cpt;mi7~ti~n we use the T.~p1~ n.
However, the derived pattern will remain optimal for a
large class of symmetric focus operators. In spatial
domain, the digcrete T.~pl~ri ~n i8

WO96141304 PCT~S95/07890
2 1 9 6 5 6 3

T~y) = I(~,y; qr~ qli) ~17
40(T) ~ ~i(y) -- [~j(T) ~(y qy) + ( ) (Y q")

+~ qs~ ~5(Y) + ~5( + qs) ~(Y)]
Here, ~, and qy, are the ~pa~;ngq between neighboring
~1~ ' R of the diacrete rAp1~;An kernel. In the
opt;m;7At;rn, these sr~c;ngq will be related to the
; l l n~t; on parameters. The Fourier t~ansform of the
discrete T.AP1 A~i An is:
(u,v) = L(u,v;qr,qv) (18
= 2(1-cos(2~q~u))*~(u)+ 2(1 - cos (2~q~v)) * ~(v)
~--2 COB (2trqSu) --2 cos (2rq~u)

The required discrete nature of the focus operator
comes with a price. It tends to broaden the bandwidth
of the op~r~tnr~ Once the pattern has been determined,
the _bove filter will be tuned to maximize sensitivity
to the fnnfl ~Al ;ll lnAt;nn frequency while
m;n;m;7;n~ the effects of spurious frp~l~n~;pn caused
either by the scene's inherent texture or image noise.

Focus Measure
The focus measure is simply the output of:the
focus operator. It is related to defocus ~ ~and hence
depth d) via all of the c _ -nt ~ modeled above. ~ote
that the illl ;n~t; nn pattern~ *i~) is projected
through optics that is similar to that used for image
formation. Consequently, the pattern is also subjected
to the limits imposed by the optical transfer function
o and array defocus function h. There~ore, the texture
projected o~ the scene is:
i(T~ y; bS, b", tJ, t~) * O(I, y; a, f) * ~ y; ~ a ~ f) ( 19)

where, ~' represents defocus of the ;llnm;nAt;nn itself
that depends on the depth of the ;11 nm; n~t~ point.
However, the ;11 nAt; nn pattern once ; nrifl~nt on a

~ W096/41304 PCT~S9~07890
2196563 a 3 '~
sur~ace patch plays the role of surface texture and
hence de~ocus ~' o~ ill ;nAt;~n does not have any
sign;f;~nt effect on depth est;r-t;nn. The projected
texture is reflected by the scene and projected by the
optics back onto the image plane to produce the
discrete image.

{i(:~,y;bz,bv,tr,tv)~o(r,y;a',f)'2~h'(~,y;~',a',f)*h(T,y;~,a',J) (20)
*~5C(~y;2~l~wv)} ~ 5g(7~YiPs~Pv~0=~50v)

where, oA2 = o ~ o. The final focus measure function
g(x,y) is the result of applying the discrete T.ArlAr;~n
to the above discrete image:
T, y) = ~, (i (T, y; bs~ by~ tr~ tv) $ o~ r~ y; a, f ) (2 1
~h'(I,y;cr~,a~lf) ~ h'(~y;c~,a',f~s~(7,y;w5~wv))
5~/(2~ y; pr~ Pv~ s- Yv) } 'I' l(2~ y; qr~ QV)
= {(i~o'2~1 2~5C)-lo}~l

Since the distance between adjacent weights of the
T,~rl ~ n kernel must be integer multiples of the
period o~ the=image sampling function ~, the above
expression can be rearranged as
2~y) -- (i * o'~ * h~ * h * sc * 1) ~ sg (2
= Yo SQ
where, gO = ~ ~ oA~ ~ h'~ h ~ 80 ~ l. The same can be
expressed in Fourier domain as:
G(u, v) = (I o2 ~ H' H Sc L) ~ 59 (23)
Go ~ 5~

The above expression gives us the final output of
the focus~operator ~or any value of the defocus
F~ a_ It will be used in the following section
to determine the optimal ;ll nm; n~t; nn pattern.

W096/41304 P~~
21 96563 a~
o~tim;7At;on
In our impl At;~n, the ;~ m;n~t;on grid
projected on the object 22 using a high power light
aource and a t~1~r~ntric lens identical to the one used
to image the scene. This allowa ua to assume that the
projected ;11 rAt;~n ia the primary cause for surface
texture and is ~Lully~r than the natural texture of the
surface. Conse~uently our results are applicable not
only to textureless surfaces but also textured ones.
The ;11 n~t;on opt;m;z~ti~n problem is full lAt~ as
follows: Establish closed-form relationships between
the ;11 n~t;~n parameters (b~, byl tx, ty)l sensor
~a~ t~rs (wx, wy, ~x, Pyl ~x, ~y~/ and discrete
T~rlA~;An parameters (9~, qy) so as to maximize the
sensitivity, robustness, and spatial resolution of the
focus measure g(x,y). High sensitivity implies that a
small variation in the degree of focus results in a
large variation in g(x,y). This would ensure high
depth estimation accuracy in the presence of image
noise, i.e. high signal-to-noise ratio. By robustness
we mean that all pixel sensors 36 with the same degree
of defocus produce the same focus measure ;n~r~n~nt
of their ]o~At;~n on the image plane. This ensures
that depth estimation accuracy is invariant to lo~at;~n
on the image plane. ~astly, high spatial resolution is
achieved by m;n;m;7;ng the size of the focus operator
This ensures that rapid depth variations (surface
nt;n-l;ties) can be detected with high accuracy
In order to minimize smoûthing effects and
~-~;m; ~e spatial resolution of computed depth, the
support (or span) of the ~ r~t~ Laplacian must be as
small as possible This in turn requires the frequency
of the ;11 ;n~tion pattern be as high as possible.
However, the optical transfer function described in
section 4.2 imposes limits on the highest spatial
fre~uency that can be imaged by the optical system.

~ wo 96141304 r ~ ~ . j r~ ,s
2 1 9 6 5 6 3
~S
This m~imum allowable frequency i8 Af ~ determined

by the numerical aperture of the t~l~rPntric lens.
Since the ;11 'n~t;nn grid pattern is periodic, its
Fourier transform must be ~iscrete. It may have a
zero-fnpqu~nry _l~n~nt, but this can be safely
ignored since the T~rl~r;~n operator, being a sum of
second-order derivatives, will eventually remove any
zero-fnP~l~nry : _ -nt in the ~inal image. Our
objective then i8 to m ~imize the fnn~ ~l spatial
frequency tl/tX, l/ty) of the illnm;n~t;~n pattern. In
order to m~imize this freguency while r-;n~t~;n;ng high

~tert~h-l;ty, we must have ~(l/tx) + (l/t~ close to


the optical limit Af - This in turn pushes all

higher ~,~ A in the ill~~m;n~t;~n pattern outside
the optical limit. What we are left with is a surface
texture whose image has only the quadruple fnn~ ~l
fr~ nr;~r (~l/tx~ ~ l/ty)~ As a result, these are the
only frequencies we need consider in our analysis of
the focus measure fnnct;~n G(u,v).
Before we c~nR;~r the final measure G(u,v), we
examine Go(u~v) the focus measure prior to image
! ~ 1; ng For the reasons given above, the two-
dimensional Go (u, v) i5 reduced to four discrete spikes
at (l/tX, l/tyj, (l/tX, -l/ty), (-l/tX, l/ty) and (-l/tX,
-l/ty)~ Since all r ~ ~ (I, o, ~, 5~ and ~) of Go
~ are reflectio~ ~y trir about u = O and v = O, we
have:

~(--,--) = Go(--,---) = Go~---,--) = G~(--1,--1 ) (2
ts ty t3: ty ts ty s y

where

wo 96141304 2 1 9 6 5 6 3 ~ ~ ~ PCT~S9V07890


Go~ t ~t ) -- I(t ~t ibr7 by~ tr7tv) o2( t ~tl ;a~J) (25

( t ~ t; ~ a', f ~ H( t ~ t; ~, a~, f )

S'(t ~ t ;ws~ W9) L(t ~ t; q '59)'

Therefore, in frequency domain the _ocus measure
function prior to image ~ _l;ng reduce3 to:
Go(u~v) = G~(t ~ t ) (26)

{~(u--t ~v ~ t ) +~i(u + t ~u--t )

+~i(u----~v+--~+~5(u+ t ~v+ t )}

The fllnrt;~n gO(x,y) in image domain, is aimply the
inverse Pourier transform of Go(u,v):

go(I~y) = G~(t' t ) ~4cos2~ ~-cos2~t Yl (27

Note that gO(x,y) is the product of cogine fl~n~t;~n~
weighted by the co~ff;~;~nt Go(l/tx, l/ty)~ The defocus
function h has the effect of reducing the co~ff;ri~nt
Go(l/tX, l/ty) in the focus measure gO(x,y). Clearly,
the sensitivity of the focus measure to depth (or
defocus) is opt;m;7~ by m-~;m~7;ng the ~ff;~;~nt .
Gofl/tX, l/ty) with respect to the unknown parameters of
the system. This opt;m; 7~t;on procedure can be
summarized as:

~tr ~(tr'tv) ' ~tv ~(tr'tv) ' (2~)

ab G~(t ~ t ) = ~~ ~b G~(t--' t ) = ~~ (29)

- Go(-~-)=0~ - Go(-~-)=0~ (30)
~qr tr ty ~7qu tr tv

~ Wo96/4l304 2 1 9 6 5 6 3 t ~ 9 PC~U595~7890

Since tx, and ty show :p in all the ,lullenL8 in
~25), the first two partial dêrivatives (~ t;nn (28))
are ~;ff~rlllt to evaluate. Fortunately, the
clerivatives in (29) and (30) are sufficient to obtain
r~l~t;nn~ between the system ~ rS. The iollowing
rêsult --~;m~7~ sensitivity ~n~ gpAtial r~nlnt;nn of
the iocu8 measure g(x,y):
br = 2t~ ~Y = 2tY ~31)

qs = 2t-~ qY = 2tY (32,

Next, we examine the spatial robustnesa o~ g(x,y).
Imagine the imaged surface to be planar and parallel to
the image sensor. Then, we would like the image
r ,l;ng to produce the same absolute value of g(x,y)
at all ~ r~t~ sampling points on the image. This
entails relating the ;ll n~t; nn and sensing
p~ rg so as to f~;l; tAte careful sampling o_ the
product of coaine flln~t;nnq in (27). Note that the
final ~ocu~ measure is:

(2,y) = go ~0 = G~(t ,t)-{4cos2~t 2-cos2~t y} (33
-lIll(pl ~2 - yz)~ pl~V - Y~Y))

All samples of g(x,y) have the same ab601ute value when
the two cosines in the above êxpression are sampled at
their peak values. Such a sampling is pn~5;hl~ when:
pr = 2t'~ PY = 2tY (3~)

and

~ = 0, ~y = 0 (35)

W096/41304 2 1 9 6 5 6 3 ~ ' ~' " P~

Alternatively, the cosines can be sampled with a period
of ~r/2 and phase shift of ~T/4. This yields the second
solution:
pr = 4tr~ py = 4tv7 (36)

~r = +8tr~ Y~y = +8tV- (37)

The above er~uations give two solutions, shown in
Figures 4A and 4B both are r~k~.l,o~ d ;7l~l~;nAtinn
patterns but differ i~ their flln~ ~1 freriuencies,
size of the illumination cell, and the phase shift with
respect to the image sensor. Equations (31), (32),
~34), (35) yield the grid pattern shown in Fi~ure 4A.
In this case the grid image and detector are registered
with zero phase shift, and the image of the
;llllm;n~t;~n cell hag the same size and shape as the
sensor ~ (pixels). The second rnllltin~, shown
in Figure 4B, is nht~;nP~ using the e _linrJ solutions
(36) and (37), yieldirg a filter pattern with
;llllm;n~t;nn cell image two times the size of the
sensor element and phase shift of half the sensor
element size.

~uned FQCU8 O~erator
For the purpose of illl n~t;~n optimization, we
used the T~rl~r;~n op~r~tnr~ The resulting
~ ;n~t;nn pattern has only a single ' r~nt
absolute Lre~u~n~y, (l/tx, l/ty)~ Given this, we are in
a position to further refine our focus operator so as
to m;n;m;7e the effects of all other fr~r~ nr;~c caused
either by the physical texture of the scene or image
noise. To this end, let us consider the properties of
the 3x3 discrete Laplacian (see Figure 6A and 6B~. We
see that though the T.~pl ~r;~n does have peaks exactly
at (1/tx, l~ty), (l/tX, -l/ty), (-l/tX, l/ty) and (-l/tx,
-l/ty), it has a fairly broad bandwidth allowing other

WO 961413~14 2 1 9 6 5 6 3
a~
spurious frequencie3 to contribute to the focus measure
G in (23), as shown in Figure 6B. ~ere, we aeek a
narrow band operator wlth sharp peaks at the above four
coordinatea in frequency apace.
Given that the operator muat eventually be
discrete and of finite aupport, there ia a limit to the
extent to which it can be tuned. To constrain the
problem, we impose the following conditiona. (a) To
maximize apatial r~nll~t;nn in computed depth we force
the operator kernel to be 3x3. (b) Since the
f ' ~l frequency of the ;l1 'n~t;nn pattern has a
symmetric guadruple dLL~ly , the focus operator
must be rotAt;nn~lly symmetric. These two conditiona
force the operator to have the atructure shown in~5 ~igure 6~ ~ (c~ The operator must not re3pond to any DC
in image brightnesa. This last rnn~;t;nn ia
~At; ~; e~ if the aum of all Pl ~ of the operator
equals zero: ~
a + 4b + 4c = O (38)

It ia also imperative that the response Lfu,vJ of the~0 operator to the ~ ' ~1 frequency not be zero:

( 1 1 ) = a + 2b(cc~27rq.t + co~27rq~tJ )

Given (32), the above reduces to:
a - 4b + 4c ~ O . (40)

Expresaions (38) and (40) imply that b ~ 0. Without
loas of generality, we aet b = ~ ence, (38) gives a
4 (1 -C) . Therefore, the tuned operator is ~t~rmi n~
by a single unknown rA -t~r~ c~ ag ghown in Figure
6D. The problem then is to find c such that the
operator~s Fourier transform has a sharp peak at (1/ty~
1/~). A rough measure of sharpness is given by the



:; :

WO96141304 PCT~S95/07890 ~
2 1 965~3 , ~
3~
second-order moment of the powerl1~(u, v)211 with
respect to ( l~tX, l/ty):
L(~ 12 ¦ O ¦ O [(u _ ~ )2 + (v _ ~ )2] 1¦ L(u-t,v-~ du ( ~1)

7~ 201r2c2 + 6c2 + 48C -- 327r c + 201r2 -- 93)


The above measure is minimized when ~c = ~, i.e.

when c = O.658 as shown in Figure 6E. The resulting
tuned focus operator has the respon3e shown in Figure
6F, it has subst~nti~lly sharper peaks than the
di6crete T.~pl ~r; ~n . Given that the operator is 3x3 and
discrete, the sharpne33 of the peak3 is limited. The
above derivation brings to light the ~1 ' nl
di~~erence between ~P~ign;ng tuned operators in
~nt;nn~ and discrete domains. In general, an
operator that i3 deemed optimal in ~nnt; nn~ll~ domain is
most likely cub-optimal for discrete images.

DePth from Two Imaqes
Depth e3t; r-t; ~n uses two images of the scene I1(x,
y) and I2(x, y) that corre3pond to different e~fective
focal lengths as shown in Figure 3. Depth of each
scene point is ~Ptprm;no~ by e~t;r-t;nS the
~;~pl~1 a of the ~ocused plane If for the scene
point. The tuned focus operator is applied to both ~
images to get focus measure images g1(x, y) and g2fx,y).
From (33) we see that: ~

g1(Z y) G~(ttrl~J1~) (42)
92(~,Y) Gb(t~
From (23) we see that the only factor in G~ a~fected by
parameter a is de~0cu3 ~unction H.

~ WO96/41304 ~ 96563 ~ ~/US95/07890


91(2~y~ H(~t ~ s~ ) 43
92(:1:.y) H(~,, sl i'~--13) ( )
Note that the above measure is not bounded. Thi3 poses
a problem from a c _- ~t;nn~l viewpoint which is
easily L~ -~;P~ by using the following norm-1;7~t;nn:

91(2~Y)--g2t2~y) ~ )--H(sl ~ t~ d)
~71(2,y) + 92~2,y) ~(~ I) + H(~ - d)

As shown in Figure 7, g is a monotonic function of
such that -p c q ~ p, p ~ 1. In ~ractice, the above
relation can be pre- _ Pd and stored as a look-up
table that maps q _ e~ at each image point to a
unique ~. Since a represents the position of the
focused image, the lens law ~1) yields the depth d of
the ccrrp~pnn~; ng scene point. Note that the tuned
focus operator ~P~;gnPd in the previous section is a
linear filter. making it feasible to compute depth maps
of scenes in real-time using simple image processing
hardware.

~ Real Time Ranqe 5ensor
Based on ,the above results, we have ; _1~ P~
the real-time focus range sensor 25 shown in Figure 2.
The scene is imaged using a standard 12.5 mm Fujinon
lens 28 with an ~;t;nnAl l~eLLuL~ 26 added to convert
it to telecentric. Light rays passing through the lens
28 are split in two directions, using a beam-splitting
prism 30. This produces two images that are
simultaneously detected using two Sony XC-77-RR 8-bit
CCD cameras 32, 34. The positions of the two cameras
are precisely fixed such that one obtains a near-focus
image while the other a far-focus image. In this setup
a physical displ~ of 0.25mm between the effective
iocal lengths of the two CCD cameras translates to a
~ensor depth of iield o~ apprn~-tPly 30 cms. This
detectable range of the sensor can be varied elther by




. , ~

W096~4l304 _ PCT~S95/07890
2 1 9 fJ 5 ~ 3 3a~
changing the sensor displacement or the focal length of
the imaging optics.
The ;~ min~tion grid shown in Figure 4B waa
etched on a glass plate using microlithography, a
process widely used in VBSI. The grid 14 was then
placed in the path of a 300 W Xenon arc lamp. The
~llllm;n~t;nn pattern generated is projected using a
telecentric lens 16 ;A~ntir~1 to the one used for image
formation. A half-mirror 20 is used to ensure that the
;11 n~t;rn pattern projects onto the scene via the
same optical path 24 used to acquire images. As a
result, the pattern is almost perfectly registered with
respect to the pixels 36 of the two CCD cameras 32, 34.
Furthermore, the a~ove ~ _ ensures that every
scene point that i9 visible to the sensor is also
;1lnm;n~ted by it, avoiding shadows and thus
nArtert~hle regions.
Images from the two CCD cameras 32, 34 are
digitized and processed using MV200 Datacube image
processing hardware. The present configuration
includes the equivalent of two 8-bit digitizers, two
A/D convertors, and one 12-bit convolver. This
hardware enables simultaneous digitization of the two
images, convoIution of both images with the tuned focus
operator, and~the ~ ~t;rn of a 256x240 depth map,
all within a single frametime of 33 msec with a lag of
33 msec. A look-up table is used to map each pair of
focus measures (gl and g,) to a unique depth estimate d.
Alternatively, a 512x480 depth map can be computed at
the same rate if the two images are taken in
s~lrr~a~irn. Simultaneous image acquisition is clearly
advantageous since it makes the sensor less sensitive
to variations in both illnm;natirn and scene structure
between frames. With minor additions to the present
processing hardware, it is easy to obtain 512x480 depth
maps at 30 ~z usi~g simultaneous image grabbing. Depth
maps produced by the sensors 25 can be displayed as
wireframes at frame rate on a DEC Alpha workstation.

WO 96/4i304
- 2 1 9 6 5 6 3 ~3 ~ ! S r~., ~
V~riation ~n The Preferred ~n' ' ~.~
One v~r;~t;nn of the sensor 10 addresses the fact
that the defocus effect is a function of the chromatic
content of the~;lln~in~ting light. Most lenses have
slightly different focal length for different light
wavelengths, accordingly, the accuracy of determ;n~t;nn
of depth from defocus can vary with the spectral
characteristics of the ;11 n~t;nn and the color of
the reflecting surface of the object, since depth
~t~rm;n~t;nn relies on prior knowledge of the focal
length f of lens 28. This source of error can be
avoided by providing a spectral band-pass filter 38,
shown in Figure 2, to allow only certain ~ of
reflected light to be imaged. A band-pass filter would
limit the range of the wavelengths to be imaged and
thereby limit the chromatic variation in the focal
length of the~lens. Other possible locations for such
a filter are shown in Figure 8 at 38', 38'' and 38'''.
In the case where the ;11 n~t;nn gource 12 is a
laser, the filter is preferably narrow band, passing
the laser fre~uency and ~lim;n~t;ng most ambient light,
thereby both Pl;m;nAt;n~ the effects of chromatic
nh~ w~t;nr of the lens and texture variations from
ambient light, not resulting from the projected grid
pattern.
In multicolor scenes with no-overlapping spectral
characteristics, the pass-band of the spectral filter
may be changed or controlled to use an appropriate
pass-band to measure depth in different object areas.
For this purpose an electrically controllable filter,
or a filter wheel 101, shown in Figure 9 may be used.
In some instances objects to be mapped may include
surfaces or structures that provide sp~c~ r
r~f1ert;nn~ as well as diffuse reflections. SrPcn1~r
rPf1ent;nnq c~an produce negative effects in a depth
from defocus mèa~u~ ~ system. First, specular
r~flect;nn~ tend to saturate the image sensors 32, 34,
whereby focus and defocus information is lost. Second,

W096/41304 PCT~S95/07890
21 9~563 3 ~ '
the depth from defocus valuea derived from apecular
r~f1ert;rnq represent the depth of the r~fl~rt~
source, not the reflecting surface. Finally, if the
normal at a specular surface point does not bisect the
5 ;11 nAt;nn ~;r~rt;~n and the optical axis 24, then
the surface point will not produce reflections of the
;11 n~t;~n light in the direction of the sensor.
When required, polarization filters, as shown in
Figures lOA to lOD can be used to remove the ef~ects of
specular r~f1ect;~n from the senaor images. In Figure
lOA a polarizing filter 44 polarizes the ;11 ;n~t;~n
light in a vertical direction ;n~;rAtrd by arrowhead V.
Specular r~f1~rt;~n~ would therefore have primarily
vertical polarization and would be filtered by
hor;~nt~11y polarized filter 42 arranged to provide
horizontal polAr;~t;~n H in the sensor imaging 6ystem.
An ~lt~rn~t~, illustrated in Figure 10;3 uses a
vertically polarized laser source 45 which ia projected
onto grid 14 by lens 47 to provide vertically polarized
;11 n~t;nn A polarizing filter 42 protects the
imaging optics from 8pPr~ r reflections. Another
Al t~rn~t~ shown in Figure lOC uaes the polarizing
effect of a prism semi reflective beam splitter 46,
which causes vertically polarized ~11 n~t;~n V to be
r~fl~rt~ toward the object, but allows hor;7~n~t~11y
polarized re~1ections ~ to pass to the imaging optics.
A final aLl~l-y. of Figure~4D ahows a vertical
polarizer 48 ~ollowed by a quarter wave plate 50 to
produce circularly polarized light. Tl 1 ;n~t;~n light
passing through polarizer 48 becomes vertically
po1~r;~ and is converted to right-hand r; rrnl ~r
polarization by circular polarizer 50. Specular
reflections, which have left-hand circular polarization
are converted to h~r;7~nt~1 polarization by polarizer
50 and are filtered out by vertical polarizer_48.
Diffuse reflections include right-hand circular
polarized ~ IL ~nt~ that are converted to vertical

WO 96/41304 2 ~ 9 6 ~ 6 3 ~ r~

polarization by polarizer 50 and pass polarizer 48 to
the sensor system.
A5 described wlth respect to the preferred
embodiment, the ;~ min~tjrn patterns shown in~Figures
4A and 4B include sïngle f~ l spatial frequency
in the x and y coordinates, with harmonic fre~nr; ~R
outside the limits imposed by the optical transfer
fnnrt;rn It is, however, possible to uge ;ll r~t;rn
grid patterns that have multiple measurable spatial
freguencies within the limits of the optical transfer
fllnrt;rn. One such multiple frequency grid pattern is
shown in Figure llA, wherein two rh~r~rh~rd grids,
one with twice the spatial frequency of the other are
superimposed. The resulting sensing of the defocus
function, Figure llB, can be filtered in the frequency
domain by tuned filters to result in multiple tuned
focus operators that detect power variations for
~;ff~r~nt defocug frequencies on a pixel by pixel
basis, as shown in Figures llC and llD. The defocus
discr;m;n~t;rn functions g for sensitivity of depth
from defocus are shown in Figures llE and llF
respectively. The high/frequency defocus function
yields greater depth sensitivity, but reduced range.
The lower frequency defocus function yields lower depth
sensitivity, but increased range. Accordingly, using
the multiple frequency grid of Figure llA can provide
variable resolution depth from defocus.
Still other grid patterns are shown in Figures 12A
and 12C with their respective frequency responses in
Figures 12B and 12D respectively. The pattern of
Figure 12A has dual frequency response characteristics
similar to the pattern of Figure llA, but using a
different pattern aLL~ . The grid of Figure 12C
has ~;ff~r~nt freguency responses in the x and y
coordinates.




-- . .

W096/41304 2 1 9 6 5 6 3 ~ ' PCT~S95107890

3~ =
A~erture Variation
The apparatu6 and method described thus far uses
two sensor images taken at ~iffer~nt distances from the
imaging lens to generate ~1ff~r~nt amounts of defocus
in the images. It in also possible to provide images
with different defocus by using different aperture
sizes or shapes for the two images, which are formed on
substlnt;~l1y the same image plane with respect to the
imaging lens. It is well r~~ogn;7~, for example, that
a small ~eLLuL~ opening a' will cause less de~ocus
effect in an image than a large aperture opening.
One approach to aperture variation is to use the
~pp~r~tlln of Figure 2, ~l~m;n~t;ng beam splitter 30~a~nd
sensor array 32. A first image of an object 22 is
formed on sensor array 34 with a first setting of
a~_LLuLe 26 and a second image is se~l~nt;~lly formed
using a dif~erent setting of aperture 26. Preferably a
~ero density filter is used with the larger aperture
setting to ~ ~ r~t~ for the greater amount of light.
The v~r;pt;nn in the defocus factor between the two
~peLLuL~ settings can then be used to ~t~r~ir~ depth
of an image element by defocus.
Another approach, shown in Figure 13 provides a
beam splitter 57 which is ~t~rn~l to a pair of sensor
units 60, 62. ~nits 60, 62 have j~nt~n~l nensor
arrays 64, 66 and lenses 68, 70. Unit 60 has a small
aperture opening 72 while unit 62 has a large aperture
opening 74 and a neutral density filter 76 to
compensate for the increased light from the larger
aperture Alt~rn~t~ly, in either a sequential or
~;mnlt~n~nl-n ~eLLuL~ based aLL~n~ , two apertures
having similar transparent area size but different
shape, such as d~e~LuL~s llO~and 112, shown in Figure
14, can be used. The difference in aperture shape
changes the optical transfer~iunction, i.e., depth of
focus, while r-;nt~;n;ng the same image brightness

WO 96/41304 2 1 9 6 5 6 3 - - - PCI~US9~07890
37
RP~; AtratiOn _ - -
While those skilled in the art will recognize that
~l;h~t;nn of the system 10 of Figure 2 can be
achieved by aligning the aensor arrays 32, 34 with the
image grid as projected onto a plane surface located at
the field focal plane of the sensor, it i8 also
possible to c ~ te for mis-alignment in the
, _ ~t;nn of the defocus function. Misregistration
of an ;ll n~t;nr pattern of the type shown in Figure
4B with respect to a sensor array is shown in Figure
15, wherein the ;ll ;n~t;on pattern is mis-registered
by an amount ~x and ~y from the nor~al phase offset
value ~ ~htX and ~ 'hty given by P~l~tinn (37). In this
case the output of each sensor element in the
misaligned sensor will have an error of
C08 ~-/)X C08 ~ y
which will cause a depth map error.
It is pn~;hle to compensate for this alignment
error by applying an additional operator to the
convolved image, taking the sum of squared data of the
convolved image at four adjacent Pl ~ which
correspond to the phase shift of (~x, ~y) = (~, ~), (~,
~2) ~ (~/2,0) and (~ ). This results in a power
mea~ul, ' that can be directly applied to a power
look up table for defocus, or can be modified by the
square root function before being applied to the
norm ~ t;nn or look-up table.
In the case of a one ~ inn~l pattern (stripes)
it is only nP~P~y to apply the above procedure to
two adjacent element points in the direction transverse
to the pattern stripes.
It is also possible to numerically construct two
tuned operators which produce focus measure data whose
phase~ diffe~r by ~/, (sine and cosine). In the case of
the two dimensional pattern, it is likewise possible to
numerically construct four tuned operators which
produce focus measure data whose phases differ (~ y)
= (~I ~) I (~~ ~/2), (~/2,0) and (~/21 ~/~) . These

WO 96/41304 2 1 9 6 5 6 ~ ~ ' ;. 'j i ! P~ Y:~/U~V~C
38
convolved images can be 'in~fl to calculate the sum
of squares at positions C~Ll~ fl; ng to the image
~l~ q to get a focus measure that is illd~ldeu~ of
~l;,; phase in either one or two fl;-- q;~nal grid
patterns.

Concurre~t Imaainq
In some ~ppl;c~t;~nc it is desirable to have both
a depth map and a br;qhtn~ss image of a scene. In this
respect the images used to compute depth from defocus
can be used to computat;~n~l1y reconstruct a normal
br;ghtn~qq image by removing the spatial frequencies
~o~;~ted with the projected illnm;n~t;nn This can
be achieved using a aimple convolution operation to
yield an image under aobient ;llllm;nation~ Further,
since the depth of each image point is known, a de-
hlllrr~ng operation, which can also be ~ ed as a
convolution, can be applied to the brightness image
that has the highest degree of focus at all point3. In
the case of coaxial ;ll ;n~t;~n and imaging, the
~ fl focused hr;3htn~qq image is registered with
the ~ d depth map and may be stored in a suitable
memory. This enables the use of, not only fast texture
mapping, but also the joint recovery of geometric and
photometric scene properties for visual processing,
such as object recognition. T_ree-fl; R;~n~l texture
maps may be displayed as wireframes at frame rate on a
bitmapped workstation.
Figure 16 shows an aLL_ _ ' for separate
detection of hr;ghtn~qq images in a televi3ion camera
80 and depth by sensor 25, which may be the '~fl;-
of Fiaure 2. In thi8 ~rr~n. various filter or
8P~l~n~;ng techniques may be used to remove the effect
of the ;11 ;n~t;~n pattern in the brightness image.
For example, beam splitter 82 may be formed as a
selective r~fl~tnr allowinq frequencies corresponding
to the ;ll llm; n~ti~r pattern to pass to depth sensor 25
and r~ t; ng other light to camera 80.

WO96/41304 2 1 965 63 '; ~ PCr/US95~07890

3q
~lt~rn~tively, filter 84 can be ~ ~d to aelectively
absorb the ;11 n~tion frequency and pass other
fr~q~rnri~, while filter 86 passes the illumination
f~equency to de~pth aensor 25. Such filtering ia
especially practical in the case narrow band, e.g.
laaer, pattern ;11 n~t;nr.
An alternate to uaing a tranamission grid acreen,
as shown in Figures 4A and 4B is to uae a phaae pattern
grid, wherein there ia provided a r~rrk~rhn~rd grid of
rectangular rl~ r, with tranamiaaion phaae shifted
by 90~ in alternate grid ~1~ R aa ahown in Figure
17A. This "phase grid" providea a pro;ected pattern of
alt~rnntinr; conatructive and deatructive interference
as ahown in Figure 17B and reaulta in a freriuency
domain pattern Figure 17C, that can be analyzed by
tuned filter convolution to provide alternate separate
~requency responses for defocus analysis aa ahown in
Figures 17D and 17E. The advantage of a phase ahift
grid is that there is little 1088 of energy from the
grid ;11 n~t;nn aa c , ~d to the tranr~;r8ion grid
pattern.
In nnnn~rt;nn with the provision of an illllm;n~t~d
grid pattern, as noted above, a laaer ia the preferred
source for several reasons, ;nr~ ;ng (1) narrow band
;llllm;n~t;on, providing ease of filtering and absence
of ~h.l t;C aberration in ~tected imagea, (2) better
control of aurfaces, including lens, filter and mirror
coatings for single frequency light, (3~ polarized
light without loss of energy, and (4) bright, and
controllable light source using low power.
Figure 18 is a flow diagram showing the
determ;n~t;nn of the image depth map from the image
information received in sensor arrays 32 and 34. The
image sensor data is converted to digital format, and
then convolved in ac~ ~.c~ with the methods deacribed
herein, to reault in a determ;n~t;nn of the defocua
measures for each element of each image. Optionally
registration correction, as described above, can be

W09~41304
2 1 9 6 5 6 3 ~,~ ' i '
performed in the process of arriving at defocus
measures gO and gl. The defocus measures are then
,_ ' ;n~A in a point-by-point manner to ~t~rm;n~ the
n~rr~ od relative blur of the two images and, using
computation or a look-up table ~t~rm;n~ depth of the
object on a point-by-point basis, resulting in the
desired depth map.
Further, while some e '~~ q of the invention
;n~ t~ simultaneous generation of images, depending
on the dynamics of the appl;c~t;~n it should be
rP~ogn;7~ that the invention can be pr~ct;r~fl with
se~l~nt;~77y formed images, wherein the image spacing,
lens position and/or ~eLLul~ are varied between
image6, but the object position remains constant.
While we have described what we believe to be the
preferred : ' ~fl; ' q of the ;nvention, those skilled
in the art will r~cmgn; 7~ that other and further
changes and , ~;fic~ nq can be made thereto without
departing from the spirit of the invention, and it is
;nt~n~ to claim all such changes as fall within the
true scope of~the invention.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(86) PCT Filing Date 1995-06-07
(87) PCT Publication Date 1996-12-19
(85) National Entry 1997-01-31
Examination Requested 2002-04-19
Dead Application 2006-06-07

Abandonment History

Abandonment Date Reason Reinstatement Date
2005-06-07 FAILURE TO PAY APPLICATION MAINTENANCE FEE
2005-10-04 R30(2) - Failure to Respond
2005-10-04 R29 - Failure to Respond

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $0.00 1997-01-31
Registration of a document - section 124 $0.00 1997-04-24
Maintenance Fee - Application - New Act 2 1997-06-09 $50.00 1997-05-30
Maintenance Fee - Application - New Act 3 1998-06-08 $50.00 1998-06-08
Maintenance Fee - Application - New Act 4 1999-06-07 $50.00 1999-06-03
Maintenance Fee - Application - New Act 5 2000-06-07 $75.00 2000-05-24
Maintenance Fee - Application - New Act 6 2001-06-07 $75.00 2001-05-18
Request for Examination $400.00 2002-04-19
Maintenance Fee - Application - New Act 7 2002-06-07 $150.00 2002-05-28
Maintenance Fee - Application - New Act 8 2003-06-09 $150.00 2003-05-27
Maintenance Fee - Application - New Act 9 2004-06-07 $200.00 2004-06-04
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
THE TRUSTEES OF COLUMBIA UNIVERSITY
Past Owners on Record
NAYAR, SHREE K.
NOGUCHI, MINORI
WANTANABE, MASAHIRO
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Cover Page 1995-06-07 1 14
Abstract 1995-06-07 1 32
Claims 1995-06-07 12 319
Representative Drawing 1997-06-11 1 5
Description 1995-06-07 40 1,291
Drawings 1995-06-07 15 197
Cover Page 1998-06-04 1 14
Fees 1997-05-30 1 38
Assignment 1997-01-31 10 416
PCT 1997-01-31 2 107
Prosecution-Amendment 2002-04-19 1 39
Prosecution-Amendment 2002-07-22 1 37
Fees 2003-05-27 1 36
Fees 2002-05-28 1 32
Fees 2000-05-24 1 35
Fees 1998-06-08 1 41
Fees 2001-05-18 1 33
Fees 1999-06-03 1 33
Fees 2004-06-04 1 35
Prosecution-Amendment 2005-04-04 3 81