Note: Descriptions are shown in the official language in which they were submitted.
CA 02337530 2007-02-22
APPARATUS AND METHOD FOR REAL-TIME VOLUME
PROCESSING AND UNIVERSAL 3D RENDERING
10
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates generally to three-dimensional (3D) graphics and
volume visualization, and more particularly relates to an apparatus and method
for
real time volume processing and universal three-dimensional rendering.
Description of the Prior ArtComputer rendering is the process of transforming
complex information into a
format which is comprehensible to human thought, while maintaining the
integrity
and accuracy of the information. Volumetric data, which consists of
information
relating to three-dimensional phenomena, is one species of complex information
that
can benefit from improved image rendering techniques. The process of
presenting
volumetric data, from a given viewpoint, is commonly referred to as volume
rendering.
CA 02337530 2001-01-11
WO 00/04505 PC7'/t1S99/16038
2
Volume visualization is a vital technology in the interpretation of the great
amounts of volurneb ic data generated by acquisition devices (e.g., biomedical
scanners), by supercomputer simulations, or by synthesizing geometric models
using
volume graphics teclmiques. Of particular importance for manipulation and
display of
volumetric objects are the interactive change of projection and rendering
parameters,
real-time display rate,s, and in many cases, the possibility to view changes
of a
dynamic dataset over time, called four-dimensional (4D) visualization (i.e.,
spatial-temporal), as in the emerging integrated acquisition visualization
systems.
A vollunetric dataset is commonly represented as a 3D grid of volume
elements (voxels), often stored as a full 31) raster (i.e., volume buffer)
ofvoxels.
Volume rendering is one of the most conunon techniques for visualizing the 3D
scalar
field of a corYtiiiuous object or phenomenon represented by voxels at the grid
points of
the volume dataset, &id can be accomplished using hvo primary methods: object-
order
methods and image-order methods. Using an object-order approach, the
contribution
of each voxel to the screen pixels is calculated, and the combined
contribution yields
the final image. Using an image-order approach, sight rays are cast from
screen pixels
through the volume dataset, and contributions of voxels along these sight rays
are
used to evaluate the corresponding pixel values.
Over the past three decades graphics systems have evolved duofold: from
primarily two-dimensional (2D) to 3D and 4D (space and time), and from vector
graphics to raster graphics, where the vector has been replaced by the polygon
as the
basic graphics primitive. This has led to the proliferation of polygon-based
geometry
engines, optimized to display millions of triangles per second. In such
systems,
however, triangle facets only approximate i:he shape of bjects. Still, the 3D
polygon-based graphics market continues to boom, and has become one of the
hottest
arenas of the personal computer (PC) industry.
In response to emerging demands pliaced on traditional graphics systems,
various techniques have been devised to hairidle and display discrete imagery
in order
CA 02337530 2001-01-11
WO 00/04505 PC'T/[JS99/16038
3
to enhance visual realism of the geometric model, as well as enhance or
replace object
shape and structure. Yunong these techniques include 21y texture and photo
mapping,
enviro ent rraappiug, range images for image-based reridering, 2D mip-mapping,
video streams, 3D volrjrnes, 3D mip-mapping, 4D light fields and lumigraphs,
and
five-dimensional (5D) plenoptic fumetiorls. All these teciiniques require some
sort of
dimensionality-based interpolation (bilinear, trilinear, quadrilinear, etc.)
between
discrete pixels, texels, voxels, or n-oxels.
Special purpose computer architectures and methods for volume visualization
are known in the art. Traditional methods of volume visualization typically
operate
by sc ing through a volume dataset in a sequential manner in order to provide
an
accurate represeritatiori of an object. For example, Ciibe-4, an architecture
developed
by Dr. Arie Ka.ufanau, Ingmar Bitter and Dr. Hanspeter Pfister, some of whom
are also
named inventors in the present application, is a special purpose scalable
volume
rendering architecture'based on slice-parallel ray-casting. Cube-4 is capable
of
delivering true real-tirr.ie ray-casting of high. resolution datasets (e.g.,
10241 16-bit
voxels at 30 Hertz franie rate). However, Cube-4 cannot deliver such real-time
performance for perspective projections. Presently, in known prior art
rendering
systems, the use of perspective projections either increases the rendering
time or
decreases the projected, irnage quality. Additionally, prior architectures do
not
provide the ability to combine volumes and geometries into a si-ilgle image.
Referring now to Figure 1, a conventional volume visualization system 1 is
shown. As illustrated in Figure 1, the volun,rie data is stored on a disk 2
and loaded
into memory 4 before renderirZg. A Central Processing Ullit (CPU) 6 then
computes
the volume rendered iniage from the data residing in rne.m.ory 4. The final
image is
written to a frame buffer 8, which is typically embedded on a graphics card,
for
displaying on a monitor 9 or similar display device.
The present invention, therefore, is intended to provide a method and
apparatus which significantly enhances the capabilities of known methods and
CA 02337530 2001-01-11
WO 00/04505 PCT'/LJS99/16038
4
apparatus to the extent that it can be considered a new generation of imaging
data
processing.
Other and ftu-dier objects will be made lcnown to the artisan as a result of
the
present disclosure, and it is intended to include all such objects which are
realized as a
result of the disclosed invetltion.
SU ARY OF 7.'HE I EN7rI N
The present iraiienti n is tantamount to a depar"cure from the prior art
because
of the all-enc mpassing new characteristics,. An apparatus, in accordance with
the
present invention, for i eal-tirne volume processing and universal three-
dimensional
(3I3) renderirig includes one or more three-dimensional (3D) memory units; at
least a
first pixel bus; one or ln re reudering pipelines; one or more geometry
busses; and a
control unit. The apparatus is responsive tc, viewing and processing
parameters which
define a viewpoint, aud the apparatus generates a 3D volume projection image
from
the viewpoint. The projected image includes a plurality of pixels.
The 3D memoiy units store a plurality of discrete voxels, each of the voxels
having a location and ir xel data associated therewith. The voxels together f
rsn a
volume dataset, and th-. viewing and processing parameters define at least one
face of
the vulume dataset as the base plane of the volume dataset as well as first
and last
processing slices of the volume dataset. The control unit initially designates
the frst
processing slice as a cizrrerat slice of sample points, and controls sweeping
through
subsequent slices of the volume dataset as current slices until the last
processing slice
is reached.
Each of the plurality frerldering pipelines is vertically coupled to both a
corresponding one of the plurality of 3D memory units and the at least first
pixel bus,
and each of the rendering pipelines has global horizontal communication
preferably
CA 02337530 2001-01-11
W 00/04505 PC't'/IJS99/16038
~
with at most its two nearest neighbors. The rendering pipelines receive voxel
data
from the corresponding 3D memory units and generate a two-dimensional (2D)
base
plane image aligned with a face of the volume dataset. The geometry I/ bus
provides global horizontal co unication 1'cietween the plurality of rendering
pipelines and a geometry engine, and the geometry I/C) bus enables the
rendering of
geometric and volumetric objects together in a single image.
The apparatus and methods of the present invention surpass existing 3D
volume visualization architectures and nletllods, not only in terms of
enhanced
perforrnance, image rendering quality, flexibility and simplicity, but in
terms of the
ability to combine both volumes and surfaces (particularly translucezit) in a
single
image. The present invention provides flexible, high quality, true real-time
volume
rendering from ar"bitrary viewing directions, control of reiadering and
projection
parameters, and mechanisms for visualizing internal and surface structures of
high-
resolution datasets. lt : er supports a variety of volume rendering
enhancements,
including accurate perspective projection, niulti-resolution volumes, multiple
overlapping volumes, clipping, improved gradient calculation, depth cuing,
haze,
super-sampling, anisotropic datasets and rendering of large volumes.
The present invention is more than a mere volume rendering machine; it is a
high-perforinance interpolation engine, and as such, it provides hardware
support for
high-resolution volume rendering and acceleration of discrete imagery
operations that
are highly dependent on interpolation, including 2D and 3D texture mapping
(with
mip-mapping) and image-based rendering. p urtherrnore, the apparatus and
methods
of the present invention, coupled with a geometry engine, combine volumetric
and
geometric approaches, allowing users to efficiently model and render complex
scenes
containing traditional geometric primitives (e.g., polygonal facets), images
and
volumes together in a single image (defined as universal 3D rendering).
The apparatus cof the present invention additionally provides enhanced system
flexibility by including various global and local feedback connections, which
adds the
CA 02337530 2001-01-11
WO 00/04505 PCT'fUS99/16038
6
ability to reconfigure the pipeline stages to perforrn advanced imagery
operations,
such as imaging vrarping; and multi-resolution volume processing.
.pnrthermore, the
present invention accomplishes these objectives in a cost-e:laective manner.
It is an advantage of the present invention to provide a method and apparatus
dvhich, when coupled to a geometry engine, enables the mixing of geometries
and
volumes together in a single image.
It is another advantage of the present invention 'to provide a method and
apparatus for efficientlmr rendering multiple overlappin,g volumetric objects.
It is also an adz~=antage of the present invention to provide an apparatus
having
the ability to reconfigure selective pipeline stages to perforn advanced
volume
rendering functionalities.
It is a further advantage of the present invention to provide a method and
apparatus which supports enhanced imaging operations that are highly dependent
on
interpolation.
It is still afurther advantage of the present invention to provide a method
and
apparatus which accelerate the processing of large volurne da.tasets,
It is another advantage of the present invention to provide either higher-
quality
or faster perspective projections than existing special p-urpose harchvare
volume
2$'s visuahzat2on systen7.s.
It is still another advantage of the present invention to provide amet.hod and
apparatus which are readily capable of irnproving the quality of images with
enhancements to gradient estimation and interpolation units.
CA 02337530 2001-01-11
WO 00/04505 PCT/US99/16038
7
It is yet another advantage of the present invention to provide an apparatus
which is capable of implenzenting image vvarping in hardware.
It is yet a further advantage f the present invention to provide a method and
apparatus which verccirne the inherent disadvantages of known volume
visualization
systems.
These and ethe- features and advantages of the present invention will become
apparent from the following detailed description of illustrative embodiments
thereof,
which is to be read in connection with the accompanying drawings.
B EF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a conventional volume visualization system.
Figure 2 is a conceptual block diagram illustrating a universal three-
dimensional rendering system formed in accordance vrith one embodiment of the
present invention.
Figure 3 is a simplified block diagram of the Cube-5 unit of Figure 2
illustrating a preferrecl implementation of the present invention.
Figure 4 is a fia.ncti nal block diagram depicting an vervieiYv of the
universal
three-dimensional rer.LderYng architecture foraned in accordance with one
embodiment
of the present invention.
Figure 5 is a 9:uncti nal block diag3ralm illustrating a unit rendering
pipeline
f mied in accordance with one embodiment of the present invention.
CA 02337530 2001-01-11
WO 00/04505 PC'r/tJS99/16038
8
Figure 6A is a graphical representation showing how 32 bits of texel data is
stored for 2x2 neighbo:rhood in a miniblock of 16-bit voxels, in accordance
with a
preferred method of the present invention.
Figure 6B depicts a tabular coznparison of voxel storage and texel storage for
the example of Figure tiA.
Figure 7 illustrates special parallel preserving scanlines in the source and
target images in accordance with a preferred forward irnage warping method of
the
present invention.
Figure 8 is a graphical representation illustrating a method for deterrriining
the
parallel preserving scasiline direction in accordance with the preferred
forward image
warping method of the present invention.
Figure 9 is two=-dirnensional graphical representati on of an exatnple
illustrating
pixel read templates in the source image for performing scanline processing,
in
accordance with the preferred forward image warping method of the present
invention.
Figure 10 is two-dimensional graphical representadtion of the example of
Figure 9 illustrating a bilinear interpolation of samples for performing
scanline
processing, in accordasice with the preferred forward image warping method of
the
present invention.
Figure 11 is tw -d1F'I1er1s1onal graphical representation of a linear
interpolation
on samples to obtain pixel values for performing target pixel correction, in
accordance
with a preferred method of the present inverityon.
CA 02337530 2001-01-11
WO 00/04505 1FC"T/IJS99/16038
Figure 12 is a gr6Lphical represeritatior illustrating the calculation of an
anisotropic filter footpriiat for perforta.ing antialiasing in accordance with
a preferred
forward image warping rnethod of the present inverition.
Figure 13 is a graphical representation illustrating the splatting of so-arce
pixels onto the target samples, in accordance with the preferred forward image
warping method of the present invention.
Figure 14 depicts an example of a,y-slice shear for perfor.rnirig three-
dimensional rotation by, two-dimensional slice shear decomposition, in
accordance
with one method of the present invention.
Figure 15 depicts an ex ple of an x-beam shear fbr perforrning three-
dimensional rotation by two-dimensional beain shear decomposition, in
accordance
with another method of the present invention.
Figure 16 depicts an example of an x-slice-y-bea.rri shear for perforrraing
three-
dimensional rotation by two-dimensional slice-beam shee:ly decomposition, in
accordance with still another method of the present irrverition.
Figure 17 depicts an example of a three-dimensional x-beam shear for
performing three-dim,,-nsional rotation by three-dimensional beam shear
decomposition, in accordance with yet another method of the present invention.
Figure 1SA iLustrates a conventional undersampling method for performing
perspective projections.
Figure 18B illustrates a conventiojaal oversampling method for performing
perspective projections.
CA 02337530 2001-01-11
WO 00/04505 PCT/US99/16038
Figure 19 illustrELtes an adaptive perspective ray-casting method for
performing perspective volumetric projections, in accordance with. a preferred
forrn of
the present invention, w'.aerein a view fr'zstoni is divided into regions
based on
exponentially increasing distance from a vievvpoint.
5
Figure 20A is a graphical representation illustrating the splitting/merging of
rays at exponential regic-n boundaries, in accordance with the preferred
adaptive
perspective ray-casting tnethod of the present invention.
10 Figure 20B is a graphical representation illustrating the effective filter
weights
for ray segments A, B aad C of the adaptive perspective ray-casting method
example
of Figure 20A.
Figure 21 illustrates an exaimple of the weights for a two-dimensional filter
of
115 size +2 samples, in accordance with a preferred fonn of the present
invention.
Figure 22 is a graphicai representation illustrating an ex ple of a of the
adaptive perspective ray-casting method of the present invention, wherein a 73
volume
is three voxel units frorn the viewpoint.
Figure 23 is a pseudo-code representation of a preferred method for
perfornning Eacponen.tia:l-It.egions Perspective back-to-front projection of a
volume, in
accordance with one fo rrra of the present invention.
Figure 24 illust ates an example of the Exponen.tia:l-Regions Perspective ray
casting method of the feresent invention across two regions.
Figure 25 depicts an example of the preferred weights for perforrning a 33
symmetric approxinlati.on of the x-component of a Sobel gradient filter, in
accordance
with one embodiment of the present invention.
CA 02337530 2001-01-11
WO 00/04505 PC'Y'/1JS99/16038
Figure 26 is a graphical representation illustrating a method for mixing
geognetric objects and volumes in a single irriage in accordance with one form
of the
present invention.
Figure 27 is a g.raphical representation of a method for clipping triangles to
thin slab boundaries in accordance with one form of the present invention.
Figure 28 is a graphical representation of a method for bucket sorting
translucent polygons iri accordance with a preferred forrn of the present
invention.
Figure 29 is a graphical representation of a method, in accordance with one
forrn of the present invention, for creating slzeared viewing geometry by pre-
warping
the polygon footprints,
Figure 30 is a graphical representation of a Cube-5 pipeline, forrned in
accordance with one fornn of the present inveo.tion, illustrating an SRAM
composite
buffer included tlrereb.i.
Figure 31 is a graphical representation of a conventional graphics
accelerator,
conceptczally illustrating the interfaces betweerb the textuxe memory, frame
buffer and
geometry pipelirie.
Figure 32 is a graphical representation illustrating one embodiment of the
present invention errA.ploying a dual-use DRAM frame buffer connecting a
geometry
pipeline with the Cube-5 volume rendering pipeline of the present invention.
Figure 33 is ~L block diagrarri illustrating memory interfaces for each Cube-5
pipeline including a coxel FIFO queue, in. accordance with one forin of the
present
invention.
CA 02337530 2001-01-11
WO 00/04505 PCT/t(JS99/16038
12
Figure 34 is agTaplrical representation of a RCs13U coxel layout onto eight
DRAM chips, formed in accordance with a preferred embodiment of the present
invention.
Figure 35 is a partial block diagram representation of an embedded DRAM chip
implementation of run-length encodin;~ (RLE) frame buffer hardware, forr~n.ed
in:-
accordance with one farrn of the present invention.
Figure 36 is a;pseardo-code representation showirig processing occurring in
the
RLE hardware of F'igiare 35, in accordance with one form of the present
invention.
Figure 37 is a graphical representation of a preferred embodiment of the
present invention illu9trating a RLE frame buffer conl-iectin.g a geometry
pipeline to an
SRAM compositing buffer included in the Cube-5 pipelnne.
Figure 38 illustrates a density profile of an oriented box filter taken along
a
line from the center of a solid primitive outward, perpendicular to the
surface, in
accordance with one forrn of the present invention,
Figure 39 illustrates a density profile of an oriented box filter taken along
a
line perpendicular to a triangle surface primitive, in accordance with another
fornn of
the present invention.
Figure 40 depicts a two-dimensional illustration of seven voxelization regions
for a triangle primitive, in accordance with a preferred embodiment of the
present
invention.
Figure 41 is a pseudo-code representation of a naethod for computing the
distance from a plane, in accordance witYc one form of the present invention.
34)
CA 02337530 2007-02-22
13
Figure 42 is a block diagram representation illustrating an overview of a
hardware voxelization pipeline, formed in accordance with one embodiment of
the
present invention.
Figure 43 is a block diagram depicting a distance unit which incrementally
computes the distance of a current voxel from a desired plane, in accordance
with one
form of the present invention.
Figure 44 is a top view graphical representation illustrating a preferred
method
for performing image-based rendering in accordance with one form of the
present
invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The apparatus and methods of the present invention are capable of processing
data and supporting real-time visualization of high resolution voxel-based
data sets.
The present invention is a universal three-dimensional (3D) rendering system
delivering enhanced volume rendering in addition to the integration of imagery
(e.g.,
volumes, textures and images) with geometry (e.g., polygons). The apparatus
and
methods are designed for use as a voxel-based system as described in the
issued
patents and pending applications of Dr. Arie Kaufinan, a named inventor in
this
application, including "Method of Converting Continuous Three-Dimensional
Geometrical Representations Into Discrete Three-Dimensional Voxel-Based
Representations Within a Three-Dimensional Voxel-Based System", which issued
on
August 6, 1991, as U.S. Patent No. 5,038,302; "Method of Converting Continuous
Three-Dimensional Geometrical Representations of Polygonal Objects Into
Discrete
Three-Dimensional Voxel-Based Representations Thereof Within a Three-
Dimensional Voxel-Based System", which issued on January 22, 1991, as U.S.
Patent
No. 4,987,554; "Method and Apparatus for Storing, Accessing, and Processing
Voxel-Based Data", which issued on January 15, 1991, as U.S. Patent No.
4,985,856;
CA 02337530 2007-02-22
14
"Method and Apparatus for Generating Arbitrary Projections of Three-
Dimensional
Voxel-Based Data", which issued on March 31, 1992 as U.S. Patent No.
5,101,475;
"Method and Apparatus for Real-Time Volume Rendering From An Arbitrary
Viewing Direction", which issued on August 6, 1996, as U.S. Patent No.
5,544,283;
"Method and Apparatus For Generating Realistic tmages Using a Discrete
Representation", which issued on August 15, 1995, as U.S. Patent No.
5,442,733; and
"Apparatus and Method for Parallel and Perspective Real-Time Volume
Visualization", which issued on December 8, 1998, as U.S. Patent No.
5,847,711.
Figure 2 illustrates a conceptual view of a universal 3D rendering system 10
formed in accordance with one embodiment of the present invention.
Applications 12
which display.collections of renderable objects are preferably split by an
Applications
Program Interface (API) 14 into appropriate imagery and geometry
representations.
These representations are subsequently processed by an imagery unit 16 and a
geometry unit 18, respectively, which are illustrated generaIly as functional
blocks.
The imagery unit 16 preferably includes a plurality of imagery pipelines and
the
geometry unit 18preferably includes a plurality of geometry pipelines (not
shown) for
rendering the imagery and geometry representations, respectively. The rendered
outputs of the imagery unit 16 and the geometry unit 18 are subsequently
combined in
a blending unit 20 to generate a single baseplane image. This baseplane image
may
preferably be transformed by a warp unit 22 to a final projection plane for
display.
Figure 3 illustrates one implementation of the Cube-5 volume visualization
system of the present invention. As shown in Figure 3, the system preferably
includes
one or more three-dimensional memory units 24, with each 3D memory unit 24
vertically coupled to an input bus 26 and a corresponding Cube-5 chip 28. A
plurality
of Cube-5 chips 28 are shown connected to a frame buffer pixel bus 34.
Furthermore
CA 02337530 2001-01-11
W 00/04505 PC'I'/US99/1603$
the system 10 of the present invention preferably interfaces to at least one
conventional geometry engine 30 and a host computer 32, both operatively
coupled
between the input bus 25 and the frame buffer pixel bus 34 for communicating
with
the Cube-5 apparatus of the present invention.
5
Referring now tD Figure 4, the apparatus of the present invention 10 includes
a.-
plurality of 3D memory units 24 which are preferably connected to an imagery
input
bus 26, providing global horizontal communication between the 3D memory units
24.
The volume dataset is commonly represented as a regular grid of volume
elements, or
10 voxels, often stored as a full 3D raster (i.e., volume buffer). This volume
dataset is
preferably distributed across the 3D memory units 24. With a skewed
distribution, the
present invention allo-vrs conflict-free access to complete beams (i.e., rows)
of voxels
parallel to any of the niajor axes, thereby reducing the memory-processor
bandwidth
bottleneck. As illustraled in Figure 4, for s treaming video or four-
dimensional (4D)
15 volume data through the system 10, each 3D memory unit 24 is preferably
connected
to a dedicated real-tirr e input 36. By providing a dedicated connection to a
real-time
input source, the na.err.ory-processor bandwidth bottleneck is further
reduced.
The universal 3D rendering system. 10 of the present invention further
includes
a plurality ofrenderirig pipelines, shown as functional blocks of Cube-5 units
38 in
Figure 4. Each rendering pipeline 38 is connected to a corresponding 3D memory
unit 24 and preferably has horizontal communication with at least preferably
its two
nearest neighbors. The Cube-5 units 38 read from their dedicated 3D memory
units
24 and produce a two-dimensional (2D) baseplane irnage. This baseplane image,
which contains a plurality of composited pixels generated by the Cube-5 units
38, is
preferably distributed across a plurality o:E'two-dimensional (2D) memory
units 40.
Each of the plurality of 2D memory units 40 is preferably connected to both a
corresponding Cube-5 pipeline unit 38 arid a baseplane pixel bus 42 which
provides
global Yao ntal communication between 2D rnemory units 40.
Preferably, the present invention includes a plurality of warp units 44
connected to the baseplane pixel bus 42. The warp units 44 assemble and
transforrn
CA 02337530 2001-01-11
W 00/04505 PC'1'/YJS99/16038
16
(i.e., warp) the baseplane image stored in the plurality of 2D memory units 40
onto a
user-defined image plane. Although the present invention contemplates using a
single
warp unit 44 (e.g., in order to reduce the costs or overhead of the hardware),
the use of
a plurality of warp units 44 is desirable to accelerate image transforrnati
ns.
The output of each of the warp units 44 is preferably connected to a frame
buffer pixel bus 34 which provides global horizontal corramunication between
wa-rp
units 44. Reading the source pixels over the baseplane pixel bus 42 and
writing the
final image pixels over the frame buffer pixel bus 34 preferably happens
concurrently
in order to allow greater system throughput. Although not a preferred
architecture, the
present invention also contemplates sequential reading and writing'by the warp
units
44. In this manner, only one pixel bus may be required, assuming the one pixel
bus
offers sufficient bandwidth for real-time image transfer for a full screen
image.
With continued reference to Figure 4, the present invention preferably
includes
ageorn.etry input bus 46 and a geometry output bus 48, although it is
contemplated to
combine the two busses into a single geometry input/output bus of su.fficient
bandwidth for real-ti;,cne imaging. The geometry input and output busses 46
and 48
are preferably connected to the inputs and outputs of the Cube-5 units 38
respectively
and provide for the unique coupling of at least one geometry pipeline or
engine (not
shown) to the present system 10. The architecture of the present invention,
coupled
with a geometry engine via the geometry busses 46 and 48, supports the
integration of
irnagery, such as volumes and textures,,%ith geometries, such as polygons and
surfaces. This rnixirig of geometric data with volumetric objects is a
powerful feature
which is unique to the present invention.
Referring now to Figure 5, there :is illustrated a block diagram depicting the
functional stages of'one of the plurality of Cube-5 rendering pipelines
(reference
number 38 in Figure 4), form.ed in accordance with one ernbodiment of the
present
invention. As shovm in Figure 5, each rendering pipeline 52 preferably
includes four
types of processing units: a trilinear interpolation unit (TriLin) 54, a
gradient
estimation unit (Gradient) 56, a shading unit (Shadee) 58 and a compositing
unit
CA 02337530 2001-01-11
W 00/04505 PC II'/US99/16038
17
(Cornpos) 60. Each of these rendering pipeline stages is described in detail
in the
prior issued patents and pending applications of Azie Kaufinan relating to
prior Cube
volume visualization architectures (listed above) and are therefore only
briefly
discussed herein below.
As discussed above in reference to the 3D memory units 24, the volume
dataset is stored as a regu.lar grid of voxels distributed across the 3D
irnernory units 24
in a skewed fashion, with each Cube-5 unit:38 connnected to a corresponding 3D
memory unit 24 (see Figure 4). Voxels of the same skewed beam are preferably
fetched and processed ; n parallel, distributed across all Cube-5 units 38.
Consecutive
slices of the volume dataset parallel to a predefined baseplane (i.e.,
parallel to a face
of the volume dataset ivhich is most perpendicular to a predefined view
direction) are
preferably traversed in scanline order. Referring agaira to Figure 5, an
address
generation and control unit 62 preferably generates the addresses for access
into the
3D memory unit 24. 7'he address generation and control unit 62 additionally
designates a first processing slice as the current processing slice and
controls
sweeping through subsequent slices of the volume dataset until the final slice
has been
processed.
The trilinear interpolatiori unit 54 computes a new slice of interpolated
sample
values between two processing slices. It is contemplated by the present
invention that
the trilinear interpolation function may alternatively be performed as a
sequence of
linear or bilinear interpolations.
The gradient (vsti tion unit 56 preferably cornputes central difference
gradients using voluYne data from multiple slices of the volume dataset.
Utilizing the
central difference gradients generated by the gradient estimation unit 56,
sample
points of the current processing slice are subsequentiy shaded by the shading
unit 58.
The shading unit 58 preferably uses the samples and gradients as indices into
one or
more look-up tables (LUTs), preferably residing in each shading unit 58, which
store
material color and ir-tensity information. The material color table is dataset-
type
dependent, while the color intensity table is based on a local illumination
model, as
CA 02337530 2001-01-11
WO 00/04505 PC'I"/[7s99/16038
18
known by those skilled in the art. In simple terms, the multiplication of
color and
intensity yields a pixel color for each sample which is used in the
conapositing unit 60
to composite such c loy, with the previously accumulated pixels along each
sight ray.
With reference again to Figure 4, data for computing the next sample along a
continuous sight ray may reside on a neighboring Cube-5 unit 38. In this case,
the
nearest-neighbor connections between Cube-5 units 38 are preferably used to
transfer
the necessary data to the appropriate Cube-5 unit 38, which will continue to
process
that particular sight ray. When compositing has been completed, the c mposited
pixels (i.e., baseplane pixels) are preferably stored in the corresponding 2D
memory
unit 40 connected to the Cube-5 unit pipeline 38. The baseplane pixels, which
form
the baseplane image, a: e subsequently read from the 2D memory units 40, via
the
baseplane pixel bus 42, and assembled by the warp units 44. The warp units 44
additionally transform the baseplane image to the final projection plane
image.
Referring to Figure 5, the delay of data required for the trilinear
interpolation
unit 54 and gradient estimation unit 56 is preferably achieved by inserting
one or more
first-in-first-out (FIFO) units 64 into the pipeline data pa:th prior to being
processed by
the trilinear interpolation 54 and the gradient estirnation 56 units. The FIFO
unit(s) 64
may be implernented as, for exatnple, random access memory (RAM), preferably
embedded on the Cubv-5 chip. T'he introduction of a predeterrnined delay may
be
particularly important when simultaneously processing beams of multiple slices
of the
volume dataset, thereby requiring more computation time between slices.
A compositing buffer (Compos Bu}fea ) 74 operatively coupled to a bilinear
interpolation unit (Bii;irc) 72 essentially provides a one slice FIFO. The
bilinear
interpolation unit 72 preferably interpolates to obtain values between voxels
as needed
for texture mapping. For volume rendering, BiLin 72 preferably uses only
weights of
0.0 or 1.0 which selects one of the corner voxels of the volume dataset
(determined by
Select x and Select y). It just moves the ray data, if the ray crosses
pipelines. Just a
mux [71 for x and yviould be enough for volume rendering, but bilinear
interpolation
is preferred because of texture mapping.
CA 02337530 2001-01-11
W 00/04505 PC'r/US99/16038
19
The Cube-5 architecture preferably supports re-ordering of the pipeline stages
and a number of multipass rendering and processing operations, which require
feedback connections between various stages of the Cube-5 rendering pipelines
52
and the 3D memory units 24. For exarnple, correct rendering of overlapping
volumetric objects pref:erably requires at least two passes through the Cube-5
pipeline
52, where the first pass re-sarnples the volunietric objects to align them
with each
other and the second pass renders the volurr:ietric objects in interleaved
order. As
shown in Figure 5, a multiple volumes feedlback path 66 is preferably
provided,
operatively connecting the output of the compositing unit 60 to the
corresponding 3D
memory unit 24, whicYt allows the re-sampled volumes to be written back into
the 3D
memory unit 24 after re-sampling, classification and shading. The final
rendering
pass works on I2G~a volumes.
Similarly, each Cube-5 reridering pipeline 52 preferably includes an image-
based rendering feedba,ck path 68 connectecl between the waip unit 44 and the
3D
memory unit 24. The image-based rendering feedback line 68 preferably provides
a
feedback path for writing the interrnediate warped images to the 3D memory
unit 24.
This may be particularly useful for accelerating certain image-based rendering
operations requiring multiple warp passes. The architecture of the present
invention
further contemplates feedback connections lbetween the 3D memory unit 24 and
various other Cube-5 r ndering pipeline stages, or between the individual
pipeline
stages thexnselves. Image rendering speed may be substantially increased by
including feedback paths which provide direct and immediate access to the
computational results of individual pipeline stages, without having to wait
for the
results to traverse throizgh the entire Cube-5 rendering pipeline 52.
In a preferred embodiment of the present invention, the Cube-5 system
includes connections vrhich bypass selective stages of the rendering pipeline,
that, for
example, may not be required for certain imaging operations. By bypassing
these
unused pipeline stages, such imaging operations can be accelerated. As
illustrated in
Figure 5, a texture rnap bypass 70 is preferably included in each Cube-5
rendering
pipeline 52. This textarre map bypass connection 70 substantially speeds up
mip-
CA 02337530 2001-01-11
w 00/04505 1PC7'/QJS99116030
mapping, for instance, vrhich consists of storing multiple levels-of-detail
(LOD) of the
image to be processed, by bypassing the shading utllt 58 a-ad compositing unit
60 and
directly presenting the results from the trilinear interpolation unit 54 and
gradient
estimation unit 56 to th+;, bilinear interpolation unit 72. In this way, the
architecture of
5 the present invention caLn preferably be consiidered not only as an array of
pipelines for
performing volume rendering, but as a collection of hardware resources which
can be
selectively configured 1:o perfonn a variety of imaging operations. For
example, when
the Cube-5 system of tlae present invention is perforixirig volume rendering,
essentially all of the ha.rdware resources are required, while texture mapping
generally
10 requires only memory, some buffering and the interpolation units.
Another unique and amportant aspect of the present invention which will now
be discussed is the ability of the Cube-5 architecture to preferably interface
with at
least one convergtiona:i geometry engine 76 to support mixing of geometric
data and
15 volumetric objects in a single image. s is preferably accomplished by
providing at
least one geometry btis, as discussed above, to interface with the geometry
engine 76.
Preferably, thv Cube-5 architecture of the present invention is adapted to re-
use pipeline components (e.g., interpolation unit, etc.), wherever possible,
to
20 accelerate a variety o f renderirAg algoritlnis using multiple
configurations, in
particular, rendering scenes of multiple volumetric and polygonal objects,
texture
mapping, and image-based rendering. Arriong other important advantages,
reusnng
pipeline components reduces hardware costs. The Cube-5 architecture also
supports
various unique methods and algorithms for enhancirig volume rendering and
acceleration of other imaging operations. Some of these methods and algorithms
will
be discussed individually in greater detail below.
In a preferred embo ierlt of the Cube-5 system, fo ed in accordance with
the present invention, volume datasets are stored in blocks, thereby taking
advantage
~b0 of spatial locality. Instead of linear blocking (e.g., Voxelator API),
hierarchical
blocks are used which are preferably stored in a distributed arrangement,
skewed
across multiple 3D xneanory units. For example, using current Mitsubishi
Electric 16-
CA 02337530 2001-01-11
WO 00/04505 PC't'/t1S99/16038
bit, 125 megahertz syrtchronous dynamic random access memory (SDRAM) to
implement the 3D memory, each block can contain 83 16-bit voxels requiring
1024
bytes or two SDRAM pages.
Each block is preferably organized as a collection of 23-voxel miniblocks
residing in the same 3LO memory unit. The banks inside the SDRAM can
preferably
be accessed in a pipelir.ted fashion such that the current burst transfer
essentially
completely hides the setup of the subsequent burst transfer. If the view-
dependent
processing order of the voxels in a miniblock does not coincide with their
storage
order, then the eight miniblock vexels are preferably reordered on the Cube-5
chip.
Hence, a single copy of the volume dataset on the SDRAM is suff icient.
Therefore,
hierarchical blocking i-llows random access to miniblocks at essentially full
burst
mode speed, essentially full (100%) bandwidth utilization, view-independent
data
storage and balanced workload.
Blocking not only optimizes the memory interface, but has an additional
advantage of reducing the inter-chip co aunication bar-dwidth (i.e., between
Cube-5
hardware units), since only the voxels on the block perirneters need to be
exchanged
between neighboring chips processing neighboring blocks. While processing a
b3-voxel block in (P) time, only the (b') voxels on the block boundary need
to be
communicated between chips processing neighboring blocks, where b is the size
of a
block edge and each block has bxbxb (i.e., b') voxels. Therefore, inter-chip
communication needs (1/b)less bandwidth than with a non-blocking solution.
The
size of the block edge b can be in the range of about 4 ti b s 64, although a
block edge
size of eight (8) is preferred.
Block look-up tables (LUT) are preferably utilized to store the pointers to
all
blocks comprising tJae current volume. I'his approach provides an easy method
to
restrict the active voliiime while zooming into a selected region of interest
of a large
volume. It also alle-ws rendering of arbitrarily shaped sub-volumes (at block-
sized
ge-anularity). Additionally, scenes containing many srriall volumes can be
rendered
very efficiently, as all volumes can reside anywhere aknong the 3D memory
units, and
CA 02337530 2001-01-11
WO 00/04505 PC'II /1JS99/16038
22
only the look-up tables rraust be reloaded for each volume, rather than the 3D
memory
units.
One method of perforgning perspective projection and/or Level-of-Detail
(LOD) relies on two-fold super-sanapling in the x and y directions.
Accordingly, a
four-times (4x) replication of the interpolation units for trilinear
interpolation, as well
as the gradient estimation units for gradient computation, is preferably
employed. As
a result, the datapath between the SDRAM and the Cube-5 pipelines is
essentially
unchanged. However, 4he bandwidth between Cube-5 pipelines is quadrupled, as
is
the on-chip throughput and buffers, primarily because each sample of the
nor.rnal
mode is replaced by up to four samples (i.e., 2x in the x direction and 2x in
they
direction).
Handling anisutropic datasets and super-sainpling preferably require a
modification of opacylbr a. The combined fimction is a' = 1-(1 -CZ)dlk, with
super-sa.tnpling factor ;~ representing the number of samples per voxel cell,
and d
representing the distance which a sight ray travels (i.e., the length of the
path of the
sight ray) through eacli voxel cell. preferably, a loolc-lap table (LUT) is
employed, for
fast look-up of a' during rendering.
With continue(l reference to Figure 5, the perspective rendering of volumetric
data with close to unif:ortn s pling of the underlying volume dataset requires
re-
scaling of the compositing buffer 74 with filtering between levels: Level-of-
detail
(LOD) perspective rendering requires re-ali ent of the compositing buffer 74
between levels. Both of these processes, which incorporate global
communication not
available in the pipelines 52, are preferably perforrned by the warp unit(s)
44.
Although the compositing buffer 74 is already accessible to the warps units
44, it is
preferred that a feedback line 43 be used to write the filtered values back
into the
compositing buffer 74.
JO
A hardware viarp unit is generally necessary to obtain final full screen
images
in real time (i.e., a 301-Iertz franie rate). As shown in Figure 5, the
baseplane image,
CA 02337530 2001-01-11
WO 00/04505 PC'T/[JS99/16038
23
generated by the compositing units 60 of the Cube-5 rendering pipelines 52, is
preferably buffered in the 2D memory units 40. To lower the memory bandwidth
from the 2D memory units 40 to the warp unit 44, each pixel fthe baseplane
image is
preferably accessed only once. To perform a linear interpolation between
samples of
the current and the pre ii us scanline, another FIFC) unit, sized to hold at
least one
scanline, is required to store the previous scanline s ples. The interpolation
rweights:-
f r each grid pixel are preferably pre-calculated on a host machine.
In order to perf:crm the accurate mixing of volumes and geometry, for opaque
geometric objects, the Z-buffer image is preferably written to the compositing
buffer
60. The compositing unit 60 must perforrn a z-comparison prior to blending
each new
s ple. Additionally, for translucent geometric slices, the geometry engine 76
preferably utilizes the geometry input bus (reference nurnber 46 in Figure 4)
of the
present invention to insert each slab of R.G~a values into the data strearn so
that each
slab is interleaved with the volumetric data slices.
For texture mapping, Figazre 6 shows, by way of example, how 32 bits of texel
data are preferably stored for a 2x2 neighborhood in a miniblock of 16-bit
vaxels in
the 3D memory unit, in accordance with the present invention. Therefore, a
four-texel
neighborhood of 32-bit texels is preferably read during each niern ry burst
read.
Without data duplication, the Cube-5 syst.-n preferably performs, on average,
2.25
data burst reads to ac-cess the appropriate 7exel neighb rh cd, since some
texture
coordinates may lie between stored miniblocks.
2-- With reference again to Figure 5, in accordance with one forrn of the
present
invention, one way to implement image-based renderirlLg in hardware is to
utilize the
memory c ntrcl uni't 78, preferably included in each Cube-5 pipeline 52, to
read the
appropriate source fsixels based on the contributing region for each pipeline.
The
interpolation units (e.g., 54 and 72) in that pipeline 52 will then preferably
perform
the four-dimensional (4D) interpolations needed for light field rendering or
lumigraph. As an alterr?ative implementation, the warp unit 44 may be utilized
to
perform this function. The source pixels contributing t the current view are
read and
CA 02337530 2001-01-11
W 00/04505 PC'r/1JS99/16038
24
assembled into the 2D memory units 40, preferably thro-agh a connection line
41,
followed by the warp transforrnation. preferably, four assembled source images
are
processed in four consecutive warp passes. The final combination of the four
intermediate warped images is performed in the Cube-5 pipeline 52. As
described
previously above, the iniage-based rendering feedback line 68 provides
feedback for
writing the interrnediate, warped images to the 3D memory 24. For either
approach,
the 3D memory units 2za provide local storage for a large database of images.
It is to be appreciated that the apparatus of the present invention described
herein above (and referred to as Cube-5) may considerably accelerate
conventional
volume processing methods, beyond the universal rendering already described.
Additionally, the Cube-5 apparatus of the present invention may be used in
conjunction with a nun-iber of unique algorithms adapted for enhancing the
performance of and0or providing enhanced features for real-time volume
processing,
therefore making the overall Cube-5 system superior to existing volume
rendering
architectures, such as Cube-4. Some of these unique algorithms, including
those for
perfor8raing image waiping, three=-dimensional transf :rrriations, perspective
projections, handling large volumes, high qnality rendering, clipping, depth
cueing,
super-sampling and aizisotropic datasets, are discussed in detail below.
In accordance with one forrri of the present invention, a method for
perforrning
image warping is presented which, among other advantages, speeds perspective
warping and provides improved image quality. Image warping is preferably the
final
stage of the Cube-5 volume rendering pipeline. In simple terrns, irnage
warping
primarily relates to the geometric transformation between two images, namely,
a
source image and a target image. The geometric transfonnation defines the
relationship betweeri. source pixels and target pixels. Efficiency and high
quality are
equally critical issues in such applications. In the apparatus of the present
invention,
the warp unit prefer-ably perfornis the image transformation function.
Consequently,
applications employing a wunit benefit from the image warping method of the
present invention.
CA 02337530 2001-01-11
WO 00/04505 PC1"/[JS99/16038
Distinguished by the data flow of the transformation, image warping methods
are generally classified as either forward wapping or backward warping. In
forward
warping, the source pixels are processed in scanline order and the results are
projected
onto the target image. I:n backward warping, the target pixels in raster order
are
5 inversely mapped to the source image and sampled accordingly. Most known
prior art
warping algorithms ernploy backward warping.
Compared with affine transfornnations (i.e., translation, rotation, scaling,
shearing, etc.), a perspective transfornaation is considered to be more
expensive and
10 challenging. For perspective projection, an expensive division is needed
when
calculating the s ple l cation in the baseplane image for a pixel in tlae
projection
plane. Conventional perspective warping is typically at least three-fold
slower than
parallel warping, when implemented by a CFIL7. Accordingly, some prior art
approaches have decomposed the perspective transforrnation into several
simpler
:15 transformations requiring multiple passes. One prirnary problem inherent
in multi-
pass transformation algorithms, however, is that the coYnbination of two one-
dimensional (1D) filtering operations is not as flexible as txue two-
dimensional (2D)
filtering. Furthermore, conventional multi-pass approaches introduce
additional
filtering operations which degrade image quality.
:2n
The present invention preferably employs a unique single-pass forward
warping method which can be iinplemented with substantially the s e efficiency
as
affine transformations. Costly divisions, wl:iich were traditionally
pe:rfonned for every
pixel, are reduced to only twice per scanline according to the present
invention. Thus,
25 by reducing the number of division operations, the present invention
provides an
alternative perspective warping method which is superior to known prior art
methods,
at least, for example, ir- terrns of speed and the efficient hardware
implementation. A
preferred method for perspective warping, in accordance with the present
invention,
will now be discussed.
Preferably, the present invention uses a scanline approach to perforin
perspective warping. Rather than scanning in normal raster scanline order,
however,
CA 02337530 2001-01-11
w 00/04505 PCT/Y.JS99/16038
26
the algorithm of the present invention is processed in a special scanline
direction in
the source image. As illustrated in Figures 7 and 8, this special scanline
direction 92
(Figure 8) preferably has the property that parallel scanlines 84 in the
source image 80
appear as parallel scanlines 86 in the target image 82, and that equi-distant
sarnple
points 88 along a source scanline 84 remain as equi-distant sample points 90
in the
target scanline 86. Soxne advantages of this unique approach include a reduced
complexity of perspective-correct image wcxping; (i.e., by eliminating the
division per
pixel and replacing it -ivith two divisions per scanline), accurate
antialiasing by
incorporating anisotropic filtering, correction of flaws in Gouraud shading
caused by
bilinear interpolation and optimization of the memory bandwidth by reading
each
source pixel exactly o,ace.
The intuition of the special scanline direction is derived from projection
geometry, as shown in Figure 8. Referring; to Figure 8, the source image 80 is
preferably placed on a three-dimensional (3D) surface and the target image 82
is
placed on a screen. As in typical texture mapping, to obtain the pixel on
screen, a
sight ray (or rays) 94 is cast from a viewpoint (or eye point) 96 to 3D space
and
intersected with the screen 82 and 3D surface 80. The intersectiorl points are
the
sample points 98. When the scan direction 92 in screen space is parallel to
the 3I?
planar surface, the scanlines in both images are parallel to each other, and
equi-distant
sample points 98 along the scanline remain equi-distant in the 3D surface
plane. This
parallel-preserving (PP) scanline direction exists and is unique for a given
perspective
transfortnation. It is to be appreciated that for parallel projections, any
scan direction
preserves this parallelism on both images, and thus a raster scanline
direction may be
preferably used due to its simplicity.
Referring ag;ain to Figure 7, parallel-preserving (PP) scanlines 84 and 86 are
shown in both the source 80 and target 82 images respectively. Once the
parallelisrn
property is achieved, pixel access becomes regular, and spatial coherency can
be
?>0 utilized in both images. Additionally, the PP scanline enables the
application of a
pure incremental algorithm without division to each scanline for calculating
the
CA 02337530 2001-01-11
W 00/04505 PCT/US99/16038
27
projection of source sa,niples 88. One division is still needed, however, for
the two
endpoints of every scanline due to the non-linear projection.
With continued reference to Figure 7, as the source image 80 is scanned in the
PP scanline direction rather than the raster direction, sample points 90 on
the target
scanline 86 may not necessarily coincide with the target pixels 91. However,
the
sample points 90 can be aligned on the x grid lines 89 of the target image 82,
thus the
sample points 90 are only off they grid lines 87 (they are equi-distant along
the
scanline). For a more vfficient but lower quality itnplemeratation, placing
the sample
value in the nearest-neighbor target pixel is a reasonable approximation, as a
half
pixel is the maximum error. However, when higher quality is preferred, the
present
invention may perforna pixel correction and effective aratia.iiasing, to be
described
herein below.
In general, a reduction in the number of divisions from (nZ) to 0(n) is
obtained by the algorithm of the present invention (where n is the linear
resolution).
For the present algorithm, preferably oitly two additions are needed to
calculate each
sample point, while conventional raster scanline algorithms generally require
three
additions, one division and two multiplications per pixel. A preferred method
for
perforrning forward image warping, in accordance with the present invention,
is
described in detail herein below.
The forward warping algorithm of the present invention is preferably
perforrned in two steLges: (1) calculating the special parallel-preserving
(PP) sc e
direction, and (2) fo,,-ward mapping the source image to the target image
along the
special PP scanlines, incrementally within each scaxilirne.
As discussed briefly above, the parallel-presenring (PP) scanline is the
intersection line between the tYiree-dimensional (3D) planar surface and the
screen
(i.e., target image). However, in a two-dimensional (2D) problem, the PP
scanline
must be calculated based on a 2D matrix. Generally, a perspective
transforxnation can
be presented as
CA 02337530 2001-01-11
WO 00/04505 PC''/t3S99/16038
28
u x 6, d.. g x - -
v=M y= b e h y
1 z cf 1 1
where (u, v) is the coordinate of the source pixel, (x, y) is the coordinate
of the target
pixel, and M is the perspective transfcrrrnation matrix. The (u, v) coordinate
can be
expressed in terms f (x, y) as
ax + dy --~
(u,v) -F(x9y) -c bx+ey+h
where
C=- 1
(cx +.~y + 1)
A line in the target image can be expressed as y = kx + B, where slope k
denotes
a line direction and B denotes a line intercept. To calculate slope k for the
PP
scanline, two parallel, lines are preferably defined having identical slope k
and
intercepts B of 0 and 1, represented by point pairs of (0, 0), (1, k) and (0,
1), (1, k + 1),
'LC respectively. The coordinates of these points in the source image are then
calculated.
Since perspective transforrn.ati n preserves straight lines, these two lines
will remain
as straight lines in the source image and their slopes can be calculated from
two point
pairs. Assuming that the slopes of the two rnapped lines are essentially
equal, an
equation in k is pref-,rably obtained. Solving this equation for k results in
c
k=--
f
The corresponding slope k in the source image is t;hen
k/= bf-ec
af - dc
CA 02337530 2001-01-11
PC'T/tJS99/9603$
~' 00/04505
29
As can be noted frorn tYie above equation, when k' _-c , the denominator of
the
- f
homogenous coordinates becomes a constant value of Bf -~ 1, where B is the
intercept
in y=kx+B.
The second stage of the preferred forward warping method of the present
invention involves scai.iline processing and is illustrated in Figures 9 and
10 by way of-
ex ple. Referring now to Figure 9, the preferred algoriT sweeps the scanlines
84
(e.g., scanlines SI -.S4) through the source -image 80. As discussed above,
the
scanlines 84 have tbe slope k'. '.rlo.e samples 88 along each scanline 84 are
preferably
incrementally calculated. First, for each scanline 84, the projection of the
endpoints
from the target image onto the source image is calculated. Then, based on the
number
of sample points on tYLe scanline, increments are calculated in both the x and
the y
directions.
Considering a, traditional bilinear iriterpolation of sarnples in the source
image,
every sample essentially requires the contribution of foLir surrounding source
pixels.
If pixels are read every time for every sainple, each pixel ought to be read
four times.
This leads to a memory bandwidth of four times the target image size. However,
since all scanlines are in parallel, samples on neighboring scanlines usually
share
contributing source pixels. Consequently, in accordance with the method of the
present invention, pixels that have been previously read are preferably
buffered so that
corxirnon pixels are read from the buffer rather dun from the source image
itself
With referer.Lce to Figure 7, pixels are preferably read in a fixed patterrz,
called
the pixel read template 100, calculated based on the Breserham line algorithm
(as
appreciated by those skilled in the art). The binary digits shown at the
bottom of
Figure 7 represent one way of encoding the read ternplate 100. The present
invention,
however, contemplates other encoding schemes, as appreciated by those skilled
in the
art. As illustrated in Figure 7, this code indicates the increase in the
positive v
:30 direction; a"0" represents no increase and a"189 denotes an increase by
one unit, while
u is always increased by one unit. For the example of Figure 7, the u axis may
preferably be referred to as the primary processing axis. It is preferred that
the
CA 02337530 2001-01-11
W 00/04505 PCT/[JS99/16038
template 100 always start from the left-rraost '~ixel and moves in the
vertical direction
(i.e., increasing v direction) so that all pixels are read and placed into the
buffer for
subsequent use in the sampling. It can be seen frorn Figure 7 that in order to
provide
pixels for sarnpling on any scanline between the two dotted lines, four pixel
templates
5 are preferably required, even though for a specific scarnline, only three
pixel templates
might seem sufficient (e.g., only templates 2, 3 and 4 are $necessary to
process the
current scanline aS2). Therefore, the buffer size is preferably four
scanlines.
Referring now to Figure 10A, there is illustrated the addressing of sarnnples
in
10 the buffer. Whenever the template code value is 1, the sample decreases by
one unit
in the v direction. The thick zigzag line 104 represents the output scanline
in the
buffer. When the sarnnple falls within the shaded region 106, in which the
pixels in the
buffer are sheared, care should be taken to read the correct pixels for
sampling.
Figure 10B illustrates a preferred procedure for bilinearly interpolating one
of the
15 samples, s, in this region.
The contents of the buffer are preferably updated based on the scanline
position. For ex ple, referring to Figure 9, templates 1, 2, 3 and 4 are
preferably in
the buffer when processing scanline S1. For scanline 5'2, the buffer
preferably
20 remains the s e. For scanline S3, template 5 is preferably read into the
buffer and
template 1 is discarded. For scanline S4, template 6 preferably replaces
template 2,
and so on.
As mentioned above, one of the features of the unique forward image warping
2G s method of the presern.t invention is the correction of flaws in Gouraud
shading.
Gouraud shading is a popular intensity interpolation algorithm used to shade
the
surfaces of geometric objects. Given color only at the vertices, Gouraud
shading
bilinearly interpolates the intensities for -the entire rasterization of
ageornetry in a
raster scanline orde:r. The flaws of the Gouraud shading approach are known in
the art
30 and have been the subject of such articles as, for example, DiLyital Imave
Warging, by
G. Wolberg, IEEE Computer Society Press, 1990.
CA 02337530 2001-01-11
wo 00/04505 PC'I'/tJS99l16038
31
One of the prob'leyns associated with the Gouraud approach is that diagonal
lines (as an example) ai-e not linearly mapped for perspective projections.
When a
diagonal line is perspectively projected onto a screen in 3D screen space,
Gouraud
shading converts this diagonal line into a curve, which violates the property
of
preserving lines in perspective transfornnation.
The irnage warping method of the piresent invention corrects the perspective
distortion in Gouraud shading. The perspective distorlion is present because
the linear
interpolation along a raster in screen space is generally non-linear when
transfonned
into geometrical coordinates. Using the special scan direction of the present
invention, however, li:riearity is preserved by the mapping. Thus,
interpolation is
linear in both image arYd geometrical space, thereby fixing the distortion of
Gouraud
sha g. It is to be appreciated that interpolation along the edges is still
non-linear,
and therefore the scan.line endpoints must be transformed into geometrical
space for
correct interpolation.
The forward rnapping algorithm of'the present invention, with nearest-
neighbor approximation, preferably generates a target irnage that is
essentially
indistinguishable froin an image generated using traditional methods. However,
when
a higher image quality is desired, the method of the present invention can
preferably
calculate the pixel value at exact grid points. A simple target pixel
correction scheme
may preferably be introduced to perforrn this correction.
With reference now to Figure 11, assuming the sample points 90 in the target
2:5 image 82 are alflgned on integer x coordnlates, in order to obtain the
pixel value at the
exact pixel grid locations 91, a linear interpolation of the two samples
immediately
above and below each pixel is preferably performed. Perfo ing this linear
interpolation simply as a second pass may increase the cost, since the samples
must be
read over again. Instead, as each sample is generated, a preferred method of
the
present invention spreads the contribution of each saanple to the
corresponding upper
and lower pixels with no intermediate buffering.
CA 02337530 2001-01-11
WO 00f04505 pC'I'/IJS99/16038
32
As illustrated by the example of Figure 11, samples 112 located on the thicker
inclined scanline 108 contribute to the shaded pixels neighboring them
(lighter
shading above the scan:line, darker shading below the scarilirze). The arrows
indicate
that each s ple 112 pi: eferably contributes to two pixels. It is preferred
that a pixel
not be written out until both contributions are collected. Thus, a ue scanline
buffer is
preferably included for storing the intertnediate pixel values.
To write out pixels correctly and efficiently, a pixel write pattem, called a
pixel write template 11; 0, is preferably pre-calculated. Unlike the pixel
read template
(e.g., reference number 100 in Figure 9), the pixel write template 110 is
preferably
calculated by trancatirig the y coordinate vdue of samples along a scanline.
The
template 110 is preferably encoded as a series of integer y steps and
fractional
distances dy from the true scanline 86. The weights used for the final linear
interpolation are dy aiid 1 - dy for the upper and lower pixels, respectively.
Since all
scanlines are preferably one unit apart in the vertical direction (i.e., y
direction), the
template is calculated only once per projection.
The forward image warping method of the present iuventioii can further
improve on image quality by antialiasing. Using the parallel-preserving (PP)
scanline,
a higher quality, less expensive method of antialiasing may be achieved.
Referring again to Figure 7, the sai:nple points on the upper scaailines of
the
source image are sparser than on the lower scanlines, resulting in a
transition from
under-sasnpling to normal sarnpling. Thus, an appropr~iate resampling filter
may
preferably be used t+) avoid aliasing on the upper scanlines. Isotropic
filtering results
in clearly incorrect and blurry images. The need for anisotropic filters has
been
addressed in such articles as Soey f Texture Marr~in , by P. S. Heckbert,
IEEE
Computer Graphics- and Applications, 6(11):56-67, November 1986, and more
recently in Texram; Smart r/Iemcrv f r ']Cexturiug by .A. Schilling, et al.,
IEEE
C mputer Graphics and Applications, 16(3)>32-41, May 1996.
CA 02337530 2001-01-11
WO 00/04505 PC II'/IJS99/16038
33
1t is known by those skilled in the art, that each filter is defined by its
footprint
and profile. Taking a target sample as a circle, its projection in the source
image is its
footprint. As illustrated in Figure 12, this footprint 114 should generally be
neither
circular (i.e., isotropic) nor square-shaped (i.e., as in mip-mapping), but
conic in
shape. The profile of the filter decides the weights of the contributing
pixels within
the footprint. Although a sine filter is optirnal, agaussiari filter is easier
to implement-
and is preferred becau'e of its finite footprint and good low-pass
characteristics. The
perspective warping algorithm of the preseiat invention offers more accuracy
in
calculating the anisotropic footprint, producing higher age quality at a
lower cost.
Using conventional methods for calculating the anisotropic footprint, the main
axes of the ellipse must be calculated for every pixel. Although
approximations have
been proposed, this remains an expensive computation, and no known incremental
method is available. To obtain the major axes of the ellipse using these prior
art
methods, the Jacobian must be calculated. Using the image warping method of
the
present invention, however, calculation of the Jacobian may be eliminated.
Iri order to gLLin insight into a preferred method for calculating the
anisotropic
footprint ir, accordarice with the present irivention, the properties of the
Jacobian will
first be analyzed. The generalized backward mapping -from an xy target image
into a
uv source image was previously defin.ed above as
u' lax+c~y+~
= ~ (x.Y,) - C v bx+ey+h
where
C=. 1
(cx +fy + 1)
The Jacobian J for the generalized transformation is a non-linear fimction of
x arid y,
J = C 2 y(af - cd) + a - gc x(af- cd) - d +g1
25.-
y(bf-ce)+b-hc x(bf-ce)-e+v
CA 02337530 2001-01-11
WO 00/04505 PCT/LIS99/16030
34
In conventional antialiasing approaches, the Jacobian is used to determine the
footprint of each pixel in the source image and is necessary for anisotropic
filtering.
The differences between screen pixels in xy raaster space are projected into
the source
image by computing the directional derivatives in the [1, 0] and [0, 1]
directions.
These derivatives in source image space are called ri and r 2, and are defined
as
1 y(a~'- cc~) +ca - gc
.J J=C,
0 3'(bf- ce) + b - hc
and
6 x(a,f-ca~) -d +g~
r ZT~ ,~x
l. x(b.{- ce) - e +hf
These vectors, r, and r2, define the bounding box of an ellipse that
approximates the footprint 114. Typically, these vectors 116 and 118 are
calculated
for every pixel, when needed, for conventional methods of anisotropic
filtering (e.g.,
elliptical weighted average (EWA), footprint assembly). This requires one more
division per pixel for,malctalating C. In accordance with the present
invention, a more
accurate method for determining the footprint is presented, as described
herein below.
Because the Jacobian is a linear approximation of the non-linear mapping, it
is
more accurate, and therefore preferable, to compute the footprint by taking
the
distances to neighbos-ing samples in source image space. Since the projections
of
neighboring samples are already computed, this method of the present invention
requires no additional division.
The parallel-preserving (PP) scan direction provides for greater coherency and
no division to compute the Jacobian. Forr each pixel iri the PP scanning
order, the
footprint is preferably defined by r,' a.nd rZ. The directional derivative r,'
in direction
[1, k] along the PP scanline is
CA 02337530 2001-01-11
WO 00/04505 PCT/iJ599/16038
r t ~~- ~~r
~P [1rtl F'=.1 k I=c z bf -ce
and since y =1cx YB, C is constant for every PP scanline, and thus a ,' is
(Bf+ t)
constant for every PP scanline. The method of f.ae present invention exploits
this fact
in order to preferably increment the source image coordinates along a
scanline, with
no divisions. The value of the directional derivative r2'in they direction [0,
1] is
r
r2 =V[ol]F=YZ
5 It is to be appreciated that r2' varies linearly along the scanline since it
is a function of
x, and thus it can be iticremented along the scanline. The special scan
direction makes
it possible to compute the source image coordinates and pixel footprints
simply and
efficiently.
10 After efficiently computing all the footprint and source pixel coordinate
inforrnation, correct anisotropic filtering can be perforrned using a standard
method
known by those skilled in the art, such as, for example, Greene and 1-
Ieclcbert's
elliptical weighted average (EWA) or Shilling et al."s footprint assembly.
These
conventional algorithms are described, for example, in the text Creating
Raster
15 Omnirna&lmages fYOrri MuM ltinle Perspective Views fJsing th.e Elliptical
Wei ngted
A.vera e Filter, by N. Greene and P. S. Heckbert, IEEF. Computer Graphics and
Applications, 6(6):21-27, June 1986. However, these conventional filtering
approaches are not preferred since, as pointed out previously, even the
elliptical
footprint approximation is inaccurate. F'urtherrnore, such prior art methods
result in
20 redundant sampling (i.e., accessing each source pixel multiple times). For
instance,
for a circular filter region with a footprint radius of 1.0 source pixel, each
source pixel
is sampled an aveT'age of -A times. By using the forward mapping technique of
the
present invention, redundant memory access can be essentially eliminated, thus
lowering the memory bandwidth by a factor of n. Preferably, the present
invention
25 -- provides a forward mapping technique in which all source pixels are read
once in
CA 02337530 2001-01-11
WO 00/04505 PCTIUS99/16038
36
pixel read template order and subsequently splatted onto the targq~ image with
a filter
kernel.
As illustrated in Figure 13, each source pixel 124 has a Ax 120 and a y 122
relative to each of its nearest-neighbor target samples 126. The Ax can be
preferably
computed incrernentelly since all samples along a scanline are equi-distant.
The
special scan direction essentially guarantees that the Ay is constant along
each
scanline. Although the raster grid locations deviate from the true scanline
128, the
actual distances can be estimated preferably by adding a small correction
which may
be stored in the pixel read template 130 and is preferably unifor.rn among
scanlines.
The filter kernel is preferably premcomputed once and stored in a lookup table
(LUT).
Subsequently, the contribution of each source pixel 124 is preferably indexed
by its
& and Ay into the lookup table (LUT) for the four (or more) nearest-neighbor
target
samples 126. The riumber of target sam:ples 126 depends upon the footprint of
the
filter used, and it rnay preferably vary from four to 16 samples. Using this
method,
each source pixel 124 is preferably read exactly once from memory, then four
(or
more) times modulated by a loolcup table entry and accumulated in the target
pixel. In
this manner, the final pixel value is the weighted average of the nearby
source pixels
124. This weighted average requires a division by the sum of the filter
weights to
normalize each firial pixel intensity.
In addition to image warping, vahich can be broadly defined as a geometric
transformation between two images (e.g.9 a source LCnage and a target image),
three-
dimensional (3D) volume trarrsforrnation plays a key role in volume rendering,
volume rza.odeling, and registration of inultiple voluznes. Among all affine
transforn3ati-onS, rotation generally consumes the most computation time and
is
considered the rnost complicated. Accordingly, in providing a universal 3D
renderinl;;
architecture in accordance with the present invention, several unique methods
for
perfor.nling arbitrary 3D volume rotation are presented, as described in
detail herein
below. Although the universal 3D rendering hardware of the present invention
may
be used without the 3D volume rotation methods described herein, these
methods, or
algorithans, are preferably irnplernented in conjunction with the apparatus of
the
CA 02337530 2007-02-22
37
present invention to provide enhanced speed and features and are adapted to
most
efficiently utilize the apparatus of the present invention.
Prior to describing the unique methods for performing 3D volume rotation, it
is important to first provide some basic definitions of the terms used. As
appreciated
by those skilled in the art, relative to the rows and columns of an image, a
beam in a
volume may be defined as a row of voxels along one major coordinate axis
(e.g., an x-
beam is a row of voxels in the x direction). A slice of a volume is a plane of
voxels
which is perpendicular to a major axis (e.g., an x-slice is defined as a plane
perpendicular to the x axis).
Prior art methods for performing volume transformations typically utilize
multiple-pass algorithms, which are usually direct extensions of the multiple-
pass
algorithms used for image transformations. Various methods for performing 3D
rotation have been proposed, generally involving a decomposition of the 3D
transformation into multiple two-dimensional (2D) or one-dimensional (1D)
transformations. These prior art methods have been the subject of articles,
including
Volume Rendering, by R. A. Drebin et al., Computer Graphics (SIGGRAPH '88
Proceedings), Vol. 22, pp 65-74, August 1988, Three-Pass Affine
Transformations
for Volume Rendering, by P. Hanrahan, Computer Graphics (San Diego Workshop on
Volume Visualization), Vol. 24, pp 71-78, November 1990 and Fast Rotation of
Volume Data on Parallel Architectures, by P. Schroder and J. B. Salem,
Visualization
'91, pp. 50-57, 1991. However, these known 3D transformation methods typically
result in a lower quality rotation and/or slower processing speed.
One of the properties which make three-dimensional (3D) rotation so difficult
is that 3D rotations inherently require global communication and could cause
memory
contention while writing data back to the distributed memory modules. However,
as
shear transformation capitalizes on nearest neighbor connections, it lends
itself to an
extremely feasible multi-pipelined hardware implementation, as provided by the
unique architecture of the present invention. The present invention further
provides
CA 02337530 2001-01-11
WO 00/04505 PC'r/1JS99/16038
38
novel methods for perl'orpning arbitrary 3D rotation, essentially by
decomposing the
3D rotations into sequ+rnces of different types of shear transformations.
Using a conventional decomposition. approach, sir.ice a 2D rotation can be
decomposed into three one-dimensional (lI)) shears, a direct extension to 3D
rotation
would require nine lD shears. However, in accordance with the present
invention,
four preferred methods of shear decomposition of an arbitrary 3D volume
rotation are
presented, as described in detail herein below. These methods include a four-
pass 2D
slice shear, a four-pass 2D beam shear, a three-pass bearri-slice shear and a
two-pass
3D beam shear decornposition. By not introducing a scale operation, the
algorithms
of the present invention avoid complications in sampling, filtering and the
associated
image degradations.
It is to be appreciated by one skilled in the art that a 3D rotation matrix
can be
expressed as the conccltenation of three major axis rotations, .RX(~),
Ry(.Rz(~), where
1 0 0
dtx = 0 cos c~ sin c~
0 -siin (~ c s (~
c s 0 0 -sin 6
.R = 0 0 0
Y
sinij 0 c s0
eosa sin oc 0
IZ = -sin 0; cosa 0
z
0 0 1
The order in which this concatenatioil is perforrned results in different 3D
rotation
matrices. There are six perrnutations of the 3D rotation matrix in total. By
way of
illustration, the under:lying 3D rotation matrix was chosen as R3D
=Rx*Ry(H)RZ(a),
_.2 Q- where
CA 02337530 2001-01-11
WO 00r04505 PC'r/13S99J16038
39
co;s Cosu cos sina
R 3D = si1l(~sYnOC sa-CC3s4)"sina siYl4)sin6sina+Cs3s(~Cosa siTB.~C(Ds
CiBs(~ sin Cos$x +sil14) sii'@OC cCas(~siffiQsin (, -uifll(~cfls G cos(~CosO
The above 3D rotation matrix (R3D) is used for all the decompositions which
follow. One of the primary differences between the unique methods of the
present
invention and other cor entional approaches is that in the present invention,
the
decomposition is applied directly to a 3D rotation matrix, rather than to
multiple 2D
rotation sequences, to cibtain shear sequences. It is to be appreciated that,
for any of
the shear operations pe:rforrned in accordance with the present invention,
barrel
shifters may be used as a preferred hardware implementation, although other
means,
such as logarithmic shifters or the like, are similarly contemplated.
As shown in Fi gBare 14, a method for perforrrzing two-dimensional (2D) slice
shear rotation, in accordance with one exnbo ' ent of the present invention,
preferably involves a decomposition of the ;3D rotation into a sequence of 2D
slice
shears. in a 2D slice shear, avolurrie slice (i.e., a plane of voxels along a
major
projection axis and parallel to any two axes) is merely shifted wit its
plane. A slice
may be arbitrarily taken along any major projection axis. For example, Figure
14
illustrates a y-slice shear. A 2D y-slice shear is preferably expressed as:
x=x +w y
z=z +b y
A 2D y-slice shear may preferably be written as S(xz, y, (a, b)), interpreted
as a
shear along the y axis by an amount a 132 in the x-direction and an amount b
134 in
the z-direction. Although both a and b are preferably constants, it is further
contemplated that a asid b can represent functions as well. A 2D x-slice
shear, S(yz, x,
(c, d)), and a 2D z-slice shear, Sfty, z, (e, fi), are similarly defined. With
reference to
CA 02337530 2001-01-11
W 00/04505 PC'1'/iJS99/16038
Figure 14, the volume represented by the solid lines 136 is the shear result
of the
volume defined by the dotted lines 138.
Intuitively, consecutive shears along the same axis produce a conforming
5 shear. For example:
5(xz,.Y9 (a9 b)) S(xz,y, (a', b'))
1 0 0 1 0 0
= a 1 b a' 1 b'
0 0 1 0 0 1
1 0 0
= a+a' 1 b+b'
0 0 1
In order to build the general 3D matrix from 2D shear matrices, shear products
may be restricted to products of different shears: S(yz, -, (c, d)), S(xz, y,
(a, b)) and
S(xy, z, (e, fl). However, the product matrix of these tkiree shear matrices
will still not
10 be in the general form due to a constant 1 in. the present naatrix.
Accordingly, another
shear matrix is preferably concatenated, where this final shear is the same
slice shear
as the first one. This results in the following six perrn'xtations of shear
sequences:
S(xz,y, (a, b)) S(xy, z9 (e,j)) S(3'z, x, (c, d)) ,5(xz,Y9 (g9 h))
S(xz, y, (a4 b)) 'S(yz9x9 (c, d)) S(xy9z, (e9j)) S(xz,3', (g, h))
17 (xJ 9 z, (e9j)) 67 (.:6z9,J 9 (a, b)) Ig{y~yz9 x, (~ . 9 Ld)) S(xy,
z, (i 7d ))
"'' (x/. ,'7i y (e9j)) SCyz, x, (c7 d)) S(xz9 J 9 (a9 ' )) - S(xY, z9 (=
9/ )/
S(yz, x, (c, d)) S(xz,y, (a, b)) Ay(xJ %z9 \e%.1 i) S(yz, x, (m, F3))
S(yz, x, (c, ~)), S(xy, z, (e'9j)) S(xz,y, (a, b)) ' ' (yz,sg'9 (m, n))
For each of the; shear sequences, the product matrix of the consecutive shear
matrices is preferably computed and set equal to the underlying 3D irotation
matrix.
15 For example, for the first shear sequence given above (i.e., S(xz, y, (a,
b)) ,S(xy, z, (e,
j)) S(yz, x, (c, d)) S(xz, y, (g, h))) :
CA 02337530 2001-01-11
WO 00,04505 PC"T/[.JS99/16038
41
R 3D = R,x(4 )R y( ) RZ(U) _
S(xz,y, (a, .. )'), .'" \xi , z, ( e9J 1 )e
L9('z9'.~D(c,."").) - LV(.az9.Y9( 9h)/
The above rnatrix equation implies nine trigonometric equations with eight
variables, namely, a, b, c, d, e, f, g and h. In solving these nine equations
for the eight
variables, a - h, the fol:lowilig results are obtained:
a sin sina(cosO -cos(~) + sint~(cosoc -cosO)
(cosO)2sincXsin4)
cos~cosa - b
b_-
sinc~cosO
c = c s sincx
d_ -siI2~ s1P1O Cosa ~ Cos(~ silla -cas6 sflffia
slni~6;osO
sin~ Cos + eos(~ siYl. sitl G - sgn(~ cosa
e
cosO sina
f_ -sirl(pC s
_ coso Cosa - I
9 C s0sina
cosec6)sa - 1
h _ coSO sinCt'
In a similar rnarner, the shear matrices for the remaining five slice shear
sequences given abovs, may be obtained. In fact, the slice shear sequence with
the
solution given above has the simplest expression and is preferably ternned the
dominant sequence.
Referring now to Figure 15, a method for perforrrging three-dimensional (3D)
rotation by a two-dimensional (2D) beam shear decomposition will now be
described.
First, a beam shear may be defined as a beam that is merely shifted in its
major
CA 02337530 2001-01-11
WO 00/04505 PC'r/tJ399/16038
42
direction without any change of the other tvro coordinates. For example, a 2D
x-beain
shear is preferably expressed as:
x = x +a ,y +b z
A 2D x-beam shear may preferably be written as S(x, yz, (c, d)), interpreted
as
a shear along the x axis by an amount a in the x-direction and an amount b in
the z-
direction. A 2D .y-beaa.n shear, S(y, xz, (a, b)), and a 2D z-beam shear, S(z,
xy, (e, )9),
are similarly defined. Figure 15 illustrates an x-beam shear, whereira the
volume
represented by the dotted lines 146 is sheared to the volurrrae position
represented by
the solid lines 144.
A two-dimensional (2D) beam shear is advantageous over a 2D slice shear,
and is therefore prefen ed, since a beam is shifted without changing the other
two
coordinates. Thus, the; resampling for each pass of the 2D beam shear approach
is
simpler, as only a linear interpolation is required. In contrast, a 2D slice
shear
approach requires a bi:liriear interpolation which is more complex.
Similar to the 2D slice shear decomposition, ir order to build the general 3D
matrix from 2D beam shear matrix decompositions, shear products may be
restricted
to products of different shears: S(x, yz, (c, 69), S(y, xz, (a, b)), S(z, xy,
(e.j)). However,
the product matrix of these three shear matrices will still not be in the
general forn
due to a constant 1 in the matrix. A.ccordinLgly, as in the slice shear
method, another
shear matrix is preferably concatenated, where this final shear is the same
bearri shear
as the first one. This results in the following six per-nu.ta.tions of shear
sequences:
S(y,xz, (a, b)) "" (z,xJ ) (e9j)) S(x,yz, (c, d)) S(y,xz, (6 9' ))
S(y,a.z,(afb)) S(Y9yZ, (c, d)) S(z9xy9(ej)) s(y,xz,(g9h))
47 \zSi'J S \e9J // 9(J 9xz9 (a, b)) " (x9 J"z9 (c, G' ) L~(z9'%y S (g9J ))
S(z,a.,y9 (e9j)) S(x,yz, (c, d)) S(y,xz9 (a, b)) ,3(x4xyf (i9J))
,S(xf 1'z9 (c, d)) S(y, xz, (a, b)) S(z, xy, (e,j)) S(x,yz, (m, n))
S(x,yz, (c, d)) S(z, xy, (eej))) S(y, xz, (a, b)) S(x,yz, (m, n))
CA 02337530 2001-01-11
WO 00/04505 PC'fiYCiS99/36038
43
For each of the above shear sequences, the product matrix of the consecutive
shear matrices is preferably eomputed and set equal to the underlying 3D
rotation
matrix. For example, for the first shear sequence given above (i.e., S(y, xz,
(a, b)) S(z,
xy. (e, fl) S(x) yz, (c, d),, S(y, xz, (g, h)))
R3D 1?x(~)R y (O)R~(a) _
"'(/ 9xz9(a9 b)) "'(z9xy9(e9y))
- S(X9.YZ (C9 d)) ' S(y) xz9 (g9 h))
The above rnnatrix equation implies rdne trigonometric equations with eight
variables, namely, a, b, c, d, e, f, g and h. In solving these nine equations
for the eight
variables, a - h, the fol:lowing results are obtained:
a_sinO sina (cos(~ -c sO) + sin4) (cos(3 -cosa)
sin~sin. G (C sO)2
c sc~c sEl - 1
b = sanc~cos
c = -CG95 sfl84qx
CosO silYa 1- sYn~siriO C sy - c17s(~94P.na
d sin~cos0
Cos(~ sin() sinOG - sin4) Cos(x + sin42COSO
C s sina
f= ~IIIn~cosO
c seeosa - 1
9
CosO sfllla
h = sfln~sinO (cos - cosa) + sanoc(c J4) - cosO)
sin~ sina (CosO)z
In a similar manner, the shear matrices for the rer,raainirag five beam shear
sequences given above may be obtained. The beaYn shear sequence with the
solution
given above is preferably termed the dominant sequence.
CA 02337530 2001-01-11
WO 00/04505 PC'r/iJS99/1603$
44
With reference iicsw to Figure 16, a method for performing three-dimensional
(3I3) rotation by two-dimensional (2D) beam-slice shear decomposition in
accordance
with the present invention will be described. A 2D beam-slice shear may
preferabfly
be defined as a bearn that is shifted within a plane. For example, a 2D x-beam-
y-slice
shear is preferably expressed as:
x=x +a y +g z
z = z + b'Y
A 2D x-bearn y..slice shear may preferably be written as S'((x, yz, (a, g)),
(z, y,
b)), interpreted as a shear along the x axis by an amount a in the y-direction
and an
amount g in the z-directien, combined with a shear along the z axis by an
amount b in
the y-directi n, where a, g and b are preferably constants. In essence, a beam-
slice
shear is a combination of a beam shear and a slice shear. Figure 16
illustrates an x-
bearn y-slice shear, S((e, yz, (a, g)), (z, y, b)), wherein the volume
represented by the
dotted lines 156 is shecred to the volume position represented by the solid
lines 154.
To build the ge:aeral 3D matrix from. a 2D shear matrix decomposition, shear
products may be restricted to products of different shears: y-beam-x-slice
shear S((y,
xz, (c, h))l (z, /4s d)), x-CoeaCYBy-.Jldce shear U(ftJ yz, (a, g)), (z, y,
b)), '~'- d y-beaYYE shear
S(y, xz, (I, j)). As in the case of the slice shear and beam shear approaches,
it is to be
appreciated that there cire also six permutations of be -slice shear
sequences.
For each of the shear sequences, the product matrix of the consecutive shear
matrices is preferably computed and set eqtkal to the underlying 3D rotation
matrix.
For example, for the first beam-slice shear sequence given above (i.e., S((y,
xz, (c, h)),
(z, x, d)) S((x, / z9 (a, CJ/)- (z, / 9 b)) S(y, xz, ( J fift
R 3D = Rx(cp)R y(f))Rz(sx) =
S((y, xz, (c, h)), (z, x, d)) S(x,yz, (a, g)), (z,y, )/
A1~ty,xT.9~ffifJ'!~
CA 02337530 2001-01-11
WO 00/04505 PCT'/Lls99/16035
The above nnatrix equation ianplies nine trigonometric equations with eight
variables, namely, a, b, c, d, f g, h and I. In solving these nine equations
for the eight
variables, the following results are obtained:,
a = siYA(Dsin Cosa - CosC~'9siI1a
b = :,in~cas6
c = ~~in~i(cos - eosa) + sin6sia~a (cos(~ - cosO)
-
Sg11~(C()SO)2 siffiot
d _ sill~CosQ; -- cos4~siYbO sina - si81(9Gos
c1Ds S1Lfl3CG
.dneJ)sgn (cos - cos6x) + sina(cosa~ - cos0)
P =
sbnca(cOsO)2 sina
CosQ sin G + siriq9 Si21E) CoSa - CosB~ siYlce
9 _
s4n(~CosO
_.Cos(~Co" - I
~
sil3(~cosO
CGO Cosa - 1
z =
cos0si83a
It is to be appreciated that the shear matrices for the remaining five shear
5 sequences may be obtained in a similar marnier.
Figure 17 illus9xates a fourth method for performing an arbitrary three-
dimensional (3D) rotation using 3D beam shear decompositions, according to the
present invention. By further examination of the product matrix of the
consecutive
10 shear matrices used in the beam-slice shear decomposition inethod described
above
(i.e., S((y, xz, (c, h)), (z, x, d)) ' S(ft= yz, (a, g)), (z,1'1 b)) S(y,
xz, (1,9)), the first pair
and the last pair of 2D shears can be merged since there is a common beam in
each
pair. For example, x beam is a common beam of the y-slice and z-slice shears
of the
first pair. Therefore, the number of shears can be reduced to two by
introducing a
15 new definition of a 3D beam shear.
CA 02337530 2001-01-11
WO 00/04505 PCT/tJS99/16038
46
Figure 17 illustr ates a 3D x-beam shear, which is equal to the concatenation
of
two consecutive 2D slice shears S'(xz, y, (a, b)) S'(xy, z, (e,)9). It is to
be appreciated
that there are two other 3D beam shears, narnely, a 3D z-bearn shear,
represented as
S(yz, x, (c, d)) S(xz, y, (a, b)), and a 3D y-beam shear, represerflted as
S(yz, x, (c, d))
S(xy, z, (e, fi). Every 3 beam shear preferably involves only one major
bearn. With
reference to Figure 17, the marked x beam 158 (dark shaded beam) is preferably
translated to a new 3D location following the arrows. f'he lighter shaded
bearn 158"
indicates the intermediate position if the shear decomposition is interpreted
as two
consecutive 2D slice shea.rs.
The three 3D beam shears may preferably be denoted as SI~3 p,~'I~3~, and
x y
SH3D .Now, using thÃs method fthe present invention described herein, an
arbitrary
II
3D rotation can be decomposed into only tmrcs consecutive 3D beam shears. The
dominant decomposition sequence may be obtained directly from the 2D slice
shear
sequence as:
R3D = SH3Dx" SHMa
where
SH3D - S(xz, y, (a, b)) * S(xy, z, (ej))
x
1 0 0
_ a +be 1 +b,f b
e f 1
6.Dd13DII 9(y1,x,(c, ")) - S(xzyJ9(g7 h))
1 +cg c d+ch
9 1 h
0 0 1
Using the 3D beam shear decomposition approach of the present invention
described herein, an aibitrary 3D rotation preferably involves only two major
beam
CA 02337530 2001-01-11
WO 00/04505 PC"r/t3s99/16038
47
transformations, whereas conventional decomposition approaches require three
(e.g.,
Hanrahan's decomposition). In accordance with the 3D beam shear method of the
present invention, the first pass involves orily x beams and the second pass
involves
only z beams. By the end of the iEirst shear pass, all voxels of a bearn
preferably have
the same offsets. As there are N' beams for an N' volurrre, there are only NZ
different
offset values. Accordingly, the offset values for N' beams can be stored at
the end of
the first pass, while storing the voxels to their nearest neighbor integral
positions.
When gnnltiple pass algorithms are used, the resampling techniques chosen are
key to achieving high quality. Intuitively, res pling is necessary for each
pass
because a continuous shear transformation may move voxels off the grid points.
One
problem inherent with. multiple resampling, however, is -the quick degradation
of the
volume quality if consecutive rotations are applied to a volume. It is
therefore
desirable to sample the volume only once.
Accordingly, a preferred method of'the present invention achieves one pass
resampling of a volunie. In essence, the method of the present invention
involves
precomputing a sampled volume and then fiising only zero-order (i.e., nearest
neighbor) interpolation in each shear pass, thereby distinguishing frorn known
prior
art methods which require global communication (e.g., NIVittenbrink and
Somani's
permutation warping).
Given an original vol e(soia.rce volume) and the desired rotated volume
(target volume), the rxiethod of the present invention preferably first builds
up a one-
to-one correspondenoe between a source voxel and a target voxel. This one-to-
one
mapping is guaranteesi by the multi-pass shear decomposition of the present
invention
because each shear is a one-to-one transformation using zero-order
interpolation. The
concatenation of a seqnence of one-to-one mapping remains one-to-one. Once
this
one-to-one correspondence is built up, the method of the present invention
preferably
calculates for each so arce voxel its corresponding target voxel and stores it
in the
source voxel position, During this procediire, no global communication is
required;
the resampling is perforrned by interpolation on the local voxels. The
sampling
CA 02337530 2001-01-11
WO 00/04505 PC'r/iJS99116038
48
position of each target -voxel is preferably computed using a backward
transformation
of rotation.
After obtaining the values for all target voxels, the method of the present
invention preferably sb.uffles therni to their destiiiations in target volume.
Intuitively,
this would involve global communication. However, global co uiiicatiorb is
expensive to perforrra ~:)r parallel irrYplemeroi:ation. Therefore, the method
according to
present invention preferably uses multiple shears with a nearest neighbor
placement
scheme to achieve this voxel shuffling. Since shear is a regular, non-conflict
trarlsformation, each pass can be perforrned more efficiently than if global
communication was utilized. Using the 3D beam shear decomposition method of
the
present invention desc7-ibed herein, only antiraimurn of two passes of regular
local
co unication are necessary to achieve virtually the same effect as global
communication.
It is to be appreciated that care should be taken to avoid the overlapping of
beams in 3I? beam shear. Consider, for exainple, the 3D x beam shear equation
given
above. raile each x bearn is preserved (i.e., an x bea.ra remains rigid after
a 3D x
bearn shear), several x beams may overlap with each other. To maintain the
required
one-to-one mapping, recall that a 3D beam shear is the concatenation of two 2D
slice
shears, as discussed above. A 2D slice shear maintains one-to-one mapping when
using zero-order interpolation. Therefore, as a solution, the method of the
present
invention preferably calculates the destination coordinates using the same
order as
that of two consecutive 2D slice shears, but communication is preferably only
perfo ed once. For a 3D x beam shear, while the x coordinate is calculated
directly
using the 3D shear rna.trix (described above;), the.y and z coordinates of
each beam are
preferably calculated as
z rounci(z + b Y)
yrouaad(Y +f z)
CA 02337530 2001-01-11
WO 00/04505 PCT/iJS99/16038
49
where round(w) is a function of rounding w to the nearest integer. Coordinates
(y', z')
deterrnine the integral coordinates of the whole beam for the nearest neighbor
storage.
In this manner, no overlap occurs.
In accordance with another form of the present invention, several unique
methods for performing enhanced volume processing will be discussed in detail -
herein below.
Perspective projections present inherent challenges, particularly when
perforrning ray casting. For parallel projections, sight rays that are cast
through a
volume dataset maintain a constant s pling rate on the underlying volume data.
It is
straightforward to set this sampling rate to create an output image of the
required
quality. For perspective projections, however, the rays do not maintain such a
continuous and uniforrn sampling rate. Instead, the rays diverge as they
traverse the
volume from front to back. This creates an uneven sampling of the underlying
volume, as shown in Figures 18A and 1$B.
Referring now to Figures 18A and 18B, conventional ray casting algorithms
generally handle ray divergence from perspective projections by one of two
methods.
The first method is undersampling (Figure 18A), in which rays 160 are cast,
frorr:.~ a
predefined viewpoint 162, so that the sampling rate at the front of the volume
164 is
appropriate for the de:sired irnage quality. However, because of the
perspective ray
divergence, the n.nderlying volume dataset is undersampled. This may result in
severe
aliasing by creating "holes" in the rear of the volume 166 where regions of
voxels .
remain unsampled. The second method is oversampling (Figure 1813), in which
rays
160 are cast from a predefined viewpoint 162 so that the sampling rate at the
rear of
the volume dataset 166 is appropriate for the desired image quality. This
approach
avoids the aliasing of the first method; hoNvever, the volume may be radically
oversampled in the front 164. The inefficient oversampling in the front of the
volume
164 dramatically increases the runtime of this method. The rays 160 can be
cast with
a sampling rate between undersampling and oversampling. This results in a
tradeoff
between the image qtiality of oversampling and the rendering speed of
undersampling.
CA 02337530 2001-01-11
WO 00/04505 PC"r/[JS99/16038
Many prior imaging architectures do not even attempt to perfonn perspective
projections. Other architectures have dealt with perspective projections by
casting
diverging sight rays from a predefined viewpoint, which produce images with
temporal aliasing and either do not achieve true real-time frame rates (i.e.,
301-Iertz)
5 or are much more complex than the slice-order method of the present
invention.
A ray-splitting method applies the concept of adaptive super-sampling in order
to maintain a unifonn :ray density. In this approach, a ray is split into two
child rays
when neighboring rays diverge beyond somie predeterrniried threshold.
Recently, a
10 method was proposed which divides the viewing frustum into regions based on
distance, from the viewpoint, such that the ray density in each region is near
the
underlying volume resolntion. Afterwards, such method projects each region
onto
sub-images and composites them into the frame buffer using texture mapping
hardware. In effect, the technique casts coritinuous rays through a region,
then at
15 specified boundaries, splits them into a nevEr set of continuous rays.
T'his, however,
creates a potential undesired discontinuity between regions.
A method for performing perspective projections of izniforrnn regular
datasets,
termed ER-Per spective (exponential regions perspective), in accordance with
one
20 fonn of the present inirention, preferably adaptively sarnples the
underlying volume,
whereby the above-described problems, inherent in conventional volume
rendering
systems and methods, are essentially eliminated. The ER-Perspective algorithm
combines the desirable properties of both undersampling and oversain.pling,
providing
extremely good anti-a[iasing properties associated with oversampling methods,
while
25 providing runtimes on. the order of undersampling methods. Furthermore,
this
algorithm preferably creates at least one sample for every visible voxel in
the volume
dataset. ER-Perspective gains a runtime aclvantage over previous work by
utilizing
slice-order voxel access, while maintaining equal or better image quality in
comparison to known perspective projection methods.
Figure 19 is a 2D top view illustrat:ion of the ER-Perspective algorithm, in
accordance with the present invention. As shown in Figure 19, the ER-
Perspective
CA 02337530 2001-01-11
WO 00f04505 PC'r/LJS99116038
51
algorithm preferably works by dividing a view frustum 168 into a plixrality of
regions
based on exponentially increasing distances along a major projection axis
(e.g., z-axis)
from a predefined viewpoint 172. Preferably, continuous sight rays 174 are
cast from
the viewpoint 172 froni back-to-front (or front-to-back) through the volume
dataset
and the rays 174 are merged (or split) once they become too close (or too far)
from
each other. Since the operation of the ER-Perspective algorithm is similar for
baclC-
to-front compared with front-to-back ray casting, the remaining discussion of
the ER-
Perspective algori ,will be limited to the imore intuitive case ofback-to-fi
ont ray
casting with merging. The differences are pointed out where they are
significant.
The ER-Perspective algorithm preferably uses region boundaries 170, which
define the exponential regions, to mark the locations where the sight rays 174
are
preferably merged. B,r defining the regions and merging all rays 174 at the
boundaries 170, the algorithm provides a regular pattern of ray merging that
is
dependent on the global geometry rather than local neighborhood conditions.
Figure
20A more clearly illustrates the merging of sight rays at region boundaries
170 for
contribution to baseplauie pixel B, in particiclar. With reference to Figure
20A, an odd
number of rays 174 are preferably merged such that the resulting ray 174 is
essentially an exact continuation of the previous center ray, thus eliminating
potential
discontinuities present at the region boundaries 170. This is one important
advantage
of the method of the present invention over known prior approaches. F errnore,
this algorithm can be (lualif:ied by characterizing the filterfing achieved
when
adaptively sampling the volume.
An example of:' a preferred filtering scheme is shown in Figure 20B. Refe ' g
to Figure 20B, aBartl(dtt window (i.e., linear interpolation, triangle filter)
is preferably
employed. Cascading efficient local Bartlett windows at each region boundary
170 is
essentially the equivalent of resampling the rays 174 with a single large
Bartlett filter
for each baseplane ph~:el (see Figure 20A). A graphical representation of the
preferred
filter weights 175 is shovvn for contributiors to the baseplane pixels (e.g.,
pixels A, B,
C).
CA 02337530 2001-01-11
WO 00/04505 PCT/CJS99/16038
52
The base sampling rate of the algorithm can be set to a predefined value
according to a desired image quality. The base sampling rate is the minimum
ray
density compared to the underlying volume resolution. Although the ER-
Perspective
method of the present invention supports vii:-tually any sampling rate, a
sampling rate
of at least one ray per voxel is preferred. The algorithm has the advantage of
keeping
the ray density between one to two tirnes the base sarnpling rate. This
guarantees that-
no voxels are missed iyn the rear of the volurne dataset and places an upper
bound on
the total amount of wo:rk perfonned at two times (2x) supersampling.
Since the present invention utilizes slice-order processing, the volume
dataset
is projected onto a baseplane of the volume which is most perpendicular to the
view
direction. The baseplane image is then warped onto the final image plane in a
conventional manner (e.g., in the same manner as in shear-warp or the prior
Cube-4
architecture).
The ER-Perspective method of the present invention is ideally suited for
implementation on the Cube-5 architecture described above. Specifically, this
algorithm preferably only requires nearest rteighbor comriiunication between
processing elements. While processing a row of voxels on a one-dimensional
array of
processing elements, the algorithm only requires processing elements to
communicate
with their immediate left and right neighbors. The Caibe-5 rendering pipelines
similarly support nearest neighbor communication.
The ER-Perspective algorithm of the present invention preferably employs
slice-order processing along one of the three major axes. Consequently, the
regions in
the ER-perspective algorithm are defined as slabs of slices along a ynajor
projection
axis. In a preferred ernbodirnent of the EIZ=perspective method according to
the
present invention, the volume dataset is projected along slices perpendicular
to the
z-axis. So as not to lirnit the methods of the present invention to
projections along the
z-axis only, it is to be appreciated that the coordinate sys-tern may be
flipped and the
geometry rotated. The algorithm proceeds, as illustrated in Figure 7, by
measuring the
distance along the z-axis, from the viewpoint 96 to the front of the volume
dataset 80,
CA 02337530 2001-01-11
WO 00/04505 PC"I'/tJS99/16038
53
is determined (eZ.). Subsequently, a first region 92 is created to consist of
as many
z-slices as this distance. Each successive region after the first region 92 is
preferably
twice as deep as the one before it.
When combined with high quality supersaynpling, the first region is exactly as
large as needed to have one ray per voxel at the end of the region when
shooting one ray per pixel of the -fin'Ll image. Thus, supersampling higher
than 2x might be needed
in the front of the volugne to render high quality close up views.
As illustrated irL the example of Figu.re 19, if the viewpoint 172 is three
voxel
units from the front of the volume (i.e., the ?= 3 region boundary), for
example, then
the first region 176 is preferably three voxel units thick, the next region is
six voxel
units thick, and so on. In general, the I-th region is preferably eZ 2'
slices thick,
where eZ is the distance; from the viewpoint 1.72 to the froilt of the volume
(see Figure
19). Forcing the regioais to be thus defined produces the desired effect that
any two
perspective rays 174 cast through any of the regions are twice as far apart at
the rear
boundary (i.e., the z = 24 boundary) as they are at the fTolit boundary (i.e.,
the z= 3
boundary). This is shown in Figure 19 as the distance between the two rays 174
grows from one unit to two units across the first region 176, then to four
units, and
finally to eight units at the rear of the last region. Additionally, since the
region
boundaries 170 are dependent on the global geometry, the efficiency of the ray
casting
algorithm is maximized by providing amechanisrn for keeping the ray density
between one and two times the underlying volume resolution in each
dflYTleR3slEDTI. It
also creates a regular topology so that the filltering of the data can be
controlled as
perspective rays are cast.
Having regions with boundaries at exponential distances produces a ray
density twice as high at the front as at the back of the region. Therefore, a
rnecharlism
must preferably be provided to adjust the ray density when crossing a region
bounradary. Since each ray preferably starts on a voxel coordinate at the rear
of a
region, at the front of -the region every secoild ray in each dimension will
preferably
coincide directly with a voxel coordinate. 1: he remaining rays preferably
intersect the
CA 02337530 2001-01-11
W 00/04505 PC"t'/ils99/16038
54
region boundary halfway between two voxel positions. To down-sarnple the ray
density with this detem-inistic ray pattern, a two-dimensional (2I)) Bartlett
filter (also
known as tent or triangle filter) is preferably employed, with an extent of 1
voxel
unit in each dimension. Because the ray density at the front of each region is
twice
the voxel density, this ? X3 voxel neighborhood is intersected by 5x5 rays.
Referring
now to Figure 21, since the edges 178 each have a weight of zero, only the 3x
3
neighboring rays 180 are used for applying the filter to down-sample the ray
density.
This effectively merges neighboring rays. .A, Bartlett filter is preferred
over a simple
box filter for the added quality it produces in the final image. For the case
of
front-to-back processing, rays are split instead of merged. Here a bilinear
interpolation of the rays is perforrned to generate the new rays which begin
between
the other rays. It should be mentioned that a Bartlett filter of size 1 is
the inverse of
a bilinear interpolation operation.
Figure 22 shows a 2D example of how sight rays 186 travel through a 73
volume 192 when the viewpoint 1961s three voxel units An front of the volume
(i.e.,
from the baseplane 192). Notice that the sampling rate remains between 7 and
14 per
slice, and that it increases as the rays 186 travel through tYie regions from
back to
front. The number of ray density resampling stages for an M volume is limited
by
log2N, since that is the maximum number of regions in an M vol e; The last re-
sampling step shown on the baseplane 198 is preferably perforxned when the
final
image warp takes place.
As illustrated in Figure 22, the rear of the volume dataset 182 does not
necessarily always coincide with a region boundary 184. However, since it is
preferred that the rays 186 be on exact voxel coordinates 188 at all of the
region
boundaries 184, the rays 186 preferably originate on the grid coordinates 190
at the
rear of the last region enclosing the volume dataset 192 (shaded area).
Therefore, the
voxel coordinates and the ray sample locations 194 may riot be congruent at
the rear
of the volume 182. TYiis not only provides the mentioned boundary conditions,
but
aids with temporal anti-aliasing when the viewpoint 196 is moved iri smaller
than
CA 02337530 2001-01-11
WO 00/04505 PCT/LJS99/16038
voxel unit distances, because the rays 186 will continue to originate from the
same
positions relative to the voxels.
Figure 23 depicts a preferred rnethod. for perforrning ER.-Perspective
5 back-to-front proj ectioii of a volurkie, in accordance with one forrn of
the present
invention, although other embodiments of the ER-Perspective method are
contemplated. As described above, first, the distance fror"I the eye or
viewpoint to the
baseplane is preferably detergniried (in voxeli units). Using this viewpoint
position,
exponential region boundaries are created. lqext, enough regions are
preferably
10 established to completely encompass the volume dataset. To perforrn the
volume
rendering, the algorithr.a loops through each region from the back to the
front,
computing norgnal ray casting, but in a slice-order fashion, and stores the
partially
computed rays in a conipositirlg buffer. Between regions (i.e., at the region
boundaries), ray density re-sampling of the compositing buffer is preferably
15 prefortned, as described previously. The baseplane image is then warped
onto the
final image plane for di.splay.
With adaptive ray density perspective methods knovvn in the prior art, it is
generally difficult to determine the filtering functior$ achieved when rays
are merged
20 using irregular patterns. However, since the; ER-Perspective method of the
present
invention preferably u...es regular boundaries for the filtering operations
and exact ray
placement within the boundaries, it is easier to compute the effective filter
achieved
by the cascading of local Bartlett filters. This is an irnportarit adva-ntage
of the
ER-Perspective algorithm of the present invention. Additionally, the
boundaries and
25 filter of the present invention have preferab'.ly been chosera to overcome
the poor
image quality usually associated with conventional successive filtering of
discrete
data.
Consider, for example, the case of a perspective projection of a volume seven
30 slices deep with the viewpoint two voxel units in front of'the volume, as
depicted in
Figure 24. Using the 1.R-Perspective method of the present invention, the rays
200
that are cast through a region are one voxel unit apart at the rear of the
region.
CA 02337530 2001-01-11
WO 00/04505 PC'TIL.TS99/16033
56
However, when the rays reach a region boundary 202 they are preferably
filtered
using local Bartlett filters. The Bartlett filters (simplified to 1-
diYnension) contain the
following weights f r a kernel of size 2n+l, normalized so that the output has
the
same scalar range as the input:
t 2 sa - f n n-1 2 1
9 ~9 9 ... , , , , .o. , y y 0
n2 nZ n2 T2Z ~b~ nZ nZ
For two-dimensional i3nages at region boundaries, the present invention
preferably
employs a two-dimensional Bartlett filter bv convolvir4g two one-dimensional
Bartlett
filters in the two principal directions. The ER-Perspective algorithm
preferably
always resamples the rays to have half of the original density. Using a filter
of size 2
rays (n=2) creates a filter kemel of 5x5, or just the following five weights
for one
dimension:
1 2 t
0s_.9-_'_.90
4 4 4
By way of exwnple, as illblstrated in Figure 24, consider the contribution of
samples a, b,'c, d and e to the partially composited ray which changes from
region 2
to region 1 at location o,
o = I b + 2 c + 1 ~
4 4 4
Likewise, the partial rays at locations p and. q are cornputed as:
p = I d + 2 e + 1 ~
4 4 4
CA 02337530 2001-01-11
WO 00/04505 PC'TftJS99/16038
57
q = I f + 2 g + 1 h
4 4 4
The equations for partial rays n and r have been omitted since they have a
weight of
zero in the final filter for pixel A. Corltin.uiiig the ER-Perspective
algorithm, the
resarnpled partial rays ;i, o, p, q and r are preferably cast through region 1
where they
are again filtered by a local Bartlett filter. T'he norinalized contribution
of n, o, p9 q
and r to pixel A will be.
A = 1 + 2 p + 1 q
4 4 4
Substituting in the vallies for o, p and q results in:
A = I b + 2 c + 3 d + 4 e + 3 f + 2 g + I h
16 16 16 16 16 16 16
It is to be appreciated that this formula contains the same weights (i.e.,
coefficients) as
a Bartlett filter with a l:ernel size of nine values (n = 4). This can be
repeated for pixel
14 B with the sarrle filter weights. For front-to-ba.ck processing, a similar
analysis can be
used to demonstrate the performance of the algorithm and the result of
successive
applications of the bilinear interpolation.
Each sample oi.'a slice preferably contributes the same ainount to the final
image as any other sample in the same region (assuming all other operations on
samples, such as color mapping and compositing, are equal). For ex ple, the
value
that sample e contributes to pixel A with ari effective weight of 1/4 after
the cascading
of the local Bartlett filters. Likewise, sample I contributes to pixel B with
an effective
weight of 1/4. Samplef contributes to pixel A with a weight of 3/16 and to
pixel B
with a weight of 1/16 for a total of 1/4. This can be repeated for samples g
and h.
The samples to the left of sample e and to the right of sample I partially
contribute to
pixels left of pixel A and right of pixel B, respectively, such that the sum
of their
contributions to the final image is also 1/4. In fact, every sample that is in
this region
CA 02337530 2001-01-11
WO 00/04505 PC7 /[JS99/16038
58
has the same weight. The weight is 1/4 because this regior.t is the second
region in the
volume. For the first region in the volume, every sample preferably has a
weight of
%2. This is qualifiable by realizing that there are two rays per final irnage
pixel in this
region. There are four rays per final image pixel in the second region, etc.
Consequently, the weight which determines -the contribution of each sarnple
towards
the final image is the ratio image pixels
samples in this slice
Since the ER-Perspective method of the present invention performs a
slice-order processing, ',:he total arnount of computation may be analyzed by
calculating the amount of work performed on each slice. Assuming that the work
done on each sample is the same, the count of the number of samples processed
can be
used as a comparison oI the workloads. For example, in the oversanapling
method
(see Figure 18B), the nlunnber of samples on the rear slice of a volewhich
ends
exactly on a region boundary is NI. On the firont slice, the sample count
depends on
the geometry of the viewpoint. In particuzlar, using similar triangles and
defining eZ as
the distance from the viewpoint to the front of the volume, the number of
sarnples
taken is
2
N2 + N e
z
e
z
This can be generalized for any slice s throtiLgh the voliarne dataset to
N2 +.fV e 2
z
e -s- s
z
Thus, the total count of samples processed lby the oversarzipling method is
N (N2+N.e)2
S-o e, + s
CA 02337530 2001-01-11
W 00/04505 PCT/LJS99/16038
59
Similarly, the undersampling method (see Figure 18A) can be shown to perform
the
following amount of work:
N 7r e 2
dY Z
3=0 LZ-+.S
For the ER-Perspective algorithm of the present 'nventio , the analysis is
more
N + e
complicated. Depending on the viewing geometry, log - z-1 regions are
FZ
created. It has been shown previously that each of these regi ns preferably
contains eZ
- 2' slices. Again, using; the geometric principle of similar triangles, the
ER-Perspective algorithrn of the present invention processes the following
number of
samples:
ve
1og~ $ -1
e~ e -2 @g
~ ~ ~ ,~: (e*2reg _ e~ s) 2
a z
reg=0 s= e& *2reg - eL
This formula has an upper bound of
(2N)2
s=0
and a lower bou.nd of
ENZ
S=o
Examining the equation for the total. count of samples processed by the
oversampling method (given herein above);, it can be seen that the
oversampling
approach could perforln O(N) work on the front slice when the viewpoint is
very
close to the volurrn.e. The oversarnpliiig run. tirries grow rapidly as the
viewpoint is
moved closer to the front of the volume. Examining the undersarnpling equation
above, it can be seen that as the viewpoint approaches the front of the
volume, the
numerator approaches zero. The amount of vrork perforrned on the rear slice
also
CA 02337530 2001-01-11
WO 00/04505 PCT/tJS99/16038
approaches zero. The ran times of the undersampling method decrease as the
viewpoint becomes closer to the volume.
Regardless of the viewpoint geometry, the amount of work perforrned by the
5 ER-Perspective algoritlLm of the present invention is bounded by o(M) and
w(W) per
slice. Some advantages of this approach are that the upper bound on the run
time of
the algorithm is linear with the number of voxels and is independent of the
view
position, and a lower bound on the image quality achieved is also independent
of the
view position. Thus, a user can set the base sampling rate for the desired
image
10 quality and be sure that the sampling rate is sufficient throughout the
volume for that
desired image quality.
In contrast, a conventional oversampling approach provides a lower bound on
the image quality yet the runtime of the algorithm may become much greater
than that
15 of the ER-Perspective rnethod of the present. invention. A conventional
undersampling
method provides an upper bound on the runtime for rendering, but the image
quality
may become much worse than the ER-Perspective approach.
Referring again to Figure 23, a preferred back-to-front ER-Perspective ray-
20 casting algorithm, in accordance with the present invention, is
illustrated. The
algorithm of Figure 23 is shown as a pseudo-code representation and assumes a
Z-
major axis projection. The ER-Perspective algorithm of the present invention
does
not suffer from the traditional pitfalls when perforrning perspective
projections on
uniforni regular grids. This unique approach runs faster than oversampling
methods
25 and produces better quality images than undersampling methods. Employing a
Bartlett filter for ray merging provides an irnage quality improvement over a
conventional box filter. The ER-Perspective algorithm is qualified by
characterizing
the effective filtering on the input data.
30 In accordance with another form of the present invention, a method is
presented for renderinl; a large volume, wherein tl'e voluirne dataset exceeds
the
physical single-pass czLpacity of the Cube-5 apparatus of the present
invention. The
CA 02337530 2001-01-11
WO 00/04505 PCT/US99/16038
61
preferred method subdivides the volume dataset into a plurality of cuboid
bricks.
Traversing the bricks i:~~~a a predefined order preferably enables
initialization of the
corzflpositing buffer of the Cube-5 apparatus with a baseplane image of a
previous
brick before rendering it, whereby ray path and compositing are logically
extended
throughout the entire volume. Inforrnation :regardirlg the boundary between
bricks is
preferably re-read to insure correct saznpling. Using this approach, the
maximum
volume size is limited only by the available intermediate baseplane storage.
In areas of the flataset where, during perspective projection, multiple voxels
contribute to the same image pixel, images of equivalent quality may
preferably be
rendered using a level-of-detail (LOD) tree, which may be generated, for
example, by
combining voxels of increasing neighborhood size in a pre-processing step.
While
perspectively rendering a single large volun-ie utilizing LOD, preferably only
a small
portion of the volume, substantially close to the viewpoint, must be read in
its highest
detail. The more distant portions of the volume, with respect to the
viewpoint, may
then be rendered from lower resolution versions of the data. Thus the frame
rate
and/or dataset size is preferably increased. Since each region in the
perspective
algorithm of the preseiit invention (previously described) will now be at a
different
LOD, there is no longer need to filter the rays betweeri regions, but merely
to
redistribute them. Preferably, only one regiion of each LOD tree level is
processed;
thus, only those regioxcs must be paged into memory.
The level-of-detail (LOD) method of the present invention may also be used
for rendering scenes comprised of multiple objects at differing distances from
the
viewpoint. For such cases, a starting LOI3 is preferably selected that
delivers a
baseplane image of about the same size as the screen space image, tliereby
relating
rendering time to image resolution and not to object size (i.e., scale
independence).
Although back-to-front rendering is similarly contemplated by and within the
scope of the present invention, the unique LOD method will be described herein
in a
front-to-back rendering context. Rendering front-to-back, it is preferable to
start with
a slab of the most detailed representation of the volurne to be rendered. In a
preferred
CA 02337530 2001-01-11
WO 00/04505 PC't /tJS99/16038
62
method of the present invention, the thickness of the volume slab is chosen so
that
projected voxel distances in front and back of the slab differ by a factor of
two, similar
to perspective projections according to the present invention, as previously
described
herein. After renderinl; a slab, the current compositing buffer image is
preferably
scaled by a factor of 0.5 in the warp unit. This initializes the compositing
buffer for
the rendering of the next slab of half the resolution. Preferably, only one
slab of each:-
LOi3 actually flows th:rough the rendering pipelines; thus, for large volumes,
only
those slabs must be pa,~ed into the on-board 3D memory.
It is to be appreciated that the apparatus of the present invention can also
be
employed to speed Up eff line computations, such as generation of level-of-
detail
(LOD) and filtering of datasets. To generate LODs, the trilinear interpolation
unit
(TriLin) of the present invention preferably sets all its weights to 0.5. Once
new
samples become available, they are preferably subsarnpled and compacted into a
new
volume, which is the next coarser LOD. To filter a dataset, the trilinear
interpolation
unit again uses only 0.5 weights; this time, however, data is fed back to the
beginning
of the rendering pipeline without compaction. Each additional pass creates a
new
filtered volume with a filter lrernel having one more voxel extent in every
major axis
direction.
For higher quality image rendering, the apparatus and methods of the present
invention preferably provide the flexibility to utilize a full hardware
implementation,
multi-pass algorithms, and/or a combination of the two, depending on the
desired
tradeoffs. The full harduaare implementations and multi-pass methods
preferably
provide more accurate computations in two primary functional areas: filtering
and
interpolation.
The Cube-4 architecture, a predecessor of the present invention. (Cube-5),
utilizes a central difference gradient filter with only two sample points to
estimate
each of the x, y and z gradients at a particular location. A larger 3D filter
can deliver a
more accurate gradient estimate, such as a Sobel filter (which is a 33 filter
with
weights derived from the inverse of the Manhattan distarice from the center
point). A
CA 02337530 2001-01-11
w 00/04505 PCT/[IS99/16038
63
straightforward hardware implementation of a 33 filter, however, requires 27
multipliers and 26 adders.
The apparatus of the present invention presents an alternative to this
expensive
prior art approach by using symmetric convolution filters. The convolution
filters can
be efficiently irnplernezited with only three naultipliers and six adders, at
a significant
cost savings. Replication of hardware per gradient component can preferably be
avoided by applying a three-pass algorithm instead. As arz example, Figure 25
illustrates a symmetric approximation of the x-component of the Sobel gradient
filter.
Within each stage, the weights are preferably applied to the nearest neighbors
before
summation. With reference to Figure 25, if each stage operates on the output
of a
previous stage instead of on the raw data, the weights presented in Figure 25
will
effectively produce the 33 symmetric approximation of the Sobel gradient
filter (right
side of Figure 25). Ch;mging the x-weights to { 1 w 1} will produce an
approximation
of a Gaussian filter instead.
The present invention contemplates higher quality rendering modes in which
no additional hardware; is needed, but in which the frame rate is lowered. One
such
example is to achieve larger neighborhood contributions to the gradient
estimation by
utilizing level-of-detail (LOD) infortriation. If the central[ difference
gradient is
computed on data of tYle next coarser LOD, it is effectively the equivalent of
employing a 6x4x2 filter, with 6 being the extent in the direction of the
current
gradient component. Since the apparatus o:f the present iiivention (i.e., Cube-
5
architecture) is able to hold mip-mapped LOD representations of the data, this
filter is
preferably achieved with essentially no increase in hardware, beyond the
simple
central difference solution.
Another highe;,~ quality rnulti-pass rendering mode provided by the preseni
invention, for which no additional hardware is required, is an approximation
of tri-
31) cubic interpolation, w:hich has beneficial applications in the medical
field as well as
other fields. This rnode enables more accurate resaYnplirig and iso-position
calculation. For this, the present invention preferably decomposes a piecewise
4 3-
CA 02337530 2001-01-11
WO 00/04505 PC'r/iJS99/16038
64
voxel filter into a series of linear interpolations and extrapolations,which
is symmetric
in every dimension, thereby allowing efficient reuse of interrnediate results.
In perforrning higher quality renderirag, it is to be aippreciated that there
are
certain tradeoffs between using additional hardware for providing more
accurate and
flexible gradient estimation within the Cube=-5 pipeline, as opposed to
employing
multiple pass algorit .s. Generally, using ai multiple pass algorithm requires
changes
in the Address Generati.on and Control unit (see Figure 5) of the present
invention to
momentarily stall the pipeline for computational purposes, while the hardware
approach requires additional application specific integrated circuit (ASIC)
logic and
additional connections to support larger neighborhoods.
With respect to enhanced volume rendering capabilities, a preferred
embodiment of the present invention supports clipping by arbitrary planes. The
distance from each plane may preferably be incrementally computed using only
registers and one adder per plane. In addition to conventional clipping planes
which
define only the positive direction as visible, the apparatus of the present
invention
preferably supports extracting an arbitrarily thick slice from the dataset for
oblique
multi-planar reforrnatting (MPR) by invalidating all samples lying outside a
predetermined offset.
Axis-aligned ctitting planes are preferably irnplern.ented by restricting the
volume traversal to the cuboid of interest. Alternatively, -the present
invention
contemplates restricting this traversal to exclude a simple cuboid from the
volume
(e.g., visualizing all but one octant of avolume).
In addition to clipping, the present invention further contemplates depth
cueing, which modulates the color of objects to simulate, for example,
atmospheric
attenuation of light through a translucent medium. This phenomenon, as
appreciated
by those skilled in the art, is terflned fog or liaze when the medium also
contributes
some color (e.g., white or gray). To implenient this feature in accordance
with the
present invention, nonnally clear regions are preferably replaced with a semi-
CA 02337530 2001-01-11
WO 00/04505 PC'r/[JS99/16038
transparent color (e.g., black for depth cueing, white for fog) by modifying
the
transfer function. Each final pixel is preferalbly further attenuated to
account for the
distance from the viewpoint to the surface oi.'the volume, preferably
implemented as a
part of warping.
5
The apparatus of the present invention additionally supports rendering of
super-sampled images with a preferred defaiilt super-sampling rate of two in
the x and
y directions, although other sampling rates are contemplated. To improve image
quality further, the sampling rate along each ray can also be increased.
Neither
10 approach requires re-reading voxels from the 3D memory. The apparatus of
the
present invention preferably changes the volume traversal order so that voxels
already
residing in the buffers ivill be read out repeatedly. Eac'd time they are
reused, new
weights are preferably iatilized in the trilinear interpolation units (TriLin)
of the
present invention to reflect the new resampling position.
In a preferred einbodirnent for supersampling iri the present invention,
central
difference gradients are computed between neighbors one distance unit apart to
ensure
sufficient precision. These gradients are preferably computed by taking the
difference
first and interpolating afterwards or, alternatively, by interpolating first
and then
taking the difference between neighbors k positions apart (assuming k times
oversampling), and preferably not immediate neighbors. A classification stage
must
consider the new inters ample distances when computing a new a' value.
Therefore,
during super-sampling, the volume will preferably be traversed in an
interleaved
pattern within each slice. This essentially ensures that a translucent
material (gel)
keeps its accumulated opacity CaBa value) independent of the sa.nnpling rate.
Thus,
for example, for an oversampling factor of k in the z-direction, modified a'
values are
preferably used, where: a'= 1 - (]. - a) 'lk.
Anisotropic dai:asets have different distances between samples along different
axes. Thus, the gradie;at computation and the final two-dimensional (2D) image
warp
preferably require axis-dependent scaling factors. In addition, the direction
in which
the sight rays are being cast through the volume dataset pireferably require
adjustment
CA 02337530 2001-01-11
WO 00/04505 PC"I'/tJS99/16038
66
to account for the implicit volume scaling, vvhich occurs when storing
anisotropic data
in an isotropic grid. The a value is preferalbly adjusted according to the
direction-dependent distance d which a sight ray travels through a voxel cell.
The
corrected a' is a = 1-(1 - a)d , with the direction-dependent distance d
preferably
being in the range [ 1, ~/3]
In addition to the methods for enhancing volume rendering capabilities
described herein above, the preser.it invention further provides several
unique methods
for universal three-dimensional (3I3) rendering, including mixing polygons and
volumes, voxelization of polygons, rendering multiple overlapping volumes,
performing texture mapping and accelerating image-based rendering. These
methods
are described in greate;-r detail herein below.
An important aspect of the present iiivention is its unique ability to
correctly
mix geometric objects (i.e., polygons) and volumes in a single image. The
apparatus
of the present invention (i.e., Cube-5) preferably leverages conventional
geometry
hardware to render opaque and translucent polygons togetlier with the Cube-5
volume
rendering pipeline.
In a preferred raethod according to the present invention, to render a scene
containing volumes and opaque polygons, all opaque polygons are first
projected onto
a Z-buffer coincident with a predefined baseplane and having sufficient
resolution to
match the volume sample distance. Using the Z-buffer, a determination is
preferably
made as to which slices of the volume are in front of the polygons for each
pixel of
the baseplane irgaage. 'The compositing bufiEer is then preferably pre-loaded
(i.e.,
initialized) with this pirojected IZC'gBaZ (i.e., Z-buffer) image,
representing the color
and depth image of the polygons. Subsequently, the voli,r.rne is rendered with
z-
comparisori enabled in the cornpositing buffer. The depth values of the opaque
polygons are checked to keep volume samples which are hidden by opaque
polygons
from contributing to the final image. CJltirriately9 the opaque polygons
occlude the
volume behind, and the volume in front correctly composites over the polygons.
CA 02337530 2007-02-22
67
In other words, the compositing buffer is pre-loaded with the z-buffer image
{Cv ZZ}, in accordance with the preferred method of the present invention,
where CZ
represents the value of the geometry sample and ZZ represents the depth of the
geometry sample from a predetermined viewpoint. During back-to-front
compositing,
the resulting output pixel in the compo siting buffer, Cou~, will preferably
be equal to
the geometry sample value, CZ, when the volume sample is behind the geometry
(i.e.,
when the depth of the sample, Z, is greater than the geometry depth, ZZ).
Similarly,
during front-to-back compositing, the samples are preferably composited using
the
Porter-Duff over operator, as appreciated by those skilled in the art. A more
detailed
discussion of the Porter-Duff a compositing rules are described, for example,
in the
text Compositing Dip-ital Images, by T. Porter and T. Duff, Computer Graphics
(SIGGRAPH84), vol. 18, no. 3, pp. 253-259, July 1984. Therefore, the resulting
output pixel in the compositing buffer, Cout, will preferably be equal to the
volume
sample value, CS, over the geometry sample value, CZ, when the volume sample
is in
front of the geometry (i.e., when the depth of the volume sample, ZS, is less
than the
geometry depth, ZZ).
Translucent polygons pose a more complicated situation, since all fragments
(both translucent polygon pixels and volume samples) must be drawn in
topologically
depth-sorted order. This is required because compositing translucent fragments
with
the over operator is not commutative. Therefore, polygons must be re-depth-
sorted
whenever the scene or viewing geometry changes. Additionally, the sorting must
be
topologically correct, including the handling of depth cycles.
Although there are proposed architectures which use an A-buffer to provide
some hardware sorting support, implementing an A-buffer in hardware allows
only
limited depth complexity (i.e., number of overlapping polygons per pixel) in a
single
pass and is costly. A discussion of a conventional A-buffer algorithm may be
found,
for example, in the text The A-Buffer, an Antialiased Hidden Surface Method,
by L.
Carpenter, Computer Graphics (SIGGRAPH 84), vol. 18, no. 3, pages 103-108,
July
1984.
CA 02337530 2001-01-11
W 00/04505 PC'r/Us99116038
68
In a preferred ndethod, the present invention adapts polygon rendering to
slice
order ray casting, and synchronizes the overall rendering process on a volume
slice-
by-slice basis, rather th.an a polygon-by-polygon or pixel-by-pixel basis. The
Cube-5
apparatus preferably utilizes the geometry pipeline and conventional graphics
hardware to render geometric objects in thin slabs that are interleaved or
dove-tailed
between slices of voluine samples 212, as illustrated ira Figure 26.
With reference now to Figure 26, each slice of the volume is preferably
sampled in planes perpendicular to the volume storage axes. The planes are
drawn in
depth order (e.g., usinl; near and far clipping planes) froni farthest from
the eye or
viewpoint 214 to nearest to the eye. 'I'herefisre, to mix translucent polygons
with
volumetric data, thin slabs of the polygons 210 are preferably rendered and
composited in between the slices of volume saYnples 212. It is to be
appreciated that
the slabs 210 represent; all of the translucent objects which lay between two
consecutive slices of the volume sarnple planes. The boundaries of the slabs
are
preferably defined such that the union of all rendered slabs 210 neither
misses nor
duplicates any region i'e.g., (), (], ..., ( ], as shown in Figure 26). The
data from the
volume slices and the translucent polygonal slabs 210 are dove-tailed together
in an
alternating fashion. Iri this manner, the correct depth ordering of all
contributing
entities is preserved and use of the over operator to composite them creates
correct
colors in the final image pixels.
In accordance with a preferred method of the present invention, the opaque
polygons are drawn first with Z-buffering. Before drawing any volume slices,
the
translucent polygons which lie behind the volume extent are preferably drawn
over
the opaque polygons using any conventional translucent polygon rendering
algorithm
(e.g., painters). I,ikeNvise, translucent polygons which lie in front of the
volume are
preferably drawn after the mixing portion of the algorithrn. Polygons which
lie depth-
wise within the volunle boundary, but to the top/bottom,4side of the volume,
are
preferably drawn in slice order as if the volume slices were planes that
extend to
infinity cutting the translucent polygons.
CA 02337530 2007-02-22
69
OpenGL may be used to directly render the thin slabs of translucent
polygonal objects. The polygons are preferably shaded using the Gouraud
shading
model included in OpenGL . A naive approach would be to render the complete
set
of translucent polygons for every slab and set the hither and yon clipping
planes to cut
the current thin slab of data. However, for an n3 volume, there could be up to
n thin
slabs that must be rendered. Since a typical scene contains very few polygons
which
span all of the thin slabs, the present invention contemplates an alternative
approach
which would involve clipping the polygons to the slab boundaries and only
rendering
the portions of the polygons within each slab. This would substantially reduce
the
processing load on the polygon pipeline. However, it would require the
application to
clip every polygon against the two planes of each thin slab which contains
that
polygon.
As illustrated in Figure 27, it is contemplated that the present invention may
take advantage of the fact that the two clipping planes 216, 218 are parallel
to keep
only the portions of the polygons which lie between the planes. While this
creates
fewer polygons than clipping against each plane separately, it still can
increase the
triangle count dramatically. The first case occurs when a triangle 220
intersects the
thin slab, but no vertices are within the slab boundaries 216, 218. When this
occurs,
one vertex must be on one side of the slab and the other two vertices on the
other side
of the slab, thus creating a trapezoid which is decomposed into two triangles.
Next,
consider when one vertex of a triangle is within the slab. In one situation, a
triangle
222 intersects the slab such that the remaining two vertices lay on the same
side of the
current slab, creating only one triangle. In a second situation, a triangle
224 intersects
the slab such that the remaining two vertices lay on opposite sides of the
current slab.
This is a worst case situation, since it produces a pentagon, or three
triangles. The
final case occurs when a triangle 226 intersects the slab such that two
vertices lie
within the same slab and, once again, a trapezoid is created resulting in two
triangles.
In a preferred embodiment of the present invention, a bucket sorting method is
applied to the translucent polygons. Whenever the viewing geometry changes,
the
placement of volume sample planes change their relative positions to the
geometry.
CA 02337530 2001-01-11
WO 00/04505 PCT'/US99/16038
Therefore, the present invention preferably creates a bucket for each thin
slab between
two volume sarnple planes. All of the translucent polygons in a scene are
preferably
traversed and each of the polygons is placed in a bucket for each of the slabs
it
intersects. For example, as shown in Figure 28, triangle T1 is placed in all
six buckets
5 since it spans all six slabs S1-S6. Triangle T2 is placed in buckets
corresponding to
slabs S2 and S3, and likewise for the remaining triangles. For the example
shown in -
Figure 28, bucketing the four triangles 'T'1 -- T4 would result in twelve
triangles being
sent to the graphics pipeline. As a comparison, if the triangles were being
clipped to
the slab boundaries, tdventy triangles would be sent to the graphics pipeline.
An alternative to bucketing is to create an active triangle list similar to
the
active edge list utilized in scan converting polygons. The triangles may be
placed in
the active list at the furst slice they intersect and removed from the list
when they no
longer intersect any slices. A data structure is preferably pre coxnputed
which
indicates which slice cach triangle first encountered. This preprocessing is
essentially
the sarne as for bucketing, with the exception that bucketing does not have to
check
for triangle removal f:)r each slice.
One advantage of the method of the present invention is that for applications
which choose to trade; off image quality in order to maintain a predeterrnined
frame
rate, the number of polygons drawn decreases as the number of slices drawn for
the
volume decreases. Tlais occurs because the interslice size increases as the
number of
volume slices decreases. The rendering rate achieved is substantially
proportional to
the number of polygeins drawn and the number of vol e samples drawn (which is
proportional to the niunber of volume slices drawn). The image quality
degradation
resulting from this tradeoff affects only the volume data, similar to taking
fewer
sarnples in any volunie rendering algorithnn.
When anixing; translucent geometries and volumes, there exist at least three
options for handling two or more translucent polygons being drawn to the same
pixel
within one thin slab. In the first option, the polygons could be drawn in
regular
processing order witli the over operator. 'While this method may produce the
incorrect
CA 02337530 2001-01-11
WO 00/04505 PC'r/iJS99/16038
71
color, the amount of color error is limited because the polygons are still
sorted by
bucketing them into thin slabs.
Another rnethod for handling two o:r more translucent polygons is to draw thin
slabs of translucent polygons between two volume sample slices as on-the-fly
voxelization. In conventional voxelization methods, when a surface is 3D scan
converted into a 3D volume grid, the resolution of the grid is commonly chosen
such
that the size of a singlv voxel represents the smallest area that can be
discerned by the
human eye when it is :rendered. In the %and Y dirnensions, the polygons are
drawn to
screen resolution. In t:he Z dimension, it is assumed that the volume is being
rendered
with enough slices such that each volume sample also represents the smallest
area that
can be discerned by tlr.e human eye. Therefore, each pixel bounded by two
volume
slices in the Z dirnens:on also represents this small area.
In view of the foregoing, a method, perforrned in accordance with one
embodiment of the present invention, may be viewed as computing on-the-fly
voxelization by utilizing 3D graphics hardivare. 'Ioxelization methods combine
polygons into a single voxel by using one of two preferred methods. The first
method
is to take the max of each color channel. The second method is to take the
weighted-
rnax as
C = (pdL) PI 4. c p2~p2
~
V (L) p1 + Pp21
where C'p1 is the color of a first polygon (polygon 1), DPI is the density of
polygon 1,
and Cq, is the color assigned to the voxel. Many OpenGL implementations, for
example, allow max blending with glBlenaFquationext(gl_max~_ext). Assuming
that
the density is equal to the alpha value (e.g., linear ramp transfer fimction
for volume
rendering), then the colors may preferably be weighted by their alpha values
before
blending by using aglBlenclFaarac (gl src_alpha, gl_ nc). However, OpenGL is
i'lot
able to compute the complete previous equation since it cannot divide by the
sum of
the alpha values after accumulating them.
CA 02337530 2001-01-11
WO 00/04505 PCT/tJS99/16038
72
The third method of drawing two or more translucent polygons to the same
pixel within one thin slab may also be considered the ;rnost accurate
approach. By
utilizing one of the previously described methods of the present invention to
perform
depth sorting, such as BSP tree, proper ordering of all translucent polygons
within
each slab is maintained. Depth cycles are preferably handled by the BSP
algorithm by
splitting polygons wtuch span a plane used in the partitioning, and eventually
one of
the polygons in the cycle is used as the partitioning plane.
As previousl,~ discussed, an important feature of the present Cu.be-5
invention
is the unique ability to couple at least one geometry pipeline or engine to
the Cube-5
system. In accordarice with the present invention, two preferred methods of
connecting one or rriore geometry pipelines to the claimed Cube-5 system on PC-
class
machines is provided, as described herein below. Both methods allow the unique
mixing of opaque and/or translucent polygons with volumetric data.
It is to be appreciated that the opaque polygons are preferably rendered such
tliat, after projecticon through the volume dataset, warping creates the
correct footprint
on the final irnage. Furtherrnore, the Z--depth values are preferably aligned
along the
processing axis, so that a volume slice index may be used for the Z-depth
checlt.
In accordance with one embodiment of the present invention, a preferred
method begins by determining a major viewing axis for the current viewing
direction.
As illustrated in lii e 29, a trarisfoi -nnation is preferably applied to the
geometry 228
so that the major viewing axis 230 is along, for exazmple9 the Z-axis. Next,
the view
or eye point 232 is moved to be along this direction, preferably by rotating
the vector
between the look-at point 234 and the eye point 232 by a predefined angle a
around
the X-axis and an angle 0 around the Y-axis. Preferably, a and ~ are always in
a
range between --45 and +45 degrees, otherwise a different baseplane would be
chosen.
A Z-slice shear transformation along X and Y (also known as a "X and ~.'
according to
Z" shear) is preferably subsequently applied to the viewing matrix as follows:
CA 02337530 2001-01-11
WO 00/04505 PC"r/QJS99/56038
73
1 0 tasncc 0 -0 1 ta.n R 0
0 0 1 0
0 0 0 1
With this geometry, when the opaque polygons are drawn, the polygon
footprints are "prewarped" so that the warping operation at the end of Cube-5
rendering creates correct polygons in the final image. Additionally, the Z-
depths
computed are preferably proportional to the distances alorig the processing
axis. It is
possible (e.g., if all opaque geometry fits within the volume extents) to set
the hither
and yon clipping planes to the edges of the volume and, if the precision of
the depth
buffer is the same, the depths computed are exactly the volume slice indexes
for depth
checking. Otherwise, a simple scaling must be applied vrhen the computed
depths are
utilized by the volume rendering system. Light positions should be considered
when
using this method, however, as the shearing may not move the lights to the
correct
location.
The thin slices of translucent polygons preferably align geometrically with
their 3D positions in space. Preferably, the eye point is first aligned as
previously
described. Next, in order to keep the objects from projecting all the way to
the final
image plane, the geometry is preferably trar!slated such that the center of
the current
thin slab is at the Z=0 plane prior to shearing. Clipping planes allow only
the current
thin slab to be rendered arad the projection plane is set to be within the two
volume
slices which border that region with, for example, glOrtho (glFrustum for
Perspective).
Important to comprehending the present invention is to understand the
organization of frame buffer design and cornposting buffer design. As
illustrated in
Figure 30, the Cube-5 volume rendering pipeline 236 of the present invention
preferably utilizes a tightly coupled on-chip SRAM buffer 238, termed a
composting
buffer, to hold the partially composited rays as a volume is processed in
slice order.
CA 02337530 2001-01-11
WO 00/04505 PC't'/I.1S99/36038
74
This architecture exploits the regular processing sequence inherent in slice
order
rendering. Specifically, each slice of the volume 240 is preferably processed
in the
sarne order as the previous, left-most voxel to right-most voxel of each row,
and
bottom-most row to top-most row of each slice (possibly with some skewing). In
this
way, the SRAM composting buffer 238 becomes a simple FIF queue having a
length equal to the size of a slice. The SRAM queue is preferably 32 bits wide
to hold-
8-bit fixed point IZGEa values (called coxels). Each pipeline 236 preferably
reads a
coxel from the front of the queue and writes a coxel to tle rear of the queue
for each
clock cycle.
In contrast, dvith reference now to Figure 31, conventional PC-class geometry
pipelines 242 utilize an exterrial DRAM frarne buffer 244, which stores the
IgGBa
color values and Z-dep th values for each pixel. This frame buffer 244 must
support
random access, since polygon rendering does not enjoy the regular access
ordering
inherent in slice-order volume rendering. Nortnal polygon rendering produces
triangles on a screen of average between 10 and 50 pixels. Therefore, the DRAM
memory is organized to maximize access to areas of the screen of this size.
As shown in Figure 31, when the 3D texture mapping method of the present
invention is implemeni;ed on geometry pipelines 242, volume slices 246
perpendicular
to the screen are texture mapped through the volume. The per-vertex geometry
calculations for the volume slices 246 are easily achievable with any level
graphics
hardware. However, the requirement to support random access to both the
texture
memory 248 and frame buffer 244 lirnits the perforrnance of this approach to
the fill
rate achievable with DRAM frarne buffer.
Very high end surface graphics systems typically utilize massive parallellisrn
in
the fragm.ent processir.Lg sectior1250 of the polygon pipeline. This, coupled
with a
highly distributed frame buffer, allow increased fill rate performance.
In Figure 32 tl:,ere is shown one embodiment for connecting a geometry
pipeline 242 to the Cube-5 volurrie rendering system 252, according to the
present
CA 02337530 2001-01-11
WO 00/04505 PCT/iJS99/16038
invention. As illustrated in Figure 32, the SRAM composting buffer is
preferably
removed from inside the Cube-5 pipeline 252 and replaced with an external DRAM
frame buffer 254. R.ather than organizing the DRAM fr e buffer 254 as in
conventional polygon engines, the memory in the frarne buffer of the present
5 invention is preferably organized so that it is specifically optimized for
volume
rendering. The frame buffer 254 is also preferably accessible from a 3D
graphics
pipeline 242 to allow rsrixing of polygonal dlata 256 with volumes.
With continued reference to Figure 32, the dual use fraine buffer 254
10 preferably connects the; two pipelines 242, 252. In a preferred method, to
render a
scene with both opaque and translucent polygons and also volume data, the
geometry
pipeline 242 first rende;rs all opaque polygons with Z-depth. The volume
slices,
stored in volume memory 258, and thin slabs of translucent polygons are then
rendered in an alternating (e.g., dovetailing) fashion - volume slices by the
Cube-5
15 pipeline 252 and translucent polygons by the graphics pipeline 242 (opaque
polygons
may also be rendered with the sarne dovetailing algorithm, but with increased
demand
on the graphics pipeline).
Z-depth checking is preferably utilized to insure correct hidden object
removal
20 and blending is set in b th pipelines to correctly composite the samples
and
fragments. The geometry engine 242 preferably perferrns the final baseplane
warp
required by the Cube-5' system of the present inventi ri.
The design of the DRAM buffer 254 is critical to achieve, for example, the
25 503 Million samples per second required for 30Hz rendering of 2563 volume
datasets.
Therefore, it is helpful to first create a DRAM buffer for the Cube-5
rendering
pipeline itself, before discussing connecting the rendering pipeline to a
graphics
pipeline. The volume rendering system of the present invention is preferably
comprised of multiple Cube-5 pipelines. In each rendering pipeline, at every
clock
30 cycle, a coxel (composting buffer element consisting of RGBa) is read from
the
SRAM composite bufi:er FIFO and blended. with an appropriate composting
equation.
The new coxel is then placed at the rear of the FIFO. In a preferred
embodiment, the
CA 02337530 2001-01-11
WO 00/04505 PC1'/9.1S99/16038
76
structure of a coxel is changed to contain 32 bits of color, 8 for each
IZCpBix and 32
bits of Z-depth information, 24 + 8-bit stencil. This configuration is
required to
handle Z-depth checking in the composting stage. Assuming that opaque polygon
rendering is completed before any volume rendering begins, the 32 bits of Z-
depth/stencil information is read, but not re-written. Therefore, for every
clock cycle,
each Cube-5 pipeline needs to read 8 bytes of coxel data and write back 4
bytes.
Preferably, the rendering pipeline of'the present invention utilizes memory
chips vvitli a word size of 16 bits. Using this configuration, four words must
be read
by each pipeline every cycle and two words must be written. To do this would
require
six 16-bit memory interfaces per pipeline. Am emerging technology in
synchronous
DRAM (SDRAM) chips, which the present invention may avail itself, is known as
double data rate (DdaR.), which reads/writes data at both the rising and
falling edges of
the clock. Using DDR. SDRAMs, the present invention can utilize hvo 16-bit
memory
interfaces for reading 64 bits of data per clock and one 16-bit memory
interface for
writing 32 bits per clock, for a total of three 16-bit memory interfaces per
pipeline.
With reference now to Figure 33, sirice a read and write must be perforined
every clock cycle in order to keep the pipeline full, the present invention
preferably
reads from one set of ft aane buffer chips (e.g., set A) 260 and writes to
another set
(e.g., set B) 262. The Cube-5 system conteniplates reading from set A 260 and
writing to set B 262 for a complete slice of the volume, and then swapping for
the
next slice. With this approach, however, each frame buffer chip set would have
to be
large enough to hold tl:ie complete frame buffer. 1" errnore, the polygon
engine
would have to be instnicted as to which set is the current set. Therefore, in
a preferred
embodiment, the present invention alternates reading and writing between set A
260
and set B 262 within a frame and buffers the processed coxels from the read
set until
it becomes the write set. Since every memory access must be a burst, each
burst
actually lasts four clock cycles and reads/writes four coxels (i.e., eight
words) with
16-bit DDR DRAM chips. The Cube-5 system preferably cycles through al14 banks
to keep the memory bandwidth saturated before writing the new Gcc values back.
CA 02337530 2001-01-11
WO 00/04505 PC'r/C)S99/16035
77
For this reason, there is preferably a 16-coxel FIFO queue 264 (four coxels
for each of
four banlcs) that the nemrly composited CrcY, portions of the coxels are
stored in.
There are many different possible configurations for the number of pipelines
etc. in the Cube-5 volurne rendering system of the present invention. An
exarnple for
a case of four parallel pipelines creating 12 total memory interfaces will now
be
discussed with reference to Figure 33. As shown in Figure 33, each pipeline
contains
one read interface 266 t the Z-depth/stencil portion 268 of the frame buffer
and two
read/wri.te interfaces 270 and 272 to set A 260 and set B 262, respectively,
of the
"i 0 RGBu portion of the frame buffer. To render a 2563 volurne at 30Hz, each
of the four
pipelines process 125 rriiilion voxels per second. Therefore, a 133 M.HZ clock
is
utilized for the chip and. the SDRAM. The mapping of the fr e buffer pixels
onto
the memory chips is critical to the perfonnarice. It must match exactly the
processing
order of the Cube-5 pipelines and the parallel access by four pipelines
substantially
simultaneously. It is assumed that the skewed memory access of the Cabe-5
architecture is "un-skevied" so that the volume samples are in order from left
to right
across each scanline in groups of four, since it is easier to follow in the
explanations.
The design can be extended to skewed memory, although the geornetry pipeline
and
screen refresh system niust be aware of the additional skewing.
Figure 34 shows a preferred layout of the RGBa portion of the coxels in the
frame buffer. For a given scanline 274, there is a group oi"pixels which
reside in set
A 276 followed by a group of pixels which reside in set B 278, repeated across
the
entire scanline 274. The length of each set is 64 pixels due to the fact that
each set
must contain pixels which are read from four different bariks inside each
chip, and
each bank consists of four RC'sBoc values from four parallel chips/pipelines.
Thus the
pixel data in the frame 'buffer is interleaved across eight chips; In fine
detail, it is
really interleaved across only four chips. This provides an interface which
reads
4 pipeliiies x(1 I2.GBoc chip + 1 depth chip) x 16 bits
x 133MHz x 2 data rate = 34 Gbits = 4.15C-jbytes
CA 02337530 2001-01-11
WO 00/04505 PCT/61S99/16038
78
of data per second. This surpasses the required
2563 x 30Hz x 8 bytes = 3.75Gbytes per second
where eight bytes are organized as four bytes RGBa + four bytes Z-
depth/stencil.
Additionally, the frarrie buffer sub-system is capable of writing
4 pipelines x 1RGBa chip x 16 bits x 133MHz
x 2 data rate = 17Clbits = 2.1 Gbytes
once again handling the
2563 x 21011z x 4 bytes = 1.8Clbytes per second
required for real time 30Hz rendering of 2563 volumes.
This extra bandwidth is not siting idle. The screen must be refreshed from the
data in the frame buffer. If we assume a 1280 x 1024 screen resolution with
60Hz
refresh rate and that all four RCsBcx bytes are read from the frame buffer
(since our
burst mode access retrieves them anyway), then
1280 x 1024 x 60Hz x 4 bytes = 30fl ytes
are read from the frainsr buffer per second. Only the RGBa portion of the
frame
buffer is required for refresh. Therefore, the refresh data is read from eight
chips. It is
sufficient to perform ten data burst reads/writes (depending on set A or set
B) to each
chip followed by one read of data for refresh. This distribution of memory
accesses
provides the refresh hardware with a consistent (althoo.gh. bursty) stream of
data. The
Cube-5 pipelines also contain a small percentage of excess cycles, and thus
will not
lose the ability to achieve 30Hz 2563 rendering when the memory sub-system is
temporarily stalled for refresh.
CA 02337530 2001-01-11
WO 00/04505 PCT/US99/16038
79
An alternative approach to connecting a graphics pipeline to the Cube-5
volume rendering pipeline, in accordance with a preferred embodiment of the
present
invention, will now be described. This preferred connection approach keeps
both the
graphics pipeline and the volume rendering pipeline working at all times and
merges
the data in the SRAM compositing buffer inside the Cube-5 chip. At any given
time,
the volume rendering pipeline is composting the current volume slice with the
previous thin slab of polygon data over the compositing buffer and the
graphics
pipeline is rendering the next thin slab of translucent polygons.
The method described herein still utilizes the unique approach of dovetailing
volume slices and thin slabs of translucent polygonal data, as previously
described
herein above. In a firsi, step, all opaque polygons are projected onto a Z-
buffer
coincident with the baseplane (e.g., the volume face most parallel to the
screen).
Next, the projected RCBuZ image is loaded into the coxnposting buffer of the
volume
rendering pipeline. Subsequently, the volurne is rendered with a Z-cornparison
enabled in the composting stage. The thin slabs of translucent polygons are
preferably
rendered by the geornetry pipeline, and their correspon.ding RGBa data is sent
to the
volume pipeline of the present invention to be blended in}to the SRAM
composting
buffer within the volurrie pipeline.
Preferably, the composting stage of the volume rendering accelerator is
modified to composite two layers (one volume and one translucent polygon) per
step,
thus not delaying the volume rendering process.. This requires the addition of
some
extra logic. The straightforward f ula for perfortning a double composition
of a
volume sample v over a translucent pixel fra entp over the old coxel c would
require four additions and four multiplies ir.i five stages:
C'7 = C'a' + [C p a p + Cc(1 - CP)] (t - C&v)
However, employing simple math allows the double composition to be calculated
with four additions anci two multiples in six stages with the following
forrnula (some
of the calculations are re-used):
30.
C, _(CC+(C p -C')a p ) + CCd - (C' + (C p -C')M p )] av
CA 02337530 2001-01-11
WO 00/04505 PC'T/IJS99/16038
As appreciated by one skilled in the art, the hardware designer would choose
the
option more desirable i'or a given implementation (i.e., less logic and more
stages, or
fewer stages and more logic).
5 Consider the anYount of data transferxed for a 2563 volume. There are
preferably 255 slabs plus one buffer in front of the volume and one buffer
behind the
volume. Each of these 257 slabs contains 256 (2562 pixels of R~'xBa) of data.
This equates to 64MB 'being read from the firame buffer arad transferred
between the
two sub-systems each ftanne. To achieve a 30Hz frarne rate would require a
10 bandwidth of 1.9GB pc;r second. While this much data could be transferred
with
sufficiently wide chanraels, it must also be read from the frame buffer. It
would be
virtually impossible to read this much data without changing the organization
of the
current DRAM frame buffers. Additionally, the frame buffer must be cleared 257
times per frame.
To solve this bEdndwidth challenge, the present invention preferably uses run-
length encoding (RLE) of the blank pixels. 'With this method, each scanline is
encoded separately an& a"run-of=zeros" is encoded as four zeros (RGBa)
followed by
the length of the run. Since typically only a, small percentage of the
polygons in a
scene are translucent, the translucent polygon slabs will be relatively
sparse. Run-
length-encoding just the blank pixels in these thin slabs results in over 99 /
reduction
in the required bandvaidth. Preferably, the rnethod of the present invention
utilizes
E on 2D images of sparse translucent polygons to save on bandwidth.
Using this preferred method requires adding hardware to the Cube-5 system of
the present invention. Specifically, additional hardware may be included in
the
volume rendering pipeline that can decode the RLE input stream and create RGBa
fragments. However, since these fragments are utilized by the volume pipeline
in a
regular order, it is preferable to decode the input strearn using a double
buffer to
synchronize the two pipelines. Every clock cycle, a value is output from the
decoding
hardware. If the volo.nie rendering machine has multiple pipelines (as most
current
CA 02337530 2001-01-11
WO 00/04505 PC"r/qJS99/16038
31
designs do) the decoding hardware is preferably replicated for each pipeline
so that
they can keep up with pixel demand.
Likewise, RLE hardware at the originating end connected to the geometry
pipeline may encode the data in real-time before sending it to the volume
pipeline.
However, 1.9GB per second access to the frame buffer would still be required
to read:-
all the thin slabs of traiislucent polygons and clear the ftame buffer 257
times per
frarne. Therefore, a separate frame buffer is preferably ernployed which
stores the
data directly in RLE f rmat. Since the thin slabs of translucent data are very
sparse,
more time is spent clearing and reading than is spent rasterizing. An RLE
buffer,
while generally not opt;imal for rasterization, is well suited for both
clearing and
reading the data. For example, to clear an :F:L,E frame buffer requires merely
storing a
single run of zeros (in five bytes) for each scanline, instead of writing an
entire 256'
frame buffer.
To minimize the impact on the current geometry pipelines, the RLE frame
buffer is preferably implemented using the emerging technology of embedded
DRAM
and connecting it in parallel to the nortnal frame buffer. This differs from
conventional encoding algorithms which typically assume that the data was
given in
physical order. Triangle rasterization, however, does not guarantee any
ordering of
the fragments. 'fherefore, the apparatus of the present invention must be able
to
randomly insert an 1tGBa value into an RLE scanline of data.
Figure 35 illustrates a diagram of an RLE insert, forrned in accordance with
the present invention. For each fragment, the encoded scanline is copied from
one
buffer to another, insei-ting the new RGBavalue. Every clock cycle, a single
flit (i.e.,
either an RGBa pixel or run-of-zeros) is processed. The entire scanline is
preferably
processed flit by flit. 'With reference to Figure 35, an input buffer ("in
Buffer") 280
holds the current enc ded scanline and an output buffer ("out BBSffer99) 282
holds the
newly encoded scanlirie with the nevv 1tGBoc fragment inserted. The choice of
what to
insert at each cycle is preferably perf rrned by a 5-byte rnultiplexar 284.
The
apparatus of the present invention preferably includes pointers, narnely
661nSt1" 286
CA 02337530 2001-01-11
WO 00/04505 PC'r/[JS99/16038
82
and "outPtr" 288, which point to the current flit of both the in buffer 280
and out
buffer 282, respectively. The logic on the right side of Figure 35 calculates
how much
has been processed ("T0tal") 290 and two of the control points (66ctrl 1' and
"ctrl_3
The other mux control I3oint ("ctrl 2' ) is calculated by 'OR'-ing together
all of the
RCaBc& values (the flag for run-of-zeros). "xPos" is deillned as the x
position of the
fragment. Preferably, a, lookup table is implemented of where the current
buffer is
located in memory for each y value. Thus, the buffer can 'be moved while
inserting
new pixels and the table is simply updated. This preferred method is
illustrated in the
)/ AddFragment pseudo-code routine of Figure 36. With continued reference to
Figure 36, the 1tL,E AddPixel'ToScanline function demonstrates the processing
that
occurs in the hardware embodiment of the present invention shown in Figure 35.
By utilizing an,vmbedded 13RAM the present invention takes advantage of the
extremely high bandwidth available when processing occ'irrs on the memory
chip.
The processing is simple enough to be implemented in the DRAM manufacturing
process. For example, for a 1280 x 1024 fraarae buffer, the maximum arnount of
memory required is 50] its. This fits onto eDRAM dies with room for over 3
million gates for the encoding hardware.
Figure 37 is a preferred block diagram illustrating how a polygon pipeline 242
and volume pipeline 252 are connected through the RLE frarne buffer 292, which
is
preferably double-buff+rred to allow rendering during transmission of data.
The
auxiliary frarne buffer i.s preferably connected at the same place as the
existing one by
sirnply duplicating the fragments, thus not affecting the remainder of the
geometry
pipeline 242. The vola:me pipeline 252 also preferably utilizes double
buffering to
allow receiving of data while blending the previous slab. It is to be
appreciated that,
using the system of the. present inirention, volume rendering does not
conflict with
polygon rendering. Since the voltame pipeline 252 always accesses its memory
in a
repeatable ordered fashion, it achieves the sample fill rate into the frame
buffer at a
sufficient rate to achielie 30Hz volume rendering. The system of the present
invention
utilizes the graphics pipeline 242 to render the opaque polygons before
rendering the
volume, stored in volucne rnemory 258. This can norrnalliy be accomplished
CA 02337530 2001-01-11
WO 00/04505 1'C'F/[7S99/16038
83
concurrently with the rendering of the volume for the previous frame. Even if
the
polygon engine must render translucent polygons mixed in with the volume,
there is
usually enough time to :render the opaque polygons before the volume finishes
due to
the small number of trailslucent polygons in normal scenes.
In accordance with a preferred ernbodirnent of the present invention, a method
is provided to incrementally voxelize triangles into a volumetric dataset with
pre-
filtering, thereby generating an accurate multivalued voxe:lization.
Multivalued
voxelization allows direct volume rendering wvitl internsixed geometry,
accurate
multiresolution represeritations, and efficient antialiasing. Prior
voxelization methods
either computed only a binary voxelization or inefficiently computed a
multivalued
voxelization. The method, in accordance with the present invention, preferably
develops incremental equations to quickly determine which filter function to
compute
for each voxel value. This preferred method, which is described in greater
detail
herein below, requires eight additions per voxel of the triangle bounding box.
To avoid image aliasing the present invention preferably employs pre-
filtering,
in which scalar-valued 'voxels are used to represent the percentage of spatial
occupancy of a voxel, an extension of the two-dimensional line anti-aliasing
method
conventionally known (Filtering Ed es for C rayscale Dsnlavs, by S. Gupta and
R. F.
Sproull, Computer Grauhics (SIGG PH 8.1), vol. 15, no. 3, pp. 1-5, Aug. 198
1). It
has also been shown that the optimal volume sampling filt:er for central
difference
gradient estimation is a one-dimensional oriented box filter perpendicular to
the
surface. The method olEthe present invention preferably utilizes this filter
which is a
simple linear fiznction of the distance from the triangle.
Conventional graphics hardware only rasterizes points, lines, and triangles,
with higher order primitives expressed as combinations of these basic
primitives.
Therefore, it is preferable to voxelize only triangles because all other
primitives can
be expressed in terms of triangles. Polygon meshes, spline surfaces, spheres,
cylinders, and others can be subdivided into triangles for voxelization.
Points and
lines are special cases of triangles ahd can similarly be voxelized by the
present
CA 02337530 2001-01-11
WO 00/04505 PCT/l3s99/16038
84
algorithm. To voxelize solid objects, the boundary of the object is preferably
voxelized as a set of triangles. The interior of the object is then filled
using a
volumetric filing procedure.
As appreciated l:)y those skilled in the art, are linear expressions
that maintain a distance from an edge by efficient incremental arithmetic. The
methods of the present invention extend this concept into three dimensions and
apply
antialiasing during the scan conversion of volumetric triarigles.
In essence, the general idea of the triangle voxelization method of the
present
invention is to voxelize a triangle by scanning a bounding box of the triangle
in raster
order. For each voxel in the bounding box, a filter equation is preferably
evaluated
and the result is stored in memory. The value of the equation is a linear
function of
the distance from the triangle. The result is preferably stored using a fuzzy
algebraic
union operator, namely, the max o;perator.
With reference now to Figiare 38, there is shown a density profile of an
oriented box filter alon;g a line 294 from the center of a solid primitive 296
outward,
perpendicular to the sui face 298. 'The width. of the filter is defined as W.
The
inclusion of a voxel in the fuzzy set varies between zero and one, inclusive,
deterYnined by tl?e value of the oriented box filter. The surface 298 of the
primitive
296 is assumed to lie on the 0.5 density isosurface. Therefore, when
voxelizing a
solid primitive 296, as in Figure 38, the density profile varies from one
inside the
prixnitive to zero outsicle the primitive, and varies srnooth:ty at the edge.
For a surface
primitive, such as the Hangle 300 shown in Figure 39, the density is
preferably one
on the surface and drops off linearly to zero at distance W from the surface.
Although
the present invention similarly contemplates the voxelization of solids, the
voxelization of surfaces will be described herein.
With continued reference to Figure 39, it has been deterrnined that the
optimum value for filter width W is 2J voxel units (see e.g., Object
Voxelization by
Filtering, by M. Srdmelc and A. Kwaftnan, 1998 Volume Visualizatim,n
Symposium, pp.
CA 02337530 2001-01-11
WO 00/04505 PC't /1JS99/16038
111-11$, IEEE, Oct. 15,98). For shading, the norrnal is preferably estimated
by
computing the central ciifference gradient at the 0.5 isosurface. Because the
overall
width of the central dif~erence filter is at most 2r3 units, a correct
gradient is found
on the 0.5 density isosurface. The thickness of the triangle 300 may be
defined as T.
5 Norrnally, T can be zeri), unless thick surfaces are desired. By
thresholding at 0.5
density, a 6-tunnel-free set of voxels is generated when Wz 1. This property
is useful
for volumetric filling (e.g., in order to generate solid objects).
All voxels with non-zero values for a triangle are preferably within a
bounding
10 box which is S W+TI2 voxel units larger in all directions than a tight
bounding box.
Therefore, the first step of the present method preferably determines a tight
bound for
the triangle 300, then xTiflates it in all directions by S'voxel units and
rounds outward
to the nearest voxels.
15 As illustrated art Figure 40, the area surrounding a triangle defined by
vertices
C,, CZ and C'3 Ynay be divided into seven regions (e.g., R1 through R7) which
must be
treated separately. In a preferred xnethod of the presern' in.ventiorl, each
candidate
voxel is tested for incla:ision within the seven regions, thexi filtered with
a different
equation for each region. In the interior region R1 of the triangle, the value
of the
20 oriented box filter is sitnply proportional to the distance from the plane
of the triangle.
In regions along the edges of the triangle, R2, R3, R4, the value of the
filter is
preferably proportional to the distance from the edge of the triangle. In
regions at the
corners of the triangle, R5,1t6, IZ7, the value of the filter is preferably
proportional to
the distance from the ciemer of the triangle.
With continued reference to Figure 40, the regions R1 -127 are preferably
distinguished by their alistarace from seven planes. The fi:rst plane a is
preferably
coplanar with the triangle and its'lorrnal vector a points outward from the
page. The
next three planes b, c, md d prefeirably have norrnal vectors b, c, and d
respectively
and pass through the comer vertices C, , C2, and C3 of the triangle,
respectively. The
final three planes e, f, and g are prefera.bly perpendicular to the triangle
and parallel to
the edges; their respective norrnal vectors, e, f, and g, lie in the plane of
the triangle
CA 02337530 2001-01-11
w 00/04505 PC'I'/t7s99/16038
86
and point inward so that a positive distance from all three planes defines
region Iti.
All of the plane coefficients are normalized so that the lerigth of the normal
is one,
except for norrraal vectors b, c, and dwhich are normalized so that their
length is
equal to the inverse of their respective edge lengths. Iri that manner, the
computed
distance from the plane varies from zero to one along the valid length of the
edge.
For any pl ar >urface, the distance of any point fi orn the surface can be
computed using the plane equation coefficients:
Dast = Ax + .t3y + C'z + .LD
d4z + Ba + C2
which simplifies to
Dist = Ax + By + Cz + D
when the coefficients are pre-norrnalized. This computation can be made
incremental
so that when stepping <<long any vector, the distance ordy changes by a
constant. For
example, if the distance from a plane is Dist at position [x, y, z], then
stepping one unit
distance in the X direct:ion changes the distarice to
I)ist A (x + 1) + By + Cz + D
_ Ax + By + Cz + D + A
= Dist + A
In general, stepping along any vector r =[rxõ rY, r"], the distance from the
plane
changes by
Drst Dist + r 0 [A, B, C]
where o indicates the dot product. While scanning the bounding box of the
triangle,
the distance from the plane of the triangle can be computed incrementally with
just a
CA 02337530 2001-01-11
WO 00/04505 PC'I'/tJS99/16038
87
single addition per voxel. This method, performed in accordance with the
present
invention, for computing the distance from a plane is illustrated by the
preferred
pseudo-code routine shown in Figure 41.
The Y-step is more complicated than. the X-step because it not only steps one
unit in the Y direction, but it also steps back multiple units in the X
direction.
Consider, as an analogy, the operation of a typewriter which glides back to
the left
margin of the paper and advances the line with one push of the return key.
Similarly,
the Z-step combines stepping back in both the X and Y directions and stepping
forward one unit in the Z direction. This simple pre-processing step ensures
efficient
stepping throughout th,d entire volume. If nrumerflcal approximation issues
arise, then
it is possible to store the distance value at the start of each inner loop and
restore it at
the end, thereby minimizing numerical creep due to roundoff error in the inner
loops.
For multivalued voxeliza,tion, seven plane distances are required. Therefore,
seven additions are required per voxel to compute the plane distances. Other
computations per voxe:l may include incrementing the loop index, comparisons
to
deterynine the appropriate region and, if necessary, computations to determine
the
density.
Referring again to Figure 40, in region R1 the density value of a voxel is
preferably computed with the box filter oriented perpendicular to plane ta.
Given a
distance DistA from plaine a, the density valliae V is cornpLLted using:
13istA I - 7'12
w
In region R2, the density is preferably computed using the distance from
planes a and
b:
V= 1 - Dist14 2+DYStB 2- T 12
CA 02337530 2001-01-11
WO 00/04505 PC'T/iJS99/16038
88
Similarly, region R3 uses planes a and c, and region RA. uses planes a and d.
Region
R5 uses the Pythagoreaji distance f~om the corner point Ct:
(cr$ x+ (C1 - Y)2 + (C1z - z)Z - 7 /2
T~=t - ---
w
Similarly, regions R6 and R7 use corner points C'z and C3, respectively.
~
At the shared edge of adjacent trianglles, it is preferable to avoid
discontinuities or cracks. Fortunately, the oiiented box filter guarantees
accurate
filtering of the edges fo;r any polyhedra, provided the union of the voxelized
surfaces
is correctly computed. The union operator can be defined over multivalued
density
values V(x) with V,,s = anax(VR(x)9 VB(x)). ther Boolean operators are
available.
However, the max operator preserves the correct oriented box filter value at
shared
edges, and is therefore preferred.
The implication of using max in the inethod of the present invention is that
the
current voxel value must be read from memory, then possibly modified and
written
back into memory. Therefore, a maximum of two rneanory cycles are required per
voxel.
The efficiency of the algorithm of the present invention may be fafther
increased by limiting the amount of unnecessary computation because the
bounding
box often contains a higher percentage of voxels unaffected by the triangle
than
affected by it. The bounding box can be made tighter by recursively
subdividing the
triangle when edge lengths exceed a predetermined consta.nt.
To visualize inte=ixed polygons and volumes, the polygons are preferably
voxelized into the target volume and rendered in a single lpass. If the
polygons move
with respect to the volt me, then voxelization should occur into a copy of the
original
volume so as not to cozrixpt the da1ta. The m,altivalued voxelized polygon
voxels may
CA 02337530 2001-01-11
WO 00/04505 PC'T/tJS99/16038
89
be tagged to distinguish them from volume data. In this manner, polygons can
be
colored and shaded separately from other data.
The preferred triangle voxelization algorithm described above is efficiently
implemented in the distributed pipelines of the Cube-5 volume rendering system
of
the present invention. 'Chis algorithm adds j ust a small anaount of hardware
to the
existing pipelines and performs accurate multivalued voxelization at
interactive rates.
One important advantage of the claimed Cube-5 volume rendering algorithm is
that
the volume data is accessed coherently in a dete inistic order. This feature
allows
orderly scanning of a bounding box for this algorithm.
In Figure 42, a preferred embodiment of the overall voxelization pipeline is
shown, in accordance mrith the present invention. If on-the-fly voxelization
is
important, the system of the present invention may preferably include separate
pipelines for volume rendering and voxelization. If voxelization can occur in
a
separate pass, then these volume rendering and voxelization pipelines may be
combined, with the voxelization pipeline re-9Asflng most of the hardware from
the
volume rendering pipeline. The setup for each triangle preferably occurs on
the host
system, in a similar rnarmer as setup is perforrned on the host system for 2D
rasterization.
With reference to Figure 42, in the first hardware stage 302 of the pipeline,
the distances from the seven planes are preferably computed. - Seven simple
distance
units are allocated with four registers for each of the seven planes.
Preferably, one
:25 register holds the currelit distance from the plane and t&ie other three
registers hold the
increments for the X-, Y-, and Z-steps. Figure 43 shows a distance computation
unit
310 for one of the seven planes, forrned in accordance with a preferred
embodiment of
the present invention. 'Chis distance computation unit 310 may be included as
part of
the distance calculation stage 302 of the pipeline (see Figure 42). The other
six units
can be essentially identical in design, but hold different values. During each
clock
cycle of voxelization, the pipeline preferabl}, steps in either the X, Y, or Z
direction
(i.e., perforxns an X-Step 312, Y-Step 314, or Z-Step 316), thereby updating
the
CA 02337530 2001-01-11
WO 00/04505 t'C II'/[JS99/Y6038
current distance according to the direction of'rnovement. 'rhe hardware for
looping
through the volume is already present in the 'volurne rendcri.ng pipeline and
is
therefore re-used here to scan the bounding box of the tria'ygle.
5 After the seven plane distances are calculated, the resulting values
preferably
flow down the pipeline. As shown in Figure 42, the next pipeline stage 304
then
preferably determines in which region the current voxel resides. In a
preferred
embodiment of the region selection stage 304, only seven comparators are
needed to
determine the outcome of the truth table, due to the mutual exclusion of some
cases.
1.0 For instance, in Figure 40, from the negative (lower) side of plane b, it
is not
necessary to test the distances from. planef or g, depending on the value of
the
distance from plane e.
With continued reference to Figure 42, after the region has been determined,
15 the next pipeline stage 3,06 computes the filter function. The filter
calculation stage
306 of the pipeline is preferably only activated if the eurrelat voxel is
within S voxel
units of the triangle. Otherwise, the current voxel is essentially unaffected
by the
triangle and different regions require different calculations, ranging from a
simple
linear expression to a complex Pythagorean clistaxlce evaluation. Since
hardware
20 ideally must handle all cases equally well, it is preferred that such
hardware be able to
perforrn a square root approximation by mearas of a limited resolution look up
table
(LUT). However, the range of inputs and outputs is small, and therefore the
size of
the required LUT will be srnall. Furtherrnore, the Cube-5 hardware of the
present
invention has several I,IJTs available for voliairne rendering which can be re-
used for
voxelization. Instead oi providing three separate units to compute the
expression
V = 1 - (Dfst - 7'/2)IW, it is rriore efficient to roll all the calculations
into one
LUT. In this case, the i;aput is Dist', defined over [0,12], and the output is
the density
value V in the range [0,1].
Due to the mutual exclusion of the seven regions, it is sufficient to provide
hardware for only the rnost complex filter calculation. With reference to
Figure 40,
the most complex calculation is the corner distance computation of regions R5,
R6,
CA 02337530 2001-01-11
WO 00/04505 PC'r/tJS99/16038
91
and R7 which, in a prefi;rred embodiment, requires five adders and three
multipliers,
in addition to the square root LUT previously mentioned. The line distance
computations in regions R2, R3, and R4 are simtsler, requiring only one adder,
two
multipliers and the square root LUT. Regior.L .IZ1 requires a single multiply
to obtain
the distance squared, which is the required input to the LUT.
Referring again to Figure 42, the final stage 308 of'the pipeline preferably
computes the max operation using the current voxel value and the coniputed
density
estimate. In a preferred embodiment of the present invention, the max operator
is
1.fl simply a comparator attached to a multiplexor such that the greater of
the two values
is written back to memory. Since most voxels in the bounding box are not close
enough to the triangle to be affected by it, memory bandwidth will be saved by
only
reading the necessary voxels. Furtlaer bandwidth savings may be achieved by
only
writing back to memory those voxels that change the current voxel value. Since
there
1_5 is some latency-betweer. requesting and receiving word frorn memory, the
voxel is
preferably fetched as soon as possible in the pipeline and the results queued
until the
memory is received. The final stage 308 is write-back to imernory, which can
be
buffered without worry of dependencies.
20 The present invention thus far has been described outside the context of
skewing, which complicates the traversal. However, the present invention
contemplates building skewing into the Y- and Z-step distance update values.
Skewing also adds more, complexities to the Cube-5 hardware of the present
invention. Specifically, when a lef1:-rnost voxel moves one unit in the Y
direction,
placing it outside of the bounding box, the pipeline actually takes p - 1
steps in the X
direction to keep the voxel within the bounding box. Similarly, wherl the left-
most
voxel moves one step in the Z direction, it also moves one step in the
negative X
direction, which is handled in the same way as before. Therefore, the
apparatus of the
present invention is preferably adapted to perforrn skewing by adding fourteen
(14)
more registers and corresponding logic to determine when the pipeline is
currently
processing the left-most voxel.
CA 02337530 2001-01-11
WO 00/04505 PCT/[JS99/16039
92
Pre-filtering, wr.ich may be performed in combination with thevoxelization
methods of the present invention, can be used to optimally generate a series
of
volumes of different resolutions. This technique is useful for rendering
images of
different sizes; the size,ef the volume is preferably chosen to correspond to
the size of
the final image. In this manner, aliasing is avoided at all image resolutions
and no
unnecessary work is perfonned rendering parts of a scene not visible at the
image
scale.
Pre-filtering can additionally be used to model motion blur. For example, as
an object sweeps past a carnera, it sweeps out a complex volume during the
time the
shutter is open, causing motion blur. To accajxately render motion blur,
conventional
rendering techniques render multiple images and blend them into a single
image.
However, this approach is very slow. With pre-filtering, the present
i:rivention
perforrns the sweeping operation once, during voxelization, so that motion
blur can be
rendered in the same tinie as regular volume rendering. This method works
well,
particularly for certain cases where the motion is constant (e.g., the same
direction
and/or rotation). For example, consider a helicopter blade which spins at a
constant
speed during flight. For example, to voxelize the blade spinning at the rate
of 5Hz for
an animation frame rate of 30Hz, the blade sweeps out an arc of 5 (21t) each
frame.
20 Thus, at the outer edge of the blade, the density value is much lower and
the blade
appears more transparerit than in the center, where it sweeps out a smaller
volume and
appears more solid. The volume rendering transfer function may be set so that
the
lower density values appear less opaque and higher density values appear more
opaque.
When multiple volumetric objects overlap, the projected image of the volumes
becomes quite cornplex, Consider, for example, a scene where smoke rises up
through a cloud. Clearly, the two volumetric objects cannot be rendered
separately
with the images combined in the final frame. Therefore, in a preferred method,
perforrned in accordance with one forrn of the present invention, multiple
objects are
combined into one obj eot for a final rendering pass to create the resulting
image.
CA 02337530 2001-01-11
WO 00/04505 PC"r/[1s99/16038
93
Vy'hen two or more objects occupy the same space, the colors from each object
are preferably modulated together at each sarnple location along a projected
sight ray.
Therefore, it is preferre(i that each object be classified and shaded prior to
being
combined, followed by color r.nodulation. If9 alternatively, voxel data were
combined
first, a new transfer function would be required for each possible
combination. This
latter approach is there# re not preferred.
In accordance with one form of the present invention, a preferred method for
mixing multiple overlapping volumes resamples all but the first object in the
z-dimension of the first object so that slices of each object become
interlaced. This
includes a classification., a shading and a transformation which aligns all
objects.
Object transformations include translation and scaling, preferably performed
by the
apparatus of the present invention using nearest neighbor connections, and
rotation,
which is preferably perform.ed using the rotation methods of the present
invention
previously described herein above.
For scenes containing objects vvhich -will not change position or orientation
with respect to each other, the present invention contemplates optimizations
such as
high-level scene graph compilation that can preferably be employed. For
instance,
static objects are preferably combined once and stored for subsequent
rendering, while
non-static objects are re-combined each time; they are moved with respect to
the other
obj ects.
Texture mapping is a widely used technique to sianulate high-quality image
:25 effects, such as surface details, and even lighting and shadows. In
general tergns,
texture mapping involves mapping a two-dirnensional (21)) image onto a three-
dimensional (3D) surface. Texture mapping occurs while geometric objects are
rasterized onto the screen. The (x, y) pixel coordinates are preferably mapped
into (u,
v) texture coordinates and an RG13a value is returned as the color value to
use for that
pixel on the screen.
CA 02337530 2001-01-11
WO 00l04505 PCT/US99/16038
94
There are basically two processes involved in texture mapping: a mapping
from (x, y) coordinates to (u, v) coordinates, and a look-up into the image of
what
IZGBa value corresponds to a given (u, v) coordinate. The mapping from (x, y)
to (u,
v) coordinates preferab'dy involves simple matrix multiplication, as
appreciated by
those skilled in the art. However, the look-up into the image of the (u, v)
coordinate
to return an RGBa valtie is complex. The very large scale integration (VLSI)
hardware requirernents for the texture lookup commonly consume large portions
of
todayps graphics boards, at a significant cost,. This is primarily due to the
fact that (u,
v) coordinates rarely map directly to a discrete image coordinate, called a
texel.
Therefore, the neighbor ing RGBa values are preferably li:ciearly interpolated
to
produce the RGBa value at the exact (u, v) coordinate.
Two-dimensional (2D) interpolations are generally sufficient if the pixel does
not cover more than one texel. However, if the mapping produces pixel
coverages
greater than one texel, ,irtifacts are introduced into the image using the 2D
interpolation method. 'ro avoid costly texel combining operations, a technique
ternied
Mip-Mapping may be utilized by conventional graphics pipelines. Mip-Mapping
basically consists of storing multiple levels-of-detail (LOD) of an image.
Then, when
an (x, y) pixel is mapped to a (u, v) texel, the appropriate Mip-Map level
texels are
chosen so that the pixel is smaller than the texels. A more accurate method is
to look-
up the four neighborhood texels from both the higher level and lower level of
detail
texel images and then perform a trilinear interpolation on all eight texels to
compute
the appropriate RGBa,value for the pixel.
Texture mappin.g hardware from conventional graphics pipelines has been
used to accelerate voliu:ne rendering and has been the subject of such texts
as
IZealitvEneine Graphics, by K. Akeley, Computer Graphics (SIGCZ PH 93), 27:109-
116, Aug. 1993, and Accelerated Volume ltenderinand rorno aphic
Reconstruction IJsiny Cexture Mapping Hardare, by B. Cabral, N. Cam and J.
Foran, Symposium on t'olume Visicalizatioaaõ pp. 91-98, Oct. 1994. This
conventional
approach, however, neither achieves the cost-perforrnance nor supports the
various
functionalities (e.g., shading) of the present invention. Furtherrnore, using
knnown
CA 02337530 2001-01-11
WO 00/04505 PC'r/tJ599/16038
prior art methods, texture mapping is unscalable without data replication,
often
employs two-dirnensional (2D) rather than three-dirnensional (3D)
interpolation,
downloads datasets slo'vly, and/or does not support real-time four-dimensional
(4D)
input.
5
In accordance xvith a preferred form of the present invention, described
previously hereir. above, the Cube-5 apparatas is combined with a conventional
geometry engine via the geometry input/output bus 46, 48 (see Figure 4).
Preferably,
the rendering pipeline(s) of the present irflverition are utilized to perform
the texture
10 look-up function, while the geometry engine is used for mapping (x, y)
pixel
coordinates to (u, v) texture coordinates. In simple terms, once combined with
the
Cube-5 apparatus, the rasponsibility of the geometry engine is essentially to
rasterize
triangles, while the apparatus of the present invention preferably provides
the high
perforxrflance interpolation engine for texture mapping. To perform texture
look-ups
15 on the apparatus of the present invention, texel data is preferably loaded
into 3D
memory included within the Cube-5 unit(s). Figures 6A and 6B illustrate an
example
of how 32 bits of texel data for a 2x2 neighborhood are preferably arranged in
a 23
subcube of 16-bit voxel.s.
:20 Another important advanta,ge of the present invention is the ability to
enhance
image-based rendering. In general, image-based rendering methods render
complex
scenes from arbitrary viewpoints based on a:finite set of images of that
scene. Two
similar image-based rendering methods, known by those skilled in the art,
which use
four-dimensional (4D) interpolation without requiring the depth information of
the
25 source images are light field rendering and T.,urnigraph. The high-
performance
interpolation engine of the present invention may be used to accelerate these
two
techniques.
Figure 44 shows that in light field rendering, the scene is modeled by uv 322
30 and st 320 planes. Every uv grid point preferably defines a viewpoint and
has an
associated st image. For every pixel of the projection plane 324, a sight ray
326 is
preferably cast into the uv plane 322. The four st images corresponding to the
uv grid
CA 02337530 2001-01-11
WO 00/04505 PC'1'/[3S49/16038
96
points surrounding the intersection of the sight ray with the uv plane
contribute to that
ray. The contributions are preferably calculated by casting a sight ray into
each st
image through its uv grid point. These rays hit between st irnage pixels and,
therefore,
a bi-linear interpolatior must be performed for each st image. One final bi-
linear
interpolation between the four rays yields the final projection plane pixel
color.
Obtaining every pixel of the projection plane 324, therefore, requires four bi-
linear
interpolations in st plar. es 320 and one bilinear interpolation in the uv
plane 322,
resulting in a total of five bi-linear interpolations. These five bi-linear
interpolations
are substantially equivcLlent to one 4D interpolation, or 15 linear
interpolations.
I'erforming lookups for each projection plane ray usually causes random
access into the st images. Therefore, in accordance with a preferred method of
the
present invention, st images are accessed in object order, which is more
appropriately
adapted for use with thc apparatus of the present invention since the Cube-5
apparatus
allows reading of each :st image pixel only once. With continued reference to
Figure
44, for each quadrilateral 328 in the uv plane (e.g., abed), its projections
on the four st
planes (e.g., corresponding to abcai7 preferably determine which four tile
regions 330
contribute to the final image. All st tile regions 330 are then preferably
assembled
into four images and are perspectively projected onto the projection plane
324. The
final image is subsequently formed by bilinear interpolation among the four
projected
images. Interpolation weights are preferably deterrnined by the intersection
between
the original ray and the uv plane 322.
Although illustrative embodiments of the present invention have been
described herein with reference to the accompanying dra-vvings, it is to be
understood
that the invention is not limited to those precise embodiments, and that
various other
changes and modifications may be effected therein by one skilled in the art
without
departing from the scope or spirit of the present invention.