Note: Descriptions are shown in the official language in which they were submitted.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
SYSTEM AND METHOD FOR TRANSFERRING WEB PAGE DATA
~: -
BACKGROUND ART
It has long been recognized that browsing the Web on a
small device, such as a Personal Digital Assistant(PDA)
cellular phone, or wireless terminal, would be useful. The
difficulties involved in implementing web browsing on such
devices generally do not arise because of limitations in
computing power. Rather such difficulties arise from the two
following causes. First, the displays are usually very small,
making it difficult for them to display web pages designed
specifically for larger displays or windows (e.g., typically,
at least 800x200 or 1024x768 pixels) . Second, small wireless
devices are typically connected to wide-area networks at low
bandwidth, making traditional web browsing a nuisance due to
the need to download HTML (Hypertext Markup Language) pages in
their entirety before viewing them.
Some partial solutions to this problem exist. For
example, one computer hardware manufacturer has designed a new
platform for wireless delivery of content to small devices
based on BREW (Binary Runtime Environment for Wireless). This
kind of solution, however, requires that content providers
agree to provide their content using BREW technologies, or
that third-party content aggregators continually repackage
existing content using formatting specialized to the content
or application, again serving the results using BREW
technologies.
Another partial solution is Opera For Mobile, in
combination with Opera Mobile Accelerator and Small-Screen
RenderingT"'. These technologies are based on a popular Web
browser, Opera, which has been ported to mobile wireless
platforms. Small-Screen RenderingTM attempts to reformat Web
pages dynamically to fit within the horizontal dimension of
1
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
the mobile terminal's display. Opera Mobile Accelerator uses
a proxy server to access Web pages in their original HTML form
on demand, and then compresses this HTML and forwards the
compressed version to the mobile terminal. According to
Opera, Opera Mobile Accelerator reduces the size of Web pages
by approximately 50-70%, but it has the ability to reduce them
up to 90%. With this in mind, the data is retrieved faster
and the Web pages are displayed sooner. Generally, the
increase in speed depends on the type of sites you are
visiting. These two technologies present partial solutions
to the problems of mobile terminal display size and bandwidth
in connection with Web browsing. However, there is a need in
the art for an improved system and method for providing web
page data to digital devices having limited bandwidth or other
communication and/or processing power limitations.
DISCLOSURE OF THE INVENTION
According to one aspect, the present invention discloses
a method, that may include accessing an Internet site by a
proxy server; converting image data from the Internet site
into a multiresolution representation; and transmitting the
image data in the multiresolution representation to a client
device. Preferably, the method further includes
navigating a web page by the client device. Preferably,
the navigating step includes (a) navigating the Internet site
image data by the proxy server; and (b) navigating the
multiresolution image data by the client device, wherein the
navigating of step (a) and the navigating of step (b) occur
substantially simultaneously.
Preferably, the navigating includes storing
multiresolution image data for at least substantially the
entirety of the web page at the proxy server; and navigating
2
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
the stored multiresolution image data by the client device.
Preferably, the converting step comprises: continuously
converting the Internet site image data as needed by the
navigating by the client device. Preferably, the converting
step comprises: pre-converting selected image data from the
Internet site by the proxy server prior to the selected image
data being needed for the client device navigation; and
storing the pre-converted image data at the proxy server.
Preferably, the method further comprises transmitting the
stored image data to the client device as needed by the client
device navigation. Preferably, the method further includes
ensuring that the stored pre-converted image data is not stale
prior to transmission thereof to the client device.
Preferably, the ensuring step includes comparing one of a
timestamp and a checksum of the pre-converted image data with
a respective one of a timestamp and a checksum of
corresponding image data originating from the Internet site.
Preferably, the pre-converting step includes converting image
data for at least one web page by the proxy server; and
storing the converted image data for the at least one web page
at the proxy server.
Preferably, the method further includes enabling bi-
directional interactive communication between the client
device and the Internet site. Preferably, the interactive
communication comprises: transmitting navigation commands from
the client device to the proxy server. Preferably, the method
further includes emulating dynamic Internet graphics for the
client device by the proxy server. Preferably, the emulating
step comprises emulating at least one of: dynamic HTML
(Hypertext Markup Language); and applets. Preferably, the
navigating comprises: selecting a resolution level at which to
view at least a portion of the web page, at the client device.
3
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Preferably, the navigating includes selecting a region of
the web page for display on the client device. Preferably,
the navigating further includes selecting a resolution at
which to view the region of the web page. Preferably, the
navigating step comprises selecting a plurality of portions of
the web page for viewing at the client device. Preferably,
the navigating step further includes selecting a plurality of
respective resolutions at which to view the plurality of web
page portions. Preferably, navigating the multiresolution
image data includes panning the multiresolution image data.
Preferably, the navigating the multiresolution image data
comprises: zooming in on the multiresolution image data.
According to another aspect, the invention may provide an
apparatus, that may include a client device; and a proxy
server in communication with, and serving as an intermediary
between, the client device and an Internet site, wherein the
proxy server is operable to convert image data from the
Internet site into a multiresolution visual data format.
Preferably, the multiresolution visual data format is
JPEG2000. Preferably, the client device is a cell phone, and
wherein the cell phone comprises: at least one mechanism for
navigating image data in the multiresolution visual data
format. Preferably, the mechanism is a touchpa.d and the
touchpad enables panning image data in the multiresolution
visual data format. Preferably, the mechanism enables zooming
in on image data in the multiresolution visual data format.
Preferably, the mechanism for zooming comprises a roller that
is rotatable by a user of the client device. Preferably, the
client device is one of the group consisting of: a cell phone,
a PDA (Personal Digital Assistant), a notebook computer, a
desktop computer, a tablet computer, and a television set.
Other aspects, features, advantages, etc. will become
apparent to one skilled in the art when the description of the
4
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
preferred embodiments of the invention herein is taken in
conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
For the purposes of illustrating the various aspects of
the invention, there are shown in the drawings forms that are
presently preferred, it being understood, however, that the
invention is not limited to the precise arrangements and
instrumentalities shown.
FIG. 1 is a bock diagram of a communication system
including a proxy server and a client device in accordance
with one or more embodiments of the present invention;
FIG. 2A is a front view of a client device, which may be
a cell phone, customized in accordance with one or more
embodiments of the present invention;
FIG. 2B is an elevational view of the back of a client
device, which may be a cell phone, customized in accordance
with one or more embodiments of the present invention;
FIG. 2C is an elevational view of the back of a client
device, which may be a cell phone, customized in accordance
with one or more embodiments of the present invention;
FIG. 3 is a flowchart of a method for navigating image
data by a client device in accordance with one or more
embodiments of the present invention;
FIG. 4 is a view of a web page column as seen on a client
device in accordance with one or more embodiments of the
present invention;
FIG. 5A is a view of the web page of FIG. 4 after about
two seconds of loading at a rate of 20 kilobits per second, in
accordance with one or more embodiments of the present
invention;
5
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
FIG. 5B is a view of the web page of FIG. 4 after about
ten seconds of loading at a rate of 20 kilobits per second, in
accordance with one or more embodiments of the present
invention;
FIG. 6A is a view of a top portion of the web page of
FIG. 4 after loading data at 20 kilobits per second for about
two seconds, in accordance with one or more embodiments of the
present invention;
FIG. 6B is a view of a top portion of the web page of
FIG. 4 after loading data at 20 kilobits per second for about
ten seconds, in accordance with one or more embodiments of the
present invention;
FIG. 7 is a zoomed in view of a portion of the web page
of FIG. 4 captured shortly after capture of the views of FIG.
6, in accordance with one or more embodiments of the present
invention;
FIGS. 8A and 8B are portrait and landscape views,
respectively, of a portion of the web page of FIG. 4, in
accordance with one or more embodiments of the present
invention; and
FIG. 9 is a block diagram of a computer system adaptable
for use with one or more embodiments of the present invention.
BEST MODE OF CARRYING OUT THE INVENTION
FIG. 1 is a bock diagram of a communication system 140
including a proxy server 150 and a client device 230 in
accordance with one or more embodiments of the present
invention. Communication system 140 may include Internet 180,
client device 230, and proxy server 150, which may in turn
include cache 160. Internet 180 may interpreted to include
communication links and computing and communication hardware
6
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
needed to enable proxy server 150 to browse the Internet in a
conventional manner. While a high-bandwidth connection is
preferred for use between proxy server 150 and Internet 180,
any communication link, such as is known in the art, may be
used to connect proxy server 150 to Internet 180. Generally,
proxy server 150 may be connected to an Internet Service
Provider (ISP) (not shown) which serves as a gateway to the
Internet as a whole. In one or more embodiments herein,
Internet 180 is intended to include such an Internet Service
Provider.
Proxy server 150 may be a personal computer or other
computing device suitably configured to communicate with the
Internet and to process data, including image data, downloaded
therefrom. Proxy server 150 is preferably able to transmit
commands to and receive image data from Internet 180.
Proxy server 150 may include cache 160 for storing data,
including image data, to enable rapid and easy access to such
data. Cache 160 may be physically incorporated within server
150, such as by being within the same physical enclosure as
proxy server 150. Alternatively, cache 160 may be housed
separately from proxy server 150 but be accessible to cache
160.
Client device 230 is preferably a portable computing
device able to conduct communication with proxy server 150.
However, client device could be a larger, non-portable
computing device, such as a desktop computer. Client device
230 may be one of a cell phone, a Personal Digital Assistant
(PDA), a notebook computer, a desktop computer, a tablet
computer, and a television set. However, client device 230 is
not limited to being one of the foregoing items. Preferably,
if client device 230 is a television set, this television set
is able receive multiresolution image data and to transmit
7
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
navigation commands to an external device, such as, but not
limited to proxy server 150. Client device 230 may also
include a data storage capability, including volatile and/or
non-volatile storage, for storing multiresolution image data
received from proxy server 150. Such data storage may include
but is not limited to: RAM (Random Access Memory, ROM (Read
Only Memory), hard drive, CD-ROM drive, CD read-write, and
bubble memory. Where desired, the client device 230 data
storage may include cache memory.
Proxy server 150 may serve as an intermediary between
Internet 180 and client device 230 to enable client device 230
to browse image data from Internet 180 after conversion of
such data into a multiresolution format, as discussed in
greater detail in connection with FIG. 2.
FIG. 2A is a front view of client device 230, which may
be a cell phone, customized in accordance with one or more
embodiments of the present invention. FIG. 2B is an
elevational view of the back of the client device 230 of FIG.
2A. And, FIG. 2C is an elevational view of the back of the
client device of FIG. 2A.
In one or more embodiments covered by the illustrations
of FIG. 2, client device 230 is a cell phone or PDA that is
specially configured for browsing multiresolution image data.
Client device 230 may include a body 232, a display 234, an
antenna 236, a touchpad 238, a joystick 240, and/or a roller
242. In FIG. 2, roller 242 is shown protruding to some extent
from body 232 of client device 230. This protrusion is shown
for illustrative purposes. Roller or wheel 242 need only be
accessible to a user of client device 230. Thus, it may
protrude from body 232 very slightly. Alternatively, a recess
(not shown) in the side of body 232 may be provided into which
wheel 242 is nested, thereby enabling wheel 242 to not
8
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
protrude beyond the straight-edged portion of the side of body
232 of client device 230.
In one or more embodiments, display 234 may be located on
the front of client device 230 (FIG. 2A) However, in other
embodiments, display 234 may be located anywhere on client
device 230. Touchpad 238 may be located on the rear of client
device 230 (FIG. 2B) and may be used for panning image data
received from proxy server 150. However, alternatively,
touchpad 238 may be located on any surface of client device
230. Alternatively, touchpad 238 may be used for zooming
image data. In one or more other embodiments, touchpad 238
may be used for panning and zooming image data.
A user of client device 230 may pan or zoom image data by
moving a finger along the surface of touchpad 238.
Additionally or alternatively, touchpad 238 may be employed to
move a cursor over one or more image regions or web page
regions displayed on display 234. Where plural images
(possibly but not necessarily from two different web pages)
are displayed on display 234, use of a cursor may be
beneficial for identifying one of the images for the purpose
of performing a navigation step thereon.
In an alternative embodiment, joystick 240 (FIG. 2C) may
be employed for panning image data either in addition to, or
in place of touchpad 238. Joystick may be used for panning
image data by pushing the joystick 240 in a direction
corresponding to a direction in which a user of client device
230 wishes to pan across an image. As with touchpad 238,
joystick 240 may be located on any surface of client device
230.
In one or more embodiments, zooming may be accomplished
using a separate device, such as wheel 242, or other device
such as a slider (not shown), or using a stereotyped gesture
9
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
on touchpad 238, such as sliding the finger up and down one
edge of the touchpad 238 surface. In the latter case, both
index fingers could be used on the touchpad simultaneously,
one panning and the other zooming.
In one or more embodiments that include both touchpad 240
and wheel 242, a user of client device 230 could rapidly
switch back and forth between zooming and panning, or even pan
and zoom simultaneously. Herein, wheel 242 may also be
referred to as a "roller."
Employing one or more of the above-described features,
panning and/or zooming may become tactile experiences, as
though a user of client device 230 were looking through the
display (screen) 234 at a larger surface. The index finger of
one hand could stroke touchpad 238, panning the image and
providing a sense that the larger image is being moved from
behind display 234 using touchpad 238. In an analogous
manner, the use of wheel 242 for zooming in on an image may be
employed to provide a sense that movement of wheel 242 moves
the "larger image" closer to or further away from display 234,
depending upon the direction of rotation of wheel 242.
The haptic interface described above may provide a
natural and familiar "feel" with a shallow learning curve.
One or more embodiments may also provide the advantage,
compared to the more conventional solution of using a
touchscreen, of allowing an unobstructed view of the display
while the finger strokes touchpad 238 to pan an image.
Moreover, in or more embodiments, synergistic benefits may be
obtained by being able to readily alternate between panning
and zooming and/or to conduct these two activities
simultaneously or substantially simultaneously, whether using
touchpad 238 and wheel 242, or using other navigation
mechanisms.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In one or more embodiments, touchpad 238 may have
dimensions of about 2 inches by 1.5 inches, however touchpad
238 may be smaller or larger in either dimension than the
listed values.
FIG. 3 is a flowchart of a method 320 for navigating
image data by a client device 230 in accordance with one or
more embodiments of the present invention. The steps shown in
FIG. 3 are not limited to being performed in the order shown.
In certain cases, two or more of the illustrated steps may be
performed simultaneously. In the following, the steps are
listed and summarized, following which the application of the
steps to basic interactive browsing and the variations thereof
are discussed.
Method 320 may include accessing 322 an Internet site at
Internet 180 by proxy server 150. Proxy server may convert
324 image data from the accessed internet site into a
multiresolution visual data format that supports spatial
random access (multiresolution image data) JPEG2000 is one
possible example of such a visual data format. However, the
present invention is not limited to the use of this standard.
Herein, the term "converting" generally corresponds to the
term "repackaging," as applied to turning ordinary web page
data into multiresolution image data.
Method 320 may include transmitting 326 the
multiresolution image data to client device 230 by proxy
server 150, for ultimate display 328 on client device 230.
Client device 230 may navigate 330 the transmitted
multiresolution image data, either after or concurrently with
the transmission in step 326. The transmitted multiresolution
image data navigated 330 by client device 230 may include
emulated Internet graphical features, such as, but not limited
to dynamic HTML and applet functionality.
11
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In one embodiment, proxy server 150 converts 324 image
data from web pages on Internet 180 into multiresolution image
data, as such data is needed by the navigation 330 of client
device 230.
The multiresolution image data is then preferably
transmitted 326 to client device 230 and displayed 328
thereon. In this embodiment, the accessing 322 of the
Internet site, which may be a web page, by proxy server 150
and the navigation of the multiresolution image data by client
device 230 may occur substantially simultaneously. Thus, in
this embodiment, the conversion 324 of web page (or other type
of) image data occurs substantially "on the fly" in response
to the navigation 330 conducted by client device 230.
Navigation 330 may include but is not limited to panning and
zooming by client device 230.
In another embodiment, proxy server may convert, or "pre-
convert", one or more entire web pages, or other form of
Internet image data, into multiresolution image data and store
the converted data in cache 160, or other memory device
accessible to proxy server 150, for ready accessibility by,
and transmission to, client device 230. This pre-conversion
may be performed prior to the web page data concerned being
needed in response to client device 230 navigation 330
commands.
In one or more embodiments, proxy server 150 may ensure
that, when servicing a request for image data from client
device 230, the cached pre-converted multiresolution image
data isn't "stale" by verifying that the original HTML content
is unaltered in comparison with the cached version. This
verification may be accomplished by comparing timestamps,
checksums, or by using other well-known methods. Some
tolerance for "staleness" may be built in to bound the number
12
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
of HTML requests the proxy server must make. In this
disclosure, data that is not "stale" is "fresh."
In one or more embodiments, proxy server 150 serves as an
intermediary between the Internet 180, or more specifically a
site or Internet page of Internet 180, and client device 230.
Consequently, proxy server 150's communication with the
Internet 180 is preferably responsive to the browsing of
multiresolution image data by client device 230. Also, two or
more of the steps illustrated in FIG. 3 may occur
interactively and/or concurrently rather than occurring one at
a time and strictly in the sequence shown.
Generally, web page image data flows from Internet 180 to
proxy server 150 where it may be converted into
multiresolution image data. At that point, the
multiresolution image data may be stored in cache 160 and/or
transmitted to client device 230 for display thereon. Client
device may, where needed, store the received multiresolution
image data.
Navigation or browsing commands generally proceed in the
reverse direction. Specifically, navigation requests and/or
other commands pertinent to browsing a web page or Internet
site may originate at client device 230, proceed to proxy
server 150 and thereafter may be sent, possibly in modified
form, from proxy server 150 to the Internet 180. Suitable
conversion or reconfiguration of commands received at proxy
server 150 from client device 230 may be implemented at proxy
server 150 before re-transmitting such commands to the
Internet 180. For example, where client device 230 clicks on
a link to another web page, proxy server 150 may re-transmit
this linking command to the Internet 180 in order to access
the target web page. The commands that may be sent by client
device 230 may include navigation commands such as panning and
13
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
zooming. Further commands may include clicking with a mouse
or other device, typing, speaking (where voice commands may be
understood at client device 230), gesturing, movement of a
finger or implement along the surface of touchpad 238,
movement of wheel 242, or movement of joystick 240.
A first example is considered in which a single web page
is accessed 322, converted 324 into multiresolution image
data, and stored in cache 160. After being stored in cache
160, the multiresolution image data may be browsed as needed
by client device 230.
Continuing with the example, let us suppose that client
device 230 receives a initial rendition of the converted web
page at a low resolution level. From this starting point
("resolution level 1"), the user of client device 230 zooms in
on a selection region "A" of the web page. Assuming that
client device does not have the higher-resolution image data
in its local storage, the zoom command is preferably
transmitted to proxy server 150 which may retrieve the needed
data from cache 160 and which may transmit this image data to
client device 230.
Preferably, the extent of the zoom requested by client
device 230 may be associated with a selected resolution level
("resolution level 2" for the sake of this example) by proxy
server 150. The resulting resolution level may be selected
from one of a set of pre-defined resolution levels of the
selected region (region "A") . Alternatively, if the selected
resolution level does not precisely correspond to one of the
pre-defined resolution levels, the requested resolution level
may calculated by conducting one or more interpolation
calculations as disclosed in a document incorporated herein.
Display 234 of client device 230 then preferably shows
the selected region "A" at the selected resolution.
14
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Presumably, substantially only region A of the converted web
page is then displayed on display 234.
Continuing with the example, we consider a situation
where the user of client device 230 next wishes to pan
laterally from region "A" to region "B" (regions are not
shown) within a web page, while preferably maintaining the
resolution level last used to display region "A" ("resolution
level 2"). Assuming that the multiresolution image data for
region "B" is not available within client device 230, the step
of panning toward region "B" preferably causes proxy server
150 to seek the image data for region "B" from cache 160, or
other data storage device accessible to proxy server 150, and
to transmit this data to client device 230. Upon receiving
this data, client device 230 should be able to display region
"B" at "resolution level 2". It is noted that, if the image
data for resolution level 2 had not been available in cache
160, proxy server 150 would preferably have again accessed the
pertinent web page from Internet 180, and converted data for
that web page into multiresolution image data having greater
resolution than that stored in cache 160.
Continuing with the example, we next consider a situation
in which a user of client device 230 wishes to activate a
hyperlink within the first web page, preferably within region
B of the first web page.
As the user of client device 230 clicks on the hyperlink,
proxy server 150 may determine that the web page image data
associated with the hyperlink is not located in cache 160.
Accordingly, proxy server 150 preferably replicates the
request for the hyperlink's target web page in its
communication with Internet 180.
Preferably, proxy server 150 then accesses the
hyperlink's target web page. Proxy server 150 may then
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
convert image data from the target web page into
multiresolution image data and dynamically stream the
multiresolution image data to client device 230 for display
thereon. Alternatively or additionally, proxy server 150 may
store all or part of the multiresolution image data for the
target web page in cache 160 for later access, as needed, by
client device 230. Such storage in cache 160, or other
suitable storage device, may be beneficial since client device
230 may not be able to receive and display all of the
multiresolution image data for the target (hyperlinked) web
page at once. This inability is more likely to be present
where client device 230 has selected a high resolution level
at which to view one or more regions of the target web page.
It is noted that, in general, the higher the resolution (of a
displayed region) at which client device displays
multiresolution image data, the smaller the displayed region
will be.
In one or more embodiments, client device 230 may be able
to display plural regions of a web page, or plural regions
drawn from a plurality of respective web pages. In this case,
client device 230 is preferably able to independently select
display resolutions for each such displayed region. In this
case, touchpad 238 or other mechanism within client device 230
is preferably able to move a cursor over each image, in turn,
for the purpose of selecting a desired resolution for each
such image.
Moreover, within any one image, which image may form only
a portion of a web page, the resolution need not be constant.
The resolution may vary as needed to reconcile the competing
priorities of viewing a region of interest at high resolution
and avoiding the data storage and data communication burden of
receiving the amount of multiresolution image data needed to
display an entire image at high resolution.
16
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
One or more embodiments of the approach described herein
for Web browsing by proxy may provide the features discussed
below.
As discussed above, a zooming function may be provided
with web browsing by proxy. Thus, a Web page formatted for
viewing a wider display than that of client device 230 can
nonetheless be seen in its full column width on a small
display such as display 234. If the text is small relative to
the column size, it may be beneficial to pan horizontally to
read single lines in their entirety. However, the use of
continuous zooming may enable many text columns to be read in
their entirety even on a 176 pixel wide display (which width
is featured on certain commercially available mobile phones).
In one or more embodiments, this is possible without any
reformatting, which would alter the layout of the Web page.
FIG. 4 shows a column of a Web page (originally designed for
viewing on an 800x200 pixel display) being viewed in this
fashion.
Provided that either conversion to multiresolution image
data format is reasonably fast or that cached multiresolution
image content is already available at cache 160, an initial
view of an entire Web page can be rendered visible at client
device 230 shortly after requesting the pertinent web page.
This ability may be provided for Web pages of any length. It
also applies whether the initial view of the Web page is a
zoomed-out overview of the entire page or a zoomed-in view of
the top (or any other part) of the web page.
FIGS. 5A and 6A illustrate these situations, providing
general overviews and a top views, respectively, of a large
Web page after about 2 seconds of loading over a 20
kilobit/second connection. FIGS. 5B and 6B show clearer
17
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
versions of FIGS. 5A and 6A, respectively, after about ten
seconds of data loading time.
For complex Web pages with a variety of content, such as
newspapers, the client can continuously focus in on
interesting areas immediately, without waiting for the entire
document to download. See FIG. 7.
Web pages of virtually unlimited length may be viewed
without taxing the resources of client device 230, without
delaying the time to initial view, and without causing
degraded performance.
In one or more embodiments, a Web page may have very
large images, which client device 230 may zoom in on to see
fine details of, without taxing client device 230's resources,
without delaying the time to initial viewing of the web page,
and/or without resulting in degraded performance.
In some cases, small device displays are taller than they
are wide (e.g. 176 pixels wide by 220 pixels high for the
Samsung SGH-D410 mobile phone). Yet for Web browsing, the
most constraining variable may be column width, making it
desirable for the cellular display to use a "landscape" rather
than a "portrait" page setup. Because multiresolution image
data allows rotation of visual content by a selectable angular
distance, the larger dimension of the display can be used to
its greatest effect during content browsing. See FIGS. 8A and
8B.
In one or more embodiments, a client device 230 accessing
multiresolution image data may be enabled to view image
content originating on any server having access to
multiresolution image data, not only the multiresolution image
data generated by proxy server 150. Thus, such a client
device 150 may be enabled to continuously navigate large
numbers of large images, large multiresolution vectorial maps,
18
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
and other kinds of image content that might be difficult or
impossible to serve using conventional HTML.
The above-listed advantages may be compelling enough to
make multiresolution image data browsing using proxy server
150 beneficial not only for cellular phone users, but also for
users of larger devices, including, but not limited to: PDA or
palmtop wireless devices; notebook computers with wireless
internet; home computer users connected to the internet via a
narrowband connection such as a modem; and/or home broadband
users.
In the case of home broadband users, one or more
embodiments of a system using proxy server 150 to transmit
multiresolution image data to client device 230 may provide
features that include but are not limited to the following:
a) Web pages may be zoomed continuously to improve
legibility of small type or examine details of images;
b) web pages designed for a small display could be
browsed in sharp detail (for text or vector components) on a
high-resolution display;
c) multiple Web pages could be rearranged, manipulated
and resized in a rich 2D (two-dimensional) or 3D (three-
dimensional) multiscale environment, providing alternative
user experiences to conventional multi-window or tabbed
browsing; and/or
d) through the judicious use of proxy server 150 caching,
potentially unlimited numbers of Web pages could be aggregated
into live large-scale "views" of the Web, without requiring
excessive client device 230 memory or other resources.
Finally, because displaying images in a multiresolution
image data format may provide complementary advantages for
large-display, high-bandwidth platforms and small-display,
19
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
low-bandwidth platforms, displaying in this manner may unify
user experience and may simplify cross-platform engineering
and development, rather than requiring different versions of
visual content and/or browsers to be provided for the
different platforms.
FIG. 9 is a block diagram of a computing system 800
adaptable for use with one or more embodiments of the present
invention. In one or more embodiments, central processing
unit (CPU) 802 may be coupled to bus 804. In addition, bus 804
may be coupled to random access memory (RAM) 806, read only
memory (ROM) 808, input/output (I/0) adapter 810,
communications adapter 822, user interface adapter 806, and
display adapter 818.
In one or more embodiments, RAM 806 and/or ROM 808 may
hold user data, system data, and/or programs. I/0 adapter 810
may connect storage devices, such as hard drive 812, a CD-ROM
(not shown), or other mass storage device to computing system
800. Communications adapter 822 may couple computing system
800 to a local, wide-area, or Internet network 824. User
interface adapter 816 may couple user input devices, such as
keyboard 826 and/or pointing device 814, to computing system
800. Moreover, display adapter 818 may be driven by CPU 802 to
control the display on display device 820. CPU 802 may be any
general purpose CPU.
It is noted that the methods and apparatus described thus
far and/or described later in this document may be achieved
utilizing any of the known technologies, such as standard
digital circuitry, analog circuitry, any of the known
processors that are operable to execute software and/or
firmware programs, programmable digital devices or systems,
programmable array logic devices, or any combination of the
above. One or more embodiments of the invention may also be
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
embodied in a software program for storage in a suitable
storage medium and execution by a processing unit.
Although the invention herein has been described with
reference to particular embodiments, it is to be understood
that these embodiments are merely illustrative of the
principles and applications of the present invention. It is
therefore to be understood that numerous modifications may be
made to the illustrative embodiments and that other
arrangements may be devised without departing from the spirit
and scope of the present invention as defined by appended
claims appended to the end of this description.
It is noted that additional disclosure text is appended
hereto which precede the claims and abstract of this
application.
21
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
SYSTEM AND METHOD FOR EXACT
RENDERING IN A ZOOMING USER INTERFACE
FIELD OF THE INVENTION
The present invention relates generally to graphical
zooming user interfaces (ZUI) for computers. More
specifically, the invention is a system and method for
progressively rendering zoomable visual content in a manner
that is both computationally efficient, resulting in good user
responsiveness and interactive frame rates, and exact, in the
sense that vector drawings, text, and other non-photographic
content is ultimately drawn without the resampling which would
normally lead to degradation in image quality, and without
interpolation of other images, which would also lead to
degradation.
BACKGROUND OF THE INVENTION
Most present-day graphical computer user interfaces
(GUIs) are designed using visual components of a fixed spatial
scale. However, it was recognized from the birth of the field
of computer graphics that visual components could be
represented and manipulated in such a way that they do not
have a fixed spatial scale on the display, but can be zoomed
in or out. The desirability of zoomable components is obvious
in many application domains; to name only a few: viewing maps,
browsing through large heterogeneous text layouts such as
newspapers, viewing albums of digital photographs, and working
with visualizations of large data sets. Even when viewing
ordinary documents, such as spreadsheets and reports, it is
often useful to be able to glance at a document overview, and
then zoom in on an area of interest. Many modern computer
applications include zoomable components, such as Microsoft
Word and other Office products (Zoom under the View menu),
22
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Adobe Photoshop , Adobe Acrobat , QuarkXPress , etc.
In most cases, these applications allow zooming in and out of
documents, but not necessarily zooming in and out of the
visual components of the applications themselves. Further,
zooming is normally a peripheral aspect of the user's
interaction with the software, and the zoom setting is only
modified occasionally. Although continuous panning over a
document is standard (i.e., using scrollbars or the cursor to
translate the viewed document left, right, up or down), the
ability to zoom and pan continuously in a user-friendly manner
is absent from prior art systems.
First, we set forth several definitions. A display is
the device or devices used to output rendered imagery to the
user. A frame buffer is used to dynamically represent the
contents of at least a portion of the display. Display
refresh rate is the rate at which the physical display, or
portion thereof, is refreshed using the contents of the frame
buffer. A frame buffer's frame rate is the rate at which the
frame buffer is updated.
For example, in a typical personal computer, the display
refresh rate is 60-90 Hz. Most digital video, for example,
has a frame rate of 24-30 Hz. Thus, each frame of digital
video will actually be displayed at least twice as the display
is refreshed. Plural frame buffers may be utilized at
different frame rates and thus be displayed substantially
simultaneously on the same display. This would occur, for
example, when two digital videos with different frame rates
were being played on the same display, in different windows.
One problem with zooming user interfaces (ZUI) is that
the visual content has to be displayed at different
resolutions as the user zooms. The ideal solution to this
problem would be to display, in every consecutive frame, an
exact and newly computed image based on the underlying visual
23
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
content. The problem with such an approach is that the exact
recalculation of each resolution of the visual content in real
time as the user zooms is computationally impractical if the
underlying visual content is complex.
As a result of the foregoing, many prior art ZUI systems
use a plurality of precomputed images, each being a
representation of the same visual content but at different
resolutions. We term each of those different precomputed
images a Level of Detail (LOD). The complete set of LODs,
organized conceptually as a stack of images of decreasing
resolution, is termed the LOD pyramid- see Fig. 10. In such
prior systems, as zooming occurs, the system interpolates
between the LODs and displays a resulting image at a desired
resolution. While this approach solves the computational
issue, it displays a final compromised image that is often
blurred and unrealistic, and often involves loss of
information due to the fact that it represents interpolation
of different LODs. These interpolation errors are especially
noticeable when the user stops zooming and has the opportunity
to view a still image at a chosen resolution which does not
precisely match the resolution of any of the LODs.
Another problem with interpolating between precomputed
LODs is that this approach typically treats vector data in the
same way as photographic or image data. Vector data, such as
blueprints or line drawings, are displayed by processing a set
of abstract instructions using a rendering algorithm, which
can render lines, curves and other primitive shapes at any
desired resolution. Text rendered using scalable fonts is an
important special case of vector data. Image or photographic
data (including text rendered using bitmapped fonts) are not
so generated, but must be displayed either by interpolation
between precomputed LODs or by resampling an original image.
We refer to the latter herein as nonvector data.
24
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Prior art systems that use rendering algorithms to
redisplay vector data at a new resolution for each frame
during a zoom sequence must restrict themselves to simple
vector drawings only in order to achieve interactive frame
rates. On the other hand, prior art systems that precompute
LODs for vector data and interpolate between them, as for
nonvector data, suffer from markedly degraded visual quality,
as the sharp edges inherent in most vector data renditions are
particularly sensitive to interpolation error. This
degradation is usually unacceptable for textual content, which
is a special case of vector data.
It is an object of the invention to create a ZUI that
replicates the zooming effect a user would see if he or she
actually had viewed a physical object and moved it closer to
himself or herself.
It is an object of the invention to create a ZUI that
displays images at an appropriate resolution but which avoids
or diminishes the interpolation errors in the final displayed
image. A further object of the present invention is to allow
the user to zoom arbitrarily far in on vector content while
maintaining a crisp, unblurred view of the content and
maintaining interactive frame rates.
A further object of the present invention is to allow the
user to zoom arbitrarily far out to get an overview of complex
vectorial content, while both preserving the overall
appearance of the content and maintaining interactive frame
rates.
A further object of the present invention is to diminish
the user's perception of transitions between LODs or rendition
qualities during interaction.
A further object of the present invention is to allow the
graceful degradation of image quality by blurring when
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
information ordinarily needed to render portions of the image
is as yet incomplete.
A further object of the present invention is to gradually
increase image quality by bringing it into sharper focus as
more complete information needed to render portions of the
image becomes available.
It is an object of the invention to optimally and
independently render both vector and nonvector data.
These and other objects of the present invention will
become apparent to those skilled in the art from a review of
the specification that follows.
SUNIIMARY OF THE INVENTION
The above and other problems of the prior art are
overcome in accordance with the present invention, which
relates to a hybrid strategy for implementing a ZUI allowing
an image to be displayed at a dynamically varying resolution
as a user zooms in or out, rotates, pans, or otherwise
changes his or her view of an image. Any such change in view
is termed navigation. Zooming of the image to a resolution
not equal to that of any of the predefined LODs is
accomplished by displaying the image at a new resolution that
is interpolated from predefined LODs that "surround" the
desired resolution. By "surrounding LODs" we mean the LOD of
lowest resolution which is greater than the desired resolution
and the LOD of highest resolution which is less than the
desired resolution. If the desired resolution is either
greater than the resolution of the LOD with the highest
available resolution or less than the resolution of the LOD
with the lowest resolution, then there will be only a single
"surrounding LOD". The dynamic interpolation of an image at a
desired resolution based on a set of precomputed LODs is
26
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
termed in the literature mipmapping or trilinear
interpolation. The latter term further indicates that
bilinear sampling is used to resample the surrounding LODs,
followed by linear interpolation between these resampled LODs
(hence trilinear). See, e.g.; Lance Williams. "Pyramidal
Parametrics," Computer Graphics (Proc. SIGGRAPH 183) 17(3): 1-
11 (1983). The foregoing document is incorporated herein by
reference in its entirety. Obvious modifications of or
extensions to the mipmapping technique introduced by Williams
use nonlinear resampling and/or interpolation of the
surrounding LODs. In the present invention it is immaterial
whether the resampling and interpolation operations are
zeroth-order (nearest-neighbor), linear, higher-order, or more
generally nonlinear.
In accordance with the invention described herein, when
the user defines an exact desired resolution, which is almost
never the resolution of one of the predefined LODs, the final
image is then displayed by preferably first displaying an
intermediate final image. The intermediate final image is the
first image displayed at the desired resolution before that
image is refined as described hereafter. The intermediate
final image may correspond to the image that would be
displayed at the desired resolution using the prior art.
In a preferred embodiment, the transition from the
intermediate final image to the final image may be gradual, as
explained in more detail below.
In an enhanced embodiment, the present invention allows
LODs to be spaced in any resolution increments, including
irrational increments (i.e. magnification or minification
factors between consecutive LODs which cannot be expressed as
the ratio of two integers), as explained in more detail below.
In another enhanced embodiment, portions of the image at
each different LOD are denoted tiles, and such tiles are
27
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
rendered in an order that minimizes any perceived
imperfections to a viewer. In other embodiments, the
displayed visual content is made up of plural LODs
(potentially a superset of the surrounding LODs as described
above), each of which is displayed in the proper proportion
and location in order to cause the display to gradually fade
into the final image in a manner that conceals imperfections.
The rendition of various tiles in plural LODs is
accomplished in an order that optimizes the appearance of the
visual content while staying within acceptable levels of
computational complexity so that the system can run on
standard computers with typical clock speeds available in most
laptop and desktop personal computers.
The present invention involves a hybrid strategy, in
which an image is displayed using predefined LODs during rapid
zooming and panning, but when the view stabilizes
sufficiently, an exact LOD is rendered and displayed. The
exact LOD is rendered and displayed at the precise resolution
chosen by the user, which is normally different from the
predefined LODs. Because the human visual system is
insensitive to fine detail in the visual content while it is
still in motion, this hybrid strategy can produce the illusion
of continuous "perfect rendering" with far less computation.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 10 depicts an LOD pyramid (in this case the base
of the pyramid, representing the highest-resolution
representation, is a 512x512 sample image, and successive
minifications of this image are shown in factors of 2);
Figure 11 depicts a flow chart for use in an exemplary
embodiment of the invention;
28
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Figure 12 is another flow chart that shows how the system
displays the final image after zooming;
Figure 13 is the LOD pyramid of Figure 1 with grid lines
added showing the subdivision of each LOD into rectangular
tiles of equal size in samples;
Figure 14 is another flow chart, for use in connection
with the present invention, and it depicts a process for
displaying rendered tiles on a display;
Figure 15 shows a concept termed irrational tiling,
explained in more detail herein;
Figure 16 depicts a composite tile and the tiles that
make up the composite tile, as explained more fully below; and
FIG. 17 depicts successive stages of the rendering, in
foveated order, of the second layer of the LOD pyramid of FIG.
4, the successive stages being illustrated in FIGS. 17A, 17B,
17C, and 17D, respectively, in accordance with one or more
embodiments of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Figure 11 shows a flowchart of a basic technique for
implementation of the present invention. The flowchart of
Figure 11 represents an exemplary embodiment of the invention
and would begin executing when an image is displayed at an
initial resolution. It is noted that the invention may be
used in the client/server model, but that the client and
server may be on the same or different machines. Thus, for
example, there could be a set of discrete LODs stored remotely
at a host computer, and the user can be connected to said host
through a local PC. The actual hardware platform and system
utilized are not critical to the present invention.
The flowchart is entered at start block 241 with an
initial view of an image at a particular resolution. In this
29 1
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
example, the image is taken to be static. The image is
displayed at block 242. A user may navigate that image by
moving, for example, a computer mouse. The initial view
displayed at block 242 will change when the user navigates the
image. It is noted that the underlying image may itself be
dynamic, such as in the case of motion video, however, for
purposes of this example, the image itself is treated as
static. As explained above, any image to be displayed may
also have textual or other vector data and/or nonvector data
such as photographs and other images. The present invention,
and the entire discussion below, is applicable regardless of
whether the image comprises vector or nonvector data, or both.
Regardless of the type of visual content displayed in
block 242, the method transfers control to decision point 243
at which navigation input may be detected. If such input is
not detected, the method loops back to block 242 and continues
displaying the stationary visual content. If a navigation
input is detected, control will be transferred to block 244 as
shown.
Decision point 243 may be implemented by a continuous
loop in software looking for a particular signal that detects
movement, an interrupt system in hardware, or any other
desired methodology. The particular technique utilized to
detect and analyze the navigation request is not critical to
the present invention. Regardless of the methodology used,
the system can detect the request, thus indicating a desire to
navigate the image. Although much of the discussion herein
relates to zooming, it is noted that the techniques are
applicable to zooming, panning, or otherwise navigating.
Indeed, the techniques described herein are applicable to any
type of dynamic transformation or change in perspective on the
image. Such transformations may include, for example, three
dimensional translation and rotation, application of an image
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
filter, local stretching, dynamic spatial distortion applied
to selected areas of the image, or any other kind of
distortion that might reveal more information. Another
example would be a virtual magnifying glass, that can get
moved over the image and which magnifies parts of the image
under the virtual magnifying glass. When decision point 243
detects that a user is initiating navigation, block 244 will
then render and display a new view of the image, which may be,
for example, at a different resolution from the prior
displayed view.
One straightforward prior art technique of displaying the
new view is based upon interpolating LODs as the user zooms in
or out. The selected LODs may be those two LODs that
"surround" the desired resolution; i.e.; the resolution of the
new view. The interpolation, in prior systems, constantly
occurs as the user zooms and is thus often implemented
directly in the hardware to achieve speed. The combination of
detection of movement in decision point 245 and a
substantially immediate display of an appropriate interpolated
image at block 244 results in the image appearing to zoom
continuously as the user navigates. During zooming in or out,
since the image is moving, an interpolated image is sufficient
to look realistic and clear. Any interpolation error is only
minimally detectable by the human visual system, as such
errors are disguised by the constantly changing view of the
image.
At decision point 245, the system tests whether or not
the movement has substantially ceased. This can be
accomplished using a variety of techniques, including, for
example, measuring the rate of change of one or more
parameters of the view. That is, the methodology ascertains
whether or not the user has arrived at the point where he has
finished zooming. Upon such stabilization at decision point
31
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
245, control is transferred to block 246, where an exact image
is rendered, after which control returns to block 243. Thus,
at any desired resolution, the system will eventually display
an exact LOD.
Notably, the display is not simply rendered and displayed
by an interpolation of two predefined LODs, but may be
rendered and displayed by re-rendering vector data using the
original algorithm used to render the text or other vector
data when the initial view was displayed at block 242.
Nonvector data may also be resampled for rendering and
displayed at the exact required LOD. The required re-
rendering or resampling may be performed not only at the
precise resolution required for display at the desired
resolution, but also on a sampling grid corresponding
precisely to the correct positions of the display pixels
relative to the underlying content, as calculated based on the
desired view. As an example, translation of the image on the
display by '-~ pixel in the display plane does not change the
required resolution, but it does alter the sampling grid, and
therefore requires re-rendering or resampling of the exact
LOD.
The foregoing system of Fig. 11 represents a hybrid
approach in which interpolation based upon predefined LODs is
utilized while the view is changing (e.g. navigation is
occurring) but an exact view is rendered and displayed when
the view becomes substantially stationary.
For purposes of explanation herein, the term render
refers to the generation by the computer of a tile at a
specific LOD based upon vector or nonvector data. With
respect to nonvector data, these may be rerendered at an
arbitrary resolution by resampling an original image at higher
or lower resolution.
32
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
We turn now to the methodology of rendering and
displaying the different portions of the visual content needed
to achieve an exact final image as represented by block 246 of
Fig. 11. With reference to Fig. 12, when it is determined
that navigation has ceased, control is transferred to block
333 and an interpolated image is immediately displayed, just
as is the case during zooming. We call this interpolated
image that may be temporarily displayed after the navigation
ceases the intermediate final image, or simply an intezrmediate
image. This image is generated from an interpolation of the
surrounding LODs. In some cases, as explained in more detail
below, the intermediate image may be interpolated from more
than two discrete LODs, or from two discrete LODs other than
the ones that surround the desired resolution.
Once the intermediate image is displayed, block 334 is
entered, which causes the image to begin to gradually fade
towards an exact rendition of the image, which we term the
final image. The final image differs from the intermediate
image in that the final image may not involve interpolation of
any predefined LODs. Instead, the final image, or portions
thereof, may comprise newly rendered tiles. In the case of
photographic data, the newly rendered tiles may result from
resampling the original data, and in the case of vector data,
the newly rendered tiles may result from rasterization at the
desired resolution.
It is also noted that it is possible to skip directly
from block 333 to 335, immediately replacing the interpolated
image with a final and exact image. However, in the preferred
embodiment, step 334 is executed so the changeover from the
intermediate final image to the final image is done gradually
and smoothly. This gradual fading, sometimes called blending,
causes the image to come into focus gradually when navigation
ceases, producing an effect similar to automatic focusing in
33
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
cameras or other optical instruments. The illusion of
physicality created by this effect is an important aspect of
the present invention.
Following is a discussion of the manner in which this
fading or blending may take place in order to minimize
perceived irregularities, sudden changes, seams, and other
imperfections in the image. It is understood however that the
particular technique of fading is not critical to the present
invention, and that many variations will be apparent to those
of skill in the art.
Different LODs differ in the number of samples per
physical area of the underlying visual content. Thus, a first
LOD may take a 1 inch by 1 inch area of a viewable object and
generate a single 32 by 32 sample tile. However, the
information may also be rendered by taking the same 1 inch by
1 inch area and representing it as a tile that is 64 by 64
samples, and therefore at a higher resolution.
We define a concept called irrational tiling. Tiling
granularity, which we will write as the variable g, is defined
as the ratio of the linear tiling grid size at a higher-
resolution LOD to the linear tiling grid size at the next
lower-resolution LOD. In the Williams paper introducing
trilinear interpolation, g = 2. This same value of g has been
used in other prior art. Although LODs may be subdivided into
tiles in any fashion, in an exemplary embodiment each LOD is
subdivided into a grid of square or rectangular tiles
containing a constant number of samples (except, as required,
at the edges of the visual content) . Conceptually, when g =
2, each tile at a certain LOD "breaks up" into 2x2=4 tiles at
the next higher-resolution LOD (again, except potentially at
the edges), as shown in Figure 13.
There are fundamental shortcomings in tilings of
granularity 2. Usually, if a user zooms in on a random point
34
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
in a tile, every g-fold increase in zoom will require the
rendition of a single additional tile corresponding to the
next higher-resolution LOD near the point toward which the
user is zooming. However, if a user is zooming in on a grid
line in the tiling grid, then two new tiles need to be
rendered, one on either side of the line. Finally, if a user
is zooming in on the intersection of two grid lines, then four
new tiles need to be rendered. If these events-requests for
1, 2 or 4 new tiles with each g-fold zoom-are interspersed
randomly throughout an extended zooming sequence, then overall
performance will be consistent. However, a grid line in any
integral-granularity tiling (i.e. where g is a whole number)
remains a grid line for every higher-resolution LOD.
Consider, for example, zooming in on the center of a very
large image tiled with granularity 2. We will write the (x,y)
coordinates of this point as adopting the convention
that the visual content falls within a square with corners
(0,0), (0,1), (1,0) and (1,1). Because the center is at the
intersection of two grid lines, as the user reaches each
higher-resolution LOD, four new tiles need to be rendered
every time; this will result in slow performance and
inefficiency for zooming on this particular point. Suppose,
on the other hand, that the user zooms in on an irrational
point-meaning a grid point (x,y) such that x and y cannot be
expressed as the ratios of two whole numbers. Examples of
such numbers are pi (=3.14159...) and the square root of 2
(=1.414213...). Then, it can easily be demonstrated that the
sequence of l's, 2's and 4's given by the number of tiles that
need to be rendered for every g-fold zoom is quasi-random,
i.e. follows no periodic pattern. This kind of quasi-random
sequence is clearly more desirable from the point of view of
performance; then there are no distinguished points for
zooming from a performance standpoint.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Irrational tiling resolves this issue: g itself is taken
to be an irrational number, typically the square root of 3, 5
or 12. Although this means that on average 3, 5 or 12 tiles
(correspondingly) at a given LOD are contained within a single
tile at the next lower-resolution LOD, note that the tiling
grids at consecutive LODs no longer "agree" on any grid lines
in this scheme (except potentially at the leading edges of the
visual content, x= and y=0, or at some other preselected
single grid line along each axis) If g is chosen such that
it is not the nth root of any integer (pi is such a number),
then no LODs will share any grid lines (again, potentially
except x=O and y=0). Hence it can be shown that each tile may
randomly overlap 1, 2, or 4 tiles at the next lower LOD,
whereas with g=2 this number is always 1.
With irrational tiling granularity, zooming in on any
point will therefore produce a quasi-random stream of requests
for 1, 2 or 4 tiles, and performance will be on average
uniform when zooming in everywhere. Perhaps the greatest
benefit of irrational tiling emerges in connection with
panning after a deep zoom. When the user pans the image after
having zoomed in deeply, at some point a grid line will be
moved onto the display. It will usually be the case that the
region on the other side of this grid line will correspond to
a lower-resolution LOD than the rest of the display; it is
desirable, however, for the difference between these
resolutions to be as small as possible. With integral g,
however, the difference will often be extremely large, because
grid lines can overlap over many consecutive LODs. This
creates "deep cracks" in resolution over the node area, as
shown in Figure 15 (a) .
On the other hand, because grid lines in an irrational
tiling never overlap those of an adjacent LOD (again with the
possible exception of one grid line in each direction, which
36
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
may be at one corner of the image), discontinuities in
resolution of more than one LOD do not occur. This increased
smoothness in relative resolution allows the illusion of
spatial continuity to be much more convincing.
Figure 15(b) illustrates the advantage gained by
irrational tiling granularity. Figure 15 shows cross-sections
through several LODs of the visual content; each bar
represents a cross-section of a rectangular tile. Hence the
second level from the top, in which there are two bars, might
be a 2x2=4 tile LOD. The curves 631, drawn from top to
bottom, represent the bounds of the visible area of the visual
content at the relevant LOD during a zooming operation: as the
resolution is increased (zooming in to reveal more detail),
the area under examination decreases. Darker bars (e.g., 632)
represent tiles which have already been rendered over the
course of the zoom. Lighter bars have not yet been rendered,
so cannot be displayed. Note that when the tiling is integral
as in Figure 15(a), abrupt changes in resolution over space
are common; if the user were to pan right after the zoom, then
at the spatial boundary indicated by the arrow, four LODs
would "end" abruptly. The resulting image would look sharp to
the left of this boundary, and extremely blurry to the right.
The same visual content represented using an irrational tiling
granularity lacks such resolution "cracks": adjacent LODs do
not share tile boundaries, except as shown at the left edge.
Mathematically, this shared boundary may occur at most in one
position on the x-axis and at one position on the y-axis. In
the embodiment shown these shared boundaries are positioned at
y=0 and x=0, but, if present, they may also be placed at any
other position.
Another benefit of irrational tiling granularity is that
it allows finer control of g, since there are a great many
more irrational numbers than integers, particularly over the
37
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
useful range where g is not too large. This additional
freedom can be useful for tuning the zooming performance of
certain applications. If g is set to the irrational square
root of an integer (such as sqrt ( 2), sqrt ( 5) or sqrt ( 8)), then
in the embodiment described above the grid lines of alternate
LODs would align exactly; if g is an irrational cube root,
then every third LOD would align exactly; and so on. This
confers an additional benefit with respect to limiting the
complexity of a composite tiling, as defined below.
An important aspect of the invention is the order in
which the tiles are rendered. More particularly, the various
tiles of the various LODs are optimally rendered such that all
visible tiles are rendered first. Nonvisible tiles may not be
rendered at all. Within the set of visible tiles, rendition
proceeds in order of increasing resolution, so that tiles
within low-resolution LODs are rendered first. Within any
particular LOD, tiles are rendered in order of increasing
distance from the center of the display, which we refer to as
foveated rendering. To sort such tiles in the described
order, numerous sorting algorithms such as heapsort,
quicksort, or others may be used. To implement this ordering,
a lexicographic key may be used for sorting "requests" to
render tiles, such that the outer subkey is visibility, the
middle subkey is resolution in samples per physical unit, and
the inner subkey is distance to the center of the display.
Other methods for ordering tile rendering requests may also be
used. The actual rendering of the tiles optimally takes place
as a parallel process with the navigation and display
described herein. When rendering and navigation/display
proceed as parallel processes, user responsiveness may remain
high even when tiles are slow to render.
We now describe the process of rendering a tile in an
exemplary embodiment. If a tile represents vector data, such
38
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
as alphabetic typography in a stroke based font, then
rendering of the tile would involve running the algorithm to
rasterize the alphabetic data and possibly transmitting that
data to a client from a server. Alternatively, the data fed
to the rasterization algorithm could be sent to the client,
and the client could run the algorithm to rasterize the tile.
In another example, rendering of a tile involving digitally
sampled photographic data could involve resampling of that
data to generate the tile at the appropriate LOD. For
discrete LODs that are prestored, rendering may involve no
more than simply transmitting the tile to a client computer
for subsequent display. For tiles that fall between discrete
LODs, such as tiles in the final image, some further
calculation as described above may be required.
At any given time, when the tiles are rendered and the
image begins to fade toward the exact image, the actual
display may comprise different mixes of different tiles from
different LODs. Thus, any portion of the display could
contain for example, 20% from LOD 1, 40% from LOD 2, and 40%
from LOD 3. Regardless of the tiles displayed, the algorithm
attempts to render tiles from the various LODs in a priority
order best suited to supply the rendered tiles for display as
they are most needed. The actual display of the rendered
tiles will be explained in more detail later with reference to
Figure 14.
In what follows we describe a method for drawing the
plural LODs using an algorithm which can guarantee spatial and
temporal continuity of image detail. The algorithm is
designed to make the best use of all rendered tiles, using
high-resolution tiles in preference to lower-resolution tiles
covering the same display area, yet using spatial blending to
avoiding sharp boundaries between LODs, and temporally
graduated blending weights to blend in higher detail if and
39
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
when it becomes available (i.e. when higher-resolution tiles
have been rendered). Unlike the prior art, this algorithm and
variants thereof can result in more than two LODs being
blended together at a given point on the display; it can also
result in blending coefficients that vary smoothly over the
display area; and it can result in blending coefficients that
evolve in time even after the user has stopped navigating. In
this exemplary embodiment it is nonetheless computationally
efficient, and can be used to render imagery as partially
transparent, or with an overall transparency that varies over
the image area, as will become apparent.
We define herein a composite tile area, or simply a
composite tile. To define a composite tile we consider all of
the LODs stacked on top of each other. Each LOD has its own
tile grid. The composite grid is then formed by the
projection of all of the grids from all of the LODs onto a
single plane. The composite grid is then made up of various
composite tiles of different sizes, defined by the boundaries
of tiles from all of the different LODs. This is shown
conceptually in Fig. 16. Fig. 16 depicts the tiles from three
different LODs, 701 through 703, all representing the same
image. One can imagine the LODs 701 through 703 being stacked
up on top of each other. In such a case, if one lined up
corner 750 from each of these LODs and stacked them on top of
each other, an area represented by 740 would be inside the
area represented by 730, and the areas represented by 730 and
740, would be inside the area represented by 720. Area 710 of
Fig. 16 shows that there would be a single "composite tile"
710. Each of the composite tiles is examined during each
frame, wherein the frame rate may be typically greater than
ten frames per second. Note that, as explained above, this
frame rate is not necessarily the display refresh rate.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Fig. 14 depicts a flow chart of an algorithm for updating
the frame buffer as tiles are rendered. The arrangement of
Fig. 14 is intended to operate on every composite tile in the
displayed image each time the frame buffer is updated. Thus,
for example, if a frame duration is 1/20 of a second, each of
the composite tiles on the entire screen would preferably be
examined and updated during each 1/20 of a second. When a
composite tile is operated upon by the process of Fig. 14, the
composite tile may lack the relevant tiles in one or more
LODs. The process of Fig. 14 attempts to display each
composite tile as a weighted average of all the available
superimposed tiles within which the composite tile lies. Note
that composite tiles are defined in such a way that they fall
within exactly one tile at any given LOD; hence the weighted
average can be expressed as a relative proportion of each LOD.
The process attempts to determine the appropriate weights for
each LOD within the composite tile, and to vary those weights
gradually over space and time to cause the image to gradually
fade towards the final image discussed above.
The composite grid includes plural vertices which are
defined to be any intersection or corner of gridlines in the
composite grid. These are termed composite grid vertices. We
define an opacity for each LOD at each composite grid vertex.
The opacity can be expressed as a weight between 0.0 and 1.0,
and the sum of all the LOD weights at each vertex should
therefore be 1.0 if the desired result is for the image to be
totally opaque. The current weights at any particular time
for each LOD at each vertex are maintained in memory.
The algorithm for updating vertex weights proceeds as
described below.
The following variables, which are taken to be numbers
between 0.0 and 1.0, are kept in memory for each tile:
centerOpacity, cornerOpacity for each corner (4 if the tiling
41
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
is a rectangular grid), and edgeOpacity for each edge (4 if
the tiling is a rectangular grid) . When a tile is first
rendered, all of its opacities as just listed are normally set
to 1Ø
During a drawing pass, the algorithm walks through the
composite tiling once for each relevant LOD, beginning with
the highest-resolution LOD. In addition to the per-tile
variables, the algorithm maintains the following variables:
levelOpacityGrid and opacityGrid. Both of these variables are
again numbers between 0.0 and 1.0, and are maintained for each
vertex in the composite tiling.
The algorithm walks through each LOD in turn, in order
from highest-resolution to lowest, performing the following
operations. First 0.0 is assigned to levelOpacityGrid at all
vertices. Then, for each rendered tile at that LOD (which may
be a subset of the set of tiles at that LOD, if some have not
yet been rendered), the algorithm updates the parts of the
levelOpacityGrid touching that tile based on the tile's
center0pacity, corner0pacity and edgeOpacity values:
If the vertex is entirely in the interior of the tile,
then it gets updated using centerOpacity.
If the vertex is e.g. on the tile's left edge, it gets
updated with the left edgeOpacity.
If the vertex is e.g. on the top right corner, it gets
updated with the top right corner0pacity.
"Updating" means the following: if the pre-existing
levelOpacityGrid value is greater than 0.0, then set the new
value to the minimum of the present value, or the value it's
being updated with. If the pre-existing value is zero (i.e.
this vertex hasn't been touched yet) then just set the
levelOpacityGrid value to the value it's being updated with.
The end result is that the levelOpacityGrid at each vertex
42
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
position gets set to the minimum nonzero value with which it
gets updated.
The algorithm then walks through the levelOpacityGrid and
sets to 0.0 any vertices that touch a tile which has not yet
been rendered, termed a hole. This ensures spatial continuity
of blending: wherever a composite tile falls within a hole, at
the current LOD, drawing opacity should fade to zero at all
vertices abutting that hole.
In an enhanced embodiment, the algorithm can then relax
all levelOpacityGrid values to further improve spatial
continuity of LOD blending. The situation as described thus
far can be visualized as follows: every vertex is like a
tentpole, where the levelOpacityGrid value at that point are
the tentpole's height. The algorithm has thus far ensured
that at all points bordering on a hole, the tentpoles have
zero height; and in the interior of tiles that have been
rendered, the tentpoles are set to some (probably) nonzero
value. In the extreme case, perhaps all the values inside a
rendered tile are set to 1Ø Assume for purposes of
illustration that the rendered tile has no rendered neighbors
yet, so the border values are 0Ø We have not specified how
narrow the "margin" is between a 0.0 border tentpole and one
of the 1.0 internal tentpoles. If this margin is too small,
then even though the blending is technically continuous, the
transition may be too sharp when measured as an opacity
derivative over space. The relax operation smoothes out the
tent, always preserving values of 0.0, but possibly lowering
other tentpoles to make the function defined by the tent
surface smoother, i.e. limiting its maximum spatial
derivative. It is immaterial to the invention which of a
variety of methods are used to implement this operation; one
approach, for example, is to use selective low-pass filtering,
locally replacing every nonzero value with a weighted average
43
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
of its neighbors while leaving zeroes intact. Other methods
will also be apparent to those skilled in the art.
The algorithm then walks over all composite grid
vertices, considering corresponding values of levelOpacityGrid
and opacityGrid at each vertex: if levelOpacityGrid is greater
than 1.0-opacityGrid, then levelOpacityGrid is set to 1.0-
opacityGrid. Then, again for each vertex, corresponding
values of levelOpacityGrid are added to opacityGrid. Due to
the previous step, this can never bring opacityGrid above 1Ø
These steps in the algorithm ensure that as much opacity as
possible is contributed by higher-resolution LODs when they
are available, allowing lower-resolution LODs to "show
through" only where there are holes.
The final step in the traversal of the current LOD is to
actually draw the composite tiles at the current LOD, using
levelOpacityGrid as the per-vertex opacity values. In an
enhanced embodiment, levelOpacityGrid can be multiplied by a
scalar overallOpacity variable in the range 0.0 to 1.0 just
before drawing; this allows the entire image to be drawn with
partial transparency given by the overallOpacity. Note that
drawing an image-containing polygon, such as a rectangle, with
different opacities at each vertex is a standard procedure.
It can be accomplished, for example, using industry-standard
texture mapping functions using the OpenGL or Direct3D
graphics libraries. In practice, the drawn opacity within the
interior of each such polygon is spatially interpolated,
resulting in a smooth change in opacity over the polygon.
In another enhanced embodiment of the algorithm described
above, tiles maintain not only their current values of
centerOpacity, corner0pacity and edgeOpacity (called the
current values), but also a parallel set of values called
targetCenter0pacity, targetCorner0pacity and targetEdgeOpacity
(called the target values). In this enhanced embodiment, the
44
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
current values are all set to 0.0 when a tile is first
rendered, but the the target values are all set to 1Ø Then,
after each frame, the current values are adjusted to new
values closer to the target values. This may be implemented
using a number of mathematical formulae, but as an example, it
can be done in the following way: newValue = oldValue*(1-b) +
targetValue*b, where b is a rate in greater than 0.0 and less
than 1Ø A value of b close to 0.0 will result in a very
slow transition toward the target value, and a value of b
close to 1.0 will result in a very rapid transition toward the
target value. This method of updating opacities results in
exponential convergence toward the target, and results in a
visually pleasing impression of temporal continuity. Other
formulae can achieve the same result.
FIG. 17 depicts successive stages of the rendering, in
foveated order, of the second layer of the LOD pyramid of FIG.
17, in accordance with one or more embodiments of the present
invention.
The foregoing describes the preferred embodiment of the
present invention. The invention is not limited to such
preferred embodiment, and various modifications consistent
with the appended claims are included within the invention as
well.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
A SYSTEM AND METHOD FOR MULTIPLE NODE DISPLAY
BACKGROUND OF THE INVENTION
The present invention relates to zooming user interfaces
(ZUI) for computers.
Most present day graphical computer user interfaces are
designed using visual components of a fixed spacial scale.
The visual content can be manipulated by zooming in or out or
otherwise navigating through it. However, the precision with
which coordinates of various objects can be represented is
extremely limited by the number of bits, usually between 16
and 64, designated to represent such coordinates. Because of
their limited representational size, there is limited
precision.
In the context of the zooming user interface, the user is
easily able to zoom in, causing the area which previously
covered only a single pixel to fill the entire display.
Conversely, the user may zoom out, causing the contents of the
entire display to shrink to the size of a single pixel. Since
each zoom in or out may multiply or divide the xy coordinates
by numerous orders of magnitude, just a few such zooms
completely exhaust the precision available with a 64 bit
floating point number, for example. Thereafter, round-off
causes noticeable degradation of image quality.
It is an object of the present invention to provide a ZUI
in which a larger range of zooms is possible.
It is a further object of the invention to provide a ZUI
in which the precision in which coordinates are represented is
related to the required precision needed at a particular zoom
level of detail. It is a further object of the present
invention to allow a pannable and zoomable two-dimensional
space of a finite physical size, but of an arbitrarily high
complexity or resolution, to be embedded into a well-defined
area of a larger pannable and zoomable two-dimensional space.
46
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
A further objective of the present invention is to allow
zooming out after a deep zoom-in to behave like the "back"
button of a web browser, letting the user retrace his or her
steps through a visual navigation.
A further objective of the present invention is to allow
zooming in immediately after zooming out to behave analogously
to the "forward" button of a web browser, letting the user
precisely undo the effects of an arbitrarily long zoom-out.
A further objective of the present invention is to allow
a node, a visual object as defined more precisely below, to
have a very large number of child nodes (for example, up to
10~28) .
A further objective of the present invention is to allow
a node to generate its own children programmatically on the
fly, enabling content to be defined, created or modified
dynamically during navigation.
A further objective of the present invention is to enable
near-immediate viewing of arbitrarily complex visual content,
even if this content is ultimately represented using a very
large amount of data, and even if the data are stored at a
remote location and shared over a low-bandwidth network.
A further objective of the present invention is to allow
the user to zoom arbitrarily far in on visual content while
maintaining interactive frame rates.
A further objective of the present invention is to allow
the user to zoom arbitrarily far out to get an overview of
complex visual content, in the process both preserving the
overall appearance of the content and maintaining interactive
frame rates.
These and other broader objectives of the present
invention will become apparent to those skilled in the art
from a review of the specification that follows.
SUN.MARY OF THE INVENTION
47
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
The above and other objects of the present invention are
accomplished by displaying visual content as plural "nodes."
Each node preferably has its own coordinate system and
rendering method, but may be contained within a parent node,
and may be represented in the coordinate system and rendering
method of the parent node. As a user navigates the visual
content, by for example, zooming in or out, a node is only
"launched" when the zooming results in an appropriate level of
detail. The launching of the node causes the node to be
represented in its own coordinate system and/or rendering
method, rather than in the coordinate system and/or rendering
method of a different node.
Prior to the node being launched, the node is either
represented in the coordinate system of the parent node, or
not represented at all. By launching nodes only when they are
required, the precision of a coordinate system is a function
of the zoom level of detail of what is being displayed. This
allows a variable level of precision, up to and including the
maximum permissible by the memory of the computer in which the
system operates.
DESCRIPTION OF THE DRAWINGS
For the purposes of illustration, there are forms shown
in the drawings that are presently preferred, it being
understood, however, that the invention is not limited to the
precise arrangements and instrumentalities shown.
FIG. 18 is a depiction of visual content on a display;
FIG. 19 is an image of the visual content of FIG. 18 at a
different level of detail;
FIG. 20 is a representation of an embodiment of the
invention;
FIG. 21 is an exemplary embodiment of the invention
showing plural nodes on a display;
FIG 22 is a tree diagram corresponding to the exemplary
48
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
embodiment shown in FIG 21.
FIG. 23 is a block diagram corresponding to a portion of
the tree diagram of FIG. 22.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
We assume a user interface metaphor in which the display
is a camera, through which the user can view part of a
two-dimensional surface, or 2D universe. For convenience,
although it is not necessary to do so, we ascribe physical
dimensions to this universe, so that it may be, for example,
one meter square. The invention is equally applicable to N-
dimensional representations.
The exemplary universe in turn contains 2D objects, or
nodes, which have a visual representation, and may also be
dynamic or interactive (i.e. video clips, applications,
editable text documents, CAD drawings, or still images) . For
a node to be visible it must be associated with a rendering
method, which is able to draw it in whole or in part on some
area of the display. Each node is also endowed with a local
coordinate system of finite precision. For illustrative
purposes, we assume a node is rectangular and represented by a
local coordinate system.
These two parameters, the rendering method and coordinate
system, specify how to display the node, and the positions of
items in the node. Each node may have 0 or more child nodes,
which are addressed by reference. The node need not, and
generally does not, contain all the information of each child
node, but instead only an address providing information
necessary to obtain the child node. As a user navigates, for
example, zooms in and out, the nodes are displayed on the
screen, as shown, for example in FIG. 18.
Generally, a "node" is the basic unit of functionality in
the present invention. Most nodes manifest visually on the
49
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
user's display during navigation, and some nodes may also be
animated and/or respond to user input. Nodes are
hierarchical, in that a node may contain child nodes. The
containing node is then called a parent node. When a parent
node contains a child node, the child's visual manifestation
is also contained within the parent's visual manifestation.
Each node has a logical coordinate system, such that the
entire extent of the node is contained within an exemplary
rectangle defined in this logical coordinate system; e.g. a
node may define a logical coordinate system such that it is
contained in the rectangle ( 0, 0)-(10 0, 10 0).
Each node may have the following data defining its
properties:
o the node's logical coordinate system, including its
logical size (100 x 100 in the above example);
o the identities, positions and sizes of any child
nodes, specified in the (parent) node's logical
coordinates;
0 optionally, any necessary user data;
- executable code defining these operations or "methods":
o initialization of the node's data based on
"construction arguments"
o rendering all or a portion of the node's visual
appearance (the output of this method is a rendered
tile) ;
0 optionally, responding to user input, such as
keyboard or mouse events.
The executable code defines a "node class", and may be
shared among many "node instances". Node instances differ in
their data content. Hence a node class might define the logic
needed to render a JPEG image. The "construction arguments"
given to the initialization code would then include the URL of
the JPEG image to display. A node displaying a particular
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
image would be an instance of the JPEG node class. Plural
instances of a node may be viewable in the s-ame visual
content, similar to the way a software application may be
instantiated numerous times simultaneously.
Note that in a complex visual document or application, it
is usually possible to divide the necessary functionality into
nodes in many different ways. For example, a scripted web-
page-like document containing multiple-images, pull-down menus
and buttons could be implemented as a single node with complex
rendering and user input methods. Alternatively, it could be
implemented as a parent node which only defines the overall
layout of the page, with every constituent image and button a
child node. This has the obvious advantage of reusing or
"factoring" the functionality more effectively: the buttons
may all have the same behavior, and hence all be instances of
the same node class; the images may all be in the same format
and so also be instances of a common node class, etc. This
also simplifies rearranging the layout- the parent node can
easily move or resize the child nodes.
In accordance with the present invention, visual content
may be displayed in a manner that depends upon the state of
navigation input by a user. For example, FIG. 18 shows a
node 105 which may be the image of a portion of the city.
Node 105 may contain child nodes 101-103. Node 101 may be an
image of a building in the city, node 107 could be an image of
a playground, and node 103 might be a sports arena. At the
level of zoom shown, nodes 101-103 are relatively small, so
they can be represented as a small darkened area with no
detail in node 105, located at the correct location in the
coordinate system of node 105. Only the coordinate system and
the rendering method of node 105 is needed.
Consider the case where the user now zooms in so that a
different level of detail (LOD) such as that shown in FIG. 19
results. In the LOD of FIG. 19, nodes 101 and 107 are no
51
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
longer visible on the screen, due to the fact that the visual
content is displayed as much larger. Additionally, it is
noted that the because the size at which sports arena node 103
is displayed is now much larger, the details of the sports
arena, such as the individual seats, the field, etc, now must
be displayed.
In furtherance of the foregoing, sports arena node 103
would now be displayed not as a darkened area with no detail
in the coordinate system of node 105, but rather, it would be
"launched" to be displayed using its own coordinate system and
rendering method. When displayed using its own coordinate
system and rendering method, the details such as seating, the
filed of play, etc. would be individually shown. Other
functions discussed above, and associated with the node 103,
would also begin executing at the point when node 103 is
launched. The particular navigation condition that causes
the launching of node 103, or any node for that matter, is a
function of design choice and is not critical to the present
invention.
The precision with which the node 103 will be displayed
is the combined precision of the coordinate system utilized by
node 105, as well as that of node 103. Thus, for example, if
the coordinate system of each of said nodes utilizes 8 bits,
then the combined precision will be 16 bits because the
coordinate system of node 103 is only utilized to specify the
position of items in node 103, but the overall location of
node 103 within node 105 is specified within the coordinate
system of node 105. Note that this nesting may continue
repeatedly if sports arena 103 itself contains additional
nodes within it. For example, one such node 201 may in fact
be a particular concession stand within the sports arena. it
is represented without much detail in the coordinate system
and rendering method of node 103. As a user continues
zooming in on sports arena 103, at some point node 201 will
52
CA 02599357 2007-08-27
WO 2006/105158
PCT/US2006/011405
launch. If it is displayed using 8 bits of precision, those
8 bits will specify where within the node 201 coordinate
system particular items are to be displayed. Yet, the
location of node 201 within node 103 will be maintained to 8
bits of precision within the coordinate system of node 103,
the location of which will in turn be maintained within the
coordinate system of node 105 using 8 bits. Hence, items
within node 201 will ultimately be displayed using 24 bits of
precision.
By nesting nodes within nodes, the precision at which
visual content may ultimately be displayed is limited only by
the memory capacity of the computer. The ultimate precision
with which visual content in a node is displayed after that
node is launched is effectively the combined precision of all
parent nodes and the precision of the node that has launched.
Hence, depending upon the level of nesting, the precision may
increase as needed limited only by the storage capacity of the
computer, which is almost always much more than sufficient.
Additionally, the increased precision is only utilized when
necessary, because if the image is at an LOD that does not
require launching, then in accordance with the above
description, it will only be displayed with the precision of
the node within which it is contained if that node has been
launched. Thus, for nodes nested within other nodes, as one
moves from the outermost node inward, one may traverse nodes
that have launched until finally reaching a node that has not
launched yet. Any such unlaunched node, and nodes further
within it, will be displayed only with the precision of the
last traversed node that has launched.
This results in an "accordion" type precision, wherein
the precision at which visual content is displayed expands and
contracts as necessary and as dictated by the navigational
input of the user, maximizing the efficiency of system
resources by using them only when necessary for higher
53
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
precision.
It is also noted, that when a node launches the display
of that node changes from being based upon the coordinates and
rendering method of the parent node to the coordinates and
rendering method of the child node. That change is optimally
made gradual through the use of blending, as described, for
example, in copending US patent application no. 10/790,253.
However, other methodologies of gradually changing from the
display of the information in the coordinate system and
rendering method the parent node to the child node are
possible. The system could be programmed, for example, that
over a particular range, the blending from parent to child
occurs. Then, as the user traverses through that range during
a zoom, the changeover occurs, unless the navigation is ceased
during that range, in which case the blending may continue
until fully displayed in the appropriate coordinate system.
An additional issue solved by the present invention
relates to a system for maintaining the spatial
interrelationship among all nodes during display. More
particularly, during dynamic navigation such as zooming and
panning, many different coordinate systems are being used to
display potentially different nodes. Some nodes, as explained
above, are being displayed merely as an image in the
coordinate system of other nodes, and some are being displayed
in their own coordinate systems. Indeed, the entire visual
display may be populated with nodes displayed at different
positions in different coordinate systems, and the coordinate
systems and precisions used for the various nodes may vary
during navigation as nodes are launched. Hence, it is
important to ensure that the nodes are properly located with
respect to each other, because each node is only knowledgeable
of its own coordinate system. The present invention provides
a technique for propagating relative location information
among all of the nodes and for updating that information when
54
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
needed so that each node will "know" the proper position in
the overall view at which it should render itself.
The foregoing may be accomplished with the addition of a
field to the node structure and an additional address stack
data structure. The expanded node definition includes a field
which we term the "view" field, and which is used by the node
to locate itself relative to the entire display. The view
field represents, in the coordinates of that node, the visible
area of the node-that is, the image of the display rectangle
in the node's coordinates. This rectangle may only partially
overlap the node's area, as when the node is partially off-
screen. Clearly the view field cannot always be kept updated
for every node, as we cannot necessarily traverse the entire
directed graph of nodes in real time as navigation occurs.
The stack structure is defined thus:
Stack<Address> view5tack;
where this stack is a global variable of the client
(the computer connected to the display). For exemplary
purposes we assume that navigation begins with an overview of
a universe of content, defined by a root node; then this root
node is pushed onto the viewStack, and the root node's view
field might be initialized to be the entire area of the root
node, i.e.
rootNode.view = rootNode.coordSystem;
Push(viewStack, rootNode);
Schematically, the viewStack will specify the addresses
of a sequence of nodes "pierced" by a point relative to the
display, which we will take in our exemplary implementation to
be the center of the display. This sequence must begin with
the root node, but may be infinite, and therefore must be
truncated. In an exemplary embodiment, the sequence is
truncated when the nodes "pierced" become smaller than some
minimum size, defined as minimumArea. The current view is
then represented by the view fields of all of the nodes in the
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
viewStack, each of which specify the current view in terms of
the node's local coordinate system. If a user has zoomed very
deeply into a universe, then the detailed location of the
display will be given most precisely by the view field of the
last node in the stack. The last element's view field does
not, however, specify the user's viewpoint relative to the
entire universe, but only relative to its local coordinates.
The view field of the root node, on the other hand, does
specify where in the universe the user is looking. Nodes
closer to the "fine end" of the viewStack thus specify the
view position with increasing precision, but relative to
progressively smaller areas in the universe. This is shown
conceptually in FIG. 20, where it can be seen that of the
three nodes 301, 302, and 303 that have been launched, node
303 provides the most accurate indication of where the user is
looking, since its coordinate system is the "finest", but node
301 provides information, albeit not as fine, on a much larger
area of the visual content.
The problem then reduces to the following: the views
(i.e. view fields) of all visible nodes must be kept
synchronized as the user navigates through the universe,
panning and zooming. Failure to keep them synchronized would
result in the appearance of nodes moving on the display
independently of each other, rather than behaving as a
cohesive and physically consistent 2D surface.
Changing the view during any navigation operation
proceeds as follows. Because the last node in the viewStack
has the most precise representation of the view, the first
step is to alter the view field of this last node; this
altered view is taken to be the correct new view, and any
other visible nodes must follow along. The second step is to
propagate the new view "upward" toward the root node, which
entails making progressively smaller and smaller changes to
the view fields of nodes earlier in the stack. If the user is
56
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
deeply zoomed, then at some point in the upward propagation
the alteration to the view may be so small that it ceases to
be accurately representable; upward propagation stops at this
node. At each stage of the upward propagation, the change is
also propagated downward to other visible nodes. Hence,
first, the last node's parent's view is modified; then, in the
downward propagation, the last node's "siblings". The next
upward propagation modified the grandparent's view, and the
second downward propagation modifies first uncles, then first
cousins. The downward propagation is halted, as before, when
the areas of "cousin nodes" become smaller than minimumArea,
or when a node falls entirely offscreen.
The foregoing technique involves translating the layout
of the various nodes into a tree, which conceptually is
illustrated in FIGs. 21 and 22. As can be seen from FIGs. 21
and 22, there is a corresponding tree for a particular
displayed set of nodes, and the tree structure may be used to
propagate the view information as previously described. Each
of the nodes 401 through 408 corresponds to a point on the
tree with a similar number shown in FIG. 22. Shown
conceptually in Figure 20 is the information that would be
stored with respect to a node, such as the pointer and the
information indicative of the size and location of another
node, for example. FIG. 23 is a block diagram corresponding
to a portion of the tree diagram of FIG. 22, in which
exemplary values are employed to illustrate the data that may
be used to define the properties of node 403. In this
exemplary case, node 403 includes a pointer to child node 406,
information about the logical size (100X100) of child node
406, and data specifying the location of child node 406 in the
coordinates of node 403: 10, 20.
A panning operation may move the last node far enough
away that it no longer belongs in the viewStack.
Alternatively, zooming in might enlarge a child to the extent
57
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
that a lengthening of the viewStack is required, or zooming
out might bring the last node's area below a minimum area
requiring a truncation of the viewStack. In all of these
cases the identity of the last node changes. These situations
are detected during the downward propagation, which may alter
the viewStack accordingly, potentially leaving it longer or
shorter.
One simple case of the foregoing is that during zooming,
a node gets launched so that now it needs to be placed in the
view stack. Another example is that by zooming out, a
previously visible node becomes so small that it must be
removed from the viewstack.
An extension of the idea is to avoid truncating the
viewStack immediately in response to a long outward zoom.
Truncating the viewStack is only necessary if the user then
pans. Although a long outward zoom will cause the view fields
of deeply zoomed nodes to grow very large (and therefore
numerically inaccurate), a field
Point2D viewCenter;
can be added to the Node structure, representing the
central point of the view rectangle; zooming without panning
therefore does not alter the viewCenter field of any node.
This construction allows zooming far outward to be followed
immediately by zooming back in. Because the viewStack has
been left intact, the user can then return to precisely the
starting view. This behavior is analogous to the "back" and
"forward" buttons of a web browser: "back" is analogous to
zooming out, and "forward" is analogous to zooming back in.
In a web browser, if a user uses "back" to return to a
previous web page, but then follows an alternate link, it is
at this point that "forward" ceases to work. Following an
alternate link is thus analogous to panning after zooming out.
The foregoing provides that visual content may be
displayed and navigated in a variety of fashions with
58
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
substantially infinite precision, limited only by the capacity
of the computer system on which the application is running.
The visual content displayed at any given time is then
displayed as an assembly of nodes, wherein only the nodes
needed for the particular view have been launched, and all
other nodes are displayed without launching as part of another
node or not at all. It is understood that various other
embodiments will be apparent to those of ordinary skill in the
art, and that the invention is not limited to the embodiments
described herein.
59
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
METHODS AND APPARATUS FOR NAVIGATING AN IMAGE
BACKGROUND OF THE INVENTION
The present invention relates to methods and apparatus for
navigating, such as zooming and panning, over an image of an
object in such a way as to provide the appearance of smooth,
continuous navigational movement.
Most conventional graphical computer user interfaces (GUIs)
are designed using visual components of fixed spatial scale, it
has long been recognized, however, that visual components may be
represented and manipulated such that they do not have a fixed
spatial scale on the display; indeed, the visual components may
be panned and/or zoomed in or out. The ability to zoom in and
out on an image is desirable in connection with, for example,
viewing maps, browsing through text layouts such as newspapers,
viewing digital photographs, viewing blueprints or diagrams, and
viewing other large data sets.
Many existing computer applications, such as Microsoft Word,
Adobe Photo Shop, Adobe Acrobat, etc., include zoomable
components. In general, the zooming capability provided by these
computer applications is a peripheral aspect of a user's
interaction with the software and the zooming feature is only
employed occasionally. These computer applications permit a user
to pan over an image smoothly and continuously (e.g., utilizing
scroll bars or the cursor to translate the viewed image left,
right, up or down). A significant problem with such computer
applications, however, is that they do not permit a user to zoom
smoothly and continuously. Indeed, they provide zooming in
discrete steps, such as 10 0, 25%, 50 0, 75%, 100%, 150%, 200%,
5000, etc. The user selects the desired zoom using the cursor
and, in response, the image changes abruptly to the selected zoom
level.
The undesirable qualities of discontinuous zooming also
exist in Internet-based computer applications. The computer
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
application underlying the www.mapquest.com website illustrates
this point. The MapQuest website permits a user to enter one or
more addresses and receive an image of a roadmap in response.
FIGS. 1-4 are examples of images that one may obtain from the
MapQuest website in response to a query for a regional map of
Long Island, NY, U.S.A. The MapQuest website permits the user to
zoom in and zoom out to discrete levels, such as 10 levels. FIG.
24 is a rendition at zoom level 5, which is approximately 100
meters/pixel. FIG. 25 is an image at a zoom level 6, which is
about 35 meters/pixel. FIG. 26 is an image at a zoom level 7,
which is about 20 meters/pixel. FIG. 27 is an image at a zoom
level 9, which is about 10 meters/pixel.
As can be seen by comparing FIGs. 1-4, the abrupt
transitions between zoom levels result in a sudden and abrupt
loss of detail when zooming out and a sudden and abrupt addition
of detail when zooming in. For example, no local, secondary or
connecting roads may be seen in FIG. 24 (at zoom level 5),
although secondary and connecting roads suddenly appear in FIG.
25, which is the very next zoom level. Such abrupt
discontinuities are very displeasing when utilizing the MapQuest
website. It is noted, however, that even if the MapQuest
software application were modified to permit a view of, for
example, local streets at zoom level 5 (FIG. 24), the results
would still be unsatisfactory. Although the visual density of
the map would change with the zoom level such that at some level
of zoom, the result might be pleasing (e.g., at level 7, FIG.
26), as one zoomed in the roads would not thicken, making the map
look overly sparse. As one zoomed out, the roads would
eventually run into each other, rapidly forming a solid nest in
which individual roads would be indistinguishable.
The ability to provide smooth, continuous zooming on images
of road maps is problematic because of the varying levels of
coarseness associated with the road categories. In the United
States, there are about five categories of roads (as categorized
under the Tiger/Line Data distributed by the U.S. Census Bureau):
61
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Al, primary highways; A2, primary roads; A3, state highways,
secondary roads, and connecting roads; A4, local streets, city
streets and rural roads; and A5, dirt roads. These roads may be
considered the elements of an overall object (i.e., a roadmap).
The coarseness of the road elements manifests because there are
considerably more A4 roads than A3 roads, there are considerably
more A3 roads than A2 roads, and there are considerably more A2
roads than Al roads. In addition, the physical dimensions of the
roads (e.g., their widths), vary significantly. Al roads may be
about 16 meters wide, A2 roads may be about 12 meters wide, A3
roads may be about 8 meters wide, A4 roads may be about 5 meters
wide, and A5 roads may be about 2.5 meters wide.
The MapQuest computer application deals with these varying
levels of coarseness by displaying only the road categories
deemed appropriate at a particular zoom,level. For example, a
nation-wide view might only show Al roads, while a state-wide
view might show Al and A2 roads, and a county-wide view might
show Al, A2 and A3 roads. Even if MapQuest were modified to
allow continuous zooming of the roadmap, this approach would lead
to the sudden appearance and disappearance of road categories
during zooming, which is confusing and visually displeasing.
In view of the foregoing, there are needs in the art for new
methods and apparatus for navigating images of complex objects,
which permit smooth and continuous zooming of the image while
also preserving visual distinctions between the elements of the
objects based on their size or importance.
SUMMARY OF THE INVENTION
In accordance with one or more aspects of the present
invention, methods and apparatus are contemplated to perform
various actions, including: zooming into or out of an image
having at least one object, wherein at least some elements of at
least one object are scaled up and/or down in a way that is
non-physically proportional to one or more zoom levels associated
with the zooming.
62
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
The non-physically proportional scaling may be expressed by
the following formula: p = c- d- za, where p is a linear size
in pixels of one or more elements of the object at the zoom
level, c is a constant, d is a linear size in physical units of
the one or more elements of the object, z is the zoom level in
units of physical linear size/pixel, and a is a scale power where
a 0 -1.
Under non-physical scaling, the scale power a is not equal
to -1 (typically -1 < a < 0) within a range of zoom levels zO and
zl, where zO is of a lower physical linear size/pixel than z1.
Preferably, at least one of zO and z1 may vary for one or more
elements of the object. It is noted that a, c and d may also
vary from element to element.
At least some elements of the at least one object may also
be scaled up and/or down in a way that is physically proportional
to one or more zoom levels associated with the zooming. The
physically proportional scaling may be expressed by the following
formula: p = c- d/z, where p is a linear size in pixels of one
or more elements of the object at the zoom level, c is a
constant, d is a linear size of the one or more elements of the
object in physical units, and z is the zoom level in units of
physical linear size/pixel.
It is noted that the methods and apparatus described thus
far and/or described later in this document may be achieved
utilizing any of the known technologies, such as standard digital
circuitry, analog circuitry, any of the known processors that are
operable to execute software and/or firmware programs,
programmable digital devices or systems, programmable array logic
devices, or any combination of the above. The invention may also
be embodied in a software program for storage in a suitable
storage medium and execution by a processing unit.
The elements of the object may be of varying degrees of
coarseness. For example, as discussed above, the coarseness of
the elements of a roadmap object manifests because there are
considerably more A4 roads than A3 roads, there are considerably
63
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
more A3 roads than A2 roads, and there are considerably more A2
roads than Al roads. Degree of coarseness in road categories
also manifests in such properties as average road length,
frequency of intersections, and maximum curvature. The
coarseness of the elements of other image objects may manifest in
other ways too numerous to list in their entirety. Thus, the
scaling of the elements in a given predetermined image may be
physically proportional or non-physically proportional based on
at least one of: (i) a degree of coarseness of such elements;
and (ii) the zoom level of the given predetermined image. For
example, the object may be a roadmap, the elements of the object
may be roads, and the varying degrees of coarseness may be road
hierarchies. Thus; the scaling of a given road in a given
predetermined image may be physically proportional or
non-physically proportional based on: (i) the road hierarchy of
the given road; and (ii) the zoom level of the given
predetermined image.
In accordance with one or more further aspects of the
present invention, methods and apparatus are contemplated to
perform various actions, including: receiving at a client
terminal a plurality of pre-rendered images of varying zoom
levels of a roadmap; receiving one or more user navigation
commands including zooming information at the client terminal;
and blending two or more of the pre-rendered images to obtain an
intermediate image of an intermediate zoom level that corresponds
with the zooming information of the navigation commands such that
a display of the intermediate image on the client terminal
provides the appearance of smooth navigation.
In accordance with one or more still further aspects of the
present invention, methods and apparatus are contemplated to
perform various actions, including: receiving at a client
terminal a plurality of pre-rendered images of varying zoom
levels of at least one object, at least some elements of the at
least one object being scaled up and/or down in order to produce
the plurality of pre-determined images, and the scaling being at
64
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
least one of: (i) physically proportional to the zoom level; and
(ii) non-physically proportional to the zoom level; receiving one
or more user navigation commands including zooming information at
the client terminal; blending two or more of the pre-rendered
images to obtain an intermediate image of an intermediate zoom
level that corresponds with the zooming information of the
navigation commands; and displaying the intermediate image on the
client terminal.
In accordance with one or more still further aspects of the
present invention, methods and apparatus are contemplated to
perform various actions, including: transmitting a plurality of
pre-rendered images of varying zoom levels of a roadmap to a
client terminal over a communications channel; receiving the
plurality of pre-rendered images at the client terminal; issuing
one or more user navigation commands including zooming
information using the client terminal; and blending two or more
of the pre-rendered images to obtain an intermediate image of an
intermediate zoom level that corresponds with the zooming
information of the navigation commands such that a display of the
intermediate image on the client terminal provides the appearance
of smooth navigation.
In accordance with one or more still further aspects of the
present invention, methods and apparatus are contemplated to
perform various actions, including: transmitting a plurality of
pre-rendered images of varying zoom levels of at least one object
to a client terminal over a communications channel, at least some
elements of the at least one object being scaled up and/or down
in order to produce the plurality of pre-determined images, and
the scaling being at least one of: (i) physically proportional
to the zoom level; and (ii) non-physically proportional to the
zoom level; receiving the plurality of pre-rendered images at the
client terminal; issuing one or more user navigation commands
including zooming information using the client terminal; blending
two of the pre-rendered images to obtain an intermediate image of
an intermediate zoom level that corresponds with the zooming
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
information of the navigation commands; and displaying the
intermediate image on the client terminal.
Other aspects, features, and advantages will become apparent
to one of ordinary skill in the art when the description herein
is taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
For the purposes of illustrating the invention, forms are
shown in the drawing, it being understood, however, that the
invention is not limited to the precise arrangements and
instrumentalities shown.
FIG. 24 is an image taken from the MapQuest website, which
is at a zoom level 5;
FIG. 25 is an image taken from the MapQuest website, which
is at a zoom level 6;
FIG. 26 is an image taken from the MapQuest website, which
is at a zoom level 7;
FIG. 27 is an image taken from the MapQuest website, which
is at a zoom level 9;
FIG. 28 is an image of Long Island produced at a zoom level
of about 334 meters/pixel in accordance with one or more aspects
of the present invention;
FIG. 29 is an image of Long Island produced at a zoom level
of about 191 meters/pixel in accordance with one or more further
aspects of the present invention;
FIG. 30 is an image of Long Island produced at a zoom level
of about 109.2 meters/pixel in accordance with one or more
further aspects of the present invention;
FIG. 31 is an image of Long Island produced at a zoom level
of about 62.4 meters/pixel in accordance with one or more further
aspects of the present invention;
FIG. 32 is an image of Long Island produced at a zoom level
of about 35.7 meters/pixel in accordance with one or more further
aspects of the present invention;
FIG. 33 is an image of Long Island produced at a zoom level
66
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
of about 20.4 meters/pixel in accordance with one or more further
aspects of the present invention;
FIG. 34 is an image of Long Island produced at a zoom level
of about 11.7 meters/pixel in accordance with one or more further
aspects of the present invention;
FIG. 35 is a flow diagram illustrating process steps that
may be carried out in order to provide smooth and continuous
navigation of an image in accordance with one or more aspects of
the present invention;
FIG. 36 is a flow diagram illustrating further process steps
that may be carried out in order to smoothly navigate an image in
accordance with various aspects of the present invention;
FIG. 37 is a log-log graph of a line width in pixels versus
a zoom level in meters/pixel illustrating physical and
non-physical scaling in accordance with one or more further
aspects of the present invention; and
FIG. 38 is a log-log graph illustrating variations in the
physical and non-physical scaling of FIG. 37.
FIGS. 16A-D illustrate respective antialiased vertical lines
whose endpoints are precisely centered on pixel coordinates;
FIGS. 17A-C illustrate respective antialiased lines on a
slant, with endpoints not positioned to fall at exact pixel
coordinates; and
FIG. 41 is the log-log graph of line width versus zoom level
of FIG. 37 including horizontal lines indicating incremental line
widths, and vertical lines spaced such that the line width over
the interval between two adjacent vertical lines changes by no
more than two pixels.
DETAILED DESCRIPTION OF THE INVENTION
Referring now to the drawings, wherein like numerals
indicate like elements, there is shown in FIGS. 5-11 a series of
images representing the road system of Long Island, NY, U.S.A.
where each image is at a different zoom level (or resolution).
Before delving into the technical details of how the present
67
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
invention is implemented, these images will now be discussed in
connection with desirable resultant features of using the
invention, namely, at least the appearance of smooth and
continuous navigation, particularly zooming, while maintaining
informational integrity.
It is noted that the various aspects of the present
invention that will be discussed below may be applied in contexts
other than the navigation of a roadmap image. Indeed, the extent
of images and implementations for which the present invention may
be employed are too numerous to list in their entirety. For
example, the features of the present invention may be used to
navigate images of the human anatomy, complex topographies,
engineering diagrams such as wiring diagrams or blueprints, gene
ontologies, etc. It has been found, however, that the invention
has particular applicability to navigating images in which the
elements thereof are of varying levels of detail or coarseness.
Therefore, for the purposes of brevity and clarity, the various
aspects of the present invention will be discussed in connection
with a specific example, namely, images of a roadmap.
Although it is impossible to demonstrate the appearance of
smooth and continuous zooming in a patent document, this feature
has been demonstrated through experimentation and prototype
development by executing a suitable software program on a
Pentium-based computer. The image 100A of the roadmap
illustrated in FIG. 28 is at a zoom level that may be
characterized by units of physical length/pixel (or physical
linear size/pixel). In other words, the zoom level, z,
represents the actual physical linear size that a single pixel of
the image 100A represents. In FIG. 28, the zoom level is about
334 meters/pixel. Those skilled in the art will appreciate that
the zoom level may be expressed in other units without departing
from the spirit and scope of the claimed invention. FIG. 29 is
an image 100B of the same roadmap as FIG. 28, although the zoom
level, z, is about 191 meters/pixel.
In accordance with one or more aspects of the present
68
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
invention, a user of the software program embodying one or more
aspects of the invention may zoom in or out between the levels
illustrated in FIGS. 5 and 6. It is significant to note that
such zooming has the appearance of smooth and continuous
transitions from the 334 meters/pixel level (FIG. 28) to/from the
191 meters/pixel level (FIG. 29) and any levels therebetween.
Likewise, the user may zoom to other levels, such as z = 109.2
meters/pixel (FIG. 30), z 62.4 meters/pixel (FIG. 31), z = 35.7
meters/pixel (FIG. 32), z 20.4 meters/pixel (FIG. 33), and z
11.7 meters/pixel (FIG. 34). Again, the transitions through
these zoom levels and any levels therebetween advantageously have
the appearance of smooth and continuous movements.
Another significant feature of the present invention as
illustrated in FIGS. 5-11 is that little or no detail abruptly
appears or disappears when zooming from one level to another
level. The detail shown in FIG. 31 (at the zoom level of z = 62.4
meters/pixel) may also be found in FIG. 28 (at a zoom level of
z = 334 meters/pixel). This is so even though the image object,
in this case the roadmap, includes elements (i.e., roads) of
varying degrees of coarseness. Indeed, the roadmap 100D of FIG.
31 includes at least Al highways such as 102, A3 secondary roads
such as 104, and A4 local roads such as 106. Yet these details,
even the A4 local roads 106, may still be seen in image 100A of
FIG. 28, which is substantially zoomed out in comparison with the
image 100D of FIG. 31.
Still further, despite that the A4 local roads 106 may be
seen at the zoom level of z = 334 meters/pixel (FIG. 28) the Al,
A2, A3, and A4 roads may be distinguished from one another. Even
differences between Al primary highways 102 and A2 primary roads
108 may be distinguished from one another vis-a-vis the relative
weight given to such roads in the rendered image 100A.
The ability to distinguish among the road hierarchies is
also advantageously maintained when the user continues to zoom
in, for example, to the zoom level of z = 20.4 meters/pixel as
illustrated in image 100F of FIG. 33. Although the weight of the
69
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Al primary highway 102 significantly increases as compared with
the zoom level of z = 62.4 meters/pixel in FIG. 31, it does not
increase to such an extent as to obliterate other detail, such as
the A4 local roads 106 or even the A5 dirt roads. Nevertheless,
the weights of the roads at lower hierarchical levels, such as A4
local roads 106 significantly increase in weight as compared with
their counterparts at the zoom level z= 62.4 meters/pixel in
FIG. 31.
Thus, even though the dynamic range of zoom levels between
that illustrated in FIG. 28 and that illustrated in FIG. 34 is
substantial and detail remains substantially consistent (i.e., no
roads suddenly appear or disappear while smoothly zooming), the
information that the user seeks to obtain at a given zooming
level is not obscured by undesirable artifacts. For example, at
the zoom level of z = 334 meters/pixel (FIG. 28), the user may
wish to gain a general sense of what primary highways exist and
in what directions they extend. This information may readily be
obtained even though the A4 local roads 106 are also depicted.
At the zoom level of z = 62.4 meters/pixel (FIG. 31), the user
may wish to determine whether a particular Al primary highway 102
or A2 primary road 108 services a particular city or
neighborhood. Again, the user may obtain this information
without interference from other much more detailed information,
such as the existence and extent of A4 local roads 106 or even A5
dirt roads. Finally, at the zoom level of z= 11.7 meters/pixel,
a user may be interested in finding a particular A4 local road
such as 112, and may do so without interference by significantly
larger roads such as the Al primary highway 102.
In order to achieve one or more of the various aspects of
the present invention discussed above, it is contemplated that
one or more computing devices execute one or more software
programs that cause the computing devices to carry out
appropriate actions. In this regard, reference is now made to
FIGS. 12-13, which are flow diagrams illustrating process steps
that are preferably carried out by the one or more computing
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
devices and/or related equipment.
While it is preferred that the process flow is carried out
by commercially available computing equipment (such as
Pentium-based computers), any of a number of other techniques may
be employed to carry out the process steps without departing from
the spirit and scope of the present invention as claimed. Indeed,
the hardware employed may be implemented utilizing any other
known or hereinafter developed technologies, such as standard
digital circuitry, analog circuitry, any of the known processors
that are operable to execute software and/or firmware programs,
one or more programmable digital devices or systems, such as
programmable read only memories (PROMs), programmable array logic
devices (PALs), any combination of the above, etc. Further, the
methods of the present invention may be embodied in a software
program that may be stored on any of the known or hereinafter
developed media.
FIG. 35 illustrates an embodiment of the invention in which
a plurality of images are prepared (each at a different zoom
level or resolution), action 200, and two or more of the images
are blended together to achieve the appearance of smooth
navigation, such as zooming (action 206). Although not required
to practice the invention, it is contemplated that the approach
illustrated in FIG. 35 be employed in connection with a service
provider - client relationship. For example, a service provider
would expend the resources to prepare a plurality of pre-rendered
images (action 200) and make the images available to a user's
client terminal over a communications channel, such as the
Internet (action 202). Alternatively, the pre-rendered images
may be an integral or related part of an application program that
the user loads and executes on his or her computer.
It has been found through experimentation that, when the
blending approach is used, a set of images at the following zoom
levels work well when the image object is a roadmap: 30
meters/pixel, 50 meters/pixel, 75 meters/pixel, 100 meters/pixel,
200 meters/pixel, 300 meters/pixel, 500 meters/pixel, 1000
71
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
meters/pixel, and 3000 meters/pixel. It is noted, however, that
any number of images may be employed at any number of resolutions
without departing from the scope of the invention. Indeed, other
image objects in other contexts may be best served by a larger or
smaller number of images, where the specific zoom levels are
different from the example above.
Irrespective of how the images are obtained by the client
terminal, in response to user-initiated navigation commands
(action 204), such as zooming commands, the client terminal is
preferably operable to blend two or more images in order to
produce an intermediate resolution image that coincides with the
navigation command (action 206). This blending may be
accomplished by a number of methods, such as the well-known
trilinear interpolation technique described by Lance Williams,
Pyramidal Parametrics, Computer Graphics, Proc. SIGGRAPH '83,
17(3):1-11 (1983), the entire disclosure of which is incorporated
herein by reference. Other approaches to image interpolation are
also useful in connection with the present invention, such as
bicubic-linear interpolation, and still others may be developed
in the future. It is noted that the present invention does not
require or depend on any particular one of these blending
methods. For example, as shown in FIG. 31, the user may wish to
navigate to a zoom level of 62.4 meters/pixel. As this zoom
level may be between two of the pre-rendered images (e.g., in
this example between zoom level 50 meters/pixel and zoom level 75
meters/pixel), the desired zoom level of 62.4 meters/pixel may be
achieved using the trilinear interpolation technique. Further,
any zoom level between 50 meters/pixel and 75 meters/pixel may be
obtained utilizing a blending method as described above, which if
performed quickly enough provides the appearance of smooth and
continuous navigation. The blending technique may be carried
through to other zoom levels, such as the 35.7 meters/pixel level
illustrated in FIG. 32. In such case, the blending technique may
be performed as between the pre-rendered images of 30
meters/pixel and 50 meters/pixel of the example discussed thus
72
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
far.
The above blending approach may be used when the computing
power of the processing unit on which the invention is carried
out is not high enough to (i) perform the rendering operation in
the first instance, and/or (ii) perform image rendering
"just-in-time" or "on the fly" (for example, in real time) to
achieve a high image frame rate for smooth navigation. As will
be discussed below, however, other embodiments of the invention
contemplate use of known, or hereinafter developed, high power
processing units that are capable of rendering at the client
terminal for blending and/or high frame rate applications.
The process flow of FIG. 36 illustrates the detailed steps
and/or actions that are preferably conducted to prepare one or
more images in accordance with the present invention. At action
220, the information is obtained regarding the image object or
objects using any of the known or hereinafter developed
techniques. Usually, such image objects have been modeled using
appropriate primitives, such as polygons, lines, points, etc.
For example, when the image objects are roadmaps, models of the
roads in any Universal Transverse Mercator (UTM) zone may readily
be obtained. The model is usually in the form of a list of line
segments (in any coordinate system) that comprise the roads in
the zone. The list may be converted into an image in the spatial
domain (a pixel image) using any of the known or hereinafter
developed rendering processes so long as it incorporates certain
techniques for determining the weight (e.g., apparent or real
thickness) of a given primitive in the pixel (spatial) domain.
In keeping with the roadmap example above, the rendering
processes should incorporate certain techniques for determining
the weight of the lines that model the roads of the roadmap in
the spatial domain. These techniques will be discussed below.
At action 222 (FIG. 36), the elements of the object are
classified. In the case of a roadmap object, the classification
may take the form of recognizing already existing categories,
namely, Al, A2, A3, A4, and A5. Indeed, these road elements have
73
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
varying degrees of coarseness and, as will be discussed below,
may be rendered differently based on this classification. At
action 224, mathematical scaling is applied to the different road
elements based on the zoom level. As will be discussed in more
detail below, the mathematical scaling may also vary based on the
element classification.
By way of background, there are two conventional techniques
for rendering image elements such as the roads of a map: actual
physical scaling, and pre-set pixel width. The actual physical
scaling technique dictates that the roadmap is rendered as if
viewing an actual physical image of the roads at different
scales. Al highways, for example, might be 16 meters wide, A2
roads might be 12 meters wide, A3 roads might be 8 meters wide,
A4 roads might be 5 meters wide, and A5 roads might be 2.5 meters
wide. Although this might be acceptable to the viewer when
zoomed in on a small area of the map, as one zooms out, all
roads, both major and minor, become too thin to make out. At
some zoom level, say at the state level (e.g., about 200
meters/pixel), no roads would be seen at all.
The pre-set pixel width approach dictates that every road is
a certain pixel width, such as one pixel in width on the display.
Major roads, such as highways, may be emphasized by making them
two pixels wide, etc. Unfortunately this approach makes the
visual density of the map change as one zooms in and out. At
some level of zoom, the result might be pleasing, e.g., at a
small-size county level. As one zooms in, however, roads would
not thicken, making the map look overly sparse. Further, as one
zooms out, roads would run into each other, rapidly forming a
solid nest in which individual roads would be indistinguishable.
In accordance with one or more aspects of the present
invention, at action 224, the images are produced in such a way
that at least some image elements are scaled up and/or down
either (i) physically proportional to the zoom level; or (ii)
non-physically proportional to the zoom level, depending on
parameters that will be discussed in more detail below.
74
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
It is noted that the scaling being "physically proportional
to the zoom level" means that the number of pixels representing
the road width increases or decreases with the zoom level as the
size of an element would appear to change with its distance from
the human eye. The perspective formula, giving the apparent
length y of an object of physical size d, is:
y = c = d/x,
where c is a constant determining the angular perspective
and x is the distance of the object from the viewer.
In the present invention, the linear size of an object of
physical linear size d' in display pixels p is given by
p = d, , za
f
where z is the zoom level in units of physical linear
size/pixel (e.g. meters/pixel), and a is a power law. When a =
-1 and d' = d (the real physical linear size of the object), this
equation is dimensionally correct and becomes equivalent to the
perspective formula, with p = y and z= x/c. This expresses the
equivalence between physical zooming and perspective
transformation: zooming in is equivalent to moving an object
closer to the viewer, and zooming out is equivalent to moving the
object farther away.
To implement non-physical scaling, a may be set to a power
law other than -1, and d' may be set to a physical linear size
other than the actual physical linear size d. In the context of
a road map, where p may represent the displayed width of a road
in pixels and d' may represent an imputed width in physical
units, "non-physically proportional to the zoom level" means that
the road width in display pixels increases or decreases with the
zoom level in a way other than being physically proportional to
the zoom level, i.e. a# -1. The scaling is distorted in a way
that achieves certain desirable results.
It is noted that "linear size" means one-dimensional size.
For example, if one considers any 2 dimensional object and
doubles its "linear size" then one multiplies the area by 4= 22.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In the two dimensional case, the linear sizes of the elements of
an object may involve length, width, radius, diameter, and/or any
other measurement that one can read off with a ruler on the
Euclidean plane. The thickness of a line, the length of a line,
the diameter of a circle or disc, the length of one side of a
polygon, and the distance between two points are all examples of
linear sizes. In this sense the "linear size" in two dimensions
is the distance between two identified points of an object on a
2D Euclidean plane. For example, the linear size can be
calculated by taking the square root of (dxz + dy2), where dx =
xl - xO, dy = yl - yO, and the two identified points are given by
the Cartesian coordinates (xO, yO) and (xl, y1) .
The concept of "linear size" extends naturally to more than
two dimensions; for example, if one considers a volumetric
object, then doubling its linear size involves increasing the
volume by 8 = 23. Similar measurements of linear size can also
be defined for non-Euclidean spaces, such as the surface of a
sphere.
Any power law a < 0 will cause the rendered size of an
element to decrease as one zooms out, and increase as one zooms
in. When a < -1, the rendered size of the element will decrease
faster than it would with proportional physical scaling as one
zooms out. Conversely, when -1 < a < 0, the size of the rendered
element decreases more slowly than it would with proportional
physical scaling as one zooms out.
In accordance with at least one aspect of the invention,
p(z), for a given length of a given object, is permitted to be
substantially continuous so that during navigation the user does
not experience a sudden jump or discontinuity in the size of an
element of the image (as opposed to the conventional approaches
that permit the most extreme discontinuity - a sudden appearance
or disappearance of an element during navigation). In addition,
it is preferred that p(z) monotonically decrease with zooming out
such that zooming out causes the elements of the object become
smaller (e.g., roads to become thinner), and such that zooming in
76
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
causes the elements of the object become larger. This gives the
user a sense of physicality about the object(s) of the image.
The scaling features discussed above may be more fully
understood with reference to FIG. 37, which is a log-log graph of
a rendered line width in pixels for an Al highway versus the zoom
level in meters/pixel. (Plotting log(z) on the x-axis and log(p)
on the y-axis is convenient because the plots become straight
lines due to the relationship log(xa) = a=log(x)). The basic
characteristics of the line (road) width versus zoom level plot
are:
(i) that the scaling of the road widths may be physically
proportional to the zoom level when zoomed in (e.g., up to
about 0.5 meters/pixel);
(ii) that the scaling of the road widths may be
non-physically proportional to the zoom level when zoomed
out (e.g., above about 0.5 meters/pixel); and
(iii) that the scaling of the road widths may be physically
proportional to the zoom level when zoomed further out
(e.g., above about 50 meters/pixel or higher depending on
parameters which will be discussed in more detail below).
As for the zone in which the scaling of the road widths is
physically proportional to the zoom level, the scaling formula of
p = d' = za, is employed where a=-1. In this example, a
reasonable value for the physical width of an actual Al highway
is about d' = 16 meters. Thus, the rendered width of the line
representing the Al highway monotonically decreases with physical
scaling as one zooms out at least up to a certain zoom level zO,
say zO = 0.5 meters/pixel.
The zoom level for zO = 0.5 is chosen to be an inner scale
below which physical scaling is applied. This avoids a
non-physical appearance when the roadmap is combined with other
fine-scale GIS content with real physical dimensions. In this
example, zO = 0.5 meters/pixel, or 2 pixels/meter, which when
expressed as a map scale on a 15 inch display (with 1600x1200
77
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
pixel resolution) corresponds to a scale of about 1:2600. At d
16 meters, which is a reasonable real physical width for Al
roads, the rendered road will appear to be its actual size when
one is zoomed in (0.5 meters/pixel or less). At a zoom level of
0.1 meters/pixel, the rendered line is about 160 pixels wide. At
a zoom level of 0.5 meters/pixel, the rendered line is 32 pixels
wide.
As for the zone in which the scaling of the road widths is
non-physically proportional to the zoom level, the scaling
formula of p= d' = za, is employed where -1 < a < 0 (within a
range of zoom levels zO and zl). In this example, the
non-physical scaling is performed between about zO=0.5
meters/pixel and z1=3300 meters/pixel. Again, when -1 < a < 0,
the width of the rendered road decreases more slowly than it
would with proportional physical scaling as one zooms out.
Advantageously, this permits the Al road to remain visible (and
distinguishable from other smaller roads) as one zooms out. For
example, as shown in FIG. 28, the Al road 102 remains visible and
distinguishable from other roads at the zoom level of z = 334
meters/pixel. Assuming that the physical width of the Al road is
d' = d = 16 meters, the width of the rendered line using physical
scaling would have been about 0.005 pixels at a zoom level of
about 3300 meters/pixel, rendering it virtually invisible. Using
non-physical scaling, however, where -1 < a < 0 (in this example,
a is about -0.473), the width of the rendered line is about 0.8
pixels at a zoom level of 3300 meters/pixel, rendering it clearly
visible.
It is noted that the value for zl is chosen to be the most
zoomed-out scale at which a given road still has "greater than
physical" importance. By way of example, if the entire U.S. is
rendered on a 1600x1200 pixel display, the resolution would be
approximately 3300 meters/pixel or 3.3 kilometers/pixel. If one
looks at the entire world, then there may be no reason for U.S.
highways to assume enhanced importance relative to the view of
78
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
the country alone.
Thus, at zoom levels above z1, which in the example above is
about 3300 meters/pixel, the scaling of the road widths is again
physically proportional to the zoom level, but preferably with a
large d' (much greater than the real width d) for continuity of
p(z). In this zone, the scaling formula of p= d' = za is
employed where a = -1. In order for the rendered road width to
be continuous at z1 = 3300 meters/pixel, a new imputed physical
width of the Al highway is chosen, for example, d' = 1.65
kilometers. z1 and the new value for d' are preferably chosen in
such a way that, at the outer scale z1, the rendered width of the
line will be a reasonable number of pixels. In this case, at a
zoom level in which the entire nation may be seen on the display
(about 3300 meters/pixel), Al roads may be about pixel wide,
which is thin but still clearly visible; this corresponds to an
imputed physical road width of 1650 meters, or 1.65 kilometers.
The above suggests a specific set of equations for the
rendered line width as a function of the zoom level:
p(z) = dO = z-1, if z -< z0
p(z) = dl = za, if z0 < z < zl,
p(z) = d2 = z-1, if z>- z1.
The above form of p(z) has six parameters: zO, z1, dO, dl,
d2 and a. zO and z1 mark the scales at which the behavior of
p(z) changes. In the zoomed-in zone (z < z0), zooming is
physical (i.e., the exponent of z is -1), with a physical width
of dO, which preferably corresponds to the real physical width d.
In the zoomed-out zone (z - z1), zooming is again physical, but
with a physical width of dl, which in general does not correspond
to d. Between zO and zl, the rendered line width scales with a
power law of a, which can be a value other than -1. Given the
preference that p(z) is continuous, specifying zO, z1, dO and d2
is sufficient to uniquely determine dl and a, which is clearly
79
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
shown in FIG. 37.
The approach discussed above with respect to Al roads may be
applied to the other road elements of the roadmap object. An
example of applying these scaling techniques to the Al, A2, A3,
A4, and A5 roads is illustrated in the log-log graph of FIG. 38.
In this example, zO = 0.5 meters/pixel for all roads, although
it may vary from element to element depending on the context. As
A2 roads are generally somewhat smaller that Al roads, dO = 12
meters. Further, A2 roads are "important," e.g., on the U.S.
state level, so z1 = 312 meters/pixel, which is approximately the
rendering resolution for a single state (about 1/10 of the
country in linear scale). At this scale, it has been found that
line widths of one pixel are desirable, so d2 = 312 meters is a
reasonable setting.
Using the general approach outlined above for Al and A2
roads, the parameters of the remaining elements of the roadmap
object may be established. A3 roads: dO = 8 meters, zO = 0.5
meters/pixel, z1 = 50 meters/pixel, and d2 = 100 meters. A4
streets: dO = 5 meters, zO = 0.5 meters/pixel, z1 = 20
meters/pixel, and d2 = 20 meters. And A5 dirt roads: dO = 2.5
meters, zO = 0.5 meters/pixel, zl = 20 meters/pixel, and
d2 = 20m. It is noted that using these parameter settings, A5
dirt roads look more and more like streets at zoomed-out zoom
levels, while their physical scale when zoomed in is half as
wide.
The log-log plot of FIG. 38 summarizes the scaling behaviors
for the road types. It is noted that at every scale the apparent
width of A1>A2>A3>A4>=A5. Note also that, with the exception of
dirt roads, the power laws all come out in the neighborhood of
a = -0.41. The dotted lines all have a slope of -1 and represent
physical scaling at different physical widths. From the top
down, the corresponding physical widths of these dotted lines
are: 1.65 kilometers, 312 meters, 100 meters, 20 meters, 16
meters, 12 meters, 8 meters, 5 meters, and 2.5 meters.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
When interpolation between a plurality of pre-rendered
images is used, it is possible in many cases to ensure that the
resulting interpolation is humanly indistinguishable or nearly
indistinguishable from an ideal rendition of all lines or other
primitive geometric elements at their correct pixel widths as
determined by the physical and non-physical scaling equations.
To appreciate this alternative embodiment of the current
invention, some background on antialiased line drawing will be
presented below.
The discussion of antialiased line drawing will be presented
in keeping with the roadmap example discussed at length above, in
which all primitive elements are lines, and the line width is
subject to the scaling equations as described previously. With
reference to FIG. 39A, a one pixel wide vertical line drawn in
black on white background, such that the horizontal position of
the line is aligned exactly to the pixel grid, consists simply of
a 1-pixel-wide column of black pixels on a white background. In
accordance with various aspects of the present invention, it is
desirable to consider and accommodate the case where the line
width is a non-integral number of pixels. With reference to FIG.
39B, if the endpoints of a line remain fixed, but the weight of
the line is increased to be 1.5 pixels wide, then on an anti-
aliased graphics display, the columns of pixels to the left and
right of the central column are drawn at 25% grey. With
reference to FIG. 39C, at 2 pixels wide, these flanking columns
are drawn at 50% grey. With reference to FIG. 39D, at 3 pixels
wide, the flanking columns are drawn at 100% black, and the
result is three solid black columns as expected.
This approach to drawing lines of non-integer width on a
pixellated display results in a sense (or illusion) of visual
continuity as line width changes, allowing lines of different
widths to be clearly distinguished even if they differ in width
only by a fraction of a pixel. In general, this approach, known
as antialiased line drawing, is designed to ensure that the line
integral of the intensity function (or "1-intensity" function,
81
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
for black lines on a white background) over a perpendicular to
the line drawn is equal, to the line width. This method
generalizes readily to lines whose endpoints do not lie precisely
in the centers of pixels, to lines which are in other
orientations than vertical, and to curves.
Note that drawing the antialiased vertical lines of FIGS.
16A-D could also be accomplished by alpha-blending two images,
one (image A) in which the line is 1 pixel wide, and the other
(image B) in which the line is 3 pixels wide. Alpha blending
assigns to each pixel on the display (1-alpha)*(corresponding
pixel in image A) + alpha*(corresponding pixel in image B). As
alpha is varied between zero and one, the effective width of the
rendered line varies smoothly between one and three pixels. This
alpha-blending approach only produces good visual results in the
most general case if the difference between the two rendered line
widths in images A and B is one pixel or less; otherwise, lines
may appear haloed at intermediate widths. This same approach can
be applied to rendering points, polygons, and many other
primitive graphical elements at different linear sizes.
Turning again to FIGS. 16A-D, the 1.5 pixel-wide line (FIG.
39B) and the 2 pixel-wide line (FIG. 39C) can be constructed by
alpha-blending between the 1 pixel wide line (FIG. 39A) and the 3
pixel wide line (FIG. 39D). With reference to FIGS. 17A-C, a 1
pixel wide line (FIG. 40A), a 2 pixel wide line (FIG. 40B) and a
3 pixel wide line (FIG. 40C) are illustrated in an arbitrary
orientation. The same principle applies to the arbitrary
orientation of FIGS. 17A-C as to the case where the lines are
aligned exactly to the pixel grid, although the spacing of the
line widths between which to alpha-blend may need to be finer
than two pixels for good results.
In the context of the present map example, a set of images
of different resolutions can be selected for pre-rendition with
reference to the log-log plots of FIGS. 14-15. For example,
reference is now made to FIG. 41, which is substantially similar
to FIG. 37 except that FIG. 41 includes a set of horizontal lines
82
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
and vertical lines. The horizontal lines indicate line widths
between 1 and 10 pixels, in increments of one pixel. The
vertical lines are spaced such that line width over the interval
between two adjacent vertical lines changes by no more than two
pixels. Thus, the vertical lines represent a set of zoom values
suitable for pre-rendition, wherein alpha-blending between two
adjacent such pre-rendered images will produce characteristics
nearly equivalent to rendering the lines representing roads at
continuously variable widths.
Interpolation between the six resolutions represented by the
vertical lines shown in FIG. 41 is sufficient to render the Al
highways accurately using the scaling curve shown at about nine
meters/pixel and above. Rendition below about nine meters/pixel
does not require pre-rendition, as such views are very zoomed-in
and thus show very few roads, making it more computationally
efficient (and more efficient with respect to data storage
requirements) to render them vectorially than to interpolate
between pre-rendered images. At resolutions of more than about
1000 meters/pixel (such views encompass large fractions of the
Earth's surface), the final pre-rendered image alone can be used,
as it is a rendition using 1 pixel wide lines. Lines that are
thinner than a single pixel render the same pixels more faintly.
Hence, to produce an image in which the Al lines are 0.5 pixels
wide, the 1 pixel wide line image can be multiplied by an alpha
of 0.5.
In practice, a somewhat larger set of resolutions are pre-
rendered, such that over each interval between resolutions, none
of the scaling curves of FIG. 38 varies by more than one pixel.
Reducing the allowed variation to one pixel can result in
improved rendering quality. Notably, the tiling techniques
contemplated and discussed in the following co-pending
application may be considered in connection with the present
invention: U.S. Patent Application No. 10/790,253, entitled
SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER
INTERFACE, filed March 1, 2004, Attorney Docket No. 489/2, the
83
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
entire disclosure of which is hereby incorporated by reference.
This tiling technique may be employed for resolving an image at a
particular zoom level, even if that level does not coincide with
a pre-rendered image. If each image in the somewhat larger set
of resolutions is pre-rendered at the appropriate resolution and
tiled, then the result is a complete system for zooming and
panning navigation through a roadmap of arbitrary complexity,
such that all lines appear to vary in width continuously in
accordance with the scaling equations disclosed herein.
Additional details concerning other techniques for blending
images, which may be employed in connection with implementing the
present invention, may be found in U.S. Provisional Patent
Application No. 60/475,897, entitled SYSTEM AND METHOD FOR THE
EFFICIENT, DYNAMIC AND CONTINUOUS DISPLAY OF MULTI RESOLUTIONAL
VISUAL DATA, filed June 5, 2003, the entire disclosure of which
is hereby incorporated by reference. Still further details
concerning blending techniques that may be employed in connection
with implementing the present invention may be found in U.S.
Provisional Patent Application Serial No. 60/453,897, filed March
12, 2003, entitled SYSTEM AND METHOD FOR FOVEATED, SEAMLESS,
PROGRESSIVE RENDERING IN A ZOOMING USER INTERFACE, the entire
disclosure of which is hereby incorporated by reference.
Advantageously, employing the above-discussed aspects of the
present invention, the user enjoys the appearance of smooth and
continuous navigation through the various zoom levels. Further,
little or no detail abruptly appears or disappears when zooming
from one level to another. This represents a significant
advancement over the state of the art.
It is contemplated that the various aspects of the present
invention may be applied in numerous products, such as
interactive software applications over the Internet,
automobile-based software applications and the like. For
example, the present invention may be employed by an Internet
website that provides maps and driving directions to client
terminals in response to user requests. Alternatively, various
84
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
aspects of the invention may be employed in a GPS navigation
system in an automobile. The invention may also be incorporated
into medical imaging equipment, whereby detailed information
,concerning, for example, a patient's circulatory system, nervous
system, etc. may be rendered and navigated as discussed
hereinabove. The applications of the invention are too numerous
to list in their entirety, yet a skilled artisan will recognize
that they are contemplated herein and fall within the scope of
the invention as claimed.
The present invention may also be utilized in connection
with other applications in which the rendered images provide a
means for advertising and otherwise advancing commerce.
Additional details concerning these aspects and uses of the
present invention may be found in U.S. Provisional Patent
Application No. 60/553,803, entitled METHODS AND APPARATUS FOR
EMPLOYING MAPPING TECHNIQUES TO ADVANCE COMMERCE, filed on even
date herewith, Attorney Docket No. 489/7, the entire disclosure
of which is hereby incorporated by reference.
Although the invention herein has been described with
reference to particular embodiments, it is to be understood that
these embodiments are merely illustrative of the principles and
applications of the present invention. It is therefore to be
understood that numerous modifications may be made to the
illustrative embodiments and that other arrangements may be
-devised without departing from the spirit and scope of the
present invention as defined by the appended claims.
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
EFFICIENT DATA CACHE
BACKGROUND
"MRU caching," where MRU stands for "most recently used,"
is a known concept for implementing a client-side memory in a
client-server system. It is assumed that the server has
access to and can serve to a client a large number of data
objects, which in the aggregate may occupy a large amount of
memory. The available bandwidth between client and server is
limited, however, so client requests for data objects to be
sent from the server take time. If access to data objects is
reasonably "coherent," meaning that objects which the client
needed recently are likely to be needed again in the near
future, then MRU caching may increase the efficiency of the
client-server system. Employing this approach, the client
generally sets aside some limited amount of memory (generally
much less than would be needed to store all of the objects on
the server), and stores in this memory (a cache) as many of
the most recently requested objects as will fit. When a new
object is sent from the server to the client and the client's
cache space has run out, the least recently used (LRU) object
is erased from the cache to create data storage space in which
the new object may be stored.
Generally, when the client needs a data object, the cache
is first examined to see if the object is cached. If it is
cached, then the cached representation is used, obviating the
need for a slow or computationally expensive server request.
Usually, making use of a cached representation also "promotes"
that object to the MRU end of the cache. This approach
generally provides substantial performance advantages over
having to request data from the server for every data object
accessed.
86
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
The erasure of the least recently used object from a
cache when a new object is accessed by a computing system and
stored in the cache may cause inefficiency in cache usage.
Even the erased, least-recently-used object in the cache may
be again requested by the server. When this happens, the
server may undertake the relatively slow or computationally
expensive task of retrieving this object from a more remote
source of data storage, such as a main memory or mass storage
device. Given the finite size of cache memories, object
erasure may occur with some frequency, thereby causing a
server or other computing system to expend significant
resources accessing more remote memories to get data that was
once conveniently stored in a cache memory. Accordingly,
there is a need in the art for a more efficient and flexible
approach to cache memory management.
SUMMARY OF THE INVENTION
According to one aspect, the invention may provide a
method, comprising providing a cache in a computing system
having an initial group of cache objects, each cache object
having an initial compression ratio and including stored data;
decreasing an amount of data storage space occupied by at
least one of the cache objects other than a given one of the
cache objects; and increasing an amount of data storage space
occupied by the given cache object. Preferably, the
decreasing comprises decreasing the amount of data storage
space by a given amount. Preferably, the increasing comprises
increasing the amount of data storage space occupied by the
given cache object by the given amount. Preferably, the
decreasing comprises increasing the initial compression ratio
of the at least one cache object. Preferably, the increasing
87
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
comprises decreasing the initial compression ratio of the
given cache object. Preferably, the given cache object is a
most recently used cache object of the cache objects.
Preferably, the at least one cache object undergoing the
decreasing step comprises a least recently used cache object
of the cache objects.
Preferably, the decreasing comprises removing a portion
of the stored data for the at least one cache object.
Preferably, the increasing comprises supplementing the stored
data for the given cache object. Preferably, an amount of
data storage space available for each of the cache objects may
equal one of a finite number of discrete values. Preferably,
the decreasing comprises reducing the amount of data storage
space for at least one randomly selected cache object of the
cache objects, other than the given cache object. Preferably,
the reducing comprises reducing the amount of data storage
space for the at least one randomly selected cache object to a
randomly determined extent. Preferably, the randomly selected
cache object is selected using one of a random method and
pseudorandom method. Preferably, the selection of the
randomly selected cache object is guided by a heuristic.
Preferably, the method further comprises storing the
given cache object in a losslessly compressed form after the
increasing. Preferably, the method further comprises storing
the given cache object in uncompressed form after the
increasing. Preferably, the decreasing comprises removing at
least one of the cache objects other than the given cache
object.
According to another aspect, the invention may provide an
apparatus, comprising a computing system having at least one
processor capable of operative communication with a main
memory; and a cache in the computing system having an initial
88
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
group of cache objects, each cache object having an initial
compression ratio and including stored data; wherein the
computing system is operable to decrease an amount of data
storage space occupied by at least one of the cache objects
other than a given one of the cache objects; and increase an
amount of data storage space occupied by the given cache
object. Preferably, the decreasing comprises decreasing the
amount of data storage space by a given amount. Preferably,
the increasing comprises increasing the amount of data storage
space occupied by the given cache object by the given amount.
Preferably, the decreasing comprises increasing the initial
compression ratio of the at least one cache object.
Preferably, the increasing comprises decreasing the initial
compression ratio of the given cache object.
Preferably, the given cache object is a most recently
used cache object of the cache objects. Preferably, the
decreasing comprises removing a portion of the stored data for
the at least one cache object. Preferably, the increasing
comprises supplementing the stored data for the given cache
object. Preferably, an amount of data storage space available
for each of the cache objects may equal one of a finite number
of discrete values. Preferably, the decreasing comprises
reducing the amount of data storage space for at least one
randomly selected cache object of the cache objects, other
than the given cache object. Preferably, the reducing
comprises reducing the amount of data storage space for the at
least one randomly selected cache object to a randomly
determined extent.
According to another aspect, the invention provides
method, comprising: providing a cache in a computing system,
the cache having an initial condition; if insufficient data
storage space is present in the cache under the initial
condition to store at least one new object in the cache,
89
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
compressing at least one object in the cache to clear data
storage space for the at least one new object; and storing the
at least one new object in the cache. Preferably, the initial
condition corresponds to the cache being empty. Preferably,
the method further comprises continuing to store new objects
in the cache without compressing the objects stored in the
cache until insufficient data storage space remains in the
cache to store any additional new object. Preferably, the
method further comprises storing the at least one new object
in the cache without the compressing, if sufficient space for
storing the at least one new object is present in the cache
under the initial condition.
Other aspects, features, advantages, etc. will become
apparent to one skilled in the art when the description of the
preferred embodiments of the invention herein is taken in
conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 42 is a graph of the data storage sizes of
individual cache objects in a cache as a function of the cache
objects' recency of use within the cache, in accordance with
one or more embodiments of the present invention;
FIG. 43 is a graph of the cumulative sum of the data
storage space occupied by cache objects in a cache plotted
against the number, "N," of cache objects summed, in
accordance with one or more embodiments of the present
invention;
FIG. 44 is a block diagram of a data cache including a
plurality of cache objects in accordance with one or more
embodiments of the present invention;
FIG. 45 is a block diagram of the data cache of FIG. 44
in which cache objects have been resized in accordance with
one or more embodiments of the present invention;
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
FIG. 46 is a block diagram of the data cache of FIG. 45
in which an accessed cache object has been resized and
restored to the cache in accordance with one or more
embodiments of the present invention;
FIG. 47 is a flow diagram of a method for accessing a
cache object in a data cache and updating the data cache in
accordance with one or more embodiments of the present
invention; and
FIG. 48 is a block diagram of a computing system
adaptable for use with one or more embodiments of the present
invention.
For the purposes of illustrating the various aspects of
the invention, there are shown in the drawings forms that are
presently preferred, it being understood, however, that the
invention is not limited to the precise arrangements and
instrumentalities shown.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
This disclosure makes reference to the LRU and MRU ends
of the cache. objects are generally added at the MRU end, and
are generally erased from the LRU end. However, the present
invention is not limited to such a scheme. It is noted that
the physical layout of cache objects in a cache need not
correspond to the LRU-MRU layout. The layout of a cache
merely preferably enables a computing system to find, insert,
and/or erase objects in the manner described herein. The
linear LRU-MRU arrangement is a convenient mechanism for
describing the operation of a cache, but represents only of
many possible implementations of a cache memory. Herein, the
terms "cache" and "cache memory" are used interchangeably.
It is noted that, although MRU caching and its extensions
disclosed herein are discussed in the context of a
client/server architecture, similar principles apply to many
91
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
other scenarios, such as efficient hard disk access on a
single computer (where access to the hard disk is slower than
access to RAM, and RAM is thus used to cache the most recently
used content on the hard disk). In one or more other
embodiments, data are gathered from the environment or
generated computationally rather than being loaded from a disk
or sent across a network. In each case, the client has access
to a small but fast temporary cache memory, and a larger but
slower data source from which information is requested
repeatedly. This slower data source is generally referred to
herein as the "server."
The following discussion of convergent series is provided
as an introduction to the cache memory apparatus and method
disclosed herein.
The infinite sum of the series y(n)=n~-p, with n going
from 1 to infinity, and with p>1, is finite. Similarly, the
sum of y=1/b~n is finite for b>1. For example, in the latter
case, if b=2, the sum is exactly 2. The principles underlying
such convergent series may be used to implement one or more
embodiments of efficient data caching methods and apparatus as
described herein.
One or more embodiments of the methods and apparatus
described herein may employ concepts related to the "Zeno
paradox," which is described below. While this discussion
provides a conceptual underpinning applicable to one or more
embodiments described herein, the present invention is not
limited by the conceptual aspects discussed below.
Zeno Caching Concept.
Zeno is a runner who is so quick that in one step (which,
for the sake of discussion, it is assumed he makes every
second) he covers half the distance from his current position
92
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
to the end of any racetrack. The paradox is that he never
finishes the course, even though he moves forward with every
step. This paradox is easily related to the 1/b~n series
above with b=2, and summing from n=2 to infinity. This
concept may be extended to the storage of cache objects (with
the cache itself being analogized to the "racetrack") by
enabling the cache objects to be compressed to a progressively
greater extent with decreasing recency of use or access.
Thus, in proceeding from the MRU end of a cache to the LRU end
thereof, a theoretically infinite number of additional cache
objects of ever decreasing size could be put in place, without
ever running out of space. This principle is referred to
herein as the Zeno cache concept.
Preferably, the cache objects concerned herein are
compressible, which in this disclosure, corresponds to being
amenable to lossy data compression techniques. Lossy data
compression may be characterized by the ability to represent a
data object with fewer bytes than the full representation of
the data object. Higher compression ratios generally incur
higher distortion of the data object and lower quality of an
image rendered using the compressed data (where the object
includes one or more image files) Without limitation, lossy
compression techniques may also be applicable to sound, video,
and many other data types.
In one or more embodiments, compressed versions of the
data may be suitable as substitutes for the uncompressed data.
Below a given level of distortion, the compressed
representations of the data may be fully adequate, and above
the given level of distortion, the compressed representations
may be adequate as a temporary measure while the client waits
for a higher quality version of the data. The higher quality
version may merely be less compressed than the temporarily
used version, or may be losslessly compressed or uncompressed.
93
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In one or more embodiments, lower quality representations
may be subsets of higher quality representations, meaning that
improving the representation quality at the client side may
involve merely sending data to supplement lower quality
representations, thereby providing the higher quality
representation. Preferably, with this approach, there is no
need to incur the burden of sending an entirely new set of
data associated with the high quality representation. This
approach preferably avoids redundancy and hence preferably
substantially increases efficiency.
Consistent with the approach discussed above, the reverse
process of lowering the representation quality of an object
may involve merely removing a portion of the data employed for
a high quality representation of an image, rather than
requiring compression, or re-compression, of the data used for
the high quality representation. This property preferably
also enhances the efficiency of the caching apparatus and
method disclosed herein.
In one or more embodiments, the compression technique may
provide objects with compression levels that scale from lossy
to lossless. This feature may allow a lossless representation
of a data object to be built up in steps, from highly lossy to
lossless, at little or no extra total cost relative to sending
across a lossless version initially. An example of a data
type and compression technology enabling the above features is
the wavelet compression of images, as exemplified by the
JPEG2000 standard. However, the present invention is not
limited the use of the JPEG2000 standard.
Given the above properties, if memory were "continuous"
(i.e. not discretized into bytes) then it would be possible in
theory to cache an infinite number of data objects in a finite
amount of memory, merely by enforcing the constraint that the
94
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
compressed sizes of the objects conform to the rules of a
convergent series as discussed earlier herein. The operation
of a cache able to function in accordance with the theory
discussed above is described below in connection with FIGS. 1
and 2.
In the graph of FIG. 42, the variable "N" preferably
corresponds to the number of each cache object, the value of
the number of each cache object representing the recency of
use of each such cache object, with increasing values of N
corresponding to decreasing recency of use of the cache object
associated with that number. The variable "Y" preferably
corresponds to the size of each cache object. For the "Y"
variable, a value of "1" may correspond to the size of a cache
object in its highest-quality condition, i.e. when it is not
compressed at all. Most-recently-used objects may be
represented with low distortion, and less recently used
objects may be subject to an extent of compression consistent
with the recency of the last use thereof. It may be seen in
FIG. 42 that the value of the compressed cache-object size, Y,
declines with decreasing recency of use of the pertinent cache
object. Note that the "Y" variable may correspond to an
absolute size of each object (whether compressed or not) in
the cache (expressed in arbitrary units) . Alternatively, in
one or more other embodiments, "Y" may correspond to a
compression ratio, with, for example, the value "1"
corresponding to a full-size object, and the value "0.5"
corresponding to an object occupying one half of its
uncompressed size.
With reference to FIG. 43, the cumulative sum of the
sizes Y of the objects numbered from 1 to N, for each value of
N, may still be a finite number, as shown in FIG. 43. The
units of the variable "Y" may be units of data storage space
corresponding to the size (or data storage space needed by)
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
one representative fully expanded (uncompressed) cache object.
Since FIGS. 1 and 2 aid an understanding of the theory of one
or more embodiments of the present invention, information
describing the size of the cache objects in bits and/or bytes
of data is not provided herein.
In one or more embodiments, the theoretical
implementation described above is preferably modified for two
reasons. First, in practice, memory storage is preferably
composed of discrete storage units. Thus, for example, it is
usually meaningless in practice to compress a cache object to
occupy an amount of storage space that is smaller than one
bit. Second, the total number of operations performed on the
cache is preferably finite. In contrast, enforcing a
continuous curve of compression ratios described by one of the
convergent formulas above may involve reducing the size of
every cache object in the cache every time additional cache
storage space was needed. This would require an impractically
large number of operations.
In one or more embodiments, the number of objects in the
cache will in practice be finite. However, where the Zeno
cache concept is employed, this number may be much larger than
would be possible with conventional MRU caching. Further,
cached objects may have the property that if recently used,
they may be stored at a high quality level (anywhere from a
low level of distortion, or compression lossyness, to lossless
compression, to uncompressed data). The quality level of
cache objects may become progressively worse (i.e. be subject
to progressively higher levels of distortion or compression
lossyness) with each successive cache memory access in which
these cache objects are not accessed.
Because computer memory is discrete and there may be a
minimum compressed size of a cache object below which a cache
96
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
object may have no value to a user, cached representations may
be subject to a maximum compression ratio that yields this
minimum compressed size. Thus, in one or more embodiments,
the maximum number of cache objects that can be stored in the
cache may equal the total data storage space in the cache
divided by the amount of data storage space occupied by a
cache object having the above-described minimum compressed
size, if the objects are all of equal size. However, the
cache objects need not all be of equal size.
There are many ways to design a series which is bounded
by one of the equations discussed above (or any other
convergent sum), and which therefore has a finite sum. An
additional constraint can also be introduced, specifically
that the likelihood of any given value repeating in successive
values of a series increases at higher values of N such that
the number of different values of Y employed may be limited to
a reasonable number.
An example of such a series is: 1, 1/4, 1/4, 1/16, 1/16,
1/16, 1/16, 1/64, 1/64, 1/64, 1/64, 1/64, 1/64, 1/64, 1/64,
1/256, etc.
Clearly the sum of the series 1, two quarters, four
sixteenths, eight sixty-fourths, etc. is 2, just like y=1/2~n,
as discussed earlier herein, But, if we take the series out
to n=16000, only about log2(16000), or about 14, values of y
(object data storage space size) may be used.
In one or more embodiments, the log function described
above provides one way to cause the number of available values
of Y (possible sizes of the cache objects) to grow much more
slowly than the value of N. However, the present invention is
not limited to the use of this log function, and other
mathematical operations that cause the number of values of Y
97
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
to grow more slowly than the value of N may be employed in
connection with the present invention.
In one or more embodiments, when N = one million, as few
as 20 values of Y may be used (as determined using the
logarithm-based formula recited above). This implies that when
space has to be freed in the cache, only a small number of
operations may be needed to establish a suitable allocation of
data storage space among the cache objects, since the majority
of the cache objects will occupy an amount of data storage
space that preferably does not need to change.
Other mathematical series may also satisfy the desired
criteria for use within a cache memory management system and
method. Additionally, it is possible to use series that are
not theoretically convergent (i.e. whose sums are infinite),
since in practice a finite number of terms will be summed in
any case.
In one or more embodiments, random algorithms may be used
to improve the basic algorithm in a number of ways. In one
more embodiments, the 2*1/4, 4*1/16 etc. series, described
above, may include only a small number of available cache
object sizes, possibly leading to stark differences in
compression ratios between different objects within a cache.
Random choice may be used to "squeeze" (reduce the data
storage space used by) a randomly selected subset of the cache
objects in a weighted fashion until some target amount of
space is made available for new cache objects. This approach
may provide beneficial results because the exact position in
the cache of a cache object may decrease in importance with an
increasing number of objects in the cache. The amount by
which each object is "squeezed" may also be at least partially
randomized. Using randomization algorithms like those
discussed herein may reduce obvious discontinuities or
98
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
thresholds in cache-object quality, which may be perceived in
images rendered using cache objects stored in the cache.
In the following, an illustrative example of managing
cache objects in a data cache in accordance with one or more
aspects of the present invention is presented.
FIG. 44 is a block diagram of a data cache 300 including
a plurality of cache objects 302-310 in accordance with one or
more embodiments of the present invention. FIG. 45 is a
block diagram of the data cache 300 of FIG. 44 in which cache
objects 302, 304, 308, and 310 have been resized in accordance
with one or more embodiments of the present invention. FIG.
46 is a block diagram of the data cache of FIG. 45 in which
an accessed cache object 306 has been resized and restored to
cache 300 in accordance with one or more embodiments of the
present invention.
In one or more embodiments, including that shown in FIG.
44, cache 300 may include five cache objects (abbreviated "CO"
in FIGS. 3-6 for the sake of brevity) C0 1 302, C0 2 304, CO 3
306, C04 308, and CO 5 310. The number of cache objects (5)
shown in FIG. 44 has been selected for convenience in
illustrating various concepts described herein. However,
fewer or more than five cache objects may be included within
cache 300. There is in principle no lower limit to the number
of cache objects which may be included within cache 300. In
principle, an upper limit of the number of cache objects that
may be included within cache 300 may correspond to the total
size of the cache divided by the smallest acceptable possible
cache object size, which is discussed elsewhere herein..
In FIGS. 3-5, for the sake of describing various concepts
disclosed herein, the width of each illustrated cache object
is intended to be proportional to the data storage space used
by that cache object. Also, in FIGS. 3-5, proceeding from the
99
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
left-most cache object to the right-most cache object
corresponds to increasi.ng recency of access of the displayed
cache objects, with the least recently used of the cache
objects shown at the extreme left, and the most recently used
of the cache objects shown at the extreme right.
FIG. 45 shows cache 300 after CO 3 306 has been accessed
by a computing system, such as computing system 700, employing
cache 300. In this example, CO 3 306 is not shown within
cache 300 in its original position, since this position has
been overwritten in the condition of cache 300 shown in FIG.
45. Moreover, free space 402 has been created to make room
for an expanded version of CO 3 306 which may occupy more data
storage space within cache 300 than did the original version
of CO 3 in FIG. 44. In FIG. 46, an expanded version of CO
306 has been written into a portion of cache 300 which was
occupied by free space 402 in the cache 300 condition shown in
FIG. 45.
FIG. 47 is a flow diagram of a method 600 for accessing
a cache object in data cache 300 and updating the data cache
300 in accordance with one or more embodiments of the present
invention. Reference is made to FIGS. 3-6 in the following.
At step 602, cache objects 302, 304, 306, 308, and 310
are provided to a program having access to cache 300. The
group of cache objects initially present in cache 300 are
shown in FIG. 44. This initial condition of cache 300 may
result from a default setting in or more programs or from
program steps previously executed within one or more programs.
In any case, it is the changes made to cache 300 after
establishment of the initial condition shown in FIG. 44 that
are of interest in the following discussion. Although any one
of many programs operating on any one of various computing
systems may be in communication with cache 300, for the sake
100
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
of convenience, software which may access cache 300 is
referred to merely as "the program" in the following.
At step 604, an indication may be provided as to which
cache object will be the next to be used by the program. At
step 606, the indicated cache object, which in this example is
CO 3 306, may be accessed by the program.
At step 608, a determination may be made as to whether
the accessed cache object is expandable. Herein, a cache
object is expandable when it may occupy more data storage
space by being subject to a lower compression ratio. Such
expansion may be accomplished by supplementing the data
already present in the cache object rather than by providing a
completely new set of data corresponding to the new
compression ratio (or corresponding to a lack of compression).
If the accessed cache object is not expandable, it is
preferably restored to cache 300 in step 610. Preferably, in
step 610, the restored cache object occupies the same amount
of data storage space after being accessed as it did prior to
such access. Consistent with the principles of LRU-MRU
caching, upon being restored to cache 300, the accessed cache
object may be written to the rightmost, or MRU end, of cache
300. Alternatively, however, the accessed cache object could
be written to any part of cache 300. Continuing with this
branch of method 600, the method 600 preferably ends at step
612.
With reference to step 608, if the accessed cache object,
such as cache object 306, is expandable, it is preferably
expanded (step 614) in accordance with one or more embodiments
of the present invention. As previously discussed herein,
expanding a cache object as described above preferably helps
provide an arrangement in which the most recently and/or the
101
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
most frequency accessed cache objects are stored in cache 300
at the highest quality levels.
In one or more embodiments, where there are "N" cache
objects in a cache, the number of possible sizes (as measured
in data storage space) of such cache objects may be limited to
the quantity equal to 1092(N). Establishing a limited, finite
number of possible cache object sizes, as described above,
preferably limits the computational expense of determining a
new, expanded size for a cache object, such as CO 306, to be
expanded in step 614.
In one or more embodiments, the amount of data storage
space needed for the expanded (or otherwise stated, less
compressed) version of CO 306 may be calculated by a computing
system (not shown) having access to cache 300. Where cache
300 is not yet ready to receive the expanded version of CO
306, the expanded version of Co 306 may be written to another
memory storage location (not shown) for temporary storage
therein.
At step 616, data storage space 402 needed for storing an
expanded version of CO 306 is preferably made available within
cache 300. If there is sufficient space present within cache
300 to store an expanded version of CO 306 without altering
any cache objects within cache 300, then a reduction in size
of one or more of the cache objects in cache 300 may be
omitted. However, where all or substantially all of the
storage space in cache 300 was occupied prior to CO 306 being
accessed, one or more of the cache objects other than CO 306
may undergo a reduction in size to free up space in cache 300
for storage of an expanded version of cache 306.
In one or more embodiments, the number of cache object
size reduction operations may be reduced where there is a
limited number of possible cache object sizes. Limiting the
102
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
number of cache object size reduction operations preferably
operates to reduce the computational burden on a computing
system accessing cache 300 and preferably provides for overall
computing system efficiency.
In one or more embodiments, there may be various ways to
achieve a desired amount of data storage space clearing.
Herein, the term "clearing" may correspond to making data
storage space available in cache 300 by reducing the data
storage space allocated to one or more cache objects within
cache 300.
In one or more embodiments, the amount of data storage
space to be cleared may correspond to the amount of additional
storage needed by the expanded cache object over and above the
space it occupied prior to its most recent access by a
computing system. However, in other embodiments, the amount of
space to be cleared may be smaller or greater than the amount
space by which the most recently accessed cache object has
increased in size.
In one or more embodiments, the space cleared for the
most recently used, expanded cache object may be at one end of
cache 300, as is illustrated in FIG. 46. However, in other
embodiments, the cleared space could be placed at other
locations within cache 300.
In one or more embodiments, the data storage space to be
made available may be provided at the expense of one or more
of the cache objects of FIG. 44 other than CO 3 306 (the most
recently used cache object). Specifically, it may be possible
to provide the needed additional space by reducing the size of
just one remaining cache object or by reducing the size of all
but the most recently used cache object. Moreover, any number
of cache objects in between these two extremes may be caused
to shed storage space in favor of the expanded, most recently
103
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
used cache object. In the following, all of the cache objects
other than the most recently accessed cache object are
considered to be "eligible for size reduction."
In one or more embodiments, the extent of size reduction
of the one or more cache objects eligible for size reduction
may be selected according one or more considerations. In one
embodiment, the cache objects eligible for size reduction may
shed an equal or substantially equal amount of storage space.
In another embodiment, the eligible cache objects may shed an
equal or substantially equal proportion of their pre-reduction
size to clear space for the expanded, most recently used cache
object.
In one or more other embodiments, the extent of size
reduction of each cache object may be based on how recently
the cache object was last accessed. Specifically, cache
objects eligible for size reduction may shed progressively
more storage space with decreasing recency of the last access
thereof. Thus, under this approach, the most recently used of
the cache objects eligible for size reduction may shed a
relatively small amount of storage space, and the least
recently used cache object may shed a relatively large amount
of data storage space, with those cache objects in between
these two extremes shedding intermediate amounts of storage
space.
While the discussion of storage space reduction herein is
directed primarily to merely reducing the size of cache
objects that are not the most recently accessed, in one or
more embodiments, one or more cache objects may be removed
from cache 300 to clear data storage space. Moreover, such
cache object removal may be practiced either alone, or in
combination with cache object data storage space reduction of
cache objects that will remain within cache 300.
104
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In the embodiment of FIG. 46, all four cache objects 302,
304, 308, and 310 remaining in cache 300 have been reduced in
size to clear data storage space for the writing of CO 306 to
the position shown at the rightmost end of cache 300.
However, in alternative embodiments, three or fewer of the
four cache objects 302, 304, 308, and 310 eligible for size
reduction could undergo size reduction. Preferably, the method
ends at step 620.
In one or more embodiments, rather than managing objects
in the cache employing only the recency of use of each cache
object as a variable in determining cache object size, cache
object management may also involve intelligent guessing about
which objects might be needed next. Thus, objects less likely
to be needed may be "squeezed" before objects with a higher
likelihood of being needed in the future. In one or more
embodiments, this guessing approach could be combined with an
algorithm that may randomly select objects in the cache for
squeezing and which may additionally generate a randomly
varying amount of squeezing for the objects selected.
FIG. 48 is a block diagram of a computing system 900
adaptable for use with one or more embodiments of the present
invention. In one or more embodiments, central processing
unit (CPU) 902 may be coupled to bus 904. In addition, bus 904
may be coupled to inventive cache 300, random access memory
(RAM) 906, read only memory (ROM) 908, input/output (I/0)
adapter 910, communications adapter 922, user interface
adapter 906, and display adapter 918.
In one or more embodiments, RAM 906 and/or ROM 908 may
hold user data, system data, and/or programs. I/0 adapter 910
may connect storage devices, such as hard drive 912, a CD-ROM
(not shown), or other mass storage device to computing system
900. Communications adapter 922 may couple computing system
105
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
900 to a local, wide-area, or Internet network 924. User
interface adapter 916 may couple user input devices, such as
keyboard 926 and/or pointing device 914, to computing system
900. Moreover, display adapter 918 may be driven by CPU 902 to
control the display on display device 920. CPU 902 may be any
general purpose CPU.
It is noted that the methods and apparatus described thus
far and/or described later in this document may be achieved
utilizing any of the known technologies, such as standard
digital circuitry, analog circuitry, any of the known
processors that are operable to execute software and/or
firmware programs, programmable digital devices or systems,
programmable array logic devices, or any combination of the
above. One or more embodiments of the invention may also be
embodied in a software program for storage in a suitable
storage medium and execution by a processing unit.
Although the invention herein has been described with
reference to particular embodiments, it is to be understood
that these embodiments are merely illustrative of the
principles and applications of the present invention. It is
therefore to be understood that numerous modifications may be
made to the illustrative embodiments and that other
arrangements may be devised without departing from the spirit
and scope of the present invention as defined by the appended
claims.
106
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
SYSTEM AND METHOD FOR EFFICIENTLY ENCODING DATA
BACKGROUND OF THE INVENTION
Recently, image compression standards such as
JPEG2000/JPIP (JPEG 2000 Interactive Protocol) have been
introduced to meet a demanding engineering goal: to enable
very large images (i.e. gigapixels in size) to be delivered
incrementally or selectively from a server to a client over a
low-bandwidth communication channel. (See David Taubman's
implementation of Kakadu at www.kakadusoftware.com). When
such images are being viewed at full resolution, only a
limited region of each image can fit on a client's graphical
display at any given time. The new standards are geared
toward selectively accessing such regions and sending across
the communication channel only data relevant to the region.
If this "region of interest" or ROI changes continuously, then
a continuous dialogue between a client and server over a low-
bandwidth channel can continue to keep the client's
representation of the area inside the ROI accurate.
However, existing approaches generally require the
transmission of substantial amounts of data to significantly
shift the location of the region of interest, thus limiting
the speed at which such shifts may be implemented. Moreover,
existing approaches generally depend on sequential access to a
linear string of text, thereby imposing a significant burden
on text navigation when a client seeks to significantly change
the location of the region of interest. Accordingly, an
improved method for transmitting data from a server to a
client is needed.
SUMMARY OF THE INVENTION
107
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
According to one or more embodiments, the invention
provides a method that may include converting data identifying
a plurality of visual features into a plurality of pixel
characteristic data values; and forming an image file with
pixels having the respective pixel characteristic data values.
Preferably, the plurality of visual features comprise at least
one graphical symbol. Preferably, the visual feature
identification data comprises ASCII codes. Preferably, the
visual feature identification data comprises frequency-order
positions of the visual features. Preferably, the visual
feature identification data is determined based on a
combination of a) a frequency of use of the visual features
and b) transition probabilities between successive ones of the
visual features. Preferably, the pixel characteristic data
values comprise at least one data type selected from the group
consisting of: pixel intensity data values, pixel color data
values, and image contrast data values.
Preferably, the pixel characteristic data values comprise
pixel color data values. Preferably, the pixel color data
values comprise at least one color-data value type selected
from the group consisting of: RBG (Red, Blue, Green) pixel
color data values, HSV (Hue Saturation Value) pixel color data
values, and CMYK (Cyan, Magenta, Yellow, Black) color data
values.
Preferably, the method further comprises losslessly
compressing said image file. Preferably, the visual feature
identification data comprises differences in values of the
identification data for successive ones of the visual
features. Preferably, the visual features comprise
alphanumeric characters. Preferably, the method further
comprises locating the pixels in said image file in a same
order that said alphanumeric characters are ordered in, within
an original document. Preferably, the method further
108
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
comprises transmitting the image file to a client. Preferably,
the method further comprises decoding at least a region of
said image file into the alphanumeric characters for viewing
by the client. Preferably, the method further comprises
enabling the client to browse the decoded image file region in
a first dimension.
Preferably, the method further comprises enabling the
client to browse the decoded image file region in first and
second dimensions. Preferably, browsing in said first
dimension includes advancing along a line of the alphanumeric
characters, and browsing in the second dimension preferably
includes scrolling over a plurality of lines of the
alphanumeric characters. Preferably, the browsing comprises
browsing the decoded image file in a manner that emulates
browsing within the original document. Preferably, the method
further comprises correlating locations of the pixels in the
image file with locations of the alphanumeric characters
corresponding to said pixels in an original document
containing the alphanumeric characters.
Preferably, the pixels' locations within the image file
at least substantially correspond to locations of the
corresponding alphanumeric characters in the original
document. Preferably, the pixels' locations within the image
file are substantially the same as locations of the
corresponding alphanumeric characters in the original
document. Preferably, the method further comprises
transmitting at least a region of the image file from a server
to a client. Preferably, the method further comprises
requesting a region of the image file from a server by a
client.
Preferably, the method further comprises sending the
requested region of the image file by the server. Preferably,
109
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
the sending step comprises transmitting compressed data
suitable for decoding by the client.
According to another aspect, one or more embodiments of
the present invention provide an apparatus that may include a
processor operating under the instructions of a software
program, the software program causing the apparatus to perform
actions, including converting data identifying a plurality of
visual features into a plurality of pixel characteristic data
values; and forming an image file with pixels having the
respective pixel characteristic data values.
According to another aspect, one or more 'embodiments of
the present invention may provide a storage medium containing
a software program operable to cause an apparatus including a
processor under the instructions of the software program to
perform actions, comprising: converting data identifying a
plurality of visual features into a plurality of pixel
characteristic data values; and forming an image file with
pixels having the respective pixel characteristic data values.
Other aspects, features, advantages, etc. will become
apparent to one skilled in the art when the description of the
preferred embodiments of the invention herein is taken in
conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
For the purposes of illustrating the various aspects of
the invention, there are shown in the drawings forms that are
presently preferred, it being understood, however, that the
invention is not limited to the precise arrangements and
instrumentalities shown.
FIG. 49 is a text image of the full text of "Ulysses,"
using raw ASCII encoding in accordance with one or more
110
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
embodiments of the present invention, with the color White
having a numerical value of 0 and the color Black having a
numerical value of 255;
FIG. 50 is a text image of the first five chapters of
Ulysses, encoded using raw ASCII in accordance with one or
more embodiments of the present invention;
FIG. 51 is a text image of the first five chapters of
Ulysses encoded using frequency-order encoding in accordance
with one or more embodiments of the present invention; and
FIG. 52 is a block diagram of a computing system that may
be adaptable for use with one or more embodiments of the
present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
One or more embodiments of the present invention may
involve extending selectively decompressable image compression
and transmission technologies to textual or other data that
may be identified using a binary representation.
Herein, binary data that identify and/or describe visual
features may be converted from an initial format, such as
ASCII (American Standard Code for Information Interchange) or
other format, may be converted into a format suitable for
incorporation into image data, such as but not limited to
pixel intensity data.
The visual features referred to above may include but are
not limited to graphical symbols and/or alphanumeric
characters. However, herein, visual features may include any
visible image features that may be described and/or identified
using binary data. Moreover, while such data may be encoded
into pixel intensity data, the present invention is not
limited to encoding in this format.
111
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In one or more embodiments, the initial data (visual
feature identification data) may be converted into several
possible types of pixel characteristic data including but not
limited to pixel intensity data, pixel color data, image
contrast data, and/or other form of image data. The above-
mentioned pixel color data may include but is not limited to
Red, Blue, Green (RBG) pixel color data, Hue Saturation Value
(HSV) pixel color data, Cyan, Magenta, Yellow, Black (CMYK)
pixel color data, and/or other form of pixel color data.
In the following discussion, a large text, specifically
the book "Ulysses," by James Joyce, is considered. In one or
more embodiments, this text may be formatted by putting each
chapter in its own column, with columns for successive
chapters arranged from left to right. However, it will be
appreciated that other arrangements of the chapters may be
implemented. Columns are assumed to have a maximum width in
characters, such as 100. FIG. 49 shows the entire text of
Ulysses encoded as an image in this fashion, with each pixel
within FIG. 49 corresponding to a single text character (or
character position that doesn't include text, such as empty
space) in the original text document. FIG. 50 shows a text
image of the first five chapters of Ulysses, using the same
encoding method as in FIG. 49.
In one or more embodiments, the intensity value of each
pixel may be set equal to the ASCII code of the character
being encoded in the pixel. Because grayscale pixels and
ASCII characters may both be represented using 8-bit
sequences, (which may both have values in the range 0-255),
the correspondence between a pixel value and a character may
be readily implemented. In this disclosure, although textual
and other characters may be represented with pixels using the
ASCII code as a value for pixel intensity, it will be
112
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
appreciated that other codes for textual or other characters
may be employed for this purpose.
Generally, the full text of Ulysses in ordinary ASCII
representation (i.e. as a standard text file) occupies 1.5MB
of storage space, which may be too large to transmit in its
entirety over a narrowband communication channel. The pixel-
characteristic-data representation of character data (which is
known herein as the "character-pixel image" and also as the
"text-image") of FIG. 49, encoded as a lossless JPEG2000
image, requires about 2.2MB (Megabytes) of data storage space.
This storage space requirement could be significantly reduced
if the chapters of the book were more equal in length,
resulting in less empty space (encoded as zeros) in the
character-pixel image (text-image).
However, more important than the overall file size, is
the ability of an ordinary JPIP server to serve this file to a
client selectively and incrementally. Specifically, of
concern here is the ability of a server to serve selected
portions of the file at controllable increments of resolution.
In one or more embodiments, a client viewing the text at
a zoom level sufficient to read the characters (which may
require well over 1 pixel/character on the client-side
display) can use JPIP (or other suitable protocol) to request
only the relevant portion of the text. This operation is
efficient, and adequate performance could be achieved for a
reader of the text even with a very low bandwidth connection
to the server, under conditions that would make it prohibitive
to download the entire text, due to the magnitude of data
involved in such a download.
In one or more embodiments, similar effects may be
achieved using a client/server technology specifically
designed for selective access to large texts, but the
113
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
character-pixel image approach described above has a number of
advantages over conventional implementations, which are listed
below.
1) The character-pixel image approach may use existing
technology and protocols designed for very large data volume.
2) The character-pixel image approach may easily scale up to
texts of many gigabytes in size, or even larger, without any
degradation of performance.
3) Access to text or other visual features within a region
of interest is preferably two-dimensional in one or more
embodiments of the present invention. Thus, in many situations
(for example, when text is to be viewed in columns as in the
Ulysses case), the character-pixel image approach disclosed
herein preferably enables more efficient browsing than is
available using existing approaches that deal with text as a
one-dimensional string.
4) Because wavelets are used in JPEG2000, the character-
pixel image is preferably subject to a multi-resolution
representation. Although the text, or other visual feature
identified using visual feature identification data, will
generally not be readable at other than the final (most
detailed) resolution, the broader spatial support of lower-
resolution wavelets preferably provides an intelligent client-
side cache for text, and/or other visual features, near the
region of interest. Moving the ROI over various regions of the
original text, as may be done when scrolling through text in a
traditional manner, may therefore be efficiently performed
using one or more embodiments of the present invention.
In one or more embodiments, the above concepts may be
readily extended to deal with formatted text, Unicode, or
other metadata, as all such data can be represented using
ASCII text strings, possibly with embedded escape sequences.
114
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In selected applications, JPEG2000 may be used as a lossy
compression format, meaning that the decoded image bytes are
not necessarily identical to the original bytes. Herein, the
term "decoding" refers to converting pixel data in a text
image back into the original text data, or other visual
feature data. If the image bytes represent text, lossy
compression will generally not be acceptable. One of the
design goals of JPEG2000 was, however, to support lossless
compression efficiently, as this is important for certain
imaging functions (such as for medical and scientific
applications). Lossless compression ratios for photographic
images are typically only around 2:1, as compared with
visually acceptable lossy images, which can usually easily be
compressed by 24:1.
Image compression, both lossy and lossless, generally
operate best on images that have good spatial continuity,
meaning that the differences between the intensity values of
adjacent pixels are low. The raw ASCII encoding is not optimal
from this perspective, since successive text characters
encoded in ASCII may have widely varying values. Thus, some
alternatives are considered below.
FIG. 51 is a text image of the first five chapters of
Ulysses encoded using frequency-order encoding in accordance
with one or more embodiments of the present invention.
Frequency Encoding
In one or more embodiments of the present invention, the
encoding efficiency may be improved by reordering characters
according to their frequency of occurrence in the pertinent
text, in the English language as a whole, or in another
language as a whole, from highest frequency to lowest
frequency. Thus, in one or more embodiments, empty space
would have code zero, and a pixel value of zero in the
115
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
character-pixel image. The "space" character could receive
code "one" (with its corresponding value in the character-
pixel image also being "1"). A sequence of characters such as
e, t, a, o, i, n, s, r, h, 1, etc... could be caused to convert
to successive pixel values starting with "2" (corresponding to
"e") and proceeding upward therefrom up to the value 255.
It is possible that, upon converting all the characters
in a large text into pixel values using this approach, all 255
pixel values could end up being used. However, by the very
nature of the text character (or other visual feature) to
pixel value conversion contemplated herein, pixel value
occurrences preferably become increasingly rare with
increasing numerical values thereof.
The image of FIG. 51 illustrates this point. Pixel
values tend to cluster near zero. At least as importantly,
the difference in pixel values between successive character-
pixel values in the embodiment of FIG. 51 is preferably lower
than in the embodiment of FIG. 50.
Where all pixel values in the range 0-255 are equally
likely, eight bits will generally be needed to represent each
pixel value. However, in embodiments in which some pixel
values occur much more frequently than others, the pixel
values can preferably be represented with fewer bits, without
losing any information.
An example is considered to illustrate this point. In
this extreme case, the pixel value equals zero 99% of the
time, and has another value, somewhere between 0 and 255, the
rest of the time. In this case, the encoding algorithm may
encode the 0 value with a single "0" bit, and non-zero values
with a leading "1" bit (to signal the presence of a non-zero
value) followed by an 8-bit representation of the non-zero
value. Thus, this approach conserves 7 bits per pixel for 99
116
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
out of 100 pixels, but uses one extra pixel to represent a
non-zero pixel which occurs only 1% of the time. The decoding
algorithm corresponding to the above-described encoding
algorithm thus preferably interprets a "0" bit as representing
a 0 bit value and a bit sequence starting with a"1" bit value
has having the value represented by the bits succeeding the
leading "1" bit.
However, even in less extreme situations, the existence
of pixel values that occur much more frequently than others
may enable considerable savings in storage space, without
incurring any loss of pixel data, and by logical extension
without incurring any loss of the visual-feature data
represented by the pixel data. In general, two or more
categories of value occurrence frequency may be established,
generally using a progressively increasing number of bits to
represent values occurring with decreasing frequency. In this
manner, smaller bit sequences may be used most of the time,
thereby conserving data storage space and data communication
bandwidth requirements.
In an intermediate example, five bits could be used to
represent the most frequently occurring pixel values, and nine
bits for the less frequently occurring values. For the most
frequently occurring visual features, a leading bit, which may
be a "0", may be provided, which may be followed by four bits
representing the actual value of the pixel. For the less
frequency occurring pixel values, a leading bit, which may be
a"1", may be provided, which may be followed by eight bits
representing the actual value of the pixel.
In one or more other embodiments, frequency encoding may
benefit from spatial coherence to represent a text image using
a reduced number of bits. Specifically, the image may be
divided into 8X8 pixel blocks, thus providing 64 pixels in
117
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
each block, with each pixel representing a frequency encoded
visual feature (which may be a text character). The encoding
method may then review each block and determine the number of
bits needed to represent the highest-valued pixel value in the
block. This determined number of bits may then be used to
represent all pixel values in that block.
For many of the blocks within any given image, the
highest-value pixel may be able to be represented with four or
fewer bits. Thus, considerable savings in data storage
requirements may be obtained when employing this block by
block approach.
In one or more embodiments, when the frequency-encoded
text-image of Ulysses is compressed losslessly as a JPEG2000
image, the file size is 1.6MB, barely larger than the raw
ASCII text file (1.5MB), and 37% smaller than the ASCII
encoded text-image. With further optimizations of the letter
encoding, the compressed file size can drop well below the
ASCII text file size. The further optimizations can include,
but are not limited to: using letter transition probabilities
(Markov-1) to develop the encoding, instead of just
frequencies (Markov-0); and/or encoding as pixels the delta or
difference between one character and the next, rather than the
characters themselves.
With these added optimizations, text ready to be served
in this fashion may actually take up less data storage space
than the raw ASCII.
One or more embodiments of the present invention
discussed herein include using JPEG2000/JPIP as a selective
image decompression technology, but the present invention is
not limited to using this image compression technology. Other
image compression formats and protocols may be employed in
conjunction with the present invention, including but not
118
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
limited to, for example, LizardTech's MrSID format and
protocol, which has similar properties.
FIG. 52 is a block diagram of a computing system 1400
adaptable for use with one or more embodiments of the present
invention. In one or more embodiments, central processing
unit (CPU) 1402 may be coupled to bus 1404. In addition, bus
1404 may be coupled to random access memory (RAM) 1406, read
only memory (ROM) 1408, input/output (I/0) adapter 1410,
communications adapter 1422, user interface adapter 1406, and
display adapter 1418.
In one or more embodiments, RAM 1406 and/or ROM 1408 may
hold user data, system data, and/or programs. I/0 adapter
1410 may connect storage devices, such as hard drive 1412, a
CD-ROM (not shown), or other mass storage device to computing
system 1400. Communications adapter 1422 may couple computing
system 1400 to a local, wide-area, or Internet network 1424.
User interface adapter 1416 may couple user input devices,
such as keyboard 1426 and/or pointing device 1414, to
computing system 1400. Moreover, display adapter 1418 may be
driven by CPU 1402 to control the display on display device
1420. CPU 1402 may be any general purpose CPU.
It is noted that the methods and apparatus described thus
far and/or described later in this document may be achieved
utilizing any of the known technologies, such as standard
digital circuitry, analog circuitry, any of the known
processors that are operable to execute software and/or
firmware programs, programmable digital devices or systems,
programmable array logic devices, or any combination of the
above. One or more embodiments of the invention may also be
embodied in a software program for storage in a suitable
storage medium and execution by a processing unit.
119
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Although the invention herein has been described with
reference to particular embodiments, it is to be understood
that these embodiments are merely illustrative of the
principles and applications of the present invention. It is
therefore to be understood that numerous modifications may be
made to the illustrative embodiments and that other
arrangements may be devised without departing from the spirit
and scope of the present invention as defined by the appended
claims.
120
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
SYSTEM AND METHOD FOR MANAGING COMMUNICATION AND/OR STORAGE OF
IMAGE DATA
BACKGROUND OF THE INVENTION
Recently developed image compression and transmission
standards such as JPEG2000/JPIP have enabled the interactive
display of large images (i.e. gigapixels in size) over narrow
bandwidth communication channels. However, these emerging
standards and technologies do not provide means for achieving
a more ambitious goal: to allow flexible visual interaction
with a very large number of images simultaneously, each of
which may also potentially be very large. Accordingly, there
is a need in the art for an improved system and method for an
improved system and method for transmitting and/or storing
image data.
SUMMARY OF THE INVENTION
According to one aspect, the present invention provides a
method that may include establishing communication between a
first computer and a second computer over a communication
link, the second computer having an image collection stored
therein in the form of compressed image data; selecting a
plurality of images in the collection for communication to
said first computer; and transmitting low-resolution image
data for all of the selected images from the second computer
to the first computer before transmitting full-resolution
image data for any of the selected images.
Other aspects, features, advantages, etc. will become
apparent to one skilled in the art when the description of the
preferred embodiments of the invention herein is taken in
conjunction with the accompanying drawings.
121
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
BRIEF DESCRIPTION OF THE DRAWINGS
For the purposes of illustrating the various aspects of
the invention, there are shown in the drawings forms that are
presently preferred, it being understood, however, that the
invention is not limited to the precise arrangements and
instrumentalities shown.
FIG. 53 is a block diagram of a system that may be
connected to enable communication of image data a plurality of
computers in accordance with one or more embodiments of the
present invention;
FIG. 54 is a block diagram of an image having at least
two regions of interest therein in accordance with one or more
embodiments of the present invention;
FIG. 55 is a block diagram of a "virtual book" that
employs aspects of the technology disclosed herein in
accordance with one or more embodiments of the present
invention;
FIG. 56 is an illustration of a three-dimensional version
of the virtual book of FIG. 55 in accordance with one or more
embodiments of the present invention;
FIG. 57 is block diagram of a system for managing image
data communication between one or more portable devices and
one or more other computers in accordance with one or more
embodiments of the present invention;
FIG. 58A illustrates the results of an incomplete image
data download employing an existing approach;
FIG. 58B illustrates the results of an incomplete image
data download in accordance with one or more embodiments of
the present invention;
FIG. 59 is a block diagram of a "common space" that may
include a physical display (screen) and two virtual displays
122
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
in accordance with one or more embodiments of the present
invention;
FIG. 60 illustrates a collection of over one thousand
images (a collection of digitized maps of various sizes)
packed into a montage in accordance with one or more
embodiments of the present invention;
FIG. 61 illustrates a snapshot of about three thousand
images that have been dynamically re-arranged into a random
configuration in accordance with one or more embodiments of
the present invention; and
FIG. 62 is a block diagram of a computer system that may
be adaptable for use with one or more embodiments of the
present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
FIG. 53 is a block diagram of a system 170 that may be
connected to enable communication of image data between a
plurality of computers in accordance with one or more
embodiments of the present invention. System 170 preferably
includes client computer 182 which is connected to display 184
and data storage device 186. System 170 preferably also
includes server computer 188 which may be connected to data
storage device 190. Server computer 188 may also be connected
to the Internet 180.
In one or more embodiments, image data may be
communicated between a plurality of computers 182, 188 so as
to enable viewing of large collections of potentially large
images using a relatively low-bandwidth connection
therebetween. For instance, desirable viewing and navigation
of images stored at server computer 188 may be accomplished by
transmitting selected portions of the image data stored at
server computer 188 at controllable levels of resolution. The
selectivity of image data 114 may be such as to select a
123
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
particular image at high resolution, or even a selected
portion of a particular image at high resolution.
Herein, various embodiments are discussed that include
varying the types of devices used as client computer 182 and
server 188, the types of image data 114 transmitted
therebetween, and various applications of the ability to
transmit selected image data at specified levels of
resolution.
FIG. 54 is a block diagram of an image 250 having at
least two regions of interest 252, 254 therein in accordance
with one or more embodiments of the present invention. Image
250 could be a subset of image data 114. Alternatively, image
data 114 could represent a subset of image 250, depending upon
what image data is requested by client computer 182.
In one or more embodiments, image 250 may be stored in
compressed form on server computer 188, or within storage
device 190. Preferably, when stored in this manner, data for
a plurality of resolution levels for various regions of image
250 may be stored and may be requested for downloading by
client computer 182.
In one or more embodiments, the resolution level at which
a particular image or region of an image is stored on client
computer 182 may be readily increased or decreased. Where a
prior download results in storage of region or image at a
first resolution level (which may be less-than-full
resolution), this first resolution level may be increased by
adding data representing the next higher level of resolution,
preferably without having to discard the data representing the
first resolution, thereby avoiding redundancy and increasing
the efficiency of the image data communication contemplated
herein. Conversely, the resolution level of a region or image
stored at client 182 may be decreased by discarding the
124
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
highest level of resolution stored therein, without losing
data corresponding to the lower levels of resolution for the
same region or image. Such resolution reduction may be
practiced at client 182 to clear data storage space needed for
one or more regions or images other than the one for which
data is being discarded.
The pertinent image compression may be provided by, for
instance, the use of JPEG2000 or another discrete wavelet
transform-based image compression scheme. However, the
present invention is not limited to the use of any particular
compression format or image data representation. Other
formats may be employed, including image formats whose sizes
in bytes are not substantially smaller than the uncompressed
image data. It is merely preferable that the selected image
format be susceptible to multiscale representation and storage
of image data.
In one or more embodiments, client computer 182 may seek
to download one or more regions of image 250, where such
regions may be portions of image 250. The one or more regions
of interest 252, 254 may be the only ones that client computer
182 seeks to download. Alternatively, client computer
(client) 182 may merely seek to download one or more selected
regions at higher resolution than the resolution at which the
remainder of image 250 is downloaded. In either case, client
182 may request a download by identifying both a specified
region of image 250 to download and a resolution level at
which this specified region will be provided by server
computer (server) 188.
In the example of FIG. 54, client 182 preferably requests
a download of all of image 250 at low resolution. (The exact
resolution level at which the bulk of image 250 is downloaded
is not pertinent to this discussion) However, client 182
125
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
seeks to download region of interest 1 252 at a higher
resolution, or even at full resolution. Accordingly, client
182 preferably specifies the coordinates and the desired
resolution level of region of interest 1 252 to server 188.
Thus, in addition to downloading the bulk (including that
portion external to region of interest 1 252) of image 250 at
low resolution, client 182 preferably downloads region of
interest 1 252 at the specified higher resolution. In other
situations, client 182 could seek to download only the
region(s) of interest and omit a download of the remainder of
image 250.
In this manner, a user of client computer 182 may view
region of interest 1 252 at high resolution without having to
download the entirety of image 250 at this high resolution.
Thus, a relatively low-bandwidth data communication link
between client 182 and server 188 could nevertheless transmit
the entirety of image 250, while providing a region of
particular interest (in this case, region of interest 1 252)
at high resolution, thereby providing the viewer with the same
viewing experience with respect to the region of interest that
would have occurred had client 182 downloaded the entirety of
image 250 at the high resolution, this latter option however
demanding considerably more download time and data storage
space at client computer 182 or data storage device 186.
Shifting the Region of Interest
In one or more embodiments, a user of client computer 182
may wish to pan across image 250. Normally, panning from one
region of interest 252 to another 254 would involve having
both regions downloaded at client 182 at the level of
resolution at which the regions will be viewed. Moreover,
generally, all image territory in between region of interest 1
252 and region of interest 2 254 would be stored at client
126
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
computer 182 to enable the described panning to occur. As
described in the following, in one or more embodiments of the
present invention, viewing of such regions of interest 252,
254 may be accomplished by downloading much less data and
using less storage space at client computer 182 than in the
approach described above.
In one or more embodiments, client 182 may shift from a
high resolution view of region of interest 1 252 to region of
interest 2 254. Preferably, image data corresponding to a
low-resolution representation of region of interest 2 254 is
already present in client computer 182 from the download of
image 250, discussed above. In this case, all that is needed
is to supplement the existing image data for region of
interest 2 254 with additional image data describing the
pertinent higher levels of resolution to arrive at a high-
resolution rendition of region of interest 2 254 at client
computer 182. If needed, image data representing the higher
resolution levels of region of interest 1 252 may be discarded
or overwritten to make space in data storage device 186 or
other data storage space for the additional image data to be
downloaded for region of interest 2 254.
In one or more embodiments, the shift in view from region
of interest 1 252 to region of interest 2 254 may be
accomplished gradually, to provide a viewer of display 184
with a viewing experience that may closely simulate that
available on a computer that has the entirety of image 250
downloaded at high resolution. Specifically, the level of
resolution at which region of interest 1 252 is displayed may
be reduced gradually to the resolution level at which most of
image 250 is represented. Thereafter, the view on display 184
may present a gradual pan across the low-resolution territory
in between region of interest 1 252 and region of interest 2
254. Finally, upon arriving at region of interest 2 254, the
127
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
view on display 184 may increase toward a high-resolution
rendition of region of interest 2 254 either after completing
the panning across image 250 or concurrently with the latter
portion of this panning operation. Preferably, at the
conclusion of the described process, region of interest 2 254
may be stored in client computer 182 at high resolution and
may be displayed on display 184 at this high resolution.
FIG. 55 is a block diagram of a "virtual book" 350 that
employs aspects of the technology disclosed herein in
accordance with one or more embodiments of the present
invention. Virtual book 350 may include display 352, backward
cache 354, and forward cache 356. While the caches 354, 356
are each shown having two pages stored therein, any number of
pages may be stored in either of caches 354 and 356.
In one or more embodiments, virtual book 352 employs the
ability to present selected image data at controllable levels
of resolution for the particular case of virtual book 350. In
virtual book 350, each image may be a page within display 352
of virtual book 350. Display 352 may correspond to display
184 of FIG. 53 or may be a special purpose display that
accommodates the specific features of virtual book 350.
Virtual book 350 may correspond to client computer 182 of FIG.
53, or may be a special purpose computer that is substantially
limited to communicating, storing, and displaying pages of
books.
In one or more embodiments, virtual book 350 may include
only one page that is stored and/or displayed at full
resolution, with other pages, both earlier and later in the
sequence of pages displayed, at a variety of other
resolutions.
In one or more embodiments, the page currently displayed
on display 184, i.e. the active page, is displayed at full
128
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
resolution, which is "page 10" in FIG. 55. In such
embodiments, other pages may be displayed at progressively
lower resolutions with increasing distance in pages from the
active page. More specifically, the resolution at which each
page is stored may equal the resolution of the active page
being displayed in display 356 divided the quantity equal to 2
raised to a power equal to the number of pages between each
stored page and the active page. Thus, applying this
approach, page 11 (in forward cache 356) and page 9 (in
backward cache 354) may each occupy one half the amount of
data storage space occupied by the active page in display 352.
Continuing with this approach, page 12 (in forward cache 356)
and page 8 (in backward cache 354) may each occupy one quarter
the amount of data storage space occupied by the active page
in display 352.
While in the above discussion, the amount of data storage
space allocated to each page differs by a factor of two with
respect to its immediately neighboring page, it will be
appreciated by those of skill in the art that a value greater
than or less than two may be employed as the division factor.
Moreover, arithmetic formulae other than division of the data
storage space of the active page by a constant may be employed
to determine the allocation of data storage space to a
succession of pages stored in caches 354 and 356.
In one or more embodiments, a new active page may be
selected in place of page 10 which is shown displayed in FIG.
55. The new selected page may, but need not, be a page
immediately adjacent page 10 (either page 9 or page 11). That
is, any page from 1 to the last page in the pertinent book (or
any other type of publication with discrete pages) may be the
new active page.
129
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In one or more embodiments, upon selection of the new
active page, a transition between the currently active page
and the new active page is preferably conducted. This
transition to a new active page may include acquiring
additional image data for the new active page to enable the
new active page to be stored and/or displayed at full
resolution. If the new active page is "page 11", and the
"factor-of-two" embodiment, discussed above, is employed, the
amount of data storage space allocated to page 11 will
preferably double. Continuing with an application of the
"factor-of-two" embodiment, the data storage space allocated
to page 10 will preferably be halved as part of the transition
away from page 10 and toward page 11, as the active page. The
data for the active version of page 10 that is not included in
the post-transition page 10 may be discarded (which may
include overwriting thereof). Alternatively however, this
"surplus" data for page 10 may be stored in another cache.
Such caching of the page-10 surplus data may provide
efficiency if a transition to page 10 occurs soon after (i.e.
within a reasonable number of page transitions) the transition
away therefrom.
In one or more embodiments, the transition from page 10
to page 11 (or other new active page) may include a gradual
fade-out from page 10 and gradual fade-in of page 11, to
provide an experience that is visually pleasing and/or
reminiscent of a physical page transition to the user of
virtual book 350. Optionally, a sequence of images showing
the folding and turning of the old active page may be provided
to make the virtual page transition look still more
reminiscent of a physical turn of a page.
FIG. 56 is an illustration of a three-dimensional version
of the virtual book of FIG. 55 in accordance with one or more
embodiments of the present invention. The embodiment of FIG.
130
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
56 illustrates a method in which an alpha channel, for partial
transparency (the rough edges) may be stored as image
information in addition to the red, green and blue color
components. While color components are discussed above, for
the sake of convenience, only a black and white rendition of
the image of FIG. 56 is provided herein.
In one or more embodiments, hardware-accelerated texture
mapping may be employed to support an alpha channel. Another
feature that may be practiced in connection with either two-
dimensional or three-dimensional embodiments of the virtual
book is dynamic deformation of images, e.g. bending the pages
of this book as they turn, as illustrated in FIG. 56.
Managing Image Data in One or More Portable Devices
In this section, a number of mechanisms for storing and
interacting with the digital images, based on progressive and
interactive visual collection transmission are described. In
one or more embodiments of the present invention, variations
on the methods disclosed herein allow near-instantaneous
viewing, on a desktop computer, a mobile device, or other
devices, of a large collection of images stored on a second
mobile device; the use of remote storage to augment a mobile
device's local memory for the purpose of viewing images; and
browsing of large image collections from a mobile device. The
various permutations enabled by one or more embodiments of the
present invention may rely on a common client/server imaging
and collection representation architecture.
One or more embodiments of the present invention may
provide a method which may include providing a collection of
digital images or other visual objects on a server;
establishing communication between a client and said server;
and enabling efficient multi-scale navigation by the client of
collections of visual objects residing on the server.
131
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
In this disclosure, the term "digital image data" may
include digital photographs, digital images, visual documents
or other forms of visual content. Herein, the term "image"
generally corresponds to the term "digital image," and either
of these terms may correspond to a "digital photograph."
Herein, the term "client" generally corresponds to the term
"client side" and to the term "client device". Herein, the
terms "portable device", "portable camera device", and "camera
device" generally refer to a digital image capturing devices
and/or digital image storage devices. Herein, a "digital
image capturing device" may include but is not limited to a
digital camera, a camera-enabled mobile phone (which may be
referred to as a camera-enabled cell phone), a personal
digital assistant, and/or a digital video recorder able to
record digital still images. A"digital image capturing
device" may include devices that are capable of receiving
image data by directly optically receiving and recording such
data (such as with a standard digital camera) and may also
include devices that are able to receive image data via a
wired or wireless Internet or other network connection.
One or more embodiments of the methods described herein
may use a multi-resolution approach to address the problems of
storing, synchronizing, browsing, and organizing collections
of digital image data, which may be visual documents. Digital
photos, which may be represented as arrays of color pixels at
a certain resolution (e.g. 1024x768 pixels = 0.75 megapixels,
2592x1944 pixels = about 5 megapixels, etc.), are a common
visual document type that end-users may create in large
numbers using digital cameras, camera-enabled mobile phones,
and digital video recorders, among other devices.
One or more of the methods described herein may also
apply to visual data objects other than images, such as the
roadmap or other vector data of Applicant reference document
132
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
489/17NP (U.S. Patent Application Serial No. 11/082,556) or
the textual data of Applicant reference document 489/13 (U.S.
Provisional Patent Application Serial No. 60/617,485). . (Both
documents are identified in greater detail at the beginning of
this document, and both documents are incorporated by
reference herein).
A problem facing users of existing systems is that the
camera devices can quickly create large numbers of potentially
large visual documents. However, these devices typically
don't have sufficient memory or visual browsing facilities to
allow satisfactory archiving, viewing, or organization of
these documents.
Digital photographs or other digital image data stored in
a camera or other portable device are generally downloaded
periodically to a desktop or notebook computer, cleared from
the camera's memory to allow more pictures to be taken, and
organized and/or viewed on the desktop or notebook computer.
Thereafter, digital photographs may be shared with friends by
posting a selection of digital photographs to one or more
Internet sites.
Traditional Approach to Managing Image Data on Portable
Devices
The following steps may be followed when using a
conventional approach to managing image data on portable
devices. First, a mobile device, which may be a digital
camera or other digital image data capturing device, takes
pictures. Then, potentially after some culling of the
pictures, the pictures are downloaded to the camera user's PC
(personal computer) and deleted from the camera device. The
camera device's local storage may be limited and, in this
conventional approach, only holds images transiently, until
they are safely stored on the PC.
133
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
The PC may permanently retain in its memory (e.g. hard
disk drive or other non-volatile storage) any subset of the
digital photos. The user may in turn upload some further
culled subset of those images to a web server which may be
owned by a web photo publishing service, typically at reduced
resolution. The images uploaded may be made publicly viewable
by any third party using a web browser on a PC or other
device, or by some subset of those users with restricted
access.
Limitations of the existing approach may include lengthy
download times from the camera device to the PC. There is also
usually poor management of persistent storage on the camera
device. Camera devices typically have small color displays on
which viewers could theoretically view persistently stored
images of the same type people commonly carry in their wallets
(such as of family and pets) and photos associated with
callers or other contacts on a PDA (Personal Digital
Assistant). However, the limitations on persistent storage in
existing camera devices makes the above task difficult to
achieve.
Moreover, existing camera devices impose other
limitations. In existing camera devices, navigation of images
stored on the camera device is generally awkward and
difficult. In existing camera devices, there is a lack of a
unified visual interface to image collections which would give
users a consistent experience either on the camera device or
on a PC. Existing camera devices tend to impose very
restrictive limits on the number of photos that can be stored
thereon before downloading becomes necessary. Thus, when
employing existing approaches, a lengthy series of steps is
generally involved in making images available to a third
party.
134
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Managing Image Data According to One or More Embodiments of
the Invention
FIG. 57 is block diagram of a system 550 for managing
image data communication between one or more portable devices
562, 572 and one or more other computers in accordance with
one or more embodiments of the present invention. System 550
may include a client side 560 and a server side 570. However,
in alternative embodiments, the client and server statuses of
the groupings of devices shown in FIG. 57 may be reversed.
In one or more embodiments, system 550 may include
portable device 1 562, portable device 2 572, personal
computer 182 (which may be substantially the same as client
computer 182 of FIG. 53), server 188 (which may be
substantially the same as server computer 188 of FIG. 53)
and/or additional computers 574. Preferably, each of devices
562, 572 and computers 182, 188, and 574, have memory and one
or more displays included therewith. Alternatively or
additionally, the devices and computers of FIG. 57 could be in
communication with memories and/or displays.
FIG. 57 illustrates various possible data paths useable
in accordance with one or more embodiments of the present
invention. Qne or more embodiments may use less than all of
the data paths shown in FIG. 57. The available data paths
shown in FIG. 57 may have one or more of the following
features in common: 1) The data paths may involve server side
570 (the originator of the image data) and a client side 560
(the recipient of the image data); 2) bi-directional data
paths (which are illustrated with lines having arrows at both
ends) indicate that the devices pointed to by these arrows are
capable of serving in either a client or a server capacity;
3) the connections may employ a hard-wired network (e.g.
Universal Serial Bus (USB), Firewire or Ethernet) or a
135
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
wireless network (e.g., for nearby devices, Bluetooth, and for
more remote connections, WiFi or a wireless wide-area
networking protocol); and/or 4) the illustrated connections
may or may not be ad-hoc.
In one or more embodiments, both the client side 560 and
the server side 570 may include one or more digital computing
and/or storage devices including but not limited to: camera
devices, personal computers, and personal digital assistants.
In one or more embodiments, a client device (client) may
have one or more displays. The client may browse a collection
of documents residing on the server using one or more of the
efficient multi-resolution browsing methods described in
Applicant reference document 489/15P (U.S. Provisional
Application Serial No. 60/619,118, entitled "Method for
Efficiently Interacting with Dynamic, Remote Photo Albums with
Large Numbers of Potentially Large Images" which is
incorporated herein by reference). These methods allow large
collections of large images or other visual documents to be
navigated efficiently over low-bandwidth connections.
Zooming, panning, and dynamic rearrangement of such image
collections are described in the referenced document.
In one or more embodiments, one of the properties of this
navigation method is that the display contents may gradually
come into focus as information is sent from the server to the
client. The rate at which this information comes into focus
may be governed by the ratio of connection bandwidth to
display pixels. When the user zooms, pans, or rearranges the
documents on the client side 560 such that new content becomes
visible, this content again appears blurred, then comes into
focus.
136
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Virtual Display
In one or more embodiments, a client's "display" need not
necessarily be physical, or visible to an end-user. In one or
more embodiments, this display can be a "virtual display",
i.e. an abstract model of a display with a specified
resolution. Such a "virtual display" might be represented as
an array of pixel values in the client's memory, irrespective
of whether those pixel values are rendered to a screen. A
virtual display may include wavelet data that at least
partially describes one or more images. The wavelet data is
preferably able to represent an image at a range of possible
resolutions. In one or more embodiments, the wavelet data may
correspond to that employed using JPEG2000. In one or more
embodiments, a virtual display may include enough wavelet data
to completely describe one or more images.
For example, if it were desirable for a device to acquire
thumbnails of all of the images in a collection at a specified
resolution, then this device could create a "virtual display"
of the appropriate size, establish a connection with a server,
and request a view of the entire collection. The full set of
thumbnails could then be transmitted to and rendered on this
"virtual display". If transmission were interrupted before
all of the relevant data were sent from the server to the
client, then the client's virtual display would not yet have
all of the thumbnail images in perfectly focused condition.
However, all of the requested thumbnail images would
preferably be stored within the client's virtual display with
sufficient resolution to enable rendering visible versions of
these images on a screen. The images rendered in the manner
described would generally be of lower visual quality than if
the transmission of the image had concluded without
interruption. Thus, some image degradation may be present in
137
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
the images rendered using data from an incomplete, interrupted
transmission.
Nevertheless, the described degradation is preferable to
the prior art approach to sending a set of thumbnails across a
network, in which the complete image of each thumbnail is
transmitted in turn. Under this prior art approach, a
premature interruption of connectivity would result in some
thumbnails being available in their entirety (i.e. at full
resolution) and would result in other thumbnails being
completely unavailable. FIG. 58 illustrates this difference.
FIG. 58A illustrates the results of an incomplete image data
download employing an existing approach; and FIG. 58B
illustrates the results of an incomplete image data download
in accordance with one or more embodiments of the present
invention.
FIG. 55A shows a prior art scenario in which all of the
data for three thumbnails (shown with square shapes) have been
received, and in which the remaining nine thumbnails (shown
with X's) have not been received at all. FIG. 55B illustrates
a situation that may arise employing one or more embodiments
of the present invention, in which all twelve thumbnails
(shown as cross-hatched square shapes) have been received at
some level of resolution, which is preferably acceptable for
viewing, but which is likely below the resolution that would
be obtained after conclusion of a complete and uninterrupted
data transmission.
In one or more embodiments, a client may have a client-
side cache that caches recently viewed visual content. A
standard MRU (most-recently-used) cache may be employed for
the caching needs for one or more embodiments of the present
invention. However, a cache disclosed in U.S. Patent
Application Serial No. 11/141,958 (client reference document
138
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
489/10NP), entitled "Efficient Data Cache", which is
incorporated herein by reference, may be beneficially employed
to enable more sophisticated client-side caching. In either
case, a given amount of client-side memory may be devoted to
the cache. Thus, navigation back to a recently viewed image
may permit using image data stored in the cache, rather than
requiring that this image data be re-sent from the server.
A client may have multiple displays. A given display may
be physical or virtual. A given display may be driven
directly by user input, or it may be driven programmatically
by software within a client computer such as computer 182.
The total size in pixels of all of the displays may be fixed
or bounded by some limit, and this limit may define a minimum
amount of client-side memory needed for visual content. This
client-side memory is preferably separate from storage space
allocated to the cache memory.
An embodiment involving both a physical display and a
virtual display is described in the following. Preferably, a
physical display within a client device is visible to the user
and allows zooming and panning navigation through, and
rearrangement of, a collection of digitally stored images.
The user may also select one or more images from the
collection and send them to a "holding pen" which may serve as
a place for storing user-selected images. The holding pen may
be visualized in some way on the physical display. Adding an
image to the holding pen preferably causes the image to be
placed on the virtual display, which may be invisible to the
user. As images are added to the holding pen, the virtual
display representing the holding pen gradually fills up.
This virtual display may increase in size (as measured in
number of pixels) up to some limit, after which its size may
remain fixed at this limit. The virtual display may be too
139
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
small to display all of the images in the holding pen at full
resolution. In this case, the data storage space needed for
the images resident in the virtual display is preferably
reduced as needed to fit the images into the virtual display.
Hence, an off-screen view (the virtual display) preferably
gets supplemented with images as the user puts viewable images
into the holding pen. This supplementing of the off-screen
view may occur invisibly to the user.
A method for browsing is disclosed in U.S. Patent
Application Serial No. 10/790,253 (Applicant reference
document 489/2NP), entitled "System and Method for Exact
Rendering in a Zooming User Interface", which is incorporated
by reference herein. The method disclosed in that document
for determining the order in which,information is sent from
the server to the client based on the client's view may be
modified for a multiple display scenario. The 489/2NP
document discloses that visual information may be broken up
into tiles, with each tile covering a region in space at a
given resolution. Low-resolution tiles may then occupy large
physical areas, while high-resolution tiles may occupy smaller
physical areas, such that the amount of information in each
tile may be substantially the same.
The 489/2NP document discloses methods for ordering tiles
using criteria discussed in the following. One criterion may
be tile resolution and tile position on the display. Sorting
of tiles could be lexicographic, such that lower-resolution
tiles always precede higher-resolution tiles, with spatial
position only playing a role in resolving order within a
resolution. (Lexicographic sorting is referenced here in the
generalized tuple sense - for example, the lexicographic sort
of the set of triplets {(1, 2, 3) , (0, 3, 1) , (4, 0, 0) , (0, 0, 1) ,
(0,3,2) } would be (0,0,1), (0,3,1), (0,3,2), (1,2,3),
(4,0,0).)
140
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Alternatively, non-lexicographic sorting criteria may be
employed. For instance, a linear combination of a plurality
of properties could be used to sort tiles. Such properties
may include but are not limited to: resolution (which could be
expressed in logarithmic units) and distance of the tile from
the center of the display. Herein, the term "sort key"
corresponds to the term "sorting criterion."
In this embodiment, lower-resolution tiles may be sent in
preference to higher-resolution tiles, and tiles near the
center of the display may be sent in preference to tiles near
the periphery, but these properties can trade off against each
other.
Preferably, minor changes may be implemented to adapt the
above scheme for a multiple display scenario. In one
embodiment, display number can be added as an extra
lexicographic sort key. Thus, a first display might refine
completely (in accordance with the other sort keys) before any
tiles are sent relevant to a second display.
In another embodiment, display number can be an
additional variable for inclusion in a linear combination,
allowing display number to trade off in some fashion against
resolution and proximity to the center of the display. In yet
another embodiment, the displays can coexist in an imaginary
"common space", and the resolution and proximity-to-center
sort keys can be used as before. The "common space" is a
notional space establishing an imaginary spatial relationship
between multiple displays, as if they were regions of a
single, larger display. Defining this imaginary spatial
relationship determines all parameters needed for prioritizing
tiles in the multiple displays.
FIG. 59 is a block diagram of a "common space" 740 that
may include a physical display (screen) 742 and two virtual
141
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
displays 744, 746 in accordance with one or more embodiments
of the present invention. The physical display 742 is
preferably in the center of "common space" 740 at normal size.
Virtual displays V1 744 and V2 746 are preferably off to the
side, and V2 is preferably scaled down, so that its pixels are
preferably half the linear size of the physical display's
pixels. This means that, assuming purely lexicographic tile
sort order, the contents of each resolution level in V1 746
will preferably be sent from the server to the client after
the corresponding resolution for the physical display (since
V1 is farther from the center of the space than any point on
the physical display) . Resolutions in V2 746 may be sent
after all the tiles at a resolution twice as fine have been
sent for both the physical display 742 and V1 744. It is
noted that it isn't necessary for the "common space" 740 to
correspond to any real larger display or memory address space.
The "common space" 740 is merely a conceptual convenience for
establishing the relationships among tile priorities across
different displays.
Clearly many tradeoffs are possible. These tradeoffs can
have the consequence, as in the lexicographic example above,
of giving refinement of the physical display 742 the highest
priority, while using any excess time and bandwidth not
required for bringing the physical display into focus to
continue refining the virtual display(s) 744, 746. The
tradeoffs may alternatively begin refining the virtual
display(s) after the physical display has largely, but not
completely, come into focus. After the physical display 742
has largely come into focus, the physical and virtual displays
744, 746 can share bandwidth resources to refine in concert.
If the images in a collection are JPEG2000 images, then
any subset of the data for a given image can itself comprise a
JPEG2000 image file. During navigation of an image, the
142
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
client may progressively download image data from the server,
thereby supplementing the quality of the client's subset of
the image and giving the client the ability to create a
JPEG2000 file that is an increasingly accurate approximation
of the full image.
If a client has navigated everywhere in an image, or has
viewed the entire image at full resolution for long enough
that all of the image data has been transmitted, then the
client can recreate the entire original JPEG2000 file for that
image. If a client has zoomed in closely on only a part of a
large image, then the client could still create a JPEG2000
file, but it would lack detail everywhere except where the
client zoomed in. This property of JPEG2000 can be extended
to other multi-resolution document types as well. If the
client never zoomed in beyond a given resolution, then no
information would be available regarding the image content
beyond that given resolution. In this case, the version of the
JPEG2000 image which may be created and/or stored by the
client may have a lower overall resolution than the original
version of that image.
One application of the virtual display scenario described
above is to ameliorate the problem of long download times for
images from a camera. In one or more embodiments, the camera
or camera-enabled mobile device may operate as the server, and
a PC may operate as the client.
In one or more embodiments, rather than initiating a
time-consuming batch download of all images to the PC, when
the camera and PC are connected, the PC can rapidly browse
through the complete set of images available on the camera.
During navigation, a group of images can be selected and put
in the holding pen. Note that if all images on the camera are
to be downloaded to the PC in their entirety, then the total
143
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
time needed to accomplish the transfer remains the same as in
the prior art. However, as with the closely related problem
of thumbnail transmission, this method can provide a number of
advantages over the conventional serial download of images
which are listed and discussed below. The present invention
is not limited to the features listed below.
Image download and user navigation of the full image set
on the camera or other mobile device may be concurrent and
cooperative in their use of bandwidth (in effect, navigation
merely influences the order in which tiles are sent from
server to client).
If the PC's display is larger than the mobile device's
display, then better choices can be made about which images to
download, which to leave on the mobile device, and which to
discard, without incurring the delay of downloading the entire
set before deciding.
The experiences of browsing on the PC and on the mobile
device (assuming that it also has a display), respectively,
are preferably simple and experientially similar, thereby
increasing usability.
If lower-resolution versions of the images in the holding
pen are desired, it's preferably straightforward to suitably
limit the detail of downloaded data by reducing the size of
the item on the virtual display. It is noted that reducing
the image size in this manner may both speed up downloading by
a large factor - i.e. by a factor of 4 per resolution level
discarded - and requires less space on the PC).
By limiting the size of the virtual display and reducing
the number of images therein as desired, the amount of memory
allocated to photos on the PC can be bounded. Also, different
constraints can be placed on different photos, and hence space
144
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
can be allocated based on recency or one or more other
criteria.
In one or more embodiments, premature loss of
connectivity results in a degradation in the quality of some
or all of the images to be downloaded, instead of completely
removing some images from the download operation. (Note that
the bulk of the data volume for an image is very high-
resolution detail, some of which is camera noise, and all of
which is less critical for ordinary viewing than the coarser
image structure. Hence, it is preferable to conduct the
transmission of the high-resolution image data for all images
after the lower-resolution image data for all of the images
has been fully transmitted.) Hybrid prioritization of the
image data is also possible, for example, favoring the
complete download of a subset of the photos before proceeding
to refine a second set beyond thumbnail detail.
In one or more embodiments, one or more methods disclosed
herein are resilient to intermittent connectivity, since any
JPEG2000 object can continue to be augmented at any time with
additional information while still allowing browsing and
interaction with whatever visual data has already been
received.
With regard to the above references to a) reducing the
size of the item on the physical display and b) bounding the
amount of memory allocated to photos on the PC, it is noted
that typical home users may not want to discard any of their
images (after an initial culling of such images) . If such
users continue to add sufficient storage to their PC, then of
course it should not be necessary to discard any content. The
addition of storage can in itself increase the virtual display
maximum size. Features of (a) and (b) above can therefore be
omitted if a sufficiently large virtual display size can be
145
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
created (i.e., if there is enough available client-side
storage).
Because it may be unclear to the user on the client-side
when the "holding pen" images are finished downloading, some
form of visual indication for completion is desirable. As an
example, checkmarks or green dots can appear next to images as
they finish downloading. When all images in the "holding pen"
include green dots, the connection can be broken without loss.
Operations such as requesting that the camera discard
some of its images using the client computer (which may be a
PC) may benefit from some additional communication from the
client to the server beyond that contemplated in Applicant
reference document 489/15P. In one or more other embodiments,
the client side could also instruct the server side (which may
be a mobile device such as a digital camera or mobile phone)
to launch its own client side, and create its own view to
receive content from the PC.
This is similar to "push" methods developed in the
context of the World Wide Web. The PC can render the
camera/mobile phone's "view" of content on the PC, thus (for
example) displaying the green completion dots described above
for images uploaded from the PC to the camera. Each of the
reciprocal arrows of FIG. 57 can be implemented using either a
"push" or "pull" arrangement. Specifically, the viewport
setting, the arrangement, and other navigation settings may be
controlled from either the client side 560 ("pull") or from
the server side 570 ("push"). A user interacting with one
device can be connected reciprocally to another device,
thereby enabling both "pulling" and "pushing" to occur
simultaneously.
146
CA 02599357 2007-08-27
WO 2006/105158 PCTIUS2006/011405
We will now enumerate the potential client-server
connections shown in FIG. 57, and describe briefly how they
can be used and why they are useful.
A mobile device 562 which may be a camera or camera-
enabled mobile phone may serve content to a user's PC
(personal computer) 182. This connection might typically take
place over a USB cable or a Bluetooth ad-hoc wireless network.
The benefits are described above.
The PC 182 may serve content back to the mobile device
562. This can be useful for the following applications, among
others.
"Wallet photos" can be sent from the PC to the camera or
mobile phone, even if those photos weren't taken by the mobile
device.
The PC may be a home appliance without a display, and the
mobile device may then be used as a primary visual interface
to the archived visual material. The mobile device in this
context may be a digital camera, a camera-enabled cell phone,
a PDA, or a mobile tablet PC with a display.
A first mobile device can be connected directly to, or
form an ad-hoc network with, another mobile device (the
"guest"). The two mobile devices can then view and share each
others' photos.
The PC could upload images (via push) to a remote server.
The server may be a photo sharing service, and may therefore
implement the kind of space constraints envisioned in the
above processes of reducing the size of the item on the
physical display and bounding the amount of memory allocated
to photos on the PC. The remote server could then serve its
collection to one or more additional PCs. Typically this
would be a broadband connection. However, other connection
types could be employed.
147
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
The remote server could also serve collections to mobile
device users. Typically this would be a mobile wireless wide-
area network.
Mobile devices could upload their images via "push" (that
is, under control of the mobile devices) to a remote server.
In one or more embodiments, the upload may be automatic,
allowing the mobile device to transparently extend its
apparent storage space by transferring content freely to a
server and deleting it locally when transfers are complete.
In connection with the last two items above, it is noted
that local caching on the mobile device 562 may allow the
mobile device 562 to support browsing through very large
thumbnail collections using only local storage, even if the
local storage is limited. Zooming in on details of recently
viewed images may also be possible, if the relevant
information is still in the mobile device's local cache.
Zooming in on images whose details are only available on
a remote server could result in a blurry and un-detailed
image. If the mobile device is on a network that includes the
remove server 188 however, the blurry image can become
progressively more refined as more and more detailed image
data is downloaded to the mobile device 562. If the mobile
device is not connected to a network that can supply
additional image data, the image may not be presented with any
greater detail than is available in the initial thumbnail
image.
Montage of Low-Resolution Images
One or more embodiments of the present invention may
define precomputed steps and interactive rendering algorithms
which can be used in a variety of configurations to implement
downloading of selected images and or image regions at
148
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
controllable levels of resolution for various applications.
Many of these applications (such as focusing on regions of
interest, virtual book etc..) may involve user interaction
with a "universe" of images.
In one or more embodiments, the starting point for
precomputation may therefore be a list of the filenames, URLs,
or other strings referencing the individual images. When a
user is zoomed out far enough to view all of these images at
once, it is impractical for either the client or the server to
traverse all of the image files, as there may be a very large
number of them. For example, in the regime where individual
images occupy 2x2=4 pixels onscreen, tens of thousands or
hundreds of thousands of images may be in view. Even if these
images support efficient low-resolution access, merely opening
and closing 100,000 files involves a large overhead and could
be impractical to accomplish on an interactive timescale. It
may therefore be desirable to use a cached representation of
low-resolution versions of these images, referred to herein as'
a "montage".
In one or more embodiments, a montage may be a mosaic or
collage of all of the images, rendered at low resolution and
packed efficiently into a rectangular area, as shown in FIG.
60. Auxiliary metadata, which can be embedded in the montage
image file or stored separately, may identify rectangular
regions on the montage image with a particular image file.
In one embodiment, the montage image itself can be
navigated using a zooming and panning interface. When the
user zooms in far enough to exhaust the resolution available
in the montage version of one or more images within the
montage, the metadata for that image may refer the client to
one or more individual image files, and the client may use
149
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
imagery from these image files to render the images at higher
resolution.
In one or more embodiments, the overall size of the
montage in pixels may be chosen such that its resolution is
only exhausted when zooming in to a stage where only a small
number of images, which may be referred to herein as a"set"
of images, are visible simultaneously. Therefore, access to
more than this small number of images at high resolution is
preferably not needed at any given time. During subsequent
zooming and panning, image streams may be opened and closed as
needed to limit the number of high resolution images that are
open at any given time.
The above approach to navigating many images of high
resolution incurs a limitation: the montage layout is
preferably designed for packing efficiency, but the user may
want a different arrangement of the images onscreen.
Moreover, the user may want to be able to dynamically
rearrange the layout of images on the screen.
In one or more embodiments, to enable such rearrangement,
we can make use of a graphics rendering technique known as
"texture mapping", which may be implemented in software but
is, in general, hardware-accelerated on modern personal
computers. Texture mapping allows a portion of a "texture",
or source image, to be drawn on the display, optionally
rescaling the image, rotating it, and/or performing a three-
dimensional perspective transform. Other hardware-accelerated
transformations are often supported, including color
correction or alteration, full or partial transparency,
lighting, occlusion, and coordinate remapping. A low-
resolution version of the montage can be used as a "texture",
so that when the user is zoomed out, the individual images
within the montage can be dynamically remapped in any way, as
150
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
in FIG. 61. More than one texture map may be used, in which
case each texture map may be a montage containing a subset of
the images. Transitions between arrangements may or may not
be animated. It is noted that rearrangement can take place
while the user is zoomed in, but because the rearrangement
might result in a new zoomed-in view of an image which was
previously off-screen, the new image may initially be very
blurry.
In another embodiment, the texture mapping technique may
be used only during dynamic rearrangement of images. When the
image arrangement is static, software compositing can be used
to assemble all or part of a higher-definition rearranged
montage on-screen. This software compositing method is
especially valuable in combination with the multiresolution
rendering techniques described in US Patent application Serial
No. 10/790,253, (Applicant reference document 489/2NP),
identified in detail earlier in this disclosure. This method
may in effect creates a new "display montage" by rearranging
the imagery of the original montage.
Texture mapping may also be used to display high
resolution images, but in this case, rather than using
textures containing montages of multiple images, textures are
used that contain tiles of individual images. This technique
is also described in US Patent application Serial No.
10/790,253 (Applicant reference document 489/2NP).
In one or more embodiments, montage rearrangement may be
used to support reorganization of the images without recourse
to texture mapping.
In one or more other embodiments, texture mapping,
software rendering, or any combination of the two may be used
to render imagery in three dimensions instead of on a one-
dimensional plane. Dynamic rearrangement in three dimensions
151
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
is also possible. Three-dimensional applications may include
virtual galleries or other walk-through environments as well
as virtual books. Virtual books are described herein and
still further in Provisional Patent Application Serial No.
60/619,053 which is incorporated herein by reference.
FIG. 62 is a block diagram of a computing system 1000
adaptable for use with one or more embodiments of the present
invention. In one or more embodiments, central processing
unit (CPU) 1002 may be coupled to bus 1004. In addition, bus
1004 may be coupled to random access memory (RAM) 1006, read
only memory (ROM) 1008, input/output (I/0) adapter 1010,
communications adapter 1022, user interface adapter 1006, and
display adapter 1018.
In one or more embodiments, RAM 1006 and/or ROM 1008 may
hold user data, system data, and/or programs. I/0 adapter
1010 may connect storage devices, such as hard drive 1012, a
CD-ROM (not shown), or other mass storage device to computing
system 1000. Communications adapter 1022 may couple computing
system 1000 to a local, wide-area, or Internet network 1024.
User interface adapter 1016 may couple user input devices,
such as keyboard 1026 and/or pointing device 1014, to
computing system 1000. Moreover, display adapter 1018 may be
driven by CPU 1002 to control the display on display device
1020. CPU 1002 may be any general purpose CPU. It is noted
that the methods and apparatus described thus far and/or
described later in this document may be achieved utilizing any
of the known technologies, such as standard digital circuitry,
analog circuitry, any of the known processors that are
operable to execute software and/or firmware programs,
programmable digital devices or systems, programmable array
logic devices, or any combination of the above. One or more
embodiments of the invention may also be embodied in a
152
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
software program for storage in a suitable storage medium and
execution by a processing unit.
Although the invention herein has been described with
reference to particular embodiments, it is to be understood
that these embodiments are merely illustrative of the
principles and applications of the present invention. It is
therefore to be understood that numerous modifications may be
made to the illustrative embodiments and that other
arrangements may be devised without departing from the spirit
and scope of the present invention as defined by the appended
claims.
153
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
METHODS AND APPARATUS FOR EMPLOYING IMAGE NAVIGATION
TECHNIQUES TO ADVANCE COMMERCE
BACKGROUND OF THE INVENTION
The present invention is directed to methods and
apparatus for the application of image navigation techniques
in advancing commerce, for example, by way of providing new
environments for advertising and purchasing products and/or
services.
Digital mapping and geospatial applications are a booming
industry. They have been attracting rapidly increasing
investment from businesses in many different markets-from
candidates like Federal Express, clothing stores and fast food
chains. In the past several years, mapping has also become
one of the very few software applications on the web that
generate significant interest (so-called "killer apps"),
alongside search engines, web-based email, and matchmaking.
Although mapping should in principle be highly visual, at
the moment its utility for end users lies almost entirely in
generating driving directions. The map images which
invariably accompany the driving directions are usually poorly
rendered, convey little information, and cannot be navigated
conveniently, making them little more than window dressing.
Clicking on a pan or zoom control causes a long delay, during
which the web browser becomes unresponsive, followed by the
appearance of a new map image bearing little visual
relationship to the previous image. Although in principle
computers should be able to navigate digital maps more
effectively than we navigate paper atlases, in practice visual
navigation of maps by computer is still inferior.
154
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
DESCRIPTION OF THE INVENTION
The present invention is intended to be employed in
combination with a novel technology permitting continuous and
rapid visual navigation of a map (or any other image), even
over a low bandwidth connection. This technology relates to
new techniques for rendering maps continuously in a panning
and zooming environment. It is an application of fractal
geometry to line and point rendering, allowing networks of
roads (1D curves) and dots marking locations (OD points) to be
drawn at all scales, producing the illusion of continuous
physical zooming, while still keeping the "visual density" of
the map bounded. Related techniques apply to text labels and
iconic content. This new approach to rendition avoids such
effects as the sudden appearance or disappearance of small
roads during a zoom, an adverse effect typical of digital map
drawing. The details of this navigation technology may be
found in U.S. Patent Application No.: 10/803,010, filed on
even date herewith, entitled METHODS AND APPARATUS FOR
NAVIGATING AN IMAGE, Attorney Docket No.: 489/9, the entire
disclosure of which is hereby incorporated by reference. This
navigation technology may be referred to herein as "Voss."
The use of the Voss technology enables a number of novel
and commercially valuable business models for Internet
mapping. These models take as their point of departure the
proven success of businesses like Yahoo! Maps and MapQuest,
both of which generate revenue from geographical advertising.
Our approach, however, goes well beyond advertising,
capitalizing on the ability of new technology to add
substantial value for both businesses and end users. The
essential idea is to allow businesses and people to rent "real
estate" on the map, normally at their physical address, in
155
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
which they can embed zoomable content. This content can
appear in an iconic form-i.e. golden arches for McDonalds-when
viewed on a large-scale map, but then smoothly and
continuously resolve into any kind of web-like content when
viewed closely. By integrating our mapping application with
dynamic user content and business/residential address data, we
can thus enable a kind of "geographical world wide web".
In addition to large horizontal markets for both
consumers and retailers, synergies with existing geographical
information service (GIS) providers and related industries
would obtain, such as car navigation systems, cell phones and
PDAs, real estate rentals or sales, classified advertising,
and many more. Possible business relationships in these areas
include technology licensing, strategic partnership, and
direct vertical sales.
The capabilities of the new navigation techniques of the
present invention are described in detail in the
aforementioned U.S. patent application. For this application,
the most relevant aspects of the base technology are:
- smooth zooming and panning through a 2D world with
perceptual continuity and advanced bandwidth management;
- an infinite-precision coordinate system, allowing
visual content to be nested without limit;
- the ability to nest content stored on many different
servers, so that spatial containment is equivalent to a
hyperlink.
With respect to the latter two elements, additional
details may be found in U.S. Provisional patent application
No.: 60/474,313, filed, May 30, 2003, entitled SYSTEM AND
METHOD FOR INFINITE PRECISION COORDINATES IN A ZOOMING USER
156
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
INTERFACE, the entire disclosure of which is hereby
incorporated by reference.
A map consists of many layers of information; ultimately,
the Voss map application will allow the user to turn most of
these layers on and off, making the map highly customizable.
Layers include:
1. roads;
2. waterways;
3. administrative boundaries;
4. aerial photography-based orthoimagery (aerial
photography which has been digitally "unwarped" such that
it tiles a map perfectly);
5. topography;
6. public infrastructure locations, e.g. schools,
churches, public telephones, restrooms;
7. labels for each of the above;
8. cloud cover, precipitation and other weather
conditions;
9. traffic conditions;
10. advertising; and
11. personal and commercial user content; etc.
The most salient layers from the typical user's point of
view are 1-4 and 7. The advertising/user content layers 10-
11, which are of particular interest in this patent
application are also of significant interest. Many of the map
layers-including 1-7-are already available, at high quality
and negligible cost, from the U.S. Federal Government. Value-
added layers like 8-9 (and others) can be made available at
any time during development or even after deployment.
157
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
Several companies offer enhanced road data with
annotations indicating one-way streets, entrance and exit
ramps on highways, and other features important for generating
driving directions but not for visual geography. The most
relevant commercial geographic information service (GIS)
offering for our application is geocoding, which enables the
conversion of a street address into precise latitude/longitude
coordinates. It has been determined that obtaining geocoding
services will not be prohibitive.
In addition to map data, national Yellow Pages/White
Pages data may also be valuable in implementing the present
invention. This information may also be licensed. National
Yellow Pages/White Pages data may be used in combination with
geocoding to allow geographical user searches for businesses,
or filtering (e.g. "highlight all restaurants in Manhattan").
Perhaps most importantly, directory listings combined with
geocoding will greatly simplify associating business and
personal users with geographic locations, allowing "real
estate" to be rented or assigned via an online transaction and
avoiding the need for a large sales force.
National telephone and address databases designed for
telemarketing can be obtained inexpensively on CDs, but these
are not necessarily of high quality-their coverage is usually
only partial, and they are often out of date. A number of
companies offer robust directory servers with APIs designed
for software-oriented businesses like ours. Among the best is
W3Data (www.w3data.com), which offers what it calls "neartime"
national telephone listings using an XML-based API for
$500/month minimum, beginning at $0.10/hit and going down to
$0.05/hit for volumes of 250,000/month, or $0.03/hit for
volumes over 1,000,000/month. The entire U.S. and Canada are
covered. Reverse queries are also possible, i.e. looking up a
name given a telephone number. The "neartime" data are
158
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
updated at least every 90 days. Combined with 90-day caching
of entries already obtained on our end, this is a very
economical way to obtain high-quality national listings.
"Realtime" data, updated nightly, are also available, but are
more expensive ($0.20/hit) The realtime data are identical
to those used by 411 operators.
Service providers similar to W3Data who can produce
national business listings by category also exist, with
comparable business and pricing models.
Classic advertising-based business models, as well as
"media player" models involving a proprietary data format and
a downloadable plug-in (such as Flash, Adobe Acrobat, and Real
Player) normally face a chicken-and-egg problem. An
advertising venue only becomes worth advertising in when
people are already looking; a plug-in (even if free) only
becomes worth downloading when there is already useful content
to view; and content only becomes attractive to invest in
making if there is already an installed user base ready to
view it.
Although the Voss mapping application requires both
downloadable client software and generates revenue through
advertising, it will not suffer the disadvantages of classic
advertising-based business models. Even before any
substantial commercial space has been "rented", the present
invention will provide a useful and visually compelling way of
viewing maps and searching for addresses-that is, similar
functionality to that of existing mapping applications, but
with a greatly improved visual interface. Furthermore, the
approach of the present invention provides limited but
valuable service to non-commercial users free of charge to
attract a user base. The limited service consists of hosting
159
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
a small amount (5-15 MB) of server space per user, at the
user's geographical location-typically a house. The client
software may include simple authoring capabilities, allowing
users to drag and drop images and text into their "physical
address", which can then be viewed by any other authorized
user with the client software. (Password protection may be
available.) Because the zooming user interface approach is of
obvious benefit for navigating digital photo collections-
especially over limited bandwidth-the photo album sharing
potential alone may attract substantial numbers of users.
Additional server space may be available for a modest yearly
fee. This very horizontal market is likely to be a major
source of revenue.
The usual chicken-and-egg problem is thus avoided by
providing valuable service from, the outset, an approach that
has worked well for search engines and other useful (and now
profitable) web services. This contrasts sharply with online
dating services, for example, which are not useful until the
user base is built up.
In accordance with various aspects of the invention, the
sources of revenue may include:
1. Commercial "rental" of space on the map
corresponding to a physical address;
2. Fees for "plus services" (defined below) geared
toward commercial users;
3. Fees for "plus services" geared toward non-
commercial users;
4. Professional zoomable content authoring
software;
160
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
5. Licensing or partnerships with PDA, cell phone,
car navigation system, etc. vendors and service
providers;
6. Information.
Basic commercial rental of space on a map can be priced
using a combination of the following variables:
1. Number of sites on the map;
2. Map area ("footprint") per site, in square
meters;
3. Desirability of the real estate, based on
aggregate viewing statistics;
4. Server space needed to host content, in MB.
Plus services for commercial users are geared toward
franchises, businesses wishing to conduct e-commerce or make
other more sophisticated use of the web, and businesses
wishing to increase their advertising visibility:
1. Greater visible height-several levels may be
offered, allowing visible but unobtrusive icons or
"flags" to indicate the locations of business sites from
farther away (more zoomed out) than they would otherwise
be visible.
2. Focusing priority-Voss brings areas of the
image into focus during navigation as data become
available. By default, all visual content is treated
equally, and focusing goes from the center of the screen
outward. Focusing priority allows commercial content to
come into focus faster than it would otherwise,
increasing its prominence in the user's "peripheral
vision". This feature will be tuned to deliver
161
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
commercial value without compromising the user's
navigation experience.
3. Including a conventional web hyperlink in the
zoomable content-these may be clearly marked (e.g., with
the conventional underlined blue text) and, on the user's
click, open a web browser. We can either charge for
including such a hyperlink, or, like Google, charge per
click.
4. Making the geographic area rented refer to an
outside commercial server, which will itself host
zoomable content of any type and size-this is a fancier
version of #3, and allows any kind of e-business to be
conducted via the map. We can once again either charge a
flat fee or charge per user connecting to the outside
server (though this no longer requires a click, just a
zoom-in).
5. Billboards-as in real life, many high-
visibility areas of the map will have substantial empty
space. Companies can buy this space and insert content,
including hyperlinks and "hyperjumps", which if clicked
will make the user jump through space to a commercial
site elsewhere on the map. In contrast to ordinary
commercial space, billboard space need not be rented at a
fixed location; its location can be generated on the fly
during user navigation.
This last service raises issues of the "ecology" or
visual aesthetics and usability of the map. It is desirable
that the map is attractive and remains a genuine service for
users, which implies limits on advertising and tasteful
"zoning regulations". If the map becomes too cluttered with
billboards or other content which does not mirror real world
162
CA 02599357 2007-08-27
WO 2006/105158 PCT/US2006/011405
geography, then users will be turned off and the value of the
map as an advertising and e-commerce venue will go down.
As we usage statistics are gathered, the value of many of
these commercial "plus services" may be quantitatively
demonstrable. Quantitative evidence of a competitive
advantage should increase sales of these extras.
Plus services geared toward non-commercial users will
consist of some of the same products, but scaled, priced and
marketed differently.
Limited authoring of zoomable Voss content will be
possible from within the free client. This will include
inserting text, dragging and dropping digital photos, and
setting passwords. Professional authoring software may be a
modified version of the client designed to allow more flexible
zoomable content creation, as well as facilities for making
hyperlinks and hyperjumps, and inserting custom applets.
Use of the present invention may generate a great deal of
aggregate and individual information on spatial attention
density, navigation routes and other patterns. These data are
of commercial value.
163