Language selection

Search

Patent 2486448 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 2486448
(54) English Title: ENCRYPTED AND WATERMARKED TEMPORAL AND RESOLUTION LAYERING IN ADVANCED TELEVISION
(54) French Title: STRUCTURATION EN COUCHES TEMPORELLE, PAR RESOLUTION CODEE ET EN FILIGRANE NUMERIQUE DES TELEVISIONS DE POINTE
Status: Deemed expired
Bibliographic Data
(51) International Patent Classification (IPC):
  • H04N 21/2347 (2011.01)
(72) Inventors :
  • DEMOS, GARY A. (United States of America)
(73) Owners :
  • DOLBY LABORATORIES LICENSING CORPORATION (United States of America)
(71) Applicants :
  • DOLBY LABORATORIES LICENSING CORPORATION (United States of America)
(74) Agent: SMART & BIGGAR LLP
(74) Associate agent:
(45) Issued: 2012-01-24
(86) PCT Filing Date: 2002-06-13
(87) Open to Public Inspection: 2004-02-05
Examination requested: 2004-11-17
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/US2002/018884
(87) International Publication Number: WO2004/012455
(85) National Entry: 2004-11-17

(30) Application Priority Data: None

Abstracts

English Abstract




A method and apparatus for image compression using temporal and resolution
layering of compressed image frames, and which provides encryption and
watermarking capabilities. In particular, layered compression allows a form of
modularized decomposition of an im age that supports flexible encryption and
watermarking (1404) techniques. Using layered compression, the base layer and
various internal components of the base layer can be used to encrypt a
compressed layered movie data stream. By using such a layered subset of the
bits, the entire picture stream can be made unrecognizable by encrypting only
a small fraction of the bits of the entire stream. A variety of encryption
algorithms and strengths can be applied to various portions of the layered
stream, including enhancement layers. Encryption algorithms or keys can be
changed at each slice boundary as well, to provide greater intertwining of the
encryption and the picture stream. Watermarking (1404) tracks lost or stolen
copies byck to the source, so that the nature of theft can be determine d and
so that those involved in a theft can be identified. Watermarking (1404)
preferably uses low order bits in certain coefficients in certain frames of a
layered compression movie stream to provide reliable isentification while
being invisible or nearly invisible to the eye. An enhancement layer can also
have its own unique identifying watermark (1404) structure.


French Abstract

L'invention concerne un procédé et un appareil de compression d'image au moyen d'une structuration en couches temporelle et par résolution de trames d'images comprimées, offrant des capacités de codage et de filigrane numérique (1404). Plus particulièrement, la compression en couches permet une forme de décomposition modularisée d'une image qui supporte les techniques de codage et de filigrane numérique (1404) flexibles. La compression en couches permet l'utilisation de la couche de base et de différents composants internes de la couche de base afin de coder un train de données de film en couches comprimées. L'utilisation d'un tel sous-ensemble en couches de bits permet de rendre tout le train d'images méconnaissable grâce au seul codage d'une fraction des bits du train entier. Plusieurs algorithmes et forces de codage peuvent être appliqués à différentes parties du train en couches, y compris des couches de renforcement. Des algorithmes ou des clés de codage peuvent être changés à chaque contour de tranche afin de fournir un meilleur entrelacement du codage et du train d'images. Le filigrane numérique (1404) repère les copies perdues ou volées pour les ramener vers la source de manière que la nature du vol puisse être déterminée et que les personnes impliquées dans le vol soient identifiées. Le filigrane numérique (1404) utilise de préférence des bits de poids faible dans certains coefficients et certains cadres d'un train de films comprimés en couches afin de fournir une identification fiable tout en étant invisible ou presque à l'oeil. Une couche de renforcement peut également avoir sa propre structure de filigrane numérique (1404) d'identification unique.

Claims

Note: Claims are shown in the official language in which they were submitted.




CLAIMS:

1. A method for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including the steps of:

(a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt;

(c) applying at least one selected encryption
algorithm to encrypt each selected unit into an encrypted
unit; and

(d) applying a first unique selected encryption
algorithm to selected units from the base layer and a second
unique selected encryption algorithm to selected units from
the at least one enhancement layer.


2. The method of claim 1, further comprising
selecting units to encrypt to confound decompression or
decoding of subsequent units that depend on information in
the units to encrypt.


3. The method of claim 1, wherein at least one
selected unit is a multi-frame unit.


4. The method of claim 1, wherein at least one
selected unit is a frame unit.


5. The method of claim 1, wherein at least one
selected unit is a sub-frame unit.


6. The method of claim 1, wherein at least one
selected unit is a distributed unit.


44



7. The method of claim 1, further comprising
decrypting each encrypted unit.


8. The method of claim 1, further comprising
decrypting a plurality of encrypted units in parallel.

9. A method for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including the steps of:

(a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt; and

(c) applying at least one selected encryption
algorithm to encrypt each selected unit into an encrypted
unit, wherein the at least one selected encryption algorithm
differs between at least two selected units.


10. A method for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including the steps of:

(a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt;

(c) applying at least one selected encryption
algorithm to encrypt each selected unit into an encrypted
unit, wherein each selected encryption algorithm has a key
having a key length; and

(d) varying at least one of a value for the key
and the key length for at least some of the selected units.




11. A method for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including the steps of:

(a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt;

(c) applying at least one selected encryption
algorithm to encrypt each selected unit into an encrypted
unit;

(d) encrypting a subset of selected units from the
data stream of video information to create an encrypted
custom distribution copy;

(e) grouping remaining units from the data stream
of video information to create a bulk distribution copy,
wherein the remaining units comprise units from the data
stream of video information that are different from the
subset of selected units; and

(f) distributing the bulk distribution copy
separately from the encrypted custom distribution copy.

12. A method for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including the steps of:

(a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt; and

(c) applying at least one selected encryption
algorithm to encrypt each selected unit into an encrypted
unit, wherein each selected encryption algorithm has a key

46



generated from at least one or more of the following
factors: a previous key; a serial number of a destination
decoding device; a date or time range determined by a secure
clock; a location identifier; a number of previous uses of a
work; a personal identification number of a specific
authorizing person; or portions of a previously encrypted
data stream of video information.


13. A method for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including the steps of:

(a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt; and

(c) applying at least one selected encryption
algorithm to encrypt each selected unit into an encrypted
unit, wherein each selected encryption algorithm has a key;

(d) storing a plurality of encrypted units on an
erasable media; and

(e) erasing the encrypted units upon an expiration
of the keys.


14. A system for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including:

means for selecting at least one encryption
algorithm;

means for selecting at least one unit of one of
the base layer or at least one enhancement layer to encrypt;

47


means for applying at least one selected
encryption algorithm to encrypt each selected unit into an
encrypted unit; and

means for applying a first unique selected
encryption algorithm to selected units from the base layer
and a second unique selected encryption algorithm to
selected units from the at least one enhancement layer.
15. The system of claim 14, further comprising means
for selecting units to encrypt to confound decompression or
decoding of subsequent units that depend on information in
the units to encrypt.

16. The system of claim 14, wherein at least one
selected unit is a multi-frame unit.

17. The system of claim 14, wherein at least one
selected unit is a frame unit.

18. The system of claim 14, wherein at least one
selected unit is a sub-frame unit.

19. The system of claim 14, wherein at least one
selected unit is a distributed unit.

20. The system of claim 14, further comprising means
for decrypting each encrypted unit.

21. The system of claim 14, further comprising means
for decrypting a plurality of encrypted units in parallel.
22. A system for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including:

means for selecting at least one encryption
algorithm;

48



means for selecting at least one unit of one of
the base layer or at least one enhancement layer to encrypt;
and

means for applying at least one selected
encryption algorithm to encrypt each selected unit into an
encrypted unit, wherein the at least one selected encryption
algorithm differs between at least two selected units.


23. A system for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including:

means for selecting at least one encryption
algorithm;

means for selecting at least one unit of one of
the base layer or at least one enhancement layer to encrypt;
means for applying at least one selected
encryption algorithm to encrypt each selected unit into an
encrypted unit, wherein each selected encryption algorithm
has a key having a key length; and

means for varying at least one of a value for the
key and the key length for at least some of the selected
units.


24. A system for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including:

means for selecting at least one encryption
algorithm;

means for selecting at least one unit of one of
the base layer or at least one enhancement layer to encrypt;

49



means for applying at least one selected
encryption algorithm to encrypt each selected unit into an
encrypted unit;

means for encrypting a subset of selected units
from the data stream of video information to create an
encrypted custom distribution copy;

means for grouping remaining units from the data
stream of video information to create a bulk distribution
copy, wherein the remaining units comprise units from the
data stream of video information that are different from the
subset of selected units; and

means for distributing the bulk distribution copy
separately from the encrypted custom distribution copy.


25. A system for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including:

means for selecting at least one encryption
algorithm;

means for selecting at least one unit of one of
the base layer or at least one enhancement layer to encrypt;
and

means for applying at least one selected
encryption algorithm to encrypt each selected unit into an
encrypted unit, wherein each selected encryption algorithm
has a key generated from at least one or more of the
following factors: a previous key; a serial number of a
destination decoding device; a date or time range determined
by a secure clock; a location identifier; a number of
previous uses of a work; a personal identification number of





a specific authorizing person; or portions of a previously
encrypted data stream of video information.


26. A system for encrypting a data stream of video
information encoded and compressed into a base layer and at
least one enhancement layer, including:

means for selecting at least one encryption
algorithm;

means for selecting at least one unit of one of
the base layer or at least one enhancement layer to encrypt;
means for applying at least one selected

encryption algorithm to encrypt each selected unit into an
encrypted unit, wherein each selected encryption algorithm
has a key;

means for storing a plurality of encrypted units
on an erasable media; and

means for erasing the encrypted units upon an
expiration of the keys.


27. A computer-readable medium, having computer-
executable instructions stored thereon, for encrypting a
data stream of video information encoded and compressed into
a base layer and at least one enhancement layer, the
computer-executable instructions comprising instructions
that, when executed by a computer, cause the computer to:

select at least one encryption algorithm;

select at least one unit of one of the base layer
or at least one enhancement layer to encrypt;

apply at least one selected encryption algorithm
to encrypt each selected unit into an encrypted unit; and

51



apply a first unique selected encryption algorithm
to selected units from the base layer and a second unique
selected encryption algorithm to selected units from the at
least one enhancement layer.


28. The computer-readable medium of claim 27, further
including instructions for causing the computer to select
units to encrypt to confound decompression or decoding of
subsequent units that depend on information in the units to
encrypt.


29. The computer-readable medium of claim 27, wherein
at least one selected unit is a multi-frame unit.


30. The computer-readable medium of claim 27, wherein
at least one selected unit is a frame unit.


31. The computer-readable medium of claim 27, wherein
at least one selected unit is a sub-frame unit.


32. The computer-readable medium of claim 27, wherein
at least one selected unit is a distributed unit.


33. The computer-readable medium of claim 27, further
comprising instructions for causing the computer to decrypt
each encrypted unit.


34. The computer-readable medium of claim 27, further
comprising instructions for causing the computer to decrypt
a plurality of encrypted units in parallel.


35. A computer-readable medium, having computer-
executable instructions stored thereon, for encrypting a
data stream of video information encoded and compressed into
a base layer and at least one enhancement layer, the
computer-executable instructions comprising instructions
that, when executed by a computer, cause the computer to:


52



select at least one encryption algorithm;

select at least one unit of one of the base layer
or at least one enhancement layer to encrypt; and

apply at least one selected encryption algorithm
to encrypt each selected unit into an encrypted unit,
wherein the at least one selected encryption algorithm
differs between at least two selected units.


36. A computer-readable medium, having computer-
executable instructions stored thereon, for encrypting a
data stream of video information encoded and compressed into
a base layer and at least one enhancement layer, the
computer-executable instructions comprising instructions
that, when executed by a computer, cause the computer to:

select at least one encryption algorithm;

select at least one unit of one of the base layer
or at least one enhancement layer to encrypt;

apply at least one selected encryption algorithm
to encrypt each selected unit into an encrypted unit,
wherein each selected encryption algorithm has a key having
a key length; and

vary at least one of a value for the key and the
key length for at least some of the selected units.


37. A computer-readable medium, having computer-
executable instructions stored thereon, for encrypting a
data stream of video information encoded and compressed into
a base layer and at least one enhancement layer, the
computer-executable instructions comprising instructions
that, when executed by a computer, cause the computer to:


53



select at least one encryption algorithm;

select at least one unit of one of the base layer
or at least one enhancement layer to encrypt;

apply at least one selected encryption algorithm
to encrypt each selected unit into an encrypted unit;
encrypt a subset of selected units from the data
stream of video information to create an encrypted custom
distribution copy;

group remaining units from the data stream of
video information to create a bulk distribution copy,
wherein the remaining units comprise units from the data
stream of video information that are different from the
subset of selected units; and

distribute the bulk distribution copy separately
from the encrypted custom distribution copy.


38. A computer-readable medium, having computer-
executable instructions stored thereon, for encrypting a
data stream of video information encoded and compressed into
a base layer and at least one enhancement layer, the
computer-executable instructions comprising instructions
that, when executed by a computer, cause the computer to:

select at least one encryption algorithm;

select at least one unit of one of the base layer
or at least one enhancement layer to encrypt; and

apply at least one selected encryption algorithm
to encrypt each selected unit into an encrypted unit,
wherein each selected encryption algorithm has a key
generated from at least one or more of the following


54



factors: a previous key; a serial number of a destination
decoding device; a date or time range determined by a secure
clock; a location identifier; a number of previous uses of a
work; a personal identification number of a specific
authorizing person; or portions of a previously encrypted
data stream of video information.


39. A computer-readable medium, having computer-
executable instructions stored thereon, for encrypting a
data stream of video information encoded and compressed into
a base layer and at least one enhancement layer, the
computer-executable instructions comprising instructions
that, when executed by a computer, cause the computer to:

select at least one encryption algorithm;

select at least one unit of one of the base layer
or at least one enhancement layer to encrypt;

apply at least one selected encryption algorithm
to encrypt each selected unit into an encrypted unit,
wherein each selected encryption algorithm has a key;

store a plurality of encrypted units on an
erasable media;

erase the encrypted units upon an expiration of
the keys.



Description

Note: Descriptions are shown in the official language in which they were submitted.



CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
ENCRYPTED AND WATERMARKED TEMPORAL AND RESOLUTION
LAYERING IN ADVANCED TELEVISION

TECHNICAL FIELD
This invention relates to electronic communication systems, and more
particularly to an
advanced electronic television system having temporal and resolution layering
of compressed
image frames, and which provides encryption and watermarking capabilities.

BACKGROUND
The United States presently uses the NTSC standard for television
transmissions.
However, proposals have been made to replace the NTSC standard with an
Advanced
Television standard. For example, it has been proposed that the U.S. adopt
digital
standard-definition and advanced television formats at rates of 24 Hz, 30 Hz,
60 Hz, and 60 Hz
interlaced. It is apparent that these rates are intended to continue (and thus
be compatible with)
the existing NTSC television display rate of 60 Hz (or 59.94 Hz). It is also
apparent that "3-2
pulldown" is intended for display on 60 Hz displays when presenting movies,
which have a
temporal rate of 24 frames per second (fps). However, while the above proposal
provides a
menu ofpossible formats from which to select, each format only encodes and
decodes a single
resolution and frame rate. Because the display or motion rates of these
formats are not
integrally related to each other, conversion from one to another is difficult.
Further, this proposal does not provide a crucial capability of compatibility
with
computer displays. These proposed image motion rates are based upon historical
rates which
date back to the early part of this century. If a "clean-slate" were to be
made, it is unlikely that
these rates would be chosen. In the computer industry, where displays could
utilize any rate
over the last decade, rates in the 70 to 80 Hz range have proven optimal, with
72 and 75 Hz
being the most common rates. Unfortunately, the proposed rates of 30 and 60 Hz
lack useful
interoperability with 72 or 75 Hz, resulting in degraded temporal performance.
In addition, it is being suggested by some in the field that frame interlace
is required,
due to a claimed need to have about 1000 lines of resolution at high frame
rates, but based
upon the notion that such images cannot be compressed within the available 18-
19
mbits/second of a conventional 6 MHz broadcast television channel.
It would be much more desirable if a single signal format were to be adopted,
containing within it all of the desired standard and high definition
resolutions. However, to do
so within the bandwidth constraints of a conventional 6 MHz broadcast
television channel
1


CA 02486448 2007-11-26
51205-82

requires compression (or "scalability") of both frame rate (temporal) and
resolution (spatial).
One method specifically intended to provide for such scalaoility is the MPEG-2
standard.
Unfortunately, the temporal and spatial scalability features specified within
the IvIPEG-2
standard are not sufficiently efficient to accommodate the needs of advanced
television for the

U.S. Thus, the proposal for advanced television for the U.S. is based upon the
premise that
temporal (frame rate) and spatial (resolution) layering are inefficient, and
therefore discrete
formats are necessary.
In addition to the above issues, the inventor has identified a need to protect
and manage
the use of valuable copyrighted audio and video media such as digital movies.
The viability of
entire technologies for movie data delivery can hinge on the ability to
protect and manage

usage. As the quality of digital compressed movie masters approaches that of
the original
work, the need for protection and management methodologies becomes a crucial
requirement.
In approaching a system architecture for digital content protection and
management, it
would be very beneficial to have a variety of tools and techniques which can
be applied in a
modular and flexible way. Most commercial encryption systems have been
eventually
compromised. It is therefore necessary to architect any protection system to
be sufficiently
flexible as to adapt and strengthen itself if and when it is compromised. It
is also valuable to
place informational clues into each copy via watermarking of symbols and/or
serial number
information in order to pinpoint the source and method by which the security
has been
compromsed.
Movie distribution digitally to movie theaters is becoming feasible. The high
value
copies of new movies have long been a target for theft or copying of today's
film prints.
Digital media such as DVD have attempted crude encryption and authorization
schemes (such
as DTVX). Analog cable scramblers have been in use from the beginning to
enable charging for
premium cable channels and pay-per-view events and movies. However these crude
scramblers
have been broadly compromised.

One reason that digital and analog video systems have tolerated such poor
security
systems is that the value of the secondary video release and the loss due to
pirating is a
relatively small proportion of the market. However, for digital first-run
movies, for valuable
live events, and for high resolution images to the home and business (via
forms of HDTV),
robust security systems become a requirement.
Embodiments of the present invention overcome these and other problems of
current
digital content protection systems.

2


CA 02486448 2007-11-26
51205-82

SUMMARY
According to an aspect of the present invention,
there is provided a method for encrypting a data stream of
video information encoded and compressed into a base layer
and at least one enhancement layer, including the steps of:
(a) selecting at least one encryption algorithm;

(b) selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; and (c) applying
at least one selected encryption algorithm to encrypt each
selected unit into an encrypted unit.

According to another aspect of the present
invention, there is provided a method for encrypting and
watermarking a data stream of video information encoded and
compressed into a base layer and at least one enhancement
layer, including the steps of: (a) selecting at least one
encryption algorithm; (b) selecting at least one
watermarking technique; (c) selecting at least one unit of
one of the base layer or at least one enhancement layer to
encrypt; (d) selecting at least one unit of one of the base

layer or at least one enhancement layer to watermark;

(e) applying at least one selected watermarking technique to
watermark each selected unit as a watermarked unit; and

(f) applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit.

According to another aspect of the present
invention, there is provided a system for encrypting a data
stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including:
(a) means for selecting at least one encryption algorithm;
(b) means for selecting at least one unit of one of the base
layer or at least one enhancement layer to encrypt; and

(c) means for applying at least one selected encryption
3


CA 02486448 2010-10-08
60412-3896

least one selected encryption algorithm to encrypt each
selected unit into an encrypted unit, wherein each selected
encryption algorithm has a key having a key, length; and (d)
varying at least one of a value for the key and the key

length for at least some of the selected units.
According to another aspect of the present
invention, there is provided a method for encrypting a data
stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including the
steps of: (a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; (c) applying at
least one selected encryption algorithm to encrypt each

selected unit into an encrypted unit; (d) encrypting a
subset of selected units from the data stream of video
information to create an encrypted custom distribution copy;
(e) grouping remaining units from the data stream of video
information to create a bulk distribution copy, wherein the
remaining units comprise units from the data stream of video
information that are different from the subset of selected
units; and (f) distributing the bulk distribution copy
separately from the encrypted custom distribution copy.

According to another aspect of the present
invention, there is provided a method for encrypting a data
stream of video information encoded and compressed into a

base layer and at least one enhancement layer, including the
steps of: (a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; and (c) applying
at least one selected encryption algorithm to encrypt each
selected unit into an encrypted unit, wherein each selected
encryption algorithm has a key generated from at least one
or more of the following factors: a previous key; a serial
3a


CA 02486448 2010-10-08
60412-3896

number of a destination decoding device; a date or time
range determined by a secure clock; a location identifier; a
number of previous uses of a work; a personal identification
number of a specific authorizing person; or portions of a

previously encrypted data stream of video information.
According to another aspect of the present
invention, there is provided a method for encrypting a data
stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including the

steps of: (a) selecting at least one encryption algorithm;
(b) selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; and (c) applying
at least one selected encryption algorithm to encrypt each
selected unit into an encrypted unit, wherein each selected

encryption algorithm has a key; (d) storing a plurality of
encrypted units on an erasable media; and (e) erasing the
encrypted units upon an expiration of the keys.

According to another aspect of the present
invention, there is provided a system for encrypting a data
stream of video information encoded and compressed into a

base layer and at least one enhancement layer, including:
means for selecting at least one encryption algorithm; means
for selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; means for

applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit; and means
for applying a first unique selected encryption algorithm to
selected units from the base layer and a second unique
selected encryption algorithm to selected units from the at
least one enhancement layer.

According to another aspect of the present
invention, there is provided a system for encrypting a data
3b


CA 02486448 2010-10-08
60412-3896

stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including:
means for selecting at least one encryption algorithm; means
for selecting at least one unit of one of the base layer or

at least one enhancement layer to encrypt; and means for
applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit, wherein
the at least one selected encryption algorithm differs
between at least two selected units.

According to still another aspect of the present
invention, there is provided a system for encrypting a data
stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including:
means for selecting at least one encryption algorithm; means
for selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; means for
applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit, wherein
each selected encryption algorithm has a key having a key

length; and means for varying at least one of a value for
the key and the key length for at least some of the selected
units.

According to yet another aspect of the present
invention, there is provided a system for encrypting a data
stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including:
means for selecting at least one encryption algorithm; means
for selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; means for
applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit; means for
encrypting a subset of selected units from the data stream
of video information to create an encrypted custom
3c


CA 02486448 2010-10-08
60412-3896

distribution copy; means for grouping remaining units from
the data stream of video information to create a bulk
distribution copy, wherein the remaining units comprise
units from the data stream of video information that are

different from the subset of selected units; and means for
distributing the bulk distribution copy separately from the
encrypted custom distribution copy.

According to a further aspect of the present
invention, there is provided a system for encrypting a data
stream of video information encoded and compressed into a

base layer and at least one enhancement layer, including:
means for selecting at least one encryption algorithm; means
for selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; and means for

applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit, wherein
each selected encryption algorithm has a key generated from
at least one or more of the following factors: a previous
key; a serial number of a destination decoding device; a

date or time range determined by a secure clock; a location
identifier; a number of previous uses of a work; a personal
identification number of a specific authorizing person; or
portions of a previously encrypted data stream of video
information.
According to yet a further aspect of the present
invention, there is provided a system for encrypting a data
stream of video information encoded and compressed into a
base layer and at least one enhancement layer, including:
means for selecting at least one encryption algorithm; means

for selecting at least one unit of one of the base layer or
at least one enhancement layer to encrypt; means for
applying at least one selected encryption algorithm to
encrypt each selected unit into an encrypted unit, wherein
3d


CA 02486448 2010-10-08
60412-3896

each selected encryption algorithm has a key; means for
storing a plurality of encrypted units on an erasable media;
and means for erasing the encrypted units upon an expiration
of the keys.

According to still a further aspect of the present
invention, there is provided a computer-readable medium,
having computer-executable instructions stored thereon, for
encrypting a data stream of video information encoded and
compressed into a base layer and at least one enhancement

layer, the computer-executable instructions comprising
instructions that, when executed by a computer, cause the
computer to: select at least one encryption algorithm;
select at least one unit of one of the base layer or at
least one enhancement layer to encrypt; apply at least one

selected encryption algorithm to encrypt each selected unit
into an encrypted unit; and apply a first unique selected
encryption algorithm to selected units from the base layer
and a second unique selected encryption algorithm to

selected units from the at least one enhancement layer.
According to another aspect of the present
invention, there is provided a computer-readable medium,
having computer-executable instructions stored thereon, for
encrypting a data stream of video information encoded and
compressed into a base layer and at least one enhancement

layer, the computer-executable instructions comprising
instructions that, when executed by a computer, cause the
computer to: select at least one encryption algorithm;
select at least one unit of one of the base layer or at
least one enhancement layer to encrypt; and apply at least

one selected encryption algorithm to encrypt each selected
unit into an encrypted unit, wherein the at least one
selected encryption algorithm differs between at least two
selected units.
3e


CA 02486448 2010-10-08
60412-3896

According to yet another aspect of the present
invention, there is provided a computer-readable medium,
having computer-executable instructions stored thereon, for
encrypting a data stream of video information encoded and

compressed into a base layer and at least one enhancement
layer, the computer-executable instructions comprising
instructions that, when executed by a computer, cause the
computer to: select at least one encryption algorithm;
select at least one unit of one of the base layer or at
least one enhancement layer to encrypt; apply at least one
selected encryption algorithm to encrypt each selected unit
into an encrypted unit, wherein each selected encryption
algorithm has a key having a key length; and vary at least
one of a value for the key and the key length for at least
some of the selected units.

According to another aspect of the present
invention, there is provided a computer-readable medium,
having computer-executable instructions stored thereon, for
encrypting a data stream of video information encoded and

compressed into a base layer and at least one enhancement
layer, the computer-executable instructions comprising
instructions that, when executed by a computer, cause the
computer to: select at least one encryption algorithm;
select at least one unit of one of the base layer or at
least one enhancement layer to encrypt; apply at least one
selected encryption algorithm to encrypt each selected unit
into an encrypted unit; encrypt a subset of selected units
from the data stream of video information to create an

encrypted custom distribution copy; group remaining units
from the data stream of video information to create a bulk
distribution copy, wherein the remaining units comprise
units from the data stream of video information that are
different from the subset of selected units; and distribute

3f


CA 02486448 2010-10-08
60412-3896

the bulk distribution copy separately from the encrypted
custom distribution copy.

According to another aspect of the present
invention, there is provided a computer-readable medium,

having computer-executable instructions stored thereon, for
encrypting a data stream of video information encoded and
compressed into a base layer and at least one enhancement
layer, the computer-executable instructions comprising
instructions that, when executed by a computer, cause the

computer to: select at least one encryption algorithm;
select at least one unit of one of the base layer or at
least one enhancement layer to encrypt; and apply at least
one selected encryption algorithm to encrypt each selected
unit into an encrypted unit, wherein each selected

encryption algorithm has a key generated from at least one
or more of the following factors: a previous key; a serial
number of a destination decoding device; a date or time
range determined by a secure clock; a location identifier; a
number of previous uses of a work; a personal identification
number of a specific authorizing person; or portions of a
previously encrypted data stream of video information.
According to another aspect of the present

invention, there is provided a computer-readable medium,
having computer-executable instructions stored thereon, for
encrypting a data stream of video information encoded and

compressed into a base layer and at least one enhancement
layer, the computer-executable instructions comprising
instructions that, when executed by a computer, cause the
computer to: select at least one encryption algorithm;
select at least one unit of one of the base layer or at
least one enhancement layer to encrypt; apply at least one
selected encryption algorithm to encrypt each selected unit
into an encrypted unit, wherein each selected encryption
3g


CA 02486448 2010-10-08
60412-3896

algorithm has a key; store a plurality of encrypted units on
an erasable media; erase the encrypted units upon an
expiration of the keys.

Embodiments of the present invention provide a
method and apparatus for image compression which
demonstrably achieves better than 1000-line resolution image
compression at high frame rates with high quality.
Embodiments also achieve both temporal and resolution
scalability at this resolution at high frame rates within

the available bandwidth of a conventional television
broadcast channel. The invention technique efficiently
achieves over twice the compression ratio being proposed for
advanced television while providing for flexible encryption
and watermarking techniques.

In some embodiments, image material is captured at
an initial or primary framing rate of 72 fps. An MPEG-2
data stream is then generated, comprising:

(1) a base layer, preferably encoded using only MPEG-2 P
frames, comprising a low resolution (e.g., 1024x512 pixels),
low frame rate (24 or 36 Hz) bitstream;

3h


CA 02486448 2010-10-08
60412-3896

(2) an optional base resolution temporal enhancement layer, encoded using
onlyMPEG-2
B frames, comprising a low resolution (e.g.,1024x512 pixels), high frame rate
(72 Hz)
bitstream;

(3) an optional base temporal high-resolution enhancement layer, preferably
encoded using
only DEG-2 P frames, comprising a high resolution (e.g., 21cdk pixels), low
frame
rate (24 or 36 Hz) bitstream;

(4) an optional high resolution temporal enhancement layer, encoded using only
MPEG-2
B frames, comprising a high resolution (e.g., 2kxlk pixels), high frame rate
(72 Hz)
bitstream.
Embodiments of the invention provide a number of key technical attributes,
allowing
substantial improvement over current proposals, and including: replacement of
numerous resolutions
and frame rates with a single layered resolution and frame rate; no need for
interlace in order to
achieve better than 1000-lines of resolution for 2 megapixel images at high
frame rates (72 Hz)
within a 6 MHz television channel; compatibility with computer displays
through use of a
primary framing rate of 72 fps; and greater robustness than the current
unlayered format
proposal for advanced television, since all available bits maybe allocated to
a lower resolution
base layer when "stressful" image material is encountered.

The disclosed layered compression technology allows a form of modularized
decomposition of an image. This modularity has additional benefits beyond
allowing scalable
decoding and better stress resilience. The modularity can be further tapped as
a structure which
supports flexible encryption and watermarking techniques. The function of
encryption is to
restrict viewing, performance, copying, or other use of audio/video shows
unless one ormore
3i


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
proper keys are applied to an authorized decryption system. The function
ofwatermarking is to
track lost or stolen copies back to a source, so that the nature of the method
of theft can be
determined to improve the security of the system, and so that those involved
in the theft can be
identified.
Using layered compression, the base layer, and various internal components of
the base
layer (such as I frames and their DC coefficients, or motion vectors for P
frames) can be used
to encrypt a compressed layered movie stream. By using such a layered subset
of the bits, the
entire picture stream can be made unrecognizable (unless decrypted) by
encrypting only a
small fraction of the bits of the entire picture stream. Further, a variety of
encryption
algorithms and strengths can be applied to various portions of the layered
stream, including the
enhancement layers (which can be seen as a premium quality service, and
encrypted specially).
Encryption algorithms or keys can be changed at each slice boundary as well,
to provide
greater intertwining of the encryption and the image stream.
The inventive layered compression structure can also be used for watermarking.
The
goal of watermarking is to be reliably identifiable to detection, yet be
essentially invisible to
the eye. For example, low order bits in DC coefficients in I frames would be
invisible to the
eye, but yet could be used to uniquely identify a particular picture stream
with a watermark.
Enhancement layers can also have their own unique identifying watermark
structure.
The details of one or more embodiments of the invention are set forth in the
accompanying drawings and the description below. Other features, objects, and
advantages
of the invention will be apparent from the description and drawings, and from
the claims.
DESCRIPTION OF DRAWINGS
FIG 1 is a tuning diagram showing the pulldown rates for 24 fps and 36 fps
material to
be displayed at 60 Hz.
FIG 2 is a first preferred MPEG-2 coding pattern.
FIG 3 is a second preferred MPEG-2 coding pattern.
FIG 4 is a block diagram showing temporal layer decoding in accordance with
the
preferred embodiment of the present invention.
FIG 5 is a block diagram showing 60 Hz interlaced input to a converter that
can output
both 36 Hz and 72 Hz frames.
FIG 6 is a diagram showing a "master template" for a base MPEG-2 layer at 24
or
36 Hz.

4


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

FIG 7 is a diagram showing enhancement of a base resolution template using
hierarchical resolution scalability utilizing MPEG-2.
FIG 8 is a diagram showing the preferred layered resolution encoding process.
FIG 9 is a diagram showing the preferred layered resolution decoding process.
FIG 10 is a block diagram showing a combination of resolution and temporal
scalable
options for a decoder in accordance with the present invention.
FIG 11 is a diagram showing the scope of encryption and watermarking as a
function
of unit dependency.
FIGS. 12A and 12B show diagrams of image frames with different types of
watermarks.
FIG 13 is a flowchart showing one method of applying the encryption techniques
of
the invention.
FIG 14 is a flowchart showing one method of applying the watermarking
techniques of
the invention.
Like reference symbols in the various drawings indicate like elements.
DETAILED DESCRIPTION
Throughout this description, the preferred embodiment and examples shown
should be
considered as exemplars, rather than as limitations on the present invention.

TEMPORAL AND RESOLUTION LAYERING
Goals Of A Temporal Rate Family
After considering the problems of the prior art, and in pursuing the present
invention,
the following goals were defined for specifying the temporal characteristics
of a future digital
television system:
= Optimal presentation of the high resolution legacy of 24 frame-per-second
films.
= Smooth motion capture for rapidly moving image types, such as sports.

= Smooth motion presentation of sports and similar images on existing analog
NTSC
displays, as well as computer-compatible displays operating at 72 or 75 Hz.

= Reasonable but more efficient motion capture of less-rapidly-moving images,
such as
news and live drama.

= Reasonable presentation of all new digital types of images through a
converter box
onto existing NTSC displays.

5


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

= High quality presentation of all new digital types of images on computer-
compatible
displays.

= If 60 Hz digital standard or high resolution displays come into the market,
reasonable
or high quality presentation on these displays as well.

Since 60 Hz and 72/75 Hz displays are fundamentally incompatible at any rate
other
than the movie rate of 24 Hz, the best situation would be if either 72/75 or
60 were eliminated
as a display rate. Since 72 or 75 Hz is a required rate for N.I.I. (National
Information
Infrastructure) and computer applications, elimination of the 60 Hz rate as
being
fundamentally obsolete would be the most future-looking. However, there are
many competing
interests within the broadcasting and television equipment industries, and
there is a strong
demand that any new digital television infrastructure be based on 60 Hz (and
30 Hz). This has
lead to much heated debate between the television, broadcast, and computer
industries.
Further, the insistence by some interests in the broadcast and television
industries on
interlaced 60 Hz formats further widens the gap with computer display
requirements. Since
non-interlaced display is required for computer-like applications of digital
television systems, a
de-interlacer is required when interlaced signals are displayed. There is
substantial debate
about the cost and quality of de-interlacers, since they would be needed in
every such receiving
device. Frame rate conversion, in addition to de-interlacing, further impacts
cost and quality.
For example, that NTSC to-from PAL converters continue to be very costly and
yet conversion
performance is not dependable for many common types of scenes. Since the issue
of interlace
is a complex and problematic subject, and in order to attempt to address the
problems and issue
of temporal rate, the invention is described in the context of a digital
television standard
without interlace.

Selecting Optimal Temporal Rates
Beat Problems. Optimal presentation on a 72 or 75 Hz display will occur if a
camera or
simulated image is created having a motion rate equal to the display rate (72
or 75 Hz,
respectively), and vice versa. Similarly, optimal motion fidelity on a 60 Hz
display will result
from a 60 Hz camera or simulated image. Use of 72 Hz or 75 Hz generation rates
with 60 Hz
displays results in a 12 Hz or 15 Hz beat frequency, respectively. This beat
can be removed
through motion analysis, but motion analysis is expensive and inexact, often
leading to visible
artifacts and temporal aliasing. In the absence of motion analysis, the beat
frequency
dominates the perceived display rate, making the 12 or 15 Hz beat appear to
provide less
6


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
accurate motion than even 24 Hz. Thus, 24 Hz forms a natural temporal common
denominator
between 60 and 72 Hz. Although 75 Hz has a slightly higher 15 Hz beat with 60
Hz, its motion
is still not as smooth as 24 Hz, and there is no integral relationship between
75 Hz and 24 Hz
unless the 24 Hz rate is increased to 25 Hz. (In European 50 Hz countries,
movies are often
played 4% fast at 25 Hz; this can be done to make film presentable on 75 Hz
displays.)
In the absence of motion analysis at each receiving device, 60 Hz motion on 72
or
75 Hz displays, and 75 or 72 Hz motion on 60 Hz displays, will be less smooth
than 24 Hz
images. Thus, neither 72/75 Hz nor 60 Hz motion is suitable for reaching a
heterogeneous
display population containing both 72 or 75 Hz and 60 Hz displays.
3-2 Pulldown. A further complication in selecting an optimal frame rate occurs
due to
the use of "3-2 pulldown" combined with video effects during the telecine
(film-to-video)
conversion process. During such conversions, the 3-2 pulldown pattern repeats
a first frame (or
field) 3 times, then the next frame 2 times, then the next frame 3 times, then
the next frame 2
times, etc. This is how 24 fps film is presented on television at 60 Hz
(actually, 59.94 Hz for
NTSC color). That is, each of 12 pairs of 2 frames in one second of film is
displayed 5 times,
giving 60 images per second. The 3-2 pulldown pattern is shown in FIG 1.
By some estimates, more than half of all film on video has substantial
portions where
adjustments have been made at the 59.94 Hz video field rate to the 24 fps
film. Such
adjustments include "pan-and-scan", color correction, and title scrolling.
Further, many films
are time-adjusted by dropping frames or clipping the starts and ends of scenes
to fit within a
given broadcast scheduled. These operations can make the 3-2 pulldown process
impossible to
reverse, since there is both 59.94 Hz and 24 Hz motion. This can make the film
very difficult
to compress using the MPEG-2 standard. Fortunately, this problem is limited to
existing
NTSC-resolution material, since there is no significant library of higher
resolution digital film
using 3-2 pulldown.
Motion Blur. In order to further explore the issue of finding a common
temporal rate
higher than 24 Hz, it is useful to mention motion blur in the capture of
moving images. Camera
sensors and motion picture film are open to sensing a moving image for a
portion of the
duration of each frame. On motion picture cameras and many video cameras, the
duration of
this exposure is adjustable. Film cameras require a period of time to advance
the film, and are
usually limited to being open only about 210 out of 360 degrees, or a 58% duty
cycle. On
video cameras having CCD sensors, some portion of the frame time is often
required to "read"
the image from the sensor. This can vary from 10% to 50% of the frame time. In
some sensors,
an electronic shutter must be used to blank the light during this readout
time. Thus, the "duty
7


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
cycle" of CCD sensors usually varies from 50 to 90%, and is adjustable in some
cameras. The
light shutter can sometimes be adjusted to further reduce the duty cycle, if
desired. However,
for both film and video, the most common sensor duty cycle duration is 50%.
Preferred Rate. With this issue in mind, one can consider the use of only some
of the
frames from an image sequence captured at 60, 72, or 75 Hz. Utilizing one
frame in two, three,
four, etc., the subrates shown in TABLE 1 can be derived.

Rate 1/2 Rate 1/3 Rate 1/4 Rate 1/5 Rate 1/6 Rate
75 Hz 37.5 25 18.25 15 12.5
72 Hz 36 24 18 14.4 12
60 Hz 30 20 15 12 10
TABLE 1

The rate of 15 Hz is a unifying rate between 60 and 75 Hz. The rate of 12 Hz
is a
unifying rate between 60 and 72 Hz. However, the desire for a rate above 24 Hz
eliminates
these rates. 24 Hz is not common, but the use of 3-2 pulldown has come to be
accepted by the
industry for presentation on 60 Hz displays. The only candidate rates are
therefore 30, 36, and
37.5 Hz. Since 30 Hz has a 7.5 Hz beat with 75 Hz, and a 6 Hz beat with 72 Hz,
it is not useful
as a candidate.
The motion rates of 36 and 37.5 Hz become prime candidates for smoother motion
than
24 Hz material when presented on 60 and 72/75 Hz displays. Both of these rates
are about 50%
faster and smoother than 24 Hz. The rate of 37.5 Hz is not suitable for use
with either 60 or
72 Hz, so it must be eliminated, leaving only 36 Hz as having the desired
temporal rate
characteristics. (The motion rate of 37.5 Hz could be used if the 60 Hz
display rate for
television can be move 4% to 62.5 Hz. Given the interests behind 60 Hz, 62.5
Hz appears
unlikely 0 there are even those who propose the very obsolete 59.94 Hz rate
for new television
systems. However, if such a change were to be made, the other aspects of the
present invention
could be applied to the 37.5 Hz rate.)
The rates of 24, 36, 60, and 72 Hz are left as candidates for a temporal rate
family. The
rates of 72 and 60 Hz cannot be used for a distribution rate, since motion is
less smooth when
converting between these two rates than if 24 Hz is used as the distribution
rate, as described
above. By hypothesis, we are looking for a rate faster than 24 Hz. Therefore,
36 Hz is the
prime candidate for a master, unifying motion capture and image distribution
rate for use with
60 and 72/75 Hz displays.

8


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

As noted above, the 3-2 pulldown pattern for 24 Hz material repeats a first
frame (or
field) 3 times, then the next frame 2 times, then the next frame 3 times, then
the next frame 2
times, etc. When using 36 Hz, each pattern optimally should be repeated in a 2-
1-2 pattern.
This can be seen in TABLE 2 and graphically in FIG 1.

Rate Frame Numbers
60 Hz 1 2 3 4 5 6 7 8 9 10
24 Hz 1 1 1 2 2 3 3 3 4 4
36 Hz 1 1 2 3 3 4 4 5 6 6
TABLE 2

This relationship between 36 Hz and 60 Hz only holds for true 36 Hz material.
60 Hz
material can be "stored" in 36 Hz, if it is interlaced, but 36 Hz cannot be
reasonable created
from 60 Hz without motion analysis and reconstruction. However, in looking for
a new rate for
motion capture, 36 Hz provides slightly smoother motion on 60 Hz than does 24
Hz, and
provides substantially better image motion smoothness on a 72 Hz display.
Therefore, 36 Hz is
the optimum rate for a master, unifying motion capture and image distribution
rate for use with
60 and 72 Hz displays, yielding smoother motion than 24 Hz material presented
on such
displays.
Although 36 Hz meets the goals set forth above, it is not the only suitable
capture rate.
Since 36 Hz cannot be simply extracted from 60 Hz, 60 Hz does not provide a
suitable rate for
capture. However, 72 Hz can be used for capture, with every other frame then
used as the basis
for 36 Hz distribution. The motion blur from using every other frame of 72 Hz
material will be
half of the motion blur at 36 Hz capture. Tests of motion blur appearance of
every third frame
from 72 Hz show that staccato strobing at 24 Hz is objectionable. However,
utilizing every
other frame from 72 Hz for 36 Hz display is not objectionable to the eye
compared to 36 Hz
native capture.
Thus, 36 Hz affords the opportunity to provide very smooth motion on 72 Hz
displays
by capturing at 72 Hz, while providing better motion on 60 Hz displays than 24
Hz material by
using alternate frames of 72 Hz native capture material to achieve a 36 Hz
distribution rate and
then using 2-1-2 pulldown to derive a 60 Hz image.
In summary, TABLE 3 shows the preferred optimal temporal rates for capture and
distribution in accordance with the present invention.

9


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Preferred Rates
Capture Distribution Optimal Display Acceptable Display
72 Hz 36 Hz+36Hz 72 Hz 60 Hz

TABLE 3

It is also worth noting that this technique of utilizing alternate frames from
a 72 Hz
camera to achieve a 36 Hz distribution rate can profit from an increased
motion blur duty
cycle. The normal 50% duty cycle at 72 Hz, yielding a 25% duty cycle at 36 Hz,
has been
demonstrated to be acceptable, and represents a significant improvement over
24 Hz on 60 Hz
and 72 Hz displays. However, if the duty cycle is increased to be in the 75-
90% range, then the
36 Hz samples would begin to approach the more common 50% duty cycle.
Increasing the
duty rate may be accomplished, for example, by using "backing store" CCD
designs which
have a short blanking time, yielding a high duty cycle. Other methods may be
used, including
dual CCD multiplexed designs.

Modified MPEG-2 Compression
For efficient storage and distribution, digital source material having the
preferred
temporal rate of 36 Hz should be compressed. The preferred form of compression
for the
present invention is accomplished by using a novel variation of the MPEG-2
standard.
MPEG-2 Basics. MPEG-2 is an international video compression standard defining
a
video syntax that provides an efficient way to represent image sequences in
the form of more
compact coded data. The language of the coded bits is the "syntax." For
example, a few tokens
can represent an entire block of 64 samples. MPEG also describes a decoding
(reconstruction)
20' process where the coded bits are mapped from the compact representation
into the original,
"raw" format of the image sequence. For example, a flag in the coded bitstream
signals
whether the following bits are to be decoded with a discrete cosine transform
(DCT) algorithm
or with a prediction algorithm. The algorithms comprising the decoding process
are regulated
by the semantics defined by MPEG This syntax can be applied to exploit common
video
characteristics such as spatial redundancy, temporal redundancy, uniform
motion, spatial
masking, etc. In effect, MPEG-2 defines a programming language as well as a
data format. An
MPEG-2 decoder must be able to parse and decode an incoming data stream, but
so long as the
data stream complies with the MPEG-2 syntax, a wide variety of possible data
structures and
compression techniques can be used. The present invention takes advantage of
this flexibility


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

by devising a novel means and method for temporal and resolution scaling using
the MPEG-2
standard.
MPEG-2 uses an intraframe and an interframe method of compression. In most
video
scenes, the background* remains relatively stable while action takes place in
the foreground.
The background may move, but a great deal of the scene is redundant. MPEG-2
starts its
compression by creating a reference frame called an I (for Intra) frame. I
frames are
compressed without reference to other frames and thus contain an entire frame
of video
information. I frames provide entry points into a data bitstream for random
access, but can
only be moderately compressed. Typically, the data representing I frames is
placed in the
bitstream every 10 to 15 frames. Thereafter, since only a small portion of the
frames that fall
between the reference I frames are different from the bracketing I frames,
only the differences
are captured, compressed and stored. Two type of frames are used for such
differences 0 P (for
Predicted) frames and B (for Bi-directional Interpolated) frames.
P frames generally are encoded with reference to a past frame (either an I
frame or a
previous P frame), and, in general, will be used as a reference for future P
frames. P frames
receive a fairly high amount of compression. B frames pictures provide the
highest amount of
compression but generally require both a past and a future reference in order
to be encoded.
Bi-directional frames are never used for reference frames.
Macroblocks within P frames may also be individually encoded using intra-frame
coding. Macroblocks within B frames may also be individually encoded using
intra-frame
coding, forward predicted coding, backward predicted coding, or both forward
and backward,
or bi-directionally interpolated, predicted coding. A macroblock is a 16x16
pixel grouping of
four 8x8 DCT blocks, together with one motion vector for P frames, and one or
two motion
vectors for B frames.
After coding, an MPEG data bitstream comprises a sequence of I, P, and B
frames. A
sequence may consist of almost any pattern of I, P, and B frames (there are a
few minor
semantic restrictions on their placement). However, it is common in industrial
practice to have
a fixed pattern (e.g., IBBPBBPBBPBBPBB).
As an important part of the present invention, an MPEG-2 data stream is
created
comprising a base layer, at least one optional temporal enhancement layer, and
an optional
resolution enhancement layer. Each of these layers will be described in
detail.

11


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Temporal Scalability
Base Layer: The base layer is used to carry 36 Hz source material. In the
preferred
embodiment, one of two MPEG-2 frame sequences can be used for the base layer:
IBPBPBP
or IPPPPPP. The latter pattern is most preferred, since the decoder would only
need to decode
P frames, reducing the required memory bandwidth if 24 Hz movies were also
decoded
without B frames.
72 Hz Temporal Enhancement Layer. When using MPEG-2 compression, it is
possible
to embed a 36 Hz temporal enhancement layer as B frames within the MPEG-2
sequence for
the 36 Hz base layer if the P frame distance is even. This allows the single
data stream to
support both 36 Hz display and 72 Hz display. For example, both layers could
be decoded to
generate a 72 Hz signal for computer monitors, while only the base layer might
be decoded
and converted to generate a 60 Hz signal for television.
In the preferred embodiment, the MPEG-2 coding patterns of IPBBBPBBBPBBBP or
IPBPBPBPB both allow placing alternate frames in a separate stream containing
only temporal
enhancement B frames to take 36 Hz to 72 Hz. These coding patterns are shown
in FIGS 2 and
3, respectively. The 2-Frame P spacing coding pattern of FIG 3 has the added
advantage that
the 36 Hz decoder would only need to decode P frames, reducing the required
memory
bandwidth if 24 Hz movies were also decoded without B frames.
Experiments with high resolution images have suggested that the 2-Frame P
spacing of
FIG 3 is optimal for most types of images. That is, the construction in FIG 3
appears to offer
the optimal temporal structure for supporting both 60 and 72 Hz, while
providing excellent
results on modern 72 Hz computer-compatible displays. This construction allows
two digital
streams, one at 36 Hz for the base layer, and one at 36 Hz for the enhancement
layer B frames
to achieve 72 Hz. This is illustrated in FIG 4, which is a block diagram
showing that a 36 Hz
base layer MPEG-2 decoder 50 simply decodes the P frames to generate 36 Hz
output, which
may then be readily converted to either 60 Hz or 72 Hz display. An optional
second decoder 52
simply decodes the B frames to generate a second 36 Hz output, which when
combined with
the 36 Hz output of the base layer decoder 50 results in a 72 Hz output (a
method for
combining is discussed below). In an alternative embodiment, one fast MPEG-2
decoder 50
could decode both the P frames for the base layer and the B frames for the
enhancement layer.
Optimal Master Format. Anumber of companies are building MPEG-2 decoding chips
which operate at around 11 MPixels/second. The MPEG-2 standard has defined
some
"profiles" for resolutions and frame rates. Although these profiles are
strongly biased toward
computer-incompatible format parameters such as 60 Hz, non-square pixels, and
interlace,
12


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
many chip manufacturers appear to be developing decoder chips which operate at
the "main
profile, main level". This profile is defined to be any horizontal resolution
up to 720 pixels,
any vertical resolution up to 576 lines at up to 25 Hz, and any frame rate of
up to 480 lines at
up to 30 Hz. A wide range of data rates from approximately 1.5 Mbits/second to
about 10
Mbits/second is also specified. However, from a chip point of view, the main
issue is the rate at
which pixels are decoded. The main-level, main-profile pixel rate is about
10.5
MPixels/second.
Although there is variation among chip manufacturers, most MPEG-2 decoder
chips
will in fact operate at up to 13 MPixels/second, given fast support memory.
Some decoder
chips will go as fast as 20 MPixels/second or more. Given that CPU chips tend
to gain 50%
improvement or more each year at a given cost, one can expect some near-term
flexibility in
the pixel rate of MPEG-2 decoder chips.
TABLE 4 illustrates some desirable resolutions and frame rates, and their
corresponding pixel rates.

13


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Resolution Frame Rate Pixel Rate
X Y (Hz) (MPixels/s)
640 480 36 11.1
720 486 36 12.6
720 486 30 (for comparison) 10.5
704 480 36 12.2
704 480 30 (for comparison) 10.1
680 512 36 12.5
1024 512 24 12.6
TABLE 4

All of these formats can be utilized with MPEG-2 decoder chips that can
generate at
least 12.6 MPixels/second. The very desirable 640x480 at 36 Hz format can be
achieved by
nearly all current chips, since its rate is 11.1 MPixels/second. A widescreen
1024x512 image
can be squeezed into 680x512 using a 1.5:1 squeeze, and can be supported at 36
Hz if 12.5
MPixels/second can be handled. The highly desirable square pixel widescreen
template of
1024x512 can achieve 36 Hz when MPEG-2 decoder chips can process about 18.9
MPixels/second. This becomes more feasible if 24 Hz and 36 Hz material is
coded only with P
frames, such that B frames are only required in the 72 Hz temporal enhancement
layer
decoders. Decoders which use only P frames require less memory and memory
bandwidth,
making the goal of 19 MPixels/second more accessible.
The 1024x512 resolution template would most often be used with 2.35:1 and
1.85:1
aspect ratio films at 24 fps. This material only requires 11.8 MPixels/second,
which should fit
within the limits of most existing main level-main profile decoders.
All of these formats are shown in FIG. 6 in a "master template" for a base
layer at 24 or
36 Hz. Accordingly, the present invention provides a unique way of
accommodating a wide
variety of aspect ratios and temporal resolution compared to the prior art.
(Further discussion
of a master template is set forth below).
The temporal enhancement layer of B frames to generate 72 Hz canbe decoded
using a
chip with double the pixel rates specified above, or by using a second chip in
parallel with
additional access to the decoder memory. Under the present invention, at least
two ways exist
for merging of the enhancement and base layer data streams to insert the
alternate B frames.
First, merging can be done invisibly to the decoder chip using the MPEG-2
transport layer. The
MPEG-2 transport packets for two PIDs (Program IDs) can be recognized as
containing the
base layer and enhancement layer, and their stream contents can both be simply
passed on to a
double-rate capable decoder chip, or to an appropriately configured pair of
normal rate
14


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
decoders. Second, it is also possible to use the "data partitioning" feature
in the MPEG-2 data
stream instead of the transport layer from MPEG-2 systems. The data
partitioning feature
allows the B frames to be marked as belonging to a different class within the
MPEG-2
compressed data stream, and can therefore be flagged to be ignored by 36-Hz
decoders which
only support the temporal base layer rate.
Temporal scalability, as defined by MPEG-2 video compression, is not as
optimal as
the simple B frame partitioning of the present invention. The MPEG-2 temporal
scalability is
only forward referenced from a previous P or B frame, and thus lacks the
efficiency available
in the B frame encoding proposed here, which is both forward and backward
referenced.
Accordingly, the simple use of B frames as a temporal enhancement layer
provides a simpler
and more efficient temporal scalability than does the temporal scalability
defined within
MPEG-2. Notwithstanding, this use of B frames as the mechanism for temporal
scalability is
fully compliant with MPEG-2. The two methods of identifying these B frames as
an
enhancement layer, via data partitioning or alternate PID's for the B frames,
are also fully
compliant.
50/60 Hz Temporal enhancement layer. In addition to, or as an alternative to,
the 72 Hz
temporal enhancement layer described above (which encodes a 36 Hz signal), a
60 Hz
temporal enhancement layer (which encodes a 24 Hz signal) can be added in
similar fashion to
the 36 Hz base layer. A 60 Hz temporal enhancement layer is particular useful
for encoding
existing 60 Hz interlaced video material.
Most existing 60 Hz interlaced material is video tape for NTSC in analog, Dl,
or D2
format. There is also a small amount of Japanese HDTV (SMPTE 240/260M). There
are also
cameras which operate in this format. Any such 60 Hz interlaced format can be
processed in
known fashion such that the signal is de-interlaced and frame rate converted.
This process
involves very complex image understanding technology, similar to robot vision.
Even with
very sophisticated technology, temporal aliasing generally will result in
"misunderstandings"
by the algorithm and occasionally yield artifacts. Note that the typical 50%
duty cycle of
image capture means that the camera is "not looking" half the time. The
"backwards wagon
wheels" in movies is an example of temporal aliasing due to this normal
practice of temporal
undersampling. Such artifacts generally cannot be removed without human-
assisted
reconstruction. Thus, there will always be cases which cannot be automatically
corrected.
However, the motion conversion results available in current technology should
be reasonable
on most material.



CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

The price of a single high definition camera or tape machine would be similar
to the
cost of such a converter. Thus, in a studio having several cameras and tape
machines, the cost
of such conversion becomes modest. However, performing such processing
adequately is
presently beyond the budget of home and office products. Thus, the complex
processing to
remove interlace and convert the frame rate for existing material is
preferably accomplished at
the origination studio. This is shown in FIG 5, which is a block diagram
showing 60 Hz
interlaced input from cameras 60 or other sources (such as non-film video
tape) 62 to a
converter 64 that includes a de-interlacer function and a frame rate
conversion function that
can output a 36 Hz signal (36 Hz base layer only) and a 72 Hz signal (36 Hz
base layer plus
36 Hz from the temporal enhancement layer).
As an alternative to outputting a 72 Hz signal (36 Hz base layer plus 36 Hz
from the
temporal enhancement layer), this conversion process can be adapted to produce
a second
MPEG-2 24 Hz temporal enhancement layer on the 36 Hz base layer which would
reproduce
the original 60 Hz signal, although de-interlaced. If similar quantization is
used for the 60 Hz
temporal enhancement layer B frames, the data rate should be slightly less
than the 72 Hz
temporal enhancement layer, since there are fewer B frames.
>60I-X36+36=72
>60I-X36+24=60
>72 -~ 36, 72, 60
>50 I -~ 36, 50, 72
>60 - 24, 36, 72

The vast majority of material of interest to the United States is low
resolution NTSC.
At present, most NTSC signals are viewed with substantial impairment on most
home
televisions. Further, viewers have come to accept the temporal impairments
inherent in the use
of 3-2 pulldown to present film on television. Nearly all prime-time
television is made on film
at 24 frames per second. Thus, only sports, news, and other video-original
shows need be
processed in this fashion. The artifacts and losses associated with converting
these shows to a
36/72 Hz format are likely to be offset by the improvements associated with
high-quality
de-interlacing of the signal.
Note that the motion blur inherent in the 60 Hz (or 59.94 Hz) fields should be
very
similar to the motion blur in 72 Hz frames. Thus, this technique of providing
a base and
enhancement layer should appear similar to 72 Hz origination in terms of
motion blur.
Accordingly, few viewers will notice the difference, except possibly as a
slight improvement,
16


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
when interlaced 60 Hz NTSC material is processed into a 36 Hz base layer, plus
24 Hz from
the temporal enhancement layer, and displayed at 60 Hz. However, those who buy
new 72 Hz
digital non-interlaced televisions will notice a small improvement when
viewing NTSC, and a
major improvement when viewing new material captured or originated at 72 Hz.
Even the
decoded 36 Hz base layer presented on 72 Hz displays will look as good as high
quality digital
NTSC, replacing interlace artifacts with a slower frame rate.
The same process can also be applied to the conversion of existing PAL 50 Hz
material
to a second MPEG-2 enhancement layer. PAL video tapes are best slowed to 48 Hz
prior to
such conversion. Live PAL requires conversion using the relatively unrelated
rates of 50, 36,
and 72 Hz. Such converter units presently are only affordable at the source of
broadcast
signals, and are not presently practical at each receiving device in the home
and office.
Resolution Scalability
It is possible to enhance the base resolution template using hierarchical
resolution
scalability utilizing MPEG-2 to achieve higher resolutions built upon a base
layer. Use of
enhancement can achieve resolutions at 1.5x and 2x the base layer. Double
resolution can be
built in two steps, by using 3/2 then 4/3, or it can be a single factor-of-two
step. This is shown
in FIG. 7.
The process of resolution enhancement can be achieved by generating a
resolution
enhancement layer as an independent MPEG-2 stream and applying MPEG-2
compression to
the enhancement layer. This technique differs from the "spatial scalability"
defined with
MPEG-2, which has proven to be highly inefficient. However, MPEG-2 contains
all of the
tools to construct an effective layered resolution to provide spatial
scalability. The preferred
layered resolution encoding process of the present invention is shown in FIG
8. The preferred
decoding process of the present invention is shown in FIG 9.
Resolution Layer Coding. In FIG 8, an original 2kxlk image 80 is filtered in
conventional fashion to 1/2 resolution in each dimension to create a 1024x512
base layer 81.
The base layer 81 is then compressed according to conventional MPEG-2
algorithms,
generating an MPEG-2 base layer 82 suitable for transmission. Importantly,
full MPEG-2
motion compensation can be used during this compression step. That same signal
is then
decompressed using conventional MPEG-2 algorithms back to a 1024x512 image 83.
The
1024x512 image 83 is expanded (for example, by pixel replication, or
preferably by better
filters such as spline interpolation) to a first 2kxlk enlargement 84.

17


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Meanwhile, as an optional step, the filtered 1024x512 base layer 81 is
expanded to a
second 2kxlk enlargement 85. This second 2kxlk enlargement 85 is subtracted
from the
original 2kxlk image 80 to generate an image that represents the top octave of
resolution
between the original high resolution image 80 and the original base layer
image 81. The
resulting image is optionally multiplied by a sharpness factor or weight, and
then added to the
difference between the original 2kxlk image 80 and the second 2kxlk
enlargement 85 to
generate a center-weighted 2kxlk enhancement layer source image 86. This
enhancement layer
source image 86 is then compressed according to conventional MPEG-2
algorithms,
generating a separate MPEG-2 resolution enhancement layer 87 suitable for
transmission.
Importantly, full MPEG-2 motion compensation can be used during this
compression step.
Resolution Layer Decoding. In FIG 9, the base layer 82 is decompressed using
conventional MPEG-2 algorithms back to a 1024x512 image 90. The 1024x512 image
90 is
expanded to a first 2kxlk image 91. Meanwhile, the resolution enhancement
layer 87 is
decompressed using conventional MPEG-2 algorithms back to a second 2kxlk image
92. The
first 2kxlk image 91 and the second 2kxlk image 92 are then added to generate
a high-
resolution 2kxlk image 93.
Improvements Over MPEG-2. In essence, the enhancement layer is created by
expanding the decoded base layer, taking the difference between the original
image and the
decode base layer, and compressing. However, a compressed resolution
enhancement layer
may be optionally added to the base layer after decoding to create a higher
resolution image in
the decoder. The inventive layered resolution encoding process differs from
MPEG-2 spatial
scalability in several ways:

= The enhancement layer difference picture is compressed as its own MPEG-2
data
stream, with I, B, and P frames. This difference represents the major reason
that
resolution scalability, as proposed here, is effective, where MPEG-2 spatial
scalability
is ineffective. The spatial scalability defined within MPEG-2 allows an upper
layer to
be coded as the difference between the upper layer picture and the expanded
base layer,
or as a motion compensated MPEG-2 data stream of the actual picture, or a
combination of both. However, neither of these encodings is efficient. The
difference
from the base layer could be considered as an I frame of the difference, which
is
inefficient compared to a motion-compensated difference picture, as in the
present
invention. The upper-layer encoding defined within MPEG-2 is also inefficient,
since it
is identical to a complete encoding of the upper layer. The motion compensated
18


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
encoding of the difference picture, as in the present invention, is therefore
substantially
more efficient.

= Since the enhancement layer is an independent MPEG-2 data stream, the MPEG-2
systems transport layer (or another similar mechanism) must be used to
multiplex the
base layer and enhancement layer.

= The expansion and resolution reduction filtering can be a gaussian or spline
function,
which are more optimal than the bilinear interpolation specified in MPEG-2
spatial
scalability.

= The image aspect ratio must match between the lower and higher layers in the
preferred embodiment. In MPEG-2 spatial scalability, extensions to width
and/or
height are allowed. Such extensions are not allowed in the preferred
embodiment due
to efficiency requirements.

= Due to efficiency requirements, and the extreme amounts of compression used
in the
enhancement layer, the entire area of the enhancement layer is not coded.
Usually, the
area excluded from enhancement will be the border area. Thus, the 2kxlk
enhancement
layer source image 86 in the preferred embodiment is center-weighted. In the
preferred
embodiment, a fading function (such as linear weighting) is used to "feather"
the
enhancement layer toward the center of the image and away from the border edge
to
avoid abrupt transitions in the image. Moreover, any manual or automatic
method of
determining regions having detail which the eye will follow can be utilized to
select
regions which need detail, and to exclude regions where extra detail is not
required. All
of the image has detail to the level of the base layer, so all of the image is
present. Only
the areas of special interest benefit from the enhancement layer. In the
absence of other
criteria, the edges or borders of the frame can be excluded from enhancement,
as in the
center-weighted embodiment described above. The MPEG-2 parameters
"lower layer prediction horizontal&vertical offset" parameters used as signed
negative integers, combined with the "horizontal&vertical subsampling_factor
m&n"
values, can be used to specify the enhancement layer rectangle's overall size
and
placement within the expanded base layer.

= A sharpness factor is added to the enhancement layer to offset the loss of
sharpness
which occurs during quantization. Care must be taken to utilize this parameter
only to
restore the clarity and sharpness of the original picture, and not to enhance
the image.
As noted above with respect to FIG 8, the sharpness factor is the "high
octave" of
resolution between the original high resolution image 80 and the original base
layer
19


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
image 81 (after expansion). This high octave image will be quite noisy, in
addition to
containing the sharpness and detail of the high octave of resolution. Adding
too much
of this image can yield instability in the motion compensated encoding of the
enhancement layer. The amount that should be added depends upon the level of
the
noise in the original image. A typical weighting value is 0.25. For noisy
images, no
sharpness should be added, and it even maybe advisable to suppress the noise
in the
original for the enhancement layer before compressing using conventional noise
suppression techniques which preserve detail.

= Temporal and resolution scalability are intermixed by utilizing B frames for
temporal
enhancement from 36 to 72 Hz in both the base and resolution enhancement
layers. In
this way, four possible levels of decoding performance are possible with two
layers of
resolution scalability, due to the options available with two levels of
temporal
scalability.

These differences represent substantial improvements over MPEG-2 spatial and
temporal scalability. However, these differences are still consistent with
MPEG-2 decoder
chips, although additional logic may be required in the decoder to perform the
expansion and
addition in the resolution enhancement decoding process shown in FIG 9. Such
additional
logic is nearly identical to that required by the less effective MPEG-2
spatial scalability.
Optional Non-MPEG-2 Coding of the Resolution Enhancement Layer. It is possible
to
utilize a different compression technique for the resolution enhancement layer
than MPEG-2.
Further, it is not necessary to utilize the same compression technology for
the resolution
enhancement layer as for the base layer. For example, motion-compensated block
wavelets can
be utilized to match and track details with great efficiency when the
difference layer is coded.
Even if the most efficient position for placement of wavelets jumps around on
the screen due
to changing amounts of differences, it would not be noticed in the low-
amplitude enhancement
layer. Further, it is not necessary to cover the entire image 0 it is only
necessary to place the
wavelets on details. The wavelets can have their placement guided by detail
regions in the
image. The placement can also be biased away from the edge.
Multiple Resolution Enhancement Layers. At the bit rates being described here,
where
2 MPixels (2048x1024) at 72 frames per second are being coded in 18.5
mbits/second, only a
base layer (1024x512 at 72fps) and a single resolution enhancement layer have
been
successfully demonstrated. However, the anticipated improved efficiencies
available from
further refinement of resolution enhancement layer coding should allow for
multiple resolution


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
enhancement layers. For example, it is conceivable that a base layer at
512x256 could be
resolution-enhanced by four layers to 1024x512, 1536x768, and 2048x1024. This
is possible
with existing MPEG-2 coding at the movie frame rate of 24 frames per second.
At high frame
rates such as 72 frames per second, MPEG-2 does not provide sufficient
efficiency in the
coding of resolution-enhancement layers to allow this many layers at present.

Mastering Formats
Utilizing a template at or near 2048x 1024 pixels, it is possible to create a
single digital
moving image master format source for a variety of release formats. As shown
in FIG 6, a
2kxlk template can efficiently support the common widescreen aspect ratios of
1.85:1 and
2.35:1. A 2kxlk template can also accommodate 1.33:1 and other aspect ratios.
Although integers (especially the factor of 2) and simple fractions (3/2 &
4/3) are most
efficient step sizes in resolution layering, it is also possible to use
arbitrary ratios to achieve
any required resolution layering. However, using a 2048x1024 template, or
something near it,
provides not only a high quality digital master format, but also can provide
many other
convenient resolutions from a factor of two base layer (1kx512), including
NTSC, the U.S.
television standard.
It is also possible to scan film at higher resolutions such as 4kx2k, 4kx3k,
or 4kx4k.
Using optional resolution enhancement, these higher resolutions can be created
from a central
master format resolution near 2kxlk. Such enhancement layers for film will
consist of both
image detail, grain, and other sources of noise (such as scanner noise).
Because of this
noisiness, the use of compression technology in the enhancement layer for
these very high
resolutions will require alternatives to MPEG-2 types of compression.
Fortunately, other
compression technologies exist which can be utilized for compressing such
noisy signals,
while still maintaining the desired detail in the image. One example of such a
compression
technology is motion compensated wavelets or motion compensated fractals.
Preferably, digital mastering formats should be created in the frame rate of
the film if
from existing movies (i.e., at 24 frames per second). The common use of both 3-
2 pulldown
and interlace would be inappropriate for digital film masters. For new digital
electronic
material, it is hoped that the use of 60 Hz interlace will cease in the near
future, and be
replaced by frame rates which are more compatible with computers, such as 72
Hz, as
proposed here. The digital image masters should be made at whatever frame rate
the images
are captured, whether at 72 Hz, 60 Hz, 36 Hz, 37.5 Hz, 75 Hz, 50 Hz, or other
rates.

21


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

The concept of a mastering format as a single digital source picture format
for all
electronic release formats differs from existing practices, where PAL, NTSC,
letterbox, pan-
and-scan, HDTV, and other masters are all generally independently made from a
film original.
The use of a mastering format allows both film and digital/electronic shows to
be mastered
once, for release on a variety of resolutions and formats.

Combined Resolution and Temporal Enhancement Layers
As noted above, both temporal and resolution enhancement layering can be
combined.
Temporal enhancement is provided by decoding B frames. The resolution
enhancement layer
also has two temporal layers, and thus also contains B frames.
For 24 fps film, the most efficient and lowest cost decoders might use only P
frames,
thereby minimizing both memory and memory bandwidth, as well as simplifying
the decoder
by eliminating B frame decoding. Thus, in accordance with the present
invention, decoding
movies at 24 fps and decoding advanced television at 36 fps could utilize a
decoder without B
frame capability. B frames can then be utilized between P frames to yield the
higher temporal
layer at 72 Hz, as shown in FIG 3, which could be decoded by a second decoder.
This second
decoder could also be simplified, since it would only have to decode B frames.
Such layering also applies to the enhanced resolution layer, which can
similarly utilize
only P and I frames for 24 and 36 fps rates. The resolution enhancement layer
can add the full
temporal rate of 72 Hz at high resolution by adding B frame decoding within
the resolution
enhancement layer.
The combined resolution and temporal scalable options for a decoder are
illustrated in
FIG 10. This example also shows an allocation of the proportions of an
approximately
18 inbits/second data stream to achieve the spatio-temporal layered Advanced
Television of the
present invention.
In FIG 10, a base layer MPEG-2 1024x512 pixel data stream (comprising only
P frames in the preferred embodiment) is applied to a base resolution decoder
100.
Approximately 5 inbits/per sec of bandwidth is required for the P frames. The
base resolution
decoder 100 can decode at 24 or 36 fps. The output of the base resolution
decoder 100
comprises low resolution, low frame rate images (1024x512 pixels at 24 or 36
Hz).
The B frames from the same data stream are parsed out and applied to a base
resolution
temporal enhancement layer decoder 102. Approximately 3 mbits/per sec of
bandwidth is
required for such B frames. The output of the base resolution decoder 100 is
also coupled to
the temporal enhancement layer decoder 102. The temporal enhancement layer
decoder 102
22


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

can decode at 36 fps. The combined output of the temporal enhancement layer
decoder 102
comprises low resolution, high frame rate images (1024x512 pixels at 72 Hz).
Also in FIG 10, a resolution enhancement layer MPEG-2 2kxlk pixel data stream
(comprising only P frames in the preferred embodiment) is applied to a base
temporal high
resolution enhancement layer decoder 104. Approximately 6 mbits/per sec of
bandwidth is
required for the P frames. The output of the base resolution decoder 100 is
also coupled to the
high resolution enhancement layer decoder 104. The high resolution enhancement
layer
decoder 104 can decode at 24 or 36 fps. The output of the high resolution
enhancement layer
decoder 104 comprises high resolution, low frame rate images (2kxlk pixels at
24 or 36 Hz).
The B frames from the same data stream are parsed out and applied to a high
resolution
temporal enhancement layer decoder 106. Approximately 4 mbits/per sec of
bandwidth is
required for such B frames. The output of the high resolution enhancement
layer decoder 104
is coupled to the high resolution temporal enhancement layer decoder 106. The
output of the
temporal enhancement layer decoder 102 is also coupled to the high resolution
temporal
enhancement layer decoder 106. The high resolution temporal enhancement layer
decoder 106
can decode at 36 fps. The combined output of the high resolution temporal
enhancement layer
decoder 106 comprises high resolution, high frame rate images (2kxlk pixels at
72 Hz).
Note that the compression ratio achieved through this scalable encoding
mechanism is
very high, indicating excellent compression efficiency. These ratios are shown
in TABLE 5 for
each of the temporal and scalability options from the example in FIG 10. These
ratios are
based upon source RGB pixels at 24 bits/pixel. (If the 16 bits/pixel of
conventional 4:2:2
encoding or the 12 bits/pixel of conventional 4:2:0 encoding are factored in,
then the
compression ratios would be 3/4 and 1/2, respectively, of the values shown.)

Layer Resolution Rate Data Rate 0 mb/s MPixels/s Comp.
(Hz) (typical) Ratio
(typical)
Base lkx5l2 36 5 18.9 90
Base Temp. lkx5l2 72 8 (5+3) 37.7 113
High 2kxlk 36 11(5+6) 75.5 165
High Temp. 2kxlk 72 18 (5+3+6+4) 151 201
for comparison:
CCIR 601 720x486 29.97 5 10.5 50
TABLE 5

These high compression ratios are enabled by two factors:
23


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
1) The high temporal coherence of high-frame-rate 72 Hz images;
2) The high spatial coherence of high resolution 2kxlk images;
3) Application of resolution detail enhancement to the important parts of the
image (e.g., the
central heart), and not to the less important parts (e.g., the borders of the
frame).

These factors are exploited in the inventive layered compression technique by
taking
advantage of the strengths of the MPEG-2 encoding syntax. These strengths
include
bi-directionally interpolated B frames for temporal scalability. The MPEG-2
syntax also
provides efficient motion representation through the use of motion-vectors in
both the base and
enhancement layers. Up to some threshold of high noise and rapid image change,
MPEG-2 is
also efficient at coding details instead of noise within an enhancement layer
through motion
compensation in conjunction with DCT quantization. Above this threshold, the
data bandwidth
is best allocated to the base layer. These MPEG-2 mechanisms work together
when used
according to the present invention to yield highly efficient and effective
coding which is both
temporally and spatially scalable.
In comparison to 5 mbits/second encoding of CCIR 601 digital video, the
compression
ratios in TABLE 5 are much higher. One reason for this is the loss of some
coherence due to
interlace. Interlace negatively affects both the ability to predict subsequent
frames and fields,
as well as the correlation between vertically adjacent pixels. Thus, a major
portion of the gain
in compression efficiency described here is due to the absence of interlace.
The large compression ratios achieved by the present invention can be
considered from
the perspective of the number of bits available to code each MPEG-2
macroblock. As noted
above, macroblock is a 16x16 pixel grouping of four 8x8 DCT blocks, together
with one
motion vector for P frames, and one or two motion vectors for B frames. The
bits available per
macroblock for each layer are shown in TABLE 6.


24


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Layer Data Rate 0 mb/s MPixels/s Average Available
(typical) Bits/Macroblock
Base 5. 19 68
Base Temporal 8 (5+3) 38 54
High 11(5+6) 76 37 overall, 20/enh. layer
High w/border 11(5+6) 61 46 overall, 35/enh. layer
around hi-res
center
High Temporal 18 (5+3+6+4) 151 30 overall, 17/enh. layer
High Temporal 18 (5+3+6+4) 123 37 overall, 30/enh. layer
w/border around
hi-res center
for comparison:
CCIR 601 5 10.5 122
TABLE 6

The available number of bits to code each macroblock is smaller in the
enhancement
layer than in the base layer. This is appropriate, since it is desirable for
the base layer to have
as much quality as possible. The motion vector requires 8 bits or so, leaving
10 to 25 bits for
the macroblock type codes and for the DC and AC coefficients for all four 8x8
DCT blocks.
This leaves room for only a few "strategic" AC coefficients. Thus,
statistically, most of the
information available for each macroblock must come from the previous frame of
an
enhancement layer.
It is easily seen why the MPEG-2 spatial scalability is ineffective at these
compression
ratios, since there is not sufficient data space available to code enough DC
and AC coefficients
to represent the high octave of detail represented by the enhancement
difference image. The
high octave is represented primarily in the fifth through eighth horizontal
and vertical AC
coefficients. These coefficients cannot be reached if there are only a few
bits available per
DCT block.
The system described here gains its efficiency by utilizing motion compensated
prediction from the previous enhancement difference frame. This is
demonstrably effective in
providing excellent results in temporal and resolution (spatial) layered
encoding.
Graceful Degradation The temporal scaling and resolution scaling techniques
described here work well for normal-running material at 72 frames per second
using a 2kxlk
original source. These techniques also work well on film-based material which
runs at 24 fps.
At high frame rates, however, when a very noise-like image is coded, or when
there are


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
numerous shot cuts within an image stream, the enhancement layers may lose the
coherence
between frames which is necessary for effective coding. Such loss is easily
detected, since the
buffer-fullness/rate-control mechanism of a typical MPEG-2 encoder/decoder
will attempt to
set the quantizer to very coarse settings. When this condition is encountered,
all of the bits
normally used to encode the resolution enhancement layers can be allocated to
the base layer,
since the base layer will need as many bits as possible in order to code the
stressful material.
For example, at between about 0.5 and 0.33 MPixels per frame for the base
layer, at 72 frames
per second, the resultant pixel rate will be 24 to 36 MPixels/second. Applying
all of the
available bits to the base layer provides about 0.5 to 0.67 million additional
bits per frame at
18.5 mbits/second, which should be sufficient to code very well, even on
stressful material.
Under more extreme cases, where every frame is very noise-like and/or there
are cuts
happening every few frames, it is possible to gracefully degrade even further
without loss of
resolution in the base layer. This can be done by removing the B frames coding
the temporal
enhancement layer, and thus allow use of all of the available bandwidth (bits)
for the I and P
frames of the base layer at 36 fps. This increases the amount of data
available for each base
layer frame to between about 1.0 and 1.5 mbits/frame (depending on the
resolution of the base
layer). This will still yield the fairly good motion rendition rate of 36 fps
at the fairly high
quality resolution of the base layer, under what would be extremely stressful
coding
conditions. However, if the base-layer quantizer is still operating at a
coarse level under about
18.5 mbits/second at 36 fps, then the base layer frame rate can be dynamically
reduced to 24,
18, or even 12 frames per second (which would make available between 1.5 and 4
mbits for
every frame), which should be able to handle even the most pathological moving
image types.
Methods for changing frame rate in such circumstances are known in the art.
The current proposal for U.S. advanced television does not allow for these
methods of
graceful degradation, and therefore cannot perform as well on stressful
material as the
inventive system.
In most MPEG-2 encoders, the adaptive quantization level is controlled by the
output
buffer fullness. At the high compression ratios involved in the resolution
enhancement layer of
the present invention, this mechanism may not function optimally. Various
techniques can be
used to optimize the allocation of data to the most appropriate image regions.
The conceptually
simplest technique is to perform a pre-pass of encoding over the resolution
enhancement layer
to gather statistics and to search out details which should be preserved. The
results from the
pre-pass can be used to set the adaptive quantization to optimize the
preservation of detail in
the resolution enhancement layer. The settings can also be artificially biased
to be non-uniform
26


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
over the image, such that image detail is biased to allocation in the main
screen regions, and
away from the macroblocks at the extreme edges of the frame.
Except for leaving an enhancement-layer border at high frame rates, none of
these
adjustments are required, since existing decoders function well without such
improvements.
However, these further improvements are available with a small extra effort in
the
enhancement layer encoder.

Conclusion
The choice of 36 Hz as a new common ground temporal rate appears to be
optimal.
Demonstrations of the use of this frame rate indicate that it provides
significant improvement
over 24 Hz for both 60 Hz and 72 Hz displays. Images at 36 Hz can be created
by utilizing
every other frame from 72 Hz image capture. This allows combining a base layer
at 36 Hz
(preferably using P frames) and a temporal enhancement layer at 36 Hz (using B
frames) to
achieve a 72 Hz display.
The "future-looking" rate of 72 Hz is not compromised by the inventive
approach,
while providing transition for 60 Hz analog NTSC display. The invention also
allows a
transition for other 60 Hz displays, if other passive-entertainment-only
(computer
incompatible) 60 Hz formats under consideration are accepted.
Resolution scalability can be achieved though using a separate MPEG-2 image
data
stream for a resolution enhancement layer. Resolution scalability can take
advantage of the
B frame approach to provide temporal scalability in both the base resolution
and enhancement
resolution layers.
The invention described here achieves many highly desirable features. It has
been
claimed by some involved in the U.S. advanced television process that neither
resolution nor
temporal scalability can be achieved at high definition resolutions within the
approximately
18.5 mbits/second available in terrestrial broadcast. However, the present
invention achieves
both temporal and spatial-resolution scalability within this available data
rate.
It has also been claimed that 2 MPixels at high frame rates cannot be achieved
without
the use of interlace within the available 18.5 mbits/second data rate.
However, achieves not
only resolution (spatial) and temporal scalability, it can provide 2 MPixels
at 72 frames per
second.

In addition to providing these capabilities, the present invention is also
very robust,
particularly compared to the current proposal for advanced television. This is
made possible by
the allocation of most or all of the bits to the base layer when very
stressful image material is
27


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
encountered. Such stressful material is by its nature both noise-like and very
rapidly changing.
In these circumstances, the eye cannot see detail associated with the
enhancement layer of
resolution. Since the bits are applied to the base layer, the reproduced
frames are substantially
more accurate than the currently proposed advanced television system, which
uses a single
constant higher resolution.
Thus, the inventive system optimizes both perceptual and coding efficiency,
while
providing maximum visual impact. This system provides a very clean image at a
resolution
and frame rate performance that had been considered by many to be impossible.
It is believed
that the inventive system is likely to outperform the advanced television
formats being
proposed at this time. In addition to this anticipated superior performance,
the present
invention also provides the highly valuable features of temporal and
resolution layering.
ENCRYPTION & WATERMARKING
Overview
Layered compression allows a form of modularized decomposition of an image
that
supports flexible encryption and watermarking techniques. Using layered
compression, the
base layer and various internal components of the base layer can be used to
encrypt and/or
watermark a compressed layered movie data stream. Encrypting and watermarking
the
compressed data stream reduces the amount of required processing compared to a
high
resolution data stream, which must be processed at the rate of the original
data. The amount
of computing time required for encryption and watermarking depends on the
amount of
data that must be processed. For a particular level of computational
resources, reducing the
amount of data through layered compression can yield improved encryption
strength, or
reduced the cost of encryption/decryption, or a combination of both.
Encryption allows protection of the compressed image (and audio) data so that
only
users with keys can easily access the information. Layered compression divides
images
into components: a temporal and spatial base layer, plus temporal and spatial
enhancement
layer components. The base layer is the key to decoding a viewable picture.
Thus, only the
temporal and spatial base layer need be encrypted, thereby reducing the
required amount of
computation. The enhancement layers, both temporal and spatial, are of no
value without
the decrypted and decompressed base layer. Accordingly, by using such a
layered subset of
the bits, the entire picture stream can be made unrecognizable by encrypting
only a small
fraction of the bits of the entire stream. A variety of encryption algorithms
and strengths
can be applied to various portions of the layered stream, including
enhancement layers.

28


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Encryption algorithms or keys also can be changed as often as at each slice
boundary (a
data stream structure meant for signal error recovery), to provide greater
intertwining of the
encryption and the picture stream.
Watermarking invisibly (or nearly invisibly) marks copies of a work. The
concept
originates with the practice of placing an identifiable symbol within paper to
ensure that a
document (e.g., money) is genuine. Watermarking allows the tracking of copies
which may
be removed from the possession of an authorized owner or licensee. Thus,
watermarking
can help track lost or stolen copies back to a source, so that the nature of
the method of
theft can be determined and so that those involved in a theft can be
identified.
The concept of watermarking has been applied to images, by attempting to place
a
faint image symbol or signature on top of the real image being presented. The
most widely
held concept of electronic watermarking is that it is a visible low-amplitude
image,
impressed on top of the visible high-amplitude image. However, this approach
alters the
quality of the original image slightly, similar to the process-of impressing a
network logo in
the corner of the screen on television. Such alteration is undesirable because
it reduces
picture quality.
In the compressed domain, it is possible to alter signals and impress
watermark
symbols or codes upon them without these watermark alterations being applied
directly in
the visual domain. For example, the DCT transformation operates in frequency
transform
space. Any alterations in this space, especially if corrected from frame to
frame, may be
much less visible (or completely invisible). Watermarking preferably uses low
order bits in
certain coefficients in certain frames of a layered compression movie stream
to provide
reliable identification while being invisible or nearly invisible to the eye.
Watermarking can
be applied to the base layer of a compressed data stream. However, it is
possible to protect
enhancement layers to a much greater degree than the base layer, since the
enhancement
layers are very subtle in detail to begin with. Each enhancement layer can
have its own
unique identifying watermark structure.
In general, care must be taken to ensure that encryption and watermarking are
co-
mingled such that the watermark cannot be easily stripped out of the stream.
For this
reason, it is valuable to apply watermarks in a variety of useful locations
within a layered
data stream. However, since the watermark is most useful in detection of
pirates and the
path of the piracy, one must assume that encryption may have been completely
or partially
compromised, and thus watermarking should be robustly ingrained in the data
stream in
such a way that no simple procedure can be applied to remove the various
watermarks. The
29


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
preferred approach is to have a secure master representation of a work and
provide random
variations from the master to uniquely create each watermark. Such random
variations
cannot be removed, since there is no way from the final stream to detect what
the variations
might have been. However, to guard against additional random variations being
added to
the pirated stream to confound the watermark (perhaps by adding visible levels
of noise to
the image), it is useful to have a variety of other techniques (such as the
motion vector
second-best technique described below) to define watermarks.
Encryption preferably operates in such a fashion as to scramble, or at least
visibly
impair, as many frames as possible from the smallest possible units of
encryption.
Compression systems such as the various types of MPEG and motion-compensated-
wavelets utilize a hierarchy of units of information which must be processed
in cascade in
order to decode a range of frames (a "Group of Pictures," or GOP). This
characteristic
affords opportunities early in the range of concatenated decoded units to
encrypt in such a
way as to scramble a large range of frames from a small number of parameters.
Further, to
protect a work commercially, not every unit need by encrypted or confounded by
the
encryption of a higher-level unit. For example, a film may be rendered
worthless to pirates
if every other minute of film frames or particularly important plot or action
scenes are
encrypted or confounded.
In contrast, watermarking has the goal of placing a symbol and/or serial-
number-
style identification marks on the image stream which are detectable to
analysis, but which
are invisible or nearly invisible in the image (i.e., yielding no significant
visual
impairment). Thus, watermarking preferably is applied in portions of the
decoding unit
chain which are near the end of the hierarchy of units, to yield a minimum
impact on each
frame within a group of frames.
For example, FIG 11 shows a diagram of the scope of encryption and
watermarking
as a function of unit dependency with respect to I, P, and B frames.
Encryption of any
frame confounds all subsequent dependent frames. Thus, encryption of the first
I frame
confounds all P and B frames derived from that I frame. In contrast, a
watermark on that I
frame generally would not carry over to subsequent frames, and thus it is
better to
watermark the larger number of B frames to provide greater prevalence of the
watermark
throughout the data stream.
Units of Video Information. A compressed MPEG-type or motion-compensated
wavelets bitstream is parsed by normally extracting and processing various
fundamental
units of compressed information in video. This is true of the most efficient
compression


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
systems such as MPEG-2, MPEG-4, and motion-compensated wavelets (considering
wavelets to have I, P, and B frame equivalents). Such units may consist of
multi-frame
units (such as a GOP), single frame units (e.g., I, P, and B frame types and
their motion-
compensated-wavelet equivalents), sub-frame units (such as AC and DC
coefficients,
macro blocks, and motion vectors), and "distributed units" (described below).
When GOPs are used as a unit of encryption, each GOP can be encrypted with
independent methods and/or keys. In this way, each GOP can have the benefits
of unique
treatment and modularity, and can be decoded and/or decrypted in parallel or
out-of-order
with other GOPs in non-realtime or near-realtime (slightly delayed by a few
seconds)
applications (such as electronic cinema and broadcast). The final frames need
only be
ordered for final presentation.
As suggested above, encryption of certain units may confound proper decoding
of
other units that dependent on information derived from the encrypted unit.
That is, some
information within a frame may be required for decoding the video information
of
subsequent frames; encrypting only the earlier frame confounds decoding of
later frames
that are not otherwise encrypted. Thus, in selecting units to encrypt, it is
useful to note how
encryption of particular units can confound the usability of other, related
units. For
example, multiple frames spanning a GOP are influenced at various levels as
set forth in
TABLE 7:

Encryption of this Unit: Confounds:
I frame starting a GOP entire associated GOP
P frame within a GOP the remainder of the GOP
B frame within a GOP only itself as a frame
TABLE 7

Further, an entire frame need not be encrypted to confound some or all of a
GOP.
Sub-units of frames may be encrypted and still have a confounding affect,
while reducing
encryption and decryption processing time. For example, encryption of the
certain intra-
frame units influences subsequent frames at various levels as set forth in
TABLE 8:

31


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Encryption of this Unit: Confounds:
DC coefficients in an I frame (the top hierarchy all frames in the associated
GOP
unit of DC coefficients)
DC coefficients in a P frame the remainder of the associated GOP
DC Coefficients in a B frame (the bottom only that frame
hierarchy unit of DC coefficients)
motion vectors in a P frame the remainder of the associated GOP
motion vectors in a B frame only that frame
AC coefficients in an I frame (top hierarchy the remainder of the associated
GOP
unit of AC coefficients)
AC coefficients in a P frame (bottom hierarchy the remainder of the associated
GOP
unit of AC coefficients)
macroblock mode bits in a P frame (e.g., the the remainder of the associated
GOP
macroblock modes "forward predict", "intra",
and "4MV")
macroblock mode bits in a B frame ("forward", only that frame
"backward", "bi-directional", "direct" mode
bits in MPEG-4)
"slice boundaries" (usually left-edge beginnings only the following slice
of macroblock lines, where various parameters
are reset)
left column of macroblocks for each P and B that P frame and subsequent P
frames
frame (e.g., motion vectors are reset at the left for P; the corresponding B
frame for B
column, with each macroblock to the right
being differentially determined from the left
(and possibly above in MPEG-4))
base layer in a resolution enhancement layer itself and all higher resolution
layers
system (top hierarchy unit in resolution)
enhancement layer in resolution (bottom and itself and all higher resolution
layers
lower units of resolution)
base temporal layer (top hierarchy unit) itself and all higher temporal layers
temporal enhancement layer(s) (B frames, its own frames
lower temporal hierarchy units)
TABLE 8

Delay can be applied in many applications (such as broadcast and digital
cinema),
allowing an aggregation of items from units of like types to be encrypted
before
transmission. This allows for a "distributed unit", where the bits comprising
an
encryption/decryption unit are physically allocated across a data stream in
conventional
units of the type described above, making decrypting without knowledge of the
key even
more difficult. For decryption, a sufficient number of conventional units
would be
aggregated (e.g., in a buffer) and decrypted as a group. For example, DC
coefficients can
be collected into groups for an entire frame or GOP. Similarly, motion vectors
are coded

32


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
differentially and predicted one to the next from one macroblock to the next
throughout the
frame, and thus can be encrypted and decrypted in aggregations. Variable-
length-coding
tables can also be aggregated into groups and form modular units between
"start codes".
Additional examples of units or subunits that can be aggregated, encrypted,
and then have
the encrypted bits separated or spread in the data stream include: motion
vectors, DC
coefficients, AC coefficients, and quantizer scale factors.

Application of Encryption
In the preferred embodiment, one or more of the units described above (or
other
data stream units with similar properties) may be selected for encryption, and
each unit can
be encrypted independently rather than as a combined stream (as with MPEG-l,
MPEG-2,
and MPEG-4). Encryption of each unit may use different keys of different
strengths (e.g.,
number of bits per key) and may use different encryption algorithms.
Encryption can be applied uniquely to each distinct copy of a work (when
physical
media is used, such as DVD-RAM), so that each copy has its own key(s).
Alternatively, an
encryption algorithm can be applied on the assembled stream with critical
portions of the
stream removed from the data stream or altered before encryption (e.g., by
setting all
motion vectors for the left-hand macroblocks to zero), thus defining a bulk
distribution
copy. The removed or altered portion can then be encrypted separately and
uniquely for
each display site, thereby defining a custom distribution copy that is sent
separately to
individual sites in a convenient manner (e.g., satellite transmission, modem,
Internet, etc.).
This technique is useful, for example, where the bulk of a work is distributed
on a medium
such as a DVD-ROM, while unique copies of the smaller critical compression
units are
separately sent, each with their own unique keys, to independent recipient
destinations
(e.g., by satellite, Internet, modem, express delivery, etc.). Only when the
custom portion is
decrypted and recombined with the decrypted bulk distribution copy will the
entire work be
decodable as a video signal. The larger the bandwidth (size capacity) of such
custom
information, the larger the portion of the image that can be custom encrypted.
This
technique can be used with watermarking as well.
A variant of this approach is to encrypt a subset of units from a data stream
as a
custom distribution copy, and not encrypt the remaining units at all. The
remaining units
may be distributed in bulk form, separately from the custom distribution copy.
Only when
the custom portion is decrypted and recombined with the unencrypted bulk
distribution
copy will the entire work be decodable as a video signal.

33


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
One or more overall encryptions can be concatenated or combined with special
customized encryptions for various of the crucial units of video decoding
information. For
example, the entire video data stream may be "lightly" encrypted (e.g., using
a short key or
simple algorithm) while certain key units of the data stream are more
"heavily" encrypted
(e.g., using a longer key or more complex algorithm). For example, in one
embodiment, the
highest resolution and/or temporal layers may be more heavily encrypted to
define a
premium signal that provides the best appearing image when properly decrypted.
Lower
layers of the image would be unaffected by such encryption. This approach
would allow
different grades of signal service for end-users.
If units are encrypted independently of each other, then decryption may be
performed in parallel using one or more concurrently processed decryption
methods on
separate units within the compressed image stream.

Application of Watermarking
With respect to the units discussed above, and other units having similar
properties,
various points within a compressed video data stream are suitable for applying
watermarks
in various ways, including:

= In transform space or realspace or combinations thereof.

= In the least significant bits (LSBs) of the DC coefficients. For example,
the DC
coefficients can have extra bits (10 and 11 bits are allowed in MPEG2, and up
to 14 bits in MPEG4). The low order bit(s) can code a specific watermark
identifier without degrading the image in any visible way. Further, these low
order bits might only be present in I frames, since a clear watermark need not
be
present on every frame.

= In noise patterns within the LSBs of the AC coefficients.

= In low amplitude overall picture low frequencies, coded one-frame-to-the-
next,
forming a visually undetectable imaged pattern. For example, this might be a
small number of low signal amplitude letters or numbers on each frame, where
each letter is very large and soft. For example, where a pixel should have a
binary value of "84", the watermark process could instead set the value to
"83";
the watermark at this location this has a value of "1 ". The difference is
essentially invisible to the eye, but forms a code in the compressed data
stream.
Such an imaged pattern would be detected by subtracting the decoded image
from the unperturbed (unwatermarked) decompressed original (and also from
34


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
the uncompressed original source work), and then greatly increasing the
amplitude. A series of very large blurry letters or numbers would then appear.

= In frames which do not propagate (such as I frames, the last P frame before
an I
frame, and B frames), using marks of extremely low visibility. These frames
also are displayed only briefly.

= At slice boundaries (usually left-edge beginnings of macroblock lines).
Watermarks at these points generally comprise imposed patterns of minor pixel
data
variations. In some cases, these variations form images or symbols that are
invisible or nearly
invisible to the eye due to the very low amplitude of the bit variations in
terms of pixel
brightness and color and/or due to the brevity of display. For example, FIGS.
12A and 12B
show diagrams of image frames 1200 with different types of watermarks. FIG 12A
shows a
frame 1200 with a single symbol ("X") 1202 in one corner. FIG 12B shows a
frame 1200 with
a set of marks (dots, in this example) 1204 scattered around the frame 1200.
Such watermarks
are detectable only by data comparison to yield the watermark signal. For
example, a precise
decoder can detect LSB variations between an original work and a watermarked
work that are
invisible to the eye, but which uniquely watermark the customized copy of the
original work.
Other forms of watermarking may be used that do not impose specific images or
symbols, but do form unique patterns in the data streams. For example, certain
decisions of
coding are nearly invisible, and may be used to watermark a data stream. For
example,
minor rate control variations are invisible to the eye, but can be used to
mark each copy
such that each copy has a slightly different number of AC coefficients in some
locations.
Examples of other such decisions include:

= Rate control variations within an I frame.

= Rate control variations within P and B frames.

= Specific AC coefficient allocations, affecting LSBs.

Similarly, second-best choices for motion vectors which are nearly as good as
optimum motion vectors may be used to create a watermark code. Also, a system
can use
second-best selections for exactly the same SADs (sum of absolute differences,
a common
motion vector match criteria) when and where they occur. Other non-optimum
(e.g., third
and higher ranked) motion vector matches can also be used, if needed, with
very little
visual impairment. Such second-choice (and higher) motion vectors need only be
used
occasionally (e.g., a few per frame) in a coherent pattern to form a watermark
code.



CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Image variations are less visible near the periphery of the frame (i.e., near
the top,
bottom, right, edge, and left edge). It is therefore better to apply image or
symbol type
watermarks to image edge regions if the selected watermark is possibly
slightly visible.
Watermark methods of very low visibility (such as second-best motion vectors
or rate
control variations) can be used everywhere on the image.
Watermarking also can be coded as a unique serial-number-style code for each
watermarked copy. Thus, 1,000 copies of an original work would each be
watermarked in a
slightly different fashion using one or more techniques described above. By
tracking where
each watermarked copy is shipped, it is possible to determine which copy was
the source of
unauthorized duplication when a watermark is found in an unauthorized copy.

Watermark Detection
Most of these methods for watermarking require that the original decompressed
image be used as a reference for comparison with each watermarked copy in
order to reveal
(decipher) the watermark. The differences between the two images will disclose
the
watermark. Thus, it is necessary to keep the master decompressed source in a
secure place.
Security is required because possession of a copy of the master decompressed
source
provides sufficient information with which to defeat many of the watermarking
methods.
However, theft of the watermarking comparison master is itself detectable,
since the master
is automatically "watermarked" to be a perfect match to itself. When the
master is used to
confound a copy (i.e., find and remove the watermark), it implies possession
of a master.
Use of low amplitude, large blurry symbols or images as a watermark has the
advantage that such symbols or images are detectable not only by comparison
against the
decompressed master source, but also by comparison against the uncompressed
original
work. Thus, an original uncompressed work can be stored in an independent
secure
environment, such that low-amplitude watermarks can be used within the
original
(otherwise unvaried) compressed master source. In this way, a watermark
comparison
reference would remain if either the original work or the
compressed/decompressed master
source are stolen. However, possession of both would allow defeat of both
classes of
watermarks.

36


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Watermark Vulnerability
Important to the use of watermarking is an understanding of methods that might
be
used to defeat or confound the detection of such marks. Some watermark methods
are
subject to confounding by adding small amounts of noise to the image. While
this may
degrade the image quality somewhat, the degradation might still be visually
small, but
sufficient to confound deciphering of the watermark. Watermark techniques
which are
vulnerable to being confounded by adding noise include use of LSBs in DC or AC
coefficients.
Other watermark methods are much more difficult to confound using noise. Those
watermark techniques which are resistant to confounding by noise, but which
can still be
readily detected, include low amplitude overall picture low frequency image
variations
(such as a low-amplitude, very blurry large word superimposed on the image),
second-best
motion vectors, and minor rate control variations.
It is thus valuable to utilize multiple methods of watermarking in order to
defeat
simple methods which attempt to confound the detection of the watermark.
Further, use of
encryption ensures that watermarks cannot be altered unless the encryption is
compromised. Accordingly, watermarking preferably is used in conjunction with
encryption of a suitable strength for the application.

Tool-Kit Approach
The various concepts of encryption and watermarking comprising this aspect of
the
invention are preferably embodied as a set of tools which can be applied to
the task of
protecting valuable audio/video media. The tools can be combined by a content
developer
or distributor in various ways, as desired, to create a protection system for
a layered
compressed data stream.
For example, FIG 13 is a flowchart showing one method of applying the
encryption
techniques of the invention. A unit to be encrypted is selected (STEP 1300).
This maybe any
of the units described above (e.g., a distributed unit, a multi-frame unit, a
single frame unit, or
a sub-frame unit), or other units with similar properties. An encryption
algorithm is selected
(STEP 1302). This maybe a single algorithm applied throughout an encryption
session, or may
be a selection per unit, as noted above. Suitable algorithms are well known,
and include, for
example, both private and public key algorithms, such as DES, Triple DES, RSA,
Blowfish,
etc. Next, one or more keys are generated (STEP 1304). This involves selection
of both key
length and key value. Again, this maybe a single selection applied throughout
an encryption
37


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
session, or truly maybe a selection per unit, as noted above. Lastly, the unit
is encrypted using
the selected algorithm and key(s) (STEP 1306). The process then repeats for a
next unit. Of
course, a number of the steps maybe carried out in different orders,
particularly steps 1300,
1302, and 1304.
For decompression, the relevant key(s) would be applied to decrypt the data
stream.
Thereafter, the data stream would be decompressed and decoded, as described
above, to
generate a displayable image.
FIG 14 is a flowchart showing one method of applying the watermarking
techniques of
the invention. A unit to be watermarked is selected (STEP 1400). Again, this
maybe any of the
units described above (e.g., a distributed unit, a multi-frame unit, a single
frame unit, or a sub-
frame unit), or other units with similar properties. One or more watermarking
techniques are
then selected, such as a noise-tolerant method and a non-noise tolerant method
(STEP 1402).
This may be a single selection applied throughout a watermarking session, or
truly may be a
selection per unit (or class of units, where two or more watermarking
techniques are applied to
different types of units). Lastly, the selected unit is watermarked using the
selected technique
(STEP 1404). The process then repeats for a next unit. Of course, a number of
the steps maybe
carried out in different orders, particularly steps 1400 and 1402. Further,

Key Management
Encryption/decryption keys may be tied to various items of information, in
order to
construct more secure or synchronized keys. For example, public or private
encryption and
decryption keys may be generated to include or be derived from any of the
following
components:

= Previous keys.

= A serial number of a destination device (e.g., a theater projector having a
secure
serial number).

= A date or time range (using a secure clock), such that the key only works
during
specific time periods (e.g., only on certain days of the week, or only for a
relative period, such as one week). For example, an encryption system may plan
for the use of a secure GPS (global positioning satellite) in the decoder as a
source for time. The decrypting processor would only need access to that
secure
time source to decrypt the image file or stream.

38


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884

= Location of the decryption processor. A GPS capability would allow fairly
exact
real-time location information to be incorporated into a key. An internet
protocol (IP) static address of a known destination could also be used.

= Accounting records of the number of previous showings of a work, as reported
(manually or automatically) by each theater.

= A "PIN" (personal identification number) of a specific authorizing person
(e.g.,
a theater manager).

= Physical customized-encrypted movies (such as DVD's, where each is uniquely
keyed to a specific movie theater) can be used such that the possession of the
encrypted movie itself by a key holder at the intended site is a form of key
authorization for a subsequent movie. For example, playback of a portion of
the
movie and transmission of that portion to a remote key generation site can be
part of the key authorization protocol. Further, the use of the encrypted
movie
data as a key element can be tied to a secure media erasure key when a
distribution copy is stored on an erasable media, such as hard disk or DVD-
RAM. In this way, the previous movie is erased as part of the key process for
obtaining the new movie.

= Keys can also be active for a specific number of showings or other natural
units
of use, requiring new keys subsequently.

Various methods of managing distribution of keys for decryption can be
applied.
Different key management strategies can be applied to each type of use and
each type of
data delivery (whether network data transfer, satellite, or physical disk or
tape media).
Following are examples of key distribution and management procedures:

= Keys can be stored on a media (e.g., floppy disk, CDROM) and physically
shipped to a destination via overnight shipping, or transmitted electronically
or
in text format (e.g., by facsimile, email, direct-connect data transmission,
Internet transmission, etc.).

= Public key methods can also be used with local unique keys, as well as
authenticated third-party key verification.

= Keys maybe themselves encrypted and electronically transmitted (e.g., via
direct-connect data transmission, Internet transmission, email, etc.), with
pre-
defined rules at each destination (e.g., theater) for how to decrypt and apply
the
keys.

39


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
= Possession of a current key may be required as a condition of obtaining or
utilizing new keys. The current key value may be transmitted to a key
management site by any suitable means, as noted above; the new key can be
returned by one of the means noted above.

= Use of a decryption key may require a "key handshake" with a key management
site that validates or authorizes application of the key for every instance of
decryption. For example, a decryption key may need to be combined with
additional symbols maintained by the key management site, where the specific
symbols vary from use to use. Use of key handshakes can be used for every
showing, or for every length of time of use, or for other natural value units.
Since such uses may also be a natural unit of accounting, key management can
also be integrally tied to accounting systems which log uses or use durations,
and apply appropriate charges to the key holder (e.g., rental charges per
showing for a theater). For example, both key management and use logging can
be tied to a key authorization server system which can simultaneously handle
the accounting for each authorized showing or use duration.
Some keys may be pre-authorized keys versus keys which are authorized onsite.
Pre-authorized keys generally would be issued one at a time by a key
management site. For
onsite key authorization, a key management site may issue a set of keys to a
theater, thus
allowing a local manager to authorize additional decryptions (and hence
showings) of a
movie which is more popular than originally projected,, to accommodate
audience
demand). If such keys are used, the system is preferably designed to signal
(e.g., by email
or data record sent over the Internet or by modem) the key management site
about the
additional showings, for accounting purposes.

Conclusion
Different aspects of the invention that are considered to be novel include
(without
limitation) the following concepts:

= Encryption applied to layered compression

= Watermarking applied to layered compression

= Unique encryption applied to each layer in a layered system, requiring
different
keys, authorizations, or algorithms to unlock each independent layer

= Unique watermarking applied to each layer in order to identify the
particular
layer (using a method such as a serial number)


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
= Utilizing sub-frame units of compressed image streams for encryption or
watermarking

= Utilizing multiple simultaneous watermark methods in order to protect
against
methods which attempt to confound the detection of a particular type of
watermark

= Utilizing multiple simultaneous encryption methods and strengths, thus
requiring multiple independent decryption systems in order to decode the
various units within the single-layer or layered compressed image stream

= Parallel decryption using one or more simultaneous decryption methods on
various units within the compressed image stream

= Tying keys to accounting systems

= Tying encryption to specific media and/or a specific target location or
serial
number

= Tying encryption to a secure clock and date range of use

= Tying encryption to a specific number of uses with a secure use counter
= Using the movie itself as a key to obtain new movies or keys

= Erasure of the movie data on physical media when used as a key to obtain new
movies, or when the duration of authorized use expires

= Use of a flexible key toolkit approach, so that key use methods can be
continuously refined in order to improve flexibility, convenience of use, and
security

= Use second-best (or third, etc., best) motion vectors as a watermark
technique
= Use of minor rate control variations as a watermark technique (applied to
any
combination of I, B, and/or P-type frames, as well as their motion-compensated-

wavelet equivalents)

= Use of low-order bit variations in DC and/or AC coefficients as a watermark
technique (applied to I, B, and/or P type frames, and their equivalents).

= Use of low amplitude blurry letters or numbers uniquely added to each copy
of
the image during compression to uniquely watermark each copy

= Applying encryption to portions of the bitstream which affect large portions
of
the image stream (high influence for encryption)

= Applying overall encryption for the bulk of a work, plus customized
encryption(s) for selected units

41


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
= Encrypting small portions of the data stream, and sending these by point-to-
point methods to each specific location (including tying to serial numbers,
keys,
personnel codes, IP addresses, and other unique identifiers at that specific
location)

= Applying watermarking to portions of the bitstream which have low influence
on other frames, to minimize visibility

= Use of image edge regions (near top, bottom, left edge, and right edge) for
potentially visible watermarks (such as low-amplitude letters and numbers or
LSBs in DC or AC coefficients), to minimize visual impact

= Extraction of sub-frame unit influence points for independent encryption,
such
as left column (slice start) motion vectors, DC and AC Coefficients in I
frames,
prediction mode bits, control codes, etc.

COMPUTER IMPLEMENTATION
The invention maybe implemented in hardware (e.g., an integrated circuit) or
software,
or a combination of both. However, preferably, the invention is implemented in
computer
programs executing on one or more programmable computers each comprising at
least a
processor, a' data storage system (including volatile and non-volatile memory
and/or storage
elements), an input device, and an output device. Program code is applied to
input data to
perform the functions described herein and generate output information. The
output
information is applied to one or more output devices, in known fashion.
Each such program may be implemented in any desired computer language
(including
machine, assembly, or high level procedural, logical, or object oriented
programming
languages) to communicate with a computer system. In any case, the language
may be a
compiled or interpreted language.
Each such computer program is preferably stored on a storage media or device
(e.g.,
ROM, CD-ROM, or magnetic or optical media) readable by a general or special
purpose
programmable computer system, for configuring and operating the computer when
the storage
media or device is read by the computer system to perform the procedures
described herein.
The inventive system may also be considered to be implemented as a computer-
readable
storage medium, configured with a computer program, where the storage medium
so
configured causes a computer system to operate in a specific and predefined
manner to
perform the functions described herein.

42


CA 02486448 2004-11-17
WO 2004/012455 PCT/US2002/018884
Anumber of embodiments of the present invention have been described.
Nevertheless,
it will be understood that various modifications maybe made without departing
from the spirit
and scope of the invention. For example, while the preferred embodiment uses
MPEG-2
coding and decoding, the invention will work with any comparable standard that
provides
equivalents of I, B, and P frames and layers. Accordingly, it is to be
understood that the
invention is not to be limited by the specific illustrated embodiment, but
only by the scope of
the appended claims.

43

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 2012-01-24
(86) PCT Filing Date 2002-06-13
(87) PCT Publication Date 2004-02-05
(85) National Entry 2004-11-17
Examination Requested 2004-11-17
(45) Issued 2012-01-24
Deemed Expired 2020-08-31

Abandonment History

There is no abandonment history.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Request for Examination $800.00 2004-11-17
Registration of a document - section 124 $100.00 2004-11-17
Registration of a document - section 124 $100.00 2004-11-17
Registration of a document - section 124 $100.00 2004-11-17
Application Fee $400.00 2004-11-17
Maintenance Fee - Application - New Act 2 2004-06-14 $100.00 2004-11-17
Maintenance Fee - Application - New Act 3 2005-06-13 $100.00 2005-05-18
Maintenance Fee - Application - New Act 4 2006-06-13 $100.00 2006-05-19
Maintenance Fee - Application - New Act 5 2007-06-13 $200.00 2007-05-18
Maintenance Fee - Application - New Act 6 2008-06-13 $200.00 2008-05-21
Maintenance Fee - Application - New Act 7 2009-06-15 $200.00 2009-05-28
Maintenance Fee - Application - New Act 8 2010-06-14 $200.00 2010-05-18
Maintenance Fee - Application - New Act 9 2011-06-13 $200.00 2011-05-18
Final Fee $300.00 2011-11-04
Maintenance Fee - Patent - New Act 10 2012-06-13 $250.00 2012-05-17
Maintenance Fee - Patent - New Act 11 2013-06-13 $250.00 2013-05-17
Maintenance Fee - Patent - New Act 12 2014-06-13 $250.00 2014-06-09
Maintenance Fee - Patent - New Act 13 2015-06-15 $250.00 2015-06-08
Maintenance Fee - Patent - New Act 14 2016-06-13 $250.00 2016-06-06
Maintenance Fee - Patent - New Act 15 2017-06-13 $450.00 2017-06-12
Maintenance Fee - Patent - New Act 16 2018-06-13 $450.00 2018-06-11
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
DOLBY LABORATORIES LICENSING CORPORATION
Past Owners on Record
DEMOGRAFX, INC.
DEMOS, GARY A.
DOLBY LABORATORIES, INC.
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Abstract 2004-11-17 2 71
Claims 2004-11-17 12 504
Drawings 2004-11-17 12 161
Description 2004-11-17 43 2,643
Representative Drawing 2004-11-17 1 4
Cover Page 2005-01-31 1 48
Claims 2007-11-26 10 308
Description 2007-11-26 46 2,785
Claims 2010-10-08 12 421
Description 2010-10-08 52 3,061
Representative Drawing 2011-12-19 1 4
Cover Page 2011-12-19 2 52
PCT 2004-11-17 1 58
Assignment 2004-11-17 24 890
Correspondence 2005-02-04 1 23
Assignment 2005-07-29 1 43
Prosecution-Amendment 2006-03-24 1 40
Prosecution-Amendment 2007-05-24 6 298
Prosecution-Amendment 2007-04-16 1 35
Prosecution-Amendment 2007-11-26 19 697
Prosecution-Amendment 2008-04-14 3 100
Prosecution-Amendment 2008-10-01 5 202
Prosecution-Amendment 2009-02-24 3 90
Prosecution-Amendment 2009-02-06 2 49
Prosecution-Amendment 2010-04-08 3 102
Prosecution-Amendment 2009-08-24 3 141
Prosecution-Amendment 2010-10-08 27 1,114
Prosecution-Amendment 2011-05-26 2 81
Correspondence 2011-11-04 2 60
Assignment 2012-03-28 8 316