Note: Descriptions are shown in the official language in which they were submitted.
CA 02529310 2009-09-22
METHODS AND APPARATUS FOR EMBEDDING WATERMARKS
TECHNICAL FIELD
[0002] The present disclosure relates generally to media measurements, and
more
particularly, to methods and apparatus for embedding watermarks in a
compressed digital
data stream.
BACKGROUND
[00031 In modern television or radio broadcast stations, compressed digital
data
streams are typically used to carry video and/or audio data for transmission.
For example,
the Advanced Television Systems Committee (ATSC) standard for digital
television
(DTV) broadcasts in the United States adopted Moving Picture Experts Group
(MPEG)
standards (e.g., MPEG-1, MPEG-2, MPEG-3, MPEG-4, etc.) for carrying video
content
and Digital Audio Compression standards (e.g., AC-3, which is also known as
Dolby
Digital) for carrying audio content (i.e., ATSC Standard: Digital Audio
Compression
(AC-3), Revision A, August 2001). The AC-3 compression standard is based on a
perceptual digital audio coding technique that reduces the amount of data
needed to
reproduce the original audio signal while minimizing perceptible-distortion.
In particular,
the AC-3 compression standard recognizes that the human ear is unable to
perceive
1
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
changes in spectral energy at particular spectral frequencies that are smaller
than the
masking energy at those spectral frequencies. The masking energy is a
characteristic of
an audio segment dependent on the tonality and noise-like characteristic of
the audio
segment. Different known psycho-acoustic models may be used to determine the
masking energy at a particular spectral frequency. Further, the AC-3
compression
standard provides a multi-channel digital audio format (e.g., 5.1 channels
format) for
digital television (DTV), high definition television (HDTV), digital versatile
discs
(DVDs), digital cable, and satellite transmissions that enables the broadcast
of special
sound effects (e.g., surround sound).
[0004] Existing television or radio broadcast stations employ watermarking
techniques to embed watermarks within video and/or audio data streams
compressed in
accordance with compression standards such as the AC-3 compression standard
and the
MPEG Advanced Audio Coding (AAC) compression standard. Typically, watermarks
are digital data that uniquely identify broadcasters and/or programs.
Watermarks are
typically extracted using a decoding operation at one or more reception sites
(e.g.,
households or other media consumption sites) and, thus, may be used to assess
the
viewing behaviors of individual households and/or groups of households to
produce
ratings information.
[0005] However, many existing watermarking techniques are designed for use
with
analog broadcast systems. In particular, existing watermarking techniques
convert analog
program data to an uncompressed digital data stream, insert watermark data in
the
uncompressed digital data stream, and convert the watermarked data stream to
an analog
format prior to transmission. In the ongoing transition towards an all-digital
broadcast
environment in which compressed video and audio streams are transmitted by
broadcast
networks to local affiliates, watermark data may need to be embedded or
inserted directly
2
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
in a compressed digital data stream. Existing watermarking techniques may
decompress
the compressed digital data stream into time-domain samples, insert the
watermark data
into the time-domain samples, and recompress the watermarked time-domain
samples
into a watermarked compressed digital data stream. Such
decompression/compression
may cause degradation in the quality of the media content in the compressed
digital data
stream. Further, existing decompression/compression techniques require
additional
equipment and cause delay of the audio component of a broadcast in a manner
that, in
some cases, may be unacceptable. Moreover, the methods employed by local
broadcasting affiliates to receive compressed digital data streams from their
parent
networks and to insert local content through sophisticated splicing equipment
prevent
conversion of a compressed digital data stream to a time-domain (uncompressed)
signal
prior to recompression of the digital data streams.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is a block diagram representation of an example media monitoring
system.
[0007] FIG. 2 is a block diagram representation of an example watermark
embedding
system.
[0008] FIG. 3 is a block diagram representation of an example uncompressed
digital
data stream associated with the example watermark embedding system of FIG. 2.
[0009] FIG. 4 is a block diagram representation of an example embedding device
that
may be used to implement the example watermark embedding system of FIG. 2.
[0010] FIG. 5 depicts an example compressed digital data stream associated
with the
example embedding device of FIG. 4.
3
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[0011] FIG. 6 depicts an example quantization look-up table that maybe used to
implement the example watermark embedding system of FIG. 2.
[0012] FIG. 7 depicts another example uncompressed digital data stream that
may be
compressed and then processed using the example watermark embedding system of
FIG.
2.
[0013] FIG. 8 depicts an example compressed digital data stream associated
with the
example uncompressed digital data stream of FIG. 7.
[0014] FIG. 9 depicts one manner in which the example watermark embedding
system of FIG. 2 may be configured to embed watermarks.
[0015] FIG. 10 depicts one manner in which the modification process of FIG. 9
may
be implemented.
[0016] FIG. 11 depicts one manner in which a data frame may be processed.
[0017] FIG. 12 depicts one manner in which a watermark may be embedded in a
compressed digital data stream.
[0018] FIG. 13 depicts an example code frequency index table that may be used
to
implement the example watermark embedding system of FIG. 2.
[0019] FIG. 14 is a block diagram representation of an example processor
system that
maybe used to implement the example watermark embedding system of FIG. 2.
DETAILED DESCRIPTION
[0020] In general, methods and apparatus for embedding watermarks in
compressed
digital data streams are disclosed herein. The methods and apparatus disclosed
herein
may be used to embed watermarks in compressed digital data streams without
prior
decompression of the compressed digital data streams. As a result, the methods
and
apparatus disclosed herein eliminate the need to subject compressed digital
data streams
4
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
to multiple decompression/compression cycles, which are typically unacceptable
to, for
example, affiliates of television broadcast networks because multiple
decompression/compression cycles may significantly degrade the quality of
media
content in the compressed digital data streams.
[00211 Prior to broadcast, for example, the methods and apparatus disclosed
herein
may be used to unpack the modified discrete cosine transform (MDCT)
coefficient sets
associated with a compressed digital data stream formatted according to a
digital audio
compression standard such as the AC-3 compression standard. The mantissas of
the
unpacked MDCT coefficient sets may be modified to embed watermarks that
imperceptibly augment the compressed digital data stream. Upon receipt of the
compressed digital data stream, a receiving device (e.g., a set top television
metering
device at a media consumption site) may extract the embedded watermark
information
from an uncompressed analog output such as, for example, output emanating from
speakers of a television set. The extracted watermark information may be used
to identify
the media sources and/or programs (e.g., broadcast stations) associated with
media
currently being consumed (e.g., viewed, listened to, etc.) at a media
consumption site. In
turn, the source and program identification information may be used in known
manners to
generate ratings information and/or any other information that may be used to
assess the
viewing behaviors associated with individual households and/or groups of
households.
[00221 Referring to FIG. 1, an example broadcast system 100 including a
service
provider 110, a television 120, a remote control device 125, and a receiving
device 130 is
metered using an audience measurement system. The components of the broadcast
system 100 may be coupled in any well-known manner. For example, the
television 120
is positioned in a viewing area 150 located within a household occupied by one
or more
people, referred to as household members 160, some or all of whom have agreed
to
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
participate in an audience measurement research study. The receiving device
130 may be
a set top box (STB), a video cassette recorder, a digital video recorder, a
personal video
recorder, a personal computer, a digital video disc player, etc. coupled to
the television
120. The viewing area 150 includes the area in which the television 120 is
located and
from which the television 120 maybe viewed by the one or more household
members
160 located in the viewing area 150.
[0023] In the illustrated example, a metering device 140 is configured to
identify
viewing information based on video/audio output signals conveyed from the
receiving
device 130 to the television 120. The metering device 140 provides this
viewing
information as well as other tuning and/or demographic data via a network 170
to a data
collection facility 180. The network 170 may be implemented using any desired
combination of hardwired and wireless communication links, including for
example, the
Internet, an Ethernet connection, a digital subscriber line (DSL), a telephone
line, a
cellular telephone system, a coaxial cable, etc. The data collection facility
180 maybe
configured to process and/or store data received from the metering device 140
to produce
ratings information.
[0024] The service provider 110 may be implemented by any service provider
such
as, for example, a cable television service provider 112, a radio frequency
(RF) television
service provider 114, and/or a satellite television service provider 116. The
television
120 receives a plurality of television signals transmitted via a plurality of
channels by the
service provider 110 and may be adapted to process and display television
signals
provided in any format such as a National Television Standards Committee
(NTSC)
television signal format, a high definition television (HDTV) signal format,
an Advanced
Television Systems Committee (ATSC) television signal format, a phase
alternation line
(PAL) television signal format, a digital video broadcasting (DVB) television
signal
6
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
format, an Association of Radio Industries and Businesses (ARIB) television
signal
format, etc.
[0025] The user-operated remote control device 125 allows a user (e.g., the
household
member 160) to cause the television 120 to tune to and receive signals
transmitted on a
desired channel, and to cause the television 120 to process and present or
deliver the
programming or media content contained in the signals transmitted on the
desired
channel. The processing performed by the television 120 may include, for
example,
extracting a video and/or an audio component delivered via the received
signal, causing
the video component to be displayed on a screen/display associated with the
television
120, and causing the audio component to be emitted by speakers associated with
the
television 120. The programming content contained in the television signal may
include,
for example, a television program, a movie, an advertisement, a video game, a
web page,
a still image, and/or a preview of other programming content that is currently
offered or
will be offered in the future by the service provider 110.
[0026] While the components shown in FIG. 1 are depicted as separate
structures
within the broadcast system 100, the functions performed by some of these
structures may
be integrated within a single unit or may be implemented using two or more
separate
components. For example, although the television 120 and the receiving device
130 are
depicted as separate structures, the television 120 and the receiving device
130 maybe
integrated into a single unit (e.g., an integrated digital television set). In
another example,
the television 120, the receiving device 130, and/or the metering device 140
may be
integrated into a single unit.
[0027] To assess the viewing behaviors of individual household members 160
and/or
groups of households, a watermark embedding system (e.g., the watermark
embedding
system 200 of FIG. 2) may encode watermarks that uniquely identify
broadcasters and/or
7
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
programs in the broadcast signals from the service providers 110. The
watermark
embedding system may be implemented at the service provider 110 so that each
of the
plurality of media signals (e.g., television signals) transmitted by the
service provider 110
includes one or more watermarks. Based on selections by the household members
160,
the receiving device 130 may tune to and receive media signals transmitted on
a desired
channel and cause the television 120 to process and present the programming
content
contained in the signals transmitted on the desired channel. The metering
device 140 may
identify watermark information based on video/audio output signals conveyed
from the
receiving device 130 to the television 120. Accordingly, the metering device
140 may
provide this watermark information as well as other tuning and/or demographic
data to
the data collection facility 180 via the network 170.
[00281 In FIG. 2, an example watermark embedding system 200 includes an
embedding device 210 and a watermark source 220. The embedding device 210 is
configured to insert watermark information 230 from the watermark source 220
into a
compressed digital data stream 240. The compressed digital data stream 240 may
be
compressed according to an audio compression standard such as the AC-3
compression
standard and/or the MPEG-AAC compression standard, either of which may be used
to
process blocks of an audio signal using a predetermined number of digitized
samples
from each block. The source of the compressed digital data stream 240 (not
shown) may
be sampled at a rate of, for example, 48 kilohertz (kHz) to form audio blocks
as described
below.
[00291 Typically, audio compression techniques such as those based on the AC-3
compression standard use overlapped audio blocks and the MDCT algorithm to
convert
an audio signal into a compressed digital data stream (e.g., the compressed
digital data
stream 240 of FIG. 2). Two different block sizes (i.e., short and long blocks)
may be used
8
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
depending on the dynamic characteristics of the sampled audio signal. For
example, AC-
3 short blocks maybe used to minimize pre-echo for transient segments of the
audio
signal and AC-3 long blocks may be used to achieve high compression gain for
non-
transient segments of the audio signal. In accordance with the AC-3
compression
standard an AC-3 long block corresponds to a block of 512 time-domain audio
samples,
whereas an AC-3 short block corresponds to 256 time-domain audio samples.
Based on
the overlapping structure of the MDCT algorithm used in the AC-3 compression
standard,
in the case of the AC-3 long block, the 512 time-domain samples are obtained
by
concatenating a preceding (old) block of 256 time-domain samples and a current
(new)
block of 256 time-domain samples to create an audio block of 512 time-domain
samples.
The AC-3 long block is then transformed using the MDCT algorithm to generate
256
transform coefficients. In accordance with the same standard, an AC-3 short
block is
similarly obtained from a pair of consecutive time-domain sample blocks of
audio. The
AC-3 short block is then transformed using the MDCT algorithm to generate 128
transform coefficients. The 128 transform coefficients corresponding to two
adjacent
short blocks are then interleaved to generate a set of 256 transform
coefficients. Thus,
processing of either AC-along or AC-3 short blocks results in the same number
of
MDCT coefficients. In accordance with the MPEG-AAC compression standard as
another example, a short block contains 128 samples and a long block contains
1024
samples.
[00301 In the example of FIG. 3, an uncompressed digital data stream 300
includes a
plurality of 256-sample time-domain audio blocks 310, generally shown as A0,
Al, A2,
A3, A4, and AS. The MDCT algorithm processes the audio blocks 310 to generate
MDCT coefficient sets 320, shown by way of example as MAO, MA1, MA2, MA3, MA4,
and MA5 (where MA5 is not shown). For example, the MDCT algorithm may process
9
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
the audio blocks AO and Al to generate the MDCT coefficient set MAO. The audio
blocks AO and Al are concatenated to generate a 512-sample audio block (e.g.,
an AC-3
long block) that is MDCT transformed using the MDCT algorithm to generate the
MDCT
coefficient set MAO which includes 256 MDCT coefficients. Similarly, the audio
blocks
Al and A2 may be processed to generate the MDCT coefficient set MA1. Thus, the
audio block Al is an overlapping audio block because it is used to generate
both MDCT
coefficient sets MAO and MAI. In a similar manner, the MDCT algorithm is used
to
transform the audio blocks A2 and A3 to generate the MDCT coefficient set MA2,
the
audio blocks A3 and A4 to generate the MDCT coefficient set MA3, the audio
blocks A4
and AS to generate the MDCT coefficient set MA4, etc. Thus, the audio block A2
is an
overlapping audio block used to generate the MDCT coefficient sets MA1 and
MA2, the
audio block A3 is an overlapping audio block used to generate the MDCT
coefficient sets
MA2 and MA3, the audio block A4 is an overlapping audio block used to generate
the
MDCT coefficient sets MA3 and MA4, etc. Together, the MDCT coefficient sets
320
form the compressed digital data stream 240.
[0031] As described in detail below, the embedding device 210 of FIG. 2 may
embed
or insert the watermark information or watermark 230 from the watermark source
220
into the compressed digital data stream 240. The watermark 230 may be used,
for
example, to uniquely identify broadcasters and/or programs so that media
consumption
information (e.g., viewing information) and/or ratings information may be
produced.
Accordingly, the embedding device 210 produces a watermarked compressed
digital data
stream 250 for transmission.
[0032] In the example of FIG. 4, the embedding device 210 includes an
identifying
unit 410, an unpacking unit 420, a modification unit 430, and a repacking unit
440.
While the operation of the embedding device 210 is described below in
accordance with
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
the AC-3 compression standard, the embedding device 210 maybe implemented to
operate with additional or other compression standards such as, for example,
the MPEG-
AAC compression standard. The operation of the embedding device 210 is
described in
greater detail in connection with FIG. 5.
[0033] To begin, the identifying unit 410 is configured to identify one or
more frames
510 associated with the compressed digital data stream 240, a portion of which
is shown
by way of example as Frame A and Frame B in FIG. 5. As mentioned previously,
the
compressed digital data stream 240 maybe a digital data stream compressed in
accordance with the AC-3 standard (hereinafter AC-3 data stream,,). While the
AC-3
data stream 240 may include multiple channels, for purposes of clarity, the
following
example describes the AC-3 data stream 240 as including only one channel. In
the AC-3
data stream 240, each of the frames 510 includes a plurality of MDCT
coefficient sets
520. In accordance with the AC-3 compression standard, for example, each of
the frames
510 includes six MDCT coefficient sets (i.e., six -audblk"). For example,
Frame A
includes the MDCT coefficient sets MAO, MA I, MA2, MA3, MA4 and MA5 and Frame
B includes the MDCT coefficient sets MBO, MB I, M132, M133, M134 and MB5.
[0034] The identifying unit 410 is also configured to identify header
information
associated with each of the frames 510, such as, for example, the number of
channels
associated with the AC-3 data stream 240. While the example AC-3 data stream
240
includes only one channel as noted above, an example compressed digital data
stream
having multiple channels is described below in connection with FIGS. 7 and 8.
[0035] Returning to FIG. 5, the unpacking unit 420 is configured to unpack the
MDCT coefficient sets 520 to determine compression information such as, for
example,
the parameters of the original compression process (i.e., the manner in which
an audio
compression technique compressed an audio signal or audio data to form the
compressed
11
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
digital data stream 240). For example, the unpacking unit 420 may determine
how many
bits are used to represent each of the MDCT coefficients within the MDCT
coefficient
sets 520. Additionally, compression parameters may include information that
limits the
extent to which the AC-3 data stream 240 may be modified to ensure that the
media
content conveyed via the AC-3 data stream 240 is of a sufficiently high
quality level. The
embedding device 210 subsequently uses the compression information identified
by the
unpacking unit 420 to embed/insert the desired watermark information 230 into
the AC-3
data stream 240 thereby ensuring that the watermark insertion is performed in
a manner
consistent with the compression information supplied in the signal.
[00361 As described in detail in the AC-3 compression standard, the
compression
information also includes a mantissa and an exponent associated with each MDCT
coefficient. The AC-3 compression standard employs techniques to reduce the
number of
bits used to represent each MDCT coefficient. Psycho-acoustic masking is one
factor that
maybe utilized by these techniques. For example, the presence of audio energy
Ek either
at a particular frequency k (e.g., a tone) or spread across a band of
frequencies proximate
to the particular frequency k (e.g., a noise-like characteristic) creates a
masking effect.
That is, the human ear is unable to perceive a change in energy in a spectral
region either
at a frequency k or spread across the band of frequencies proximate to the
frequency k if
that change is less than a given energy threshold AEk. Because of this
characteristic of the
human ear, an MDCT coefficient ink associated with the frequency k maybe
quantized
with a step size related to DEL, without risk of causing any humanly
perceptible changes to
the audio content. For the AC-3 data stream 240, each MDCT coefficient Mk is
represented as a mantissa Mk and an exponent Xk such that Mk = Mk.2 k. The
number of
bits used to represent the mantissa Mk of each MDCT coefficient of the MDCT
coefficient
sets 520 may be determined based on known quantization look-up tables
published in the
12
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
AC-3 compression standard (e.g., the quantization look-up table 600 of FIG.
6). In the
example of FIG. 6, the quantization look-up table 600 provides mantissa codes
or bit
patterns and corresponding mantissa values for MDCT coefficients represented
by a four-
bit number. As described in detail below, the mantissa Mk; may be changed
(e.g.,
augmented) to represent a modified value of an MDCT coefficient to embed a
watermark
in the AC-3 data stream 240.
(0037] Returning to FIG. 5, the modification unit 430 is configured to perform
an
inverse transform of each of the MDCT coefficient sets 520 to generate time-
domain
audio blocks 530, shown by way of example as TAO', TA3 ", TA4', TA4 ", TA5',
TA5 ",
TBO', TBO", TB 1', TB1 ", and TB5' (TAO" through TA3' and TB2' through TB4"
are not
shown). The modification unit 430 performs inverse transform operations to
generate
sets of previous (old) time-domain audio blocks (which are represented as
prime blocks)
and sets of current (new) time-domain audio blocks (which are represented as
double-
prime blocks) associated with the 256-sample time-domain audio blocks that
were
concatenated to form the MDCT coefficient sets 520 of the AC-3 data stream
240. For
example, the modification unit 430 performs an inverse transform on the MDCT
coefficient set MA5 to generate time-domain blocks TA4" and TA5', the MDCT
coefficient set MBO to generate TA5 " and TBO', the MDCT coefficient set MB 1
to
generate TBO" and TB 1', etc. In this manner, the modification unit 430
generates
reconstructed time-domain audio blocks 540, which provide a reconstruction of
the
original time-domain audio blocks that were compressed to form the AC-3 data
stream
240. To generate the reconstructed time-domain audio blocks 540, the
modification unit
430 may add time-domain audio blocks based on, for example, the known Princen-
Bradley time domain alias cancellation (TDAC) technique as described in
Princen et al.,
Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing
Cancellation,
13
CA 02529310 2009-09-22
Institute of Electrical and Electronics Engineers (IEEE) Transactions on
Acoustics,
Speech and Signal Processing, Vol. ASSP-35, No. 5, pp. 1153 - 1161 (1996). For
example, the modification unit 430 may reconstruct the time-domain audio block
TA5
(i.e., TA5R) by adding the prime time-domain audio block TA5' and the double-
prime
time-domain audio block TA5 R using the Princen-Bradley TDAC technique.
Likewise,
the modification unit 430 may reconstruct the time-domain audio block TBO
(i.e., TBOR)
by adding the prime audio block TBO' and the double-prime audio block TBO 11
using the
Princen-Bradley TDAC technique. In this manner, the original time-domain audio
blocks
used to form the AC-3 data stream 240 are reconstructed to enable the
watermark 230 to
be embedded or inserted directly into the AC-3 data stream 240.
[0038] The modification unit 430 is also configured to insert the watermark
230 into
the reconstructed time-domain audio blocks 540 to generate watermarked time-
domain
audio blocks 550, shown by way of example as TAOW, TA4W, TAW, TBOW, TB1W
and. TB5W (blocks TA1W, TA2W, TA3W, TB2W, TB3W and TB4W are not shown).
To insert the watermark 230, the modification unit 430 generates a modifiable
time-
domain audio block by concatenating two adjacent reconstructed time-domain
audio
blocks to create a 512-sample audio block. For example, the modification unit
430 may
concatenate the reconstructed time-domain audio blocks TA5R and TBOR (each
being a
256-sample audio block) to form a 512-sample audio block. The modification
unit 430
may then insert the watermark 230 into the 512-sample audio block formed by
the
reconstructed time-domain audio blocks TA5R and TBOR to generate the
watermarked
time-domain audio blocks TA5W and TBOW. Encoding processes such as those
described in U.S. Patent Nos. 6,272,176, 6,504,870, and 6,621,881 may be used
to insert
the watermark 230 into the reconstructed time-domain audio blocks 540.
14
CA 02529310 2009-09-22
[0039] In the example encoding methods and apparatus described in U.S. Patent
Nos.
6,272,176, 6,504,870, and 6,621,881, watermarks maybe inserted into a 512-
sample
audio block. For example, each 512-sample audio block carries one bit of
embedded or
inserted data of the watermark 230. In particular, spectral frequency
components with
indices f1 and f2 may be modified or augmented to insert data bits associated
with the
watermark 230. To insert a binary -1," for example, a power at the first
spectral
frequency associated with the index f1 may be increased or augmented to be a
spectral
power maximum within a frequency neighborhood (e.g., a frequency neighborhood
defined by the indices fl - 2, fj N 1, fi, fl + 1, and fl + 2). At the same
time, the power at
the second spectral frequency associated with the index f2 is attenuated or
augmented to
be a spectral power minimum within a frequency neighborhood (e.g., a frequency
neighborhood defined by the indices f2 N2, f2 N 1, f2, f2 + 1, and f2 + 2).
Conversely, to
insert a binary 0, " the power at the first spectral frequency associated with
the index ft is
attenuated to be a local spectral power minimum while the power at the second
spectral
frequency associated with the index f2 is increased to a local spectral power
maximum.
[0040] Returning to FIG. 5, based on the watermarked time-domain audio blocks
550,
the modification unit 430 generates watermarked MDCT coefficient sets 560,
shown by
way of example as MAOW, MA4W, MA5W, MBOW and MB5W (blocks MA1W,
MA2W, MA3W, MBIW, MB2W, MB3W and MB4W are not shown). Following the
example described above, the modification unit 430 generates the watermarked
MDCT
coefficient set MA5W based on the watermarked time-domain audio blocks TASW
and
TBOW. Specifically, the modification unit 430 concatenates the watermarked
time-
domain audio blocks TA5W.and TBOW to form a 512-sample audio block and
converts
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
the 512-sample audio block into the watermarked MDCT coefficient set MA5W
which,
as described in greater detail below, may be used to modify the original MDCT
coefficient set MA5.
[0041] The difference between the MDCT coefficient sets 520 and the
watermarked
MDCT coefficient sets 560 represents a change in the AC-3 data stream 240 as a
result of
embedding or inserting the watermark 230. As described in connection with FIG.
6, for
example, the modification unit 430 may modify the mantissa values in the MDCT
coefficient set MA5 based on the differences between the coefficients in the,
corresponding watermarked MDCT coefficient set MA5W and the coefficients in
the
original MDCT coefficient set MA5. Quantization look-up tables (e.g., the look-
up table
600 of FIG. 6) may be used to determine new mantissa values associated with
the MDCT
coefficients of the watermarked MDCT coefficient sets 560 to replace the old
mantissa
values associated with the MDCT coefficients of the MDCT coefficient sets 520.
Thus,
the new mantissa values represent the change in or augmentation of the AC-3
data stream
240 as a result of embedding or inserting the watermark 230. It is important
to note that,
in this example implementation, the exponents of the MDCT coefficients are not
changed.
Changing the exponents might require that the underlying compressed signal
representation be recomputed, thereby requiring the compressed signal to
undergo a true
decompression / compression cycle. If a modification of only the mantissa is
insufficient
to fully account for the difference between a watermarked and an original MDCT
coefficient, the affected MDCT mantissa is set to a maximum or minimum value,
as
appropriate. The redundancy included in the watermarking process allows the
correct
watermark to be decoded in the presence of such an encoding restriction.
[0042] Turning to FIG. 6, the example quantization look-up table 600 includes
mantissa codes and mantissa values for a fifteen-level quantization of an
example
16
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
mantissa Mk in the range of -0.9333 to +0.9333. While the example quantization
look-up
table 600 provides mantissa information associated with MDCT coefficients that
are
represented using four bits, the AC-3 compression standard provides
quantization look-up
tables associated with other suitable numbers of bits per MDCT coefficient. To
illustrate
one manner in which the modification unit 430 may modify a particular MDCT
coefficient ink with a mantissa Mk contained in the MDCT coefficient set MA5,
assume
the original mantissa value is -0.2666 (i.e., -4/15). Using the quantization
look-up table
600, the mantissa code corresponding to the particular MDCT coefficient ink in
the
MDCT coefficient set MA5 is determined to be 0101. The watermarked MDCT
coefficient set MA5W includes a watermarked MDCT coefficient weak with a
mantissa
value WMk.. Further, assume the new mantissa value of the corresponding
watermarked
MDCT coefficient wink of the watermarked MDCT coefficient set MA5W is -0.4300,
which lies between the mantissa codes of 0011 and 0100. In other words, the
watermark
230, in this example, results in a difference of -0.1667 between the original
mantissa
value of -0.2666 and the watermarked mantissa value of -0.4300.
[00431 To embed or insert the watermark 230 in the AC-3 data stream 240, the
modification unit 430 may use the watermarked MDCT coefficient set MA5W to
modify
or augment the MDCT coefficients in the MDCT coefficient set MA5. Continuing
with
above example, either mantissa code 0011 or mantissa code 0100 may replace the
mantissa code 0101 associated with the MDCT coefficient Mk because the
watermarked
mantissa WMk associated with the corresponding watermarked MDCT coefficient
wink
lies between the mantissa codes of 0011 and 0100 (because the mantissa value
corresponding to the watermarked MDCT coefficient wink is -0.4300). The
mantissa
value corresponding to the mantissa code 0011 is -0.5333 (i.e., -8/15) and the
mantissa
value corresponding to the mantissa code 0100 is -0.4 (i.e., -6/15). In this
example, the
17
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
modification unit 430 selects the mantissa code 0100 instead of the mantissa
code 0011 to
replace the original mantissa code 0101 associated with the MDCT coefficient
ink because
the mantissa value -0.4 corresponding to the mantissa code 0100 is closest to
the desired
watermark mantissa value -0.4300. As a result, the new mantissa bit pattern of
0100,
which corresponds to the watermarked mantissa WIkfk of the watermarked MDCT
coefficient weak, replaces the original mantissa bit pattern of 0101.
Likewise, each of the
MDCT coefficients in the MDCT coefficient set MA5 may be modified in the
manner
described above. If a watermarked mantissa value is outside the quantization
range of
mantissa values (i.e., greater than 0.9333 or less than -0.9333), either the
positive limit of
1110 or the negative limit of 0000 is selected as the new mantissa code, as
appropriate.
Additionally, and as discussed above, while the mantissa codes associated with
each
MDCT coefficient of an MDCT coefficient set may be modified as described
above, the
exponents associated with the MDCT coefficients remain unchanged.
[0044] The repacking unit 440 is configured to repack the watermarked MDCT
coefficient sets 560 associated with each frame of the AC-3 data stream 240
for
transmission. In particular, the repacking unit 440 identifies the position of
each MDCT
coefficient set within a frame of the AC-3 data stream 240 so that the
corresponding
watermarked MDCT coefficient set can be used to modify the MDCT coefficient
set. To
rebuild a watermarked version of Frame A, for example, the repacking unit 440
may
identify the position of and modify the MDCT coefficient sets MAO to MA5 based
on the
corresponding watermarked MDCT coefficient sets MAOW to MA5W in the
corresponding identified positions. Using the unpacking, modifying, and
repacking
processes described herein, the AC-3 data stream 240 remains a compressed
digital data
stream while the watermark 230 is embedded or inserted in the AC-3 data stream
240. As
a result, the embedding device 210 inserts the watermark 230 into the AC-3
data stream
18
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
240 without additional decompression/compression cycles that may degrade the
quality of
the media content in the AC-3 data stream 240.
[0045] For simplicity, the AC-3 data stream 240 is described in connection
with FIG.
to include a single channel. However, the methods and apparatus disclosed
herein may
be applied to compressed digital data streams having audio blocks associated
with
multiple channels, such as 5.1 channels (i.e., five full-bandwidth channels),
as described
below. In the example of FIG. 7, an uncompressed digital data stream 700 may
include a
plurality of audio block sets 710. Each of the audio block sets 710 may
include audio
blocks associated with multiple channels 720 and 730 including, for example, a
front left
channel, a front right channel, a center channel, a surround left channel, a
surround right
channel, and a low-frequency effect (LFE) channel (e.g., a sub-woofer
channel). For
example, the audio block set AUDO includes an audio block AOL associated with
the
front left channel, an audio block AOR associated with the front right
channel, an audio
block AOC associated with the center channel, an audio block AOSL associated
with the
surround left channel, an audio block AOSR associated with the surround right
channel,
and an audio block AOLFE associated with the LFE channel. Similarly, the audio
block
set AUDI includes an audio block A 1 L associated with the front left channel,
an audio
block A1R associated with the front right channel, an audio block A I C
associated with
the center channel, an audio block Al SL associated with the surround left
channel, an
audio block Al SR associated with the surround right channel, and an audio
block Al LFE
associated with the LFE channel.
[0046] Each of the audio blocks associated with a particular channel in the
audio
block sets 710 may be processed in a manner similar to that described above in
connection with FIGS. 5 and 6. For example, the audio blocks associated with
the center
channel 810 of FIG. 8, shown by way of example as AOC, Al C, A2C, and A3C,
maybe
19
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
transformed to generate the MDCT coefficient sets 820 associated with a
compressed
digital data stream 800. As noted above, each of the MDCT coefficient sets 820
maybe
derived from a 512-sample audio block formed by concatenating a preceding
(old) 256-
sample audio block and a current (new) 256-sample audio block. The MDCT
algorithm
may then process the time-domain audio blocks 810 (e.g., AOC through A5C) to
generate
the MDCT coefficient sets (e.g., MOC through M5C).
[0047) Based on the MDCT coefficient sets 820 of the compressed digital data
stream
800, the identifying unit 410 identifies a plurality of frames (not shown) and
header
information associated with each of the frames as described above. The header
information includes compression information associated with the compressed
digital
data stream 800. For each of the frames, the unpacking unit 420 unpacks the
MDCT
coefficient sets 820 to determine the compression information associated with
the MDCT
coefficient sets 820. For example, the unpacking unit 420 may identify the
number of
bits used by the original compression process to represent the mantissa of
each MDCT
coefficient in each of the MDCT coefficient sets 820. Such compression
information may
be used to embed the watermark 230 as described above in connection with FIG.
6. The
modification unit 430 then generates inverse transformed time-domain audio
blocks 830,
shown by way of example as TAOC", TA1C', TA1C", TA2C', TA2C", and TA3C'. The
time-domain audio blocks 830 include a set of previous (old) time-domain audio
blocks
(which are represented as prime blocks) and a set of current (new) time-domain
audio
blocks (which are represented as double-prime blocks). By adding the
corresponding
prime blocks and double-prime blocks based on, for example, the Princen-
Bradley TDAC
technique, original time-domain audio blocks compressed to form the AC-3
digital data
stream 800 maybe reconstructed (i.e., the reconstructed time-domain audio
blocks 840).
For example, the modification unit 430 may add the time-domain audio blocks
TAI C'
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
and TA1 C" to reconstruct the time-domain audio block TAI C (i.e., TA1 CR).
Likewise,
the modification unit 430 may add the time-domain audio blocks TA2C' and TAX
11 to
reconstruct the time-domain audio block TA2C (i.e., TA2CR).
[0048] To insert the watermark 230 from the watermark source 220, the
modification
unit 430 concatenates two adjacent reconstructed time-domain audio blocks to
create a
512-sample audio block (i.e., a modifiable time-domain audio block). For
example, the
modification unit 430 may concatenate the reconstructed time-domain audio
blocks
TA1 CR and TA2CR, each of which is a 256-sample short block, to form a 512-
sample
audio block. The modification unit 430 then inserts the watermark 230 into the
512-
sample audio block formed by the reconstructed time-domain audio blocks TA1CR
and
TA2CR to generate the watermarked time-domain audio blocks TA1CW and TA2CW.
[0049] Based on the watermarked time-domain audio blocks 850, the modification
unit 430 may generate the watermarked MDCT coefficient sets 860. For example,
the
modification unit 430 may concatenate the watermarked time-domain audio blocks
TA1CW and TA2CW to generate the watermarked MDCT coefficient set M1CW. The
modification unit 430 modifies the MDCT coefficient sets 820 based on a
corresponding
one of the watermarked MDCT coefficient sets 860. For example, the
modification unit
430 may use the watermarked MDCT coefficient set M1CW to modify the original
MDCT coefficient set M1 C. The modification unit 430 may then repeat the
process
described above for the audio blocks associated with each channel to insert
the watermark
230 into the compressed digital data stream 800.
[0050] FIG. 9 is a flow diagram depicting one manner in which the example
watermark embedding system of FIG. 2 may be configured to embed or insert
watermarks
in a compressed digital data stream. The example process of FIG. 9 maybe
implemented
as machine accessible instructions utilizing any of many different programming
codes
21
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
stored on any combination of machine-accessible media such as a volatile or
nonvolatile
memory or other mass storage device (e.g., a floppy disk, a CD, and a DVD).
For
example, the machine accessible instructions may be embodied in a machine-
accessible
medium such as a programmable gate array, an application specific integrated
circuit
(ASIC), an erasable programmable read only memory (EPROM), a read only memory
(ROM), a random access memory (RAM), a magnetic media, an optical media,
and/or
any other suitable type of medium. Further, although a particular order of
actions is
illustrated in FIG. 9, these actions can be performed in other temporal
sequences. Again,
the flow diagram 900 is merely provided and described in connection with the
components of FIGS. 2 to 5 as an example of one way to configure a system to
embed
watermarks in a compressed digital data stream.
[0051] In the example of FIG. 9, the process begins with the identifying unit
410
(FIG. 4) identifying a frame associated with the compressed digital data
stream 240 (FIG.
2) such as Frame A (FIG. 5) (block 910). The identified frame may include a
plurality of
MDCT coefficient sets formed by overlapping and concatenating a plurality of
audio
blocks. In accordance with the AC-3 compression standard, for example, a frame
may
include six MDCT coefficient sets (i.e., six audblk"). Further, the
identifying unit 410
(FIG. 4) also identifies header information associated with the frame (block
920). For
example, the identifying unit 410 may identify the number of channels
associated with the
compressed digital data stream 240.
[0052] The unpacking unit 420 then unpacks the plurality of MDCT coefficient
sets
to determine compression information associated with the original compression
process
used to generate the compressed digital data stream 240 (block 930). In
particular, the
unpacking unit 420 identifies the mantissa Mk and the exponent Xk of each MDCT
coefficient Mk of each of the MDCT coefficient sets. The exponents of the MDCT
22
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
coefficients may then be grouped in a manner compliant with the AC-3
compression
standard. The unpacking unit 420 (FIG. 4) also determines the number of bits
used to
represent the mantissa of each of the MDCT coefficients so that a suitable
quantization
look-up table specified by the AC-3 compression standard may be used to modify
or
augment the plurality of MDCT coefficient sets as described above in
connection with
FIG. 6. Control then proceeds to block 940 which is described in greater
detail below in
connection with FIG. 10.
[0053] As illustrated in FIG. 10, the modification process 940 begins by using
the
modifying unit 430 (FIG. 4) to perform an inverse transform of the MDCT
coefficient
sets to generate inverse transformed time-domain audio blocks (block 1010). In
particular, the modification unit 430 generates a previous (old) time-domain
audio block
(which, for example, is represented as a prime block in FIG. 5) and a current
(new) time-
domain audio block (which is represented as a double-prime block in FIG. 5)
associated
with each of the 256-sample original time-domain audio blocks used to generate
the
corresponding MDCT coefficient set. As described in connection with FIG. 5,
for
example, the modification unit 430 may generate TA4" and TA5' from the MDCT
coefficient set MA5, TA5"and TBO' from the MDCT coefficient set MBO, and TBO"
and
TBl' from the MDCT coefficient set MBI. For each time-domain audio block, the
modification unit 430 adds corresponding prime and double-prime blocks to
reconstruct
the time-domain audio block based on, for example, the Princen-Bradley TDAC
technique (block 1020). Following the above example, the prime block TA5' and
the
double-prime block TA5"maybe added to reconstruct the time-domain audio block
TA5
(i.e., the reconstructed time-domain audio block TA5R) while the prime block
TBO' and
the double-prime block TBO"may be added to reconstruct the time-domain audio
block
TBO (i.e., the reconstructed time-domain audio block TBOR).
23
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[0054] To insert the watermark 230, the modification unit 430 generates
modifiable
time-domain audio blocks using the reconstructed time-domain audio blocks
(block
1030). The modification unit 430 generates a modifiable 512-sample time-domain
audio
block using two adjacent reconstructed time-domain audio blocks. For example,
the
modification unit 430 may generate a modifiable time-domain audio block by
concatenating the reconstructed time-domain audio blocks TA5R and TBOR of FIG.
5.
[0055] Implementing an encoding process such as, for example, one or more of
the
encoding methods and apparatus described in U.S. Patent Nos. 6,272,176,
6,504,870,
and/or 6,621,881, the modification unit 430 inserts the watermark 230 from the
watermark source 220 into the modifiable time-domain audio blocks (block
1040). For
example, the modification unit 430 may insert the watermark 230 into the 512-
sample
time-domain audio block generated using the reconstructed time-domain audio
blocks
TA5R and TBOR to generate the watermarked time-domain audio blocks TA5W and
TBOW. Based on the watermarked time-domain audio blocks and the compression
information, the modification unit 430 generates watermarked MDCT coefficient
sets
(block 1050). As noted above, two watermarked time-domain audio blocks, where
each
block includes 256 samples, may be used to generate a watermarked MDCT
coefficient
set. For example, the watermarked time-domain audio blocks TA5W and TBOW may
be
concatenated and then used to generate the watermarked MDCT coefficient set
MA5W.
[0056] Based on the compression information associated with the compressed
digital
data stream 240, the modification unit 430 calculates the mantissa value
associated with
each of the watermarked MDCT coefficients in the watermarked MDCT coefficient
set
MA5W as described above in connection with FIG. 6. In this manner, the
modification
unit 430 can modify or augment the original MDCT coefficient sets using the
watermarked MDCT coefficient sets to embed or insert the watermark 230 in the
24
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
compressed digital data stream 240 (block 1060). Following the above example,
the
modification unit 430 may replace the original MDCT coefficient set MA5 based
on the
watermarked MDCT coefficient set MA5W of FIG, 5. For example, the modification
unit 430 may replace an original MDCT coefficient in the MDCT coefficient set
MA5
with a corresponding watermarked MDCT coefficient (which has an augmented
mantissa
value) from the watermarked MDCT coefficient set MA5W. Alternatively, the
modification unit 430 may compute the difference between the mantissa codes
associated
with the original MDCT coefficient and the corresponding watermarked MDCT
coefficient (i.e., AMk = Mk - WIVIk) and modify the original MDCT coefficient
based on
the difference AMk. In either case, after modifying the original MDCT
coefficient sets,
the modification process 940 terminates and returns control to block 950.
[0057] Referring back to FIG. 9, the repacking unit 440 repacks the frame of
the
compressed digital data stream (block 950). The repacking unit 440 identifies
the
position of the MDCT coefficient sets within the frame so that the modified
MDCT
coefficient sets may be substituted in the positions of the original MDCT
coefficient sets
to rebuild the frame. At block 960, if the embedding device 210 determines
that
additional frames of the compressed digital data stream 240 need to be
processed, then
control returns to block 910. If, instead, all frames of the compressed
digital data stream
240 have been processed, then the process 900 terminates.
[0058] As noted above, known watermarking techniques typically decompress a
compressed digital data stream into uncompressed time-domain samples, insert
the
watermark into the time-domain samples, and recompress the watermarked time-
domain
samples into a watermarked compressed digital data stream. In contrast, the
digital data
stream 240 remains compressed during the example unpacking, modifying, and
repacking
processes described herein. As a result, the watermark 230 is embedded into
the
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
compressed digital data stream 240 without additional
decompression/compression cycles
that may degrade the quality of the content in the compressed digital data
stream 500.
[0059) To further illustrate the example modification process 940 of FIGS. 9
and 10,
FIG. 11 depicts one manner in which a data frame (e.g., an AC-3 frame) may be
processed. The example frame processing process 1100 begins with the embedding
device 210 reading the header information of the acquired frame (e.g., an AC-3
frame)
(block 1110) and initializing an MDCT coefficient set count to zero (block
1120). In the
case where an AC-3 frame is being processed, each AC-3 frame includes six MDCT
coefficient sets having compressed-domain data (e.g., MAO, MAI, MA2, MA3, MA4
and
MA5 of FIG. 5, which are also known as ''audblks" in the AC-3 standard).
Accordingly,
the embedding device 210 determines whether the MDCT coefficient set count is
equal to
six (block 1130). If the MDCT coefficient set count is not yet equal to six,
thereby
indicating that at least one more MDCT coefficient set requires processing the
embedding
device 210 extracts the exponent (block 1140) and the mantissa (block 1150)
associated
with an MDCT coefficient of the frame (e.g., the original mantissa Mk
described above in
connection with FIG. 6). The embedding device 210 computes a new mantissa
associated
with a code symbol read at block 1220 (e.g., the new mantissa WMk described
above in
connection with FIG. 6) (block 1160) and modifies the original mantissa
associated with
the frame based on the new mantissa (block 1170). For example, the original
mantissa
may be modified based on the difference between the new mantissa and the
original
mantissa (but limited within the range associated with the bit representation
of the
original mantissa). The embedding device 210 increments the MDCT coefficient
set
count by one (block 1180) and control returns to block 1130. Although the
example
process of FIG. 11 is described above to include six MDCT coefficient sets
(e.g., the
threshold of the MDCT coefficient set count is six), a process utilizing more
or fewer
26
CA 02529310 2009-09-22
MDCT coefficient sets could be used instead. At block 1130, if the MDCT
coefficient set
count is equal to six, then all MDCT coefficient sets have been processed such
that the
watermark has been embedded and the embedding device 210 repacks the flame
(block
1190).
[00601 As noted above, many methods are known to embed a watermark
imperceptible to the human ear (e.g., an inaudible code) in an uncompressed
audio signal.
For example, one known method is described in U.S. Patent No. 6,421,445 to
Jensen et
al. In
particular, as described by Jensen et al., a code signal (e.g., a watermark)
may include
information at a combination of ten different frequencies, which are
detectable by a
decoder using a Fourier spectral analysis of a sequence of audio samples
(e.g., a sequence
of 12,288 audio samples as described in detail below). For example, an audio
signal may
be sampled at a rate of 48 kilo-Hertz (kHz) to output an audio sequence of
12,288 audio
samples that may be processed (e.g., using a Fourier transform) to acquire a
relatively
high-resolution (e.g., 3.9 Hz) frequency domain representation of the
uncompressed audio
signal. However, in accordance with the encoding process of the method
disclosed by
Jensen et al., a sinusoidal code signal having constant amplitude across an
entire sequence
of audio samples is unacceptable because the sinusoidal code signal may be
perceptible to
the human ear. To satisfy the masking energy constraints (i.e., to ensure that
the
sinusoidal code signal information remains imperceptible), the sinusoidal code
signal is
synthesized across the entire sequence of 12,288 audio samples using a masking
energy
analysis which determines a local sinusoidal amplitude within each block of
audio
samples (e.g., wherein each block of audio samples may include 512 audio
samples).
Thus, the local sinusoidal waveforms may be coherent (in-phase) across the
sequence of
12,288 audio samples but have varying amplitudes based on the masking energy
analysis.
27
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[00611 However, in contrast to the method disclosed by Jensen et al., the
methods and
apparatus described herein may be used to embed a watermark or other code
signal in a
compressed audio signal in a manner such that a compressed digital data stream
containing the compressed audio signal remains compressed during the
unpacking,
modifying, and repacking processes. FIG. 12 depicts one manner in which a
watermark,
such as that disclosed by Jensen et al., may be inserted in a compressed audio
signal. The
example process 1200 begins with initializing a frame count to zero (block
1210). Eight
frames (e.g., AC-3 frames) representing a total of 12,288 audio samples of
each audio
channel may be processed to embed one or more code symbols (e.g., one or more
of the
symbols -0", -1 and -E" shown in FIG. 13 and described in Jensen, et al.) into
the
audio signal. Although the compressed digital data stream is described herein
to include
12,288 audio samples, the compressed digital data stream may have more or less
audio
samples. The embedding device 210 (FIG. 2) may read a watermark 230 from the
watermark source 220 to inject one or more code symbols into the sequence of
frames
(block 1220). The embedding device 210 may acquire one of the frames (block
1230)
and proceed to the frame processing operation 1100 described above to process
the
acquired frame. Accordingly, the example frame processing operation 1100
terminates
and control returns to block 1250 to increment the frame count by one. The
embedding
device 210 determines whether the frame count is eight (block 1260). If the
frame count
is not eight, the embedding device 210 returns to acquire another frame in the
sequence
and repeat the example frame processing operation 1100 as described above in
connection
with FIG. 11 to process another frame. If, instead, the frame count is eight,
the
embedding device 210 returns to block 1210 to reinitialize the frame count to
zero and
repeat the process 1200 to process another sequence of frames.
28
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[0062] As noted above, a code signal (e.g., the watermark 230) may be embedded
or
injected into the compressed digital data stream (e.g., an AC-3 data stream).
As shown in
the example table 1300 of FIG. 13 and described in Jensen, et al., the code
signal may
include a combination of ten sinusoidal components corresponding to frequency
indicesfi
throughfio to represent one of four code symbols -0," '1," -S," and 'E." For
example,
the code symbol 0"may represent a binary value of zero and the code symbol -1
"may
represent a binary value of one. Further, the code symbol -S"may represent the
start of a
message and the code symbol E" may represent the end of a message. While only
four
code symbols are shown in FIG. 13, more or fewer code symbols could be used
instead.
Additionally, table 1300 lists the transform bins corresponding to the center
frequencies
about which the ten sinusoidal components for each symbol are located. For
example, the
512-sample central frequency indices (e.g., 10, 12, 14, 16, 18, 20, 22, 24,
26, and 28) are
associated with a low resolution frequency domain representation of the
compressed
digital data stream and the 12,288-sample central frequency indices (e.g.,
240, 288, 336,
384, 432, 480, 528, 576, 624, and 672) are associated with a high resolution
frequency
domain representation of the compressed digital data stream.
[0063] As noted above, each code symbol may be formed using ten sinusoidal
components associated with the frequency indicesfi through f o depicted in
table 1300.
For example, a code signal for injecting or embedding the code symbol 'v0"
includes ten
sinusoidal components corresponding to the frequency indices 237, 289, 339,
383, 429,
481, 531, 575, 621, and 673, respectively. Likewise, a code signal for
injecting or
embedding the code symbol -1 " includes ten sinusoidal components
corresponding to the
frequency indices 239, 291, 337, 381, 431, 483, 529, 573, 623, and 675,
respectively. As
shown in the example table 1300, each of the frequency indices f through f o
has a unique
frequency value at or proximate to each of the 12,288-sample central frequency
indices.
29
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[00641 Each of the ten sinusoidal components associated with the frequency
indices f,
throughfio may be synthesized in the time domain using the methods and
apparatus
described herein. For example, the code signal for injecting or embedding the
code
symbol -0 "may include sinusoids ci(k), c2(k), c3(k), c4(k), c5(k), c6(k),
c7(k), c8(k), c9(k),
and clo(k). The first sinusoid ci(k) may be synthesized in the time domain as
a sequence
of samples as follows: c, (k) = cos 2)r * 237k for k = 0 through 12287.
However, the
12288
sinusoid ci(k) generated in this manner would have a constant amplitude over
the entire
12,288 sample window. Instead, to generate a sinusoid whose amplitude maybe
varied
from audio block to audio block, the sample values in a 512-sample audio block
(e.g., a
long AC-3 block) associated with the first sinusoid cl(k) may be computed as
follows:
c, P (in) = w(m) cos 2 g * 237 * (p * 256 + m) for in = 0 through 511 and p =
0 through 46,
12288
where w(m) is the window function used in the AC-3 compression described
above. One
having ordinary skill in the art will appreciate that the preceding equation
may be used
directly to compute cip(m), or ci (k) maybe pre-computed and appropriate
segments
extracted to generate cip(m). In either case, the MDCT transform of c1 p (m)
includes a
set of MDCT coefficient values (e.g., 256 real numbers). Continuing with the
preceding
example, for cip(m) corresponding to symbol 0, " the MDCT coefficient values
associated with the 512-sample frequency indices 9, 10, and 11 may have
significant
magnitudes because cip(m) is associated with the 12,288-sample central
frequency index
240, which corresponds to the 512-sample central frequency index 10. The MDCT
coefficient values associated with other 512-sample frequency indices will be
negligible
relative to the MDCT coefficient values associated with the 512-sample
frequency indices
9, 10, and 11 for the case of cip(m). Conventionally, the MDCT coefficient
values
associated with cip(m) (as well as the other sinusoidal components c2p(m), . ,
ciop(m)) are
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
divided by a normalization factor as follows: 512
y Q Q = 4 =128 128, wh512 is a number of
samples associated with each block. This normalization allows a time-domain
cosine
wave of unit amplitude at the 12,288-sample central frequency index 240 to
produce a
unit amplitude MDCT coefficient at the 512-sample central frequency index 10.
[0065] Continuing with the preceding example, for c1n (m) associated with code
symbol -0,"the code frequency index 237 (e.g., the frequency value
corresponding to the
frequency indexfi associated with the code symbol 0 ") causes the 512-sample
central
frequency index 10 to have the highest MDCT magnitude relative to the 512-
sample
frequency indices 9 and 11 because the 512-sample central frequency index 10
corresponds to the 12,288-sample central frequency index 240 and the code
frequency
index 237 is proximate to the 12,288-sample central frequency index 240.
Likewise, the
second frequency index f2 corresponding to the code frequency index 289 may
produce
MDCT coefficients with significant MDCT magnitudes in the 512-sample frequency
indices 11, 12, and 13. The code frequency index 289 may cause the 512-sample
central
frequency index 12 to have the highest MDCT magnitude because the 512-sample
central
frequency index 12 corresponds to the 12,288-sample central frequency index
288 and the
code frequency index 289 is proximate to the 12,288-sample central frequency
index 288.
Similarly, the third frequency index f3 corresponding to the code frequency
index 339
may produce MDCT coefficients with significant MDCT magnitudes in the 512-
sample
frequency indices 13, 14, and 15. The code frequency index 339 may cause the
512-
sample central frequency index 14 to have the highest MDCT magnitude because
the
512-sample central frequency index 14 corresponds to the 12,288-sample central
frequency index 336 and the code frequency index 339 is proximate to the
12,288-sample
central frequency index 336. Based on the sinusoidal components at each of the
ten
31
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
frequency indices f, through f o, the MDCT coefficients representing the
actual
watermarked code signal will correspond to the 512-sample frequency indices
ranging
from 9 to 29. Some of the 512-sample frequency indices, such as, for example,
9, 11, 13,
15, 17, 19, 21, 23, 25, 27, and 29 maybe influenced by energy spill-over from
two
neighboring code frequency indices, with the amount of spill-over a function
of the
weighting applied to each sinusoidal component based on the masking energy
analysis.
Accordingly, in each 512-sample audio block of the compressed digital data
stream, the
MDCT coefficients may be computed as described below to represent the code
signal.
[0066] In the compressed AC-3 data stream, for example, each AC-3 frame
includes
MDCT coefficient sets having six MDCT coefficients (e.g., MAO, MA1, MA2, MA3,
MA4, and MA5 of FIG. 5) with each MDCT coefficient corresponding to a 512-
sample
audio block. As described above in connection with FIGS. 5 and 6, each MDCT
coefficient is represented as mk = Mk * 2-Xk = (sk *Nk) * 2-Xk , where Xk is
the exponent
and M. is the mantissa. The mantissa Mk is a product of a mantissa step size
sk and an
integer value N.. The mantissa step size sk and the exponent Xk may be used to
form a
quantization step sizeSk = sk * 2-"A . Referring to the look-up table 600 of
FIG. 6, for
example, the mantissa step size sk is 2/15 and the integer value Nk is -2 when
the
original mantissa value is -0.2666 (i.e., -4/15).
[0067] To inject a code signal into the compressed AC-3 data stream,
modifications
to the mantissa set Mk for k = 9 through 29 are determined. For example,
consider a
subset of the mantissa set Mk for k = 9 through 29 in which the MDCT
coefficient
magnitudes C9, C10, and C11 corresponding to the watermarked MDCT coefficients
wm9,
wmlo, and wmll are -0.3, 0.8, and 0.2, respectively (with the varying
amplitude based on
the local masking energy). Furthermore, assume that the code MDCT magnitude
C,1
32
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
associated with the 512-sample central frequency index 11 is the MDCT
coefficient
having the lowest absolute magnitude (e.g., an absolute value of 0.2) for the
entire
mantissa set (Ck for k = 9 through 29). The value of the code MDCT magnitude
C11 is
used to normalize and modify the values of the MDCT coefficients m9, m10, and
m11 (as
well as the other MDCT coefficients in the set m9 through m29) because the
code MDCT
magnitude Cõ has the lowest absolute magnitude. First, C11 is normalized to
1.0 and
then used to normalize, for example, C9 and C10 as C9 = -0.3 / C11= -1.5 and
C10 = 0.8 /
C11 = 4Ø Then, the mantissa integer value N11 corresponding to the original
MDCT
coefficient nil1 is increased by 1 to as this is the minimum amount (due to
mantissa step
size quantization) by which ml l may be modified to reflect the addition of
the watermark
code corresponding to C11. Finally, the mantissa integer values N9 and N10
corresponding
to the original MDCT coefficients m9 and m10 are modified relative to N11 as
follows:
N9 - > N9 + -1.5 *S11 and N10 - > N10 + 4.0 * S11 Thus, the modified mantissa
integer
S9 S10
values N9, N10, and N11 (and the similarly modified mantissa integers N12
through N29)
may be used to modify the corresponding original MDCT coefficients to embed
the
watermark code. Also, as mentioned previously, for any MDCT coefficient, the
maximum change is limited by the upper and lower limits of its mantissa
integer
value N,,. Referring to FIG. 6, for example, the table 600 indicates lower
limit and upper
limit values of -0.9333 to +0.9333.
[0068] Thus, the preceding example illustrates how the local masking energy
may be
used to determine the code magnitude for code symbols to be embedded into a
compressed audio signal digital data stream. Moreover, eight successive frames
of the
compressed digital data stream were modified without performing decompression
of
33
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
MDCT coefficients during the encoding process of the methods and apparatus
described
herein.
[0069] FIG. 14 is a block diagram of an example processor system 2000 that may
used to implement the methods and apparatus disclosed herein. The processor
system
2000 may be a desktop computer, a laptop computer, a notebook computer, a
personal
digital assistant (PDA), a server, an Internet appliance or any other type of
computing
device.
[0070] The processor system 2000 illustrated in FIG. 14 includes a chipset
2010,
which includes a memory controller 2012 and an input/output (1/0) controller
2014. As
is well known, a chipset typically provides memory and 1/0 management
functions, as
well as a plurality of general purpose and/or special purpose registers,
timers, etc. that are
accessible or used by a processor 2020. The processor 2020 is implemented
using one or
more processors. In the alternative, other processing technology maybe used to
implement the processor 2020. The processor 2020 includes a cache 2022, which
maybe
implemented using a first-level unified cache (L1), a second-level unified
cache (L2), a
third-level unified cache (L3), and/or any other suitable structures to store
data.
[0071] As is conventional, the memory controller 2012 performs functions that
enable
the processor 2020 to access and communicate with a main memory 2030 including
a
volatile memory 2032 and a non-volatile memory 2034 via a bus 2040. The
volatile
memory 2032 may be implemented by Synchronous Dynamic Random Access Memory
(SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random
Access Memory (RDRAM), and/or any other type of random access memory device.
The
non-volatile memory 2034 may be implemented using flash memory, Read Only
Memory
(ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), and/or
any
other desired type of memory device.
34
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[0072] The processor system 2000 also includes an interface circuit 2050 that
is
coupled to the bus 2040. The interface circuit 2050 may be implemented using
any type
of well known interface standard such as an Ethernet interface, a universal
serial bus
(USE), a third generation input/output interface (3GI0) interface, and/or any
other
suitable type of interface.
[0073] One or more input devices 2060 are connected to the interface circuit
2050.
The input device(s) 2060 permit a user to enter data and commands into the
processor
2020. For example, the input device(s) 2060 may be implemented by a keyboard,
a
mouse, a touch-sensitive display, a track pad, a track ball, an isopoint,
and/or a voice
recognition system.
[0074] One or more output devices 2070 are also connected to the interface
circuit
2050. For example, the output device(s) 2070 may be implemented by media
presentation devices (e.g., a light emitting display (LED), a liquid crystal
display (LCD),
a cathode ray tube (CRT) display, a printer and/or speakers). The interface
circuit 2050,
thus, typically includes, among other things, a graphics driver card.
[0075] The processor system 2000 also includes one or more mass storage
devices
2080 to store software and data. Examples of such mass storage device(s) 2080
include
floppy disks and drives, hard disk drives, compact disks and drives, and
digital versatile
disks (DVD) and drives.
(0076] The interface circuit 2050 also includes a communication device such as
a
modem or a network interface card to facilitate exchange of data with external
computers
via a network. The communication link between the processor system 2000 and
the
network maybe any type of network connection such as an Ethernet connection, a
digital
subscriber line (DSL), a telephone line, a cellular telephone system, a
coaxial cable, etc.
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
[00771 Access to the input device(s) 2060, the output device(s) 2070, the mass
storage
device(s) 2080 and/or the network is typically controlled by the I/O
controller 2014 in a
conventional manner. In particular, the 1/0 controller 2014 performs functions
that
enable the processor 2020 to communicate with the input device(s) 2060, the
output
device(s) 2070, the mass storage device(s) 2080 and/or the network via the bus
2040 and
the interface circuit 2050.
[00781 While the components shown in FIG. 14 are depicted as separate blocks
within the processor system 2000, the functions performed by some of these
blocks may
be integrated within a single semiconductor circuit or may be implemented
using two or
more separate integrated circuits. For example, although the memory controller
2012 and
the UO controller 2014 are depicted as separate blocks within the chipset
2010, the
memory controller 2012 and the I/O controller 2014 may be integrated within a
single
semiconductor circuit.
[00791 The methods and apparatus disclosed herein are particularly well suited
for
use with data streams implemented in accordance with the AC-3 standard.
However, the
methods and apparatus disclosed herein may be applied to other digital audio
coding
techniques.
[00801 In addition, while this disclosure is made with respect to example
television
systems, it should be understood that the disclosed system is readily
applicable to many
other media systems. Accordingly, while this disclosure describes example
systems and
processes, the disclosed examples are not the only way to implement such
systems.
[00811 Although certain example methods, apparatus, and articles of
manufacture
have been described herein, the scope of coverage of this patent is not
limited thereto. On
the contrary, this patent covers all methods, apparatus, and articles of
manufacture fairly
falling within the scope of the appended claims either literally or under the
doctrine of
36
CA 02529310 2005-12-12
WO 2005/008582 PCT/US2004/018953
equivalents. For example, although this disclosure describes example systems
including,
among other components, software executed on hardware, it should be noted that
such
systems are merely illustrative and should not be considered as limiting. In
particular, it
is contemplated that any or all of the disclosed hardware and software
components could
be embodied exclusively in dedicated hardware, exclusively in firmware,
exclusively in
software or in some combination of hardware, firmware, and/or software.
37