Note: Descriptions are shown in the official language in which they were submitted.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
1
A method for generating and decoding image dependent
watermarks
s Technical Field
The present invention relates to methods of
generating and decoding watermarks in images according to
the preamble of the independent claims.
Background Art
The idea of using a robust digital watermark
to detect and trace copyright violations has stimulated
significant interest among artists and publishers in
recent years. Podilchuk (Podilchuk & Zeng 1998) gives
three important requirements for an effective water-
marking scheme: transparency, robustness and capacity.
Transparency refers to the fact that we would like the
2o watermark to be invisible. The watermark should also be
robust against a variety of possible attacks by pirates.
These include robustness against compression such as
JPEG, scaling and aspect ratio changes, rotation,
cropping, row and column removal, addition of noise,
2s filtering, cryptographic and statistical attacks, as well
as insertion of other watermarks and the watermark copy
attack proposed by Kutter (Kutter, Voloshynovskiy
Herrigel 2000) in which a watermark is estimated from one
image and added to another one.
3o The third requirement is that the watermark
be able to carry a certain amount of information i.e.
capacity. In order to attach a unique identifier to each
buyer of an image, a typical watermark should be able to
carry at least 60-100 bits of information. However few
3s publications deal with 60 or more bits.
Watermarking methods can be divided into two
broad categories: spatial domain methods such as (Bender,
CONFIRMATION COPY
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
2
Gruhl & Morimoto 1996, Pitas 1996) and transform domain
methods which have for the most part focused on DCT
(Podilchuk & Zeng 1998, Barni et al. 1998), DFT (Pereira
& Pun 1999) and most recently wavelet domain methods
(Podilchuk & Zeng 1998, Barni, Bartolini, Cappellini,
Lippi & Piva 1999). Transform domain methods have several
advantages over spatial domain methods. Firstly, it has
been observed that in order for watermarks to be robust,
they must be inserted into the perceptually significant
Zo parts of an image. For images these are the lower
frequencies which can be marked directly if a transform
domain approach is adopted. Secondly, since compression
algorithms operate in the frequency domain (-for example
DCT for JPEG and wavelet for EZW) it is possible to
optimize methods against compression algorithms.
Thirdly, certain transforms are intrinsically robust to
certain transformations. For example, the DFT domain has
been successfully adopted in algorithms which attempt to
recover watermarks from images which have undergone
affine transformations (Pereira & Pun 1999).
While transform domain watermarking clearly
offers benefits, the problem is more challenging since it
is more difficult to generate watermarks which are
adapted to the human visual system (HVS). The problem
arises since constraints on the acceptable level of
distortion for a given pixel may be specified in the .
spatial domain. In the bulk of the literature on
adaptive transform domain watermarks, a watermark is
generated in the transform domain and then the inverse
3o transform is applied to generate the spatial domain
counterpart. The watermark is then modulated as a
function of a spatial domain mask in order to render it
invisible. However this spatial domain modulation is
suboptimal since it changes the original frequency domain
watermark. In the case of a DFT domain watermark,
multiplication by a mask in the spatial domain
corresponds to convolution of the magnitude of the
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
3
spectrum. Unfortunately, to correctly account for the
effects of the mask at decoding a deconvolution problem
would have to be solved. This is known to be difficult
and to our knowledge in the context of watermarking this
problem has not been addressed. Methods proposed in the
literature simply ignore the effects of the mask at
decoding. One alternative which has recently appeared is
the attempt at specifying the mask in the transform
domain (Podilchuk & Zeng 1998). However other authors
(e. g. Swanson (Swanson, Zhu & Tewfik 1998)) have noted
the importance of masking in the spatial domain even
after a frequency domain mask has been applied.
The existing technologies exhibit at least
one of the following problems:
s5 1. Less than 60 bits are encoded.
2. sub-optimal spatial domain modulation is
applied to reduce visibility.
3. truncation is applied to deal with values
greater than 255 or less than 0 in the image.
4. The watermark is not image dependent and
in particular does not resist against the watermark copy
attack which estimates the watermark from one image and
adds it to another.
5. Cannot be optimized against JPEG
compression.
6. Uses an additive watermark which is easily
copied, or attacked by denoising and perceptual
remodulation as proposed by Voloshynovskiy
(Voloshynovskiy, Herrigel, Baumgartner, Pereira & Pun
2000) .
7. At embedding the image is treated as
noise.
Disclosure of the Invention
It is the object of the present invention to
provide a method of the type mentioned above that is
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
4
capable of dealing with at least some, preferably all of
these problems.
According to the present invention, the
problem is solved by the method of the independent
claims.
Preferred embodiments are described in the
dependent claims.
The present method is suited for watermarking
still images or video data.
Modes for Carrying Out the Invention
Formulation of a preferred embodiment:
We formulate the embedding process as a constrained
optimization problem. We assume that we are given an
image to be watermarked denoted I. If it is an RGB image
we work with the luminance component though the same
methodology can be applied to other color spaces. We are
2o also given a masking function V(I) which returns 2
matrices of the same size of I containing the values
~pi~~ and Oni~~ corresponding to the amount by which
pixel Ii~~ can be respectively increased and decreased
without being noticed. These can be determined by noise
visibility functoins NVF as proposed by Voloshynovskiy
(Voloshynovskiy, Herrigel, Baumgaertner & Pun 1999) or
other visual models such as those proposed by Osberger
(Osberger, Bergmann & Maeder 1998). We note that these
Opi~~ and Oni~~ are not necessarily the same since we
ao also take into account truncation effects. That is pixels
are integers e.g. in the range 0-255, consequently it is
possible to have a pixel whose value is 1 which can be
increased by a large amount, but can be decreased by at
most 1. In the general case, the function V(I) can be a
complex function of texture, luminance, contrast,
frequency and patterns.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
We wish to embed a message m=(ml,m2...mM),
where miEfO,l~ and M is the number of bits in the
message. In general, the binary message may first be
augmented by a checksum and/or coded using error
s correction codes such as BCH or turbo codes to produce a
message me of length Mc=512. Without loss of generality
we assume the image I is of size 128x128 corresponding to
a very small image. For larger images the same procedure
is adopted for each 128x128 large block. To embed the
1o message, we first divide the image into 8x8 blocks. In
each 8x8 block we embed 2 bits from mc. For each 8x8
block we select, using a key, 2 mid-frequency DCT
coefficients in which we will embed the information bits.
We then have to solve the following constrained
minimization problem:
min fx ; Ax <- b (1)
x
where x= [x11 ... xg1 x12 .° x82 °~ x18 ~° x88 t is a
vector of
offsets to be added to the DCT coefficients of the 8x8
block. x is arranged column by column and the values of
this vector are to be varied during minimization. f is a
vector of zeros except in the positions of the 2 selected
coefficients where we insert a -1 or 1 depending on
whether we wish to respectively increase or decrease the
value of the coefficients as determined by me and
described in the next section. fx denotes the scalar
product of these vectors. Ax S b contain the constraints
which are partitioned as follows.
A - IDCT b - OP ( 2 )
- IDCT ~ O
where IDCT is the matrix which yields the 2D inverse DCT
transform of x (with elements of the resulting image
3s arranged column by column in the vector). If we let Did
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
6
be the coefficients of the 1D DCT transform then it is
easily shown that the matrix IDCT is given by:
D11D11 ... D1sD11 -D11~21 .. . Dl$D21 .. . D11D81 .. . DlsDs1
D21D11 ... DZgDlI D21D21 ... D2sD21 . .. Da1D81 .. . D2sD81
DsiDii . . . D88Dii DslDai . . . DssD2~ . . . DsiDsl . . . DssD81
IDCT = ~ DiiDia . . . D1sD12 DixDaa . . . Dl8Dz2 . . . D11Ds2 . . . DlsDsz
D81D12 . . . DssDia DsiDz2 . . . D$$D2z ... Dsl..D82 . .. DssDsa
DsiDis . .. DssDls DsiDzs . .. D88D2$ . .. DslDss . .. D$sDsa
We also note that we take Op and On to be
column vectors where the elements are taken column wise
from the matrices of allowable distortions. Stated in
this form the problem is easily solved by the well known
2o Simplex method.
Stated as such the problem only allows for
spatial domain masking, however many authors (Swanson,
Zhu & Tewfik 1996) suggest also using frequency domain
masking. This is possible by adding the following
constraints:
L <_ x <_ U (3)
Here L and U are the allowable lower and
2o upper bounds on the amount we by which we can change a
given frequency component. The Simplex method can also
be used to solve the problem with added frequency domain
constraints.
We note that by adopting this framework, we
z5 in fact allow all DCT coefficients to be modified (in a
given 8x8 block) even though we are only interested in 2
coefficients at decoding. This is a novel approach which
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
7
has not appeared in the literature. Other publications
select a subset of coefficients to mark while leaving the
rest unchanged. This is necessarily suboptimal relative
to our approach. In words, we are "making space" for the
watermark in an optimal fashion by modifying elements
from the orthogonal complement of the coefficients we are
interested in, while satisfying spatial domain
constraints.
1o Effective Channel Coding:
Rather than coding based on the sign of a
coefficient as in (Pereira & Pun 2000), we propose using
the magnitude of the coefficient. To encode a 1 we will
s5 increase the magnitude of a coefficient and to encode a 0
we will decrease the magnitude. At decoding a threshold
Tz will be chosen against which the magnitudes of
coefficients will be compared. The coding strategy is
summarised in table 1, where c3 is the selected DCT
2o coefficient,
Table 1: Magnitude coding
sign(ci) bit coding
25 + 0 decrease ci (set L to stop at 0)
- 0 increase ci (set U to stop at 0)
+ 1 increase ci
- 1 decrease ci
3o The actual embedding is performed by setting
f in equation (1) based on whether we want to increase or
decrease a coefficient.
The major advantage of this scheme over
encoding based on the sign is that the image is no longer
35 treated as noise. As noted by Cox (Cox, Miller &
McKellips 1999) this is an important characteristic of
the potentially most robust schemes since all a priori
information is used. Clearly the best schemes should not
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
8
treat the image as noise since it is known at embedding.
However most algorithms in the literature do not take
.advantage of this knowledge except in the extraction of
perceptual information. In our case, based on the
s observed image DCT coefficient we encode as indicated in
table 1 At decoding the image is once again not noise
since it contributes to the watermark.
The decoding procedure is simple, we need
only take the magnitude of the selected coefficients ci.
so We must then compare them to a threshold Ti. Coefficients
greater than the threshold are assigned a value of l, and
coefficients smaller than the threshold are assigned a
value of 0. Tf ECC was used, the codes are then decoded.
In a superior implementation, soft decoding
Zs can be used. Here we calculate the difference between
the magnitude of the coefficient ci and the threshold Ti.
The result is used for the soft decoding of the coded
sequence. It is well known that soft decoding can provide
up to a 3dB gain over hard decoding since all the
2o information is being used. One possible set of codes are
the turbo codes which provide a near optimum performance
in Gaussian channels. We note that the threshold Ti can
be set empirically. Typically the value Ti should be the
midpoint between the average magnitude associated with a
2s 1 and the value associated with a 0 at encoding. If other
information is available about the noise at decoding,
then the value of Ti should be set accordingly.
Another important property of this scheme is
that it is highly image dependent. This is an important
so property if we wish to resist against the watermark copy
attack (Kutter et al. 2000) in which a watermark is
estimated from one image (typically by denoising) and
added to another image to produce a fake watermark. If
this is done, the watermark will be falsely decoded since
35 at embedding and decoding the marked image is an integral
part of the watermark. Consequently changing the image
implies changing the watermark.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
9
It is also possible to incorporate JPEG
quantization tables into the model in order to increase
the robustness of the algorithm. Assume for example that
we would like to aim for resistance to JPEG compression
s at quality factor 10. Table 2 contains the threshold
value thri~ below which a given DCT coefficient will be
set to o.
Table 2: JPEG thresholds thri~ at quality factor 10
30 30 30 40 60 100 130 130
35 35 35 50 65 130 130 130
40 35 45 45 65 130 130 130
40 45 60 75 130 130 130 130
50 55 95 130 130 130 130 130
60 90 130 130 130 l30 130 130
125 130 130 130 130 130 130 130
130 130 130 130 130 130 130 130
2o In order to improve the performance of the
algorithm we can add bounds based on the values in table
2 to the amount we increase a coefficient. In particular,
if we wish to embed a 1 we need only increase the mag-
nitude of a coefficient to the threshold given in table 2
2s iii. order for it to survive a JPEG compression at quality
factor 10. This is accomplished by setting the bounds L
and U. Since 2 bits are embedded per block, the
remaining energy may be used to embed the other bit. It
is important to note that it may not be possible to
3o achieve the threshold since our visibility constraints as
determined by V in the spatial domain must not be viol-
ated, however the algorithm will embed as much as much
energy as possibly via the minimization in equation 1. We
note that we choose only to embed the watermark in
35 randomly chosen coefficients where the value in table 2
is less than 70 since for larger values we will require
more energy to be sure that the coefficient survives at
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
low JPEG compression. We avoid the 4 lowest frequency
components in the upper left hand part of the DCT block
since these tend to be visible even with small
modifications.
5
Extensions
While the algorithm has been detailed in the case of the
DCT, it is readily applicable to any watermarking domain
Zo (i.e. the coefficients c can either be in the space
domain or a transform domain), and further visibility
constraints can be imposed in the transform and/or
spatial domains. In the case of the DCT domain, we have
moved coefficients towards 0 to encode a 0 and increased
the magnitude to encode a 1. In the general case, we
should move a coefficient ci towards its expected value
Vi to encode a 0 and move it away from (increase the
magnitude of the difference from) the expected value to
encode a 1. In the case of a spatial domain approach,
2o this would consist of moving pixels away from the local
mean Vi to encode a 1 and towards the local V mean to
encode a 0. The local mean would be calculated over the
pixel to be modified and neighboring pixels thereof.
In the case of the spatial domain, i.e. we
select our coefficients ci in the spatial.domain, we can
specify the visibility constraints in the spatial domain
and/or in a transform domain and carry out the
optimization from there.
We also note that instead of using just one
3o coefficient per bit, we may use error correction coding
or in the simplest case the repetition code where each
bit is encoded several times. At decoding, we could use
the average, median or other type of estimate depending
on our estimate of the noise to combine the information.
We further note the extension to video
watermarking. In fact the extension is straightforward,
since we need only apply the algorithm to each frame.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
11
This is an extremely powerful method of overcoming
averaging attacks within video since the watermark
changes from frame to frame.
In general, the present method starts from a
still image of a single video image frame I. It then
derives a set c of coefficients ~c2, ... cN~ from this
image. The set c can either correspond to the values of
the image in the space domain (e.g. the luminance or the
so r, g or b coefficients) or in some transform domain (e. g.
c can be a vector of the DCT coefficients, the Fourier
magnitude coefficients or the coefficients of the
discrete wavelet transform).
Then, some of the coefficients ci are
s5 selected for encoding the bits of the message mc. This
selection can e.g. be based on some secret key, which is
used to derive the indices i of the coefficients to be
selected.
The selected coefficients will either be
2o increased or decreased to encode the corresponding bit.
To decide if a coefficient is to be increased or
decreased, a test function gi(c) is calculated for each
coefficient. This function can depend on all or a subset
of all coefficients in set c. Then, the following table
25 is used for deciding if c1 is to be increased or
decreased:
Table 3: generalized magnitude coding
3o gi (c) bit encoding
true 0 decrease ci
false 0 increase ci
true 1 increase ci
false 1 decrease ci
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
12
As it is obvious to a person skilled in the
art, the column titled "bit" could also be inverted, i.e.
the table could equivalently be written as:
Table 3': generalized magnitude coding
gi(c) bit encoding
true 1 decrease ci
false 1 increase ci
1o true 0 increase ci
false 0 decrease ci
Such a coding is considered to lie equivalent
to the one of table 3.
As mentioned above, the scheme of Table 3 or
3' has the advantage that it depends on the image to be
watermarked and therefore resists the watermark copy
attack.
Preferably, gi(c) is defined such that it is
2o true if and only if ci > Vi, where Vi is the expected
value mentioned above, e.g. 0 (in particular if the
coefficients c are in the DCT or DFT domain) or a local
mean at the position of said coefficients (in particular
when the coefficients c are in the space domain). In this
case, in the first line of table 3 (ci > Vi and bit = 0),
c1 should not be decreased below Vi. In the second line
of table 3 (ci < Vi and bit = 0), ci should not be
increased above Vi.
However, gi(c) could e.g. also depend on the
3o values of coefficients ck that lie adjacent to ci.
The optimization step as described in Eq. (1)
can be generalized as follows:
In order to apply the watermark, a set of
additive offsets x = ~xl, ... xN} must be calculated, which
is to be added to or subtracted from the set c of
coefficients derived from the image, i.e. ci . ci + xi
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
13
for all i = I to N. The offsets x are calculated by
minimizing or maximizing the value of a scalar product
f-x of the vectors f and x as described in Eq. (1). The
condition in Eq. (1) can be expressed in more general
terms by saying that a Boolean valued boundary function
a(x), which depends on the offsets x, must be true.
In the above embodiments, two different
possibilities for defining a(x) are described, which can
be used alternatively or in combination.
1o In Eq. (2) , a (x) is true if Ax 5 b, wherein A
is a double matrix of the inverse DCT. If, more
generally, a transform function t(I) of the image (also
other than DCT) is used to generate the set c of
coefficients, the limitation in Eq. (1) can be expressed
is as follows: a(x) is false if the conditions
t-2 (x) 5 Op and t-Z (x) >_ ~n (4)
are not fulfilled. If transform function t is
2o a linear transform expressed as a matrix T, i.e. c = T-I,
the boundary function a(x) is false if the condition A-x
_< b is false, wherein matrix A and vector b are defined
(analogously to Eq. (2)) by
-1
25 A = T -1 ; b = 'p . (5)
_ T ~n
The boundary function a(x) can achieve one or
several of the following goals:
a) prevent overflow/underflow conditions
3o and/or
b) ensure a minimum quality of the image with
the watermark embedded therein and/or
c) prevent unnecessarily large values of the
offsets x in view of a lossy compression algorithm, such
35 as e.g. the JPEG compression algorithm.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
14
In the above examples, goals a) and b) are
achieved by the boundary conditions in Eqs. (1) and (2)
or (4) and (5) because the values of Op and On prevent
overflow/underflow in the space domain as well as image
degradation.
Goal c) is can be achieved by the limitation
of Eq. (3), i.e. with the lower and upper limits L and U
of c. To do this, L and U are calculated for each
selected coefficient ci by the following steps:
so - Retrieving a threshold value thri for ci.
This value is given by the compression algorithm the
watermark is to be proof against: It is the value thrl
for which the compression algorithm suppress-es ci if the
magnitude of ci is smaller than thrz, and
- If the magnitude of the coefficient ci is
smaller that the threshold value thri and the magnitude
of the coefficient is to be increased, the upper limit Uz
is set approximately to the difference between thri and
the magnitude of ci. By "approximately", it is understood
2o that the upper limit is exactly equal to or slightly
higher than this difference. This ensures that the
coefficient will make it through the compression without
wasting too much energy, i.e. without making them
unnecessarily large.
While there are shown and described presently
preferred embodiments of the invention, it is to be dis-
tinctly understood that the invention is not limited
thereto but may be otherwise variously embodied and prac-
so ticed within the scope of the following claims.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
References:
Barni, M., Bartolini, F., Cappellini, V., Lippi, A. & Piva, A. (1999), A DWT-
based
technique for spatio-frequency masking of digital signatures, in P. W. Wong &
E. J. Delp, eds, 'Security and Watermarking of Multimedia Contents', Vol.
3657,
The Society fox Imaging Science and Technology (IS&T) and the International
Society for Optical Engineering (SPIE), SPIE, San Jose, California, U.S.A.,
pp. 31-39.
*http://lci.die.unifi.it/Publications/swmc99-b4.ps.gz
Barni, M., Bartolini, F., Cappellini, V., Piva, A. & Rigacci, F. (1998), A
M.A.P. identification criterion for DCT-based watermarking, in 'EUSIPCO'98',
Rhodes, Greece.
Cox, I. J., Miller, M. L. & McKellips, A. L. (1999), 'Watermarking as
communications with side infoxmation', Proceedings of the IEEE 8'T(7), 112'1-
1141. .
Kutter, M., Voloshynovskiy, S. &. Herrigel, A. (2000), Watermark copy attack,
iw P. Wah Wong & E. J. Delp, eds, 'IS&T/SPIE's 12th Annual Symposium,
Electronic Imaging 2000: Security and Watermarking of Multimedia Content
II', Vol. 39'T1 of SPIE Proceedings, San Jose; California USA.
'Osberger, W., Bergmann, N. & Maeder, A. J. (1998), An automatic image quality
assessment technique incorporating higher level perceptual factors, in 'IEEE
ICIP-98', Chicago,USA.
Pereira, S. & Pun, T. (1999), Fast xobust template matching for affine
resistant
watermarks, in '3rd International Information Hiding Workshop', Dreseden,
Germany.
Pereira, S. & Pun, T. (2000), A framework for optimal adaptive dct watermarks
using linear programming, in 'Tenth European Signal Processing Conference
(EUSIPCO'2000)', Tampere, Finland.
Podilchuk, C. I. & Zeng, W. (1998), 'Image-adaptive watermarking using visual
models', IEEE Journal on Selected Areas in Communications 16(4), 525-539.
Swanson, M. D., Zhu, B. & Tewfik, A. (1996), Robust data hiding for images,
in '7th TEEE Digital Signal Processing Workshop', Loen, Norway, pp. 3?-40.
G:WM1-A23.
Swanson, M. D., Zhu, B. & Tewfik, A. H. (1998), 'Multiresolution scene-based
video
watermarking using perceptual models', IEEE Journal on Selected Areas in
G'ommunications 16(4), 540-550.
Voloshynovskiy, S., Herrigel, A., Baumgaextner, N. & Pun, T. (1999), A
stochastic approach to content adaptive digital image watermarking, in 'Third
International Workshop on Information Hiding', Dresden, Germany.
CA 02377900 2001-12-20
WO 01/82226 PCT/IB00/00494
16
Voloshynovskiy, S., Herrigel, A., Baumgartner, N., Pereira, S. & Pun, T.
(2000),
A generalized watermark attack based on stochastic watermark estimation and
perceptual remodulation, in P. Wah Wong & E. J. Delp, eds, 'IS&T/SPIE's 12th
Annual Symposium, Electronic Imaging 2000: Security and Watermarking of
Multimedia Content II', Vol. 3971 of SPIE Proceedings, San Jose, California
USA. (Paper EI 3971-34).
Zhu, W., Xiong, Z. & Zhang, Y. (~. (1999), 'Multiresolution watermarking for
images
and video', IEEE Transactions on Carcuits and Systems for lrideo Technology
9(4), 545-550.