Patent 2522856 Summary

(12) Patent:	(11) CA 2522856
(54) English Title:	VIDEO SIZE CONVERSION AND TRANSCODING FROM MPEG-2 TO MPEG-4
(54) French Title:	CONVERSION DE FORMAT D'ECRAN ET TRANSCODAGE DE MPEG-2 EN MPEG-4
Status:	Term Expired - Post Grant Beyond Limit

Bibliographic Data

(51) International Patent Classification (IPC):	H04N 19/625 (2014.01) H04N 19/40 (2014.01) H04N 19/61 (2014.01)
(72) Inventors :	CHEN, XUEMIN (United States of America) PANUSOPONE, KRIT (United States of America)
(73) Owners :	GOOGLE TECHNOLOGY HOLDINGS LLC
(71) Applicants :	GOOGLE TECHNOLOGY HOLDINGS LLC (United States of America)
(74) Agent:	SMART & BIGGAR LP
(74) Associate agent:
(45) Issued:	2008-01-15
(22) Filed Date:	2001-05-25
(41) Open to Public Inspection:	2001-12-13
Examination requested:	2005-11-10
Availability of licence:	N/A
Dedicated to the Public:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:

Application No.	Country/Territory	Date
09/590,928	(United States of America)	2000-06-09

Abstracts

English Abstract

A transcoder architecture that provides the lowest possible complexity with a small error, e.g., for converting an MPEG-2 bitstream into an MPEG-4 bitstream. The transcoder reads header information (304) from an input bitstream and provides a corresponding header in the new format for the output bitstream. In one embodiment (Fig. 3), a low complexity front-to-back transcoder (with B frames disabled) avoids the need for motion compensation processing. In another embodiment (Fig. 4), a transcoder architecture that minimizes drift error (with B frames enabled) is provided. In another embodiment (Fig. 5), a size transcoder (with B frames enabled) is provided, e.g., to convert a bitstream of ITU-R 601 interlaced video coding with MPEG-2 MP@ ML into a simple profile MPEG-4 bitstream which Contains SIF progressive video suitable for a streaming video application. For spatial downscaling of field-mode DCT blocks, vertical and horizontal downscaling techniques are combined to use sparse matrixes to reduce computations.

French Abstract

L'invention concerne une architecture de transcodage la moins complexe possible et à faible niveau d'erreur, destinée, par exemple, à la conversion d'un train de bits MPEG-2 en un train de bits MPEG-4. Le transcodeur lit les renseignements d'en-tête (304) contenus dans un train de bits d'entrée et fournit un en-tête correspondant dans le nouveau format du train de bits de sortie. Dans un mode de réalisation (figure 3), un transcodeur avant-arrière peu complexe (trames B désactivées) rend tout traitement de correction du mouvement inutile. Dans un autre mode de réalisation (figure 4), l'invention porte sur une architecture de transcodage permettant de minimiser les erreurs de dérive (trames B désactivées). Dans un autre mode de réalisation encore (figure 5), l'invention porte sur un transcodeur de format (trames B activées), qui sert, par exemple à convertir un train de bits de données vidéo entrelacées ITU-R 601 codées avec MPEG-2 MP@ ML en un train de bits MPEG-4 de profil simple contenant des données vidéo progressives SIF appropriées pour une application de lecture vidéo en continu. Pour réaliser une conversion spatiale des blocs DCT en mode champ, on combine les techniques de conversion verticales et horizontales pour utiliser des matrices creuses de réduction des calculs.

Claims

Note: Claims are shown in the official language in which they were submitted.

45
Claims
1. A method for performing 2:1 downscaling on video data, comprising the steps
of:
forming at least one input matrix of N×N Discrete Cosine Transform (DCT)
coefficients from the video data by combining four N/2×N/2 field-mode
DCT blocks;
performing vertical downsampling and de-interlacing to the input matrix to
obtain
two N/2×N/2 frame-mode DCT blocks;
forming an N×N/2 input matrix from the two frame-mode DCT blocks; and
performing horizontal downsampling to the N×N/2 matrix to obtain one
N/2×N/2
frame-mode DCT block.
2. The method of claim 1, wherein N=16.
3. The method of claim 1, wherein:
the vertical downsampling also achieves low pass filtering of the N×N
input
matrix.
4. The method of claim 1, wherein:
the vertical downsampling uses a sparse downsampling matrix.
5. The method of claim 4, wherein:
the sparse downsampling matrix=0.5[I8 I8], where I8 is an 8×8 identity
matrix.
6. The method of claim 1, wherein:
the horizontal downsampling uses a sparse downsampling matrix composed of
odd "O" and even "E" matrices.

46
7. The method of claim 6, wherein:
the even matrix has the following form:
E = [e(0) 0 0 0,
0 e(1) 0 e(2),
0 0 0 0,
0 e(3) 0 e(4),
0 0 e(5) 0,
0 e(6) 0 e(7),
0 0 0 0,
0 e(8) 0 e(9)]
where e( 1 ) through e(9) are non-zero coefficients; and
the odd matrix has the following form:
O = [0 0 0 0,
o(0) 0 o(1) 0,
0 o(2) 0 0,
o(3) 0 o(4) 0,
0 0 0 0,
o(5) 0 o(6) 0,
0 0 0 o(7),
o(8) 0 o(9) 0]
where o(1) through o(9) are non-zero coefficients.
8. An apparatus for performing 2:1 downscaling on video data, comprising:
means for forming at least one input matrix of N×N Discrete Cosine
Transform
(DCT) coefficients from the video data by combining four N/2×N/2 field-
mode DCT
blocks;

47
means for performing vertical downsampling and de-interlacing to the input
matrix to obtain two N/2×N/2 frame-mode DCT blocks;
means for forming an N×N/2 input matrix from the two frame-mode DCT
blocks;
and
means for performing horizontal downsampling to the N×N/2 matrix to
obtain one
N/2×N/2 frame-mode DCT block.
9. The apparatus of claim 8, wherein N=16.
10. The apparatus of claim 8, wherein:
the means for performing vertical downsampling also achieves low pass
filtering
of the N×N input matrix.
11. The apparatus of claim 8, wherein:
the means for performing vertical downsampling uses a sparse downsampling
matrix.
12. The apparatus of claim 11, wherein:
the sparse downsampling matrix=0.5[I8 I8], where I8 is an 8×8 identity
matrix.
13. The apparatus of claim 8, wherein:
the means for performing horizontal downsampling uses a sparse downsampling
matrix composed of odd "O" and even "E" matrices.
14. The apparatus of claim 13, wherein:
the even matrix has the following form:
E = [e(0) 0 0 0,
0 e(1) 0 e(2),
0 0 0 0,

48
0 e(3) 0 e(4),
0 0 e(5) 0,
0 e(6) 0 e(7),
0 0 0 0,
0 e(8) 0 e(9)]
where e(1) through e(9) are non-zero coefficients; and
the odd matrix has the following form:
O = [0 0 0 0,
o(0) 0 o(1) 0,
0 o(2) 0 0,
o(3) 0 o(4) 0,
0 0 0 0,
o(5) 0 o(6) 0,
0 0 0 o(7),
o(8) 0 o(9) 0]
where o(1) through o(9) are non-zero coefficients.

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02522856 2005-11-10
VIDEO SIZE CONVERSION AND TRANSCODING FROM MPEG-2 TO
MPEG-4
BACKGROUND OF THE INVENTION
The present invention relates to compression of
multimedia data and, in particular, to a video
txanscoder that allows a generic MPEG-4 decodex to
decode MPEG-2 bitstreams. Temporal and span al size
conversion (downscaling) are also provided.
The following acronyms and terms are used:
CBP - Coded Block Pattern
DCT - Discrete Cosine Transform
DTV - Digital Television
DVD - Digital Video Disc
HDTV - High Definition Television
Z5 FLC - Fixed Length Coding
IP - Internet Protocol
MB - Macroblock
ME - Motion Estimation
ML - Main Level
MP - Main Profile
MPS - MPEG-2 Program Stream
MTS - MPEG-2 Transport Stream
MV - Motion Vector
QP - quantization parameter
' PMV - Prediction Motion Vector
' RTP - Real-Time Transport Protocol (RFC 1889)
SDTV - Standard Definition Television

CA 02522856 2005-11-10
2
SIF - Standard Intermediate Format
SVCD - Super Video Compact Disc
VLC - Variable Length Coding
VLD - Variable Length Decoding
VOP - Video Object Plane
MPEG-4, the multimedia coding standard, provides a
rich functionality to support various applications,
including Internet applications such as streaming
media, advertising, interactive gaming, virtual
traveling, etc. Streaming video over the Internet
(multicast), which is expected to be among the most
popular application for the Internet, is also well-
suited for use with the MPEG-4 visual standard (ISO/IEC
14496-2 Final Draft of International Standard (MPEG-4),
"Information Technology - Generic coding of audio-
visual objects, Part 2: visual," Dec. 1998).
MPEG-4 visual handles both synthetic and natural
video, and accommodates several visual object types,
such as video, face, and mesh objects. MPEG-4 visual
also allows coding of an arbitrarily shaped object so
that multiple objects can be shown or manipulated in a
scene as desired by a user. Moreover, MPEG-4 visual is
very flexible in terms of coding and display
configurations by including enhanced features such as
multiple auxiliary (alpha) planes, variable frame rate,
and geometrical transformations (sprites).
However, the majority of the video material (e. g.,
' movies, sporting vents, concerts, and the like) which
is expected to be the target of streaming video is

CA 02522856 2005-11-10
3
already compressed by the MPEG-2 system and stored on
storage media such as DVDs, computer memories (e. g.,
server hard disks), and the like. The MPEG-2 System
specification (ISO/IEC 13818-2 International Standard
(MPEG-2), "Information Technology - Generic coding of
Moving Pictures and Associated Audio: Part 2 - Video,"
1995) defines two system stream formats: the MPEG-2
Transport Stream (MTS) and the MPEG-2 Program Stream
(MPS). The MTS is tailored for communicating or
storing one or more programs of MPEG-2 compressed data
and also other data in relatively error-prone
environments. One typical application of MTS is DTV.
The MPS is tailored for relatively error-free
environments. The popular applications include DVD and
SVCD.
Attempts to address this issue have been
unsatisfactory to date. For example, the MPEG-4 studio
profile (O. Sunohara and Y. Yagasaki, "The draft of
MPEG-4 Studio Profile Amendment Working Draft 2.0,"
ISO/IEC JTC1/SC29/WGI1 MPEG99/5135, Oct. 1999) has
proposed a MPEG-2 to MPEG-4 transcoder, but that
process is not applicable to the other MPEG-4 version 1
profiles, which include the Natural Visual profiles
(Simple, Simple Scaleable, Core, Main, N-Bit),
Synthetic Visual profiles (Scaleable Texture, Simple
Face Animation), and Synthetic/Natural Hybrid Visual
- (Hybrid, Basic Animated Texture). The studio profile
is not applicable to the Main Profile of MPEG-4 version
1 since it modifies the syntax, and the decoder process

CA 02522856 2005-11-10
4
is incompatible with the rest of the MPEG-4 version 1
profiles.
The MPEG standards designate several sets of
constrained parameters using a two-dimensional ranking
order. One of the dimensions, called the "profile"
series, specifies the coding features supported. The
other dimension, called "level", specifies the picture
resolutions, bit rates, and so forth, that can be
accommodated.
20 For MPEG-2, the Main Profile at Main Level, or
MP@ML, supports a 4:2:0 color subsampling ratio, and I,
P and B pictures. The Simple Profile is similar to the
Main Profile but has no B-pictures. The Main Level is
defined fox ITU-R 601 video, while the Simple Level is
defined for SIF video.
Similarly, for MPEG-4, the Simple Profile contains
SIF progressive video (and has no B-VOPs or interlaced
video). The Main Profile allows B-VOPs and interlaced
video.
Accordingly, it would be desirable to achieve
interoperability among different types of end-systems
by the use of MPEG-2 video to MPEG-4 video transcoding
and/or MPEG-4-video to MPEG-2-video transcoding. The
different types of end-systems that should be
accommodated include:
Transmitting Interworking Unit (TIU): Receives
MPEG-2 video from a native MTS (or MPS) system and
transcodes to MPEG-4 -video and distributes over packet
networks using a native RTP-based system layer (such as

CA 02522856 2005-11-10
an IP-based internetwork). Examples include a real-
time encoder, a MTS satellite link to Internet, and a
video server with MPS-encoded source material.
Receiving Interworking Unit (RIU): Receives MPEG-4
5 video in real time from an RTP-based network and then
transcodes to MPEG-2 video (if possible) and forwards
to a native MTS (or MPS) environment. Examples include
an Internet-based video server to MTS-based cable
distribution plant.
Transmitting Internet End-System (TIES): Transmits
MPEG-2 or MPEG-4 video generated or stored within the
Internet end-system itself, or received from internet-
based computer networks. Examples include a video
server.
Receiving Internet End-System (RIES): Receives
MPEG-2 or MPEG-4 video over an RTP-based Internet for
consumption at the Internet end-system or forwarding to
a traditional computer network. Examples include a
desktop PC or workstation viewing a training video.
It would be desirable to determine similarities
and differences between MPEG-2 and MPEG-4 systems, and
provide transcoder architectures which yield a low
complexity and small error.
The transcoder architectures should be provided
for systems where B-frames are enabled (e. g., main
profile), as well as a simplified architecture for when
B-frames are not used (simple profile).
Format (MPEG-2 to MPEG-4) and/or size transcoding
should be provided.

CA 02522856 2005-11-10
It would also be desirable to provide an efficient
mapping from the MPEG-2 to MPEG-4 syntax, including a
mapping of headers.
The system should include size transcoding,
including spatial and temporal transcoding.
The system should allow size conversion at the
input bitstream or output bitstream of a transcoder.
The size transcoder should convert a bitstream of
ITU-R 601 interlaced video coded with MPEG-2 MP~ML into
a simple profile MPEG-4 bitstream which contains SIF
progressive video suitable, e.g., for a streaming video
application.
The system should provide an output bitstream that
can fit in the practical bandwidth for a streaming
video application (e. g., less than 1 Mbps).
The present invention provides a system having the
above and other advantages.

CA 02522856 2005-11-10
7
SUMMARY OF THE INVENTTON
The invention relates to format transcoding (MPEG-
2 to MPEG-4; and size (spatial and temporal?
transcoding.
A proposed transcoder includes size conversion,
although these parameters can be transcoded either at
the input bitstream or the output bitstream. However,
it is more efficient to include all kinds of
transcoding into the product version of a transcoder to
reduce the complexity since the transcoders share
processing elements with each other (such as a
bitstream reader?.
The invention addresses the most important
requirements for a transcoder, e.g., the complexity of
the system and the loss generated by the process.
In one embodiment, a proposed front-to-back
transcoder architecture reduces complexity because
there is no need to perform motion compensation.
In a particular embodiment, the transcoder can use
variable 5-bit QP representation, and eliminates AC/DC
prediction and the nonlinear DC sealer.
The invention is alternatively useful for rate
control and resizing.
A particular method for transcoding a pre-
compressed input bitstream that is provided in a first
video coding format includes the steps of: recovering
header information of the input bitstream; providing
corresponding header information in a second, different

CA 02522856 2005-11-10
8
video coding format; partially decompressing the input
bitstream to provide partially decompressed data; and
re-compressing the partially decompressed data in
accordance with the header information in the second
format to provide the output bitstream.
A method for performing 2:I downscaling on video
data includes the steps of: forming at least one input
matrix of NxN (e. g., N=26) Discrete Cosine Transform
(DCT) coefficients from the video data by combining
four N/2xN/2 field-mode DCT blocks; performing vertical
downsampling and de-interlacing to the input matrix to
obtain two N/2xN/2 frame-mode DCT blocks; forming an
NxN/2 input matrix from the two frame-mode DCT blocks;
and performing horizontal downsampling to the NxN/2
matrix to obtain one N/2xN/2 frame-mode DCT block.
Preferably, the vertical and horizontal
downsampling use respective sparse downsampling
matrixes. In particular, a vertical downsampling
matrix of 0.5 (I8 I$] may be used, where I8 is an 8x8
identity matrix. This is essentially vertical pixel
averaging. A horizontal downsampling matrix composed
of odd "O" and even "E" matrices rnay be used.
Corresponding apparatuses are also presented.

CA 02522856 2005-11-10
9
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates an MPEG-2 video decoder.
FIG. 2 illustrates an MPEG-4 video decoder without
any scalability feature.
FIG. 3 illustrates a low complexity front-to-back
transcoder (with B frames disabled) in accordance with
the invention.
FIG. 4 illustrates a transcoder architecture that
minimizes drift error (with B frames enabled) in
20 accordance with the invention.
FIG. 5 illustrates a size transcoder in accordance
with the invention.
FIG. 6 illustrates downsampling of four field mode
DCT blocks to one frame mode DCT block in accordance
with the present invention.

CA 02522856 2005-11-10
DETAILED DESCRIPTION OF THE INVENTION
The invention relates to format transcoding (MPEG-
2 to MPEG-4) and size (spatial and temporal)
transcoding_
S The invention provides bit rate transcoding to
convert a pre-compressed bitstream into another
compressed bitstream at a different bit rate. Bit rate
transcoding is important, e.g., for streaming video
applications because the network bandwidth is not
10 constant and, sometimes, a video server needs to reduce
the bit rate to cope with the network traffic demand.
A cascaded-based transcoder which re-uses MVs from the
input bitstream and, hence, eliminates motion
estimation (ME), is among the most efficient of the bit
rate transcoders. The cascaded-based transcoder
decodes the input bitstream to obtain the MV and form
the reference frame. It then encodes this information
with a rate control mechanism to generate an output
bitstream at the desired bit rate.
Spatial resolution transcoding becomes a big issue
with the co-existence of HDTV and SDTV in the near
future. It is also very beneficial for the streaming
video application since it is likely that the Internet
bandwidth is not going to be large enough for broadcast
2S quality video_ Hence, downsampling of the broadcast
quality bitstream into a bitstream with a manageable
. resolution is appealing. Spatial resolution
transcoding usually performs in the compressed (DCT)

CA 02522856 2005-11-10
11
domain since it drastically reduces the complexity of
the system. The process of downsampling in the
compressed domain involves the processing of two
parameters, namely DCT coefficients and MVs. A
downsampling filter and its fast algorithm is suggested
to perform DCT coefficient downsampling. MV resampling
a
is used to find the My of the downsampled video. In
the real product, to avoid drift, the residual of the
motion compensation should be re-transformed instead of
~ approximating the DCT coefficients from the input
bitstream.
2. High level comparison
Structure-wise, MPEG-2 and MPEG-4 employ a similar
video compression algorithm. Fundamentally, both
standards adopt motion prediction to exploit temporal
correlation and quantization in the DCT domain to use
spatial correlation within a frame. This section
describes the structure of the MPEG-2 and MPEG-4
decoders at a high level, and then notes differences
between the two standards.
2.1 MQEG-2
FIG. 1 shows the simplified video decoding process
of MPEG-2. In the decoder 100, coded video data is
provided to a variable length decoding function 110 to
provide the one-dimensional data QFS[n], where n is a
coefficient index in the range of 0-63. At the inverse
' scan function 120, QFS[n] is converted into a two-
dimensional array of coefficients denoted by QF[v][u1,
where the array indexes a and v both lie in the range 0

CA 02522856 2005-11-10
12
to 7. An inverse quantisation function 230 applies the
appropriate inverse quantisation arithmetic to give the
final reconstructed, frequency-domain DCT coefficients,
F[v][u]. An inverse DCT function 140 produces the
pixel (spatial) domain values f [y] [x] . A motion
compensation function 150 is responsive to a frame
store memory 160 and the values f [y] [x] for producing
the decoded pixels (gels) d[y] [x7 , where y and x are
Cartesian coordinates in the pixel domain.
MPEG-2 operates on a macroblock level for motion
compensation, a block level for the DCT transformation,
and the coefficient level for run-length and lossless
coding. Moreover, MPEG-2 allows three types of
pictures, namely I-, P- and B- pictures. Allowed
motion prediction modes (forward, backward, bi-
directional) are specified for the P- and B- pictures.
MPEG-2 uses interlaced coding tools to handle
interlaced sources more efficiently.
2.2 MPEG-4
2o FIG. 2 shows the MPEG-4 video decoding process
without any scalability features.
At the decoder 200, data from a channel is output
from a demux 210. A coded bit stream of shape data is
provided to a switch 215, along with the MPEG-4 term
video_object~layer~shape (which indicates, e.g.,
whether or not the current image is rectangular, binary
only, or grayscale). If videoobject~layer_shape is
equal to "00" then no binary shape decoding is

CA 02522856 2005-11-10
1~
required. Otherwise, binary shape decoding is carried
out.
If binary shape decoding is performed, a shape
decoding function 220 receives the previous
reconstructed VOP 230 (which may be stored in a
memory), and provides a shape-decoded output to a
motion compensation function 240. The motion
compensation function 240 receives an output from a
motion decoding function 235, which, in turn, receives
a motion coded bit stream from the demux 210. The
motion compensation function 240 also receives the
previous reconstructed VOP 230 to provide an output to
a VOP reconstruction function 245.
The VOP reconstruction function 245 also receives
data from a texture decoding function 250 which, in
turn, receives a texture coded bit stream from the
demux 210, in addition to an output from the shape
decoding function 220. The texture decoding function
250 includes a variable length decoding function 255,
an inverse scan function 260, an inverse DC and AC
prediction function 270, an inverse quantization
function 280 and an Inverse DCT (IDCT) function 290.
Compared to MPEG-2, several new tools are adopted
in MPEG-4 to add features and interactivity, e.g.,
sprite coding, shape coding, still texture coding,
scalability, and error resilience. Moreover, motion
compensation and texture coding tools in MPEG-g, which
- are similar to MPEG-2 video coding, are modified to
improve the coding efficiency, e.g., coding tools such

CA 02522856 2005-11-10
14
as direct mode motion compensation, unrestricted motion
compensation, and advanced prediction.
In particular, direct mode motion compensation is
used for B-VOPs. Specifically, it uses direct bi-
directional motion compensation derived by employing I-
or P-VOP macroblock MVs and scaling them to derive
forward and backward MVs for macroblocks in B-VOP.
Only one delta MV is allowed per macroblock. The
actual MV is calculated from the delta vector and the
scaled MV from its co-located macroblock.
Unrestricted motion compensation allows one or
four MVs per macroblock. The four MV mode is only
possible in.B-VOPs with the use of direct mode. Note
that the MV for a chrominance macroblock is the average
of four MVs from its associated luminance macroblock.
Furthermore, unrestricted motion compensation allows an
MV to point out of the reference frame (the out-of-
bound texture is padded from the edge pixel).
Advanced prediction defines the prediction method
for MV and DCT coefficients. A MV predictor is set
according to the median value of its three neighbors'
MVs. Prediction of the intra DCT coefficient follows
the intra AC/DC prediction procedure (Graham's rule).
3. Transcoder architecture
FIG. 3 illustrates a low complexity front-to-back
transcoder in accordance with the invention, with B
frames disabled.
Similarities between the structures of MPEG-2 and
MPEG-4 allow a low complexity (front-to-back)

CA 02522856 2005-11-10
transcoder. Instead of completely decoding an MPEG-2
bitstream to the spatial (pixel) domain level, the
front-to-back transcoder 300 uses DCT coefficients and
MVs to generate an MPEG-4 bitstream without actually
S performing a motion estimation process. A trade-off is
that this architecture may cause a drift in the
reconstructed frame, and does not allow bit rate
control. However, the drift problem is small since
most of the difference between the MPEG-2 and MPEG-4
10 decoders lies in the lossless coding part.
The transcoder 300 comprises a cascade of a MPEG-2
bitstream reader (decoder) (310-330) and a MPEG-4
header and texture coder (encoder) (340-370), along
with a header decoding function 304, a look-up table
15 308, and a communication path 312. The transcoder 300
reads an input MPEG-2 bitstream, performs a variable
length decoding (VLD) at a function 310 on DCT
coefficients and MV residual, and then follows MPEG-2
logic to find DCT coefficients and/or MVs of every
block in the frame.
The header decoding function 304 decades the MEPG-
2 headers and provides them to a look-up table (or
analogous function) 308, which uses the tables detailed
below to obtain corresponding MPEG-4 headers.
With the information of the headers, DCT
coefficients and/or MV, the transcoder 300 encodes this
information into the MPEG-4 format. Note that the
reference frame is not needed in this architecture.

CA 02522856 2005-11-10
The transcoder 300 reads the MPEG-2 header from
the input bitstrea~ and writes the corresponding MPEG-4
header in its place in an output bitstream.
After processing at the VLD 310, the data is
5 provided to an inverse scan function 320, and an
inverse quantisation function 330. Next, using the
MPEG-4 header information provided via the path 312,
the decoded, DCT coefficient data is processed at a
MPEG-4 header and texture coder that includes a
10 quantisation function 340, and an AC/DC prediction
function 350 for differentially encoding the quantised
DCT coefficients. In particular, the AC/DC prediction
process generates a residual of DC and AC DCT
coefficients in an intra MB by subtracting the DC
15 coefficient and either the first row or first column of
the AC coefficients. The predictor is adaptively
selected. Note that the AC/DC prediction function 350
may not need the MPEG-4 header information.
Subsequently, a scan/run-length coding function
360 and a variable length encoding function 370 provide
the MPEG-4 bitstream.
FIG. 4 illustrates a transcoder architecture that
minimizes drift error in accordance with the invention,
with B frames enabled.
Like-numbered elements correspond to one another
in the figures.
To counter the problems of drift in the
reconstructed frame, and the lack of bit rate control,
a more complex architecture such as the transcoder 400,

CA 02522856 2005-11-10
17
which is an extension of the transcoder 300 of FTG. 3,
can be used. This architecture actually computes the
DCT coefficient of the texture/residual data, hence
motion compensation is required. Since the encoder of
this transcoder includes a decoding process, the drift
error can be minimized.
Moreover, the transcoder 400 can be used to
transcode bitstreams with B-frames since MPEG-4 does
not allow intra mode for B-frames. The transcoder 400
treats a block in intra mode in a B-frame (in MPEG-2)
as a block with a zero MV in inter mode (in MPEG-4). It
can be either a zero residual MV (PMV) or zero MV
(which may yield a non-zero MV code) since the MV is
predictive coded against the PMV.
In particular, the transcoder 400 includes a
variable length decoding function 405 that provides MV
residue data to a MV decoder 425, and that provides DCT
coefficient data to the inverse scan function 320. The
DCT data is processed by the inverse quantisation
function 330 and an inverse DCT function 420 to obtain
pixel domain data. Intra-coded pixel data is provided
via a path 422 to a buffer, while inter-coded pixel
data is provided to an adder 435 via a path 424.
The pixel (difference) data on path 424 is added
to reference pixel data from a motion compensation
function 430 (responsive to the MV decoder 425) to
provide inter-coded data to the buffer 450 via a path
448.

CA 02522856 2005-11-10
18
For re-encoding, e.g., in the MPEG-4 format, the
buffer 450 either outputs the infra pixel data directly
to a DCT function 455, or outputs the inter pixel data
to a subtractor 445, where a difference relative to an
output from a motion compensation function 440
Eresponsive to the MV decoder 425) is provided to the
DCT function 455.
The DCT coefficients are provided from the DCT
function 455 to the quantisation function 340, and the
guantised DCT data is then provided to the AC/DC (DCT
coefficient) prediction function 350, where AC and DC
residuals of the current MB are generated. These
residuals of DCT coefficients are entropy coded. The
output data is provided to the scan/run-length coding
function 360, and the output thereof is provided to the
variable length encoding function 370 to obtain the
MPEG-4 compliant bitstream.
The quantised DCT coefficients are also output
from the quantisation function 340 to an inverse
quantisation function 495, the output of which is
provided to an inverse DCT function 490, the output of
which is summed at an adder 485 with the output of the
motion compensation function 440. The output of the
adder 485 is provided to a buffer 480, and subsequently
to the motion compensation function 440.
The header decoding function 304 and look-up table
' 308 and path 312 operate as discussed in connection
with FIG. 3 to control the re-encoding to the MPEG-4
format at functions 340-370.

CA 02522856 2005-11-10
I9
4, Implementation of the Format Transcoder
This section explains the implementation of the
format transcoding, e.g., as implemented in FIGS 3 and
4, discussed above, and FIG. 5, to be discussed later.
Minor implementation details (e. g., systems-related
details such as the use of time stamps and the like)
that are not specifically discussed should be apparent
to those skilled in the art.
In a particular implementation, the transcoders of
the present invention can be used to convert a main-
profile, main-level (MP@ML) MPEG-2 bitstream into a
main-profile MPEG-4 bitstream. It is assumed that the
MPEG-2 bitstream is coded in frame picture structure
with B-picture coding (no dual prime prediction).
Generally, the same coding mode which is used in MPEG-2
coding should be maintained. This mode is likely to be
optimum in MPEG-4 and hence avoids the complexity of
the mode decision process. The transparency pattern in
MPEG-4 is always 1 (one rectangular object with the
same size of VOP in one VOP). That is, MPEG-4 allows
an arbitrarily shaped object which is defined by a
nonzero transparency pattern. This feature does not
exist in MPEG-2 so we can safely assume that all
transparency patterns of the transcoding object is one.
4.1 MPEG-2 bitstream reader
A transcoder in accordance with the invention
obtains the bitstream header, DCT coefficients and MVs
. from the MPEG-2 bitstream. This information is mixed
together in the bitstream. Both MPEG-2 and MPEG-4

CA 02522856 2005-11-10
bitstreams adopt a hierarchical structure consisting of
several layers. Each layer starts with the header
following by a multiple of its sublayer. In this
implementation, as shown in Table 1, the MPEG-2 layer
5 has a direct translation into the MPEG-4 layer, except
the slice layer in MPEG-2, which is not used in MPEG-4.
DC coefficients and predicted MVs in MPEG-4 are reset
at the blocks that start the slice.
However, some MPEG-4 headers are different from
10 MPEG-2 headers, and vice versa. Fortunately, the
restrictions in MPEG-2 and MPEG-2 header information
are sufficient to specify a MPEG-4 header. Tables 2
through 6 list MPEG-4 headers and their relation to a
MPEG-2 header or restriction at each layer.
15 Table 1. Relationship between MPEG-2 and MPEG-4
layers
MPEG-2 MPEG-4
Video Sequence Video Object Sequence (VOS) I
Video Object (V0)
Sequence Scalable Extension Video Object Layer (VOL)
Group of Picture (GOP) Group of Video Object Plane (GOV)
Picture Video Object Piane (VOP)
Macroblock Macroblock
Table 2. MPEG-4 header and its derivation (VOS and
VO)
Header Code Comment
Visual object sequence 0000180 Initiate a visual session
start_code
Profile and lave( indication00110100 Main ProfiielLevei 4

CA 02522856 2005-11-10
21
Visual_object sequence 00001 Terminate a visual session
end code B1
Visual_object start_code 00001 initiate a visual object
B5
Is visual object_identifier0 No version identification
of priority
needs to be specified
Visua! object_fype 0001 Video ID
Video_object start_code 0000010X-Mark a new video object
0000011
X
Video signal type Derived Corresponds to MPEG-2
from sequence display_extension
id
MPEG-2
Video_format Same as Corresponds to MPEG-2
MPEG-2 sequence disp(ay_extension
id
Video range Derived Corresponds to MPEG-2
from sequence_display_extension
id
MPEG-2
'
Colour description Same as Corresponds to MPEG-2
MPEG-2 sequence display_extension
id
Colour_primaries Same as Corresponds to MPEG-2
MPEG-2 colour description
Transfer characteristics Same as Corresponds to MPEG-2
Mf'EG-2 colour description
Matrix coefficients Same as Corresponds to MPEG-2
MPEG-2 colour description
Table 3. MPEG-4 header and its derivation (VOL)
Header Code Comment
Video_object_iayer start0000012X Mark a new video object
code layer
Random accessible vol 0 Allow non-infra coded
VOP
Video object type identification00000100 Main object type
!s_object_type identifier0 No version identification
of priority
needs to be specified
Aspect ratio info Same as Corresponds to MPEG-2
MPEG-2 aspect_ratio_information
Par width ~ Same as Corresponds to MPEG-2
(

CA 02522856 2005-11-10
22
MPEG-2 vertical size
Par height Same as Corresponds to MPEG-2
MPEG-2 horizontal size
Vo~ control_parameters Same as Correponds to MPEG-2
MPEG-2 extension start code
identifier
(sequence extension)
Chroma format Same as Corresponds to MPEG-2
MPEG-2 chroma format
Low delay Same as Corresponds to MPEG-2
MPEG-2 low_delay
Vbv_parameters Recomputed Follow MPEG-4 VBV spec.
Video_object layer_shape00 Rectangular
Vop_time_increment resolutionRecomputed See Table 7
Fixed vop_rate 1 Indicate that ail VOPs
are coded at
a fixed rate
Fixed_vop Time incrementRecomputed See Table 7
Video_object layer widthSame as Correpond to display_verticai
MPEG-2 size
Video object_layer heightSame as Correspond to
MPEG-2 dispiay_horizontal_size
Interlaced Same as Correspond to
MPEG-2 progressive_sequence
Obmc disable 1 Disable 08MC
Sprite enable 0 Indicate absence of sprite
Not_8 bit Derived Corresponds to MPEG-2
from intra dc_precision
MPEG-2
Quant_type 1 MPEG quantization
Complexity_estimation_disable1 Disable complexity estimation
header
Resync marker disable 1 Indicate absence of resync
marker
Data_partitioned 0 Disable data partitioning
Reversible vlc 0 Disable reversible vlc
i Scalability 0 Indicate that the current
'' layer is
used as base-layer

CA 02522856 2005-11-10
23
Table 4. MPEG-4 header and its derivation (VOP)
Header Code Comment
Vop_start_code 000007 Mark a start of a video
B6 object plane
Vop_coding type Same as Corresponds to MPEG-2
MPEG-2 picture_coding type
Modulo_time base RegeneratedFo(low MPEG-4 spec.
Vop_time_increment RegeneratedFollow MPEG-4 spec.
Vop coded 1 Indicate that subsequent
data exists
for the VOP
Vop_rounding type 0 Set value of rounding
control to '0'
Change conversion ratio_disabie1 Assume that conv_ratio
is '1' for al(
macroblocks
Vop constant alpha 0 Not include
vop_constant_alpha_vafue
in the
bitstream
Intra do vlc thr 0 Use infra DC vlc for entire
VOP
Top field first Same as Corresponds to MPEG-2
MPEG-2 top field_first
Alternate_vertical scan Same as Corresponds to MPEG-2
flag to
MPEG-2 alternate scan
Vop_quant Derived Corresponds to MPEG-2
from
MPEG-2 quantiser scale_code
Vop_fcode forward Same as See section 4.3
MPEG-2
Vop fcode_backward Same as See section 4.3
MPEG-2
Table 5. MPEG-4 header and its derivat~.on
(macroblock and MV)
Header Code Comment
Not_coded Derived from Corresponds to MPEG-2
MPEG-2
macroblock address increment

CA 02522856 2005-11-10
24
Mcbpc Derived from Corresponds to MPEG-2 f
MPEG-2 macroblock_type
Ac_pred_flag 0 Disable intra AC prediction
Cbpy Derived from See section 4.2
MPEG-2
Dquant Derived from See section 4.2
MPEG-2
Modb Derived from Corresponds to macrobiock_type
MPEG-2
Mb type Derived from Corresponds to macrobiock_type
MPEG-2
Cbpb Derived from See section 4.2
MPEG-2
Dbquant Derived from See section 4.2
MPEG-2
Horizontal mv_dataDerived from Corresponds to MPEG-2
MPEG-2 motion_code[r][s][(J]
Vertical mv_data Derived from Corresponds to MPEG-2
MPEG-2 motion code[r][sJ[1]
Horizontal mv_residualDerived from Corresponds to MPEG-2
MPEG-2 motion_residuai[r][s][0]
Vertical_mv_residua(Derived from Corresponds to MPEG-2
MPEG-2 motion residual[r][s][1)
Table 6. MPEG-4 header and its derivation (block
and interlaced information)
Header Code ~ Comment
Dct dc_size luminance Same as MPEG-2Corresponds to MPEG-2
dct do size luminance
Dct do differential Same as MPEG-2Correspond to dct_dc
differential
Dct dc_size_chrominanceSame as MPEG-2Corresponds to MPEG-2
dct do size chrominance
DCT coefficient Derived from See section 4.2
MPEG-2
DCT_type Same as MPEG-2Corresponds to MPEG-2
DCT_type
Field_prediction Same as MPEG-2Corresponds to MPEG-2
frame motion type
Forward_top field_referenceSame as MPEG-2Corresponds to MPEG-2
motion vertical field
select[0][0]

CA 02522856 2005-11-10
Forward_bottom_field_referenceSame as MPEG-2Corresponds to MPEG-2
motion_vertical field_seiect[1j[0]
Backward top field_referenceSame as MPEG-2Corresponds to MPEG=2
motion vertical field_select[Oj[1j
Backward bottom_field_referenceSame as MPEG-2Corresponds to MPEG-2
motion_vertical field
select[1j[1]
Table 7. Mapping of frame_rate~code in MPEG-2 to
vop_time-increment~resolution and
fixed vop time increment in MPEG-4.
Frame rate Vop time increment Fixed_vop_time increment
code resolution
0001 24,000 1001
0010 24 1
0011 25 1
0100 30,000 1001
0101 30 1
0110 50 1
0111 60,000 1001
1000 60 1
MV data is stored in the macroblock layer. Up to
5 four MVs are possible for each macroblock. Moreover, a
MV can be of either field or frame type and have either
full pixel or half pixel resolution. The MPEG-2 MV
decoding process is employed to determine motion~code
(VLC) and motion residual (FLC) and, hence, delta.
10 Combined with predictive MV, delta gives the
field/frame MV. The MV for skipped macroblocks is set
to zero.
DCT data is stored in the block layer. It is
first decoded from the bitstream (VLC), inverse scanned

CA 02522856 2005-11-10
26
using either zigzag or alternate scanning patterr_, and
then inverse quantized. The intra DC coefficient is
determined from dct dc'differential and the predictor
(the predictor is reset according to the MPEG-2 spec). -
DCT coefficients in a skipped macroblock are set to
zero.
4.2 Texture coding
A transcoder in accordance with the invention
reuses DCT coefficients (for inter frame). The
following guidelines should be used:
1. q,scale~type = 1 (linear scale) is used in
MPEG-2 quantization.
2. The MPEG quantization method should only be
used (not H.263) in MPEG-4 quantization to reduce a
25 mismatch between MPEG-2 and MPEG-4 reconstructed frame
(drift) .
3. A differential value of MPEG-2 QP determines
dquant in MPEG-4. Dquant is set to ~2 whenever the
differential value is greater than ~2. dquant is a 2-
bit code which specifies a change in the quantizer,
quant, for I- and P-VOPs.
4. The quantization matrix should be changed
following the change of matrix in the MPEG-2 bitstream.
5. The transcoder has the flexibility of
enabling an alternate vertical scanning method (for
interlaced sequence) at the VOL level.
6. Intra AC/DC prediction (which involves
scaling when the QP of the current block is not the
same as that of the predicted block) should be turned

CA 02522856 2005-11-10
27
off at a macroblock level to reduce.complexity and
mismatch in AC quantization.
7. Higher efficiency can be obtained with the
use of intra do vlc thr to select the proper VLC table
(AC/DC) for coding of intra DC coefficients, e.g., as a
function of the quantization parameter (except when
intra do vlc thr is either 0 or 7 - these thresholds
will force the use of the infra DC or AC table
regardless of the QP).
z0 8. A skipped macroblock is coded as not coded
macroblock (all DCT coefficients axe zero).
9. Cbpy and cbpc (CBP) are set according to
code~block~attern 420 (CBP 420). Note that there is a
slight discrepancy between CBP in MPEG-4 and CBP 420 in
MPEG-2 for an infra macroblock. Specifically, when
CBP 420 is set, it indicates that at least one of the
DCT coefficients in that block is not zero. CBP
contains similar information except it does not
corresponds to a DC coefficient in an intra macroblock
(also depending on intra'dc vlc~thr). Hence, it is
possible that CBP is not zero when CBP_420 is zero in
an infra macroblock (this case can happen in an I-VOP
and P-VOP, but not B-VOP).
There are three sources of loss in texture coding,
namely QP coding, DC prediction and nonlinear staler
for DC quantization. MPEG-4 uses differential coding
to code a QP. MPEG-2 allows all possible 32 QP values
at the expense of 5 bits. However, the differential
value can take up to ~2 (in QP value units) ar_d, hence,

CA 02522856 2005-11-10
28
a differential value greater than ~2 is loss. This loss
can be minimized by limiting the QP fluctuation among
the macroblock in the MPEG-2 rate control algorithm.
All intra macroblocks perform adaptive DC prediction,
which may take a different prediction from the previous
macroblock (MPEG-2 DC prediction) thereby causing a
different DC residual for the quantization. DC
coefficients of all intra macroblocks in MPEG-4 are
also quantised in a different manner from MPEG-2
because of the nonlinear sealer. Therefore, quantised
DC coefficients for MPEG-2 and MPEG-4 coding are likely
to be different for an intra macroblock.
4.3 MV coding
The transcoder encodes MVs into an MPEG-4 format.
However, there is no error involved in transcoding a MV
from MPEG-2 to MPEG-4 since MV coding is a lossless
process. The following constraints are imposed on a
MPEG-4 encoder:
1. Unrestricted motion compensation mode is
disabled, which means no MV pointing outside the
boundary of the frame.
2. Advanced prediction mode is employed. A
different predictor (a median value) is used in an
MPEG-4 bitstream, but a MV for 8x8 pals block is not.
That is, advanced prediction mode allows 8x8 MV and
nonlinear (median filter) predictor. Only a nonlinear
predictor is used in our format transcoder (we still
keep a 16x16 MV) .

CA 02522856 2005-11-10
29
3. Direct mode is not allowed in an MPEG-4
bitstream, which means there are only four MV types for
a B-VOP, i.e., 16x16 forward and backward vectors and
26x8 forward and backward field vectors.
4. Field motion compensation is applied whenever
a 16x8 field vector is used (maintain mode).
5. A skipped macroblock is coded as not coded
macroblock (motion compensation with zero MV).
6. Single f~code is allowed in MPEG-4.
Therefore, the larger f~code in MPEG-2 between the two
directions (vertical, horizontal) is converted to
f code in MPEG-4 based on the following relationship:
f code (MPEG-4 ) - f code (MPEG-2 ) -1 .
7. A padding process is not used since the
texture for the entire reference frame is known.
8. Field motion compensation is used whenever
dual prime arithmetic is activated. Vector parities
(field of the reference and field of the predicting
frame) are preserved. Field MVs are generated
according to vector [o] [0] [2 : 0] which is coded in the
MPEG-2 bitstream. When prediction of the same parity
is used (e. g., top field to top field, or bottom field
to bottom field) , both field MVs are vector [0J [0] [1: 01
When prediction of the odd parity is used (e.g., top
field to bottom field, or bottom field to top field),
the top field MV uses vector [2) [0] [1 : 0] and the bottom
field MV uses vector [3] [0) [1 : 0] . Vector [r] [0J [0 : 1] for
r=2,3 can computed as follows:

CA 02522856 2005-11-10
(a) vector [r1 [o] [o] - (vector [0] [o] [o] x
m[parity-ref] [parity-pred] //2) + dmvector [o] .
(b) Vector [r] [o] [1] - (vector [o] [o] [1] x
m [pari ty_ref ] [pari ty pred] / / 2 ) +
5 a [parity-ref] [parity_pred] + dmvector [1] .
Note that (m[parity~ref][parity pred] and
a[parity ref][parity_pred] are defined in Table 7-11
and 7-I2, respectively in the MPEG-2 specification
(TSO/IEC 13818-2).
10 Moreover, "r" denotes the order of the MV, e.g.,
first, second, etc. r=0 denotes to the first set of
MV, and r=1 denotes the second set of MV. Dual prime
prediction uses r=2 and r=3 to identify two extra sets
of MVs .
ii//~~ denotes integer division with rounding to the
nearest integer.
4.4 Coding of intra MB in B-VOP
Additional conversion is necessary when coding an
intra MB in a B-frame of a MPEG-2 bitstream (e.g., as
20 shown in FTG. 4). MPEG-4 replaces intra mode with
direct mode for B-VOP and hence an intra MB in B-frame
has to be coded differently in the MPEG-4 syntax.
There are two practical solutions to this problem.
The first solution employs the architecture
25 similar to the front-to-back transcoder of FTG. 3 (no
buffer for the entire reference frame). MC is
performed against the previous MB (or previous MB
without compensating texture residual with the expense
of the extra memory vaith the size of one MB) in the

CA 02522856 2005-11-10
31
same VOP under the assumption. that this MB is close
enough to its reference MB (its uncompensated version).
The MV for the intra MB equals the MV of the previous
MB offset by its MB distance.
The second solution uses the architecture similar
to the one shown in FIG. 4. It keeps the reference
frame for all I and P-VOPs. Note that MC has to be
performed on all P-VOPs in this solution. The MV for
the intra MB is the same as the predicted MV (median of
its three neighbors) and MC is performed against the
reference MB pointed by the derived MV.
5_ Video downscaling in the compressed domain
Generally, video downscaling and size transcoding
have the same meaning. Downsampling means sub-sampling
with an anti-aliasing (low pass) filter, but
subsampling and downsampling are used interchangeably
herein.
Size transcoding becomes computationally intensive
when its input and output are in the compressed domain.
A video downscaling process which limits its operations
in the compressed domain (and, in effect, avoids
decoding and encoding processes) provides a much
reduced complexity. However, two new problem arises
with downscaling in the compressed domain, i.e.,
downsampling of DCT coefficients and MV data.
Recently, video downscaling algorithms in the
' compressed domain have been discussed, but they do not
address the complete transcoding between MPEG-2 and

CA 02522856 2005-11-10
32
MPEG-4, which includes field-to-frame deinterlacing.
The present invention addresses this problem.
Subsection 5.1 and 5.2 provide solutions to two
new problems in the downsampling process. The
implementation of a proposed size transcoder in
accordance with the invention is described in section 6
and FTGs 5 and 6.
5.1 Subsampling of DCT block
In frame-based video downscaling, it is necessary
to merge four 8x8 DCT blocks into a new 8x8 DCT block
(specific details involving a field block will be
described later). Moreover, the output block should be
a low pass version of the input blocks. This process
is carried out in the spatial domain by multiplying the
input matrix with a subsampling matrix (preferably with
a low pass filter). Multiplication by a subsampling
matrix in the spatial domain is equivalent to
multiplication by DCT coefficients of a subsampling
matrix in the DCT domain because of the distributive
property of the orthogonal transform. However, the
number of operations (computations) in the downsampling
process in the DCT domain for some downsampling filters
can be as high as the total number of operations of its
counterpart in the spatial domain. The solution to
this problem is to employ a downsampling matrix which
is sparse (e. g., a matrix that has relatively few non-
zero values, e.g., approximately 30s or less).
A sparse downsampling matrix may be based on the
orthogonal property between the DCT basis vector and

CA 02522856 2005-11-10
33
the symmetry structure of the DCT basis vector. One
approach, discussed in R. Dugad and N. Abuja, "A Fast
Scheme For Downsampling And Upsampling In The DCT
Domain," International Conference on Image Processing
(ICIP) 99, incorporated herein by reference, takes the
lower 4x4 DCT coefficients from four processing blocks,
applies a 4x4 IDCT to each DCT subblock, forms a new
8x8 pixel block and applies an 8x8 DCT to obtain an
output block. The downsampling matrix can be pre-
calculated since the downsampling process is fixed. By
splitting the 8x8 DCT matrix into left and right
halves, about half of the downsampling matrix values
are zero because of the orthogonality between the
column of the 4x4 IDCT matrix and the row of both left
and right 8x4 DCT matrices. This operation (one
dimension) can be written mathematically as:
b1 T4 B1
B=Tb= ~ =~TL . TH~ ... =TL T4 B1+TRT4 B2
62 T4 B2
where b is a 8x1 spatial input vector, B is its
corresponding 8x1 DCT vector, b1 and b2 are subsampled
ZO 4x1 vectors, BI and B2 are lower 4x1 DCT vectors, T is
the 8x8 DCT transform matrix, Tg is the 4x4 DCT
transform matrix, TLand TR are left and right half of
T. The superscript °t" denotes a matrix transpose.
Dugad's algorithm also employs the symmetry property of
the DCT basis vector to reduce the complexity of the
downsampling process. TLT4 and TRT4 are identical in

CA 02522856 2005-11-10
34
terms of magnitude ( TzT4(~~j)=(-~)'~jTRT4~l,j),0s1s7,0<_ j_<3 ) since
odd rows of T are anti-symmetric and even rows of T are
symmetric. "i" is a matrix row index, and "j" is a
matrix column index. Hence, both TLT4 and ~RT4 can be
calculated based on the same components, i.e., a
symmetrical part, E, (index which i+j is even) and an
anti-symmetrical part, O, (index which i+j is odd)
( TL T4 =E+O and TRT4 =E-O) . This arrangement effectively
reduces the number of multiplications by a factor of
two when the downsampling~process is done as:
B=TL T4 B1 +TRT4B2 =(E+O)B1 +(E-O)BZ =E(B1 +B2)+O(B~ -B2)
Implementation of Dugad's method to convert four
field blocks into one frame block is not as simple. An
extension of the downsampling process in this scenario
(one dimension) can be written as:
B = T(sTT4 B~. + s,~T4 BB )
where BT and BB are the lower 4x2 field vectors, ST anal
SB are DCT values of an 8x4 deinterlacing matrix
corresponding to its top, sT, and bottom, se, field
block, respectively. Elements of sT, sT(i,j)=1 if
( j =2i , OSi<3 ) and s~ (i, j ) =0 otherwise . Elements of s.~,
sB(i,j)=1 if (j=2i+1, OSiS3J and sB(i,j)=0 otherwise.
This is a modification of Dugad's algorithm for
~ downsampling and deinterlacing in accordance with the
present invention.

CA 02522856 2005-11-10
The operations of downscaling and the
deinterlacing process are more complex since S and 7'
are not orthogonal to each other and, hence, the
downsampling matrix is not sparse. C. Yim and M.A.
5 Isnardi, "An Efficient Method For DCT-Domain Image
Resizing With Mixed Field/Frame-Mode Macroblocks," IEEE
Trans. Circ. and Syst. For Video Technol., vol. 9.
pp.696-700, Aug. 1999, incorporated herein by
reference, propose an efficient method for downsampling
10 a field block. A low pass filter is integrated into
the deinterlacing matrix in such a way that the
downsampling matrix (S=0.5 jI8 .I8] ) is sparse.
I8 denotes an 8x8 identity matrix, and (I8 Iel
denotes a 16x8 matrix that comprises a concatenation of
15 the two identity matrixes. The identity matrix, of
course, has all ones on the diagonal and all zeroes
elsewhere.
The method starts with four 8x,8 IDCT field blocks,
then applies the downsampling matrix, S, and performs
20 an 8x8 DCT to obtain the output block. Note that an
8x8 IDCT is used in this method instead of a 4x4 IDCT.
This operation can be shown mathematically (in one
dimension) as:
t r
D = TS T 0 1 T flg ; I8 T 0
0 T 2 0 T 2
5.2 Subsampling of Mtl data

CA 02522856 2005-11-10
36
ME is the bottleneck of the entire video encoding
process. It is hence desirable to estimate a MV of the
resized MB by using MVs of four original MBs without
actually performing ME (assuming that all MBs are coded
in inter mode). Note that, if an MPEG-2 bitstream is
assumed, subsampling of MV data takes MVs of four MBs
since each MB has one input (only an MPEG-4 bitstream
can have a MV for every block). The simplest solution
is to average four MVs together to obtain the new MV
but it gives a poor estimate when those four MVs are
different. B. Shen, I.K. Sethi and B. Vasudev,
"Adaptive Motion-Vector Resampling For Compressed Video
Downscaling," TEES Trans. Circ. and Syst. For Video
Technol., vol. 9, pp. 929-936, Sep. 1999, show that a
better result can be obtained by giving more weight to
the worst predicted MV. A matching accuracy, A, of
each MV is indicated by the number of nonzero AC
coefficients in that MB. By using the Shen et al.
technique, the new MV for the downscaled MB can be
computed as:
4
~MVAI
MY' _ ~ '-f4
Ai
i=1
M.R. Hashemi, L. Winger and S. Parichanathan,
"Compressed Domain Motion Vector Resampling For
Downscaling Of MPEG Video," ICIP 99,_propose a
I nonlinear method to estimate the MV of the resized MB.
Similar to the algorithm in Shen et al., Hashemi's

CA 02522856 2005-11-10
37
technique uses spatial activity of the processing MBs
to estimate the new MV. A heuristic measurement,
called Maximum Average Correlation (MAC) is employed in
Hashemi's method to identify one of the four original
MVs to be the output MV. Hy using the MAC, the new MV
for the downscaled MB can be computed as::
4
MY = max ~ A~ pd!
1.~
where p is the spatial correlation and is set to 0.85,
and d1 is the Euclidean distance between the ith input
20 MV (MVO) and the output MV.
6. Implementation vt the size transcoder
FIG. 5 illustrates a sine transcoder in accordance
with the invention. B frames may be present in the
input bitstream, but are discarded by the transcoder
and therefore do not appear in the output bitstream.
In the transcoder 500, a MV scaling function 510,
DCT scaling function 520, and spatial scaling function
540 are added. Switches 530 and 535 are coordinated so
that, in a first setting, an output of the DCT function
455 is routed into the quantisation function 340, and
the switch 535 is closed to enable an output of the
spatial scaling function 540 to be input to the adder
445. In a second setting of the switches 530 and 535,
an output of the DCT scaling function 520 is routed
. into the quantisation function 340, and the switch 535
is open.

CA 02522856 2005-11-10
38
The transcoder 500 converts an MPEG-2 bitstream
into an MPEG-4 bitstream which corresponds to a smaller
size video, e.g., from ITU-R 601 (720x480) to SIF
(352x240) .
To achieve a bandwidth requirement for the MPEG-4
bitstream, the transcoder 500 subsamples the video by
two in both the horizontal and vertical directions (at
the spatial scaling function 540) and skips all B-
frames (at temporal scaling functions 545 and 546),
thereby reducing the temporal resolution accordingly.
Note that the Cemporal scaling function 546 could
alternatively be provided after the DCT scaling
function 520. Skipping of B-frames before performing
downscaling reduces complexity.
Moreover, a low pass filter (which can be provided
in the spatial scaling function 540) prior to
subsampling should result in improves image quality.
The invention can be extended to include other
downsampling factors, and B-VOPs, with minor
modifications. Specifically, changes in MV downscaling
and mode decision are made. MV downscaling for B-VOP
is a direct extension of what was discussed to include
the backward MV. The mode decision for B-VOP can be
handled in a similar way as in the P-VOP (e.g., by
converting uni-directional MV into bi-directional MV as
in converting intra MB into inter MB in a P-VOP).
Below, we discuss six problems that are addressed
by the size transcoder 500. We also assume that the
input video is 704x480 pixel resolution, and coded with

CA 02522856 2005-11-10
39
an MP@ML MPEG-2 encoder, and the desired output is
simple profile MPEG-4 bitstream which contains SIF
progressive video (with a frame rate reduction by N).
However, the invention can be extended to other input
and output formats and resolutions as well.
6.1 Progressive Video MV downscaling (lama)
This problem appears when all four MBs are coded
as inter, and use frame prediction. Each MV in those
MBs is downscaled by two in each direction (horizontal
and vertical) to determine the MV of four blocks in
MPEG-4 (MPEG-4 allows one MV per 8x8 block). The
scaled MVs are then predictively encoded (using a
median filter) using the normal MPEG-4 procedure.
Note that each MB (comprising four blocks) has to
be coded in the same mode in both MPEG-2 and MPEG-4.
With video downscaling, the output MB (four blocks)
corresponds to four input MBs.
6.2 Interlaced Video MV downsampling (lama)
This problem exists when all four MBs are coded as
inter and use field prediction. We need to combine two
field MVs in each MB to get a frame MV of the resized
block. Instead of setting the new MV based on the
spatial activity, the proposed transcoder picks the new
My based on its neighbors' MVs. The MVs of all eight
surrounding MBs are used to find a predictor (field MVs
are averaged in case of MB with field prediction). The
median value from these eight MVs becomes a predictor,
and the field MV of the current MB, which is closer in

CA 02522856 2005-11-10
terms of Euclidean distance, is scaled by two in the
horizontal direction to become the new MV.
6.3 MV downsampling (chroma)
This problem happens when all four MBs are coded
5 as inter, and use either frame or field prediction
(MPEG-4 treats both prediction mode in the same way for
a chroma block). The process follows the MPEG=4 method
to obtain a chroma MV from a lama MV, i.e., a chroma MV
is the downscaled version of the average of its four
10 corresponding, 8x8 lama MVs.
6.4 DCT downsampling (lama progressive, chroma)
This problem occurs when all four lama MBs are
coded as intra or inter, and use frame MB structure,
and their eight chroma blocks (four for Cr and four
15 for Cb) use either frame or field structure)_ Dugad's
method is used to downscale the lama and chroma DCT
blocks by a factor of two in each direction.
6.5 Interlaced DCT downsampling (lama)
This problem arrives in one of two ways. First,
20 its associated MB uses field prediction and second, its
associated MB uses frame prediction. In either case,
we want to downscale four 8x8 field DCT blocks (two for
the top field, and two for the bottom field) into one
8x8 frame DCT block. The solution for the first case
25 is to use the same field DCT block as the one chosen
f_or MC. The second case involves deinterlacing and we
propose a combination of the Dugad and Yim methods,
discussed above.

CA 02522856 2005-11-10
4?
Specifically, the transcoder first downscales four
field blocks in the vertical direction (and at the same
time performs deinterlacing) based on the Yim algorithm
to obtain two frame blocks. The transcoder then
downscales these two frame blocks in the horizontal
direction to get the output block using the Dugad
algorithm.
This is illustrated in FIG. 6, where four 8x8
coefficient field-mode DCT blocks are shown at 600, two
8x8 frame-mode DCT blocks are shown at 610, and one 8x8
frame-mode DCT block is shown at 620.
The procedure for DCT downscaling in accordance
with the invention can be summarized as follows:
1. Form the 16x16 coefficient input matrix by
combining four field blocks together as shown at 600.
2. For vertical downscaling~and filtering, apply a
low pass (LP) filter D according to Yim's algorithm to
every ro~;v of the input matrix. The LP input matrix is
now 16x8 pixels, as shown at 610.
3. Form B1 and B2 8x8 matrices from the LP matrix
(fB, : B2~).
4. Perform a horizontal downscaling operation
according to Dugad's algorithm to every column of B~
and BZ to obtain the output matrix (8x8) (620) as
follows:
B = B1 (TL T4 )1 + B2 (TRT4 )~ _ (B1 + B2 )E + (B1 - B2 )~
where E and O denote even and odd rows as
discussed above.

CA 02522856 2005-11-10
42
In particular, a horizontal downsampling matrix
composed of odd "O" and even "E" matrices as follows
may be used (ignoring
the scaling factor):
E = [e(0) 0 0 0,
0 e(1) 0 e(2),
0 0 0 0,
0 e(3) 0 e(4) ,
0 0 a (5) 0,
0 e(6) 0 e(7) ,
0 0 0 0,
0 a (8) 0 a (9) ] .
0 = [0 0 0 0,
0(0) 0 0(1) 0,
0 0(2) o o,
0(3) 0 0(4) 0,
0 0 0 0,
0(5) 0 0(6) 0,
0 0 0 0(7) ,
0(8) 0 0(9) 0] .
The coefficients follows can be used:
as
a (0) - 4 0 (0) - 2. 56925448
a (1) - 0.831469612 0 (1) --0 . 14931.5668
a (2) - 0.045774654 0 (2) - 2
e(3) - 1.582130167 0(3) --0.899976223
e(4) --0.195090322 0(4) - 1.026559934
' e(5) - 2 0(5) - 0.601344887
e(6) --0.704885901 0(6) - 1.536355513
e(7) - 0.980785280 0(7) - 2

CA 02522856 2005-11-10
43
e(8) - 0.90612744& 0(8) --0.509795579
e(9) - 1.731445835 0(9) --0.750660555.
Essentially, the product of a DCT matrix which is
sparse is used as the downsampling matrix.
The technique may be extended generally for 2:1
downsizing of an NxN block that comprises four N/2xN/2
coefficient field-mode blocks. Other downsizing ratios
may also be accommodated.
6.6 Special cases
Special cases occur when all four MBs axe not
coded in the same mode (not falling in any of the five
previous cases). We always assume that any intra or
skipped MB among the other inter MBs are inter mode
with zero MV. Field MVs are merged based on section
6.2 to obtain frame MV, and then we apply the
techniques of section 6.1. MC is recommended to
determine the texture of the intra block, which is
treated as an inter block with a zero MV by the
transcoder.
7. Conclusion
It should now be appreciated that the present
invention provides a transcoder architecture that
provides the lowest possible complexity with a small
error. This error is generated in the MPEG-4 texture
2S encoding process (QP coding, DC prediction, nonlinear
DC scalar). These processes should be removed in the
future profile of MPEG-4 to create a near-lossless
transcoding system.

CA 02522856 2005-11-10
44
The invention also provides complete details of a
size transcoder to convert a bitstream of ITU-R 601
interlaced video coding with MPEG-2 MP@ ML into a
simple profile MPEG-4 bitstream which contains SIF
progressive video suitable for a streaming video
application.
For spatial downscaling of field-mode DCT blocks,
it is proposed to combine vertical and horizontal
downscaling techniques in a novel manner such that
sparse downsampling matrixes are used in both the
vertical and horizontal direction, thereby reducing
computations of the transcoder.
Moreover, for MV downscaling, we propose using a
median value from its eight neighboring MV. This
proposal works better than algorithms in section 5.2
since our predicted MV go with the global MV. It also
works well with an interlaced MB, which has only two
MVs instead of 4 MVs per MB.
Although the invention has been described in
connection with various preferred embodiments, it
should be appreciated that various modifications and
adaptations may be made thereto without departing from
the scope of the invention as set forth in the claims.

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee and Payment History should be consulted.

Event History

Description	Date
Inactive: Expired (new Act pat)	2021-05-25
Inactive: COVID 19 - Deadline extended	2020-05-14
Common Representative Appointed	2019-10-30
Common Representative Appointed	2019-10-30
Revocation of Agent Requirements Determined Compliant	2017-04-04
Inactive: Office letter	2017-04-04
Appointment of Agent Requirements Determined Compliant	2017-04-04
Appointment of Agent Request	2017-03-16
Revocation of Agent Request	2017-03-16
Letter Sent	2016-04-08
Inactive: IPC deactivated	2014-05-17
Inactive: IPC deactivated	2014-05-17
Inactive: IPC from PCS	2014-02-01
Inactive: First IPC from PCS	2014-02-01
Inactive: IPC from PCS	2014-02-01
Inactive: IPC from PCS	2014-02-01
Inactive: IPC expired	2014-01-01
Inactive: IPC expired	2014-01-01
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Letter Sent	2013-08-14
Grant by Issuance	2008-01-15
Inactive: Cover page published	2008-01-14
Inactive: Final fee received	2007-10-15
Pre-grant	2007-10-15
Notice of Allowance is Issued	2007-04-13
Letter Sent	2007-04-13
Notice of Allowance is Issued	2007-04-13
Inactive: Approved for allowance (AFA)	2007-03-30
Amendment Received - Voluntary Amendment	2007-01-24
Inactive: S.30(2) Rules - Examiner requisition	2006-07-25
Inactive: Office letter	2006-01-20
Inactive: Cover page published	2005-12-22
Inactive: First IPC assigned	2005-12-16
Inactive: IPC assigned	2005-12-16
Letter sent	2005-11-29
Inactive: Divisional - Presentation date updated	2005-11-22
Letter Sent	2005-11-22
Divisional Requirements Determined Compliant	2005-11-22
Application Received - Regular National	2005-11-22
Application Received - Divisional	2005-11-12
All Requirements for Examination Determined Compliant	2005-11-10
Request for Examination Requirements Determined Compliant	2005-11-10
Application Published (Open to Public Inspection)	2001-12-13

Abandonment History

There is no abandonment history.

Maintenance Fee

The last payment was received on 2007-03-16

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

the reinstatement fee;
the late payment fee; or
additional fee to reverse deemed expiry.

Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type	Anniversary Year	Due Date	Paid Date
MF (application, 3rd anniv.) - standard	03	2004-05-25	2005-11-10
Registration of a document			2005-11-10
Request for examination - standard			2005-11-10
Application fee - standard			2005-11-10
MF (application, 2nd anniv.) - standard	02	2003-05-26	2005-11-10
MF (application, 4th anniv.) - standard	04	2005-05-25	2005-11-10
MF (application, 5th anniv.) - standard	05	2006-05-25	2006-03-16
MF (application, 6th anniv.) - standard	06	2007-05-25	2007-03-16
Final fee - standard			2007-10-15
MF (patent, 7th anniv.) - standard		2008-05-26	2008-04-07
MF (patent, 8th anniv.) - standard		2009-05-25	2009-04-07
MF (patent, 9th anniv.) - standard		2010-05-25	2010-04-07
MF (patent, 10th anniv.) - standard		2011-05-25	2011-04-18
MF (patent, 11th anniv.) - standard		2012-05-25	2012-04-16
MF (patent, 12th anniv.) - standard		2013-05-27	2013-04-15
Registration of a document			2013-07-26
MF (patent, 13th anniv.) - standard		2014-05-26	2014-04-15
MF (patent, 14th anniv.) - standard		2015-05-25	2015-05-19
Registration of a document			2016-03-22
MF (patent, 15th anniv.) - standard		2016-05-25	2016-05-23
MF (patent, 16th anniv.) - standard		2017-05-25	2017-05-22
MF (patent, 17th anniv.) - standard		2018-05-25	2018-05-21
MF (patent, 18th anniv.) - standard		2019-05-27	2019-05-17
MF (patent, 19th anniv.) - standard		2020-05-25	2020-05-15

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
GOOGLE TECHNOLOGY HOLDINGS LLC

Past Owners on Record
KRIT PANUSOPONE
XUEMIN CHEN

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Description	2005-11-10	44	1,714
Abstract	2005-11-10	1	32
Claims	2005-11-10	4	77
Drawings	2005-11-10	8	182
Representative drawing	2005-12-21	1	17
Cover Page	2005-12-22	1	54
Representative drawing	2007-12-19	1	17
Cover Page	2007-12-19	1	54
Acknowledgement of Request for Examination	2005-11-22	1	176
Commissioner's Notice - Application Found Allowable	2007-04-13	1	162
Correspondence	2005-11-22	1	38
Correspondence	2006-01-20	1	15
Fees	2006-03-16	1	27
Fees	2007-03-16	1	29
Correspondence	2007-10-15	1	28

Language selection

Menus

English Abstract

French Abstract

Event History

Abandonment History

Maintenance Fee

Fee History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2522856 Summary

English Abstract

French Abstract

Event History

Abandonment History

Maintenance Fee

Fee History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.