Note: Descriptions are shown in the official language in which they were submitted.
CA 02805722 2015-06-12
77762-30
=
1
METHOD AND APPARATUS OF REGION-BASED ADAPTIVE LOOP
FILTERING
CROSS REFERENCE TO RELATED APPLICATIONS
The present invention claims priority to U.S. Provisional Patent Application,
Serial No.
61/390,068, filed October 5, 2010, entitled "Improved In-Loop Filter", U.S.
Provisional Patent
Application, Serial No. 61/421,729, filed December 10, 2010, entitled "Rate-
Distortion-Complexity
Optimization for Adaptive Loop Filter", U.S. Provisional Patent Application,
Serial No. 13/177,343,
filed July 06, 2011, entitled " Method and Apparatus of Region-Based Adaptive
Loop Filtering",
U.S. Provisional Patent Application, Serial No. 61/454,829, filed March 21,
2011, entitled "
Region-Based ALF". The present invention is also related to U.S. Patent
Application, Serial No.
13/093,068, filed on April 25, 2011, entitled "Method and Apparatus of
Adaptive Loop Filtering"
and U.S. Patent Application, Serial No.13/158,427, filed on June 12, 2011,
entitled "Apparatus and
Method of Sample Adaptive Offset for Video Coding", U.S. Provisional Patent
Application, Serial
No. 61/429,313, filed January 03, 2011, entitled "Media Tek's Adaptive Loop
Filter(MTK_ALF)".
TECHNICAL FIELD
The present invention relates to video coding. In particular, the present
invention relates to
adaptive loop filtering.
BACKGROUND
Motion compensated inter-frame coding has been widely adopted in various
coding standards,
such as MPEG-1/2/4 and H.261/H.263/H.264/AVC. Motion estimation and
compensation as well
as subsequent processing are performed on a block basis. During compression
process, coding
noises may arise due to lossy operations applied such as quantization. The
coding artifacts may
become noticeable in the reconstructed video data, especially at or near block
boundaries. In order
to alleviate the visibility of coding artifacts, a technique called deblocking
has been used in newer
coding systems such as H.264/AVC and the High Efficiency Video Coding (HEVC)
system under
development. The deblocking process applies filtering across block boundaries
adaptively to
smooth the large transitions at and near block boundaries due to coding noises
while retaining
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
2
image sharpness. Furthermore, due to the nature of inter-frame coding, the
deblocking process is
configured for in-loop operation. In the recent HEVC development, adaptive
loop filtering (ALF)
is being adopted to process deblocked reconstruction frames. Adaptive loop
filtering is used as in-
loop processing in addition to deblocking and is often applied after
deblocking of reconstructed
video data.
The conventional adaptive loop filter is only applied to these blocks where
the filtering helps
to improve performance. For other blocks that adaptive loop filtering does not
help to improve
performance, adaptive loop filtering is not applied. When the ALF is applied,
a single filter SF is
applied to the blocks in a slice. The single filter is selected from a group
of filter candidates to
achieve the best performance, such as minimum distortion, minimum rate or best
R-D performance.
Such ALF scheme is referred to as SF (single filter) ALF. Another ALF scheme
(called QC ALF)
was proposed by Qualcomm ("Video coding technology proposal by Qualcomm Inc.",
Karczewicz
et al., Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T 5G16 WP3
and ISO/IEC
JTC1/SC29/WG11, 1st Meeting: Dresden, DE, 15-23 April, 2010, Document: ICTVC-
A121.
According to QC_ALF, the ALF is applied to the deblocked video data on a block
by block basis. For each
block, the Sum-modified Laplacian Measure (SLM) SLiV1(0) of each pixel (0) of
the block is computed.
Each pixel of the block is then classified into multiple classes or categories
according to the SLM
measurement, and a respective ALF filter is selected for each pixel. While the
QC ALF scheme may
improve performance over the SF ALF, the SLM computations for each pixel and
filter switching from pixel
to pixel will result in higher computational complexity and consume more
power. It is desirable to develop
an ALF scheme that provides the advantage of multiple filter choices for each
region without high
computational complexity and the need to switch filter from pixel to pixel.
During the ALF design in terms of region partition and mode decision, an
optimization technique,
called rate-distortion optimization (RDO), is often used to guide the region
partition and mode decision.
While the RDO technique achieves the best rate-distortion performance, it does
not take into consideration of
the system complexity, which may consume valuable system resources. It is
desirable to use an optimization
scheme that achieves the best performance in terms of rate, distortion, and
complexity. Accordingly, the
rate-distortion-complexity optimization (RDCO) technique is used for ALF
design. The RDCO
technique is also applied to sample adaptive offset (SAO) design to achieve
the best rate-distortion-
complexity performance.
SUMMARY
A method and apparatus for processing of coded video using adaptive loop
filter are
disclosed. In one embodiment according to the present invention, the method
and apparatus for
CA 02805722 2015-06-12
77762-30
3
processing of coded video using adaptive loop filter comprise steps of
receiving reconstructed video
data corresponding to coded video data from a processing unit, applying
adaptive loop filtering to
the reconstructed video data to generate filtered video data, and providing
the filtered video data.
The adaptive loop filtering may be region based ALF or non-region based ALF
according to a
region adaptation flag. If the region based ALF is selected, the reconstructed
video data can be
divided into MxN regions, where M and N are integers. Regions of the MxN
regions may be
merged using either 1-D or 2-D syntax. Furthermore, a merge flag may be used
to indicate whether
a region is merged with one or more neighboring regions. When 1-D syntax is
used, a scanning
pattern may be used to assign a group index to each of the MxN regions,
wherein the scanning
pattern is selected from a group consisting of deformation of Hilbert curve,
horizontal snake scan,
vertical snake scan, zig-zag scan, spiral scan, quad-tree scan, and raster
scan. In another
embodiment according to the present invention, each of the MxN regions is
aligned with boundaries
of largest coding units (LCUs). In yet another embodiment according to the
present invention, a
merge flag is used to indicate whether region merging is allowed or not.
A method and apparatus for adaptive loop filter (ALF) design or sample
adaptive offset (SAO)
design are disclosed. In one embodiment according to the present invention,
the method and
apparatus for adaptive loop filter (ALF) design or sample adaptive offset
(SAO) design comprise
steps of determining candidates associated with a design feature for adaptive
loop filter (ALF)
design or sample adaptive offset (SAO) design and selecting a best candidate
among the candidates
according to rate-distortion-complexity optimization (RDCO). The design
feature can be associated
with mode decision or region partition. For the ALF design, the complexity of
the RDCO is related
to the number of filter coefficients, a combination of the number of filter
coefficients and the
number of pixels to be filtered, or a combination of the number of filter
coefficients, the number of
pixels to be filtered and the number of operations associated with a candidate
mode for mode
decision. For the SAO design, the complexity of the RDCO can be related to the
number of
operations associated with a pixel classification type or a combination of the
number of operations
associated with a pixel classification type and the number of pixels to be
compensated with an
offset value,
CA 02805722 2015-06-12
77762-30
3a
According to one aspect of the present invention, there is provided a method
for a video coding system for processing of coded video using in-loop
processing, the method
comprising: receiving reconstructed video data corresponding to coded video
data from a
processing unit, wherein the reconstructed video data is divided into regions,
and wherein a
merge syntax is used to indicate whether a current region shares a same in-
loop filter of a
neighboring region of the current region; applying in-loop processing to the
current region to
generate a filtered region, wherein said applying in-loop processing uses the
same in-loop
filter of the neighboring region and the current region shares all parameters
of the same in-
loop filter of the neighboring region if the merge syntax indicates that the
current region
shares the same in-loop filter of the neighboring region; and providing the
filtered region.
According to another aspect of the present invention, there is provided an
apparatus for processing coded video using in-loop processing, the apparatus
comprising:
means for receiving reconstructed video data corresponding to coded video data
from a
processing unit, wherein the reconstructed video data is divided into regions,
and wherein a
merge syntax is used to indicate whether a current region shares a same in-
loop filter of a
neighboring region of the current region; means for applying in-loop
processing to the current
region to generate a filtered region, wherein said applying in-loop processing
uses the same
in-loop filter of the neighboring region and the current region shares all
parameters of the
same in-loop filter of the neighboring region if the merge syntax indicates
that the current
region shares the same in-loop filter of the neighboring region; and means for
providing the
filtered region.
BRIEF DESCRIPTION OF DRAWINGS
Fig. 1 illustrates an exemplary block diagram of a video coding system based
on motion compensated prediction, where adaptive loop filter is applied to
reconstructed
video data.
Fig. 2 illustrates an example of pixel based adaptive loop filter where Sum-
modified Laplacian
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
4
Measure (SLM) is used to classify the pixels in a 6x4 block into three
categories.
Fig. 3 illustrates an example of dividing a picture consisting of 416x240
pixels into 4x4 LCU-
aligned regions, where each square is a LCU consisting of 64x64 pixels and a
non-rightmost and
non-bottom region consists of 2x1 LCUs.
Fig. 4 illustrates an exemplary syntax design to support region based ALF
incorporating a
flag, region_adaptation_flag, to indicate whether region based ALF is used.
Fig. 5 illustrates the scanning order through the 4x4 regions according to the
deformed Hilbert
curve.
Fig. 6 illustrates the scanning order through the 4x4 regions according to the
horizontal snake
scan.
Fig. 7 illustrates the scanning order through the 4x4 regions according to the
vertical snake
scan.
Fig. 8 illustrates the scanning order through the 4x4 regions according to the
zig-zag scan.
Fig. 9 illustrates the scanning order through the 4x4 regions according to the
spiral scan.
Fig. 10 illustrates the scanning order through the 4x4 regions according to
the quad-tree scan.
Fig. 11 illustrates the scanning order through the 4x4 regions according to
the raster scan.
Fig. 12 illustrates an example of region splitting, where a region is split
into five regions, i.e.,
Fo, F1, F2, F3 and a No-Filter region.
Fig. 13 illustrates an example where the five to-be-filtered regions are
merged into three
regions, F0', and F1' and a No-Filter region.
Fig. 14 illustrates an exemplary syntax design to support 2-D region merging.
DETAILED DESCRIPTION
For digital video compression, motion compensated inter-frame coding is an
effective
compression technique and has been widely adopted in various coding standards,
such as MPEG-
1/2/4 and H.261/H.263/H.264/AVC.
In a motion compensated system, motion
estimation/compensation and subsequent compression is often performed on a
block by block basis.
During compression process, coding noises may arise due to lossy operations
applied such as
quantization.
The coding artifacts may become noticeable in the reconstructed video
data,
especially at or near block boundaries. In order to alleviate the visibility
of coding artifacts, a
technique called deblocking has been used in newer coding systems such as
H.264/AVC and the
High Efficiency Video Coding (HEVC) system under development. The deblocking
process
applies filtering across block boundaries adaptively to smooth the large
transitions at and near block
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
boundaries due to coding noises while retaining image sharpness. Furthermore,
due to the nature of
inter-frame coding, the deblocking process is configured for in-loop
operation.
In HEVC, another in-loop filtering, called adaptive loop filtering (ALF), is
used in addition to
deblocking. While deblocking filter is only applied to block boundaries, the
adaptive loop filter
5 may be applied to all pixels in a frame, field, slice or picture area.
The conventional adaptive loop
filter is only applied to these blocks where the filtering helps to improve
performance. For other
blocks that adaptive loop filtering does not help to improve performance,
adaptive loop filtering is
not applied. When the ALF is applied, a single filter SF is applied to the
blocks in a slice. The
single filter is selected from a group of filter candidates to achieve the
best performance (such as
minimum distortion, minimum rate or best R-D performance. Such ALF scheme is
referred to as
SF (single filter) ALF. Information associated with the filter selected has to
be conveyed to the
decoder side. In order to conserve the information to be transmitted or
stored, the set of ALF filters
may be pre-defined and the filter selection can be indicated by an index.
Alternatively, the filter
can be derived in a time-delayed arrangement based on video data already
reconstructed.
Therefore, no side information or very little side information is needed.
Other means for reducing
information associated with the filter can be used such as entropy coding of
the filter coefficients
and/or transmitting the coefficients differentially.
A system block diagram for a coding system incorporating adaptive loop
filtering and
deblocking is shown in Fig. 1. Fig. 1 illustrates a system block diagram 100
of motion-
compensated video encoder with deblocking. Compression system 100 illustrates
a typical video
encoder incorporating Intra/Inter-prediction, Discrete Cosine Transform (DCT)
and entropy coding
to generate compressed video data. The input video data enters the encoder
through input interface
112 and the input video data is predicted using Intra/Inter-prediction 110. In
the Intra prediction
mode, the incoming video data are predicted by surrounding data in the same
frame picture or field
picture that are already coded. In the Inter prediction mode, the prediction
is based on previously
reconstructed data 142 in the temporal direction where the reconstructed data
142 are stored in
picture buffer 140. The Inter prediction can be a list 0 prediction mode,
where the prediction is
based on a picture that is prior to the current picture in decoding order and
is in a first list of
reference pictures. The Inter prediction may also be a list 1 prediction mode
where the Inter
prediction is based on a picture that is prior to the current picture in the
decoding order and is in a
second list of reference picture if the current slice is a B-slice. The Inter
prediction may also be a
bi-prediction mode where the Inter prediction is based on a list 0 reference
picture and a list 1
reference picture if the current slice is a B-slice. In the Inter-prediction
mode, the Intra/Inter
prediction 110 will cause the prediction data to be provided to the adder 115
and be subtracted from
CA 02805722 2015-06-12
77762-30
6
the original video data 112. The output 117 from the adder 115 is termed as
the prediction error
which is further processed by the DCT/Q block 120 representing Discrete Cosine
Transform and
quantization (Q). The DCT and quantization 120 converts prediction errors 117
into coded symbols
for fiirther processing by entropy coding 130 to produce compressed bitstream
132, which is stored
or transmitted. In order to provide the prediction data for Intra/Inter
prediction, the prediction error
processed by the DCT and quantization 120 has to be recovered by inverse DCT
and inverse
quantization (IDCT/IQ) 160 to provide a reconstructed prediction error 162. In
the Inter prediction
mode, the reconstructed prediction error 162 is added to previously
reconstructed video data 119 by
the reconstruction block 150 to form a currently reconstructed frame 152. In
the Intra prediction
mode, the reconstructed prediction error 162 is added to the previously
reconstructed surrounding
data in the same picture. The Intra/Inter prediction block 110 is configured
to route the previously
reconstructed data 119 to the reconstruction block 150, where the
reconstructed data 119 may
correspond to a previously reconstructed frame in the temporal direction or
reconstructed
surrounding data in the same picture depending on the Inter/ Intra mode.
The reconstructed .data are processed by deblocking 170 and adaptive loop
filtering 180 and
are then stored in the picture buffer 140 as reference video data for
processing of subsequent
pictures. The original ALF proposed for HEVC is applied on a block by block
basis. If ALF helps
to improve the performance (lower distortion, lower bit rate, or better R-D
performance), the ALF is
turned on for the block. Otherwise, ALF is turned off for the block. An ALF
scheme (called
QC_ALF) was proposed by Qualcomm ("Video coding technology proposal by
Qualcomm Inc.",
Karczewicz et al., Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T
SG16 WP3 and
ISO/TEC JTC1/SC29/WG11, 1 st Meeting: Dresden, DE, 15-23 April, 2010,
Document: JCTVC-
A121. According to QC_ALF, the ALF is applied to the deblocked video data on a
pixel by pixel basis. For
each block, the Sum-modified Laplacian Measure (SLM) SLM(i,j) of each pixel
(i,j) within the block is
computed:
L L
SLM(i, j)= E Elm(I+ k, j + 1)¨ R(i + k ¨1, j + 1)¨ R(i + k +1, j + 1)1+
k =¨K 1=¨L
124 + k, j + 1)¨ R(i + k, j + 1 ¨1)¨ R(i + k, j + ( +1)1,
where R(id) is the deblocked video data. The SLM is calculated based on
(2K+1)x(2L+1)
neighboring window centered at (i,j). The neighboring window size can be 9x9,
7x7, 5x5 or 3x3, In
order to reduce the complexity, the 3x3 neighboring window has been used. The
SLM value
computed for each pixel of the block is used to classify the pixel into one of
M groups. Fig. 2
illustrates an example of SLM classification where each square denotes a
pixel. The pixels are
classified into three groups according to the SLM value as shown in Fig. 2.
The QC ALF scheme
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
7
selects a filter corresponding to each group to filter the pixels associated
with the group. The filter
used for ALF is often horizontally and vertically symmetric in order to reduce
computational
complexity. To further reduce the complexity, diamond shaped filter may be
used, particularly for
large size filter. For example, 9x9, 7x7, or 5x5 diamond-shaped filter, or 5x5
or 3x3 square filter
may be used. The SLM-based ALF is applied on a pixel by pixel basis and pixels
may use different
filters. Therefore, the QC ALF is also referred to as pixel-adaptive or pixel-
adaptation (PA) ALF.
The SF ALF mentioned previously can be considered as a special case of PA ALF
where only one
group is used.
While the PA ALF can adaptively select a filter on a pixel by pixel basis, it
requires deriving
the group information based on the SLM for each pixel on the decoder side
based because the
required side information will be substantial if it is coded. Therefore, it is
desirable to develop an
ALF scheme that can achieve better performance and/or reduced complexity.
Accordingly, a region
based ALF scheme is disclosed herein. The ALF scheme according to the present
invention applies
ALF to deblocked video data on a region by region basis. The regions can be
formed by dividing a
picture or a picture area into fixed blocks or fixed sets of blocks.
Alternatively, the regions can be
formed by partitioning a picture or a picture area recursively. For example,
quadtree may be used
for recursive region partitioning. A flag in the syntax for ALF information
set is used to indicate
whether the region based ALF is applied or non-region based ALF method is
applied. For example,
a flag can be used to select between region-based ALF and block-based ALF. The
filter selected for
the block can be derived from a similar method used in the pixel-based ALF.
For example, Chong
et al, described a block based ALF, where Laplacian based activity metrics of
4x4 blocks are
averaged so that each 4x4 block can use one Laplacian activity value (Chong et
al., "CE8 Subtest 2:
Block based adaptive loop filter (ALF)", Joint Collaborative Team on Video
Coding (JCT-VC) of
ITU-T 5G16 WP3 and ISO/IEC JTC1/SC29/WG11, 5th Meeting: Geneva, CH, 16-23
March, 2011,
Document: JCTVC-E323). This method requires computing the Laplacian activity
value for each
pixel of the 4x4 block and does not save any computation over the conventional
pixel-based ALF.
However, the block-based ALF can reduce the frequency of filter switching
activity in comparison
with the pixel-based ALF. Instead of using Laplacian activity as a classifier,
other measurement,
such as the band offset (BO) or the edge offset (EO) classifier used for
sample adaptive offset
(SAO), may also be used to replace the SLM computation used by Karczewicz et
al. Both BO and
EO require much fewer computations compared with the SLM measurement.
As an example of region-based ALF using sets of blocks, a picture may be
divided into 16
(i.e., 4x4) roughly-equal-sized regions. For example, the region widths of non-
rightmost regions
can be (PicWidth14) wherein the PicWidth means the picture width. For
rightmost regions, the
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
8
region width can be (PicWidth ¨ (PicWidth14)*3). The region heights of non-
bottom regions can be
(PicHeight/4), wherein the PicHeight means the picture height. For bottom
regions, the region
height can be (PicHeight ¨ (PicHeight14)*3). In this example, the rightmost
regions and the bottom
regions may be larger than other regions. Another example is to partition a
picture into 16 (i.e.,
4x4) roughly-equal-sized LCU-aligned regions, wherein region boundaries must
also be largest
coding unit (LCU) boundaries, as shown in Figure 3. The picture size is
416x240 and contains 7x4
LCUs, wherein each LCU has 64x64 pixels. The region widths of non-rightmost
regions can be
(((PicWidthInLCUs+1)14)*64), wherein the PicWidthInLCUs means the number of
LCUs of the
picture width. The region heights of non-bottom regions can be
(((PicHeightInLCUs+1)14)*64),
wherein the PicHeightInLCUs means the number of LCUs of the picture height.
The size of the
rightmost and bottom regions can be derived from PicWidth, PicHeight, and the
size of a non-
rightmost and non-bottom region. An index is assigned to each of 4x4 regions.
While 4x4 regions
are used as an example, it is understood that the present invention is not
limited to the particular 4x4
regions. In fact, MxN regions may be used to practice the present invention,
where M and N are
integers. For a picture having PicWidthInSamples pixels horizontally and
PicHeightInSamples
pixels vertically, the region index for a region with upper left corner
coordinate (x,y) can be derived
as follows. The (1<<Log2MaxCUSize) is the maximum coding unit size, and
xMaxIndex and
yMaxIndex are the maximum region indexes in horizontal and vertical
directions, respectively. The
horizontal interval x interval and the vertical interval y interval of the
region can be derived as:
x interval = ((((PicWidthInSamples + (1<<Log2MaxCUSize) ¨ 1) >> Log2MaxCUSize
) +
x round) 1 (xMaxIndex+1)), and (1)
y interval = ((((PicHeightInSamples + (1<<Log2MaxCUSize) ¨ 1) >>
Log2MaxCUSize) +
y round) 1 (yMaxIndex+1)), (2)
where x round = max( 0, (xMaxIndex+1) /2 ¨ 1) and y round = max( 0,
(yMaxIndex+1) /2 ¨
1). The horizontal index x idx and the vertical index y idx can be derived as:
x idx = min (xMaxIndex, x 1 (x interval<< Log2MaxCUSize)), and (3)
y idx = min (yMaxIndex, y / (y interval<< Log2MaxCUSize)) (4)
The region index region idx is determined according to:
region idx = y idx * (xMaxIndex + 1) + x idx. (5)
When the 4x4 regions are used for region based ALF, the region index can be
derived as
follows. The horizontal interval x interval and the vertical interval y
interval of the region can be
derived as:
x interval = ((((PicWidthInSamples + (1<<Log2MaxCUSize) ¨ 1)>> Log2MaxCUSize )
+ 1)
>> 2), and (6)
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
9
y interval = ((((PicHeightInSamples + (1<<Log2MaxCUSize) ¨ 1 )>>
Log2MaxCUSize) + 1)
>> 2 ).
(7)
The horizontal index x idx and the vertical index y idx can be derived as:
xidx = min ( 3, x / (x interval<< Log2MaxCUSize)), and
(8)
yidx = min ( 3, y/ interval<< Log2MaxCUSize)). (9)
The region index region idx is determined according to:
region idx = idx << 2) + xidx.
(10)
The filter indexfiller idx(x,y) is determined according to:
filter idx(x, y) = region tab [region idx],
(11)
where region tab [16] = { 0, 1, 4, 5, 15, 2, 3, 6, 14, 1, 10, 7, 13, 12, 9,
8}.
Two neighboring regions, i.e., regions with successive indices, can be merged.
After region
merging, one filter is applied to each merged region. The picture can then be
processed by using
the pixel-based ALF, the block-based ALF, or the region-based ALF. Compared to
the pixel-based
ALF and the block-based ALF, the region-based ALF can save computations
associated with pixel-
based classification of pixel adaptation and block-based classification of
block adaptation,
respectively, so that the average decoder complexity can be reduced when the
region based ALF is
incorporated. Furthermore, the region-based ALF also significantly reduces the
number of filter
switching in the picture and consequently results in less switching power. An
example of syntax to
support selection between region-based ALF and pixel-based ALF or between
region-based ALF
and block-based ALF is shown in Fig. 4. The only syntax change is to add one
flag,
region_adaptation_flag, in ALF parameter set, alf_param0 , of the slice header
or the picture
parameter set (PPS) to select between the pixel-based ALF and region-based ALF
or between the
block-based ALF and the region-based ALF.
The coding efficiency may be further improved by region merging. Neighboring
regions may
have similar characteristics and can share a same filter to reduce information
required for indicating
the ALF filter. One method to perform region merging is to order the 2-D
regions into 1-D regions.
For example, the 4x4 regions can be converted into 1-D regions with group
indexes from 0 through
15. The 2-D to 1-D conversion can be performed according to a specific
scanning pattern. There
are many known scan patterns that can be used to convert the 2-D regions into
1-D regions, such as
deformed Hilbert curve (Fig. 5), horizontal snake san (Fig. 6), vertical snake
scan (Fig. 7), zig-zag
scan (Fig. 8), spiral scan (Fig. 9), quad-tree scan (Fig. 10) and raster scan
(Fig. 11). Upon
converting the 2-D regions into 1-D regions, neighboring regions, i.e.,
regions with consecutive
group indexes can be merged to share a same filter as indicated by merging
syntax. For example,
one merge flag can be used to indicate whether the region with group index n
is merged with the
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
region with group index (n-1) or not.
Region merging can also be performed using 2-D merge syntax where a region may
be
merged with a surrounding region. Furthermore, a first flag can be used to
indicate whether this
region is merged or not first. The first flag is followed by one merge
candidate flag if the first flag
5 indicates that the region is merged. In certain circumstances, some of
the surrounding regions may
not available for merge candidates. For example, a region on the boundaries of
a picture or on the
boundaries of 4x4 regions will not have certain merge candidates. Accordingly,
the merge
candidate flag can be simplified based on the neighboring regions. For
example, the merge
candidate can be left or upper region and a 1-bit flag can be used as the
merge candidate flag.
10 However, if left and upper regions do not exist at the same time, then
the merge candidate flag can
be saved. Accordingly, the representation of the merge candidate flag is
adapted to the availability
of the neighboring regions. When some of the neighboring regions are not
available, the merge
candidate flag may be represented in fewer bits. The region merging can be
adaptively applied to
each set of 4x4 regions. A region merge enable flag may be used to indicate
whether region
merging is allowed for the 4x4 regions.
The region for ALF filtering can also be the filter unit (FU). A picture or a
picture area can be
recursively partitioned into smaller FUs if the split will result in better
performance such as lower
rate-distortion (R-D) cost. The R-D cost can be computed for one to-be-
filtered region (an
individual FU in this case) and for regions resulted from splitting. If
splitting will result in lower R-
D cost, the to-be-filtered region will be split; otherwise the region is not
split. Alternatively, a
picture or a picture area may be divided into smaller regions first.
Neighboring regions can be
merged if the merging will result in better performance such as lower R-D
cost. If the cost for the
merged region is smaller than the cost for individual regions, the regions
will be merged; otherwise
the regions will not be merged. According to a method embodying the present
invention, several
to-be-filtered regions can be merged into one FU to reduce the bitrate
associated with the filter
information.
An example of region splitting is shown in Fig. 12 where a region (filter unit
FU 1210) is split
into five regions (FUs 1220), i.e., Fo, F1, F2, F3 and a No-Filter region. The
region split can be
according to lower R-D cost. Fig. 13 illustrates an example of region merging
according to one
embodiment of the present invention. The to-be-filtered regions 1310 are
considered for region
merging according to the R-D cost. The R-D cost for individual neighboring
regions is compared
with the R-D cost of a merged region corresponding to these individual
neighboring regions. The
individual neighboring regions are merged if the merging results in lower R-D
cost. The example
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
11
of Fig. 13 illustrates a case where the five to-be-filtered regions 1310 are
merged into three regions
1320, F0', and F1' and a No-Filter FU. Regions Fo and F2 are merged into F0',
and Regions F1 and
F3 are merged into F1'.
An exemplary syntax design to support 2-D region merging is illustrated in
Fig. 14. The
alfjs selectionfiaram( r, c) function processes filter sharing for region
merging. The ALF merge
flag, alf merge_flag indicates whether the current region (i.e., FU in this
example) is merged with
another region, where a value 1 denotes that the current region is merged and
a value 0 denotes that
the current FU is not merged with the left or the upper region. The filter set
index of the current
region, alf fu Jilter_set_idx is incorporated when the current region is not
merged. The first
region of the picture always has the filter set index set to 0 and does not
need to send the index.
The syntax element alf merge_up_flag indicates whether the region is merged
with the region on
the top side, where a value 0 denotes that the current region is merged with
the left FU; a value 1
denotes that the current region is merged with the upper region.
Another aspect of the present invention is related to ALF filter design. Rate-
distortion
optimization (RDO) is a widely known technique used in video encoding to
obtain good coding
efficiency. RDO is usually applied during mode decision of macroblocks or sub-
macroblocks or
coding units, Intra prediction, motion estimation, adaptive loop filter (ALF)
decision (for filter
sizes, filter shapes, on/off switch, etc.), and sample adaptive offset (SAO)
decision (for choosing
different pixel classification methods). The best RDO decision is the one that
minimizes a rate-
distortion cost function, J=D+XR, where D is the estimated distortion between
original pixels and
reconstructed (or predicted) pixels, R is the estimated rate required for
sending the side information,
and X, is a Lagrange multiplier.
Rate-distortion-complexity optimization (RDCO) is also a well known technique
widely used
in video encoding to obtain a good trade-off between coding efficiency and
coding complexity.
RDCO is usually applied during mode decision of macroblocks, sub-macroblocks
or coding units,
Intra prediction, and motion estimation. The best RDCO decision is the one
that minimizes a rate-
distortion-complexity cost function, J=D+k1lt+X2C, where D is the estimated
distortion between
original pixels and reconstructed (or predicted) pixels, R is the estimated
rate required for sending
the side information, C is the estimated complexity required for encoding or
decoding (in terms of
clock cycles, memory access, or other complexity measurement), and Xi and k2
are Lagrange
multipliers.
An embodiment according to the present invention incorporates RDCO for ALF/SAO
design,
wherein the best candidate associated with a design feature to be determined
according to the
CA 02805722 2013-01-16
WO 2012/045269
PCT/CN2011/080408
12
RDCO. The design feature may be mode decision or region partition. The mode
decision among
ALF different modes can be dependent on the slice type such as I/B/P-slices or
the percentage of
Inter/Intra coding units (CUs). Another embodiment according to the present
invention
incorporates RDCO for ALF filter selection. When the RDCO is used for ALF
filter selection, the
complexity of the RDCO can be associated with the number of filter
coefficients. Alternatively, the
RDCO can be used for ALF filter selection, where the complexity C of the RDCO
can be associated
with (A) the number of filter coefficients and (B) the number of pixels to be
filtered, such as C =
A*B. In yet another embodiment according to the present invention, the RDCO
can be used for
ALF filter selection, where the complexity C of the RDCO can be associated
with (A) the number
of filter coefficients, (B) the number of pixels to be filtered, and the
number of required operations
in one mode (TableOfNumOperations[Mode]), such as C=A*B* TableOfNumOperations
[Mode].
In U.S. Patent Application, Serial No.13/158,427, filed on June 12, 2011,
entitled "Apparatus
and Method of Sample Adaptive Offset for Video Coding", a sample adaptive
offset (SAO) scheme
is disclosed where each pixel is classified using multiple pixel
classification types and each
classification type may classify the pixel into multiple categories. According
to the pixel
classification type and the category of the pixel, an offset value is
determined to compensate the
offset. The SAO scheme utilizes either the band offset (BO) context or the
edge offset (EO) context
to classify pixels into categories. The complexity associated with the
multiple pixel classification
types may be different. Therefore, RDCO technique may be used for SAO decision
to select a pixel
classification type to achieve the best RDCO performance. Another aspect of
the invention
disclosed in U.S. Patent Application, Serial No.13/158,427 addresses the
region partition. The
RDCO technique can also be used for SAO decision to determine region
partition. The cost for
RDCO can be associated with the number of pixels to be processed and the
number of required
operations (TableOfNumOperations [Mode]) for the mode (i.e., the pixel
classification type being
considered).
The adaptive loop filter scheme described above can be used in a video encoder
as well as in a
video decoder. In both the video encoder and the video decoder, the
reconstructed video data is
subject to coding artifacts. The adaptive loop filter scheme described above
can help to improve
visual quality of the reconstructed video. Embodiments of an encoding system
with region based
ALF according to the present invention as described above may be implemented
in various
hardware, software codes, or a combination of both. For example, an embodiment
of the present
invention can be a circuit integrated into a video compression chip or program
codes integrated into
video compression software to perform the processing described herein. An
embodiment of the
present invention may also be program codes to be executed on a Digital Signal
Processor (DSP) to
CA 02805722 2015-06-12
=
77762-30
13
perform the processing described herein. The invention may also involve a
number of functions to
be performed by a computer processor, a digital signal processor, a
microprocessor, or field
programmable gate array (FPGA). These processors can be configured to perform
particular tasks
according to the invention, by executing machine-readable software code or
firmware code that
defines the particular methods embodied by the invention. The software code or
firmware codes
may be developed in different programming languages and different format or
style. The software
code may also be compiled for different target platform. However, different
code formats, styles
and languages of software codes and other means of configuring code to perform
the tasks in
accordance with the invention will not depart from the spirit and scope of the
invention.
1 0 The invention may be embodied in other specific forms without departing
from its
essential characteristics. The described examples are to be considered in all
respects only as
illustrative and not restrictive. The scope of the invention is, therefore,
indicated by the appended
claims rather than by the foregoing description. All changes which come within
the meaning and
range of equivalency of the claims are to be embraced within their scope.