Note: Descriptions are shown in the official language in which they were submitted.
CA 02272064 2004-11-23
-1
METHOD OF PROCESSING A VIDEO STREAM
This invention relates to a method of processing a video stream, to detect
changes,
for example, a cut in scenes.
In video terminology, a video stream consists of a number of frames that are
displayed successively to create the illusion of motion. A sequence of frames
can be
considered to form a "scene", which is considered to be a continuous action in
space and
time (i.e. with no camera breaks). A "cut" is a discontinuity between scenes.
A cut is
sharp if it can be located between two frames and gradual if it takes place
over a
sequence of frames. A keyframe is a frame that represents a whole scene. It
can either be
calculated or selected from the frames of the scene it represents.
There are many situations where it is desirable to select a cut. For example,
selecting keyframes to transmit over a network, save onto a hard disk, or use
to browse a
video can reduce bandwidth, capacity and time than considering the whole video
data.
However, video segmentation is a difficult process in view of the various
types of camera
breaks and different operations that can take place.
Video parameters include intensity, red-green-blue (RGB), hue-value-chroma
(HVC), and a motion vector. A traditional approach for detecting a cut is to
compare one
or more of these parameters, such as intensity, of the corresponding pixels in
a pair of
consecutive frames. If the number of pixels whose intensity values have
changed from
one frame to the next exceeds a certain threshold, a cut is presumed. However,
such an
approach results in low detection rates and result in the detection of false
cuts or missing
real cuts. False cuts may result from camera operations, object movements or
flashes
within a video clip, while missed cuts may result from gradual scene changes.
EP 696 016A describes a cut detection method wherein a scene changing ratio is
computed taking into account the frame difference between temporally spaced
images as
well as temporally successive images. EP 660327describes a method for
detecting abrupt
and gradual scene changes wherein matching is performed between a current
frame and a
D~' previous frame. Neither of these patents satisfactorily solves the
problems outlined
above.
An object of the invention is to alleviate the afore-mentioned problems.
CA 02272064 2004-11-23
-2-
According to the present invention there is provided a method of processing a
video stream, comprising the steps of selecting first pairs of frames in the
video stream
with a predetermined temporal spacing; selecting second pairs of frames in the
video
stream, said second pairs of frames having a longer temporal spacing than said
first pairs
S of frames; for each of said first and second pairs of frames, determining a
difference value
representing the degree of change between the first and second frames of the
pair and
generating a particular logic level depending on whether this difference value
exceeds a
predetermined threshold; characterized in that the change in interframe
difference value
for successive pairs of frames is determined for each of said first and second
pairs of
frames and compared with a threshold to generate additional logic levels
dependent on the
change in interfi~ame difference values for said successive frame pairs; and
the generated
logic levels are compared with a decision map to identify cuts in the video
stream.
The degree of change may be represented by the number of pixels for which a
particular value, such as intensity, has changed. Alternatively, the
difference value may be
arrived at by, for example, taking the root mean square of the differences in
pixel values.
In this case, the difference in intensity value of each corresponding pair of
pixels is
determined, the results squared, and the square root taken of the sum. This
rms value can
then be compared to a threshold. A value other than intensity, for example
hue; can be
chosen for the value.
By this method, gradual cuts between scenes can be more accurately detected
and
the occurrence of false detections can be reduced.
In a preferred embodiment, the change in difference value between each of the
first and second pairs of fi~ames and the corresponding previous pairs is
determined, and
additional logic levels are generated that depend on whether the change in
difference
values exceeds a predetermined threshold. The additional logic levels are also
compared
with the decision map to assist in identifying the cuts. This additional step
enhances the
detection process.
The invention also provides video processing apparatus comprising means for
selecting first pairs of frames in the video stream with a predetermined
temporal spacing;
means for selecting second pairs of frames in the video stream, said second
pairs of
frames having a longer temporal spacing than said first pairs of frames; means
for
CA 02272064 2004-11-23
-3-
determining, for each of said first and second pairs of frames, a difference
value
representing the degree of change between the first and second frames of the
pair and
generating a particular logic level depending on whether this difference value
exceeds a
predetermined threshold; characterized in that it further comprises means for
computing
the change in interframe difference value for successive pairs of frames for
each of said
first and second pairs of frames and comparing said change with a threshold to
generate
additional logic levels dependent on the change in interframe difference
values for said
successive frame pairs; and means for comparing the generated logic levels
with a
decision map to identify cuts in the video stream.
The invention will now be described in more detail, by way of example, only
with
reference to the accompanying drawings, in which:-
Figure 1 is a block diagram of an apparatus for detecting cuts in a video
stream;
Figure 2 illustrates the main processing routine;
Figure 3 illustrates the detection processing routine;
Figure 4 illustrates the compare frames processing routine;
Figure S illustrates the change detection routine;
Figure 6 illustrates the set tag routine; and
.Figure 7 shows typical threshold values.
The system illustrated is typically implemented on a Pentium 133 MHz computer.
A digital video stream, for example, from a digital video camera or an analog
feed passed
through an analog-to-digital converter (not shown), is split and passed to
short and long
period units l, 2. The short period comparison unit identifies a pair of
fi~mes in a stream,
for example, the fourth and fifth frames, and determines the number of pixels
whose
values have changed. This number is then compared with a threshold and
allocated a logic
level 1 if it exceeds the threshold and otherwise a logic level 0. The pixel
values can be
any suitable value, but typically the intensity is used.
The long period comparison unit 2 carnes out the same operation, except on
pairs
of frames that are temporally further apart, for example, first and eighth
frames in the
CA 02272064 2004-11-23
-4-
video stream. It generates a logical 1 if the number of pixels whose intensity
values have
changed exceeds a predetermined threshold. Otherwise it generates a logical 0.
The video stream is then passed to the short period change detection unit 3
and the
long period change detection unit 4. The short period change detection unit 3
compares
the current interframe difference value, derived in unit l, namely the number
of pixels
whose intensity values have changed between each pair of pixels, with the
previous pair,
or the average of all the previous pairs, of interframe difference values to
derive the
change. If the change in interframe difference values exceeds a predetermined
threshold, a
logical 1 is generated, otherwise a logical 0 is generated.
The long period change detection unit 4 does the same as the short period
change
detection unit, except with frames separated by a longer period, the first and
eighth frames
in this example. The threshold for the long period change detection unit is
typically higher
than for the short period detection unit.
A decision map shown below is stored in unit 5.
Short- Short- Long- Long-Period Type of Cut?
Change Period Change Cut Change
Cut
0 X X 0 - No
X X 0 1 Action No
X 0 1 1 Gradual Yes
0 1 1 1 Action No
1 X X 0 Flashes No
1 1 1 1 Shar Yes
This contains a table of all possible values of the logic outputs of units 1
to 4,
logic level 0 representing a comparison below the threshold, logic 1 being a
comparison
above the threshold and X, where X means "don't care". i.e. 0 or 1. For
example, in the
short change column, a 1 means that unit 3 detects a change in difference
values between
successive pairs of frames above a threshold, 0 means any change was below the
threshold, and X means that the outcome of the short change comparison is not
relevant to
the decision.
CA 02272064 2004-11-23
-S-
The system shown in Figure 1 moves through successive frames as follows. For
example, if the system processes six frames at a time, frames l and 6 would
form the long
pair and frames 3 and 4 might form the short pair. If no cuts are detected in
this block, the
next block will be made of frames 2 to 7, with frames 2 and 7 forming the long
pair, and
frames 4 and 5 forming the short pair and so on. However, if a cut is
detected, the next
block will contain frames 6 to 11 since the block size is very small compared
to a typical
scene length and no two cuts can be detected within one block.
On looking at the table above, it will be observed that a positive result for
all four
comparisons indicates a true sharp cut, whereas a positive result in the long
change and
long period detector without a corresponding result in the short period
detector indicates a
probable gradual cut.
A frame may contain one or more windows to which the above described process
can be applied. The use of windows allows the system to focus on specific
areas of a
frame instead of considering the frame as a whole. This can save considerable
processing
time.
The main processing routine is shown in Figure 2. Starting at block 10, the
routine
determines at process blockl 1 whether the frame is decompressed. If not, it
is
decompressed in process block 12. At process block.13, the frame number is
incremented.
Decision block 14 determines whether variable jump (set in the Tag routine to
be
described) is greater than 0 and the number of sectors is greater than 1. If
not, and the
FrameNo variable is greater than the Blocksize variable, block 15 calls the
call detection
routine shown in Figure 3. If the output of Decision block 14 is true, block l
6 decrements
the jump variable.
Figure 3 shows the cut detection routine. Starting from process block 21,
block 22
performs the short frame comparison and block 23 performs the long frame
comparison to
determine the number of pixels whose intensity values have changed for each
pair frames.
Process blocks 27, 28 determine whether the percentage change is above the
threshold for the associated window, and if so generate a logical 1, otherwise
they
generate a logical zero.
CA 02272064 2004-11-23
-6-
Process blocks 2S and 26 perform the short and long change detection
operations
following which block 29 calls the set tag routine described with reference to
Figure 4.
Process block 30 causes the routine to loop for all windows of interest
assuming the
system is operating on the basis that there is more than one window. Of
course, there
S could only be one window that represents a whole frame.
Figure 4 shows the compare frames routine in more detail. Processing blocks
40,
41 loop for all the pixels in a row and column respectively. Block 42 then
finds the
differences between the corresponding pixels in the first and second frames of
each pair.
If the decision unit 43 finds a difference greater than a threshold, the CP
variable, which
represents the percentage change of the window containing the pixels is
incremented.
Figure S shows the change detection routine identified in blocks 2S, 26 in
Figure 3
in more detail. Block 50 finds the difference between the change period CP for
the
window and the average change period CPA. If this change is greater than a
threshold, as
determined in decision unit S3, and the condition in block S4 is met, process
block SS sets
1 S the output to logical 1 to indicate a change. The change detection routine
shown in Figure
S works for both long period and short period changes.
Referring now to Figure 6, the tag routine essentially implements the decision
map
table shown in Figure 7. Starting from process block 60 called by process
block 29 in
Figure 3, the routine determines whether there has been a short period change
in block 61.
If yes, decision block 63 determines whether there has been a short period
cut, long period
cut, and long period change. If yes, the block 67 creates a cut tag. If.no,
the block 68
determines whether there has been long period cut. If yes, block 71 creates a
flash tag.
If the result of decision block 68 is true, block 69 checks the condition
!SPcut, and
LPcut and LPchange is met, where ! SPcut means that the SPcut variable is
logical 0, or in
2S other words no cut was detected in the short period. If yes, block 73
creates a cut tag. If
no, block 74 creates an action tag.
If the result of decision block 61 is no, decision block 62 determines whether
the
condition !SPcut and LPcut and Lpchange has been met. If yes, block 64 creates
a cut tag
and sets the variable jump equal to the block size. If no, block 65 determines
whether
CA 02272064 2004-11-23
_'
there has been an LP cut. If no, block 70 creates a no cut tag; if yes, block
74 creates an
action tag.
The program then moves on to a following block of frames to repeat the
process,
continually creating tags identifying cuts and indicating whether a cut has
been detected
in the processed block.
Figure 7 shows typical short-period interframe difference values expressed as
a
percentage vs. Frame number. Although Threshsold~oNCpExroD is applied on the
corresponding long-term chart, it is shown in this figure as well.
The described method has many uses. -For example, it can be applied to scene
. change detection with automatic cut detection and flagging, visual index
creation for
videos, video transformation from video to pictorial transcripts and
illustrated audio,
video sequence recognition, motion detection, motion tracking and sizing, and
bandwidth
reduction by extracting only changing information from a scene.
The described method can achieve a high and robust video cut detection rate in
. part due to the change detection routine, satisfy real-time requirements. It
can easily be
applied only to specific windows of interest within a frame in the manner
described. It can
be applied to automatic television monitoring and be situated either at the
network access
point or at the user end. It can also be integrated with any database
management system
that needs to index or store video.