Language selection

Search

Patent 3166347 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3166347
(54) English Title: VIDEO GENERATION METHOD AND APPARATUS, AND COMPUTER SYSTEM
(54) French Title: PROCEDE ET APPAREIL DE GENERATION DE VIDEO, ET SYSTEME INFORMATIQUE
Status: Examination
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06T 15/20 (2011.01)
(72) Inventors :
  • HUANG, MINMIN (China)
  • DONG, BANGFA (China)
  • YANG, XIAN (China)
(73) Owners :
  • 10353744 CANADA LTD.
(71) Applicants :
  • 10353744 CANADA LTD. (Canada)
(74) Agent: JAMES W. HINTONHINTON, JAMES W.
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 2020-08-28
(87) Open to Public Inspection: 2021-07-08
Examination requested: 2022-06-29
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/CN2020/111952
(87) International Publication Number: WO 2021135320
(85) National Entry: 2022-06-29

(30) Application Priority Data:
Application No. Country/Territory Date
201911396267.6 (China) 2019-12-30

Abstracts

English Abstract

Disclosed in the present application are a video generation method and apparatus, and a computer system. The method comprises: acquiring an original image; rendering the original image according to a pre-determined rendering method, obtaining a key frame; rendering the key frame according to a pre-determined rendering method, obtaining an intermediate frame corresponding to the key frame; generating a video corresponding to the key frame, the video consisting of the key frame and the intermediate frame corresponding to the key frame, realizing low cost and high efficiency for a video generating process, taking into account the problems of both scalability and content individualization.


French Abstract

L'invention concerne un procédé et un appareil de génération de vidéo, ainsi qu'un système informatique. Le procédé selon l'invention consiste : à acquérir une image d'origine ; à rendre l'image d'origine selon un procédé de rendu prédéterminé, à obtenir une trame clé ; à rendre la trame clé selon un procédé de rendu prédéterminé, à obtenir une trame intermédiaire correspondant à la trame clé ; à générer une vidéo correspondant à la trame clé, la vidéo étant constituée de la trame clé et de la trame intermédiaire correspondant à la trame clé, ce qui permet d'obtenir un faible coût et une efficacité élevée pour un processus de génération de vidéo, en tenant compte des problèmes de variabilité d'échelle et d'individualisation de contenu.

Claims

Note: Claims are shown in the official language in which they were submitted.


CLAIMS
What is claimed is:
1. A video generating method, characterized in that the method comprises:
receiving an initial video and a target video classification;
segmenting the initial video into video segments according to a preset video
segmenting method;
inputting the video segments in a preset model, and determining confidence of
each video
segment corresponding to all preset video classifications;
determining the video segment to which the target video classification
corresponds according to
the target video classification and the confidence of each video segment
corresponding to all
preset video classifications; and
joining the video segment to which the target video classification corresponds
according to a
preset joining parameter, and obtaining a target video.
2. The method according to Claim 1, characterized in that the step of
segmenting the initial video
into video segments according to a preset video segmenting method includes:
employing a preset shot boundary detection method to determine a shot boundary
contained in
the initial video; and
segmenting the initial video into video segments according to the determined
shot boundary.
3. The method according to Claim 2, characterized in that the shot boundary
contains an abrupt
shot and a gradual shot of the initial video, and that the step of segmenting
the initial video into
video segments according to the determined shot boundary includes:
eliminating the abrupt shot and the gradual shot from the initial video, and
obtaining a video
segment collection consisting of the video segments remaining after the
elimination.
4. The method according to Claim 3, characterized in that the video consists
of consecutive
frames, and that the step of determining the abrupt shot and the gradual shot
includes:
21

calculating degrees of deviation of all the frames from adjacent frames;
judging, when a given degree of deviation exceeds a first preset threshold,
the given frame as an
abrupt frame, wherein the abrupt shot consists of consecutive abrupt frames;
judging, when the given degree of deviation is between the first preset
threshold and a second
preset threshold, the given frame as a latent gradual frame; and
judging, when the number of consecutive latent gradual frames exceeds a third
preset threshold,
the latent gradual frames as gradual frames, wherein the gradual shot consists
of consecutive
gradual frames.
5. The method according to anyone of Claims 1 to 4, characterized in that the
step of inputting
the video segments in a preset model, and determining confidence of each video
segment
corresponding to all preset video classifications includes:
sampling the video segments according to a preset sampling method, and
obtaining at least two
sample frames to which the video segments correspond; and
preprocessing the sample frames, inputting the preprocessed sample frames in
the preset model,
and obtaining confidences of the video segments corresponding to all the
preset video
classifications.
6. The method according to Claim 5, characterized in that the step of
inputting the preprocessed
sample frames in the preset model includes:
extracting spatiotemporal features contained in the preprocessed sample
frames, and inputting
the spatiotemporal features in the preset model.
7. The method according to anyone of Claims 1 to 4, characterized in that the
preset model is a
previously trained MFnet 3D convolutional neural network model.
8. The method according to anyone of Claims 1 to 4, characterized in that the
method further
comprises receiving a target duration, and that the step of determining the
video segment to which
the target video classification corresponds according to the target video
classification and the
confidence of each video segment corresponding to all preset video
classifications includes:
22

determining the video segment to which the target video classification
corresponds according to
the target duration, the target video classification, the confidence of each
video segment
corresponding to all preset video classifications, and a duration of the video
segment.
9. A video generating device, characterized in that the device comprises:
a receiving module, for receiving an initial video and a target video
classification;
a segmenting module, for segmenting the initial video into video segments
according to a preset
video segmenting method;
a processing module, for inputting the video segments in a preset model, and
determining
confidence of each video segment corresponding to all preset video
classifications;
a matching module, for determining the video segment to which the target video
classification
corresponds according to the target video classification and the confidence of
each video segment
corresponding to all preset video classifications; and
a joining module, for joining the video segment to which the target video
classification
corresponds according to a preset joining parameter, and obtaining a target
video.
10. A computer system, characterized in that the system comprises:
one or more processor(s); and
a memory, associated with the one or more processor(s), for storing a program
instruction that
performs the following operations when it is read and executed by the one or
more processor(s):
receiving an initial video and a target video classification;
segmenting the initial video into video segments according to a preset video
segmenting method;
inputting the video segments in a preset model, and determining confidence of
each video
segment corresponding to all preset video classifications;
determining the video segment to which the target video classification
corresponds according to
the target video classification and the confidence of each video segment
corresponding to all
preset video classifications; and
23

joining the video segment to which the target video classification corresponds
according to a
preset joining parameter, and obtaining a target video.
24

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 03166347 2022-06-29
VIDEO GENERATION METHOD AND APPARATUS, AND COMPUTER SYSTEM
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to the field of computer vision
technology, and more
particularly to a video generating method, and corresponding device and
computer system.
Description of Related Art
[0002] With the quickening of the tempo of life, consumers hope to be able to
acquire relevant
information of commodities more visually directly, and the traditional method
of relying
on a certain number of commodity pictures to present commodities can no longer
satisfy
the requirements of e-commerce platforms to present commodity characteristics
to help
consumers select commodities and make decisions, instead, commodity
presentation
short videos for presenting commodity functions or actual use effects have
become the
mainstream of commodity propaganda of various large online retailers. However,
great
quantities of commodity videos uploaded by such users as merchants are
diversified in
qualities, not fixed in lengths, and cannot meet the marketing requirements of
the
platforms.
[0003] In the state of the art, the generation of commodity videos is
classified into two large
categories, namely the traditional manual method and graphic video conversion
generation. The traditional manual method is to manually segment the shots of
the
uploaded source video according to scene contents and target source materials
etc., and
thereafter manually screen and join various video segments that satisfy the
marketing
standards to obtain creative commodity marketing short videos that satisfy
user
requirements; the method puts high technical demand on the operator, has low
timeliness
and high subjectivity during the manual operation, and cannot satisfy the
marketing
requirements of videos.
[0004] The method of graphic video conversion requires the cutout of commodity
presentation
pictures provided by merchants, the cutouts are thereafter deployed in preset
image
1
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
backgrounds to form finished pictures of commodities, such template files as
video
templates and background music are obtained from video source material
libraries
existing in the platforms, and commodity videos are generated in batches from
these
template files. Although generation of commodity videos in great batches is
achieved
thereby, the styles and formats of the commodity videos are completely
dependent upon
template files preconfigured in the source material libraries, whereby videos
generated
are close in styles and lacking in formats, fall short of visually directly
presenting actual
statuses of commodities to consumers, and are rather limited in the capability
of
expression.
SUMMARY OF THE INVENTION
[0005] In order to deal with deficiencies in the state of the art, a main
objective of the present
invention is to provide a video generating method to realize automatic
generation of target
videos according to initial videos.
[0006] In order to achieve the above objective, according to the first aspect,
the present invention
provides a video generating method that comprises:
[0007] receiving an initial video and a target video classification;
[0008] segmenting the initial video into video segments according to a preset
video segmenting
method;
[0009] inputting the video segments in a preset model, and determining
confidence of each video
segment corresponding to all preset video classifications;
[0010] determining the video segment to which the target video classification
corresponds
according to the target video classification and the confidence of each video
segment
corresponding to all preset video classifications; and
[0011] joining the video segment to which the target video classification
corresponds according
to a preset joining parameter, and obtaining a target video.
[0012] In some embodiments, the step of segmenting the initial video into
video segments
2
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
according to a preset video segmenting method includes:
[0013] employing a preset shot boundary detection method to determine a shot
boundary
contained in the initial video; and
[0014] segmenting the initial video into video segments according to the
determined shot
boundary.
[0015] In some embodiments, the shot boundary contains an abrupt shot and a
gradual shot of
the initial video, and the step of segmenting the initial video into video
segments
according to the determined shot boundary includes:
[0016] eliminating the abrupt shot and the gradual shot from the initial
video, and obtaining a
video segment collection consisting of the video segments remaining after the
elimination.
[0017] In some embodiments, the video consists of consecutive frames, and the
step of
determining the abrupt shot and the gradual shot includes:
[0018] calculating degrees of deviation of all the frames from adjacent
frames;
[0019] judging, when a given degree of deviation exceeds a first preset
threshold, the given frame
as an abrupt frame, wherein the abrupt shot consists of consecutive abrupt
frames;
[0020] judging, when the given degree of deviation is between the first preset
threshold and a
second preset threshold, the given frame as a latent gradual frame; and
[0021] judging, when the number of consecutive latent gradual frames exceeds a
third preset
threshold, the latent gradual frames as gradual frames, wherein the gradual
shot consists
of consecutive gradual frames.
[0022] In some embodiments, the step of inputting the video segments in a
preset model, and
determining confidence of each video segment corresponding to all preset video
classifications includes:
[0023] sampling the video segments according to a preset sampling method, and
obtaining at
least two sample frames to which the video segments correspond; and
3
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0024] preprocessing the sample frames, inputting the preprocessed sample
frames in the preset
model, and obtaining confidences of the video segments corresponding to all
the preset
video classifications.
[0025] In some embodiments, the step of inputting the preprocessed sample
frames in the preset
model includes:
[0026] extracting spatiotemporal features contained in the preprocessed sample
frames, and
inputting the spatiotemporal features in the preset model.
[0027] In some embodiments, the preset model is a previously trained MEnet 3D
convolutional
neural network model.
[0028] In some embodiments, the method further comprises receiving a target
duration, and the
step of determining the video segment to which the target video classification
corresponds
according to the target video classification and the confidence of each video
segment
corresponding to all preset video classifications includes:
[0029] determining the video segment to which the target video classification
corresponds
according to the target duration, the target video classification, the
confidence of each
video segment corresponding to all preset video classifications, and a
duration of the
video segment.
[0030] According to the second aspect, there is provided a video generating
device that
comprises:
[0031] a receiving module, for receiving an initial video and a target video
classification;
[0032] a segmenting module, for segmenting the initial video into video
segments according to
a preset video segmenting method;
[0033] a processing module, for inputting the video segments in a preset
model, and determining
confidence of each video segment corresponding to all preset video
classifications;
[0034] a matching module, for determining the video segment to which the
target video
classification corresponds according to the target video classification and
the confidence
4
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
of each video segment corresponding to all preset video classifications; and
[0035] a joining module, for joining the video segment to which the target
video classification
corresponds according to a preset joining parameter, and obtaining a target
video.
[0036] According to the third aspect, the present application provides a
computer system that
comprises:
[0037] one or more processor(s); and
[0038] a memory, associated with the one or more processor(s), for storing a
program instruction
that performs the following operations when it is read and executed by the one
or more
processor(s):
[0039] receiving an initial video and a target video classification;
[0040] segmenting the initial video into video segments according to a preset
video segmenting
method;
[0041] inputting the video segments in a preset model, and determining
confidence of each video
segment corresponding to all preset video classifications;
[0042] determining the video segment to which the target video classification
corresponds
according to the target video classification and the confidence of each video
segment
corresponding to all preset video classifications; and
[0043] joining the video segment to which the target video classification
corresponds according
to a preset joining parameter, and obtaining a target video.
[0044] The present invention achieves the following advantageous effects.
[0045] The present invention discloses a video generating method, by receiving
an initial video
and a target video classification, segmenting the initial video into video
segments
according to a preset video segmenting method, inputting the video segments in
a preset
model, determining confidence of each video segment corresponding to all
preset video
classifications, determining the video segment to which the target video
classification
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
corresponds according to the target video classification and the confidence of
each video
segment corresponding to all preset video classifications, joining the video
segment to
which the target video classification corresponds according to a preset
joining parameter,
and obtaining a target video, automatic generation of target videos that
conform to
requirements according to initial videos is realized, and timeliness and
precision of video
generation are ensured.
[0046] The present invention further proposes employing a preset shot boundary
detection
method to determine a shot boundary contained in the initial video, and
segmenting the
initial video into video segments according to the determined shot boundary,
and still
further proposes that the shot boundary contains an abrupt shot and a gradual
shot of the
initial video, and that segmenting the initial video into video segments
according to the
determined shot boundary includes: eliminating the abrupt shot and the gradual
shot from
the initial video, and obtaining a video segment collection consisting of the
video
segments remaining after the elimination, whereby is guaranteed the precision
in
segmenting video segments.
[0047] The present application discloses sampling the video segments according
to a preset
sampling method, obtaining at least two sample frames to which the video
segments
correspond, preprocessing the sample frames, inputting the preprocessed sample
frames
in the preset model, obtaining confidences of the video segments corresponding
to all the
preset video classifications, determining the preset video classification to
which the
confidence having the maximum value corresponds as the preset video
classification to
which the given video segment corresponds, determining the confidence having
the
maximum value as the confidence of the given video segment, and determining
the video
segment to which the target video classification corresponds and the
corresponding
confidence of the video segment according to the preset video classifications
and the
confidences to which all of the video segments correspond, whereby is
guaranteed the
precision in calculating the confidences.
[0048] Not all products of the present invention are necessarily required to
possess all of the
6
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
aforementioned effects.
BRIEF DESCRIPTION OF THE DRAWINGS
[0049] In order to more clearly describe the technical solutions in the
embodiments of the present
invention, drawings required for the illustration of the embodiments will be
briefly
introduced below. Apparently, the drawings described below are merely directed
to some
embodiments of the present invention, and it is possible for persons
ordinarily skilled in
the art to base on these drawings to acquire other drawings without spending
creative
effort in the process.
[0050] Fig. 1 is a view schematically illustrating the structure of the model
network provided by
an embodiment of the present application;
[0051] Fig. 2 is a flowchart illustrating shot segmentation provided by an
embodiment of the
present application;
[0052] Fig. 3 is a flowchart illustrating model training provided by an
embodiment of the present
application;
[0053] Fig. 4 is a flowchart illustrating the method provided by an embodiment
of the present
application;
[0054] Fig. 5 is a view illustrating the structure of the device provided by
an embodiment of the
present application; and
[0055] Fig. 6 is a view illustrating the structure of the computer system
provided by an
embodiment of the present application.
DETAILED DESCRIPTION OF THE INVENTION
[0056] In order to make more lucid and clear the objectives, technical
solutions and advantages
of the present invention, the technical solutions in the embodiments of the
present
invention will be clearly and comprehensively described below with reference
to the
accompanying drawings in the embodiments of the present invention. Apparently,
the
embodiments as described are merely partial embodiments, rather than the
entire
7
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
embodiments, of the present invention. All other embodiments obtainable by
persons
ordinarily skilled in the art based on the embodiments in the present
invention without
spending any creative effort shall all be covered by the protection scope of
the present
invention.
[0057] As noted in the Description of Related Art, the two methods of
generating commodity
videos frequently used in the state of the art are respectively restricted to
certain degrees.
Use of the manual editing method is high in manpower cost, low in efficiency,
and cannot
satisfy the practical requirements for generating commodity videos in great
batches;
although the video generating method based on graphic conversion achieves
higher
efficiency, available video formats and video styles are few and fixed, and
the capability
of expression is rather limited.
[0058] In order to solve the aforementioned technical problems, the present
application proposes
obtaining video segments by segmenting the video uploaded by a user with a
preset
segmenting method, employing a preset classification model to classify each
video
segment, obtaining confidence to which each video segment corresponds, and
joining any
video segment whose confidence satisfies a preset condition in the
classifications
according to a target video classification selected by the user, so as to
obtain the target
video. It is realized to generate a target video that conforms to requirements
according to
the video uploaded by the user, and timeliness of video generation is
guaranteed at the
same time.
[0059] Embodiment 1
[0060] In order to achieve classification of video segments obtained by
segmentation, it is
required to train the classification model in advance, specifically, an MEnet
3D
convolutional neural network model can be used as the classification model.
The MEnet
3D convolutional neural network model is a lightweight deep learning model,
relative to
such recent deep learning models as I3D, SlowFastnet, etc., the model is more
refined
and simplified, has fewer computational amount of floating points FLOPs, and
exhibits
more excellent testing effect on the testing dataset.
8
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0061] The training process includes:
[0062] 110¨ importing a training dataset;
[0063] the training dataset can be generated by the following method:
[0064] 111¨obtaining a preset number of commodity videos, and creating a
corresponding video
folder for each video;
[0065] 112¨ classifying segments contained in each video into different
categories according to
different contents as presented, wherein the categories include, but are not
limited to,
commodity subject appearance, commodity usage scene, and commodity content
introduction, and performing manual editing according to the classified
categories;
[0066] 113 ¨ creating a home folder corresponding to each category at the
folder to which each
video corresponds, wherein the home folder marks the corresponding category,
under the
various home folders is/are contained one or more sub video segment folder(s)
of the
video that corresponds to the category, and under the sub video segment
folder(s) is/are
stored one or more image frame(s) of the corresponding video segment;
[0067] 114¨ densely sampling the folder to which each video corresponds, and
normalizing the
sampled sample to a size of NxCxHxW, where N indicates the number of sample
frames
of each sub video segment folder, C indicates RGB channels of each frame, H
indicates
the preset height of each frame, and W indicates the preset width of each
frame, preferably,
N is at least 8.
[0068] 120 ¨ employing the training dataset to train the MFnet 3D
convolutional neural network
model, and obtaining a preset model.
[0069] Fig. 1 is a view schematically illustrating the network structure of
the model, including
3DCNN for extracting a 3D convolution feature contained in each sample, the 3D
convolution feature contains a spatiotemporal feature, including such movement
information of objects inside the video stream as movement tendency and
background
variation of commodities, etc.
9
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0070] 3Dpooling is a pooling layer of the model used for pooling the output
from 3DCNN, the
pooling result is input to a 3D MF-Unit layer, and such different convoluting
operations
as lx1x1, 3x3x3, 1x3 x3 are carried out;
[0071] Global Pool is a global pool layer used for retaining key features of
the input result and
reducing unnecessary parameters at the same time;
[0072] FClayer is a total connection layer used for outputting the confidence
of each video
segment corresponding to each category.
[0073] The model is employed to test a short video testing set with 56
commodities, and the
testing result is as shown in Table 1:
model loss
Accuracy/% Modelsie/MB Infer time/ms
MF net 0.022 95.92 29.6 330
Table 1
[0074] The model can classify samples obtained by dense sampling of a single
shot, the accuracy
rate of classifications reaches 95.92% in a testing result with altogether
1119 testing
samples of the above video dataset, the single model only has 29.6MB, the
forward
inference time of the video densely sampled from a single shot is 330ms, so
the accuracy
is high and the speed is quick.
[0075] After the preset model has been obtained, generation of the video can
be realized
according to the model. As shown in Fig. 2, the generating process includes:
[0076] Step A ¨ receiving an initial video input by a user;
[0077] Step B ¨ performing shot boundary detection on the initial video,
segmenting the video
according to a detection result, eliminating redundant segments, and obtaining
video
segments.
[0078] As shown in Fig. 3, the shot boundary detecting process includes the
following.
[0079] Each frame of the initial video is firstly equally divided into a
preset number of subblocks
lo
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
by the same preset method, a sub histogram of each subblock is thereafter
calculated, a
difference in histograms with respect to subblocks at the same positions of
adjacent
frames is calculated according to the sub histogram, the adjacent frames of
each frame
include the previous frame of the frame and the next frame of the frame. When
the
difference exceeds a first preset threshold TI-1, this indicates that
corresponding subblocks
between the adjacent frames differ unduly much, when the number of much
different
subblocks of a certain frame is higher than a second preset threshold, it is
considered that
this frame is an abrupt frame, and consecutive abrupt frames constitute an
abrupt shot.
With respect to any frame whose difference lies between the first preset
threshold Tx and
a third preset threshold TL, it is determined as a latent start frame, when
the differences
of its sequentially following frames likewise lie between TL and TI-1, and the
duration of
continuation exceeds a fourth preset threshold, these consecutive frames are
determined
as gradual frames, which constitute a gradual shot, and a shot from which the
gradual and
abrupt shots have been eliminated is considered to be a normal shot.
[0080] In order to guarantee the effect of the generated video, any unduly
short shot whose length
is less than a fifth preset threshold should be eliminated from the normal
shot, and the
required video segment collection is finally obtained.
[0081] Step C ¨ sampling the video segments, inputting the sampling result in
the preset model,
and obtaining a category and confidence to which each video segment
corresponds.
[0082] Firstly, the video segments are randomly densely sampled according to a
temporal
sequence of the video.
[0083] The random and dense sampling process includes the following.
[0084] Sample points are randomly initialized on the video segments, the
sample points are taken
as seven, emphasis is put at the end of the video segments, N frames are
uniformly
sampled, and the sample frames are so preprocessed that they satisfy the size
requirement
for input in the preset model.
[0085] The preprocessed sample frames are thereafter input in the preset
model, and confidences
11
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
of the video segments including the sample frames corresponding to all the
categories are
obtained.
[0086] Step D ¨joining the video segment to which a target category
corresponds according to
the target category and a target duration selected by the user, and generating
a target video.
[0087] For instance, when the user acquires an appearance presentation video
of a current
commodity, the video segments are sorted according to confidences of the
corresponding
appearance presentation category, and the video segment that conforms to the
requirement is screened out.
[0088] The specific screening rule can include the following.
[0089] When a duration T, of the video segment with the highest confidence has
already satisfied
the requirement of the target duration, the video segment with the highest
confidence is
directly taken as the target video;
[0090] when the duration T, of the video segment with the highest confidence
does not satisfy
the requirement of the target duration, the following n video segments T1 are
sequentially
selected according to sorting of the confidence values, where j E [1,n], until
the following
formula is satisfied:
[0091] T1 << + El=7Tj T2, where T2-Ti indicates the target duration;
[0092] when the duration of n+1 shots selected according to the above
confidence scores exceeds
the maximum duration Tz, both ends of the longest shot are cut according to
the duration
of each shot, until the total duration satisfies the requirement of the target
duration.
[0093] Step E ¨ sequentially joining the video segments obtained in Step D
according to the
temporal sequence of the initial video, and obtaining the target video.
[0094] The generated target video can be stored in a video database to be
reused when it is
required next time, or used for continued training of the model.
[0095] Based on the foregoing solution provided by the present application, it
is realized to
12
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
generate a target video that conforms to the requirement according to the
video uploaded
by the user, and timeliness of video generation is ensured at the same time.
[0096] Embodiment 2
[0097] Corresponding to the foregoing embodiment, the present application
provides a video
generating method, as shown in Fig. 4, the method comprises:
[0098] 410 - receiving an initial video and a target video classification;
[0099] 420 - segmenting the initial video into video segments according to a
preset video
segmenting method;
[0100] preferably, the method comprises:
[0101] 421 - employing a preset shot boundary detection method to determine a
shot boundary
contained in the initial video; and
[0102] segmenting the initial video into video segments according to the
determined shot
boundary;
[0103] preferably, the shot boundary contains an abrupt shot and a gradual
shot of the initial
video, and the method comprises:
[0104] 422 - eliminating the abrupt shot and the gradual shot from the initial
video, and obtaining
a video segment collection consisting of the video segments remaining after
the
elimination;
[0105] preferably, the video consists of consecutive frames, and the step of
determining the
abrupt shot and the gradual shot includes:
[0106] 423 - calculating degrees of deviation of all the frames from adjacent
frames;
[0107] judging, when a given degree of deviation exceeds a first preset
threshold, the given frame
as an abrupt frame, wherein the abrupt shot consists of consecutive abrupt
frames;
[0108] judging, when the given degree of deviation is between the first preset
threshold and a
second preset threshold, the given frame as a latent gradual frame; and
13
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0109] judging, when the number of consecutive latent gradual frames exceeds a
third preset
threshold, the latent gradual frames as gradual frames, wherein the gradual
shot consists
of consecutive gradual frames;
[0110] 430 - inputting the video segments in a preset model, and determining
confidence of each
video segment corresponding to all preset video classifications;
[0111] preferably, the method comprises:
[0112] 431 - sampling the video segments according to a preset sampling
method, and obtaining
at least two sample frames to which the video segments correspond; and
[0113] preprocessing the sample frames, inputting the preprocessed sample
frames in the preset
model, and obtaining confidences of the video segments corresponding to all
the preset
video classifications;
[0114] preferably, the obtained sample frames are at least eight frames;
[0115] preferably, the step of inputting the preprocessed sample frames in the
preset model
includes:
[0116] 432 - extracting spatiotemporal features contained in the preprocessed
sample frames,
and inputting the spatiotemporal features in the preset model;
[0117] preferably, the preset model is a previously trained MFnet 3D
convolutional neural
network model;
[0118] 440 - determining the video segment to which the target video
classification corresponds
according to the target video classification and the confidence of each video
segment
corresponding to all preset video classifications;
[0119] preferably, the method further comprises receiving a target duration,
and the step of
determining the video segment to which the target video classification
corresponds
according to the target video classification and the confidence of each video
segment
corresponding to all preset video classifications includes:
14
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0120] 441 - determining the video segment to which the target video
classification corresponds
according to the target duration, the target video classification, the
confidence of each
video segment corresponding to all preset video classifications, and a
duration of the
video segment; and
[0121] 450 - joining the video segment to which the target video
classification corresponds
according to a preset joining parameter, and obtaining a target video.
[0122] Embodiment 3
[0123] Corresponding to the foregoing method embodiment, the present
application provides a
video generating device, as shown in Fig. 5, the device comprises:
[0124] a receiving module 510, for receiving an initial video and a target
video classification;
[0125] a segmenting module 520, for segmenting the initial video into video
segments according
to a preset video segmenting method;
[0126] a processing module 530, for inputting the video segments in a preset
model, and
determining confidence of each video segment corresponding to all preset video
classifications;
[0127] a matching module 540, for determining the video segment to which the
target video
classification corresponds according to the target video classification and
the confidence
of each video segment corresponding to all preset video classifications; and
[0128] a joining module 550, for joining the video segment to which the target
video
classification corresponds according to a preset joining parameter, and
obtaining a target
video.
[0129] Preferably, the segmenting module 520 is further usable for employing a
preset shot
boundary detection method to determine a shot boundary contained in the
initial video;
and
[0130] segmenting the initial video into video segments according to the
determined shot
boundary.
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0131] Preferably, the shot boundary contains an abrupt shot and a gradual
shot of the initial
video, and the segmenting module 520 is further usable for eliminating the
abrupt shot
and the gradual shot from the initial video, and obtaining a video segment
collection
consisting of the video segments remaining after the elimination.
[0132] Preferably, the video consists of consecutive frames, and the
segmenting module 520 is
further usable for calculating degrees of deviation of all the frames from
adjacent frames;
judging, when a given degree of deviation exceeds a first preset threshold,
the given frame
as an abrupt frame, wherein the abrupt shot consists of consecutive abrupt
frames; judging,
when the given degree of deviation is between the first preset threshold and a
second
preset threshold, the given frame as a latent gradual frame; and judging, when
the number
of consecutive latent gradual frames exceeds a third preset threshold, the
latent gradual
frames as gradual frames, wherein the gradual shot consists of consecutive
gradual frames.
[0133] Preferably, the matching module 530 is further usable for sampling the
video segments
according to a preset sampling method, and obtaining at least two sample
frames to which
the video segments correspond; and preprocessing the sample frames, inputting
the
preprocessed sample frames in the preset model, and obtaining confidences of
the video
segments corresponding to all the preset video classifications.
[0134] Preferably, the matching module 530 is further usable for extracting
spatiotemporal
features contained in the preprocessed sample frames, and inputting the
spatiotemporal
features in the preset model.
[0135] Preferably, the preset model is a previously trained MFnet 3D
convolutional neural
network model.
[0136] Preferably, the receiving module 510 is further usable for receiving a
target duration, and
the matching module 540 is further usable for determining the video segment to
which
the target video classification corresponds according to the target duration,
the target
video classification, the confidence of each video segment corresponding to
all preset
video classifications, and a duration of the video segment.
16
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0137] Embodiment 4
[0138] Corresponding to the foregoing method and device, embodiment 4 of the
present
application provides a computer system that comprises: one or more
processor(s); and a
memory, associated with the one or more processor(s), for storing a program
instruction
that performs the following operations when it is read and executed by the one
or more
processor(s):
[0139] receiving an initial video and a target video classification;
[0140] segmenting the initial video into video segments according to a preset
video segmenting
method;
[0141] inputting the video segments in a preset model, and determining
confidence of each video
segment corresponding to all preset video classifications;
[0142] determining the video segment to which the target video classification
corresponds
according to the target video classification and the confidence of each video
segment
corresponding to all preset video classifications; and
[0143] joining the video segment to which the target video classification
corresponds according
to a preset joining parameter, and obtaining a target video.
[0144] Fig. 6 exemplarily illustrates the framework of the computer system
that can specifically
include a processor 1510, a video display adapter 1511, a magnetic disk driver
1512, an
input/output interface 1513, a network interface 1514, and a memory 1520. The
processor
1510, the video display adapter 1511, the magnetic disk driver 1512, the
input/output
interface 1513, the network interface 1514, and the memory 1520 can be
communicably
connected with one another via a communication bus 1530.
[0145] The processor 1510 can be embodied as a general CPU (Central Processing
Unit), a
microprocessor, an ASIC (Application Specific Integrated Circuit), or one or
more
integrated circuit(s) for executing relevant program(s) to realize the
technical solutions
provided by the present application.
17
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0146] The memory 1520 can be embodied in such a form as an ROM (Read Only
Memory), an
RAM (Random Access Memory), a static storage device, or a dynamic storage
device.
The memory 1520 can store an operating system 1521 for controlling the running
of a
computer system 1500, and a basic input/output system (BIOS) for controlling
lower-
level operations of the computer system 1500. In addition, the memory 1520 can
also
store a web browser 1523, a data storage administration system 1524, and an
icon font
processing system 1525, etc. The icon font processing system 1525 can be an
application
program that specifically realizes the aforementioned various step operations
in the
embodiments of the present application. To sum it up, when the technical
solutions
provided by the present application are to be realized via software or
firmware, the
relevant program codes are stored in the memory 1520, and invoked and executed
by the
processor 1510.
[0147] The input/output interface 1513 is employed to connect with an
input/output module to
realize input and output of information. The input/output module can be
equipped in the
device as a component part (not shown in the drawings), and can also be
externally
connected with the device to provide corresponding functions. The input means
can
include a keyboard, a mouse, a touch screen, a microphone, and various sensors
etc., and
the output means can include a display screen, a loudspeaker, a vibrator, an
indicator light
etc.
[0148] The network interface 1514 is employed to connect to a communication
module (not
shown in the drawings) to realize intercommunication between the current
device and
other devices. The communication module can realize communication in a wired
mode
(via USB, network cable, for example) or in a wireless mode (via mobile
network, WIFI,
Bluetooth, etc.).
[0149] The bus 1530 includes a passageway transmitting information between
various
component parts of the device (such as the processor 1510, the video display
adapter 1511,
the magnetic disk driver 1512, the input/output interface 1513, the network
interface 1514,
and the memory 1520).
18
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
[0150] Additionally, the computer system 1500 may further obtain information
of specific
collection conditions from a virtual resource object collection condition
information
database 1541 for judgment on conditions, and so on.
[0151] As should be noted, although merely the processor 1510, the video
display adapter 1511,
the magnetic disk driver 1512, the input/output interface 1513, the network
interface 1514,
the memory 1520, and the bus 1530 are illustrated for the aforementioned
device, the
device may further include other component parts prerequisite for realizing
normal
running during specific implementation. In addition, as can be understood by
persons
skilled in the art, the aforementioned device may as well only include
component parts
necessary for realizing the solutions of the present application, without
including the
entire component parts as illustrated.
[0152] As can be known through the description to the aforementioned
embodiments, it is clearly
learnt by person skilled in the art that the present application can be
realized through
software plus a general hardware platform. Based on such understanding, the
technical
solutions of the present application, or the contributions made thereby over
the state of
the art, can be essentially embodied in the form of a software product, and
such a
computer software product can be stored in a storage medium, such as an
ROM/RAM, a
magnetic disk, an optical disk etc., and includes plural instructions enabling
a computer
equipment (such as a personal computer, a cloud server, or a network device
etc.) to
execute the methods described in various embodiments or some sections of the
embodiments of the present application.
[0153] The various embodiments are progressively described in the Description,
identical or
similar sections among the various embodiments can be inferred from one
another, and
each embodiment stresses what is different from other embodiments.
Particularly, with
respect to the system or system embodiment, since it is essentially similar to
the method
embodiment, its description is relatively simple, and the relevant sections
thereof can be
inferred from the corresponding sections of the method embodiment. The system
or
system embodiment as described above is merely exemplary in nature, units
therein
19
Date Regue/Date Received 2022-06-29

CA 03166347 2022-06-29
described as separate parts can be or may not be physically separate, parts
displayed as
units can be or may not be physical units, that is to say, they can be located
in a single
site, or distributed over a plurality of network units. It is possible to base
on practical
requirements to select partial modules or the entire modules to realize the
objectives of
the embodied solutions. It is understandable and implementable by persons
ordinarily
skilled in the art without spending creative effort in the process.
[0154] What the above describes is merely directed to preferred embodiments of
the present
invention, and is not meant to restrict the present invention. Any
modification, equivalent
substitution, and improvement makeable within the spirit and scope of the
present
invention shall all be covered by the protection scope of the present
invention.
Date Regue/Date Received 2022-06-29

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Amendment Received - Voluntary Amendment 2024-06-17
Amendment Received - Response to Examiner's Requisition 2024-06-17
Examiner's Report 2024-06-13
Inactive: Report - No QC 2024-06-12
Amendment Received - Voluntary Amendment 2023-12-04
Amendment Received - Response to Examiner's Requisition 2023-12-04
Examiner's Report 2023-08-04
Inactive: Report - No QC 2023-07-11
Letter sent 2022-07-29
Letter Sent 2022-07-28
Application Received - PCT 2022-07-28
Inactive: First IPC assigned 2022-07-28
Inactive: IPC assigned 2022-07-28
Request for Priority Received 2022-07-28
Priority Claim Requirements Determined Compliant 2022-07-28
Request for Examination Requirements Determined Compliant 2022-06-29
All Requirements for Examination Determined Compliant 2022-06-29
National Entry Requirements Determined Compliant 2022-06-29
Application Published (Open to Public Inspection) 2021-07-08

Abandonment History

There is no abandonment history.

Maintenance Fee

The last payment was received on 2023-12-15

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Request for examination - standard 2024-08-28 2022-06-29
Basic national fee - standard 2022-06-29 2022-06-29
MF (application, 2nd anniv.) - standard 02 2022-08-29 2022-06-29
MF (application, 3rd anniv.) - standard 03 2023-08-28 2023-06-15
MF (application, 4th anniv.) - standard 04 2024-08-28 2023-12-15
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
10353744 CANADA LTD.
Past Owners on Record
BANGFA DONG
MINMIN HUANG
XIAN YANG
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Claims 2024-06-17 20 1,300
Claims 2023-12-04 20 1,305
Drawings 2022-06-29 5 281
Claims 2022-06-29 4 143
Abstract 2022-06-29 1 25
Description 2022-06-29 20 904
Representative drawing 2022-06-29 1 56
Cover Page 2022-11-08 1 52
Amendment / response to report 2024-06-17 45 2,069
Examiner requisition 2024-06-13 5 263
Courtesy - Letter Acknowledging PCT National Phase Entry 2022-07-29 1 591
Courtesy - Acknowledgement of Request for Examination 2022-07-28 1 423
Examiner requisition 2023-08-04 3 139
Amendment / response to report 2023-12-04 51 3,662
International Preliminary Report on Patentability 2022-06-29 11 422
National entry request 2022-06-29 7 209
International search report 2022-06-29 4 180
Amendment - Abstract 2022-06-29 2 89