Patent 3057924 Summary

(12) Patent Application:	(11) CA 3057924
(54) English Title:	SYSTEM AND METHOD TO OPTIMIZE THE SIZE OF A VIDEO RECORDING OR VIDEO TRANSMISSION BY IDENTIFYING AND RECORDING A REGION OF INTEREST IN A HIGHER DEFINITION THAN THE REST OF THE IMAGE THAT IS SAVED OR TRANSMITTED IN A LOWER DEFINITION FORMAT
(54) French Title:	SYSTEME ET METHODE D`OPTIMISATION DE LA TAILLE D`UN ENREGISTREMENT VIDEO OU D`UNE TRANSMISSION VIDEO EN CERNANT ET ENREGISTRANT UNE REGION D`INTERET DANS UNE DEFINITION SUPERIEUREAU RESTE DE L`IMAGE SAUVEGARDEE OU TRANSMISE DANS UNE DEFINITION INFERIEURE
Status:	Deemed Abandoned and Beyond the Period of Reinstatement - Pending Response to Notice of Disregarded Communication

Bibliographic Data

(51) International Patent Classification (IPC):	H4N 19/167 (2014.01) G6T 9/00 (2006.01) H4N 19/162 (2014.01) H4N 19/17 (2014.01) H4N 21/4728 (2011.01)
(72) Inventors :	UNKNOWN, (Country Unknown)
(73) Owners :	ALFONSO F. DE LA FUENTE SANCHEZ DANY A. CABRERA VARGAS
(71) Applicants :	ALFONSO F. DE LA FUENTE SANCHEZ (Canada) DANY A. CABRERA VARGAS (Canada)
(74) Agent:
(74) Associate agent:
(45) Issued:
(22) Filed Date:	2019-10-08
(41) Open to Public Inspection:	2021-04-08
Availability of licence:	N/A
Dedicated to the Public:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Abstracts

English Abstract

A method to identify from origin at the video capture device points of
interest on the
video and record them in higher quality than the rest of the image, thus,
making the
video file smaller in size for saving or transmission over the internet.

Claims

Note: Claims are shown in the official language in which they were submitted.

CLAIMS
What is claimed is:
1- A system and method to optimize the size of a video recording or video
transmission
by identifying and recording in higher image resolution the region of
interest, from the
rest of the image, which is saved or transmitted in a lower resolution format,
comprising:
receiving, in a first device, a request from a user to locate in an original
master
image an object or region of interest,
identifying what is the region or object of interest to extract from an
original
master image,
locating in the original master image, using computer vision trained with
machine
learning, matches to the region or object of interest,
masking in a polygon an area surrounding the located region or object of
interest,
wherein the image within the polygon becomes a first slave image, wherein
the first slave image maintains the same image resolution as the original
master image,
identifying the region of interest and viewport information of the first slave
image
in reference to the master image, saving the first slave image's viewport
information into the viewport information file,
saving the processed master image in a smaller resolution format than the
first
slave image,
combining in a container the master image, the viewport information file and
the
first slave image,
saving the container in the device.
2- The system and method of claim 1, wherein the original master image
comprises a
second match to the object of interest, the system and method further
comprising:
1

masking in a polygon an area surrounding the second object of interest,
wherein
the image within the polygon becomes a second slave image, wherein the
second slave image maintains the same image resolution as the original
master image,
identifying the region of interest and viewport information of the second
slave
image in reference to the master image,
saving the second image's viewport information into the viewport information
file,
saving the processed master image in a smaller resolution format than the
first
and second slave image,
combining in a container the master image, the viewport information file and
the
first and second slave image,
saving the container in the device.
3- The system and method of claim 1, further comprising:
transmitting the container to a second device,
opening the container in the second device,
performing a computer vision recognition process to the slave images,
using the data from the viewport information file to identify the slave images
location in reference to the master image, playing back the master image with
the slave images superimposed.
4- The system and method of claim 1, wherein the device is one or more from
the group
of a server, a video surveillance system, a smart gadget, a computer, a
smartphone, a
tablet, a digital camera.
5- The system and method of claim 1, wherein the object of interest is the
face of a
person.
2

6- The system and method of claim 3, wherein the computer vision recognition
process
is one from the group of face recognition, text recognition.
7- The system and method of claim 1, wherein the original master image is one
from a
camera, wherein the camera is one from the group of security cameras, smart
gadget,
digital camera.
8- A non-transitory computer readable medium comprising instructions, which,
when
executed by a processor, performs a method, the method to optimize the size of
a video
recording or video transmission by identifying and recording in higher image
resolution
the region of interest, from the rest of the image, which is saved or
transmitted in a
lower resolution format, comprising:
receiving, in a first device, a request from a user to locate in an original
master
image an object or region of interest,
identifying what is the region or object of interest to extract from an
original
master image,
locating in the original master image, using computer vision trained with
machine
learning, matches to the region or object of interest,
masking in a polygon an area surrounding the located region or object of
interest,
wherein the image within the polygon becomes a first slave image, wherein
the first slave image maintains the same image resolution as the original
master image,
identifying the viewport information of the first slave image in reference to
the
master image, saving the first slave image's viewport information into the
viewport information file,
saving the processed master image in a smaller resolution format than the
first
slave image,
combining in a container the master image, the viewport information file and
the
first slave image,
3

saving the container in the device.
9- The non-transitory computer readable medium of claim 8, wherein the
original master
image comprises a second match to the object of interest, the system and
method
further comprising:
masking in a polygon an area surrounding the second object of interest,
wherein
the image within the polygon becomes a second slave image, wherein the
second slave image maintains the same image resolution as the original
master image,
identifying the viewport information of the second slave image in reference to
the
master image,
saving the second image's viewport information into the viewport information
file,
saving the processed master image in a smaller resolution format than the
first
and second slave image,
combining in a container the master image, the viewport information file and
the
first and second slave image,
saving the container in the device.
10- The non-transitory computer readable medium of claim 8, further
comprising:
transmitting the container to a second device,
opening the container in the second device,
performing a computer vision recognition process to the slave images,
using the data from the viewport information file to identify the slave images
location in reference to the master image, playing back the master image with
the slave images superimposed.
4

11- The non-transitory computer readable medium of claim 8, wherein the device
is one
or more from the group of a server, a video surveillance system, a smart
gadget, a
computer, a smartphone, a tablet, a digital camera.
12- The non-transitory computer readable medium of claim 8, wherein the object
of
interest is the face of a person.
13- The non-transitory computer readable medium of claim 10, wherein the
computer
vision recognition process is one from the group of face recognition, text
recognition.
14- The non-transitory computer readable medium of claim 8, wherein the
original
master image is one from a camera, wherein the camera is one from the group of
security cameras, smart gadget, digital camera.

Description

Note: Descriptions are shown in the official language in which they were submitted.

TITLE
SYSTEM AND METHOD TO OPTIMIZE THE SIZE OF A VIDEO RECORDING OR
VIDEO TRANSMISSION BY IDENTIFYING AND RECORDING A REGION OF
INTEREST IN A HIGHER DEFINITION THAN THE REST OF THE IMAGE THAT IS
SAVED OR TRANSMITTED IN A LOWER DEFINITION FORMAT
INVENTORS: DE LA FUENTE SANCHEZ, Alfonso Fabian
CABRERA VARGAS, Dany Alejandro
BACKGROUND
Security camera recordings, specially the ones that record or capture video
24/7 and
save the information in the cloud, often record at low frames per second (FPS)
or low
resolution to save bandwidth or memory space in the storage device. Points of
interest
such as faces, vehicle tags, and others are as blurred as other non-interest
elements
such as streets, trees, and people's bodies.
Discussion of Prior Art
A patent search conducted showed several patented inventions which are related
to the
art. Security cameras, especially the ones that record or capture video 24/7
and save
the
information in the cloud often record at low FPS or low resolution to save
bandwidth or memory space in the storage device. Points of interest such as
faces,
vehicle tags and others are as blurred as other non-interest elements such as
streets,
trees, people's bodies.
The concept of detection video recording focus on region of interest is
already known
and are described in several available patents. There are published patent
applications
which described the same. For instance, a patent application presented a
patent
presented a method for playing back video files optimizes display viewing
while
minimizing file size. One of a plurality of video files representing the same
video
1
CA 3057924 2019-10-08

production is automatically selected for viewing based on multiple criteria,
such as
network bandwidth, the type of video players available to display the video
file, the
format of the video file and the platform used to display the file. The width
and height of
the image displayed from the selected video file is adjusted to match the
resolution of a
display screen, or a user specified image size (US20050097615A1).
Further, another invention showed an invention which provides a scalable video
coding
method where a region of interest has transmission priority. In the method,
the region of
interest is treated with default setting or optionally chosen by a user at a
receiving
terminal during coding. The bit stream of an enhancement layer is treated in
different
manners according to the fact that whether an instruction is received from the
region of
interest (ROI) when the instruction is sent. If the instruction from the ROI
is not received,
the encoded bit stream is packed according to a preset order and an obtained
channel
bandwidth; if the instruction from the ROI is received, a coding and recoding
module
recodes after coding according to the position and the size of the region of
interest and
has the transmission priority so as to ensure that the region of interest of
the user at the
receiving terminal is more clear to improve the subjective quality of the
video
(C N 101262604A) .
Furthermore, a video optimized media streamer with cache management was
presented
in an invention, a data storage system includes a mass storage unit storing a
data
entity, such as a digital representation of a video presentation, that is
partitioned into a
plurality N of temporally-ordered segments. A data buffer is bidirectionally
coupled to
the mass storage unit for storing up to M of the temporally-ordered segments,
wherein
M is less than N. The data buffer has an output for outputting stored ones of
the
temporally-ordered segments. The data storage system further includes a data
buffer
manager for scheduling transfers of individual ones of the temporally-ordered
segments
between the mass storage unit and the data buffer (US5586264A).
Another patent also described a display viewing system and methods for
optimizing
display view based on active tracking. An apparatus for interfacing with a
display screen
2
CA 3057924 2019-10-08

is provided. The apparatus includes a frame. The frame includes (a) a pair of
shutter
lenses, (b) a light coupled to the frame; and (b) a circuit integrated with
the frame to
control the pair of shutter lenses and control the light coupled to the frame.
The circuit is
configured to communicate with a display device to enable synchronization of
the
shutter lenses and the display device. The light is analyzed to determine
position of the
frame relative to the display device, and the position is used to cause an
adjustment in
display output when viewed from the perspective of the position of the frame
(US20100007582A1).
Another invention presents a video streaming parameter optimization and QoS. A
data
streaming system comprises: one or more streaming sources, one or more
streaming
clients, a network connecting said streaming sources and said clients, and a
level of
service selector able for each data stream to monitor the network, the
respective
streaming source and the respective streaming client to control streaming to
the
respective streaming client to define a level of service of the data stream.
For a video
stream the level of service may define the frame rate, the resolution, the
overall quality
or a level of masking used (US7587454B2).
Meanwhile, another invention presented an instant replay of digital video
optimized
using non MPEG frame tags. A method and device for providing instant replay in
an
MPEG video decoder. The method and device provides non-MPEG frame tags for
correlation of the frames in the decompressed domain to the frames in the
compressed
domain (US698025662).
Another patent also disclosed a method is described for determining a region-
of-interest
(ROI) for a client device on the basis of at least one HEVC-tiled (panorama)
video
stream wherein the method comprises: receiving a ROI video stream for
rendering a
first ROI defining a first sub-region within the full image region of said
HEVC-tiled video
stream, the positions of HEVC tiles in said HEVC-tiled panorama video being
defined by
tile position information; providing ROI position information associated with
at least one
video frame of said ROI video stream, said ROI position information comprising
at least
3
CA 3057924 2019-10-08

first position of said first ROI; identifying one or more HEVC tiles of said
HEVC-tiled
video stream on the basis of said ROI position information and said tile
position
information, preferably one or more HEVC tiles that overlap with said first
ROI; and,
requesting video data associated with said one or more identified HEVC tiles
for
rendering a second ROI defining a second sub-region within the full image
region of
said HEVC-tiled video stream (EP3162074A1).
Another patented invention also described an apparatus and method for
automatic
surveillance of a monitored area having one or more regions of interest using
a video
imaging device are disclosed. The method includes receiving data defining said
one or
more regions of interest and one or more characteristics of an object of
interest; pointing
the line of sight of the imaging device at a region of interest selected from
said one or
more regions of interest; automatically scanning the selected region of
interest to detect
said one or more characteristics of the object of interest and upon detection
issuing an
alert; and when said one or more regions of interest includes more than one
region of
interest, repeating the steps of pointing the line of sight of the imaging
device and
automatically scanning in a predetermined order of viewing of the regions of
interest,
and upon detection of said one or more characteristics of the object of
interest issuing
an alert, for each of the regions of interests (US9253453B2).
A method for point-of-interest attraction towards an object pixel in a digital
image by first
performing object segmentation resulting in a contour-based or a region-based
representation of object pixels and background pixels of the image. Secondly a
vector
distance transform image is computed comprising a vector displacement of each
background pixel towards the nearest of said object pixels and the nearest
object pixel
for a given background pixel is determined by adding the vector displacement
to said
background pixel. Finally, the point-of-interest is attracted towards the
determined
nearest object pixel (US20070116357A1).
Video multicast optimization. A network device includes a communication
interface and
a processor. The communication interface may receive a multicast stream that
includes
4
CA 3057924 2019-10-08

a frame. The processor is coupled to the communication interface and may
determine
whether to send the frame unicast or multicast. The communication interface
transmits
the frame unicast or multicast based on the determination by the processor.
The
determination by the processor may be based on characteristics of the frame.
If the
characteristics of the frame include characteristics of a key frame such as an
l-frame,
the processor may determine to transmit the frame unicast. The determination
may also
be based on a predetermined state of client devices that are to receive the
frame. If a
client device is in a predetermined state such as a power save state, the
processor may
determine to transmit the frame unicast to that client device. Other
embodiments are
also described (US20130286922A1).
System and method for optimizing multimedia compression using plural encoders.
A
multimedia stream is compressed in parallel by plural encoders, the compressed
stream
outputs of which are dynamically evaluated for merit. A best one of the
compressed
streams is transmitted, along with information regarding the particular
compression
algorithm that was used, so that the receiver's decoder can decompress the
stream for
presentation (US7720999B2).
Region of interest (roi) video encoding. A method of encoding an image frame
in a video
encoding system. The image frame has a region of interest (ROI) and a non-
region of
interest (non-R01). In the method, quantization scale for the image frame
based on rate
control information is determined. ROI statistics based on residual energy of
the ROI
and non-ROI is then calculated. Quantization scale for the image frame based
on ROI
priorities and ROI statistics is calculated. Further, quantization scales for
ROI and
non-ROI based on ROI priorities are determined (US20110235706A1).
A method for optimizing the storage of video signals in a digital scan
converter provided
with an intermediate block memory provided between the radial memory and the
image
memory. In the block memory, the pixels to be displayed are grouped together
in
blocks, the blocks being transferred in parallel to the image memory when they
are
CA 3057924 2019-10-08

completely filled. The blocks of the block memory correspond to those of the
image
memory and can each be used several times during a single antenna revolution
(US4740789A).
A method of optimizing storage for streaming video. A method for removing
cookies,
temporary Internet files, and defragmenting a hard drive to optimize streaming
video
performance. A program residing on a computer automatically (and configurably)
removes cookies and temporary Internet files from a hard drive, and
defragments the
hard drive on shutdown of the computer system (US20040059863A1
The invention discloses a method for optimizing video coding keyframe
positions
through dynamic planning, and relates to the field of audio/video coding. The
method
comprises the steps of assigning encoders and parameters of precoding,
carrying out
precoding, obtaining the coding size of each frame of coded videos and the
positions of
all scene switching points after the precoding is finished, setting all the
video frames of
the scene switching points as keyframes, and calculating the optimized
keyframe
allocation plan of a section according to the size information of a coding
frame between
any two adjacent scene switching points. The method for optimizing the video
coding
keyframe positions through the dynamic planning has the advantages that code
rates
can be saved without influencing play dragging precision and image quality, or
namely,
under the conditions of the same code rates and play dragging precision, the
method
can partially improve the image quality (CN103491381A).
The present invention relates to a method and system for controlling the
quality of video
data by selecting the optimal quantization parameter during encoding. The
system is
configured to perform quantization for one or more macroblocks using a
different range
of step sizes, then the kurtosis for the respective quantized data is
performed to
determine the macroblock and the corresponding quantization step size that
yields the
highest kurtosis value. The quantization step size that generates the highest
kurtosis
value is selected to the blocks of input-video data during encoding
(US20030231796A1).
6
CA 3057924 2019-10-08

Optimizing data-transmission system and method. The invention provides a data
transmission system (as an attached graph 2). The system comprises a frame
analyzing
system (106 in the attached graph 2) for receiving frame data and generating
regional
data (112 in the attached graph); the frame data is similar to a video data
frame; and the
regional data is similar to a uniform matrix dimension (204 in the attached
graph 2) for
dividing the frame into a predetermined set of matrices. A pixel selection
system (108 in
the attached graph 2) receives the regional data and generates a set of pixel
data
corresponding to each region, such as by selecting one of a plurality of
pixels which are
contained in each original matrix comprising the frame (0N101290679B).
Method and apparatus for video encoding optimization. There is provided an
encoder
and a corresponding method for encoding video signal data corresponding to a
plurality
of pictures. The encoder includes an overlapping window analysis unit (310)
for
performing a video analysis of the video signal data using a plurality of
overlapping
analysis windows with respect to at least some of the plurality of pictures
corresponding
to the video signal data, and for adapting encoding parameters for the video
signal data
based on a result of the video analysis (W02006007285A1).
Video coding with optimized low complexity variable length codes. A novel
group of
optimized variable word-length codes for intra coded pictures is disclosed.
Hardware
complexity and implementation cost is minimized by only optimizing the
variable
word-length codes for intra coded pictures for a small, but frequently
occurring number
of discrete cosine transform events. A table of variable word-length codes
conforming to
the Motion Pictures Expert Group Phase 1 ("MPEG-1") standard may be used for
the
remaining number of less frequently occurring events. In an illustrative
example of the
invention, the MPEG-1 variable word-length code table is also used to code non-
intra
pictures which advantageously allows for totally compatible operation with the
MPEG-1
standard (US5563593A).
In another patent, an optimized data transmission system and method. A data
transmission system (see FIG. 2). The system includes a received frame data
and
7
CA 3057924 2019-10-08

generating the area data (see FIG. 112) of the frame analysis system (see Fig.
1062),
the frame data such as video data frame, such as the area data for the frame
into the
predetermined set of matrices uniform matrix size (see the reference 2042).
Pixel
selection system receives data region (see FIG. 1082) is one of a plurality of
pixels and
generating a set of pixel data for each region, for example by selecting
contained in
each of the original matrix comprises a frame (CN100395959C).
Video surveillance system employing video primitives. A video surveillance
system
extracts video primitives and extracts event occurrences from the video
primitives using
event discriminators. The system can undertake a response, such as an alarm,
based
on extracted event occurrences (CN101405779A).
In another patent, a Wireless video surveillance system and method with input
capture
and data transmission prioritization and adjustment. A surveillance system and
method
with at least one wireless input capture device ICD(s) and a corresponding
digital input
recorder (DIR) and/or another ICD, including the steps of providing the base
system; at
least one user accessing the DIR via user interface either directly or
remotely; the DIR
and/or ICD searching for signal from the ICD(s) and establishing communication
with
them, and the system providing for input capture and data transmission
prioritization,
thereby providing a secure surveillance system having wireless communication
for
monitoring a target environment with prioritization capabilities
(US940787762).
System and method for providing and transmitting condensed streaming content.
A
stream condense unit coupled to a streaming server and a client player is
provided. The
stream condense unit includes a streaming data input unit, a stream content
analysis
unit, a frame timestamp adjust unit, and a streaming data output unit. The
streaming
data input unit is configured to receive a plurality of streaming content
groups sent by
the streaming server. The stream content analysis unit is configured to
receive the
plurality of streaming content groups, execute a content analysis to get
importance
scores of the source streaming contents. The frame timestamp adjust unit is
configured
to receive the condensed stream and adjust a timestamp of each frame in the
8
CA 3057924 2019-10-08

condensed stream. The streaming data output unit is configured to receive the
condensed stream and attach content identifying labels and tables to the
condensed
stream, and send the condensed stream to the client player to display
(US8719442B2).
Networked surveillance and control system. A surveillance and control system
include a
feature extraction unit to dynamically extract low-level features from a
compressed
digital video signal, a description encoder, coupled to the feature extraction
unit, to
encode the low-level features as content descriptors. An event detector is
coupled to
the description encoder to detect security events from the content
descriptors, and a
control signal processor, coupled to the event detector, to generate control
signals in
response to detecting the security events (US6646676B1).
A system and method of alerting visually impaired users of nearby objects was
also
disclosed in another patent. A system and method for assisting a visually
impaired
user including an imaging device , a processing unit for receiving images from
the
imaging device and converting the images into signals for use by one or more
controllers , and one or more vibro - tactile devices , wherein the one or
more
controllers activates one or more of the vibro - tactile devices in response
to said
signals received from the processing unit. The system preferably includes a
lanyard
to be worn around the neck of the user such that a first vibro - tactile
device is arranged
on the right side of the user' s neck , a second vibro - tactile device on a
left side
of the user' s neck, and a third vibro - tactile device at the back portion of
the user' s
neck (US Patent No. 9801 778 B2).
Another invention presented a method in determining a region of interest on
the basis of
a hevc-tiled video stream. A method is described for determining a region-of-
interest
(ROI) for a client device on the basis of at least one HEVC-tiled (panorama)
video
stream wherein the method comprises: receiving a ROI video stream for
rendering a
first ROI defining a first sub-region within the full image region of said
HEVC-tiled video
stream, the positions of HEVC tiles in said HEVC-tiled panorama video being
defined by
tile position information; providing ROI position information associated with
at least one
9
CA 3057924 2019-10-08

video frame of said ROI video stream, said ROI position information comprising
at least
first position of said first ROI; identifying one or more HEVC tiles of said
HEVC-tiled
video stream on the basis of said ROI position information and said tile
position
information, preferably one or more HEVC tiles that overlap with said first
ROI; and,
requesting video data associated with said one or more identified HEVC tiles
for
rendering a second ROI defining a second sub-region within the full image
region of
said HEVC-tiled video stream (EP3162074A1).
A method of supporting region-of-interest cropping through constrained
compression.
Region-of-interest cropping of high-resolution video is supported video
compression and
extraction methods. The compression method divides each frame into virtual
tiles, each
containing a rectangular array of macroblocks. Intra-frame compression uses
constrained motion estimation to ensure that no macroblock references data
beyond the
edge of a tile. Extra slice headers are included on the left side of every
macroblock row
in the tiles to permit access to macroblocks on the left edge of each tile
during
extraction. The compression method may also include breaking skipped
macroblock
runs into multiple smaller skipped macroblock runs. The extraction method
removes
slices from virtual tiles that intersect the region-of-interest to produce
cropped frames.
The cropped digital video stream and the compressed digital video stream have
the
same video sequence header information (US20100232504A1).
A method for optimizing a video stream. A method for optimizing a video
stream,
including: retrieving an original parameter set for an original bit stream in
a video
container received from an origin server; and writing stream-specific metadata
to a
header for the video container, wherein the stream-specific metadata includes
the
original parameter set for the original bit stream and an optimized parameter
set for an
optimized bit stream (EP2673958A1).
Method of displaying a region of interest in a video stream. Methods or
arrangements
for coding, decoding, compressing or decompressing digital video signals using
hierarchical techniques, e.g. scalability in the spatial domain (GB2509954B).
CA 3057924 2019-10-08

Region of interest video coding using tiles and tile groups. A system, method
and
means for region of interest (ROI) video coding using tile and tile groups are
disclosed.
An encoded video sequence including a plurality of tiles may be received. The
plurality
of tiles can be divided into one or more tile groups. Signaling may be
received indicating
the parameters of the at least one tile group. One of the tile groups of the
one or more
tile groups may be decoded and an image of the decoded tile group may be
displayed.
The decoded tile group may overlap with the ROI. The ROI may correspond to the
displayed image, and the displayed image may be part of the encoded video
sequence.
A tile group that does not overlap with the ROI may not be decoded
(KR101953679B1).
Enhancing a region of interest in video frames of a video stream. A method for
enhancing a region of interest in video frames of a video stream is described.
The
method includes receiving media data defining base video frames of at least
one base
stream, said base stream being associated with one or more enhancement tile
streams,
an enhancement tile stream comprising media data defining tiled video frames
comprising a tile, said tile comprising media data for enhancing visual
content in a
subregion of the image region of the base video frames; requesting media data
of one
or more enhancement tile streams, preferably one or more HEVC enhancement tile
streams, the one or more enhancement tile streams being associated with media
data
for enhancing visual content in the at least one region of interest;
generating base video
frames on the basis of media data of the at least one base stream and
generating tiled
video frames on the basis of media data of the one or more enhancement tile
streams,
the tiled video frames comprising one or more tiles forming visual content the
at least
one region of interest; and, replacing or blending at least part of the visual
content of
said at least one region of interest in the video frames of said base stream
with at least
part of said enhanced visual content of the tiled video frames
(US20180295400A1).
Methods and systems for auto-zoom based adaptive video streaming. Automatic
adaptive zoom enables computing devices that receive video streams to use a
higher
resolution stream when the user enables zoom, so that the quality of the
output video is
11
CA 3057924 2019-10-08

preserved. In some examples, a tracking video stream and a target video stream
are
obtained and are processed. The tracking video stream has a first resolution,
and the
target video stream has a second resolution that is higher than the first
resolution. The
tracking video stream is processed to define regions of interest for frames of
the
tracking video stream. The target video stream is processed to generate zoomed-
in
regions of frames of the target video stream. A zoomed-in region of the target
video
stream corresponds to a region of interest defined using the tracking video
stream. The
zoomed-in regions of the frames of the target video stream are then provided
for display
on a client device (US1031341762).
Systems and methods for region-of-interest tone remapping. Systems and methods
are
described for providing viewers of adaptive bit rate (ABR) streaming video
with the
option to view alternative streams in which an alternative tone mapping is
applied to one
or more regions of interest. The availability of streams with alternative tone
mappings
may be identified in a media presentation description (MPD) in an MPEG-DASH
system. In some embodiments, the streaming video is divided into slices, and
alternative tone mappings are applied to regions of interest within the
slices. When a
server receives a request from a client device for alternative tone mappings
of different
regions, slices with the appropriate mapping may be assembled on demand and
delivered to the requestor as a single video stream. Tone mappings may be
used, for
example, to highlight particular players in a sporting event (W02018009828A1).
Method and systems for displaying a portion of a video stream with partial
zoom ratios.
Systems and methods are described to enable video clients to zoom in to a
region or
object of interest without substantial loss of resolution. In an exemplary
method, a
server transmits a manifest, such as a DASH MPD, to a client device. The
manifest
identifies a plurality of sub-streams, where each sub-stream represents a
respective
spatial portion of a source video. The manifest also includes information
associating an
object of interest with a plurality of the spatial portions. To view high-
quality zoomed
video, the client requests the sub-streams that are associated with the object
of interest
12
CA 3057924 2019-10-08

and renders the requested sub-streams. In some embodiments, different sub-
streams
are available with different zoom ratios (W02018049321A1).
A method for tracked video zooming. Systems, methods, and instrumentalities
are
disclosed for dynamic picture-in-picture (PIP) by a client. The client may
reside on any
device. The client may receive video content from a server, and identify an
object within
the video content using at least one of object recognition or metadata. The
metadata
may include information that indicates a location of an object within a frame
of the video
content. The client may receive a selection of the object by a user, and
determine
positional data of the object across frames of the video content using at
least one of
object recognition or metadata. The client may display an enlarged and time-
delayed
version of the object within a PIP window across the frames of the video
content.
Alternatively or additionally, the location of the PIP window within each
frame may be
fixed or may be based on the location of the object within each frame
(W02019046095A1).
Systems and methods for providing high-resolution regions-of-interest. A
system and
method for controlling display of at least two video streams over a network
connection,
comprising analysing video capabilities of at least one multi-stream video
source to
determine if said at least one multi-stream video source is capable of
providing a first
video stream and a second video stream of a high quality region of interest
(HQ-R01)
corresponding to said first video stream, accessing said first and second
video streams
from said at least one multi-stream video source for delivery over said
network
connection, receiving said first and second video streams simultaneously from
said at
least one multi-stream video source over said network connection and
simultaneously
displaying said received first and second video streams (W02007015817A2).
Video transmission considering a region of interest in the image data. To
enable
efficient use of limited bandwidth in transmitting video, a region of interest
is determined
in each image. Before coding,the image is spatially scaled, with magnification
applied
inside thata region of interest. The scaled images are then compression
encoded.
13
CA 3057924 2019-10-08

Meta-data identifying the location of the region of interest accompanies the
transmitted
video so that, after decoding, the scaling can be reversed (W02008107721A1).
System and method for recording a person in a region of interest. A method and
system
for recording a player in at least one zone of a playing Field are described
herein. The
system includes at least one set of cameras, each set containing one or more
cameras.
Each zone is associated with one set of cameras which is used to obtain a set
of video
data from the zone associated therewith. The player wears an emitter of
electromagnetic signals. A sensor apparatus receives and processes the
electromagnetic signals to obtain location information indicative of when the
player is
located in a particular zone. A video processing module, in data communication
with the
at least one set of cameras, utilizes the location information to extract
video information
of the player from the at least one set of video data (US20080297304A1).
System and method for high-resolution storage of images. An image-creation
method
includes capturing an image as digital data, locating an area of interest of
the captured
image, extracting, from the digital data, at least some data corresponding to
the located
area of interest, digitally magnifying the extracted at least some data to
yield digitally
magnified data, and combining the digitally magnified data with at least some
of the
digital data of the captured image to yield combined data (US986053662).
Method and apparatus for creating a zone of interest in a video display. A
method of
creating a zone of interest in a video scene comprising the steps of capturing
a video
scene, transmitting a captured video scene over a network, receiving a
captured video
scene from the network, enabling a user to identify a portion of a captured
video scene
to be a zone of interest, replicating the portion of the captured video scene
identified by
the user as a zone of interest, rendering the video scene in a first window,
and
rendering the replicated portion of the captured video scene in a second
window
independent of the first window (US20100245584A1).
Region of Interest (ROI) request and inquiry in a video chain. A method for
video stream
processing in a video chain is provided that includes transmitting a video
stream in the
14
CA 3057924 2019-10-08

video chain, receiving, by a first video node in the video chain, a region of
interest (ROI)
command from a second video node in the video chain, wherein the ROI command
includes an ROI type indicator, and performing, by the first video node, the
ROI
command according to the ROI type indicator (US20140079118A1).
Regions of interest in video frames. A first representation of a video stream
is received
that includes video frames, the representation expressing the video frames at
a
relatively high pixel resolution. At least one of the video frames is detected
to include a
region of interest. A second representation of the video stream that expresses
the video
frames at a relatively low pixel resolution is provided to a video playing
device. Included
with the second representation is additional information that represents at
least a
portion of the region of interest at a resolution level that is higher than
the relatively low
pixel resolution (US20070086669A1).
Method and apparatus for image frame identification and video stream
comparison. A
method and apparatus are provided for identifying image frames in a video
stream, and
for comparing one video stream against one or more other video streams. The or
each
video stream is examined to produce a respective digest stream comprising
digest
values, which may be recorded in a digest record. A candidate image frame from
a
second video stream provides a respective second digest value. A match of the
digest
values indicates matching images in the respective video Image Digest
(US20160227275A1).
System and method for selective image capture, transmission and
reconstruction. A
video processing method and system for generating a foveated video display
with
sections having different resolutions uses a network channel for communicating
video
images having video sections of different resolutions, and includes a video
transmission
system for processing and transmitting the received video images over the
network
channel. The system assigns a larger portion of the network channel's
bandwidth to a
video section with higher resolution. Further, the system includes a video
receiving
system for receiving and seamlessly combining the first and second video
sections of
CA 3057924 2019-10-08

different resolutions to form an output video image on a display device, and a
control
unit for sending one or more video control parameters to the video
transmission system
to control capturing, transmitting and processing of the video images
(US20060176951A1).
System and method for exact rendering in a zooming user interface. A method
and
apparatus is disclosed that facilitates realistic navigation of visual content
by displaying
an interpolated image during navigation and a more exact image when the
navigation
ceases. Methodologies are disclosed for rendering and displaying "tiles",
portions of the
visual content at different levels of detail to minimize perceivable
discontinuities
(US20040233219A1).
Method and system for transcoding regions of interests in video surveillance.
A method
and a system for spatial scalable region of interest transcoding of JPEG2000
coded
video frames for video surveillance systems are shown. Based on a user defined
ROI
the method transcodes HD frames into images in moderate resolution with a ROI
in HD
resolution. The transcoder extracts all packets belonging to the ROI or the
lower
resolution levels of the background from the JPEG2000 bitstream. Non-ROI
packets of
higher resolution levels are replaced by empty packets. The ROI is tracked
using a
mean shift algorithm to guarantee that always the correct image details are
extracted in
high resolution. Since the transcoding is performed by extracting and
replacing packets
of the code stream, an expensive re-encoding of the code-stream is not
required. Thus,
the transcoding technique is of low complexity and shows a short processing
time.
Combining the transcoding technique with mean shift tracking leads to a
powerful video
transcoding technique (US20110051808A1).
Controlling bandwidth utilization of video transmissions for quality and
scalability. A
method of managing bandwidth associated with video transmissions over a
computer
network is disclosed. A plurality of video transmissions is received from a
plurality of
video cameras connected to a computer surveillance system via the computer
network.
Quality levels of the plurality of video transmissions are set to a first
level. A first
16
CA 3057924 2019-10-08

analysis is performed on a video transmission to identify whether a region of
interest
exists. A quality level of the video transmission is increased to a second
level with
respect to the region of interest. A second analysis is performed on the
region of
interest to identify whether an actionable event has occurred in an area
monitored by
one of the plurality of video cameras. The quality level may subsequently be
restored to
the first level to keep usage of the bandwidth efficient and scalable for a
large number of
camera nodes (US20170061214A1).
System and method for selective image capture, transmission and
reconstruction. A
video processing method and system for generating a foveated video display
with
sections having different resolutions uses a network channel for communicating
video
images having video sections of different resolutions, and includes a video
transmission
system for processing and transmitting the received video images over the
network
channel. The system assigns a larger portion of the network channel's
bandwidth to a
video section with higher resolution. Further, the system includes a video
receiving
system for receiving and seamlessly combining the first and second video
sections of
different resolutions to form an output video image on a display device, and a
control
unit for sending one or more video control parameters to the video
transmission system
to control capturing, transmitting and processing of the video images
(US749282162).
System and method for selective image capture, transmission and
reconstruction. A
video processing method and system for generating a foveated video display
with
sections having different resolutions uses a network channel for communicating
video
images having video sections of different resolutions, and includes a video
transmission
system for processing and transmitting the received video images over the
network
channel. The system assigns a larger portion of the network channel's
bandwidth to a
video section with higher resolution. Further, the system includes a video
receiving
system for receiving and seamlessly combining the first and second video
sections of
different resolutions to form an output video image on a display device, and a
control
unit for sending one or more video control parameters to the video
transmission system
17
CA 3057924 2019-10-08

to control capturing, transmitting and processing of the video images
(US20090167948A1).
Digital image compression with spatially varying quality levels determined by
identifying
areas of interest. Methods and systems for compression of digital images
(still or motion
sequences) are provided wherein predetermined criteria may be used to identify
a
plurality of areas of interest in the image, and each area of interest is
encoded with a
corresponding quality level (Q-factor). In particular, the predetermined
criteria may be
derived from measurements of where a viewing audience is focusing their gaze
(area of
interest). In addition, the predetermined criteria may be used to create areas
of interest
in an image in order to focus an observer's attention to that area. Portions
of the image
outside of the areas of interest are encoded at a lower quality factor and bit
rate. The
result is higher compression ratios without adversely affecting a viewer's
perception of
the overall quality of the image (US702765562).
Content processing apparatus for processing high resolution content and
content
processing method thereof. A content processing apparatus is provided, which
includes
an inputter configured to receive an input of high resolution content, a data
processor
configured to generate video frames by processing the high resolution content,
and a
controller configured to control the data processor to configure an object
that
corresponds to the high resolution content as an object for low resolution and
to add the
object to the video frame if an output resolution of a display panel to
display the video
frames is a low resolution (W02014054847A1).
An apparatus for image capture with automatic and manual field of interest
processing
with a multi-resolution camera. An apparatus for capturing a video image
comprising a
means for generating a digital video image, a means for classifying the
digital video
image into one or more regions of interest and a background image, and a means
for
encoding the digital video image, wherein the encoding is selected to provide
at least
one of; enhancement of the image clarity of the one or more ROI relative to
the
background image encoding, and decreasing the video quality of the background
image
18
CA 3057924 2019-10-08

relative to the one or more ROI. A feedback loop is formed by the means for
classifying
the digital video image using a previous video image to generate a new ROI and
thus
allow for tracking of targets as they move through the imager field-of-view
(W02008057285A2).
Method and system for transcoding regions of interests in video surveillance.
A method
and a system for spatial scalable region of interest transcoding of JPEG2000
coded
video frames for video surveillance systems. Based on a user defined ROI, the
method
transcodes HD frames into images in moderate resolution with a ROI in HD
resolution.
The transcoder extracts all packets belonging to the ROI or the lower
resolution levels
of the background from the JPEG2000 bitstream. Non-ROI packets of higher
resolution
levels are replaced by empty packets. The ROI is tracked using a mean shift
algorithm
to guarantee that always the correct image details are extracted in high
resolution.
Since the transcoding is performed by extracting and replacing packets of the
codestream, an expensive re-encoding of the code-stream is not required. Thus,
the
transcoding technique is of low complexity and shows a short processing time.
Combining the transcoding technique with mean shift tracking leads to a
powerful video
transcoding technique (US834574962).
Processing video signals based on user focus on a particular portion of a
video display.
Devices and methods are disclosed for detecting a focus of at least one viewer
as being
directed to a particular region of a video display. A first portion of a frame
of video for
presentation in the region of the viewer's focus is processed differently than
another
portion of the frame of video related to a region that is not part of the
viewer's focus
(US9912930B2).
Surveillance video camera enhancement system. In a video camera surveillance
system, a video processor determines dense motion vector fields between
adjacent
frames of the video. From the dense motion vector fields moving objects are
detected
and objects undergoing unexpected motion are highlighted in the display fo the
video.
To distinguish expected motion from unexpected motion, dense motion vector
fields are
19
CA 3057924 2019-10-08

stored representing expected motion and the vectors representing the moving
object
are compared with the stored vectors to determine whether the object motion is
expected or unexpected. In an alternative embodiment, the video surveillance
system
comprises a panning camera and the frames of the video are arranged in a
mosaic.
Object motion in the video is detected by means of dense motion vector fields
and the
predicted position of objects in the mosaic is detected based on the detected
object
motion. The position of moving objects in the current frame being detected by
the
panning camera is compared with the predicted position of the objets in the
mosaic and
if the positions are substantially different, the corresponding object is
tagged and
highlighted as undergoing unexpected motion. A system is also disclosed for
using the
dense motion vector fields to control the motion of the panning camera to
follow a
moving object (US20020054211A1).
Automatic video surveillance system and method. Apparatus and method for
automatic
surveillance of a monitored area having one or more regions of interest using
a video
imaging device are disclosed. The method includes receiving data defining said
one or
more regions of interest and one or more characteristics of an object of
interest; pointing
the line of sight of the imaging device at a region of interest selected from
said one or
more regions of interest; automatically scanning the selected region of
interest to detect
said one or more characteristics of the object of interest and upon detection
issuing an
alert; and when said one or more regions of interest includes more than one
region of
interest, repeating the steps of pointing the line of sight of the imaging
device and
automatically scanning in a predetermined order of viewing of the regions of
interest,
and upon detection of said one or more characteristics of the object of
interest issuing
an alert, for each of the regions of interests (US925345362).
Apparatus and method for a dynamic "region of interest" in a display system. A
method
and apparatus of displaying a magnified image comprising obtaining an image of
a
scene using a camera with greater resolution than the display, and capturing
the image
in the native resolution of the display by either grouping pixels together, or
by capturing
CA 3057924 2019-10-08

a smaller region of interest whose pixel resolution matches that of the
display. The
invention also relates to a method whereby the location of the captured region
of
interest may be determined by external inputs such as the location of a
person's gaze in
the displayed unmagnified image, or coordinates from a computer mouse. The
invention
further relates to a method whereby a modified image can be superimposed on an
unmodified image, in order to maintain the peripheral information or context
from which
the modified region of interest has been captured (US9618748B2).
Identifying objects tracked in images using active device. Methods are
disclosed for
augmenting the identification of objects tracked in a sequence of images using
active
devices. In one embodiment, encoded information transmitted from active
devices, such
as a PDA, is decoded from the sequence of images to provide identifying
information to
a tracked region of interest. In another embodiment, motion information
transmitted from
active devices is associated with regions of interest tracked between images
in the
sequence of images. In a further embodiment, these embodiments operate
together to
minimize possible signal interference (US750560762).
Region-of-interest encoding enhancements for variable-bitrate mezzanine
compression.
A specification defining allowable luma and chroma code-values is applied in a
region-of-interest encoding method of a mezzanine compression process. The
method
may include analyzing an input image to determine regions or areas within each
image
frame that contain code-values that are near allowable limits as specified by
the
specification. In addition, the region-of-interest method may comprise then
compressing
those regions with higher precision than the other regions of the image that
do not have
code-values that are close to the legal limits (U820160286237A1).
Method of identifying relevant areas in digital images, method of encoding
digital
images, and encoder system. A method of identifying relevant areas in digital
images is
provided. The method comprises receiving information representative of pixels
in a first
digital image, and calculating a spatial statistical measure of said
information for groups
of neighboring pixels in said first image to form a group value for each group
of pixels.
21
CA 3057924 2019-10-08

Further, the method comprises calculating differences between group values,
and
comparing said differences to a predetermined threshold value. If said
difference is
equal to or above said threshold value, said group is identified as relevant,
and if said
difference is below said threshold value, said group is identified as not
relevant. A
method of encoding digital images based on the identification of relevant and
non-relevant areas is also provided, as well as a digital encoder system
(EP3021583A1).
Dual encoding/compression method and system for picture quality/data density
enhancement. A digital compression apparatus and method uses a first
compression
encoding step associated with a first encode circuit to produce a first
statistical data set
from a first compression encoding of an initial digital video bit stream
representing a
group of pictures (GOP). The initial bit stream of the GOP is also stored in a
first
memory, while the first compression encoding is performed. Concurrently with,
or
following the first compression of the GOP data by the first encoder, a
companion
processor uses the first statistical data set and optional additional
information to
compute a set of filter and encoder control settings. The initial digital
video bit stream of
the GOP data previously stored in the first memory is retrieved from the
memory after
the first encoding step and is input as a time delayed version of the initial
digital video
bit stream through a set of filters and subsequently through a second
compression
encode circuit to produce a second compressed video bit stream of the GOP
data. The
filters are configured to provide dynamically variable filter characteristics
that are
responsive to the filter control settings from the companion processor. The
filter
characteristics can be changed optionally by the companion processor on a
block,
frame or GOP basis. The second encode circuit also has encoder characteristics
responsive to the encoder control settings. The filter and encoder control
settings
selected by the companion enable the second encoder to provide enhanced
compression encoding performance second compression encoding of the GOP data
video bit stream relative to the compression of the first encode circuit. In
particular, the
22
CA 3057924 2019-10-08

second compressed bit stream may have improved performance in either or both
bit
rate and total bits relative to the compression of the first encode circuit
(US5990955A).
Scene and activity identification in video summary generation based on motion
detected
in a video. Video and corresponding metadata is accessed. Events of interest
within the
video are identified based on the corresponding metadata, and best scenes are
identified based on the identified events of interest. In one example, best
scenes are
identified based on the motion values associated with frames or portions of a
frame of a
video. Motion values are determined for each frame and portions of the video
including
frames with the most motion are identified as best scenes. Best scenes may
also be
identified based on the motion profile of a video. The motion profile of a
video is a
measure of global or local motion within frames throughout the video. For
example, best
scenes are identified from portion of the video including steady global
motion. A video
summary can be generated including one or more of the identified best scenes
(US964665262).
While there are available patented inventions which described video recording
of high
quality, there has been no such system and method to identify from origin at
the capture
device points of interest on the video and record them in higher quality than
the rest of
the image. Apparently, this system and method is unique and is found
innovative which
proves useful in the field of video recording such as in surveillance. In
particular, to
identify from origin at the capture device points of interest on the video and
record them
in higher quality than the rest of the image, thus making the video file
smaller in size for
saving or transmission over the internet. The present invention is directed
towards this
new and innovative idea.
SUMMARY
In general, in one aspect, the invention relates to a method for identifying
from origin at
the video-capture device's points of interest and record them in higher
quality than the
23
CA 3057924 2019-10-08

rest of the image, thus making the video file smaller in size for saving or
transmission
over the internet.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagram of the relationship of the devices described in this
invention
FIG. 2 shows a flowchart in accordance with one or more embodiments of the
invention.
FIG. 3 shows a flowchart describing the process for the system and method,
FIGS. 4A to 4D show a side view of an example of the method and system used to
identify someone's face.
FIG. 5 shows a relationship diagram of the contents of the container
FIG. 6 shows how the viewport data of each image is saved in the viewport data
file.
FIG. 7 shows a side view of an image of an example of the operation of the
invention.
FIG. 8 is a diagram describing a process of the invention
DETAILED DESCRIPTION
Specific embodiments of the technology will now be described in detail with
reference to
the accompanying FIGS.. In the following detailed description of embodiments
of the
technology, numerous specific details are set forth in order to provide a more
thorough
understanding of the technology. However, it will be apparent to one of
ordinary skill in
the art that the technology may be practiced without these specific details.
In other
instances, well-known features have not been described in detail to avoid
unnecessarily
complicating the description.
In the following description of FIGS., any component described with regard to
a FIG. - in
various embodiments of the technology - may be equivalent to one or more like-
named
components described with regard to any other figure. For brevity,
descriptions of these
components will not be repeated with regard to each figure. Thus, each and
every
24
CA 3057924 2019-10-08

embodiment of the components of each FIG. is incorporated by reference and
assumed
to be optionally present within every other FIG. having one or more like-named
components. Additionally, in accordance with various embodiments of the
technology,
any description of the components of a FIG. is to be interpreted as an
optional
embodiment, which may be implemented in addition to, in conjunction with, or
in place
of the embodiments described with regard to a corresponding like-named
component in
any other figure.
FIG. 1 is a diagram of the relationship of the devices described in this
invention,
showing a first device (101), wherein the device is one or more from the group
of a
server, a video surveillance system, a smart gadget, a computer, a smartphone,
a
tablet, a digital camera, a ring camera, a security appliance, a voice
operated assistant
such as the Amazon Alexa or Google home device. That has a CPU (110), memory
(111), a power supply (113) and a hard drive or recording apparatus such as a
Solid
State Drive or persistent memory. The device (101) has an image input (114)
such as a
camera input or a digital media input for the video image. It connects to
other devices
(102) via a physical (130), peer to peer connection or via an internet
connection (121)
optionally through a server (120).
FIG. 2 shows a flowchart in accordance with one or more embodiments of the
invention.
While the various steps in these flowcharts are presented and described
sequentially,
one of ordinary skill will appreciate that some or all of the steps may be
executed in a
different order, may, be combined or omitted, and some or all of the steps may
be
executed in parallel. Furthermore, the steps may be performed actively or
passively.
FIG. 2 shows a flowchart of the system and method that optimizes the size of a
video
recording or video transmission by executing a series of steps for every frame
(image)
in the video and/or by identifying and recording the region of interest in
higher image
resolution than the rest of the image - which is saved or transmitted in a
lower resolution
CA 3057924 2019-10-08

format. One familiar with the art will appreciate that the term optimization
referred in this
FIG. and through this application is one or more from the group of reducing,
compressing, packaging the image or file that contains the image.
Step 201 receiving, in a first device, a request from a user to locate in an
original
master image, an object or region of interest. Wherein the original master
image
is one collected by a camera, including security cameras, smart gadgets, or
digital cameras to name a few.
A region of interest (ROI), are samples within a data set identified for a
particular
purpose. In computer vision the ROI defines the borders of an object under
consideration. In many applications, symbolic (textual) labels are added to an
ROI, to describe its content in a compact manner. Within a ROI may lie
individual
points of interest (POls).
Step 202 identifying what is the region or object of interest to extract from
an
original master image. For example, wherein the object of interest is the face
of a
person.
Step 203 locating in the original master image matches to the region or object
of
interest. Using a machine-learning based computer vision technique to detect
and localize the region or object of interest. Examples of such techniques are
object detection and localization methods to detect objects or faces using a
pre-trained Convolutional Neural Network model based on frameworks like
YOLO, Fast-RCNN, or any similar alternative. I.e. YOLO, unlike other detection
systems repurpose classifiers or localizers to perform detection that apply
the
model to an image at multiple locations and scales. Then High scoring regions
of
the image are considered detections. YOLO uses a totally different approach by
applying a single neural network to the full image. This network divides the
image
into regions and predicts bounding boxes and probabilities for each region.
26
CA 3057924 2019-10-08

These bounding boxes are weighted by the predicted probabilities. YOLO's
model has several advantages over classifier-based systems. It looks at the
whole image at test time, so its predictions are informed by global context in
the
image. It also makes predictions with a single network evaluation unlike
systems
like R-CNN which require thousands for a single image.
Step 204 masking in a polygon an area surrounding the located region or object
of interest, wherein the image within the polygon becomes a first slave image,
wherein the first slave image maintains the same image resolution as the
original
master image.
Image resolution is the detail an image holds. The term applies to raster
digital
images, film images, and other types of images. Higher resolution means more
image detail.
Image resolution can be measured in various ways. Resolution quantifies how
close lines can be to each other and still be visibly resolved. Resolution
units can
be tied to physical sizes (e.g. lines per mm, lines per inch), to the overall
size of a
picture (lines per picture height, also known simply as lines, TV lines, or
TVL), or
to angular subtense. Line pairs are often used instead of lines; a line pair
comprises a dark line and an adjacent light line. A line is either a dark line
or a
light line. A resolution of 10 lines per millimeter means 5 dark lines
alternating
with 5 light lines, or 5 line pairs per millimeter (5 LP/mm). Photographic
lens and
film resolution are most often quoted in line pairs per millimeter.
Step 205 identifying the region of interest and viewport information of the
first
slave image in reference to the master image, saving the first slave image's
viewport information into the viewport information file. One familiar with the
art
will appreciate that the original master image may comprise more than one
27
CA 3057924 2019-10-08

match to the object of interest, in this case, our method repeats this process
until
all objects or regions of interest that are matched have their slave image.
A viewport is a polygon viewing region in computer graphics where there are
two
region-like notions of relevance when rendering some objects to an image. The
viewport is an area (typically rectangular) expressed in rendering-device-
specific
coordinates, e.g. pixels for screen coordinates, in which the objects of
interest
are going to be rendered. Clipping to the world-coordinates window is usually
applied to the objects before they are passed through the window-to-viewport
transformation. For a 20 object, the latter transformation is simply a
combination
of translation and scaling, the latter not necessarily uniform. An analogy of
this
transformation process based on traditional photography notions is to equate
the
world-clipping window with the camera settings and the variously sized prints
that
can be obtained from the resulting film image as possible viewports.
Because the physical-device-based coordinates may not be portable from one
device to another, a software abstraction layer known as normalized device
coordinates is typically introduced for expressing viewports; it appears, for
example, in the Graphical Kernel System (GKS) and later systems inspired from
it Viewport information might also contain additional metadata to facilitate
the
later re-combination of the master and slave image, for example, the
coordinates
and size of the region of interest relative to the master image.
Step 206 saving the processed master image in a smaller resolution format than
the first slave image. As an example, security cameras in banks need to record
24/7 and they do so with low resolution cameras. The result from those images
is
not very good for the identification of people nor to perform face
recognition.
Instead, by saving the non-meaningful images such as the background and the
28
CA 3057924 2019-10-08

person's body in low resolution, while maintaining a high resolution image in
the
face, it is possible to perform a better identification of the person's face
as well as
to run face recognition algorithms on it.
Step 207 combining in a container the master image, the viewport information
file, and the first slave image. A container or wrapper format is a metafile
format
whose specification describes how different elements of data and metadata
coexist in a computer file.
By definition, a container format could wrap any kind of data. Though there
are
some examples of such file formats (e.g. Microsoft Windows's DLL files), most
container formats are specialized for specific data requirements. For example,
a
popular family of containers is found for use with multimedia file formats.
Since
audio and video streams can be coded and decoded with many different
algorithms, a container format may be used to provide a single file format to
the
user.
The container file is used to identify and interleave different data types.
Simpler
container formats can contain different types of audio formats, while more
advanced container formats can support multiple audio and video streams,
subtitles, chapter-information, and meta-data (tags) ¨ along with the
synchronization information needed to play back the various streams together.
In
most cases, the file header, most of the metadata and the synchro chunks are
specified by the container format. For example, container formats exist for
optimized, low-quality, Internet video streaming which differs from high-
quality
Blu-ray streaming requirements.
Step 208 saving the container in the device.Once the container is in the
devices
it does not occupy as much storage capacity as if the full video was recorded
and
29
CA 3057924 2019-10-08

saved in high resolution. Also, if a transmission of the video is necessary,
it is
much easier and faster to transmit.
FIG. 3 shows a flowchart describing the process for the system and method,
continuation of FIG. 2 for the system and method of this invention.
Step 301 transmitting the container to a second device, One familiar with the
art
will appreciate that as more and more camera systems are installed and
connected to a network, transmitting those images is necessary to perform
surveillance as well as dataset training. For example, the doorbell devices
that
have a camera are constantly capturing images throughout the day, yet no one
is monitoring them. The images captured are used in later times in case the
owner or the manager of the security surveillance company wants to see a clip
of
a video, or, big data companies want to use such data for data algorithm
training,
such as the one used for computer vision in machine learning. Those algorithms
may only need a good quality video in certain areas of the video and not in
all of
the image frames. As such, our invention can identify such areas of interest
and
transmit them to a server or secondary device for processing.
Step 302 opening the container in the second device,
Step 303 performing a computer vision recognition process to the slave images.
For example, the computer vision recognition process could be a
face-recognition process or a text-recognition process to name some examples.
Step 304 using the data from the viewport information file to identify the
slave
image's location in reference to the master image, playing back the master
image
with the slave images superimposed.
CA 3057924 2019-10-08

FIGS. 4A to 40 show a side view of an example of the method and system used to
identify someone's face.
FIG. 4A Security camera captures images in high definition. In this example,
it shows 4
people hanging out in the street (401, 402, 403, 404). In a different
embodiment of our
invention, the source can be a recording in high definition of a previously
captured
moment.
FIG. 4B Shows the identification of the location of faces (411, 412, 413,
414). In this
example we are using a person's face as the object of interest. One familiar
with the art
will appreciate that objects of interest could vary, for example, but not
limited to licence
plate tags, objects, etc. Continuing with the description of FIG. 4B,
algorithms process
the image identifying the object of interest, in this case the face of a
person. In this
example, the 4 faces of 4 people.
FIG. 4C shows how the system mask's the identified face in a rectangle with
viewport
information in reference to the master image (421, 422, 423, 424). This
rectangle is now
a slave image with viewport data as shown in Fig 6. A rectangle is identified
in the area
surrounding the face or the object of interest. Using this viewport
information allows the
system to reconstruct the master image with the slave images to form the
complete
original image later as shown in FIG. 7.
FIG. 40 shows how the splitting, from the master image, a slave image works.
This
process repeats for as many faces (411) are detected within the master image.
In the
case of this example it is 4 times. The final result is a slave image (441) in
the original
resolution.
31
CA 3057924 2019-10-08

FIG. 5 shows a relationship diagram of the contents of the container as
described in
FIG. 2, step 7. This way, the method and system saves, records, or transmits
the
master and the slave images together with the viewport data file. One familiar
with the
art will appreciate that the viewport data file could also reside as metadata
within each
slave image or master image.
Fig 6 shows how the viewport data of each image is saved in the viewport data
file
(600). A master image (601) comprises one or multiple slave images (602), each
slave
image comprises a series of x and y viewport coordinates (603), shown in the
image are
4 different viewport values (603), in some instances only one x and one y
coordinate is
necessary to identify the position of the image polygon, that, plus the size
of the
viewport data on the polygon is more than sufficient to superimpose the slave
image to
the master image as shown in FIG. 7.
FIG. 7 shows a side view of an image of an example of the operation of the
invention. In
the example that is playing back the master image (700) with the slave images
(701,
702, 703, 704) superimposed. One familiar with the art will appreciate that
the original
master image (700) shows in lower resolution than each one of the slave images
(701,
702, 703, 704) that are shown in higher resolution.
Embodiments of the invention may be implemented on a computing system. Any
combination of mobile, desktop, server, router, switch, embedded device, or
other types
of hardware may be used. For example, as shown in FIG. 8, the computing system
(800) may include one or more computer processors (801), non-persistent
storage (802)
(for example, volatile memory, such as random access memory (RAM), cache
memory),
persistent storage (803) (for example, a hard disk, an optical drive such as a
compact
disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc.),
a
32
CA 3057924 2019-10-08

communication interface (804) (for example, Bluetooth interface, infrared
interface,
network interface, optical interface, etc.), and numerous other elements and
functionalities.
The computer processor(s) (801) may be an integrated circuit for processing
instructions. For example, the computer processor(s) may be one or more cores
or
micro-cores of a processor. The computing system (800) may also include one or
more
input devices (810), such as a touchscreen, keyboard, mouse, microphone,
touchpad,
electronic pen, or any other type of input device.
The communication interface (804) may include an integrated circuit for
connecting the
computing system (800) to a network (not shown) (for example, a local area
network
(LAN), a wide area network (WAN) such as the Internet, mobile network, or any
other
type of network) and/or to another device, such as another computing device.
Further, the computing system (800) may include one or more output devices
(806),
such as a screen (for example, an LCD display, a plasma display, touch screen,
cathode ray tube (CRT) monitor, projector, or other display device), a
printer, external
storage, or any other output device. One or more of the output devices may be
the
same or different from the input device(s). The input and output device(s) may
be locally
or remotely connected to the computer processor(s) (801), non-persistent
storage (802)
, and persistent storage (803). Many different types of computing systems
exist, and
the aforementioned input and output device(s) may take other forms.
Software instructions in the form of computer readable program code to perform
embodiments of the invention may be stored, in whole or in part, temporarily
or
permanently, on a non-transitory computer readable medium such as a CD, DVD,
storage device, a diskette, a tape, flash memory, physical memory, or any
other
computer readable storage medium. Specifically, the software instructions may
33
CA 3057924 2019-10-08

correspond to computer-readable program code that, when executed by a
processor(s),
is configured to perform one or more embodiments of the invention.
The computing system (800) in FIG. 8 may be connected to or be a part of a
network.
By way of an example, embodiments of the invention may be implemented on a
node of
a distributed system that is connected to other nodes. By way of another
example,
embodiments of the invention may be implemented on a distributed computing
system
having multiple nodes, where each portion of the invention may be located on a
different
node within the distributed computing system. Further, one or more elements of
the
aforementioned computing system (800) may be located at a remote location and
connected to the other elements over a network.
FIG. 9 shows a different embodiment of the invention where the recordings in
high
definition or resolution are stored in the first device and a low resolution
video is
transmitted to the second device. A user at the second device sees the need to
identify
an object of interest in high resolution. The user identifies the viewport
information of the
area of the frame or frames it needs the first device to send in high
resolution to
properly identify the object or area(s) of interest. By receiving the viewport
information
and the frame identification (one familiar with the art will appreciate that
the frame
identification can be the identification of a camera, footage, and the
timeline on the
video the user at the second device needs) the first device can use our
process,
separate the areas or objects of interest from the requested frames or videos
and
transmit them to the second device.
34
CA 3057924 2019-10-08

FIG. 9 is a flowchart describing the process. From a low resolution video
received from
the first device at the second device, where a higher resolution video exists
(is stored) in
the first device.
Step 901 A user at the second device needs to identify an object of interest
in higher
resolution than the lower resolution video the second device received from the
first
device. The user at the second device needs to request a higher resolution
video of the
objects of interest. Such identification can be a request of a type of object
of interest
where the user may only select the type of object of interest needed, for
example, faces
of people from a specific time window from the videos without having to
identify each
face needed. In a different embodiment of our invention is a manual
identification of the
view port information including but not limited to a specific image type or a
specific
viewport information within a time frame.
Step 902 The user at the second device identifies the viewport information of
the area of
the frame or frames it needs the first device to send in high resolution to
properly
identify the object or area of interest.
Step 903 The first device receives the viewport information and the frame
identification
(one familiar with the art will appreciate that the frame identification can
be the
identification of a camera, footage and the timeline on the video the user at
the second
device needs)
Step 904 the first device separates the areas or objects of interest from the
requested
frames or videos using the process described in FIGS. 1 to 7. Creating a
master image
and one or more slave images.
Step 905 The master and slave images as well as the viewport data file are
compressed
into a container.
CA 3057924 2019-10-08

Step 906 The first device transmits the container to the second device.
While the invention has been described with respect to a limited number of
embodiments, those skilled in the art, having the benefit of this disclosure,
will
appreciate that other embodiments can be devised which do not depart from the
scope
of the invention as disclosed herein. Accordingly, the scope of the invention
should be
limited only by the attached claims.
36
CA 3057924 2019-10-08

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee and Payment History should be consulted.

Event History

Description	Date
Time Limit for Reversal Expired	2023-04-11
Application Not Reinstated by Deadline	2023-04-11
Letter Sent	2022-10-11
Deemed Abandoned - Failure to Respond to Maintenance Fee Notice	2022-04-08
Letter Sent	2021-10-08
Application Published (Open to Public Inspection)	2021-04-08
Inactive: Cover page published	2021-04-07
Common Representative Appointed	2019-10-30
Common Representative Appointed	2019-10-30
Inactive: Filing certificate - No RFE (bilingual)	2019-10-25
Inactive: IPC assigned	2019-10-21
Inactive: Reply to s.37 Rules - Non-PCT	2019-10-18
Inactive: Request under s.37 Rules - Non-PCT	2019-10-18
Inactive: IPC assigned	2019-10-16
Inactive: IPC assigned	2019-10-16
Inactive: IPC assigned	2019-10-16
Inactive: First IPC assigned	2019-10-16
Inactive: IPC assigned	2019-10-16
Application Received - Regular National	2019-10-10
Small Entity Declaration Determined Compliant	2019-10-08

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2022-04-08

Fee History

Fee Type	Anniversary Year	Due Date	Paid Date
Application fee - small			2019-10-08

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
ALFONSO F. DE LA FUENTE SANCHEZ
DANY A. CABRERA VARGAS

Past Owners on Record
UNKNOWN

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column (Temporarily unavailable). To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Cover Page	2021-02-28	1	39
Description	2019-10-07	36	1,618
Claims	2019-10-07	5	145
Abstract	2019-10-07	1	6
Drawings	2019-10-07	10	245
Representative drawing	2021-02-28	1	10
Filing Certificate	2019-10-24	1	213
Commissioner's Notice - Maintenance Fee for a Patent Application Not Paid	2021-11-18	1	549
Courtesy - Abandonment Letter (Maintenance Fee)	2022-05-05	1	550
Commissioner's Notice - Maintenance Fee for a Patent Application Not Paid	2022-11-21	1	550
Request Under Section 37	2019-10-17	1	60
Response to section 37	2019-10-17	2	57

Language selection

Menus

English Abstract

Event History

Abandonment History

Fee History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 3057924 Summary

English Abstract

Event History

Abandonment History

Fee History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.