Patent 2312333 Summary

(12) Patent Application:	(11) CA 2312333
(54) English Title:	MULTIMEDIA COMPRESSION, CODING AND TRANSMISSION METHOD AND APPARATUS
(54) French Title:	METHODE ET APPAREIL DE COMPRESSION, DE CODAGE ET DE TRANSMISSION DE DONNEES MULTIMEDIA
Status:	Dead

Bibliographic Data

(51) International Patent Classification (IPC):	H04N 7/08 (2006.01) H04N 19/00 (2014.01) H04N 5/917 (2006.01)
(72) Inventors :	SATO, KIMIHIKO E. (Canada) MYERS, KELLY LEE (Canada)
(73) Owners :	SATO, KIMIHIKO E. (Canada) MYERS, KELLY LEE (Canada)
(71) Applicants :	KYXPYX TECHNOLOGIES INC. (Canada)
(74) Agent:	NA
(74) Associate agent:	NA
(45) Issued:
(22) Filed Date:	2000-06-21
(41) Open to Public Inspection:	2001-12-21
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Abstracts

English Abstract

A system, method and apparatus for the compression, coding and transmission of
multimedia
data. This system encompasses the content creation step of capturing or
creating the raw
multimedia data and converting it into compressed and encoded multimedia data,
the
transmission step of adaptively requesting variable levels of information from
the multimedia
storage device based on the capabilities of the display device and the
conditions of the
communication channel, and the rendering step which presents the multimedia
data to the
intended audience. The method encodes the multimedia data in such a way that
the rendering
device can request at any time any variant of the original multimedia data
thus optimising the
bandwidth that is available, and presenting the best possible rendition of the
original source
data to the audience. The encoding algorithm is structured such that the
compression may not
be optimal for storage, but is optimal for parsing, transmission, and display.

Claims

Note: Claims are shown in the official language in which they were submitted.

We claim:

1. A method for multimedia compression, comprising the steps of, for each
frame:
capturing image data for a frame;
capturing audio data for the frame;
compressing the image data;
encoding the audio data within the comment field of the compressed image data;

2. A method for multimedia compression, comprising the steps of:
capturing raw audio data for a frame;
converting the raw audio data to provide back half audio data and front half
audio data;
capturing raw video data for a frame;
flipping the frame diagonally to provide flipped video data;
collecting every other pixel in the flipped video data;
encoding the remaining uncollected pixels to provide back half video data;
converting the collected pixels to YUV space to provide front half video data;
comparing and storing the back half data using a continuous tone compression
algorithm.

3. A system for implementing the method according to claims 1 and 2.

-19-

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02312333 2000-06-21
.
MULTIMEDIA COMPRESSION, CODING AND TRANSMISSION METHOD AND
APPARATUS
FIELD OF THE INVENTION
The present invention relates to data compression, coding and transmission. In
particular, the
present invention relates to multimedia data compression, coding and
transmission, such as
video and audio multimedia data.
BACKGROUND OF THE INVENTION
1o A constant aim within video data compression systems is to achieve a higher
degree of data
compression with a reduced degree of degradation in image quality upon
decompression.
Generally speaking, the higher the degree of compression used, the greater
will be the
degradation in the image when it is decompressed. Similarly, audio compression
systems aim
to store the best quality reproduction of an audio signal with the least
amount of storage
15 space. In both audio and video compression, a model of human perception is
used so that if
the decompressed data is such that the information lost first is the least
likely to be perceived.
Coding and its inverse step decoding describe how this compressed information
of Video and
Audio are stored, sequenced, and categorized. Text, Picture, Audio and Video
data is
2o generically termed multimedia data, and the long-term archival storage of
this information is
termed multimedia storage. The apparatus that captures, encodes and compresses
multimedia
data is termed the content creation apparatus and the decoding, decompressing,
and display
apparatus for multimedia data is termed display apparatus.
25 Content creation is the step whereby the multimedia data is created from
raw sources and
placed into multimedia storage.
Transmission is the step whereby the multimedia data is transmitted from
multimedia storage
to the display apparatus.
Rendering is the step whereby the multimedia data is converted from the
encoded form into a
-1-

CA 02312333 2000-06-21
'
video and audio representation at the display apparatus.
Everybody else it seems codes the information into the lowest common
denominator format,
which most of the world can play using the hardware and infrastructure
present. This file
format is optimized for lowest bitrate coding, so that the fewest amount of
bits are needed in
order to reconstruct a representation of the original information. These
formats, such as
MPEG1, MPEG2, and MPEG4, are designed for small bitrate coding errors, such as
is found
in any kind of radio transmission (Microwave, Satellite, Terrestrial). These
formats are not
designed specifically for the Internet, which is unreliable on a packet by
packet basis rather
1o than on a bit by bit basis.
A disadvantage of MPEG type information is that it achieves the compression
rate by
interframe coding. This means that only differences between successive frames
are recorded.
The bad thing about this is that you need a reference frame that the
differences are applied
towards. This means that the stream needs to be coded for the bitrate that it
expects to have
available. The bitrates that are specified in MPEG documentation are
continuous, reliable,
and reproducible. The Internet is far from any of these.
Everybody else also starts from television, and tries to reproduce it using
the Internet as the
2o transport mechanism for the information. In order to achieve this, the
usual way is to use
either TCP or RTP transports to send the MPEG coded information across the
net. TCP is a
protocol heavy transport, because the aim is to have the exact copy
transferred from sender to
receiver as fast as possible, but with no guarantee of time.
The creator of the MPEG type file needs to decide at creation time what the
size and quality
of the image is, and the quality of the audio. If a smaller, lower quality
derivative of this
image or this audio is required, the creator needs to make a separate file for
each variant.
In order to compensate for unreliable and unpredictable transmission channels,
such as the
Internet, the conventional approach is to use "Pre-Buffering", a term that
simply means that
prior to commencing playback, enough of the file is received reliably and
stored away such
-2-

CA 02312333 2000-06-21
that if the network experiences any difficulties, there is a small amount of
multimedia data
that can be used. In some cases, tens of second to several minutes of time is
spent collecting
pre-buffering data instead of presenting pictures with sound to the viewing
and listening
audience.
These systems require that the width, height, compression level, and audio
quality, are all
determined at the time of compression. Even if the decoding and decompression
apparatus is
capable of handling a much higher quality image, the resulting experience for
the user has
been limited to the least capable playback device. There is no mechanism for
altering the
1o experience on the fly based on the capabilities of the display apparatus.
The streaming technology is simply downloading whatever file, and starting the
rendering
prior to the whole file being available. This is achieved by something called
prebuffering,
which mean that a certain amount of information is required at the client end
before playback
15 can commence. If the playback point ever catches up to the last information
received, which
happens whenever the conditions of the network are not what is expected, the
playback has to
stop and further prebuffering is required.
Conventional video and audio compression algorithms and coding systems rely
heavily on
2o committee based standards work, such as MPEG2 from the MPEG committee, or
H.261 from
the ITU-T. These describe a multimedia data file in multimedia storage that
can more or less
be transmitted error-free and at a reliable rate to the decoding and
decompression apparatus.
The encoding typically attempts to determine the differences from both earlier
and later video
frames. The encoder then stores only a representation of the differences
between the earlier
25 frame and the later frame. The Audio information is typically interleaved
within the same
media storage file in order to correctly synchronize the audio playback with
the video
playback. In conventional systems, the multimedia data that is determined at
the encoding
step must be transmitted in its entirety to the decoding, decompressing
apparatus. When
motion prediction algorithms are used in the content creation step, there is a
large amount of
30 computation required at both content creation and rendering. This then
means that it is more
expensive in hardware costs to do real-time content creation, and rendering.
-3-

CA 02312333 2000-06-21
The sizes and the limits of the video data is typically limited to height to
width ratios of
standard NTSC and PAL video, or 16:9 Wide-screen movie, as this is the
standard source of
moving pictures with audio. This should not be the only possible size.
Parallel to the development of video and audio compression, the field of still
picture
compression, such as JPEG, PNG, and Compuserve GIF, are fairly
straightforward. These
are usually symmetrical algorithms, so that the content creation step is of
roughly equivalent
complexity to the rendering step. When still pictures are flashed changing at
a high enough
l0 frame rates, the illusion of motion is created. Motion JPEG (MJPEG) is a
system used in
non-linear video editing that does just that with still JPEG files. This is
simply a video
storage system, and does not encompass audio as well.
There is a need for a new type of video compression method that overcomes the
above-noted
15 deficiencies in the prior art methodologies.
SUMMARY OF THE INVENTION
According to the invention, there is provided a video and audio compression,
coding and
transmission system, method and apparatus comprising:
- a communication channel coupled to one transmitter device, at least one
transmission relay device, and at least one reception device, along with a
method of
coding and compressing multimedia data, such that there are multiple levels of
detail and reproducible coherence in multimedia data such that that a
redundant set
variably encoded of audio and text information can be sent adaptively with the
video in a minimally acknowledged transmission protocol.
Advantageously, the video compression method and apparatus according to the
invention
allows:
- multimedia data to be requested by the display device and transmitted
through an
unpredictable transmission channel adapting to the capabilities of the display
device
-4-

CA 02312333 2000-06-21
and the reliability of the communication.
- multimedia data is encoded in such a way that the rendering of audio can
continue
in some capacity for a short period of time, at a reduced level in the case
when
information is sent but not received.
- multimedia data that the system sends to adapt by reducing the amount of
data
selectively in such a way that the least perceived data, such as high
frequency audio,
or higher frame rate, and possibly even stereo separation is selectively
removed
from the transmission first.
- multimedia data is encoded in such as way that multiple levels of audio and
video
to can be reduced to the required level for that particular display device and
the current
communications capacity with minimal calculations.
- multimedia data is encoded such that long term archival storage of the full
highest
quality video and audio is protected by multiple levels of encryption in a way
that
the lowest representative audio and video has minimal or no protection and the
highest representation of audio and video has maximum protection
The above advantages and features of this invention will be apparent from the
following
detailed description of illustrative embodiments, which is to be read in
connection with the
accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Preferred embodiments of the present invention will now be described, by way
of example only,
with reference to the attached Figures, wherein:
Figure 1 shows an apparatus for the creation, transmission, and rendering of
streamed multimedia
data;
Figure 2 shows a data format to store multimedia data;
Figure 3 shows a method for creation of multimedia data;
Figure 4 shows a method for front half compression and encoding;
Figure 5 shows a method for the diagonal flip step;
3o Figure 6 shows a method for front half data packing;
Figure 7 shows a method for front half decompression and decoding;
-5-

CA 02312333 2000-06-21
Figure 8 shows a method for Picture Substream sample;
Figure 9 shows pixel numbering;
Figure 10 shows a tiled representation.
DETAILED DESCRIPTION
The present invention is a system and method that caters to the digital
convergence industry.
It is a format and a set of tools for content creation, content transmission,
and content
presentation. Digital convergence describes how consumer electronics,
computers, and
communications will eventually come together into being part of the same
thing. The key to
to this happening is that the digital media, the format for sound, video, and
any other data
related to such, are recorded, converted and transmitted from source to
destination.
As used herein, the following terms are defined:
Packet = a single unit of data that can be sent unreliably across the
Internet.
Frame = one full picture from the video.
Video = a series of frames that need to be shown at a certain rate to achieve
the illusion of
continuous motion.
A/V = Audio / Video = Audio and video put together into a single stream.
MPEG = motion picture experts group. This is an ISO standards workgroup that
concentrates
on moving images.
2o JPEG = joint photographics experts group. Similar ISO standards workgroup
that
concentrates on still images.
The system of the present invention encodes the information at the highest
size and quality
that will be requested by any client. It is then encoded in a way that makes
it easy during the
playback for a client to downscale the quality or size based on the conditions
of the network.
In order to achieve this, the fundamental starting point everyone else uses,
which is to use
interframe coded MPEG, had to be abandoned. This instantly releases us from
requiring a
reliable transport mechanism such as TCP/IP.
The transport requirements are that there is a bidirectional channel, and that
the propogation
-6-

CA 02312333 2000-06-21
time from sender to receiver, and the reverse, are more or less uniform. We
currently use
straight unicast and multicast UDP/IP, although we are not limited to these at
all.
As most motion picture is originally on film stock, any continuous tone image
compression
algorithm, such as, but not limited to, the discrete cosine transform (DCT) in
JPEG, or the
Wavelet image compression algorithm in JPEG2000, can be used to compress the
frames.
The audio corresponding to the film is captured separately, and then encoded
within the frame
data as comment fields. The audio can encoded by any audio compression
algorithm. Using
the comment fields I believe has not been done before for audio.
A further innovation is to flip the image diagonally prior to compression.
What this means is
that instead of the sequential information in the file corresponding to the
picture information
from top left to bottom right of the image by rows, the information
corresponds to picture
date from top left to bottom right by columns. This introduces further work at
both the
compression step and the decompression step, but it allows for the capability
to do left to
right reduction of the image without the server having to decompress the
entire image.
Another coding innovation is to encode a reduced resolution image with a low
bitrate coded
audio signal as the front half of the encoded frame data. The information
required to modify
2o this image into a higher resolution and higher quality image, as well as
the corresponding
high frequency encoded audio is encoded as differences in a back half of the
encoded data.
The example uses two levels of image resolution, although this algorithm can
be extended to
have as many levels of resolution as necessary.
The back half components can be encrypted at high security levels, which
allows for a lower
quality rendition to be available for a lower price, etc.
Example:
Take a motion picture with audio signal at 24 frames a second.
Capture the audio at 44.1KHz Stereo PCM digital data format. This is referred
to as the Raw

CA 02312333 2000-06-21
Audio Data (RAD).
Convert the audio signal into a 44.1KHz Stereo PCM audio data file. This is
referred to as
the Back Half Audio Data (BHAD).
Convert the audio signal into a llKHz Mono PCM audio data file. This is
referred to as the
Front half audio data (FHAD).
Capture the video at 24 frames per second into 720 x 576 pixel progressive
picture files.
1o Each pixel is RGB888 (CCIR601), which means that there is 8 bits of
precision in the capture
of the each of the Red, Green, and Blue channels according to a color profile
supplied by the
Digital TV industry committee (CCIR). This is chosen because it is one of the
standard
MPEG 2 video profile sizes and formats. A standard movie encoded to 30 frames
per second
for television usually does a process called 3/2 pulldown, which means that
every fourth
frame is doubled. This means that no extra information is being conveyed in
that last frame,
so we might as well just capture only the frames, A single frame of this
information is
referred to as the Raw Video Data (RVD), and all these frames collectively are
referred to as
the Raw Video Data Stream (RVDS).
Each frame is noise filtered and diagonally flipped to become a new image
where the
horizontal lines correspond to the columns of the original image. If there is
any black band
removal on a frame by frame basis, it is done at the same time as this step.
This is referred to
as the Flipped Video Data (FVD).
The (FVD) is converted into a new image that is half the width and half the
height by a
process of collecting every other pixel. It is important that this is
collected and not averaged
with adjoining pixels. This frame of information is referred to as the Front
half Video Data
(FHVD), and is still converted into YUV format. In this example it is the
lower right pixel of
each 2 by 2 block that is collected.
The pixels that have not been collected into the (FHVD) are collected and
encoded. This new
_g_

CA 02312333 2000-06-21
representative of the data is now referred to as the Back half video data
(BHVD), and consist
of four planes, the delta left intensity plane (dLYP) the delta right
intensity plane (dRYP), the
delta U plane (dUP) and the delta V plane (dVP).
These last two steps are detailed as follows.
(a) Divide the FVD into 2 by 2 blocks. These are pixels with identifiers 11,
12, 21 and 22
based on their Cartesian coordinates. The RGB values of each pixel is R(xy),
G(xy), and
B(xy) where xy is the pixel identifiers.
(b) compute the YUV representation of the all four pixels into Y(xy), U(xy)
and V(xy)
using the matrix conversion formula as follows:
Y 0 +0.29 +0.59 +0.14 R

U 50% + -0.14 -0.29 +0.43 x G
-

V 50% +0.36 +0.29 -0.07 B

(c) Calculate the delta values of the YUV data values with pixel 22, the
bottom right pixel.
These would give the following delta values.
dY( 11 ) Y( 11 ) - Y(22)
-

dY(12) - Y(12) - Y(22)

dY(21 ) - Y(21 ) - Y(22)

dU(11) - U(11) - U(22)

dU(12) - U(12) - U(22)

dU(21 ) - U(21 ) - U(22)

dV(11) - V(11) - V(22)

dV(12) - V(12) - V(22)

dV(21) - V(21) - V(22)

(d) Average the values for the delta U values to get dU.
-9-

CA 02312333 2000-06-21
dUavg = [ dU( 11 ) + dU( 12) + dU(21 ) ]
3
(e) Average the values for the delta V values to get dU
1o dVavg = [ dV(11) + dV(12) + dV(21) ]
3
(f) Collect all left side Y pixel delta values, dY(11) and dY(21) into a
plane, and refer to it
as the delta left intensity plane (dLYP),
(g) Collect all upper right~Y pixel delta values dY(12) into a plane, and
refer to it as the
delta right intensity plane (dRYP),
(h) Collect all dUavg values into a plane and refer to it as the delta U plane
(dUP),
(i) Collect all dVavg values into a plane and refer to it as the delta V plane
(dVP),
Using our original (720x576) pixel picture size, the flipped image FVD would
be (576x720).
This would mean that dLYP is (288x720), dRYP is (288x360), dUP is (288x360)
and dVP is
(288x360) in planar image size. In this example each plane has elements that
have eight (8)
bits of precision. That is for efficiency of implementation in software and
should not be a
restriction on hardware implementations.
3o Each plane is put through a continuous tone grey scale compression
algorithm, such as a
single plane JPEG. Prior to this though, the (FHVD), (dLYP), (dRYP), (dUP),
and (dVP) are
divided into horizontal bands, which correspond to vertical bands of the
original image. If
there was four bands with our (720x576) example, then the FVD of (576x720)
becomes a
(FHVD) of (288x360) consisting of four bands each sized (288x90). It is
allowable to have a
single band encompassing the entire image, and for efficiency it is suggested
that a power of
-10-

CA 02312333 2000-06-21
two number of bands be used. The (FHVD) is compressed in the three equally
sized
component planes of YLTV using a continuous tone image compression algorithm
such as but
not limited to JPEG. Each of these planes are (288x360).
The (FHVD) and the (FHAD) are interleaved with frame specific information such
that the
audio data, video data and padding are easily parsable by a server
application. This is
referred to as the Front half data (FHDATA). In the case that JPEG was used,
this
(FHDATA) should be parsable by any standard JPEG image tool, and any padding,
extra
information, and audio is discarded. This image is of course diagonally
flipped, and needs to
1o be flipped back. The (FHAD) is duplicated in a range of successive range of
corresponding
frames. This is so that only one of a sequence of successive frames need to be
received in
order to be able to reproduce a lower quality continuous audio representation.
The (BHVD) and (BHAD) are stored following the (FHDATA) in a way so that the
server
can easily pull individual bands of the information out from the data. The
(BHAD) is
duplicated in a successive range of corresponding frames. This is similar to
the (FHAD) in
the (FHDATA) but the difference is in how reduntant the information is when
dealing with
high frequency data. The aim is to have some form of audio available as the
video is
presented. The (BHVD) and (BHAD) interleaved in this form is called the back
half data
(BHDATA).
A frame header (FRAMEHEADER), the (FHDATA) and the (BHDATA) put together is
the
complete frame data (FRAMEDATA).
A continuous stream of (FRAMEDATA) can be converted to audio and video. This
is
referred to as streamed data (STREAMDATA). A subset of (FRAMEDATA) can be
constructed by the video server device. This is referred to as subframe data
(SUBFRAMEDATA) and a continuous stream of this information decimated
accordingly is
referred to as subsampled stream data (SUBSTREAMDATA).
A collection of (FRAMEDATA) with a file header (FILEHEADER) is an unpacked
media
-11-

CA 02312333 2000-06-21
file (MEDIAFILE), and a packed compressed representative of a (MEDIAFILE) is a
packed
media file (PACKEDMEDIAFILE).
The server apparatus will on read a (MEDIAFILE), or capture from a live video
source, and
create a (STREAMDATA) that goes to a relay apparatus.
A client apparatus contacts a relay apparatus and requests a certain
(STREAMDATA).
Through a continuous feedback process, the relay will customize a
(SUBSTREAMDATA)
based on the current instantaneous network conditions and the capabilities of
the client
to apparatus, and by specific user request such as but not limited to pan and
scan locations.
(SUBFRAMEDATA) is created from the (FRAMEDATA) by a process of decimation,
which
is the discarding of information selectively. The algorithm for discarding is
variable, but the
essence is to discard unnecessary information, and least perceivable
information first.
Only complete (SUBFRAMEDATA) elements that are reliably received in their
entirety are
rendered. All others are discarded and ignored.
The rendering step is as follows:
The audio data is pulled from the (SUBFRAMEDATA). If (BHAD) exists, then it is
stored
accordingly. (FHAD) always exists in a (SUBFRAMEDATA) and is stored
accordingly.
The (FHVD) which is always available, is decompressed accordingly into its
corresponding
YIJV planes. This is stored accordingly.
The (BHVD) if it is available is used to create a decompressed full size image
using the
following algorithm.
3o alreverse the continuous tone compression algorithm so that there is a
reconstructed [dRYP],
[dLYP], [dUP], and [dVP] [square braces used to indicate reconstructed values.
- 12-

CA 02312333 2000-06-21
b/values from each plane and from the (FHVD) are interleaved into a YUYV data
block.
From [FHVD] . [Y22] [U22] [V22]

From [dRYP] . [dY 11 [dY21 ]
]

From [dLYP] . [dYl2]

From [dUP] . [dU]

From [dVP] . [dV]

to [Y11] - [dYll] + [Y22]

[Y12] - [dYl2] + [Y22]

[Y21 ] - [dY21 ] + [Y22]

[U1] _ [dU] + [U22]
[U2] _ [U22]
[V 1 ] _ [dV] + [V22]
[V2] _ [V22]
[Y 11 ] [U1 ] [Y21 ] [V 1 ] is the YUYV representation of the left two pixels
[Y12] [U2] [Y22] [V2] is the YUYV representation of the right two pixels
All pixels are collected together into a intermediate frame. This display is
then put through
the final reconstruction step of reversing the diagonal flip with another
diagonal flip of the
picture elements. Following this step the columns of YUYV data calculated
above are now
rows of YLJYV data, in the exact format that computer video card overlay
surfaces require.
During this reverse diagonal flip, an optional filtering step can be done to
further remove any
visual artifacts introduced during compression and decompression.
3o The available image is displayed at the appropriate time in the sequence.
If high quality
audio is available, then it is played on the audio device, otherwise the lower
quality audio
-13-

CA 02312333 2000-06-21
sample is used.
The client monitors the number of frames that it managed to receive and it
managed to
decompress/process. This is reported back to the server which then scales up
or down the rate
and the complexity of the data that is sent.
As will be apparent to those of skill in the art, television variants, such as
29.98 fps, 30 fps,
and 25 fps can be downscaled to 24 frames per second by frame decimation
(throwing away
frames). 30 is another ideal framerate for storage, as it can be easily used
for a lot of
1o downscaled framerates, but there is very little difference in the
perception to the average
human eye.
Any continuous tone compression algorithm can be substituted for DCT in JPEG.
A
suggested alternative is Wavelet Image compression, or fractal image
compression.
Any audio rate and multispeaker/stereo/mono/subwoofer combination can be used
for the
high quality and low quality audio signal.
There is two levels of audio and video in the example. This algorithm can be
extended to
2o have 3 levels by having a front third, middle third, and back third. The
server can send either
the first third, the front two thirds, or the whole encoded frame is desired.
Other variants are
academic.
Any rectangular picture size is possible. In particular, 16x9 width to height
picture ratios of
theatrical releases can be captured using a square pixel or a squashed pixel.
Black band
removal can be done either on a frame by frame basis, or across the whole
video stream.
Any capture rate is possible.
Postbuffering can be done by the relay, so that the last n (FRAMEDATA)
elements are
stored. Any new client can have these (FRAMEDATA) or (SUBFRAMEDATA) burst
-14-

CA 02312333 2000-06-21
across the communication channel at maximum rate to show something while the
rest of the
information is being prepared.
Other multimedia types are available such as text, force feedback cues, closed
captioning, etc.
Multiple languages can be encoded and stored within this format, and
selectively requested.
Even higher audio representations, such as Dolby Surround sound 5.1 channel
AC3 format
encoding can be selectively requested as well if high enough bandwidth and
audio processing
1o facilities exist at the client end.
The client device can send multiple cues and requests. If the source is
encoded appropriately,
then multiple angle shots can be stored for either a client controlled pan, or
as a client
controlled position around a central action point.
There is a mechanism for selectively requesting computer generated video
streams to be
created and presented based on user preferences.
Essentially, the present invention consists of a method for multimedia
transmission
2o comprising the following steps:
A signal from client to server specifying the line conditions for multimedia
rendering so that
the multimedia data that is supplied can be modified as conditions changes
The same signal specifying the method by which, the full multimedia data is
reduced into a
fully presentable subset depending on line conditions, direct user control,
and demographic
positioning.
The methods by which the direct user control of the multimedia data requested,
so that the
audio can be modified via mixing, equalization, computer controlled text to
voice additions,
and language selection can be provided by the transmission server.
The same signal specifying a demographic of the audience
3o The same signal containing encryption and authentication data so that the
client is identified
and is provided multimedia data in accordance to the exact request of the
audience
-15-

CA 02312333 2000-06-21
The same signal transmitted through an unpredictable and unreliable
communication channel
in such a way that acknowledgement is required based on time elapsed rather
than by amount
of information received.
The same signal transmitted as full frames of video with sub-sampled redundant
sets of audio
and text information in such as way that at any time the probability that
there is always a form
of playable audio of some quality that is available is maximized
The same signal transmitted with a decimated picture header so that a
simplified rendering
device can be constructed.
1o A MULTIMEDIA compression and coding method comprising the following steps:
A video signal is captured and compressed using a discrete cosine transform
based video
compression algorithm similar to JPEG, whereby the information is ordered in
the
MULTIMEDIA DATA STREAM from top to bottom in sets of interleaved columns
rather
than left to right in sets of progressive rows.
15 The same MULTIMEDIA DATA STREAM that has sets of columns interleaved into
sparse
tiles in a way that allows for fast parsing at the transmitter.
The same MULTIMEDIA DATA STREAM is also stored using interleaved luminance and
chrominance values in YUV4:2:2 format in variably sized picture element
processing sets
tiles that are greater than 8 by 8 byte matrixes in units that are powers of
two, such as but not
20 limited to 64 by 64 matrixes and 128 byte by 128 byte matrixes.
The same MULTIMEDIA DATA STREAM is also stored as a lower resolution decimated
JPEG image as a header with the required information to reconstruct a higher
resolution
image stored as a secondary and tertiary set of information in comment blocks
of the JPEG,
or as additional data elements that may or may not be transmitted at the same
time as the
25 header, both put together are termed for this documents as COMMENT TYPE
INFORMATION:
The same MULTIMEDIA DATA STREAM with the COMMENT TYPE INFORMATION
variably encrypted and authenticated in such a way that the origin of the
source, and the
legitimacy of the video requester can be controlled and regulated.
3o The same MULTIMEDIA DATA STREAM with audio, text, and force feedback
information
encoded as COMMENT TYPE INFORMATION within the image file, so that standard
-16-

CA 02312333 2000-06-21
picture editing software will parse the file, yet not store or extract the
additional multimedia
information.
The same MULTIMEDIA DATA STREAM with audio encoded with variable sampling
rates
and compression ratios, and then packaged as COMMENT TYPE INFORMATION in such
a
way that a long time period of low quality audio and short periods of higher
quality audio is
redundantly transmitted.
The same MULTIMEDIA DATA STREAM with types of multimedia information, such as
but not limited to text and subtext, language and country specific cues, force
feedback cues,
control information, and client side 3-d surface model rendering and texture
information,
1o program flow elements, camera viewpoint information encoded as COMMENT TYPE
INFORMATION.
A MULTIMEDIA CONTENT CREATION apparatus comprising the following components:
Software or hardware that can take an industry standard interface for
capturing audio, video,
and other types of multimedia information, such as but not limited to text and
subtext,
language and country specific cues, force feedback cues, control information,
and client side
3-d surface model rendering and texture information, program flow elements,
camera
viewpoint information, and then compressing and encoding the information into
a
MULTIMEDIA DATA STREAM format as described above and then storing the data
into a
2o MULTIMEDIA DATA STORE.
A MULTIMEDIA TRANSMISSION apparatus comprising the following components:
A MULTIMEDIA DATA STORE that will, on an authenticated or unauthenticated
request,
transmit the previously described MULTIMEDIA DATA STREAM to another
MULTIMEDIA TRANSMISSION apparatus in its entirety.
A MULTIMEDIA TRANSMISSION RELAY that will, on an authenticated or
unauthenticated request, set up a network point that one or many MULTIMEDIA
DATA
STORE can transmit to, and that one or many MULTIMEDIA RENDERING apparatus can
request said MULTIMEDIA DATA.
The same apparatus that can, based on time specified acknowledgement
information, will
modify the information that is presented by a process of parsing, merging, and
filtering in
17-

CA 02312333 2000-06-21
such a way that required information is always sent redundantly, and less
important
information is removed first, based on a selection criteria specified by the
MULTIMEDIA
RENDERING apparatus.
The same apparatus that will collect and store information based on the
audience
demographic, and may or may not modify the MULTIMEDIA DATA STREAM to
accommodate visual cues, market based product placement,
The same apparatus that will post-buffer the information that has already been
sent, so that at
the request from the MULTIMEDIA RENDERING apparatus, the missing information
can
be retransmitted at faster than real time rates.
A MULTIMEDIA RENDERING apparatus comprising the following components:
A software program or hardware device that can receive through some
communication
channel in a timely manner from reception time, the previously mentioned
MULTIMEDIA
DATA STREAM and will produce a video picture stream and audio stream that can
be
presented to an audience
The same apparatus that can present all other types of multimedia information,
such as but
not limited to text and subtext, language and country specific cues, force
feedback cues,
control information, and client side 3-d surface model rendering and texture
information,
program flow elements, camera viewpoint information.
2o The same apparatus that may or may not be a stand alone application, a plug
in for an existing
application, a standalone piece of hardware, or a component for an existing
piece of hardware
that may or may not have been originally intended for the use of being a
MULTIMEDIA
RENDERING DEVICE, but can be easily modified to be such a device.
Although illustrative embodiments of the invention have been described in
detail herein with
reference to the accompanying drawings, it is to be understood that the
invention is
not limited to those precise embodiments, and that various changes and
modifications can be
effected therein by one skilled in the art without departing from the scope
and spirit of the
invention as defined by the appended claims.
-18-

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(22) Filed	2000-06-21
(41) Open to Public Inspection	2001-12-21
Dead Application	2002-09-26

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2001-09-26	FAILURE TO RESPOND TO OFFICE LETTER
2002-02-20	FAILURE TO RESPOND TO OFFICE LETTER
2002-06-21	FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Application Fee			$150.00	2000-06-21

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
SATO, KIMIHIKO E.
MYERS, KELLY LEE

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Representative Drawing	2001-11-27	1	10
Cover Page	2001-12-14	1	44
Abstract	2000-06-21	1	26
Description	2000-06-21	18	823
Claims	2000-06-21	1	24
Drawings	2000-06-21	10	140
Correspondence	2000-08-01	1	24
Assignment	2000-06-21	3	81
Correspondence	2001-10-31	1	19
Correspondence	2001-11-20	1	15
Correspondence	2001-11-20	1	28

Language selection

Menus

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2312333 Summary

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.