Note: Descriptions are shown in the official language in which they were submitted.
CA 02311095 2000-06-09
1
DYNAMIC TRAFFIC SHAPING OF A
MULTI-CHANNEL AUDIO STREAM OVER THE INTERNET
Field of the invention
The present invention concerns a method and system for dynamic traffic shaping
of a multi-channel audio stream and a packet based network.
Background of the invention
Streaming digital music over a packet-based network such as the Internet is
available today through solutions such as ReaINetwork's ReaIPlayer~ and
Microsoft's Windows Media Player. Unfortunately, the quality of current music
formats on the Internet is relatively low. For example, the popular MP3 format
uses
a rather a low-end compression scheme, inferior to CD-quality. Furthermore,
none
of the current Internet formats support multi-channel audio, which will play
an
important part in the future of audio.
The limiting factor in delivering high quality audio over the Internet is
typically the
available bandwidth of the underlying networks. This limit is especially felt
by users
whose last-mile connection to the Internet is a slow telephone modem. The
recent
emergence of residential broadband technologies such as DSL and cable modems
provides an opportunity to implement systems for streaming high quality, high
bandwidth, multi-channel audio over the Internet. However, even though
transfer
speeds are increased considerably, data congestion still occurs since the
broadband connections are shared by many customers using multiple transfer
methods such as web surfing and file downloading, to access enriched content.
The present invention concerns a real time, dynamic traffic-shaping method
based
on the concept of layered coding congestion control. In a preferred
embodiment,
the method is tailored for delivering audio encoded in Dolby Digital~ 5.1
format, the
CA 02311095 2000-06-09
2
most popular, high quality, multi-channel audio format, as used in DVDs. This
method optimizes transmission over "best effort" packet networks such as the
Internet.
DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
ABBREVIATIONS
For the purpose of the present document, the following abbreviations are
adopted:
~ AOD -- Audio On Demand
~ kbps -- Kilo-bits per second
~ Mbps -- Mega-bits per second
~ REID --
~ RTT -- Round Trip Time
WHY DOLBY DIGITAL~?
The Dolby Digital~ 5.1 audio coding scheme (also known as AC-3) was developed
by Dolby Laboratories to provide high-quality, multi-channel digital audio to
the
consumer market. The AC-3 algorithm provides audio in 5.1 channels (left,
center,
right, left and right surround, and a 1/10 bandwidth subwoofer), and is used
in a
large majority of home-theatre setups. By applying psycho-acoustic digital
signal-
processing algorithms, Dolby Digital~ is able to achieve excellent compression
ratio while still preserving almost all of the important features of the
original audio
stream. In most current applications, the AC-3 coding scheme compresses a PCM
representation requiring more than SMbps (6 channels x 48hHz x 18bit) into a
384
kbps bit stream. Due to its excellent sound quality and multi-channels nature,
it
has been adopted by high definition TV (HDTV) and digital versatile disk (DVD)
for
their audio systems. This audio format is highly superior to anything
currently
available on the Internet.
CA 02311095 2000-06-09
3
Recently, another multi-channel audio format known as AAC (MPEG-II Advanced
Audio Coding) has captured great attention within the circle of professorial
audio
engineers. AAC provides greater sound fidelity than Dolby Digital~ with even
better
compression rate. Although preliminary contact with some influential members
of
the Audio Engineering Society suggests that widespread adoption of AAC is
several years away, the present invention can be adapted to support AAC.
Dolby statistics for May 2000 show that there have been over 45,389,120
products
incorporating Dolby Digital~ sold. Due to the widespread acceptance of Dolby
Digital~, it will likely remain the leading multi-channel audio format for the
consumer market, for years to come.
LAYERED CODING BASED CONGESTION CONTROL
This section presents an introduction to layered coding congestion control
which is
an aspect of the method for a dynamic traffic-shaping algorithm of the
invention.
When the traffic on the Internet exceeds the available bandwidth, there are
buffer
overflows on the intermediate routers. Most modern routers drop packets from
the
tail of the queue when a buffer overflows happen. Congestion control is the
algorithm that reduces the traffic according to the congestion of the network.
By
reducing the traffic, congestion can be quickly relieved. More importantly,
all user
traffic can share the Internet more fairly.
The classical congestion control algorithms are the ones deployed in TCP. The
idea is that the TCP sender will back-off and slow down its rate of sending
packets
in case of network congestion, which is indicated by the packet loss. Although
being a classical algorithm, this kind of congestion control is not suitable
for
streaming of audio data since it will cause discontinuity of the music stream,
therefore degrading the audio experience tremendously. WinAmp and many other
popular MP-3 players support streaming over the Internet. However, it is
frequent
CA 02311095 2000-06-09
4
for users of these systems to experience music breaks when the network is
busy,
since they use TCP as their transmission protocol. To counter unpleasant
breaks
in the music, many systems impose a buffering delay at the start of the
transmission.
Layered coding, also known as hierarchical coding or embedded coding, was
first
developed for packet audio transmission. In layered coding, a signal is
separated
into subsignals of various importance in order for them to be coded and
transmitted separately. In fact, the data format of packetized voice, as
specified
by ITU-T 6.764 is arranged in layers. This technique was also extended to
video
coding and transmission. In layered coding, network congestion will always
first
affect the subsignals of low importance. Thus, hierarchical coding offers a
way of
achieving error control by preventing loss of perceptually important
information.
The most important idea of layered coding is that by reducing sampling
resolution,
it is able to reduce the bandwidth usage of the media stream, while keeping
the
frame rate constant during congestion. It is significantly different from the
congestion control mechanism used in TCP, in which transmission rate (i.e.
frame
rate for audio and video) is reduced during periods of congestion. Layered
coding
relieves nefinrork congestion by reducing the quality of the transmitted
media, while
preserving the frame rate at which it is sent. Therefore, by applying layered
coding, users are able to receive a reasonable, continuous music stream while
sharing the bandwidth efficiently in case of network congestion.
Although many applications have implemented layered coding congestion control,
the exact implementation is dependent on many factors such as: type of data
(audio or video), timing constraints, bandwidth constraints and quality
constraints.
Among them, the most important factor is the exact type of media that is
transmitted. For example, the present invention is sufficiently flexible to
apply its
dynamic traffic shaping to AC-3, AAC and MP-3. Although the concept is the
same, the implementation in each case is significantly different from the
others.
CA 02311095 2000-06-09
DYNAMIC TRAFFIC SHAPING OF AN AC-3 STREAM OVER THE INTERNET
As mentioned earlier, the default rate of AC-3 as coded on DVDs is 384Kbps.
However, it is possible to generate AC-3 with either higher or lower bit rates
in
order to obtain different audio quality. For example, the Dolby Encoder DP569
is
5 able to generate an AC-3 stream at 640kps, 320kps and 256kps, 128kps. This
forms one of the foundations of the present invention.
I
Server I
j Client
I
Music Transmission I Transmissbn
Storage Engine Engrt~e
Bit Rate I I
Control Info I I
I I I
I I
Traffic Shaping I
_1_________~
Engine Lo3 Rate, I
Jitter, RTT j
Figure-1
Figure-1 demonstrates the block diagram of a client-server based music
delivery
system that implements the dynamic traffic-shaping system and method of the
present invention. There are three important components in the server: music
storage, traffic-shaping engine and packet-transmission engine. The music
storage is a large REID device that stores all the music tracks. All the music
tracks
are preferably encoded in a proprietary nac3 (net AC-3) format, but other
formats
are equally applicable. The traffic-shaping engine is the software module that
implements the core algorithm. It has a well-defined interface to exchange
information with other modules. In order to make a decision, it takes standard
parameters that describe the network performance: number of users connected,
packet loss rate, RTT and average arriving fitter at the receiving end. To
achieve
the best result, all three parameters are required, although it is possible to
operate
with only the packet loss rate.
CA 02311095 2000-06-09
6
The packet-transmission engine is a software module that is responsible for
transmitting the audio data to the end user. The traffic-shaping engine is
completely independent from the packet-transmission engine. This allows the
method of the present invention to be easily integrated into existing
software. In
section 5, a sample implementation that integrates MediaStack's algorithm into
the
RealAudio~ system is presented.
The traffic-shaping engine evaluates the network congestion based on the
parameters mentioned in last section. If there is no congestion, it tells the
packet-
transmission engine to transmit the AC-3 stream with the highest quality from
the
music track. In case of congestion, the algorithm computes the proper bit rate
of
the AC-3 stream that should be transmitted in order to achieve the best
throughput
for all the connected users, and communicates this value to the transmission
engine. The transmission engine will dynamically switch to transmit the AC-3
data
with the new bit rate. Once the network starts to recover, the traffic shaping
will
start to increase the bit rate of the AC-3 streams. Of course, there are a
variety of
different strategies available, although in a preferred embodiment, a nature
format
preferably is used.
PREPARATION OF THE NETWORK READY AC-3 AUDIO FILE
The AC-3 data file format is not optimized for streaming over Internet, as it
was
originally designed for local fixed media storage like DVD. For transfer on
digital
audio links, AC-3 data is embedded within a transmission protocol. For
example,
AC-3 is embedded in AES/EBU for communicating between professional audio
equipment, and it is embedded in S/P-DIF for the consumer products such as DVD
players and home amplifiers. Most of the music tracks in the AC-3 format
contain
extra information that is added to facilitate the embedding in either AC-3 or
S/PDIF, increasing the file size unnecessarily. Transmitting such tracks over
a
network would therefore be very inefficient.
CA 02311095 2000-06-09
7
In order to effectively stream AC-3 audio over the Internet, a network ready
AC-3
file format (.nac3) that is customized for dynamic traffic shaping has been
developed. A nac-3 file is encoded with several AC-3 streams of the same music
track with different bit rates. A file header describes the exact composition
of
individual blocks. The different AC-3 streams are encoded in such a way as to
facilitate the dynamic switching between different streams in real time, as
required
during network transmissions.
SAMPLE IMPLEMENTATION
The modular design of the traffic-shaping engine of the present invention
allows it
to be integrated into many different audio products. To date, a proprietary
transmission system has been developed, but has also implemented its traffic-
shaping algorithm for AC-3 within a set of Real Network's ReaISystem~ G2
plugins. This algorithm could also be integrated into Microsoft Media Player.
AC-3 FILE FORMAT AND RENDERING PLUGINS
Real Network's ReaISystem~ is based on COM binary standard. It provides a
media streaming platform that allows custom-developed server and client
plugins
for streaming new audio file formats.
In order to deliver AC-3 data from the ReaIServer~ to the ReaIPlayer~, a
special
AC-3 file format and rendering plugins has been implemented. As shown in
Figure-2, the AC-3 file format plugin is used by the ReaIServer~ to convert AC-
3
frames into a stream of Real Media packets. The AC-3 rendering plugin is used
by
the ReaIPlayer~ to repack the received Real Media packets into AC-3 frames.
CA 02311095 2000-06-09
oca i
Systan Plugin
Streaming Packet
Real SeNer w - ...... ~~ ."Real Player AC3 frames
A~3 File - en erng
Format Plugh Plugin
Figure-2
ADAPTIVE BIT RATE STREAMING
When streaming AC-3 data through ReaISystem~ Platform, adaptive bit rate is
achieved by using the feedback channel, as illustrated in Figure-3.
Streaming Pxket;
S/PDIF
usic Real Server 0 0 0 ~ Real Player compatible
Storage - --- -- - -- --- ~ AC-3frame
Badc Channel
i (Packet Loss, I
Jitter, RTT) I
Trarsm'ssion ~ Traffic Shaping ~ ~ ~ ~ Trarsm'ssion
Engine Engine Engine
AC3 FileFortnat Plugs I I AC3 Renderhg Plugin
Figure-3
Figure-4 illustrates the dynamic traffic shaping architecture implemented in
the
ReaISystem. While re-packing the Real Media packets into AC-3 frames with
S/PDIF compatible headers, the AC-3 rendering plugin continuously monitors the
net traffic parameters, like packet loss, fitter and RTT. When these
parameters
indicate that the net traffic increases to a dangerous level, the rendering
plugin
sends a message to file format plugin through ReaISystem's back channel. With
CA 02311095 2000-06-09
9
this message, the rendering plugin passes the net traffic parameters to the
file
format plugin.
As soon as the traffic shaping engine in the file format plugin receives the
net
traffic parameters from the rendering plugin, it processes these parameters
and
sends a bit rate control message to the transmission engine to adjust the
streaming bit rate.
ea _ ~ -
l1V
J
Play Renderirg Serv FileFormat
er er
Play Connect
an to
AC-3 Server Locate
Mrsic fuC-3
S~rage
Load E4C-3
Renderirg Init Load
Pg Transm'ssion AC-3
Engine FileFormat Init
Pg Transmission
Engine
Data
Trarsm'ssion
Init
Traffic
C~Pu~ Adjust Shaping
Engine
Packet Streaming
Lass,
litter, BilReate
RTT
Badc Channel
(Packet Lcss,
litter, RTT)
Figure-4
CONCLUSION
Through advanced knowledge in nefinrork protocols and digital multichannel
audio
encoding, a new adaptable method for efficiently transmitting Dolby Digital~
audio
through congested packet networks such as the Internet is proposed. The
methods and systems of the present invention have been integrated into two
CA 02311095 2000-06-09
systems for this purpose, and have been tested in varying network congestion
conditions. Preliminary testing indicates a good performance in transmitting
DVD-quality audio through home DSL connections, as well as through cable
based systems.
5
Although the present invention has been explained hereihabove by way of a
preferred embodiment thereof, it should be pointed out that any modifications
to
this preferred embodiment within the scope of the appended claims is not
deemed
to alter or change the nature and scope of the present invention.