Note: Descriptions are shown in the official language in which they were submitted.
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
TITLE OF INVENTION
AUDIO CHUNKING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001.1 This application claims priority to the filing date of U.S.
Provisional
Application for Patent filed on June 30, 2004 entitled AUDIO CHUNKING and
having been assigned serial number 60/584,058, which application is hereby
incorporated by reference.
BACKGROUND OF THE INVENTION
[0{1021 Alexander Graham Bell's notebook entry of 10 March 1876 describes his
successful experiment with the telephone. Speaking through the instrument to
his
assistant, Thomas A. Watson, in the next room, Bell utters these famous first
words,
"Mr. Watson -- come here -- I want to see you." Every since this time,
engineers,
marketers and consumers have been on a quest for the faster delivery of more
information tlirough telecommunication a.nd/or computer networks. In a short
period
of time, we have moved from 300 baud modems delivering data to the home to
full-
blown Tl carriers, cable modems and DSL lines bringing data to consumers at
millions of bits per second.
[0003] Although the technological advances in the speed of data delivery have
astonishing, they are still challenged by the imagination of users. As
bandwidth and
data rate increases, users continue to come up witli applications that
challenge the
capabilities of the current state of technology. Applications that require the
downloading of extensive amounts of data, audio files, video files and
graphics can
easily challenge the bandwidth and data throughput capabilities of home and
office
network solutions. At the data rates increase, then the quality of the audio,
video or
other data will also iinprove, thereby requiring the download of even more
data and
once again challenging the throughput of the network.
[00041 As a result, users are somewhat accustomed, especially in the realm of
personal computer applications, to waiting at least some period of time for a
data file,
audio file, video file or graphics file to download before they can utilize
the file.
More specifically, for downloading audio files, the users are used to waiting
several
1
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
seconds while a streained audio file is downloaded, or at least a significant
amount of
the file is loaded into a buffer.
[0005j In the context of a voice mail systein, such delays are not acceptable.
Thus,
there is a need in the art for a tecluiique to minimize or alleviate the delay
experienced
by a user downloading an audio file, especially in the context of the delivery
of voice
mail message through a telecommunications system.
BRIEF SUMMARY OF THE INVENTION
[0006] The present invention satisfies the above-listed needs in the art, as
well as
other needs, by providing a technique for downloading files on a chunk by
chunk
basis, maintaining sufficient data on the target destination to ensure
uninterrupted
playback or access to the data. In general, when the download of a file is
requested,
two portions of the file are transferred to the requesting target. While the
first portion
of the file is being played back or utilized, a third portion of the file is
downloaded.
This operation continues until the entire file is downloaded or the user has
requested
the download to stop. As a result the user is able to access the data in an
uninterrupted manner with minimal delay.
f0Q071 In one embodiment, the present invention is incorporated into a voice
mail
system to facilitate the access of voice messages by subscribers. When a
subscriber
attempts to retrieve a voice mail message, the metadata, the first portion or
first two
chunks of the file are downloaded to the requesting target. Once downloaded,
which
takes place in a short period of time, the playback coinmences. While the
playback of
a first portion of the voice mail message is active, a next portion of the
message is
downloaded to the subscriber. Thus, the subscriber has continuous feed of the
audio
with minimal delay. Advantageously, the present invention provides a
continuous
playback of audio and/or video files with requiring extensive buffering at the
target
destination, without incurring a significant delay in the reception of the
start of the
audio and/or video, and provide continuous playback of the content.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING
[001181 Fig. 1 is a system diagram illustrating the components and the
connectivity of
an exeinplary next-generation communications platform in which the present
invention can be incorporated.
2
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
[0009] Fig. 2 is a flow diagram illustrating the operation of an exemplary
embodiment of the present invention.
[001.f)] Fig. 3 is a timing diagram illustrating another embodiment of the
present
invention.
DETAILED DESCRIPTION OF THE INVENTION
[00:1.1] The present invention is directed towards the provision of audio
messages
using a chunking technique, or the delivery of the audio in small pieces. This
invention involves, breaking the audio message into several pieces or chunks.
Iiiitially, when a user requests the download of the audio message, two chunks
are
immediately downloaded. Once the first two chunks are delivered, the first
chunk
begins to be played back to the user. While the first chunk is being played, a
third
chunk is downloaded. Upon the completion of the playback of the first chunk,
the
playback of the second chunk is commenced, the third chunk is fully
downloaded, and
the down load of a next cliunk is initiated.
[0E112] Advantageously, aspects and embodiments of the present invention
provide a
seainless audio interface from the user's perspective with unnoticeable delay
in
playback. The delivery of the audio can be performed using a TCP/IP protocol
that
provides ordering and retransmission, or some other siinilar or similarly
functional
protocol that can provide some level of reception assurance and packet
ordering. The
chunk sizes are chosen so as to minimize the delivery of the initial download,
and
provide a level of assurance that continuous audio will be available to the
user.
100:1.31 United States Patent Application filed on , and assigned serial
number 11/ , describes a distributed IP architecture for telecominunications
voice mail system. The contents of this application are incorporated herein by
reference.
[0014] Fig. 1 is a system diagram illustrating the components and the
connectivity of
an exemplary next-generation communications platform in which the present
invention can be incorporated. The illustrated system includes a distributed
IP-based
architecture for telecommunications equipment that, among other things, can
provide
telecommunication services such as voice mail, call forwarding and other
telecommunication features. In the illustrated embodiment, the next-generation
communications platform 100 has a distributed IP architecture and is connected
to the
Public Switched Telephone Network (PSTN) or a Mobile Switching Network (MSC)
3
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
110. The communications platform 100 is illustrated as including a signaling
gateway
function (SGF) 120, one or more media servers (MS) 130, one or more system
management units (SMU) 140, one or more application servers (AS) 150 and one
or
more central data and message store (CDMS) or next generation message storage
device (NGMS) 160. It should be understood that the distribution of
functionality
illustrated in the figures and described is not the only acceptable platform,
and aspects
of the present invention could be incorporated into a system that includes
fewer or
more components and a different arrangement of functionality among the
components.
[001.5] In the illustrated distributed system, problems associated with the
download
and playback of voice mail messages are introduced. Rather than a subscriber
calling
into a system dedicated to providing the voice messages, such a system is
required to
deliver the messages over an IP network to the media server. This can result
in
significant delays in the retrieval of the messages and also result in dead
space
between messages or portions of the message. The present invention provides
for the
seamless delivery of voice messages through audio chunking.
100161 In general, the SGF 120 serves as the Signaling System 7 (SS7)
interface to
the PSTN, MSC or other telecommunications network 110. The media server 130
terminates IP and/or circuit switched traffic from the telecommunications
network via
a multi-interface design and is responsible for trunking and call control. The
application server module 150 generates dynamic VoiceXML pages for various
applications and renders the pages through the media server 130 and provides
an
external interface via a web application server configuration. The SMU 140 is
a
management portal that enables service providers to provision and maintain
subscriber accounts and manage networlc elements from a centralized web
interface.
The CDMS 160 stores voice messages, subscriber records, and manages specific
application functions including notification. Each of these sub-systems are
described
in more detail following.
10017] Each of the components in the next-generation communications platform
is
independently scalable and independently interconnected onto an IP network.
Thus,
the components can be geographically distributed but still operate as a single
communications platform as long as they can communicate with each other over
the
IP network. This is a significant advantage of the present invention that is
not
available in state-of-the-art communication systems.
4
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
[(10:1.81 The MS 130 terminates IP traffic from the SGF 120 and circuit-
switched
traffic from the PSTN 110. The MS 130 is responsible for call set up and
control
within the platform architecture. The MS 130 processes input from the user in
either
voice, DTMF format or other signaling scheme (much like a web client gathers
keyboard and mouse click inputs from a user). The MS 130 then presents the
content
back to the user in voice form (similar in principle to graphic and text
displayed back
to the user on a PC client). This client/server methodology is important in
the
platform architecture in that it enables rapid creation of new applications
and quick
utilization of content available on the World Wide Web. The client/server
architecture also is an enabler for the ability of the system to be
geographically
distributed.
[00 19] Voice messages that are left for a subscriber are stored in the CDMS
160 and
can be retrieved by the subscribers at a later time. When a subscriber
retrieves voice
messages, the audio messages are delivered to a Media Server 130 from the CDMS
160 via one or more Application Servers 150. Advantageously, the audio
messages
can be interleaved and thus, multiple voice message playbacks for multiple
users can
be accommodated.
[00201 Fig. 2 is a flow diagram illustrating the operation of an exemplary
embodiment of the present invention. Although this embodiment is described
within
a voice mail retrieval environment, it will be appreciated that the various
aspects of
the present invention can be employed in a variety of environments. In the
described
embodiment, it is assumed that the distributed voice mail system has received
a
plurality of voice messages for a particular subscriber. At step 210, the MS
130
receives an incoming call from a subscriber requesting to review the voice
mail
messages. At this point it is necessary for the MS 130 to extract the voice
mail
messages from the CDMS 160 or next generation message system (NGMS). At step
215, the MS 130 requests the subscriber's voice mails to be retrieved. In the
illustrated embodiment this is shown as placing a request to the AS 150. At
step 220,
the AS 150 retrieves the header or metadata information from the NGMS 160 and
provides this information to the MS 130 at step 225. The metadata is
relatively small
block of data and is transferred rather quickly. In an exemplary embodiment,
the
metadata includes header information. As a non-limiting example, the header
infomiation can include the time the message was received, the length of the
message,
the identity of the sender of the message, the priority of the message, the
class or type
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
of message, etc. The MS 130 and the AS 150 cooperate to convert the metadata
into a
VXML page and begin to render it to the caller 230A. The AS 150 operates
simultaneously to extract blocks of the voice message associated with the
metadata
230B. In an exemplary embodiment, two blocks of 16 K byte portions of the
voice
data are retrieved while the metadata VXML is played for the caller. At step
235, the
two blocks or chunks are delivered to the MS 130 for playback once the AS 150
retrieves them from the CDMS 160. Once the metadata is completed, the caller
can
immediately begin the playback of the voice message by playing back the first
block
240A. Simultaneously, the AS 150 proceeds to extract the next block of the
voice
message from the CDMS 160 at step 240B and delivers the next block to the MS
130
at step 245. The MS 130, after rendering the first block to the calling
subscriber,
begins to render the second block 250A wliile the AS 150 retrieves the next
block
from the CDMS 160 at step 250B. Thus, the system operates to always keep at
least
one block ahead of the playback of the message. Thus, while the caller listens
to the
first block, the third block is requested by the AS 150 and provided to the MS
130. It
should be appreciated that the present invention can also be iinplemented in a
"just-in-
time" fashion. This means that rather than ensuring that the delivery of the
voice
message is at least one block ahead of the playback, the deliver of the next
block of
the voice message can be made just in time for playback before the preceding
block is
completed.
[00211 One advantage of the present invention is ability to minimize the
amount of
data downloaded if a user simply wants to scan his or her messages. For
instance, if a
user requests the download of a voice message, two chunks of the voice message
can
be downloaded and playback can commence. If the user decides to delete or skip
this
message, the user can so instruct the MS 130 through either a voice command or
a
DTMF command. While the system is processing the user's action the first two
chunks of the next message can be downloaded. For instance, if while the user
is
listening to a message, the user elects to skip the rest of the message and go
to the
next message, the metadata for the next message is then extracted from the
NGMS
160 (unless it was previously extracted) and converted to a VXML page (unless
it was
previously converted) and then rendered to the MS 130. Again, while the
metadata
VXML is being rendered, the AS 150 retrieves the first two blocks of the next
voice
message. Thus, rather than downloading an entire message or series of
messages,
only the content that is imminently necessary for the user is downloaded.
6
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
[0022] Fig. 3 is a timing diagram illustrating another embodiment of the
present
invention. In this diagram, the CDMS has been excluded for simplification
purposes.
At step 310 the MS 130 receives a request from a subscriber via
telecommunications
network 110 to retrieve his or her voice mail messages. At step 315 the MS 130
requests the subscriber's voice mail messages from the AS 150. At step 325,
the AS
150 delivers the metadata of the first voice message to the MS 130 which then
begins
to render the metadata to the subscriber at step 330. The AS 150 then proceeds
to
extract the metadata for the second voice mail message at step 335 and the
first two
blocks of the first voice mail message at step 340. At step 345, the playback
of the
metadata for the first message is complete and the MS 130 begins the playback
of the
first block of the first message. If the subscriber elects to skip the
remainder of this
message 350, the MS 130 has already received the metadata for the second voice
mail
message and thus, it immediate begins the playback of the second voice mail
message
metadata 355. During the playback of the metadata for the second voice mail
message, the AS 150 retrieves the metadata for a third voice mail message is
any 360
and the first two blocks of the second voice mail message 365. Upon completion
of
the playback of the metadata for the second voice mail message, the MS 130 has
received the blocks of the second message and immediately commences the
playback
of the first block 370.
10Ã123] In another embodiment, a smart downloading of the audio chunks can be
performed. For instance, if a user has inultiple audio files to be downloaded,
such as
a series of voice mail messages or several songs selected from an MP3 download
cite,
etc, the smart downloading can be applied incorporating aspects of the present
invention. In this embodiment, the metadata for the first audio file is
downloaded and
rendered to the user while the next block is being retrieved. If the block
sizes are
chosen such that the playback time exceeds the average download time, then
eventually the entire audio file will be downloaded but the playback will
still be in
process. This aspect of the present invention is based on this characteristic.
In one
embodiment, once the first file is completely downloaded, the present
invention can
operate to start a download of the second file. Thus, the user is able to
transition to
the next file without any lag time.
(00241 In anotller embodiment, most applicable to a voice mail environment but
not
limited to such environment, the chunk sizes are chosen such that the playback
time
exceeds the average download time. During the download of a first file, a
block count
7
CA 02571122 2006-12-18
WO 2006/005052 PCT/US2005/023725
is maintained. Once enough chunks have been downloaded to ensure that playback
time remaining exceeds the chunk download time by at least a factor of 2,
chunks of
the second file are then downloaded. In a high-speed delivery network, this
aspect of
the invention can be applied in a cascaded manner so that portions of multiple
files
are simultaneously downloaded, and the user is able to experience
uninterrupted
playback regardless of whether the user listens to the entire files
sequentially or, skips
or deletes messages before listening to them through completion, or skips over
messages or recalls messages directly.
[00251 In an application of the smart download, the strategy for the download
can
change in response to the user's activities. For instance, if the smart
download is able
to download multiple messages, if the user skips to another message, the
messages
being downloaded by the invention can be adjusted. For example, suppose a user
is
listening to the playback of a first audio filel. While the user is listening,
the
remainder of the first audio file, along with portions of the next N audio
files can be
downloaded. If the user elects the playback of message X, then the download of
file
X can be initiated and once playback commences, the download of the next N
audio
files after message X can be initiated.
[00261 In an exemplary and non-limiting embodiment, the size of the audio
chunks
can be between 1 to 5 seconds. The present invention is applicable to, but not
limited
to, the downloading of audio, video and data. The invention can work with a
variety
of file types in a variety of formats and using a variety of delivery
mechanisms and
protocols.
[00271 The present invention has been described using detailed descriptions of
einbodiments thereof that are provided by way of example and are not intended
to
limit the scope of the invention. The present invention can be impleinented as
a
process that runs within a variety of system environments or as an entire
system
including various components. The described embodiments comprise different
features, not all of which are required in all embodiments of the invention.
Some
embodiments of the present invention utilize only some of the features,
aspects or
possible combinations of the features or aspects. Variations of embodiments of
the
present invention that are described and embodiments of the present invention
comprising different coinbinations of features noted in the described
embodiments
will occur to persons of the art.
8