Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
CA 02620483 2011-01-26
77787-105
SYSTEM FOR AND METHOD OF GENERATING AUDIO SEQUENCES
OF PRESCRIBED DURATION
100031 Background
100041 The present invention relates to the automatic generation of a musical
piece in a
selected genre and having a desired duration.
[00051 In a typical audiovisual environment, it is often desirable to shorten
or lengthen the
performance duration of, for example, a portion of a sound track associated
with a video clip of a
home or cinematic movie or a television commercial. Video sequences are often
repetitively
edited before an aesthetically satisfactory sequence is achieved. An audio
source segment
associated with the video sequence must similarly be edited to form an audio
output sequence
that is synchronized to match the duration of the edited video output
sequence.
[0006[ Exemplary video editing products, such as Pinnacle Studio (Pinnacle
Systems, Inc.,
Mountain View, CA), are not typically equipped with the powerful audio editing
capabilities of
music production applications such as, for example, the Cubase System SX
(Steinberg Media)
Technologies GmbH) application and its progeny. Such music production
applications allow
segmentation of a musical piece into discrete audio blocks, which can be
manually configured in
a desired order by an author of an audio sequence.
[0007) Applications for shortening or lengthening the performance duration of
a music piece
include systems that rely on audio block looping, on uniformly regulating the
reproduction speed
of the piece, and/or on rearranging the audio blocks in orders that do not
necessarily preserve the
musical integrity, or theme, of the edited music piece.
CA 02620483 2011-01-26
77787-105
[00081 Summary
According to one aspect of the present invention, there is provided a
system for automatically generating a composed audio output sequence of a
prescribed duration, the composed audio sequence associated with a musical
piece, comprising: a data storage library populated with a plurality of audio
parts
having predetermined durations and associated with at least one musical piece,
the plurality of audio parts including one or more intro parts suitable as
introductions to an audio selection, one or more outro parts suitable as
endings to
the audio selection, and optionally one or more main parts, marker data
associated with each audio part indicative of a musical structure type and
optionally one or more part properties, one or more audio selections
associated
with the at least one musical piece, wherein each audio selection is comprised
of
a plurality of the intro parts having distinct durations, one or more of the
outro
parts having distinct durations, and optionally one or more main parts having
distinct durations and suitable for use in the audio selection between a
particular
intro part and a particular outro part, and one or more templates associated
with
the one or more audio selections, each template comprised of a style
identifier
and text string representing a preferred ordering of one or more of the audio
parts;
an interface permitting a user: to prescribe a target duration, to choose a
musical
style, to choose a musical piece from among the at least one musical piece in
the
data storage library that conform to the chosen style, to choose an audio
selection
from among the audio selections in the data storage library that conform to
the
chosen style and that have an available duration range encompassing the target
duration, and to preview the chosen audio selection; and a sequence composer
for identifying one or more audio selections of the one or more audio
selections in
the data storage library to the user that conform to the chosen style and that
have
respective duration ranges encompassing the target duration and for
automatically
composing a sequence of audio parts conforming to the chosen selection, the
sequence derived from the template associated with the chosen selection, the
sequence having an intro part, an outro part, and optionally one or more main
parts in a preferred ordering and further having a duration substantially
matching
the target duration.
2
CA 02620483 2011-01-26
77787-105
According to another aspect of the present invention, there is
provided a method of automatically generating a composed audio sequence of a
prescribed duration, the composed audio sequence associated with a musical
piece, comprising the steps of: providing a data storage library populated
with a
plurality of audio parts having predetermined durations and associated with at
least one musical piece, the plurality of audio parts including one or more
intro
parts suitable as introductions to an audio selection, one or more outro parts
suitable as endings to the audio selection, and optionally one or more main
parts,
marker data associated with each audio part indicative of a musical structure
type
and optionally one or more part properties, one or more audio selections
associated with the at least one musical piece, wherein each audio selection
is
comprised of a plurality of the intro parts having distinct durations, one or
more of
the outro parts having distinct durations, and optionally one or more main
parts
having distinct durations and suitable for use in the audio selection between
a
particular intro part and a particular outro part, and one or more templates
associated with the one or more audio selections, each template comprised of a
style identifier and text string representing a preferred ordering of one or
more of
the audio parts; receiving a musical style choice; receiving a musical piece
choice
from among the one or more musical pieces in the data storage library that
conform to the chosen style; identifying one or more audio selections of the
one or
more audio selections in the data storage library that conform to the chosen
style
and that have associated duration ranges encompassing a target duration;
displaying the identified one or more audio selections; receiving an audio
selection
choice from among the identified one or more audio selections that are
displayed;
and automatically composing a sequence of audio parts conforming to the chosen
audio selection, the sequence derived from the template associated with the
chosen audio selection, the sequence including an intro part, an outro part,
and
optionally one or more main parts in a preferred ordering and further having a
duration substantially matching the target duration.
According to still another aspect of the present invention, there is
provided a computer program product, comprising: a computer readable medium;
computer program instructions stored on the computer readable medium that,
when executed by a computer, cause the computer to perform a method for
2a
CA 02620483 2011-01-26
77787-105
automatically generating a composed audio sequence of a prescribed duration,
the composed audio sequence associated with a musical piece, the method
comprising: providing a data storage library populated with a plurality of
audio
parts having predetermined durations and associated with at least one musical
piece, the plurality of audio parts including one or more intro parts suitable
as
introductions to an audio selection, one or more outro parts suitable as
endings to
the audio selection, and optionally one or more main parts, marker data
associated with each audio part indicative of a musical structure type and
optionally one or more part properties, one or more audio selections
associated
with the at least one musical piece, wherein each audio selection is comprised
of
a plurality of the intro parts having distinct durations, one or more of the
outro
parts having distinct durations, and optionally one or more main parts having
distinct durations and suitable for use in the audio selection between a
particular
intro part and a particular outro part, and one or more templates associated
with
the one or more audio selections, each template comprised of a style
identifier
and text string representing a preferred ordering of one or more of the audio
parts;
receiving a musical style choice; receiving a musical piece choice from among
the
one or more musical pieces in the data storage library that conform to the
chosen
style; identifying in a list one or more audio selections that of the one of
more
audio selections in the data storage library conform to the chosen style and
that
have associated duration ranges encompassing a target duration; displaying the
identified one or more audio selections; receiving an audio selection choice
from
among the identified one or more audio selections that are displayed; and
automatically composing a sequence of audio parts conforming to the chosen
audio selection, the sequence defined by the template associated with the
chosen
audio selection, the sequence including an intro part, an outro part, and
optionally
one or more main parts in a preferred ordering and further having a duration
substantially matching the target duration.
2b
CA 02620483 2011-01-26
77787-105
[00091 A system and method is provided for automatically generating a sequence
of audio data
for producing an audio output sequence that has a prescribed duration and that
is based upon a
chosen musical piece such as, for example, a desired song.
1000101 The audio output sequence is automatically composed on a block-by-
block basis,
utilizing marked audio blocks (also referred to herein as audio "parts")
associated with the
chosen musical piece and stored in a data storage library. A list of musical
selection options is
generated that satisfy a chosen musical style and a prescribed target
duration. In response to a
choice of one of the musical selections, the audio sequence is automatically
composed. The
composed audio sequence typically extends over a plurality of musical time
units, its duration
matching the target duration. The target duration may be prescribed by
directly entering a time
value into the system, or may be inherited from a video sequence with which
the audio sequence
is associated. In the latter embodiment, the system may be used in conjunction
with a video
editing application, wherein editing the length of the video sequence results
in a dynamic
adjustment of the available musical selections from which a user may choose to
accompany the
edited video sequence. Once a user chooses a musical selection, the audio
sequence is
automatically generated and previewed for the user through a playback device.
If satisfactory to
the user, the audio output sequence may be stored and/or placed in a timeline
synchronized with
the edited video sequence.
[000111 An audio output sequence of practically any prescribed length may be
automatically
generated, with or without looping, while preserving the thematic musical
integrity of the chosen
musical selection. To this end, the data storage library is populated with
data files including
MIDI data and associated metadata used by the system to identify a list of
available audio
selections meeting the genre and target duration criteria, and to generate one
or more audio
output sequences. Authors add editorial annotations to ("mark up") the data
files that facilitate
the subsequent automatic generation of the audio output sequence, using a
syntax that defines
musical structure and functionality of small segments of audio data, referred
to herein as "audio
2c
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
parts". Audio parts may be comprised of single segments, or longer data
structures comprised of
a plurality of audio data segments with unique identification information.
[0010] Accordingly, in one aspect, a system is provided for populating the
data storage library
with musical selections associated with a musical piece, such as a song. The
system includes
means for permitting an author to define a plurality of audio parts of desired
durations that are
associated with a musical piece, to assign marker data to each audio part
indicative of a musical
structure type and optionally one or more part properties, and to define one
or more audio
"selections" of the musical piece and one or more "templates" associate with
the musical piece.
Each audio selection is comprised of a plurality of "intro" parts, one or more
"outro' parts, and
optionally one or more "main" parts suitable for ordering in the audio
selection between a
particular intro part and an outro part. Each template is comprised of a
unique template type
identifier and a text string representing a preferred ordering of the audio
parts. The system
further comprises means for exporting the audio parts, marker data,
selections, templates and
MIDI data associated with the audio parts to the data storage library. A
detailed description of an
exemplary syntax useful in the system is provided below.
[0011] In another embodiment, there is provided a method and a computer
program product
for using a computer system to perform the actions described above.
[0012] In another aspect, a system is provided for automatically generating
the composed
audio output sequence having the prescribed duration. This system includes the
data storage
library already populated with exported audio parts, marker data, selections,
templates and
associated MIDI data associated with one or more musical pieces or songs. The
system further
comprises a user interface permitting a user to prescribe a target duration,
and to choose a
musical style, a musical piece from among the musical pieces (or songs)
available in the data
storage library that conform to the chosen style, and an audio selection from
among the audio
selections in the data storage library that conform to the chosen style and
that have an available
range of durations that accommodate or encompass the target duration. The user
interface also
permits the user to preview the chosen audio selection. The system includes
means for
identifying (via listing in the user interface) the audio selections
conforming to the chosen style
3
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
and target duration criteria. The system optionally, but preferably, adjusts
this list dynamically in
response to changes in the target duration. For example, if the system is
being used in
conjunction with a video editing application such as, for example, Pinnacle
Studio, the length of
an edited video sequence may change numerous times, which may result in the
addition or
deletion of musical sequences from the list of available sequences, if a
greater or lesser number
of musical selections can accommodate the target duration inherited from the
edited video
sequence. The system is further comprised of a sequence composer for
automatically composing
an output sequence of audio parts that conforms to the chosen selection,
wherein the sequence is
derived from the template associated with the chosen musical selection, and
wherein each audio
output sequence includes an intro part, an outro part, and optionally one or
more main parts in a
preferred ordering. Once a musical selection is chosen, the sequence composer
generates the
output audio sequence for preview by the user. If the author of the data
storage library entries has
properly marked the audio parts and otherwise defined the associated metadata
associated with a
musical piece, the automatic lengthening or shortening of the audio output by
the sequence
composer will not affect the musical thematic integrity of the music composed.
[0013] The automatically composed audio output sequence has a duration that
substantially
matches the target duration. The meaning of substantially matching includes an
exact match in
some embodiments. In other embodiments, the system makes fine adjustments in
the composed
sequence duration through one or more of a variety of techniques, including
adding silence data
to the composed audio output sequence of parts, trimming one or more of the
parts comprising
the output sequence, and/or globally adjusting the tempo of all of the parts
of the output
sequence upon preview rendering. Further, the sequence composer may compensate
for playback
trailing effects by targeting an output sequence duration that is slightly
shorter than the
prescribed target duration, i.e., in order to permit a natural-sounding decay
or to avoid cutting off
reverberation at the end of a piece.
[0014] In another embodiment, there is provided a method and a computer
program product
for using a computer system to perform the actions described above.
4
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
[0015] Other features and advantages of the present invention will become
readily apparent to
artisans from the following description of exemplary embodiments, taken in
conjunction with the
accompanying drawings, which illustrate, by way of example, principles of the
present invention.
100161 Brief Description of the Drawings
O0f 171 Fig. 1 is a block diagram illustrating a configuration and working
environment for an
authoring system and an automatic audio output sequence composing system in
accordance with
an embodiment of the invention.
O0f 181 Fig.2 is a simplified flow chart illustrating an authoring process in
accordance with the
present invention.
O0f 191 Fig. 3 is a computer display screen shot illustrating an exemplary
user interface for
defining audio parts, play orders and marker track data associated with a
musical piece.
001 201 Fig. 4 is another computer display screen shot of the exemplary user
interface.
O0f 211 Figs. 5A-5E illustrate the contents of an exemplary project file
including audio block
and marker data, authored with an authoring system in accordance with the
present invention.
00( 221 Fig. 6 is a simplified flow chart illustrating an exemplary automatic
audio output
sequence composing process in accordance with the present invention.
O0f 231 Fig. 7 is a computer display screen shot illustrating an exemplary
user interface for
using the automatic audio output sequence composing system.
[0024] Fig. 8 is a simplified flow chart illustrating an exemplary algorithm
for composing an
audio output sequence given a target duration and user choice of an audio
selection.
[0025] Detailed Description of the Invention
[00261 A. Overview
[0027] Referring now to Fig. 1, a block diagram is shown of a configuration
capable of
performing several functions of the invention. Within the configuration is an
authoring system
100, including an authoring interface 104 and an audio part definition and
marking system 106.
Authoring system 100 permits an author to populate a data storage library 202
with audio
"selections" associated with a musical piece such as, for example, a song. The
audio selections
are contained within project files, the complete contents of which will be
described below. Also
CA 02620483 2011-01-26
77787-105
depicted in the exemplary configuration is a composing system 200 for
automatically generating
a composed audio output sequence associated with the musical piece and having
practically any
user-prescribed duration. The primary components of composing system 200
include the data
storage library 202 populated with the audio selections, a user interface 204
and a sequence
composer 206. Composed output audio sequences are output to a playback engine
201 and mixer
203.
100281 In a preferred embodiment, the definition/marking system 106 and
sequence composer
206 are comprised of computer-executed software applications, executing on
computer
processors that would typically, but not necessarily, be integrated into
distinct computers. As
such, the following description relates to computer systems. However,
different computer
platforms or hardware-only implementations are also considered within the
scope of the
invention.
100291 The user interfaces 104, 204 are each electronically coupled to their
respective
computers for inputting choices and/or instructions to the computers. For the
purposes of clarity,
the term "author" is used to refer to a user of interface 104, through which
definition/marking
system 106 is utilized to populate data storage library 202, and the term
"user" is used to refer to
a user of interface 204, through which sequence composer 206 is utilized to
automatically
generate a composed audio sequence of a user-prescribed duration. As will be
described below,
the user may prescribe the target duration for the composed audio sequence by
directly entering a
time value through user interface 204. Alternatively, the user may indirectly
prescribe the target
duration, in configurations wherein composing system 200 is used in
conjunction with an
optional video editing system 208 such as, for example, the Pinnacle Systems
STUDIO application.
In such embodiments, an alteration of the length of an edited video clip or
sequence is
determined by sequence composer 206, and results in a corresponding, dynamic
change in the
target duration for the audio output sequence that is intended to accompany
the edited video clip-
100301 In general, authoring system 100 is utilized by the author to define
blocks of audio data
associated with a song, herein referred to as "audio parts", which in various
orderings define
musical compositions thematically related to the song, as well as metadata
associated with the
6
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
audio parts. In the description below, definition/marking system 106 is
described in the context
of a CUBASE SX 3.1 (Steinberg Media Technologies GmbH) software application
(or a later
version), and the marked up audio parts are described in the context of CUBASE
project MIDI
files employing a unique syntax (described below), while retaining
compatibility with standard
Midi files as defined by the MIDI 1.0 Spec, July 1998. However, all of the
techniques disclosed
could be implemented using any application permitting the definition and
unconstrained marking
of audio parts, and the authored projects could equally be authored as WAV
files. The only
technical distinctions, which should be readily apparent to those of skill in
the art, would be in
exporting the author-defined metadata to the data storage library 202, and in
parsing the
metadata for the purposes of automatically composing the user-defined audio
sequence(s).
[0031] Once the data storage library 202 is populated with project files
including audio parts
and artistically appropriate marker tracks (described below) associated with
one or more musical
pieces, composing system 200 permits the user to prescribe the target duration
and to choose
from among one or more musical styles, and then automatically composes an
output audio
sequence (song) based on these inputs utilizing the data storage library 202
entries. Preferably, as
the target duration is changed (either directly by the user, or through
inheritance of the video clip
length), the audio output song duration is correspondingly dynamically
changed.
100321 B. The Authorin. Environment
[0033] The process of specifying characteristics of music and sound is both
musical and
technical. This process is used to provide as much information as possible
about each piece of
music or sound so that the sequence composer 206 can make informed, musical
decisions, when
it manipulates the music according to requests from users.
[0034] Fig. 2 shows a simplified flow chart illustrating an exemplary process
250 for
populating data storage library 202 with audio selections associated with a
musical piece. In step
252, the authoring system permits an author to create a project file related
to a musical piece. In
step 254, the author uses the authoring system to define a plurality of audio
parts of desired
durations from the musical piece.
7
CA 02620483 2011-01-26
77787-105
[00351 In step 256, the author adds a marker track to the project file, and
assigns marker data
to each audio part indicative of a musical structure type of the audio part,
and optionally one or
more properties that the part possesses. Exemplary musical structure types
include verses,
choruses, bridges, ends, intros, outros, headers, sections, versions, effects
and silence. Exemplary
part properties include the suitability of the audio part to be trimmed, faded
out, looped, used as a
step-in block, and/or used as a step-out block. The author further defines one
or more musically
thematic templates, each comprised of a template type identifier and a text
string representing a
preferred ordering of one or more of the audio parts, and one or more audio
selections, each
comprised of a plurality of intro parts of distinct durations, one or more
outro parts of distinct
durations, and optionally one or more main audio parts of distinct durations
suitable for use in
the audio selection between a particular intro part and a particular outro
part. Optionally, the
authoring system permits the author to associate effects parameters with the
audio parts that are
readable by a rendering engine to control mixer playback. In this manner, the
present invention
uses audio parts as constructs to contain global control data, such as a
reverb. The authoring
system also optionally permits definition of playback automation information
associated with
one or more of the audio parts and readable by a rendering engine so as to
control audio output
sequence playback.
[00361 Finally in step 258, once the author has completed defining and
annotating, the
authoring system exports the plurality of audio parts, marker data, audio
selections, templates
and sound file (e.g., MIDI) data associated with the musical piece as a
project file to the data
storage library.
100371 An implementation of the authoring system 100 will now be described in
detail.
100381 Cubase SX provides an author with means for defining numerous data
structures and
associated metadata related to a musical piece. The "Cubase SX/SL 3 Operation
Manual"
describes in detail existing functionality in the application.
Only those Cubase features relevant to the exemplary embodiment of authoring
system 100 will
be discussed here. A cpr project is the basic file type in Cubase, and
typically includes the
8
CA 02620483 2011-01-26
77787-105
audio, WAV or MIDI, automation, and settings for a piece of music. The
authoring system 100
leverages Cubase's capability to permit a great deal of customization by an
author.
100391 With reference to Fig. 3, which illustrates a screen shot 210 of a
Cubase SX window,
an author creates a MIDI Project by inputting a musical piece, represented as
multiple channels
of MIDI data 212, and defining audio parts 214, i.e. audio blocks of the
musical piece, with
desired durations using a Play Order Track function. Once the audio parts are
defined, they may
be arranged (with reference to Fig. 3) in any preferred ordering in a Play
Order List 216 using a
Play Order Editor 218. Multiple play orders can be easily created and
previewed for musicality.
Audio parts should be contiguous, in order to permit calculation of each
part's duration.
Once the play orders are approved, the audio part and play order information
are annotated. With
reference to Fig. 4, the author utilizes the Cubase Marker Track function to
add a marker track
220 to the project. In the marker track 220, markers 222 are assigned to each
audio part 214
and/or groupings of audio parts (also referred to as audio parts). The marker
data (embedded
in the MIDI file) contain all the instructions needed by the sequence composer
206 to perform
its audio output sequence generation, and to alter the characteristics of the
resultant new
soundfile in various ways. The marker files as described in this example are
therefore an
"overlay" technology as opposed to an "embedded" technology, although they do
not exclude
combining the digital information file with the digital instruction file to
form a new digital
information file in an embedded' technology. The marker files do not directly
interact with or
alter the existing MIDI files in any way.
100401 In a preferred, non-limiting example syntax, each marker 222 is a
Cubase Cycle Marker
with a text string of the form,
MnnApart pro pertiesA(marker name)A(templateJ
where,
M represents a musical structure type,
nn is a unique identifier (e.g., a number between 1 and 99),
part properties is an optional indication of one or more properties of the
audio part,
marker name is an optional marker name text string (enclosed in (} ),
9
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
template is an optional text string used in longer audio parts comprised of a
listing of
selected other audio parts, and represents a preferred ordering of the other
audio parts,
and
A represents a space in the marker text string.
[0041] Table One illustrates preferred, non-limiting examples of a variety of
possible musical
structure types and corresponding part properties assignable to an audio part.
Type Meaning Comment Part Properties
A Verse start of verse. U,L,S,R
B Chorus start of chorus. U,L,S,R
C Bridge start of bridge. U,L,S,R
E End any parts after this marker will be ignored
I Intro start of variant n of intro. Optional {} and []. T,S
0 Outro start of variant n of outro. U,F,R
H Header style template. Optional II and [].
S Section any contiguous part with Step-in and Step-out
V Version start of new version.
X Effect global FX parameter settings 0 - 100
unnamed Silence will be ignored.
Table One
[0042] In this exemplary syntax, intro part types indicate that the audio part
is suitable as an
introduction to an audio selection, and outro part types are indicative that
the audio part is
suitable as an ending of an audio selection. Intros are comprised of single
audio parts, of which
some, referred to as "extended" intros, are typically long enough (e.g., 15 to
30 seconds) to
create musical variations with a distinct "feel" or theme, particularly when
combined with other
carefully selected audio parts. Defining multiple intros and outros provides a
mechanism by
which sequence composer 206 can make small adjustments in the duration of
composed audio
output sequences. Intro parts and outro parts, taken together, should provide
durations in the
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
range of 1 to 7 musical bars. For songs at slow tempo (e.g., say < 100 bpm)
extra short intro
parts of 0.5 bars are helpful. An intro part may optionally be designated as
supporting a crash
start (trimmable), and outro parts may be allowed to fade out (fadeable).
[00431 As the term is used herein, "main" parts are suitable for use in an
audio selection
between a particular intro part and a particular outro part, such as verses,
choruses and bridges.
100441 A "version" is a self-contained play order of audio parts that has
multiple short intros
and outros of different lengths, and that uses as many of the audio parts as
possible (but
preferably not extended intro parts) for a typical play duration of
approximately 3-4 minutes.
Extended length intros provide the primary musical themes for generating
interesting variations
when combined with selected audio parts. Each song project file typically
contains multiple
versions that can be dramatically different even though broadly complying with
a specified
musical style. It is suggested that within each version there should be at
least three (3) extended
intros, each having its own name and template. With reference to Fig. 4, in
Version 1 232 of the
song Blue Wednesday, 11-14 represent short intro parts, 15-17 represent
extended intro parts. 15 is
also used as an extended intro for the variation "Should've Left" 229.
Medleys, such as Blue
Wednesday Medley 234, permit the generation of more complex output songs, and
can extend
for significant durations (e.g., 12 minutes or longer.) Each extended intro,
version and medley is
defined to include a template (such as, for example, template 228C), which is
a text string
representing a preferred ordering of audio parts. When automatically composing
output audio
sequences, sequence composer 206 will adhere as closely as possible to the
ordering represented
in a selected template, and will only compose new orderings when no template
is associated with
a user's chosen audio selection.
[00451 In the exemplary syntax, the optional audio part properties may include
one or more of-
U Upbeat - signifying that the audio part includes an additional 1 beat from
the
preceding audio part;
L Loopable - signifying that the audio part can be looped;
T Trimmable - signifying that, if necessary, the audio part can be crash
started up to
4 beats in (only applies to intro parts);
11
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
F Fadeable - signifying that the audio part can be optionally faded out (only
applies
to outro parts);
S Step-in - signifying that the audio part represents a preferable audio part
to segue
after an intro part or prior section part; and
R Step-out - signifying that the audio part represents a preferable audio part
to
segue before another section part or outro part.
[0046] With reference again to Fig. 4, some examples of markers will be
described. For
example, audio part 226 represents a simple intro 13 with no optionally
assigned part properties.
Audio part 228 also has no optionally assigned part properties, but represents
an extended intro
part, including an "Intro" structure type 228A, a unique numerical identifier
("7"), a marker
name 228B ("Fallen Breath"), and a template 228C comprised of a text string
indicating a
preferred ordering of audio blocks ("I7B1B2B3I7B1B2B3I7B1B2B303"). Audio part
230
represents a verse (a type of main audio part) audio part that has been
identified as suitable for
stepping-in ("S") or stepping-out ("R"). As the term is used herein, an audio
"selection" refers to
musical data structures such as, for example, version V 1 232 and medley H 234
of "Blue
Wednesday", that each are comprised of a plurality of intro parts having
distinct durations, one
or more outro parts having distinct durations, and optionally (but preferably)
one or more main
audio parts of distinct durations.
[0047] The authoring system 100 permits the composing skills of the author to
be captured by
defining audio parts and templates in a manner that preserves thematic musical
integrity. The
sequence composer 206 automatically generates audio output sequences (songs)
from the
templates. If the templates are appropriately defined, the duration of the
composed output
sequences can be dynamically adjusted to accommodate longer or shorter target
durations while
preserving the musical theme. The author creates audio parts of the various
types that have
different lengths, or durations, that can be edited and arranged by sequence
composer 206 (as
described below) in multiple ways to obtain output audio sequences matching
user choices and
prescribed target durations.
12
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
[00481 Once the author is done annotating the audio parts and defining
templates for a musical
piece, the marked-up MIDI file is exported from Cubase to the data storage
library. Figs. 5A-5F
illustrate a text conversion of the binary content of the MIDI file for the
"Blue Wednesday" song
as partially shown in Fig. 4. In particular, Fig. 5A through a portion of Fig.
5D illustrate the
marker track data associated with each of the audio parts, versions and medley
associated with
the song "Blue Wednesday". The remainder of Fig. 5D, and Figs. 5E-5F,
illustrate the
associated Event Track data, including instrument and playback control
information. The
exported file automatically includes certain information, such as the project
name, tempo and
time signature.
[00491 It should be readily apparent to artisans that the syntax described
above represents
merely a starting point for audio part marking. For example, additional
metadata could be
assigned to the audio parts that is readable by the playback engine 201 so as
to control audio
output playback. For example, dynamic changes in volume and pan automation are
definable
and stored with the MIDI Event Track data, as shown in Figs. 5D-5F. There are
many other
parameters that can be automated with this scheme, e.g. any of the parameters
represented by
127 MIDI Controller Codes.
[00501 C. Automatic Song Generation
[00511 A description of the operation of composing system 200 will now be
provided with
reference to Figs.6-8.
[00521 Fig. 6 shows a simplified flow chart illustrating an exemplary process
350 for
automatically generating an audio output sequence associated with a musical
piece (e.g., a song)
and having a prescribed duration. A number of the steps (351, 354 and 358)
described preferably
occur in the full usage of the composing system 200, but are optional in that
the input values
resulting from performance of these steps could be gathered without prompting
a user of the
composing system to input the values. In step 351, a target duration is
prescribed. As noted
above, the target duration could be directly entered by the user, or inherited
from an edited video
clip or sequence. Data storage library 202 is made available to sequence
composer 206 (step
352), which displays a list of available musical style choices in the library
202.
13
CA 02620483 2011-01-26
77787-105
100531 Sequence composer 206 then receives (step 356) a musical style choice
from the user,
and in turn displays (step 358) a list of available musical pieces (or songs)
in the library that
conform to the chosen style. Sequence composer 206 then receives (step 360) a
musical piece
choice from the user. At this point, sequence composer 206 identifies (step
362) and displays
(step 364) one or more audio selections in the library that conform to the
chosen style and that
have associated duration ranges encompassing the prescribed target duration,
preferably by
calculating the duration ranges of the audio selections conforming to the
chosen style, and
comparing those ranges to the target duration. In preferred embodiments, the
list of available
selections is dynamically adjusted in accordance with changes in the target
duration.
[00541 In step 366, sequence composer 206 receives the user's choice of
preferred audio
selection, and in response, automatically generates an audio output sequence
368 having a duration
substantially matching the target duration from the template data associated
with the chosen
audio selection. The resultant audio output sequence is then output (step 370)
for preview by the
user, and/or written to an audio track in a timeline associated with the
edited video sequence.
[0055) Fig. 7 illustrates an exemplary user interface 204 presented to the
user of the
composing system. As noted above, in preferred embodiments, user interface 204
is called up as
an adjunct routine of a video editing system, at a point when the user is
satisfied with the edited
video clip and when musical accompaniment for the edited video clip is
desired. One exemplary
video editing system with which composing system may be employed is Pinnacle
Systems
Studio 10. The target duration for the output audio sequence is, thus,
inherited from the edited
video clip. In other embodiments, the target duration may be directly entered
by the user in target
duration field 246.
[0056 User interface 204 presents three window panes to the user, style pane
240, song pane
242, and version pane 244. Sequence composer 206 retrieves the information
necessary to
populate these panes with musical choices for the user from the marked up
project file entries
available in data storage library 202.
14
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
[0057] In operation, the user chooses a musical style from among the musical
style choices 241
listed in style pane 240, and in response, sequence composer 206 lists the
available musical
piece(s) 243, or songs, that meet the chosen style. A XML file details the
contents of the data
storage library 202 and provides additional information about each song
including author name,
tempo (bpm), musical style, and a 25 word description. This information is
displayed in the UI as
a "Tool Tip", i.e. when the user's mouse hovers over a song title, the
appropriate information is
briefly shown. This XML file is the single method for maintaining, expanding
or deleting access
to items from the library. This XML file is also used by the Sequence composer
206 to identify
the styles of the musical pieces 243 available in the data storage library
202. Sequence
composer then identifies and lists in version pane 244 one or more audio
selections 248 that
conform to the chosen style and that can accommodate the prescribed target
duration. Each of the
audio selections 248, as noted above, includes a number of intro parts, outro
parts and optionally
one or more main parts. These various parts preferably have a variety of
distinct durations, which
permits sequence composer 206 to generate, for each audio selection, audio
output sequences in
a range of durations. The list of available audio selections 248 is compiled
by sequence
composer by identifying the audio selections conforming to the chosen style
and that have
respective duration ranges supporting or encompassing the prescribed target
duration. In
preferred embodiments, changes to the target durations (e.g., lengthening or
shortening resulting
from further editing of the associated video clip) result in dynamic
adjustments to (e.g., pruning
of) the list of available musical pieces 243 and associated audio selections
248 presented to the
user.
[0058] Sequence composer 206 re-creates the audio parts and play orders by
scanning the
templates in the stored project files, and calculating the start, end and
duration of each part from
adjacent markers in the associated marker track for the project. If a template
is not available for a
version, variation or medley, sequence composer infers an audio part sequence
from the
available audio parts..
[0059] When the user chooses one of the audio selections 248, sequence
composer 206
automatically generates an audio output sequence, for audio preview by the
user, which
substantially matches the prescribed target duration. As noted above, the
audio output sequence
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
includes an intro part, an outro part, and optionally one or more main parts
in an ordering that
conforms as closely as possible to the template of the chosen audio selection.
As used herein, the
meaning of terms "substantially matching" may include an exact match between
the
automatically generated output sequence and the target duration. However, in
alternative
embodiments, the sequence composer 206 performs additional editing steps
(described below)
that further refine the duration of the output sequence.
[0060] In general, sequence composer 206 scans the template associated with a
chosen audio
selection 248 for long, contiguous, audio part sequences marked with step-in
and step-out
properties. That is, it tries to minimize the number of edits to be performed,
and identifies
explicit indications that it is acceptable, from a musical aesthetic
perspective, to edit sequences
when necessary to do so. Another objective is to generate the output sequence
without using
looping techniques. The contiguous sequences can comprise explicitly marked
Section types, or
they can be assembled from contiguous audio parts of various types. A
resulting main section is
then combined with a suitable intro part and outro part to derive the output
sequence having a
duration substantially matching the prescribed target duration. In one
exemplary embodiment,
sequence composer 206 generates an output sequence having an approximate
duration within
+3% of the target duration, and then performs a final tempo adjustment to
refine the output
sequence duration to exactly match the target duration. In other embodiments,
sequence
composer employs further output sequence duration refining techniques, such as
adding silence
data at the end of a composed sequence and/or trimming one or more of the
audio parts
comprising the sequence. In another embodiment, the sequence composer composes
output
sequences slightly shorter than the target duration in order to provide a
natural sounding decay at
the end of the piece.
[0061] Once the audio output sequence is generated it is rendered (using
playback engine 201
and mixer 203) for preview. When used in conjunction with a video editing
system, the audio
output sequence may be placed on a music track of a timeline associated with
the edited video
clip.
16
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
[0062] One exemplary algorithm 600 executed by sequence composer 206 for the
automatic
audio output sequence generation will now be described with reference to Fig.
8. It will be
appreciated by those of skill in the art that alternative algorithms could be
utilized, and/or a
selection among distinct algorithms could be made based upon the target
duration, user audio
selection choice and the range of durations supported by the chosen audio
selection.
[0063] An exemplary algorithm utilizes four "composer" routines, which work in
similar ways,
but each having a unique capability:
Stinger or ShortSong Composer.
Variation Composer
Version Composer
Medley Composer.
[0064] In general, variation composer algorithm 600 looks for a section to
remove from the
template associated with the chose audio selection, while preserving the
longest possible outro
and intro parts. For very long target durations, the main body may be looped
in its entirety.
[0065] For a given selection, step 601 uses the template to find all the parts
that may be used
by the composer. The extended intro is located first since this anchors the
theme of the variation.
In step 602 , durations are calculated for the extended intro, the mean value
of the available
outros, and in step 604 the maximum length of the main section. In 603, the
target length is
adjusted by the known extended intro, the computed mean outro and a trail
buffer of 1 second.
Two competing techniques are used to compose a main section of a length
matching the target
duration, the first is a contiguous sequence 606, the other is allowed to be
non-contiguous 605.
The contiguous method 606 looks for a section to remove such that what
remains, when spliced
together is the correct length. Thus there is only one edit, or departure from
the composers
intention. Where possible the edit will be done close to the middle of the
main section so that the
intros and outros are not influenced. The step-in and step-out flags may be
optionally used for
the best result. The non-contiguous method 605 scans all parts and allows any
combination that
best approximates the target length. Again, the step-in and step-out flags may
be optionally used
for the best result. In 607, the number of parts created in the non-contiguous
sequence is totaled
17
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
and if small, the method will override the contiguous result. Normally, the
priority is to preserve
musicality but for certain short sequences, for example less than 2, it is
unlikely that a
sufficiently close match can be found. The computed main section can now be
added to the
output sequence 608,609, and the final adjustments made. In 610, an outro is
chosen that best
matches the remaining time difference between the target and actual sequences.
In 611, an
optional pad/trim tool can add or remove content (described below). Finally, a
tempo adjustment
of +/-3% may be used, an amount typically not noticed when comparing different
rendered
sequences, 612.
[0066] For short, compositions, a Pad/Trim Tool is used, as described below.
[0067] The Version Composer uses similar processing but expands the content
used by the
composer as the target duration is increased. The selection is typically
locked while a user
makes duration changes and so it is necessary to add additional parts to the
template used by the
composer while preserving the existing parts that identify the theme of the
selection. For
example, if the initial choice is a Variation and the target duration is
extended, at some point the
Version composer will kick in to add content and variety. The composition will
start with Parts
from the selected Variation to retain the theme, then segue to additional
parts from the owner
version, then segue back to parts from the original variation . Similarly, if
expanded further, the
Medley Composer will kick in. The composition will start with the Selection,
work through the
Version Parts and then add content and variety form the Parts in the Medley.
The thresholds at
which the different composers are used are determined by a scope calculation
for each song. This
estimates the minimum and maximum durations that can be readily implemented
with the parts
of each template.
[0068] The Stinger Composer uses similar processing, but creates its own
template based on
all the available intros and outros belonging to the version. To meet a
specific target length
intros/outro pairs or outro-only sequences are created. Because the
compositions are so short the
pad/trim tool is essential.
18
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
[0069] The pad/trim tool provides the ability to add or delete content in
order to achieve a
desired target duration. A sequence is lengthened by padding, i.e. adding
silence after an outro.
On playback, the MIDI synthesizer will typically fill the silence with the
reverberant decay of the
notes played, usually with very acceptable results. A sequence can be
shortened in at least two
ways: (1) by applying a crash start to an Intro, i.e. starting playback at an
initial offset into the
Part (if an intro has the trimmable property, an offset of between 1 and 4
beats may be applied;
and (2) if an outro has the fadeable property, it can be shortened by applying
a fade out.
[0070] The various components of the systems described herein may be
implemented as a
computer program (including plug-in) using a general-purpose computer system.
Such a
computer system typically includes a main unit connected to both an output
device that displays
information to a user and an input device that receives input from a user. The
main unit
generally includes a processor connected to a memory system via an
interconnection mechanism.
The input device and output device also are connected to the processor and
memory system via
the interconnection mechanism.
[0071] One or more output devices may be connected to the computer system.
Example output
devices include, but are not limited to, a cathode ray tube (CRT) display,
liquid crystal displays
(LCD) and other video output devices, printers, communication devices such as
a modem, and
storage devices such as disk or tape. One or more input devices may be
connected to the
computer system. Example input devices include, but are not limited to, a
keyboard, keypad,
track ball, mouse, pen and tablet, communication device, and data input
devices. The invention
is not limited to the particular input or output devices used in combination
with the computer
system or to those described herein.
[0072] The computer system may be a general purpose computer system which is
programmable using a computer programming language, a scripting language or
even assembly
language. The computer system may also be specially programmed, special
purpose hardware.
In a general-purpose computer system, the processor is typically a
commercially available
processor. The general-purpose computer also typically has an operating
system, which controls
the execution of other computer programs and provides scheduling, debugging,
input/output
19
CA 02620483 2008-02-07
Express Mailing Label No. EQ 967995431 US
Atty. Docket No. A2007001(2)
control, accounting, compilation, storage assignment, data management and
memory
management, and communication control and related services.
[0073] A memory system typically includes a computer readable medium. The
medium may
be volatile or nonvolatile, writeable or nonwriteable, and/or rewriteable or
not rewriteable. A
memory system stores data typically in binary form. Such data may define an
application
program to be executed by the microprocessor, or information stored on the
disk to be processed
by the application program. The invention is not limited to a particular
memory system.
[0074] The components of a system in accordance with the present invention may
also be
implemented in hardware or firmware, or a combination of hardware, firmware
and software.
The various elements of the system, either individually or in combination may
be implemented
as one or more computer program products in which computer program
instructions are stored on
a computer readable medium for execution by a computer. Various steps of a
process may be
performed by a computer executing such computer program instructions. The
computer system
may be a multiprocessor computer system or may include multiple computers
connected over a
computer network. The components shown in Fig. 1 may be separate modules of a
computer
program, or may be separate computer programs, which may be operable on
separate computers.
The data produced by these components may be stored in a memory system or
transmitted
between computer systems.
[0075] Having now described an example embodiment, it should be apparent to
those skilled
in the art that the foregoing is merely illustrative and not limiting, having
been presented by way
of example only. Numerous modifications and other embodiments are within the
scope of one of
ordinary skill in the art and are contemplated as falling within the scope of
the invention.