Language selection

Search

Patent 2686264 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 2686264
(54) English Title: OPTIMIZATION OF MP3 ENCODING WITH COMPLETE DECODER COMPATIBILITY
(54) French Title: OPTIMISATION DE CODAGE MP3 AVEC COMPLETE COMPATIBILITE DU DECODEUR
Status: Granted and Issued
Bibliographic Data
(51) International Patent Classification (IPC):
  • G10L 19/032 (2013.01)
(72) Inventors :
  • WU, GUIXING (Canada)
  • YANG, EN-HUI (Canada)
(73) Owners :
  • BLACKBERRY LIMITED
(71) Applicants :
  • BLACKBERRY LIMITED (Canada)
(74) Agent: ROWAND LLP
(74) Associate agent:
(45) Issued: 2015-01-27
(22) Filed Date: 2009-11-24
(41) Open to Public Inspection: 2010-06-01
Examination requested: 2009-11-24
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
08170396.9 (European Patent Office (EPO)) 2008-12-01

Abstracts

English Abstract

An iterative rate-distortion optimization algorithm for MPEG I/II Layer-3 (MP3) encoding based on the method of Lagrangian multipliers. Generally, an iterative method is performed such that a global quantization step size is determined while scale factors are fixed, and thereafter the scale factors are determined while the global quantization step size is fixed. This is repeated until a calculated rate-distortion cost is within a predetermined threshold. The methods are demonstrated to be computationally efficient and the resulting bit stream is fully standard compatible.


French Abstract

Un algorithme itératif d'optimisation de débit-distorsion de codage de la couche 3 MPEGI/II (MP3) est fondé sur la méthode des multiplicateurs lagrangiens. Généralement, un procédé itératif est exécuté de sorte qu'une taille d'étape de quantification globale est déterminée pendant que les facteurs d'échelle sont fixés, puis les facteurs d'échelle sont déterminés pendant que la taille de l'étape de quantification globale est fixée. Ce procédé est répété jusqu'à ce que le coût calculé du rapport débit-distorsion respecte un seuil prédéterminé. Les procédés ont prouvé leur efficacité en termes de calculs et le flux de bits résultant respecte parfaitement la norme.

Claims

Note: Claims are shown in the official language in which they were submitted.


26
1. A method for optimizing audio encoding of a source sequence, the
encoding being
dependent on quantization factors, the quantization factors including a global
quantization step size and scale factors, the method comprising:
initializing fixed values of the scale factors; and
determining, using a processor, values of the quantization factors which
minimize
a cost function by iteratively performing:
determining, for the fixed values of the scale factors, a value of the global
quantization step size which minimizes the cost function of the encoding of
the source sequence, the cost function being dependent on the quantization
factors;
fixing the determined value of the global quantization step size and
determining values of scale factors which minimize the cost function, and
fixing the determined values of the scale factors, and
determining whether the cost function is below a predetermined threshold, and
if so ending the iteratively performing,
wherein the scale factors are constrained within a bit length, wherein the bit
length is a first bit length for a first group of scale factor bands, and
wherein
the bit length is a second bit length for a second group of scale factor
bands.
2. The method claimed in claim 1, wherein the cost function is based on a
distortion of the
encoding of the source sequence.
3. The method claimed in claim 2, wherein the cost function is further
based on a rate, said
rate being a transmission bit rate of the encoding of the source sequence.
4. The method claimed in claim 3, wherein the cost function is further
based on a tradeoff
function that represents a tradeoff of the rate for distortion.
5. The method claimed in claim 4, further comprising obtaining the
distortion from a pre-
generated table.
6. The method claimed in claim 4, wherein the tradeoff function includes
.lambda., the method
further comprising:

27
calculating .lambda. as:
<IMG>
wherein PE is Perceptual Entropy of an encoded frame, R is the rate, M is a
number of audio samples to be encoded, and c1, c2 and c3 are constants; and
calculating the cost function using .lambda..
7. The method claimed in claim 1, wherein the step of determining the value
of the global
quantization step size includes differentially calculating the cost function
with respect to
global quantization step size to determine the global quantization step size
which
minimizes the cost function.
8. The method claimed in claim 1, wherein the determining of the value of
global
quantization step size includes calculating:
<IMG>
wherein xr i is the source sequence, scale_factor[sb] is a quantization step
size for
scale factor band sb, l[sb] and l[sb+1]-1 are start and end positions for
scale
factor band sb respectively, w[sb] is an inverse of the masking threshold for
scale factor band sb, and y i is a quantized spectral coefficient of the
source
sequence.

28
9. The method claimed in claim 1, wherein the scale factors include a
parameter scalefac
being a scale factor for a particular scale factor band, the method further
comprising:
calculating a value of scalefac which minimizes the cost function and
constraining scalefac to within the bit length.
10. The method claimed in claim 9, wherein the step of calculating the
value of scalefac
includes differentially calculating the cost function with respect to scalefac
to determine
the value of scalefac which minimizes the cost function.
11. The method claimed in claim 9, wherein the step of calculating the
value of scalefac
includes calculating:
<IMG>
wherein xr, is the source sequence, 1[sb] and 1[sb+ 1] -1 are start and end
positions
for scale factor band sb respectively and y is a quantized spectral
coefficient
of the source sequence.
12. The method claimed in claim 1, wherein the scale factors include a high
frequency
amplification parameter.
13. The method claimed in claim 1, wherein the audio encoding is MPEG I/II
Layer-3
encoding.
14. The method claimed in claim 1, wherein the encoding is further
dependent on quantized
spectral coefficients, Huffman codebooks, and Huffman coding region partition,
the
method further including minimizing the cost function with respect to the
quantized
spectral coefficients, the Huffman codebooks, and the Huffman coding region
partition.
15. An encoder for optimizing audio encoding of a source sequence, the
audio encoding
being dependent on quantization factors, the quantization factors including a
global
quantization step size and scale factors, the encoder comprising:
a controller;

29
a memory accessible by the controller, a cost function of the encoding of the
source sequence stored in memory, the cost function being dependent on the
quantization factors; and
a predetermined threshold of the cost function stored in the memory,
wherein the controller is configured to:
access the cost function and predetermined threshold from memory,
initialize fixed values of the scale factors, and
determine values of the quantization factors which minimize the cost function
by iteratively performing:
determining, for the fixed values of the scale factors, a value of the global
quantization step size which minimizes the cost function,
fixing the determined value of the global quantization step size and
determining values of scale factors which minimize the cost function, and
fixing the determined values of the scale factors, and
determining whether the cost function is below the predetermined
threshold, and if so ending the iteratively performing,
wherein the scale factors are constrained within a bit length, wherein the
bit length is a first bit length for a first group of scale factor bands, and
wherein the bit length is a second bit length for a second group of scale
factor bands.

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 02686264 2009-11-24
32737-CA-PAT - 1 -
OPTIMIZATION OF MP3 ENCODING WITH COMPLETE DECODER
COMPATIBILITY
FIELD
[0001] Example embodiments herein relate to audio signal encoding,
and in particular to rate-distortion optimization for MP3 encoding.
BACKGROUD
[0002] Many compression standards have been developed and evolved
for the efficient use of storage and/or transmission resources. Among these
standards is the audio coding scheme MPEG I/II Layer-3 (conventionally
referred to as "MP3"), which has been a popular audio coding method since
its inception in 1991. MP3 has greatly facilitated the storage and access of
audio files. MP3 is now widely used in the Internet, portable audio devices
and wireless communications.
[0003] An example MP3 encoder is LAME, which refers to "LAME Ain't
an Mp3 Encoder", as is known in the art. Another MP3 encoder is ISO
reference codec, which is based on the ISO standard. Generally, such MP3
encoders include use of two nested loop search (TNLS) algorithms, which are
computationally complex and may not be guaranteed to converge. These
encoders may be configured or operated to provide for additional
functionality and customization.
[0004] Generally, although the encoding algorithm is not standardized
in MP3, the basic structure and syntax-related tools are fixed so that the MP3
encoded/compressed bitstreams can be correctly decoded by any standard
compatible decoder. However, there may be opportunities to manipulate the
encoding algorithm while maintaining full decoder compatibility.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] Reference will now be made, by way of example, to the
accompanying drawings which show example embodiments of the present
application, and in which:

CA 02686264 2009-11-24
32737-CA-PAT - 2 -
[0006] Figure 1 shows an MP3 encoding process to which example
embodiments may be applied;
[0007] Figure 2 shows a flow diagram of an optimization process in
accordance with an example embodiment;
[0008] Figure 3 shows a graph of an optimal path search algorithm for
use in the process of Figure 2;
[0009] Figure 4 shows the graph of Figure 3, illustrating an optimal
path;
[0010] Figure 5 shows a flow diagram of a process to be used in the
optimization process of Figure 2;
[0011] Figure 6 shows a graph of performance characteristics of an
example embodiment, for encoding of audio file waltz.wav as compared to
ISO reference codec;
[0012] Figure 7 shows a graph of performance characteristics of an
example embodiment, for encoding of audio file waltz.way as compared to
LAME;
[0013] Figure 8 shows a graph of performance characteristics of an
example embodiment, for encoding of audio file vioin.way as compared to
ISO reference codec;
[0014] Figure 9 shows a graph of performance characteristics of an
example embodiment, for encoding of audio file violin.way as compared to
LAME; and
[0015] Figure 10 shows an encoder for optimizing encoding
performance of MP3 in accordance with an example embodiment.
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0016] It would be advantageous to provide an iterative optimization
algorithm to jointly optimize quantized coefficient sequences, quantization
factors, Huffman coding and Huffman coding region partition for MP3
encoding.

CA 02686264 2009-11-24
32737-CA-PAT - 3 -
[0017] It would be advantageous to provide for efficient optimization of
quantization factors.
[0018] In one aspect, the present application provides a method for
optimizing audio encoding of a source sequence, the encoding being
dependent on quantization factors, the quantization factors including a global
quantization step size and scale factors. The method includes defining a cost
function of the encoding of the source sequence, the cost function being
dependent on the quantization factors. The method includes initializing fixed
values of the scale factors; and determining values of the quantization
factors which minimize the cost function by iteratively performing:
[0019] determining, for the fixed values of the scale factors, a value of
the global quantization step size which minimizes the cost function,
[0020] fixing the determined value of the global quantization step size
and determining values of scale factors which minimize the cost function, and
fixing the determined values of the scale factors, and
[0021] determining whether the cost function is below a predetermined
threshold, and if so ending the iteratively performing.
[0022] In another aspect, the present application provides a method for
optimizing audio encoding of a source sequence based on minimizing of a
cost function, the cost function being a function of quantization distortion
and
encoding bit rate, the cost function including A as a function that represents
the tradeoff of encoding bit rate for quantization distortion, the method
comprising calculating A as the function
R
C1 In 1 0
A = ___________________________
X1o(c,PE-c3101 Al
10A4
wherein PE is Perceptual Entropy of an encoded frame, R is an encoding bit
rate, M is the number of audio samples to be encoded, and ch c2 and c3 are
constants; and calculating the cost function using A.
[0023] In another aspect, the present application provides an encoder
for optimizing audio encoding of a source sequence, the audio encoding being
dependent on quantization factors, the quantization factors including a global
quantization step size and scale factors. The encoder includes a controller, a

CA 02686264 2009-11-24
32737-CA-PAT - 4 -
memory accessible by the controller, a cost function of an encoding of the
source sequence stored in memory, the cost function being dependent on the
quantization factors; and a predetermined threshold of the cost function
stored in the memory. The controller is configured to access the cost
function and predetermined threshold from memory, initialize fixed values of
the scale factors, and determine values of the quantization factors which
minimize the cost function by iteratively performing:
determining, for the fixed values of the scale factors, a value of the
global quantization step size which minimizes the cost function,
fixing the determined value of the global quantization step size and
determining values of scale factors which minimize the cost function,
and fixing the determined values of the scale factors, and
determining whether the cost function is below the predetermined
threshold, and if so ending the iteratively performing.
[0024] Reference is now made to Figure 1, which shows an MP3
encoding process 20 to which example embodiments may be applied.
Generally, the MP3 encoding process 20 receives digital audio input 22 and
produces a compressed or encoded output 32 in the form of a bitstream for
storage and transmission. The encoding process 20 may for example be
implemented by an encoder such as a suitably configured computing device.
In Figure 1, continuous lines denote the time or spectral domain signal flow,
and dash lines denote the control information flow. As shown, the encoding
process 20 includes audio input 22 for input to a time/frequency (T/F)
mapping module 24 and a psychoacoustic model module 26. Also shown are
a quantization and entropy coding module 28 and a frame packing module
30. The encoding process 20 results in an encoded output 32 of the audio
input 22, for example for sending to a decoder for subsequent decoding.
[0025] The audio input 22 (in time domain) are first input into the TIE
mapping module 24, which converts the audio input 22 into spectral
coefficients. The T/F mapping module 24 is composed of three steps:
pseudo-quadrature mirror filter (PQMF), windowing and modified discrete
cosine transform (MDCT), and aliasing reduction. The PQMF filterbank splits
a so-called granule (in MPEG I and II layer 3 each audio frame contains 2

CA 02686264 2009-11-24
32737-CA-PAT - 5 -
and 1 granules respectively) of 576 input audio samples into 32 equally
spaced subbands, where each subband has 18 time domain audio samples.
The 18 time domain audio samples in each subband are then combined with
their counterpart of the next frame, and processed by a sine-type window
based on psychoacoustic modeling decisions. A long window, which covers a
whole length of 36, addresses stationary audio parts. Long windowing with
MDCT afterwards ensures a high frequency resolution, but also causes
quantization errors spreading over the 1152 time-samples in the process of
quantization. A short window is used to reduce the temporal noise to spread
for the signals containing transients/attacks. In the short window, audio
signals with a length of 36 are divided into 3 equal sub-blocks. In order to
ensure a smooth transition from a long window to a short window and vice
versa, two transition windows, long-short (start) and short-long (stop), which
have the same size as a long window, are employed.
[0026] The
psychoacoustic model module 26 is generally used to
generate control information for the T/F mapping module 24, and for the
quantization and entropy coding module 28. Based
on the control
information from the psychoacoustic model module 26, the spectral
coefficients which are output from the T/F mapping module 24 are received
by the quantization and entropy coding module 28, and are quantized and
entropy coded. Finally these compressed bits streams are packed up along
with format information, control information and other auxiliary data in MP3
frames, and output as the encoded output 32.
[0027] The
MP3 syntax leaves the selection of quantization step sizes
and Huffman codebooks to each encoder or encoding algorithm, which
provides opportunity to apply rate-distortion consideration. A conventional
MP3 encoding algorithm is now be described as follows, which employs a
"hard decision quantization", a two nested loop search (TNLS) algorithm, and
fixed or static Huffman codebooks.
[0028] The
MP3 quantization and entropy coding module 28 first
subdivides an entire frame of 576 spectral coefficients into 21 or 12 scale
factor bands for a long window block (including long-short window and short-

CA 02686264 2009-11-24
32737-CA-PAT - 6 -
long window) or a short window block respectively. Each coefficient xri, i=0
to 575, is quantized by the following non-uniform quantizer:
xr,
y = nint[( ____________________________ õ , )075 ¨0.0946] (2.1)
pH/7_2 I 0¨ scale luclori sh
where yi denotes the quantized index, flint denotes the nearest non-negative
integer, global gain is a global quantization step size which determines the
overall quantization step size for the entire frame, and scale factor[sb] is
used to determine the actual quantization step size for scale factor band sb
where the spectral coefficient xri lies (sb= 0 to 11 for short blocks, sb =0
to
20 for other blocks) to make the perceptually weighted quantization noise as
small as possible. The formulaic determination of y, as in (2.1) may be
referred to as "hard decision quantization".
[0029] The scale_factor[sb] is expressed as
scale _ factorisid = 2 = (scalefac[sub _blockilsh]+ preflag pretab[slil)
(2.2)
x (1+ scale/ac _scale)+ 8 x .s.ubb/ock _ gain[sab _block].
[0030] Generally, each of the parameters listed in (2.2) may be
referred to as a "scale factor", and all of which may be collectively referred
to
herein as "scale factors", as appropriate, global gain and the scale factors
may collectively be referred to herein as "quantization factors".
[0031] In (2.2), sub block is only used for short windows, and it refers
to one of the 3 sub-blocks for a short window. scalefac[sub block][sb] is a
scale factor parameter for scale factor band sb to color the quantization
noise. scalefac[sub_block][sb] are variable length transmitted according to
scalefac_compress which occupies 4 bits (MPEG-1) or 9 bits (MPEG-2) in the
side information of MP3 encoded frames. preflag is a shortcut for additional
high frequency amplification of the quantized values. If preflag is set, the
values of a fixed table pretab[sb] are added to the scale factors. preflag is
never used in short windows (for the purposes of the standard).
subblock_gain[sub block] is the gain offset for the short window.
scalefac_scale is a one-bit parameter used to control the quantization step
size.

CA 02686264 2009-11-24
32737-CA-PAT - 7 -
[0032] The
quantized spectral coefficients are then encoded by static
Huffman coding, which utilizes 34 fixed Huffman codebooks. To achieve
greater coding efficiency, MP3 subdivides the entire quantized spectrum into
three regions. Each region is coded with a different set of Huffman codebooks
that best match the statistics of that region.
Specifically, at high
frequencies, MP3 identifies a region of "all zeros". The size of this region
can
be deduced from the sizes of the other two regions, and the coefficients in
this region don't need to be coded. The only restriction is that it must
contain an even number of zeros since the other two regions group their
values in 2- or 4-tuples. The second region, called "count 1" region, contains
a series of contiguous values consisting only of -1, 0, +1 just before the
"zero" region, and is encoded in 4-tuples by Huffman codebook 32 or 33.
Finally the low frequency region, called "big value" region, covers the
remaining coefficients which are encoded in pairs. This region is further
subdivided into 3 (for long window) or 2 (for short, long-short and short-long
window) parts with each covered by a distinct Huffman codebook.
[0033] To
minimize the quantization noise, a noise shaping method
may be applied to find the proper global quantization step size global gain
and scale factors before the actual quantization. Some
conventional
algorithms use the TNLS algorithm to jointly control the bit rate and
distortion. The TNLS algorithm consists of an inner (rate control) loop and
an outer (noise control) loop. The task of the inner loop is to change the
global quantization step size global gain such that the given spectral data
can just be encoded with the number of bits available. If the number of bits
resulting from Huffman coding exceeds this number, the global gain can be
increased to result in a larger quantization step size, leading to smaller
quantized values. This operation is repeated until the resulting bit demand
for Huffman coding is small enough. The TNLS algorithm may require
quantization step sizes so small to obtain the best perceptual quality. On the
other hand, it has to increase to the quantization step sizes to enable coding
at the required bit rate. These two requirements are conflicting. Therefore,
this conventional algorithm does not guarantee to converge.
[0034] In
some example embodiments, soft decision quantization,
instead of the hard decision quantization, is applied, and the corresponding

CA 02686264 2009-11-24
32737-CA-PAT - 8 -
purpose of quantization and entropy coding in MP3 encoding is to achieve the
minimum perceptual distortion for a given encoding bit rate by solving,
mathematically, the following minimization problem:
{min%qp D (xr rxr), subject to
R(q)+R(y, P, H) RI (3.1)
where xr is the original spectral signal, rxr is the reconstructed signal
obtained from the quantized spectral coefficients y, P and H represent
Huffman codebook region partition and Huffman codebooks selection
respectively, q denotes the quantization factors including global gain and
scale factors, R(q) and R(y, P, H) are the bit rates to encode q and the
quantized spectral coefficients y respectively, R1 is the rate constraint, and
D, (xr, rxr) denotes the weighted distortion measure between xr and rxr.
Note that here y is not calculated according to (2.1) any more; instead, it is
treated as a variable in a cost function involving the distortion and rates,
and
has to be determined jointly along with q, P, and H. Average noise-to mask
ratio (ANMR) is used as the distortion measure. The noise-to mask ratio
(NMR), the ratio of the quantization noise to the masking threshold, is a
widely used objective measure for the evaluation of an audio signal. ANMR is
expressed as
1
ANIVIR--lw[shI=d[sb] (3.2)
N
where N is the number of scale factor bands, w[sb] is the inverse of the
masking threshold for scale factor band sb, and d[sb] is the quantization
distortion, mean squared quantization error for scale factor band sb.
[0035] The
above constrained optimization problem could be converted
into the following minimization problem:
,JA (y, q, P, H) = Dõ (xr, rxr)+ A =(R(q)+ R(y,P,H)) (3.3)
where A is a fixed parameter that represents the tradeoff of rate for
distortion, and JA is referred to as the "Lagrangian cost".
[0036]
Reference is now made to Figure 2, which shows a flow diagram
of an optimization process 50 in accordance with an example embodiment.

CA 02686264 2012-11-05
9
The exact order of steps may vary from those shown in FIG. 2 in different
applications
and embodiments. It can also be appreciated that more or less steps may be
required in
some example embodiments, as appropriate. To find an optimum J, the parameters
y,
q, P and H are jointly optimized. The general framework for the process 50 has
been
outlined previously in Xu and E.-h. Yang, "Rate-distortion optimization for
MP3 audio
coding with complete decoder compatibility,"in Proc. 2005 IECE Workshop on
Multimedia Signal Processing, October 2005. Generally, the process 50 selects
the
quantized spectral coefficients y and Huffman codebook region division P,
quantization
factors q and Huffman codebook region selection H alternatively to minimize
the
Lagrangian cost J. The iterative searching for the parameters may be referred
to as
"soft-decision quantization" (rather than the formulaic "hard-decision
quantization" of
(2.1), described above).
[0037] Referring still to FIG. 2, the iterative algorithm of the process 50
can be
described as follows. At step 52, specify a tolerance as the convergence
criterion for
the Lagranglan cost J. At step 54, initialize a set of quantization factors go
from the
given frame of spectral domain coefficients xr with a Huffman codebooks
selection
mode Ho; and set t=0.
[0038] At step 56, qt and Ht are fixed or given for any t _. 0. Find the
optimal
quantized spectral coefficients yt and Huffman codebook region division Pt by
soft
decision quantization, where yt and Pt achieve the minimum
miny,pix = Dw(xr,Q-1(q,y)) + A = (R(qt) + R(y,P,Ht))
(3.4)
[0039] At step 58, given yt, Pt and Ht, update qt to qt+i so that qt+i
achieves the
minimum
ming./ A = Dw(xr,Q-1(q,y)) + A = (R(q))
(3.5)
[0040] At step 60, given yt, Pt and qt+i, update Ht to Ht+1 so that Ht+1
achieves the
minimum
mine (R(yt,Pt+i,Ht))
(3.6)

CA 02686264 2009-11-24
32737-CA-PAT - 10 -
[0041] At step 62, query whether .I21 --,P;156.=1/1. If so,
the
optimization process 50 proceeds to step 66 and outputs the final y, q, P and
H and ends at step 68. If not, proceed to step 64 wherein t=t+/, and repeat
steps 56, 58 and 60 for t=0, I, 2, ... until . Since
the
Lagrangian cost function may be non-increasing at each step, the
convergence is guaranteed. The final y, q, P and H may thereafter be
provided for MP3 coding of xr.
[0042]
Referring still to Figure 2, an example embodiment of step 56
will now be described in greater detail, with reference now to Figures 3 and
4. Figure 3 shows a graph 80 of an optimal path search algorithm for use in
the process of Figure 2; while Figure 4 shows an optimal path of the graph
80.
[0043] Without
being limiting, consider for example the long window
case. The graph 80 is defined with 4 layers (shown as I, II, III, and IV) and
288 nodes in each layer as shown in Figure 3. The 4 layers correspond to
the three divisions of the big value region and the count I region. Each
state SL,, (L=I, ===,IV, 0 5_ I <288) in layer L stands for two neighboring
coefficients xr2; and xr21+1 to be quantized, since Huffman coding is always
applied on 2-(for layer I, II, III) or 4-(for layers IV) tuples. Two special
states, frame begin and frame_end, denote the start and end of the frame
respectively. Connection between any two states denotes a Huffman
codebook region division decision pair: state SL,, may have incoming
connections from states SK, (M=/, ===, L; j=i-2 if L=IV, and j=i-/ otherwise),
each of which represents the decision of assigning node i, i.e., coefficients
xr21 and xr21-F/ to the Huffman codebook region denoted by layer L. Note that
not all the states and paths are compatible with the standard and the
following syntax constraints should be observed for the construction of the
graph 80:
a) States of scale factor band 0 in layers II and III, states of scale
factor band 1 in layer III, and the second state in layer IV are
illegitimate, and thus don't have any incoming and outgoing
connections;
b) States after scale factor band 15 in Layer I are not allowed;

CA 02686264 2009-11-24
32737-CA-PAT - 11 -
c) A graph path cannot transverse more than 8 scale factor bands
in layer II;
d) The connections among layers I, II and III can only occur at the
scale factor band boundaries, and the frame_begin state has
only outgoing connections to states S1,0 and S1,0 and
frame_end; and
e) The frame end state has incoming connections from all
legitimate states, with each connection from non-trailing state
SL,, (0 i <
287) representing the decision of assigning the
coefficients after node i to the zero region, that is, dropping that
part of spectrum without Huffman encoding and transmission.
[0044] Assign
to each connection from previous states (no matter
which layer they lie in) to state SL,, (0 i <
288) a cost which is defined as
the minimum incremental Lagrangian cost of quantizing and Huffman
encoding the coefficients of state SL,, (or states and
SL,, if L=IV) by using
the Huffman codebook selected for layer L.
Specifically, this minimum
incremental cost is equal to
mini. D. (xi" (q , Y ,)) + =ri(Y2,-k,'='.Y2,) (3.7)
1=21-1,
where k=3 if L=IV, and k=1 otherwise, y1, j=2i-k, === 2i, is the jth quantized
coefficient, qj is the corresponding scale factor for yj, and rif=-) denotes
the
codeword length by using the Huffman codebook selected for layer L.
Similarly, for the connection from state SL,, (0 i < 287) to the frame end
state, its cost is defined as
576 576
Dõ.(xr, , (q 1,0))+2=0= Dõ(xr1,0-1(q 1,0)) (3.8)
1-2i-k
[0045] No
cost is assigned to the connections from trailing state SL,288
to the frame end state.
[0046] With
the above definitions, every sequence of connections from
the frame begin state to the frame end state corresponds to a Huffman
codebook region division of the entire frame with a Lagrangian cost. For
example, the sequence of connection in Figure 4 assigns scale factor band 0
and 1 to the fist two subdivisions of the big value region respectively, the

CA 02686264 2009-11-24
32737-CA-PAT - 12 -
next 4 coefficients to the count 1 region, and the rest to the zero region. On
the other hand, any Huffman codebook region division of the entire frame
that is compatible with the standard can be represented by a sequence of
connections from the frame begin to the frame_end state in the graph 80.
Hence the optimal path from the frame begin state to the frame begin state,
together with quantized coefficients along each connection that give the
minimum cost defined by (3.7), achieves the minimum in step 56 (Figure 2)
for any given q and H.
[0047] An
elaborate step-by-step description of the path searching
algorithm is described as follows, referring still to Figures 3 and 4. As an
initialization, the algorithm preselects and stores the best quantized
coefficients based on minimizing the Lagrangian cost of (3.7) for each
legitimate state SL,õ and sets their associated cost as the cost of each
connection to that state. The algorithm also recursively precalculates, for
each state, the distortion/cost resulting from ending the frame at that state,
i.e., the cost of its connection to the state frame_end. The algorithm begins
with the state frame begin by storing the cost of dropping the entire frame
in iframe_begm= Then, one proceeds to state SL,0 (L=I, ==, IV), among which
only
states 51,0 and .5/17,0 have incoming connections from the state frame begin.
The cost of each state is set to the cost of corresponding incoming
connection, and added with the cost of dropping the remaining coefficients to
get Ji,0 and 1, respectively. Proceeding to state SL,1 (L=I, ==, IV), only
states SL,/ has an incoming connection from states S1,0. Set its cost to the
sum of the costs of state Su) and the connection between S/,0 and Sim and
add it with the cost of dropping the remaining coefficients to get J. Next,
consider states SL,2 (L=I, ==, IV), it may be observed that 51v,2 has two
incoming connections from .50 and SL,0 respectively. Here the connection
from the state with less cost is chosen, and the costs of Siv,2 and /
- IV,2 are
computed by adding it with corresponding incremental connection costs,
respectively.
Following the above cost computation rule, process all
legitimate states: for each state SL,õ the best incoming connection is
selected
such that the accumulated cost (from frame begin to SL,,) can be minimized.
Store this connection selection decision at that state, set the cost of SL,,
to

CA 02686264 2009-11-24
32737-CA-PAT - 13 -
the accumulated cost, and then sum it with the cost of dropping the
remaining coefficients to get
[0048]
Referring now to Figure 4, after traversing all the legitimate
states, the path cost information, L=I, ==, IV, 0 i<288,
is available.
Obtain the minimum path cost J,õ,õ = mirk,/ By
backtracking the path
which gives Jrnin with the help of the stored information in each state, the
optimal quantized spectral coefficients y and region division P that solve the
problem (3.4) may be obtained.
[0049] In a
similar manner as described above, a three-layer graph
could be constructed for other three window cases.
[0050]
Referring to Figure 2, step 58 will now be described in greater
detail, with reference now to Figure 5. Figure
5 shows an example
embodiment of a process 100 to be used in step 58 of Figure 2. Step 58
generally determines the quantization factors q (i.e., scale factors and
global gain) that minimize the combined cost of weighted distortion and bit
rate for encoding or transmittal. Given
the nonuniform quantizer and
nonlinear bit rate for quantization factors in the standard, there is no
direct
formula to calculate the optimal quantization factors. Direct search through
all combinations of global gain, scale fac compress, scale fac, scalfac scale,
and subblock gain (for short windows) or preflag (for other windows) may be
computationally complex. Take an MPEG-1 encoded long-block frame as an
example. There are 256 different cases for global gain. scalefac compress,
preflag and scalfac scale have 16, 2 and 2 different cases respectively.
There are 256 x 16 x 2 x 2 = 16384 different combinations to find the
minimum combined cost. In some example embodiments, to reduce the
computational complexity, the method 100 includes the following alternating
minimization procedure to minimize the combined cost. Generally, at step
102 global gain is determined while the scale factors are fixed, and at step
104 the scale factors are determined while global gain is fixed. This is
repeated iteratively until the calculated rate-distortion cost is within a
predetermined threshold.
[0051] At step
102, update global gain when scalefac, scalfac scale
and subblock gain (for short windows) or preflag (for other windows) are

CA 02686264 2009-11-24
32737-CA-PAT - 14 -
fixed. In this case, the bit rate for the transmission of scale factors is
fixed.
Therefore, at this stage only the encoding distortion is minimized, while rate
is not considered. The weighted distortion for scale factor band sb is
/[,/, 1]-1
dõ[sb].= w[sbl= y,"3 2''" 4 i2 (3.9)
14101
where s[sb]= global gain-210-scale_factor[sb], l[sb] and l[sb+1]-1 are the
start and end positions for scale factor band sb respectively, w[sb] is the
inverse of the masking threshold for scale factor band sb. The total average
weighted distortion D,,õ for an encoded frame could be expressed as
1 \ 1 A'
D
=-1(1õ[sb]=¨Iw[sb]= [xi,¨ ji,4' 3 2%0/))/ 4 ]2
(3.10)
N 0=1
[0052]
Differentially calculate the distortion based on encoding with
respect to global at) gain to minimize the
distortion. Let =0,
aglobal gain
which leads to
Ib[sb]
4
global gain= _____________ log10
\h=1 +210 (3.11)
logio2
a[sb]
where
b [sb] - %mit, or! I 4 xi;
x4/3
(3.12)
,=/101
and
/10+11-1
a[sb] 2- %rah, lact or[ 2 lu[ch]
-12,4 (3.13)
t-/I NhI
As global gain should be an integer, global gain is chosen as one of the two
nearest integers to formula (3.11) which has smaller weighted distortion.
[0053] At step
104, fix global gain. Update the scale factors scalefac,
scalfac scale and subblock gain (for short windows) or preflag (for other
windows) to minimize the combined cost of weighted distortion and bit rate
for transmitting the scale factors. As indicated from equation (3.9),
s[sb]= global gain - 210 - scale factor[sb],
where global gain has the value of 0 to 255, and scale_factor[sb] is equal to

CA 02686264 2012-11-05
scale factolsb] = 2 x (scalefac[sb) + preflag = pretab(sbJ) x (3.14)
(1 + scalefac scale).
[0054] preflag is equal to 0 or 1. The value of pretablSb] is typically fixed
and is of the
form as shown in Table 1.
[0055]
Sb 0 to 10 11 12 13 14 15 16 17 18 19 20
Preflag=0 0 0 0 0 0 0 0 0 0 0 0
Preflag=1 0 1 1 1 1 2 2 3 3 3 2
Table 1. The value of pretabfsbj for long windows.
[0056] scalefac scale is equal to 0 or 1.
[0057] The bit length of scalefacfsb] is determined by scalefac compress, that
is,
sca/efac compress determines the number of bits used for the transmission of
the
scalefactors according to Table 2.
[0058]
scalefac compress slen1 slen2
0 0 0
1 0 1
2 0 2
3 0 3
4 3 0
5 1 1
6 1 2
7 1 3
8 2 1
9 2 2
10 2 3
11 3 1
12 3 2
13 3 3
14 4 2
15 4 3
Table 2: The bit length for scalefactsb]
[0059] As can be appreciated from Table 2, the bit length may be a first bit
length for
a first group of scale factor bands and the bit length may be a second bit
length for a
second group of scale factor bands. In Table 2 slen1 is the bit length of
scalefac for
each of scalefactor bands 0 to 10, and slen2 is the bit length of scale fac
for each of
scalefactor bands 11 to 20.

CA 02686264 2009-11-24
32737-CA-PAT - 16 -
[0060] From
the above, it can be observed that a direct search for the
minimum combined cost requires the computation of encoding costs for all
combinations of scalefac compress, scalfac scale and preflag. This leads to
16 x 2 x 2 = 64 different combinations to find the minimum combined cost
for each scalefactor band. Without intending to be limiting, the following
example embodiment assumes that the encoding block is an MPEG-1
encoded, long-window frame. In
some example embodiments, it is
recognized that there are some redundant operations in the distortion
computations. Therefore, some example embodiments provide for pre-
generating a look-up table for those redundant operations, which are based
on slen rather than searching through all combinations of scalefac compress.
[0061] From
Table 2, the maximum length for slenl is 4 while the
maximum length for slen2 is 3 (as based on the MP3 standard). When slen1
and slen2 are given, in some example embodiments, one can find the
minimum encoding distortion for each scalefactor band and the
corresponding scalefac[sb] which generates the minimum encoding
distortion. Hence, when preflag and scalfac scale are fixed, there only needs
to be calculated 5 (the first 11 bands) or 4 (the last 10 bands) different
cases
of encoding distortion for each scale factor band, rather than calculate the
encoding distortion 16 times for different scalefac compress. In each case,
the pre-calculated encoding distortion is minimized with a certain value for
scalefac[sb] given the length slenl or slen2.
[0062] Let's
denote dist[sb][slen] as the minimum weighted distortion
for scale factor band sb, where sb=0, ===, 20 and slen=0, ===, 4. Denote
sf[sb][slen] as the value for scalefac[sb] such that the weighted distortion
is
minimized for scale factor band sb when the bit length used for transmitting
scalefac[sb] is slen. To generate a look-up table for each scale factor band,
apply the following approach given the fixed values for global gain,
scalfac scale and preflag. Without loss of generality, the following example
embodiment considers the first 11 scale factor bands for an MPEG-1
encoded, long-window frame.
[0063] Assume
s[sb] in equation (3.9) can be freely chosen. That is,
s[sb] is not restricted by the value of scalefactsb] to be one of the 16
integer
numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15). Apply the

CA 02686264 2009-11-24
32737-CA-PAT - 17 -
minimum mean square error criterion to find the minimum weighted
ad,,,[sh]
distortion for (3.9). That is, let __ =0, which leads to
as[sb]
/[sh+li-1
XI; = y,4/3
4 ,=/ishi
s[sb] ____________________
i gio (3.15)
log102 y/813
1=1N)]
[0064] Denote sg[sb]=s[sb]+210. The
corresponding value for
scalefac[sb] is (global gain - sg[sb])/2(1+scalfac_scale ) _ preflag =
pretab[sb].
Denote this value as T. scalefac[sb] cannot be freely chosen in reality (as
defined by the standard), that is, it must be constrained to one of the 16
integer numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15). In
some example embodiments, the value of scalefac[sb] can be determined
using the following algorithm. Generally, it is determined whether T exceeds
encoding within slen, and if so constraining T to within slen:
If slen=0
let sf[sb][slen] = 0, and calculate the distortion dist[sb][0]. (3.16)
Else (slen=0)
if T 5_ 0
for slen = 1 to 4
let sf[sb][slen] = 0, and let dist[sb][slen] = dist[sb][0].
else if T 15
for slen=1 to 4
let sf[sb][slen]=2slen_1, and calculate dist[sb][slen] using
equation (3.9).
else
let sf[sb][4]=T (If T is not an integer, choose one of the two nearest
integers to T which has smaller weighted distortion), calculate
dist[sb][4] using equation (3.9)
for slen,--3 down to /
if sf[sb][slen+1] Z1'1-1
let sf[sb][slen]=2slen_1.
else
let sf[sb][slen]=sf[sb][slen+1].

CA 02686264 2009-11-24
32737-CA-PAT - 18 -
calculate dist[sb][slen] using equation (3.9).
[0065]
Totally there are 20 different cases (5 slenl x 2 preflag x 2
scalfac scale) of encoding distortion for each of the first 11 scale factor
bands and 16 different cases (4 slen2 x 2 preflag x 2 scalfac scale) of
encoding distortion for each of the last 10 scale factor bands. As the setting
of preflag only affects the last 10 scale factor bands, the number of
different
cases of encoding distortion to be computed for each of the first 11 scale
factor bands is reduced to 10 (5 sleni x 2 scalfac scale). In other words,
the cost function is minimized with respect to preflag for only one set of
scale
factor bands, being the higher frequency scale factor bands 11 to 20. In
addition, there exists one redundant case for each scale factor band if
scalefac[sb] is equal to 0 (i.e., (3.16) may be calculated once). As a result,
in some example embodiments, there are 9 (the first 11 scale factor bands)
or 15 (the last 10 scale factor bands) different cases of encoding distortion
for each scale factor band.
[0066] After
generating the above table based on encoding distortion,
what remains is the calculation of the total Lagrangian cost by calculating
(3.3). As described above with respect to (3.3), the total Lagrangian cost is
the addition of the encoding distortion and the bit rate. Therefore, what
remains is the addition of bit rate to calculate the combined cost. For
example, the distortion based on bit rate for the transmission of all scale
factors can also be looked up from a pre-generated table, as is known in the
art. Similarly, for other window cases, a similar approach could be applied to
reduce the computational complexity.
[0067] At
step 106, repeat steps 102 and 104 until the decrease of the
combined cost is below a prescribed threshold. If the
predetermined
threshold is reached, at step 110 output the final global gain and scale
factors (scalefac, scalfac scale, preflag/subblock gain), and then ends at
step 112 (or proceed to the next step in method 50 (Figure 2)).
[0068] As the
iterative method 100 generally converges after two
rounds of iteration, the number of different cases to be computed for each
scale factor band of an MPEG-1 encoded, long-window frame has been
reduced from 16384 to 18 (the first 11 bands) or 30 (the last 10 bands).

CA 02686264 2012-11-05
19
[0069] The particular quantization factors or scale factors to be
determined may
depend on the particular application or coding scheme, and may not be limited
to the
parameters global gain, scale fac, scalfac scale, and preflag/subblock gain.
[0070] Referring now to FIG. 2, step 60 will now be described. Given Huffman
coding
region division P, the quantization factors q and quantized spectral
coefficients y,
determining the Huffman codebook H may be performed as follows: for each
region,
every Huffman codebook that has encodable value limit larger than or equal to
the
greatest coefficient amplitude of that region is considered, and the one with
the
minimum codeword length is selected.
[0071] Implementation and simulation results will now be described. In regards
to
(3.3), the estimation of lambda (A) will now be described in greater detail.
In
conventional systems, bisection methods may be used to determine for a final
A. This
may require a high computational complexity which is proportional to the
number of
iterations over the optimization algorithm described in the last section. As
recognized
herein, in some example embodiments, by analyzing the relationship between
Perceptual Entropy, signal to noise ratio, signal to mask ratio, encoding bit
rate and the
number of audio samples to be encoded, the final A was estimated using the
following
formula in a trellis search algorithm for the optimization of advance audio
coding (MC),
c11n10x10(cPE-c3R)/ M
final 10M
where PE is Perceptual Entropy of an encoded frame, R is the encoding bit
rate, and M
is the number of audio samples to be encoded. ci, c2 and c3 are determined
from the
experimental data using the least square criterion. This is for example
generally
described in C. Bauer and M. Vinton, "Joint optimization of scale factors and
Huffman
codebooks for MEPG-4 MC," in Proc. of the 2004 IEEE workshop on Multimedia
Signal
Processing, pp. 111-114, 2004; and C. Bauer and M. Vinton, "Joint optimization
of scale
factors and Huffman codebooks for MEPG-4 MC," in IEEE Trans. on Signal
Processing, vol. 54, pp. 177-189, January 2006.

CA 02686264 2009-11-24
32737-CA-PAT - 20 -
[0072] In the experiment, 16 RIFF WAVE files with a sampling rate of
44.1 khz from a sound test file were used. The initial value for A was
arbitrarily selected, and the bisection method was used to find the final
value
for A. The optimized MP3 encoded files were generated for each of the 16
RIFF WAVE test files at the encoding bit rates of 32, 40, 48, 56, 64, 80, 96,
112, 128, 144, 160, 192, 224, 256 and 320 kbps. For each tested file,
tested values of Perceptual Entropy and A at different encoding bit rates were
recorded. As the values of Perceptual Entropy are usually in the range of
100 to 3000, tested data outside this range was discarded. Next, uniformly
quantize the values of tested Perceptual Entropy with a quantization step size
of 100, and calculated the mean value and standard deviation for the tested
A for each possible encoding bit rate and perceptual entropy pair.
[0073] To determine the values of cl, c2 and c3, a non-linear regression
progress within MATLAB optimization toolbox was used in some example
simulations. Specifically, use the following MATLAB function
beta = nlinfit(X ,y, fun, beta0) (4.2)
to estimate the coefficients of ci, c2 and c3. In the above formula, X
represents independent variables PE and R. y represents the dependent
variable /1õ,/,' . fun represents the formula
(4.1). beta is a vector
containing initial values for the coefficients for cl, c2 and c3. To avoid the
ill
condition in the nonlinear regression process, discard those encoding bit rate
and perceptual entropy pairs where 75% of the tested 2,1, values generated
from the bisection method fall outside the range of 20% of standard
deviation from the mean value.
[0074] For 44.1 khz sampling audio, LAME's psychoacoustic model, the
following values for ci, c2 and c3 to encode the audio file in MP3 format were
obtained:
lci = 8.3839
c2 =1.3946
c, = 6.2698

CA 02686264 2012-11-05
21
[0075] The average number of iterations was tested over the Lagrangian
multiplier if the
formula (4.1) with the above estimated coefficient is used as the initial
point for the bisection
search. The average number of iterations over the Lagrangian multiplier is
1.5. On the other
hand, the average number of iterations over the Lagrangian multiplier ranges
from 4 to 8 if an
arbitrary number is used as the initial point. Therefore, on the average,
using (4.1) as the initial
point can run 4 times as fast as the method in which an arbitrary initial
point is used.
[0076]
Implementation and simulation results of the optimization process 50 will now
be
described, referring now to Figures 6 to 9. Generally, the performance of
example
embodiments is implemented based on two MP3 encoders: ISO reference codec and
LAME
3.96.1. For each case, the iterative optimization algorithm uses the original
encoder output as
the initial points. Figure 6 shows a graph 140 of performance characteristics
of an example
embodiment, showing a comparison of the method 50 (Figure 2) for encoding of
audio file
waltz.wav as compared to LAME. Figure 8 shows a graph 160 of performance
characteristics of
an example embodiment, for encoding audio file vioin.wav as compared to ISO
reference
codec. Figure 9 shows a graph 170 of performance characteristics of an example
embodiment,
for encoding of audio file violin.wav as compared to LAME.
[0077] The LAME MP3 encoder features a psychoacoustic model, joint stereo
encoding and
variable bit-rate encoding. However, LAME still uses the basic structure of
typical TNLS. In
LAME 3.96.1, a refining TNLS is used to minimize the total noise to masking
ratio for an entire
frame after the successful termination of search process given its typical
TNLS. Specifically,
during each outer loop, the band with maximum noise to masking ratio is
amplified and the
best result based on total noise to mask ratio is stored.
[0078] The method 50 (Figure 2) is implemented as described above. For each
case, the
perceptual model, joint stereo encoding mode and window switching decision are
kept intact.
Figure 6 shows the rate-distortion performance of the method 50 (Figure 2)
(denoted as "RD
optimization" in

CA 02686264 2009-11-24
32737-CA-PAT - 22 -
the graph 140) applied to ISO reference encoder, when compared to a
conventional or normal ISO reference encoder implementing TNLS, in
constant bit-rate mode for waltz.wav. The test file may for example be
encoded at 48khz, 2 channel, 16 bits/sample, 30 seconds. In Figure 6, "ISO-
HO" represents the optimal Huffman tables used for Huffman coding, while
"ISO-NH" means that the first Huffman table satisfying the coding limit is
selected for each Huffman coding region. The vertical axes denote the
average noise to mask ratio over all audio frames. From Figure 6, the
method 50 (Figure 2) can achieve significant performance gain over the ISO
reference encoder. For instance, at 320kbps the proposed optimization
algorithm achieves 4.57dB and 2.75dB ANMR gains over ISO-NH and ISO-HO
respectively. The ANMR of the optimized algorithm at 32kbps is similar to
that of ISO reference encoder at 40kbps, which corresponds to equivalent
20% compression rate reduction.
[0079] Figure
7 depicts the rate-distortion performance of the method
50 (Figure 2) (also denoted as "RD optimization) applied to LAME when
compared to the LAME reference encoder (implementing conventional TNLS)
in constant bit-rate mode for waltz.wav. It is shown separately from ISO
reference encoder because ISO reference encoder and LAME adopt different
perceptual models. For an
unbiased comparison, in some example
embodiments the LAME encoder disables the functions of amplitude scaling
and low pass filter. In
Figure 7, "LAME" means that the audio file is
compressed using LAME's normal compression mode. As shown, the method
50 (Figure 2) outperforms LAME in terms of compression performance. At
96kbps, the proposed optimization algorithm achieves about 1.34dB ANMR
gain over LAME.
[0080] Figures
8 and 9 compare the compression performance of the
method 50 (Figure 2) for the music file violin.wav (MPEG lossless audio
coding test file, 48khz, 2 channel, 16 bits/sample, 30 seconds) in constant
bit-rate mode. Figure 8 shows results from ISO reference encoder, while
Figure 9 shows results from LAME. It may
be observed that "RD
optimization" has improved rate-distortion over the conventional reference
encoders. Similar results may be observed for other test music files.

CA 02686264 2009-11-24
32737-CA-PAT - 23 -
[0081]
Referring now to Figure 2, the computational complexity of the
method 50 will now be described. Given the value of A, the number of
iterations in the iterative joint optimization algorithm has a direct impact
on
the computational complexity.
Experiments show that by setting the
convergence tolerance E to 0.005, the iteration process is observed to
converge after 2 loops in most cases, that is, most of the gain achievable
from full joint optimization is obtained within two iterations. This is the
same
to the iterative quantization factor q updating in step 58. In Step 56, the
search range for yl is set to [yhra, yk+a], where yh, is the jth quantized
coefficient from hard decision quantization (e.g. yi is determined from (2.1))
and a is a fixed integer. Experiments show that further expansion of the
search range for yj beyond a = 2 does not significantly improve compression
performance. In constant bit-rate mode, the average number of iterations
over the Lagrangian multiplier is 1.5 if the formula (4.1) is used as the
initial
point. On the other hand, the average number of iterations over the
Lagrangian multiplier ranges from 4 to 8 if an arbitrary number is used as
the initial point.
[0082] Table
3 lists the computation time (in seconds) on a Pentium
PC, 2.16GHZ, 1G bytes of RAM to encode violin.wav and waltz.wav at
different transmission rates for the method 50 based on LAME reference
codec.
[0083]
Bit rates 96 112 128 160 192
(kbps)
Waltz. wav 27 23 21 21 16
Violin.wav 23 22 20 16 15
Table 3. Computation time in seconds for different MP3 encoders
[0084] From
Table 3 the proposed optimization algorithm generally
reaches real time throughput, which suggests that the method 50 is
computationally efficient. As shown in Table 3, the computation time is
generally less than 30 seconds. The computation time for ISO-based
encoders is not listed, but are generally less-efficient than LAME-based
encoders in both the computation time and compression performance.

CA 02686264 2012-11-05
24
[0085] Reference is now made to Figure 10, which shows an encoder 300 in
accordance
with an example embodiment. The encoder 300 may for example be implemented on
a
suitable configured computer device. The encoder 300 includes a controller
such as a
microprocessor 302 that controls the overall operation of the encoder 300. The
microprocessor
302 may also interact with other subsystems (not shown) such as a
communications subsystem,
display, and one or more auxiliary input/output (I/O) subsystems or devices.
The encoder 300
includes a memory 304 accessible by the microprocessor 302. Operating system
software 306
and various software applications 308 used by the microprocessor 302 are, in
some example
embodiments, stored in memory 304 or similar storage element. For example, MP3
software
application 310, such as the ISO-based encoder or LAME-based encoder described
above, may
be installed as one of the various software application 308. The
microprocessor 302, in
addition to its operating system functions, in example embodiments enables
execution of
software applications 308 on the device.
[0086] The encoder 300 may be used for optimizing performance of MP3 encoding
of a
source sequence. Specifically, the encoder 300 may enable the microprocessor
302 to
determine quantization factors (for example including a global quantization
step size and scale
factors) for the source sequence. The memory 304 may contain a cost function
of an encoding
of the source sequence, wherein the cost function is dependent of the
quantization factors.
The memory 304 may also contain a predetermined tolerance of the cost function
stored in the
memory 304. Instructions residing in memory 304 enable the microprocessor 302
to access the
cost function and predetermined tolerance from memory 304, determine the
quantization
factors which minimize the cost function within the predetermined tolerance,
and store the
determined quantization factors in memory 304 for MP3 encoding of the source
sequence.
Generally, an iterative method is performed such that global gain is
determined while the scale
factors are fixed, and the scale factors are determined while global gain is
fixed. This is
repeated until a calculated rate-distortion cost is within a predetermined
threshold. For
example, the MP3 software application

CA 02686264 2009-11-24
32737-CA-PAT - 25 -
310 may be used to perform MP3 encoding using the determined
quantization factors.
[0087] In another example embodiment, the encoder 300 may be
configured for optimizing of parameters including quantization factors, in a
manner similar to the example methods described above. For example, the
encoder 300 may be configured to perform the method 50 (Figure 2).
[0088] While the foregoing has been described with respect to MP3
encoding, it may be appreciated by those skilled in the art that example
embodiments may be adapted to or implemented by other forms of signal
encoding or audio signal encoding, for example Advanced Audio Coding.
[0089] While example embodiments have been described in detail in
the foregoing specification, it will be understood by those skilled in the art
that variations may be made without departing from the scope of the present
application.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Common Representative Appointed 2019-10-30
Common Representative Appointed 2019-10-30
Revocation of Agent Request 2018-11-29
Appointment of Agent Request 2018-11-29
Grant by Issuance 2015-01-27
Inactive: Cover page published 2015-01-26
Maintenance Request Received 2014-11-04
Pre-grant 2014-09-26
Inactive: Final fee received 2014-09-26
Notice of Allowance is Issued 2014-04-04
Letter Sent 2014-04-04
Notice of Allowance is Issued 2014-04-04
Inactive: Approved for allowance (AFA) 2014-03-31
Inactive: Q2 passed 2014-03-31
Amendment Received - Voluntary Amendment 2013-11-27
Maintenance Request Received 2013-11-05
Inactive: S.30(2) Rules - Examiner requisition 2013-05-31
Inactive: IPC assigned 2013-02-13
Inactive: First IPC assigned 2013-02-13
Amendment Received - Voluntary Amendment 2013-01-15
Inactive: IPC expired 2013-01-01
Inactive: IPC removed 2012-12-31
Maintenance Request Received 2012-11-05
Amendment Received - Voluntary Amendment 2012-11-05
Inactive: S.30(2) Rules - Examiner requisition 2012-05-09
Amendment Received - Voluntary Amendment 2011-07-07
Appointment of Agent Requirements Determined Compliant 2011-06-16
Inactive: Office letter 2011-06-16
Inactive: Office letter 2011-06-16
Revocation of Agent Requirements Determined Compliant 2011-06-16
Revocation of Agent Request 2011-05-30
Appointment of Agent Request 2011-05-30
Application Published (Open to Public Inspection) 2010-06-01
Inactive: Cover page published 2010-05-31
Inactive: IPC assigned 2010-03-05
Inactive: First IPC assigned 2010-03-05
Inactive: Office letter 2009-12-22
Inactive: Filing certificate - RFE (English) 2009-12-16
Letter Sent 2009-12-16
Letter Sent 2009-12-16
Letter Sent 2009-12-16
Application Received - Regular National 2009-12-16
Amendment Received - Voluntary Amendment 2009-11-24
Request for Examination Requirements Determined Compliant 2009-11-24
All Requirements for Examination Determined Compliant 2009-11-24

Abandonment History

There is no abandonment history.

Maintenance Fee

The last payment was received on 2014-11-04

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
BLACKBERRY LIMITED
Past Owners on Record
EN-HUI YANG
GUIXING WU
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column (Temporarily unavailable). To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

({010=All Documents, 020=As Filed, 030=As Open to Public Inspection, 040=At Issuance, 050=Examination, 060=Incoming Correspondence, 070=Miscellaneous, 080=Outgoing Correspondence, 090=Payment})


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 2009-11-23 25 1,108
Abstract 2009-11-23 1 15
Claims 2009-11-23 5 149
Drawings 2009-11-23 7 88
Description 2012-11-04 25 1,102
Claims 2012-11-04 4 131
Claims 2013-11-26 4 132
Representative drawing 2014-03-30 1 5
Acknowledgement of Request for Examination 2009-12-15 1 175
Courtesy - Certificate of registration (related document(s)) 2009-12-15 1 103
Courtesy - Certificate of registration (related document(s)) 2009-12-15 1 103
Filing Certificate (English) 2009-12-15 1 156
Reminder of maintenance fee due 2011-07-25 1 113
Commissioner's Notice - Application Found Allowable 2014-04-03 1 162
Correspondence 2009-12-15 1 19
Correspondence 2011-05-29 5 156
Correspondence 2011-06-15 1 13
Correspondence 2011-06-15 1 21
Fees 2011-11-23 1 35
Fees 2012-11-04 1 39
Fees 2013-11-04 1 38
Correspondence 2014-07-14 5 102
Correspondence 2014-09-25 1 38
Fees 2014-11-03 1 38